You might need to make use of the gpu_memory_limit and/or lora_on_cpu config options to avoid operating outside of memory. If you continue to run out of CUDA memory, it is possible to endeavor to merge in procedure RAM https://aliviawyhy560344.blogunteer.com/profile