CUDA usage is low

#28

by Max545 - opened Jul 15, 2024

Jul 15, 2024

When I trained a gemma2, the GPU usage is low (0% at most of time). And when I use the same method (LoRA, peft library) to train llama, the GPU usage is constantly about 100%. What's the reason?

GopiUppari

Google org Oct 1, 2024

Hi @Max545 ,

I executed both the models in GPU type NVIDIA_TESLA_A100 x 1. When running models like google/gemma-2b and meta-llama/Llama-2-7b-hf, if the device is not specified as "auto", the models will use system RAM instead of the GPU. However, if you explicitly set device="cuda", the models will automatically run on the GPU, utilizing its computational power for faster processing. Please refer to the following gist for more details: link to gist.

The difference in GPU usage between Gemma2 and LLaMA during fine-tuning with LoRA can be attributed to several factors:

  Model architecture: LLaMA is more optimized for efficient GPU usage, while Gemma2 may not be as well-tuned for GPU-heavy tasks.
  Memory bottlenecks: Inefficient memory management or slow data transfer between CPU and GPU in Gemma2 can result in lower GPU usage.
  Framework support: LLaMA has better support in the PEFT library and related tools, which could lead to better GPU utilization compared to Gemma2.

Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment