G4 VM’s P2P fabric boosts multi-GPU workloads
google cloudblogToday, we announced the general availability of the G4 VM family based on NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Thanks to unique platform optimizations only available in Google Cloud, G4 VMs deliver the best performance of any commercially available NVIDIA RTX PRO 6000 Blackwell GPU offering for inference and fine-tuning on a wide range of models, from less than 30B to over 100B parameters. In this blog, we discuss the need for these platform optimizations, how they work, and how to use them in your own environment.
Collective communications performance matters
Large language models (LLMs) vary significantly in size, as characterized by their number of parameters: small (~7B), medium (~70B), and large (~350B+). LLMs often exceed the memory capacity of a single GPU, including the NVIDIA RTX PRO 6000 Blackwell’s, with its 96GB of GDDR7 memory. A common solution is to use tensor parallelism, or TP, which ...
Copyright of this story solely belongs to google cloudblog . To see the full text click HERE