Microsoft Azure has officially launched its impressive new NDv6 GB300 virtual machine (VM) series. This groundbreaking release marks the industry’s first supercomputing-scale production cluster utilizing Nvidia’s GB300 NVL72 systems. These powerful new VMs are designed to handle OpenAI’s most demanding artificial intelligence (AI) inference tasks, with a focus on optimizing reasoning models, advanced agentic AI systems, and diverse multimodal generative AI workflows. This new architecture represents a significant leap for Azure, coming less than a year after the introduction of its ND GB200 v6 VMs.
Microsoft Azure Boosts Cloud AI Capabilities with Cutting-Edge Nvidia Hardware
In a recent blog post, Microsoft’s Azure cloud division detailed the development of these next-generation virtual machines. The core of this immense cluster is powered by over 4,600 Nvidia GB300 NVL72 systems, each featuring the company’s advanced Blackwell Ultra GPUs. These GPUs are interconnected through a high-speed InfiniBand network. Microsoft anticipates this cluster will dramatically reduce AI model training times from months to mere weeks, while also providing exceptional throughput for inference workloads, supporting models with “hundreds of trillions of parameters.”
The system’s innovative design is built on a rack-scale architecture. Each individual rack houses 18 virtual machines, collectively equipped with 72 GPUs and 36 Nvidia Grace CPUs. Data communication between each GPU occurs at an astonishing 800GBps via Nvidia’s Quantum-X800 InfiniBand, leveraging two GB200 NVL72 systems per rack.
Within each rack, ultra-fast links facilitate data transfer at an incredible 130TB per second. A massive 37TB of high-speed memory is dedicated to managing immense calculations. This configuration allows the system to achieve up to 1,440 petaflops (PFLOPS) of AI calculations per second using FP4 Tensor Cores, positioning it as one of the world’s fastest platforms for AI tasks.
The integration of NVLink and NVSwitch technologies within each rack creates specialized high-speed connections, enabling GPUs to communicate at unparalleled speeds. This ensures that 37TB of memory can exchange data at up to 130TB per second. Such tight integration empowers AI models to tackle larger tasks more quickly, process longer sequences of information, and efficiently run complex agentic (self-decision-making AI) or multimodal (AI capable of processing text, images, and audio simultaneously) workloads with minimal latency.
To scale operations beyond a single rack, Microsoft Azure employs a full fat-tree, non-blocking network architecture, ensuring seamless and unhindered communication across all racks. This network, driven by InfiniBand, allows AI training to scale effectively across tens of thousands of GPUs while maintaining minimal communication delays. By significantly reducing synchronization overhead (the time GPUs spend waiting for each other), the GPUs can dedicate more time to computation, accelerating the training of colossal AI models and reducing overall costs for researchers.
Azure’s meticulously co-designed stack incorporates custom protocols, specialized collective libraries, and advanced in-network computing features to guarantee network reliability and optimal utilization. Furthermore, Microsoft’s innovative cooling systems integrate standalone heat exchanger units with existing facility cooling to minimize water consumption. On the software front, the company has completely reengineered its stacks for storage, orchestration, and scheduling to complement this powerful new hardware.