Elon Musk announces GROK 3 training at Memphis with NVIDIA H100 GPUs

Elon Musk announces GROK 3 training at Memphis with NVIDIA H100 GPUs

full version at cryptopolitan

Elon Musk has officially announced the commencement of GROK 3 training at the Memphis supercomputer facility, equipped with NVIDIA’s current-generation H100 GPUs. The facility, which Musk refers to as ‘the most powerful AI training cluster in the world,’ began operating on Monday with the aid of 100,000 liquid-cooled H100 GPUs on a single RDMA fabric.

The training was scheduled at 4:20 am local time in Memphis. In his subsequent tweet, Musk stated that the world’s “most advanced AI” could be developed by December of this year. Musk also tweeted about the achievement on X and congratulated the teams from xAI, X, and NVIDIA for their excellent work. 

xAI shifts strategy and cancels Oracle server deal

The announcement comes in the wake of the recent cancellation of a $10 billion server deal between xAI and Oracle. Musk indicated that the xAI Gigafactory of Compute, initially expected to be operational by the fall of 2025, has started operations ahead of schedule.

xAI had earlier outsourced its AI chips from Oracle but decided to disengage in order to develop its own advanced supercomputer. The project now plans to harness the potential of the state-of-the-art H100 GPUs that cost around $30,000 each. GROK 2 used 20,000 GPUs, and GROK 3 needed five times as many GPUs to build a more sophisticated AI chatbot. 

Also Read:Elon Musk seeks public opinion on $5 billion xAI investment for Tesla

This is quite surprising, especially because NVIDIA has just recently announced the upcoming release of the H200 GPUs, which are based on the Hopper architecture. The decision to begin training with H100 GPUs instead of waiting for the H200 or the forthcoming Blackwell-based B100 and B200 GPUs. The H200 GPUs, which entered mass production in Q2, promise significant performance enhancements, but xAI’s immediate focus is on leveraging the existing H100 infrastructure to meet its ambitious targets.

Analyst questions power supply for Memphis Supercluster

Dylan Patel, an expert in AI and semiconductors, initially raised concerns over power concerns with running the Memphis Supercluster. He pointed out that the current grid supply of 7 megawatts can only sustain about 4,000 GPUs. The Tennessee Valley Authority (TVA) is expected to supply 50MW to the facility as a deal that is expected to be signed by the 1st of August. However, the substation that will be needed to meet the full power demand will only be completed in late 2024. 

When analyzing satellite images, Patel noted that Musk has employed 14 VoltaGrid mobile generators, which can yield 2. 5 megawatts each. Altogether, these generators produce 35 megawatts of electricity. In addition to the 8MW from the grid, this makes it a total of 43MW, which is enough to power about 32,000 H100 GPUs with some power capping.

Recent conversions

0.18 BNB to USD 1 BTC to FJD 0.0025 BNB to ETH 0.146 BTC to NOK 19000 PKR to CZK 0.001 BTC to PHP 20 BNB to CZK 320 DOGE to USD 1 PLN to UAH 1 NGN to BTC 1.75 BTC to CAD