01.11.2024

HPE and Dell Unveil Advanced AI Server Solutions for LLM Training

The AI arms race intensifies as Hewlett Packard Enterprise (HPE) and Dell Technologies introduce new high-performance servers tailored for large language model (LLM) training and multimodal AI applications.

HPE’s New ProLiant Compute XD685 for Enhanced AI Processing

HPE recently launched the ProLiant Compute XD685, designed specifically to optimize AI workloads. The XD685 harnesses the power of AMD’s 5th Gen Epyc processors and Instinct MI325X accelerators, enabling efficient natural language processing, LLM development, and complex multimodal AI tasks. This new 5U server features a modular chassis allowing a flexible mix of CPU and GPU options, along with options for either air cooling or direct liquid cooling. With support for up to eight MI325X accelerators and 6 Tbps of memory bandwidth, the XD685 is built to tackle high-demand AI processes.

One key advantage of the MI325X accelerators is their HBM3E shared memory, which consolidates memory from multiple cards into one large pool. This configuration enables users to achieve robust performance with fewer cards, ultimately lowering total ownership costs.

The XD685 also comes with a comprehensive support package from HPE Services, aimed at facilitating efficient deployment and setup for large-scale AI clusters. HPE provides configuration, validation, and testing support, reducing the time needed to get clusters up and running. Security features, including the HPE Integrated Lights-Out (iLO) technology, are embedded directly into the hardware, providing enterprise-grade security for critical AI infrastructure.

HPE expects the ProLiant Compute XD685 to be widely available by Q1 of 2025.

Dell’s Expansion in AI Infrastructure with PowerEdge and PowerScale Solutions

Dell Technologies is advancing its AI portfolio with a suite of new products under the Dell AI Factory banner, addressing the growing demands of generative AI and LLM training.

Central to Dell’s launch are the PowerEdge XE9712 servers, built for robust LLM processing and large-scale inferencing. The XE9712 incorporates Nvidia’s GB200 NVL72 technology, scaling up to 36 NVIDIA Grace CPUs paired with 72 NVIDIA Blackwell GPUs in a rack-level setup. This NVLink-enabled GPU domain operates as a unified GPU, offering a performance boost of up to 30x for real-time inferencing of trillion-parameter LLMs.

Dell also introduced the PowerEdge M7725 server, optimized for high-density computing, catering to sectors like research, government, finance, and academia. The M7725 provides between 24K and 27K cores per rack, featuring 64 or 72 dual-socket nodes with AMD’s 5th Gen Epyc processors. With options for both direct liquid cooling and air cooling, the M7725 ensures efficient temperature management across intensive computational tasks.

Alongside compute solutions, Dell has upgraded its unstructured data storage with the new PowerScale devices. These additions deliver enhanced AI performance and simplified data management across global data landscapes. The PowerScale enhancements include faster metadata processing, a 61TB storage drive for higher capacity and efficiency, and a reduced data center footprint. Moreover, PowerScale now supports InfiniBand and 200GbE Ethernet adapters, achieving up to 63% higher throughput.

To support these high-density setups, Dell introduced the Integrated Rack 7000 (IR7000). Engineered with Open Compute Project (OCP) standards, the IR7000 enables advanced power management, sustainable cooling, and increased rack density. Designed for liquid cooling, the IR7000 supports up to 480KW deployments and captures nearly 100% of generated heat. It offers plug-and-play capability and compatibility with both Dell and third-party networking components.

Dell’s latest AI infrastructure offerings signify a substantial leap in performance, aiming to support the evolving needs of LLM training, large-scale AI applications, and data management in AI-centric industries.

Maxinvest