Nvidia, others promise to use new Intel Xeon processors
Intel’s 4th Generation Xeon processors include virtual machine security as well as AI accelerators.
Intel has formally introduces its 4th Gen Intel Xeon Scalable Processors (aka Sapphire Rapids) and the Intel Max Series CPUs and GPUs, which isn’t much of a secret as we have documented the processors here already, but there are a few new features to go along with them.
Those new features include a virtual machine (VM) isolation solution and an independent trust verification service to help build what it calls the “industry’s most comprehensive confidential computing portfolio.”
The VM isolation solution, called Intel Trust Domain Extension (TDX), is designed to protect data inside a trusted execution environment (TEE) in the VM. It builds on Intel’s Software Guard Extensions (SGX) for security and is similar to AMD’s Secure Encrypted Virtualization in that it gives real-time encryption and protection to the contents of a VM.
Intel also introduced Project Amber, a multicloud SaaS-based trust verification service to help enterprises verify the TEEs, devices, and roots of trust. Project Amber launches later this year.
All told, Intel introduced 56 chips, from eight to 60 cores, with the top end weighing in at 350 watts. Still, the company is making sustainability claims for performance per watt.
For example, it claims that thanks to the accelerators and software optimizations, the new Xeon improves performance per watt efficiency by up to 2.9 times on average compared to the previous generation of Xeon CPUs.
Intel on On Demand
Intel also provided more information regarding its Intel On Demand service. The new Xeon Scalable processors ship with specialty processing engines onboard but that requre a license in order to be accessed.
The service includes an API for ordering licenses and a software agent for license provisioning and activation of the CPU features. Customer have the option of buying the On Demand features at time of purchase or post-purchase as an upgrade.
Intel is working with a few partners to implement a metering adoption model in which On Demand features can be turned on and off when needed and payment is based on usage versus a one-time licensing.
AI Everywhere
It has long been conventional wisdom that AI and machine learning workloads are best done on a GPU, but Intel wants to make the CPU an equal to the GPU, even as it prepares its own GPU for the data center.
The new Xeon processors come with a number of AI accelerators, and Intel is launching a software toolkit called AI Software Suite that provides both open-source and commercial tools to help build, deploy, and optimize AI workloads.
A key component of the new Xeons is the integration of Intel Advanced Matrix Extensions (AMX), which Intel said can provide a tenfold performance increase in AI inference over Intel third generation Xeon processors.
Intel also said the new processors support a tenfold increase in PyTorch real-time inference and training performance using Intel Advanced Matrix extensions versus the prior generation.
Nvidia Teams for AI Systems
OEMs Supermicro and Lenovoannounced new products based on the 4th Gen Xeon Scalable processors. A surprise announcement came from Nvidia, showing things are definitely more cordial between the two firms than they used to be.
Nvidia and its partners have launched a series of accelerated computing systems that are built for energy-efficient AI, combining the new Xeon with Nvidia’s H100 Tensor Core GPU. All told, there will be more than 60 servers featuring new Xeon Scalables and H100 GPUs from Nvidia partners around the world.
Nvidia says these systems will run workloads an average of 25 times more efficiently than traditional CPU-only data-center servers, and compared to prior-generation accelerated systems, these servers speed training and inference to boost energy efficiency by 3.5 times.
The servers also feature Nvidia’s ConnectX-7 network adapters. All told, this architecture delivers up to nine times greater performance than the previous generation and 20 times to 40 times the performance for AI training and HPC workloads than unaccelerated X86 dual-socket servers.