Nvidia has set new MLPerf performance benchmarking records on its H200 Tensor Core GPU and TensorRT-LLM software.
MLPerf Inference is a benchmarking suite that measures inference performance across deep-learning use cases.
The latest version of the benchmarking suite – MLPerf v4 – has seen the addition of two new workloads that represent generative AI use cases: a large language model (LLM) benchmark based on Meta’s Llama 2 70B, and a text-to-image test based on Stable Diffusion XL.
Nvidia has set performance records on both new workloads, providing the highest performance across all MLPerf Inference workloads in the data center category.
The company’s TensorRT-LLM is an open-source software library developed to double the speed of inferencing LLMs on its H100 GPUs. Across the MLPerf v4 GPT-J test, the H100 GPUs using TensorRT-LLM achieved speedups of 2.4x and 2.9x in the offline and server scenarios, compared to the performance provided by the GPUs six months earlier during the v3.1 test.
For the MLPerf Llama 2 70B benchmarking test, Nvidia’sTensorRT-LLM running on the company’s H200 GPUs delivered up to 43 percent and 45 percent higher performance compared to the H100 in the server and offline scenarios, respectively, when configured to a 1,000W TDP.
Read more: datacenterdynamics.com
Image: Nvidia H200 Tensor GPU