Nvidia unwrapped its Nvidia A100 synthetic intelligence chip as we speak, and CEO Jensen Huang referred to as it the final word instrument for advancing AI.
Huang stated it might make supercomputing duties — that are very important within the battle in opposition to COVID-19 — rather more cost-efficient and highly effective than as we speak’s costlier methods.
The chip has a monstrous 54 billion transistors (the on-off switches which can be the constructing blocks of all issues digital), and it might execute 5 petaflops of efficiency, or about 20 occasions greater than the previous-generation chip Volta. Huang made the announcement throughout his keynote on the Nvidia GTC occasion, which was digital this yr.
The launch was initially scheduled for March 24 however was delayed by the pandemic. Nvidia rescheduled the discharge for as we speak, because the chips and the DGX A100 methods that used the chips at the moment are obtainable and transport.
The Nvidia A100 chip makes use of the identical Ampere structure (named after French mathematician and physicist André-Marie Ampère) that may very well be utilized in shopper purposes similar to Nvidia’s GeForce graphics chips. In distinction to Advanced Micro Devices (AMD), Nvidia is targeted on making a single microarchitecture for its GPUs for each industrial AI and shopper graphics use. But Huang stated mixing and matching the totally different parts on the chip will decide whether or not it’s used for AI or graphics.
The DGX A100 is the third technology of Nvidia’s AI DGX platform, and Huang stated it basically places the capabilities of a whole datacenter right into a single rack. That is hyperbole, however Paresh Kharya, director of product administration datacenter and cloud platforms, stated in a press briefing that the 7-nanometer chip, codenamed Ampere, can take the place of quite a lot of AI methods getting used as we speak.
“You get all of the overhead of additional memory, CPUs, and power supplies of 56 servers … collapsed into one,” Huang stated. “The economic value proposition is really off the charts, and that’s the thing that is really exciting.”
Above: Nvidia’s Jensen Huang holds the world’s largest graphics card.
For occasion, to deal with AI coaching duties as we speak, one buyer wants 600 central processing unit (CPU) methods to deal with thousands and thousands of queries for datacenter purposes. That prices $11 million, and it will require 25 racks of servers and 630 kilowatts of energy. With Ampere, Nvidia can do the identical quantity of processing for $1 million, a single server rack, and 28 kilowatts of energy.
“That’s why you hear Jensen say, ‘The more you buy, the more you save,’” Kharya stated.
Huang added, “It’s going to replace a whole bunch of inference servers. The throughput of training and inference is off the charts — 20 times is off the charts.”
The first order
Above: DGX A100 servers in use on the Argonne National Lab.
The first order for the chips goes to the U.S. Department of Energy’s (DOE) Argonne National Laboratory, which can use the cluster’s AI and computing energy to higher perceive and battle COVID-19.
DGX A100 methods use eight of the brand new Nvidia A100 Tensor Core GPUs, offering 320 gigabytes (GBs) of reminiscence for coaching the most important AI knowledge units, and the most recent high-speed Nvidia Mellanox HDR 200Gbps interconnects.
Multiple smaller workloads might be accelerated by partitioning the DGX A100 into as many as 56 cases per system, utilizing the A100 multi-instance GPU characteristic. Combining these capabilities permits enterprises to optimize computing energy and assets on demand to speed up various workloads — together with knowledge analytics, coaching, and inference — on a single totally built-in, software-defined platform.
Immediate DGX A100 adoption and help
Above: DGX A100 system at Argonne National Lab.
Nvidia stated a lot of the world’s largest corporations, service suppliers, and authorities companies have positioned preliminary orders for the DGX A100, with the primary methods delivered to Argonne earlier this month.
Rick Stevens, affiliate laboratory director for Computing, Environment, and Life Sciences at Argonne National Lab, stated in a press release that the middle’s supercomputers are getting used to battle the coronavirus, with AI fashions and simulations working on the machines in hopes of discovering remedies and a vaccine. The DGX A100 methods’ energy will allow scientists to do a yr’s price of labor in months or days.
The University of Florida would be the first U.S. establishment of upper studying to obtain DGX A100 methods, which it’s going to deploy to infuse AI throughout its whole curriculum to foster an AI-enabled workforce.
Among different early adopters are the Center for Biomedical AI on the University Medical Center Hamburg-Eppendorf, Germany, which can leverage DGX A100 to advance scientific determination help and course of optimization.
Thousands of previous-generation DGX methods are at the moment getting used across the globe by a variety of private and non-private organizations. Among these customers are among the world’s main companies, together with automakers, well being care suppliers, retailers, monetary establishments, and logistics corporations which can be adopting AI throughout their industries.
Above: A DGX SuperPod
Nvidia additionally revealed its next-generation DGX SuperPod, a cluster of 140 DGX A100 methods able to attaining 700 petaflops of AI computing energy. Combining 140 DGX A100 methods with Nvidia Mellanox HDR 200Gbps InfiniBand interconnects, the corporate constructed its personal next-generation DGX SuperPod AI supercomputer for inner analysis in areas similar to conversational AI, genomics, and autonomous driving.
It took solely three weeks to construct that SuperPod, Kharya stated, and the cluster is without doubt one of the world’s quickest AI supercomputers — attaining a degree of efficiency that beforehand required 1000’s of servers.
To assist prospects construct their very own A100-powered datacenters, Nvidia has launched a brand new DGX SuperPod reference structure. This provides prospects a blueprint that follows the identical design rules and finest practices Nvidia used.
DGXpert program, DGX-ready software program
Above: Nvidia A100 chip on a circuit card.
Nvidia additionally launched the Nvidia DGXpert program, which brings DGX prospects along with the corporate’s AI consultants, and the Nvidia DGX-ready software program program, which helps prospects benefit from licensed, enterprise-grade software program for AI workflows.
The firm stated that every DGX A100 system has eight Nvidia A100 Tensor Core graphics processing items (GPUs), delivering 5 petaflops of AI energy, with 320GB in whole GPU reminiscence and 12.4TB per second in bandwidth.
The methods even have six Nvidia NVSwitch interconnect materials with third-generation Nvidia NVLink know-how for 4.eight terabytes per second of bi-directional bandwidth. And they’ve 9 Nvidia Mellanox ConnectX-6 HDR 200Gb per second community interfaces, providing a complete of three.6 terabits per second of bi-directional bandwidth.
The chips are made by TSMC in a 7-nanometer course of. Nvidia DGX A100 methods begin at $199,000 and are transport now via Nvidia Partner Network resellers worldwide.
Huang stated the DGX A100 makes use of the HGX motherboard, which weighs about 50 kilos and is “the most complex motherboard in the world.” (This is the board he pulled out of his residence oven in a teaser video). It has 30,000 elements and a kilometer of wire traces.
As for a shopper graphics chip, Nvidia would configure an Ampere-based chip in a really totally different approach. The A100 makes use of high-bandwidth reminiscence for datacenter purposes, however that wouldn’t be utilized in shopper graphics. The cores would even be closely biased for graphics as an alternative of the double-precision floating level calculations datacenters want, he stated.
“We’ll bias it differently, but every single workload runs on every single GPU,” Huang stated.