Home Hardware Amazon is ditching Nvidia GPUs in favor of their own silicon

Amazon is ditching Nvidia GPUs in favor of their own silicon

What just happened? Amazon has announced that they’re migrating their artificial intelligence processing to custom AWS Inferentia chips. This means that Amazon’s biggest inferencing services, like virtual assistant Alexa, will be processed on faster, specialized silicon instead of somewhat multi-purpose GPUs.

Amazon has already shifted about 80% of Alexa processing onto Elastic Compute Cloud (EC2) Inf1 instances, which use the new AWS Inferentia chips. Compared to the G4 instances, which used traditional GPUs, the Inf1 instances push throughput up by 30% and costs down by 45%. Amazon reckons that they’re the best instances on the market for inferencing natural language and voice processing workloads.

Alexa works like this: the actual speaker box (or cylinder, as it may be) does basically nothing, while AWS processors in the cloud do everything. Or to put it more technically… the system kicks in once the wake word has been detected by the Echo’s on-device chip. It starts streaming the audio to the cloud in real-time. Off in a data center somewhere, the audio is turned into text (this is an example of inferencing). Then, meaning is withdrawn from the text (another example of inferencing). Any required actions are completed, like pulling up the day’s weather information.

Once Alexa has completed your request, she needs to communicate the answer to you. What she’s supposed to say is chosen from a modular script. Then the script is turned into an audio file (another example of inferencing) and sent to your Echo device. The Echo plays the file and you decide to bring an umbrella to work with you.

Evidently enough, inferencing is a big part of the job. It’s unsurprising that Amazon has invested millions of dollars into making the perfect inferencing chips.

Speaking of, the Inferentia chips are comprised of four NeuronCores. Each one implements a “high-performance systolic array matrix multiply engine.” More or less, each NeuronCore is made up of a very large number of small data processing units (DPUs) that process data in a linear, independent fashion. Each Inferentia chip also has a huge cache, which improves latencies.

Most Popular

Linux for Apple Silicon Is Coming

After bringing Linux to the PlayStation 4, famous developer Hector Martin, also known as marcan, is getting ready for another important project for the...

Flexe raises $70 million to make logistics networks more elastic

Flexe, an on-demand warehousing and technology platform used by retailers like Walmart, has raised $70 million in a round of funding led by T....

2020 Venture Capital Impact Report Released

Highlights of the 17-page report show high marks in racial equity, gender equality, geographic equality and women on boards ATLANTA–(BUSINESS WIRE)–December 1, 2020– Valor Ventures, a...

Zoom’s surging free user base dents margins as cloud costs rise

(Reuters) — Zoom warned on Monday its gross margins would remain under pressure going into 2021 as the surging number of free users of...

Recent Comments