Amazon today debuted AWS Trainium, a chip custom-designed to deliver what the company describes as cost-effective machine learning model training in the cloud. It comes ahead of the availability of new Habana Gaudi-based Amazon Elastic Compute Cloud (EC2) instances built specifically for machine learning training, powered by Intel’s new Habana Gaudi processors.
“We know that we want to keep pushing the price performance on machine learning training, so we’re going to have to invest in our own chips,” AWS CEO Andy Jassy said during a keynote address at Amazon’s re:Invent conference this morning. “You have an unmatched array of instances in AWS, coupled with innovation in chips.”
Amazon claims that Trainium will offer the most teraflops of any machine learning instance in the cloud, where a teraflop translates to a chip being able to process one trillion calculations a second. (Amazon is quoting 30% higher throughput and 45% lower cost-per-inference compared with the standard AWS GPU instances.) When Trainium becomes available to customers in the second half of 2021 as EC2 instances and in SageMaker, Amazon’s fully managed machine learning development platform, it will support popular frameworks including Google’s TensorFlow, Facebook’s PyTorch, and MxNet. Moreover, Amazon says it will use the same Neuron SDK as Inferentia, the company’s cloud-hosted chip for machine learning inference.
“While Inferentia addressed the cost of inference, which constitutes up to 90% of ML infrastructure costs, many development teams are also limited by fixed ML training budgets,” AWS wrote in a blog post. “This puts a cap on the scope and frequency of training needed to improve their models and applications. AWS Trainium addresses this challenge by providing the highest performance and lowest cost for ML training in the cloud. With both Trainium and Inferentia, customers will have an end-to-end flow of ML compute from scaling training workloads to deploying accelerated inference.”
Absent benchmark results, it’s unclear how Trainium’s performance might compare with Google’s tensor processing units (TPUs), the search giant’s chips for AI training workloads hosted in Google Cloud Platform. Google says its forthcoming fourth-generation TPU offers more than double the matrix multiplication teraflops of a third-generation TPU. (Matrices are often used to represent the data that feeds into AI models.) It also offers a “significant” boost in memory bandwidth while benefiting from unspecified advances in interconnect technology.
Machine learning deployments have historically been constrained by the size and speed of algorithms and the need for costly hardware. In fact, a report from MIT found that machine learning might be approaching computational limits. A separate Synced study estimated that the University of Washington’s Grover fake news detection model cost $25,000 to train in about two weeks. OpenAI reportedly racked up a whopping $12 million to train its GPT-3 language model, and Google spent an estimated $6,912 training BERT, a bidirectional transformer model that redefined the state of the art for 11 natural language processing tasks.
Amazon has increasingly leaned into AI and machine learning training and inferencing services as demand in the enterprise grows. According to one estimate, the global machine learning market was valued at $1.58 billion in 2017 and is expected to reach $20.83 billion in 2024. In November, Amazon announced that it shifted part of the computing for Alexa and Rekognition to Inferentia-powered instances, aiming to make the work faster and cheaper while moving it away from Nvidia chips. At the time, the company claimed the shift to Inferentia for some of its Alexa work resulted in 25% better latency at a 30% lower cost.