Microsoft at the moment upgraded its DeepSpeed library for coaching massive neural networks with ZeRO-2. Microsoft says the reminiscence optimizing tech is able to coaching machine studying fashions with 170 billion parameters. For context, Nvidia’s huge Megatron language mannequin is among the largest on this planet at the moment at 11 billion parameters.
Today’s announcement follows the February open supply launch of the DeepSpeed library, which was used to create Turing-NLG. At 17 billion parameters, Turing-NLG is the most important recognized language mannequin on this planet at the moment. Microsoft launched the Zero Redundancy Optimizer (ZeRO) in February alongside DeepSpeed.
ZeRO achieves its outcomes by decreasing reminiscence redundancy in knowledge parallelism, one other approach for becoming massive fashions into reminiscence. Whereas ZeRO-1 included some mannequin state reminiscence optimization, ZeRO-2 delivers optimization for activation reminiscence and fragmented reminiscence.
DeepSpeed is made for distributed mannequin coaching throughout a number of servers, however ZeRO-2 additionally comes with enhancements for coaching fashions on a single GPU, reportedly coaching fashions like Google’s BERT 30% sooner.
Additional particulars might be introduced Wednesday in a keynote tackle by Microsoft CTO Kevin Scott.
The information comes at first of Microsoft’s all-digital Build developer convention, the place numerous AI developments have been introduced — together with the debut of the WhiteNoise toolkit for differential privateness in machine studying and Project Bonsai for industrial purposes of AI.
Last week, Nvidia CEO Jensen Huang unveiled the Ampere GPU structure and A100 GPU. The new GPU chip — alongside traits just like the creation of multimodal fashions and large recommender methods — will result in bigger machine studying fashions within the years forward.