In a technical paper quietly launched earlier this yr, IBM detailed what it calls the IBM Neural Computer, a custom-designed, reconfigurable parallel processing system designed to analysis and develop rising AI algorithms and computational neuroscience. This week, the corporate printed a preprint describing the primary software demonstrated on the Neural Computer: a deep “neuroevolution” system that mixes the {hardware} implementation of an Atari 2600, picture preprocessing, and AI algorithms in an optimized pipeline. The coauthors report outcomes aggressive with state-of-the-art strategies, however maybe extra considerably, they declare that the system achieves a report coaching time of 1.2 million picture frames per second.

The Neural Computer represents one thing of a shot throughout the bow within the AI computational arms race. According to an evaluation just lately launched by OpenAI, from 2012 to 2018, the quantity of compute used within the largest AI coaching runs grew greater than 300,000 instances with a 3.5-month doubling time, far exceeding the tempo of Moore’s regulation. On tempo with this, supercomputers like Intel’s forthcoming Aurora on the Department of Energy’s Argonne National Laboratory and AMD’s Frontier at Oak Ridge National Laboratory promise in extra of an exaflop (a quintillion floating-point computations per second) of computing efficiency.

Video video games are a well-established platform for AI and machine studying analysis. They’ve gained foreign money not solely due to their availability and the low price of working them at scale, however as a result of in sure domains like reinforcement studying, the place AI learns optimum behaviors by interacting with the surroundings in pursuit of rewards, sport scores function direct rewards. AI algorithms growing inside video games have proven to be adaptable to extra sensible makes use of, like protein folding prediction. And if the outcomes from IBM’s Neural Computer show to be repeatable, the system might be used to speed up these AI algorithms’ growth.

The Neural Computer

IBM’s Neural Computer consists of 432 nodes (27 nodes throughout 16 modular playing cards) primarily based on field-programmable gate arrays (FPGAs) from Xilinx, a longtime strategic collaborator of IBM’s. (FPGAs are built-in circuits designed to be configured after manufacturing.) Each node contains a Xilinx Zynq system-on-chip — a dual-core ARM A9 processor paired with an FPGA on the identical die — together with 1GB of devoted RAM. The nodes are organized in a 3D mesh topology, interconnected vertically with electrical connections referred to as through-silicon vias that go utterly by means of silicon wafers or dies.

VB Transform 2020 Online – July 15-17: Join main AI executives on the AI occasion of the yr. Register today and save 30% off digital entry passes.

IBM claims its Neural Computer achieves record AI model training time

Above: A single card from IBM’s Neural Computer.

Image Credit: IBM

On the networking facet, the FPGAs present entry to the bodily communication hyperlinks amongst playing cards with a purpose to set up a number of distinct channels of communication. A single card can theoretically help switch speeds as much as 432GB per second, however the Neural Computer’s community interfaces may be adjusted and progressively optimized to finest go well with a given software.

“The availability of FPGA resources on every node allows application-specific processor offload, a feature that is not available on any parallel machine of this scale that we are aware of,” wrote the coauthors of a paper detailing the Neural Computer’s structure. “[M]ost of the performance-critical steps [are] offloaded and optimized on the FPGA, with the ARM [processor] … providing auxiliary support.”

Playing Atari video games with AI

The researchers used 26 out of 27 nodes per card inside the Neural Computer, finishing up experiments on a complete of 416 nodes. Two cases of their Atari game-playing software — which extracted frames from a given Atari 2600 sport, carried out picture preprocessing, ran the photographs by means of machine studying fashions, and carried out an motion inside the sport — ran on every of the 416 FPGAs, scaling as much as 832 cases working in parallel.

To acquire the very best efficiency, the group shied away from emulating the Atari 2600, as a substitute opting to make use of the FPGAs to implement the console’s performance at greater frequencies. They tapped a framework from the open supply MiSTer project, which goals to recreate consoles and arcade machines utilizing fashionable {hardware}, and bumped the Atari 2600’s processor clock to 150 MHz up from 3.58 MHz. This produced roughly 2,514 frames per second in contrast with the unique 60 frames per second.

In the picture preprocessing step, IBM’s software transformed the frames from colour to grayscale, eradicated flickering, rescaled pictures to a smaller decision, and stacked the frames into teams of 4. It then handed these onto an AI mannequin that reasoned concerning the sport surroundings and a submodule that chosen the motion for the subsequent frames by figuring out the utmost reward as predicted by the AI mannequin.

IBM claims its Neural Computer achieves record AI model training time

Above: Results from the experiments.

Image Credit: IBM

Yet one other algorithm — a genetic algorithm — ran on an exterior pc related to the Neural Computer through a PCIe connection. It evaluated the efficiency of every occasion and recognized the top-performing of the bunch, which it chosen as “parents” of the subsequent technology of cases.

Over the course of 5 experiments, IBM researchers ran 59 Atari 2600 video games on the Neural Computer. The outcomes indicate that the method wasn’t data-efficient in contrast with different reinforcement studying strategies — it required 6 billion sport frames in whole and failed at difficult exploration video games like Montezuma’s Revenge and Pitfall. But it managed to outperform a preferred baseline — a Deep Q-network, an structure pioneered by DeepThoughts — in 30 out of 59 video games after 6 minutes of coaching (200 million coaching frames) versus the Deep-Q community’s 10 days of coaching. With 6 billion coaching frames, it surpassed the Deep Q-network in 36 video games whereas taking 2 orders of magnitude much less coaching time (2 hours and 30 minutes).