100x improvement in throughput and energy efficiency on Xilinx™ FPGAs
Xilinx™, an AMD company, is a technology and semiconductor company and primary supplier of programmable logic devices. Known for inventing the field-programmable gate array (FPGA), Xilinx provides adaptable, accelerated computing that can be deployed at global scale and respond to dynamic needs.
challenge
Overcoming performance problems in deep learning without increasing energy consumption
Deep learning networks today have accomplished a great deal but are hitting bottlenecks as they scale to more complex tasks and bigger models. Attempts to break through the performance bottlenecks in today’s machine learning techniques typically require adding more compute power and more data. The result is enormous models that consume vast amounts of power, limiting scalability and creating environmental damage.
We need a new approach to achieve significant breakthroughs in performance and scalability while reducing power consumption on today’s hardware.
SOLUTION
Brain-inspired, optimized networks on FPGAs yield multiplicative throughput improvements
In contrast to the standard dense representations used in most deep learning networks, we created networks that borrow several aspects of the brain’s efficient structure. These brain-inspired, optimized networks not only deliver equivalent accuracy to their standard counterparts, they drastically reduce computational requirements and can run on today’s hardware.
We demonstrated these performance improvements on inference tasks using the Google Speech Commands (GSC) dataset. We created optimized networks on two off-the-shelf Xilinx products:
- Alveo™ U250 – a powerful platform designed for datacenters
- Zynq™ UltraScale+ ZU3EG – a smaller platform designed for embedded applications
RESULTS
100x throughput speedup and power improvement, and new possibilities for deep learning at the edge
Our optimized networks delivered over 100x throughput speed-up and power improvement over their traditional counterparts on the large FPGA platform. Additionally, our optimized network was able to run efficiently on even the smallest of these platforms, where the standard network could not fit, opening new possibilities for Edge AI.
BENEFITS
Better resource utilization, untapped edge opportunities and critical energy savings
This dramatic speed improvement provides great benefits, enabling:
- Implementation of much larger networks using the same resource
- Implementation of more copies of networks on the same resource
- Ability to run networks on edge platforms where traditional networks don’t fit
- Massive energy savings and lower costs due to scaling efficiencies
ADDITIONAL RESOURCES
Ready to supercharge your AI solutions with NuPIC?
Related Case Studies
Developing AI-powered games on existing CPU infrastructures without breaking the bank
AI is opening a new frontier for gaming, enabling more immersive and interactive experiences than ever before. NuPIC enables game studios and developers to leverage these AI technologies on existing CPU infrastructure as they embark on building new AI-powered games.
20x inference acceleration for long sequence length tasks on Intel Xeon Max Series CPUs
Numenta technologies running on the Intel 4th Gen Xeon Max Series CPU enables unparalleled performance speedups for longer sequence length tasks.
Numenta + Intel achieve 123x inference performance improvement for BERT Transformers
Numenta technologies combined with the new Advanced Matrix Extensions (Intel AMX) in the 4th Gen Intel Xeon Scalable processors yield breakthrough results.