100x improvement in throughput and energy efficiency on Xilinx™ FPGAs

« Back to Case Studies

Xilinx™, an AMD company, is a technology and semiconductor company and primary supplier of programmable logic devices. Known for inventing the field-programmable gate array (FPGA), Xilinx provides adaptable, accelerated computing that can be deployed at global scale and respond to dynamic needs.

challenge

Overcoming performance problems in deep learning without increasing energy consumption

Deep learning networks today have accomplished a great deal but are hitting bottlenecks as they scale to more complex tasks and bigger models. Attempts to break through the performance bottlenecks in today’s machine learning techniques typically require adding more compute power and more data. The result is enormous models that consume vast amounts of power, limiting scalability and creating environmental damage.

We need a new approach to achieve significant breakthroughs in performance and scalability while reducing power consumption on today’s hardware.

SOLUTION

Brain-inspired, optimized networks on FPGAs yield multiplicative throughput improvements

In contrast to the standard dense representations used in most deep learning networks, we created networks that borrow several aspects of the brain’s efficient structure. These brain-inspired, optimized networks not only deliver equivalent accuracy to their standard counterparts, they drastically reduce computational requirements and can run on today’s hardware.

We demonstrated these performance improvements on inference tasks using the Google Speech Commands (GSC) dataset. We created optimized networks on two off-the-shelf Xilinx products:

Alveo™ U250 – a powerful platform designed for datacenters
Zynq™ UltraScale+ ZU3EG – a smaller platform designed for embedded applications

“We achieve factors of improvement, and we beat the GPU.”

Victor Peng

CEO, Xilinx

RESULTS

100x throughput speedup and power improvement, and new possibilities for deep learning at the edge

Our optimized networks delivered over 100x throughput speed-up and power improvement over their traditional counterparts on the large FPGA platform. Additionally, our optimized network was able to run efficiently on even the smallest of these platforms, where the standard network could not fit, opening new possibilities for Edge AI.

BENEFITS

Better resource utilization, untapped edge opportunities and critical energy savings

This dramatic speed improvement provides great benefits, enabling:

Implementation of much larger networks using the same resource
Implementation of more copies of networks on the same resource
Ability to run networks on edge platforms where traditional networks don’t fit
Massive energy savings and lower costs due to scaling efficiencies

ADDITIONAL RESOURCES

Whitepaper: FPGA 100x Acceleration in Deep Learning Networks
Press: Xilinx and Numenta claim dramatic speed-up of neural nets versus Nvidia GPUs

Ready to supercharge your AI solutions with NuPIC?

Related Case Studies

Developing AI-powered games on existing CPU infrastructures without breaking the bank

AI is opening a new frontier for gaming, enabling more immersive and interactive experiences than ever before. NuPIC enables game studios and developers to leverage these AI technologies on existing CPU infrastructure as they embark on building new AI-powered games.

20x inference acceleration for long sequence length tasks on Intel Xeon Max Series CPUs

Numenta technologies running on the Intel 4th Gen Xeon Max Series CPU enables unparalleled performance speedups for longer sequence length tasks.

Numenta + Intel achieve 123x inference performance improvement for BERT Transformers

Numenta technologies combined with the new Advanced Matrix Extensions (Intel AMX) in the 4th Gen Intel Xeon Scalable processors yield breakthrough results.