THE LATEST NEWS
AMD Updates AI Engine In New Versal Series

NUREMBERG, Germany—AMD has updated the AI Engine in its second-generation Versal AI Edge Series Gen 2, and has added post-processing capabilities to put an application’s entire data path on a single piece of silicon.

Manuel Uhm, director of Versal marketing at AMD, told EE Times that the focus on integrating data pre-processing was part of the Xilinx heritage that came into AMD when the company was acquired in 2022.

“When you look at non-adaptive SoCs from other companies, typically what you see is they have an inference capability along with a number of high-performance Arm cores, and they often have an FPGA at the front end to do pre-processing, that’s very common,” he said. “Another approach, which customers are taking with our first gen Versal AI Edge parts, is using programmable logic for pre-processing and the inference on the AI Engines, but then have a separate CPU or MPU, and there’s a lot of overhead associated with doing that.”

Versal AI Edge Gen 1 had pre-processing capabilities provided by its programmable logic block and AI inference accelerated by AMD’s on-chip AI Engine. Second-generation parts have added embedded Arm CPUs for post-processing.

Arm cores have been added to Versal AI Edge for post-processing. (Source: AMD)

Putting pre-processing, AI inference and post-processing on the same chip avoids using multiple chips, which would be bigger, more complicated and use more power, Uhm said. Multiple chips also means more security vulnerabilities and points of failure.



For the second gen, AMD has added 8 Arm Cortex A78E application processors and 10 Arm Cortex-R52 real-time processors. These will be used for post-processing tasks like complex decision making based on information from AI inference, and control functions. Both are designed for ASIL-D and SIL-3 applications, including automotive, industrial, robotics, machine vision, aerospace and defense.


AMD customer Subaru has already said it will use the second generation Versal AI Edge Gen 2 in its future EyeSight systems, which handle pre-collision braking, lane departure warnings, adaptive cruise control and lane keep assist. Subaru needs to run both application processors and real-time processors in lock-step for functional safety. A stereo vision algorithm runs on camera data in the programmable logic portion of the device, enabling Subaru to differentiate versus LiDAR-based systems, which are more expensive. Subaru also cited AMD’s low-latency AI Engine with advanced data type support as part of its reasons for choosing Versal AI Edge Gen 2.


Second-gen AI Engine

AMD’s AI Engine is an array of vector processors designed for efficient AI inference. Versus the first generation, AMD has doubled the amount of compute and increased the memory in each tile. Parts with AI Engines between 30 and 185 TOPS (INT8) are in the lineup.


AMD has also introduced support for new shared exponent number formats MX6 and MX9, which help throughput without reducing accuracy compared to integer formats. The company said up to 3× the TOPS/W is achievable using bigger compute tiles combined with quantizing to MX6 instead of INT8.



AMD’s AI Engine now has hardened control processors, freeing up programmable logic for other tasks. (Source: AMD)

FP8 and FP16 formats have also been introduced for applications that need the dynamic range and resolution of floating point.


“Our customers are doing all kinds of amazing, cool, crazy things and we have to support those,” Uhm said. “Some of them require the additional resolution and they’re willing to give up performance to do that. For others, it’s all about the throughput, and they can sacrifice some level of dynamic range or accuracy. We give them the ability to choose where they want to be.”


AMD has also hardened control processors for the AI Engine, which previously were implemented in programmable logic, into the engine itself.



AMD Versal AI Edge Gen 2 parts will come in versions between 30 and 185 TOPS (INT8). (Source: AMD)

“Whereas before you used to have to use a combination of some level of programmable logic in order to support the overall inference engine…the goal is you never have to go to the programmable logic,” Uhm said. “The programmable logic is used specifically for other things you need to do to supplement pre-processing and maybe post processing, signal processing, data conditioning, etc.”


These control processors handle data movement through the compute arrays, which Uhm said is “non-trivial.”


“When you think of an array of hundreds of processors and you may be broadcasting, you may be unicasting, you may be parallelising the flows, that’s non-trivial, we’ve found,” he said. “Building that into the array makes it a lot simpler to build out the NPU IP on top of the AI Engine.”


Vitis AI

Versal AI Edge Gen 2 will use AMD’s Vitis development environment, which is built for heterogeneous compute, to program all the on-chip compute blocks.


“The secret here is having our tools all targeting a single device, which allows you to do system level debug, even when you have separate developers working in parallel,” Uhm said. “Software folks working in parallel with hardware folks, in parallel with AI scientists…we’re able to bring it together in a single development environment, with a single tool and do the system-level debugging, which you can’t do when you have four or five different devices with four or five different tools.”


Within Vitis, Vitis AI includes a quantizer, pruning, model compiler, runtime, drivers and firmware. It can also be used with third-party quantizers and sparsity tools.


“We’re not going to teach AI scientists how to write RTL,” Uhm said, noting that Vitis AI sits above AMD’s Vivado RTL flow in the software stack. “They want to use PyTorch, TensorFlow, Triton, the tools they are used to, and they want to leverage open-source models. The key for us is by having the auto-quantization happen in Vitis AI, we can optimize open-source models for our processor.”


Vitis can also compile AI workloads for programmable logic, though in general this is not as efficient as the AI Engine. However, it may be used in some circumstances, perhaps for unsupported datatypes, Uhm suggested.


Silicon samples for Versal AI Edge Gen 2 parts will be available in the first half of 2025.

 ----Form EE Times

Back
Cyient Spin-off Eyes Global ASIC Market
India is witnessing a new wave in the electronics industry. With the announcement of OSAT facilities in Gujarat, Tamil Nadu and As...
More info
Cadence to Buy Artisan to Support Chiplet, 3D IC Future
Cadence Design Systems has entered into a definitive agreement to acquire Arm’s Artisan Foundation IP business, complementing its A...
More info
Intel Sells Majority Stake in Altera to Silver Lake
Ailing chip giant Intel has sold a majority stake in FPGA maker Altera to private equity group Silver Lake. The deal values Alter...
More info