Startup Delivers Accelerated Compute IP
December 1, 2020
by Kevin Morris
Autonomous driving is a wildly challenging problem. Of all the headline-grabbing technologies in development today, replacing the human driver in a car probably takes the most computing power, although you wouldn’t guess that from the obvious lack of computing power demonstrated by many of our fellow human drivers on the road. But the invention of “artificial stupidity” notwithstanding (did you SEE the way that idiot cut me off?), the industry is taking a methodical, phased approach to the task, with six levels of automation (numbered zero to five) starting with no automation whatsoever (level zero) up to no human intervention whatsoever (level five).
By the time we reach levels three (conditional automation) and four (high automation), the computing power required is enormous, because the vehicle must have a contextual understanding of the driving environment. That means rapidly processing loads of data from a large number of sensors and making critical real-time decisions based on that data. Clearly, a complex heterogeneous computing system is required, with massive parallel capabilities, low latency, minimal power consumption, small footprint, high reliability, and – ultimately – low cost. AI inferencing must be done in coordination with conventional computing, and workloads must be managed and coordinated across a bewildering array of computational elements.
Like we said, it’s a wildly challenging problem.
VSORA is a technology startup that has been creating digital signal processing (DSP) design IP for the wireless and communications industry, and they have realized that the architectures they were creating to support 5G infrastructure were applicable in autonomous driving as well, by combining their approach to DSP acceleration with AI inference technology. The result is IP that implements a new programmable, scalable, software-driven multi-core, dual-purpose device that combines DSP and artificial intelligence (AI) acceleration for the autonomous driving industry.
Their design, called AD1028, is an IP block implementing what the company says is “the first PetaFLOPS computational platform to accelerate Level 4 (L4) and Level 5 (L5) autonomous vehicle designs. The IP block allows the design of a compact, low-power single chip L4/L5 control unit.” AD1028 can be implemented on 7-nanometer process technology (and possibly on 5nm technology). In 7nm, the AD1028 logic area measures 35 mm2 and delivers petaFLOPS performance on less than 35 watts. VSORA says AD1029 has computational power of 1,028 TeraFLOPS (one PetaFLOPS) running at 2GHz, and it is capable of processing eight million pixel images on a Yolo-v3 in less than 7 ms and full HD images in less than 1.6 ms.
The IP block is delivered as synthesizable RTL,and is fully programmable with a complete application development flow entirely done using high-level languages that VSORA describes as “Matlab-like,” “Tensorflow-like,” and C++. This high-level programmability should give automotive developers the ability to quickly implement autonomous driving designs without needing to dive into low-level languages and without being constrained by hardwired accelerator solutions.
A typical application has an array of environmental sensors such as radar, lidar, cameras, ultrasonic sensors, GPS/GNSS, inertial sensors, and so forth, as well as external data supplied by cellular vehicle-to-everything (C-V2X) networks. The data from all these sources must be collected, filtered, and combined to extract the context – the actual situation in the environment through which the car is moving – what obstacles are relevant to the car’s path, what traffic rules are in force, what environmental factors such as weather and road condition exist, what traffic exists along the path, and anything else relevant to the safe conduct of the trip.
VSORA attacks this sensor fusion challenge with an architecture that combines DSP and DNN acceleration exchanging data via a shared very-high-bandwidth on-chip memory. This allows signal processing and AI algorithms to execute in parallel, minimizing latency on the “perception” phase of autonomous driving (typically handled by conventional DSP algorithms), and providing the “planning” phase of the algorithms (which are often AI inference) with direct low-latency access to the data to make decisions rapidly.
This setup with large amounts of on-chip memory shortens the datapath and has a dramatic impact on throughput, latency, and power reduction compared with relying on external memory, which is often a bottleneck in these systems. VSORA says AD1028 “can handle multiple parallel instances of any combination of both types of sensor fusion, including hybrid fusion, when the fusion requires signal processing as well as AI functionality [such as] fusions between camera and lidar and/or radar.”
Ultimately, the only way to get cost, power consumption, and form factor down and performance up to the levels required for mass deployment will be for the various suppliers to create their own custom silicon. Having pre-designed IP that is up to the task will accelerate that custom silicon development, reduce the risk, and allow system designers to take advantage of the novel technology VSORA has developed in their own custom designs, and the in-system programmability will allow the silicon to easily and rapidly adapt to the high rate of change in ADAS and autonomous driving technology, without requiring silicon re-spins.
VSORA says the AD1028 is the first in a family of products with a range of processing power. The AD514 will be capable of 514 TFLOPS and the AD2056 of 2056 TFLOPS respectively. So AD1028 will have siblings with either more or less compute power as needed. The company says both of the additional devices will be released before the end of 2020.