Embedded Systems

GOURD: Tensorizing Streaming Applications to Generate Multi-Instance Compute Platforms

by Patrick Schmid, Paul Palomero Bernardo, Christoph Gerum, and Oliver Bringmann
In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 43(11): 4166-4177, 2024.

Keywords: Integrated circuits, Actuators, Design automation, Shape, Memory architecture, Focusing, Edge AI, Hardware, Sensors, Data orchestration, hardware platform generation, multidimensional data flow, synchronization

Abstract

In this article, we rethink the dataflow processing paradigm to a higher level of abstraction to automate the generation of multi-instance compute and memory platforms with interfaces to I/O devices (sensors and actuators). Since the different compute instances (NPUs, CPUs, DSPs, etc.) and I/O devices do not necessarily have compatible interfaces on a dataflow level, an automated translation is required. However, in multidimensional dataflow scenarios, it becomes inherently difficult to reason about buffer sizes and iteration order without knowing the shape of the data access pattern (DAP) that the dataflow follows. To capture this shape and the platform composition, we define a domain-specific representation (DSR) and devise a toolchain to generate a synthesizable platform, including appropriate streaming buffers for platform-specific tensorization of the data between incompatible interfaces. This allows platforms, such as sensor edge AI devices, to be easily specified by simply focusing on the shape of the data provided by the sensors and transmitted among compute units, giving the ability to evaluate and generate different dataflow design alternatives with significantly reduced design time.