GOURD: Tensorizing Streaming Applications to Generate Multi-Instance Compute Platforms
In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 43(11): 4166-4177, 2024.
Keywords: Integrated circuits, Actuators, Design automation, Shape, Memory architecture, Focusing, Edge AI, Hardware, Sensors, Data orchestration, hardware platform generation, multidimensional data flow, synchronization
Abstract
In this article, we rethink the dataflow processing paradigm to a higher level of abstraction to automate the generation of multi-instance compute and memory platforms with interfaces to I/O devices (sensors and actuators). Since the different compute instances (NPUs, CPUs, DSPs, etc.) and I/O devices do not necessarily have compatible interfaces on a dataflow level, an automated translation is required. However, in multidimensional dataflow scenarios, it becomes inherently difficult to reason about buffer sizes and iteration order without knowing the shape of the data access pattern (DAP) that the dataflow follows. To capture this shape and the platform composition, we define a domain-specific representation (DSR) and devise a toolchain to generate a synthesizable platform, including appropriate streaming buffers for platform-specific tensorization of the data between incompatible interfaces. This allows platforms, such as sensor edge AI devices, to be easily specified by simply focusing on the shape of the data provided by the sensors and transmitted among compute units, giving the ability to evaluate and generate different dataflow design alternatives with significantly reduced design time.