Embedded Systems

Compiler-aware AI Hardware Design for Edge Devices

by Paul Palom­ero Bernardo, Patrick Schmid, Christoph Gerum, and Oliver Bring­mann
In Pro­ceed­ings of the 8th In­ter­na­tional Work­shop on Edge Sys­tems, An­a­lyt­ics and Net­work­ing, pages 31–36. As­so­ci­a­tion for Com­put­ing Ma­chin­ery, 2025.

Key­words: edge com­put­ing, hard­ware ac­cel­er­a­tor, deep learn­ing com­piler

Ab­stract

The adop­tion of novel AI hard­ware is often hin­dered by the lack of avail­able de­ploy­ment so­lu­tions. To ad­dress this chal­lenge, multi-tar­get deep learn­ing (DL) com­pil­ers have emerged, of­fer­ing a large va­ri­ety of op­ti­miza­tions and au­to­mated de­ploy­ment so­lu­tions that bridge the gap be­tween AI soft­ware and hard­ware. How­ever, sup­port for ded­i­cated AI ac­cel­er­a­tors re­mains lim­ited due to their vast ar­chi­tec­tural dif­fer­ences and unique cha​ract​eris​tics.​This paper in­tro­duces a novel com­piler-aware AI hard­ware de­sign for edge de­vices that im­proves the de­ploy­ment of AI work­loads on edge AI ac­cel­er­a­tors by align­ing the hard­ware ar­chi­tec­ture with the ca­pa­bil­i­ties of DL com­pil­ers. By an­a­lyz­ing work­load rep­re­sen­ta­tions in the DL com­piler TVM, we de­rive ar­chi­tec­ture-level and com­po­nent-level de­sign prin­ci­ples to en­hance ac­cel­er­a­tor us­abil­ity. We demon­strate this ap­proach with a pro­gram­ma­ble edge AI ac­cel­er­a­tor, op­ti­mized through a com­piler-dri­ven de­sign space ex­plo­ration and im­ple­mented in Glob­al­Foundries 22FDX+, achiev­ing an en­ergy ef­fi­ciency of 0.697 pJ/MAC for EEG-based seizure de­tec­tion and video cap­sule en­doscopy.