Embedded Systems

A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator

by Oliver Bause, Paul Palom­ero Bernardo, and Oliver Bring­mann
In MBMV 2024; 27. Work­shop, pages 31–40, 2024.

Ab­stract

As ma­chine learn­ing ap­pli­ca­tions con­tinue to evolve, the de­mand for ef­fi­cient hard­ware ac­cel­er­a­tors, specif­i­cally tai­lored for deep neural net­works (DNNs), be­comes in­creas­ingly vital. In this paper, we pro­pose a con­fig­urable mem­ory hi­er­ar­chy frame­work tai­lored for per layer adap­tive mem­ory ac­cess pat­terns of DNNs. The hi­er­ar­chy re­quests data on-de­mand from the off-chip mem­ory to pro­vide it to the ac­cel­er­a­tor’s com­pute units. The ob­jec­tive is to strike an op­ti­mized bal­ance be­tween min­i­miz­ing the re­quired mem­ory ca­pac­ity and main­tain­ing high ac­cel­er­a­tor per­for­mance. The frame­work is char­ac­ter­ized by its con­fig­ura­bil­ity, al­low­ing the cre­ation of a tai­lored mem­ory hi­er­ar­chy with up to five lev­els. Fur­ther­more, the frame­work in­cor­po­rates an op­tional shift reg­is­ter as final level to in­crease the flex­i­bil­ity of the mem­ory man­age­ment process. A com­pre­hen­sive loop-nest analy­sis of DNN lay­ers shows that the frame­work can ef­fi­ciently ex­e­cute the ac­cess pat­terns of most loop un­rolls. Syn­the­sis re­sults and a case study of the DNN ac­cel­er­a­tor Ul­tra­Trail in­di­cate a pos­si­ble re­duc­tion in chip area of up to 62.2% as smaller mem­ory mod­ules can be used. At the same time, the per­for­mance loss can be min­i­mized to 2.4%.