Embedded Systems

Testing reliability techniques for SoCs with fault tolerant CGRA by using live FPGA fault injection

by Jo­hannes M. Kühn, Thomas Schweizer, Dustin Pe­ter­son, Thommy Kuhn, and Wolf­gang Rosen­stiel
In 2013 In­ter­na­tional Con­fer­ence on Field-Pro­gram­ma­ble Tech­nol­ogy (FPT) (): 462-465, 2013.

Key­words: fault tol­er­ance, field pro­gram­ma­ble gate ar­rays, in­te­grated cir­cuit re­li­a­bil­ity, logic de­sign, re­con­fig­urable ar­chi­tec­tures, re­dun­dancy, sys­tem-on-chip, field pro­gram­ma­ble gate array, dy­namic func­tional ver­i­fi­ca­tion, dy­namic remap­ping, TMR tech­nique, triple mod­u­lar re­dun­dancy tech­nique, SoC de­sign, sys­tem on chip de­sign, coarse grained re­con­fig­urable ar­chi­tec­tures, fault in­jec­tion method, live FPGA, fault tol­er­ant CGRA, test­ing re­li­a­bil­ity tech­nique, Tun­nel­ing mag­ne­tore­sis­tance, Re­li­a­bil­ity, Sys­tem-on-chip, Cir­cuit faults, Con­text, Com­puter ar­chi­tec­ture, Field pro­gram­ma­ble gate ar­rays

Ab­stract

In this work, we in­tend to demon­strate a num­ber of re­li­a­bil­ity tech­niques de­vel­oped for Coarse Grained Re­con­fig­urable Ar­chi­tec­tures (CGRA). The tech­niques to be demon­strated tar­get dif­fer­ent por­tions of a Sys­tem on Chip (SoC) De­sign con­sist­ing of a gen­eral pur­pose CPU, var­i­ous ac­cel­er­a­tors and a CGRA which may be used for ap­pli­ca­tion ac­cel­er­a­tion as well. On the CGRA we will demon­strate a light-weight Triple Mod­u­lar Re­dun­dancy (TMR) tech­nique which mit­i­gates the hard­ware over­head usu­ally in­curred by TMR. In case of a de­tected CGRA fault, we use Dy­namic Remap­ping of the ap­pli­ca­tion to avoid faulty com­po­nents and thus re­store the func­tion­al­ity of the mapped ap­pli­ca­tion. On SoC level, we demon­strate Dy­namic Func­tional Ver­i­fi­ca­tion to sam­ple and thus de­tect faults in com­po­nents of the SoC in a time mul­ti­plexed man­ner. The com­plete sys­tem is em­u­lated on a Field Pro­gram­ma­ble Gate Array (FPGA) for which we de­vel­oped a fast and ac­cu­rate fault in­jec­tion method to test the de­vel­oped tech­niques in a live and re­al­is­tic way.