Loop transformation methodology for fixed-rate video, image and telecom processing applications

Abstract
Many real-time signal processing applications are dominated by iterative loop constructs which exhibit a large amount of parallelism. In general, a realisation matched to the required rate of these applications exploits only a relatively small part of the parallelism available in the algorithm. This paper addresses the important problem of selecting the appropriate algorithmic-level decisions, in particular loop manipulations and the like, to arrive at an area-optimized specification for use in register-transfer level synthesis tools. One of the crucial cost factors in this optimisation is memory storage related. An effective model and methodology are proposed to derive an optimized architecture with fully matched throughput, while avoiding a full traversal of the large search space. The effectiveness of our approach is substantiated with several realistic test cases.<>

This publication has 16 references indexed in Scilit: