Refine Search

New Search

Results: 27

(searched for: doi:10.1109/date.2010.5456950)
Save to Scifeed
Page of 1
Articles per Page
by
Show export options
  Select all
IEEE Transactions on Multi-scale Computing Systems, Volume 4, pp 127-140; https://doi.org/10.1109/tmscs.2017.2774294

Abstract:
The implementation and optimization of dynamic dataflow programs on multi/many-core platforms require solving a very difficult problem: how to partition and schedule the processing elements and dimension their interconnecting buffers according to given optimization functions in terms of throughput, memory usage, and energy consumption. This problem is NP-hard even for two cores. Thus, finding a close-to-optimal solution consists of exploring the design space by appropriate heuristics identifying those design points that maximize or minimize the desired (multiple) objective functions subject to a set of constraints. In general, exploring the design space efficiently is a challenging task due to the massive number of admissible design points. Efficient estimation methodologies are necessary to support an effective search of the design space by reducing to a minimum the cost and the number of measurements on the physical platform. This paper presents a new methodology that provides high-precision estimations of dynamic dataflow programs performances on multi/many-core platforms for any set of design configurations. The estimations rely on the execution trace post-processing obtained by a single execution of the program. The paper describes the estimation methodology, implementation tools, and the information that is obtained from many/multi-core dataflow executions and used to drive the optimization heuristics.
, Miguel Angel Aguilar, Juan Fernando Eusse, Jeronimo Castrillon, Weihua Sheng
Published: 27 September 2017
The publisher has not yet granted permission to display this abstract.
, Robert Khasanov, Jeronimo Castrillon, Marcus Hähnel, Till Smejkal, Hermann Härtig
Published: 12 June 2017
Abstract:
For embedded system software, it is common to use static mappings of tasks to cores. This becomes considerably more challenging in multi-application scenarios. In this paper, we propose TETRiS, a multi-application run-time system for static mappings for heterogeneous system-on-chip architectures. It leverages compile-time information to map and migrate tasks in a fashion that preserves the predictable performance of using static mappings, allowing the system to accommodate multiple applications. TETRiS runs on off-the-shelf embedded systems and is Linux-compatible. We embed our approach in a state-of-the-art compiler for multicore systems and evaluate the proposed run-time system in a modern heterogeneous platform using realistic benchmarks. We present two experiments whose execution time and energy consumptions are comparable to those obtained by the highly-optimized Linux scheduler CFS, and where execution time variance is reduced by a factor of 510, and energy consumption variance by a factor of 83.
M. Michalska, J. J. Ahmad, E. Bezati, S. Casale-Brunet, M. Mattavelli
Abstract:
The exploration of different design configurations of dynamic dataflow programs executed on many-core or multi-core platforms is, in general, a very difficult task. Determining a close-to-optimal partitioning, scheduling and buffer dimensioning configuration, when associated with a performance optimization function, belongs to the class of NP-complete problems. In order to explore the space of feasible solutions with efficient heuristics looking for solutions of good quality, it is important to be able to evaluate the design points in terms of the performance optimization function with sufficient precision without having to physically execute the program on the platform. This paper presents a performance estimation approach and an associated SW tool capable of exploring, with a high level of accuracy, the space of feasible solutions by using only a limited set of measurements from the physical processing platform. Moreover, the estimation model allows an identification of possible improvements that can be applied to different configurations. The results reported validate the accuracy of the methodology using examples of dataflow implementations of dynamic video codec designs for two different classes of platforms: Transport Triggered Architecture and Intel platforms.
IEEE Design & Test, Volume 34, pp 77-90; https://doi.org/10.1109/mdat.2016.2626445

Abstract:
As embedded systems grow more complex and as new applications such as IoT require many design constraints, sophisticated design space exploration techniques are essential in order to find the best compromise between different design goals and their tradeoff. This tutorial gives a structured insight into the field of design space exploration for embedded systems.
Malgorzata Michalska, Simone Casale-Brunet, Endri Bezati, Marco Mattavelli
Abstract:
An important challenge for a dataflow designer is to efficiently explore the design space in order to find a set of configurations that satisfy the defined objective function. The exploration directions may involve the partitioning, scheduling and buffer dimensioning, and all together should drive the designer to maximally benefit from the potential parallelism of an application. Successful exploration can be strongly facilitated by means of performance estimation. This paper presents a tool that allows a high-precision estimation of a program execution on a given platform, when various sets of configurations can be applied. It demonstrates which information related to the multi-core program execution can be extracted and successfully used to drive the optimization procedures. The experimental results are confirmed by an actual execution on different types of platforms.
, Endri Bezati, Marco Mattavelli
Abstract:
The growing complexity of digital signal processing applications makes a compelling case for the adoption of higher-level programming models such as dataflow for the implementation of applications on programmable logic devices and many/multi-core embedded processors. Past research works have shown that raising the level of abstraction of design stages does not necessarily come at penalties in results in terms of performance or resource requirements. Dataflow programs provide a high-level behavioral descriptions capable of expressing both sequential and parallel components of application algorithms and enable natural design abstractions, modularity, and portability. This paper presents an overview of the main features, recent achievements and results of a design-flow, entirely dataflow based, and the associated tools capable of implementing and optimizing complex signal processing system applications on heterogeneous and massive parallel embedded systems.
, , Juan Fernando Eusse, Jeronimo Castrillon, Weihua Sheng
Published: 1 January 2016
The publisher has not yet granted permission to display this abstract.
Malgorzata Michalska, Nicolas Zufferey, Marco Mattavelli
Published: 1 January 2016
Procedia Computer Science, Volume 80, pp 1577-1588; https://doi.org/10.1016/j.procs.2016.05.486

Malgorzata Michalska, Endri Bezati, , Marco Mattavelli
Published: 1 January 2016
Procedia Computer Science, Volume 80, pp 2287-2291; https://doi.org/10.1016/j.procs.2016.05.415

Anastasia Stulova, Rainer Leupers, Gerd Ascheid
Abstract:
Due to energy efficiency requirements of modern embedded systems, chip vendors are inclined towards multicore architectures with different types of processing engines and non-uniform interconnect fabrics. At the same time multiple applications are intended to run concurrently on the devices with such heterogeneous architectures. This rapid growth in the complexity of the hardware and its use cases imposes new challenges on the software development tools. To overcome this complexity, model of computation based approaches are becoming increasingly promising. Synchronous Data Flow (SDF) is a popular specification formalism for streaming applications with inherently concurrent nature. However, the parallelism expressed in the original representation is often not sufficient to maximally exploit the potential of multicore platforms. In this paper we present a holistic methodology for improving the throughput of streaming applications while mapping them onto heterogeneous architectures. The approach uses transformations that adapt the parallelism in SDF according to available platform resources. We use a genetic algorithm to explore SDF instances with the objective of maximizing throughput on a target platform. Our model supports architecture heterogeneity and multi-application scenarios. The experiments indicate that our approach outperforms other techniques for exploiting parallelism on a single application in most of the test cases and enables concurrent applications optimization.
, Jiali Teddy Zhai, Hristo Nikolov, Todor Stefanov
Abstract:
The increasing complexity of modern embedded streaming applications imposes new challenges on system designers nowadays. For instance, the applications evolved to the point that in many cases hard-real-time execution on multiprocessor platforms is needed in order to meet the applications' timing requirements. Moreover, in some cases, there is a need to run a set of such applications simultaneously on the same platform with support for accepting new incoming applications at run-time. Dealing with all these new challenges increases significantly the complexity of system design. However, the design time must remain acceptable. This requires the development of novel systematic and automated design methodologies driven by the aforementioned challenges. In this paper, we propose such a novel methodology for automated design of an embedded multiprocessor system, which can run multiple hard-real-time streaming applications simultaneously. Our methodology does not need the complex and time-consuming design space exploration phase, present in most of the current state-of-the art multiprocessor design frameworks. In contrast, our methodology applies very fast yet accurate schedulability analysis to determine the minimum number of processors, needed to schedule the applications, and the mapping of applications' tasks to processors. Furthermore, our methodology enables the use of hard-real-time multiprocessor scheduling theory to schedule the applications in a way that temporal isolation and a given throughput of each application are guaranteed. We evaluate an implementation of our methodology using a set of real-life streaming applications and demonstrate that it can greatly reduce the design time and effort while generating high quality hard-real-time systems.
Jeronimo Castrillon, ,
IEEE Transactions on Industrial Informatics, Volume 9, pp 527-545; https://doi.org/10.1109/tii.2011.2173941

Abstract:
Processor Systems on Chip (MPSoCs) in order to cope with the increasing applications demands and the tight energy budget of portable devices. The complexity of these systems makes them difficult to program, which has caused academia and industry to look for alternative methodologies and models. In the particular case of multimedia and baseband processing, dataflow models are being proposed and appear to be a sensible choice to represent applications. While high-level models, like dataflow, increase programmers' productivity, new, powerful tools are badly required that lower the abstract specification into an efficient implementation. In this paper, a framework is presented that provides support for mapping multiple dataflow applications onto heterogeneous MPSoCs. The framework is aware of design constraints, provides different means for performance estimation and supports a variety of mapping heuristics. The tool is showcased on three applications on a virtual platform containing heterogeneous processing elements. The heuristics for single applications reported a speedup of up to 40% when compared against random walk. The multi-application component helped to find an appropriate scheduling configuration that met real-time constraints when the three applications were running simultaneously.
Dmitry Nadezhkin, Todor Stefanov
Abstract:
The Process Networks (PNs) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is very difficult and highly error-prone task. To overcome the associated difficulties, an automated procedure exists for derivation of a specific polyhedral process networks (PPN) from static affine nested loop programs (SANLPs). This procedure is implemented in the pn complier. However, there are many applications, e.g., multimedia applications, signal processing, etc., that have adaptive and dynamic behavior which can not be expressed as SANLPs. Therefore, in order to handle more dynamic applications, in this paper we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way. The main contribution of this paper is a first approach for automated translation of affine nested loops programs with while-loops into input-output equivalent PPNs.
Ashkan Beyranvand Nejad, Anca Molnos,
Abstract:
Multi-processor Systems on Chip (MPSoCs) execute multiple applications concurrently. These applications may belong to different domains, i.e., may have firm-, soft-, or non-real time requirements. A composable system simplifies system design, integration, and verification by avoiding the inter-application interference. Existing work demonstrates composability for applications expressed using a single model of computation. For example, Kahn Process Network (KPN) and dataflow are two common data-driven parallel models of computation, each with different properties and suited for different application domains. This paper extends existing work with support for concurrent, composable execution of KPN and dataflow applications on the same MPSoC platform. We formalize a unified execution model by defining its operations that implement the different models of computation on the MPSoC, and discuss the trade-offs involved. Our experiments indicate that multiple applications modeled in KPN and dataflow run composably on an MPSoC platform.
Jeronimo Castrillon, Weihua Sheng, Rainer Leupers
Abstract:
The increasing software content in current and future embedded systems has forced academia and industry to devise new programming methodologies. Only with new methods, software productivity will keep the pace with user's demands in the very competitive embedded market. Software synthesis, a traditionally formal approach of code generation from abstract models, is an attractive concept to solve the programming problem. In this paper we describe some of the major trends in software synthesis and its place in the overall environment of Electronic System Level (ESL) design and verification. The trends are illustrated by using the Multi-Processor System on Chip Application Programming Studio (MAPS) as example.
Jerónimo Castrillón, Aamer Shah, , Rainer Leupers, Gerd Ascheid
Abstract:
Advances in process integration, the power wall and end-user application demands have made Multi-Processor Systems on Chip (MPSoCs) a reality. In mobile embedded devices, these systems are heterogeneous in order to cope with stringent real time and energy constraints, which makes them difficult to program, debug and verify. Therefore, a lot of research in industry and academia has focused on providing solutions to this MPSoC programming problem. In this paper we study and extend one of such frameworks, namely, the MPSoC Application Programming Studio (MAPS) [1]. We analyze MAPS retargetability by adding a new backend for a heterogeneous MPSoC with the OSIP hardware scheduler [2]. The new backend exports high level debugging information that is included in an environment for application debugging based on virtual platforms. The extensions are demonstrated on a heterogeneous virtual platform running the JPEG application.
Dmitry Nadezhkin, Hristo Nikolov, Todor Stefanov
Abstract:
The Process Network (PN) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is very difficult and highly error-prone task. To overcome the associated difficulties, an automated procedure exists for derivation of a specific polyhedral process networks (PPN) from static affine nested loop programs (SANLPs). This procedure is implemented in the pn complier. However, there are many applications, e.g., multimedia applications (MPEG coders/decoders, smart cameras, etc.) that have adaptive and dynamic behavior which can not be expressed as SANLPs. Therefore, in order to handle more dynamic multimedia applications, in this paper we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way. The main contribution of this paper is a first approach for automated translation of affine nested loops programs with dynamic loop bounds into input-output equivalent polyhedral process networks.
Rainer Leupers, Lothar Thiele, Xiaoning Nie, Bart Kienhuis, Matthias Weiss, Tsuyoshi Isshiki
Abstract:
This paper summarizes a special session on multi-core/multi-processor system-on-chip (MPSoC) programming challenges. Wireless multimedia terminals are among the key drivers for MPSoC platform evolution. Heterogeneous multi-processor architectures achieve high performance and can lead to a significant reduction in energy consumption for this class of applications. However, just designing energy efficient hardware is not enough. Programming models and tools for efficient MPSoC programming are equally important to ensure optimum platform utilization. Unfortunately, this discipline is still in its infancy, which endangers the return on investment for MPSoC architecture designs. On one hand there is a need for maintaining and gradually porting a large amount of legacy code to MPSoCs. On the other hand, special C language extensions for parallel programming as well as adapted process network programming models provide a great opportunity to completely rethink the traditional sequential programming paradigm for sake of higher efficiency and productivity. MPSoC programming is more than just code parallelisation, though. Besides energy efficiency, limited and specialized processing resources, and real-time constraints also growing software complexity and mapping of simultaneous applications need to be taken into account. We analyze the programming methodology requirements for heterogeneous MPSoC platforms and outline new approaches.
Rainer Leupers, Jeronimo Castrillon
Abstract:
The problem of efficiently programming complex embedded heterogeneous multi-processor systems-on-chip (MPSoCs) continues to be one of the biggest hurdles in the IT community. Extracting parallelism from sequential applications, dealing with different programming models, and handling real time constraints in the presence of multiple concurrent applications are some of the challenges that make MPSoC programming so difficult. In this paper we describe the MAPS tool suite, which tries to tackle these aspects of MPSoC programming in an integrated development environment built upon the Eclipse framework. We give an overview of the MAPS framework, highlighting its differences to the previous work in, and report on experiences using the tool.
Page of 1
Articles per Page
by
Show export options
  Select all
Back to Top Top