Research

22 pages, 2980 KiB

Open AccessArticle

ASIR: Application-Specific Instruction-Set Router for NoC-Based MPSoCs

by Jens Rettkowski and Diana Göhringer

Computers 2018, 7(3), 38; https://doi.org/10.3390/computers7030038 - 27 Jun 2018

Cited by 3 | Viewed by 7256

The end of Dennard scaling led to the use of heterogeneous multi-processor systems-on-chip (MPSoCs). Heterogeneous MPSoCs provide a high efficiency in terms of energy and performance due to the fact that each processing element can be optimized for an application task. However, the [...] Read more.

The end of Dennard scaling led to the use of heterogeneous multi-processor systems-on-chip (MPSoCs). Heterogeneous MPSoCs provide a high efficiency in terms of energy and performance due to the fact that each processing element can be optimized for an application task. However, the evolution of MPSoCs shows a growing number of processing elements (PEs), which leads to tremendous communication costs, tending to become the performance bottleneck. Networks-on-chip (NoCs) are a promising and scalable intra-chip communication technology for MPSoCs. However, these technological advances require novel and effective programming methodologies to efficiently exploit them. This work presents a novel router architecture called application-specific instruction-set router (ASIR) for field-programmable-gate-arrays (FPGA)-based MPSoCs. It combines data transfers with application-specific processing by adding high-level synthesized processing units to routers of the NoC. The execution of application-specific operations during data exchange between PEs exploits efficiently the transmission time. Furthermore, the processing units can be programmed in C/C++ using high-level synthesis, and accordingly, they can be specifically optimized for an application. This approach enables transferred data to be processed by a processing element, such as a MicroBlaze processor, before the transmission or by a router during the transmission. Moreover, a static mapping algorithm for applications modeled by a Kahn process network-based graph is introduced that maps tasks to the MicroBlaze processors and processing units. The mapping algorithm optimizes the communication cost by allocating tasks to nearest neighboring PEs. This complete methodology significantly simplifies the design and programming of ASIR-based MPSoCs. Furthermore, it efficiently exploits the heterogeneity of processing capabilities inside the routers and MicroBlaze processors. Full article

(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)

► Show Figures

Figure 1

22 pages, 5208 KiB

Open AccessArticle

Hardware-Assisted Secure Communication in Embedded and Multi-Core Computing Systems

by Ahmed Saeed, Ali Ahmadinia and Mike Just

Computers 2018, 7(2), 31; https://doi.org/10.3390/computers7020031 - 15 May 2018

Viewed by 6757

Abstract

With the sharp rise of functionalities and connectivities in multi-core embedded systems, these systems have become notably vulnerable to security attacks. Conventional software security mechanisms fail to deliver full safety and also affect the system performance significantly. In this paper, a hardware-based security [...] Read more.

With the sharp rise of functionalities and connectivities in multi-core embedded systems, these systems have become notably vulnerable to security attacks. Conventional software security mechanisms fail to deliver full safety and also affect the system performance significantly. In this paper, a hardware-based security procedure is proposed to handle critical information in real-time through comprehensive separation without needing any help from the software. To evaluate the proposed system, an authentication system based on an image procession solution has been implemented on a reconfigurable device. In addition, the proposed security mechanism is evaluated for the Networks-on-chips, where minimal area, power consumption and performance overheads are achieved. Full article

(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)

► Show Figures

Figure 1

22 pages, 3771 KiB

Open AccessFeature PaperArticle

Mixed Cryptography Constrained Optimization for Heterogeneous, Multicore, and Distributed Embedded Systems

by Hyunsuk Nam and Roman Lysecky

Computers 2018, 7(2), 29; https://doi.org/10.3390/computers7020029 - 24 Apr 2018

Cited by 7 | Viewed by 5289

Abstract

Embedded systems continue to execute computational- and memory-intensive applications with vast data sets, dynamic workloads, and dynamic execution characteristics. Adaptive distributed and heterogeneous embedded systems are increasingly critical in supporting dynamic execution requirements. With pervasive network access within these systems, security is a [...] Read more.

Embedded systems continue to execute computational- and memory-intensive applications with vast data sets, dynamic workloads, and dynamic execution characteristics. Adaptive distributed and heterogeneous embedded systems are increasingly critical in supporting dynamic execution requirements. With pervasive network access within these systems, security is a critical design concern that must be considered and optimized within such dynamically adaptive systems. This paper presents a modeling and optimization framework for distributed, heterogeneous embedded systems. A dataflow-based modeling framework for adaptive streaming applications integrates models for computational latency, mixed cryptographic implementations for inter-task and intra-task communication, security levels, communication latency, and power consumption. For the security model, we present a level-based modeling of cryptographic algorithms using mixed cryptographic implementations. This level-based security model enables the development of an efficient, multi-objective genetic optimization algorithm to optimize security and energy consumption subject to current application requirements and security policy constraints. The presented methodology is evaluated using a video-based object detection and tracking application and several synthetic benchmarks representing various application types and dynamic execution characteristics. Experimental results demonstrate the benefits of a mixed cryptographic algorithm security model compared to using a single, fixed cryptographic algorithm. Results also highlight how security policy constraints can yield increased security strength and cryptographic diversity for the same energy constraint. Full article

(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)

► Show Figures

Figure 1

28 pages, 1372 KiB

Open AccessArticle

Designing Domain-Specific Heterogeneous Architectures from Dataflow Programs

by Süleyman Savas, Zain Ul-Abdin and Tomas Nordström

Computers 2018, 7(2), 27; https://doi.org/10.3390/computers7020027 - 22 Apr 2018

Cited by 6 | Viewed by 8615

Abstract

The last ten years have seen performance and power requirements pushing computer architectures using only a single core towards so-called manycore systems with hundreds of cores on a single chip. To further increase performance and energy efficiency, we are now seeing the development [...] Read more.

The last ten years have seen performance and power requirements pushing computer architectures using only a single core towards so-called manycore systems with hundreds of cores on a single chip. To further increase performance and energy efficiency, we are now seeing the development of heterogeneous architectures with specialized and accelerated cores. However, designing these heterogeneous systems is a challenging task due to their inherent complexity. We proposed an approach for designing domain-specific heterogeneous architectures based on instruction augmentation through the integration of hardware accelerators into simple cores. These hardware accelerators were determined based on their common use among applications within a certain domain.The objective was to generate heterogeneous architectures by integrating many of these accelerated cores and connecting them with a network-on-chip. The proposed approach aimed to ease the design of heterogeneous manycore architectures—and, consequently, exploration of the design space—by automating the design steps. To evaluate our approach, we enhanced our software tool chain with a tool that can generate accelerated cores from dataflow programs. This new tool chain was evaluated with the aid of two use cases: radar signal processing and mobile baseband processing. We could achieve an approximately

4 \times

improvement in performance, while executing complete applications on the augmented cores with a small impact (2.5–13%) on area usage. The generated accelerators are competitive, achieving more than 90% of the performance of hand-written implementations. Full article

(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)

► Show Figures

Figure 1

19 pages, 2377 KiB

Open AccessArticle

Feedback-Based Admission Control for Firm Real-Time Task Allocation with Dynamic Voltage and Frequency Scaling

by Piotr Dziurzanski and Amit Kumar Singh

Computers 2018, 7(2), 26; https://doi.org/10.3390/computers7020026 - 16 Apr 2018

Cited by 5 | Viewed by 5463

Abstract

Feedback-based mechanisms can be employed to monitor the performance of Multiprocessor Systems-on-Chips (MPSoCs) and steer the task execution even if the exact knowledge of the workload is unknown a priori. In particular, traditional proportional-integral controllers can be used with firm real-time tasks to [...] Read more.

Feedback-based mechanisms can be employed to monitor the performance of Multiprocessor Systems-on-Chips (MPSoCs) and steer the task execution even if the exact knowledge of the workload is unknown a priori. In particular, traditional proportional-integral controllers can be used with firm real-time tasks to either admit them to the processing cores or reject in order not to violate the timeliness of the already admitted tasks. During periods with a lower computational power demand, dynamic voltage and frequency scaling (DVFS) can be used to reduce the dissipation of energy in the cores while still not violating the tasks’ time constraints. Depending on the workload pattern and weight, platform size and the granularity of DVFS, energy savings can reach even 60% at the cost of a slight performance degradation. Full article

(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)

► Show Figures

Figure 1

21 pages, 2181 KiB

Open AccessArticle

Scheduling and Tuning for Low Energy in Heterogeneous and Configurable Multicore Systems

by Mohamad Hammam Alsafrjalani and Ann Gordon-Ross

Computers 2018, 7(2), 25; https://doi.org/10.3390/computers7020025 - 14 Apr 2018

Cited by 3 | Viewed by 5274

Abstract

Heterogeneous and configurable multicore systems provide hardware specialization to meet disparate application hardware requirements. However, effective multicore system specialization can require a priori knowledge of the applications, application profiling information, and/or dynamic hardware tuning to schedule and execute applications on the most energy [...] Read more.

Heterogeneous and configurable multicore systems provide hardware specialization to meet disparate application hardware requirements. However, effective multicore system specialization can require a priori knowledge of the applications, application profiling information, and/or dynamic hardware tuning to schedule and execute applications on the most energy efficient cores. Furthermore, even though highly disparate core heterogeneity and/or highly configurable parameters with numerous potential parameter values result in more fine-grained specialization and higher energy savings potential, these large design spaces are challenging to efficiently explore. To address these challenges, we propose a novel configuration-subsetted heterogeneous and configurable multicore system, wherein each core offers a small subset of the design space, and propose a novel scheduling and tuning (SaT) algorithm to efficiently exploit the energy savings potential of this system. Our proposed architecture and algorithm require no a priori application knowledge or profiling, and incur minimal runtime overhead. Results reveal energy savings potential and insights on energy trade-offs in heterogeneous, configurable systems. Full article

(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)

► Show Figures

Figure 1

23 pages, 3949 KiB

Open AccessFeature PaperArticle

Low Effort Design Space Exploration Methodology for Configurable Caches

by Mohamad Hammam Alsafrjalani and Ann Gordon-Ross

Computers 2018, 7(2), 21; https://doi.org/10.3390/computers7020021 - 27 Mar 2018

Cited by 2 | Viewed by 5373

Abstract

Designers can reduce design space exploration time and efforts using the design space subsetting method that removes energy-redundant configurations. However, the subsetting method requires a priori knowledge of all applications. We analyze the impact of a priori application knowledge on the subset quality [...] Read more.

Designers can reduce design space exploration time and efforts using the design space subsetting method that removes energy-redundant configurations. However, the subsetting method requires a priori knowledge of all applications. We analyze the impact of a priori application knowledge on the subset quality by varying the amount of a priori application information available to designers during design time from no information to a general knowledge of the application domain. The results showed that only a small set of applications representative of the anticipated applications’ general domains alleviated the design efforts and was sufficient to provide energy savings within 5.6% of the complete, unsubsetted design space. Furthermore, since using a small set of applications was likely to reduce the design space exploration time, we analyze and quantify the impact of a priori applications knowledge on the speedup in the execution time to select the desired configurations. The results revealed that a basic knowledge of the anticipated applications reduced the subset design space exploration time by up to 6.6X. Full article

(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)

► Show Figures

Figure 1

1576 KiB

Open AccessFeature PaperArticle

TaPT: Temperature-Aware Dynamic Cache Optimization for Embedded Systems

by Tosiron Adegbija and Ann Gordon-Ross

Computers 2018, 7(1), 3; https://doi.org/10.3390/computers7010003 - 22 Dec 2017

Cited by 4 | Viewed by 5893

Abstract

Embedded systems have stringent design constraints, which has necessitated much prior research focus on optimizing energy consumption and/or performance. Since embedded systems typically have fewer cooling options, rising temperature, and thus temperature optimization, is an emergent concern. Most embedded systems only dissipate heat [...] Read more.

Embedded systems have stringent design constraints, which has necessitated much prior research focus on optimizing energy consumption and/or performance. Since embedded systems typically have fewer cooling options, rising temperature, and thus temperature optimization, is an emergent concern. Most embedded systems only dissipate heat by passive convection, due to the absence of dedicated thermal management hardware mechanisms. The embedded system’s temperature not only affects the system’s reliability, but can also affect the performance, power, and cost. Thus, embedded systems require efficient thermal management techniques. However, thermal management can conflict with other optimization objectives, such as execution time and energy consumption. In this paper, we focus on managing the temperature using a synergy of cache optimization and dynamic frequency scaling, while also optimizing the execution time and energy consumption. This paper provides new insights on the impact of cache parameters on efficient temperature-aware cache tuning heuristics. In addition, we present temperature-aware phase-based tuning, TaPT, which determines Pareto optimal clock frequency and cache configurations for fine-grained execution time, energy, and temperature tradeoffs. TaPT enables autonomous system optimization and also allows designers to specify temperature constraints and optimization priorities. Experiments show that TaPT can effectively reduce execution time, energy, and temperature, while imposing minimal hardware overhead. Full article

(This article belongs to the Special Issue Multi-Core Systems-On-Chips Design and Optimization)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Multi-Core Systems-On-Chips Design and Optimization

Share This Special Issue

Special Issue Editor

Special Issue Information

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI