A Script-Based Cycle-True Verification Framework to Speed-Up Hardware and Software Co-Design: Performance Evaluation on ECC Accelerator Use-Case

Zulberti, Luca; Di Matteo, Stefano; Nannipieri, Pietro; Saponara, Sergio; Fanucci, Luca

doi:10.3390/electronics11223704

Open AccessArticle

A Script-Based Cycle-True Verification Framework to Speed-Up Hardware and Software Co-Design: Performance Evaluation on ECC Accelerator Use-Case

by

Luca Zulberti

^*,†

,

Stefano Di Matteo

^†

,

Pietro Nannipieri

^†

,

Sergio Saponara

and

Luca Fanucci

Department of Information Engineering, University of Pisa, Via G. Caruso, 16, 56122 Pisa, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2022, 11(22), 3704; https://doi.org/10.3390/electronics11223704

Submission received: 11 October 2022 / Revised: 4 November 2022 / Accepted: 9 November 2022 / Published: 12 November 2022

(This article belongs to the Special Issue VLSI Design, Testing, and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Digital designs complexity has exponentially increased in the last decades. Heterogeneous Systems-on-Chip integrate many different hardware components which require a reliable and scalable verification environment. The effort to set up such environments has increased as well and plays a significant role in digital design projects, taking more than 50% of the total project time. Several solutions have been developed with the goal of automating this task, integrating various steps of the Very Large Scale Integration design flow, but without addressing the exploration of the design space on both the software and hardware sides. Early in the co-design phase, designers break down the system into hardware and software parts taking into account different choices to explore the design space. This work describes the use of a framework for automating the verification of such choices, considering both hardware and software development flows. The framework automates compilation of software, cycle-true simulations and analyses on synthesised netlists. It accelerates the design space exploration exploiting the GNU Make tool, and we focus on ensuring consistency of results and providing a mechanism to obtain reproducibility of the design flow. In design teams, the last feature increases cooperation and knowledge sharing from single expert to the whole team. Using flow recipes, designers can configure various third-party tools integrated into the modular structure of the framework, and make workflow execution customisable. We demonstrate how the developed framework can be used to speed up the setup of the evaluation flow of an Elliptic-Curve-Cryptography accelerator, performing post-synthesis analyses. The framework can be easily configured taking approximately 30 min, instead of few days, to build up an environment to assess the accelerator performance and its resistance to simple power analysis side-channel attacks.

Keywords:

automating workflow; co-design; cycle-true simulation; design productivity; risc-v; system-on-chip; elliptic-curve cryptography; simple power analysis; side-channel attack

1. Introduction

The complexity of digital design has increased a lot in last decades with heterogeneous architectures [1,2] and Multi-Processor Systems-on-Chips (MPSoCs) in general [3,4]. Nowadays challenges lie in optimising each module of a heterogeneous system and its integration within it. The design space that System-on-Chip (SoC) designers have to explore is very wide and includes the addition of functionalities by selecting available modules, and the firmware or software application tuning. The evaluation of hardware and software co-design choices allows the selection of the best candidate configuration for given requirements.

The RISC-V [5] open-source Instruction Set Architectures (ISAs), together with its open-source implementations, have boosted co-design methodologies aimed to produce custom hardware with dedicated ISA extensions [6]. Thus, from development to tape-out, designers have to dedicate a substantial amount of time to setup all tools needed to execute the entire workflow for their projects. With these systems, the verification process is a very expensive task [7] and represents a large portion of the total cost of the project.

The target of this work, which extends our previous publication [8], is the workflow of heterogeneous digital designs described in Figure 1. The development cycle of these designs is very long. The most prominent hardware/software partitioning solutions must be evaluated in term of performance, power consumption, and area utilisation. The power analysis can be performed only at the final stage, when hardware and software development paths are integrated. Hardware emulation can integrate the two separated paths in advance, however, it does not validate any hardware-dependent results, only the functionality.

For simple designs, workflow automation is conducted by using scripting languages, such as Bash [9]. But this approach is not scalable and is prone to errors due to manual modification of scripts. To ensure the consistency of the results, the workflow should implement dependency checks between tools, and should support the integration of different tools. Furthermore, since these workflows take up to several hours to complete, it is important that their automation mechanisms do not produce undesirable results due to errors in the environment itself. Our contribution focuses on the automation of design and verification workflows. Designers should not take care of tool configurations, inter-tool dependencies, and automation scripts. We want to eliminate the time overheads caused by these tasks and their tendency to be error prone when performed manually.

We propose an automating framework implementing hardware and software compilation techniques to accelerate the design space exploration of hardware/software co-designs in heterogeneous digital systems. It makes the customization of state-of-art design tools simple and efficient by interacting with it at a single and high-level of abstraction provided by our Makefile-based infrastructure. The aim of the proposed framework is to improve the productivity of designers of such complex systems. In a team, designers use their knowledge to customise a particular portion of the design flow, benefiting from other contributions. Our framework provides a standardised interface to adapt the workflow to the needs of the designer, while not limiting the capabilities of state-of-the-art tools, such as compilers, Hardware Description Language (HDL) simulators, synthesisers, and others. All the different hardware and software choices identified during the co-design phase of the project can be evaluated by setting up multiple workflow executions with different parameters. We also integrate into the framework the RISC-V toolchain, an emerging state-of-the-art technology that is pervading every field of application. To adequately prepare the framework to use cases characterised by cycle-accurate simulations used to gather a more accurate time-based power consumption, we focused its development on post-synthesis analyses and related tools. Design space explorations following accurate power consumption and area occupancy evaluations cannot be done with faster but less detailed tools, such as emulators or ISA simulators [10]. Our framework can be exploited when designing robust systems against Side-Channel Attack (SCA) [11], ultra-low-power and heterogeneous [12], or space-grade SoCs, where the target technology, in both Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA) development flows, determines the time-based power consumption and area occupancy outcomes. It has been used in [6] to assist the design of a Post-Quantum Cryptography ISA extension for RISC-V, and in [13] to collect post-synthesis results of a cryptographic hardware accelerator for Advanced Encryption Standard (AES). To describe our work, we present an use case in which the performance of an Elliptic Curve Cryptography (ECC) accelerator [14] are assessed using the proposed framework, showing how the complexity of the design flow is reduced thanks to the automation provided. The implemented workflow is important in security applications where SCAs could exploit physical vulnerabilities to obtain information on secret keys used for cryptographic operations.

1.1. Related Works

From all the flow optimisation works for heterogeneous digital systems design found in literature [15,16,17,18,19,20,21], Highly Agile Masks Made Effortlessly from RTL (HAMMER) [22] is one of the projects closest to our work, as it focuses on workflow automation. It is used within the Chipyard framework [23] to automate its Very Large Scale Integration (VLSI) design flow. The generation of scripts and collaterals is done by exploiting Pyhton and Make, and it can be integrated by implementing a set of Application Program Interfaces (APIs) to run the steps of the flow. It is focused on the hardware design space exploration, i.e., synthesis, Place-and-Route (P&R), Design Rule Check (DRC), gate-level simulation, and power analysis and does not provide software support in it. Its execution is demanded to an executable separated from GNU Make, hence it cannot trace dependencies between each tool using the Make dependency check system. Furthermore, it does not provide a way to run the tools in the flow multiple times with different configurations. Our framework is better integrated with the GNU Make tool, providing finer control on the execution of the various design tools and on their output files. It integrates also the bare-metal software design flow, which is required when designing SoC solutions composed of processors (based on RISC-V ISA in our case) and peripherals.

1.2. Outline

In Section 2 we describe the proposed co-design verification framework: how it can be used and configured. In Section 3 we go into the various parts of the framework, describing all its configuration options. In Section 4 we describe the use-case workflow for the post-synthesis performance evaluation of an ECC accelerator that will be integrated into the Hardware Secure Module (HSM) of the European Processor Initiative (EPI) [24]. In the end, we discuss the conclusions in Section 5.

2. Our Script-Based Verification Framework

This section describes the main aspects of the framework in order to explain how the designers can exploit it during their works. Figure 2 illustrates the four main components of the framework: projects, tool handlers, Register Transfer Level (RTL) library and Software-Development-Kits (SDKs). Tool handlers are scripts written in Make syntax that execute an associated third-party tool, and are organized by type (software compilers, simulators, synthesisers, and power analysers). Up to now, there are four types of tools, each one driven by the associated tool handler that customize its behaviour with project configurations:

Software Compilation: generates synthesizable Read-Only Memorys (ROMs) or simulation-only initialized Random-Access Memorys (RAMs) through RISC-V GNU Toolchain 12.1.0.
RTL Synthesis: provides the netlists used in timing verification and power analysis through Synopsis Design Compiler 2022.03.
RTL Simulation: performs functional and gate-level simulations (depending on whether the synthesis has been performed or not) through Mentor QuestaSIM 21.3.
Power Analysis: provides cycle-true power consumption profiles using simulation results and synthesized netlists through Synopsis PrimeTime 2022.03.

Other tools can be integrated into the framework (such as VCS or Verilator simulator, Synplify synthesis tool and Clang compiler) thanks to the standardised interface represented by the naming of Make variables and the paths of generated files, which the tool handlers use to communicate with each other. The requested tool handlers define a set of targets that perform the design tasks. They are invoked by the framework using one of the two workflow executors implementing a particular behaviour to conduct the design flow:

Default: during parallel executions, Make decide how to schedule the invocation of all targets the selected tool handlers have defined.
Limited Power-Analysis: targets defined by simulation and power analysis tool handlers are grouped together to constraint their concurrent execution, thus reducing the number of Value Change Dump (VCD) files residing on the disk simultaneously. With many parallel simulations, this solution can limit disk usage.

2.1. Workflow Customisation: Projects

Figure 3 illustrates that each project is composed of flow recipes, RTL sources, tool-related configuration files and the main Makefile. Also flow recipes are written in Make syntax and define the variables, referred to as properties henceforth, which customise the execution of the tools. Designers select the tools to run at Make invocation by writing the flow recipes of the project. To increase the productivity of design space exploration, recipes and tools can be executed in parallel also respecting the dependency constraints. During the execution of the workflow, the framework generates the output artefacts of each recipe in the relevant folder, where it in turn calls up the required tools in separate folders.

Sharing only flow recipes and related configuration files, each designer of a team, software or hardware, can test its system with contributions from the others. The automation mechanisms makes the evaluation of each optimisation choice simple, reliable, and reproducible, because using the same flow recipe gives the same results. Customizing flow recipes, designers can choose:

HDL files to include into the workflow.
applications to compile for generating ROMs or initializing RAMs.
hardware modules to synthesize.
how many parametrized simulations to validate different SoC configurations.
to perform power analysis for evaluating the power consumption in each simulated scenario.

As reported in Figure 4, by configuring the flow recipes the workflow accomplishes the following verification scenarios:

Functional verification: it may compile some applications and then perform RTL simulations to verify the functionality of both hardware and software.
Post-Synthesis verification: it may compile some applications used to stimulate the netlist of the RTL modules synthesised using the requested technology library. After that, it performs a timing gate-level simulation of the system to verify constraints and perform a cycle-true power consumption analysis. Furthermore, post-layout (for ASIC-based flows) and post-routing (for FPGA-based flows) netlists can be simulated.

2.2. Underlying Makefile System

The framework is started by invoking Make inside the project directory (i.e., Figure 3) where the primary makefile is located. Make takes the instructions to generate the targets from a file called Makefile. The targets are files generated by the execution of a list of commands called recipe. The generation of some targets may depend on some other files, which can be source files located on the disk or can be represented in turn by other Make targets. The dependencies of a target are called prerequisites. For example, a target A which has got two targets B and C as prerequisites is not generated until they have been satisfied. Thus, the recipes of B and C are executed if they are missing or if they are older than the prerequisites on which they in turn depend. This mechanism ensures that the generated files are consistent with all the sources on which they depend directly or indirectly. Each target in the makefile can be defined with this syntax:

target … : prerequisites …
recipe
…

The Makefile of the project includes the entry script which starts the execution of the framework. After loading the selected flow recipe, it loads the requested tool handlers that define the targets performing the tasks of the workflow. Designers can set the property related to a tool type with the name of the desired tool:

SW: can be set to cxx to enable compilation with RISC-V GNU Toolchain.
SIM: can be set to questa to enable the simulation with Mentor QuestaSIM.
SYN: can be set to dc to enable the synthesis with Synopsys Design Compiler.
PWR: can be set to pt to enable the power analysis with Synopsys PrimeTime.

The Make targets defined by each tool handler are responsible to run the design flow. Make ensures dependencies between them, allowing the workflow to be executed without any user interaction. By simply invoking make, the designer lets the requested workflow executor handle the tool invocations and carry out the whole workflow. He can also run only a step by invoking make on the related target (e.g., make sw to compile all applications).

The execution of the desired tools is obtained by setting them as prerequisites of the default Make target default, which starts when the designer invokes make without targets. For example, a flow recipe that defines a workflow composed of all four steps, the dependencies between the tools could be described by this pseudo-Make syntax:

default: sw syn sim pwr
sw: sw-app1 sw-app2
syn: syn-mod1 syn-mod2 syn-mod3
sim: sim-run1 sim-run2
pwr: pwr-run1-mod1 pwr-run1-mod2 pwr-run1-mod3
pwr-run2-mod1 pwr-run2-mod2 pwr-run2-mod3
sw-app1: app1.mem <other deps>
app1.mem: <src deps>
<commands>
[…]
syn-mod1: mod1.v <other deps>
mod1.v: <hdl deps>
<commands>
[…]
sim-run1: run1-dump1.vcd <other deps>
run1-dump.vcd: app1.mem mod1.v <hdl deps>
<commands>
[…]
pwr-run1-mod1: run1-mod1-power.out <other deps>
run1-mod1-power.out: dump1.vcd mod1.v <other deps>
<commands>
[…]

The four tools are instructed to build two application images (sw-app1, sw-app2), generate three synthesised netlists (syn-mod1, syn-mod2, syn-mod3), run two simulation (sim-run1, sim-run2), and perform six power analyses (all pwr-sim*-mod*). Each target lists its dependencies as prerequisites and defines the commands to launch for carrying out its part of the design flow.

3. Tool Handlers

In this section we will describe the most important configuration options provided by the various tool handlers of the framework. The workflow execution can be customized according to the user needs by setting these properties as described in the next paragraphs.

3.1. Software Tool Handler

Software compilation is the first step of the workflow. As described in Figure 5, it generates the Executable and Linkable Format (ELF) images of the applications to run using the desired toolchain. The ELF images can be used to load the applications into the real platform or can be dumped for debugging purposes.

The compilation process can be customized with the flow recipe using C/C++ defines and it can feed the successive phases of the workflow converting the built images into an HDL ROM or a simulation-only initialized RAM. These files can be used to synthesise the boot ROMs of the SoC or to run the application within the simulator. The compilation can be performed with different toolchains by setting the SW property to the desired one. Using cxx, the targets are compiled with the GCC Toolchain architectures using the tool handler we wrote. The designer can define multiple targets to generate more than one application image, this can be useful for projects containing several ROMs or initialized RAMs. The idea behind this is to support a multi-ISA heterogeneous architecture in which the binaries can be compiled with different toolchain settings.

3.1.1. Make Targets

For each requested application, defined with the SW_TARGETS property, the framework generates a sw-<target> Make target on which sw depends on, along with others utility targets:

sw-<target>: starts the compilation of the specific target.
sw-<target>-recompile: removes the generated output files and causes the recompilation of the modified sources to update the resulting image.
sw-<target>-dump: after compilation, it prints the objdump of the specified target.
sw-<target>-clean: deletes the build directory and the generated output.

The compilation process can be customized with several properties that can be provided to the handler:

SW_<target>_SDK: it specifies the SDK to use for compilation. Up to now, only a baremetal SDK is integrated within the framework. The actual compilation is performed by a sub-invocation of make into the desired software application directory using its own build system. In this way, any other SDK tool that produces ELF images as output can be included into the framework.
SW_<target>_TYPE: can be bin, rom or ram. Binary output is desired to load the application on the target platform. ROM output generates a synthesizable SystemVerilog read-only memory. RAM output generates a simulation-only memory initialized with the application code.
SW_<target>_MEMNAME: in case of rom or ram output, it specifies the name of the generated SystemVerilog module. This is desired to correctly instantiate the memory in the SoC.

3.1.2. Baremetal SDK

Applications can be compiled using different SDKs and build systems. The compilation process is accomplished through a sub-invocation of Make (or another build utility) into the desired project of the SDK. The reason behind this is to keep compatibility with many software environments available online, for example the ones based on seL4 Microkernel [25], Zephyr Project [26] and Yocto Project [27]. In this work, we set up a baremetal SDK based on the Newlib C standard library [28] provided by the RISC-V GCC Toolchain.

As shown in Figure 6, the SDK is composed of three parts: the platform-independent code, composed by drivers and syscalls redefinitions; the platform-dependent code, which initializes peripherals using drivers and provides the linker script; the applications, which use the Newlib interface and the API provided by drivers.

Each image compiled with the baremetal SDK can be customized using these properties provided by the framework:

SW_<target>_BAREMETAL_APP: used to select an available application located into the baremetal SDK directory.
SW_<target>_BAREMETAL_ARCH: used to select the Application Binary Interface (ABI) string of the compiled image. It can be [soft|hard][32|64].
SW_<target>_BAREMETAL_PLATFORM: used to select an available platform which provides the linker script, initialize the drivers, and redefines part of the Newlib syscalls.
SW_<target>_BAREMETAL_DEFINES: a list of C defines passed to the compiler to customize the build process.

3.2. Synthesis Tool Handler

RTL synthesis can be performed to generate the netlist of different modules of the system using an available synthesis library. As illustrated in Figure 7, if previous tools provide any synthesisable sources (e.g., ROMs from software tool), this step is executed after their completion.

The synthesis can be performed with different tools setting the SYN property to the desired one. Using dc, the modules are synthesised with Synopsys Design Compiler using the tool handler we wrote. Into the flow recipe there can be defined multiple targets to generate more than one synthesised netlist. The idea behind this is to perform a mixed functional and gate-level simulation to evaluate the post-synth performances of the desired modules, using the stimulus given by the functional part of the simulated system.

Make Targets

For each module requested to be synthesised through the SYN_TARGETS property, the framework generates a syn-<target> Make target on which syn depends on, along with a cleaning target:

syn-<target>: starts the synthesis process for the module <target>.
syn-<target>-gui: starts the Graphical User Interface (GUI) of the third-party tool for the module <target>.
syn-<target>-clean: deletes the build directory and the generated output.

The synthesis process can be customized with several properties that can be provided to the handler:

SYN_NETLIST_ANNOTATED: instruct the tool to generate the Standard Delay Format (SDF) files for the synthesised netlists.
SYN_<target>_TOP: the name of the top-level module to synthesise the target.
SYN_<target>_TOP_LIB: the name of the library where the top-level module of the target resides.
SYN_<target>_RTL_DEFINES: can be used to provide per-target HDL defines to customize the RTL elaboration.
SYN_<target>_REQUIRE_SW: if the requested target depends on a generated ROM from the software compilation step, this property can be set with the names of the related software targets. Make will ensure the dependency with the file generated by these targets.
SYN_DC_LIB: relative path to the technology library to use for synthesis.
SYN_DC_SETUP_FILE: the designer can pass a custom setup script to Design Compiler to customize more settings. It will be executed before the RTL analysis step.
SYN_DC_SDC_FILES: can be used to provide constraints to the synthesis process of all targets.
SYN_<target>_DC_SDC_FILES: can be used to provide per-target constraints to the synthesis process.

3.3. Simulation Tool Handler

Functional and gate-level simulations are performed using the HDL modules provided by the framework library, the local sources and the modules generated by previous tools. As shown in Figure 8, different simulations can be started and customized to perform functional verification of the SoC starting the GUI or the batch simulation.

The designer can use this phase to provide VCD files for further evaluations with successive tools. The simulation step can be performed with different tools setting the SIM property to the desired one. Using questa the targets are simulated by Mentor QuestaSIM using the tool handler we wrote. Into the flow recipe there can be defined multiple targets to simulate testbenches with different parameters. The HDL outputs from the software compilation and the RTL synthesis steps are automatically added to the prerequisites of the simulation Make target to ensure their generation before the simulation begins.

Make Targets

For each simulation target defined through the SIM_TARGETS property, the framework generates a sim-<target> Make target on which sim depends on, along with others utility targets:

sim-<target>: after compiling and optimising the HDL sources, the simulation is executed in batch mode. Further commands can be provided to the simulator specifying the inclusion of a .do file with the SIM_CMD_FILE property.
sim-<target>-gui: after compiling and optimizing the HDL sources, the GUI of the simulator tool is launched to continue the simulation manually. Further commands can be provided to the simulator specifying the inclusion of a .do file with the SIM_WAVE_FILE property, which is generally used to setup the Wave window of QuestaSIM.
sim-<target>-compile: compile and optimise the HDL sources.
sim-<target>-clean: deletes the build directory and the generated output files.

SIM_CMD_FILE and SIM_WAVE_FILE properties are evaluated in the same manner by the simulator. They are kept separated to provide different behaviour for batch and GUI simulations.

Batch and GUI simulations can be customized with several properties that can be provided to the handler:

SIM_TB: specifies the name of the testbench module to run for the simulation.
SIM_TIMESCALE: sets the time unit and the time resolution for the simulation. A different timescale can be set for a specific target using the property SIM_<target>_TIMESCALE.
SIM_RUNTIME: specifies the duration of the simulation, it can be set to all in order to wait for its end.
SIM_RTL_DEFINES: list of defines passed to the compiler when compiling the functional HDL modules for all targets. Different defines can be specified for a specific target using the property SIM_<target>_RTL_DEFINES.
SIM_SYN_DEFINES: list of defines passed to the compiler when compiling the synthesised HDL modules for all targets. Different defines can be specified for a specific target using the property SIM_<target>_SYN_DEFINES.
SIM_VCD_MODULES: a list of modules whose nets activity will be saved in the compressed VCD file.
SIM_VCD_ENTITIES: a list of entities whose activity will be saved in the compressed VCD file.
SIM_<target>_REQUIRE_SW: list of software targets on which the simulation for <target> depends.
SIM_<target>_REQUIRE_SYN: list of synthesis targets on which the simulation for <target> depends.
SIM_DELETE_INTERMEDIATES: if enabled, at the end of each simulation, the compiled library is deleted to save space on disk. It can be useful when a lot of simulations which cannot share the compiled library (see next property) are performed.
SIM_SHARE_LIB: if the design units does not need to be recompiled for each target, this property instructs the framework to compile the sources just once and share the library across the targets. It speeds up the simulation process and saves space on the disk when the designer sets up a lot of targets.

In addition, there are dedicated properties for netlist optimisation and tool-dependent command-line arguments (e.g., vopt […] -sdftyp /dut/path=/sdf/path) that can be specified.

3.4. Power Analysis Tool Handler

Power analysis is performed to provide an estimation of the time-based power consumption of the synthesised modules using their netlist, the synthesis library and all the VCD files generated by the simulations.

The power analysis step can be performed with different tools by setting the PWR property to the desired one. Using pt, the power consumption evaluation is done by Synopsys PrimeTime using the tool handler we wrote. The Make targets may not be defined into the flow recipe because they are generated automatically: one for each combination of synthesised module and simulation run, as shown in Figure 9. The VCD files generated by the simulator and the synthesised modules are automatically added to the prerequisites of the power analysis Make targets to ensure their generation before the analysis begins.

Make Targets

For each combination of simulation runs (SIM_TARGETS property) and synthesised modules (SYN_TARGETS property) on which they depends (defined by the SIM_<target>_ REQUIRE_SYN properties), the framework generates a pwr-<target> Make target (where <target> = <syn_target>-<sim_target>) on which pwr depends on, along with a cleaning target:

pwr-<target>: starts the power analysis process for the target.
pwr-<target>-clean: deletes the build directory and the generated outputs of the target.

Power analysis process can be customised with two properties that must be provided to the handler:

PWR_DELETE_INPUTS: delete the input VCD file when power analysis completes to save space on disk.
PWR_<syn_target>_NETLIST_PATH: identify the synthesised module <syn_target> into the VCD record.

3.5. Limited Power-Simulation Workflow Executor

In complete workflows, when simulation and power analysis are requested, the generated VCD files can be very large and in case the recipes define several simulation targets, disk space could easily run out. The solution is to delete each VCD file as soon as it has been used by the power analysis tool. This cannot be easily done with the execution of the default Make target recipe, because the order in which the targets are executed when make is invoked with -j option is not defined. All simulations could be executed before running all the power analysis, ending up with a lot of VCD files. Even though the files are all deleted at the end of the workflow, the number of them on disk at the same time cannot be controlled. To accomplish this scenario, the designer must set the property WORKFLOW to limit-pwr-sim and define the property LIMIT_PWR_SIM_FILES = N. The framework will include the workflow executor we wrote that defines a limit-pwr-sim-<sim> target for each simulation run. They are generated specifically for grouping together the simulation and the power analysis targets related to the same VCD file.

The execution of its recipe performs a sub-invocation of the Make command with the specific pwr-<sim> target as argument, which then depends on all pwr-<sim>-* targets as described in Figure 10. This invocation is wrapped into a wait-post sequence on a semaphore file on the disk initialized with N to limit the sequences running simultaneously. In this way, only up to N simulation outputs can reside on disk at the same time. This number does not interfere with the limitation of parallel instances given by the set of *_PARALLEL properties, which are always respected. The Make recipe generated for simulation <sim> can be summarized by the following simplified version:

limit-pwr-sim-<sim>: $(LIMIT_VCD_DEPS)
$(limit-pwr-sim-semaphore) wait
$(MAKE) PWR_DELETE_INPUTS=yes pwr-<sim>
$(limit-pwr-sim-semaphore) post

where limit-pwr-sim-semaphore is a shortcut for the execution of the bash script which handles the semaphore for this task. When Make is invoked with -jM option with

M > N

, only N spawned jobs can execute the pwr-<sim> target, the other ones must wait on the semaphore.

The limit-pwr-sim-<sim> targets must take care of dependency with the simulation targets to prevent a race condition within the sub-invocation of the make command. There could be multiple executions of the software or synthesis targets on which the simulation and the power analysis steps depends. Depending on the type of simulation performed, the dependencies are adjusted accordingly:

default: it runs software compilation and netlist synthesis before the sub-invocations of Make. Then, each sub-instance will compile the RTL sources in its build directory.
shared library: it performs the compilation of the first simulation target before the sub-invocations of Make. Then, each sub-instance will create a symbolic link to the compiled library to perform the simulation.

As the other tools, this one stores all the necessary files into its build directory and defines some Make targets: for each simulation run, this workflow executor generates a limit-pwr-sim-<sim> Make target on which limit-pwr-sim depends on, plus a cleaning target:

limit-pwr-sim-<sim>: waits on the limiting semaphore, runs Make on the pwr-<sim>, create a file to mark the run as completed and then posts on the semaphore.
limit-pwr-sim-<sim>-clean: deletes the file used to mark the <sim> run as completed, to evaluate whether it should be updated through the sub-invocation of Make.

4. Use-Case: Performance Evaluation of an ECC Accelerator within a RISC-V-Based SoC

The use-case in Figure 11 schematizes the typical scenarios in which the proposed framework aids the design process.

They are composed of one or more Central Processing Units (CPUs), an interconnection mechanism, some peripherals, and the memories where the applications, executed during the simulation, reside. From this general scenario, we want to focus on the power consumption evaluation of some modules in the design, using different parameters to customize the analysis procedure. To further illustrate the framework features and prove the design productivity gain achieved with the aided workflow, we present a use-case example where the framework helps to set up the environment needed to obtain the desired result. The execution of the workflows described in next section has been performed on a computer equipped with two Intel(R) Xeon(R) E5-2650 v3 CPUs and

384 G i B

of RAM running CentOS Linux 7.

4.1. Use-Case System-on-Chip

This section presents how the various parts of the framework have been used to perform the post-synthesis evaluation of a hardware accelerator for ECC. As reported in Section 1, such hardware module will be integrated into the Hardware Secure Module of the European Processor Initiative (EPI) chip and implemented in a 7 nm ARM^® Artisan technology. The use of the framework has reduced the time spent on building up the design and verification environment to set up all the synthesis, simulations, and power analysis steps of the workflow. It has also permitted the evaluation of many simulations with different input data.

As shown in Figure 12, the simulations instantiate a SoC composed of a RISC-V CPU (64bit CVA6 by OpenHw Group [29]), an Advanced eXtensible Interconnect (AXI) communication network, a simulation-only RAM initialised with the binary of the application, and the hardware accelerator for ECC (henceforth ECC Core). The ECC Core under evaluation can be configured to support both the NIST P-256 and NIST P-521 elliptic curves [30], which are used to accelerate different cryptographic schemes such as Elliptic Curve Digital Signature Algorithm (ECDSA) and Elliptic Curve Diffie-Hellman (ECDH). In this work, we are focusing on the evaluation of the performance of the ECC Point Multiplication (PM) operation, which in ECC represents the most important primitive. The architectural details of the ECC Core that we want to evaluate in this work are presented in [14]. In the cited work, three different algorithms are implemented to perform the PM and an evaluation of performance in terms of latency, power and area consumption has been made. In addition, [14] provides a preliminary evaluation on the resistance against Simple Power Analysis (SPA) attacks of the accelerator for the three different architectures implemented on the ARM^® Artisan^® (typical corner case: 0.75 V, 85 °C) technology at 100 MHz. The architecture of the ECC Core is reported in Figure 13: it features two computational units (i.e., Point addition module and Point doubling module in Figure 13), and a state machine. The latter manages the computational modules and the data flow according to the PM algorithm; at synthesis level the state machine can be configured to execute three different PM algorithms, and a brief description of them is reported as follows:

DA: This configuration (already presented in [14]) performs PM using the standard Double-and-Add (DA) algorithm that is not resistant to SPA. This algorithm has no fixed latency, which depends on the value of the key k.
DAA: This configuration (already presented in [14]) performs PM using the Double-and-Add-Always (DAA) algorithm, which is retained secure against SPA.
MDAA: This configuration (already presented in [14]) performs PM using a Modified Double-and-Add-Always (MDAA) algorithm.

In this work, we introduced an additional randomized MDAA architecture to improve SCA resistance of the ECC Core, named Randomized Modified Double-and-Add-Always (RMDAA). All the architectures in [14] employ a redundant projective representation of the elliptic curve points (named Standard Projective representation), which allows reducing the computation time of PM at the cost of higher resources consumption. This approach requires to map every generic point

P (x, y)

of the elliptic curve with its projective representation

P (X, Y, Z)

where Z can be arbitrarily chosen. In the presented solution, we randomized the Z-coordinate to find whether this countermeasure provides benefit against SPA attacks. Furthermore, different works as [31,32,33] showed that randomization of the Z-coordinate can be used as countermeasure also against Differential Power Analysis (DPA) SCAs.

Thanks to the functionalities offered by the framework presented in this paper, we were able to obtain a more accurate characterisation of the ECC Core in the four different synthesis scenarios (i.e., DA, DAA, MDAA, RMDAA). In particular, we used the SDF generated by Design Compiler for the gate-level simulations, synthesised the hardware designs at 1 GHz of frequency, and evaluated the power consumption profile of the four architectures. We used the proposed framework to synthesise the four ECC Core configurations (for simplicity only the configuration for NIST P-256 elliptic curve is synthesized, but the workflow allows to synthesize automatically all the configurations), evaluating area utilization, latency, and power consumption. Additionally, we performed an assessment on its resistance to SPA. We needed to extract the power trace of the ECC Core while performing the PM with different inputs, provided from the software side. For reason of readability and conciseness in this work, we provided only six different keys for each architectures. Therefore, the workflow must execute: six software compilations, one per each key (k1, k2, k3, k4, k5, k6); four syntheses, one per each PM implementation (da, daa, mdaa, rmdaa); twenty-four simulations and power analyses, one for each combination of compiled software and synthesised netlist (targets are named as <sw>-<syn>). It should be noted that for a complete characterization of an ECC architecture against SPA or DPA SCAs thousands of simulations shall be required. The flow recipe can easily scale with the complexity of the desired simulations reducing the time spent for the setup of the workflow and the data collection.

4.2. Recipe Configuration

We wrote the recipe exploiting GNU Make functions to define the various properties dynamically. Firstly, we select the RTL modules to include into the workflow by using the RTL_MODULES property. The RTL_DEFINES is used to set some SystemVerilog defines common to all targets. The ecc_soc module includes as dependencies the RTL modules of the CPU, the AXI interconnect, and the ECC core.

RTL_MODULES = ecc_soc
RTL_DEFINES = <global defines>

The software handler uses the GCC compiler and the Baremetal SDK included in the framework to build the six different applications.

SW = cxx. # CXX tool
SW_TARGETS = k1 k2 k3 k4 k5 k6
define gen-sw-props
SW_$1_SDK = baremetal
SW_$1_TYPE = vmem
SW_$1_MEMNAME = initram
SW_$1_BAREMETAL_APP = ecc/spa_test
SW_$1_BAREMETAL_ARCH = soft64
SW_$1_BAREMETAL_PLATFORM = ecc_soc
enddef
$(foreach t,$(SW_TARGETS),$(eval $(call gen-sw-props,$t)))
SW_k1_BAREMETAL_DEFINES = ECC_K=101010
SW_k2_BAREMETAL_DEFINES = ECC_K=010101
SW_k3_BAREMETAL_DEFINES = ECC_K=001100
SW_k4_BAREMETAL_DEFINES = ECC_K=110011
SW_k5_BAREMETAL_DEFINES = ECC_K=110000
SW_k6_BAREMETAL_DEFINES = ECC_K=001111

The synthesis handler uses Design Compiler to synthesize the four configurations of the ECC Core for the provided Process Development Kit (PDK).

SYN = dc # DesignCompiler tool
SYN_PARALLEL = 5 # Limit for license availability
SYN_TARGETS = da daa mdaa rmdaa
SYN_NETLIST_ANNOTATED = yes
SYN_DC_LIB = libs/epi7nm
SYN_DC_SETUP_FILE = scripts/dc_setup.tcl
SYN_DC_SDC_FILES = constr/ecc_core.sdc
define gen-syn-props
SYN_$1_TOP = ecc_core_wrapper
SYN_$1_TOP_LIB = ecc_core
SYN_$1_REQUIRE_SW = # Force no dependencies with SW
enddef
$(foreach t,$(SYN_TARGETS),$(eval $(call gen-syn-props,$t)))
SYN_daa_RTL_DEFINES = ECC_PROT_DAA
SYN_mdaa_RTL_DEFINES = ECC_PROT_MDAA
SYN_rmdaa_RTL_DEFINES = ECC_PROT_MDAA ECC_RAND

The simulation handler uses QuestaSim to perform twenty-four simulations. The software and synthesis dependencies of each target have been correctly limited using SIM_<target>_ REQUIRE_SYN/SW properties. The SDF annotation is performed adding QuestaSim-specific command-line arguments to each simulation target.

SIM = questa # QuestaSIM tool
SIM_TB = tb_ecc_core
SIM_TB_LIB = ecc_core
SIM_TIMESCALE = 100ps/1ps
SIM_RUN_TIME = all
SIM_OPT = yes
# 1 GHz clock from testbench
SIM_RTL_DEFINES = CLK_PERIOD=10
SIM_VCD_LOG_MODULES = tb_ecc_soc/soc/ecc_core
SIM_QUESTA_VOPT_ARGS = +sdf_verbose +sdf_iopath_to_prim_ok
# Generation of simulation targets
$(foreach t,$(SYN_TARGETS),
$(foreach k,$(SW_TARGETS),\
$(eval SIM_TARGETS += $t-$k)\
$(eval SIM_$t-$k_REQUIRE_SYN = $t)\
$(eval SIM_$t-$k_REQUIRE_SW = $k)\
$(eval SIM_$t-$k_QUESTA_VOPT_ARGS = -sdftyp /tb_ecc_soc/soc/ecc_core=$t-ecc_core_wrapper.sdf)))

The power analysis handler generates the targets automatically, only the netlist path into the VCD file must be specified.

PWR = pt # PrimeTime tool
PWR_PARALLEL = 5 # Limit for license availability
$(foreach t,$(SYN_TARGETS),\
$(eval PWR_$t_NETLIST_PATH = tb_ecc_soc/soc/ecc_core))

To prevent huge space utilization on the host disk, the Limited Power-Simulation workflow has been used, limiting the number of VCD files on disk to five.

WORKFLOW = limit-pwr-sim
LIMIT_PWR_SIM_FILES = 5

4.3. Performance Evaluation Results

A designer with good knowledge of the GNU Make syntax takes just 30 min to set up the entire workflow, including the organisation of the RTL, the software sources, the technological synthesis library, and all the scripts and constraint files. After that, the framework can be invoked with make -j to parallelise the workload. At the end of the workflow, which took ≃ 12 h, the designer finds all the files required to evaluate the performance of the architecture in the output directory of the flow recipe. In particular, in the output folders of the synthesis targets the designer can find various synthesis reports, in the simulation output folder he can find reports on the latency of the operations, and in the power analysis output folder he can find the power report and the power trace of the simulation. In Table 1 are reported the results for the different PM architectures.

We used the simulated approach to assess the resistance to SPA attacks. Figure 14 shows some power traces for each accelerator configuration. In particular, the four pictures on the left side of Figure 14 are the ones acquired for the key k1, where the first one corresponds to the DA architecture, the second one for the DAA, the third one for the MDAA and the last one for the RMDAA. Instead, the four pictures on the right side of Figure 14 are the ones acquired for the key k3, where the first one corresponds to the DA architecture, the second one for the DAA, the third one for the MDAA and the last one for the RMDAA. As already stated in [14], the information leakage of DA architecture allows to easily retrieve the whole private key, instead of the DAA architecture where some part of the key can still be guessed. The power traces of the MDAA and RMDAA architectures are extremely similar and there are no substantial differences among them.

To further investigate the results of the architectures MDAA and RMDAA, we measured and plotted in Figure 15 the average power consumption during the computation of the PM operation. The power is averaged on the intervals corresponding to the use of an ECC key bit. While the MDAA architecture presents the same power consumption profile each time it is used with the same key, the RMDAA hides the information of the key, thanks to the randomness added by the random Z coordinate of the projective representation.

The result presented in this work related to the characterisation of the ECC Core must not be intended as complete and exhaustive, but the use-case aims to show how the proposed framework accelerates verification, validation and characterisation of SoC design workflows. In particular, the results obtained in previous work [14] required a few days of development and verification of the simulation and analysis environment itself. Here, the correct use of the various third-party tools, needed to complete the workflow, is ensured and automated by the proposed framework, which permitted the set up of the environment in around 30 min, plus

≃ 12

h of actual flow execution. Using the proposed framework, we can continue the SCA assessment of the ECC Core, in particular against the DPA attacks, which could require around thousands simulations and power analyses that will be completely automated. In an ordinary design flow, all the scripts for simulations and analyses must be carefully modified in order to adapt their use to the new necessities, with our framework instead, changing only few properties in the flow recipe permits to obtain very different design and verification environments.

5. Conclusions

This work described the use of a verification framework to automate the evaluation of software and hardware choices during co-design of SoC solutions. After the typical design workflow problems are introduced and the related works are compared, we described how our framework can be used to automate compilation of software applications, synthesis, cycle-true simulations, and power analyses. The aim of our solution is to make the framework capable of running a completely customized workflow, taking advantage of its versatility and modular structure to integrate various tools and assist even more complex use-cases. We illustrated the flow recipes, which provide a simple method for ensuring consistency among different workflow executions, improving cooperation in development teams by sharing configurations. Finally, we presented a use-case demonstrating how the framework can accelerate the setup of the verification environment used to assess the performance of the ECC accelerator embedded into the HSM module of the EPI chip. Area occupation, latency, power consumption, and resistance to SPA SCAs of four different accelerator configurations have been reported and discussed. The framework made it possible to significantly reduce the time taken to set up the various tools to perform the described use-case and collect the results.

Author Contributions

Conceptualization, L.Z., S.D.M. and P.N.; methodology, L.Z., S.D.M. and P.N.; software, L.Z.; validation, L.Z., S.D.M. and P.N.; investigation, L.Z., S.D.M. and P.N.; resources, S.D.M. and P.N.; data curation, L.Z., S.D.M. and P.N.; writing—original draft preparation, L.Z., S.D.M. and P.N.; writing—review and editing, L.Z., S.D.M., P.N., S.S. and L.F.; visualization, L.Z., S.D.M., P.N., S.S. and L.F.; supervision, S.S. and L.F.; project administration, S.S and L.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by European Processor Initiative Specific Grant Agreement No 101036168.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

ABI	Application Binary Interface
AES	Advanced Encryption Standard
API	Application Program Interface
ASIC	Application Specific Integrated Circuit
AXI	Advanced eXtensible Interconnect
CPU	Central Processing Unit
DA	Double-and-Add
DAA	Double-and-Add-Always
DPA	Differential Power Analysis
DRC	Design Rule Check
ECC	Elliptic Curve Cryptography
ECDH	Elliptic Curve Diffie-Hellman
ECDSA	Elliptic Curve Digital Signature Algorithm
ELF	Executable and Linkable Format
EPI	European Processor Initiative
FPGA	Field Programmable Gate Array
GUI	Graphical User Interface
HAMMER	Highly Agile Masks Made Effortlessly from RTL
HDL	Hardware Description Language
HSM	Hardware Secure Module
ISA	Instruction Set Architecture
MDAA	Modified Double-and-Add-Always
MPSoC	Multi-Processor Systems-on-Chip
P&R	Place-and-Route
PDK	Process Development Kit
PM	Point Multiplication
RAM	Random-Access Memory
RMDAA	Randomized Modified Double-and-Add-Always
ROM	Read-Only Memory
RTL	Register Transfer Level
SCA	Side-Channel Attack
SDF	Standard Delay Format
SDK	Software-Development-Kit
SoC	System-on-Chip
SPA	Simple Power Analysis
VCD	Value Change Dump
VLSI	Very Large Scale Integration

References

Balkind, J.; Lim, K.; Schaffner, M.; Gao, F.; Chirkov, G.; Li, A.; Lavrov, A.; Nguyen, T.M.; Fu, Y.; Zaruba, F.; et al. BYOC: A “Bring Your Own Core” Framework for Heterogeneous-ISA Research. In Proceedings of the ASPLOS ’20—Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne Switzerland, 16–20 March 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 699–714. [Google Scholar] [CrossRef] [Green Version]
Mittal, S.; Vetter, J.S. A Survey of CPU-GPU Heterogeneous Computing Techniques. ACM Comput. Surv. 2015, 47, 1–35. [Google Scholar] [CrossRef]
Wolf, W.; Jerraya, A.A.; Martin, G. Multiprocessor System-on-Chip (MPSoC) Technology. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2008, 27, 1701–1713. [Google Scholar] [CrossRef]
Balkind, J.; McKeown, M.; Fu, Y.; Nguyen, T.; Zhou, Y.; Lavrov, A.; Shahrad, M.; Fuchs, A.; Payne, S.; Liang, X.; et al. OpenPiton: An Open Source Manycore Research Framework. SIGARCH Comput. Archit. News 2016, 44, 217–232. [Google Scholar] [CrossRef] [Green Version]
Waterman, A.; Asanovic, K. (Eds.) The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213; RISC-V Foundation: San Francisco, CA, USA, 2019. [Google Scholar]
Nannipieri, P.; Di Matteo, S.; Zulberti, L.; Albicocchi, F.; Saponara, S.; Fanucci, L. A RISC-V Post Quantum Cryptography Instruction Set Extension for Number Theoretic Transform to Speed-Up CRYSTALS Algorithms. IEEE Access 2021, 9, 150798–150808. [Google Scholar] [CrossRef]
Chen, W.; Ray, S.; Bhadra, J.; Abadir, M.; Wang, L. Challenges and Trends in Modern SoC Design Verification. IEEE Des. Test 2017, 34, 7–22. [Google Scholar] [CrossRef]
Zulberti, L.; Nannipieri, P.; Fanucci, L. A Script-Based Cycle-True Verification Framework to Speed-Up Hardware and Software Co-Design of System-on-Chip exploiting RISC-V Architecture. In Proceedings of the 2021 16th International Conference on Design Technology of Integrated Systems in Nanoscale Era (DTIS), Montpellier, France, 28–30 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
Bash. by Free Software Foundation. Available online: https://www.gnu.org/software/bash (accessed on 11 November 2022).
Spike RISC-V ISA Simulator. Available online: https://github.com/riscv-software-src/riscv-isa-sim (accessed on 11 November 2022).
Sarti, L.; Baldanzi, L.; Crocetti, L.; Carnevale, B.; Fanucci, L. A Simulated Approach to Evaluate Side Channel Attack Countermeasures for the Advanced Encryption Standard. In Proceedings of the 2018 14th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), Prague, Czech Republic, 2–5 July 2018; pp. 77–80. [Google Scholar] [CrossRef]
Adegbija, T.; Rogacs, A.; Patel, C.; Gordon-Ross, A. Microprocessor Optimizations for the Internet of Things: A Survey. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2018, 37, 7–20. [Google Scholar] [CrossRef] [Green Version]
Nannipieri, P.; Matteo, S.D.; Baldanzi, L.; Crocetti, L.; Zulberti, L.; Saponara, S.; Fanucci, L. VLSI Design of Advanced-Features AES Cryptoprocessor in the Framework of the European Processor Initiative. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2022, 30, 177–186. [Google Scholar] [CrossRef]
Di Matteo, S.; Baldanzi, L.; Crocetti, L.; Nannipieri, P.; Fanucci, L.; Saponara, S. Secure Elliptic Curve Crypto-Processor for Real-Time IoT Applications. Energies 2021, 14, 4676. [Google Scholar] [CrossRef]
Hoffmann, A.; Kogel, T.; Nohl, A.; Braun, G.; Schliebusch, O.; Wahlen, O.; Wieferink, A.; Meyr, H. A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2001, 20, 1338–1354. [Google Scholar] [CrossRef]
Kinsy, M.A.; Pellauer, M.; Devadas, S. Heracles: A Tool for Fast RTL-Based Design Space Exploration of Multicore Processors. In Proceedings of the FPGA ’13—ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, CA, USA, 11–13 February 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 125–134. [Google Scholar] [CrossRef] [Green Version]
Genko, N.; Atienza, D.; De Micheli, G.; Benini, L. Feature-NoC emulation: A tool and design flow for MPSoC. IEEE Circuits Syst. Mag. 2007, 7, 42–51. [Google Scholar] [CrossRef] [Green Version]
de Micheli, G.; Benini, L. Networks on Chip: A New Paradigm for Systems on Chip Design. In Proceedings of the DATE ’02—Conference on Design, Automation and Test in Europe, Paris, France, 4–8 March 2002; IEEE Computer Society: Washington, DC, USA, 2002; p. 418. [Google Scholar]
Pani, D.; Palumbo, F.; Raffo, L. A fast MPI-based parallel framework for cycle-accurate HDL multi-parametric simulations. Int. J. High Perform. Syst. Archit. 2010, 2, 187–202. [Google Scholar] [CrossRef]
Wang, S.; Possignolo, R.T.; Skinner, H.B.; Renau, J. LiveHD: A Productive Live Hardware Development Flow. IEEE Micro 2020, 40, 67–75. [Google Scholar] [CrossRef]
Tagliavini, G.; Rossi, D.; Marongiu, A.; Benini, L. Synergistic HW/SW Approximation Techniques for Ultralow-Power Parallel Computing. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2018, 37, 982–995. [Google Scholar] [CrossRef] [Green Version]
HAMMER: Highly Agile Masks Made Effortlessly from RTL. By UC Berkeley Architecture Research. Available online: https://github.com/ucb-bar/hammer (accessed on 11 November 2022).
Chipyard Framework. By UC Berkley Architecture Research. Available online: https://github.com/ucb-bar/chipyard (accessed on 11 November 2022).
European Processor Initiative. Available online: https://www.european-processor-initiative.eu (accessed on 11 November 2022).
seL4 Microkernel. A Series of LF Projects, LLC. Available online: https://sel4.systems (accessed on 11 November 2022).
Zephyr Project. A Linux Foundation Project. Available online: https://www.zephyrproject.org (accessed on 1 November 2022).
Yocto Project. A Linux Foundation Collaborative Project. Available online: https://www.yoctoproject.org (accessed on 11 November 2022).
Newlib C standard Library. Maintained by Red Hat. Available online: https://sourceware.org/newlib (accessed on 11 November 2022).
Zaruba, F.; Benini, L. The Cost of Application-Class Processing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-nm FDSOI Technology. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2019, 27, 2629–2640. [Google Scholar] [CrossRef]
NIST, F. FIPS 186-4–Digital Signature Standard (DSS). 2013. Available online: https://csrc.nist.gov/publications/detail/fips/186/4/final (accessed on 11 November 2022).
Möller, B. Parallelizable elliptic curve point multiplication method with resistance against side-channel attacks. In Information Security, Proceedings of the International Conference on Information Security, Sao Paulo, Brazil, 30 September–2 October 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 402–413. [Google Scholar]
Giraud, C.; Verneuil, V. Atomicity improvement for elliptic curve scalar multiplication. In Smart Card Research and Advanced Application, Proceedings of the International Conference on Smart Card Research and Advanced Applications, Passau, Germany, 14–16 April 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 80–101. [Google Scholar]
Dupuy, W.; Kunz-Jacques, S. Resistance of randomized projective coordinates against power analysis. In Cryptographic Hardware and Embedded Systems—CHES 2005, Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems, Edinburgh, UK, 29 August–1 September 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 1–14. [Google Scholar]

Figure 1. Common Workflow for Heterogeneous Digital Systems Design.

Figure 2. Structure of the Framework. From the left, the Baremetal SDK used to compile software; the RTL Library containing all the HDL sources; the tool handlers which manage actual tools and inter-tool dependencies; the projects which are used to configure the workflow. Below there are the four third-party tools we have integrated into the framework.

Figure 3. A Directory Tree of an Example Project.

Figure 4. Typical Design Workflow.

Figure 5. Flow execution and customization of the software compilation phase.

Figure 6. Structure of the Baremetal SDK.

Figure 7. Flow execution, dependencies, and customization of the synthesis phase.

Figure 8. Flow execution, dependencies, and customization of the simulation phase.

Figure 9. Flow execution, dependencies, and customization of the power analysis phase.

Figure 10. Description of the VCD limited type of workflow when the limiting factor is set to 5 and there are two synthesised modules per simulation to analyse.

Figure 11. Structure of a general SoC.

Figure 12. The System-On-Chip for the Evaluation of the ECC module.

Figure 13. Architecture of the ECC Core described in [14].

Figure 14. Power trace for k1 (the least significant part is K = ..101010, with MSB first) of DA (a), DAA (c), MDAA (e), RMDAA (g) and for k3 (the least significant part is K = ..010101, with MSB first) of DA (b), DAA (d), MDAA (f), RMDAA (h).

Figure 15. Power trace for k1 (the least significant part is K = ..101010, with MSB first) of MDAA (a), RMDAA (c,e,g), and for k3 (the least significant part is K = ..010101, with MSB first) of MDAA (b), RMDAA (d,f,h).

Table 1. Performance results for the four ECC Core configurations averaged on the different execution with the six keys. Power and latency of the DA architecture strongly depends on the Hamming weight of the key that is around 128.

ECC PM Config	Area (kGE)	Latency (us)	Power (mW)	Peak (mW)
DA	247.92	32.256	57.1	407.1
DAA	253.88	36.880	69.7	394.0
MDAA	248.05	36.848	68.1	401.3
RMDAA	249.80	36.848	68.6	398.5

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zulberti, L.; Di Matteo, S.; Nannipieri, P.; Saponara, S.; Fanucci, L. A Script-Based Cycle-True Verification Framework to Speed-Up Hardware and Software Co-Design: Performance Evaluation on ECC Accelerator Use-Case. Electronics 2022, 11, 3704. https://doi.org/10.3390/electronics11223704

AMA Style

Zulberti L, Di Matteo S, Nannipieri P, Saponara S, Fanucci L. A Script-Based Cycle-True Verification Framework to Speed-Up Hardware and Software Co-Design: Performance Evaluation on ECC Accelerator Use-Case. Electronics. 2022; 11(22):3704. https://doi.org/10.3390/electronics11223704

Chicago/Turabian Style

Zulberti, Luca, Stefano Di Matteo, Pietro Nannipieri, Sergio Saponara, and Luca Fanucci. 2022. "A Script-Based Cycle-True Verification Framework to Speed-Up Hardware and Software Co-Design: Performance Evaluation on ECC Accelerator Use-Case" Electronics 11, no. 22: 3704. https://doi.org/10.3390/electronics11223704

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Script-Based Cycle-True Verification Framework to Speed-Up Hardware and Software Co-Design: Performance Evaluation on ECC Accelerator Use-Case

Abstract

1. Introduction

1.1. Related Works

1.2. Outline

2. Our Script-Based Verification Framework

2.1. Workflow Customisation: Projects

2.2. Underlying Makefile System

3. Tool Handlers

3.1. Software Tool Handler

3.1.1. Make Targets

3.1.2. Baremetal SDK

3.2. Synthesis Tool Handler

Make Targets

3.3. Simulation Tool Handler

Make Targets

3.4. Power Analysis Tool Handler

Make Targets

3.5. Limited Power-Simulation Workflow Executor

4. Use-Case: Performance Evaluation of an ECC Accelerator within a RISC-V-Based SoC

4.1. Use-Case System-on-Chip

4.2. Recipe Configuration

4.3. Performance Evaluation Results

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI