Chip Design of Multithreaded and Pipelined RISC-V Microcontroller Unit †
Abstract
:1. Introduction
2. Architecture Design
3. Dispatch Algorithm
3.1. Algorithm Steps
- The round-robin method is used to assign initial weights to four threads, with initial weight values including 4, 2, 1, and 0. The higher the weight value is, the higher the priority is. The initial weight values for the first round of the four threads are 4, 2, 1, and 0. The second-round values are 0, 4, 2, 1; the third-round values are 1, 0, 4, 2; and the fourth round consists of 2, 1, 0, and 4, and the weight values are iteratively initialized in this order, ensuring that each execution order receives a fair chance of execution.
- The DIV instruction for division is checked. The jump or branch instruction with control hazard is executed in the four threads. Therefore, the event-driven mode is switched to add 4 to the weight of the thread as the new weight to increase priority. If it is a DIV instruction, it increases the utilization rate of empty DIV units, and the completion time of the instruction is advanced. If it is a jump or branch instruction, it allows for more time to fetch the next instruction. It executes prefetch first and uses a countdown counter to record the number of clock cycles that are required for the thread to prefetch the next instruction.
- This dispatch algorithm limits the ability to dispatch up to N instructions at the same time. Among them, there is only one MUL, DIV, and CSR unit, so these three types of instructions dispatch up to one each, and the rest must be ALU-type instructions. In addition, if the executing unit is a DIV unit, it takes three clock cycles to complete the execution. Therefore, the countdown counter is used to record the required number of clock cycles, while other executing units only need one clock cycle to complete the execution.
- At this point, the dispatch is completed. Then, step 1 is repeated to update the initial weights. However, if the prefetch countdown counter in step 2 does not reach 0, the instruction prefetching has not been completed. Therefore, the weight of this thread is set to 0, priority is given to threads that have been fetched but have not yet been executed, and steps 2, 3, and 4 are repeated to improve hardware resource utilization.
3.2. Pseudo Code of Dispatch Algorithm
Algorithm 1. Dispatcher for 4-thread RISC-V MCU with N ALUs |
Input: N = 1, 2, 3, or 4 prefetch = 2, 3, or 4 // clock number of prefetch // Initializing weight thread_weight [1] ← 4 thread_weight [2] ← 2 thread_weight [3] ← 1 thread_weight [4] ← 0 DIV_used_cnter ← 0 // Dispatching while !four_threads_all_empty do for i = 1 to N do ALU_used[i] ← false end for MUL_used ← false CSR_used ← false if DIV_used_cnter = 0 then // Executing division needs 3 clocks DIV_used ← false else DIV_used_cnter ← DIV_used_cnter − 1 end if if thread_wait_cnter[num] ! = 0 then thread_wait_cnter[num] ← thread_wait_cnter[num] − 1 end if for num = 1 to 4 do // Event-driven if thread_instruction[num] = Branch_instruction then thread_weight[num] ← thread_weight[num] + 4 thread_wait_cnter[num] ← prefetch end if end for for i = 1 to N do num ← max(thread_weight [1‥4]) thread_weight[num] ← 0 if thread_instruction[num] = MUL_instruction and MUL_used = false then MUL ← thread_instruction[num] MUL_used ← true else if thread_instruction[num] = DIV_instruction and DIV_used = false then DIV ← thread_instruction[num] DIV_used ← true // Executing division needs 3 clocks DIV_used_cnter ← 3 else if thread_instruction[num] = CSR_instruction and CSR_used = false then CSR ← thread_instruction[num] CSR_used ← true else then // ALU_instruction for j = 1 to N do if ALU_used[j] = false then ALU[j] ← thread_instruction[num] ALU_used[j] = true end if end for end if end for // Round-robin thread_weight [1] ← thread_weight [4] thread_weight [2] ← thread_weight [1] thread_weight [3] ← thread_weight [2] thread_weight [4] ← thread_weight [3] // Checking whether prefetch is done for num = 1 to 4 do if thread_wait_cnter[num] ! = 0 then thread_weight[num] ← 0 end if end for end while |
3.3. Illustrations
4. Architecture Validation and Analysis
4.1. FPGA Verification
4.2. Architecture Analysis and Comparison
4.3. Chip Implementation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Waterman, A.; Lee, Y.; Avizienis, R.; Patterson, D.A.; Asanovic, K. The RISC-V Instruction Set Manual Volume II: Privileged Architecture Version 1.9; University of California: Berkeley, CA, USA, 2016. [Google Scholar]
- Lim, D.X.; Smitha, K.G. Pipelined MIPS Simulation: A plug-in to MARS simulator for supporting pipeline simulation and branch prediction. In Proceedings of the 2019 IEEE International Conference on Engineering, Technology and Education, Yogyakarta, Indonesia, 10–13 December 2019; pp. 1–7. [Google Scholar]
- Curran, B.W.; Jacobi, C.; Bonanno, J.J.; Schroter, D.A.; Alexander, K.J.; Puranik, A.; Helms, M.M. The IBM z13 multithreaded microprocessor. IBM J. Res. Dev. 2015, 59, 1–13. [Google Scholar] [CrossRef]
- Mendoza Escobar, J. A Multithreading RISC-V Implementation for Lagarto Architecture; Universitat Politècnica de Catalunya: Barcelona, Spain, 2020. [Google Scholar]
- Das, A.; Jose, J.; Mishra, P. Data criticality in multithreaded applications: An insight for many-core systems. IEEE Trans. Very Large Scale Integr. Syst. 2021, 29, 1675–1679. [Google Scholar] [CrossRef]
- Eni, Y.; Greenberg, S.; Ben-Shimol, Y. Efficient Hint-Based Event (EHE) Issue Scheduling for Hardware Multithreaded RISC-V Pipeline. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 69, 735–745. [Google Scholar] [CrossRef]
- Nojiri, Y.; Yamasaki, N. A Design of Multithreaded RISC-V Processor for Real-Time System. In Proceedings of the 2023 Eleventh International Symposium on Computing and Networking Workshops (CANDARW), Matsue, Japan, 27–30 November 2023. [Google Scholar] [CrossRef]
- Cheikh, A.; Cerutti, G.; Mastrandrea, A.; Menichelli, F.; Olivieri, M. The Microarchitecture of a Multi-threaded RISC-V Compliant Processing Core Family for IoT End-Nodes. In ApplePies 2017. Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2019; Volume 512. [Google Scholar]
- Sylvain Collange. Simty: Generalized SIMT execution on RISC-V. In Proceedings of the CARRV 2017-1st Workshop on Computer Architecture Research with RISC-V, Boston, MA, USA, 14 October 2017. [Google Scholar]
- Hennessy, J.L.; Patterson, D.A. Computer Architecture: A Quantitative Approach; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
- Lai, J.-Y.; Chen, C.-A.; Chen, S.-L.; Su, C.-Y. Implement 32-bit RISC-V Architecture Processor using Verilog HDL. In Proceedings of the 2021 International Symposium on Intelligent Signal Processing and Communication Systems, Hualien City, Taiwan, 16–19 November 2021. [Google Scholar]
Coremark | SHA | Dijkstra | |
---|---|---|---|
ALU | 94.3% | 96.3% | 95.7% |
CSR | 0.2% | 0.5% | 0.2% |
MUL/DIV | 1.7% | 0% | 0.5% |
Others | 3.8% | 3.2% | 3.6% |
1 ALU [6] | 2 ALUs | 3 ALUs | 4 ALUs | |
---|---|---|---|---|
RISC-V(mW) | 230.37 | 242.27 | 251.55 | 261.06 |
Power consumption comparison | 100% | 105.1% | 109.2% | 113.3% |
Prefetch | 1 ALU [6] | 2 ALUs | 3 ALUs | 4 ALUs |
---|---|---|---|---|
2 | 0.991 | 1.931 | 2.6 | 3.1 |
3 | 0.989 | 1.826 | 2.41 | 2.7 |
4 | 0.983 | 1.731 | 2.25 | 2.5 |
Prefetch | 1 ALU [6] | 2 ALUs | 3 ALUs | 4 ALUs |
---|---|---|---|---|
2 | 1 | 1.948 | 2.623 | 3.128 |
3 | 1 | 1.846 | 2.437 | 2.73 |
4 | 1 | 1.76 | 2.289 | 2.543 |
1 ALU [6] | 2 ALUs | 3 ALUs | 4 ALUs | |
---|---|---|---|---|
Area (μm2) | 926,442 | 1,037,361 | 1,133,061 | 1,228,539 |
Prefetch | 2 ALUs | 3 ALUs | 4 ALUs |
---|---|---|---|
2 | 8,474,653 | 7,787,280 | 6,981,201 |
3 | 7,546,047 | 6,877,392 | 5,663,743 |
4 | 6,743,660 | 6,132,059 | 5,021,566 |
Item | Specification |
---|---|
Process | TSMC 180 nm |
Area | 1190 μm × 1190 μm |
Pin | 40 pin |
Core Power Pads | 4 sets |
Pad Power Pads | 4 sets |
Core voltage | 1.8 V |
Pad voltage | 3.3 V |
Power consumption | 13.3 mW |
Frequency | 133 MHz |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yen, M.-H.; Lin, Y.-H.; Lin, T.-F.; Chen, Y.-H.; Ku, Y.-F.; Kao, C.-T. Chip Design of Multithreaded and Pipelined RISC-V Microcontroller Unit. Eng. Proc. 2025, 89, 31. https://doi.org/10.3390/engproc2025089031
Yen M-H, Lin Y-H, Lin T-F, Chen Y-H, Ku Y-F, Kao C-T. Chip Design of Multithreaded and Pipelined RISC-V Microcontroller Unit. Engineering Proceedings. 2025; 89(1):31. https://doi.org/10.3390/engproc2025089031
Chicago/Turabian StyleYen, Mao-Hsu, Yih-Hsia Lin, Tzu-Feng Lin, Yu-Hui Chen, Yuan-Fu Ku, and Chien-Ting Kao. 2025. "Chip Design of Multithreaded and Pipelined RISC-V Microcontroller Unit" Engineering Proceedings 89, no. 1: 31. https://doi.org/10.3390/engproc2025089031
APA StyleYen, M.-H., Lin, Y.-H., Lin, T.-F., Chen, Y.-H., Ku, Y.-F., & Kao, C.-T. (2025). Chip Design of Multithreaded and Pipelined RISC-V Microcontroller Unit. Engineering Proceedings, 89(1), 31. https://doi.org/10.3390/engproc2025089031