Next Article in Journal
Certain Coefficient Problems for q-Starlike Functions Associated with q-Analogue of Sine Function
Previous Article in Journal
Simplified Procedure for Capacity Check of Historic Monolithic Glass Windows under Soft-Body Collision/Bird-Strike
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FPGA Hardware Realization of Membrane Calculation Optimization Algorithm with Great Parallelism

1
School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan 232000, China
2
West Anhui University, Lu’an 237000, China
3
Joint National-Local Engineering Research Center for Safe and Precise Coal Mining, Anhui University of Science and Technology, Huainan 232000, China
4
School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232000, China
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(10), 2199; https://doi.org/10.3390/sym14102199
Submission received: 22 September 2022 / Revised: 29 September 2022 / Accepted: 9 October 2022 / Published: 19 October 2022

Abstract

:
Aiming to investigate the disadvantage of the optimization algorithm of membrane computing (a P system) in which it is difficult to take advantage of parallelism in MATLAB, leading to a slow optimization speed, a digital-specific hardware solution (field-programmable gate array, FPGA) is proposed to design and implement the single-cell-membrane algorithm (SCA). Because the SCA achieves extensive global searches by the symmetric processing of the solution set, with independent and symmetrically distributed submembrane structures, the FPGA-hardware-based design of the SCA system includes a control module, an HSP module, an initial value module, a fitness module, a random number module, and multiple submembrane modules with symmetrical structures. This research utilizes the inherent parallel characteristics of the FPGA to achieve parallel computations of multiple submembrane modules with a symmetric structure inside the SCA, and it achieves a high degree of parallelism of rules inside the modules by using a non-blocking allocation. This study uses the benchmark Sphere function to verify the performance of the FPGA-designed SCA system. The experimental results show that, when the FPGA platform and the MATLAB platform obtain a similar calculation accuracy, the average time-consuming of the FPGA is 0.00041 s, and the average time-consuming of MATLAB is 0.0122 s, and the calculation speed is improved by nearly 40 times. This study uses the FPGA design to implement the SCA, and it verifies the advantages of the membrane-computing maximum-parallelism theory and distributed structures in computing speed. The realization platform of membrane computing is expanded, which provides a theoretical basis for further development of the distributed computing model of population cells.

1. Introduction

Inspired by the biofilm structure and internal biochemical reactions, Gheorghe Păun proposed the concept of membrane computing in the field of computer science in 1998 based on the theory of cell biology [1]. As a large-scale parallel-computing paradigm, membrane computing (also known as a P system) has a parallel nested or network-like distribution structure. Additionally, it has discrete, distributed, parallel, cell-like or tissue-like characteristics, processing multiple sets and evolving by rewriting rules [2].
Membrane computing realizes different computing structures through the abstraction of cell membrane functions and various distribution modes between membranes, and it realizes multi-set computing by applying specific rules in different membranes. As an active branch of natural computing, it provides a rich computing paradigm for today’s biomolecular computing [3]. Membrane computing systems that have been proposed generally have a parallel nested or network-like distribution structure, and they are mainly divided into three types according to different topological types, namely the cell type [4], tissue-like type [5] and a spiking neural network [1]. In the development of membrane computing, different scholars have proposed a variety of new P system models based on these three basic systems, such as the P system with protein organization [6], the spiking neural P system with extended rules [7], etc. Because the P system has the theoretical characteristics of computer science, in the research of basic theory, researchers usually use the proposed P system model to simulate a Turing machine, which proves that the P system has Turing completeness [8]. In the calculation process of the P system, the mechanism of cell division and new cell generation is often included, and the inner domain of the cell membrane is used as the reaction generator. This makes the P system model often independent and distributed, and it can use parallel methods to accelerate the calculation process.
Based on the above-mentioned characteristics (independence, distribution, completeness, etc.), the P system is regarded as a high-performance parallel-computing model and has been successfully applied in many fields. By incorporating cell division rules into the organizational P system, a membrane-computing structure for solving the SAT problem was constructed, and good results were achieved [9,10,11]. Based on the population dynamics model of the probabilistic P system, a variety of biological populations have been accurately modelled [12,13,14]. In the dynamic model, two infectious disease models based on the population P system were designed to accurately model influenza in Japan [15] and COVID-19 in Guangdong, China [16]. The distributed structure and flexible evolution rules of membrane computing are also widely used in engineering optimization problems. The literature [17] uses the cell-like P system framework to incorporate particle swarm optimization rules, construct the MA-PSO-M optimization algorithm, and apply it to solving Sudoku-type problems. The literature [18] applies parallel membrane systems to the field of robot motion planning. The literature [19] applied the impulse-neural-type P system to diagnose faults in complex power systems.
Among the P system implementation tools that have been developed, MeCoSim [20,21] and MATLAB are mostly used. The maximum parallelism of a P system means breaking down a task into a set of concurrently executable operations and distributing the set of operations to multiple processing units to operate simultaneously. Therefore, when the scale of the target P system increases, the consumption of CPU time and resources for communication between computers will increase sharply, which limits the great parallelism advantage of the P system [22]. Because the CPU infrastructure of the computer cannot be miniaturized and specialized, the application scope of the P system is limited.
The FPGA is a reconfigurable hardware circuit, also known as a programmable circuit gate, with excellent flexibility and low latency. Common FPGA chips include Xlinx, Altera and Lattice brands. As a device that develops from a logic gate level to a behavior level, the FPGA is developed using hardware description language; the more commonly used languages are Verilog HDL, VHDL and System Verilog. FPGAs allow researchers to use abstract code to combine logic gates to implement specific digital circuits. At the same time, it can meet the needs of different situations by changing and reorganizing the logic circuit. FPGAs provide a small, customizable, low-cost method for implementing algorithms.
Because FPGAs have the above-mentioned characteristics, some scholars have used FPGAs for optimization calculation in recent years and have achieved ideal results. In the literature [23], by adopting the design of a finite state machine, the BAT algorithm implements a parallel-computing architecture in FPGAs to obtain an excellent convergence speed. In the literature [24], using the characteristics of the FPGA modular design, the hardware customization of the RGBSO algorithm is realized. In the literature [25], the concept of implementation information of FPGA optimization calculation was proposed for the first time, as were the advantages of using FPGAs for optimization calculation. In addition, some classical optimization methods, namely PSO [26], ACO [27] and GA [28,29], have obtained a better application performance in FPGAs.
At present, the hardware research of the P system mainly focuses on building new computing models. In the literature [30], researchers designed the first FPGA circuit (more precisely, the transition P system) that simulates membrane computing to realize communications with the internal membrane, as well as the rules between priorities. Additionally, a mechanism for the generation and dissolution of membranes in simulated integrated circuits was proposed. The authors of [31] realized the parallelism, distribution and non-determinism of membrane computing based on the reconfigurable hardware membrane-computing system. As explained in [32], the FPGA-based parallel-computing platform Reconfig-P was used to simulate membrane computing. Reconfig-P is built based on a region-oriented idea, in which regions as computational entities communicate with objects through message passing.
In the current research direction of the combination of the P system optimization theory and FPGAs, there are few research results. At this stage, most of the P system optimization theory is implemented on the MATLAB platform, and the pseudo-parallel function verification of the P system optimization theory is carried out by means of serial operation simulation and parallelism. However, based on the above-mentioned achievements, it is feasible to use FPGAs to realize the P system optimization theory, and it has great potential for development.
Given the parallelism and modularity of the FPGA platform, this paper discusses the implementation method of the P system optimization theory on the FPGA platform. This study is based on the single-group setting of the multi-membrane search algorithm (MSA) [33] designed by the P system, namely the single-cell-membrane algorithm (SCA), for hardware implementation, and the study aims to discuss the optimization performance of the algorithm under the hardware platform.
The main contributions of this paper are as follows:
  • This paper implements the SCA optimization algorithm proposed by our team in FPGAs, verifies the performance of the membrane-computing optimization algorithm in the hardware chip, and realizes the miniaturization and specialization of the membrane-computing optimization system.
  • This paper designs the control, HSP and submembrane modules of the SCA, realizes the parallel computation of submembrane modules with a symmetric structure, constructs the FPGA hardware optimization system of the SCA, and provides the theoretical basis for further development of the distributed computational model of population cells.
  • This paper uses the finite state machine (FSM) and non-blocking assignment language features of the FPGA, and the extreme parallelism of membrane computation is realized. It verifies that the maximum-parallelism theory and the distributed structure of membrane computing can significantly improve computing speed without reducing computing accuracy.
The rest of the paper is organized as follows: In the second part, the optimization model of the SCA is described, and detailed technical details are given. The third part describes the overall structure and operation process of the SCA’s FPGA system, as well as the design and implementation process of each module structure in the system. The fourth part introduces the SCA hardware system test and result analysis. Finally, the full text is summarized. The abbreviations used in this paper are listed in Table 1.

2. Single Membrane Optimization Algorithm

In this section, we provide the formal definition and static spatial-structure diagram of the SCA and describe the implementation process of single-cell system optimization in detail.

2.1. Formal Definition of SCA

The SCA is a single-cell-membrane P system, and its static structure can be regarded as a flattening algorithm for parallel multi-set rewriting. We describe it through the concept of spatial location (cell), which can be seen as a special interpretation of symbols. Its formal definition is as follows:
= ( m , V , T , u , X , I n f , R )
where:
m = the cell membrane number of SCA;
V = the alphabet, representing the set of all object elements within the system;
T = {Fbest, Xbest}—a collection of output objects, where TV;
u = [0[1]1, [2]2…[i]i…[I]I]0—the nested structure and spatial position distribution between the membranes that the system has;
X= {X1, …, XI}—the finite multiset associated with the initial in the membrane, where XiV, for all 1 ≤ iI;
Inf = {f1, …, fI}—the formalized rules that exist in the intramembrane domain, for all 1 ≤ iI;
R—a finite set of rewritten rules of the form. The table is as follows:
X X ; H S P , Q
X is the finite multiset after the rewrite rule is executed, HSP and Q are the condition variable involved in the rewrite rule, HSP is the heat shock protein in SCA, Q is the intracellular protease activity constant, and where X , H S P , Q V .
Figure 1 is a single-cell system in which each submembrane is independent of each other and contains its own finite multiset, the rewriting rules involved in constructing the object and the adaptive membrane structure produced in the computation. In addition, the inner domain of the surface membrane contains selection rules related to conditional variables, enzyme catalysis rules and conditional communication rules between each submembrane.
The processing rule within the submembrane is to take j = i as the symmetric dimension and converge to the optimal solution when the dimension j = i, and the optimal value random convergence is used for j < i and j > i, and the symmetric dimension within each submembrane is not the same to increase the variation range of the solution set and achieve a wide global search.

2.2. Optimize the Process

Step 1: Initialize the membrane system:
{ S C A = [ [ X 1 ] [ X 2 ] [ X i ] [ X I ] ] T [ X i ] = [ x i 1 x i 2 x i j x i d i m ] ; i = 1 , 2 , I ; I = d i m + 1 ; j = 1 , 2 , d i m s = 0
x i j = r a n d × ( u b j l b j ) + l b j
where [Xi] represents the i-th submembrane and its inner domain multiset; [ x i 1 x i 2 x i j x i d i m ] is the multiset contained in the i-th submembrane; x i j is the j-th variable of the multiple solution set in the i-th submembrane; s represents the current iteration number of the algorithm; rand is a random variable between [0,1]; and ubj and lbj are the upper and lower bounds of the j-th variable, respectively. The value of [Xi] is assigned a random value within the definition domain during the initialization phase of the algorithm, and its number is I = dim + 1.
Step 2: Calculate the fitness of each submembrane system according to the problem to be optimized:
F i = f i ( [ X i ] )
where Fi represents the fitness set corresponding to the i-th submembrane system under the current iteration number of the SCA, and fi represents the formal rules contained in the i-th submembrane.
Step 3: Find the optimal submembrane individual fitness and optimal multiset in the system:
{ F b e s t F i ( i f   F i F 1 F 2 F i F I ) X b e s t X i ( x b e s t i X b e s t )
where Fbest indicates the global best fitness of the SCA; Xbest represents the multiset of the global optimal solution; and x b e s t i represents the i-th dimension variable of the globally optimal multiset.
Step 4: Update the value of the condition variable HSP (heat shock protein) in the SCA:
H S P = 1 - exp [ Q × log 2 ( s S ) ] 1 + exp [ Q × log 2 ( s S ) ]
where s is the current iteration number of the SCA algorithm; S is the maximum iteration number; and Q is the intracellular enzyme activity constant, which is an empirical value and can be set by the user.
Step 5: The solution set in each submembrane exchanges information with the optimal solution set and recalculates the fitness [33]:
x i j R x i j
R : { { x i j = r a n d × ( x i b e s t x i j ) + x i j ( j = i ) x i j = x i b e s t + H S P × ( r a n d - 0 . 5 ) × ( u b i l b i ) ( j i ) ; i d i m x d i m + 1 j = x i = j j ; i = d i m + 1
[ X i ] { [ X i ] , f i ( [ X i ] ) f i ( [ X i ] ) [ X i ] , f i ( [ X i ] ) > f i ( [ X i ] ) ; x i j X i
s . t . F S C A = { f 1 ( [ X 1 ] ) f 2 ( [ X 2 ] ) f i ( [ X i ] ) f I ( [ X I ] ) }
where x i j represents the rewritten variable; [ X i ] represents the rewritten multiple solution set; and F S C A represents the fitness of the rewritten algorithm. If the rewritten variable is out of range, the boundary value will be assigned to the current variable.
Step 6: Update the optimal fitness of each submembrane system and the number of iterations of the membrane system;
{ F b e s t F b e s t X b e s t X b e s t
s = s + 1
where F b e s t represents the rewritten global optimal fitness value; X b e s t represents the rewritten global optimal solution.
Step 7: If s = S, end the search and output Fbest and Xbest. If s < S, go to step 4.

3. Hardware Design

This section is divided by subheadings. It provides a concise and precise description of the experimental results and their interpretation, as well as the experimental conclusions that can be drawn.

3.1. Cyclone IV E FPGA

This research used Altera’s Cyclone IV E series EP4CE10 FPGA chip, which contained 10,000 minimum Logic Elements (LE), 23 digital signal processing blocks (DSP), and a maximum embedded memory of 414 Kb. The Cyclone IV E series has the advantages of low power consumption and high-cost performance, and its resource distribution is shown in Figure 2.

3.2. System Design

The traditional circuit design method adopts a bottom-up method, while the design of the FPGA is a top-down design. The user only needs to use the hardware description language to describe the function, and the underlying components are completed by the software. This design supports the modular design, making multiple calls.

3.2.1. SCA Hardware Module Structure

We designed the hardware structure of the SCA concerning the formal definition of the SCA, divided different functions into modules, designed the startup signal and excitation source when the system was running, and designed the communication between modules and their state changes.
In Figure 3, it can be seen that the FPGA hardware system of the SCA included 6 parts: the control module, HSP module, initial value module, fitness module, random number module and multiple submembrane modules. The control module scheduled each module assigns computing tasks and realized the entire optimization process.

3.2.2. Sequential Model

The SCA’s FPGA hardware-running process was as follows:
In Figure 4, it can be seen that the optimization process used the independence of submembrane modules to realize parallel computing between modules. Different submembrane modules independently selected the object multiset [Xi] and used the internal rules of the module to implement the rewriting of the contained multiset. Because there are often different evolutionary rules inside the module and this evolutionary rule is affected by random values, its change was not fixed, which makes the change of multiple sets a non-deterministic process.

3.3. Module Design

In this small section, we design the modules in the SCA’s FPGA system, and describe the internal structure and running process of each module in detail. At the same time, the parallel rules of different modules are studied, and the great parallelism of various rules within the modules is analyzed.

3.3.1. Control Module

The control module is the core module of the whole system; its function is to control the running state of all modules in the whole system and play the role of information processing and information transmission between modules. To improve the computational efficiency between modules, we used a finite state machine to implement parallel computing of different modules.
In Figure 5, it is shown that the finite state machine had five states as follows: IDLE (000), which corresponded to the initial state of the system; SOLVE (001), which corresponded to the system evaluation fitness state; UPDATE (010), which corresponded to the update state of the system’s multiple sets; CHECK (011), which corresponded to the updated value detection state of the system multiple sets; and STOP (100), which corresponded to the system-shutdown output state. The state machine ran under the unified coordination of the internal clock of the FPGA, and the conversion process is shown in Figure 6.
When STATE = 010, the control module activates the HSP module, random module and multiple submembrane modules at the same time and realizes the parallel computing of multiple modules. This design takes advantage of the maximum-parallelism characteristic of membrane computing through the parallel distribution of modules, increasing the occupancy rate of the internal logic array of the FPGA in exchange for a reduction in computing time, and taking advantage of the FPGA’s advantage of exchanging space for time. The maximum parallel rule set is the set of all rules that can be run in parallel under the current configuration.
When the system finite state machine STATE = UPDATE (010), the maximum parallel rule set between modules is as follows:
i f : S T A T E = U P D A T E ( 010 ) { Cell   Module : ( Cell i [ X i ] [ X i ] Cell i ) , HSP   Module : ( H S P s , S T A T E H S P ) , RANDOM   Module : ( r a n d S T A T E r a n d )
In the process of calculation, submembrane modules are independent of each other. Different submembrane modules only select the object multisets and intramembrane rules of their respective intramembrane domains to rewrite the contained multisets and communicate with each other through the control module.

3.3.2. HSP Generation Module

The HSP module mainly provides HSP-corresponding values for rewriting rules. The HSP calculation formula is as follows [33].
H S P = 1 - exp [ Q × log 2 ( s S ) ] 1 + exp [ Q × log 2 ( s S ) ]
In FPGAs, it is more complicated to implement logarithmic calculation. To save computing resources, as shown in Figure 7, we first pre-calculated the HSP value, and inputted the corresponding HSP value under each iteration step when Q = 0.05 in the form of memory. The HSP module addressed the output according to the number of iterations.

3.3.3. Fitness Module

The fitness evaluation module mainly implements the formal rules in the SCA, evaluates the iteratively updated multiple sets and outputs Fbest and [Xbest].
As shown in Figure 8, the fitness module remained stopped and was only activated when STATE = SOLVE (001). Due to the use of non-blocking assignments, parallel computations could be performed on multiple sets of input at the same time within one module.
In Figure 9, the registers in which each group of multiple solutions in the fitness module is located corresponds to a group of logic circuits, and multiple groups of logic circuits perform computations in parallel with each other. The module inputs the calculated fitness result into the comparator and selects the optimal result output corresponding to the optimal multiset.

3.3.4. Submembrane Module

The submembrane module is mainly used to update and calculate the multiple sets inside the SCA algorithm, and to implement the following update rules under the call of the control module:
x i j { r a n d × ( x i b e s t x i j ) + x i j ( j = i ) x i b e s t + H S P × ( r a n d - 0 . 5 ) × ( u b i l b i ) ( j i ) ; i dim
During the operation of the FPGA, multiple submembrane modules are independently distributed and maintained in a parallel-computing state, realizing the characteristics of great parallelism of membrane computing.
In Figure 10 and Figure 11, it can be seen that the submembrane module achieved great parallelism of membrane computation at the module level during the update process of multiple sets. In the update process of the multiset inside the module, due to the inherent non-blocking assignment of the FPGA, the parallel-rewriting process was realized for the parameters of different dimensions in a single multiset.
The maximum-parallel rule set of a single submembrane module is as follows:
Cell i = { x i 1 rand ( 0 , 1 ) ,   R 1 x i 1 x i 2 rand ( 0 , 1 ) ,   R 2 x i 2 x i j rand ( 0 , 1 ) ,   R i x i j x i J rand ( 0 , 1 ) ,   R J x i J
For [XI], after submembrane modules update [X1], [X2] … [Xi] … [XI], respectively, the control module implements the following updated rules:
[ X I ] [ x 1 1 , x 2 2 x i j x I 1 dim ] ; I = dim + 1

3.3.5. Random Module

Because the FPGA cannot generate random numbers by itself, the random module uses a linear feedback shift register (LFSR) to generate pseudo-random numbers. Statistical pseudo-randomness refers to the number of 1s roughly equal to the number of 0s in a given random bitstream sample. Similarly, “10”, “01”, “00” and “11” are roughly equal in number and have statistical characteristics similar to random numbers.
The characteristic polynomial corresponding to the random-number-generation module is as follows:
G ( X ) = g m X m + g m 1 X m 1 + g m 2 X m 2 + + g 2 X 2 + g 1 X + g 0
where gm is the coefficient of the polynomial, which can only be 1 or 0.
In Figure 12, it is shown that the random-number-generation module needs to give the output of the previous state and reuse the linear function of this output as the input shift register. The XOR operation was performed on some bits of the register as input, and then the bits in the register were shifted as a whole.

3.3.6. Initial Value Module

The initial value module also adopts the LFSR random-number-generation method to generate the initial multiset.

4. Experiment and Result Analysis

In this research, the FPGA hardware platform of the SCA adopted Quartus II 64-Bit and ModelSim-Altera for testing, and the software platform of the SCA adopted Matlab R2020b. The operating system was WIN10 Professional Edition 64-bit system, the processor was interI7-12700, and the memory was 32GDDR5-5200Mhz. In the test problem, we chose the Sphere function as the test function with dimension Dim = 3.

4.1. Sphere Function

The Sphere function is widely used in the field of optimization, and as a unimodal benchmark function, it is continuous, differentiable, and scalable. Although the function is relatively simple, it can reliably evaluate the robustness and convergence speed of the algorithm convergence and is widely used in multi-platform optimization tests [22,34].
Sphere function:
F 1 ( x ) = i = 1 n x i 2
As shown in Figure 13, the 3D stereo image of the Sphere function resembles the bottom shape of a valley. The function is continuous with only one minimum and no local optimum.

4.2. SCA-Specific Formal Definition

The SCA adaptively generates the membrane structure according to the characteristics of the problem to be solved. In this test of the FPGA hardware platform, its specific formal definition is as follows:
S C A = { u = [ 0 [ 1 ] 1 , [ 2 ] 2 , [ 3 ] 3 , [ 4 ] 4 ] 0 { [ X 1 ] = [ x 1 1 x 1 2 x 1 3 ] [ X 2 ] = [ x 2 1 x 2 2 x 2 3 ] [ X 3 ] = [ x 3 1 x 3 2 x 3 3 ] [ X 4 ] = [ x 4 1 x 4 2 x 4 3 ] I n f = { f ( x ) = i = 1 3 x i 2 } T = { F b e s t , X b e s t } R = { [ X ] 1 [ X ] 1 [ X ] 2 [ X ] 2 [ X ] 3 [ X ] 3 [ X ] 4 [ X ] 4 ; H S P , Q = 0.05 , S = 200 }
In Equation (21), it can be seen that the SCA tested this time had a double-nested membrane structure with four submembrane contained within one outer membrane. The distribution in each submembrane intradomain contained a set of three-dimensional multisets. In the test, we set the intracellular enzyme activity constant Q to an empirical value of 0.05, and the total number of iterations of the test is 200 times. There are four different sets of rewriting rules in the SCA.

4.3. FPGA Hardware Architecture

Figure 14 shows the RTL structure of the SCA’s FPGA hardware module. Among them, CONTROL was the control module; HSP was the HSP module; Xnew was the initial value module; FIT was the fitness module; RANDOM was the random module; and CELL1~CELL3 were the submembrane modules.

4.4. Test Waveform

In Figure 15, it can be seen that, when the cell enzyme activity constant Q was set to 0.05, the decline rate of the HSP value of the conditional variable of the SCA was faster in the early stage of iteration, and the decline rate was slower in the middle and late stage. This design enables the SCA to obtain greater convergence in the early stage of optimization calculation. In the middle and later stages, the SCA iteration step size was reduced to carry out a more accurate optimization in the area where the global optimal solution may exist, and to improve search accuracy.
In Figure 16, Figure 17 and Figure 18, it can be seen that the iterative update waveforms of the multiple sets [X1], [X2], and [X3], the three-dimensional parameters of the multiple sets change greatly in the early stage of optimization calculation. Additionally, as the number of iterations increases, the variation range gradually shrinks. This design is related to the gradual decrease in HSP with the number of iterations, and the solution set changes widely in the early stage of iteration for an extensive global search. The range of changes in the later period is reduced to facilitate local excavation. It shows that the SCA based on the FPGA design can keep the balance between global search and local exploration.
Meanwhile, with X2_2 as the axis, the curve variations in Figure 16 and Figure 18 show symmetric features, and because [X1], [X2] and [X3] are all feasible solutions within the SCA submembranes, Figure 16, Figure 17 and Figure 18 represent the search behavior of the SCA as a whole. The SCA achieves an extensive global search by the symmetric processing of the solution set, jointly with multiple submembranes.
During the update process of multiple sets, the feasible solution changes of the SCA showed a certain weak correlation. The waveforms for x 1 1 , x 2 2 and x 3 3 in multiple sets have similar trends. In Figure 19, the waveforms corresponding to x 1 2 , x 1 3 , x 2 1 , x 2 3 , x 3 1 and x 3 2 have similar trends. In the iterative waveform of [X4], x 4 1 , x 4 2 and x 4 3 correspond to the iterative update waveforms of x 1 1 , x 2 2 and x 3 3 , respectively, which correspond to the rewriting rules of the SCA.
Although the submembrane modules perform parallel computing independently of each other, the control module realizes the information exchange between each submembrane module and the optimal multiset under the current iteration. This design ensures the weak correlation after an independent rewriting of multiple sets, and it improves the convergence speed and accuracy of the algorithm in the hardware platform.
Figure 20 shows that the optimal multiset [Xbest] of the FIT module continuously converged to the optimal value in the iterative process, indicating that the SCA based on the FPGA design has convergence in the optimization calculation.

5. Result Analysis

5.1. Use of Decimals in FPGAs

As an optimization algorithm based on the membrane-computing architecture design, the SCA widely exists in the decimal calculation. It uses Matlab simulation to represent decimals with floating-point numbers in the computing environment of a general-purpose CPU. However, as a programmable circuit gate, the FPGA itself cannot represent decimals. In this study, high and low binary numbers are used for the design, and the output binary results need to be understood according to the established rules.
1 1 1 1 1 1 1 1 2 2 2 1 2 0 2 1 2 2 2 3 2 4 2 5 1 × 2 2 + 1 × 2 1 + 1 × 2 0 + 1 × 2 1 + 1 × 2 2 + 1 × 2 3 + 1 × 2 4 + 1 × 2 5 = 7.96875
As described in the above formula, in FPGAs, the conversion rule of binary 111.11111 represents the decimal number 7.96875. In this study, we used similar fixed-point numbers for decimal representation. For multiset [Xi], we used 1 + 15-bit signed binary number representation, among which and among the 15 bits, the upper 7 bits represented integers, and the lower 8 bits represented decimals. In the calculation of fitness, we used 1 + 32-bit signed binary representation, among which and among the 32 bits, the upper 16 bits represented integers, and the lower 16 bits represented decimals.
In this study, different starting values of random numbers were used for the LFSR, five tests were carried out, and the binary results obtained were converted into decimals, as is shown in Table 2 and Table 3.

5.2. Comparison of FPGA and Matlab Platform Results

In the Matlab platform, we also tested the SCA five times and compared the obtained results with those obtained in the FPGA, as shown in Table 4.
In the above simulation results, it can be found that, in the comparison of the solution accuracy, the average solution accuracy of the SCA algorithm based on the FPGA design was 1.77 × 10−4, and the result obtained by the Matlab platform was 1.75 × 10−4, both of which are very similar. In the comparison of solution stability, the standard deviations of the FPGA and Matlab platforms were 1.51 × 10−4 and 1.58 × 10−4, respectively, and the solution stability was basically the same. In the optimal value comparison of the solution, the Matlab platform won slightly.

5.3. Simulation Time Comparison

Figure 21, Figure 22, Figure 23, Figure 24 and Figure 25 show the time consumed by the SCA on the FPGA platform. A comparison of the time consumed by the SCA solution on the Matlab platform and the FPGA platform is summarized in Table 5.
It can be seen from Table 4 that the average time consumed by the FPGA platform was only 411,522 ns (≈0.00041 s), and the average time consumed by the Matlab platform was 0.0122 s, which is about 30 times the time consumed by the FPGA.

6. Conclusions

In this paper, the single-cell-membrane algorithm (SCA) was implemented on the FPGA platform. Due to the parallelism of the FPGA platform itself, the SCA uses multiple submembrane unions to achieve a symmetric search of the solution space. Additionally, under the setting of the finite state machine of the FPGA system, the space-for-time advantage of FPGAs was exploited to accelerate the computation by calling multiple submembrane units in parallel. At the same time, inside the submembrane unit, the FPGA performed a non-blocking assignment to the object set when multiple sets were updated, which realized the large parallelism of the P system. The Sphere benchmark function was used for testing and for the comparison with the results obtained by the Matlab platform. The results show that the SCA design based on the FPGA platform has a solution accuracy and solution stability similar to Matlab. Compared to the computation time of Matlab, the time consumed by the SCA’s FPGA system is almost negligible.
For future research, based on the existing SCA’s FPGA system, we will try to use the FPGA non-blocking assignment statement to realize the non-deterministic rules of membrane computation so as to improve the spatial exploration ability of the algorithm. Additionally, we will further explore the parallel advantages of FPGAs, combined with the space-for-time characteristics of membrane computing, to develop a distributed computing model of population cells.

Author Contributions

Conceptualization, Q.S. and Y.H.; methodology, Y.H.; software, Q.S.; validation, Q.S., W.L. and J.X.; formal analysis, S.X.; investigation, T.H.; resources, Y.H.; data collation, X.R.; writing—original draft preparation, Q.S.; writing—review and editing, W.L.; visualization, Q.S.; supervision, Y.H.; project management, Q.S.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 61772033 and “The APC was funded by Yourui Huang”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Păun, G. Computing with Membranes. J. Comput. Syst. Sci. 2000, 61, 108–143. [Google Scholar] [CrossRef] [Green Version]
  2. Păun, G. Membrane Computing: An Introduction; Springer: Berlin, Germany, 2002; ISBN 978-3-642-56196-2. [Google Scholar]
  3. Song, B.; Li, K.; Orellana-Martín, D.; Pérez-Jiménez, M.J.; Pérez-Hurtado, I. A Survey of Nature-Inspired Computing. ACM Comput. Surv. 2021, 54, 1–31. [Google Scholar] [CrossRef]
  4. Ionescu, M.; Păun, G.; Yokomori, T. Spiking neural P systems. Fundam. Inform. 2006, 71, 279–308. [Google Scholar]
  5. Martín-Vide, C.; Păun, G.; Pazos, J.; Rodríguez-Patón, A. Tissue P systems. Theor. Comput. Sci. 2003, 296, 295–326. [Google Scholar] [CrossRef] [Green Version]
  6. Christinal, A.H.; Díaz-Pernil, D.; Mathu, T. A Uniform Family of Tissue P Systems with Protein on Cells Solving 3-Coloring in Linear Time. Nat. Comput. 2018, 17, 311–319. [Google Scholar] [CrossRef]
  7. Chen, H.; Ionescu, M.; Ishdorj, T.-O.; Păun, A.; Păun, G.; Pérez-Jiménez, M.J. Spiking neural P systems with extended rules: Universality and languages. Nat. Comput. 2008, 7, 147–166. [Google Scholar] [CrossRef]
  8. Păun, G.; Rozenberg, G.; Salomaa, A. (Eds.) The Oxford Handbook of Membrane Computing; Oxford University Press: Oxford, UK, 2009. [Google Scholar]
  9. Păun, G.; Pérez-Jiménez, M.J.; Riscos-Núñez, A. Tissue P systems with cell division. Int. J. Comput. Commun. Control. 2008, 3, 295–303. [Google Scholar] [CrossRef] [Green Version]
  10. Díaz-Pernil, D.; Gutiérrez-Naranjo, M.A.; Jiménez, M.J.P.; Núñez, A.R. A uniform family of tissue P systems with cell division solving 3-COL in a linear time. Theor. Comput. Sci. 2008, 404, 76–87. [Google Scholar] [CrossRef] [Green Version]
  11. Díaz-Pernil, D.; Jiménez, M.J.P.; Núñez, A.R.; Romero-Jiménez, Á. Computational efficiency of cellular division in tissue-like membrane systems. Rom. J. Inf. Sci. Technol. 2008, 11, 229–241. [Google Scholar]
  12. Martínez-Del-Amor, M.; Pérez-Hurtado, I.; Orellana-Martín, D.; Pérez-Jiménez, M.J. Adaptative parallel simulators for bioinspired computing models. Future Gener. Comput. Syst. 2020, 107, 469–484. [Google Scholar] [CrossRef]
  13. Pérez-Jiménez, M.; Sanuy, J. A bio-inspired computing model as a new tool for modeling ecosystems: The avian scavengers as a case study. Ecol. Model. 2011, 222, 33–47. [Google Scholar] [CrossRef] [Green Version]
  14. Martinez-Del-Amor, M.; Perez-Hurtado, I.; Perez-Jimenez, M.; Riscos-Nunez, A.; Colomer, M.A. A new simulation algorithm for multienvironment probabilistic P systems. In Proceedings of the 2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), Changsha, China, 23–26 September 2010; pp. 59–68. [Google Scholar] [CrossRef]
  15. Colomer Cugat, M.Á.; García Quismondo, M.; Valencia Cabrera, L. Membrane System-Based Models for Specifying Dynamical Population Systems. In Applications of Membrane Computing in Systems and Synthetic Biology; Frisco, P., Gheorghe, M., Pérez-Jiménez, M.J., Eds.; Springer: Cham, Switzerland, 2014; pp. 97–132. [Google Scholar] [CrossRef]
  16. Huang, Y.; Song, Q.; Song, H.; Xu, S.; Han, T. Application Of The Population Dynamics Membrane System in COVID-19 Propagation Model. In Proceedings of the ICMC 2021 (International Conference on Membrane Computing 2021), Chengdu, China and Debrecen, Hungary Online Session, 25–26 August 2021. [Google Scholar]
  17. Singh, G.; Deep, K. A new membrane algorithm using the rules of Particle Swarm Optimization incorporated within the framework of cell-like P-systems to solve Sudoku. Appl. Soft Comput. 2016, 45, 27–39. [Google Scholar] [CrossRef]
  18. Pérez-Hurtado, I.; Martínez-Del-Amor, M.; Zhang, G.; Neri, F.; Pérez-Jiménez, M.J. A membrane parallel rapidly-exploring random tree algorithm for robotic motion planning. Integr. Comput. Eng. 2020, 27, 121–138. [Google Scholar] [CrossRef]
  19. Rong, H.; Yi, K.; Zhang, G.; Dong, J.; Paul, P.; Huang, Z. Automatic Implementation of Fuzzy Reasoning Spiking Neural P Systems for Diagnosing Faults in Complex Power Systems. Complexity 2019, 2019, 2635714. [Google Scholar] [CrossRef]
  20. Pérez-Hurtado, I.; Valencia-Cabrera, L.; Pérez-Jiménez, M.J.; Colomer, M.A.; Riscos-Núñez, A. MeCoSim: A general purpose software tool for simulating biological phenomena by means of P systems. In Proceedings of the IEEE Fifth International Conference on Bioinpired Computing: Theories and Applications (BIC-TA 2010) I (2010), Changsha, China, 23–26 September 2010; pp. 637–643. [Google Scholar]
  21. MeCoSim Project Website. Available online: http://www.p-lingua.org/mecosim/ (accessed on 12 February 2021).
  22. Zhang, G.; Shang, Z.; Verlan, S.; Martínez-Del-Amor, M.; Yuan, C.; Valencia-Cabrera, L.; Pérez-Jiménez, M.J. An Overview of Hardware Implementation of Membrane Computing Models. ACM Comput. Surv. 2021, 53, 1–38. [Google Scholar] [CrossRef]
  23. Ben Ameur, M.S.; Sakly, A. FPGA based hardware implementation of Bat Algorithm. Appl. Soft Comput. 2017, 58, 378–387. [Google Scholar] [CrossRef]
  24. Hassanein, A.; El-Abd, M.; Damaj, I.; Rehman, H.U. Parallel hardware implementation of the brain storm optimization algorithm using FPGAs. Microprocess. Microsyst. 2020, 74, 103005. [Google Scholar] [CrossRef]
  25. Ortiz, A.; Mendez, E.; Balderas, D.; Ponce, P.; Macias, I.; Molina, A. Hardware implementation of metaheuristics through LabVIEW FPGA. Appl. Soft Comput. 2021, 113, 107908. [Google Scholar] [CrossRef]
  26. Ettouil, M.; Smei, H.; Jemai, A. Particle Swarm Optimization on FPGA. In Proceedings of the 2018 30th International Conference on Microelectronics (ICM), Sousse, Tunisia, 16–19 December 2018; pp. 32–35. [Google Scholar] [CrossRef]
  27. Scheuermann, B.; So, K.; Guntsch, M.; Middendorf, M.; Diessel, O.; ElGindy, H.; Schmeck, H. FPGA implementation of population-based ant colony optimization. Appl. Soft Comput. 2004, 4, 303–322. [Google Scholar] [CrossRef]
  28. Ben Ameur, M.S.; Sakly, A.; Mtibaa, A. A Hardware Implementation of Genetic Algorithms using FPGA Technology. Sens. Circuits Instrum. Syst. Ext. Pap. 2018, 6, 129–144. [Google Scholar] [CrossRef]
  29. Attarmoghaddam, N.; Li, K.F.; Kanan, A. FPGA Implementation of Crossover Module of Genetic Algorithm. Information 2019, 10, 184. [Google Scholar] [CrossRef] [Green Version]
  30. Petreska, B.; Teuscher, C. A recongurable hardware membrane system. In International Workshop on Membrane Computing (WMC’ 03) (Lecture Notes in Computer Science); Martín-Vide, C., Mauri, G., Paun, G., Rozenberg, G., Salomaa, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; Volume 2933, pp. 269–285. [Google Scholar]
  31. Nguyen, V. An Implementation of the Parallelism, Distribution and Nondeterminism of Membrane Computing Models on Recongurable Hardware. Ph.D. Thesis, University of South Australia, Adelaide, Australia, 2010. [Google Scholar]
  32. Nguyen, V.; Kearney, D.; Gioiosa, G. An extensible, maintainable and elegant approach to hardware source code generation in Reconfig-P. J. Log. Algebraic Program. 2010, 79, 383–396. [Google Scholar] [CrossRef] [Green Version]
  33. Song, Q.; Huang, Y.; Lai, W.; Han, T.; Xu, S.; Rong, X. Multi-membrane search algorithm. PLoS ONE 2021, 16, e0260512. [Google Scholar] [CrossRef] [PubMed]
  34. Jamil, M.; Yang, X.-S. A literature survey of benchmark functions for global optimisation problems. Int. J. Math. Model. Numer. Optim. 2013, 4, 150–194. [Google Scholar] [CrossRef]
Figure 1. Single-cell system.
Figure 1. Single-cell system.
Symmetry 14 02199 g001
Figure 2. Cyclone IV E FPGA.
Figure 2. Cyclone IV E FPGA.
Symmetry 14 02199 g002
Figure 3. Hardware module structure.
Figure 3. Hardware module structure.
Symmetry 14 02199 g003
Figure 4. SCA hardware optimization flowchart.
Figure 4. SCA hardware optimization flowchart.
Symmetry 14 02199 g004
Figure 5. System finite state machine.
Figure 5. System finite state machine.
Symmetry 14 02199 g005
Figure 6. Control module finite state machine flow.
Figure 6. Control module finite state machine flow.
Symmetry 14 02199 g006
Figure 7. HSP module structure.
Figure 7. HSP module structure.
Symmetry 14 02199 g007
Figure 8. Fitness module flowchart.
Figure 8. Fitness module flowchart.
Symmetry 14 02199 g008
Figure 9. Fitness module architecture.
Figure 9. Fitness module architecture.
Symmetry 14 02199 g009
Figure 10. Submembrane module update process.
Figure 10. Submembrane module update process.
Symmetry 14 02199 g010
Figure 11. Internal parallelism of a single submembrane module.
Figure 11. Internal parallelism of a single submembrane module.
Symmetry 14 02199 g011
Figure 12. Random number module structure.
Figure 12. Random number module structure.
Symmetry 14 02199 g012
Figure 13. Sphere function.
Figure 13. Sphere function.
Symmetry 14 02199 g013
Figure 14. RTL structure of SCA’s FPGA hardware module.
Figure 14. RTL structure of SCA’s FPGA hardware module.
Symmetry 14 02199 g014
Figure 15. HSP waveforms.
Figure 15. HSP waveforms.
Symmetry 14 02199 g015
Figure 16. [ X 1 ] waveforms.
Figure 16. [ X 1 ] waveforms.
Symmetry 14 02199 g016
Figure 17. [ X 2 ] waveforms.
Figure 17. [ X 2 ] waveforms.
Symmetry 14 02199 g017
Figure 18. [ X 3 ] waveforms.
Figure 18. [ X 3 ] waveforms.
Symmetry 14 02199 g018
Figure 19. [ X 4 ] Waveforms.
Figure 19. [ X 4 ] Waveforms.
Symmetry 14 02199 g019
Figure 20. [Xbest] Waveforms.
Figure 20. [Xbest] Waveforms.
Symmetry 14 02199 g020
Figure 21. First test results.
Figure 21. First test results.
Symmetry 14 02199 g021
Figure 22. Second test results.
Figure 22. Second test results.
Symmetry 14 02199 g022
Figure 23. Third test results.
Figure 23. Third test results.
Symmetry 14 02199 g023
Figure 24. Fourth test results.
Figure 24. Fourth test results.
Symmetry 14 02199 g024
Figure 25. Fifth test results.
Figure 25. Fifth test results.
Symmetry 14 02199 g025
Table 1. Symbol table.
Table 1. Symbol table.
SAT ProblemBoolean satisfiability problem
FPGAField-programmable gate array
SCASingle-cell-membrane algorithm
MA-PSO-MCell-like P systems with particle swarm optimization
PSOParticle swarm optimization
RGBSORandom grouping brain storm optimization
ACOAnt colony optimization
MSAMulti-membrane search algorithm
GAGenetic algorithm
FSMFinite state machine
Table 2. Optimal multiset for FPGA platform testing.
Table 2. Optimal multiset for FPGA platform testing.
MultisetSigned BinaryDecimal
Sign BitHigh 7 BitsLower 8 Bits
1X1_best1111111111111110−7.81 × 103
X2_best00000000000000107.81 × 10−3
X3_best00000000000000000
2X1_best00000000000000000
X2_best00000000000000000
X3_best00000000000000013.90 × 10−3
3X1_best1111111111111101−3.90 × 10−3
X2_best10000000000000017.80 × 10−3
X3_best0111111111111100−3.90 × 10−3
4X1_best1111111111111101−1.17 × 10−2
X2_best00000000000000013.90 × 10−3
X3_best1111111111111100−1.56 × 10−2
5X1_best1111111111111111−3.90 × 10−3
X2_best1111111111111100−1.56 × 10−2
X3_best00000000000000000
Table 3. Optimal fitness of FPGA platform SCA test.
Table 3. Optimal fitness of FPGA platform SCA test.
FITminSigned BinaryDecimal
Sign BitHigh 16 BitsLower 16 Bits
10000000000000000000000000000010001.22 × 10−4
20000000000000000000000000000000011.52 × 10−5
30000000000000000000000000000001109.15 × 10−5
40000000000000000000000000000110103.97 × 10−4
50000000000000000000000000000100012.59 × 10−4
Table 4. Optimal fitness of SCA test based on FPGA and Matlab platform.
Table 4. Optimal fitness of SCA test based on FPGA and Matlab platform.
FPGA12345AVEStdBESTWORST
1.22 × 10−41.52 × 10−59.15 × 10−53.97 × 10−42.59 × 10−41.77 × 10−41.51 × 10−41.52 × 10−53.97 × 10−4
Matlab12345AVEStdBESTWORST
2.45 × 10−41.01 × 10−53.82 × 10−42.17 × 10−42.13 × 10−51.75 × 10−41.58 × 10−41.01 × 10−53.82 × 10−4
Table 5. Comparison of Matlab and FPGA consumption time.
Table 5. Comparison of Matlab and FPGA consumption time.
Matlab TimeFPGA Time
10.013 s412,017 ns ≈ 0.000412 s
20.012 s412,099 ns ≈ 0.000412 s
30.012 s416,414 ns ≈ 0.000416 s
40.012 s407,500 ns ≈ 0.000407 s
50.012 s409,580 ns ≈ 0.000409 s
AVE0.0122 s411,522 ns ≈ 0.000411 s
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Song, Q.; Huang, Y.; Lai, W.; Xu, J.; Xu, S.; Han, T.; Rong, X. FPGA Hardware Realization of Membrane Calculation Optimization Algorithm with Great Parallelism. Symmetry 2022, 14, 2199. https://doi.org/10.3390/sym14102199

AMA Style

Song Q, Huang Y, Lai W, Xu J, Xu S, Han T, Rong X. FPGA Hardware Realization of Membrane Calculation Optimization Algorithm with Great Parallelism. Symmetry. 2022; 14(10):2199. https://doi.org/10.3390/sym14102199

Chicago/Turabian Style

Song, Qi, Yourui Huang, Wenhao Lai, Jiachang Xu, Shanyong Xu, Tao Han, and Xue Rong. 2022. "FPGA Hardware Realization of Membrane Calculation Optimization Algorithm with Great Parallelism" Symmetry 14, no. 10: 2199. https://doi.org/10.3390/sym14102199

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop