Spiking PointCNN: An Efficient Converted Spiking Neural Network under a Flexible Framework

Tao, Yingzhi; Wu, Qiaoyun

doi:10.3390/electronics13183626

Open AccessArticle

Spiking PointCNN: An Efficient Converted Spiking Neural Network under a Flexible Framework

by

Yingzhi Tao

¹

and

Qiaoyun Wu

^2,*

¹

School of Computer Science and Technology, Anhui University, Hefei 230601, China

²

School of Artificial Intelligence, Anhui University, Hefei 230601, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(18), 3626; https://doi.org/10.3390/electronics13183626

Submission received: 12 August 2024 / Revised: 1 September 2024 / Accepted: 4 September 2024 / Published: 12 September 2024

(This article belongs to the Special Issue Artificial Intelligence in Image Processing and Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

:

Spiking neural networks (SNNs) are generating wide attention due to their brain-like simulation capabilities and low energy consumption. Converting artificial neural networks (ANNs) to SNNs provides great advantages, combining the high accuracy of ANNs with the robustness and energy efficiency of SNNs. Existing point clouds processing SNNs have two issues to be solved: first, they lack a specialized surrogate gradient function; second, they are not robust enough to process a real-world dataset. In this work, we present a high-accuracy converted SNN for 3D point cloud processing. Specifically, we first revise and redesign the Spiking X-Convolution module based on the X-transformation. To address the problem of non-differentiable activation function arising from the binary signal from spiking neurons, we propose an effective adjustable surrogate gradient function, which can fit various models well by tuning the parameters. Additionally, we introduce a versatile ANN-to-SNN conversion framework enabling modular transformations. Based on this framework and the spiking X-Convolution module, we design the Spiking PointCNN, a highly efficient converted SNN for processing 3D point clouds. We conduct experiments on the public 3D point cloud datasets ModelNet40 and ScanObjectNN, on which our proposed model achieves excellent accuracy. Code will be available on GitHub.

Keywords:

point clouds processing; computer vision; pattern recognition; deep learning; spiking neural network; artificial intelligence

1. Introduction

Three-dimensional data, which provide an accurate description of shape and structure, are crucial in fields such as autonomous driving [1] and robotics [2]. Point clouds, as a representative form of these data, are composed of three-dimensional point data, making them highly valuable in modern technology development, like the research in [3,4]. As a fundamental and important branch, the progress in point cloud classification tasks has also made breakthroughs with the development of deep learning technology, such as the proposal of MAT-Net [5]. Despite many advancements, current point cloud processing methods still have some shortcomings. Due to the complexity of data and computation, deploying these methods on resource-limited mobile platforms, such as robots, is quite challenging [6].

Spiking neural networks (SNNs), representing the third generation of neural networks, have advantages over traditional artificial neural networks (ANNs) in some aspects. The working principle of SNNs is based on the transmission of pulses between neurons, and there is a lot of research in this area, such as [7]. When a neuron receives sufficient input signals, it generates a pulse and transmits it to the next layer of neurons. The information transfer in these neurons mimics the information transfer in biological neurons, i.e., via the precise timing of spikes or a sequence of spikes [8]. This event-driven approach allows SNNs to process sparse data more efficiently [9], with lower energy consumption [10].

Considering the advantages of both ANN and SNN, converting the former into the latter is an effective method to fully utilize the dual benefits and apply them to practical tasks, such as achieving high-precision and ultra-low latency SNNs in [11,12], which outperform state-of-the-art and directly trained SNNs in terms of accuracy and time step. Some recent studies have proposed parameter calibration techniques that align the activation patterns of SNNs with those of their ANN counterparts. For instance, Li et al. (2022) demonstrate that adjusting network parameters during conversion significantly reduces the mismatch between ANN and SNN activations, thus preserving accuracy [13]. Similarly, Deng et al. (2021) highlight the importance of optimal parameter tuning to minimize performance loss during conversion, ensuring that the resulting SNN maintains the functionality of the original ANN [14]. Additionally, Rueckauer et al. (2017) provide evidence that efficient event-driven networks can be derived from continuous-valued deep networks through careful conversion processes, enhancing computational efficiency without sacrificing performance [15]. The underlying mechanisms that enable these successful conversions are rooted in the temporal dynamics of spiking neurons, which allow SNNs to process time-dependent information more naturally, as discussed by Pfeiffer and Pfeil (2018) [16]. Moreover, Diehl et al. (2015) show that balancing weights and thresholds during the conversion process can lead to high-accuracy spiking networks that are well-suited for fast classification tasks [17]. These findings collectively underscore the viability of converting ANNs to SNNs, provided that the process is carefully managed to address the inherent differences between the two network types.

SNNs have widespread applications in many fields, especially in two-dimensional data processing, achieving excellent results in image classification tasks such as in [18,19]. In [20], the authors proposed a conversion framework. However, they did not specify the type of surrogate function employed in their approach. This omission suggests that they might have utilized either a Sigmoid or Softplus function, both of which are commonly used as surrogate gradients. But these are not inherently designed for spiking neural networks (SNNs). The implementation of a more specialized surrogate gradient function could potentially enhance the results. Similarly, in [21], the authors demonstrated excellent accuracy on a synthetic dataset. However, they did not extend their evaluation to real-world datasets. This limitation suggests that the model may not be practical enough to process real data that include noise or disturbances. In Section 4.3 of our study, we tested the model on the real-world dataset and observed that the results were less than satisfactory. To solve the problems, in this work, we have two key issues to deal with: (1) We should construct a specialized surrogate gradient function to ensure the spiking neurons trainable under backpropagation-base methods to ensure accuracy performance. (2) We should design a structure making the model robust enough to perform with excellent accuracy on real-world datasets.

We focus on efficiently converting artificial neural networks (ANNs), specifically convolutional neural networks (CNNs), to spiking neural networks (SNNs) for 3D point cloud classification. To address the prior problem, we apply the X-transformation operation that PointCNN [22] utilizes and a current-based LIF spiking model. We define a preprocessing spiking neuron to convert continuous floating-point tensors into discrete pulse signals. Based on this, we design two spiking neurons, one for extracting features into high dimensions and one for generating a feature matrix X. The neurons are fused into the traditional X-Conv operator, reconstructing the spiking X-Convolution module. Compared to the original X-Conv operator, the new X-Convolution unit exhibits better results under the spiking neural network structure. For the second issue, we propose an adjustable surrogate gradient function, enabling the converted network to be correctly trained within the backpropagation training framework. Our proposed surrogate function maximally fits the spiking signals during backpropagation while providing smooth construction to obtain continuous gradients. Hence, the model can be trained effectively under an ANN training framework. Finally, we present a flexible conversion framework that combines our specially designed spiking X-Convolution module with an adjustable surrogate gradient function. We thus raise Spiking PointCNN.

To summarize, in the work, we firstly introduce CNNs on point clouds (Section 2.1) and SNNs (Section 2.2) in the Related Works section. In the Methods part, we introduce the Spiking X-Convolution module (Section 3.1), the proposed adjustable surrogate gradient function (Section 3.2), and the all-purpose ANN-to-SNN converting framework (Section 3.3). In the fourth section, Experiments and Results, experimental settings (Section 4.1), ablation study (Section 4.2), and a comparison with SOTAs (Section 4.3) are demonstrated. Finally, the conclusion is discussed in the fifth part. The main contributions are as follows:

We define two different spiking MLP neurons and construct the spiking X-Convolution module, which realizes the combination of spiking neurons and X-transformation for the first time. The module enables convolutional neural networks to perform effective local feature extraction and permutation-invariant classification on unordered point cloud data.
We propose an adjustable surrogate gradient function, which makes discrete signals differentiable during backpropagation, enabling the spiking neural network to be accurately supervised during training.
We build an efficient spiking neural network converted from a convolutional neuron network under a flexible conversion framework, which exhibits strong robustness and achieves high accuracy on two benchmark datasets, ModelNet40 [23] and ScanObjectNN [24].

2. Related Works

2.1. Convolution Neural Networks on Point Clouds

The convolutional neural network (CNN) is firstly proposed and defined as a model consisting of multiple layers of trainable filters (convolutions) and sub-sampling in [25]. The application of deep CNNs in large-scale image classification in AlexNet [26] highlighted the effectiveness of ReLU activation functions and Dropout regularization, leading to the concept of depth. Proposed in 2015, Residual Networks (ResNets) [27] introduce residual blocks to address the vanishing gradient problem in very deep networks, significantly improving training performance. CNNs now have been extensively applied in a variety of practical applications in grid data processing, demonstrating their versatility and robustness. For instance, in chronological age estimation [28], age and gender classification [29], and more complex tasks like contour detection [30], CNNs have shown exceptional performance. These applications underscore the adaptability of CNNs to a range of challenges in processing structured data.

Despite the success of CNNs in grid-like data structures, applying them to point clouds, which are irregular and unordered, presents unique challenges. Direct convolution between point-to-point features can result in loss of shape information and inconsistent outputs due to the varying order of input points. To address these challenges, pioneering models such as PointNet [31] and PointNet++ [32] were developed to directly process point clouds by learning global features through a shared MLP architecture. Building on this foundation, PointConv [33] redefines the convolution kernel for point clouds. It treats the kernel as a nonlinear function of local coordinates, using MLPs to learn a weight function and employing kernel density estimation to compute a density function. This method enhances the traditional CNN’s capability to handle the irregular format of point clouds, optimizing memory use and supporting deeper network architectures. Such advancements have enabled effective applications of CNNs to point cloud data in tasks like classification, segmentation, and object detection.

2.2. Spiking Neural Network

Spiking neural networks (SNNs) are a class of neural networks that aim to mimic the processing mechanisms of biological neurons more closely than traditional artificial neural networks. The foundation of SNNs lies in the spiking neuron models, which include classical models like Integrate-and-Fire (IF) applied in P2SResLNet [34], Leaky Integrate-and-Fire (LIF) utilized in Spiking PointNet [21], and Hodgkin–Huxley (HH) models. These models capture the dynamic behavior of biological neurons by considering the timing of spikes rather than the rate of firing. Recent innovations and explorations in this field have led to a variety of advanced models and approaches, as discussed in works such as [35].

Recent achievements of SNNs have demonstrated their potential in various applications, such as pattern recognition, sensory processing, and robotics. Novel SNN models have enhanced computational efficiency and performance by leveraging the temporal dimension of spike-based processing. For instance, Carpegna et al. developed an FPGA-optimized SNN accelerator, which significantly enhances both performance and energy efficiency, making it particularly suitable for applications that are energy-sensitive or require low latency [36]. However, this approach demands specialized FPGA expertise and might face limitations in scalability across different hardware configurations. Bybee et al. introduced Spiking Phasor Neural Networks (SPNNs), which leverage complex-valued DNNs and spike timing codes to achieve superior performance on complex datasets such as CIFAR-10 [37]. Despite their promising results, the complexity inherent in SPNNs, particularly in their use of complex-valued operations, could lead to increased computational costs and challenges in practical deployment. Kulkarni et al. provided a comprehensive review of various SNN algorithms and their applications, emphasizing the potential of SNNs in advancing cognitive computing [38]. However, as a review from 2017, it may not encompass the most recent developments in the fast-evolving field of SNN research. These studies collectively enhance our understanding of SNNs’ capabilities in pattern recognition, sensor data processing, and robotic control.

3. Methods

We propose an efficient converted spiking neural network under an all-purpose ANN-to-SNN converting framework with an adjustable surrogate gradient function, which are applied to generate Spiking PointCNN. We primarily combine X-transformation with leaky integrate-and-firing neurons to construct spiking X-Convolution module. Then, an effective surrogate gradient function that can be tuned to fit various models is presented to cope with uncomputable gradient. Next, our innovative, all-purpose ANN-to-SNN framework is demonstrated, which can be applied to various ANN models, especially suitable for CNNs. Finally, we combine the spiking X-Convolution module and the flexible converting framework with the proposed adjustable surrogate gradient function to construct Spiking PointCNN.

3.1. Spiking X-Convolution Module

Inspired by the combination of spiking neurons and typical ANN components, we introduce LIF neuron into X-transformation to construct spiking X-Convolution module functioning in 3D point clouds processing.

3.1.1. Leaky Integrate-and-Firing Neuron

Leaky integrate-and-firing (LIF) model is a simplified spiking neuron widely applied in spiking neural networks. It functions by simulating the accumulation and discharge of voltage in neurons to mimic the firing activity of neurons. The principle of an LIF neuron isgiven by

\frac{d V}{d t} = \frac{- (V (t) - V_{r e s t}) + R \cdot i (t)}{τ_{m}}

(1)

τ_{m} = R C

(2)

V (t^{+}) = \{\begin{matrix} V_{r e s e t} & if V (t) \geq V_{t h r e s h} \\ V (t) & otherwise \end{matrix}

(3)

where

V (t)

represents the membrane potential at time t and

τ_{m}

stands for the membrane time constant, which is the product of membrane resistance R and the membrane capacitance C. From the equations, it can be seen that the potential naturally decays towards the resting potential

V_{r e s t}

, simulating the condition when a neuron is not stimulated. When a stimulus signal external current

i (t)

flows into the neuron, the potential accumulates and increases. If the membrane potential exceeds the voltage threshold

V_{t h r e s h}

, it is considered that the neuron has emitted a spike signal and it is then reset to the lower potential

V_{r e s e t}

.

3.1.2. X-Transformation Convolution [22]

Since point cloud data tend to be disorganized, the X transformation module is proposed to retain local correlation features. The key of the transformation is converting traditional convolution transform to suit point cloud data processing. First, applied with KNN algorithm, original point cloud data P are subjected to a local coordinate transformation to P’. Then, coordinated with P’, features F are sent to 2 different MLP operations, one for elevating coordinate dimension and another one for generate the feature transformation matrix, given by

F_{δ} = M L P_{δ} (P^{'})

(4)

X = M L P_{t r a n s} (P^{'})

(5)

which were proposed in [22].

M L P_{δ}

in Equation (4), as is proposed in PointNet [31], is a multilayer perceptron that maps each local coordinate to a high-dimensional feature space in an attempt to capture the complexity of the local structure.

M L P_{t r a n s}

in Equation (5) is a combination of linear layers and nonlinear activation functions that transforms the input data output into a

K \times K

matrix, where K is the local neighborhood size. The outcomes of two MLP operations are saved in vector

F_{δ}

and P’. Then,

F_{*}

, the concatenating of

F

and

F_{δ}

, is permuted with the learned

K \times K

size X-transformation matrix

X

, given by

F_{X} = X \times F_{*} = X \times (F + F_{δ})

(6)

which was proposed in [22]. Finally, a typical convolution between

K

and

F_{X}

is conducted to finish the module. Combined with previously mentioned formulas, the module can be summarized as

F_{p} = CONV (K, M L P_{t r a n s} (P - p) \times [M L P_{δ} (P - p), F])

(7)

where all operations of X-transformation Convolution have been completed. The structure is shown in Figure 1.

3.1.3. Spiking X-Convolution Module

We introduce the LIF neuron into x-transformation convolution to define the spiking X-Convolution module. We design two fused spiking neurons based on the same signal-converting spiking neuron. Then, the spiking neurons are applied in the MLP operations to reconstruct the module. Utilizing specifically designed spiking neurons, the MLP operations are renamed as

M L P_{δ}^{'}

and

M L P_{t r a n s}^{'}

with the same function. The converting spiking neuron is defined as

\{\begin{matrix} I^{'} & = FN (F) \\ V^{'} (t) & = V_{r e s t} + R \cdot I^{'} (t) + V^{'} (0) \times exp (- \frac{t}{τ_{m}}) \\ s^{'} (t) & = V^{'} (t) ∣ V_{t h r e s h} \\ S^{'} & = {s_{1}^{'}, s_{2}^{'}, \dots, s_{t i m e s t e p}^{'}} \end{matrix}

(8)

where

I^{'}

stands for the simulating current flow after

F

is processed by a fully connected layer and

V^{'} (t)

simulates the accumulated membrane potential by time step t. Membrane potential is separated based on time step performing spike conversion. The resulting spike sequence is stored in

S^{'}

. This is a simple spiking neuron that serves to convert the original continuous floating-point signal into preliminary spike signals, acting as a transition for subsequent combinations in other ways. The prior neuron is applied in the spiking neuron functioning in

M L P_{t r a n s}^{'}

, which is defined as

\{\begin{matrix} I & = Conv 1 d (RS (BN (S^{'}))) \\ S & = {s_{1}, s_{2}, \dots, s_{t i m e s t e p}} \end{matrix}

(9)

where the membrane potential accumulation process is omitted for repeating and I represents the simulated current stimulus obtained from the previous spike signals through batch normalization, reshape, and convolution operations in order. This neuron is the key component in

M L P_{t r a n s}^{'}

to reassemble and weighted fusion features. The parallel spiking neuron for

M L P_{δ}

is defined as

\{\begin{matrix} I & = FN (BN (S^{'})) \\ S & = {s_{1}, s_{2}, \dots, s_{t i m e s t e p}} \end{matrix}

(10)

where the membrane potential accumulation process is ignored as well. In the Equation (10), I stands for the current signal after being processed by a batch normalization operation and a fully connected layer. This neuron is used to extract features into high-dimensional space, cooperating with other components in

M L P^{'}

.

After combining with the spiking neurons, the structure of the proposed spiking X-Convolution module is shown in Figure 2. The usage of spiking neurons enables the processing to filter more noise and improve the temporal efficiency of operations. To assign our proposed module high adaptability in model building, the input and output of it is designed to be continuous by applying other ANN layers.

3.2. Effective Adjustable Surrogate Gradient Function

Supervised methods in ANNs are based on the chain rule and are always efficient. Converted SNNs also apply the method to retain high accuracy. For a neuron

j

in iteration

n

, the principle part in updating the parameters is computing the correction in weight based on local gradient, which are derived from the backpropagation algorithm described in Simon Haykin’s book [39], given by

δ_{j} (n) = \frac{\partial E (n)}{\partial v_{j} (n)} = \frac{\partial E (n)}{\partial e_{j} (n)} \frac{\partial e_{j} (n)}{\partial y_{j} (n)} \frac{\partial y_{j} (n)}{\partial v_{j} (n)} = e_{j} (n) φ_{j}^{'} (n)

(11)

Δ w_{j i} (n) = - η δ_{j} (n) y_{i} (n)

(12)

where

δ_{j} (n)

is the local gradient,

E (n)

is the total error,

v_{j} (n)

is the induced local field produced at the input,

e_{j} (n)

is the difference between actual output and the target,

φ_{j}^{'} (n)

is the derivative of the activation function,

Δ w_{j i} (n)

represents the correction in weight,

η

is the learning rate, and

y_{i} (n)

is the output of neuron.

For spiking neurons, they emit a discrete binary pulse and the discontinuous signals are not derivable. As a result, the derivative

φ_{j}^{'} (n)

in Equation (11) is zero or infinity, causing the computed change in weight to be zero or infinity as well. In this case, by directly calculating gradient, the supervised method is not working.

To address the problem, utilizing surrogate gradient functions is an effective method. Applying a differentiable surrogate gradient function enables spiking neurons to operate during backpropagation. We propose and apply an adjustable and all-use surrogate gradient function, given by

φ (x) = \frac{1}{2 ln 2} (tanh (k (x - V_{t h r e s})) + 1) \cdot ln (1 + exp (- c (x - d))) + B

(13)

where

k

affects the mutation rate of the function near the point and V_thres represents the activation threshold of the pulse.

c

,

d

, and

B

are other constants regulating the shape and monitoring the usage.

When

c

equals 0,

φ (x)

is

\tanh

-like and possesses the characteristics of

sigmoid

. The gradient can be controlled by different

k

. The

φ (x)

with different

k

and their gradient functions along with that of

sigmoid

are shown in Figure 3. It can be seen that when

k

increases, a more accurate gradient function fitting spiking values is obtained in the backpropagation process. However, if

k

is set to a large value, the gradient is sharp in a narrow range, leading to the risk of the gradient exploding or vanishing again. The reason is that the final weight gradient is calculated by multiplying gradients through time steps and layers, making it either too big or very small. In contrast, when

k

is set low, a more inaccurate gradient is calculated, resulting in a severe error final gradient for accumulated discrepancy through time steps and layers. The condition is particularly pronounced in a deep network. Thus, the performance is adversely affected.

Apart from being applied as surrogate functions in spiking neural networks, our proposed function is flexible enough to act as a normal activation function in various models. By tuning the constants, it is able to simulate or even replace activation functions like

ReLU

,

ELU

, and

SoftPlus

, shown in Figure 4.

3.3. The All-Purpose ANN-to-SNN Converting Framework

The spiking neural network structure differs from artificial neuron networks in diverse aspects, resulting in various challenges in converting ANNs to SNNs. Solving the obstacles, we summarize and design an all-use ANN-to-SNN converting framework. The schema is based on a modular design, enhancing generality and flexibility. In this section, we will introduce the framework from different aspects.

3.3.1. Time Steps

Spiking neural networks are composed of spiking neurons that imitate brain activities. In the aforementioned section, information transmission between spiking neurons relies on time step-based spiking signals. However, time steps have not been incorporated into the design of the ANN. To tackle the first barrier in conversion operation, we introduce time steps into original artificial layers to formulate new spiking blocks.

Convolution, batch normalization, and full connection operations are well-designed, robust, and efficient. Thus, our conversion retains the features and revises them by combining time steps without disrupting these efficient transformations. To obtain

n

time steps discrete signals after continuous numerical transformations, we choose to repeat the tensor

n

times and add a time dimension. Then, they are packaged as a minimum unit and used for next-step transmission. Before reaching a spiking neuron, the tensor will be unpacked and each sub-tensor is individually processed in arriving artificial layers. When encountering a spiking neuron, the value will be sent into it

n

times to simulate membrane potential accumulation. In this case, the features and advantages of ANNs are retained while spiking operations are able to be conducted under the framework too.

3.3.2. Spiking Neuron Model

Spiking neurons can be seen as the key transformation in an ANN-to-SNN conversion. In the framework, the leaky integrate-and-firing neuron, as previously mentioned, is applied for its precision and efficiency. The LIF model utilized in the schema works as the activation function in hidden layers, turning a floating-point number into binary spiking pulse. Therefore, during forward propagation, the tensor of a traditional artificial neuron network is converted into spiking signals, altering the ANN to an SNN. Consequently, the proposed framework is able to fuse the spiking neuron into the network, successfully transforming the source model into a spiking neural network. The process is shown in Figure 5.

3.3.3. Training Methodology

Another significant variance between ANNs and SNNs is the training strategy. For SNNs, the most utilized methods are unsupervised learning algorithms like STDP and the Hebbian learning rule. Nevertheless, ANN training is highly reliant on gradient computation like gradient descent and batch gradient descent, which can be regarded as a key factor for their excellent precision.

Under our proposed framework, to sustain the high accuracy, we apply stochastic gradient descent method to train the converted SNN. However, as mentioned earlier, discontinuous spiking signals are not derivable, leading to failure in computing gradient. So, we utilize the surrogate gradient strategy, as previously discussed, using our proposed function in Equation (13). Further, the parameters are supposed to be set at

c = d = B =

0 and V_thres represents the minimum activation threshold for spikes. The function works effectively by providing continuous gradient and maximum degree of alignment with pulse signals.

Finally, we combine the spiking X-Convolution module with the effective converting framework containing special gradient function to create our innovative network, Spiking PointCNN. The model is composed of 4 spiking X-Convolution modules, which also utilize the proposed converting schema, and spiking full-connection layers. The structure can be seen in Figure 6.

4. Experiments and Results

4.1. Experimental Settings

To demonstrate the efficiency of the proposed network structure, experiments are conducted on two public 3D point cloud classification datasets: ModelNet40 and ScanObjectNN. ModelNet40 is a synthetic dataset consisting of 40 categories of 3D CAD models for common items. Generated by Princeton University, it provides distinctive features of objects with little distribution occurring. The training part of the dataset comprises 9843 samples, while the testing part comprises 2468 samples. ScanObjectNN is constructed by real-world scanning data, which contain ample noise and interference. The dataset comprises about 15,000 objects, which are divided into 15 different categories with 11,416 instances in the training set and 2882 instances in the testing set. For evaluation indicators, overall accuracy (OA) on entire dataset classification and the mean accuracy (mACC) on different categories are utilized.

Since our proposed structure is converted from an artificial neural network, it contains a large amount of mathematics computing, including the convolution operation. Thus, training on a traditional PC performs better than on neuromorphic hardware. Our experiments are conducted on a PC with two Intel Xeon Silver 4210R 2.40 GHz 10-core processors and an NVIDIA GeForce RTX 3080Ti GPU. Our python 3.8 interpreter is based on PyTorch and PyTorch Geometric. For our replacing spiking layers, overall time latency T is set to 1 by default.

4.2. Ablation Study

Entire ablation experiments are conducted on ModelNet40 first to determine the best structure of Spiking PointCNN and examine the effectiveness of different modules in the model.

4.2.1. Ablation on Components in Spiking X-Convolution Module

Figure 6 presents the structure of our proposed spiking X-Convolution module with a specifically designed spiking

M L P_{δ}

neuron and spiking

M L P_{t r a n s}

neuron. We designed four different types of X-transformation modules with or without the spiking neurons. Experiments are conducted in the converted SNN framework to demonstrate the effectiveness of the spiking neurons designed module under the proposed framework. Table 1 reports evaluation results on the ModelNet40 testing dataset. It can be clearly made out that, under the framework of converted spiking neural networks, our

M L P

s with appropriate spiking neurons achieve the highest performance. Compared with the structure without any spiking neurons, our proposed module has a 1.26% improvement in OA and 4.50% improvement in mACC. However, when only one spiking neuron is applied to the module, their performance can be even lower than the original one. We supposed that that is caused by the mismatch of two

M L P

operations. In this case, we choose to apply both the spiking

M L P_{δ}

neuron and spiking

M L P_{t r a n s}

neuron to construct the spiking X-Convolution module. The spiking X-Convolution module is for building our final classifier in other ablation experiments.

4.2.2. Ablation on Surrogate Gradient Function

Since our proposed surrogate gradient function is adjustable, to fit the usage of simulating spiking signals, the parameters are set to be

c = d = B = 0

and

V_{t h r e s} = 0.5

. However, as mentioned earlier, k affects the simulation degree and the smoothing of the gradient. In this case, k cannot be too large or too small. We test different k values in the function on ModelNet40 and record the maximum overall accuracy and mean accuracy in Table 2. When

k = 1

, the network performs best in OA; when

k = 10

, the accuracy is worst, which may be caused by large k leading to dramatic changes in the function, resulting in excessively large gradients during backpropagation. In this case, the ablation experiments set

k = 1

as default.

4.2.3. Ablation on Time Steps

Time step T is a key parameter in SNNs, and our evaluation results on the time steps area are presented in Table 3. We set T to 1, 2, 3, and 4 to conduct the ablation study. We observe that our network achieves the best OA when

T = 3

and the highest mACC when

T = 2

. Though OA in

T = 3

performs better than that in

T = 2

, we can infer that

T = 2

has higher average OA from the trends in the evolution of best mACC, indicating that the emergence of best OA in

T = 3

may be with some randomness. Generally, the result is consistent with experiments on T in Spiking PointNet. We consider that when

T = 1

, the features of input pictures are lost because, in this case, the spiking neurons act as binary machines—that is, they output 1 when the value exceeds the threshold and 0 otherwise. This results in the loss of temporal and spatial sequence information. We also conjecture that when T is too large, the information is redundantly expressed, creating noise beyond the features. For instance, spike sequences may repeat, adversely affecting the learning and fitting of true features. For the experiments, setting a larger T is not practical due to limitations in experimental physics conditions. Naturally, we can infer that increasing T beyond a certain point will not result in better performance. Hence, our other ablation studies are conducted by setting

T = 2

.

4.3. Comparison with SOTAs

Table 4 presents the comparison results from various ANN-to-SNN converted models on two benchmark datasets. All the networks convert an ANN to an SNN. KPConv-SNN is an ANN-to-SNN network following [40] to convert an ANN-based method into SNN form in [34]. PointNet_Baseline, PointNet++_Baselin, PointNet_Efficient, and PointNet++_Efficient are converted models compared or proposed in [20], while the efficient ones utilize their proposed converting framework. However, since the code is not open-source, we only refer their overall accuracy on two datasets without mean accuracy. Spiking PointNet [21] is converted by a new framework and achieves good performance. We conduct its official code and simply change the dataset, obtaining the accuracy tested on ScanObjectNN dataset for comparison.

From the table, it can be found that our proposed model performs well on ModelNet40, where the overall accuracy and mean accuracy are 85.94% and 83.14%. Compared with other networks, although it has less accuracy within 3 percent on a synthetic dataset whose data are without any disturbance, the network achieves high accuracy on the real-world dataset ScanObjectNN. It reaches an OA of 76.10% and a mACC of 73.30%, which are higher than those of the other models. The OA on ScanObjectNN is 4.83% higher than PointNet++_Efficient.

The superior performance of Spiking PointNet on the synthetic ModelNet40 dataset is acknowledged; however, our proposed model demonstrates significantly better performance on the ScanObjectNN dataset, which contains real-world disturbances. This suggests that our model is more robust and better suited for practical applications, where background interference is often unavoidable. The ability to maintain high accuracy in such noisy environments underscores the practical applicability and resilience of our approach compared to models optimized on clean datasets.

5. Conclusions

In this paper, we have presented Spiking PointCNN, a converted spiking neural network based on spiking X-Convolution modules. The work was motivated by the enormous development potential of SNN and the demand for and gap in the field of point cloud classification with SNNs. We fused the spiking neurons to construct the spiking X-Convolution module, extracting additional feature dimensions while preserving the sequential relationship between points. Then, we proposed an adjustable surrogate gradient function addressing the non-differentiability issue of pulse activation functions. This function can also be replaced by tuning parameters to substitute for other different activation functions. Subsequently, we proposed a universal framework for converting ANN to SNN, combining our Spiking X-Convolution and surrogate gradient function to obtain our Spiking PointCNN model. The ablation study and experiments on the final model were conducted on the ModelNet40 and ScanObjectNN datasets. Our network achieved excellent performance, especially surpassing other converted SNNs by a large margin on the real dataset ScanObjectNN. This indicates that our model not only underwent efficient conversion but also holds practical utility.

Author Contributions

Conceptualization, Y.T. and Q.W.; methodology, Y.T.; validation, Y.T.; formal analysis, Y.T.; investigation, Y.T.; writing—original draft preparation, Y.T.; writing—review and editing, Y.T. and Q.W.; visualization, Y.T.; supervision, Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code created and used in this study is available on GitHub( https://github.com/DeadlyAbyss/Spiking-PointCNN, accessed on 11 August 2024). The ModelNet40 dataset is publicly available at ModelNet (https://modelnet.cs.princeton.edu/, accessed on 11 June 2024). The ScanObjectNN dataset is publicly available at ScanObjectNN (https://hkust-vgd.github.io/scanobjectnn/, accessed on 11 June 2024). Other data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to thank the editor and anonymous reviewers for their comments to improve the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial Neural Network
CNN	Convolutional Neural Network
HH	Hodgkin–Huxley
IF	Integrate-and-Fire
LIF	Leaky Integrate-and-Fire
mACC	Mean accuracy
OA	Overall accuracy
SNN	Spiking Neural Network

References

Xu, F.; Xu, F.; Xie, J.; Pun, C.M.; Lu, H.; Gao, H. Action Recognition Framework in Traffic Scene for Autonomous Driving System. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22301–22311. [Google Scholar] [CrossRef]
Liu, N.; Yuan, Y.; Zhang, S.; Wu, G.; Leng, J.; Wan, L. Instance Segmentation of Sparse Point Clouds with Spatio-Temporal Coding for Autonomous Robot. Mathematics 2024, 12, 1200. [Google Scholar] [CrossRef]
Lin, H.; Gao, J.; Zhou, Y.; Lu, G.; Ye, M.; Zhang, C.; Liu, L.; Yang, R. Semantic decomposition and reconstruction of residential scenes from LiDAR data. ACM Trans. Graph. 2013, 32, 66. [Google Scholar] [CrossRef]
Zhai, M.; Ni, K.; Xie, J.; Gao, H. Learning Scene Flow from 3d Point Clouds with Cross-Transformer and Global Motion Cues. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
Hu, J.; Wang, B.; Qian, L.; Pan, Y.; Guo, X.; Liu, L.; Wang, W. MAT-Net: Medial Axis Transform Network for 3D Object Recognition. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019. [Google Scholar]
Zhang, H.; Wang, C.; Tian, S.; Lu, B.; Zhang, L.; Ning, X.; Bai, X. Deep learning-based 3D point cloud classification: A systematic survey and outlook. Displays 2023, 79, 102456. [Google Scholar] [CrossRef]
Yuan, Y.; Liu, J.; Zhao, P.; Huo, H.; Fang, T. Spike signal transmission between modules and the predictability of spike activity in modular neuronal networks. J. Theor. Biol. 2021, 526, 110811. [Google Scholar] [CrossRef] [PubMed]
Ghosh-Dastidar, S.; Adeli, H. Spiking neural networks. Int. J. Neural Syst. 2009, 19, 295–308. [Google Scholar] [CrossRef]
Chakraborty, B.; Kang, B.; Kumar, H.; Mukhopadhyay, S. Sparse spiking neural network: Exploiting heterogeneity in timescales for pruning recurrent SNN. arXiv 2024, arXiv:2403.03409. [Google Scholar]
Alawad, M.; Yoon, H.J.; Tourassi, G. Energy efficient stochastic-based deep spiking neural networks for sparse datasets. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017; pp. 311–318. [Google Scholar]
Bu, T.; Fang, W.; Ding, J.; Dai, P.; Yu, Z.; Huang, T. Optimal ANN-SNN conversion for high-accuracy and ultra-low-latency spiking neural networks. arXiv 2023, arXiv:2303.04347. [Google Scholar]
Xing, F.; Yuan, Y.; Huo, H.; Fang, T. Homeostasis-Based CNN-to-SNN Conversion of Inception and Residual Architectures. In Proceedings of the Neural Information Processing; Gedeon, T., Wong, K.W., Lee, M., Eds.; Springer: Cham, Switzerland, 2019; pp. 173–184. [Google Scholar]
Li, Y.; Deng, S.; Dong, X.; Gu, S. Converting artificial neural networks to spiking neural networks via parameter calibration. arXiv 2022, arXiv:2205.10121. [Google Scholar]
Deng, S.; Gu, S.; Wu, Y. Optimal conversion of artificial neural networks to spiking neural networks with minimal performance loss. IEEE Trans. Neural Networks Learn. Syst. 2021, 32, 2916–2926. [Google Scholar]
Rueckauer, B.; Lungu, I.A.; Hu, Y.; Pfeiffer, M.; Liu, S.C. Conversion of continuous-valued deep networks to efficient event-driven networks for image classification. Front. Neurosci. 2017, 11, 682. [Google Scholar] [CrossRef] [PubMed]
Pfeiffer, M.; Pfeil, T. Deep learning with spiking neurons: Opportunities and challenges. Front. Neurosci. 2018, 12, 774. [Google Scholar] [CrossRef] [PubMed]
Diehl, P.U.; Neil, D.; Binas, J.; Cook, M.; Liu, S.C.; Pfeiffer, M. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–8. [Google Scholar]
Li, J.; Hu, W.; Yuan, Y.; Huo, H.; Fang, T. Bio-Inspired Deep Spiking Neural Network for Image Classification. In Proceedings of the Neural Information Processing; Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S.M., Eds.; Springer: Cham, Switzerland, 2017; pp. 294–304. [Google Scholar]
Ji, S.; Gu, Q.; Yuan, Y.; Zhao, P.; Fang, T.; Huo, H.; Niu, X. A Retina-LGN-V1 Structure-like Spiking Neuron Network for Image Feature Extraction. In Proceedings of the 2021 5th International Conference on Video and Image Processing, Hayward, CA, USA, 22–25 December 2021; pp. 134–141. [Google Scholar] [CrossRef]
Lan, Y.; Zhang, Y.; Ma, X.; Qu, Y.; Fu, Y. Efficient converted spiking neural network for 3d and 2d classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 9211–9220. [Google Scholar]
Ren, D.; Ma, Z.; Chen, Y.; Peng, W.; Liu, X.; Zhang, Y.; Guo, Y. Spiking PointNet: Spiking Neural Networks for Point Clouds. Adv. Neural Inf. Process. Syst. 2024, 36, 1811. [Google Scholar]
Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. PointCNN: Convolution On X-Transformed Points. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, QC, Canada, 3–8 December 2018; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar]
Uy, M.A.; Pham, Q.H.; Hua, B.S.; Nguyen, T.; Yeung, S.K. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1588–1597. [Google Scholar]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Xie, J.C.; Pun, C.M. Chronological Age Estimation Under the Guidance of Age-Related Facial Attributes. IEEE Trans. Inf. Forensics Secur. 2019, 14, 2500–2511. [Google Scholar] [CrossRef]
Xie, J.C.; Pun, C.M. Age and gender classification using convolutional neural networks. In Proceedings of the IEEE Workshop on Analysis and Modelling of Faces & Gestures’, IEEE Transactions on Information Forensics and Security, Lille, France, 14–18 May 2019; pp. 1556–6013. [Google Scholar]
Liu, N.; Yuan, Y.; Wan, L.; Huo, H.; Fang, T. A Comparative Study for Contour Detection Using Deep Convolutional Neural Networks. In Proceedings of the 2018 10th International Conference on Machine Learning and Computing, Macau, China, 26–28 February 2018; pp. 203–208. [Google Scholar] [CrossRef]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Wu, W.; Qi, Z.; Fuxin, L. PointConv: Deep Convolutional Networks on 3D Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Wu, Q.; Zhang, Q.; Tan, C.; Zhou, Y.; Sun, C. Point-to-Spike Residual Learning for Energy-Efficient 3D Point Cloud Classification. Proc. AAAI Conf. Artif. Intell. 2024, 38, 6092–6099. [Google Scholar] [CrossRef]
Yuan, Y.; Huo, H.; Fang, T. Effects of Metabolic Energy on Synaptic Transmission and Dendritic Integration in Pyramidal Neurons. Front. Comput. Neurosci. 2018, 12, 79. [Google Scholar] [CrossRef] [PubMed]
Carpegna, G.; Fraccaroli, M.; Marongiu, A.; Benini, L. SPIKER: An FPGA-Optimized Hardware Accelerator for Spiking Neural Networks. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2022, 41, 914–927. [Google Scholar]
Bybee, M.; Hasani, R.; Grollier, J.; Daniel, L.; Eslami, M. Deep Spiking Phasor Neural Networks. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; pp. 1345–1355. [Google Scholar]
Kulkarni, S.R.; Rajendran, B. Spiking Neural Networks: A Review of Models, Learning Algorithms, and Applications. Front. Neurosci. 2017, 11, 286. [Google Scholar] [CrossRef]
Haykin, S. Neural Networks and Learning Machines, 3rd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
Cao, Y.; Chen, Y.; Khosla, D. Spiking deep convolutional neural networks for energy-efficient object recognition. Int. J. Comput. Vis. 2015, 113, 54–66. [Google Scholar] [CrossRef]

Figure 1. Structure of X-transformation convolution.

Figure 2. Structure of spiking X-transformation convolution module.

Figure 3. Comparison of spiking signals and sigmoid and surrogate functions with different k. Left picture shows the functions while the right one shows their gradients.

Figure 4. Comparison of proposed adjustable surrogate gradient function with traditional activation functions.

Figure 5. The process of a numeric matrix being converted to spiking matrix sets.

Figure 6. The complete structure of Spiking PointCNN.

Table 1. Ablation study on spiking MLP components of spiking X-Convolution module.

Components	OA (%)	mACC (%)
$M L P_{δ}, M L P_{t r a n s}$	84.68	79.98
$M L P_{δ}^{'}, M L P_{t r a n s}$	84.64	80.51
$M L P_{δ}, M L P_{t r a n s}^{'}$	82.46	79.46
$M L P_{δ}^{'}, M L P_{t r a n s}^{'}$	85.94	84.48

Table 2. Ablation study on different k in proposed surrogate gradient function on the ModelNet40 dataset.

K	0.5	1	5	10	15
OA (%)	84.24	86.71	84.24	85.94	65.80
mACC (%)	83.31	82.21	79.50	84.48	58.71

Table 3. Ablation study on different time steps on the ModelNet40 dataset. Spiking PointCNN with

T = 3

presents the highest OA while

T = 2

processes the best mACC.

Table 3. Ablation study on different time steps on the ModelNet40 dataset. Spiking PointCNN with

T = 3

presents the highest OA while

T = 2

processes the best mACC.

	T = 1	T = 2	T = 3	T = 4
OA (%)	84.28	85.78	85.94	84.97
mACC (%)	81.85	84.48	83.14	80.71

Table 4. Comparison of various ANN-to-SNN models on ModelNet40 and ScanObjectNN.

Model	ModelNet40	ScanObjectNN
Model	OA/mAcc (%)	OA/mAcc (%)
KPConv-SNN	70.5/67.6	43.9/38.7
PointNet_Baseline	84.2/-	62.6/-
PointNet++_Baseline	78.3/-	58.0/-
PointNet_Efficient	88.2/-	66.6/-
PointNet++_Efficient	89.5/-	69.2/-
Spiking PointNet	88.6/86.7	66.4/60.4
Spiking PointCNN	86.7/82.2	74.1/73.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tao, Y.; Wu, Q. Spiking PointCNN: An Efficient Converted Spiking Neural Network under a Flexible Framework. Electronics 2024, 13, 3626. https://doi.org/10.3390/electronics13183626

AMA Style

Tao Y, Wu Q. Spiking PointCNN: An Efficient Converted Spiking Neural Network under a Flexible Framework. Electronics. 2024; 13(18):3626. https://doi.org/10.3390/electronics13183626

Chicago/Turabian Style

Tao, Yingzhi, and Qiaoyun Wu. 2024. "Spiking PointCNN: An Efficient Converted Spiking Neural Network under a Flexible Framework" Electronics 13, no. 18: 3626. https://doi.org/10.3390/electronics13183626

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spiking PointCNN: An Efficient Converted Spiking Neural Network under a Flexible Framework

Abstract

1. Introduction

2. Related Works

2.1. Convolution Neural Networks on Point Clouds

2.2. Spiking Neural Network

3. Methods

3.1. Spiking X-Convolution Module

3.1.1. Leaky Integrate-and-Firing Neuron

3.1.2. X-Transformation Convolution [22]

3.1.3. Spiking X-Convolution Module

3.2. Effective Adjustable Surrogate Gradient Function

3.3. The All-Purpose ANN-to-SNN Converting Framework

3.3.1. Time Steps

3.3.2. Spiking Neuron Model

3.3.3. Training Methodology

4. Experiments and Results

4.1. Experimental Settings

4.2. Ablation Study

4.2.1. Ablation on Components in Spiking X-Convolution Module

4.2.2. Ablation on Surrogate Gradient Function

4.2.3. Ablation on Time Steps

4.3. Comparison with SOTAs

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI