Machine Learning Using Approximate Computing

Balasubramanian, Padmanabhan; Zaheen, Syed Mohammed Mosayeeb Al Hady; Maskell, Douglas L.

doi:10.3390/jlpea15020021

Open AccessArticle

Machine Learning Using Approximate Computing

by

Padmanabhan Balasubramanian

^*

,

Syed Mohammed Mosayeeb Al Hady Zaheen

and

Douglas L. Maskell

College of Computing and Data Science, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore

^*

Author to whom correspondence should be addressed.

J. Low Power Electron. Appl. 2025, 15(2), 21; https://doi.org/10.3390/jlpea15020021

Submission received: 14 February 2025 / Revised: 23 March 2025 / Accepted: 5 April 2025 / Published: 9 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Approximate computation has emerged as a promising alternative to accurate computation, particularly for applications that can tolerate some degree of error without significant degradation of the output quality. This work analyzes the application of approximate computing for machine learning, specifically focusing on k-means clustering, one of the more widely used unsupervised machine learning algorithms. The k-means algorithm partitions data into k clusters, where k also denotes the number of centroids, with each centroid representing the center of a cluster. The clustering process involves assigning each data point to the nearest centroid by minimizing the within-cluster sum of squares (WCSS), a key metric used to evaluate clustering quality. A lower WCSS value signifies better clustering. Conventionally, WCSS is computed with high precision using an accurate adder. In this paper, we investigate the impact of employing various approximate adders for WCSS computation and compare their results against those obtained with an accurate adder. Further, we propose a new approximate adder (NAA) in this paper. To assess its effectiveness, we utilize it for the k-means clustering of some publicly available artificial datasets with varying levels of complexity, and compare its performance with the accurate adder and many other approximate adders. The experimental results confirm the efficacy of NAA in clustering, as NAA yields WCSS values that closely match or are identical to those obtained using the accurate adder. We also implemented hardware designs of accurate and approximate adders using a 28 nm CMOS standard cell library. The design metrics estimated show that NAA achieves a 37% reduction in delay, a 22% reduction in area, and a 31% reduction in power compared to the accurate adder. In terms of the power-delay product that serves as a representative metric for energy efficiency, NAA reports a 57% reduction compared to the accurate adder. In terms of the area-delay product that serves as a representative metric for design efficiency, NAA reports a 51% reduction compared to the accurate adder. NAA also outperforms several existing approximate adders in terms of design metrics while preserving clustering effectiveness.

Keywords:

approximate computing; arithmetic circuits; digital logic design; k-means; machine learning; low power; high speed

1. Introduction

Approximate computing has gained prominence as a practical alternative to accurate computing, offering substantial improvements in speed and power efficiency [1,2]. However, these benefits come at the cost of reduced accuracy, necessitating a careful balance between performance enhancement and tolerable accuracy loss. This approach has been successfully investigated across various real-world applications [3], such as multimedia processing [4], digital signal processing [5,6], big data analytics [7], neuromorphic computing [8], neural network implementations for artificial intelligence and machine learning [9,10], software development [11], memory storage [12,13], and low-power graphics processing [14], among others.

This paper explores the application of approximate computing for machine learning, with a particular emphasis on k-means clustering, a widely used unsupervised machine learning technique for partitioning data into clusters based on similarity. K-means does not require any labeled data, which makes it suitable for unsupervised learning tasks. The primary goal of k-means clustering is to partition data points into k clusters, where k also represents the number of centroids, with each centroid serving as the central reference point for a cluster. The core idea of k-means is to minimize the sum of squared distances between the data points and their assigned cluster centroids. Thus, the objective of k-means is to minimize the Euclidean distance between data points and their cluster center, thus reducing the variance within each cluster, which makes it easy to evaluate and understand the results. The k-means is particularly valued for its simplicity, efficiency, and ease of implementation in various applications such as data compression, market segmentation, anomaly detection, dimensionality reduction, etc. It works well with large datasets and can handle a considerable amount of data due to its linear time complexity, as long as the number of clusters is manageable. The algorithmic time complexity of k-means is typically defined by O(n × k × t), where ‘n’ represents the number of data points, ‘k’ represents the number of clusters, and ‘t’ represents the number of iterations required for convergence. The k-means algorithm will usually converge to a solution in a finite number of iterations, and it tends to converge quickly, especially if a good initial set of centroids is chosen, which depends on the data. There also exist limitations of k-means in that it assumes clusters to be roughly spherical, which may not be suitable for all types of data (e.g., data with clusters of varying shapes and densities); the algorithm requires the number of clusters i.e., k, to be specified in advance, which is sometimes not intuitive or easy to determine; and k-means is sensitive to the initial placement of centroids, which can lead to suboptimal results if poor initializations are chosen. However, despite these limitations, the simplicity, speed, and ease of use of k-means make it a go-to clustering method for many practical applications.

In k-means, during the clustering process, data points are assigned to the closest centroid to minimize the within-cluster sum of squares (WCSS), with a lower WCSS value indicating an effective clustering. Conventionally, WCSS is computed with high precision using an accurate adder. This study examines the impact of employing various approximate adders for WCSS computation, and compares the results with those obtained using an accurate adder. Further, a novel approximate adder is introduced that demonstrates clustering performance comparable to an accurate adder, while achieving significant improvements in design efficiency and outperforming many existing approximate adders in optimizing key design metrics.

The remainder of this paper is structured as follows: Section 2 discusses the two primary categories of approximate adders—those with fixed approximation and those with variable approximation—before introducing the proposed approximate adder. Section 3 gives a concise overview of the k-means clustering algorithm and provides a comparison of the clustering results obtained using the accurate adder and the proposed approximate adder. Additionally, this section includes a discussion of the clustering performance using other approximate adders and an error analysis of various approximate adders. Section 4 presents the physical design metrics estimated for the accurate adder and various approximate adders, including the proposed approximate adder, with all the adders implemented using a 28 nm CMOS standard cell library. Finally, Section 5 concludes by summarizing the contributions of this paper.

2. New Approximate Adder

Approximate adders are typically designed by deliberately introducing inaccuracies into an accurate adder. These adders can be broadly classified into two categories: fixed approximation adders (FAAs) and variable approximation adders (VAAs). FAAs maintain a constant level of approximation, ensuring that they generate either an approximate sum with a predefined accuracy or an accurate sum, depending upon the given inputs within one clock cycle. A fixed approximation guarantees savings in design metrics for an approximate adder compared to its accurate counterpart. In contrast, VAAs allow for a dynamic approximation level, enabling them to produce either an approximate or accurate sum as needed, which may involve one or more clock cycles. VAAs often include an error detection and correction circuit (EDCC) aside from the adder logic to maintain the required output accuracy. While the EDCC is crucial in ensuring accuracy, it tends to introduce a design overhead. A study presented in [15] for a digital video encoding application found that the power saving achieved by a VAA was like that of an FAA. The reason for this was attributed to the additional EDCC in VAAs, which is not present in FAAs. Given that the proposed approximate adder falls under the FAA category, this paper focuses exclusively on FAAs. Various FAA architectures have been extensively documented and compared in the existing literature [16,17], and an interested reader is referred to the same for details; hence, they shall not be discussed here to avoid repetition. Instead, we shall describe the proposed approximate adder in this section. However, we shall cite many FAAs in the next section and consider them for a comparative evaluation in this work.

An FAA is generally composed of two distinct sections: a precise segment, where computations are performed with full accuracy, and an imprecise segment, where intentional errors are introduced into the computation. The lower-order bits of the adder are designated to the imprecise section, while the higher-order bits are assigned to the precise section. As a result, the precise segment plays a more critical role in maintaining overall computational accuracy compared to the imprecise segment.

The proposed approximate adder (NAA), illustrated in Figure 1, is an FAA that is divided into two sections, which are called precise and imprecise parts. As depicted in the figure, an N-bit NAA consists of an M-bit imprecise section and an (N–M)-bit precise section, with the latter being more critical in determining the overall accuracy of the computation. The imprecise section is highlighted in red, and the precise section is depicted in blue. The adder inputs are represented by A and B, and the adder output, i.e., the sum, is denoted by S.

In the imprecise section, a subset of the less significant sum bits is fixed at a binary value of 1, while the remaining sum bits are computed using reduced (i.e., approximate) logic. The M-bit imprecise part generates M sum bits, where S_M−1, S_M−2, and S_M−3 employ approximate logic, while the lower-order sum bits from S_M−4 to S₀ are assigned a constant value of 1. Sum bits S_M−2 and S_M−3 are produced by the logical OR of corresponding input bit pairs (A_M−2, B_M−2) and (A_M−3, B_M−3), respectively. The input bit pair (A_M−1, B_M−1) is logically XOR-ed, the input bit pair (A_M−2, B_M−2) is logically AND-ed, and the outputs of these XOR and AND gates are logically OR-ed to produce the sum bit S_M−1. Consequently, the less significant input bit pairs (A₀, B₀) through (A_M−4, B_M−4) are disregarded. An internal carry signal viz. C_T is generated by logically AND-ing the input bit pair (A_M−1, B_M−1), which is then fed into the precise section as its carry input. The (N–M)-bit precise section performs exact addition for the more significant sum bits, ranging from S_M to S_N, with S_N representing the carry overflow of the addition. The sum bits computed by the precise and imprecise sections are concatenated to generate the final sum output of NAA.

The generalized logic equations for the sum bits corresponding to the imprecise part, i.e., S_M−1 up to S₀, and the precise part, viz. S_N up to S_M of the NAA are given below. Equation (5) expresses the internal carry logic that feeds the precise part. Equation (6) refers to an arbitrary Kth bit adder stage in the precise part whose sum bit is dependent on the respective adder inputs and the carry output produced by the (K – 1)th bit adder stage.

S_M−4 = S_M−5 = S_M−6 = ⋯ = S₀ = 1

(1)

S_M−3 = A_M−3 | B_M−3

(2)

S_M−2 = A_M−2 | B_M−2

(3)

S_M−1 = (A_M−1 ⊕ B_M−1) | (A_M−2 & B_M−2)

(4)

C_T = A_M−1 & B_M−1

(5)

S_K = A_K ⊕ B_K ⊕ C_K−1

(6)

Typically, an N-bit NAA has an M-bit imprecise part where M is at least equal to 4 with S₀ fixed as 1 and S₁, S₂, and S₃ are associated with an approximate sum logic. When M ≥ 5, S_M−1, S_M−2, and S_M−3 would have the same logic as shown in Figure 1, and the remainder of the less significant sum bits would be fixed at 1. Further, the sizes of the precise and imprecise sections of an NAA can be adjusted based on specific application requirements, allowing flexibility in balancing accuracy and design efficiency.

3. K-Means Clustering Involving Accurate and Approximate Adders

The traditional k-means clustering algorithm [18] has been recognized as a fundamental and elegant approach for dividing numerical data into distinct clusters. It operates through an iterative process, where data points are assigned to clusters based on their closeness to a set of centroids. The algorithm alternates between two key steps. First, each data point is allocated to the nearest centroid using the Euclidean distance; second, the centroids are recalculated based on the average position of the points within their respective clusters. This cycle repeats until the centroids reach a stable state, ultimately forming well-defined clusters of data points [19].

A key characteristic of the k-means algorithm is its iterative expectation-maximization approach, which provides a level of error resilience at each step. Since each iteration incrementally refines the cluster assignments without requiring absolute precision in every update, k-means can effectively mitigate minor inaccuracies, particularly during centroid recalculations. This error tolerance is especially relevant to our case study, which examines the influence of approximate adders on k-means clustering. The algorithm’s convergence remains largely unaffected by small computational errors, making it a suitable framework for evaluating how approximate arithmetic impacts clustering accuracy. By integrating approximate adders into centroid computations, k-means presents a good testbed for analyzing the trade-offs between computational accuracy and clustering performance. In this work, we assess the clustering accuracy of the k-means algorithm by considering the total within-cluster sum of squares (WCSS) as our evaluation metric. WCSS is effective in reflecting how accurately the k-means algorithm has been executed computationally. This enables us to compare the clustering performances of various approximate adders by analyzing their ability to handle k-means despite arithmetic imprecisions. The methodology for calculating WCSS is discussed in the following.

To evaluate the clustering accuracy of different approximate adders, we conducted software-based experiments using publicly available artificial datasets [20]. The primary goal was to analyze the impact of approximate arithmetic on the accuracy of k-means clustering. We implemented the k-means algorithm in Python (version: 3.13) and initially simulated an accurate adder in the software. Subsequently, we modified the algorithm to integrate approximate adders during two key computations—the Euclidean distance within each dimension between a centroid and a data point, and the total sum of these Euclidean distances across all dimensions.

Consider an adder function named

A A (a, b)

, which accepts two integers

a

and

b

within the specified bit precision and returns their sum through approximate addition. In practical clustering scenarios, these integers may exceed the precision limits; therefore, preprocessing is necessary to ensure accuracy. To perform subtraction using an approximate adder, we define an auxiliary function

S b t r A A (a, b)

, given by Equation (7), where

C \geq b

to keep the adder from dealing with negative numbers. Hence, even if subtraction is accurate, incorporating the AA function introduces the approximate adder into the calculation.

S b t r A A (a, b) \leftarrow A A (C - b, a) - C

(7)

As an illustration, let

x^{(i)}

and

c^{(j)}

represent a datapoint i and a centroid j, respectively. Assume the dataset has D numerical dimensions, meaning these quantities are vectors. Let d refer to one of these dimensions. To compute the Euclidean distance within the dth dimension, we express it via Equation (8).

E u c l D i s t (x^{(i)}, c^{(j)}, d) \leftarrow {S b t r A A (x_{k}^{(i)}, c_{k}^{(j)})}^{2}

(8)

We omit the square root in this calculation, as we are not much concerned with the raw Euclidean distance but rather with a comparison of the distances. Therefore, the sum of the squared Euclidean distances across all dimensions is represented by Equation (9).

T o t a l D i s t (x^{(i)}, c^{(j)}) = A A (\sum_{d = 1}^{D} E u c l D i s t (x^{(i)}, c^{(j)}, d))

(9)

Next, we apply this calculation to all data points and centroids to decide how the centroid assignment will be made in each iteration. The notation on the right side of Equation (9) indicates that an approximate adder was utilized for each term in the summation.

In [21], only the aggregation dataset of [20] was considered for clustering. For this paper, we considered four open-source artificial datasets from [20], namely, aggregation, diamond9, DS850, and engytime, that exhibit quite distinct and different characteristics to simulate various challenges for the k-means, allowing us to evaluate the performances of the approximate adders in different clustering tasks. It may be noted that not all datasets of [20] can be clustered using k-means due to its limitations, such as the inability to handle complex, non-linear relationships between data points, high-dimensional data, etc.

Here, besides the proposed approximate adder (NAA), we considered several other approximate adders, namely LOA [22], LOAWA [23], APPROX5 [24], HEAA [25], M-HEAA [26], OLOCA [27], HOERAA [28], LDCA [29], HPETA-II [30], HOAANED [31], HERLOA [32], M-HERLOA [33], COREA [21], DBAA [34], and SAAR [35], to assess their performance in k-means clustering. We set the maximum bit precision of the adder to 16 bits (i.e., N = 16), consistent with the approach followed in [21]. For NAA with N = 16 bits, we varied the size of the imprecise part (i.e., M) from 4 bits to 10 bits, performing clustering individually for each value of M across all four datasets. The clustering iterations were carried out until convergence, after which the WCSS was calculated.

We define WCSS as follows: Let

C = {C_{1}, C_{2}, \dots . C_{k}}

represent the set of clusters, where

C_{j}

denotes the group of points assigned to the kth centroid,

c^{(k)}

, as shown in Equation (10). Interested readers may reproduce the clustering results by using the code and following the instructions provided in our GitHub link [36].

W C S S = \sum_{k = 1}^{K} \sum_{x \in C_{k}} T o t a l D i s t (x, c^{(j)})

(10)

The clustering results obtained using the accurate adder and the proposed approximate adder (NAA) are presented side-by-side in Figure 2. Through trial and error, the optimum sizes for precise and imprecise parts of NAA were found to be 9 bits and 7 bits, respectively, for all four datasets. These sizes were selected to ensure that the WCSS value obtained using NAA either matched or was very close to the WCSS value obtained with the accurate adder. As shown in Figure 2, the WCSS value for the diamond9 dataset remains identical for both the accurate adder and NAA, while the WCSS values for aggregation, diamond9, and engytime datasets exhibit slight differences between the two.

Table 1 illustrates the variation in WCSS for the four datasets when using a 16-bit NAA, with the imprecise adder part size varying from 4 bits to 10 bits. WCSS values based on the accurate adder for clustering different datasets are noted from Figure 2 as follows: aggregation = 11,427.23, diamond9 = 1015.24, DS850 = 413.13, and engytime = 12,002.04. In Table 1, M represents the size of the imprecise part of an N-bit NAA, where N = 16 and M varies from 4 to 10. It may be recalled from the previous discussion that M should be at least 4 for an NAA.

When performing k-means clustering using NAA, our aim was to determine the optimum size for its imprecise and precise parts so that a specific configuration could be chosen for NAA that is suitable for clustering different datasets considered. Generally, it is desirable to maximize the size of the imprecise part of an approximate arithmetic circuit such that it yields an acceptable output quality and enables significant savings in design metrics compared to its accurate counterpart. As seen in Table 1, the optimum sizes for the imprecise part (M) of NAA for clustering different datasets were M = 5 for aggregation, M = 7 for diamond9, M = 6 for DS850, and M = 7 for engytime. However, for aggregation, both M = 5 and M = 7 give nearly identical WCSS values, and for DS850, M = 6 and M = 7 result in almost the same WCSS values. For engytime, M = 7 is found to be optimal. Based on these findings, a configuration of 7 bits for the imprecise part and 9 bits for the precise part of the 16-bit NAA was chosen for clustering all four datasets.

The accurate adder and NAA produced the same WCSS value for clustering the diamond9 dataset, but slightly different WCSS values for aggregation, DS850, and engytime datasets, as seen in Figure 2. These variations are attributed to differences between the characteristics of datasets. Overall, the results suggest that NAA is a promising choice for k-means clustering, providing comparable or nearly identical clustering performance compared to the accurate adder. Based on the same procedure, the optimum size of the imprecise part of several existing approximate adders for k-means clustering was also determined, given as follows: M = 5 for LOA, HEAA, M-HEAA, OLOCA, HOERAA, HOAANED, HERLOA, and M-HERLOA; and M = 4 for LOAWA, APPROX5, LDCA, HPETA-II, COREA, and DBAA. The approximate adder SAAR features a unique architecture that lacks an imprecise part, and a 16-bit SAAR was implemented in the software as illustrated in [35].

Figure 3 shows which data points differ between clustering performed using the accurate adder and the proposed approximate adder (NAA) for different datasets. It can be noticed from Figure 3 that two data points (shown in red) were differently clustered by NAA compared to the accurate adder for the aggregation dataset, while one data point (shown in red) has been differently clustered by NAA compared to the accurate adder for DS850 and engytime datasets. However, there is no difference in clustering performed by the accurate adder and NAA for the diamond9 dataset. Thus, any minor/no differences in clustering between the accurate adder and NAA explain any minor/no variations between their corresponding WCSS values across different datasets.

Next, to analyze the error characteristics of approximate adders, we computed two commonly used error metrics based on the optimum level of inaccuracy for each approximate adder, as discussed earlier. The error metrics considered were mean error distance (MED), also known as mean absolute error, and root mean square error (RMSE). To perform the error analysis, the high-level functionality of the accurate adder and different approximate adders was modeled in Python. Given that a 16-bit adder has 2³² possible input combinations, considering all those may be impractical. Therefore, we supplied one million randomly generated input values to the adders and calculated their MED and RMSE compared to the sum produced by the accurate adder. The formulae for MED and RMSE are given by Equations (11) and (12).

M E D = \frac{1}{K} \sum_{L = 1}^{K} |A p p x S u m (A_{L}, B_{L}) - A c c u S u m (A_{L}, B_{L})|

(11)

R M S E = \sqrt{\frac{1}{K} \sum_{L = 1}^{K} {(A p p x S u m (A_{L}, B_{L}) - A c c u S u m (A_{L}, B_{L}))}^{2}}

(12)

In Equations (11) and (12), AccuSum(A_L, B_L) refers to the sum produced by the accurate adder, while AppxSum(A_L, B_L) represents the sum generated by an approximate adder. The notation (A_L, B_L) denotes a specific set of input values given to the adder. K represents the number of inputs provided to the approximate adders for the calculation of the error metrics, with K set to 1 million. The MED and RMSE values computed for the approximate adders having optimum inaccuracy for clustering are given in Table 2.

As seen in Table 2, although NAA exhibits numerically higher MED and RMSE values compared to many other approximate adders, k-means clustering was found to inherently tolerate some inaccuracies, making NAA still effective, as demonstrated by Figure 2 and Figure 3. However, it is important to note that the MED and RMSE values for NAA given in Table 2 correspond to M = 7, which is greater than the optimum value of M = 4 or M = 5 determined for other approximate adders. When assuming M = 5 for NAA, its MED and RMSE were found to be 3.4515 and 4.7402, respectively, both of which are lower than the MED and RMSE values of many other approximate adders, such as LOA, HEAA, M-HEAA, OLOCA, HOERAA, and HOAANED. Similarly, with M = 4, the MED and RMSE for NAA were found to be 1.6572 and 2.3470, respectively, which are lower than those of other approximate adders like LOAWA, APPROX5, COREA, LDCA, and HPETA-II. This indicates that NAA, with M = 4 or M = 5, outperforms several approximate adders in terms of error metrics, and NAA is suitable for clustering even with M = 7, showcasing its superiority.

Additionally, Table 2 reveals that SAAR has significantly greater MED and RMSE values compared to all other approximate adders, which is due to its unique architecture. As a result, the WCSS values obtained by using SAAR for clustering the datasets differ noticeably from those generated by the accurate adder, and they are given as follows: aggregation—WCSS with accurate adder = 11,427.23 and WCSS with SAAR = 13,204.01; diamond9—WCSS with accurate adder = 1015.24 and WCSS with SAAR = 1034.50; DS850—WCSS with accurate adder = 413.13 and WCSS with SAAR = 419.94; engytime—WCSS with accurate adder = 12,002.04 and WCSS with SAAR = 12,573.49. These imply that SAAR may not be useful for k-means clustering. In the next section, we discuss the physical synthesis of accurate and approximate adders.

4. Accurate and Approximate Adders–Synthesis

The accurate 16-bit adder, along with several approximate 16-bit adders, was described in Verilog HDL and synthesized using gates from a 28 nm bulk CMOS standard cell library [37] using Synopsys Design Compiler (DC). For the accurate adder, addition was described in a data flow style using the arithmetic operator (+), and synthesis was performed using the ‘compile_ultra’ command of DC, resulting in a 16-bit ripple carry adder (RCA). It is well known that the RCA is low-power and occupies less area compared to other high-speed adders, which is why it was selected as the sample architecture for this work. However, other high-speed adders, such as a carry look-ahead adder, may also be used to implement the accurate adder and the precise parts of the approximate adders. To ensure consistency and a fair comparison with the accurate adder, the RCA topology was also employed to implement the precise parts of all approximate adders in this work. Accordingly, the precise sections of approximate adders were described in data flow style using the addition operator, while the imprecise parts were structurally described. The default wire-load model was included by DC during synthesis to account for interconnect and parasitic effects. Additionally, the sum bits of all the adders were uniformly assigned a fanout-of-4 drive strength.

Following synthesis, the total area occupied by each adder was estimated using DC. Subsequently, the gate-level netlists of all the adders underwent functional verification through simulation using Synopsys VCS. To do this, approximately 1000 randomly generated input values were uniformly supplied to all adders via a test bench, with a latency of 2 ns to account for the delay of the 16-bit RCA. The switching activity recorded during functional simulations was utilized to estimate the total power dissipation using Synopsys Prime Power. The critical path delay of each adder was determined using Synopsys Prime Time. For the timing estimation, a virtual clock was employed to constrain the primary inputs and outputs of the adders; however, it was not part of the physical implementation. The standard design metrics, namely, total area, critical path delay, and total power dissipation estimated for various adders, are presented in Table 3. The split-up of total area in terms of cell area and interconnect area, and the split-up of total power dissipation in terms of dynamic power and static (leakage) power, are also given in Table 3. Dynamic power was calculated by summing cell internal power and net switching power, as reported by Synopsys Prime Power.

In Table 3, it can be observed that NAA occupies the smallest area among the approximate adders, and this is mainly because NAA has a 9-bit precise part and a 7-bit imprecise part, while the rest of the approximate adders have a 12-bit/11-bit precise part, and a corresponding 4-bit/5-bit imprecise part. Since NAA has a relatively bigger imprecise part, its logic is reduced, and consequently, NAA’s area occupancy is less compared to other approximate adders. Nonetheless, NAA has been able to achieve a good clustering quality, which is comparable to the accurate adder, despite having a big imprecise part, and this is an advantage resulting from its unique architecture. Compared to the accurate adder (i.e., RCA), NAA utilizes 21.6% less area, despite its 9-bit precise section having been implemented as an RCA.

Concerning the critical path delay, NAA achieves a 37.1% reduction compared to the accurate adder. Overall, SAAR has the shortest delay, and this is due to certain factors, stated as follows: a 16-bit SAAR is divided into two significant 6-bit precise parts and a less significant 4-bit precise part, according to its architecture, as shown in [35]. The carry inputs for the 6-bit precise parts do not come from an accurate carry logic, so the critical path delay of the 16-bit SAAR is determined by the delay of a 6-bit precise adder, which has been implemented as an RCA. On the other hand, NAA features a 9-bit precise adder, which explains why its delay is greater compared to SAAR. Other approximate adders, such as LOA, HEAA, M-HEAA, OLOCA, HOERAA, HOAANED, HERLOA, and M-HERLOA, use an 11-bit precise adder realized as an RCA, resulting in delays that are greater than SAAR’s. Similarly, LOAWA, APPROX5, LDCA, COREA, and DBAA implement a 12-bit precise section, also realized as an RCA, leading to delays greater than SAAR’s. However, SAAR’s clustering quality is inferior compared to other approximate adders, as noted in Section 3. Hence, SAAR is not of interest despite it being faster than other approximate adders.

In terms of power, NAA dissipates less dynamic and static power and thus has less total power dissipation than the accurate adder and other approximate adders. This is primarily attributed to the smaller area occupancy of NAA. When compared with the accurate adder, NAA achieves a 31% reduction in power dissipation while facilitating the same or similar clustering quality.

The power-delay product (PDP) is a crucial metric in digital logic design that combines the power dissipation and maximum propagation delay of a circuit. PDP provides insight into the trade-off between the speed performance and energy efficiency of a digital circuit or system. Lower PDP values point to circuits that are faster and dissipate less power, which is essential for optimizing battery life in portable devices. In high-performance systems, reducing PDP can lead to more efficient processing while minimizing heat generation. Designers aim to balance power and speed to meet specific design goals, whether for low-power applications or high-speed computing. Hence, PDP helps in making informed decisions to optimize a circuit’s performance. We calculated the PDP for all the adders mentioned in Table 3 and subsequently normalized these values. Normalization was performed by dividing the actual PDP of each adder by the highest PDP value of a specific adder (in this case, RCA). The resulting normalized PDP values are shown in Figure 4a. A lower PDP is more desirable as both power and delay should be minimized for optimum performance. Thus, the smallest normalized PDP value is preferred, which corresponds to NAA, as seen in Figure 4a.

The area-delay product (ADP) is another performance metric used alongside the PDP. ADP serves to assess the trade-off between area and delay, providing a measure of how effectively a design utilizes its area to achieve a specific performance level. Generally, designs with lower ADP values are considered more efficient, as they manage to minimize area while maintaining a low delay. A design with a lower ADP is more desirable, as it reflects a better balance between area and delay. In light of this, ADP for all the adders presented in Table 3 was computed and normalized. To normalize the ADP, the actual value of each adder’s ADP was divided by the highest ADP value, which corresponds to the accurate adder (RCA). The normalized ADP values for various adders are shown in Figure 4b, with the optimal value, i.e., the lowest normalized ADP, corresponding to NAA. NAA has less PDP and ADP compared to many other approximate adders, and it achieves a 56.7% reduction in PDP and a 50.9% reduction in ADP compared to the accurate adder (RCA) while providing nil/negligible compromise on the clustering quality.

5. Conclusions

This paper evaluated the usefulness of approximate adders for k-means clustering, which is a popular unsupervised machine learning technique. In this context, a new approximate adder, NAA, has been presented that demonstrates the ability to deliver the same or similar clustering quality as the accurate adder across datasets with varying characteristics. NAA also enabled significant reductions in design metrics compared to the accurate adder and many other approximate adders. To ensure the reproducibility of this research, the coding developed and methodology followed to perform clustering have been made open for public access in GitHub. The usefulness of NAA for other practical applications of approximate computing may be studied in the future.

Author Contributions

Conceptualization, P.B.; methodology, P.B. and S.M.M.A.H.Z.; software, P.B. and S.M.M.A.H.Z.; validation, P.B. and S.M.M.A.H.Z.; investigation, P.B. and S.M.M.A.H.Z.; resources, P.B., S.M.M.A.H.Z. and D.L.M.; data curation, P.B. and S.M.M.A.H.Z.; writing—original draft preparation, P.B. and S.M.M.A.H.Z.; writing—review and editing, P.B.; visualization, P.B. and S.M.M.A.H.Z.; supervision, P.B. and D.L.M.; project administration, P.B. and D.L.M.; funding acquisition, D.L.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Education, Singapore Academic Research Fund under grant number Tier-1 RG127/22.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data are available within the manuscript.

Acknowledgments

The authors wish to thank Raunaq Nayar for his assistance in this research.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Han, J.; Orshansky, M. Approximate computing: An emerging paradigm for energy-efficient design. In Proceedings of the 18th IEEE European Test Symposium, Avignon, France, 27–30 May 2013. [Google Scholar]
Roy, K.; Raghunathan, A. Approximate computing: An energy-efficient computing technique for error resilient applications. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI, Montpellier, France, 8–10 July 2015. [Google Scholar]
Liu, W.; Lombardi, F.; Schulte, M. Approximate computing: From circuits to applications. Proc. IEEE 2020, 108, 2103–2107. [Google Scholar] [CrossRef]
Breuer, M.A. Multi-media applications and imprecise computation. In Proceedings of the 8th Euromicro Conference on Digital System Design, Porto, Portugal, 30 August–3 September 2005. [Google Scholar]
Zhu, N.; Goh, W.L.; Zhang, W.; Yeo, K.S.; Kong, Z.H. Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing. IEEE Trans. Very Large Scale Integr. Syst. 2010, 18, 1225–1229. [Google Scholar]
Potluri, U.S.; Madanayake, A.; Cintra, R.J.; Bayer, F.M.; Kulasekara, S.; Edirisuriya, A. Improved 8-point approximate DCT for image and video compression requiring only 14 additions. IEEE Trans. Circuits Syst. Part I Regul. Pap. 2014, 61, 1727–1740. [Google Scholar] [CrossRef]
Nair, R. Big data needs approximate computing: Technical perspective. Commun. ACM 2015, 58, 104. [Google Scholar] [CrossRef]
Panda, P.; Sengupta, A.; Sarwar, S.S.; Srinivasan, G.; Venkataramani, S.; Raghunathan, A.; Roy, K. Cross-layer approximations for neuromorphic computing: From devices to circuits and systems. In Proceedings of the 53rd Annual Design Automation Conference, Austin, TX, USA, 5–9 June 2016. [Google Scholar]
Sarwar, S.S.; Srinivasan, G.; Han, B.; Wijesinghe, P.; Jaiswal, A.; Panda, P.; Raghunathan, A.; Roy, K. Energy efficient neural computing: A study of cross-layer approximations. IEEE J. Emerg. Sel. Top. Circuits Syst. 2018, 8, 796–809. [Google Scholar] [CrossRef]
Yin, P.; Wang, C.; Waris, H.; Liu, W.; Han, Y.; Lombardi, F. Design and analysis of energy-efficient dynamic range approximate logarithmic multipliers for machine learning. IEEE Trans. Sustain. Comput. 2021, 6, 612–625. [Google Scholar] [CrossRef]
Sampson, A.; Deitl, W.; Fortuna, E.; Gnanapragasam, D.; Ceze, L.; Grossman, D. EnerJ: Approximate data types for safe and general low-power computation. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, San Jose, CA, USA, 4–8 June 2011. [Google Scholar]
Sampson, A.; Nelson, J.; Strauss, K.; Ceze, L. Approximate storage in solid-state memories. ACM Trans. Comput. Syst. 2014, 32, 9. [Google Scholar] [CrossRef]
Shoushtari, M.; Rahmani, A.M.; Dutt, N. Quality-configurable memory hierarchy through approximation. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Seoul, Republic of Korea, 15–20 October 2017. [Google Scholar]
Zhang, H.; Putic, M.; Lach, J. Low power GPGPU computation with imprecise hardware. In Proceedings of the 51st Design Automation Conference, San Francisco, CA, USA, 1–5 June 2014. [Google Scholar]
Raha, A.; Jayakumar, H.; Raghunathan, V. Input-based dynamic reconfiguration of approximate arithmetic units for video encoding. IEEE Trans. Very Large Scale Integr. Syst. 2016, 24, 846–857. [Google Scholar] [CrossRef]
Jiang, H.; Liu, C.; Liu, L.; Lombardi, F.; Han, J. A review, classification, and comparative evaluation of approximate arithmetic circuits. ACM J. Emerg. Technol. Comput. Syst. 2017, 13, 60. [Google Scholar] [CrossRef]
Dalloo, A.M.; Humaidi, A.J.; Al Mhdawi, A.K.; Al-Raweshidy, H. Approximate computing: Concepts, architectures, challenges, applications, and future directions. IEEE Access 2024, 12, 146022–146088. [Google Scholar] [CrossRef]
Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1965; 27 December–7 January 1966. [Google Scholar]
Available online: https://github.com/deric/clustering-benchmark (accessed on 9 November 2024).
Seok, H.; Seo, H.; Lee, J.; Kim, Y. COREA: Delay- and energy-efficient approximate adder using effective carry speculation. Electronics 2021, 10, 2234. [Google Scholar] [CrossRef]
Mahdiani, H.R.; Ahmadi, A.; Fakhraie, S.M.; Lucas, C. Bio-inspired computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Trans. Circuits Syst. Part I Regul. Pap. 2010, 57, 850–862. [Google Scholar] [CrossRef]
Albicocco, P.; Cardarilli, G.C.; Nannarelli, A.; Petricca, M.; Re, M. Imprecise arithmetic for low power image processing. In Proceedings of the 46th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 4–7 November 2012. [Google Scholar]
Gupta, V.; Mohapatra, D.; Raghunathan, A.; Roy, K. Low-power digital signal processing using approximate adders. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2013, 32, 124–137. [Google Scholar] [CrossRef]
Balasubramanian, P.; Maskell, D. Hardware efficient approximate adder design. In Proceedings of the IEEE Region 10 Conference, Jeju, Republic of Korea, 28–31 October 2018. [Google Scholar]
Balasubramanian, P.; Maskell, D.L.; Prasad, K. Approximate adder with reduced error. In Proceedings of the IEEE 31st International Conference on Microelectronics, Nis, Serbia, 16–18 September 2019. [Google Scholar]
Dalloo, A.; Najafi, A.; Garcia-Ortiz, A. Systematic design of an approximate adder: The optimized lower part constant-OR adder. IEEE Trans. Very Large Scale Integr. Syst. 2018, 26, 1595–1599. [Google Scholar] [CrossRef]
Balasubramanian, P.; Maskell, D.L. Hardware optimized and error reduced approximate adder. Electronics 2019, 8, 1212. [Google Scholar] [CrossRef]
Seo, H.; Kim, Y. A new approximate adder with duplicate-constant scheme for energy efficient applications. In Proceedings of the IEEE International Conference on Consumer Electronics–Asia, Seoul, Republic of Korea, 1–3 November 2020. [Google Scholar]
Jothin, R.; Vasanthanayaki, C. High performance error tolerant adders for image processing applications. IETE J. Res. 2021, 67, 205–216. [Google Scholar] [CrossRef]
Balasubramanian, P.; Nayar, R.; Maskell, D.L.; Mastorakis, N.E. An approximate adder with a near-normal error distribution: Design, error analysis and practical application. IEEE Access 2021, 9, 4518–4530. [Google Scholar] [CrossRef]
Seo, H.; Yang, Y.S.; Kim, Y. Design and analysis of an approximate adder with hybrid error reduction. Electronics 2020, 9, 471. [Google Scholar] [CrossRef]
Balasubramanian, P.; Nayar, R.; Maskell, D. An approximate adder with reduced error and optimized design metrics. In Proceedings of the 17th IEEE Asia Pacific Conference on Circuits and Systems, Penang, Malaysia, 22–26 November 2021. [Google Scholar]
Maroof, N.; Al-Zahrani, A.Y. A double bit approximate adder providing a new design perspective for gate-level design. J. Circuits Syst. Comput. 2022, 31, 2250065. [Google Scholar] [CrossRef]
Kumar, U.A.; Sahith, G.; Chatterjee, S.K.; Ahmed, S.E. A high-speed and power-efficient approximate adder for image processing applications. J. Circuits Syst. Comput. 2022, 31, 2250049. [Google Scholar] [CrossRef]
Available online: https://github.com/SyedZaheen/clustering-with-approx-computing (accessed on 11 November 2024).
Synopsys SAED_EDK32/28_CORE Databook. Revision 1.0.0. January 2012. Available online: https://www.synopsys.com/academic-research/university.html (accessed on 11 November 2024).

Figure 1. General architecture of the proposed approximate adder (NAA). The adder size is N bits. The M-bit imprecise part is shown in red, and the (N–M)-bit precise part is shown in blue.

Figure 2. Different colors have been used to highlight the clusters in the datasets. There are seven slightly overlapping clusters in aggregation and nine distinct clusters in diamond9. There are five rather distinct but slightly overlapping clusters in DS850 and two significantly overlapping clusters in engytime. The results of clustering performed using the accurate adder and the proposed approximate adder (NAA) for the four datasets are shown side-by-side, along with the respective WCSS values. Clustering was performed using NAA, having a 9-bit precise part and a 7-bit imprecise part. The centroids of clusters are highlighted by black cross marks in Figure 2.

Figure 3. Depicting any differences in data points between clustering performed using the accurate adder and NAA for datasets: aggregation, diamond9, DS850, and engytime. The same data points are highlighted in blue, and different data points are highlighted in red (attributed to NAA).

Figure 4. Plots of (a) normalized power-delay product (PDP) and (b) normalized area-delay product (ADP) corresponding to the accurate adder (RCA) and various approximate adders, including the proposed approximate adder (NAA).

Table 1. WCSS values corresponding to the proposed approximate adder (NAA) for the clustering of different datasets determined by varying the size of its imprecise part (M).

Imprecise Part Size (M)	aggregation	diamond9	DS850	engytime
4	11,427.23	1015.24	413.13	12,001.88
5	11,427.23	1015.24	413.13	12,003.56
6	11,422.65	1015.24	413.13	11,829.33
7	11,424.95	1015.24	413.16	12,002.22
8	11,434.26	1015.31	413.21	12,009.32
9	11,427.25	1015.24	413.38	12,201.00
10	11,605.82	1015.27	413.49	12,211.40

Table 2. Error parameters of various approximate adders calculated with M and (N–M) signifying the sizes of their imprecise and precise parts.

Approximate Adder	Imprecise and Precise Parts Size	MED	RMSE
LOA	M = 5; (N–M) = 11	5.8711	7.9942
LOAWA	M = 4; (N–M) = 12	3.7485	5.4737
APPROX5	M = 4; (N–M) = 12	3.9995	4.6372
HEAA	M = 5; (N–M) = 11	3.7529	5.4788
M-HEAA	M = 5; (N–M) = 11	3.9683	5.1476
OLOCA	M = 5; (N–M) = 11	6.4698	8.6223
HOERAA	M = 5; (N–M) = 11	3.9696	5.1497
LDCA	M = 4; (N–M) = 12	4.3144	5.1504
HPETA-II	M = 4; (N–M) = 12	1.7541	3.3218
HOAANED	M = 5; (N–M) = 11	3.9672	5.1460
HERLOA	M = 5; (N–M) = 11	2.5305	3.9264
M-HERLOA	M = 5; (N–M) = 11	2.5580	3.8701
COREA	M = 4; (N–M) = 12	2.6950	3.8710
DBAA	M = 4; (N–M) = 12	1.1534	1.9969
SAAR	Not applicable	193.4860	442.8306
NAA (proposed)	M = 7; (N–M) = 9	13.9857	19.0433

Table 3. Design parameters of the accurate and various approximate adders, synthesized using a 28 nm CMOS standard cell library.

Adder Name	Area (µm²)			Delay (ns)	Power Dissipation (µW)
Adder Name	Cells	Net	Total	Delay (ns)	Dynamic	Static	Total
RCA (accurate)	77.77	5.68	83.45	1.75	40.35	2.58	42.93
LOA	65.31	5.17	70.48	1.30	35.14	2.10	37.24
LOAWA	66.59	5.35	71.94	1.34	35.31	2.17	37.48
APPROX5	66.08	4.71	70.79	1.36	37.35	2.20	39.55
HEAA	66.59	5.35	71.94	1.34	35.31	2.17	37.48
M-HEAA	64.30	4.86	69.16	1.34	33.55	2.00	35.55
OLOCA	63.03	4.67	67.70	1.30	33.39	1.92	35.31
HOERAA	68.62	5.34	73.96	1.31	34.91	2.09	37.00
LDCA	64.55	4.55	69.10	1.36	34.67	2.06	36.73
HPETA-II	80.82	7.01	87.83	1.41	39.01	2.61	41.62
HOAANED	66.84	5.26	72.10	1.31	33.97	2.04	36.01
HERLOA	70.14	6.07	76.21	1.34	37.55	2.25	39.80
M-HERLOA	69.13	5.79	74.92	1.34	37.00	2.19	39.19
COREA	72.94	6.19	79.13	1.34	36.68	2.33	39.01
DBAA	75.48	5.81	81.29	1.47	42.31	2.45	44.76
SAAR	90.47	6.67	97.14	0.81	39.36	2.74	42.10
NAA (proposed)	60.49	4.93	65.42	1.10	27.83	1.78	29.61

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Balasubramanian, P.; Zaheen, S.M.M.A.H.; Maskell, D.L. Machine Learning Using Approximate Computing. J. Low Power Electron. Appl. 2025, 15, 21. https://doi.org/10.3390/jlpea15020021

AMA Style

Balasubramanian P, Zaheen SMMAH, Maskell DL. Machine Learning Using Approximate Computing. Journal of Low Power Electronics and Applications. 2025; 15(2):21. https://doi.org/10.3390/jlpea15020021

Chicago/Turabian Style

Balasubramanian, Padmanabhan, Syed Mohammed Mosayeeb Al Hady Zaheen, and Douglas L. Maskell. 2025. "Machine Learning Using Approximate Computing" Journal of Low Power Electronics and Applications 15, no. 2: 21. https://doi.org/10.3390/jlpea15020021

APA Style

Balasubramanian, P., Zaheen, S. M. M. A. H., & Maskell, D. L. (2025). Machine Learning Using Approximate Computing. Journal of Low Power Electronics and Applications, 15(2), 21. https://doi.org/10.3390/jlpea15020021

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Using Approximate Computing

Abstract

1. Introduction

2. New Approximate Adder

3. K-Means Clustering Involving Accurate and Approximate Adders

4. Accurate and Approximate Adders–Synthesis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI