MDPI - Publisher of Open Access Journals

23 pages, 2769 KB

Open AccessArticle

Exploring CBC Data for Anemia Diagnosis: A Machine Learning and Ontology Perspective

by Amira S. Awaad, Yomna M. Elbarawy, H. Mancy and Naglaa E. Ghannam

BioMedInformatics 2025, 5(3), 35; https://doi.org/10.3390/biomedinformatics5030035 - 2 Jul 2025

Viewed by 1055

Background: Anemia, a common health disorder affecting populations globally, demands timely and accurate diagnosis for treatment to be effective. The aim of this paper is to detect and classify four types of anemia: hgb, iron-deficiency, folate-deficiency, and B12-deficiency anemia. Methods: This paper proposes [...] Read more.

Background: Anemia, a common health disorder affecting populations globally, demands timely and accurate diagnosis for treatment to be effective. The aim of this paper is to detect and classify four types of anemia: hgb, iron-deficiency, folate-deficiency, and B12-deficiency anemia. Methods: This paper proposes an ontology-enhanced machine learning (ML) framework to classify types of anemia from CBC data obtained from Kaggle, which contains 15,300 patient records. It evaluates the effects of classical versus deep classifiers on imbalanced and oversampled training samples. Tests include KNN, SVM, DT, RF, CNN, CNN+SVM, CNN+RF, and XGBoost. Another interesting contribution is the use of ontological reasoning via SPARQL queries to semantically enrich clinical features with categories like “Low Hemoglobin” or “Macrocytic MCV”. These semantic features were then used in both classical (SVM) and deep hybrid models (CNN+SVM). Results: Ontology-enhanced and CNN hybrid models perform competitively when paired with ROS or ADASYN, but their performance degrades significantly on the original dataset. There were tremendous performance gains with ontology-enhanced models in that Onto-CNN+SVM achieved an F1-score (1.00) for all the four types of anemia under ROS sampling, while Onto-SVM exhibited more than 20% improvement in F1-scores for minority categories like folate and B12 when compared to baseline models, except XGBoost. Conclusions: Ontology-driven knowledge coalescence has been shown to improve classification results; however, XGBoost consistently outperformed all other classifiers across all data conditions, making it the most robust and reliable model for clinically relevant decision-support systems in anemia diagnosis. Full article

► Show Figures

Figure 1

34 pages, 1197 KB

Open AccessArticle

PVkNN: A Publicly Verifiable and Privacy-Preserving Exact kNN Query Scheme for Cloud-Based Location Services

by Jingyi Li, Yuqi Song, Chengliang Tian and Weizhong Tian

Modelling 2025, 6(2), 44; https://doi.org/10.3390/modelling6020044 - 3 Jun 2025

Viewed by 666

Abstract

The k-nearest- neighbor (kNN) algorithm is crucial in data mining and machine learning, yet its deployment on large-scale datasets within cloud environments presents significant security and efficiency challenges. This paper is dedicated to advancing the resolution of these challenges and [...] Read more.

The k-nearest- neighbor (kNN) algorithm is crucial in data mining and machine learning, yet its deployment on large-scale datasets within cloud environments presents significant security and efficiency challenges. This paper is dedicated to advancing the resolution of these challenges and presents novel contributions to the development of efficient and secure exact kNN query schemes tailored for spatial datasets in cloud-based location services. Addressing existing limitations, our approach focuses on accelerating query processing while ensuring robust privacy preservation and public verifiability. Key contributions include the establishment of a formal framework underpinned by stringent security definitions, providing a solid groundwork for future advancements. Leveraging Paillier’s homomorphic cryptosystem and public-key signature techniques, our design achieves heightened security by safeguarding databases, query access patterns, and result integrity while enabling public verification. Additionally, our scheme enhances computational efficiency through optimized data-packing techniques and simplified Voronoi diagram-based ciphertext index construction, leading to substantial savings in computational and communication overheads. Rigorous and transparent theoretical analysis substantiates the correctness, security, and efficiency of our design, while comprehensive experimental evaluations confirm the effectiveness of our approach, showcasing its practical applicability and scalability across datasets of varying scales. Full article

► Show Figures

Figure 1

27 pages, 9653 KB

Open AccessArticle

DNS over HTTPS Tunneling Detection System Based on Selected Features via Ant Colony Optimization

by Hardi Sabah Talabani, Zrar Khalid Abdul and Hardi Mohammed Mohammed Saleh

Future Internet 2025, 17(5), 211; https://doi.org/10.3390/fi17050211 - 7 May 2025

Viewed by 1171

Abstract

DNS over HTTPS (DoH) is an advanced version of the traditional DNS protocol that prevents eavesdropping and man-in-the-middle attacks by encrypting queries and responses. However, it introduces new challenges such as encrypted traffic communication, masking malicious activity, tunneling attacks, and complicating intrusion detection [...] Read more.

DNS over HTTPS (DoH) is an advanced version of the traditional DNS protocol that prevents eavesdropping and man-in-the-middle attacks by encrypting queries and responses. However, it introduces new challenges such as encrypted traffic communication, masking malicious activity, tunneling attacks, and complicating intrusion detection system (IDS) packet inspection. In contrast, unencrypted packets in the traditional Non-DoH version remain vulnerable to eavesdropping, privacy breaches, and spoofing. To address these challenges, an optimized dual-path feature selection approach is designed to select the most efficient packet features for binary class (DoH-Normal, DoH-Malicious) and multiclass (Non-DoH, DoH-Normal, DoH-Malicious) classification. Ant Colony Optimization (ACO) is integrated with machine learning algorithms such as XGBoost, K-Nearest Neighbors (KNN), Random Forest (RF), and Convolutional Neural Networks (CNNs) using CIRA-CIC-DoHBrw-2020 as the benchmark dataset. Experimental results show that the proposed model selects the most effective features for both scenarios, achieving the highest detection and outperforming previous studies in IDS. The highest accuracy obtained for binary and multiclass classifications was 0.9999 and 0.9955, respectively. The optimized feature set contributed significantly to reducing computational costs and processing time across all utilized classifiers. The results provide a robust, fast, and accurate solution to challenges associated with encrypted DNS packets. Full article

► Show Figures

Figure 1

51 pages, 2432 KB

Open AccessArticle

A Hubness Information-Based k-Nearest Neighbor Approach for Multi-Label Learning

by Zeyu Teng, Shanshan Tang, Min Huang and Xingwei Wang

Mathematics 2025, 13(7), 1202; https://doi.org/10.3390/math13071202 - 5 Apr 2025

Viewed by 863

Abstract

Multi-label classification (MLC) plays a crucial role in various real-world scenarios. Prediction with nearest neighbors has achieved competitive performance in MLC. Hubness, a phenomenon in which a few points appear in the k-nearest neighbor (kNN) lists of many points in high-dimensional spaces, may [...] Read more.

Multi-label classification (MLC) plays a crucial role in various real-world scenarios. Prediction with nearest neighbors has achieved competitive performance in MLC. Hubness, a phenomenon in which a few points appear in the k-nearest neighbor (kNN) lists of many points in high-dimensional spaces, may significantly impact machine learning applications and has recently attracted extensive attention. However, it has not been adequately addressed in developing MLC algorithms. To address this issue, we propose a hubness-aware kNN-based MLC algorithm in this paper, named multi-label hubness information-based k-nearest neighbor (MLHiKNN). Specifically, we introduce a fuzzy measure of label relevance and employ a weighted kNN scheme. The hubness information is used to compute each training example’s membership in relevance and irrelevance to each label and calculate weights for the nearest neighbors of a query point. Then, MLHiKNN exploits high-order label correlations by training a logistic regression model for each label using the kNN voting results with respect to all possible labels. Experimental results on 28 benchmark datasets demonstrate that MLHiKNN is competitive among the compared methods, including nine well-established MLC algorithms and three commonly used hubness reduction techniques, in dealing with MLC problems. Full article

► Show Figures

Figure 1

19 pages, 6361 KB

Open AccessArticle

Investigating Intelligent Call Technology for Dispatching Telephones Towards System Integration

by Chunliang Tai, Yibo Sun, Shiming Sun, Zhixin Sun, Xing Chen, Yue Shi and Chao Liu

Electronics 2025, 14(1), 179; https://doi.org/10.3390/electronics14010179 - 4 Jan 2025

Viewed by 957

Abstract

The dispatching telephone functionality acts as a pivotal interconnection between the power grid dispatch business and telecommunications business, playing a vital role in ensuring the efficient conduct of grid dispatch activities. Nonetheless, the current power grid dispatch system and communication program-controlled exchange system [...] Read more.

The dispatching telephone functionality acts as a pivotal interconnection between the power grid dispatch business and telecommunications business, playing a vital role in ensuring the efficient conduct of grid dispatch activities. Nonetheless, the current power grid dispatch system and communication program-controlled exchange system are disjointed, leading to a cumbersome process for the dispatching telephone functionality that severely impacts grid dispatch efficiency. To better tackle the above challenges, in this paper, we introduce an innovative intelligent call technology designed to facilitate data interchange and information integration between the power grid dispatch system and the communication program-controlled exchange system. By leveraging the K-Nearest Neighbors (KNN) algorithm, the technology enables automated querying of operational information with heightened efficiency and precision, thereby optimizing the operations of the dispatching telephone functionality. Subsequently, a prototype software application is developed to conduct experimental testing of intelligent call technology. The findings demonstrate that the method proposed in this paper successfully reduces the time expenditure associated with the dispatching telephone functionality, enhancing the productivity of dispatchers in routine operations and emergency response, thus ensuring the secure and stable operation of the power grid. Full article

(This article belongs to the Special Issue Intelligence-Empowered Modeling, Control, and Optimization of Complex Networked Systems)

► Show Figures

Figure 1

18 pages, 68585 KB

Open AccessArticle

A Registration Method Based on Ordered Point Clouds for Key Components of Trains

by Kai Yang, Xiaopeng Deng, Zijian Bai, Yingying Wan, Liming Xie and Ni Zeng

Sensors 2024, 24(24), 8146; https://doi.org/10.3390/s24248146 - 20 Dec 2024

Viewed by 940

Abstract

Point cloud registration is pivotal across various applications, yet traditional methods rely on unordered point clouds, leading to significant challenges in terms of computational complexity and feature richness. These methods often use k-nearest neighbors (KNN) or neighborhood ball queries to access local neighborhood [...] Read more.

Point cloud registration is pivotal across various applications, yet traditional methods rely on unordered point clouds, leading to significant challenges in terms of computational complexity and feature richness. These methods often use k-nearest neighbors (KNN) or neighborhood ball queries to access local neighborhood information, which is not only computationally intensive but also confines the analysis within the object’s boundary, making it difficult to determine if points are precisely on the boundary using local features alone. This indicates a lack of sufficient local feature richness. In this paper, we propose a novel registration strategy utilizing ordered point clouds, which are now obtainable through advanced depth cameras, 3D sensors, and structured light-based 3D reconstruction. Our approach eliminates the need for computationally expensive KNN queries by leveraging the inherent ordering of points, significantly reducing processing time; extracts local features by utilizing 2D coordinates, providing richer features compared to traditional methods, which are constrained by object boundaries; compares feature similarity between two point clouds without keypoint extraction, enhancing efficiency and accuracy; and integrates image feature-matching techniques, leveraging the coordinate correspondence between 2D images and 3D-ordered point clouds. Experiments on both synthetic and real-world datasets, including indoor and industrial environments, demonstrate that our algorithm achieves an optimal balance between registration accuracy and efficiency, with registration times consistently under one second. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

21 pages, 8333 KB

Open AccessArticle

Urban-Scale Acoustic Comfort Map: Fusion of Social Inputs, Noise Levels, and Citizen Comfort in Open GIS

by Farzaneh Zarei, Mazdak Nik-Bakht, Joonhee Lee and Farideh Zarei

Processes 2024, 12(12), 2864; https://doi.org/10.3390/pr12122864 - 13 Dec 2024

Cited by 1 | Viewed by 1398

Abstract

With advancements in the Internet of Things (IoT), diverse and high-resolution data sources, such as environmental sensors and user-generated inputs from mobile devices, have become available to model and estimate citizens’ acoustic comfort in urban environments. These IoT-enabled data sources offer scalable insights [...] Read more.

With advancements in the Internet of Things (IoT), diverse and high-resolution data sources, such as environmental sensors and user-generated inputs from mobile devices, have become available to model and estimate citizens’ acoustic comfort in urban environments. These IoT-enabled data sources offer scalable insights in real time into both objective parameters (e.g., noise levels and environmental conditions) and subjective perceptions (e.g., personal comfort and soundscape experiences), which were previously challenging to capture comprehensively by using traditional methods. Despite this, there remains a lack of a clear framework explicitly presenting the role of these diverse inputs in determining acoustic comfort. This paper contributes by (1) exploring the relationship between attributes governing the physical aspect of the built environment (sensory data) and the end-users’ characteristics/inputs/sensations (such as their acoustic comfort level) and how these attributes can correlate/connect; (2) developing a CityGML-based framework that leverages semantic 3D city models to integrate and represent both objective sensory data and subjective social inputs, enhancing data-driven decision making at the city level; and (3) introducing a novel approach to crowdsourcing citizen inputs to assess perceived acoustic comfort indicators, which inform predictive modeling efforts. Our solution is based on CityGML’s capacity to store and explain 3D city-related shapes with their semantic characteristics, which are essential for city-level operations such as spatial data mining and thematic queries. To do so, a crowdsourcing method was used, and 20 perceptive indicators were identified from the existing literature to evaluate people’s perceived acoustic attributes and types of sound sources and their relations to the perceived soundscape comfort. Three regression models—K-Nearest Neighbor (KNN), Support Vector Regression (SVR), and XGBoost—were trained on the collected data to predict acoustic comfort at bus stops in Montréal based on physical and psychological attributes of travellers. In the best-performing scenario, which incorporated psychological attributes and measured noise levels, the models achieved a normalized mean squared error (NMSE) as low as 0.0181, a mean absolute error (MAE) of 0.0890, and a root mean square error (RMSE) of 0.1349. These findings highlight the effectiveness of integrating subjective and objective data sources to accurately predict acoustic comfort in urban environments. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence Technologies in Energy, Manufacturing and Automatic Control Processes)

► Show Figures

Figure 1

21 pages, 701 KB

Open AccessArticle

FurMoLi: A Future Query Technique for Moving Objects Based on a Learned Index

by Jiwei Yang, Chong Zhang, Wen Tang, Bin Ge, Hongbin Huang and Shiyu Yang

Mathematics 2024, 12(13), 2032; https://doi.org/10.3390/math12132032 - 29 Jun 2024

Viewed by 988

Abstract

The future query of moving objects involves predicting their future positions based on their current locations and velocities to determine whether they will appear in a specified area. This technique is crucial for positioning and navigation, and its importance in our daily lives [...] Read more.

The future query of moving objects involves predicting their future positions based on their current locations and velocities to determine whether they will appear in a specified area. This technique is crucial for positioning and navigation, and its importance in our daily lives has become increasingly evident in recent years. Nonetheless, the growing volume of data renders traditional index structures for moving objects, such as the time-parameterized R-tree (TPR-tree), inefficient due to their substantial storage overhead and high query costs. Recent advancements in learned indexes have demonstrated a capacity to significantly reduce storage overhead and enhance query efficiency. However, most existing research primarily addresses static data, leaving a gap in the context of future queries for moving objects. We propose a novel future query technique for moving objects based on a learned index (FurMoLi for short). FurMoLi encompasses four key stages: firstly, a data partition through clustering based on velocity and position information; secondly, a dimensionality reduction mapping two-dimensional data to one dimension; thirdly, the construction of a learned index utilizing piecewise regression functions; and finally, the execution of a future range query and future KNN query leveraging the established learned index. The experimental results demonstrate that FurMoLi requires 4 orders of magnitude less storage overhead than TPR-tree and 5 orders of magnitude less than

B^{+}

-tree for moving objects (

B^{x}

-tree). Additionally, the future range query time is reduced to just 41.6% of that for TPR-tree and 34.7% of that for

B^{x}

-tree. For future KNN queries, FurMoLi’s query time is only 70.1% of that for TPR-tree and 47.4% of that for

B^{x}

-tree. Full article

► Show Figures

Figure 1

18 pages, 14618 KB

Open AccessArticle

Novel Probabilistic Collision Detection for Manipulator Motion Planning Using HNSW

by Xiaofeng Zhang, Bo Tao, Du Jiang, Baojia Chen, Dalai Tang and Xin Liu

Machines 2024, 12(5), 321; https://doi.org/10.3390/machines12050321 - 7 May 2024

Cited by 1 | Viewed by 1535

Abstract

Collision detection is very important for robot motion planning. The existing accurate collision detection algorithms regard the evaluation of each node as a discrete event, ignoring the correlation between nodes, resulting in low efficiency. In this paper, we propose a novel approach that [...] Read more.

Collision detection is very important for robot motion planning. The existing accurate collision detection algorithms regard the evaluation of each node as a discrete event, ignoring the correlation between nodes, resulting in low efficiency. In this paper, we propose a novel approach that transforms collision detection into a binary classification problem. In particular, the proposed method searches the k-nearest neighbor (KNN) of the new node and estimates its collision probability by the prior node. We perform the hierarchical navigable small world (HNSW) method to query the nearest neighbor data and store the detected nodes to build the database incrementally. In addition, this research develops a KNN query technique tailored for linear data, incorporating threshold segmentation to facilitate collision detection along continuous paths. Moreover, it refines the distance function of the collision classifier to enhance the precision of probability estimations. Simulation results demonstrate the effectiveness of the proposed method. Full article

(This article belongs to the Section Robotics, Mechatronics and Intelligent Machines)

► Show Figures

Figure 1

26 pages, 21938 KB

Open AccessArticle

Navigating the Maps: Euclidean vs. Road Network Distances in Spatial Queries

by Pornrawee Tatit, Kiki Adhinugraha and David Taniar

Algorithms 2024, 17(1), 29; https://doi.org/10.3390/a17010029 - 10 Jan 2024

Cited by 9 | Viewed by 4496

Abstract

Using spatial data in mobile applications has grown significantly, thereby empowering users to explore locations, navigate unfamiliar areas, find transportation routes, employ geomarketing strategies, and model environmental factors. Spatial databases are pivotal in efficiently storing, retrieving, and manipulating spatial data to fulfill users’ [...] Read more.

Using spatial data in mobile applications has grown significantly, thereby empowering users to explore locations, navigate unfamiliar areas, find transportation routes, employ geomarketing strategies, and model environmental factors. Spatial databases are pivotal in efficiently storing, retrieving, and manipulating spatial data to fulfill users’ needs. Two fundamental spatial query types, k-nearest neighbors (kNN) and range search, enable users to access specific points of interest (POIs) based on their location, which are measured by actual road distance. However, retrieving the nearest POIs using actual road distance can be computationally intensive due to the need to find the shortest distance. Using straight-line measurements could expedite the process but might compromise accuracy. Consequently, this study aims to evaluate the accuracy of the Euclidean distance method in POIs retrieval by comparing it with the road network distance method. The primary focus is determining whether the trade-off between computational time and accuracy is justified, thus employing the Open Source Routing Machine (OSRM) for distance extraction. The assessment encompasses diverse scenarios and analyses factors influencing the accuracy of the Euclidean distance method. The methodology employs a quantitative approach, thereby categorizing query points based on density and analyzing them using kNN and range query methods. Accuracy in the Euclidean distance method is evaluated against the road network distance method. The results demonstrate peak accuracy for kNN queries at

k = 1

, thus exceeding 85% across classes but declining as k increases. Range queries show varied accuracy based on POI density, with higher-density classes exhibiting earlier accuracy increases. Notably, datasets with fewer POIs exhibit unexpectedly higher accuracy, thereby providing valuable insights into spatial query processing. Full article

(This article belongs to the Special Issue Recent Advances in Computational Intelligence for Path Planning)

► Show Figures

Figure 1

21 pages, 2964 KB

Open AccessArticle

Graph-Indexed kNN Query Optimization on Road Network

by Wei Jiang, Guanyu Li, Mei Bai, Bo Ning, Xite Wang and Fangliang Wei

Electronics 2023, 12(21), 4536; https://doi.org/10.3390/electronics12214536 - 3 Nov 2023

Cited by 2 | Viewed by 1507

Abstract

The nearest neighbors query problem on road networks constitutes a crucial aspect of location-oriented services and has useful practical implications; e.g., it can locate the k-nearest hotels. However, researches who study road networks still encounter obstacles due to the method’s inherent limitations [...] Read more.

The nearest neighbors query problem on road networks constitutes a crucial aspect of location-oriented services and has useful practical implications; e.g., it can locate the k-nearest hotels. However, researches who study road networks still encounter obstacles due to the method’s inherent limitations with respect to object mobility. More popular methods employ indexes to store intermediate results to improve querying time efficiency, but these other methods are often accompanied by high time costs. To balance the costs of time and space, a lightweight flow graph index is proposed to reduce the quantity of candidate nodes, and with this index the results of a kNN query can be efficiently obtained. Experiments on real road networks confirm the efficiency and accuracy of our optimized algorithm. Full article

(This article belongs to the Special Issue Data Privacy in IoT Networks)

► Show Figures

Figure 1

20 pages, 3841 KB

Open AccessArticle

High-Level K-Nearest Neighbors (HLKNN): A Supervised Machine Learning Model for Classification Analysis

by Elife Ozturk Kiyak, Bita Ghasemkhani and Derya Birant

Electronics 2023, 12(18), 3828; https://doi.org/10.3390/electronics12183828 - 10 Sep 2023

Cited by 37 | Viewed by 8488

Abstract

The k-nearest neighbors (KNN) algorithm has been widely used for classification analysis in machine learning. However, it suffers from noise samples that reduce its classification ability and therefore prediction accuracy. This article introduces the high-level k-nearest neighbors (HLKNN) method, a new technique for [...] Read more.

The k-nearest neighbors (KNN) algorithm has been widely used for classification analysis in machine learning. However, it suffers from noise samples that reduce its classification ability and therefore prediction accuracy. This article introduces the high-level k-nearest neighbors (HLKNN) method, a new technique for enhancing the k-nearest neighbors algorithm, which can effectively address the noise problem and contribute to improving the classification performance of KNN. Instead of only considering k neighbors of a given query instance, it also takes into account the neighbors of these neighbors. Experiments were conducted on 32 well-known popular datasets. The results showed that the proposed HLKNN method outperformed the standard KNN method with average accuracy values of 81.01% and 79.76%, respectively. In addition, the experiments demonstrated the superiority of HLKNN over previous KNN variants in terms of the accuracy metric in various datasets. Full article

(This article belongs to the Special Issue Advances in Data Science: Methods, Systems, and Applications)

► Show Figures

Figure 1

14 pages, 691 KB

Open AccessArticle

Efficient ϵ-Approximate k-Flexible Aggregate Nearest Neighbor Search for Arbitrary ϵ in Road Networks

by Hyuk-Yoon Kwon, Jaejun Yoo and Woong-Kee Loh

Electronics 2023, 12(17), 3622; https://doi.org/10.3390/electronics12173622 - 27 Aug 2023

Viewed by 1153

Abstract

Recently, complicated spatial search algorithms have emerged as spatial-information-based applications, such as location-based services (LBS), and have become very diverse and frequent. The aggregate nearest neighbor (ANN) search is an extension of the existing nearest neighbor (NN) search; it finds the object [...] Read more.

Recently, complicated spatial search algorithms have emerged as spatial-information-based applications, such as location-based services (LBS), and have become very diverse and frequent. The aggregate nearest neighbor (ANN) search is an extension of the existing nearest neighbor (NN) search; it finds the object

p^{*}

that minimizes

G {d (p^{*}, q_{i}), q_{i} \in Q}

from a set Q of M (≥1) query objects, where

G

is an aggregate function and

d ()

is the distance between two objects. The flexible aggregate nearest neighbor (FANN) search is an extension of the ANN search by introducing flexibility factor

ϕ (0 < ϕ \leq 1)

; it finds the object

p^{*}

that minimizes

G {d (p^{*}, q_{i}), q_{i} \in Q_{ϕ}}

from

Q_{ϕ}

, a subset of Q with

| Q_{ϕ} | = ϕ | Q |

. This paper proposes an efficient

ϵ

-approximate k-FANN

(k \geq 1)

search algorithm for an arbitrary approximation ratio

ϵ

(≥1) in road networks. In general,

ϵ

-approximate algorithms are expected to give an improved search performance at the cost of allowing an error ratio of up to the given

ϵ

. Since the optimal value of

ϵ

varies greatly depending on applications and cases, the approximate algorithm for an arbitrary

ϵ

is essential. We prove that the error ratios of the approximate FANN objects returned by our algorithm do not exceed the given

ϵ

. To the best of our knowledge, our algorithm is the first

ϵ

-approximate k-FANN search algorithm in road networks for an arbitrary

ϵ

. Through a series of experiments using various real-world road network datasets, we demonstrated that our approximate algorithm always outperformed the previous state-of-the-art exact algorithm and that the error ratios of the approximate FANN objects were significantly lower than the given

ϵ

value. Full article

(This article belongs to the Special Issue Application Research Using AI, IoT, HCI, and Big Data Technologies)

► Show Figures

Figure 1

24 pages, 1122 KB

Open AccessEditor’s ChoiceArticle

Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing Level

by Karima Khettabi, Zineddine Kouahla, Brahim Farou, Hamid Seridi and Mohamed Amine Ferrag

Big Data Cogn. Comput. 2023, 7(2), 119; https://doi.org/10.3390/bdcc7020119 - 14 Jun 2023

Cited by 2 | Viewed by 2727

Abstract

Internet of Things (IoT) systems include many smart devices that continuously generate massive spatio-temporal data, which can be difficult to process. These continuous data streams need to be stored smartly so that query searches are efficient. In this work, we propose an efficient [...] Read more.

Internet of Things (IoT) systems include many smart devices that continuously generate massive spatio-temporal data, which can be difficult to process. These continuous data streams need to be stored smartly so that query searches are efficient. In this work, we propose an efficient method, in the fog-cloud computing architecture, to index continuous and heterogeneous data streams in metric space. This method divides the fog layer into three levels: clustering, clusters processing and indexing. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used to group the data from each stream into homogeneous clusters at the clustering fog level. Each cluster in the first data stream is stored in the clusters processing fog level and indexed directly in the indexing fog level in a Binary tree with Hyperplane (BH tree). The indexing of clusters in the subsequent data stream is determined by the coefficient of variation (CV) value of the union of the new cluster with the existing clusters in the cluster processing fog layer. An analysis and comparison of our experimental results with other results in the literature demonstrated the effectiveness of the CV method in reducing energy consumption during BH tree construction, as well as reducing the search time and energy consumption during a k Nearest Neighbor (kNN) parallel query search. Full article

► Show Figures

Figure 1

21 pages, 4296 KB

Open AccessArticle

k-NN Query Optimization for High-Dimensional Index Using Machine Learning

by Dojin Choi, Jiwon Wee, Sangho Song, Hyeonbyeong Lee, Jongtae Lim, Kyoungsoo Bok and Jaesoo Yoo

Electronics 2023, 12(11), 2375; https://doi.org/10.3390/electronics12112375 - 24 May 2023

Cited by 4 | Viewed by 2495

Abstract

In this study, we propose three k-nearest neighbor (k-NN) optimization techniques for a distributed, in-memory-based, high-dimensional indexing method to speed up content-based image retrieval. The proposed techniques perform distributed, in-memory, high-dimensional indexing-based k-NN query optimization: a density-based optimization technique that performs k-NN optimization [...] Read more.

In this study, we propose three k-nearest neighbor (k-NN) optimization techniques for a distributed, in-memory-based, high-dimensional indexing method to speed up content-based image retrieval. The proposed techniques perform distributed, in-memory, high-dimensional indexing-based k-NN query optimization: a density-based optimization technique that performs k-NN optimization using data distribution; a cost-based optimization technique using query processing cost statistics; and a learning-based optimization technique using a deep learning model, based on query logs. The proposed techniques were implemented on Spark, which supports a master/slave model for large-scale distributed processing. We showed the superiority and validity of the proposed techniques through various performance evaluations, based on high-dimensional data. Full article

(This article belongs to the Special Issue Application Research Using AI, IoT, HCI, and Big Data Technologies)

► Show Figures

Figure 1

Search Results (44)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (44)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI