Topic Editors

Computer Science Department, College of Engineering, Effat University, Jeddah, Saudi Arabia
Faculty of Theoretical and Applied Economics, The Bucharest University of Economic Studies, Romana Square, No. 6, 010374 Bucharest, Romania

Big Data and Artificial Intelligence

Abstract submission deadline
closed (30 September 2022)
Manuscript submission deadline
closed (31 December 2022)
Viewed by
247727

Topic Information

Dear Colleagues,

The evolution of research in Big Data and artificial intelligence in recent years challenges almost all domains of human activity. The potential of artificial intelligence to act as a catalyst for all given business models, and the capacity of Big Data research to provide sophisticated data and services ecosystems at a global scale, provide a challenging context for scientific contributions and applied research. This Topic section promotes scientific dialogue for the added value of novel methodological approaches and research in the specified areas. Our interest is on the entire end-to-end spectrum of Big Data and artificial intelligence research, from social sciences to computer science including, strategic frameworks, models, and best practices, to sophisticated research related to radical innovation. The topics include, but are not limited to, the following indicative list:

  • Enabling Technologies for Big Data and AI research:
    • Data warehouses;
    • Business intelligence;
    • Machine learning;
    • Neural networks;
    • Natural language processing;
    • Image processing;
    • Bot technology;
    • AI agents;
    • Analytics and dashboards;
    • Distributed computing;
    • Edge computing,
  • Methodologies, frameworks, and models for artificial intelligence and Big Data research:
    • Towards sustainable development goals;
    • As responses to social problems and challenges;
    • For innovations in business, research, academia industry, and technology
    • For theoretical foundations and contributions to the body of knowledge of AI and Big Data research,
  • Best practices and use cases;
  • Outcomes of R&D projects;
  • Advanced data science analytics;
  • Industry-government collaboration;
  • Systems of information systems;
  • Interoperability issues;
  • Security and privacy issues;
  • Ethics on Big Data and AI;
  • Social impact of AI;
  • Open data.

Prof. Dr. Miltiadis D. Lytras
Prof. Dr. Andreea Claudia Serban
Topic Editors

Keywords

  • artificial intelligence
  • big data
  • machine learning
  • open data
  • decision making

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Big Data and Cognitive Computing
BDCC
3.7 4.9 2017 18.2 Days CHF 1800
Future Internet
futureinternet
3.4 6.7 2009 11.8 Days CHF 1600
Information
information
3.1 5.8 2010 18 Days CHF 1600
Remote Sensing
remotesensing
5.0 7.9 2009 23 Days CHF 2700
Sustainability
sustainability
3.9 5.8 2009 18.8 Days CHF 2400

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (87 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
13 pages, 2086 KiB  
Communication
Skillful Seasonal Prediction of Typhoon Track Density Using Deep Learning
by Zhihao Feng, Shuo Lv, Yuan Sun, Xiangbo Feng, Panmao Zhai, Yanluan Lin, Yixuan Shen and Wei Zhong
Remote Sens. 2023, 15(7), 1797; https://doi.org/10.3390/rs15071797 - 28 Mar 2023
Cited by 3 | Viewed by 1585
Abstract
Tropical cyclones (TCs) seriously threaten the safety of human life and property especially when approaching a coast or making landfall. Robust, long-lead predictions are valuable for managing policy responses. However, despite decades of efforts, seasonal prediction of TCs remains a challenge. Here, we [...] Read more.
Tropical cyclones (TCs) seriously threaten the safety of human life and property especially when approaching a coast or making landfall. Robust, long-lead predictions are valuable for managing policy responses. However, despite decades of efforts, seasonal prediction of TCs remains a challenge. Here, we introduce a deep-learning prediction model to make skillful seasonal prediction of TC track density in the Western North Pacific (WNP) during the typhoon season, with a lead time of up to four months. To overcome the limited availability of observational data, we use TC tracks from CMIP5 and CMIP6 climate models as the training data, followed by a transfer-learning method to train a fully convolutional neural network named SeaUnet. Through the deep-learning process (i.e., heat map analysis), SeaUnet identifies physically based precursors. We show that SeaUnet has a good performance for typhoon distribution, outperforming state-of-the-art dynamic systems. The success of SeaUnet indicates its potential for operational use. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Graphical abstract

24 pages, 7711 KiB  
Article
Analysis of the Numerical Solutions of the Elder Problem Using Big Data and Machine Learning
by Roman Khotyachuk and Klaus Johannsen
Big Data Cogn. Comput. 2023, 7(1), 52; https://doi.org/10.3390/bdcc7010052 - 20 Mar 2023
Viewed by 1584
Abstract
In this study, the numerical solutions to the Elder problem are analyzed using Big Data technologies and data-driven approaches. The steady-state solutions to the Elder problem are investigated with regard to Rayleigh numbers (Ra), grid sizes, perturbations, and other parameters [...] Read more.
In this study, the numerical solutions to the Elder problem are analyzed using Big Data technologies and data-driven approaches. The steady-state solutions to the Elder problem are investigated with regard to Rayleigh numbers (Ra), grid sizes, perturbations, and other parameters of the system studied. The complexity analysis is carried out for the datasets containing different solutions to the Elder problem, and the time of the highest complexity of numerical solutions is estimated. An approach to the identification of transient fingers and the visualization of large ensembles of solutions is proposed. Predictive models are developed to forecast steady states based on early-time observations. These models are classified into three possible types depending on the features (predictors) used in a model. The numerical results of the prediction accuracy are given, including the estimated confidence intervals for the accuracy, and the estimated time of 95% predictability. Different solutions, their averages, principal components, and other parameters are visualized. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

15 pages, 2721 KiB  
Article
Multi-Layered Projected Entangled Pair States for Image Classification
by Lei Li and Hong Lai
Sustainability 2023, 15(6), 5120; https://doi.org/10.3390/su15065120 - 14 Mar 2023
Viewed by 1368
Abstract
Tensor networks have been recognized as a powerful numerical tool; they are applied in various fields, including physics, computer science, and more. The idea of a tensor network originates from quantum physics as an efficient representation of quantum many-body states and their operations. [...] Read more.
Tensor networks have been recognized as a powerful numerical tool; they are applied in various fields, including physics, computer science, and more. The idea of a tensor network originates from quantum physics as an efficient representation of quantum many-body states and their operations. Matrix product states (MPS) form one of the simplest tensor networks and have been applied to machine learning for image classification. However, MPS has certain limitations when processing two-dimensional images, meaning that it is preferable for an projected entangled pair states (PEPS) tensor network with a similar structure to the image to be introduced into machine learning. PEPS tensor networks are significantly superior to other tensor networks on the image classification task. Based on a PEPS tensor network, this paper constructs a multi-layered PEPS (MLPEPS) tensor network model for image classification. PEPS is used to extract features layer by layer from the image mapped to the Hilbert space, which fully utilizes the correlation between pixels while retaining the global structural information of the image. When performing classification tasks on the Fashion-MNIST dataset, MLPEPS achieves a classification accuracy of 90.44%, exceeding tensor network models such as the original PEPS. On the COVID-19 radiography dataset, MLPEPS has a test set accuracy of 91.63%, which is very close to the results of GoogLeNet. Under the same experimental conditions, the learning ability of MLPEPS is already close to that of existing neural networks while having fewer parameters. MLPEPS can be used to build different network models by modifying the structure, and as such it has great potential in machine learning. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

23 pages, 9522 KiB  
Article
Oriented Object Detection in Aerial Images Based on the Scaled Smooth L1 Loss Function
by Linhai Wei, Chen Zheng and Yijun Hu
Remote Sens. 2023, 15(5), 1350; https://doi.org/10.3390/rs15051350 - 28 Feb 2023
Cited by 1 | Viewed by 2659
Abstract
Although many state-of-the-art object detectors have been developed, detecting small and densely packed objects with complicated orientations in remote sensing aerial images remains challenging. For object detection in remote sensing aerial images, different scales, sizes, appearances, and orientations of objects from different categories [...] Read more.
Although many state-of-the-art object detectors have been developed, detecting small and densely packed objects with complicated orientations in remote sensing aerial images remains challenging. For object detection in remote sensing aerial images, different scales, sizes, appearances, and orientations of objects from different categories could most likely enlarge the variance in the detection error. Undoubtedly, the variance in the detection error should have a non-negligible impact on the detection performance. Motivated by the above consideration, in this paper, we tackled this issue, so that we could improve the detection performance and reduce the impact of this variance on the detection performance as much as possible. By proposing a scaled smooth L1 loss function, we developed a new two-stage object detector for remote sensing aerial images, named Faster R-CNN-NeXt with RoI-Transformer. The proposed scaled smooth L1 loss function is used for bounding box regression and makes regression invariant to scale. This property ensures that the bounding box regression is more reliable in detecting small and densely packed objects with complicated orientations and backgrounds, leading to improved detection performance. To learn rotated bounding boxes and produce more accurate object locations, a RoI-Transformer module is employed. This is necessary because horizontal bounding boxes are inadequate for aerial image detection. The ResNeXt backbone is also adopted for the proposed object detector. Experimental results on two popular datasets, DOTA and HRSC2016, show that the variance in the detection error significantly affects detection performance. The proposed object detector is effective and robust, with the optimal scale factor for the scaled smooth L1 loss function being around 2.0. Compared to other promising two-stage oriented methods, our method achieves a mAP of 70.82 on DOTA, with an improvement of at least 1.26 and up to 16.49. On HRSC2016, our method achieves an mAP of 87.1, with an improvement of at least 0.9 and up to 1.4. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 611 KiB  
Article
How and When Does Big Data Analytics Capability Boost Innovation Performance?
by Hua Zhang and Shaofeng Yuan
Sustainability 2023, 15(5), 4036; https://doi.org/10.3390/su15054036 - 22 Feb 2023
Cited by 3 | Viewed by 2657
Abstract
The diffusion of big data in recent years has stimulated many companies to develop big data analytics capability (BDAC) to boost innovation performance. However, research regarding how and when BDAC can increase innovation performance is still scant. This study aims to test how [...] Read more.
The diffusion of big data in recent years has stimulated many companies to develop big data analytics capability (BDAC) to boost innovation performance. However, research regarding how and when BDAC can increase innovation performance is still scant. This study aims to test how (i.e., the mediating role of strategic flexibility and strategic innovation) and when (i.e., the moderating role of environmental uncertainty) BDAC can boost a firm’s innovation performance drawing on resource-based theory. Through a survey of 421 Chinese managers and employees who are engaged in the field of big data analytics, this study reveals that (1) BDAC has a positive effect on innovation performance, (2) strategic flexibility and strategic innovation play a significant serial mediating role in this relationship, and (3) the positive effect of BDAC on innovation performance is more significant under high (vs. low) environmental uncertainty conditions. This study contributes to the extant literature by verifying how BDAC can increase a firm’s innovation performance through the serial mediating role of strategic flexibility and strategic innovation. It also confirms a contingent factor (i.e., environmental uncertainty) regarding the positive effect of BDAC on innovation performance. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

21 pages, 7958 KiB  
Article
A Multi-Level Distributed Computing Approach to XDraw Viewshed Analysis Using Apache Spark
by Junduo Dong and Jianbo Zhang
Remote Sens. 2023, 15(3), 761; https://doi.org/10.3390/rs15030761 - 28 Jan 2023
Viewed by 1390
Abstract
Viewshed analysis is a terrain visibility computation method based on the digital elevation model (DEM). With the rapid growth of remote sensing and data collection technologies, the volume of large-scale raster DEM data has reached a great size (ZB). However, the data storage [...] Read more.
Viewshed analysis is a terrain visibility computation method based on the digital elevation model (DEM). With the rapid growth of remote sensing and data collection technologies, the volume of large-scale raster DEM data has reached a great size (ZB). However, the data storage and GIS analysis based on such large-scale digital data volume become extra difficult. The usually distributed approaches based on Apache Hadoop and Spark can efficiently handle the viewshed analysis computation of large-scale DEM data, but there are still bottleneck and precision problems. In this article, we present a multi-level distributed XDraw (ML-XDraw) algorithm with Apache Spark to handle the viewshed analysis of large DEM data. The ML-XDraw algorithm mainly consists of 3 parts: (1) designing the XDraw algorithm into a multi-level distributed computing process, (2) introducing a multi-level data decomposition strategy to solve the calculating bottleneck problem of the cluster’s executor, and (3) proposing a boundary approximate calculation strategy to solve the precision loss problem in calculation near the boundary. Experiments show that the ML-XDraw algorithm adequately addresses the above problems and achieves better speed-up and accuracy as the volume of raster DEM data increases drastically. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 3764 KiB  
Article
A Real-Time Computer Vision Based Approach to Detection and Classification of Traffic Incidents
by Mohammed Imran Basheer Ahmed, Rim Zaghdoud, Mohammed Salih Ahmed, Razan Sendi, Sarah Alsharif, Jomana Alabdulkarim, Bashayr Adnan Albin Saad, Reema Alsabt, Atta Rahman and Gomathi Krishnasamy
Big Data Cogn. Comput. 2023, 7(1), 22; https://doi.org/10.3390/bdcc7010022 - 28 Jan 2023
Cited by 24 | Viewed by 7129
Abstract
To constructively ameliorate and enhance traffic safety measures in Saudi Arabia, a prolific number of AI (Artificial Intelligence) traffic surveillance technologies have emerged, including Saher, throughout the past years. However, rapidly detecting a vehicle incident can play a cardinal role in ameliorating the [...] Read more.
To constructively ameliorate and enhance traffic safety measures in Saudi Arabia, a prolific number of AI (Artificial Intelligence) traffic surveillance technologies have emerged, including Saher, throughout the past years. However, rapidly detecting a vehicle incident can play a cardinal role in ameliorating the response speed of incident management, which in turn minimizes road injuries that have been induced by the accident’s occurrence. To attain a permeating effect in increasing the entailed demand for road traffic security and safety, this paper presents a real-time traffic incident detection and alert system that is based on a computer vision approach. The proposed framework consists of three models, each of which is integrated within a prototype interface to fully visualize the system’s overall architecture. To begin, the vehicle detection and tracking model utilized the YOLOv5 object detector with the DeepSORT tracker to detect and track the vehicles’ movements by allocating a unique identification number (ID) to each vehicle. This model attained a mean average precision (mAP) of 99.2%. Second, a traffic accident and severity classification model attained a mAP of 83.3% while utilizing the YOLOv5 algorithm to accurately detect and classify an accident’s severity level, sending an immediate alert message to the nearest hospital if a severe accident has taken place. Finally, the ResNet152 algorithm was utilized to detect the ignition of a fire following the accident’s occurrence; this model achieved an accuracy rate of 98.9%, with an automated alert being sent to the fire station if this perilous event occurred. This study employed an innovative parallel computing technique for reducing the overall complexity and inference time of the AI-based system to run the proposed system in a concurrent and parallel manner. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 3402 KiB  
Article
X-Wines: A Wine Dataset for Recommender Systems and Machine Learning
by Rogério Xavier de Azambuja, A. Jorge Morais and Vítor Filipe
Big Data Cogn. Comput. 2023, 7(1), 20; https://doi.org/10.3390/bdcc7010020 - 22 Jan 2023
Cited by 3 | Viewed by 6014
Abstract
In the current technological scenario of artificial intelligence growth, especially using machine learning, large datasets are necessary. Recommender systems appear with increasing frequency with different techniques for information filtering. Few large wine datasets are available for use with wine recommender systems. This work [...] Read more.
In the current technological scenario of artificial intelligence growth, especially using machine learning, large datasets are necessary. Recommender systems appear with increasing frequency with different techniques for information filtering. Few large wine datasets are available for use with wine recommender systems. This work presents X-Wines, a new and consistent wine dataset containing 100,000 instances and 21 million real evaluations carried out by users. Data were collected on the open Web in 2022 and pre-processed for wider free use. They refer to the scale 1–5 ratings carried out over a period of 10 years (2012–2021) for wines produced in 62 different countries. A demonstration of some applications using X-Wines in the scope of recommender systems with deep learning algorithms is also presented. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

12 pages, 1751 KiB  
Article
The Evolution of Artificial Intelligence in the Digital Economy: An Application of the Potential Dirichlet Allocation Model
by Chunyi Shan, Jun Wang and Yongming Zhu
Sustainability 2023, 15(2), 1360; https://doi.org/10.3390/su15021360 - 11 Jan 2023
Cited by 3 | Viewed by 2265
Abstract
The most critical driver of the digital economy comes from breakthroughs in cutting-edge technologies such as artificial intelligence. In order to promote technological innovation and layout in the field of artificial intelligence, this paper analyzes the patent text of artificial intelligence technology using [...] Read more.
The most critical driver of the digital economy comes from breakthroughs in cutting-edge technologies such as artificial intelligence. In order to promote technological innovation and layout in the field of artificial intelligence, this paper analyzes the patent text of artificial intelligence technology using the LDA topic model from the perspective of the patent technology subject based on Derwent patent data. The results reveal that AI technology is upgraded from chips, sensing, and algorithms to innovative platforms and intelligent applications. Proposed countermeasures are necessary to advance the digitalization of the global economy and to achieve economic globalization in terms of industrial integration, building ecological systems, and strengthening independent innovation. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

21 pages, 2942 KiB  
Article
Prediction of Pork Supply Based on Improved Mayfly Optimization Algorithm and BP Neural Network
by Ji-Quan Wang, Hong-Yu Zhang, Hao-Hao Song, Pan-Li Zhang and Jin-Ling Bei
Sustainability 2022, 14(24), 16559; https://doi.org/10.3390/su142416559 - 09 Dec 2022
Cited by 3 | Viewed by 1237
Abstract
Focusing on the issues of slow convergence speed and the ease of falling into a local optimum when optimizing the weights and thresholds of a back-propagation artificial neural network (BPANN) by the gradient method, a prediction method for pork supply based on an [...] Read more.
Focusing on the issues of slow convergence speed and the ease of falling into a local optimum when optimizing the weights and thresholds of a back-propagation artificial neural network (BPANN) by the gradient method, a prediction method for pork supply based on an improved mayfly optimization algorithm (MOA) and BPANN is proposed. Firstly, in order to improve the performance of MOA, an improved mayfly optimization algorithm with an adaptive visibility coefficient (AVC-IMOA) is introduced. Secondly, AVC-IMOA is used to optimize the weights and thresholds of a BPANN (AVC-IMOA_BP). Thirdly, the trained BPANN and the statistical data are adopted to predict the pork supply in Heilongjiang Province from 2000 to 2020. Finally, to demonstrate the effectiveness of the proposed method for predicting pork supply, the pork supply in Heilongjiang Province was predicted by using AVC-IMOA_BP, a BPANN based on the gradient descent method and a BPANN based on a mixed-strategy whale optimization algorithm (MSWOA_BP), a BPANN based on an artificial bee colony algorithm (ABC_BP) and a BPANN based on a firefly algorithm and sparrow search algorithm (FASSA_BP) in the literature. The results show that the prediction accuracy of the proposed method based on AVC-IMOA and a BPANN is obviously better than those of MSWOA_BP, ABC_BP and FASSA_BP, thus verifying the superior performance of AVC-IMOA_BP. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

15 pages, 2284 KiB  
Article
Competency-Based E-Learning Systems: Automated Integration of User Competency Portfolio
by Asta Margienė, Simona Ramanauskaitė, Justas Nugaras, Pavel Stefanovič and Antanas Čenys
Sustainability 2022, 14(24), 16544; https://doi.org/10.3390/su142416544 - 09 Dec 2022
Cited by 1 | Viewed by 1221
Abstract
In today’s learning environment, e-learning systems are becoming a necessity. A competency-based student portfolio system is also gaining popularity. Due to the variety of e-learning systems and the increasing mobility of students between different learning institutions or e-learning systems, a higher level of [...] Read more.
In today’s learning environment, e-learning systems are becoming a necessity. A competency-based student portfolio system is also gaining popularity. Due to the variety of e-learning systems and the increasing mobility of students between different learning institutions or e-learning systems, a higher level of automated competency portfolio integration is required. Increasing mobility and complexity makes manual mapping of student competencies unsustainable. The purpose of this paper is to automate the mapping of e-learning system competencies with student-gained competencies from other systems. Natural language processing, text similarity estimation, and fuzzy logic applications were used to implement the automated mapping process. Multiple cases have been tested to determine the effectiveness of the proposed solution. The solution has been shown to be able to accurately predict the coverage of system course competency by students’ course competency with an accuracy of approximately 77%. As it is not possible to achieve 100% mapping accuracy, the competency mapping should be executed semi-automatically by applying the proposed solution to obtain the initial mapping, and then manually revising the results as necessary. When compared to a fully manual mapping of competencies, it reduces workload and increases resource sustainability. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 369 KiB  
Article
Nexus between Renewable Energy, Credit Gap Risk, Financial Development and R&D Expenditure: Panel ARDL Approach
by Ulaş Ünlü, Furkan Yıldırım, Ayhan Kuloğlu, Ersan Ersoy and Emin Hüseyin Çetenak
Sustainability 2022, 14(23), 16232; https://doi.org/10.3390/su142316232 - 05 Dec 2022
Cited by 4 | Viewed by 1468
Abstract
In the study, we investigate the relationships between renewable energy consumption sub-indicators of G-8 countries and financial development, credit gap risk, and R&D expenditure from 1996 to 2018. The relationships among the variables in the study are analyzed by employing the Panel ARDL [...] Read more.
In the study, we investigate the relationships between renewable energy consumption sub-indicators of G-8 countries and financial development, credit gap risk, and R&D expenditure from 1996 to 2018. The relationships among the variables in the study are analyzed by employing the Panel ARDL method and the Dumitrescu–Hurlin panel causality test. The cointegration relationships between the variables have been analyzed using the bounds test approach, and an unrestricted error correction model has been established. Contrary to previous studies in the renewable energy literature, this study employed the variable of credit gap risk. Therefore, we believe that this study will fill the gap in the literature and attract the attention of researchers and policymakers. The results indicate that increases in total demand for renewable energy positively affect the financial development of countries. Moreover, R&D expenditures increase as the demand for hydro energy and solar energy increases. This result indicates that wind power consumption has a short-term impact on R&D expenditure, and such an impact ceases to exist in the long run. According to the empirical research findings, the rise in demand for renewable energy may be a factor mitigating the credit gap risk of countries. In other words, the credit gap risk, which is considered a leading indicator of systemic banking crises, can be mitigated by the rise in the demand for renewable energy. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
15 pages, 4071 KiB  
Case Report
PM2.5 Prediction Based on the CEEMDAN Algorithm and a Machine Learning Hybrid Model
by Wenchao Ban and Liangduo Shen
Sustainability 2022, 14(23), 16128; https://doi.org/10.3390/su142316128 - 02 Dec 2022
Cited by 10 | Viewed by 1661
Abstract
The current serious air pollution problem has become a closely investigated topic in people’s daily lives. If we want to provide a reasonable basis for haze prevention, then the prediction of PM2.5 concentrations becomes a crucial task. However, it is difficult to complete [...] Read more.
The current serious air pollution problem has become a closely investigated topic in people’s daily lives. If we want to provide a reasonable basis for haze prevention, then the prediction of PM2.5 concentrations becomes a crucial task. However, it is difficult to complete the task of PM2.5 concentration prediction using a single model; therefore, to address this problem, this paper proposes a fully adaptive noise ensemble empirical modal decomposition (CEEMDAN) algorithm combined with deep learning hybrid models. Firstly, the CEEMDAN algorithm was used to decompose the PM2.5 timeseries data into different modal components. Then long short-term memory (LSTM), a backpropagation (BP) neural network, a differential integrated moving average autoregressive model (ARIMA), and a support vector machine (SVM) were applied to each modal component. Lastly, the best prediction results of each component were superimposed and summed to obtain the final prediction results. The PM2.5 data of Hangzhou in recent years were substituted into the model for testing, which was compared with eight models, namely, LSTM, ARIMA, BP, SVM, CEEMDAN–ARIMA, CEEMDAN–LSTM, CEEMDAN–SVM, and CEEMDAN–BP. The results show that for the coupled CEEMDAN–LSTM–BP–ARIMA model, the prediction ability was better than all the other models, and the timeseries decomposition data of PM2.5 had their own characteristics. The data with different characteristics were predicted separately using appropriate models and the final combined model results obtained were the most satisfactory. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 822 KiB  
Article
Analysis and Prediction of the IPv6 Traffic over Campus Networks in Shanghai
by Zhiyang Sun, Hui Ruan, Yixin Cao, Yang Chen and Xin Wang
Future Internet 2022, 14(12), 353; https://doi.org/10.3390/fi14120353 - 27 Nov 2022
Cited by 1 | Viewed by 1835
Abstract
With the exhaustion of IPv4 addresses, research on the adoption, deployment, and prediction of IPv6 networks becomes more and more significant. This paper analyzes the IPv6 traffic of two campus networks in Shanghai, China. We first conduct a series of analyses for the [...] Read more.
With the exhaustion of IPv4 addresses, research on the adoption, deployment, and prediction of IPv6 networks becomes more and more significant. This paper analyzes the IPv6 traffic of two campus networks in Shanghai, China. We first conduct a series of analyses for the traffic patterns and uncover weekday/weekend patterns, the self-similarity phenomenon, and the correlation between IPv6 and IPv4 traffic. On weekends, traffic usage is smaller than on weekdays, but the distribution does not change much. We find that the self-similarity of IPv4 traffic is close to that of IPv6 traffic, and there is a strong positive correlation between IPv6 traffic and IPv4 traffic. Based on our findings on traffic patterns, we propose a new IPv6 traffic prediction model by combining the advantages of the statistical and deep learning models. In addition, our model would extract useful information from the corresponding IPv4 traffic to enhance the prediction. Based on two real-world datasets, it is shown that the proposed model outperforms eight baselines with a lower prediction error. In conclusion, our approach is helpful for network resource allocation and network management. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 1925 KiB  
Article
An Effective Online Sequential Stochastic Configuration Algorithm for Neural Networks
by Yuting Chen and Ming Li
Sustainability 2022, 14(23), 15601; https://doi.org/10.3390/su142315601 - 23 Nov 2022
Viewed by 1154
Abstract
Random Vector Functional-link (RVFL) networks, as a class of random learner models, have received careful attention from the neural network research community due to their advantages in obtaining fast learning algorithms and models, in which the hidden layer parameters are randomly generated and [...] Read more.
Random Vector Functional-link (RVFL) networks, as a class of random learner models, have received careful attention from the neural network research community due to their advantages in obtaining fast learning algorithms and models, in which the hidden layer parameters are randomly generated and remain fixed during the training phase. However, its universal approximation ability may not be guaranteed if the random parameters are not properly selected in an appropriate range. Moreover, the resulting random learner’s generalization performance may seriously deteriorate once the RVFL network’s structure is not well-designed. Stochastic configuration (SC) algorithm, which incrementally constructs a universal approximator by obtaining random hidden parameters under a specified supervisory mechanism, instead of fixing the selection scope in advance and without any reference to training information, can effectively circumvent these awkward issues caused by randomness. This paper extends the SC algorithm to an online sequential version, termed as an OSSC algorithm, by means of recursive least square (RLS) technique, aiming to copy with modeling tasks where training observations are sequentially provided. Compared to the online sequential learning of RVFL networks (OS-RVFL in short), our proposed OSSC algorithm can avoid the awkward setting of certain unreasonable range for the random parameters, and can also successfully build a random learner with preferable learning and generalization capabilities. The experimental study has shown the effectiveness and advantages of our OSSC algorithm. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 3101 KiB  
Article
Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor
by Sumeth Yuenyong and Konlakorn Wongpatikaseree
Big Data Cogn. Comput. 2022, 6(4), 136; https://doi.org/10.3390/bdcc6040136 - 11 Nov 2022
Viewed by 1576
Abstract
Due to the ubiquitous nature of CCTV cameras that record continuously, there is a large amount of video data that are unstructured. Often, when these recordings have to be reviewed, it is to look for a specific person that fits a certain description. [...] Read more.
Due to the ubiquitous nature of CCTV cameras that record continuously, there is a large amount of video data that are unstructured. Often, when these recordings have to be reviewed, it is to look for a specific person that fits a certain description. Currently, this is achieved by manual inspection of the videos, which is both time-consuming and labor-intensive. While person description search is not a new topic, in this work, we made two contributions. First, we improve upon the existing state-of-the-art by proposing unsupervised finetuning on the language model that forms a main part of the text branch of person description search models. This led to higher recall values on the standard dataset. The second contribution is that we engineered a complete pipeline from video files to fast searchable objects. Due to the use of an approximate nearest neighbor search and some model optimizations, a person description search can be performed such that the result is available immediately when deployed on a standard PC with no GPU, allowing an interactive search. We demonstrated the effectiveness of the system on new data and showed that most people in the videos can be successfully discovered by the search. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 856 KiB  
Article
Unsupervised Cluster-Wise Hyperspectral Band Selection for Classification
by Mateus Habermann, Elcio Hideiti Shiguemori and Vincent Frémont
Remote Sens. 2022, 14(21), 5374; https://doi.org/10.3390/rs14215374 - 27 Oct 2022
Cited by 1 | Viewed by 1337
Abstract
A hyperspectral image provides fine details about the scene under analysis, due to its multiple bands. However, the resulting high dimensionality in the feature space may render a classification task unreliable, mainly due to overfitting and the Hughes phenomenon. In order to attenuate [...] Read more.
A hyperspectral image provides fine details about the scene under analysis, due to its multiple bands. However, the resulting high dimensionality in the feature space may render a classification task unreliable, mainly due to overfitting and the Hughes phenomenon. In order to attenuate such problems, one can resort to dimensionality reduction (DR). Thus, this paper proposes a new DR algorithm, which performs an unsupervised band selection technique following a clustering approach. More specifically, the data set was split into a predefined number of clusters, after which the bands were iteratively selected based on the parameters of a separating hyperplane, which provided the best separation in the feature space, in a one-versus-all scenario. Then, a fine-tuning of the initially selected bands took place based on the separability of clusters. A comparison with five other state-of-the-art frameworks shows that the proposed method achieved the best classification results in 60% of the experiments. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Graphical abstract

22 pages, 5337 KiB  
Article
Auto-Learning Correlation-Filter-Based Target State Estimation for Real-Time UAV Tracking
by Ziyang Bian, Tingfa Xu, Junjie Chen, Liang Ma, Wenjing Cai and Jianan Li
Remote Sens. 2022, 14(21), 5299; https://doi.org/10.3390/rs14215299 - 23 Oct 2022
Viewed by 1573
Abstract
Most existing tracking methods based on discriminative correlation filters (DCFs) update the tracker every frame with a fixed learning rate. However, constantly adjusting the tracker can hardly handle the fickle target appearance in UAV tracking (e.g., undergoing partial occlusion, illumination variation, or deformation). [...] Read more.
Most existing tracking methods based on discriminative correlation filters (DCFs) update the tracker every frame with a fixed learning rate. However, constantly adjusting the tracker can hardly handle the fickle target appearance in UAV tracking (e.g., undergoing partial occlusion, illumination variation, or deformation). To mitigate this, we propose a novel auto-learning correlation filter for UAV tracking, which fully exploits valuable information behind response maps for adaptive feedback updating. Concretely, we first introduce a principled target state estimation (TSE) criterion to reveal the confidence level of the tracking results. We suggest an auto-learning strategy with the TSE metric to update the tracker with adaptive learning rates. Based on the target state estimation, we further developed an innovative lost-and-found strategy to recognize and handle temporal target missing. Finally, we incorporated the TSE regularization term into the DCF objective function, which by alternating optimization iterations can efficiently solve without much computational cost. Extensive experiments on four widely-used UAV benchmarks have demonstrated the superiority of the proposed method compared to both DCF and deep-based trackers. Notably, ALCF achieved state-of-the-art performance on several benchmarks while running over 50 FPS on a single CPU. Code will be released soon. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 935 KiB  
Article
Supporting Meteorologists in Data Analysis through Knowledge-Based Recommendations
by Thoralf Reis, Tim Funke, Sebastian Bruchhaus, Florian Freund, Marco X. Bornschlegl and Matthias L. Hemmje
Big Data Cogn. Comput. 2022, 6(4), 103; https://doi.org/10.3390/bdcc6040103 - 28 Sep 2022
Cited by 2 | Viewed by 1961
Abstract
Climate change means coping directly or indirectly with extreme weather conditions for everybody. Therefore, analyzing meteorological data to create precise models is gaining more importance and might become inevitable. Meteorologists have extensive domain knowledge about meteorological data yet lack practical data analysis skills. [...] Read more.
Climate change means coping directly or indirectly with extreme weather conditions for everybody. Therefore, analyzing meteorological data to create precise models is gaining more importance and might become inevitable. Meteorologists have extensive domain knowledge about meteorological data yet lack practical data analysis skills. This paper presents a method to bridge this gap by empowering the data knowledge carriers to analyze the data. The proposed system utilizes symbolic AI, a knowledge base created by experts, and a recommendation expert system to offer suiting data analysis methods or data pre-processing to meteorologists. This paper systematically analyzes the target user group of meteorologists and practical use cases to arrive at a conceptual and technical system design implemented in the CAMeRI prototype. The concepts in this paper are aligned with the AI2VIS4BigData Reference Model and comprise a novel first-order logic knowledge base that represents analysis methods and related pre-processings. The prototype implementation was qualitatively and quantitatively evaluated. This evaluation included recommendation validation for real-world data, a cognitive walkthrough, and measuring computation timings of the different system components. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

12 pages, 1256 KiB  
Article
Image Retrieval Algorithm Based on Locality-Sensitive Hash Using Convolutional Neural Network and Attention Mechanism
by Youmeng Luo, Wei Li, Xiaoyu Ma and Kaiqiang Zhang
Information 2022, 13(10), 446; https://doi.org/10.3390/info13100446 - 24 Sep 2022
Cited by 5 | Viewed by 2553
Abstract
With the continuous progress of image retrieval technology, in the field of image retrieval, the speed of a search for a desired image from a great deal of image data becomes a hot issue. Convolutional Neural Networks (CNN) have been used in the [...] Read more.
With the continuous progress of image retrieval technology, in the field of image retrieval, the speed of a search for a desired image from a great deal of image data becomes a hot issue. Convolutional Neural Networks (CNN) have been used in the field of image retrieval. However, many image retrieval systems based on CNN have a poor ability to express image features, resulting in a series of problems such as low retrieval accuracy and robustness. When the target image is retrieved from a large amount of image data, the vector dimension after image coding is high and the retrieval efficiency is low. Locality-sensitive hash is a method to find similar data from massive high latitude data. It reduces the data dimension of the original spatial data through hash coding and conversion, and can also maintain the similarity between the data. The retrieval time and space complexity are low. Therefore, this paper proposes a locality-sensitive hash image retrieval method based on CNN and the attention mechanism. The steps of the method are as follows: using the ResNet50 network as the feature extractor of the image, adding the attention module after the convolution layer of the model, and using the output of the network full connection layer to retrieve the features of the image database, then using the local-sensitive hash algorithm to hash code the image features of the database to reduce the dimension and establish the index, and finally measuring the features of the image to be retrieved and the image database to get the most similar image, completing the content-based image retrieval task. The method in this paper is compared with other image retrieval methods on corel1k and corel5k datasets. The experimental results show that this method can effectively improve the accuracy of image retrieval, and the retrieval efficiency is significantly improved. It also has higher robustness in different scenarios. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 922 KiB  
Article
A Data-Driven Based Method for Pipeline Additional Stress Prediction Subject to Landslide Geohazards
by Meng Zhang, Jiatong Ling, Buyun Tang, Shaohua Dong and Laibin Zhang
Sustainability 2022, 14(19), 11999; https://doi.org/10.3390/su141911999 - 22 Sep 2022
Cited by 4 | Viewed by 1546
Abstract
Pipelines that cross complex geological terrains are inevitably threatened by natural hazards, among which landslide attracts extensive attention when pipelines cross mountainous areas. The landslides are typically associated with ground movements that would induce additional stress on the pipeline. Such stress state of [...] Read more.
Pipelines that cross complex geological terrains are inevitably threatened by natural hazards, among which landslide attracts extensive attention when pipelines cross mountainous areas. The landslides are typically associated with ground movements that would induce additional stress on the pipeline. Such stress state of pipelines under landslide interference seriously damage structural integrity of the pipeline. Up to the date, limited research has been done on the combined landslide hazard and pipeline stress state analysis. In this paper, a multi-parameter integrated monitoring system was developed for the pipeline stress-strain state and landslide deformation monitoring. Also, data-driven models for the pipeline additional stress prediction was established. The developed predictive models include individual and ensemble-based machine learning approaches. The implementation procedure of the predictive models integrates the field data measured by the monitoring system, with k-fold cross validation used for the generalization performance evaluation. The obtained results indicate that the XGBoost model has the highest performance in the prediction of the additional stress. Besides, the significance of the input variables is determined through sensitivity analyses by using feature importance criteria. Thus, the integrated monitoring system together with the XGBoost prediction method is beneficial to modeling the additional stress in oil and gas pipelines, which will further contribute to pipeline geohazards monitoring management. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

13 pages, 13027 KiB  
Data Descriptor
A Worldwide Bibliometric Analysis of Publications on Artificial Intelligence and Ethics in the Past Seven Decades
by Chien-Wei Chuang, Ariana Chang, Mingchih Chen, Maria John P. Selvamani and Ben-Chang Shia
Sustainability 2022, 14(18), 11125; https://doi.org/10.3390/su141811125 - 06 Sep 2022
Cited by 1 | Viewed by 1942
Abstract
Issues related to artificial intelligence (AI) and ethics have gained much traction worldwide. The impact of AI on society has been extensively discussed. This study presents a bibliometric analysis of research results, citation relationships among researchers, and highly referenced journals on AI and [...] Read more.
Issues related to artificial intelligence (AI) and ethics have gained much traction worldwide. The impact of AI on society has been extensively discussed. This study presents a bibliometric analysis of research results, citation relationships among researchers, and highly referenced journals on AI and ethics on a global scale. Papers published on AI and ethics were recovered from the Microsoft Academic Graph Collection data set, and the subject terms included “artificial intelligence” and “ethics.” With 66 nations’ researchers contributing to AI and ethics research, 1585 papers on AI and ethics were recovered, up to 5 July 2021. North America, Western Europe, and East Asia were the regions with the highest productivity. The top ten nations produced about 94.37% of the wide variety of papers. The United States accounted for 47.59% (286 articles) of all papers. Switzerland had the highest research production with a million-person ratio (1.39) when adjusted for populace size. It was followed by the Netherlands (1.26) and the United Kingdom (1.19). The most productive authors were found to be Khatib, O. (n = 10), Verner, I. (n = 9), Bekey, G. A. (n = 7), Gennert, M. A. (n = 7), and Chatila, R., (n = 7). Current research shows that research on artificial intelligence and ethics has evolved dramatically over the past 70 years. Moreover, the United States is more involved with AI and ethics research than developing or emerging countries. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

14 pages, 421 KiB  
Article
Hierarchical Co-Attention Selection Network for Interpretable Fake News Detection
by Xiaoyi Ge, Shuai Hao, Yuxiao Li, Bin Wei and Mingshu Zhang
Big Data Cogn. Comput. 2022, 6(3), 93; https://doi.org/10.3390/bdcc6030093 - 05 Sep 2022
Cited by 2 | Viewed by 3496
Abstract
Social media fake news has become a pervasive and problematic issue today with the development of the internet. Recent studies have utilized different artificial intelligence technologies to verify the truth of the news and provide explanations for the results, which have shown remarkable [...] Read more.
Social media fake news has become a pervasive and problematic issue today with the development of the internet. Recent studies have utilized different artificial intelligence technologies to verify the truth of the news and provide explanations for the results, which have shown remarkable success in interpretable fake news detection. However, individuals’ judgments of news are usually hierarchical, prioritizing valuable words above essential sentences, which is neglected by existing fake news detection models. In this paper, we propose an interpretable novel neural network-based model, the hierarchical co-attention selection network (HCSN), to predict whether the source post is fake, as well as an explanation that emphasizes important comments and particular words. The key insight of the HCSN model is to incorporate the Gumbel–Max trick in the hierarchical co-attention selection mechanism that captures sentence-level and word-level information from the source post and comments following the sequence of words–sentences–words–event. In addition, HCSN enjoys the additional benefit of interpretability—it provides a conscious explanation of how it reaches certain results by selecting comments and highlighting words. According to the experiments conducted on real-world datasets, our model outperformed state-of-the-art methods and generated reasonable explanations. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

15 pages, 299 KiB  
Article
Topical and Non-Topical Approaches to Measure Similarity between Arabic Questions
by Mohammad Daoud
Big Data Cogn. Comput. 2022, 6(3), 87; https://doi.org/10.3390/bdcc6030087 - 22 Aug 2022
Cited by 1 | Viewed by 2552
Abstract
Questions are crucial expressions in any language. Many Natural Language Processing (NLP) or Natural Language Understanding (NLU) applications, such as question-answering computer systems, automatic chatting apps (chatbots), digital virtual assistants, and opinion mining, can benefit from accurately identifying similar questions in an effective [...] Read more.
Questions are crucial expressions in any language. Many Natural Language Processing (NLP) or Natural Language Understanding (NLU) applications, such as question-answering computer systems, automatic chatting apps (chatbots), digital virtual assistants, and opinion mining, can benefit from accurately identifying similar questions in an effective manner. We detail methods for identifying similarities between Arabic questions that have been posted online by Internet users and organizations. Our novel approach uses a non-topical rule-based methodology and topical information (textual similarity, lexical similarity, and semantic similarity) to determine if a pair of Arabic questions are similarly paraphrased. Our method counts the lexical and linguistic distances between each question. Additionally, it identifies questions in accordance with their format and scope using expert hypotheses (rules) that have been experimentally shown to be useful and practical. Even if there is a high degree of lexical similarity between a When question (Timex Factoid—inquiring about time) and a Who inquiry (Enamex Factoid—asking about a named entity), they will not be similar. In an experiment using 2200 question pairs, our method attained an accuracy of 0.85, which is remarkable given the simplicity of the solution and the fact that we did not employ any language models or word embedding. In order to cover common Arabic queries presented by Arabic Internet users, we gathered the questions from various online forums and resources. In this study, we describe a unique method for detecting question similarity that does not require intensive processing, a sizable linguistic corpus, or a costly semantic repository. Because there are not many rich Arabic textual resources, this is especially important for informal Arabic text processing on the Internet. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
17 pages, 1406 KiB  
Article
Machine-Learning-Based Gender Distribution Prediction from Anonymous News Comments: The Case of Korean News Portal
by Jong Hwan Suh
Sustainability 2022, 14(16), 9939; https://doi.org/10.3390/su14169939 - 11 Aug 2022
Cited by 2 | Viewed by 1368
Abstract
Anonymous news comment data from a news portal in South Korea, naver.com, can help conduct gender research and resolve related issues for sustainable societies. Nevertheless, only a small portion of gender information (i.e., gender distribution) is open to the public, and therefore, it [...] Read more.
Anonymous news comment data from a news portal in South Korea, naver.com, can help conduct gender research and resolve related issues for sustainable societies. Nevertheless, only a small portion of gender information (i.e., gender distribution) is open to the public, and therefore, it has rarely been considered for gender research. Hence, this paper aims to resolve the matter of incomplete gender information and make the anonymous news comment data usable for gender research as new social media big data. This paper proposes a machine-learning-based approach for predicting the gender distribution (i.e., male and female rates) of anonymous news commenters for a news article. Initially, the big data of news articles and their anonymous news comments were collected and divided into labeled and unlabeled datasets (i.e., with and without gender information). The word2vec approach was employed to represent a news article by the characteristics of the news comments. Then, using the labeled dataset, various prediction techniques were evaluated for predicting the gender distribution of anonymous news commenters for a labeled news article. As a result, the neural network was selected as the best prediction technique, and it could accurately predict the gender distribution of anonymous news commenters of the labeled news article. Thus, this study showed that a machine-learning-based approach can overcome the incomplete gender information problem of anonymous social media users. Moreover, when the gender distributions of the unlabeled news articles were predicted using the best neural network model, trained with the labeled dataset, their distribution turned out different from the labeled news articles. The result indicates that using only the labeled dataset for gender research can result in misleading findings and distorted conclusions. The predicted gender distributions for the unlabeled news articles can help to better understand anonymous news commenters as humans for sustainable societies. Eventually, this study provides a new way for data-driven computational social science with incomplete and anonymous social media big data. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

21 pages, 2206 KiB  
Article
Using Explainable Artificial Intelligence to Identify Key Characteristics of Deep Poverty for Each Household
by Wenguang Zhang, Ting Lei, Yu Gong, Jun Zhang and Yirong Wu
Sustainability 2022, 14(16), 9872; https://doi.org/10.3390/su14169872 - 10 Aug 2022
Viewed by 1667
Abstract
The first task for eradicating poverty is accurate poverty identification. Deep poverty identification is conducive to investing resources to help deeply poor populations achieve prosperity, one of the most challenging tasks in poverty eradication. This study constructs a deep poverty identification model utilizing [...] Read more.
The first task for eradicating poverty is accurate poverty identification. Deep poverty identification is conducive to investing resources to help deeply poor populations achieve prosperity, one of the most challenging tasks in poverty eradication. This study constructs a deep poverty identification model utilizing explainable artificial intelligence (XAI) to identify deeply poor households based on the data of 23,307 poor households in rural areas in China. For comparison, a logistic regression-based model and an income-based model are developed as well. We found that our XAI-based model achieves a higher identification performance in terms of the area under the ROC curve than both the logistic regression-based model and the income-based model. For each rural household, the odds of being identified as deeply poor are obtained. Additionally, multidimensional household characteristics associated with deep poverty are specified and ranked for each poor household, while ordinary feature ranking methods can only provide ranking results for poor households as a whole. Taking all poor households into consideration, we found that common important characteristics that can be used to identify deeply poor households include household income, disability, village attributes, lack of funds, labor force, disease, and number of household members, which are validated by mutual information analysis. In conclusion, our XAI-based model can be used to identify deep poverty and specify key household characteristics associated with deep poverty for individual households, facilitating the development of new targeted poverty reduction strategies. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 905 KiB  
Article
Efficient Supervised Image Clustering Based on Density Division and Graph Neural Networks
by Qingchao Zhao, Long Li, Yan Chu, Zhen Yang, Zhengkui Wang and Wen Shan
Remote Sens. 2022, 14(15), 3768; https://doi.org/10.3390/rs14153768 - 05 Aug 2022
Cited by 1 | Viewed by 1815
Abstract
In recent research, supervised image clustering based on Graph Neural Networks (GNN) connectivity prediction has demonstrated considerable improvements over traditional clustering algorithms. However, existing supervised image clustering algorithms are usually time-consuming and limit their applications. In order to infer the connectivity between image [...] Read more.
In recent research, supervised image clustering based on Graph Neural Networks (GNN) connectivity prediction has demonstrated considerable improvements over traditional clustering algorithms. However, existing supervised image clustering algorithms are usually time-consuming and limit their applications. In order to infer the connectivity between image instances, they usually created a subgraph for each image instance. Due to the creation and process of a large number of subgraphs as the input of GNN, the computation overheads are enormous. To address the high computation overhead problem in the GNN connectivity prediction, we present a time-efficient and effective GNN-based supervised clustering framework based on density division namely DDC-GNN. DDC-GNN divides all image instances into high-density parts and low-density parts, and only performs GNN subgraph connectivity prediction on the low-density parts, resulting in a significant reduction in redundant calculations. We test two typical models in the GNN connectivity prediction module in the DDC-GNN framework, which are the graph convolutional networks (GCN)-based model and the graph auto-encoder (GAE)-based model. Meanwhile, adaptive subgraphs are generated to ensure sufficient contextual information extraction for low-density parts instead of the fixed-size subgraphs. According to the experiments on different datasets, DDC-GNN achieves higher accuracy and is almost five times quicker than those without the density division strategy. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 4220 KiB  
Article
A Study on the Optimal Flexible Job-Shop Scheduling with Sequence-Dependent Setup Time Based on a Hybrid Algorithm of Improved Quantum Cat Swarm Optimization
by Haicao Song and Pan Liu
Sustainability 2022, 14(15), 9547; https://doi.org/10.3390/su14159547 - 03 Aug 2022
Cited by 4 | Viewed by 1665
Abstract
Multi-item and small-lot-size production modes lead to frequent setup, which involves significant setup times and has a substantial impact on productivity. In this study, we investigated the optimal flexible job-shop scheduling problem with a sequence-dependent setup time. We built a mathematical model with [...] Read more.
Multi-item and small-lot-size production modes lead to frequent setup, which involves significant setup times and has a substantial impact on productivity. In this study, we investigated the optimal flexible job-shop scheduling problem with a sequence-dependent setup time. We built a mathematical model with the optimal objective of minimization of the maximum completion time (makespan). Considering the process sequence, which is influenced by setup time, processing time, and machine load limitations, first, processing machinery is chosen based on machine load and processing time, and then processing tasks are scheduled based on setup time and processing time. An improved quantum cat swarm optimization (QCSO) algorithm is proposed to solve the problem, a quantum coding method is introduced, the quantum bit (Q-bit) and cat swarm algorithm (CSO) are combined, and the cats are iteratively updated by quantum rotation angle position; then, the dynamic mixture ratio (MR) value is selected according to the number of algorithm iterations. The use of this method expands our understanding of space and increases operation efficiency and speed. Finally, the improved QCSO algorithm and parallel genetic algorithm (PGA) are compared through simulation experiments. The results show that the improved QCSO algorithm has better results, and the robustness of the algorithm is improved. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 5895 KiB  
Article
LGB-PHY: An Evaporation Duct Height Prediction Model Based on Physically Constrained LightGBM Algorithm
by Xingyu Chai, Jincai Li, Jun Zhao, Wuxin Wang and Xiaofeng Zhao
Remote Sens. 2022, 14(14), 3448; https://doi.org/10.3390/rs14143448 - 18 Jul 2022
Cited by 7 | Viewed by 1834
Abstract
The evaporation duct is a special atmospheric stratification that significantly influences the propagation path of electromagnetic waves at sea, and hence, it is crucial for the stability of the radio communication systems. Affected by physical parameters that are not universal, traditional evaporation duct [...] Read more.
The evaporation duct is a special atmospheric stratification that significantly influences the propagation path of electromagnetic waves at sea, and hence, it is crucial for the stability of the radio communication systems. Affected by physical parameters that are not universal, traditional evaporation duct theoretical models often have limited accuracy and poor generalization ability, e.g., the remote sensing method is limited by the inversion algorithm. The accuracy, generalization ability and scientific interpretability of the existing pure data-driven evaporation duct height prediction models still need to be improved. To address these issues, in this paper, we use the voyage observation data and propose the physically constrained LightGBM evaporation duct height prediction model (LGB-PHY). The proposed model integrates the Babin–Young–Carton (BYC) physical model into a custom loss function. Compared with the eXtreme Gradient Boosting (XGB) model, the LGB-PHY based on a 5-day voyage data set of the South China Sea provides significant improvement where the RMSE index is reduced by 68%, while the SCC index is improved by 6.5%. We further carried out a cross-comparison experiment of regional generalization and show that in the sea area with high latitude and strong adaptability of the BYC model, the LGB-PHY model has a stronger regional generalization performance than that of the XGB model. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

25 pages, 10446 KiB  
Article
Object Localization in Weakly Labeled Remote Sensing Images Based on Deep Convolutional Features
by Yang Long, Xiaofang Zhai, Qiao Wan and Xiaowei Tan
Remote Sens. 2022, 14(13), 3230; https://doi.org/10.3390/rs14133230 - 05 Jul 2022
Cited by 1 | Viewed by 2127
Abstract
Object recognition, as one of the most fundamental and challenging problems in high-resolution remote sensing image interpretation, has received increasing attention in recent years. However, most conventional object recognition pipelines aim to recognize instances with bounding boxes in a supervised learning strategy, which [...] Read more.
Object recognition, as one of the most fundamental and challenging problems in high-resolution remote sensing image interpretation, has received increasing attention in recent years. However, most conventional object recognition pipelines aim to recognize instances with bounding boxes in a supervised learning strategy, which require intensive and manual labor for instance annotation creation. In this paper, we propose a weakly supervised learning method to alleviate this problem. The core idea of our method is to recognize multiple objects in an image using only image-level semantic labels and indicate the recognized objects with location points instead of box extent. Specifically, a deep convolutional neural network is first trained to perform semantic scene classification, of which the result is employed for the categorical determination of objects in an image. Then, by back-propagating the categorical feature from the fully connected layer to the deep convolutional layer, the categorical and spatial information of an image are combined to obtain an object discriminative localization map, which can effectively indicate the salient regions of objects. Next, a dynamic updating method of local response extremum is proposed to further determine the locations of objects in an image. Finally, extensive experiments are conducted to localize aircraft and oiltanks in remote sensing images based on different convolutional neural networks. Experimental results show that the proposed method outperforms the-state-of-the-art methods, achieving the precision, recall, and F1-score at 94.50%, 88.79%, and 91.56% for aircraft localization and 89.12%, 83.04%, and 85.97% for oiltank localization, respectively. We hope that our work could serve as a basic reference for remote sensing object localization via a weakly supervised strategy and provide new opportunities for further research. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

14 pages, 4001 KiB  
Article
Fog Computing Capabilities for Big Data Provisioning: Visualization Scenario
by Halimjon Khujamatov, Khaleel Ahmad, Nargiza Usmanova, Jamshid Khoshimov, Mai Alduailij and Mona Alduailij
Sustainability 2022, 14(13), 8070; https://doi.org/10.3390/su14138070 - 01 Jul 2022
Cited by 4 | Viewed by 1777
Abstract
With the development of Internet technologies, huge amounts of data are collected from various sources, and used ‘anytime, anywhere’ to enrich and change the life of the whole of society, attract ways to do business, and better perceive people’s lives. Those datasets, called [...] Read more.
With the development of Internet technologies, huge amounts of data are collected from various sources, and used ‘anytime, anywhere’ to enrich and change the life of the whole of society, attract ways to do business, and better perceive people’s lives. Those datasets, called ‘big data’, need to be processed, stored, or retrieved, and special tools were developed to analyze this big data. At the same time, the ever-increasing development of the Internet of Things (IoT) requires IoT devices to be mobile, with adequate data processing performance. The new fog computing paradigm makes computing resources more accessible, and provides a flexible environment that will be widely used in next-generation networks, vehicles, etc., demonstrating enhanced capabilities and optimizing resources. This paper is devoted to analyzing fog computing capabilities for big data provisioning, while considering this technology’s different architectural and functional aspects. The analysis includes exploring the protocols suitable for fog computing by implementing an experimental fog computing network and assessing its capabilities for providing big data, originating from both a real-time stream and batch data, with appropriate visualization of big data processing. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

13 pages, 1889 KiB  
Article
GenericConv: A Generic Model for Image Scene Classification Using Few-Shot Learning
by Mohamed Soudy, Yasmine M. Afify and Nagwa Badr
Information 2022, 13(7), 315; https://doi.org/10.3390/info13070315 - 28 Jun 2022
Cited by 1 | Viewed by 2274
Abstract
Scene classification is one of the most complex tasks in computer-vision. The accuracy of scene classification is dependent on other subtasks such as object detection and object classification. Accurate results may be accomplished by employing object detection in scene classification since prior information [...] Read more.
Scene classification is one of the most complex tasks in computer-vision. The accuracy of scene classification is dependent on other subtasks such as object detection and object classification. Accurate results may be accomplished by employing object detection in scene classification since prior information about objects in the image will lead to an easier interpretation of the image content. Machine and transfer learning are widely employed in scene classification achieving optimal performance. Despite the promising performance of existing models in scene classification, there are still major issues. First, the training phase for the models necessitates a large amount of data, which is a difficult and time-consuming task. Furthermore, most models are reliant on data previously seen in the training set, resulting in ineffective models that can only identify samples that are similar to the training set. As a result, few-shot learning has been introduced. Although few attempts have been reported applying few-shot learning to scene classification, they resulted in perfect accuracy. Motivated by these findings, in this paper we implement a novel few-shot learning model—GenericConv—for scene classification that has been evaluated using benchmarked datasets: MiniSun, MiniPlaces, and MIT-Indoor 67 datasets. The experimental results show that the proposed model GenericConv outperforms the other benchmark models on the three datasets, achieving accuracies of 52.16 ± 0.015, 35.86 ± 0.014, and 37.26 ± 0.014 for five-shots on MiniSun, MiniPlaces, and MIT-Indoor 67 datasets, respectively. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

15 pages, 3816 KiB  
Article
An Effective Ensemble Automatic Feature Selection Method for Network Intrusion Detection
by Yang Zhang, Hongpo Zhang and Bo Zhang
Information 2022, 13(7), 314; https://doi.org/10.3390/info13070314 - 27 Jun 2022
Cited by 8 | Viewed by 2687
Abstract
The mass of redundant and irrelevant data in network traffic brings serious challenges to intrusion detection, and feature selection can effectively remove meaningless information from the data. Most current filtered and embedded feature selection methods use a fixed threshold or ratio to determine [...] Read more.
The mass of redundant and irrelevant data in network traffic brings serious challenges to intrusion detection, and feature selection can effectively remove meaningless information from the data. Most current filtered and embedded feature selection methods use a fixed threshold or ratio to determine the number of features in a subset, which requires a priori knowledge. In contrast, wrapped feature selection methods are computationally complex and time-consuming; meanwhile, individual feature selection methods have a bias in evaluating features. This work designs an ensemble-based automatic feature selection method called EAFS. Firstly, we calculate the feature importance or ranks based on individual methods, then add features to subsets sequentially by importance and evaluate subset performance comprehensively by designing an NSOM to obtain the subset with the largest NSOM value. When searching for a subset, the subset with higher accuracy is retained to lower the computational complexity by calculating the accuracy when the full set of features is used. Finally, the obtained subsets are ensembled, and by comparing the experimental results on three large-scale public datasets, the method described in this study can help in the classification, and also compared with other methods, we discover that our method outperforms other recent methods in terms of performance. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

11 pages, 1717 KiB  
Article
VisualRPI: Visualizing Research Productivity and Impact
by Chihli Hung and Wei-Chao Lin
Sustainability 2022, 14(13), 7679; https://doi.org/10.3390/su14137679 - 23 Jun 2022
Cited by 1 | Viewed by 1272
Abstract
Research productivity and impact (RPI) is commonly measured through citation analysis, such as the h-index. Despite the popularity and objectivity of this type of method, it is still difficult to effectively compare a number of related researchers in terms of various citation-related statistics [...] Read more.
Research productivity and impact (RPI) is commonly measured through citation analysis, such as the h-index. Despite the popularity and objectivity of this type of method, it is still difficult to effectively compare a number of related researchers in terms of various citation-related statistics at the same time, such as average cites per year/paper, the number of papers/citations, h-index, etc. In this work, we develop a method that employs information visualization technology, and examine its applicability for the assessment of researchers’ RPI. Specifically, our prototype, a visualizing research productivity and impact (VisualRPI) system, is introduced, which is composed of clustering and visualization components. The clustering component hierarchically clusters similar research statistics into the same groups, and the visualization component is used to display the RPI in a clear manner. A case example using information for 85 information systems researchers is used to demonstrate the usefulness of VisualRPI. The results show that this method easily measures the RPI for various performance indicators, such as cites/paper and h-index. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

22 pages, 2760 KiB  
Article
A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
by Zihao Ren, Shuiping Gou, Zhang Guo, Shasha Mao and Ruimin Li
Remote Sens. 2022, 14(12), 2939; https://doi.org/10.3390/rs14122939 - 20 Jun 2022
Cited by 7 | Viewed by 5188
Abstract
Remote sensing image captioning aims to describe the content of images using natural language. In contrast with natural images, the scale, distribution, and number of objects generally vary in remote sensing images, making it hard to capture global semantic information and the relationships [...] Read more.
Remote sensing image captioning aims to describe the content of images using natural language. In contrast with natural images, the scale, distribution, and number of objects generally vary in remote sensing images, making it hard to capture global semantic information and the relationships between objects at different scales. In this paper, in order to improve the accuracy and diversity of captioning, a mask-guided Transformer network with a topic token is proposed. Multi-head attention is introduced to extract features and capture the relationships between objects. On this basis, a topic token is added into the encoder, which represents the scene topic and serves as a prior in the decoder to help us focus better on global semantic information. Moreover, a new Mask-Cross-Entropy strategy is designed in order to improve the diversity of the generated captions, which randomly replaces some input words with a special word (named [Mask]) in the training stage, with the aim of enhancing the model’s learning ability and forcing exploration of uncommon word relations. Experiments on three data sets show that the proposed method can generate captions with high accuracy and diversity, and the experimental results illustrate that the proposed method can outperform state-of-the-art models. Furthermore, the CIDEr score on the RSICD data set increased from 275.49 to 298.39. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Graphical abstract

20 pages, 3651 KiB  
Article
Application of Combined Models Based on Empirical Mode Decomposition, Deep Learning, and Autoregressive Integrated Moving Average Model for Short-Term Heating Load Predictions
by Yong Zhou, Lingyu Wang and Junhao Qian
Sustainability 2022, 14(12), 7349; https://doi.org/10.3390/su14127349 - 15 Jun 2022
Cited by 12 | Viewed by 2133
Abstract
Short-term building energy consumption prediction is of great significance for the optimized operation of building energy management systems and energy conservation. Due to the high-dimensional nonlinear characteristics of building heat loads, traditional single machine-learning models cannot extract the features well. Therefore, in this [...] Read more.
Short-term building energy consumption prediction is of great significance for the optimized operation of building energy management systems and energy conservation. Due to the high-dimensional nonlinear characteristics of building heat loads, traditional single machine-learning models cannot extract the features well. Therefore, in this paper, a combined model based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), four deep learning (DL), and the autoregressive integrated moving average (ARIMA) models is proposed. The DL models include a convolution neural network, long- and short-term memory (LSTM), bi-directional LSTM (bi-LSTM), and the gated recurrent unit. The CEEMDAN decomposed the heating load into different components to extract the different features, while the DL and ARIMA models were used for the prediction of heating load features with high and low complexity, respectively. The single-DL models and the CEEMDAN-DL combinations were also implemented for comparison purposes. The results show that the combined models achieved much higher accuracy compared to the single-DL models and the CEEMDAN-DL combinations. Compared to the single-DL models, the average coefficient of determination (R2), root mean square error (RMSE), and coefficient of variation of the RMSE (CV-RMSE) were improved by 2.91%, 47.93%, and 47.92%, respectively. Furthermore, CEEMDAN-bi-LSTM-ARIMA performed the best of all the combined models, achieving values of R2 = 0.983, RMSE = 70.25 kWh, and CV-RMSE = 1.47%. This study provides a new guide for developing combined models for building energy consumption prediction. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 5612 KiB  
Article
EBBA: An Enhanced Binary Bat Algorithm Integrated with Chaos Theory and Lévy Flight for Feature Selection
by Jinghui Feng, Haopeng Kuang and Lihua Zhang
Future Internet 2022, 14(6), 178; https://doi.org/10.3390/fi14060178 - 09 Jun 2022
Cited by 6 | Viewed by 2176
Abstract
Feature selection can efficiently improve classification accuracy and reduce the dimension of datasets. However, feature selection is a challenging and complex task that requires a high-performance optimization algorithm. In this paper, we propose an enhanced binary bat algorithm (EBBA) which is originated from [...] Read more.
Feature selection can efficiently improve classification accuracy and reduce the dimension of datasets. However, feature selection is a challenging and complex task that requires a high-performance optimization algorithm. In this paper, we propose an enhanced binary bat algorithm (EBBA) which is originated from the conventional binary bat algorithm (BBA) as the learning algorithm in a wrapper-based feature selection model. First, we model the feature selection problem and then transfer it as a fitness function. Then, we propose an EBBA for solving the feature selection problem. In EBBA, we introduce the Lévy flight-based global search method, population diversity boosting method and chaos-based loudness method to improve the BA and make it more applicable to feature selection problems. Finally, the simulations are conducted to evaluate the proposed EBBA and the simulation results demonstrate that the proposed EBBA outmatches other comparison benchmarks. Moreover, we also illustrate the effectiveness of the proposed improved factors by tests. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

23 pages, 19906 KiB  
Article
Semi-Supervised Cloud Detection in Satellite Images by Considering the Domain Shift Problem
by Jianhua Guo, Qingsong Xu, Yue Zeng, Zhiheng Liu and Xiaoxiang Zhu
Remote Sens. 2022, 14(11), 2641; https://doi.org/10.3390/rs14112641 - 31 May 2022
Cited by 9 | Viewed by 2337
Abstract
In terms of semi-supervised cloud detection work, efforts are being made to learn a promising cloud detection model via a limited number of pixel-wise labeled images and a large number of unlabeled ones. However, remote sensing images obtained from the same satellite sensor [...] Read more.
In terms of semi-supervised cloud detection work, efforts are being made to learn a promising cloud detection model via a limited number of pixel-wise labeled images and a large number of unlabeled ones. However, remote sensing images obtained from the same satellite sensor often show a data distribution drift problem due to the different cloud shapes and land-cover types on the Earth’s surface. Therefore, there are domain distribution gaps between labeled and unlabeled satellite images. To solve this problem, we take the domain shift problem into account for the semi-supervised learning (SSL) network. Feature-level and output-level domain adaptations are applied to reduce the domain distribution gaps between labeled and unlabeled images, thus improving predicted results accuracy of the SSL network. Experimental results on Landsat-8 OLI and GF-1 WFV multispectral images demonstrate that the proposed semi-supervised cloud detection network (SSCDnet) is able to achieve promising cloud detection performance when using a limited number of labeled samples and outperforms several state-of-the-art SSL methods. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Graphical abstract

21 pages, 6130 KiB  
Article
UAVSwarm Dataset: An Unmanned Aerial Vehicle Swarm Dataset for Multiple Object Tracking
by Chuanyun Wang, Yang Su, Jingjing Wang, Tian Wang and Qian Gao
Remote Sens. 2022, 14(11), 2601; https://doi.org/10.3390/rs14112601 - 28 May 2022
Cited by 7 | Viewed by 4930
Abstract
In recent years, with the rapid development of unmanned aerial vehicles (UAV) technology and swarm intelligence technology, hundreds of small-scale and low-cost UAV constitute swarms carry out complex combat tasks in the form of ad hoc networks, which brings great threats and challenges [...] Read more.
In recent years, with the rapid development of unmanned aerial vehicles (UAV) technology and swarm intelligence technology, hundreds of small-scale and low-cost UAV constitute swarms carry out complex combat tasks in the form of ad hoc networks, which brings great threats and challenges to low-altitude airspace defense. Security requirements for low-altitude airspace defense, using visual detection technology to detect and track incoming UAV swarms, is the premise of anti-UAV strategy. Therefore, this study first collected many UAV swarm videos and manually annotated a dataset named UAVSwarm dataset for UAV swarm detection and tracking; thirteen different scenes and more than nineteen types of UAV were recorded, including 12,598 annotated images—the number of UAV in each sequence is 3 to 23. Then, two advanced depth detection models are used as strong benchmarks, namely Faster R-CNN and YOLOX. Finally, two state-of-the-art multi-object tracking (MOT) models, GNMOT and ByteTrack, are used to conduct comprehensive tests and performance verification on the dataset and evaluation metrics. The experimental results show that the dataset has good availability, consistency, and universality. The UAVSwarm dataset can be widely used in training and testing of various UAV detection tasks and UAV swarm MOT tasks. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 2018 KiB  
Technical Note
Rescaling-Assisted Super-Resolution for Medium-Low Resolution Remote Sensing Ship Detection
by Huanxin Zou, Shitian He, Xu Cao, Li Sun, Juan Wei, Shuo Liu and Jian Liu
Remote Sens. 2022, 14(11), 2566; https://doi.org/10.3390/rs14112566 - 27 May 2022
Cited by 1 | Viewed by 1664
Abstract
Medium-low resolution (M-LR) remote sensing ship detection is a challenging problem due to the small target sizes and insufficient appearance information. Although image super resolution (SR) has become a popular solution in recent years, the ability of image SR is limited since much [...] Read more.
Medium-low resolution (M-LR) remote sensing ship detection is a challenging problem due to the small target sizes and insufficient appearance information. Although image super resolution (SR) has become a popular solution in recent years, the ability of image SR is limited since much information is lost in input images. Inspired by the powerful information embedding ability of the encoder in image rescaling, in this paper, we introduce image rescaling to guide the training of image SR. Specifically, we add an adaption module before the SR network, and use the pre-trained rescaling network to guide the optimization of the adaption module. In this way, more information is embedded in the adapted M-LR images, and the subsequent SR module can utilize more information to achieve better performance. Extensive experimental results demonstrate the effectiveness of our method on image SR. More importantly, our method can be used as a pre-processing approach to improve the detection performance. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 8570 KiB  
Article
Efficient Shallow Network for River Ice Segmentation
by Daniel Sola and K. Andrea Scott
Remote Sens. 2022, 14(10), 2378; https://doi.org/10.3390/rs14102378 - 15 May 2022
Cited by 3 | Viewed by 1791
Abstract
River ice segmentation, used for surface ice concentration estimation, is important for validating river processes and ice-formation models, predicting ice jam and flooding risks, and managing water supply and hydroelectric power generation. Furthermore, discriminating between anchor ice and frazil ice is an important [...] Read more.
River ice segmentation, used for surface ice concentration estimation, is important for validating river processes and ice-formation models, predicting ice jam and flooding risks, and managing water supply and hydroelectric power generation. Furthermore, discriminating between anchor ice and frazil ice is an important factor in understanding sediment transport and release events. Modern deep learning techniques have proved to deliver promising results; however, they can show poor generalization ability and can be inefficient when hardware and computing power is limited. As river ice images are often collected in remote locations by unmanned aerial vehicles with limited computation power, we explore the performance-latency trade-offs for river ice segmentation. We propose a novel convolution block inspired by both depthwise separable convolutions and local binary convolutions giving additional efficiency and parameter savings. Our novel convolution block is used in a shallow architecture which has 99.9% fewer trainable parameters, 99% fewer multiply–add operations, and 69.8% less memory usage than a UNet, while achieving virtually the same segmentation performance. We find that the this network trains fast and is able to achieve high segmentation performance early in training due to an emphasis on both pixel intensity and texture. When compared to very efficient segmentation networks such as LR-ASPP with a MobileNetV3 backbone, we achieve good performance (mIoU of 64) 91% faster during training on a CPU and an overall mIoU that is 7.7% higher. We also find that our network is able to generalize better to new domains such as snowy environments. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 31169 KiB  
Article
MIMO: A Unified Spatio-Temporal Model for Multi-Scale Sea Surface Temperature Prediction
by Siyun Hou, Wengen Li, Tianying Liu, Shuigeng Zhou, Jihong Guan, Rufu Qin and Zhenfeng Wang
Remote Sens. 2022, 14(10), 2371; https://doi.org/10.3390/rs14102371 - 14 May 2022
Cited by 9 | Viewed by 2224
Abstract
Sea surface temperature (SST) is a crucial factor that affects global climate and marine activities. Predicting SST at different temporal scales benefits various applications, from short-term SST prediction for weather forecasting to long-term SST prediction for analyzing El Niño–Southern Oscillation (ENSO). However, existing [...] Read more.
Sea surface temperature (SST) is a crucial factor that affects global climate and marine activities. Predicting SST at different temporal scales benefits various applications, from short-term SST prediction for weather forecasting to long-term SST prediction for analyzing El Niño–Southern Oscillation (ENSO). However, existing approaches for SST prediction train separate models for different temporal scales, which is inefficient and cannot take advantage of the correlations among the temperatures of different scales to improve the prediction performance. In this work, we propose a unified spatio-temporal model termed the Multi-In and Multi-Out (MIMO) model to predict SST at different scales. MIMO is an encoder–decoder model, where the encoder learns spatio-temporal features from the SST data of multiple scales, and fuses the learned features with a Cross Scale Fusion (CSF) operation. The decoder utilizes the learned features from the encoder to adaptively predict the SST of different scales. To our best knowledge, this is the first work to predict SST at different temporal scales simultaneously with a single model. According to the experimental evaluation on the Optimum Interpolation SST (OISST) dataset, MIMO achieves the state-of-the-art prediction performance. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

10 pages, 721 KiB  
Article
Deep Learning Models for COVID-19 Detection
by Sertan Serte, Mehmet Alp Dirik and Fadi Al-Turjman
Sustainability 2022, 14(10), 5820; https://doi.org/10.3390/su14105820 - 11 May 2022
Cited by 7 | Viewed by 2017
Abstract
Healthcare is one of the crucial aspects of the Internet of things. Connected machine learning-based systems provide faster healthcare services. Doctors and radiologists can also use these systems for collaboration to provide better help to patients. The recently emerged Coronavirus (COVID-19) is known [...] Read more.
Healthcare is one of the crucial aspects of the Internet of things. Connected machine learning-based systems provide faster healthcare services. Doctors and radiologists can also use these systems for collaboration to provide better help to patients. The recently emerged Coronavirus (COVID-19) is known to have strong infectious ability. Reverse transcription-polymerase chain reaction (RT-PCR) is recognised as being one of the primary diagnostic tools. However, RT-PCR tests might not be accurate. In contrast, doctors can employ artificial intelligence techniques on X-ray and CT scans for analysis. Artificial intelligent methods need a large number of images; however, this might not be possible during a pandemic. In this paper, a novel data-efficient deep network is proposed for the identification of COVID-19 on CT images. This method increases the small number of available CT scans by generating synthetic versions of CT scans using the generative adversarial network (GAN). Then, we estimate the parameters of convolutional and fully connected layers of the deep networks using synthetic and augmented data. The method shows that the GAN-based deep learning model provides higher performance than classic deep learning models for COVID-19 detection. The performance evaluation is performed on COVID19-CT and Mosmed datasets. The best performing models are ResNet-18 and MobileNetV2 on COVID19-CT and Mosmed, respectively. The area under curve values of ResNet-18 and MobileNetV2 are 0.89% and 0.84%, respectively. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

28 pages, 8028 KiB  
Review
A Survey on Memory Subsystems for Deep Neural Network Accelerators
by Arghavan Asad, Rupinder Kaur and Farah Mohammadi
Future Internet 2022, 14(5), 146; https://doi.org/10.3390/fi14050146 - 10 May 2022
Cited by 7 | Viewed by 3373
Abstract
From self-driving cars to detecting cancer, the applications of modern artificial intelligence (AI) rely primarily on deep neural networks (DNNs). Given raw sensory data, DNNs are able to extract high-level features after the network has been trained using statistical learning. However, due to [...] Read more.
From self-driving cars to detecting cancer, the applications of modern artificial intelligence (AI) rely primarily on deep neural networks (DNNs). Given raw sensory data, DNNs are able to extract high-level features after the network has been trained using statistical learning. However, due to the massive amounts of parallel processing in computations, the memory wall largely affects the performance. Thus, a review of the different memory architectures applied in DNN accelerators would prove beneficial. While the existing surveys only address DNN accelerators in general, this paper investigates novel advancements in efficient memory organizations and design methodologies in the DNN accelerator. First, an overview of the various memory architectures used in DNN accelerators will be provided, followed by a discussion of memory organizations on non-ASIC DNN accelerators. Furthermore, flexible memory systems incorporating an adaptable DNN computation will be explored. Lastly, an analysis of emerging memory technologies will be conducted. The reader, through this article, will: 1—gain the ability to analyze various proposed memory architectures; 2—discern various DNN accelerators with different memory designs; 3—become familiar with the trade-offs associated with memory organizations; and 4—become familiar with proposed new memory systems for modern DNN accelerators to solve the memory wall and other mentioned current issues. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

12 pages, 374 KiB  
Article
Two New Datasets for Italian-Language Abstractive Text Summarization
by Nicola Landro, Ignazio Gallo, Riccardo La Grassa and Edoardo Federici
Information 2022, 13(5), 228; https://doi.org/10.3390/info13050228 - 29 Apr 2022
Cited by 6 | Viewed by 3517
Abstract
Text summarization aims to produce a short summary containing relevant parts from a given text. Due to the lack of data for abstractive summarization on low-resource languages such as Italian, we propose two new original datasets collected from two Italian news websites with [...] Read more.
Text summarization aims to produce a short summary containing relevant parts from a given text. Due to the lack of data for abstractive summarization on low-resource languages such as Italian, we propose two new original datasets collected from two Italian news websites with multi-sentence summaries and corresponding articles, and from a dataset obtained by machine translation of a Spanish summarization dataset. These two datasets are currently the only two available in Italian for this task. To evaluate the quality of these two datasets, we used them to train a T5-base model and an mBART model, obtaining good results with both. To better evaluate the results obtained, we also compared the same models trained on automatically translated datasets, and the resulting summaries in the same training language, with the automatically translated summaries, which demonstrated the superiority of the models obtained from the proposed datasets. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

21 pages, 6225 KiB  
Article
Accurate Air-Quality Prediction Using Genetic-Optimized Gated-Recurrent-Unit Architecture
by Chen Ding, Zhouyi Zheng, Sirui Zheng, Xuke Wang, Xiaoyan Xie, Dushi Wen, Lei Zhang and Yanning Zhang
Information 2022, 13(5), 223; https://doi.org/10.3390/info13050223 - 26 Apr 2022
Cited by 3 | Viewed by 2160
Abstract
Air pollution is becoming a serious concern with the development of society and urban expansion, and predicting air quality is the most pressing problem for human beings. Recently, more and more machine-learning-based methods are being used to solve the air-quality-prediction problem, and gated [...] Read more.
Air pollution is becoming a serious concern with the development of society and urban expansion, and predicting air quality is the most pressing problem for human beings. Recently, more and more machine-learning-based methods are being used to solve the air-quality-prediction problem, and gated recurrent units (GRUs) are a representative method because of their advantage for processing time-series data. However, in the same air-quality-prediction task, different researchers have always designed different structures of the GRU due to their different experiences. Data-adaptively designing a GRU structure has thus become a problem. In this paper, we propose an adaptive GRU to address this problem, and the adaptive GRU structures are determined by the dataset, which mainly contributes with three steps. Firstly, an encoding method for the GRU structure is proposed for representing the network structure in a fixed-length binary string; secondly, we define the reciprocal of the sum of the loss of each individual as the fitness function for the iteration computation; thirdly, the genetic algorithm is used for computing the data-adaptive GRU network structure, which can enhance the air-quality-prediction result. The experiment results from three real datasets in Xi’an show that the proposed method achieves better effectiveness in RMSE and SAMPE than the existing LSTM-, SVM-, and RNN-based methods. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 1307 KiB  
Article
An Emergency Event Detection Ensemble Model Based on Big Data
by Khalid Alfalqi and Martine Bellaiche
Big Data Cogn. Comput. 2022, 6(2), 42; https://doi.org/10.3390/bdcc6020042 - 16 Apr 2022
Cited by 4 | Viewed by 3718
Abstract
Emergency events arise when a serious, unexpected, and often dangerous threat affects normal life. Hence, knowing what is occurring during and after emergency events is critical to mitigate the effect of the incident on humans’ life, on the environment and our infrastructures, as [...] Read more.
Emergency events arise when a serious, unexpected, and often dangerous threat affects normal life. Hence, knowing what is occurring during and after emergency events is critical to mitigate the effect of the incident on humans’ life, on the environment and our infrastructures, as well as the inherent financial consequences. Social network utilization in emergency event detection models can play an important role as information is shared and users’ status is updated once an emergency event occurs. Besides, big data proved its significance as a tool to assist and alleviate emergency events by processing an enormous amount of data over a short time interval. This paper shows that it is necessary to have an appropriate emergency event detection ensemble model (EEDEM) to respond quickly once such unfortunate events occur. Furthermore, it integrates Snapchat maps to propose a novel method to pinpoint the exact location of an emergency event. Moreover, merging social networks and big data can accelerate the emergency event detection system: social network data, such as those from Twitter and Snapchat, allow us to manage, monitor, analyze and detect emergency events. The main objective of this paper is to propose a novel and efficient big data-based EEDEM to pinpoint the exact location of emergency events by employing the collected data from social networks, such as “Twitter” and “Snapchat”, while integrating big data (BD) and machine learning (ML). Furthermore, this paper evaluates the performance of five ML base models and the proposed ensemble approach to detect emergency events. Results show that the proposed ensemble approach achieved a very high accuracy of 99.87% which outperform the other base models. Moreover, the proposed base models yields a high level of accuracy: 99.72%, 99.70% for LSTM and decision tree, respectively, with an acceptable training time. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

23 pages, 6276 KiB  
Article
A Structural Approach to Some Contradictions in Worldwide Swine Production and Health Research
by Juan Felipe Núñez-Espinoza, Francisco Ernesto Martínez-Castañeda, Fernando Ávila-Pérez and María Camila Rendón-Rendón
Sustainability 2022, 14(8), 4748; https://doi.org/10.3390/su14084748 - 15 Apr 2022
Cited by 2 | Viewed by 2555
Abstract
Several biosafety gaps in agri-food sectors have become evident in recent years. Many of them are related to the global livestock systems and the organizational models involved in their management and organization. For example, producing pigs requires a global system of massive confinement [...] Read more.
Several biosafety gaps in agri-food sectors have become evident in recent years. Many of them are related to the global livestock systems and the organizational models involved in their management and organization. For example, producing pigs requires a global system of massive confinement and specific technological innovations related to animal production and health that involve broad technical and scientific structures, which are required to generate specific knowledge for successful management. This suggests the need for an underlying socially agglomerated technological ecosystem relevant for these issues. So, we propose the analysis of a specialized scientific social structure in terms of the knowledge and technologies required for pig production and health. The objective of this work is to characterize structural patterns in the research of the swine health sector worldwide. We use a mixed methodological approach, based on a social network approach, and obtained scientific information from 4868 specialized research works on health and pig production generated between 2010 to 2018, from 47 countries. It was possible to analyze swine research dynamics, such as convergence and influence, at country and regional levels, and identify differentiated behaviors and high centralization in scientific communities that have a worldwide impact in terms of achievements but also result in significant omissions. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 7979 KiB  
Article
Landslide Displacement Prediction via Attentive Graph Neural Network
by Ping Kuang, Rongfan Li, Ying Huang, Jin Wu, Xucheng Luo and Fan Zhou
Remote Sens. 2022, 14(8), 1919; https://doi.org/10.3390/rs14081919 - 15 Apr 2022
Cited by 8 | Viewed by 2599
Abstract
Landslides are among the most common geological hazards that result in considerable human and economic losses globally. Researchers have put great efforts into addressing the landslide prediction problem for decades. Previous methods either focus on analyzing the landslide inventory maps obtained from aerial [...] Read more.
Landslides are among the most common geological hazards that result in considerable human and economic losses globally. Researchers have put great efforts into addressing the landslide prediction problem for decades. Previous methods either focus on analyzing the landslide inventory maps obtained from aerial photography and satellite images or propose machine learning models—trained on historical land deformation data—to predict future displacement and sedimentation. However, existing approaches generally fail to capture complex spatial deformations and their inter-dependencies in different areas. This work presents a novel landslide prediction model based on graph neural networks, which utilizes graph convolutions to aggregate spatial correlations among different monitored locations. Besides, we introduce a novel locally historical transformer network to capture dynamic spatio-temporal relations and predict the surface deformation. We conduct extensive experiments on real-world data and demonstrate that our model significantly outperforms state-of-the-art approaches in terms of prediction accuracy and model interpretations. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 7502 KiB  
Article
Local Transformer Network on 3D Point Cloud Semantic Segmentation
by Zijun Wang, Yun Wang, Lifeng An, Jian Liu and Haiyang Liu
Information 2022, 13(4), 198; https://doi.org/10.3390/info13040198 - 14 Apr 2022
Cited by 2 | Viewed by 2723
Abstract
Semantic segmentation is an important component in understanding the 3D point cloud scene. Whether we can effectively obtain local and global contextual information from points is of great significance in improving the performance of 3D point cloud semantic segmentation. In this paper, we [...] Read more.
Semantic segmentation is an important component in understanding the 3D point cloud scene. Whether we can effectively obtain local and global contextual information from points is of great significance in improving the performance of 3D point cloud semantic segmentation. In this paper, we propose a self-attention feature extraction module: the local transformer structure. By stacking the encoder layer composed of this structure, we can extract local features while preserving global connectivity. The structure can automatically learn each point feature from its neighborhoods and is invariant to different point orders. We designed two unique key matrices, each of which focuses on the feature similarities and geometric structure relationships between the points to generate attention weight matrices. Additionally, the cross-skip selection of neighbors is used to obtain larger receptive fields for each point without increasing the number of calculations required, and can therefore better deal with the junction between multiple objects. When the new network was verified on the S3DIS, the mean intersection over union was 69.1%, and the segmentation accuracies on the complex outdoor scene datasets Semantic3D and SemanticKITTI were 94.3% and 87.8%, respectively, which demonstrate the effectiveness of the proposed methods. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

46 pages, 2630 KiB  
Systematic Review
Deep Learning for Vulnerability and Attack Detection on Web Applications: A Systematic Literature Review
by Rokia Lamrani Alaoui and El Habib Nfaoui
Future Internet 2022, 14(4), 118; https://doi.org/10.3390/fi14040118 - 13 Apr 2022
Cited by 12 | Viewed by 6718
Abstract
Web applications are the best Internet-based solution to provide online web services, but they also bring serious security challenges. Thus, enhancing web applications security against hacking attempts is of paramount importance. Traditional Web Application Firewalls based on manual rules and traditional Machine Learning [...] Read more.
Web applications are the best Internet-based solution to provide online web services, but they also bring serious security challenges. Thus, enhancing web applications security against hacking attempts is of paramount importance. Traditional Web Application Firewalls based on manual rules and traditional Machine Learning need a lot of domain expertise and human intervention and have limited detection results faced with the increasing number of unknown web attacks. To this end, more research work has recently been devoted to employing Deep Learning (DL) approaches for web attacks detection. We performed a Systematic Literature Review (SLR) and quality analysis of 63 Primary Studies (PS) on DL-based web applications security published between 2010 and September 2021. We investigated the PS from different perspectives and synthesized the results of the analyses. To the best of our knowledge, this study is the first of its kind on SLR in this field. The key findings of our study include the following. (i) It is fundamental to generate standard real-world web attacks datasets to encourage effective contribution in this field and to reduce the gap between research and industry. (ii) It is interesting to explore some advanced DL models, such as Generative Adversarial Networks and variants of Encoders–Decoders, in the context of web attacks detection as they have been successful in similar domains such as networks intrusion detection. (iii) It is fundamental to bridge expertise in web applications security and expertise in Machine Learning to build theoretical Machine Learning models tailored for web attacks detection. (iv) It is important to create a corpus for web attacks detection in order to take full advantage of text mining in DL-based web attacks detection models construction. (v) It is essential to define a common framework for developing and comparing DL-based web attacks detection models. This SLR is intended to improve research work in the domain of DL-based web attacks detection, as it covers a significant number of research papers and identifies the key points that need to be addressed in this research field. Such a contribution is helpful as it allows researchers to compare existing approaches and to exploit the proposed future work opportunities. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 43805 KiB  
Article
A Two-Stage Low-Altitude Remote Sensing Papaver Somniferum Image Detection System Based on YOLOv5s+DenseNet121
by Qian Wang, Chunshan Wang, Huarui Wu, Chunjiang Zhao, Guifa Teng, Yajie Yu and Huaji Zhu
Remote Sens. 2022, 14(8), 1834; https://doi.org/10.3390/rs14081834 - 11 Apr 2022
Cited by 6 | Viewed by 2681
Abstract
Papaver somniferum (opium poppy) is not only a source of raw material for the production of medical narcotic analgesics but also the major raw material for certain psychotropic drugs. Therefore, it is stipulated by law that the cultivation of Papaver somniferum must be [...] Read more.
Papaver somniferum (opium poppy) is not only a source of raw material for the production of medical narcotic analgesics but also the major raw material for certain psychotropic drugs. Therefore, it is stipulated by law that the cultivation of Papaver somniferum must be authorized by the government under stringent supervision. In certain areas, unauthorized and illicit Papaver somniferum cultivation on private-owned lands occurs from time to time. These illegal Papaver somniferum cultivation sites are dispersedly-distributed and highly-concealed, therefore becoming a tough problem for government supervision. The low-altitude inspection of Papaver somniferum cultivation by unmanned aerial vehicles has the advantages of high efficiency and time saving, but the large amount of image data collected needs to be manually screened, which not only consumes a lot of manpower and material resources but also easily causes omissions. In response to the above problems, this paper proposed a two-stage (target detection and image classification) method for the detection of Papaver somniferum cultivation sites. In the first stage, the YOLOv5s algorithm was used to detect Papaver somniferum images for the purpose of identifying all the suspicious Papaver somniferum images from the original data. In the second stage, the DenseNet121 network was used to classify the detection results from the first stage, so as to exclude the targets other than Papaver somniferum and retain the images containing Papaver somniferum only. For the first stage, YOLOv5s achieved the best overall performance among mainstream target detection models, with a Precision of 97.7%, Recall of 94.9%, and mAP of 97.4%. For the second stage, DenseNet121 with pre-training achieved the best overall performance, with a classification accuracy of 97.33% and a Precision of 95.81%. The experimental comparison results between the one-stage method and the two-stage method suggest that the Recall of the two methods remained the same, but the two-stage method reduced the number of falsely detected images by 73.88%, which greatly reduces the workload for subsequent manual screening of remote sensing Papaver somniferum images. The achievement of this paper provides an effective technical means to solve the problem in the supervision of illicit Papaver somniferum cultivation. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Graphical abstract

20 pages, 2566 KiB  
Article
Learning Spatio-Temporal Attention Based Siamese Network for Tracking UAVs in the Wild
by Junjie Chen, Bo Huang, Jianan Li, Ying Wang, Moxuan Ren and Tingfa Xu
Remote Sens. 2022, 14(8), 1797; https://doi.org/10.3390/rs14081797 - 08 Apr 2022
Cited by 6 | Viewed by 2428
Abstract
The popularity of unmanned aerial vehicles (UAVs) has made anti-UAV technology increasingly urgent. Object tracking, especially in thermal infrared videos, offers a promising solution to counter UAV intrusion. However, troublesome issues such as fast motion and tiny size make tracking infrared drone targets [...] Read more.
The popularity of unmanned aerial vehicles (UAVs) has made anti-UAV technology increasingly urgent. Object tracking, especially in thermal infrared videos, offers a promising solution to counter UAV intrusion. However, troublesome issues such as fast motion and tiny size make tracking infrared drone targets difficult and challenging. This work proposes a simple and effective spatio-temporal attention based Siamese method called SiamSTA, which performs reliable local searching and wide-range re-detection alternatively for robustly tracking drones in the wild. Concretely, SiamSTA builds a two-stage re-detection network to predict the target state using the template of first frame and the prediction results of previous frames. To tackle the challenge of small-scale UAV targets for long-range acquisition, SiamSTA imposes spatial and temporal constraints on generating candidate proposals within local neighborhoods to eliminate interference from background distractors. Complementarily, in case of target lost from local regions due to fast movement, a third stage re-detection module is introduced, which exploits valuable motion cues through a correlation filter based on change detection to re-capture targets from a global view. Finally, a state-aware switching mechanism is adopted to adaptively integrate local searching and global re-detection and take their complementary strengths for robust tracking. Extensive experiments on three anti-UAV datasets nicely demonstrate SiamSTA’s advantage over other competitors. Notably, SiamSTA is the foundation of the 1st-place winning entry in the 2nd Anti-UAV Challenge. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

13 pages, 1983 KiB  
Article
Deep Learning with Word Embedding Improves Kazakh Named-Entity Recognition
by Gulizada Haisa and Gulila Altenbek
Information 2022, 13(4), 180; https://doi.org/10.3390/info13040180 - 02 Apr 2022
Cited by 4 | Viewed by 2650
Abstract
Named-entity recognition (NER) is a preliminary step for several text extraction tasks. In this work, we try to recognize Kazakh named entities by introducing a hybrid neural network model that leverages word semantics with multidimensional features and attention mechanisms. There are two major [...] Read more.
Named-entity recognition (NER) is a preliminary step for several text extraction tasks. In this work, we try to recognize Kazakh named entities by introducing a hybrid neural network model that leverages word semantics with multidimensional features and attention mechanisms. There are two major challenges: First, Kazakh is an agglutinative and morphologically rich language that presents a challenge for NER due to data sparsity. The other is that Kazakh named entities have unclear boundaries, polysemy, and nesting. A common strategy to handle data sparsity is to apply subword segmentation. Thus, we combined the semantics of words and stems by stemming from the Kazakh morphological analysis system. Additionally, we constructed a graph structure of entities, with words, entities, and entity categories as nodes and inclusion relations as edges, and updated nodes using a gated graph neural network (GGNN) with an attention mechanism. Finally, through the conditional random field (CRF), we extracted the final results. Experimental results show that our method consistently outperforms all previous methods by 88.04% in terms of F1 scores. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

22 pages, 3697 KiB  
Article
HealthFetch: An Influence-Based, Context-Aware Prefetch Scheme in Citizen-Centered Health Storage Clouds
by Chrysostomos Symvoulidis, George Marinos, Athanasios Kiourtis, Argyro Mavrogiorgou and Dimosthenis Kyriazis
Future Internet 2022, 14(4), 112; https://doi.org/10.3390/fi14040112 - 01 Apr 2022
Cited by 4 | Viewed by 2813
Abstract
Over the past few years, increasing attention has been given to the health sector and the integration of new technologies into it. Cloud computing and storage clouds have become essentially state of the art solutions for other major areas and have started to [...] Read more.
Over the past few years, increasing attention has been given to the health sector and the integration of new technologies into it. Cloud computing and storage clouds have become essentially state of the art solutions for other major areas and have started to rapidly make their presence powerful in the health sector as well. More and more companies are working toward a future that will allow healthcare professionals to engage more with such infrastructures, enabling them a vast number of possibilities. While this is a very important step, less attention has been given to the citizens. For this reason, in this paper, a citizen-centered storage cloud solution is proposed that will allow citizens to hold their health data in their own hands while also enabling the exchange of these data with healthcare professionals during emergency situations. Not only that, in order to reduce the health data transmission delay, a novel context-aware prefetch engine enriched with deep learning capabilities is proposed. The proposed prefetch scheme, along with the proposed storage cloud, is put under a two-fold evaluation in several deployment and usage scenarios in order to examine its performance with respect to the data transmission times, while also evaluating its outcomes compared to other state of the art solutions. The results show that the proposed solution shows significant improvement of the download speed when compared with the storage cloud, especially when large data are exchanged. In addition, the results of the proposed scheme evaluation depict that the proposed scheme improves the overall predictions, considering the coefficient of determination (R2 > 0.94) and the mean of errors (RMSE < 1), while also reducing the training data by 12%. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

14 pages, 4091 KiB  
Article
Num-Symbolic Homophonic Social Net-Words
by Yi-Liang Chung, Ping-Yu Hsu and Shih-Hsiang Huang
Information 2022, 13(4), 174; https://doi.org/10.3390/info13040174 - 29 Mar 2022
Cited by 1 | Viewed by 2200
Abstract
Many excellent studies about social networks and text analyses can be found in the literature, facilitating the rapid development of automated text analysis technology. Due to the lack of natural separators in Chinese, the text numbers and symbols also have their original literal [...] Read more.
Many excellent studies about social networks and text analyses can be found in the literature, facilitating the rapid development of automated text analysis technology. Due to the lack of natural separators in Chinese, the text numbers and symbols also have their original literal meaning. Thus, combining Chinese characters with numbers and symbols in user-generated content is a challenge for the current analytic approaches and procedures. Therefore, we propose a new hybrid method for detecting blended numeric and symbolic homophony Chinese neologisms (BNShCNs). Interpretation of the words’ actual semantics was performed according to their independence and relative position in context. This study obtained a shortlist using a probability approach from internet-collected user-generated content; subsequently, we evaluated the shortlist by contextualizing word-embedded vectors for BNShCN detection. The experiments show that the proposed method efficiently extracted BNShCNs from user-generated content. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 1203 KiB  
Article
Is Artificial Intelligence Better than Manpower? The Effects of Different Types of Online Customer Services on Customer Purchase Intentions
by Min Qin, Wei Zhu, Shiyue Zhao and Yu Zhao
Sustainability 2022, 14(7), 3974; https://doi.org/10.3390/su14073974 - 28 Mar 2022
Cited by 10 | Viewed by 6318
Abstract
Artificial intelligence has been widely applied to e-commerce and the online business service field. However, few studies have focused on studying the differences in the effects of types of customer service on customer purchase intentions. Based on service encounter theory and superposition theory, [...] Read more.
Artificial intelligence has been widely applied to e-commerce and the online business service field. However, few studies have focused on studying the differences in the effects of types of customer service on customer purchase intentions. Based on service encounter theory and superposition theory, we designed two shopping experiments to capture customers’ thoughts and feelings, in order to explore the differences in the effects of three different types of online customer service (AI customer service, manual customer service, and human–machine collaboration customer service) on customer purchase intention, and analyses the superposition effect of human–machine collaboration customer service. The results show that the consumer’s perceived service quality positively influences the customer’s purchase intention, and plays a mediating role in the effect of different types of online customer service on customer purchase intention; the product type plays a moderating role in the relationship between online customer service and customer purchase intention, and human–machine collaboration customer service has a superposition effect. This study helped to deepen the understanding of AI developers and e-commerce platforms regarding the application of AI in online business service, and provides reference suggestions for the formulation of more perfect business service strategies. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

11 pages, 1388 KiB  
Article
A LiDAR–Camera Fusion 3D Object Detection Algorithm
by Leyuan Liu, Jian He, Keyan Ren, Zhonghua Xiao and Yibin Hou
Information 2022, 13(4), 169; https://doi.org/10.3390/info13040169 - 26 Mar 2022
Cited by 14 | Viewed by 4201
Abstract
3D object detection with LiDAR and camera fusion has always been a challenge for autonomous driving. This work proposes a deep neural network (namely FuDNN) for LiDAR–camera fusion 3D object detection. Firstly, a 2D backbone is designed to extract features from camera images. [...] Read more.
3D object detection with LiDAR and camera fusion has always been a challenge for autonomous driving. This work proposes a deep neural network (namely FuDNN) for LiDAR–camera fusion 3D object detection. Firstly, a 2D backbone is designed to extract features from camera images. Secondly, an attention-based fusion sub-network is designed to fuse the features extracted by the 2D backbone and the features extracted from 3D LiDAR point clouds by PointNet++. Besides, the FuDNN, which uses the RPN and the refinement work of PointRCNN to obtain 3D box predictions, was tested on the public KITTI dataset. Experiments on the KITTI validation set show that the proposed FuDNN achieves AP values of 92.48, 82.90, and 80.51 at easy, moderate, and hard difficulty levels for car detection. The proposed FuDNN improves the performance of LiDAR–camera fusion 3D object detection in the car category of the public KITTI dataset. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 6021 KiB  
Article
Time Series Surface Temperature Prediction Based on Cyclic Evolutionary Network Model for Complex Sea Area
by Jiahao Shi, Jie Yu, Jinkun Yang, Lingyu Xu and Huan Xu
Future Internet 2022, 14(3), 96; https://doi.org/10.3390/fi14030096 - 21 Mar 2022
Cited by 4 | Viewed by 2240
Abstract
The prediction of marine elements has become increasingly important in the field of marine research. However, time series data in a complex environment vary significantly because they are composed of dynamic changes with multiple mechanisms, causes, and laws. For example, sea surface temperature [...] Read more.
The prediction of marine elements has become increasingly important in the field of marine research. However, time series data in a complex environment vary significantly because they are composed of dynamic changes with multiple mechanisms, causes, and laws. For example, sea surface temperature (SST) can be influenced by ocean currents. Conventional models often focus on capturing the impact of historical data but ignore the spatio–temporal relationships in sea areas, and they cannot predict such widely varying data effectively. In this work, we propose a cyclic evolutionary network model (CENS), an error-driven network group, which is composed of multiple network node units. Different regions of data can be automatically matched to a suitable network node unit for prediction so that the model can cluster the data based on their characteristics and, therefore, be more practical. Experiments were performed on the Bohai Sea and the South China Sea. Firstly, we performed an ablation experiment to verify the effectiveness of the framework of the model. Secondly, we tested the model to predict sea surface temperature, and the results verified the accuracy of CENS. Lastly, there was a meaningful finding that the clustering results of the model in the South China Sea matched the actual characteristics of the continental shelf of the South China Sea, and the cluster had spatial continuity. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

22 pages, 2325 KiB  
Article
Machine Learning for Pan Evaporation Modeling in Different Agroclimatic Zones of the Slovak Republic (Macro-Regions)
by Beáta Novotná, Ľuboš Jurík, Ján Čimo, Jozef Palkovič, Branislav Chvíla and Vladimír Kišš
Sustainability 2022, 14(6), 3475; https://doi.org/10.3390/su14063475 - 16 Mar 2022
Cited by 3 | Viewed by 2118
Abstract
Global climate change is likely to influence evapotranspiration (ET); as a result, many ET calculation methods may not give accurate results under different climatic conditions. The main objective of this study is to verify the suitability of machine learning (ML) models as calculation [...] Read more.
Global climate change is likely to influence evapotranspiration (ET); as a result, many ET calculation methods may not give accurate results under different climatic conditions. The main objective of this study is to verify the suitability of machine learning (ML) models as calculation methods for pan evaporation modeling on the macro-regional scale. The most significant PE changes in the different agroclimatic zones of the Slovak Republic were compared, and their considerable impacts were analyzed. On the basis of the agroclimatic zones, 35 meteorological stations distributed across Slovakia were classified into six macro-regions. For each of the meteorological stations, 11 variables were applied during the vegetation period in the years from 2010 to 2020 with a daily time step. The performance of eight different ML models—the neural network (NN) model, the autoneural network (AN) model, the decision tree (DT) model, the Dmine regression (DR) model, the DM neural network (DM NN) model, the gradient boosting (GB) model, the least angle regression (LARS) model, and the ensemble model (EM)—was employed to predict PE. It was found that the different models had diverse prediction accuracies in various geographical locations. In this study, the results of the values predicted by the individual models are compared. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 3624 KiB  
Article
Unsupervised Anomaly Detection and Segmentation on Dirty Datasets
by Jiahao Guo, Xiaohuo Yu and Lu Wang
Future Internet 2022, 14(3), 86; https://doi.org/10.3390/fi14030086 - 13 Mar 2022
Cited by 1 | Viewed by 3114
Abstract
Industrial quality control is an important task. Most of the existing vision-based unsupervised industrial anomaly detection and segmentation methods require that the training set only consists of normal samples, which is difficult to ensure in practice. This paper proposes an unsupervised framework to [...] Read more.
Industrial quality control is an important task. Most of the existing vision-based unsupervised industrial anomaly detection and segmentation methods require that the training set only consists of normal samples, which is difficult to ensure in practice. This paper proposes an unsupervised framework to solve the industrial anomaly detection and segmentation problem when the training set contains anomaly samples. Our framework uses a model pretrained on ImageNet as a feature extractor to extract patch-level features. After that, we propose a trimming method to estimate a robust Gaussian distribution based on the patch features at each position. Then, with an iterative filtering process, we can iteratively filter out the anomaly samples in the training set and re-estimate the Gaussian distribution at each position. In the prediction phase, the Mahalanobis distance between a patch feature vector and the center of the Gaussian distribution at the corresponding position is used as the anomaly score of this patch. The subsequent anomaly region segmentation is performed based on the patch anomaly score. We tested the proposed method on three datasets containing the anomaly samples and obtained state-of-the-art performance. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Graphical abstract

16 pages, 3278 KiB  
Article
Early Detection of Dendroctonus valens Infestation with Machine Learning Algorithms Based on Hyperspectral Reflectance
by Bingtao Gao, Linfeng Yu, Lili Ren, Zhongyi Zhan and Youqing Luo
Remote Sens. 2022, 14(6), 1373; https://doi.org/10.3390/rs14061373 - 11 Mar 2022
Cited by 3 | Viewed by 3024
Abstract
The red turpentine beetle (Dendroctonus valens LeConte) has caused severe ecological and economic losses since its invasion into China. It gradually spreads northeast, resulting in many Chinese pine (Pinus tabuliformis Carr.) deaths. Early detection of D. valens infestation (i.e., at the [...] Read more.
The red turpentine beetle (Dendroctonus valens LeConte) has caused severe ecological and economic losses since its invasion into China. It gradually spreads northeast, resulting in many Chinese pine (Pinus tabuliformis Carr.) deaths. Early detection of D. valens infestation (i.e., at the green attack stage) is the basis of control measures to prevent its outbreak and spread. This study examined the changes in spectral reflectance after initial attacking of D. valens. We also explored the possibility of detecting early D. valens infestation based on spectral vegetation indices and machine learning algorithms. The spectral reflectance of infested trees was significantly different from healthy trees (p < 0.05), and there was an obvious decrease in the near-infrared region (760–1386 nm; p < 0.01). Spectral vegetation indices were input into three machine learning classifiers; the classification accuracy was 72.5–80%, while the sensitivity was 65–85%. Several spectral vegetation indices (DID, CUR, TBSI, DDn2, D735, SR1, NSMI, RNIR•CRI550 and RVSI) were sensitive indicators for the early detection of D. valens damage. Our results demonstrate that remote sensing technology could be successfully applied to early detect D. valens infestation and clarify the sensitive spectral regions and vegetation indices, which has important implications for early detection based on unmanned airborne vehicle and satellite data. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

21 pages, 9459 KiB  
Article
SR-Net: Saliency Region Representation Network for Vehicle Detection in Remote Sensing Images
by Fanfan Liu, Wenzhe Zhao, Guangyao Zhou, Liangjin Zhao and Haoran Wei
Remote Sens. 2022, 14(6), 1313; https://doi.org/10.3390/rs14061313 - 09 Mar 2022
Cited by 1 | Viewed by 1924
Abstract
Vehicle detection in remote sensing imagery is a challenging task because of its inherent attributes, e.g., dense parking, small sizes, various angles, etc. Prevalent vehicle detectors adopt an oriented/rotated bounding box as a basic representation, which needs to apply a distance regression of [...] Read more.
Vehicle detection in remote sensing imagery is a challenging task because of its inherent attributes, e.g., dense parking, small sizes, various angles, etc. Prevalent vehicle detectors adopt an oriented/rotated bounding box as a basic representation, which needs to apply a distance regression of height, width, and angles of objects. These distance-regression-based detectors suffer from two challenges: (1) the periodicity of the angle causes a discontinuity of regression values, and (2) small regression deviations may also cause objects to be missed. To this end, in this paper, we propose a new vehicle modeling strategy, i.e., regarding each vehicle-rotated bounding box as a saliency area. Based on the new representation, we propose SR-Net (saliency region representation network), which transforms the vehicle detection task into a saliency object detection task. The proposed SR-Net, running in a distance (e.g., height, width, and angle)-regression-free way, can generate more accurate detection results. Experiments show that SR-Net outperforms prevalent detectors on multiple benchmark datasets. Specifically, our model yields 52.30%, 62.44%, 68.25%, and 55.81% in terms of AP on DOTA, UCAS-AOD, DLR 3K Munich, and VEDAI, respectively. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 6962 KiB  
Article
Depth-Wise Separable Convolution Attention Module for Garbage Image Classification
by Fucong Liu, Hui Xu, Miao Qi, Di Liu, Jianzhong Wang and Jun Kong
Sustainability 2022, 14(5), 3099; https://doi.org/10.3390/su14053099 - 07 Mar 2022
Cited by 16 | Viewed by 3250
Abstract
Currently, how to deal with the massive garbage produced by various human activities is a hot topic all around the world. In this paper, a preliminary and essential step is to classify the garbage into different categories. However, the mainstream waste classification mode [...] Read more.
Currently, how to deal with the massive garbage produced by various human activities is a hot topic all around the world. In this paper, a preliminary and essential step is to classify the garbage into different categories. However, the mainstream waste classification mode relies heavily on manual work, which consumes a lot of labor and is very inefficient. With the rapid development of deep learning, convolutional neural networks (CNN) have been successfully applied to various application fields. Therefore, some researchers have directly adopted CNNs to classify garbage through their images. However, compared with other images, the garbage images have their own characteristics (such as inter-class similarity, intra-class variance and complex background). Thus, neglecting these characteristics would impair the classification accuracy of CNN. To overcome the limitations of existing garbage image classification methods, a Depth-wise Separable Convolution Attention Module (DSCAM) is proposed in this paper. In DSCAM, the inherent relationships of channels and spatial positions in garbage image features are captured by two attention modules with depth-wise separable convolutions, so that our method could only focus on important information and ignore the interference. Moreover, we also adopt a residual network as the backbone of DSCAM to enhance its discriminative ability. We conduct the experiments on five garbage datasets. The experimental results demonstrate that the proposed method could effectively classify the garbage images and that it outperforms some classical methods. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

14 pages, 830 KiB  
Article
A Deep Learning Framework for Multimodal Course Recommendation Based on LSTM+Attention
by Xinwei Ren, Wei Yang, Xianliang Jiang, Guang Jin and Yan Yu
Sustainability 2022, 14(5), 2907; https://doi.org/10.3390/su14052907 - 02 Mar 2022
Cited by 15 | Viewed by 2854
Abstract
With the impact of COVID-19 on education, online education is booming, enabling learners to access various courses. However, due to the overload of courses and redundant information, it is challenging for users to quickly locate courses they are interested in when faced with [...] Read more.
With the impact of COVID-19 on education, online education is booming, enabling learners to access various courses. However, due to the overload of courses and redundant information, it is challenging for users to quickly locate courses they are interested in when faced with a massive number of courses. To solve this problem, we propose a deep course recommendation model with multimodal feature extraction based on the Long- and Short-Term Memory network (LSTM) and Attention mechanism. The model uses course video, audio, and title and introduction for multimodal fusion. To build a complete learner portrait, user demographic information, explicit and implicit feedback data were added. We conducted extensive and exhaustive experiments based on real datasets, and the results show that the AUC obtained a score of 79.89%, which is significantly higher than similar algorithms and can provide users with more accurate recommendation results in course recommendation scenarios. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

13 pages, 963 KiB  
Article
Parallel Particle Swarm Optimization Using Apache Beam
by Jie Liu, Tao Zhu, Yang Zhang and Zhenyu Liu
Information 2022, 13(3), 119; https://doi.org/10.3390/info13030119 - 28 Feb 2022
Cited by 4 | Viewed by 2294
Abstract
The majority of complex research problems can be formulated as optimization problems. Particle Swarm Optimization (PSO) algorithm is very effective in solving optimization problems because of its robustness, simplicity, and global search capabilities. Since the computational cost of these problems is usually high, [...] Read more.
The majority of complex research problems can be formulated as optimization problems. Particle Swarm Optimization (PSO) algorithm is very effective in solving optimization problems because of its robustness, simplicity, and global search capabilities. Since the computational cost of these problems is usually high, it has been necessary to develop optimization algorithms with parallelization. With the advent of big-data technology, such problems can be solved by distributed parallel computing. In previous related work, MapReduce (a programming model that implements a distributed parallel approach to processing and producing large datasets on a cluster) has been used to parallelize the PSO algorithm, but frequent file reads and writes make the execution time of MRPSO very long. We propose Apache Beam particle swarm optimization (BPSO), which uses Apache Beam parallel programming model. In the experiment, we compared BPSO and PSO based on MapReduce (MRPSO) on four benchmark functions by changing the number of particles and optimizing the dimensions of the problem. The experimental results show that, as the number of particles increases, MRPSO remains largely constant when the number of particles is small (<1000), while the time required for algorithm execution increases rapidly when the number of particles exceeds a certain amount (>1000), while BPSO grows slowly and tends to yield better results than MRPSO. As the dimensionality of the optimization problem increases, BPSO can take half the time of MRPSO and obtain better results than it does. MRPSO requires more execution time than BPSO, as the problem complexity varies, but both MRPSO and BPSO are not very sensitive to problem complexity. All program code and input data are uploaded to GitHub. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 434 KiB  
Article
Electromagnetic Signal Classification Based on Class Exemplar Selection and Multi-Objective Linear Programming
by Huaji Zhou, Jing Bai, Linchun Niu, Jie Xu, Zhu Xiao, Shilian Zheng, Licheng Jiao and Xiaoniu Yang
Remote Sens. 2022, 14(5), 1177; https://doi.org/10.3390/rs14051177 - 27 Feb 2022
Cited by 3 | Viewed by 1950
Abstract
In the increasingly complex electromagnetic environment, a variety of new signal types are appearing; however, existing electromagnetic signal classification (ESC) models cannot handle new signal types. In this context, the emergence of class-incremental learning aims to incrementally update the classification model as new [...] Read more.
In the increasingly complex electromagnetic environment, a variety of new signal types are appearing; however, existing electromagnetic signal classification (ESC) models cannot handle new signal types. In this context, the emergence of class-incremental learning aims to incrementally update the classification model as new categories emerge. In this paper, an electromagnetic signal classification framework based on class exemplar selection and a multi-objective linear programming classifier (CES-MOLPC) is proposed in order to continuously learn new classes in an incremental manner. Specifically, our approach involves the adaptive selection of class exemplars considering normalized mutual information and a multi-objective linear programming classifier. The former is used to maintain the classification capability of the model for previous categories by selecting key samples, while the latter is used to allow the model to adapt quickly to new categories. Meanwhile, a weighted loss function based on cross-entropy and distillation loss is presented in order to fine-tune the model. We demonstrate the effectiveness of the proposed CES-MOLPC method through extensive experiments on the public RML2016.04c data set and the large-scale real-world ACARS signal data set. The results of the comparative experiments demonstrate that our method can achieve significant improvements over state-of-the-art methods. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Graphical abstract

19 pages, 2475 KiB  
Article
Graph-Based Embedding Smoothing Network for Few-Shot Scene Classification of Remote Sensing Images
by Zhengwu Yuan, Wendong Huang, Chan Tang, Aixia Yang and Xiaobo Luo
Remote Sens. 2022, 14(5), 1161; https://doi.org/10.3390/rs14051161 - 26 Feb 2022
Cited by 14 | Viewed by 2341
Abstract
As a fundamental task in the field of remote sensing, scene classification is increasingly attracting attention. The most popular way to solve scene classification is to train a deep neural network with a large-scale remote sensing dataset. However, given a small amount of [...] Read more.
As a fundamental task in the field of remote sensing, scene classification is increasingly attracting attention. The most popular way to solve scene classification is to train a deep neural network with a large-scale remote sensing dataset. However, given a small amount of data, how to train a deep neural network with outstanding performance remains a challenge. Existing methods seek to take advantage of transfer knowledge or meta-knowledge to resolve the scene classification issue of remote sensing images with a handful of labeled samples while ignoring various class-irrelevant noises existing in scene features and the specificity of different tasks. For this reason, in this paper, an end-to-end graph neural network is presented to enhance the performance of scene classification in few-shot scenarios, referred to as the graph-based embedding smoothing network (GES-Net). Specifically, GES-Net adopts an unsupervised non-parametric regularizer, called embedding smoothing, to regularize embedding features. Embedding smoothing can capture high-order feature interactions in an unsupervised manner, which is adopted to remove undesired noises from embedding features and yields smoother embedding features. Moreover, instead of the traditional sample-level relation representation, GES-Net introduces a new task-level relation representation to construct the graph. The task-level relation representation can capture the relations between nodes from the perspective of the whole task rather than only between samples, which can highlight subtle differences between nodes and enhance the discrimination of the relations between nodes. Experimental results on three public remote sensing datasets, UC Merced, WHU-RS19, and NWPU-RESISC45, showed that the proposed GES-Net approach obtained state-of-the-art results in the settings of limited labeled samples. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Graphical abstract

16 pages, 2448 KiB  
Article
Tuberculosis Bacteria Detection and Counting in Fluorescence Microscopy Images Using a Multi-Stage Deep Learning Pipeline
by Marios Zachariou, Ognjen Arandjelović, Wilber Sabiiti, Bariki Mtafya and Derek Sloan
Information 2022, 13(2), 96; https://doi.org/10.3390/info13020096 - 18 Feb 2022
Cited by 14 | Viewed by 4139
Abstract
The manual observation of sputum smears by fluorescence microscopy for the diagnosis and treatment monitoring of patients with tuberculosis (TB) is a laborious and subjective task. In this work, we introduce an automatic pipeline which employs a novel deep learning-based approach to rapidly [...] Read more.
The manual observation of sputum smears by fluorescence microscopy for the diagnosis and treatment monitoring of patients with tuberculosis (TB) is a laborious and subjective task. In this work, we introduce an automatic pipeline which employs a novel deep learning-based approach to rapidly detect Mycobacterium tuberculosis (Mtb) organisms in sputum samples and thus quantify the burden of the disease. Fluorescence microscopy images are used as input in a series of networks, which ultimately produces a final count of present bacteria more quickly and consistently than manual analysis by healthcare workers. The pipeline consists of four stages: annotation by cycle-consistent generative adversarial networks (GANs), extraction of salient image patches, classification of the extracted patches, and finally, regression to yield the final bacteria count. We empirically evaluate the individual stages of the pipeline as well as perform a unified evaluation on previously unseen data that were given ground-truth labels by an experienced microscopist. We show that with no human intervention, the pipeline can provide the bacterial count for a sample of images with an error of less than 5%. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 972 KiB  
Article
The Ethical Governance for the Vulnerability of Care Robots: Interactive-Distance-Oriented Flexible Design
by Zhengqing Zhang, Chenggang Zhang and Xiaomeng Li
Sustainability 2022, 14(4), 2303; https://doi.org/10.3390/su14042303 - 17 Feb 2022
Cited by 1 | Viewed by 1916
Abstract
The application of caring robots is currently a widely accepted solution to the problem of aging. However, for the elderly groups who live in gregarious residences and share intelligence devices, caring robots will cause intimacy and assistance dilemmas in the relationship between humans [...] Read more.
The application of caring robots is currently a widely accepted solution to the problem of aging. However, for the elderly groups who live in gregarious residences and share intelligence devices, caring robots will cause intimacy and assistance dilemmas in the relationship between humans and non-human agencies. This is an information-assisted machine setting, with resulting design ethics issues brought about by the binary values of human and machine, body and mind. The “vulnerability” in risk ethics demonstrates that the ethical problems of human institutions stem from the increase of dependence and the obstruction of intimacy, which are essentially caused by the increased degree of ethical risk exposure and the restriction of agency. Based on value-sensitive design, caring ethics and machine ethics, this paper proposes a flexible design with the interaction-distance-oriented concept, and reprograms the ethical design of caring robots with intentional distance, representational distance and interpretive distance as indicators. The main purpose is to advocate a new type of human-machine interaction relationship emphasizing diversity and physical interaction. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

26 pages, 827 KiB  
Article
Digital Paradox: Platform Economy and High-Quality Economic Development—New Evidence from Provincial Panel Data in China
by Guoge Yang, Feng Deng, Yifei Wang and Xianhong Xiang
Sustainability 2022, 14(4), 2225; https://doi.org/10.3390/su14042225 - 16 Feb 2022
Cited by 20 | Viewed by 3413
Abstract
Based on provincial panel data of China from 2011 to 2019, this paper discusses the influence and mechanism of the platform economy on the high-quality development of regional economies. It is found that the platform economy has an inverted U-shaped impact on the [...] Read more.
Based on provincial panel data of China from 2011 to 2019, this paper discusses the influence and mechanism of the platform economy on the high-quality development of regional economies. It is found that the platform economy has an inverted U-shaped impact on the high-quality development of regional economies. On the left side of the inverted U-shaped inflection point, the platform economy plays a significant role in promoting high-quality economic development; on the right side of the inflection point, the platform economy has an obvious inhibitory effect on high-quality economic development. Statistical analysis showed that 85% of the observations fell on the left side of the inflection point, indicating that China’s platform economy as a whole is in the early stages of development. From the strong and weak grouping test of the degree of government intervention, it was found that the platform economy only has an inverted U-shaped effect on the high-quality development of the areas with weak intervention. From the point of view of the coefficient, the platform economy has a greater promoting effect on the high-quality development of the areas with strong intervention. From the grouping test of the quality of the market system, it was found that the inverted U-shaped curve is steeper in the areas with higher institutional quality, indicating that, in the early stage of development, the platform economy has a greater promoting effect on the high-quality development of areas with perfect institutions. In addition, the analysis of regional heterogeneity showed that, in the early stage of development, the promoting effect of the platform economy on the high-quality development of the northeastern and western regions is more significant. After exceeding the threshold, the platform economy has an inhibitory effect on the high-quality development of all regions. The mechanism test shows that technology, talent, and capital in the initial stage of development can all play a positive regulatory role; after exceeding the threshold, platform economic monopoly may restrain high-quality economic development by hindering technological progress and causing a mismatch of labor–capital elements and resources. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 6477 KiB  
Article
HyperLiteNet: Extremely Lightweight Non-Deep Parallel Network for Hyperspectral Image Classification
by Jianing Wang, Runhu Huang, Siying Guo, Linhao Li, Zhao Pei and Bo Liu
Remote Sens. 2022, 14(4), 866; https://doi.org/10.3390/rs14040866 - 11 Feb 2022
Cited by 4 | Viewed by 2085
Abstract
Deep learning (DL) is widely applied in the field of hyperspectral image (HSI) classification and has proved to be an extremely promising research technique. However, the deployment of DL-based HSI classification algorithms in mobile and embedded vision applications tends to be limited by [...] Read more.
Deep learning (DL) is widely applied in the field of hyperspectral image (HSI) classification and has proved to be an extremely promising research technique. However, the deployment of DL-based HSI classification algorithms in mobile and embedded vision applications tends to be limited by massive parameters, high memory costs, and the complex networks of DL models. In this article, we propose a novel, extremely lightweight, non-deep parallel network (HyperLiteNet) to address these issues. Based on the development trends of hardware devices, the proposed HyperLiteNet replaces the deep network by the parallel structure in terms of fewer sequential computations and lower latency. The parallel structure can extract and optimize the diverse and divergent spatial and spectral features independently. Meanwhile, an elaborately designed feature-interaction module is constructed to acquire and fuse generalized abstract spectral and spatial features in different parallel layers. The lightweight dynamic convolution further compresses the memory of the network to realize flexible spatial feature extraction. Experiments on several real HSI datasets confirm that the proposed HyperLiteNet can efficiently decrease the number of parameters and the execution time as well as achieve better classification performance compared to several recent state-of-the-art algorithms. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

14 pages, 1783 KiB  
Article
Semantic Segmentation of Metoceanic Processes Using SAR Observations and Deep Learning
by Aurélien Colin, Ronan Fablet, Pierre Tandeo, Romain Husson, Charles Peureux, Nicolas Longépé and Alexis Mouche
Remote Sens. 2022, 14(4), 851; https://doi.org/10.3390/rs14040851 - 11 Feb 2022
Cited by 11 | Viewed by 2451
Abstract
Through the Synthetic Aperture Radar (SAR) embarked on the satellites Sentinel-1A and Sentinel-1B of the Copernicus program, a large quantity of observations is routinely acquired over the oceans. A wide range of features from both oceanic (e.g., biological slicks, icebergs, etc.) and meteorologic [...] Read more.
Through the Synthetic Aperture Radar (SAR) embarked on the satellites Sentinel-1A and Sentinel-1B of the Copernicus program, a large quantity of observations is routinely acquired over the oceans. A wide range of features from both oceanic (e.g., biological slicks, icebergs, etc.) and meteorologic origin (e.g., rain cells, wind streaks, etc.) are distinguishable on these acquisitions. This paper studies the semantic segmentation of ten metoceanic processes either in the context of a large quantity of image-level groundtruths (i.e., weakly-supervised framework) or of scarce pixel-level groundtruths (i.e., fully-supervised framework). Our main result is that a fully-supervised model outperforms any tested weakly-supervised algorithm. Adding more segmentation examples in the training set would further increase the precision of the predictions. Trained on 20 × 20 km imagettes acquired from the WV acquisition mode of the Sentinel-1 mission, the model is shown to generalize, under some assumptions, to wide-swath SAR data, which further extents its application domain to coastal areas. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

39 pages, 794 KiB  
Review
A Survey on Text Classification Algorithms: From Text to Predictions
by Andrea Gasparetto, Matteo Marcuzzo, Alessandro Zangari and Andrea Albarelli
Information 2022, 13(2), 83; https://doi.org/10.3390/info13020083 - 11 Feb 2022
Cited by 47 | Viewed by 14284
Abstract
In recent years, the exponential growth of digital documents has been met by rapid progress in text classification techniques. Newly proposed machine learning algorithms leverage the latest advancements in deep learning methods, allowing for the automatic extraction of expressive features. The swift development [...] Read more.
In recent years, the exponential growth of digital documents has been met by rapid progress in text classification techniques. Newly proposed machine learning algorithms leverage the latest advancements in deep learning methods, allowing for the automatic extraction of expressive features. The swift development of these methods has led to a plethora of strategies to encode natural language into machine-interpretable data. The latest language modelling algorithms are used in conjunction with ad hoc preprocessing procedures, of which the description is often omitted in favour of a more detailed explanation of the classification step. This paper offers a concise review of recent text classification models, with emphasis on the flow of data, from raw text to output labels. We highlight the differences between earlier methods and more recent, deep learning-based methods in both their functioning and in how they transform input data. To give a better perspective on the text classification landscape, we provide an overview of datasets for the English language, as well as supplying instructions for the synthesis of two new multilabel datasets, which we found to be particularly scarce in this setting. Finally, we provide an outline of new experimental results and discuss the open research challenges posed by deep learning-based language models. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

18 pages, 72314 KiB  
Article
Tropical Cyclone Intensity Estimation Using Himawari-8 Satellite Cloud Products and Deep Learning
by Jinkai Tan, Qidong Yang, Junjun Hu, Qiqiao Huang and Sheng Chen
Remote Sens. 2022, 14(4), 812; https://doi.org/10.3390/rs14040812 - 09 Feb 2022
Cited by 13 | Viewed by 4401
Abstract
This study develops an objective deep-learning-based model for tropical cyclone (TC) intensity estimation. The model’s basic structure is a convolutional neural network (CNN), which is a widely used technology in computer vision tasks. In order to optimize the model’s structure and to improve [...] Read more.
This study develops an objective deep-learning-based model for tropical cyclone (TC) intensity estimation. The model’s basic structure is a convolutional neural network (CNN), which is a widely used technology in computer vision tasks. In order to optimize the model’s structure and to improve the feature extraction ability, both residual learning and attention mechanisms are embedded into the model. Five cloud products, including cloud optical thickness, cloud top temperature, cloud top height, cloud effective radius, and cloud type, which are level-2 products from the geostationary satellite Himawari-8, are used as the model training inputs. We sampled the cloud products under the 13 rotational angles of each TC to augment the training dataset. For the independent test data, the model shows improvement, with a relatively low RMSE of 4.06 m/s and a mean absolute error (MAE) of 3.23 m/s, which are comparable to the results seen in previous studies. Various cloud organization patterns, storm whirling patterns, and TC structures from the feature maps are presented to interpret the model training process. An analysis of the overestimated bias and underestimated bias shows that the model’s performance is highly affected by the initial cloud products. Moreover, several controlled experiments using other deep learning architectures demonstrate that our designed model is conducive to estimating TC intensity, thus providing insight into the forecasting of other TC metrics. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

26 pages, 6719 KiB  
Article
A Mixed Ensemble Learning and Time-Series Methodology for Category-Specific Vehicular Energy and Emissions Modeling
by Ehsan Moradi and Luis Miranda-Moreno
Sustainability 2022, 14(3), 1900; https://doi.org/10.3390/su14031900 - 07 Feb 2022
Cited by 2 | Viewed by 1977
Abstract
The serially-correlated nature of engine operation is overlooked in the vehicular fuel and emission modeling literature. Furthermore, enabling the calibration and use of time-series models for instrument-independent eco-driving applications requires reliable forecast aggregation procedures. To this end, an ensemble time-series machine-learning methodology is [...] Read more.
The serially-correlated nature of engine operation is overlooked in the vehicular fuel and emission modeling literature. Furthermore, enabling the calibration and use of time-series models for instrument-independent eco-driving applications requires reliable forecast aggregation procedures. To this end, an ensemble time-series machine-learning methodology is developed using data collected through extensive field experiments on a fleet of 35 vehicles. Among other results, it is found that Long Short-Term Memory (LSTM) architecture is the best fit for capturing the dynamic and lagged effects of speed, acceleration, and grade on fuel and emission rates. The developed vehicle-specific ensembles outperformed state-of-the-practice benchmark models by a significant margin and the category-specific models outscored the vehicle-specific sub-models by an average margin of 6%. The results qualify the developed ensembles to work as representatives for vehicle categories and allows them to be utilized in both eco-driving services as well as environmental assessment modules. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

13 pages, 4358 KiB  
Article
A Hybrid Robust-Learning Architecture for Medical Image Segmentation with Noisy Labels
by Jialin Shi, Chenyi Guo and Ji Wu
Future Internet 2022, 14(2), 41; https://doi.org/10.3390/fi14020041 - 26 Jan 2022
Cited by 4 | Viewed by 2781
Abstract
Deep-learning models require large amounts of accurately labeled data. However, for medical image segmentation, high-quality labels rely on expert experience, and less-experienced operators provide noisy labels. How one might mitigate the negative effects caused by noisy labels for 3D medical image segmentation has [...] Read more.
Deep-learning models require large amounts of accurately labeled data. However, for medical image segmentation, high-quality labels rely on expert experience, and less-experienced operators provide noisy labels. How one might mitigate the negative effects caused by noisy labels for 3D medical image segmentation has not been fully investigated. In this paper, our purpose is to propose a novel hybrid robust-learning architecture to combat noisy labels for 3D medical image segmentation. Our method consists of three components. First, we focus on the noisy annotations of slices and propose a slice-level label-quality awareness method, which automatically generates label-quality scores for slices in a set. Second, we propose a shape-awareness regularization loss based on distance transform maps to introduce prior shape information and provide extra performance gains. Third, based on a re-weighting strategy, we propose an end-to-end hybrid robust-learning architecture to weaken the negative effects caused by noisy labels. Extensive experiments are performed on two representative datasets (i.e., liver segmentation and multi-organ segmentation). Our hybrid noise-robust architecture has shown competitive performance, compared to other methods. Ablation studies also demonstrate the effectiveness of slice-level label-quality awareness and a shape-awareness regularization loss for combating noisy labels. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Graphical abstract

17 pages, 750 KiB  
Article
AR-Sanad 280K: A Novel 280K Artificial Sanads Dataset for Hadith Narrator Disambiguation
by Somaia Mahmoud, Omar Saif, Emad Nabil, Mohammad Abdeen, Mustafa ElNainay and Marwan Torki
Information 2022, 13(2), 55; https://doi.org/10.3390/info13020055 - 23 Jan 2022
Cited by 6 | Viewed by 6187
Abstract
Determining hadith authenticity is vitally important in the Islamic religion because hadiths record the sayings and actions of Prophet Muhammad (PBUH), and they are the second source of Islamic teachings following the Quran. When authenticating a hadith, the reliability of the [...] Read more.
Determining hadith authenticity is vitally important in the Islamic religion because hadiths record the sayings and actions of Prophet Muhammad (PBUH), and they are the second source of Islamic teachings following the Quran. When authenticating a hadith, the reliability of the hadith narrators is a big factor that hadith scholars consider. However, many narrators share similar names, and the narrators’ full names are not usually included in the narration chains of hadiths. Thus, first, ambiguous narrators need to be identified. Then, their reliability level can be determined. There are no available datasets that could help address this problem of identifying narrators. Here, we present a new dataset that contains narration chains (sanads) with identified narrators. The AR-Sanad 280K dataset has around 280K artificial sanads and could be used to identify 18,298 narrators. After creating the AR-Sanad 280K dataset, we address the narrator disambiguation in several experimental setups. The hadith narrator disambiguation is modeled as a multiclass classification problem with 18,298 class labels. We test different representations and models in our experiments. The best results were achieved by finetuning BERT-Based deep learning model (AraBERT). We obtained a 92.9 Micro F1 score and 30.2 sanad error rate (SER) on the validation set of our artificial sanads AR-Sanad 280K dataset. Furthermore, we extracted a real test set from the sanads of the famous six books in Islamic hadith. We evaluated the best model on the real test data, and we achieved 83.5 Micro F1 score and 60.6 sanad error rate. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 739 KiB  
Article
Reliability Assurance Dynamic SSC Placement Using Reinforcement Learning
by Wei Li, Yuan Jiang, Xiaoliang Zhang, Fangfang Dang, Feng Gao, Haomin Wang and Qi Fan
Information 2022, 13(2), 53; https://doi.org/10.3390/info13020053 - 21 Jan 2022
Cited by 2 | Viewed by 2263
Abstract
Software-defined networking (SDN) and network function virtualization (NFV) make a network programmable, resulting in a more flexible and agile network. An important and promising application for these two technologies is network security, where they can dynamically chain virtual security functions (VSFs), such as [...] Read more.
Software-defined networking (SDN) and network function virtualization (NFV) make a network programmable, resulting in a more flexible and agile network. An important and promising application for these two technologies is network security, where they can dynamically chain virtual security functions (VSFs), such as firewalls, intrusion detection systems, and intrusion prevention systems, and thus inspect, monitor, or filter traffic flows in cloud data center networks. In view of the strict delay constraints of security services and the high failure probability of VSFs, we propose the use of a security service chain (SSC) orchestration algorithm that is latency aware with reliability assurance (LARA). This algorithm includes an SSC orchestration module and VSF backup module. We first use a reinforcement learning (RL) based Q-learning algorithm to achieve efficient SSC orchestration and try to reduce the end-to-end delay of services. Then, we measure the importance of the physical nodes carrying the VSF instance and backup VSF according to the node importance of VSF. Extensive simulation results indicate that the LARA algorithm is more effective in reducing delay and ensuring reliability compared with other algorithms. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

36 pages, 1458 KiB  
Article
Comparison of the Novel Probabilistic Self-Optimizing Vectorized Earth Observation Retrieval Classifier with Common Machine Learning Algorithms
by Jan Pawel Musial and Jedrzej Stanislaw Bojanowski
Remote Sens. 2022, 14(2), 378; https://doi.org/10.3390/rs14020378 - 14 Jan 2022
Cited by 5 | Viewed by 2175
Abstract
The Vectorized Earth Observation Retrieval (VEOR) algorithm is a novel algorithm suited to the efficient supervised classification of large Earth Observation (EO) datasets. VEOR addresses shortcomings in well-established machine learning methods with an emphasis on numerical performance. Its characteristics include (1) derivation of [...] Read more.
The Vectorized Earth Observation Retrieval (VEOR) algorithm is a novel algorithm suited to the efficient supervised classification of large Earth Observation (EO) datasets. VEOR addresses shortcomings in well-established machine learning methods with an emphasis on numerical performance. Its characteristics include (1) derivation of classification probability; (2) objective selection of classification features that maximize Cohen’s kappa coefficient (κ) derived from iterative “leave-one-out” cross-validation; (3) reduced sensitivity of the classification results to imbalanced classes; (4) smoothing of the classification probability field to reduce noise/mislabeling; (5) numerically efficient retrieval based on a pre-computed look-up vector (LUV); and (6) separate parametrization of the algorithm for each discrete feature class (e.g., land cover). Within this study, the performance of the VEOR classifier was compared to other commonly used machine learning algorithms: K-nearest neighbors, support vector machines, Gaussian process, decision trees, random forest, artificial neural networks, AdaBoost, Naive Bayes and Quadratic Discriminant Analysis. Firstly, the comparison was performed using synthetic 2D (two-dimensional) datasets featuring different sample sizes, levels of noise (i.e., mislabeling) and class imbalance. Secondly, the same experiments were repeated for 7D datasets consisting of informative, redundant and insignificant features. Ultimately, the benchmarking of the classifiers involved cloud discrimination using MODIS satellite spectral measurements and a reference cloud mask derived from combined CALIOP lidar and CPR radar data. The results revealed that the proposed VEOR algorithm accurately discriminated cloud cover using MODIS data and accurately classified large synthetic datasets with low or moderate levels of noise and class imbalance. On the contrary, VEOR did not feature good classification skills for significantly distorted or for small datasets. Nevertheless, the comparisons performed proved that VEOR was within the 3–4 most accurate classifiers and that it can be applied to large Earth Observation datasets. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

15 pages, 581 KiB  
Article
Dual Co-Attention-Based Multi-Feature Fusion Method for Rumor Detection
by Changsong Bing, Yirong Wu, Fangmin Dong, Shouzhi Xu, Xiaodi Liu and Shuifa Sun
Information 2022, 13(1), 25; https://doi.org/10.3390/info13010025 - 09 Jan 2022
Cited by 10 | Viewed by 2864
Abstract
Social media has become more popular these days due to widely used instant messaging. Nevertheless, rumor propagation on social media has become an increasingly important issue. The purpose of this study is to investigate the impact of various features in social media on [...] Read more.
Social media has become more popular these days due to widely used instant messaging. Nevertheless, rumor propagation on social media has become an increasingly important issue. The purpose of this study is to investigate the impact of various features in social media on rumor detection, propose a dual co-attention-based multi-feature fusion method for rumor detection, and explore the detection capability of the proposed method in early rumor detection tasks. The proposed BERT-based Dual Co-attention Neural Network (BDCoNN) method for rumor detection, which uses BERT for word embedding. It simultaneously integrates features from three sources: publishing user profiles, source tweets, and comments. In the BDCoNN method, user discrete features and identity descriptors in user profiles are extracted using a one-dimensional convolutional neural network (CNN) and TextCNN, respectively. The bidirectional gate recurrent unit network (BiGRU) with a hierarchical attention mechanism is used to learn the hidden layer representation of tweet sequence and comment sequence. A dual collaborative attention mechanism is used to explore the correlation among publishing user profiles, tweet content, and comments. Then the feature vector is fed into classifier to identify the implicit differences between rumor spreaders and non-rumor spreaders. In this study, we conducted several experiments on the Weibo and CED datasets collected from microblog. The results show that the proposed method achieves the state-of-the-art performance compared with baseline methods, which is 5.2% and 5% higher than the dEFEND. The F1 value is increased by 4.4% and 4%, respectively. In addition, this paper conducts research on early rumor detection tasks, which verifies the proposed method detects rumors more quickly and accurately than competitors. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 4437 KiB  
Article
Utilizing Half Convolutional Autoencoder to Generate User and Item Vectors for Initialization in Matrix Factorization
by Tan Nghia Duong, Nguyen Nam Doan, Truong Giang Do, Manh Hoang Tran, Duc Minh Nguyen and Quang Hieu Dang
Future Internet 2022, 14(1), 20; https://doi.org/10.3390/fi14010020 - 04 Jan 2022
Cited by 9 | Viewed by 2869
Abstract
Recommendation systems based on convolutional neural network (CNN) have attracted great attention due to their effectiveness in processing unstructured data such as images or audio. However, a huge amount of raw data produced by data crawling and digital transformation is structured, which makes [...] Read more.
Recommendation systems based on convolutional neural network (CNN) have attracted great attention due to their effectiveness in processing unstructured data such as images or audio. However, a huge amount of raw data produced by data crawling and digital transformation is structured, which makes it difficult to utilize the advantages of CNN. This paper introduces a novel autoencoder, named Half Convolutional Autoencoder, which adopts convolutional layers to discover the high-order correlation between structured features in the form of Tag Genome, the side information associated with each movie in the MovieLens 20 M dataset, in order to generate a robust feature vector. Subsequently, these new movie representations, along with the introduction of users’ characteristics generated via Tag Genome and their past transactions, are applied into well-known matrix factorization models to resolve the initialization problem and enhance the predicting results. This method not only outperforms traditional matrix factorization techniques by at least 5.35% in terms of accuracy but also stabilizes the training process and guarantees faster convergence. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 32735 KiB  
Article
Positive Unlabelled Learning for Satellite Images’Time Series Analysis: An Application to Cereal and Forest Mapping
by Johann Desloires, Dino Ienco, Antoine Botrel and Nicolas Ranc
Remote Sens. 2022, 14(1), 140; https://doi.org/10.3390/rs14010140 - 29 Dec 2021
Cited by 4 | Viewed by 2483
Abstract
Applications in which researchers aim to extract a single land type from remotely sensed data are quite common in practical scenarios: extract the urban footprint to make connections with socio-economic factors; map the forest extent to subsequently retrieve biophysical variables and detect a [...] Read more.
Applications in which researchers aim to extract a single land type from remotely sensed data are quite common in practical scenarios: extract the urban footprint to make connections with socio-economic factors; map the forest extent to subsequently retrieve biophysical variables and detect a particular crop type to successively calibrate and deploy yield prediction models. In this scenario, the (positive) targeted class is well defined, while the negative class is difficult to describe. This one-class classification setting is also referred to as positive unlabelled learning (PUL) in the general field of machine learning. To deal with this challenging setting, when satellite image time series data are available, we propose a new framework named positive and unlabelled learning of satellite image time series (PUL-SITS). PUL-SITS involves two different stages: In the first one, a recurrent neural network autoencoder is trained to reconstruct only positive samples with the aim to higight reliable negative ones. In the second stage, both labelled and unlabelled samples are exploited in a semi-supervised manner to build the final binary classification model. To assess the quality of our approach, experiments were carried out on a real-world benchmark, namely Haute-Garonne, located in the southwest area of France. From this study site, we considered two different scenarios: a first one in which the process has the objective to map Cereals/Oilseeds cover versus the rest of the land cover classes and a second one in which the class of interest is the Forest land cover. The evaluation was carried out by comparing the proposed approach with recent competitors to deal with the considered positive and unlabelled learning scenarios. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Graphical abstract

19 pages, 5402 KiB  
Article
TAE-Net: Task-Adaptive Embedding Network for Few-Shot Remote Sensing Scene Classification
by Wendong Huang, Zhengwu Yuan, Aixia Yang, Chan Tang and Xiaobo Luo
Remote Sens. 2022, 14(1), 111; https://doi.org/10.3390/rs14010111 - 28 Dec 2021
Cited by 22 | Viewed by 2507
Abstract
Recently, approaches based on deep learning are quite prevalent in the area of remote sensing scene classification. Though significant success has been achieved, these approaches are still subject to an excess of parameters and extremely dependent on a large quantity of labeled data. [...] Read more.
Recently, approaches based on deep learning are quite prevalent in the area of remote sensing scene classification. Though significant success has been achieved, these approaches are still subject to an excess of parameters and extremely dependent on a large quantity of labeled data. In this study, few-shot learning is used for remote sensing scene classification tasks. The goal of few-shot learning is to recognize unseen scene categories given extremely limited labeled samples. For this purpose, a novel task-adaptive embedding network is proposed to facilitate few-shot scene classification of remote sensing images, referred to as TAE-Net. A feature encoder is first trained on the base set to learn embedding features of input images in the pre-training phase. Then in the meta-training phase, a new task-adaptive attention module is designed to yield the task-specific attention, which can adaptively select informative embedding features among the whole task. In the end, in the meta-testing phase, the query image derived from the novel set is predicted by the meta-trained model with limited support images. Extensive experiments are carried out on three public remote sensing scene datasets: UC Merced, WHU-RS19, and NWPU-RESISC45. The experimental results illustrate that our proposed TAE-Net achieves new state-of-the-art performance for few-shot remote sensing scene classification. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

13 pages, 2308 KiB  
Article
Parallel Particle Swarm Optimization Based on Spark for Academic Paper Co-Authorship Prediction
by Congmin Yang, Tao Zhu, Yang Zhang, Huansheng Ning, Liming Chen and Zhenyu Liu
Information 2021, 12(12), 530; https://doi.org/10.3390/info12120530 - 20 Dec 2021
Cited by 2 | Viewed by 2641
Abstract
The particle swarm optimization (PSO) algorithm has been widely used in various optimization problems. Although PSO has been successful in many fields, solving optimization problems in big data applications often requires processing of massive amounts of data, which cannot be handled by traditional [...] Read more.
The particle swarm optimization (PSO) algorithm has been widely used in various optimization problems. Although PSO has been successful in many fields, solving optimization problems in big data applications often requires processing of massive amounts of data, which cannot be handled by traditional PSO on a single machine. There have been several parallel PSO based on Spark, however they are almost proposed for solving numerical optimization problems, and few for big data optimization problems. In this paper, we propose a new Spark-based parallel PSO algorithm to predict the co-authorship of academic papers, which we formulate as an optimization problem from massive academic data. Experimental results show that the proposed parallel PSO can achieve good prediction accuracy. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 522 KiB  
Article
Could Government Data Openness Enhance Urban Innovation Capability? An Evaluation Based on Multistage DID Method
by Yi Luo, Zhiwei Tang and Peiqi Fan
Sustainability 2021, 13(23), 13495; https://doi.org/10.3390/su132313495 - 06 Dec 2021
Cited by 3 | Viewed by 2277
Abstract
The wave of government data opening has gradually swept the world since it rose from the United States in 2009. The purpose is not to open government data, but to release data value and drive economic and social development through data accessibility. At [...] Read more.
The wave of government data opening has gradually swept the world since it rose from the United States in 2009. The purpose is not to open government data, but to release data value and drive economic and social development through data accessibility. At present, the impact of academic circles on government open data mostly stays in theoretical discussion, especially due to the lack of empirical tests. Using the multistage difference-in-difference (DID) model, this paper analyzes the panel data from 2009 to 2016 by taking two batches of Chinese cities with open data released in 2014 and 2105 as samples to test the impact of government data opening on urban innovation ability. The results show that the opening of government data significantly improves urban innovation abilities. After considering the heterogeneity and fixed effects of urban characteristics, the opening of government data still significantly improves urban innovation ability and shows a greater innovation driving role in cities with high levels of economic development, human capital, and infrastructure. Based on this, this paper believes that we should continue to promote the opening of government data, release the value of data, and pay attention to the Matthew effect between cities that may appear in the era of big data. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

12 pages, 1893 KiB  
Article
Person Re-Identification by Low-Dimensional Features and Metric Learning
by Xingyuan Chen, Huahu Xu, Yang Li and Minjie Bian
Future Internet 2021, 13(11), 289; https://doi.org/10.3390/fi13110289 - 18 Nov 2021
Cited by 3 | Viewed by 2162
Abstract
Person re-identification (Re-ID) has attracted attention due to its wide range of applications. Most recent studies have focused on the extraction of deep features, while ignoring color features that can remain stable, even for illumination variations and the variation in person pose. There [...] Read more.
Person re-identification (Re-ID) has attracted attention due to its wide range of applications. Most recent studies have focused on the extraction of deep features, while ignoring color features that can remain stable, even for illumination variations and the variation in person pose. There are also few studies that combine the powerful learning capabilities of deep learning with color features. Therefore, we hope to use the advantages of both to design a model with low computational resource consumption and excellent performance to solve the task of person re-identification. In this paper, we designed a color feature containing relative spatial information, namely the color feature with spatial information. Then, bidirectional long short-term memory (BLSTM) networks with an attention mechanism are used to obtain the contextual relationship contained in the hand-crafted color features. Finally, experiments demonstrate that the proposed model can improve the recognition performance compared with traditional methods. At the same time, hand-crafted features based on human prior knowledge not only reduce computational consumption compared with deep learning methods but also make the model more interpretable. Full article
(This article belongs to the Topic Big Data and Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop