applsci-logo

Journal Browser

Journal Browser

Innovative Applications of Artificial Intelligence in Multidisciplinary Sciences: Latest Advances and Prospects

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 May 2024) | Viewed by 11083

Special Issue Editors


E-Mail Website
Guest Editor
State Key Laboratory of Power Transmission Equipment Technology, School of Electrical Engineering, Chongqing University, Chongqing, China
Interests: signal processing; electromagnetic parameter measurement; deep learning and machine learning; fault diagnosis; multi-physical field modeling
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
College of Electrical Engineering, Sichuan University, Chengdu 610065, China
Interests: power system protection and control; power quality of the DC distribution network; power system stability and control
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor Assistant
Shenzhen International Graduate School, Tsinghua University, Beijing, China
Interests: fault diagnosis and condition recognition; vibration and noise detection and sensing technology; multi-physical field coupling simulation and calculation

Special Issue Information

Dear Colleagues,

With the transformation of the digital age, artificial intelligence has developed rapidly and penetrated into multidisciplinary application fields, such as power systems, the Internet, bioengineering, medical pharmacy, traffic engineering, industrial control, financial engineering and so on. Artificial intelligence brings new technical methods to solve problems in different disciplines, including power equipment condition assessment and fault diagnosis, AC/DC power system operation planning and control, computer image processing, clinical experiments, biological data recognition, traffic flow prediction, financial market stock price prediction and complex multi-physical field modeling. Clearly, this extensive application of artificial intelligence technology represented by deep learning, machine learning and intelligent optimization algorithms brings hopeful solutions to complex problems in different disciplines.

This Special Issue focuses on the latest applications of artificial intelligence in multidisciplinary sciences. Its topics include but are not limited to the following: multi-physical field simulation modeling, the AC/DC power system, intelligent traffic management, equipment fault diagnosis, image processing, medical diagnosis technology, automatic driving, etc. We welcome research articles covering new technologies and methods in various disciplinary fields.

Prof. Dr. Zhanlong Zhang
Dr. Jianquan Liao
Guest Editors

Dr. Peiyu Jiang
Guest Editor Assistant

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • deep learning
  • machine learning
  • intelligent optimization algorithm
  • engineering application
  • image processing
  • time series data
  • fault diagnosis
  • power system
  • the internet
  • traffic engineering
  • medical pharmaceuticals
  • bioengineering
  • financial engineering

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 1276 KiB  
Article
Transfer Learning in Multimodal Sunflower Drought Stress Detection
by Olivera Lazić, Sandra Cvejić, Boško Dedić, Aleksandar Kupusinac, Siniša Jocić and Dragana Miladinović
Appl. Sci. 2024, 14(14), 6034; https://doi.org/10.3390/app14146034 - 10 Jul 2024
Viewed by 264
Abstract
Efficient water supply and timely detection of drought stress in crops to increase yields is an important task considering that agriculture is the primary consumer of water globally. This is particularly significant for plants such as sunflowers, which are an important source of [...] Read more.
Efficient water supply and timely detection of drought stress in crops to increase yields is an important task considering that agriculture is the primary consumer of water globally. This is particularly significant for plants such as sunflowers, which are an important source of quality edible oils, essential for human nutrition. Traditional detection methods are labor-intensive, time-consuming, and rely on advanced sensor technologies. We introduce an innovative approach based on neural networks and transfer learning for drought stress detection using a novel dataset including 209 non-invasive rhizotron images and 385 images of manually cleaned sections of sunflowers, subjected to normal watering or water stress. We used five neural network models: VGG16, VGG19, InceptionV3, DenseNet, and MobileNet, pre-trained on the ImageNet dataset, whose performance was compared to select the most efficient architecture. Accordingly, the most efficient model, MobileNet, was further refined using different data augmentation mechanisms. The introduction of targeted data augmentation and the use of grayscale images proved to be effective, demonstrating improved results, with an F1 score and an accuracy of 0.95. This approach encourages advances in water stress detection, highlighting the value of artificial intelligence in improving crop health monitoring and management for more resilient agricultural practices. Full article
17 pages, 13121 KiB  
Article
A Creep Model of Steel Slag–Asphalt Mixture Based on Neural Networks
by Bei Deng, Guowei Zeng and Rui Ge
Appl. Sci. 2024, 14(13), 5820; https://doi.org/10.3390/app14135820 - 3 Jul 2024
Viewed by 378
Abstract
To characterize the complex creep behavior of steel slag–asphalt mixture influenced by both stress and temperature, predictive models employing Back Propagation (BP) and Long Short-Term Memory (LSTM) neural networks are described and compared in this paper. Multiple stress repeated creep recovery tests on [...] Read more.
To characterize the complex creep behavior of steel slag–asphalt mixture influenced by both stress and temperature, predictive models employing Back Propagation (BP) and Long Short-Term Memory (LSTM) neural networks are described and compared in this paper. Multiple stress repeated creep recovery tests on AC-13 grade steel slag–asphalt mix samples were conducted at different temperatures. The experimental results were processed into a group of independent creep recovery test results, then divided into training and testing datasets. The K-fold cross-validation was applied to the training datasets to fine-tune the hyperparameters of the neural networks effectively. Compared with the experimental curves, both the effects of BP and LSTM models were investigated, and the broad applicability of the models was proven. The performance of the trained LSTM model was observed by a 95% confidence interval around the fit errors, thereby the creep strain intervals for the testing dataset were obtained. The results suggest that the LSTM model had enhanced prediction compared the BP model for creep deformation trends of steel slag–asphalt mixture at various temperatures. Due to the potent generalization strength of artificial intelligence technology, the LSTM model can be further expanded for forecasting road rutting deformations. Full article
Show Figures

Figure 1

17 pages, 1685 KiB  
Article
Active Collision Avoidance for Robotic Arm Based on Artificial Potential Field and Deep Reinforcement Learning
by Qiaoyu Xu, Tianle Zhang, Kunpeng Zhou, Yansong Lin and Wenhao Ju
Appl. Sci. 2024, 14(11), 4936; https://doi.org/10.3390/app14114936 - 6 Jun 2024
Viewed by 396
Abstract
To address the local minimum issue commonly encountered in active collision avoidance using artificial potential field (APF), this paper presents a novel algorithm that integrates APF with deep reinforcement learning (DRL) for robotic arms. Firstly, to improve the training efficiency of DRL for [...] Read more.
To address the local minimum issue commonly encountered in active collision avoidance using artificial potential field (APF), this paper presents a novel algorithm that integrates APF with deep reinforcement learning (DRL) for robotic arms. Firstly, to improve the training efficiency of DRL for the collision avoidance problem, Hindsight Experience Replay (HER) was enhanced by adjusting the positions of obstacles, resulting in Hindsight Experience Replay for Collision Avoidance (HER-CA). Subsequently, A robotic arm collision avoidance action network model was trained based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) and HER-CA methods. Further, a full-body collision avoidance potential field model of the robotic arm was established based on the artificial potential field. Lastly, the trained action network model was used to guide APF in real-time collision avoidance planning. Comparative experiments between HER and HER-CA were conducted. The model trained with HER-CA improves the average success rate of the collision avoidance task by about 10% compared to the model trained with HER. And a collision avoidance simulation was conducted on the rock drilling robotic arm, confirming the effectiveness of the guided APF method. Full article
Show Figures

Figure 1

19 pages, 6253 KiB  
Article
Intelligent Fault Diagnosis of Unbalanced Samples Using Optimized Generative Adversarial Network
by Yan Huo, Diyuan Guan and Lingyan Dong
Appl. Sci. 2024, 14(11), 4927; https://doi.org/10.3390/app14114927 - 6 Jun 2024
Viewed by 355
Abstract
The increasing range of faults encountered by mechanical systems has brought great challenges for conducting intelligent fault diagnosis based on insufficient samples, in recent years. To tackle the issue of unbalanced samples, an improved methodology based on a generative adversarial network that uses [...] Read more.
The increasing range of faults encountered by mechanical systems has brought great challenges for conducting intelligent fault diagnosis based on insufficient samples, in recent years. To tackle the issue of unbalanced samples, an improved methodology based on a generative adversarial network that uses sample generation and classification is proposed. First, 1D vibration signals are transformed into 2D images considering the features of the vibrating signals. Next, the optimized generation adversarial network is constructed for adversarial training to synthesize diverse fake 2D images according to actual sample characteristics with the generative model as a generator and the discriminative model as a discriminator. Our model uses an attenuated learning rate with a cross-iteration batch normalization layer to enhance the validity of the generator. Last, the discriminative model as a classifier is used to identify the fault states. The experimental results demonstrate that the proposed strategy efficiently improves fault identification accuracy in the two cases of sample imbalance. Full article
Show Figures

Figure 1

15 pages, 5554 KiB  
Article
A Baseline Drift-Elimination Algorithm for Strain Measurement-System Signals Based on the Transformer Model
by Yusen Wang, Lei Zhang, Xue Qi, Xiaopeng Yang and Qiulin Tan
Appl. Sci. 2024, 14(11), 4447; https://doi.org/10.3390/app14114447 - 23 May 2024
Viewed by 428
Abstract
Strain measurements are vital in engineering trials, testing, and scientific research. In the process of signal acquisition, baseline drift has a significant impact on the accuracy and validity of data. Traditional solutions, such as discrete wavelet transform and empirical mode decomposition, cannot be [...] Read more.
Strain measurements are vital in engineering trials, testing, and scientific research. In the process of signal acquisition, baseline drift has a significant impact on the accuracy and validity of data. Traditional solutions, such as discrete wavelet transform and empirical mode decomposition, cannot be used in real-time systems. To solve this problem, this paper proposes a Transformer-based model to eliminate the drift in the signal. A self-attentive mechanism is utilized in the encoder of the model to learn the interrelationships between the components of the input signal, and captures the key features. Then, the decoder generates a corrected signal. Meanwhile, a high-precision strain acquisition system is constructed. The experiments tested the model’s ability to remove drift from simulated voltage signals with and without Gaussian noise. The results demonstrated that the transformer model excels at eliminating signal baseline drift. Additionally, the performance of the model was investigated under different temperature conditions and with different levels of force applied by the electronic universal testing machine to produce strain. The experimental results indicate that the Transformer model can largely eliminate drift in dynamic signals l and has great potential for practical applications. Full article
Show Figures

Figure 1

15 pages, 6829 KiB  
Article
Denoising of Wrapped Phase in Digital Speckle Shearography Based on Convolutional Neural Network
by Hao Zhang, Dawei Huang and Kaifu Wang
Appl. Sci. 2024, 14(10), 4135; https://doi.org/10.3390/app14104135 - 13 May 2024
Viewed by 480
Abstract
Speckle-shearing technology is widely used in defect detection due to its high precision and non-contact characteristics. However, the wrapped-phase recording defect information is often accompanied by a lot of speckle noise, which affects the evaluation of defect information. To solve the problems of [...] Read more.
Speckle-shearing technology is widely used in defect detection due to its high precision and non-contact characteristics. However, the wrapped-phase recording defect information is often accompanied by a lot of speckle noise, which affects the evaluation of defect information. To solve the problems of traditional denoising algorithms in suppressing speckle noise and preserving the texture features of wrapped phases, this study proposes a speckle denoising algorithm called a speckle denoising convolutional neural network (SDCNN). The proposed method reduces the loss of texture information and the blurring of details in the denoising process by optimizing the loss function. Different from the previous simple assumption that the speckle noise is multiplicative, this study proposes a more realistic wrapped image-simulation method, which has better training results. Compared with representative algorithms such as BM3D, SDCNN can handle a wider range of speckle noise and has a better denoising effect. Simulated and real speckle-noise images are used to evaluate the denoising effect of SDCNN. The results show that SDCNN can effectively reduce the speckle noise of the speckle-shear wrapping phase and retain better texture details. Full article
Show Figures

Figure 1

14 pages, 2653 KiB  
Article
WordBlitz: An Efficient Hard-Label Textual Adversarial Attack Method Jointly Leveraging Adversarial Transferability and Word Importance
by Xiangge Li, Hong Luo and Yan Sun
Appl. Sci. 2024, 14(9), 3831; https://doi.org/10.3390/app14093831 - 30 Apr 2024
Viewed by 501
Abstract
Existing textual attacks mostly perturb keywords in sentences to generate adversarial examples by relying on the prediction confidence of victim models. In practice, attackers can only access the prediction label, meaning that the victim model can easily defend against such hard-label attacks by [...] Read more.
Existing textual attacks mostly perturb keywords in sentences to generate adversarial examples by relying on the prediction confidence of victim models. In practice, attackers can only access the prediction label, meaning that the victim model can easily defend against such hard-label attacks by denying access based on the attack’s frequency. In this paper, we propose an efficient hard-label attack approach, called WordBlitz. First, based on the adversarial transferability, we train a substitute model to initialize the attack parameter set, including a candidate pool and two weight tables of keywords and candidate words. Then, adversarial examples are generated and optimized under the guidance of the two weight tables. During optimization, we design a hybrid local search algorithm with word importance to find the globally optimal solution while updating the two weight tables according to the attack results. Finally, the non-adversarial text generated during perturbation optimization is added to the training of the substitute model as data augmentation to improve the adversarial transferability. Experimental results show that WordBlitz surpasses the baseline in terms of better effectiveness, higher efficiency, and lower cost. Its efficiency is especially pronounced in scenarios with broader search spaces, and its attack success rate on a Chinese dataset is higher than on baselines. Full article
Show Figures

Figure 1

19 pages, 5545 KiB  
Article
Ensemble Empirical Mode Decomposition Granger Causality Test Dynamic Graph Attention Transformer Network: Integrating Transformer and Graph Neural Network Models for Multi-Sensor Cross-Temporal Granularity Water Demand Forecasting
by Wenhong Wu and Yunkai Kang
Appl. Sci. 2024, 14(8), 3428; https://doi.org/10.3390/app14083428 - 18 Apr 2024
Viewed by 630
Abstract
Accurate water demand forecasting is crucial for optimizing the strategies across multiple water sources. This paper proposes the Ensemble Empirical Mode Decomposition Granger causality test Dynamic Graph Attention Transformer Network (EG-DGATN) for multi-sensor cross-temporal granularity water demand forecasting, which combines the Transformer and [...] Read more.
Accurate water demand forecasting is crucial for optimizing the strategies across multiple water sources. This paper proposes the Ensemble Empirical Mode Decomposition Granger causality test Dynamic Graph Attention Transformer Network (EG-DGATN) for multi-sensor cross-temporal granularity water demand forecasting, which combines the Transformer and Graph Neural Networks. It employs the EEMD–Granger test to delineate the interconnections among sensors and extracts the spatiotemporal features within the causal domain by stacking dynamical graph spatiotemporal attention layers. The experimental results demonstrate that compared to baseline models, the EG-DGATN improves the MAPE metrics by 2.12%, 4.33%, and 6.32% in forecasting intervals of 15 min, 45 min, and 90 min, respectively. The model achieves an R2 score of 0.97, indicating outstanding predictive accuracy and exceptional explanatory power for the target variable. This research highlights significant potential applications in predictive tasks within smart water management systems. Full article
Show Figures

Figure 1

19 pages, 9752 KiB  
Article
PPA-SAM: Plug-and-Play Adversarial Segment Anything Model for 3D Tooth Segmentation
by Jiahao Liao, Hongyuan Wang, Hanjie Gu and Yinghui Cai
Appl. Sci. 2024, 14(8), 3259; https://doi.org/10.3390/app14083259 - 12 Apr 2024
Viewed by 685
Abstract
In Cone Beam Computed Tomography (CBCT) images, accurate tooth segmentation is crucial for oral health, providing essential guidance for dental procedures such as implant placement and difficult tooth extractions (impactions). However, due to the lack of a substantial amount of dental data and [...] Read more.
In Cone Beam Computed Tomography (CBCT) images, accurate tooth segmentation is crucial for oral health, providing essential guidance for dental procedures such as implant placement and difficult tooth extractions (impactions). However, due to the lack of a substantial amount of dental data and the complexity of tooth morphology in CBCT images, the task of tooth segmentation faces significant challenges. This may lead to issues such as overfitting and training instability in existing algorithms, resulting in poor model generalization. Ultimately, this may impact the accuracy of segmentation results and could even provide incorrect diagnostic and treatment information. In response to these challenges, we introduce PPA-SAM, an innovative dual-encoder segmentation network that merges the currently popular Segment Anything Model (SAM) with the 3D medical segmentation network, VNet. Through the use of adapters, we achieve parameter reuse and fine-tuning, enhancing the model’s adaptability to specific CBCT datasets. Simultaneously, we utilize a three-layer convolutional network as both a discriminator and a generator for adversarial training. The PPA-SAM model seamlessly integrates the high-precision segmentation performance of convolutional networks with the outstanding generalization capabilities of SAM models, achieving more accurate and robust three-dimensional tooth segmentation in CBCT images. Evaluation of a small CBCT dataset demonstrates that PPA-SAM outperforms other networks in terms of accuracy and robustness, providing a reliable and efficient solution for three-dimensional tooth segmentation in CBCT images. This research has a positive impact on the management of dentofacial conditions from oral implantology to orthognathic surgery, offering dependable technological support for future oral diagnostics and treatment planning. Full article
Show Figures

Figure 1

17 pages, 3312 KiB  
Article
DBSTGNN-Att: Dual Branch Spatio-Temporal Graph Neural Network with an Attention Mechanism for Cellular Network Traffic Prediction
by Zengyu Cai, Chunchen Tan, Jianwei Zhang, Liang Zhu and Yuan Feng
Appl. Sci. 2024, 14(5), 2173; https://doi.org/10.3390/app14052173 - 5 Mar 2024
Cited by 1 | Viewed by 991
Abstract
As network technology continues to develop, the popularity of various intelligent terminals has accelerated, leading to a rapid growth in the scale of wireless network traffic. This growth has resulted in significant pressure on resource consumption and network security maintenance. The objective of [...] Read more.
As network technology continues to develop, the popularity of various intelligent terminals has accelerated, leading to a rapid growth in the scale of wireless network traffic. This growth has resulted in significant pressure on resource consumption and network security maintenance. The objective of this paper is to enhance the prediction accuracy of cellular network traffic in order to provide reliable support for the subsequent base station sleep control or the identification of malicious traffic. To achieve this target, a cellular network traffic prediction method based on multi-modal data feature fusion is proposed. Firstly, an attributed K-nearest node (KNN) graph is constructed based on the similarity of data features, and the fused high-dimensional features are incorporated into the graph to provide more information for the model. Subsequently, a dual branch spatio-temporal graph neural network with an attention mechanism (DBSTGNN-Att) is designed for cellular network traffic prediction. Extensive experiments conducted on real-world datasets demonstrate that the proposed method outperforms baseline models, such as temporal graph convolutional networks (T-GCNs) and spatial–temporal self-attention graph convolutional networks (STA-GCNs) with lower mean absolute error (MAE) values of 6.94% and 2.11%, respectively. Additionally, the ablation experimental results show that the MAE of multi-modal feature fusion using the attributed KNN graph is 8.54% lower compared to that of the traditional undirected graphs. Full article
Show Figures

Figure 1

18 pages, 6514 KiB  
Article
Fault Diagnosis of Inter-Turn Fault in Permanent Magnet-Synchronous Motors Based on Cycle-Generative Adversarial Networks and Deep Autoencoder
by Wenkuan Huang, Hongbin Chen and Qiyang Zhao
Appl. Sci. 2024, 14(5), 2139; https://doi.org/10.3390/app14052139 - 4 Mar 2024
Viewed by 849
Abstract
This paper addresses the issue of the difficulty in obtaining inter-turn fault (ITF) samples in electric motors, specifically in permanent magnet-synchronous motors (PMSMs), where the number of ITF samples in the stator windings is severely lacking compared to healthy samples. To effectively identify [...] Read more.
This paper addresses the issue of the difficulty in obtaining inter-turn fault (ITF) samples in electric motors, specifically in permanent magnet-synchronous motors (PMSMs), where the number of ITF samples in the stator windings is severely lacking compared to healthy samples. To effectively identify these faults, an improved fault diagnosis method based on the combination of a cycle-generative adversarial network (GAN) and a deep autoencoder (DAE) is proposed. In this method, the Cycle GAN is used to expand the collection of fault samples for PMSMs, while the DAE enhances the capability to extract and analyze these fault samples, thus improving the accuracy of fault diagnosis. The experimental results demonstrate that Cycle GAN exhibits an excellent capability to generate ITF fault samples. The proposed method achieves a diagnostic accuracy rate of up to 98.73% for ITF problems. Full article
Show Figures

Figure 1

13 pages, 2924 KiB  
Article
Matting Algorithm with Improved Portrait Details for Images with Complex Backgrounds
by Rui Li, Dan Zhang, Sheng-Ling Geng and Ming-Quan Zhou
Appl. Sci. 2024, 14(5), 1942; https://doi.org/10.3390/app14051942 - 27 Feb 2024
Viewed by 799
Abstract
With the continuous development of virtual reality, digital image applications, the required complex scene video proliferates. For this reason, portrait matting has become a popular topic. In this paper, a new matting algorithm with improved portrait details for images with complex backgrounds (MORLIPO) [...] Read more.
With the continuous development of virtual reality, digital image applications, the required complex scene video proliferates. For this reason, portrait matting has become a popular topic. In this paper, a new matting algorithm with improved portrait details for images with complex backgrounds (MORLIPO) is proposed. This work combines the background restoration module (BRM) and the fine-grained matting module (FGMatting) to achieve high-detail matting for images with complex backgrounds. We recover the background by inputting a single image or video, which serves as a priori and aids in generating a more accurate alpha matte. The main framework uses the image matting model MODNet, the MobileNetV2 lightweight network, and the background restoration module, which can both preserve the background information of the current image and provide a more accurate prediction of the alpha matte of the current frame for the video image. It also provides the background prior of the previous frame to predict the alpha matte of the current frame more accurately. The fine-grained matting module is designed to extract fine-grained details of the foreground and retain the features, while combining with the semantic module to achieve more accurate matting. Our design allows training on a single NVIDIA 3090 GPU in an end-to-end manner and experiments on publicly available data sets. Experimental validation shows that our method performs well on both visual effects and objective evaluation metrics. Full article
Show Figures

Figure 1

15 pages, 3912 KiB  
Article
VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
by Yifang Xu, Yunzhuo Sun, Zien Xie, Benxiang Zhai and Sidan Du
Appl. Sci. 2024, 14(5), 1894; https://doi.org/10.3390/app14051894 - 25 Feb 2024
Viewed by 1144
Abstract
Video temporal grounding (VTG) aims to locate specific temporal segments from an untrimmed video based on a linguistic query. Most existing VTG models are trained on extensive annotated video-text pairs, a process that not only introduces human biases from the queries but also [...] Read more.
Video temporal grounding (VTG) aims to locate specific temporal segments from an untrimmed video based on a linguistic query. Most existing VTG models are trained on extensive annotated video-text pairs, a process that not only introduces human biases from the queries but also incurs significant computational costs. To tackle these challenges, we propose VTG-GPT, a GPT-based method for zero-shot VTG without training or fine-tuning. To reduce prejudice in the original query, we employ Baichuan2 to generate debiased queries. To lessen redundant information in videos, we apply MiniGPT-v2 to transform visual content into more precise captions. Finally, we devise the proposal generator and post-processing to produce accurate segments from debiased queries and image captions. Extensive experiments demonstrate that VTG-GPT significantly outperforms SOTA methods in zero-shot settings and surpasses unsupervised approaches. More notably, it achieves competitive performance comparable to supervised methods. The code is available on GitHub. Full article
Show Figures

Figure 1

14 pages, 1691 KiB  
Article
Stable Low-Rank CP Decomposition for Compression of Convolutional Neural Networks Based on Sensitivity
by Chenbin Yang and Huiyi Liu
Appl. Sci. 2024, 14(4), 1491; https://doi.org/10.3390/app14041491 - 12 Feb 2024
Cited by 1 | Viewed by 1054
Abstract
Modern convolutional neural networks (CNNs) play a crucial role in computer vision applications. The intricacy of the application scenarios and the growing dataset both significantly raise the complexity of CNNs. As a result, they are often overparameterized and have significant computational costs. One [...] Read more.
Modern convolutional neural networks (CNNs) play a crucial role in computer vision applications. The intricacy of the application scenarios and the growing dataset both significantly raise the complexity of CNNs. As a result, they are often overparameterized and have significant computational costs. One potential solution for optimizing and compressing the CNNs is to replace convolutional layers with low-rank tensor decomposition. The most suitable technique for this is Canonical Polyadic (CP) decomposition. However, there are two primary issues with CP decomposition that lead to a significant loss in accuracy. Firstly, the selection of tensor ranks for CP decomposition is an unsolved issue. Secondly, degeneracy and instability are common problems in the CP decomposition of contractional tensors, which makes fine-tuning the compressed model difficult. In this study, a novel approach was proposed for compressing CNNs by using CP decomposition. The first step involves using the sensitivity of convolutional layers to determine the tensor ranks for CP decomposition effectively. Subsequently, to address the degeneracy issue and enhance the stability of the CP decomposition, two novel techniques were incorporated: optimization with sensitivity constraints and iterative fine-tuning based on sensitivity order. Finally, the proposed method was examined on common CNN structures for image classification tasks and demonstrated that it provides stable performance and significantly fewer reductions in classification accuracy. Full article
Show Figures

Figure 1

Back to TopTop