MDPI - Publisher of Open Access Journals

29 pages, 1051 KB

Open AccessArticle

Urdu Toxicity Detection: A Multi-Stage and Multi-Label Classification Approach

by Ayesha Rashid, Sajid Mahmood, Usman Inayat and Muhammad Fahad Zia

AI 2025, 6(8), 194; https://doi.org/10.3390/ai6080194 - 21 Aug 2025

Viewed by 996

Social media empowers freedom of expression but is often misused for abuse and hate. The detection of such content is crucial, especially in under-resourced languages like Urdu. To address this challenge, this paper designed a comprehensive multilabel dataset, the Urdu toxicity corpus (UTC). [...] Read more.

Social media empowers freedom of expression but is often misused for abuse and hate. The detection of such content is crucial, especially in under-resourced languages like Urdu. To address this challenge, this paper designed a comprehensive multilabel dataset, the Urdu toxicity corpus (UTC). Second, the Urdu toxicity detection model is developed, which detects toxic content from an Urdu dataset presented in Nastaliq Font. The proposed framework initially processed the gathered data and then applied feature engineering using term frequency-inverse document frequency, bag-of-words, and N-gram techniques. Subsequently, the synthetic minority over-sampling technique is used to address the data imbalance problem, and manual data annotation is performed to ensure label accuracy. Four machine learning models, namely logistic regression, support vector machine, random forest, and gradient boosting, are applied to preprocessed data. The results indicate that the RF outperformed all evaluation metrics. Deep learning algorithms, including long short-term memory (LSTM), Bidirectional LSTM, and gated recurrent unit, have also been applied to UTC for classification purposes. Random forest outperforms the other models, achieving a precision, recall, F1-score, and accuracy of 0.97, 0.99, 0.98, and 0.99, respectively. The proposed model demonstrates a strong potential to detect rude, offensive, abusive, and hate speech content from user comments in Urdu Nastaliq. Full article

(This article belongs to the Special Issue AI-Driven Innovations: Emerging Trends, Security, and Industrial Solutions)

► Show Figures

Figure 1

32 pages, 9129 KB

Open AccessArticle

Detection and Recognition of Bilingual Urdu and English Text in Natural Scene Images Using a Convolutional Neural Network–Recurrent Neural Network Combination with a Connectionist Temporal Classification Decoder

by Khadija Tul Kubra, Muhammad Umair, Muhammad Zubair, Muhammad Tahir Naseem and Chan-Su Lee

Sensors 2025, 25(16), 5133; https://doi.org/10.3390/s25165133 - 19 Aug 2025

Viewed by 635

Abstract

Urdu and English are widely used for visual text communications worldwide in public spaces such as signboards and navigation boards. Text in such natural scenes contains useful information for modern-era applications such as language translation for foreign visitors, robot navigation, and autonomous vehicles, [...] Read more.

Urdu and English are widely used for visual text communications worldwide in public spaces such as signboards and navigation boards. Text in such natural scenes contains useful information for modern-era applications such as language translation for foreign visitors, robot navigation, and autonomous vehicles, highlighting the importance of extracting these texts. Previous studies focused on Urdu alone or printed text pasted manually on images and lacked sufficiently large datasets for effective model training. Herein, a pipeline for Urdu and English (bilingual) text detection and recognition in complex natural scene images is proposed. Additionally, a unilingual dataset is converted into a bilingual dataset and augmented using various techniques. For implementations, a customized convolutional neural network is used for feature extraction, a recurrent neural network (RNN) is used for feature learning, and connectionist temporal classification (CTC) is employed for text recognition. Experiments are conducted using different RNNs and hidden units, which yield satisfactory results. Ablation studies are performed on the two best models by eliminating model components. The proposed pipeline is also compared to existing text detection and recognition methods. The proposed models achieved average accuracies of 98.5% for Urdu character recognition, 97.2% for Urdu word recognition, and 99.2% for English character recognition. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Graphical abstract

23 pages, 3836 KB

Open AccessArticle

RUDA-2025: Depression Severity Detection Using Pre-Trained Transformers on Social Media Data

by Muhammad Ahmad, Pierpaolo Basile, Fida Ullah, Ildar Batyrshin and Grigori Sidorov

AI 2025, 6(8), 191; https://doi.org/10.3390/ai6080191 - 18 Aug 2025

Viewed by 859

Abstract

Depression is a serious mental health disorder affecting cognition, emotions, and behavior. It impacts over 300 million people globally, with mental health care costs exceeding $1 trillion annually. Traditional diagnostic methods are often expensive, time-consuming, stigmatizing, and difficult to access. This study leverages [...] Read more.

Depression is a serious mental health disorder affecting cognition, emotions, and behavior. It impacts over 300 million people globally, with mental health care costs exceeding $1 trillion annually. Traditional diagnostic methods are often expensive, time-consuming, stigmatizing, and difficult to access. This study leverages NLP techniques to identify depressive cues in social media posts, focusing on both standard Urdu and code-mixed Roman Urdu, which are often overlooked in existing research. To the best of our knowledge, a script-conversion and combination-based approach for Roman Urdu and Nastaliq Urdu has not been explored earlier. To address this gap, our study makes four key contributions. First, we created a manually annotated dataset named Ruda-2025, containing posts in code-mixed Roman Urdu and Nastaliq Urdu for both binary and multiclass classification. The binary classes are depression” and not depression, with the depression class further divided into fine-grained categories: Mild, Moderate, and Severe depression alongside not depression. Second, we applied first-time two novel techniques to the RUDA-2025 dataset: (1) script-conversion approach that translates between code-mixed Roman Urdu and Standard Urdu and (2) combination-based approach that merges both scripts to make a single dataset to address linguistic challenges in depression assessment. Finally, we employed 60 different experiments using a combination of traditional machine learning and deep learning techniques to find the best-fit model for the detection of mental disorder. Based on our analysis, our proposed model (mBERT) using custom attention mechanism outperformed baseline (XGB) in combination-based, code-mixed Roman and Nastaliq Urdu script conversions. Full article

► Show Figures

Figure 1

24 pages, 2410 KB

Open AccessArticle

UA-HSD-2025: Multi-Lingual Hate Speech Detection from Tweets Using Pre-Trained Transformers

by Muhammad Ahmad, Muhammad Waqas, Ameer Hamza, Sardar Usman, Ildar Batyrshin and Grigori Sidorov

Computers 2025, 14(6), 239; https://doi.org/10.3390/computers14060239 - 18 Jun 2025

Cited by 1 | Viewed by 2173

Abstract

The rise in social media has improved communication but also amplified the spread of hate speech, creating serious societal risks. Automated detection remains difficult due to subjectivity, linguistic diversity, and implicit language. While prior research focuses on high-resource languages, this study addresses the [...] Read more.

The rise in social media has improved communication but also amplified the spread of hate speech, creating serious societal risks. Automated detection remains difficult due to subjectivity, linguistic diversity, and implicit language. While prior research focuses on high-resource languages, this study addresses the underexplored multilingual challenges of Arabic and Urdu hate speech through a comprehensive approach. To achieve this objective, this study makes four different key contributions. First, we have created a unique multi-lingual, manually annotated binary and multi-class dataset (UA-HSD-2025) sourced from X, which contains the five most important multi-class categories of hate speech. Secondly, we created detailed annotation guidelines to make a robust and perfect hate speech dataset. Third, we explore two strategies to address the challenges of multilingual data: a joint multilingual and translation-based approach. The translation-based approach involves converting all input text into a single target language before applying a classifier. In contrast, the joint multilingual approach employs a unified model trained to handle multiple languages simultaneously, enabling it to classify text across different languages without translation. Finally, we have employed state-of-the-art 54 different experiments using different machine learning using TF-IDF, deep learning using advanced pre-trained word embeddings such as FastText and Glove, and pre-trained language-based models using advanced contextual embeddings. Based on the analysis of the results, our language-based model (XLM-R) outperformed traditional supervised learning approaches, achieving 0.99 accuracy in binary classification for Arabic, Urdu, and joint-multilingual datasets, and 0.95, 0.94, and 0.94 accuracy in multi-class classification for joint-multilingual, Arabic, and Urdu datasets, respectively. Full article

(This article belongs to the Special Issue Recent Advances in Social Networks and Social Media)

► Show Figures

Figure 1

18 pages, 373 KB

Open AccessArticle

Machine Learning- and Deep Learning-Based Multi-Model System for Hate Speech Detection on Facebook

by Amna Naseeb, Muhammad Zain, Nisar Hussain, Amna Qasim, Fiaz Ahmad, Grigori Sidorov and Alexander Gelbukh

Algorithms 2025, 18(6), 331; https://doi.org/10.3390/a18060331 - 1 Jun 2025

Cited by 2 | Viewed by 1148

Abstract

Hate speech is a complex topic that transcends language, culture, and even social spheres. Recently, the spread of hate speech on social media sites like Facebook has added a new layer of complexity to the issue of online safety and content moderation. This [...] Read more.

Hate speech is a complex topic that transcends language, culture, and even social spheres. Recently, the spread of hate speech on social media sites like Facebook has added a new layer of complexity to the issue of online safety and content moderation. This study seeks to minimize this problem by developing an Arabic script-based tool for automatically detecting hate speech in Roman Urdu, an informal script used most commonly for South Asian digital communications. Roman Urdu is relatively complex as there are no standardized spellings, leading to syntactic variations, which increases the difficulty of hate speech detection. To tackle this problem, we adopt a holistic strategy using a combination of six machine learning (ML) and four Deep Learning (DL) models, a dataset from Facebook comments, which was preprocessed (tokenization, stopwords removal, etc.), and text vectorization (TF-IDF, word embeddings). The ML algorithms used in this study are LR, SVM, RF, NB, KNN, and GBM. We also use deep learning architectures like CNN, RNN, LSTM, and GRU to increase the accuracy of the classification further. It is proven by the experimental results that deep learning models outperform the traditional ML approaches by a significant margin, with CNN and LSTM achieving accuracies of 95.1% and 96.2%, respectively. As far as we are aware, this is the first work that investigates QLoRA for fine-tuning large models for the task of offensive language detection in Roman Urdu. Full article

(This article belongs to the Special Issue Linguistic and Cognitive Approaches to Dialog Agents)

► Show Figures

Figure 1

15 pages, 293 KB

Open AccessArticle

Fine-Tuning QurSim on Monolingual and Multilingual Models for Semantic Search

by Tania Afzal, Sadaf Abdul Rauf, Muhammad Ghulam Abbas Malik and Muhammad Imran

Information 2025, 16(2), 84; https://doi.org/10.3390/info16020084 - 23 Jan 2025

Viewed by 2032

Abstract

Transformers have made a significant breakthrough in natural language processing. These models are trained on large datasets and can handle multiple tasks. We compare monolingual and multilingual transformer models for semantic relatedness and verse retrieval. We leveraged data from the original QurSim dataset [...] Read more.

Transformers have made a significant breakthrough in natural language processing. These models are trained on large datasets and can handle multiple tasks. We compare monolingual and multilingual transformer models for semantic relatedness and verse retrieval. We leveraged data from the original QurSim dataset (Arabic) and used authentic multi-author translations in 22 languages to create a multilingual QurSim dataset, which we released for the research community. We evaluated the performance of monolingual and multilingual LLMs for Arabic and our results show that monolingual LLMs give better results for verse classification and matching verse retrieval. We incrementally built monolingual models with Arabic, English, and Urdu and multilingual models with all 22 languages supported by the multilingual paraphrase-MiniLM-L12-v2 model. Our results show improvement in classification accuracy with the incorporation of multilingual QurSim. Full article

(This article belongs to the Special Issue Methods for Integrating Information in Data, Language Models, and Knowledge Graphs for Neurosymbolic Learning and Reasoning)

► Show Figures

Figure 1

14 pages, 1811 KB

Open AccessArticle

Innovations in Urdu Sentiment Analysis Using Machine and Deep Learning Techniques for Two-Class Classification of Symmetric Datasets

by Khalid Bin Muhammad and S. M. Aqil Burney

Symmetry 2023, 15(5), 1027; https://doi.org/10.3390/sym15051027 - 5 May 2023

Cited by 13 | Viewed by 7144

Abstract

Many investigations have performed sentiment analysis to gauge public opinions in various languages, including English, French, Chinese, and others. The most spoken language in South Asia is Urdu. However, less work has been carried out on Urdu, as Roman Urdu is also used [...] Read more.

Many investigations have performed sentiment analysis to gauge public opinions in various languages, including English, French, Chinese, and others. The most spoken language in South Asia is Urdu. However, less work has been carried out on Urdu, as Roman Urdu is also used in social media (Urdu written in English alphabets); therefore, it is easy to use it in English language processing software. Lots of data in Urdu, as well as in Roman Urdu, are posted on social media sites such as Instagram, Twitter, Facebook, etc. This research focused on the collection of pure Urdu Language data and the preprocessing of the data, applying feature extraction, and innovative methods to perform sentiment analysis. After reviewing previous efforts, machine learning and deep learning algorithms were applied to the data. The obtained results were compared, and hybrid methods were also recommended in this research, enabling new avenues to conduct Urdu language data sentiment analysis. Full article

► Show Figures

Figure 1

26 pages, 3512 KB

Open AccessArticle

Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications

by Muhammad Bilal, Atif Khan, Salman Jan, Shahrulniza Musa and Shaukat Ali

Sensors 2023, 23(8), 3909; https://doi.org/10.3390/s23083909 - 12 Apr 2023

Cited by 35 | Viewed by 7008

Abstract

Social media applications, such as Twitter and Facebook, allow users to communicate and share their thoughts, status updates, opinions, photographs, and videos around the globe. Unfortunately, some people utilize these platforms to disseminate hate speech and abusive language. The growth of hate speech [...] Read more.

Social media applications, such as Twitter and Facebook, allow users to communicate and share their thoughts, status updates, opinions, photographs, and videos around the globe. Unfortunately, some people utilize these platforms to disseminate hate speech and abusive language. The growth of hate speech may result in hate crimes, cyber violence, and substantial harm to cyberspace, physical security, and social safety. As a result, hate speech detection is a critical issue for both cyberspace and physical society, necessitating the development of a robust application capable of detecting and combating it in real-time. Hate speech detection is a context-dependent problem that requires context-aware mechanisms for resolution. In this study, we employed a transformer-based model for Roman Urdu hate speech classification due to its ability to capture the text context. In addition, we developed the first Roman Urdu pre-trained BERT model, which we named BERT-RU. For this purpose, we exploited the capabilities of BERT by training it from scratch on the largest Roman Urdu dataset consisting of 173,714 text messages. Traditional and deep learning models were used as baseline models, including LSTM, BiLSTM, BiLSTM + Attention Layer, and CNN. We also investigated the concept of transfer learning by using pre-trained BERT embeddings in conjunction with deep learning models. The performance of each model was evaluated in terms of accuracy, precision, recall, and F-measure. The generalization of each model was evaluated on a cross-domain dataset. The experimental results revealed that the transformer-based model, when directly applied to the classification task of the Roman Urdu hate speech, outperformed traditional machine learning, deep learning models, and pre-trained transformer-based models in terms of accuracy, precision, recall, and F-measure, with scores of 96.70%, 97.25%, 96.74%, and 97.89%, respectively. In addition, the transformer-based model exhibited superior generalization on a cross-domain dataset. Full article

(This article belongs to the Special Issue Application of Transfer Learning and Ensembling Techniques for Cyber Security, Medicine, and Education Using Sensing Data)

► Show Figures

Figure 1

15 pages, 6195 KB

Open AccessArticle

Recognition and Classification of Handwritten Urdu Numerals Using Deep Learning Techniques

by Aamna Bhatti, Ameera Arif, Waqar Khalid, Baber Khan, Ahmad Ali, Shehzad Khalid and Atiq ur Rehman

Appl. Sci. 2023, 13(3), 1624; https://doi.org/10.3390/app13031624 - 27 Jan 2023

Cited by 12 | Viewed by 4119

Abstract

Urdu is a complex language as it is an amalgam of many South Asian and East Asian languages; hence, its character recognition is a huge and difficult task. It is a bidirectional language with its numerals written from left to right while script [...] Read more.

Urdu is a complex language as it is an amalgam of many South Asian and East Asian languages; hence, its character recognition is a huge and difficult task. It is a bidirectional language with its numerals written from left to right while script is written in opposite direction which induces complexities in the recognition process. This paper presents the recognition and classification of a novel Urdu numeral dataset using convolutional neural network (CNN) and its variants. We propose custom CNN model to extract features which are used by Softmax activation function and support vector machine (SVM) classifier. We compare it with GoogLeNet and the residual network (ResNet) in terms of performance. Our proposed CNN gives an accuracy of 98.41% with the Softmax classifier and 99.0% with the SVM classifier. For GoogLeNet, we achieve an accuracy of 95.61% and 96.4% on ResNet. Moreover, we develop datasets for handwritten Urdu numbers and numbers of Pakistani currency to incorporate real-life problems. Our models achieve best accuracies as compared to previous models in the literature for optical character recognition (OCR). Full article

(This article belongs to the Special Issue Digital Image Processing: Advanced Technologies and Applications)

► Show Figures

Figure 1

22 pages, 2248 KB

Open AccessArticle

Improving User Intent Detection in Urdu Web Queries with Capsule Net Architectures

by Sana Shams and Muhammad Aslam

Appl. Sci. 2022, 12(22), 11861; https://doi.org/10.3390/app122211861 - 21 Nov 2022

Cited by 4 | Viewed by 4858

Abstract

Detecting the communicative intent behind user queries is critically required by search engines to understand a user’s search goal and retrieve the desired results. Due to increased web searching in local languages, there is an emerging need to support the language understanding for [...] Read more.

Detecting the communicative intent behind user queries is critically required by search engines to understand a user’s search goal and retrieve the desired results. Due to increased web searching in local languages, there is an emerging need to support the language understanding for languages other than English. This article presents a distinctive, capsule neural network architecture for intent detection from search queries in Urdu, a widely spoken South Asian language. The proposed two-tiered capsule network utilizes LSTM cells and an iterative routing mechanism between the capsules to effectively discriminate diversely expressed search intents. Since no Urdu queries dataset is available, a benchmark intent-annotated dataset of 11,751 queries was developed, incorporating 11 query domains and annotated with Broder’s intent taxonomy (i.e., navigational, transactional and informational intents). Through rigorous experimentation, the proposed model attained the state of the art accuracy of 91.12%, significantly improving upon several alternate classification techniques and strong baselines. An error analysis revealed systematic error patterns owing to a class imbalance and large lexical variability in Urdu web queries. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

► Show Figures

Figure 1

15 pages, 9394 KB

Open AccessArticle

Introducing Urdu Digits Dataset with Demonstration of an Efficient and Robust Noisy Decoder-Based Pseudo Example Generator

by Wisal Khan, Kislay Raj, Teerath Kumar, Arunabha M. Roy and Bin Luo

Symmetry 2022, 14(10), 1976; https://doi.org/10.3390/sym14101976 - 21 Sep 2022

Cited by 47 | Viewed by 4402

Abstract

In the present work, we propose a novel method utilizing only a decoder for generation of pseudo-examples, which has shown great success in image classification tasks. The proposed method is particularly constructive when the data are in a limited quantity used for semi-supervised [...] Read more.

In the present work, we propose a novel method utilizing only a decoder for generation of pseudo-examples, which has shown great success in image classification tasks. The proposed method is particularly constructive when the data are in a limited quantity used for semi-supervised learning (SSL) or few-shot learning (FSL). While most of the previous works have used an autoencoder to improve the classification performance for SSL, using a single autoencoder may generate confusing pseudo-examples that could degrade the classifier’s performance. On the other hand, various models that utilize encoder–decoder architecture for sample generation can significantly increase computational overhead. To address the issues mentioned above, we propose an efficient means of generating pseudo-examples by using only the generator (decoder) network separately for each class that has shown to be effective for both SSL and FSL. In our approach, the decoder is trained for each class sample using random noise, and multiple samples are generated using the trained decoder. Our generator-based approach outperforms previous state-of-the-art SSL and FSL approaches. In addition, we released the Urdu digits dataset consisting of 10,000 images, including 8000 training and 2000 test images collected through three different methods for purposes of diversity. Furthermore, we explored the effectiveness of our proposed method on the Urdu digits dataset by using both SSL and FSL, which demonstrated improvement of 3.04% and 1.50% in terms of average accuracy, respectively, illustrating the superiority of the proposed method compared to the current state-of-the-art models. Full article

(This article belongs to the Special Issue Computer Vision, Pattern Recognition, Machine Learning, and Symmetry)

► Show Figures

Figure 1

24 pages, 1076 KB

Open AccessArticle

Attention-Based RU-BiLSTM Sentiment Analysis Model for Roman Urdu

by Bilal Ahmed Chandio, Ali Shariq Imran, Maheen Bakhtyar, Sher Muhammad Daudpota and Junaid Baber

Appl. Sci. 2022, 12(7), 3641; https://doi.org/10.3390/app12073641 - 4 Apr 2022

Cited by 29 | Viewed by 8173

Abstract

Deep neural networks have emerged as a leading approach towards handling many natural language processing (NLP) tasks. Deep networks initially conquered the problems of computer vision. However, dealing with sequential data such as text and sound was a nightmare for such networks as [...] Read more.

Deep neural networks have emerged as a leading approach towards handling many natural language processing (NLP) tasks. Deep networks initially conquered the problems of computer vision. However, dealing with sequential data such as text and sound was a nightmare for such networks as traditional deep networks are not reliable in preserving contextual information. This may not harm the results in the case of image processing where we do not care about the sequence, but when we consider the data collected from text for processing, such networks may trigger disastrous results. Moreover, establishing sentence semantics in a colloquial text such as Roman Urdu is a challenge. Additionally, the sparsity and high dimensionality of data in such informal text have encountered a significant challenge for building sentence semantics. To overcome this problem, we propose a deep recurrent architecture RU-BiLSTM based on bidirectional LSTM (BiLSTM) coupled with word embedding and an attention mechanism for sentiment analysis of Roman Urdu. Our proposed model uses the bidirectional LSTM to preserve the context in both directions and the attention mechanism to concentrate on more important features. Eventually, the last dense softmax output layer is used to acquire the binary and ternary classification results. We empirically evaluated our model on two available datasets of Roman Urdu, i.e., RUECD and RUSA-19. Our proposed model outperformed the baseline models on many grounds, and a significant improvement of 6% to 8% is achieved over baseline models. Full article

(This article belongs to the Special Issue Natural Language Processing: Recent Development and Applications)

► Show Figures

Figure 1

18 pages, 1375 KB

Open AccessArticle

Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media

by Lal Khan, Ammar Amjad, Kanwar Muhammad Afaq and Hsien-Tsung Chang

Appl. Sci. 2022, 12(5), 2694; https://doi.org/10.3390/app12052694 - 4 Mar 2022

Cited by 109 | Viewed by 13460

Abstract

Sentiment analysis (SA) has been an active research subject in the domain of natural language processing due to its important functions in interpreting people’s perspectives and drawing successful opinion-based judgments. On social media, Roman Urdu is one of the most extensively utilized dialects. [...] Read more.

Sentiment analysis (SA) has been an active research subject in the domain of natural language processing due to its important functions in interpreting people’s perspectives and drawing successful opinion-based judgments. On social media, Roman Urdu is one of the most extensively utilized dialects. Sentiment analysis of Roman Urdu is difficult due to its morphological complexities and varied dialects. The purpose of this paper is to evaluate the performance of various word embeddings for Roman Urdu and English dialects using the CNN-LSTM architecture with traditional machine learning classifiers. We introduce a novel deep learning architecture for Roman Urdu and English dialect SA based on two layers: LSTM for long-term dependency preservation and a one-layer CNN model for local feature extraction. To obtain the final classification, the feature maps learned by CNN and LSTM are fed to several machine learning classifiers. Various word embedding models support this concept. Extensive tests on four corpora show that the proposed model performs exceptionally well in Roman Urdu and English text sentiment classification, with an accuracy of 0.904, 0.841, 0.740, and 0.748 against MDPI, RUSA, RUSA-19, and UCL datasets, respectively. The results show that the SVM classifier and the Word2Vec CBOW (Continuous Bag of Words) model are more beneficial options for Roman Urdu sentiment analysis, but that BERT word embedding, two-layer LSTM, and SVM as a classifier function are more suitable options for English language sentiment analysis. The suggested model outperforms existing well-known advanced models on relevant corpora, improving the accuracy by up to 5%. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

13 pages, 2447 KB

Open AccessArticle

AUDD: Audio Urdu Digits Dataset for Automatic Audio Urdu Digit Recognition

by Aisha Chandio, Yao Shen, Malika Bendechache, Irum Inayat and Teerath Kumar

Appl. Sci. 2021, 11(19), 8842; https://doi.org/10.3390/app11198842 - 23 Sep 2021

Cited by 32 | Viewed by 4879

Abstract

The ongoing development of audio datasets for numerous languages has spurred research activities towards designing smart speech recognition systems. A typical speech recognition system can be applied in many emerging applications, such as smartphone dialing, airline reservations, and automatic wheelchairs, among others. Urdu [...] Read more.

The ongoing development of audio datasets for numerous languages has spurred research activities towards designing smart speech recognition systems. A typical speech recognition system can be applied in many emerging applications, such as smartphone dialing, airline reservations, and automatic wheelchairs, among others. Urdu is a national language of Pakistan and is also widely spoken in many other South Asian countries (e.g., India, Afghanistan). Therefore, we present a comprehensive dataset of spoken Urdu digits ranging from 0 to 9. Our dataset has 25,518 sound samples that are collected from 740 participants. To test the proposed dataset, we apply different existing classification algorithms on the datasets including Support Vector Machine (SVM), Multilayer Perceptron (MLP), and flavors of the EfficientNet. These algorithms serve as a baseline. Furthermore, we propose a convolutional neural network (CNN) for audio digit classification. We conduct the experiment using these networks, and the results show that the proposed CNN is efficient and outperforms the baseline algorithms in terms of classification accuracy. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI