Using Natural Language Processing and Machine Learning to Detect Online Radicalisation in the Maldivian Language, Dhivehi

Ibrahim, Hussain; Ibrahim, Ahmed; Johnstone, Michael N.

doi:10.3390/info16050342

Open AccessArticle

Using Natural Language Processing and Machine Learning to Detect Online Radicalisation in the Maldivian Language, Dhivehi

by

Hussain Ibrahim

^1,2,

Ahmed Ibrahim

^2,*

and

Michael N. Johnstone

²

¹

Maldivian National Defence Force, Malé 20126, Maldives

²

School of Science, Edith Cowan University, Joondalup 6027, Australia

^*

Author to whom correspondence should be addressed.

Information 2025, 16(5), 342; https://doi.org/10.3390/info16050342

Submission received: 4 March 2025 / Revised: 14 April 2025 / Accepted: 18 April 2025 / Published: 24 April 2025

(This article belongs to the Special Issue Natural Language Processing (NLP) with Applications and Natural Language Understanding (NLU))

Download

Browse Figures

Versions Notes

Abstract

:

Early detection of online radical content is important for intelligence services to combat radicalisation and terrorism. The motivation for this research was the lack of language tools in the detection of radicalisation in the Maldivian language, Dhivehi. This research applied Machine Learning and Natural Language Processing (NLP) to detect online radicalisation content in Dhivehi, with the incorporation of domain-specific knowledge. The research used Machine Learning to evaluate the most effective technique for detection of radicalisation text in Dhivehi and used interviews with Subject Matter Experts and self-deradicalised individuals to validate the results, add contextual information and improve recognition accuracy. The contributions of this research to the existing body of knowledge include datasets in the form of labelled radical/non-radical text, sentiment corpus of radical words and primary interview data of self-deradicalised individuals and a technique for detection of radicalisation text in Dhivehi for the first time using Machine Learning. We found that the Naïve Bayes algorithm worked best for the detection of radicalisation text in Dhivehi with an Accuracy of 87.67%, Precision of 85.35%, Recall of 92.52% and an F₂ score of 91%. Inclusion of the radical words identified through the interviews with SMEs as a count feature improved the performance of ML algorithms and Naïve Bayes by 9.57%.

Keywords:

Natural Language Processing; Machine Learning; radicalisation; terrorism; Dhivehi

1. Introduction

Radicalisation is derived from the Latin word radix, which refers to roots. Hence, the term broadly means the roots of terrorism. Radicalisation is a process that starts off with radicalisation of thought, where extreme views are accepted by an individual. This radicalisation of thought sometimes leads to radical actions, at which stage the individual becomes a violent extremist. The final stage or the culmination of the radicalisation process could lead to terrorism.

This research examines the ideological radicalisation in the Maldives, a country with close to 400,000 native speakers of Dhivehi, which is mainly based on Islamic radicalisation. The 20th and 21st centuries witnessed an increase in Islamic radicalisation worldwide, which has numerous push factors in the socio-economic and political domains. However, the ideological narratives of prominent Islamic scholars played an important role in shaping and driving the radicalised movements.

When international terrorist incidents are viewed through a lens of “cause and effect”, a large majority of the terrorist incidents are caused by individuals with extreme ideologies. The belief systems and ideologies of a terrorist were usually crucial in making the terrorist commit their acts of terror, and the Internet plays an important role in their radicalisation. After studying the use of the Internet in 15 cases of terrorism and extremism in the United Kingdom (UK), von Behr et al. [1] found that the Internet accelerated the radicalisation process and aided in the process of self-radicalisation. Hence, the Internet is not only a tactical tool and a source of inspiration but also aids in accelerating the radicalisation process.

It is important to see the effect of the use of the Internet in the radicalisation process by examining the effect of this radicalisation on the international terrorism arena in general and the Maldivian context in particular. There are no official statistics on the number of terrorism cases in the Maldives. However, reports indicate that 173 foreign terrorist fighters from the Maldives travelled to Syria to participate in the Syrian Civil War in the name of jihad.

For the purpose of this research, the definition of radicalisation is the one used by the National Counter Terrorism Centre (NCTC) of the Maldives. The NCTC defined radicalisation as “Denial, violent advocacy and blatant disrespect of the spirit of the Maldivian Constitution, its laws and the societal norms through words and actions of violence, cruelty and mistreatment” [2].

The NCTC further elaborates the definition by stating that radicalisation makes a person move away from the path that is based on the culture and traditions of the land and the rules and regulations of the state [2]. Hence, a radical individual, by definition, is moving in a continuum that would encourage and ultimately lead to breaking of laws of the land and in some cases to committing acts of terrorism.

The contributions of this research include the following:

The creation of a dataset of radical/non-radical texts in Dhivehi, the Maldivian native language, through independent validation by Subject Matter Experts. Even though this dataset is relatively small, consisting of 162 radical and 162 non-radical texts, this is the first and the only independently verified radical dataset produced in Dhivehi.
The creation of a corpus of radicalisation words in Dhivehi for use in Machine Learning to detect radical sentiments. The corpus can be used by other researchers in the domain of radicalisation for Machine Learning and sentiment analysis.
A primary dataset in the form of transcripts and translations of real-life experiences of self-deradicalised individuals to identify the pathways of self-radicalisation and deradicalisation as well as their beliefs on controversial ideological issues. These kinds of primary data have heretofore only been collected inside prisons. This is the first dataset that is collected from individuals who are self-deradicalised.
This is the first time Machine Learning and Natural Language Processing (NLP) have been utilised for the detection of radicalisation in Dhivehi. The incorporation of Machine Learning and NLP tools specifically in the domain of radicalisation detection in Dhivehi has not been researched before.

The remainder of the paper is organised as follows: Section 2 explores the Maldivian context of terrorism, radicalisation through the Internet, background on NLP and Dhivehi. The research design and methods are presented in Section 3, followed by results and discussion in Section 4 and Section 5, respectively. Section 6 concludes the paper.

2. Related Works

This section provides an overview of key contextual, linguistic and technical aspects relevant to the research. It begins by introducing Internet radicalisation and the Maldives, covering its geography, demographics and historical transition to Islam, followed by an exploration of Dhivehi as a unique language, its script development and its integration into digital platforms. The section then delves into radicalisation, terrorism and the role of the Internet, outlining definitions, the impact of radicalisation in the Maldives and the influence of online platforms. This is followed by a review of the existing literature on radicalisation detection using Machine Learning techniques in prominent languages. The section further explores detection methods in Arabic due to its linguistic similarities with Dhivehi.

2.1. The Use of the Internet in Radicalisation

Similar to Awan [3], Aly [4] argues that the Internet is being increasingly used by terrorists to recruit new cadre and to further their agendas. Correa and Sureka [5] argue that the inherent features of online platforms such as low publication threshold, greater anonymity, large audiences and high speed make online platforms an ideal tool for terrorists in propagating hate speech and extremist content. This has led to security agencies using Investigative Data Mining (IDM) to detect online radicalisation. Wadhwa and Bhatia [6] categorised four different ways that security agencies use IDM in countering terrorism. These include identifying networks, assessing links between networks, predicting terrorist acts and using sentiment analysis to identify pathways of communication. It is the lack of this identification by counter terrorism organisations that led to some of the most devastating terrorist attacks of this century.

von Behr et al. [1] also ascertained that the Internet accelerates the individual radicalisation processes by creating more opportunities to become radicalised and it (the Internet) also provides an “echo chamber”—a place where individuals find their ideas supported and echoed by other like-minded individuals. However, their study did not conclusively prove that the Internet is replacing the need for face-to-face contact during the radicalisation process. Instead, they suggest that it complements face-to-face communication. This conclusion is also reiterated by Gunton [7], where she also concludes that echo chambers can reinforce existing belief systems while it can only facilitate, rather than directly affect, radicalisation.

2.2. The Maldivian Context

Most of the acts of terrorism in the context of the Maldives fall under the first four acts mentioned in the definition of terrorism in the “Anti-terrorism Act” of the Maldives, which includes the act of killing, enforced disappearance or holding a person hostage, causing substantial damage to property and endangering the life of a person.

Radicalisation of Maldivians started in the early 1980s and culminated in the first terrorist attack in the Maldives. An Improvised Explosive Device (IED) was detonated in a public park near the Headquarters of the Maldives National Defence Force in the capital city, Male’, on 29 September 2007 wounding 12 foreign tourists from the UK, Japan and China. This was the first wakeup call to the security organisations in the nation of the emerging threat of radical Islamist terrorism. The terrorist attack was followed by a state-wide crackdown on extremists and resulted in a standoff between the radical elements on an island called Himandhoo and the security agencies, which later resulted in a violent confrontation, in which over 50 people were arrested [8]. Even before this first terrorist incident in 2007, a few incidents of concern were committed by Maldivians living abroad.

Some terrorist incidents of importance post-2007 that have occurred in the Maldives include, but are not limited to, two attempted assassinations of a sitting president and a former president and three high-profile killings, one of which was a prominent member of parliament and a religious scholar, the other two being journalists [9,10].

A large percentage (per capita) of Maldivians have also travelled to Syria to participate in the Syrian Civil War in the name of jihad (a holy war). According to Benmelech and Klor [11], the Maldives is second only to Tunisia in terms of foreign terrorist fighters per capita in Syria, with 500 per million of the population.

2.3. Use of the Internet and the Spread of Radicalisation in the Maldives

This century has seen an unprecedented growth in the volume of information sharing. This sharing was assisted by the development and growth of the Internet, which provided some degree of anonymity and a fast, effective and cheap communication medium for radicalisation to flourish. This, in turn, gave terrorists a tool to radicalise and recruit people for their cause and plan their activities.

Even though the Maldives is a developing nation, it is a nation where 63% [12] of the population use the Internet. This is higher than the global average of 53% and is comparable to most developed nations (for comparison, the percentage of Italians using the Internet is 63%). There is an unprecedented rapid increase in the prevalence of Internet use during the last few years, with the Internet use at 54% in 2015, 59% in 2016 and 63% in 2018 [12]. Maldivians are therefore increasingly exposed to cyber threats, including being more vulnerable to falling victim to radicalisation and recruitment attempts by individuals propagating radicalisation on the Internet and social media.

A Baseline Study of Radicalisation in the Maldives conducted by the NCTC shows that online radicalisation is the predominant factor leading to the recruitment of terrorists in the Maldives. The study was conducted as an intelligence-based analysis and showed that the main drivers of radicalisation are the online radical narratives that are being uploaded by Maldivians fighting in Syria and the radical elements operating in the Maldives using both the Dhivehi language for Dhivehi narratives and the English language for international narratives, aimed especially towards younger Maldivians who are proficient in the English language.

2.4. Approaches to Processing Large Volumes of Text

Before the advent of text analysis tools, analysing large text data was labour-intensive. However, content analysis software has made computers nearly as competent as humans in processing language to analyse the underlying meanings through NLP [13]. The increase in the computing power of today’s computers has enabled text analysis software to look at the concepts and themes expressed in large amounts of unstructured text and read into the underlying meaning of the conversation. Thompson [14] states that “social media is an effective tool to use, to radicalise and recruit members into a cause. It is always there whenever the user is. It lures its users with a promise of friendship, acceptance, or a sense of purpose” (p. 168). This promise of friendship and the sense of purpose have alienated today’s youth from the conventional sources of learning and increased reliance on the unverified sources of knowledge that are readily available on the Internet. In his conclusion, Thompson [14] states that “Analysts and decision makers involved in intelligence and national security need to be engaged in social media so they can understand the nuances of how nefarious users can leverage the benefits of social media to radicalise populations” (p. 179).

Most text analysis tools use NLP or Machine Learning algorithms for making predictions on large sets of unlabelled data. NLP employs computational techniques to learn, understand and produce human language [15]. The input could be in the form of a text consisting of alphabetical characters in the form of bytes or as a photograph having characters in the form of pixels or audio files. NLP is used in a wide range of applications including machine translation, speech recognition, speech synthesis, creation of spoken dialogue systems, speech-to-speech translation, social media mining and sentiment analysis.

Four key factors enabled NLP, the increase in the computational power of today’s computers, the availability of vast amounts of linguistic data, the development of Machine Learning methods and a richer understanding of the structure of human languages [15].

Unlike humans, who understand and associate meanings to the sound of words, their tones and inflections, a machine understands numbers, which may translate to vectors for the machine to be able to understand languages. Human language is complex and especially, in some contexts such as social media, very unstructured and thus may not follow rules of grammar.

When text data are reduced to a vector, words with similar meanings (however different the letters that form the words may be) are represented by similar vectors. For example, if sovereign is represented by a vector, that vector would not have a gender associated with it. The vector direction and value would not have a gender bias. However, King would have a similar vector value as masculine gender and sovereign, whereas Queen would have a vector value similar to a feminine gender plus sovereign.

In the vector space, each word is represented by a vector in an axis/dimension. A sentence or document with a certain number of words is represented as a vector in that multi-dimensional space. The number of unique words in the document determines the number of dimensions.

There are three basic approaches that use Machine Learning in the literature on radicalisation detection. First, supervised Machine Learning where a pre-labelled dataset is required for the Machine Learning algorithms to learn to classify the text into radical or non-radical category based on the features that the Machine Learning algorithm extracts from the text.

Second, an approach based on the sentiments of the text. Here, a dictionary of positive and negative words is used for the Machine Learning algorithm to classify the text as negative (radical) or positive (non-radical). In this approach, a separate list of negation words and amplification words are used to negate or amplify the positivity or negativity of the text based on rules that are pre-determined. The advantage of this approach is that no labelled dataset is required. The disadvantage is that a dictionary of negative and positive words is needed. Another disadvantage is that a dictionary of terms for the domain needs to be created for sentiment analysis to work for a particular domain.

Finally, the third approach is a combination of both. In this approach, both a labelled dataset as well as a dictionary of sentiment words are input to the Machine Learning algorithm. Most well-known languages used in communication have developed the language tools and datasets for NLP, including the sentiment corpora. This allows the libraries of tools and datasets including the corpora available for the well-known languages to be used in Machine Learning algorithms in the detection of radical content. However, in the researcher’s opinion, for languages that are poor in linguistic resources (without the linguistic tools and libraries of datasets and corpora), especially in the Machine Learning domain, the most practical approach is to produce the labelled datasets and the sentiment corpora for the radicalisation domain and then use NLP and Machine Learning algorithms for the detection of radicalisation.

Three systematic literature reviews of using Machine Learning to detect radicalisation: Adek and Ula [16]; Aldera, Emam, Al-Qurishi, Alrubaian and Alothaim [17]; and Gaikwad, Ahirrao, Phansalkar, Kotecha, Agarwal and Sureka [18] were studied to identify the most common Machine Learning algorithms, performance measures and approaches used in the literature. The summaries of these studies showing the number of articles citing each Machine Learning algorithm and metrics are presented in Table 1 and Table 2, respectively.

The most common Machine Learning algorithms were found to be SVM, Naïve Bayes, KNN, Random Forest and Logistic Regression, while the most common metrics used as a performance measure were Precision, Accuracy, F₁ Score and Recall.

2.5. Similarities Between Arabic and Dhivehi

Due to similarities between Arabic and Dhivehi, there are advantages in examining the techniques used in the detection of radicalisation in Arabic using Machine Learning. Arabic is one of the six official languages of the UN and the official language of 27 countries. It is spoken by more than 422 million people in the Arab world and is one of the fastest-growing languages on the Internet [19].

A large percentage of radicalisation content contains Arabic words that are written in Thikijehi Thaana. Naseem and Mushfique [20] stated that the radical elements would “prolifically publish translations of Wahhabi and other fundamentalist literature from all over the world in Dhivehi” (p. 5).

Like Dhivehi, Arabic is also written from right to left. Another similarity with Dhivehi is the separation of consonants and vowels where the consonants are written on the line, while short vowels are written either above or below the consonant. Both languages are phonetically written languages. A word is written as it is pronounced. The Arabic alphabet consists of 28 letters: 25 consonants and 3 long vowels. In addition to these vowels, Arabic script uses short vowels. These are placed either above or below the letters to provide the correct pronunciation and to clarify the meaning of the word. However, most of the online Arabic texts are written without the short vowels. This is because proficient Arab speakers can understand the text without the short vowels. However, the short vowels are used in children’s books as well as books for Arabic learners.

The absence of vowels presents a problem that challenges NLP systems. For example, without the short vowels above or below the letters, شعر may mean (شِعْرٌ poetry), (شَعْرٌ hair) or (شَعَرَ to feel). Fluent Arabic speakers would be able to distinguish the particular شعر that was used in the text by reading the whole sentence and drawing the meaning from the context [19].

Al-Rubaiee, Qiu and Li [21] conducted a study with a hybrid of NLP and Machine Learning approaches on Twitter data according to sentiments expressed. In their study, they analysed the data in four processes of pre-processing, applying Naïve Bayes and SVM, expansion to obtain results for N-grams terms of tokens and finally labelling by humans. SVM achieved the highest performance with Accuracy (96.06%), Precision (95.80%) and Recall (96.40%), followed by Naïve Bayes with Accuracy (88.38%), Precision (92.62%) and Recall (84.99%). A similar study was conducted by Albadi, Kurdi and Mishra [22], which examined religious hate speech in the Arabic Twitter sphere and achieved Precision (76%), Recall (78%), Accuracy (79%), F₁ Score (77%) and Area Under the Receiver Operating Characteristic Curve (84%) using Gated Recurrent Units-based Recurrent Neural Networks.

While Dhivehi has been significantly influenced by Arabic, particularly due to religious and cultural factors, this paper focuses on the shared features relevant to NLP tasks. A detailed comparison of linguistic differences between Dhivehi and Arabic falls outside the scope of this paper and would require a deeper linguistic analysis that is beyond the current study’s focus.

3. Materials and Methods

The approach selected was a mixed-method approach comprising a qualitative and a quantitative phase (see Figure 1). A dataset (quantitative dataset of radical and non-radical texts), which was labelled, and radical words identified using a series of interview data from three Subject Matter Experts (SMEs) were used by the Machine Learning algorithms, and parameters and features were selected to optimise the performance in the quantitative phase. A similar approach was used by Hung, Muramudalige and Jayasumana [23] when they used human-in-the -loop extraction and NLP techniques to detect radicalisation indicators in text.

The qualitative phase used the interview data of the self-deradicalised individuals. The interview data were used for the following purposes:

The qualitative data were evaluated to identify radicalisation pathways and validate the assumptions made with regard to the role of the Internet in an individual’s radicalisation journey.
They were used to improve the dataset of radical text and to improve the corpus of radical words.
They were used to identify the effect of Thaana text on social media in an individual’s radicalisation journey and to evaluate what worked in their deradicalisation process.

3.1. Quantitative Phase

The quantitative phase was composed of seven stages, as follows:

Stage 1: Collect Radical Data

The initial radical dataset was obtained from the archives of the Defence Intelligence Service (DIS) of the Maldivian National Defence Force. This dataset was labelled as radical by the intelligence officer of DIS responsible for collecting open-source intelligence on radical texts. This confidential dataset contained radical texts in both Dhivehi and English. The data in English were discarded, while the Dhivehi data were retained. The dataset was converted to a Comma Separated Value (CSV) file of (including data transliterated from Latin script) Thaana script.

Stage 2: Validate with SMEs

The identifiers of individuals were removed from the radical dataset provided by DIS. The dataset was validated by three different Subject Matter Experts (SMEs): one from academia, one from the religious sector and one from the security sector.

The three SMEs were from three different specialities, different age groups, a mix of genders as well as from different social backgrounds to gain insights into the radicalisation context from different perspectives. The three SMEs were selected based on their knowledge of Maldivian radicalisation domain. All three SMEs are representative of the cohort of experts available in the domain in the Maldives.

Once the dataset was labelled by the three independent SMEs, the dataset was labelled radical/non-radical based on a majority decision of at least two out of three SMEs.

Stage 3: Add Non-Radical Data

Data that were categorised as non-radical by two or more SMEs were added to the non-radical dataset. More non-radical data were added by scraping X (formerly known as Twitter at the time of the research) on random everyday topics in Thaana to reduce skew and create a balanced, categorised dataset on which to train supervised Machine Learning algorithms.

Stage 4: Pre-Process Training Dataset

Four additional training datasets were created from the original training dataset by carrying out NLP pre-processing. The main tasks carried out in this stage were the normalisation of the text and noise removal so that the Machine Learning algorithms could extract text features related to the meaning of the text without the added noise of text, which does not contribute to the meaning. Each dataset had a specific NLP technique applied to it. The first dataset was tokenised, the second dataset had the stop words removed, the third dataset had the words replaced by the stems and the fourth dataset had POS-tagged tokens. These are the four dataset variants depicted in Figure 1.

Stage 5: Exploratory Data Analysis

The four datasets were used to train multiple Machine Learning algorithms. SVM, Naïve Bayes, Logistic Regression, Neural Networks, KNN, Decision Tree, AdaBoost and Random Forest were evaluated using the default parameters on all four datasets using three text features using skLearn text vectorisers (Count Vectoriser, TF-IDF Vectoriser and Hashing Vectoriser) to explore the average Accuracy, Precision, Recall and F₁ Score of the cross validation results across the Machine Learning algorithms before fine-tuning parameters. A random selection of 90% to 10% training and testing dataset split was used at all stages of Machine Learning (i.e., stages 5, 6 and 7).

Stage 6: Fine-tune Parameters of Machine Learning Algorithms

During the fine-tuning stage, the optimum parameters for the Machine Learning algorithms were identified for different text feature selection methods. This optimisation was carried out to improve the performance of the Machine Learning algorithms, which enabled a fair comparison between algorithms.

Stage 7: Machine Learning with Sentiment Features

During the validation of the dataset by the three SMEs, a list of radical terms at word level and phrase level (bigrams and trigrams) was created, and the number of occurrences of these words and phrases in the dataset was created as an additional feature representing the radical sentiment.

3.2. Qualitative Phase

The qualitative phase was composed of three stages:

Stage 8: Conduct Interviews of Self-Deradicalised Individuals

Through the assistance of the NCTC, five self-deradicalised individuals volunteered for the interviews. Since the target population in this research is a homogenous population with respect to culture, background and ideology, data saturation was expected to be achieved with a small number of participants.

The questions were mostly open-ended discussion questions to gain insight into the participant’s radicalisation journey. Blogs, YouTube audio visuals, Facebook posts, web pages or Tweets that were mentioned during the interviews were also collected during this stage.

The qualitative analysis of the interview data identified the religious scholars, social media and the Internet sites followed by the participants. This provided an insight into the self-radicalisation and self-deradicalisation process. The radical words and phrases that the participants used in the interviews and in the social media accounts and webpages were further used to improve the radicalisation corpus.

Stage 9: Thematic Analysis of the Interviews

The transcribed and translated interview records were used for thematic analysis. Common themes and sub-themes and the common pathways to radicalisation and deradicalisation were identified. It was also used to identify the radical words and phrases used and to identify the most followed scholars, living and dead, local and international, to identify their influence in the radicalisation and deradicalisation process. The social media platforms, accounts and the use of the Internet in the radicalisation process were also analysed.

Stage 10: Feedback to Improve the Dataset and Radical Corpus

Themes and sub-themes that emerged from the interviews were further analysed. The radical words and phrases that we could identify as belonging to the themes and sub-themes identified from the interviews were added as additional features. These radical words and phrases used during the interviews were fed back into the quantitative phase (Stage 7) to enhance the corpus of radical words and phrases in Dhivehi as well as to improve the performance of the Machine Learning algorithms. A total of 175 radical words were incorporated into the corpus.

3.3. Validity and Reliability

The following threats to internal validity were addressed:

Bias in labelling the dataset: Efforts have been made to reduce bias in labelling the dataset by independently labelling the dataset by three SMEs. The bias that could arise from gender, cultural background as well as age of the SMEs has been reduced as much as possible through selection of SMEs from different educational and cultural backgrounds and with the use of both female and male SMEs.
Unbalanced training dataset: The dataset initially shared by DIS produced 162 radical text data and 38 non-radical data items. The dataset was balanced with further scraping of Twitter (X) to produce an equal amount of non-radical data.

The following threat to external validity was addressed:

Using a small dataset in a supervised Machine Learning approach: The small dataset of 162 radical and 162 non-radical data items may cause overfitting. However, incorporating the strengths of lexicon-based approaches into the Machine Learning algorithms with lexicon-based features through a semi-supervised approach is expected to mitigate this potential problem. Additional words in the radical corpus incorporated changes that were happening in real time, mitigating the threat to external validity.

3.4. Research Participants and Ethics

There were two groups of participants for the interviews.

Group 1: Three Subject Matter Experts (SMEs) with experience and knowledge related to terrorism and radicalisation in the Maldives.
Group 2: Five volunteers of self-deradicalised individuals from the Maldives. They were recruited with the assistance of the NCTC.

Both groups were interviewed remotely, and the audio was recorded and later altered by editing the pitch and speed to prevent voice recognition of individuals. Any identifiable information was removed and coded during the transcribing and translation of the interviews.

3.5. Research Metrics

As mentioned previously, the most commonly used Machine Learning performance metrics in this area are Accuracy, Precision, Recall and F-Score. The values in the confusion matrix can be used to calculate the appropriate metrics.

In Figure 2, all the correctly predicted cases of radical texts were true positives, and the correctly predicted non-radical texts were true negatives. Whereas, the radical texts, which are incorrectly classified as non-radical were false positives or Type I errors, and non-radical texts, which are incorrectly classified as radical texts, were false negatives or Type II errors.

F₂ Score is a weighted harmonic mean of Precision and Recall. We use the F₂ Score as the primary performance measure because, similar to the cyber security domain, even in the counter terrorism domain, the threat posed by not being able to detect a radical text (a False Negative being produced) is greater than the inconvenience that is encountered with wrongly classifying a non-radical text as radical (a False Positive). Hence, a metric that is weighted towards Recall is used. It is calculated by

F₂ Score = 5 × Precision × Recall/(4 × Precision + Recall)

or

F₂ Score = TP/(TP + 0.2 × FP + 0.8 × FN)

4. Results

The data collection started after we received the ethics approval for the research. The data collected from both the quantitative and qualitative phases include the following:

Radical dataset in PDF form from the DIS, later converted to CSV format for further processing.
Validation data from SMEs to validate radical dataset from DIS.
Non-radical data scraped from Twitter based on random everyday topics in the Dhivehi language.
Interview data from self-deradicalised individuals in the Maldives.

4.1. Description of Datasets

The dataset was a PDF file of photographs without text, photographs with text and photographs of text, containing text in English, text in Dhivehi written in Thaana script and text in Dhivehi transliterated to Latin script. Of the 257 photographs contained in the dataset, 61 were discarded because it was in English and was therefore out of scope of this research. A further five photographs were discarded because they did not contain text, and five additional photographs were discarded because they were repetitions. This reduced the initial dataset to 186 photographs, with some photographs representing a conversation between two or more individuals and creating more than one text. A final text dataset of 200 unique texts was obtained after deleting the repeated messages.

4.2. Fine-Tuning of Parameters of Machine Learning Algoritums

Eight Machine Learning algorithms were trained on the tokenised dataset using three text feature extraction methods, namely Count Vectoriser, Hashing Vectoriser and the TF-IDF Vectoriser. Their respective fine-tuned parameters are shown in Table 3.

4.3. Performance of Machine Learning Algorithims

Eight different Machine Learning algorithms and four different text normalisation techniques of NLP as well as three different text-to-feature vector conversions of NLP were used to identify the best-performing combinations of algorithms, NLP text normalisations and text feature conversions. The highest differences in performance were seen with the use of the Machine Learning algorithms.

A comparison of the F₂ Score for the eight Machine Learning algorithms is shown in Figure 3. Figure 3 shows that Naïve Bayes, Neural Networks and KNN algorithms were the best performers. Naïve Bayes performed best with TF-IDF Vectoriser (86.92%) and Hashing Vectoriser (84.79%), while AdaBoost performed best with Count Vectoriser (86.05%).

The results for text feature extraction methods are shown in Figure 4. Naïve Bayes performed the best for two of the text feature extraction methods, namely, TF-IDF Vectoriser (86.92%) and Hashing Vectoriser (84.79%). Unexpectedly, the inclusion of additional NLP normalisations of the text data (stop word removal, stemming and POS tagging) did not consistently improve the predictability of most Machine Learning algorithms except Neural Networks.

Overall, as shown by Table 4, the Naïve Bayes algorithm performed best, followed by the Neural Network algorithm. In an attempt to improve the performance metrics portrayed in Table 4, the radical words found in the dataset were identified during the dataset validation interviews with the SMEs. A list of radical words was created, and a new feature (radical sentiment feature) was added as the count of radical words found in the text. The Machine Learning algorithms were re-evaluated with the radical sentiment feature added to the tokenised text. The best result (F₂ Score) obtained for the prediction of radical texts without the addition of the contextual data (radical sentiment feature) identified from the interviews with the SMEs was 86.92% (Table 4) with Naïve Bayes algorithm. The inclusion of the new feature improved the prediction of radical content to 91.52% (Table 5).

Not all of our attempts to improve the metrics by adding in extra radicalisation were successful; however, Figure 5 and Table 5 show that the inclusion of a radical word count feature made some algorithms perform worse than previously.

5. Discussion

The relative meaning of radicalisation refers to anything outside accepted norms. Radicalisation can only be studied in relation to a society, as what is acceptable in one society may be completely unacceptable in another, as norms of society are dependent on a multitude of factors including culture, history, religion, education, affluence of the society, sometimes geography and the weather as well. For example, public flogging as a punishment is acceptable in Saudi Arabia while it would not be acceptable in the UK. Even amongst Muslim nations, societal norms differ and hence the differences in the perception of what is radical and what is not.

There are certain unique and contrasting features of the Maldivian society that make the radicalisation context of the Maldives different from other societies. These unique and contrasting features as well as the words associated with these features were understood through the interviews.

Through the interviews, we observed the following:

The Internet plays a major role in the radicalisation of Maldivians.
The peer network plays an important role in the radicalisation of individuals.
It is not just uneducated individuals who are radicalised as all the self-deradicalised interview participants had a secondary education or higher.

The SMEs labelled the datasets under three broad types of radicalisations where the narrative tried to

Bring about ideological change towards a more radical version of Islam;
Propagate hatred in society;
Incite violence.

The SMEs displayed a high level of agreement in categorising the texts as well as identifying the type of radicalisation of the texts. On categorising the text between radical and non-radical, the agreement amongst the SMEs was over 90%, with SME 1 having 93% agreement (with total six disagreements on non-radical and eight disagreements on radical); SME 2 with 95% agreement (with total six disagreements on non-radical and four disagreements on radical); and SME 3 with 98% agreement (with total one disagreement on non-radical and three disagreements on radical). Hence, there is less than 7% disagreement amongst the SMEs on categorising the text to radical/non-radical categories.

The observed agreement between three SMEs was 0.6666. The expected agreement between the three SMEs was 0.2743, and the Fleiss’ Kappa was 0.5315, which is considered to be moderate.

Themes that emerged from the interviews include the following:

Islamic brotherhood;
Arabised terms and norms;
Lack of respect for human rights.

The best Machine Learning algorithm identified for Dhivehi written in Thaana script was the Naïve Bayes algorithm. Reasons why it worked so well for the dataset were first, possibly due to the algorithm’s simplicity and the assumption of independence of features and giving the same weight to each feature in the calculation of the categorical class of the text. Second could be the amount of labelled data available for the Machine Learning algorithms to work on. With the inclusion of more data, Neural Networks may be able to perform as well as Naïve Bayes.

6. Conclusions

This research proved that Machine Learning and NLP can be utilised for the detection of radicalisation in Dhivehi. Using the techniques developed in this research, the radical content that is propagated on social media can be detected and the database of radical texts can be further enhanced.

Key findings:

Tokenisation without the additional normalisation of stop word removal, stemming or POS tagging worked best for six out of the eight Machine Learning algorithms tested.
Interviews assisted in the identification of categories of radicalisation and the themes on which people are radicalised.
Inclusion of radical sentiment data improved the performance.

Key contributions:

The creation of a dataset of radical/non-radical texts in Dhivehi through independent validation by Subject Matter Experts.
The creation of a corpus of radicalisation words in Dhivehi for use in Machine Learning algorithms in the detection of radical sentiments. This was carried out through the interviews with the SMEs and self-deradicalised individuals.
A primary dataset in the form of transcripts and translations of real-life experiences of self-deradicalised individuals to identify the pathways of self-radicalisation and deradicalisation as well as their beliefs on controversial ideological issues.
A methodology that utilised Machine Learning and NLP for the first time in the detection of radicalisation text in Dhivehi.

Limitations:

The scope of this research was deliberately restricted to Dhivehi language alone. It is common for Maldivians to communicate by mixing Dhivehi and English. This was also observed in some radical text; however, English was excluded in this study.
Due to the limited dataset used in the research, Large Language Models could not be used. However, with the identification of additional radical data, this is an approach that could be explored. Furthermore, recent progress [24,25] made in adopting transformer models (e.g., BERT, DistilBERT) shows great potential for adopting to multilingual radicalisation detection in social media.
The timeline of the radical dataset provided by DIS was since its formation in 2016. Hence, the size of the dataset was limited.
This research project commenced in 2019; therefore, data collection coincided with the COVID-19 pandemic travel restrictions. This prevented our ability to conduct face-to-face interviews with the self-deradicalised individuals. This limited the ability to recruit more participants as well as establish more trust that could have led to more open conversations.

Populating the database with additional text data and identification of additional features would increase the accuracy of the algorithms. The accurate identification of radicalisation material on the Internet would allow law enforcement authorities to take the necessary precautionary measures to reduce the spread of such material.

This research is a step in the direction of introducing Machine Learning to detect radical content written in Dhivehi on social media. The research needs further improvement and contributions from other researchers. The uses of audio files, text over photographs and Latin script are also seen in the radicalisation domain. The incorporation of these data in the radicalisation domain would increase the datasets available for Machine Learning algorithms to train. Hence, future research may explore the inclusion of Latin script, inclusion of photographs (through OCR) and inclusion of audio material (through conversion of voice to text).

Author Contributions

Conceptualisation, H.I. and A.I.; methodology, H.I.; validation, H.I., A.I. and M.N.J.; investigation, H.I.; resources, H.I.; data curation, H.I.; writing—original draft preparation, H.I.; writing—review and editing, H.I, A.I. and M.N.J.; supervision, A.I. and M.N.J.; project administration, A.I.; funding acquisition, A.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the ECU-Maldives National Defence Force-The Maldives National University Scholarship to Support Industry Engagement PhD Projects (Grant No.: G1003964).

Institutional Review Board Statement

This research project has received the approval of Edith Cowan University’s Human Research Ethics Committee, in accordance with the National Health and Medical Research Council’s National Statement on Ethical Conduct in Human Research 2007 (Updated 2018). The approval number is 2021-02231-IBRAHIM, obtained on 10 June 2021.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets presented in this article are not readily available because the dataset is currently being reviewed to be shared in the ECU Research Dataset at the time of this publication. Requests to access the datasets should be directed to ahmed.ibrahim@ecu.edu.au.

Conflicts of Interest

The authors declare no conflicts of interest.

References

von Behr, I.; Reding, A.; Edwards, C.; Gribbon, L. Radicalisation in the Digital Era: The Use of the Internet in 15 Cases of Terrorism and Extremism; Rand: Brussels, Belgium, 2013. [Google Scholar]
National Counter Terrorism Centre (Ed.) Harukashi Fikuru Maanakurun; National Counter Terrorism Centre: Malé, Maldives, 2019. Available online: https://nctc.gov.mv/announcement/anncmnt4.pdf (accessed on 10 June 2021).
Awan, A.N. Radicalization on the Internet? RUSI J. 2007, 152, 76–81. [Google Scholar] [CrossRef]
Aly, A. The Internet as Ideological Battleground. In Proceedings of the 1st Australian Counter Terrorism Conference, Perth, WA, Australia, 30 November 2010. [Google Scholar]
Correa, D.; Sureka, A. Solutions to detect and analyze online radicalization: A survey. arXiv 2013, arXiv:1301.4916. [Google Scholar]
Wadhwa, P.; Bhatia, M.P.S. Tracking on-line radicalization using investigative data mining. In Proceedings of the IEEE International Conference on Communications, New Delhi, India, 15–17 February 2013. [Google Scholar]
Gunton, K. The Impact of the Internet and Social Media Platforms on Radicalisation to Terrorism and Violent Extremism. In Privacy, Security and Forensics in the Internet of Things (IoT); Springer: Cham, Switzerland, 2022; pp. 167–177. [Google Scholar]
American Foreign Policy Council. Quick Facts, Maldives. 2013. Available online: https://almanac.afpc.org/uploads/documents/Maldives%202020%20Website.pdf (accessed on 10 June 2021).
Sharuhan, M. Police: IS Sympathizers Behind Attempt on Ex-Maldives Leader. Associated Press. 2021. Available online: https://apnews.com/article/government-and-politics-religion-islamic-state-group-maldives-0b491f40f6a5a72ad31b82b193af0322 (accessed on 30 July 2021).
Aiham, A. Murder commission pushes for charges against culprits behind Yameen, Rilwan, Afrasheem’s murders. The Edition, 3 December 2019. [Google Scholar]
Benmelech, E.; Klor, E. What explains the flow of foreign fighters to ISIS? Terror. Political Violence 2020, 32, 1458–1481. [Google Scholar] [CrossRef]
ITU. 2019. Available online: https://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx (accessed on 10 June 2021).
Bitter, C.; Elizondo, D.A.; Yang, Y.J. Natural language processing: A prolog perspective. Artif. Intell. Rev. 2010, 33, 151–173. [Google Scholar] [CrossRef]
Thompson, R. Radicalization and the Use of Social Media. J. Strateg. Secur. 2011, 4, 167–190. [Google Scholar] [CrossRef]
Hirschberg, J.; Manning, C.D. Advances in natural language processing. Science 2015, 349, 261–266. [Google Scholar] [CrossRef] [PubMed]
Adek, R.; Ula, M. Systematics review on the application of social media analytics for detecting radical and extremist group. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1071, 012029. [Google Scholar] [CrossRef]
Aldera, S.; Emam, A.; Al-Qurishi, M.; Alrubaian, M.; Alothaim, A. Online extremism detection in textual content: A systematic literature review. IEEE Access 2021, 9, 42384–42396. [Google Scholar] [CrossRef]
Agarwal, S.; Sureka, A. Applying social media intelligence for predicting and identifying on-line radicalization and civil unrest oriented threats. arXiv 2015, arXiv:1511.06858. [Google Scholar]
Boudad, N.; Faizi, R.; Thami, R.O.; Chiheb, R. Sentiment analysis in Arabic: A review of the literature. Ain Shams Eng. J. 2018, 9, 2479–2490. [Google Scholar] [CrossRef]
Naseem, A.; Mushfique, M. Maldives: The Long Road from Islam to Islamism, A Short History. Dhivehi Sitee. Available online: https://www.dhivehisitee.com/religion/islamism-maldives/ (accessed on 10 June 2021).
Al-Rubaiee, H.; Qiu, R.; Li, D. Identifying Mubasher software products through sentiment analysis of Arabic tweets. In Proceedings of the 2016 International Conference on Industrial Informatics and Computer Systems (CIICS), Sharjah, United Arab Emirates, 13–15 March 2016. [Google Scholar]
Albadi, N.; Kurdi, M.; Mishra, S. Are they Our Brothers? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, Spain, 28–31 August 2018; pp. 69–76. [Google Scholar] [CrossRef]
Hung, B.W.K.; Muramudalige, S.R.; Jayasumana, A.P.; Klausen, J.; Moloney, E. Recognizing Radicalization Indicators in Text Documents Using Human-in-the-Loop Information Extraction and NLP Techniques. In Proceedings of the 2019 IEEE International Symposium on Technologies for Homeland Security (HST), Woburn, MA, USA, 5–6 November 2019; pp. 1–7. [Google Scholar] [CrossRef]
Zerrouki, K.; Benblidia, N. Multilingual Text Preprocessing and Classification for the Detection of Extremism and Radicalization in Social Networks. Res. Sq. 2024. [Google Scholar] [CrossRef]
Shah, M.S.S.; Abuaieta, A.M.; Almazrouei, S.S. Safeguarding Online Communications using DistilRoBERTa for Detection of Terrorism and Offensive Chats. JISCR 2024, 7, 93–107. [Google Scholar] [CrossRef]

Figure 1. The research process.

Figure 2. Confusion matrix.

Figure 3. F₂ Scores for different text feature extraction methods (Stage 6).

Figure 4. F₂ Score for TF-IDF Vectoriser on four NLP techniques with Naïve Bayes giving the best performance (Stage 6).

Figure 5. Relative F₂ Scores with the inclusion of radical word count feature (Stage 7).

Table 1. Most common Machine Learning algorithms used to detect radicalisation.

Algorithm	[16]	[17]	[18]	Total
SVM	13	10	13	36
Naïve Bayes	9	1	9	19
KNN	6			6
Rule-based Classifier	4	1		5
Clustering	4			4
Exploratory Data Analysis	4			4
Decision Tree	3			3
Random Forest		4	8	12
AdaBoost		2		2
Neural Networks		2		2
Best First Search		1		1
Logistic Regression			6	6
Boosting			5	5
Other			3	3

Table 2. Most common metrics used in the algorithms of Table 1.

Metric	[16]	[17]	[18]	Total
Precision	12		17	29
Recall	9	6	16	31
F₁ Score	8	8	12	28
K-fold validation	1			1
Accuracy		11	11	22
Precision		6		6
ROC		1	7	8
Confusion Matrix			3	3

Table 3. Fine-tuned parameters for all three vectorisers per Machine Learning algorithm (Stage 6).

Algorithm	Count Vectoriser	Hashing Vectoriser	TF-IDF Vectoriser
AdaBoost	learning_rate = 0.1 n_estimators = 1000	learning_rate = 0.1 n_estimators = 1000	learning_rate = 0.1 n_estimators = 500
Decision Tree	Criterion = ‘entropy’ min_samples_leaf = 4 min_samples_split = 25	criterion = ‘entropy’ min_samples_leaf = 4 min_samples_split = 25	criterion = ‘entropy’ min_samples_leaf = 2 min_samples_split = 15
K Nearest Neighbours	leaf_size = 30 n_neighbours = 5	leaf_size = 10 n_neighbours = 9	leaf_size = 30 n_neighbours = 5
Logistic Regression	C = 0.5 max_iter = 30 solver = ‘liblinear’	C = 0.7 max_iter = 30 solver = ‘liblinear’	C = 0.7 max_iter = 30 solver = ‘liblinear’
Naïve Bayes (Multinomial)	alpha = 0.9	alpha = 0.8	alpha = 0.4
Neural Network (MLP)	alpha = 0.0001 hidden_layer_sizes = 100 max_iter = 200	alpha = 0.0001 hidden_layer_sizes = 150 max_iter = 300	alpha = 0.001 hidden_layer_sizes = 150 max_iter = 300
Random Forest	max_depth = 35 max_leaf_nodes = 35 min_samples_leaf = 0.005 n_estimators = 110	max_depth = 50 max_leaf_nodes = 35 min_samples_leaf = 0.005 n_estimators = 110	max_depth = 35 max_leaf_nodes = 50 min_samples_leaf = 0.006 n_estimators = 50
Support Vector Machine (SVM)	C = 0.3 kernel = ‘linear’	C = 2.1 kernel = ‘linear’	C = 1.3 kernel = ‘linear’

Table 4. Accuracy, Precision, Recall and F₂ Score of the Machine Learning algorithms for each vectoriser (Stage 6).

Machine Learning Algorithm	Vectoriser	Accuracy	Precision	Recall	F₂ Score
AdaBoost	Count	81.43%	78.59%	88.14%	86.05%
	Hashing	70.79%	67.08%	81.81%	78.37%
	TF-IDF	73.23%	74.65%	71.57%	72.17%
Decision Tree	Count	74.61%	75.73%	70.14%	71.19%
	Hashing	72.54%	75.07%	67.43%	68.83%
	TF-IDF	68.76%	69.00%	73.00%	72.16%
KNN	Count	51.86%	53.74%	89.24%	78.82%
	Hashing	69.07%	64.24%	86.05%	80.58%
	TF-IDF	77.70%	73.74%	87.43%	84.30%
Logistic Regression	Count	75.97%	80.16%	69.33%	71.26%
	Hashing	75.32%	80.56%	65.38%	67.94%
	TF-IDF	77.34%	85.88%	65.95%	69.16%
Neural Network (MLP)	Count	77.70%	74.27%	84.86%	82.50%
	Hashing	78.36%	75.66%	84.76%	82.77%
	TF-IDF	77.70%	75.85%	82.00%	80.69%
Naïve Bayes	Count	82.52%	84.19%	80.76%	81.42%
	Hashing	79.39%	75.37%	87.52%	84.79%
	TF-IDF	82.85%	79.48%	89.00%	86.92%
Random Forest	Count	72.56%	82.40%	56.90%	60.66%
	Hashing	74.95%	82.84%	63.86%	66.92%
	TF-IDF	73.94%	83.82%	58.90%	62.63%
SVM	Count	78.71%	81.52%	74.24%	75.59%
	Hashing	79.09%	82.38%	74.38%	75.85%
	TF-IDF	78.72%	81.91%	74.29%	75.70%

Table 5. Algorithm performance with the inclusion of radical word count feature (Stage 7).

Machine Learning Algorithm	Vectoriser	Accuracy	Precision	Recall	F₂ Score
AdaBoost	Count	81.82%	81.50%	84.14%	83.60%
	Hashing	84.92%	82.81%	88.90%	87.62%
	TF-IDF	84.25%	82.90%	87.57%	86.59%
Decision Tree	Count	78.71%	80.70%	77.19%	77.87%
	Hashing	80.78%	83.60%	77.95%	79.02%
	TF-IDF	81.53%	84.57%	78.05%	79.27%
KNN	Count	54.60%	54.83%	95.29%	83.03%
	Hashing	82.51%	81.90%	84.71%	84.14%
	TF-IDF	83.54%	83.78%	84.14%	84.07%
Logistic Regression	Count	84.91%	85.65%	84.00%	84.32%
	Hashing	83.53%	88.80%	77.14%	79.22%
	TF-IDF	83.20%	88.63%	76.48%	78.63%
Neural Network	Count	83.53%	78.33%	93.14%	89.75%
	Hashing	86.60%	84.23%	91.00%	89.56%
	TF-IDF	85.26%	81.63%	91.71%	89.50%
Naïve Bayes	Count	87.67%	85.35%	92.52%	90.99%
	Hashing	80.10%	72.48%	97.95%	91.52%
	TF-IDF	80.46%	73.33%	96.57%	90.81%
Random Forest	Count	82.86%	85.94%	79.24%	80.49%
	Hashing	84.25%	87.77%	80.62%	81.95%
	TF-IDF	80.46%	82.72%	77.90%	78.82%
SVM	Count	84.23%	83.49%	85.43%	85.03%
	Hashing	84.23%	87.26%	79.86%	81.24%
	TF-IDF	84.92%	88.18%	81.33%	82.62%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ibrahim, H.; Ibrahim, A.; Johnstone, M.N. Using Natural Language Processing and Machine Learning to Detect Online Radicalisation in the Maldivian Language, Dhivehi. Information 2025, 16, 342. https://doi.org/10.3390/info16050342

AMA Style

Ibrahim H, Ibrahim A, Johnstone MN. Using Natural Language Processing and Machine Learning to Detect Online Radicalisation in the Maldivian Language, Dhivehi. Information. 2025; 16(5):342. https://doi.org/10.3390/info16050342

Chicago/Turabian Style

Ibrahim, Hussain, Ahmed Ibrahim, and Michael N. Johnstone. 2025. "Using Natural Language Processing and Machine Learning to Detect Online Radicalisation in the Maldivian Language, Dhivehi" Information 16, no. 5: 342. https://doi.org/10.3390/info16050342

APA Style

Ibrahim, H., Ibrahim, A., & Johnstone, M. N. (2025). Using Natural Language Processing and Machine Learning to Detect Online Radicalisation in the Maldivian Language, Dhivehi. Information, 16(5), 342. https://doi.org/10.3390/info16050342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using Natural Language Processing and Machine Learning to Detect Online Radicalisation in the Maldivian Language, Dhivehi

Abstract

1. Introduction

2. Related Works

2.1. The Use of the Internet in Radicalisation

2.2. The Maldivian Context

2.3. Use of the Internet and the Spread of Radicalisation in the Maldives

2.4. Approaches to Processing Large Volumes of Text

2.5. Similarities Between Arabic and Dhivehi

3. Materials and Methods

3.1. Quantitative Phase

3.2. Qualitative Phase

3.3. Validity and Reliability

3.4. Research Participants and Ethics

3.5. Research Metrics

4. Results

4.1. Description of Datasets

4.2. Fine-Tuning of Parameters of Machine Learning Algoritums

4.3. Performance of Machine Learning Algorithims

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI