This section explains the evaluation of the proposed pre-trained Bangla FastText model using text classification datasets and then compares the performance with the existing Facebook Bangla FastText model including Word2Vec. The BanglaFastText model was evaluated using three common text classification Bangla datasets, where well-known machine learning classifiers such as SVM, random forest, XGBoost, KNN, and logistic regression were used as text classifiers. As deep learning models, deep LSTM and CNN for sequential tokens learning were used. In addition, the setup tool of the experiment in which all evaluations and comparisons were performed to show the enhancement of the proposed method were described, utilizing statistical processes, and analysis results. The evaluation statistical metrics are precision, recall, accuracy, F1-score, and Hamming loss. Since the dataset is imbalanced, the results of document classification using macro-averaged precision, recall, and F1-score were considered. The experimental steps to generate the result and statistical analysis are depicted in
Figure 2 5.3. Classical ML Model
For this study, the model that was used for both the classical and deep learning algorithms is discussed below.
For the classical models, the training was performed using KNN, XGB, support vector machines (SVM), random forest (RF), and LR as baselines using character n-grams and word uni-grams with Word2Vec weighting. The results were reported using FastText embedding methods. For SVM, the
‘rbf’ was used [
31] as the kernel with the hyperparameter C set to 1 and
‘gamma=scale’. For KNN, after hyperparameter tuning, it is found that
‘k=5’ is the optimal value and the value of
‘p=2’ was set for
‘Minkowski’ [
32] as a distance metric. For LR
‘l2’, a penalty was chosen and the C value was set to 1. For XGBoost, the open-source XGBoost library (
https://github.com/dmlc/xgboost (accessed on 4 March 2022)) was used, and it was observed that
‘estimator=200’ and
‘max_depth=5’ provided us with enhanced performance in the implementation after hyperparameter tuning. Meanwhile, for RF,
‘n estimators’ was set to 200 and
‘max_depth=10’ was chosen for the classical approach.
5.5. Evaluation Metric
For each textual dataset, the proposed method was used for evaluation, using five evaluation metrics. The results are discussed in the order in which the evaluation metrics were applied. The performance of the test dataset is measured by precision (P), recall (R), accuracy (A), F1-score, and Hamming loss [
37]. Equation (
6)–(
9) is used in the development stage to calculate precision, recall, accuracy, and F1-score.
Precision is used to calculate the percentage of positive patterns that are correctly predicted out of all positive patterns in a positive class, represented by:
Recall refers to the percentage of positive patterns that are correctly classified, which is given by:
In general, the accuracy metric computes the ratio of correct predictions to total instances evaluated, which is evaluated by:
This metric is the harmonic mean of recall and precision values.
where
is the true-positive samples;
is the true-negative samples;
is the false-positive samples; and
is the false-negative samples.
The Hamming loss is the ratio of the number of incorrect labels to the total number of labels. The lower the Hamming loss, the higher the method’s performance. The Hamming loss is 0 for a perfect classifier. The Hamming loss can be estimated using Equation (
10).
Here, stands for the symmetric difference of sets and considers the dataset to be given as , with L number of labels.
5.6. Comparison with Previous Techniques
In order to evaluate the enhancement and effectiveness of the proposed system, the comparison of the proposed system with existing techniques was carried out. A word embedding is a representation of learned text in which words with related meanings are represented similarly. Methods of word embedding from a corpus of text learn a real-valued vector representation for a predetermined fixed-size vocabulary. There are different types of word embedding techniques available, such as bag of words,
-
, Word2Vec, FastText, etc. The proposed model is compared with Word2Vec and Facebook’s FastText word embedding pre-trained model, as those word embedding techniques perform better than others. It is observed that the proposed BanglaFastText word embedding model outperforms the previous models.
Table 5,
Table 6 and
Table 7 show the performance of the different methods and algorithms in terms of accuracy, Hamming loss, recall, precision, and
f1-score.
The BanFakeNews dataset, XGB classifier, and BanFastText feature extractor provide the lowest Hamming loss of 0.001 and the highest -score of 75.17%, while the accuracy is 97.23%, which is practically the same as Facebook’s FastText accuracy of 97.45%. In terms of CNN, the Skip-gram and CBOW models of BanFastText show 94.25% and 96.23% accuracy, respectively, with the highest of 62.92% f1-score in Skip-gram and the lowest of 0.17 Hamming loss in CBOW, whereas FastText gives 92.98% accuracy. On the other hand, LSTM gives the highest f1-score of 78.12% for the BanFastText (CBOW) model, but the accuracy and Hamming loss of FastText and BanFastText are similar.
For the Bangla sentiment analysis classification benchmark dataset corpus, the BanFastText feature extractor has the lowest Hamming loss of 0.05 in both CBOW and Skip-gram, while FastText has 0.06, and the highest f1-score of 89.78% in CBOW and 90.03% in Skip-gram, while the FastText f1-score is 88.56%. BanFastText has the highest accuracy of 94.41%, while FastText has 92.25%, which is the lowest compared with the BanFastText model. For the LSTM model, the accuracy of BanFastText (CBOW) outperforms Facebook’s FastText model with accuracies of 92.21% and 90.21%, respectively, and the f1-scores are 91.01% and 88.56%, respectively, which clearly indicates the better performance of BanFastText over Facebook’s embedding model FastText. The Hamming loss is also the lowest at 0.07 in the BanFastText model, but FastText has a loss of 0.08. The BanFastText (Skip-gram) feature extractor obtained 73.08% accuracy using a CNN network and showed the lowest loss of 0.21, while FastText achieved 71.38% accuracy with a 0.29 Hamming loss. The f1-score was 61.07% for BanFastText, whereas the f1-score of FastText was 59.27% indicating that BanFastText performs better than FastText.
For the Bengali News Comment dataset, the accuracy of BanFastText and FastText is 71.95% and 68.48%, and the f1-score is 74.27% and 72.62%, respectively. However, the Hamming loss for BanFastText is 0.28, which is also low. For the LSTM model, the accuracy of BanFastText (CBOW) outperforms Facebook’s FastText model in terms of accuracy, but both techniques produce a nearly identical f1-score and Hamming loss, where the accuracies are 76.25% and 74.59%, and the f1-scores are 69.21% and 69.46%, respectively, and the Hamming loss is 0.21 for both the BanFastText and FastText models. For CNN, BanFastText outperforms FastText both in accuracy and f1-score, where the accuracy is 74.11% for BanFastText and 73.26% for FastText, the f1-score is 64.67% in BanFastText and 62.25% for FastText, and the lowest Hamming loss of 0.22 is found for BanFastText.
5.7. Comparison with Datasets of Previous Work
The experiments for this purpose were carried out with three datasets. In the BanFakeNews dataset, Word2Vec pre-trained word embedding techniques were used, and for classification, some classical approaches such as SVM, RF, and LR were used, achieving 46%, 55%, and 53% f1-scores, respectively. For neural networks, 53% and 59% f1-scores of the fake class were achieved for CNN and LSTM, respectively. However, in classical approaches such as SVM, RF, and LR, the proposed word embedding outperforms by a large margin. This work achieved 64% and 66% f1-scores for CNN and LSTM, respectively, which is better than that found in previous work.
For the Bangla sentiment analysis classification benchmark corpus dataset, 93%, 89%, and 91% accuracy was achieved for SVM, RF, and LR, respectively, where bag of words and - word embedding techniques were used. Here, Bangla FastText word embedding performs slightly better than the other methods, as shown in the table.
For the Bengali News Comment dataset, the proposed word embedding method outperforms other methods. Comparisons between the proposed method and other techniques in terms of accuracy and
f1-score are shown in
Table 8.