Surfing the Modeling of pos Taggers in Low-Resource Scenarios
Abstract
:1. Introduction
2. Related Work and Contribution
3. The Formal Framework
3.1. The Notational Support
3.2. Correctness
3.3. Robustness
4. The Testing Frame
4.1. The Monitoring Structure
4.2. The Performance Metrics
4.2.1. Measuring the Reliability
The Quantitative Perspective
The Qualitative Perspective
4.2.2. Measuring the Robustness
5. The Experiments
5.1. The Linguistics Resources
- A highly complex conjugation paradigm, with 10 simple tenses including the infinitive conjugate, all of which have 6 different persons. If we add the present imperative with 2 forms, non-conjugated infinitive, gerund and participle, then 65 inflected forms are associated with each verb.
- Irregularities in both verb stems and endings. Common verbs, such as facer (to do), have up to five stems: fac-er, fag-o, fa-s, fac-emos, fix-en. Approximately 30% of verbs are irregular.
- Verbal forms with enclitic pronouns at the end, which can produce changes in the stem due to the presence of accents: deu (gave), déullelo (he/she gave it to them). The unstressed pronouns are usually suffixed and, moreover, they can be easily drawn together and often are contracted (lle + o = llo), as in the case of váitemello buscar (go and fetch it for him (do it for me)). It is also frequent to use what we call a solidarity pronoun, as che and vos, in order to let the listeners be participants in the action. That way, forms with up to four enclitic pronouns, such as perdéuchellevolo (he had lost it to him), are rather common.
- A highly complex gender inflection, including words with only one gender, such as home (man) and muller (woman), and words with the same form for both genders as azul (blue). Regarding words with separate forms for masculine and feminine, more than 30 variation groups are identified.
- A highly complex number inflexion, with words only being presented in singular form, such as luns (monday), and others where only the plural form is correct, such as matemáticas (mathematics). More than a dozen variation groups are identified.
5.2. The pos Tagging Systems
- In the category of stochastic methods and representing the hidden Márkov models (hmms), we choose tnt [86]. We also include the treetagger [87], a proposal that uses decision trees to generate the hmm, and morfette [88], an averaged perceptron approach [89]. To illustrate the maximum entropy models (mems), we select mxpost [90] and opennlp maxent [91]. Finally, the stanford pos tagger [91] combines features of hmms and mems using a conditional Márkov model.
- Under the heading of other approaches, we consider fntbl [92], an update of the classic brill tagger [93], as an example of transformation-based learning. As a memory-based method, we take the memory-based tagger (mbt) [94], while svmtool [81] illustrates the behavior of support vector machines (svms).
5.3. The Testing Space
6. Discussion
6.1. The Sets of Runs
6.2. The Quantitative Study
6.3. The Qualitative Study
6.4. The Study of Robustness
7. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ac | Accuracy |
al | Active Learning |
clevel | Convergence Level |
dl | Deep Learning |
dmr | Decision-Making Reliability |
eac | Estimated Accuracy |
hmm | Hidden Márkov Model |
mem | Maximum Entropy Model |
mbt | Memory-Based Tagger |
ml | Machine Learning |
nlp | Natural Language Processing |
pe | Percentage Error |
pos | Part-of-Speech |
plevel | Prediction Level |
re | Reliability Estimation |
rer | Reliability Estimation Ratio |
rr | Robustness Rate |
svm | Support Vector Machine |
wlevel | Working Level |
References
- Chiche, A.; Yitagesu, B. Part of speech tagging: A systematic review of deep learning and machine learning approaches. J. Big Data 2022, 9, 10. [Google Scholar] [CrossRef]
- Darwish, K.; Mubarak, H.; Abdelali, A.; Eldesouki, M. Arabic POS Tagging: Don’t Abandon Feature Engineering Just Yet. In Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, Spain, 3 April 2017; Association for Computational Linguistics: Madison, WI, USA, 2017; pp. 130–137. [Google Scholar]
- Pylypenko, D.; Amponsah-Kaakyire, K.; Dutta Chowdhury, K.; van Genabith, J.; España-Bonet, C. Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; Association for Computational Linguistics: Madison, WI, USA, 2021; pp. 8596–8611. [Google Scholar]
- Tayyar Madabushi, H.; Lee, M. High Accuracy Rule-based Question Classification using Question Syntax and Semantics. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 13–16 December 2016; Association for Computational Linguistics: Madison, WI, USA, 2016; pp. 1220–1230. [Google Scholar]
- Zhang, B.; Su, J.; Xiong, D.; Lu, Y.; Duan, H.; Yao, J. Shallow Convolutional Neural Network for Implicit Discourse Relation Recognition. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; Association for Computational Linguistics: Madison, WI, USA, 2015; pp. 2230–2235. [Google Scholar]
- Chiong, R.; Wei, W. Named Entity Recognition Using Hybrid Machine Learning Approach. In Proceedings of the 5th IEEE International Conference on Cognitive Informatics, Beijing, China, 17–19 July 2006; IEEE CS Press: Washington, DC, USA, 2006; Volume 1, pp. 578–583. [Google Scholar]
- Kim, J.; Ko, Y.; Seo, J. A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains. IEEE Access 2019, 7, 70308–70318. [Google Scholar] [CrossRef]
- Li, J.; Li, R.; Hovy, E. Recursive Deep Models for Discourse Parsing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Madison, WI, USA, 2014; pp. 2061–2069. [Google Scholar]
- Crammer, K. Advanced Online Learning for Natural Language Processing. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Tutorial Abstracts, Columbus, OH, USA, 15–20 June 2008; Association for Computational Linguistics: Madison, WI, USA, 2008; p. 4. [Google Scholar]
- Vlachos, A. Evaluating unsupervised learning for natural language processing tasks. In Proceedings of the First workshop on Unsupervised Learning in NLP, Edinburgh, UK, 30 July 2011; Association for Computational Linguistics: Madison, WI, USA, 2011; pp. 35–42. [Google Scholar]
- Florian, R.; Hassan, H.; Jing, H.; Kambhatla, N.; Luo, X.; Nicolov, N.; Roukos, S. A Statistical Model for Multilingual Entity Detection and Tracking. In Proceedings of the Human Language Technologies Conference 2004, Boston, MA, USA, 2–7 May 2004; Association for Computational Linguistics: Madison, WI, USA, 2004; pp. 1–8. [Google Scholar]
- Xue, G.R.; Dai, W.; Yang, Q.; Yu, Y. Topic-bridged PLSA for Cross-domain Text Classification. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, 20–24 July 2008; ACM Press: New York, NY, USA, 2008; pp. 627–634. [Google Scholar]
- Chan, S.; Honari Jahromi, M.; Benetti, B.; Lakhani, A.; Fyshe, A. Ensemble Methods for Native Language Identification. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, Copenhagen, Denmark, 8 September 2017; Association for Computational Linguistics: Madison, WI, USA, 2017; pp. 217–223. [Google Scholar]
- Libovický, J.; Helcl, J. End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2–4 November 2018; Association for Computational Linguistics: Madison, WI, USA, 2018; pp. 3016–3021. [Google Scholar]
- Cortes, E.; Woloszyn, V.; Binder, A.; Himmelsbach, T.; Barone, D.; Möller, S. An Empirical Comparison of Question Classification Methods for Question Answering Systems. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 13–15 May 2020; European Language Resources Association: Paris, France, 2020; pp. 5408–5416. [Google Scholar]
- Swier, R.S.; Stevenson, S. Unsupervised Semantic Role Labeling. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; Association for Computational Linguistics: Madison, WI, USA, 2004; pp. 95–102. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; ACM Press: New York, NY, USA, 2011; pp. 513–520. [Google Scholar]
- Dai, W.; Xue, G.R.; Yang, Q.; Yu, Y. Transferring Naive Bayes Classifiers for Text Classification. In Proceedings of the 22nd National Conference on Artificial Intelligence, Vancouver, BC, Canada, 22–26 July 2007; AAAI Press: Menlo Park, CA, USA, 2007; Volume 1, pp. 540–545. [Google Scholar]
- Ebrahimi, M.; Eberhart, A.; Bianchi, F.; Hitzler, P. Towards Bridging the Neuro-Symbolic Gap: Deep Deductive Reasoners. Appl. Intell. 2021, 51, 6326–6348. [Google Scholar] [CrossRef]
- Poggio, T.; Banburski, A.; Liao, Q. Theoretical issues in deep networks. Proc. Natl. Acad. Sci. USA 2020, 117, 30039–30045. [Google Scholar] [CrossRef] [PubMed]
- Hao, H.; Mengya, G.; Mingsheng, W. Relieving the Incompatibility of Network Representation and Classification for Long-Tailed Data Distribution. Comput. Intell. Neurosci. 2021, 2021, 6702625. [Google Scholar]
- Zhang, Y.; Kang, B.; Hooi, B.; Yan, S.; Feng, J. Deep Long-Tailed Learning: A Survey. arXiv 2021, arXiv:2110.04596. [Google Scholar]
- Hoefler, T.; Alistarh, D.T.B.N.N.D.; Peste, A. Analytically Tractable Hidden-States Inference in Bayesian Neural Networks. J. Mach. Learn. Res. 2021, 23, 1–124. [Google Scholar]
- Li, H. Deep learning for natural language processing: Advantages and challenges. Natl. Sci. Rev. 2017, 5, 24–26. [Google Scholar] [CrossRef]
- Hedderich, M.A.; Lange, L.; Adel, H.; Strötgen, J.; Klakow, D. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; Association for Computational Linguistics: Madison, WI, USA, 2021; pp. 2545–2568. [Google Scholar]
- Chakrabarty, A.; Chaturvedi, A.; Garain, U. NeuMorph: Neural Morphological Tagging for Low-Resource Languages—An Experimental Study for Indic Languages. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2019, 19, 1–19. [Google Scholar] [CrossRef]
- Geman, S.; Bienenstock, E.; Doursat, R. Neural Networks and the Bias/Variance Dilemma. Neural Comput. 1992, 4, 1–58. [Google Scholar] [CrossRef]
- Magnini, B.; Lavelli, A.; Magnolini, S. Comparing Machine Learning and Deep Learning Approaches on NLP Tasks for the Italian Language. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; European Language Resources Association: Paris, France, 2020; pp. 2110–2119. [Google Scholar]
- Saied, H.A.; Candito, M.; Constant, M. Comparing linear and neural models for competitive MWE identification. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, Turku, Finland, 30 September–2 October 2019; Linköping University Electronic Press: Linköping, Sweden, 2019; pp. 86–96. [Google Scholar]
- Wang, M.; Manning, C.D. Effect of Non-linear Deep Architecture in Sequence Labeling. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan, 14–18 October 2013; pp. 1285–1291. [Google Scholar]
- Song, H.J.; Son, J.W.; Noh, T.G.; Park, S.B.; Lee, S.J. A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, Jeju Island, Korea, 8–14 July 2012; Association for Computational Linguistics: Madison, WI, USA, 2012; Volume 1, pp. 1025–1034. [Google Scholar]
- Hoesen, D.; Purwarianti, A. Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger. arXiv 2020, arXiv:2009.05687. [Google Scholar]
- Khan, W.; Daud, A.; Khan, K.; Nasir, J.A.; Basheri, M.; Aljohani, N.; Alotaibi, F.S. Part of Speech Tagging in Urdu: Comparison of Machine and Deep Learning Approaches. IEEE Access 2019, 7, 38918–38936. [Google Scholar] [CrossRef]
- Ljubešić, N. Comparing CRF and LSTM performance on the task of morphosyntactic tagging of non-standard varieties of South Slavic languages. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, Santa Fe, NM, USA, 20–21 August 2018; Association for Computational Linguistics: Madison, WI, USA, 2018; pp. 156–163. [Google Scholar]
- Stankovic, R.; Šandrih, B.; Krstev, C.; Utvić, M.; Skoric, M. Machine Learning and Deep Neural Network-Based Lemmatization and Morphosyntactic Tagging for Serbian. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 20–25 June 2022; European Language Resources Association: Paris, France, 2022; pp. 3954–3962. [Google Scholar]
- Todi, K.K.; Mishra, P.; Sharma, D.M. Building a Kannada POS Tagger Using Machine Learning and Neural Network Models. arXiv 2018, arXiv:1808.03175. [Google Scholar]
- Murata, N.; Yoshizawa, S.; ichi Amari, S. Learning Curves, Model Selection and Complexity of Neural Networks. In Neural Information Processing Systems; Hanson, S.J., Cowan, D., Giles, C.L., Eds.; Morgan Kaufmann: San Mateo, CA, USA, 1993; Volume 5, pp. 607–614. [Google Scholar]
- Bertoldi, N.; Cettolo, M.; Federico, M.; Buck, C. Evaluating the Learning Curve of Domain Adaptive Statistical Machine Translation Systems. In Proceedings of the 7th Workshop on Statistical Machine Translation, Montreal, Canada, 7–8 June 2012; Association for Computational Linguistics: Madison, WI, USA, 2012; pp. 433–441. [Google Scholar]
- Turchi, M.; De Bie, T.; Cristianini, N. Learning Performance of a Machine Translation System: A Statistical and Computational Analysis. In Proceedings of the 3rd Workshop on Statistical Machine Translation, Columbus, OH, USA, 19 June 2008; Association for Computational Linguistics: Madison, WI, USA, 2008; pp. 35–43. [Google Scholar]
- Koehn, P.; Och, F.J.; Marcu, D. Statistical Phrase-based Translation. In Proceedings of the 2003 Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, AB, Canada, 28–30 May 2003; Association for Computational Linguistics: Madison, WI, USA, 2003; Volume 1, pp. 48–54. [Google Scholar]
- Kolachina, P.; Cancedda, N.; Dymetman, M.; Venkatapathy, S. Prediction of Learning Curves in Machine Translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, Jeju Island, Korea, 8–14 July 2012; Association for Computational Linguistics: Madison, WI, USA, 2012; Volume 1, pp. 22–30. [Google Scholar]
- Birch, A.; Osborne, M.; Koehn, P. Predicting Success in Machine Translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, USA, 25–27 October 2008; Association for Computational Linguistics: Madison, WI, USA, 2008; pp. 745–754. [Google Scholar]
- Cohn, D.; Atlas, L.; Ladner, R. Improving Generalization with Active Learning. Mach. Learn. 1994, 15, 201–221. [Google Scholar] [CrossRef]
- Culotta, A.; McCallum, A. Reducing Labeling Effort for Structured Prediction Tasks. In Proceedings of the 20th National Conference on Artificial Intelligence, Pittsburgh, PA, USA, 9–13 July 2005; AAAI Press: Essex, UK, 2005; Volume 2, pp. 746–751. [Google Scholar]
- Thompson, C.A.; Califf, M.E.; Mooney, R.J. Active Learning for Natural Language Parsing and Information Extraction. In Proceedings of the 16th International Conference on Machine Learning, Bled, Slovenia, 27–30 June 1999; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1999; pp. 406–414. [Google Scholar]
- Becker, M.; Osborne, M. A Two-stage Method for Active Learning of Statistical Grammars. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, UK, 30 July–5 August 2005; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2005; pp. 991–996. [Google Scholar]
- Tang, M.; Luo, X.; Roukos, S. Active Learning for Statistical Natural Language Parsing. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; Association for Computational Linguistics: Madison, WI, USA, 2002; pp. 120–127. [Google Scholar]
- Lewis, D.D.; Gale, W.A. A Sequential Algorithm for Training Text Classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 3–6 July 1994; Springer: Berlin/Heidelberg, Germany, 1994; pp. 3–12. [Google Scholar]
- Liere, R.; Tadepalli, P. Active learning with committees for text categorization. In Proceedings of the 14th National Conference on Artificial Intelligence, Providence, RI, USA, 27–31 July 1997; AAAI Press: Essex, UK, 1997; pp. 591–596. [Google Scholar]
- McCallum, A.; Nigam, K. Employing EM and Pool-Based Active Learning for Text Classification. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, USA, 24–27 July 1998; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1998; pp. 350–358. [Google Scholar]
- Tong, S.; Koller, D. Support Vector Machine Active Learning with Applications to Text Classification. J. Mach. Learn. Res. 2002, 2, 45–66. [Google Scholar]
- Dagan, I.; Engelson, S.P. Committee-Based Sampling For Training Probabilistic Classifiers. In Proceedings of the 12th International Conference on Machine Learning, Tahoe City, CA, USA, 9–12 July 1995; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1995; pp. 150–157. [Google Scholar]
- Haertel, R.; Ringger, E.; Seppi, K.; Carroll, J.; McClanahan, P. Assessing the Costs of Sampling Methods in Active Learning for Annotation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, Columbus, OH, USA, 15–20 June 2008; Association for Computational Linguistics: Madison, WI, USA, 2008; pp. 65–68. [Google Scholar]
- Ringger, E.; McClanahan, P.; Haertel, R.; Busby, G.; Carmen, M.; Carroll, J.; Seppi, K.; Lonsdale, D. Active Learning for Part-of-Speech Tagging: Accelerating Corpus Annotation. In Proceedings of the Linguistic Annotation Workshop, Prague, Czech Republic, 28–29 June 2007; Association for Computational Linguistics: Madison, WI, USA, 2007; pp. 101–108. [Google Scholar]
- Laws, F.; Schütze, H. Stopping Criteria for Active Learning of Named Entity Recognition. In Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK, 18–22 August 2008; Association for Computational Linguistics: Madison, WI, USA, 2008; Volume 1, pp. 465–472. [Google Scholar]
- Shen, D.; Zhang, J.; Su, J.; Zhou, G.; Tan, C.L. Multi-criteria-based Active Learning for Named Entity Recognition. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain, 21–26 July 2004; Association for Computational Linguistics: Madison, WI, USA, 2004; pp. 589–596. [Google Scholar]
- Tomanek, K.; Wermter, J.; Hahn, U. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, 28–30 June 2007; Association for Computational Linguistics: Madison, WI, USA, 2007; pp. 486–495. [Google Scholar]
- Chan, Y.S.; Ng, H.T. Domain Adaptation with Active Learning for Word Sense Disambiguation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, 28–30 June 2007; Association for Computational Linguistics: Madison, WI, USA, 2007; pp. 49–56. [Google Scholar]
- Chen, J.; Schein, A.; Ungar, L.; Palmer, M. An Empirical Study of the Behavior of Active Learning for Word Sense Disambiguation. In Proceedings of the 2006 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New York, NY, USA, 4–9 June 2006; Association for Computational Linguistics: Madison, WI, USA, 2006; pp. 120–127. [Google Scholar]
- Zhu, J.; Hovy, E. Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, 28–30 June 2007; Association for Computational Linguistics: Madison, WI, USA, 2007; pp. 783–790. [Google Scholar]
- Baldridge, J.; Osborne, M. Active learning and logarithmic opinion pools for HPSG parse selection. Nat. Lang. Eng. 2008, 14, 191–222. [Google Scholar] [CrossRef] [Green Version]
- Ein-Dor, L.; Halfon, A.; Gera, A.; Shnarch, E.; Dankin, L.; Choshen, L.; Danilevsky, M.; Aharonov, R.; Katz, Y.; Slonim, N. Active Learning for BERT: An Empirical Study. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Online, 16–20 November 2020; Association for Computational Linguistics: Online, 2020; pp. 7949–7962. [Google Scholar]
- Liu, M.; Buntine, W.; Haffari, G. Learning to Actively Learn Neural Machine Translation. In Proceedings of the 22nd Conference on Computational Natural Language Learning, Brussels, Belgium, 31 October–1 November 2018; Association for Computational Linguistics: Madison, WI, USA, 2018; pp. 334–344. [Google Scholar]
- Lowell, D.; Lipton, Z.C.; Wallace, B.C. Practical Obstacles to Deploying Active Learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Madison, WI, USA, 2019; pp. 21–30. [Google Scholar]
- Anastasopoulos, A.; Lekakou, M.; Quer, J.; Zimianiti, E.; DeBenedetto, J.; Chiang, D. Part-of-Speech Tagging on an Endangered Language: A Parallel Griko-Italian Resource. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; Association for Computational Linguistics: Madison, WI, USA, 2018; pp. 2529–2539. [Google Scholar]
- Chaudhary, A.; Anastasopoulos, A.; Sheikh, Z.; Neubig, G. Reducing Confusion in Active Learning for Part-of-Speech Tagging. Trans. Assoc. Comput. Linguist. 2021, 9, 1–16. [Google Scholar]
- Erdmann, A.; Wrisley, D.J.; Allen, B.; Brown, C.; Cohen-Bodénès, S.; Elsner, M.; Feng, Y.; Joseph, B.; Joyeux-Prunel, B.; de Marneffe, M.C. Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Long and Short Papers, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Madison, WI, USA, 2019; Volume 1, pp. 2223–2234. [Google Scholar]
- Kim, Y. Deep Active Learning for Sequence Labeling Based on Diversity and Uncertainty in Gradient. In Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems, Suzhou, China, 7 December 2020; Association for Computational Linguistics: Madison, WI, USA, 2020; pp. 1–8. [Google Scholar]
- Settles, B.; Craven, M. An Analysis of Active Learning Strategies for Sequence Labeling Tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, USA, 25–27 October 2008; Association for Computational Linguistics: Madison, WI, USA, 2008; pp. 1070–1079. [Google Scholar]
- Vilares, M.; Darriba, V.M.; Ribadas, F.J. Modeling of learning curves with applications to POS tagging. Comput. Speech Lang. 2017, 41, 1–28. [Google Scholar] [CrossRef]
- Baker, B.; Gupta, O.; Raskar, R.; Naik, N. Accelerating neural architecture search using performance prediction. In Proceedings of the 6th International Conference on Learning Representations, ICLR’18, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Domhan, T.; Springenberg, J.T.; Hutter, F. Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves. In Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentine, 25–31 July 2015; AAAI Press: Essex, UK, 2015; pp. 3460–3468. [Google Scholar]
- Klein, A.; Falkner, S.; Springenberg, J.T.; Hutter, F. Learning Curve Prediction with Bayesian Neural Networks. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Vilares, M.; Darriba, V.; Vilares, J. Absolute convergence and error thresholds in non-active adaptive sampling. J. Comput. Syst. Sci. 2022, 129, 39–61. [Google Scholar] [CrossRef]
- Vilares, M.; Darriba, V.M.; Vilares, J. Adaptive scheduling for adaptive sampling in pos taggers construction. Comput. Speech Lang. 2020, 60, 101020. [Google Scholar]
- Domingo, C.; Gavaldà, R.; Watanabe, O. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms. Data Min. Knowl. Discov. 2002, 6, 131–152. [Google Scholar] [CrossRef]
- Meek, C.; Thiesson, B.; Heckerman, D. The Learning-curve Sampling Method Applied to Model-based Clustering. J. Mach. Learn. Res. 2002, 2, 397–418. [Google Scholar]
- Mohr, F.; van Rijn, J.N. Fast and Informative Model Selection using Learning Curve Cross-Validation. arXiv 2021, arXiv:2111.13914. [Google Scholar]
- Schütze, H.; Velipasaoglu, E.; Pedersen, J.O. Performance Thresholding in Practical Text Classification. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management, Arlington, VA, USA, 6–11 November 2006; ACM Press: New York, NY, USA; pp. 662–671. [Google Scholar]
- Tomanek, K.; Hahn, U. Approximating Learning Curves for Active-Learning-Driven Annotation. In Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, 28–30 May 2008; European Language Resources Association: Paris, France, 2008; pp. 1319–1324. [Google Scholar]
- Giménez, J.; Márquez, L. SVMTool: A general POS tagger generator based on support vector machines. In Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, 26–28 May 2004; European Language Resources Association: Paris, France, 2004; pp. 43–46. [Google Scholar]
- Etiquetador/Lematizador do Galego Actual (XIADA) [v2.8]—Corpus de adestramento. Centro Ramón Piñeiro para a Investigación en Humanidades. Available online: http://corpus.cirp.gal/xiada/descargas/texto_corpus (accessed on 13 July 2022).
- Branch, M.A.; Coleman, T.F.; Li, Y. A Subspace, Interior, and Conjugate Gradient Method for Large-Scale Bound-Constrained Minimization Problems. SIAM J. Sci. Comput. 1999, 21, 1–23. [Google Scholar] [CrossRef]
- Vandome, P. Econometric forecasting for the United Kingdom. Bull. Oxf. Univ. Inst. Econ. Stat. 1963, 25, 239–281. [Google Scholar] [CrossRef]
- Vilares, M.; Graña, J.; Araujo, T.; Cabrero, D.; Diz, I. A tagger environment for Galician. In Proceedings of the Workshop on Language Resources for European Minority Languages, Granada, Spain, 27 May 1998; European Language Resources Association: Paris, France, 1998. [Google Scholar]
- Brants, T. TnT: A Statistical Part-of-speech Tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, Washington, USA, 29 April–4 May 2000; Association for Computational Linguistics: Madison, WI, USA, 2000; pp. 224–231. [Google Scholar]
- Schmid, H. Probabilistic Part-of-Speech Tagging Using Decision Trees. In Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK, 1994; Association for Computational Linguistics: Madison, WI, USA, 1994; pp. 44–49. [Google Scholar]
- Chrupala, G.; Dinu, G.; van Genabith, J. Learning Morphology with Morfette. In Proceedings of the 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, 28–30 May 2008; European Language Resources Association: Paris, France, 2008; pp. 2362–2367. [Google Scholar]
- Collins, M. Discriminative training methods for Hidden Markov Models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, USA, 6–7 July 2002; Association for Computational Linguistics: Madison, WI, USA, 2002; Volume 10, pp. 1–8. [Google Scholar]
- Ratnaparkhi, A. A Maximum Entropy Model for Part-Of-Speech Tagging. In Proceedings of the 1996 Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, USA, 17–18 May 1996; Association for Computational Linguistics: Madison, WI, USA, 1996; pp. 133–142. [Google Scholar]
- Toutanova, K.; Klein, D.; Manning, C.D.; Singer, Y. Feature-rich part-of-speech Tagging with a Cyclic Dependency Network. In Proceedings of the 2003 Annual Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, AB, Canada, 28–30 May 2003; Association for Computational Linguistics: Madison, WI, USA, 2003; Volume 1, pp. 173–180. [Google Scholar]
- Ngai, G.; Florian, R. Transformation-Based Learning in the Fast Lane. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, Pittsburgh, PA, USA, 2–7 June 2001; Association for Computational Linguistics: Madison, WI, USA, 2001; pp. 1–8. [Google Scholar]
- Brill, E. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Comput. Linguist. 1995, 21, 543–565. [Google Scholar]
- Daelemans, W.; Zavrel, J.; Berck, P.; Gillis, S. MBT: A Memory–Based Part-of-speech Tagger Generator. In Proceedings of the 4th Workshop on Very Large Corpora, Herstmonceux Castle, Sussex, UK, 4 August 1996; Association for Computational Linguistics: Madison, WI, USA, 1996; pp. 14–27. [Google Scholar]
- Gu, B.; Hu, F.; Liu, H. Modelling Classification Performance for Large Data Sets. In Proceedings of the 2nd International Conference on Advances in Web-Age Information Management, Xi’an, China, 9–11 July 2001; Springer: Berlin/Heidelberg, Germany, 2001; pp. 317–328. [Google Scholar]
- Clark, A.; Fox, C.; Lappin, S. The Handbook of Computational Linguistics and Natural Language Processing; John Wiley & Sons: Hoboken, NJ, USA, 2010. [Google Scholar]
plevel | clevel | Control-Level | mape | dmr | rr | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ac | EAc | Ac | EAc | Ac | EAc | Ac | EAc | Ac | EAc | |||||||
fntbl | 105.003 | 2.40 | 150.017 | 94.16 | 93.87 | 94.57 | 94.30 | 94.96 | 94.61 | 95.16 | 94.84 | 95.34 | 95.03 | 0.32 | 85.71 | 90.00 |
maxent | 110.047 | 2.50 | 135.019 | 92.90 | 92.78 | 93.30 | 93.19 | 93.58 | 93.48 | 93.85 | 93.70 | 94.08 | 93.88 | 0.15 | 100.00 | 100.00 |
mbt | 85.012 | 2.20 | 145.016 | 92.97 | 92.84 | 93.42 | 93.22 | 93.76 | 93.50 | 94.01 | 93.72 | 94.30 | 93.89 | 0.28 | 100.00 | 92.31 |
morfette | 75.011 | 2.60 | 105.003 | 94.61 | 94.54 | 94.98 | 94.89 | 95.21 | 95.14 | 95.41 | 95.33 | 95.55 | 95.49 | 0.09 | 100.00 | 85.71 |
mxpost | 110.047 | 2.30 | 145.016 | 93.44 | 93.17 | 93.88 | 93.57 | 94.20 | 93.85 | 94.44 | 94.06 | 94.63 | 94.23 | 0.35 | 100.00 | 100.00 |
stanford | 95.015 | 2.40 | 125.001 | 94.41 | 94.43 | 94.78 | 94.80 | 95.07 | 95.07 | 95.26 | 95.27 | 95.41 | 95.43 | 0.02 | 85.71 | 85.71 |
svmtool | 250.012 | 2.20 | 250.012 | 95.00 | 95.05 | 95.36 | 95.44 | 95.60 | 95.71 | 95.78 | 95.93 | 95.92 | 96.10 | 0.12 | 100.00 | 86.67 |
tnt | 85.012 | 2.00 | 130.003 | 94.47 | 94.38 | 94.79 | 94.70 | 95.05 | 94.93 | 95.23 | 95.10 | 95.35 | 95.23 | 0.12 | 71.43 | 100.00 |
treetagger | – | 2.10 | – | 93.36 | – | 93.77 | – | 94.02 | – | 94.28 | – | 94.42 | – | – | – | – |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vilares Ferro, M.; Darriba Bilbao, V.M.; Ribadas Pena, F.J.; Graña Gil, J. Surfing the Modeling of pos Taggers in Low-Resource Scenarios. Mathematics 2022, 10, 3526. https://doi.org/10.3390/math10193526
Vilares Ferro M, Darriba Bilbao VM, Ribadas Pena FJ, Graña Gil J. Surfing the Modeling of pos Taggers in Low-Resource Scenarios. Mathematics. 2022; 10(19):3526. https://doi.org/10.3390/math10193526
Chicago/Turabian StyleVilares Ferro, Manuel, Víctor M. Darriba Bilbao, Francisco J. Ribadas Pena, and Jorge Graña Gil. 2022. "Surfing the Modeling of pos Taggers in Low-Resource Scenarios" Mathematics 10, no. 19: 3526. https://doi.org/10.3390/math10193526