Findings on Ad Hoc Contractions
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset
- Frequency—the amount of times the word that is intended to be abbreviated is used; it should be quite frequent to warrant the need to abbreviate. We decided on a minimum of seven occurrences;
- Length—the length of the candidate word must be greater than five letters;
- Outliers—unnatural words such as words with numbers were not included;
- Replacers—abbreviations with replacement characters or symbols that were not found in the original long-form definition (ex: XFER with X in place of the word Trans) were omitted.
2.2. Ad Hoc Abbreviations
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
BERT | Bidirectional Encoder Representations from Transformers |
WSD | Word Sense Disambiguation |
TF-IDF | Term Frequency-Inverse Document Frequency |
HMM | Hidden Markov Models |
POS | Parts of Speech |
RS | Reverse Sampling |
API | Application Program Interface |
Appendix A
Document ID | Dictionary Words | ||||||||
---|---|---|---|---|---|---|---|---|---|
appearances | appears | appended | appletalk | applets | appliances | applicability | applicable | application | |
10 | 0 | 0 | 0.005293352 | 0 | 0 | 0 | 0 | 0.005059638 | 0 |
12 | 0 | 0 | 0.009081262 | 0 | 0 | 0 | 0 | 0.004340151 | 0 |
13 | 0 | 0.005329779 | 0.003649573 | 0.005949893 | 0.005949893 | 0.011899787 | 0 | 0 | 0.031551542 |
14 | 0 | 0 | 0.002396322 | 0 | 0 | 0 | 0 | 0 | 0.007533401 |
15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.010297847 | 0.005080364 |
16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.002266803 |
17 | 0 | 0 | 0.006081679 | 0 | 0 | 0 | 0 | 0.003875439 | 0.00637306 |
18 | 0 | 0 | 0.00577253 | 0 | 0 | 0 | 0.002107528 | 0.004138245 | 0.002268413 |
2011_Saransk_Cup | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
23 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0.001940913 | 0 | 0.002381053 | 0 | 0 | 0 | 0 | 0 | 0.004678373 |
420 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
519 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.096982195 |
6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.023719197 | 0.00097514 |
699 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.002544519 | 0.018829786 |
7 | 0 | 0 | 0 | 0 | 0 | 0 | 0.002803288 | 0 | 0.006034572 |
795 | 0 | 0 | 0.001251368 | 0 | 0 | 0 | 0 | 0.002392234 | 0.000983492 |
8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.006762265 | 0.001010943 |
9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.001361715 |
… | … | … | … | … | … | … | … | … | … |
Occurrances | 1 | 2 | 8 | 1 | 1 | 1 | 2 | 9 | 14 |
References
- Barnett, A.; Doubleday, Z. Meta-Research: The growth of acronyms in the scientific literature. eLife 2020, 9, e60080. [Google Scholar] [CrossRef] [PubMed]
- Sheppard, J.E.; Weidner, L.C.E.; Zakai, S.; Fountain-Polley, S.; Williams, J. Ambiguous abbreviations: An audit of abbreviations in paediatric note keeping. Arch. Dis. Child. 2008, 93, 204–206. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tariq, R.A.; Sharma, S. Inappropriate Medical Abbreviations—Statpearls—NCBI Bookshelf. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2022. Available online: https://www.ncbi.nlm.nih.gov/books/NBK519006 (accessed on 24 April 2023).
- Grossman Liu, L.; Russell, D.; Reading Turchioe, M.; Myers, A.C.; Vawdrey, D.K.; Masterson Creber, R.M. Effect of Expansion of Abbreviations and Acronyms on Patient Comprehension of Their Health Records: A Randomized Clinical Trial. JAMA Netw. Open 2022, 5, e2212320. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Y.; Liu, H.; Zhang, Y.; Niu, N.; Zhao, Y.; Zhang, L. Which Abbreviations Should Be Expanded? In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, 23–28 August 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 578–589. [Google Scholar] [CrossRef]
- Hales, A.H.; Williams, K.D.; Rector, J.H. Alienating the Audience: How Abbreviations Hamper Scientific Communication. APS Obs. 2017, 30. [Google Scholar]
- Taghva, K.; Gilbreth, J. Recognizing acronyms and their definitions. Int. J. Doc. Anal. Recognit. 1999, 1, 191–198. [Google Scholar] [CrossRef] [Green Version]
- Taghva, K.; Vyas, L. Acronym Expansion Via Hidden Markov Models. In Proceedings of the 2011 21st International Conference on Systems Engineering, Las Vegas, NV, USA, 16–18 August 2011; pp. 120–125. [Google Scholar] [CrossRef]
- Sultan, M.A.; Bethard, S.; Sumner, T. Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence. Trans. Assoc. Comput. Linguist. 2014, 2, 219–230. [Google Scholar] [CrossRef]
- Brin, S.; Page, L. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 1998, 30, 107–117. [Google Scholar] [CrossRef]
- Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web; Technical Report 1999-66; Stanford InfoLab: Stanford, CA, USA, 1999. [Google Scholar]
- Mihalcea, R.; Tarau, P.; Figa, E. PageRank on Semantic Networks, with Application to Word Sense Disambiguation. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, 23–27 August 2004; Association for Computational Linguistics: Stroudsburg, PA, USA, 2004; pp. 1126–1132. [Google Scholar] [CrossRef]
- Choi, S.; Taghva, K. Abbreviation Disambiguation: A Review of Modern Techniques to Improve Machine Reading Comprehension. In Proceedings of the SAI Computing Conference 2023, London, UK, 22–23 June 2023; Springer Nature: Berlin, Germany, 2023. [Google Scholar]
- Song, S.; Miao, Q.; Shi, Z.; Meng, Y.; Chen, H. Co-occurrence semantic knowledge base construction for abbreviation disambiguation. In Proceedings of the 2017 International Conference on Asian Language Processing (IALP), Singapore, 5–7 December 2017; pp. 326–329. [Google Scholar] [CrossRef]
- Turtel, B.D.; Shasha, D. Acronym Disambiguation; Technical Report 2007; Courant Institute of Mathematical Sciences Library: New York, NY, USA, 2007. [Google Scholar]
- Li, C.; Ji, L.; Yan, J. Acronym Disambiguation Using Word Embedding. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; AAAI Press: Washington, DC, USA, 2015; pp. 4178–4179. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Rush, A.M. The Annotated Transformer. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS); Association for Computational Linguistics: Melbourne, Australia, 2018; pp. 52–60. [Google Scholar]
- Choi, S.; Puranik, P.; Dahal, B.; Taghva, K. How to generate data for acronym detection and expansion. Adv. Comput. Intell. 2022, 2, 23. [Google Scholar] [CrossRef]
- Daza, A.; Fokkens, A.; Erjavec, T. Dealing with Abbreviations in the Slovenian Biographical Lexicon. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Association for Computational Linguistics: Abu Dhabi, United Arab Emirates, 2022; pp. 8715–8720. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Beltagy, I.; Cohan, A.; Lo, K. SciBERT: Pretrained Contextualized Embeddings for Scientific Text. arXiv 2019, arXiv:1903.10676. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
- Wen, Z.; Lu, X.H.; Reddy, S. MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining. In Proceedings of the 3rd Clinical Natural Language Processing Workshop, Online, 19 November 2020; Association for Computational Linguistics: Abu Dhabi, United Arab Emirates, 2020; pp. 130–135. [Google Scholar]
- Skreta, M.; Arbabi, A.; Wang, J.; Brudno, M. Training without training data: Improving the generalizability of automated medical abbreviation disambiguation. arXiv 2019, arXiv:1912.06174. [Google Scholar]
- Pennell, D.; Liu, Y. Toward text message normalization: Modeling abbreviation generation. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 5364–5367. [Google Scholar] [CrossRef]
- Sarica, S.; Luo, J. Stopwords in technical language processing. PLoS ONE 2021, 16, e0254937. [Google Scholar] [CrossRef] [PubMed]
Word | Instances | Candidate 1 | Candidate 2 | Candidate 3 | Candidate 4 | Candidate 5 |
---|---|---|---|---|---|---|
background | 18 | bg | bkgd | bkgnd | bk | bgr |
additional | 13 | addl | addtl | addnl | ||
primary | 10 | pri | pry | pr | prim | |
broadcast | 10 | brd | brdcst | bcst |
Candidate Definition 1 | Candidate Definition 2 | Candidate Definition 3 | Candidate Definition 4 | Candidate Definition 5 | Candidate 5 | |
---|---|---|---|---|---|---|
bg | bag | begar | being | background | big | bgr |
bkgd | backgated | backgrind | background | backsighted | ||
bkgnd | backgrind | background | backgammoned | |||
bk | blink | break | back | black | background | |
bgr | begar | bagger | bigger | background | burger |
Definition 1 | Definition 2 | Definition 3 | Definition 4 | Definition 5 | # Candidate Words) | Letter Ratio | Letters Removed | Vowels Found | Vowls Kept | Vowels Removed | |
---|---|---|---|---|---|---|---|---|---|---|---|
bg | bag | begar | being | background | big | 1000+ | 0.2 | 8 | 3 | 0 | 3 |
bkgd | backgated | backgrind | background | backsighted | 53 | 0.4 | 6 | 3 | 0 | 3 | |
bkgnd | backgrind | background | backgammoned | 17 | 0.6 | 4 | 3 | 0 | 3 | ||
bk | blink | break | back | black | background | 1000+ | 0.2 | 8 | 3 | 0 | 3 |
bgr | begar | bagger | bigger | background | burger | 1000+ | 0.3 | 7 | 3 | 0 | 3 |
addl | addle | adducible | additional | addressible | adjudgeable | 860 | 0.4 | 6 | 4 | 0 | 4 |
addtl | additional | adjudicational | adductively | autodidactically | addictionologists | 47 | 0.5 | 5 | 4 | 0 | 4 |
addnl | addental | additional | antroduodenal | addictionologist | 65 | 0.5 | 5 | 4 | 0 | 4 | |
adj | adjunct | adjutant | adjacent | adjust | adjectives | 582 | 0.5 | 3 | 1 | 0 | 1 |
ad | advance | abide | added | admit | ahead | 1000+ | 0.333333333 | 2 | 3 | 0 | 3 |
alloc | allocate | allowance | allothetic | allocrite | allogenic | 1000+ | 0.5 | 5 | 4 | 1 | 3 |
mods | modish | modules | modification | modelss | micropods | 1000+ | 0.307692308 | 9 | 6 | 1 | 5 |
pri | peril | pride | prima | primary | praise | 1000+ | 0.428571429 | 4 | 2 | 1 | 1 |
pry | proxy | prayer | poetry | purify | primary | 1000+ | 0.428571429 | 4 | 2 | 0 | 2 |
pr | prom | prop | proxy | primary | professor | 1000+ | 0.285714286 | 5 | 2 | 0 | 2 |
prim | primal | pluralism | prelimits | primary | perimeter | 1000+ | 0.571428571 | 3 | 2 | 1 | 1 |
brd | broadcast | birds | breed | broiled | bromide | 1000+ | 0.333333333 | 6 | 3 | 0 | 3 |
brdcst | breadcrust | broadcast | 46 | 0.666666667 | 3 | 3 | 0 | 3 | |||
bcst | backseat | backstep | buckshot | biochemist | broadcast | 1000+ | 0.444444444 | 5 | 3 | 0 | 3 |
Target Definition | Candidate Definition | Category Number | Target Probability | Candidate Probability |
---|---|---|---|---|
adjust | adjacent | 2 | 0.00854 | 0.02135 |
adjust | adjacent | 14 | 0.00311 | 0.00623 |
adjust | adjacent | 16 | 0.00146 | 0.00439 |
advance | added | 13 | 0.02537 | 0.00260 |
modification | modules | 14 | 0.00357 | 0.00936 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, S.; Taghva, K. Findings on Ad Hoc Contractions. Information 2023, 14, 391. https://doi.org/10.3390/info14070391
Choi S, Taghva K. Findings on Ad Hoc Contractions. Information. 2023; 14(7):391. https://doi.org/10.3390/info14070391
Chicago/Turabian StyleChoi, Sing, and Kazem Taghva. 2023. "Findings on Ad Hoc Contractions" Information 14, no. 7: 391. https://doi.org/10.3390/info14070391
APA StyleChoi, S., & Taghva, K. (2023). Findings on Ad Hoc Contractions. Information, 14(7), 391. https://doi.org/10.3390/info14070391