Adding word sense awareness to computer-assisted language learning methods: a tailor-made word sense disambiguation method for Spanish as a foreign language

|

Accepted: 2025-03-10

|

Published: 2025-07-25

DOI: https://doi.org/10.4995/rlyla.2025.20780
Funding Data

Downloads

Keywords:

natural language processing, word sense disambiguation, Computer-Assisted Language Learning, Spanish as a foreign language

Supporting agencies:

This research was not funded

Abstract:

Word sense awareness is a feature which has not yet been implemented in most Computer-Assisted Language Learning (CALL) environments or in computer-readable resources for pedagogical purposes such as graded word lists. The current study aims to contribute to filling this gap by presenting a word sense disambiguation (WSD) method1 which relies on a tailor-made sense inventory, exploits readily available large language models, and only requires a limited number of prototypical examples sentences as manually curated data.  The methodology is evaluated on a set of 74 lexically ambiguous items, with a Spanish language for specific purposes course as the target setting. With weighted F1 scores up to 0.8995, the WSD method shows potential to be applied in real-life CALL scenarios.

Show more Show less

References:

Alfter, D., & Graën, J. (2019). “Interconnecting lexical resources and word alignment: How do learners get on with particle verbs?”, Proceedings of the 22nd Nordic Conference on Computational Linguistics, 321–326.

Bensoussan, M., & Laufer, B. (1984). “Lexical guessing in context in EFL reading comprehension”, Journal of Research in Reading,7, 15–32. https://doi.org/10.1111/j.1467-9817.1984.tb00252.x

Bevilacqua, M., Pasini, T., Raganato, A., & Navigli, R. (2021). “Recent Trends in Word Sense Disambiguation: A Survey”, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 4330–4338. https://doi.org/10.24963/ijcai.2021/593

Boers, F. (2021). Evaluating second language vocabulary and grammar instruction: A synthesis of the research on teaching words, phrases, and patterns. Routledge. https://doi.org/10.4324/9781003005605

Chambers, A. (2019). “Towards the corpus revolution? Bridging the research–practice gap”, Language Teaching,52/4, 460–475. https://doi.org/10.1017/S0261444819000089

Degani, T., & Tokowicz, N. (2010). “Ambiguous words are harder to learn”, Bilingualism: Language and Cognition13/3, 299–314. https://doi.org/10.1017/S1366728909990411

Degraeuwe, J., & Goethals, P. (2022). “Interactive Word Sense Disambiguation in Foreign Language Learning”, Proceedings of the 11th Workshop on Natural Language Processing for Computer-Assisted Language Learning (NLP4CALL 2022), 46–54. https://doi.org/10.3384/ecp190005

Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2019). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. https://doi.org/10.18653/v1/N19-1423

Fellbaum, C. (Ed.). (1998). WordNet. The MIT Press. https://doi.org/10.7551/mitpress/7287.001.0001

Firth, J.R. (1957). “A synopsis of linguistic theory 1930-55”, in Selected papers of J.R. Firth 1952-1959. London: Longman, 168–205.

Fundación SM. (2023). “Diccionario Clave. Lengua española”. https://www.grupo-sm.com/es/book/diccionario-clave-lengua-española [retrieved: 13.11.2023]

Gabrielatos, C. (2018). “Keyness analysis: Nature, metrics and techniques”, in C. Taylor & A. Marchi (eds.) Corpus Approaches To Discourse. Routledge, 225–258. https://doi.org/10.4324/9781315179346-11

Gilquin, G., & Granger, S. (2010). “How can data-driven learning be used in language teaching?”, in The Routledge Handbook of Corpus Linguistics. Routledge. https://doi.org/10.4324/9780203856949.ch26

Goethals, P. (2018). “Customizing vocabulary learning for advanced learners of Spanish”, in T. Read, B. Sedano Cuevas & S. Montaner-Villalba (eds.) Technological innovation for specialized linguistic domains: Languages for digital lives and cultures, proceedings of TISLID’18, Éditions Universitaires Européennes, 229–240.

González, M. (ed.). (2012). Diccionario Clave: Diccionario de uso del español actual (Novena edición (aumentada, y actualizada según la normativa académica actual)). SM.

Granger, S., Kraif, O., Ponton, C., Antoniadis, G., & Zampa, V. (2007). “Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness”, ReCALL,19/3, 252–268. https://doi.org/10.1017/S0958344007000237

Gutiérrez-Fandiño, A., Armengol-Estapé, J., Pàmies, M., Llop-Palao, J., Silveira-Ocampo, J., Carrino, C.P., Gonzalez-Agirre, A., Armentano-Oller, C., Rodriguez-Penagos, C., & Villegas, M. (2021). “MarIA: Spanish Language Models”. https://doi.org/10.48550/ARXIV.2107.07253

Harris, Z.S. (1970). Papers in structural and transformational linguistics. Dordrecht: Reidel. https://doi.org/10.1007/978-94-017-6059-1

Hovy, E., Navigli, R., & Ponzetto, S.P. (2013). “Collaboratively built semi-structured content and Artificial Intelligence: The story so far”, Artificial Intelligence,194, 2–27. https://doi.org/10.1016/j.artint.2012.10.002

Johns, T. (1991). “From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning”, in T. Johns & P. King (eds.) Classroom Concordancing. English Language Research Journal, 4, 27-45.

Kilgarriff, A. (1997). “I don’t believe in word senses”, Language Resources and Evaluation,31/2, 91–113. https://doi.org/10.1023/A:1000583911091

Kulkarni, A., Heilman, M., Eskenazi, M., & Callan, J. (2008). “Word Sense Disambiguation for Vocabulary Learning”, in B.P. Woolf, E. Aïmeur, R. Nkambou, & S. Lajoie (eds.), Intelligent Tutoring Systems 5091, Springer Berlin Heidelberg, 500–509. https://doi.org/10.1007/978-3-540-69132-7_53

Lacerra, C., Bevilacqua, M., Pasini, T., & Navigli, R. (2020). “CSI: A Coarse Sense Inventory for 85% Word Sense Disambiguation”, Proceedings of the AAAI Conference on Artificial Intelligence, 34/05, 8123–8130. https://doi.org/10.1609/aaai.v34i05.6324

Loureiro, D., Rezaee, K., Pilehvar, M.T., & Camacho-Collados, J. (2021). “Analysis and Evaluation of Language Models for Word Sense Disambiguation”, Computational Linguistics, 1–57. https://doi.org/10.1162/coli_a_00405

Lyons, J. (1977). Semantics (Vol. 2). Cambridge, UK: Cambridge University Press. https://doi.org/10.1017/CBO9780511620614

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). “Efficient Estimation of Word Representations in Vector Space”. arXiv preprint arXiv:1301.3781. https://doi.org/10.48550/ARXIV.1301.3781

Moliner, M., & Riera, C. (2016). Diccionario de uso del español (Cuarta edición, edición del cincuentenario) [Dictionary of the use of Spanish (Fourth edition, fiftieth anniversary edition)]. Gredos.

Navigli, R. (2009). “Word sense disambiguation: A survey”, ACM Computing Surveys,41/2, 1–69. https://doi.org/10.1145/1459352.1459355

Navigli, R., & Ponzetto, S.P. (2012). “BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network”, Artificial Intelligence,193, 217–250. https://doi.org/10.1016/j.artint.2012.07.001

Navigli, R., Litkowski, K.C., & Hargraves, O. (2007). “SemEval-2007 task 07: Coarse-grained English all-words task”, ACL 2007 - SemEval 2007 - Proceedings of the 4th International Workshop on Semantic Evaluations, June, 30–35. https://doi.org/10.3115/1621474.1621480

Pilán, I., Volodina, E., & Borin, L. (2016). “Candidate sentence selection for language learning exercises: From a comprehensive framework to an empirical evaluation”, Revue Traitement Automatique Des Langues,57/3, 67–91.

Pojanapunya, P., & Watson Todd, R. (2018). “Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis”, Corpus Linguistics and Linguistic Theory,14/1, 133–167. https://doi.org/10.1515/cllt-2015-0030

Ruiz, S., Rebuschat, P., & Meurers, D. (2021). “The effects of working memory and declarative memory on instructed second language vocabulary learning: Insights from intelligent CALL”, Language Teaching Research,25/4, 510–539. https://doi.org/10.1177/1362168819872859

Tack, A., François, T., Desmet, P., & Fairon, C. (2018). “NT2Lex: A CEFR-Graded Lexical Resource for Dutch as a Foreign Language Linked to Open Dutch WordNet”, Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, 137–146. https://doi.org/10.18653/v1/W18-0514

Uma, A.N., Fornaciari, T., Hovy, D., Paun, S., Plank, B., & Poesio, M. (2021). “Learning from Disagreement: A Survey”, Journal of Artificial Intelligence Research, 72, 1385–1470. https://doi.org/10.1613/jair.1.12752

Verspoor, M., & Lowie, W. (2003). “Making Sense of Polysemous Words”, Language Learning, 53/3, 547–586. https://doi.org/10.1111/1467-9922.00234

Wiedemann, G., Remus, S., Chawla, A., & Biemann, C. (2020). “Does BERT make any sense? Interpretable word sense disambiguation with contextualized embeddings”, Proceedings of the 15th Conference on Natural Language Processing, KONVENS 2019, 161–170.

Wilson, A. (2013). “Embracing Bayes factors for key item analysis in corpus linguistics”, in M. Bieswanger & A. Koll-Stobbe (eds.) New Approaches to the Study of Linguistic Variability. Peter Lang, 3–11.

Show more Show less