The Spanish Academic Collocation List
Submitted: 2024-10-09
|Accepted: 2025-03-30
|Published: 2025-07-25
Copyright (c) 2025 Revista de Lingüística y Lenguas Aplicadas

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Downloads
Keywords:
academic writing, collocations, universal dependencies, phraseology
Supporting agencies:
Abstract:
Phraseology plays a crucial role in academic texts, with collocation use key to demonstrating high competence in academic writing. Even for native speakers with limited experience in academic genres, academic collocations present challenges. Despite their importance, however, no sufficiently representative repertoire of academic collocations in Spanish has been developed as a resource for students, similar to those available for academic English. To address this gap, this article proposes a reference list of academic collocations in Spanish, designed for integrated in an academic writing tool and as a support for Spanish for Academic Purposes. Collocations were extracted from an academic corpus using NLP techniques and applying frequency and distribution criteria and Universal Dependency parsing. The resulting list was manually validated to ensure it includes useful collocations for Spanish students across various academic fields.
References:
Ackermann, K., & Chen, Y. (2013). “Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach”, Journal of English for Academic Purposes 12/4, 235-247. https://doi.org/10.1016/j.jeap.2013.08.002
Ahumada, I. (2011). “El español de la ciencia: ¿la identidad en crisis”, in Word for Word / Palabra por palabra. El impacto social, económico y político del español y del inglés. Madrid: Santillana Español-British Council-Instituto Cervantes, 309-328.
Alonso-Ramos, M., García-Salido, M., & Garcia, M. (2017). “Exploiting a corpus to compile a lexical resource for academic writing: Spanish lexical combinations”, in I. Kosem et al. (eds.) Electronic lexicography in the 21st century, Proceedings of 2017 eLex Conference, 571-586. https://elex.link/elex2017/wp-content/uploads/2017/09/paper35.pdf
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Harlow: Pearson Education Limited.
Biber, D., Conrad, S., & Cortes, V. (2004). “If you look at...: Lexical bundles in university teaching and textbooks”, Applied Linguistics 25/3, 371-405. https://doi.org/10.1093/applin/25.3.371
Boers, F., & Webb, S. (2018). “Teaching and learning collocation in adult second and foreign language learning”, Language Teaching 51/1, 77-89. https://doi.org/10.1017/S0261444817000301
Cortes, V. (2008). “A comparative analysis of lexical bundles in academic history writing in English and Spanish”, Corpora 3/1, 43-57. https://doi.org/10.3366/E1749503208000063
Coxhead, A. (2000). “A new academic word list”, TESOL Quarterly 34/2, 213-238. https://doi.org/10.2307/3587951
Cribb, M., & Wang, X. (2021). “Making academic vocabulary count through strategic deployment in oral presentations by Chinese students of English”, The Language Learning Journal 49/2, 251-264. https://doi.org/10.1080/09571736.2019.1566396
Csomay, E., & Prades, A. (2018). “Academic vocabulary in ESL student papers: A corpus-based study”, Journal of English for Academic Purposes 33, 100-118. https://doi.org/10.1016/j.jeap.2018.02.003
Crossley, S.A., Salsbury, T., & McNamara, D. (2015). “Assessing lexical proficiency using analytic ratings: A case for collocation accuracy”, Applied Linguistics 36/5, 570-590. https://doi.org/10.1093/applin/amt056
Da Cunha, I., Montané, M.A., & Hysa, L. (2017). “The arText prototype: An automatic system for writing specialized texts”, in A. Martins & A. Peñas (eds.) EACL 2017 15th Conference of the European Chapter of the Association for Computational Linguistics. Proceedings of the Software Demonstrations, 57-60. http://hdl.handle.net/10230/46442
Dang, T.N.Y., Webb, S., & Coxhead, A. (2022). “Evaluating lists of high-frequency words: Teachers’ and learners’ perspectives”, Language Teaching Research 26/4, 617-641. https://doi.org/10.1177/136216882091118
De Marneffe, M., Manning, C.D., Nivre, J., & Zeman, D. (2021). “Universal Dependencies”, Computational Linguistics 47/2, 255–308. https://doi.org/10.1162/coli_a_00402
Drouin, P. (2010). “Extracting a bilingual transdisciplinary scientific lexicon”, in S. Granger & M. Paquot (eds.) eLexicography in the 21st century: new challenges, new applications, 43-53.
Durrant, P. (2009). “Investigating the viability of a collocation list for students of English for academic purposes”, English for Specific Purposes 28/3, 157-169. https://doi.org/10.1016/j.esp.2009.02.002
Durrant, P. (2016). “To what extent is the Academic Vocabulary List relevant to university student writing?”, English for Specific Purposes 43, 49-61. https://doi.org/10.1016/j.esp.2016.01.004
Errázuriz Cruz, M.C. (2014). “El desarrollo de la escritura argumentativa académica: los marcadores discursivos”, Onomázein 30, 217-326. https://doi.org/10.7764/onomazein.30.13
Frankenberg-Garcia, A. (2018). “Investigating the collocations available to EAP writers”, Journal of English for Academic Purposes 35, 93-104. https://doi.org/10.1016/j.jeap.2018.07.003
Frankenberg-Garcia, A., Lew, R., Roberts, J.C., Rees, G.P., & Sharma, N. (2019). “Developing a writing assistant to help EAP writers with collocations in real time”, ReCALL 31/1, 23-39. https://doi.org/10.1017/S0958344018000150
García-Salido, M., García-González, M., & Alonso-Ramos, M. (2019). “Identifying lexical bundles for an academic writing assistant in Spanish”, in G. Corpas Pastor & R. Mitkov (eds.) Computational and Corpus-Based Phraseology, Third International Conference, Europhras 2019, 144-158. https://doi.org/10.1007/978-3-030-30135-4_11
García-Salido, M., Garcia, M., Villayandre-Llamazares, M., & Alonso-Ramos, M. (2018). “A lexical tool for academic writing in Spanish based on expert and novice corpora”, in N. Calzolari et al. (eds.) Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Paris: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2018
García-Salido, M. (2021). “Compiling an Academic Vocabulary List of Spanish”. Retrieved from: https://doi.org/10.13140/RG.2.2.27681.33123
Garcia, M., & Gamallo, P. (2016). “Yet another suite of multilingual NLP tools”, in J.P. Leal, J.L. Sierra-Rodríguez & A. Simões (eds.) Languages, Applications and Technologies. Communications in Computer and Information Science. Madrid: Springer, 65–75. https://gramatica.usc.es/~gamallo/artigos-web/SLATE2015.pdf
Gardner, D., & Davies, M. (2014). “A new academic vocabulary list”, Applied Linguistics 35/3, 305-327. https://doi.org/10.2307/3587951
Guzzi, E. (2023). Identificación automática de colocaciones académicas en español para una herramienta en línea de ayuda a la redacción. Tesis doctoral. http://hdl.handle.net/2183/35240
Guzzi, E., & Alonso-Ramos, M. (2023a). “Descripción y usabilidad de HARTA, una herramienta de ayuda para la redacción de textos académicos en español”, Tecnologías para la investigación en segundas lenguas 2, 1-22. https://doi.org/10.1344/teisel.v2.42173
Guzzi, E., & Alonso-Ramos, M. (2023b). “Sofisticación y diversidad como medidas de complejidad léxica para determinar el perfil colocacional de textos académicos en español”, Revista Signos 56/112, 282-305. https://doi.org/10.4067/S0718-09342023000200282
Hu, M., & Nation, P. (2000). “Unknown vocabulary density and reading comprehension”, Reading in a Foreign Language 13/1, 403-430.
Hyland, K. (2006). “The ‘other’ English: Thoughts on EAP and academic writing”, The European English Messenger 15/2, 34-38. https://www.academia.edu/40422292/The_other_English_thoughts_on_EAP_and_academic_writing
Hyland, K. (2008). “Metadiscourse: Mapping interactions in academic writing”, Nordic Journal of English Studies 9/2, 125-143. https://doi.org/10.35360/njes.220
Hyland, K., & Tse, P. (2007). “Is there an ‘Academic Vocabulary’?”, TESOL Quarterly 41/2, 235-253.
Jacques, M.P., & Tutin, A. (2018). Lexique transversal et formules discursives des sciences humaines. London: ISTE Group.
Jones, S., & Sinclair, J. (1974). “English lexical collocations. A study in computational linguistics”, Cahiers de lexicologie 24/25, 15–61.
Kilgarriff, A., & Renau, I. (2013). “esTenTen, a vast web corpus of Peninsular and American Spanish”, Procedia - Social and Behavioral Sciences 95, 12-19. https://doi.org/10.1016/j.sbspro.2013.10.617
Laso, N.J. (2022). “SciE-Lex Report: Building up a Collocational Database to Assist the Production of Biomedical Texts in L2 English”, TEISEL. Tecnologías para la investigación en segundas lenguas 1, 1-16. https://doi.org/10.1344/teisel.v1.37444
Laufer, B., & Waldman, T. (2011). “Verb‐noun Collocations in Second Language Writing: A Corpus Analysis of Learners’ English”, Language Learning 67/2, 647-672. https://doi.org/10.1111/j.1467-9922.2010.00621.x
Lei, L., & Liu, D. (2018). “The academic English collocation list: A corpus-driven study”, International Journal of Corpus Linguistics 23/2, 216-243. https://doi.org/10.1075/ijcl.16135.lei
Lew, R., Frankenberg-Garcia, A., Rees, G.P., Roberts, J.C., & Sharma, N. (2018). “ColloCaid: A real-time tool to help academic writers with English collocations”, in J. Cibej et al. (eds.) Proceedings of the XVIII EURALEX International Congress. Ljubljana: Ljubljana University Press, Faculty of Arts, 167-168.
Mauranen, A., Hynninen, N., & Ranta, E. (2016). “English as the academic lingua franca”, in K. Hyland & P. Shaw (eds.) The Routledge Handbook of English for Academic Purposes. London/New York: Routledge, 44-55. https://doi.org/10.4324/9781315657455
Mel’čuk, I. (2012). Semantics: From Meaning to Text [Vol. 1]. Amsterdam/Philadelphia: John Benjamins. https://doi.org/10.1515/phras-2012-0003
Mel’čuk, I. (2015). “Clichés, an understudied subclass of phrasemes”, Yearbook of Phraseology 6/1, 55-86. https://doi.org/10.1515/phras-2015-0005
Mel’čuk, I. (2020). “Clichés and pragmatemes”, Neophilologica 32, 9-20. https://doi.org/10.31261/NEO.2020.32.01
Nazar, R., & Renau, I. (2023). “Estilector: un sistema de evaluación automática de la escritura académica en castellano”, Perspectiva Educacional 62/2, 37-59. https://doi.org/10.4151/07189729-vol.62-iss.2-art.1427
Neff, J. (2008). “Contrasting English-Spanish interpersonal discourse phrases: A corpus study”, in Phraseology in Foreign Language Learning and Teaching. Amsterdam/Philadelphia: John Benjamins, 85-99.
Nesselhauf, N. (2005). Collocations in a Learner Corpus. Amsterdam/Philadelphia: John Benjamins.
Nguyen, T.M.H., & Coxhead, A. (2022). “Evaluating multiword unit word lists for academic purposes”, ITL - International Journal of Applied Linguistics 174/1, 83-111. https://doi.org/10.1075/itl.21041.ngu
Nivre, J., De Marneffe, M.C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., & Zeman, D. (2016). “Universal dependencies v1: A multilingual treebank collection”, in N. Calzolari et al. (eds.) Proceedings of the Tenth International Conference on Resources and Evaluation (LREC’16). Portoro: European Language Resources Association, 1659-1666. https://aclanthology.org/L16-1262
Padró, L., & Stanilovsky, E. (2012). “Freeling 3.0: Towards wider multilinguality”, in N. Calzolari et al. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). Paris: ELRA, 2473–2479. http://www.lrec-conf.org/proceedings/lrec2012/pdf/430_Paper.pdf
Paquot, M. (2012). “The LEAD dictionary-cum-writing aid: an integrated dictionary and corpus tool”, in S. Granger & M. Paquot (eds.) Electronic Lexicography. Oxford: Oxford University Press, 136-186. https://doi.org/10.1093/acprof:oso/9780199654864.003.0009
Paquot, M., & Granger, S. (2012). “Formulaic language in learner corpora”, Annual Review of Applied Linguistics 32, 130-149. https://doi.org/10.1017/S0267190512000098
Pérez-Llantada, C. (2014). “Formulaic language in L1 and L2 expert academic writing: Convergent and divergent usage”, Journal of English for Academic Purposes 14, 84-94. https://doi.org/10.1016/j.jeap.2014.01.002
Römer, U., & Arbor, A. (2009). “English in academia: Does nativeness matter”, Anglistik: International Journal of English Studies 20/2, 89-100. https://lexically.net/wordsmith/corpus_linguistics_links/Anglistik_2009_nativeness_89_R%C3%B6mer.pdf
Salazar, D. (2014). Lexical Bundles in Native and Non-Native Scientific Writing. Oxford: University of Oxford.
Schuth, E., Köhne, J., & Weinert, S. (2017). “The influence of academic vocabulary knowledge on school performance”, Learning and Instruction 49, 157-165. https://doi.org/10.1016/j.learninstruc.2017.01.005
Sebastián, N., Carreiras, M.F., Cuetos, F., & Martí, M.A. (2000). LEXESP: Léxico informatizado del español. Barcelona: Universitat de Barcelona.
Simpson-Vlach, R., & Ellis, N.C. (2010). “An academic formulas list: New methods in phraseology research”, Applied Linguistics 31/4, 487-512. https://doi.org/10.1093/applin/amp058
Skoufaki, S., & Petrić, B. (2021). “Exploring polysemy in the Academic Vocabulary List: A lexicographic approach”, Journal of English for Academic Purposes 54, 101038. https://doi.org/10.1016/j.jeap.2021.101038
Straka, M., Hajic, J., & Strakov, J. (2016). “UDPipe: Trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and parsing”, in N. Calzolari et al. (eds.) Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). Portoro: ELRA, 4290-4297. http://www.lrec-conf.org/proceedings/lrec2016/pdf/873_Paper.pdf
Taulé, M., Martí, M.A., & Recasens, M. (2008). “AnCora: Multilevel annotated corpora for Catalan and Spanish”, in N. Calzolari et al. (eds.) Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). Paris: ELRA, 96-101. http://www.lrec-conf.org/proceedings/lrec2008/pdf/35_paper.pdf
Townsend, D., Filippini, A., Collins, P., & Biancarosa, G. (2012). “Evidence for the importance of academic word knowledge for the academic achievement of diverse middle school students”, The Elementary School Journal 112/3, 497-518. https://doi.org/10.1086/663301
Tutin, A. (2014). «La phraséologie transdisciplinaire des écrits scientifiques: des collocations aux routines sémantico-rhétoriques», in A. Tutin y F. Grossmann (eds.) L’écrit scientifique: du lexique au discours. Autour de Scientext. Rennes: PUR, 27-44.
Tutin, A. (2018). «Les expressions polylexicales transdisciplinaires dans les articles de recherche en sciences humaines: retour d’expérience», in M.P. Jacques & A. Tutin (eds.) Lexique transversal et formules discursives des sciences humaines. London: ISTE Group, 73-90.
Wray, A. (2013). “Formulaic language”, Language Teaching 46/3, 316-334. https://doi.org/10.1017/S0261444813000013
Yao, G. (2022). Metadiscourse use in Spanish academic writing: exploring the interface of nativeness and expertise. Tesis doctoral. Murcia: Universidad de Murcia. http://hdl.handle.net/10201/117507



