The Spanish Academic Collocation List

|

Accepted: 2025-03-30

|

Published: 2025-07-25

DOI: https://doi.org/10.4995/rlyla.2025.22588
Funding Data

Downloads

Keywords:

academic writing, collocations, universal dependencies, phraseology

Supporting agencies:

This research was not funded

Abstract:

Phraseology plays a crucial role in academic texts, with collocation use key to demonstrating high competence in academic writing. Even for native speakers with limited experience in academic genres, academic collocations present challenges. Despite their importance, however, no sufficiently representative repertoire of academic collocations in Spanish has been developed as a resource for students, similar to those available for academic English. To address this gap, this article proposes a reference list of academic collocations in Spanish, designed for integrated in an academic writing tool and as a support for Spanish for Academic Purposes. Collocations were extracted from an academic corpus using NLP techniques and applying frequency and distribution criteria and Universal Dependency parsing. The resulting list was manually validated to ensure it includes useful collocations for Spanish students across various academic fields.

Show more Show less

References:

Ackermann, K., & Chen, Y. (2013). “Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach”, Journal of English for Academic Purposes 12/4, 235-247. https://doi.org/10.1016/j.jeap.2013.08.002

Ahumada, I. (2011). “El español de la ciencia: ¿la identidad en crisis”, in Word for Word / Palabra por palabra. El impacto social, económico y político del español y del inglés. Madrid: Santillana Español-British Council-Instituto Cervantes, 309-328.

Alonso-Ramos, M., García-Salido, M., & Garcia, M. (2017). “Exploiting a corpus to compile a lexical resource for academic writing: Spanish lexical combinations”, in I. Kosem et al. (eds.) Electronic lexicography in the 21st century, Proceedings of 2017 eLex Conference, 571-586. https://elex.link/elex2017/wp-content/uploads/2017/09/paper35.pdf

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Harlow: Pearson Education Limited.

Biber, D., Conrad, S., & Cortes, V. (2004). “If you look at...: Lexical bundles in university teaching and textbooks”, Applied Linguistics 25/3, 371-405. https://doi.org/10.1093/applin/25.3.371

Boers, F., & Webb, S. (2018). “Teaching and learning collocation in adult second and foreign language learning”, Language Teaching 51/1, 77-89. https://doi.org/10.1017/S0261444817000301

Cortes, V. (2008). “A comparative analysis of lexical bundles in academic history writing in English and Spanish”, Corpora 3/1, 43-57. https://doi.org/10.3366/E1749503208000063

Coxhead, A. (2000). “A new academic word list”, TESOL Quarterly 34/2, 213-238. https://doi.org/10.2307/3587951

Cribb, M., & Wang, X. (2021). “Making academic vocabulary count through strategic deployment in oral presentations by Chinese students of English”, The Language Learning Journal 49/2, 251-264. https://doi.org/10.1080/09571736.2019.1566396

Csomay, E., & Prades, A. (2018). “Academic vocabulary in ESL student papers: A corpus-based study”, Journal of English for Academic Purposes 33, 100-118. https://doi.org/10.1016/j.jeap.2018.02.003

Crossley, S.A., Salsbury, T., & McNamara, D. (2015). “Assessing lexical proficiency using analytic ratings: A case for collocation accuracy”, Applied Linguistics 36/5, 570-590. https://doi.org/10.1093/applin/amt056

Da Cunha, I., Montané, M.A., & Hysa, L. (2017). “The arText prototype: An automatic system for writing specialized texts”, in A. Martins & A. Peñas (eds.) EACL 2017 15th Conference of the European Chapter of the Association for Computational Linguistics. Proceedings of the Software Demonstrations, 57-60. http://hdl.handle.net/10230/46442

Dang, T.N.Y., Webb, S., & Coxhead, A. (2022). “Evaluating lists of high-frequency words: Teachers’ and learners’ perspectives”, Language Teaching Research 26/4, 617-641. https://doi.org/10.1177/136216882091118

De Marneffe, M., Manning, C.D., Nivre, J., & Zeman, D. (2021). “Universal Dependencies”, Computational Linguistics 47/2, 255–308. https://doi.org/10.1162/coli_a_00402

Drouin, P. (2010). “Extracting a bilingual transdisciplinary scientific lexicon”, in S. Granger & M. Paquot (eds.) eLexicography in the 21st century: new challenges, new applications, 43-53.

Durrant, P. (2009). “Investigating the viability of a collocation list for students of English for academic purposes”, English for Specific Purposes 28/3, 157-169. https://doi.org/10.1016/j.esp.2009.02.002

Durrant, P. (2016). “To what extent is the Academic Vocabulary List relevant to university student writing?”, English for Specific Purposes 43, 49-61. https://doi.org/10.1016/j.esp.2016.01.004

Errázuriz Cruz, M.C. (2014). “El desarrollo de la escritura argumentativa académica: los marcadores discursivos”, Onomázein 30, 217-326. https://doi.org/10.7764/onomazein.30.13

Frankenberg-Garcia, A. (2018). “Investigating the collocations available to EAP writers”, Journal of English for Academic Purposes 35, 93-104. https://doi.org/10.1016/j.jeap.2018.07.003

Frankenberg-Garcia, A., Lew, R., Roberts, J.C., Rees, G.P., & Sharma, N. (2019). “Developing a writing assistant to help EAP writers with collocations in real time”, ReCALL 31/1, 23-39. https://doi.org/10.1017/S0958344018000150

García-Salido, M., García-González, M., & Alonso-Ramos, M. (2019). “Identifying lexical bundles for an academic writing assistant in Spanish”, in G. Corpas Pastor & R. Mitkov (eds.) Computational and Corpus-Based Phraseology, Third International Conference, Europhras 2019, 144-158. https://doi.org/10.1007/978-3-030-30135-4_11

García-Salido, M., Garcia, M., Villayandre-Llamazares, M., & Alonso-Ramos, M. (2018). “A lexical tool for academic writing in Spanish based on expert and novice corpora”, in N. Calzolari et al. (eds.) Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Paris: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2018

García-Salido, M. (2021). “Compiling an Academic Vocabulary List of Spanish”. Retrieved from: https://doi.org/10.13140/RG.2.2.27681.33123

Garcia, M., & Gamallo, P. (2016). “Yet another suite of multilingual NLP tools”, in J.P. Leal, J.L. Sierra-Rodríguez & A. Simões (eds.) Languages, Applications and Technologies. Communications in Computer and Information Science. Madrid: Springer, 65–75. https://gramatica.usc.es/~gamallo/artigos-web/SLATE2015.pdf

Gardner, D., & Davies, M. (2014). “A new academic vocabulary list”, Applied Linguistics 35/3, 305-327. https://doi.org/10.2307/3587951

Guzzi, E. (2023). Identificación automática de colocaciones académicas en español para una herramienta en línea de ayuda a la redacción. Tesis doctoral. http://hdl.handle.net/2183/35240

Guzzi, E., & Alonso-Ramos, M. (2023a). “Descripción y usabilidad de HARTA, una herramienta de ayuda para la redacción de textos académicos en español”, Tecnologías para la investigación en segundas lenguas 2, 1-22. https://doi.org/10.1344/teisel.v2.42173

Guzzi, E., & Alonso-Ramos, M. (2023b). “Sofisticación y diversidad como medidas de complejidad léxica para determinar el perfil colocacional de textos académicos en español”, Revista Signos 56/112, 282-305. https://doi.org/10.4067/S0718-09342023000200282

Hu, M., & Nation, P. (2000). “Unknown vocabulary density and reading comprehension”, Reading in a Foreign Language 13/1, 403-430.

Hyland, K. (2006). “The ‘other’ English: Thoughts on EAP and academic writing”, The European English Messenger 15/2, 34-38. https://www.academia.edu/40422292/The_other_English_thoughts_on_EAP_and_academic_writing

Hyland, K. (2008). “Metadiscourse: Mapping interactions in academic writing”, Nordic Journal of English Studies 9/2, 125-143. https://doi.org/10.35360/njes.220

Hyland, K., & Tse, P. (2007). “Is there an ‘Academic Vocabulary’?”, TESOL Quarterly 41/2, 235-253.

Jacques, M.P., & Tutin, A. (2018). Lexique transversal et formules discursives des sciences humaines. London: ISTE Group.

Jones, S., & Sinclair, J. (1974). “English lexical collocations. A study in computational linguistics”, Cahiers de lexicologie 24/25, 15–61.

Kilgarriff, A., & Renau, I. (2013). “esTenTen, a vast web corpus of Peninsular and American Spanish”, Procedia - Social and Behavioral Sciences 95, 12-19. https://doi.org/10.1016/j.sbspro.2013.10.617

Laso, N.J. (2022). “SciE-Lex Report: Building up a Collocational Database to Assist the Production of Biomedical Texts in L2 English”, TEISEL. Tecnologías para la investigación en segundas lenguas 1, 1-16. https://doi.org/10.1344/teisel.v1.37444

Laufer, B., & Waldman, T. (2011). “Verb‐noun Collocations in Second Language Writing: A Corpus Analysis of Learners’ English”, Language Learning 67/2, 647-672. https://doi.org/10.1111/j.1467-9922.2010.00621.x

Lei, L., & Liu, D. (2018). “The academic English collocation list: A corpus-driven study”, International Journal of Corpus Linguistics 23/2, 216-243. https://doi.org/10.1075/ijcl.16135.lei

Lew, R., Frankenberg-Garcia, A., Rees, G.P., Roberts, J.C., & Sharma, N. (2018). “ColloCaid: A real-time tool to help academic writers with English collocations”, in J. Cibej et al. (eds.) Proceedings of the XVIII EURALEX International Congress. Ljubljana: Ljubljana University Press, Faculty of Arts, 167-168.

Mauranen, A., Hynninen, N., & Ranta, E. (2016). “English as the academic lingua franca”, in K. Hyland & P. Shaw (eds.) The Routledge Handbook of English for Academic Purposes. London/New York: Routledge, 44-55. https://doi.org/10.4324/9781315657455

Mel’čuk, I. (2012). Semantics: From Meaning to Text [Vol. 1]. Amsterdam/Philadelphia: John Benjamins. https://doi.org/10.1515/phras-2012-0003

Mel’čuk, I. (2015). “Clichés, an understudied subclass of phrasemes”, Yearbook of Phraseology 6/1, 55-86. https://doi.org/10.1515/phras-2015-0005

Mel’čuk, I. (2020). “Clichés and pragmatemes”, Neophilologica 32, 9-20. https://doi.org/10.31261/NEO.2020.32.01

Nazar, R., & Renau, I. (2023). “Estilector: un sistema de evaluación automática de la escritura académica en castellano”, Perspectiva Educacional 62/2, 37-59. https://doi.org/10.4151/07189729-vol.62-iss.2-art.1427

Neff, J. (2008). “Contrasting English-Spanish interpersonal discourse phrases: A corpus study”, in Phraseology in Foreign Language Learning and Teaching. Amsterdam/Philadelphia: John Benjamins, 85-99.

Nesselhauf, N. (2005). Collocations in a Learner Corpus. Amsterdam/Philadelphia: John Benjamins.

Nguyen, T.M.H., & Coxhead, A. (2022). “Evaluating multiword unit word lists for academic purposes”, ITL - International Journal of Applied Linguistics 174/1, 83-111. https://doi.org/10.1075/itl.21041.ngu

Nivre, J., De Marneffe, M.C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., & Zeman, D. (2016). “Universal dependencies v1: A multilingual treebank collection”, in N. Calzolari et al. (eds.) Proceedings of the Tenth International Conference on Resources and Evaluation (LREC’16). Portoro: European Language Resources Association, 1659-1666. https://aclanthology.org/L16-1262

Padró, L., & Stanilovsky, E. (2012). “Freeling 3.0: Towards wider multilinguality”, in N. Calzolari et al. (eds.) Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12). Paris: ELRA, 2473–2479. http://www.lrec-conf.org/proceedings/lrec2012/pdf/430_Paper.pdf

Paquot, M. (2012). “The LEAD dictionary-cum-writing aid: an integrated dictionary and corpus tool”, in S. Granger & M. Paquot (eds.) Electronic Lexicography. Oxford: Oxford University Press, 136-186. https://doi.org/10.1093/acprof:oso/9780199654864.003.0009

Paquot, M., & Granger, S. (2012). “Formulaic language in learner corpora”, Annual Review of Applied Linguistics 32, 130-149. https://doi.org/10.1017/S0267190512000098

Pérez-Llantada, C. (2014). “Formulaic language in L1 and L2 expert academic writing: Convergent and divergent usage”, Journal of English for Academic Purposes 14, 84-94. https://doi.org/10.1016/j.jeap.2014.01.002

Römer, U., & Arbor, A. (2009). “English in academia: Does nativeness matter”, Anglistik: International Journal of English Studies 20/2, 89-100. https://lexically.net/wordsmith/corpus_linguistics_links/Anglistik_2009_nativeness_89_R%C3%B6mer.pdf

Salazar, D. (2014). Lexical Bundles in Native and Non-Native Scientific Writing. Oxford: University of Oxford.

Schuth, E., Köhne, J., & Weinert, S. (2017). “The influence of academic vocabulary knowledge on school performance”, Learning and Instruction 49, 157-165. https://doi.org/10.1016/j.learninstruc.2017.01.005

Sebastián, N., Carreiras, M.F., Cuetos, F., & Martí, M.A. (2000). LEXESP: Léxico informatizado del español. Barcelona: Universitat de Barcelona.

Simpson-Vlach, R., & Ellis, N.C. (2010). “An academic formulas list: New methods in phraseology research”, Applied Linguistics 31/4, 487-512. https://doi.org/10.1093/applin/amp058

Skoufaki, S., & Petrić, B. (2021). “Exploring polysemy in the Academic Vocabulary List: A lexicographic approach”, Journal of English for Academic Purposes 54, 101038. https://doi.org/10.1016/j.jeap.2021.101038

Straka, M., Hajic, J., & Strakov, J. (2016). “UDPipe: Trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and parsing”, in N. Calzolari et al. (eds.) Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). Portoro: ELRA, 4290-4297. http://www.lrec-conf.org/proceedings/lrec2016/pdf/873_Paper.pdf

Taulé, M., Martí, M.A., & Recasens, M. (2008). “AnCora: Multilevel annotated corpora for Catalan and Spanish”, in N. Calzolari et al. (eds.) Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08). Paris: ELRA, 96-101. http://www.lrec-conf.org/proceedings/lrec2008/pdf/35_paper.pdf

Townsend, D., Filippini, A., Collins, P., & Biancarosa, G. (2012). “Evidence for the importance of academic word knowledge for the academic achievement of diverse middle school students”, The Elementary School Journal 112/3, 497-518. https://doi.org/10.1086/663301

Tutin, A. (2014). «La phraséologie transdisciplinaire des écrits scientifiques: des collocations aux routines sémantico-rhétoriques», in A. Tutin y F. Grossmann (eds.) L’écrit scientifique: du lexique au discours. Autour de Scientext. Rennes: PUR, 27-44.

Tutin, A. (2018). «Les expressions polylexicales transdisciplinaires dans les articles de recherche en sciences humaines: retour d’expérience», in M.P. Jacques & A. Tutin (eds.) Lexique transversal et formules discursives des sciences humaines. London: ISTE Group, 73-90.

Wray, A. (2013). “Formulaic language”, Language Teaching 46/3, 316-334. https://doi.org/10.1017/S0261444813000013

Yao, G. (2022). Metadiscourse use in Spanish academic writing: exploring the interface of nativeness and expertise. Tesis doctoral. Murcia: Universidad de Murcia. http://hdl.handle.net/10201/117507

Show more Show less