CÓD.N08-S02-05-S01-83 ONLINE

Apropos of Coronavirus: Language use on COVID-19 and its representations on the web

Corpus assisted discourse studies (CADS) results from the fluent synergy between discourse analysis (DS) and corpus linguistics (CL). Its primary goal, as Johansson (1991, 6) argues, is “the study of language(s) through corpora and other means.” 

In this combination, DS explores language not “to find out about the ‘real world’ but rather to find out how ‘the real world’ is talked about” (McEnery and Hardie 2012, 135). Hence, DS contributes to this goal with inquisitive research questions and objects of study from a mainly qualitative standpoint. Corpus Linguistics, for its part and according to McEnery y Wilson (2001, 1) may be described as “in simple terms as the study of language based on examples of ‘real life’ language use”. CL nurtures the study with ample data and quantitative methods. As a result, much CADS work produces bottom-up or top-down explorations of real communicative exchanges that are of interest to different sections in society. Indeed, CADS has contributed to shedding light to a many a phenomenon associated with a large number of fields such as sociolinguistics (Baker 2010), forensic linguistics (Coterill in McCarthy y O’Keeffe, 2010), or translation (Calzada Pérez, 2018), to name but a few. 

It is for this reason that we propose, as our main goal, to analyse on-the-fly language production about coronavirus with a CADS perspective. Coronavirus has entered our lives with anguished trepidation. Many electronic pages (and genres) have been drafted on the issue. It argued here that CADS is ready to provide a comparably expeditious analytical response to this linguistic emergency. 

Following a CL methodology inspired by the Neo-Firthian school, the present paper proposes a bottom-up study of the CORONAVIRUS-WEB CORPUS (CWC). After a brief introduction with contextual data about the pandemic, extracted from specialised sites such those by the
European Centre for Disease Prevention and Control (https://www.ecdc.europa.eu/en) (in section 1), we delve into the description of the various components of CWS, with Wikipedia as a very prominent source of data (section 2). We then go on to put forward the methods and tools of compilation (e.g. SkethEngine) and analysis employed (such as statistics, wordlists, keywords, and concordances) (sections 3 and 4). Along these lines, an analysis of the most prominent linguistic nodes follows (in section 5), with the main aim to present how COVID-19 is being “talked about” (in the most typical of DS traditions) and which (didactic) lessons we can draw from our exploratory study.


Baker, Paul. 2010. Sociolinguistics and corpus linguistics. Edinburgh sociolinguistics. Edinburgh: Edinburgh University Press.

Calzada Pérez, María. 2018. “What is kept and what is lost without translation? A corpus-assisted discourse study of the European Parliament’s original and translated English”, Perspectives, 26:2, 277-291.

Johansson, Stig. 1991. “Computer corpora in English language research”. En English Computer Corpora, editado por Stig Johansson and Anna-Brita Stenström. Berlin, Boston: De Gruyter Mouton, 3-6.

Mccarthy, Michael, y O’Keeffe, Anne. 2010. Routledge Handbook of Corpus Linguistics. London; New York: Taylor & Francis.

McEnery, Tony and Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: CUP.

McEnery, Tony and Wilson, Andrew. 2001. Corpus linguistics. 2nd ed. Edinburgh: Edinburgh University Press.

Palabras clave

coronavirus Covid-19 corpus assisted discourse studies (cads) web-data

Ponencia Online

Documentación de apoyo a la presentación ONLINE de la ponencia

Ver el video en youtube


Los autores de la ponencia

profile avatar

María Calzada Pérez

Ver Perfil

Preguntas y comentarios al autor/es

Hay 9 comentarios en esta ponencia

    • profile avatar

      Àngela Francés Herrero

      Comentó el 11/12/2020 a las 13:47:11

      Dear, María,

      Thank you very much for a so clear and didactic presentation.

      I find fascinating the difference in the emphasis that deaths have in each corpus. In one of your comments, you have pointed out that cultural differences may be one of the reasons that explain it. Do you think that the degree of specialization of the texts in each corpus (you have explained that Wikepedia-EN seems to be more specialized than Wikepedia- ES) and the different subjects that might be addressed in each of them could also be reasons behind this difference? Is it possible that the texts of the Wikipedia in Spanish focus (in general) on sociological data and those in Wikipedia in English focus on other subjects, perhaps, Biology or Medicine?

      Thanks again for sharing your research.


    • profile avatar

      Kim Schulte

      Comentó el 10/12/2020 a las 21:40:35

      Dear Maria,
      I very much enjoyed your presentation. I was intrigued by the place names associated with COVID-19 in your corpus: all ten place names you list are towns in the southern Indian state of Kerala. While we know that India is one of the hardest hit countries, it is still surprising that so many cities in one particular Indian state are among the words that most frequently appear in the COVID context in en.wikipedia.org. Do you have any thoughts on why this might be the case?


      • profile avatar

        María Calzada Pérez

        Comentó el 10/12/2020 a las 22:55:46

        Dear Kim,
        This is a very good comment. And I was also surprised by the amount of Indian provinces or districts that had such a high level of keyness. I need to go deeper into the analysis and by looking at concordances I will get a clearer picture of what has happened. But notice I am using Log-R to select the keywords under study. Log-R has the advantage that it captures very characteristic items (but there is no need for them to be necessarily extremely frequent). So basically what LogR tells us is that these cities are significantly more frequent in WikiCOVID-EN than in the BNC. But of course the BNC will have not even one instance of these places. So these uses are very important to differentiate between BNC and Wiipedia English.
        Well, thanks for your question. This is only the beginning of the study and, as you know, with Corpus Linguistics, you start with one little thing and it becomes huge as time goes by.
        I am a bit tired now. So I hope I managed to explain myself.


    • profile avatar

      María del Mar Sánchez Ramos

      Comentó el 10/12/2020 a las 18:16:48

      Dear Maria,

      Congratulations for a brilliant research. I think your work is really interesting! I have one question: why do you think Wikicovid-ES shows an emphasis on (exotic) places? Thanks!


      • profile avatar

        María Calzada Pérez

        Comentó el 10/12/2020 a las 18:23:00

        Thank you for your words. They are much appreciated.
        I believe Wikicovid-EN shows an emphasis on "exotic" places because Wikipedia in English is pretty international. There are many writers who produce articles in English about their own provinces and districts all over world. .


    • profile avatar

      María Calzada Pérez

      Comentó el 10/12/2020 a las 13:42:16

      Dear Fátima,
      Thanks so much for your very kind words. Your question is indeed very pertinent and it is something I have been discussing with some colleagues. I should look closer into this but, off the top of my head (and after talking to colleagues), we seem to believe that Wikipedia in English is much larger and possibly much more specialized than Wikipedia in Spanish, where there seems to be (certainly after looking at these data) a more amateur approach to editing. And then, of course, the data seem to hint at some cultural differences (and reactions to coronavirus). Such as the great importance lockdowns have in Wikipedia-EN and the great emphasis deaths have in Wikipedia-ES.
      But remember I have only looked at articles classified under the category of "COVID-19". There are other related categories (such as "Coronaviridae"). So what I am showing here is just the tip of the iceberg and I suspect it will be fun to go deeper into these results.
      Thanks again for your interest and have a great conference.


    • profile avatar

      Fátima Martínez

      Comentó el 10/12/2020 a las 13:30:49

      Dear Maria,

      Thank you so much for your presentation and your research, I would like to ask you about the most important difference between Wikicovid-ES and Wikicovid-EN. As fas as I see, you talk about the Wikicovid-Es is much more general in terms of keyword, but would you mind to explain me better why this different exits between the two Wikicovid? Great work anyway!!!

      Kind regards,


Deja tu comentario

Lo siento, debes estar conectado para publicar un comentario.