Panic around pandemics: A case study for applying natural language processing to historical archives to identify patterns of discriminatory language
University of Bath: Joanna Clifton-Sprigg
University of Bristol: Barbara Caddick, Thomas Larkin
London School of Economics: César Jiménez-Martínez
University of Exeter: Tristan Cann (PI)
Social media has increased the volume and visibility of hate speech in recent years, in addition to increased reporting of hate crimes. Disruptive events from the COVID-19 pandemic to the election of Donald Trump or the Brexit referendum have seen such speech targeted against specific communities. A natural question to emerge is whether this is a new phenomenon or instead reflects an underlying pattern in human behaviour. Historical archives present a rich source of records to answer this question. Previous analysis of historical contexts, however, has been limited to methods that do not scale to large corpora or comparative contexts.
Hate speech is a challenging term due to its fluid nature. This project defines it broadly as the use of discriminatory language to foster negative attitudes towards different groups. This definition will allow the project to explore the ways in which language varies and changes through historical and geographical contexts. Language evolution is expected to pose a challenge by affecting the utility of lexicon-based approaches for hate speech detection. The project will explore this challenge through alternative natural language processing techniques (e.g. word embeddings and lexicon updating through word co-occurrence), testing whether they are suitable for successful navigation of these lexical challenges.