Panic around pandemics: A case study for applying natural language processing to historical archives to identify patterns of discriminatory language

Project overview

Led by Dr Tristan Cann (University of Exeter), this project seeks to explore the barriers researchers face when applying quantitative methods such as natural language processing to archival data, in order to understand the requirements needed to make existing archival texts accessible for large scale computation analysis.

Community lead

University of Bath: Joanna Clifton-Sprigg

University of Bristol: Barbara Caddick, Thomas Larkin

London School of Economics: César Jiménez-Martínez

University of Exeter: Tristan Cann (PI)

Awarded

September 2023

Social media has increased the volume and visibility of hate speech in recent years, in addition to increased reporting of hate crimes. Disruptive events from the COVID-19 pandemic to the election of Donald Trump or the Brexit referendum have seen such speech targeted against specific communities. A natural question to emerge is whether this is a new phenomenon or instead reflects an underlying pattern in human behaviour. Historical archives present a rich source of records to answer this question. Previous analysis of historical contexts, however, has been limited to methods that do not scale to large corpora or comparative contexts.

Hate speech is a challenging term due to its fluid nature. This project defines it broadly as the use of discriminatory language to foster negative attitudes towards different groups. This definition will allow the project to explore the ways in which language varies and changes through historical and geographical contexts. Language evolution is expected to pose a challenge by affecting the utility of lexicon-based approaches for hate speech detection. The project will explore this challenge through alternative natural language processing techniques (e.g. word embeddings and lexicon updating through word co-occurrence), testing whether they are suitable for successful navigation of these lexical challenges.