A Hate Speech Classifier Trained to Predict a Distribution of Ratings
Keywords:
Software Engineering, Hate speech classifier system, AI developmentAbstract
This project developed and tested an alternative methodology for dataset creation informing AI hate speech classifier systems. Information on AI development and training is largely kept private by social media companies that utilise them, including hate speech classifiers that are intended to protect people from being exposed to harmful content. This is problematic as there is little community input nor knowledge on the tools which are control the content they are served online. The methodology proposed by this project attempts to address this by asking people disproportionately targeted by hate speech online to inform the hate speech classifier developed by annotating instances of hate speech to create a dataset according to this project’s methodology. Those targeted by hate speech were asked to annotate in subscription to an ethical idea that they have a right to input in this process and they will be more effective at determining what counts as hateful towards members of their own group. As this is a pilot study practicality meant that the scope is restricted to people in the Rainbow community classifying Rainbow hate speech comments left online. A substantial process for altering survey design and ethics approval was required on this project, in part due to a more sensitive subject matter and potential to harm for survey participants. The dataset creation methodology developed in this project is intended to improve upon majority rules (gold standard) annotation classification by creating a pilot dataset and methodology which can be used by classifiers for soft label annotation.