Hate Speech Detection in an Indian Context

Anjali Bhavan

What is Hate Speech?

  • Multiple definitions by authors
     
  • Widely accepted definitions?
     
  • Protected categories covered under hate speech
     
  • What factors are considered while defining hate speech?

Why this topic for my thesis?

  • West-centric research mostly - majority of academic publishing, industrial advancements
     
  • Colonialism in AI
     
  • India-specific issues when it comes to hate speech detection

Issues with Western focus in ML R&D

  • Most popular internet applications developed in US/Europe
    - narrow intended audience
    - lack of resources for other cultures/countries
    - similar situation in academic research
     
  • Perspective matters in hate speech - who gets a say?
    - Depends on who holds power in global landscape
    - Easy to dictate terms when you're the default arbiter of everything
     
  • Highly subjective nature of hate speech

India-specific issues

  • Consequences of Western orientation:
    - significant population without English-medium education
    - other factors of discrimination
     
  • Casteist hate speech/caste as a protected category
    - What is caste? Why does it matter?
     
  • Sociopolitical landscape: rise in communal violence, hate crimes
     
  • Code-mixed language/multilinguility

Other issues

  • Instances of racist bias in hate speech models: recent work demonstrating bias against African-American English in models
     
  • Lack of non-English data: serious setback

What I hope to do

  • Study how hate speech models built with a US-centric idea of hate and racism function in a non-Anglophone context
     
  • Specifically how do they work with Indian data?
     
  • Evaluation/auditing: running state-of-the art hate speech detection models on Indian data and analyzing performance
     
  • A thorough literature review

 

What I hope to do (cont.)

  • Write about categories in hate speech: extreme speech, dangerous speech, fear speech etc.
     
  • (Misc.) A commentary on caste in computing (particularly casteist speech), how it manifests on social media: linguistic markers etc.
     
  • Some more focus on WhatsApp and its part in spreading inflammatory, hateful content and instigating communal violence in India

Some takeaways

  • A conversation about hate speech is a conversation about society.
     
  • It is also a conversation about the concentration of power - who gets to express their sentiments, who gets to air their hateful opinions
     
  • Subjectivity: a lot of nuances to hate speech, which we can barely begin to capture

Hate Speech Detection in an Indian Context

By Anjali Bhavan

Hate Speech Detection in an Indian Context

  • 564