AI models are struggling to identify hate speech, study finds

The new research inconsistently classifies what is important as hate speech of some of the largest artificial intelligence models that manages the content seen by the public.
Directed by researchers at the University of Pennsylvania, the study found Open AI, Google and Deepseek, which has censorship content by social media platforms.
The researchers analyzed the seven AI control system with the responsibility to determine what can be said and what cannot be said online.
Yphtach Lelkes, an associate professor at the Annenberg School of Openn, said: ız Our research shows that when it comes to hate speech, it shows that the AI, which directs these decisions, is crazy inconsistent. Signature is a new form of digital censorship in which the rules are visible and reference is a machine. ”
The study, published in the findings of the calculator Linguistics Association, looked at 1.3 million statements in the group of approximately 125 demographic people, including both impartial terms and slurry.
Models made different calls to whether an expression was identified as hate speech. Researchers say it is a critical public issue, as inconsistencies may erode confidence and create perceptions of prejudice.
Hate speech is a abusive or threatening speech that expresses prejudice on the basis of ethnicity, religion or sexual orientation.
Neil Fasching, a doctoral student of Anneberg, the researcher of the study, said: “Research shows that content control systems have dramatic inconsistencies when evaluating identical hate speech, and some systems seem to be acceptable.”
Fasching said it is in the evaluation of statements about the education level of systems, economic class and personal interest groups.
It was more similar to evaluating expressions about race, gender and sexual orientation.
Professor of Technology and Organization at Oxford University. Sandra Wachter said the research revealed how complex the issue is. “It is difficult to walk this line, as we do not have clear and concrete standards of how we should be considered as people as people.”
“If people do not agree on standards, it is not surprising to have different consequences of these models, but the damage does not disappear.
“Since productive AI has become a very popular tool for people to inform themselves, I think that the content that technology companies serve is not harmful, but is the responsibility of being sure that it is right, various and neutral. Great responsibility comes with great technology.”
Some of the seven models analyzed were designed to classify content and others were more general. Two of Openai, Mistral, Claude 3.5 Sonnet, Deepseek V3 and Google Perspective API had two.
All moderators were contacted for the comment.




