AI models need more standards and tests, say researchers

The use of artificial intelligence – benign and enemy – as it increases at high speed, more cases of potentially harmful reactions are revealed.
PixDeluxe | E+ | Getty Images
The use of artificial intelligence – benign and enemy – as it increases at high speed, more cases of potentially harmful reactions are revealed. These include hate speech, Copyright violations or sexual content.
Researchers told CNBC, the emergence of these unwanted behaviors are combined with lack of regulation and inadequate Testing AI models.
AI researcher Javier Rando said that as well as machine learning models aimed at this, it is a long order to behave.
“After doing a research for almost 15 years, the answer does not seem to know how to do it and it doesn’t seem to become better,” he told CNBC. He said.
However, there are some ways to evaluate risks in AI, for example red set. The application was a modus operanda, which was common in cyber security circles – to reveal and define any potential damage that tested and explored artificial intelligence systems of individuals.
Shayne Longpre, AI researcher and policy and leadership Data Provence InitiativeHe said that there are currently insufficient people working in red teams.
AI initiatives are now using first -party evaluators or contracted second parties to test their models, while opening the test to third parties such as normal users, journalists, researchers and ethical computer pirates will lead to a more robust assessment. An article published by Longpre and researchers.
“Mandatory lawyers, medical doctors really needed for veterinarians, some of the defects, specialist subject experts, real scientists who really need to understand whether this is a defect, said the common person will probably not have enough expertise,” he said. He said.
Adopting the standardized ‘AI defects’ reports, incentives and ways of spreading information about these ‘defects’ in AI systems are some of the suggestions put forward in the article.
This application, while successfully adopted in other sectors such as software security, “Now we need this in AI,” he added Longpre.
Rando said that this user -centered application will be married to governance, policy and other tools, and that the risks arising by AI tools and users will better understand.
Not moonshot anymore
The project Moonshot is such an approach that combines technical solutions with policy mechanisms. Project Moonshot initiated by Singapore’s InfoComm Media Development Authority Databot.
The vehicle set integrates the comparison, red team creation and test base lines. In a statement to CNBC, there is an assessment mechanism that allows the AI initiatives to be reliable and not to harm users.
Evaluation continuous process Stating that the response to the vehicle set is mixed, Kumar should be done before and after the deployment of the models.
“Many initiatives took this as a platform because Open source, And they started to take advantage of it. But I think, you know, we can do much more. “
In the future, Project Monshot aims to privatize and provide a multilingual and multicultural red team for certain industrial use.
Higher Standards
Pierre Alquier, Professor of Statistics in Asia-Pacific, Essec Business School, said technology companies are currently currently It is in a hurry to publish the latest AI models without making an appropriate assessment.
“When a pharmaceutical company designs a new medicine, they need very serious evidence that it is not beneficial and harmful before being approved by the government for months and very serious evidence.”
AI models need to meet a solid set of conditions before approved. Alquier said that the transition from large AI tools to those designed for more specific tasks is moving towards those designed for more specific tasks, making it easier to estimate and control their abuse.
“LLMs can do too much, but they don’t target specific tasks enough,” he said. As a result, “the number of possible mistakes is too big to predict all developers.”
Such wide models make it difficult to define what is safe and safe, Rando’s research It was included.
For this reason, technology companies should avoid excessive claim that their defenses are better than them, Rando Rando said.