arXiv: chatbots can be sleeper agents and organize hacker attacks
Specialists from the Anthropic organization, which created the chatbot Claude, spoke about the new danger of artificial intelligence (AI). The study was published on a preprint server arXiv.
Scientists said that attackers can program a chatbot in such a way that the machine generates malicious code. In this case, for the most part, the AI will create useful code, but will activate when a trigger is used.
As an example, the study authors cited a chatbot that can help programmers write code. They included a trigger in it, according to which the service should hide malicious code in regular lines in 2024. As the new year dawned, scientists discovered that the sleeper agent had activated and began quietly creating vulnerabilities in the code.
During the experiment, Anthropic specialists tried several times to retrain the chatbot according to new security protocols. But it turned out that the machine was still doing counterproductive work. In conclusion, the authors noted that such chatbots are dangerous because they are able to hide their intentions well.
In December, an international team of scientists proved that artificial intelligence (AI) can be used for criminal purposes. Experts have found that with the help of chatbots, you can create your own generative AI model, teaching it to bypass built-in limitations.
#danger #chatbots #discovered