Chatbots such as Open AI’s ChatGPT and Google’s Bard are vulnerable to indirect hint injection attacks. Security researchers say the holes can be plugged to some extent.

It’s easy to spoof large language models that power chatbots, such as OpenAI’s ChatGPT and Google’s Bard. In a February experiment, security researchers forced Microsoft’s Bing chatbot to behave like a scammer. Hidden instructions on the web pages the researchers created told the chatbot to ask people using it to hand over their bank account details. Attacks in which such hidden information can cause AI systems to behave in unexpected ways are just the beginning.

Since then, hundreds of examples of “indirect hint injection” attacks have been created. This type of attack is now considered one of the most worrisome ways hackers can abuse language models. The cybersecurity industry is working to raise awareness of potential dangers as big companies and small startups start using generative artificial intelligence systems. By doing this, they hope to protect personal and corporate data from attacks. There is no magic fix yet, but common security practices can reduce the risk.

“Indirect real-time injection is definitely a concern for us,” said Vijay Bolina, chief information security officer at Google’s DeepMind artificial intelligence division. He said Google is working on multiple projects to understand how AI can be attacked. In the past, instant injection was considered “problematic,” but things have accelerated since people started connecting large language models (LLMs) to the internet and plug-ins, which can add new data to the system, Bolina said. As more companies use LL.M.s, potentially providing them with more personal and company data, things are going to get messy. “We definitely see that as a risk that actually limits the potential usefulness of the LL.M. to our profession,” Bolina said.

Also Read:

Instant injection attacks are divided into two categories: direct attacks and indirect attacks. The latter is of greatest concern to security experts. With the LLM, people ask questions or provide instructions in prompts, and the system answers them. Direct cue injection occurs when someone tries to get the LLM to respond in unexpected ways, such as by making them say hate speech or harmful answers. An injection of indirect hints, an injection of real concern, kicks things up a notch. This command is not a malicious prompt entered by the user, but comes from a third party. For example, a website that an LL.M. can read or a PDF that is being analyzed may contain hidden instructions for an AI system to follow.

“The fundamental risk behind all of this, for direct and indirect cue instructions, is that whoever gives the input to the LLM has a lot of influence over the output,” said Rich Harang, Nvidia’s chief security architect focused on artificial intelligence systems. .”, the world’s largest AI chip manufacturer. In short: if someone can feed data into an LLM, then they can potentially manipulate what it returns.

Security researchers have demonstrated how indirect prompt injection can be used to steal data, manipulate someone’s resume, and run code remotely on a computer. A team of security researchers ranked instant injection as the top vulnerability in deploying and managing LLMs. The National Cyber ​​Security Centre, the arm of GCHQ, Britain’s intelligence agency, even called attention to the risk of instant injection attacks, saying hundreds of such attacks had occurred so far. “While research on immediate injection is ongoing, this may simply be an inherent problem with the LLM technique,” the GCHQ branch warned in a blog post. “There are strategies that may make immediate injection more difficult, but so far As of yet, there are no foolproof mitigation measures.”

Also Read:

OpenAI spokesman Niko Felix said that on-the-fly injections are an active area of ​​research, and that OpenAI has previously mentioned “jailbreaking,” another term used for some on-the-fly injections. Microsoft communications director Caitlin Roulston said the company has “large teams” working on security issues. “As part of this ongoing effort, we take action to block suspicious websites and continually improve our systems to help identify and filter these types of tips before they enter the model,” Roulston said.

AI systems may create new problems, but they can also help solve them. Google’s Bolina said the company uses “specially trained models” to “help identify known malicious inputs and known unsafe outputs that violate our policies.” Nvidia has released a series of open source guardrails for adding constraints to models. But these methods can only go so far. It’s impossible to know all the ways malicious prompts can be used. Both Bolina and Nvidia’s Harang said developers and companies that want to deploy LLM into their systems should use a set of security industry best practices to reduce the risk of indirect hint injection. “You have to really think about how to integrate and implement these models into other applications and services,” Bolina said.

“When you’re getting information from a third party like the Internet, you can no longer trust an LL.M., just like you can’t trust random Internet users,” Harang said. “The core problem is that if you want to really focus on security, you always have to take the LLM outside of any trust boundary.” In cybersecurity, trust boundaries can determine how dependable certain services are and how important they are to types of information. access level. Isolated systems reduce risk. Since launching the ChatGPT plugin earlier this year, OpenAI has added user authentication, meaning people must approve it when the plugin wants to take certain actions. Companies should understand who wrote plugins and how they were designed before integrating them, Harang said.

Also Read:  A Mysterious Group Has Ties to 15 Years of Ukraine-Russia Hacks

Google’s Bolina added that when connecting systems to LLM, people should also follow the cybersecurity principle of least privilege, giving the system the minimum access to the data it needs and the minimum ability to make the changes it needs. “If I ask an LL.M. to read my email, should the service layer providing that interaction be granted that service? [the ability] write e-mail? Probably not,” he said. Ultimately, Harang added, this is a new version of an old security problem. “The attack surface is new. But the principles and issues we’re dealing with are the same ones we’ve been dealing with for more than 30 years. “

Categories: Security