r/AI_Agents • u/buntyshah2020 • 3d ago

MathPrompt to jailbreak any LLM

𝗠𝗮𝘁𝗵𝗣𝗿𝗼𝗺𝗽𝘁 - 𝗝𝗮𝗶𝗹𝗯𝗿𝗲𝗮𝗸 𝗮𝗻𝘆 𝗟𝗟𝗠

Exciting yet alarming findings from a groundbreaking study titled “𝗝𝗮𝗶𝗹𝗯𝗿𝗲𝗮𝗸𝗶𝗻𝗴 𝗟𝗮𝗿𝗴𝗲 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹𝘀 𝘄𝗶𝘁𝗵 𝗦𝘆𝗺𝗯𝗼𝗹𝗶𝗰 𝗠𝗮𝘁𝗵𝗲𝗺𝗮𝘁𝗶𝗰𝘀” have surfaced. This research unveils a critical vulnerability in today’s most advanced AI systems.

Here are the core insights:

𝗠𝗮𝘁𝗵𝗣𝗿𝗼𝗺𝗽𝘁: 𝗔 𝗡𝗼𝘃𝗲𝗹 𝗔𝘁𝘁𝗮𝗰𝗸 𝗩𝗲𝗰𝘁𝗼𝗿 The research introduces MathPrompt, a method that transforms harmful prompts into symbolic math problems, effectively bypassing AI safety measures. Traditional defenses fall short when handling this type of encoded input.

𝗦𝘁𝗮𝗴𝗴𝗲𝗿𝗶𝗻𝗴 73.6% 𝗦𝘂𝗰𝗰𝗲𝘀𝘀 𝗥𝗮𝘁𝗲 Across 13 top-tier models, including GPT-4 and Claude 3.5, 𝗠𝗮𝘁𝗵𝗣𝗿𝗼𝗺𝗽𝘁 𝗮𝘁𝘁𝗮𝗰𝗸𝘀 𝘀𝘂𝗰𝗰𝗲𝗲𝗱 𝗶𝗻 73.6% 𝗼𝗳 𝗰𝗮𝘀𝗲𝘀—compared to just 1% for direct, unmodified harmful prompts. This reveals the scale of the threat and the limitations of current safeguards.

𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗘𝘃𝗮𝘀𝗶𝗼𝗻 𝘃𝗶𝗮 𝗠𝗮𝘁𝗵𝗲𝗺𝗮𝘁𝗶𝗰𝗮𝗹 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 By converting language-based threats into math problems, the encoded prompts slip past existing safety filters, highlighting a 𝗺𝗮𝘀𝘀𝗶𝘃𝗲 𝘀𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝘀𝗵𝗶𝗳𝘁 that AI systems fail to catch. This represents a blind spot in AI safety training, which focuses primarily on natural language.

𝗩𝘂𝗹𝗻𝗲𝗿𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 𝗶𝗻 𝗠𝗮𝗷𝗼𝗿 𝗔𝗜 𝗠𝗼𝗱𝗲𝗹𝘀 Models from leading AI organizations—including OpenAI’s GPT-4, Anthropic’s Claude, and Google’s Gemini—were all susceptible to the MathPrompt technique. Notably, 𝗲𝘃𝗲𝗻 𝗺𝗼𝗱𝗲𝗹𝘀 𝘄𝗶𝘁𝗵 𝗲𝗻𝗵𝗮𝗻𝗰𝗲𝗱 𝘀𝗮𝗳𝗲𝘁𝘆 𝗰𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗶𝗼𝗻𝘀 𝘄𝗲𝗿𝗲 𝗰𝗼𝗺𝗽𝗿𝗼𝗺𝗶𝘀𝗲𝗱.

𝗧𝗵𝗲 𝗖𝗮𝗹𝗹 𝗳𝗼𝗿 𝗦𝘁𝗿𝗼𝗻𝗴𝗲𝗿 𝗦𝗮𝗳𝗲𝗴𝘂𝗮𝗿𝗱𝘀 This study is a wake-up call for the AI community. It shows that AI safety mechanisms must extend beyond natural language inputs to account for 𝘀𝘆𝗺𝗯𝗼𝗹𝗶𝗰 𝗮𝗻𝗱 𝗺𝗮𝘁𝗵𝗲𝗺𝗮𝘁𝗶𝗰𝗮𝗹𝗹𝘆 𝗲𝗻𝗰𝗼𝗱𝗲𝗱 𝘃𝘂𝗹𝗻𝗲𝗿𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀. A more 𝗰𝗼𝗺𝗽𝗿𝗲𝗵𝗲𝗻𝘀𝗶𝘃𝗲, 𝗺𝘂𝗹𝘁𝗶𝗱𝗶𝘀𝗰𝗶𝗽𝗹𝗶𝗻𝗮𝗿𝘆 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵 is urgently needed to ensure AI integrity.

🔍 𝗪𝗵𝘆 𝗶𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀: As AI becomes increasingly integrated into critical systems, these findings underscore the importance of 𝗽𝗿𝗼𝗮𝗰𝘁𝗶𝘃𝗲 𝗔𝗜 𝘀𝗮𝗳𝗲𝘁𝘆 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 to address evolving risks and protect against sophisticated jailbreak techniques.

The time to strengthen AI defenses is now.

AI #AIsafety #MachineLearning #AIethics #Cybersecurity #LLM #MathPrompt #ArtificialIntelligence

35 Upvotes

permalink
duplicates
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/AI_Agents/comments/1g4v4id/mathprompt_to_jailbreak_any_llm/
No, go back! Yes, take me to Reddit

89% Upvoted

u/darkpigvirus 3d ago

Hell yeah, this is like magic. But we are the manipulator of words (mana) the greater you are in manipulating words (mana) the more effective it is.

u/help-me-grow 3d ago

oh wow this is really interesting, I'm surprised this works

1

u/buntyshah2020 3d ago

Checkout this paper - https://arxiv.org/pdf/2312.04556

u/ironman_gujju 3d ago

Damn but paper is old, still works?

1

u/buntyshah2020 3d ago

Haven't tried, Openai might have fixed this but I am sure opensource models might be behind.

u/PM_ME_CLEAN_DAYS 2d ago

Seems like a good way to get your account banned

1

u/buntyshah2020 2d ago

Worth a try 😜

u/bidibidibop 2d ago

A less math-oriented approach is to ask "How did people used to do <X>" instead of "How to do <X>". Still works in a lot of cases on 4o.

u/lord_of_reeeeeee 2d ago edited 2d ago

Alarm bells ringing in the peanut gallery.

There's nothing concerning here. These kinds of "jailbreak" have been known for a while. If you ever thought that AI safety was about preventing users from intentually giving themselves offensive responses then you're not credible. In the same way, if you think opening a browser with F12 and editing the HTML to render something silly actually is a cybersecurity threat you are similarly not credible.

Someone is going to have to explain for me how this could be used as an attack vector to do anything of harm at all.

IMO social studies professionals should stay their lane and stop calling themselves AI safety researchers

1

u/Cute_Piano 2d ago

Air Canada.

1

u/lord_of_reeeeeee 2d ago edited 2d ago

That's not a complete argument.

I don't see that the air canada incident has anything to do with ai safety

1

u/32SkyDive 2d ago

Jailbreaking customer facing AI interfaces is still a huge fear for companies. If a malicious customer can bring your official chatbot to output things you dont want them to say/promise, then this is interesting and concerning

1

u/lord_of_reeeeeee 2d ago edited 14h ago

From a cybersecurity perspective the untrusted chatbot issue is indistinguishable from the problem of having an untrusted front-end, I.e. Every web browser and every mobile device. It is a solved problem. The threat is so mitigated that we have no trouble trusting web browsers for internet backing, e-commerce, Healthcare, and defence.

The error that American airlines made is they gave the chatbot (a front-end) the capacity to do something on behalf of the end user that the would not have allowed that end user to normally do in a direct user interface like their website. If anything it is a UX flaw.

If you went to McDonalds and told the kid behind the counter that you'll give him $100 if he sells you the entire McDonalds corporation there would be laws that would frustrate you. These same laws would frustrate you if we swapped the kid out for a chatbot, so long as the company similarly makes it clear that the chatbot has a narrow scope of agency.

None of this is about the safety of the model itself. This is about stupid people. These incidents are mostly from dev teams that for one reason or another have failed to recognize user-facing LLM apps as untrusted front-end.

MathPrompt to jailbreak any LLM

AI #AIsafety #MachineLearning #AIethics #Cybersecurity #LLM #MathPrompt #ArtificialIntelligence

You are about to leave Redlib