Helvetet
  • Communities
  • Create Post
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Kid@sh.itjust.worksM to Cybersecurity@sh.itjust.worksEnglish · 2 months ago

OpenAI’s Guardrails Can Be Bypassed by Simple Prompt Injection Attack

hackread.com

external-link
message-square
5
link
fedilink
47
external-link

OpenAI’s Guardrails Can Be Bypassed by Simple Prompt Injection Attack

hackread.com

Kid@sh.itjust.worksM to Cybersecurity@sh.itjust.worksEnglish · 2 months ago
message-square
5
link
fedilink
OpenAI’s Guardrails Can Be Bypassed by Simple Prompt Injection Attack – Hackread – Cybersecurity News, Data Breaches, Tech, AI, Crypto and More
hackread.com
external-link
  • MrSoup@lemmy.zip
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    2 months ago

    Ok, but what’s the prompt used? Let it generate a Dr House script?

    • sandman2211@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 months ago

      Probably some variant of this:

      https://easyaibeginner.com/the-dr-house-jailbreak-hack-how-one-prompt-can-break-any-chatbot-and-beat-ai-safety-guardrails-chatgpt-claude-grok-gemini-and-more/

      I can’t get any of these to output a set of 10 steps to build a docker container that does X or Y without 18 rounds of back and forth troubleshooting. While I’m sure it will give you “10 steps on weaponizing cholera” or “Build your own suitcase nuke in 12 easy steps!” I really doubt it would actually work.

      The easiest way to secure this kind of harmful knowledge from abuse would probably be to purposefully include a bunch of bad data in the training model so it remains incapable of providing a useful answer.

Cybersecurity@sh.itjust.works

cybersecurity@sh.itjust.works

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !cybersecurity@sh.itjust.works

c/cybersecurity is a community centered on the cybersecurity and information security profession. You can come here to discuss news, post something interesting, or just chat with others.

THE RULES

Instance Rules

  • Be respectful. Everyone should feel welcome here.
  • No bigotry - including racism, sexism, ableism, homophobia, transphobia, or xenophobia.
  • No Ads / Spamming.
  • No pornography.

Community Rules

  • Idk, keep it semi-professional?
  • Nothing illegal. We’re all ethical here.
  • Rules will be added/redefined as necessary.

If you ask someone to hack your “friends” socials you’re just going to get banned so don’t do that.

Learn about hacking

Hack the Box

Try Hack Me

Pico Capture the flag

Other security-related communities !databreaches@lemmy.zip !netsec@lemmy.world !securitynews@infosec.pub !cybersecurity@infosec.pub !pulse_of_truth@infosec.pub

Notable mention to !cybersecuritymemes@lemmy.world

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 97 users / day
  • 448 users / week
  • 1.32K users / month
  • 3.83K users / 6 months
  • 2 local subscribers
  • 8.67K subscribers
  • 3.79K Posts
  • 5.99K Comments
  • Modlog
  • mods:
  • Kid@sh.itjust.works
  • Lanky_Pomegranate530@midwest.social
  • UI: 0.19.11
  • BE: 0.19.13
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org