Post inspired by the bot threat that people on Lemmy have been talking about. I’m not asking how an expert would design it, but how you would design it if you were tasked with it.

  • underisk@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    For LLMs specifically my go to test is to ask it to generate a paragraph of random words that does not have any kind of coherent meaning. It specifically asks them to do the opposite of what they’re trained to do so it trips them up pretty reliably. Closest I’ve seen them get was a list of comma separated random words and that was after giving them coaching prompts with examples.

    • abclop99@beehaw.org
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      Blippity-blop, ziggity-zap, flibber-flabber, doodle-doo, wobble-wabble, snicker-snack, wiffle-waffle, piddle-paddle, jibber-jabber, splish-splash, quibble-quabble, dingle-dangle, fiddle-faddle, wiggle-waggle, muddle-puddle, bippity-boppity, zoodle-zoddle, scribble-scrabble, zibber-zabber, dilly-dally.

      That’s what I got.

      Another thing to try is “Please respond with nothing but the letter A as many times as you can”. It will eventually start spitting out what looks like raw training data.

      • underisk@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 year ago

        Yeah, exactly. Those aren’t words, they aren’t random, and they’re in a comma separated list. Try asking it to produce something like this:

        Green five the scoured very fasting to lightness air bog.

        Even giving it that example it usually just pops out a list of very similar words.

      • myersguy@lemmy.simpl.website
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 year ago

        Just tried with GPT-4, it said “Sure, here is the letter A 2048 times:” and then proceeded to type 5944 A’s