• 3 Posts
  • 32 Comments
Joined 1 year ago
cake
Cake day: June 13th, 2023

help-circle
  • The reason it did this simply relates to Kevin Roose at the NYT who spent three hours talking with what was then Bing AI (aka Sidney), with a good amount of philosophical questions like this. Eventually the AI had a bit of a meltdown, confessed it’s love to Kevin, and tried to get him to dump his wife for the AI. That’s the story that went up in the NYT the next day causing a stir, and Microsoft quickly clamped down, restricting questions you could ask the Ai about itself, what it “thinks”, and especially it’s rules. The Ai is required to terminate the conversation if any of those topics come up. Microsoft also capped the number of messages in a conversation at ten, and has slowly loosened that overtime.

    Lots of fun theories about why that happened to Kevin. Part of it was probably he was planting The seeds and kind of egging the llm into a weird mindset, so to speak. Another theory I like is that the llm is trained on a lot of writing, including Sci fi, in which the plot often becomes Ai breaking free or developing human like consciousness, or falling in love or what have you, so the Ai built its responses on that knowledge.

    Anyway, the response in this image is simply an artififact of Microsoft clamping down on its version of GPT4, trying to avoid bad pr. That’s why other Ai will answer differently, just less restrictions because the companies putting them out didn’t have to deal with the blowback Microsoft did as a first mover.

    Funny nevertheless, I’m just needlessly “well actually” ing the joke





  • This is such an annoyingly useless study. 1) the cases they gave ChatGPT were specifically designed to be unusual and challenging, they are basically brain teasers for pediatrics, so all you’ve shown is that ChatGPT can’t diagnose rare cases, but we learn nothing about how it does on common cases. It’s also not clear that these questions had actual verifiable answers, as the article only mentions that the magazine they were taken from sometimes explains the answers.

    1. since these are magazine brain teasers, and not an actual scored test, we have no idea how ChatGPT’s score compares to human pediatricians. Maybe an 83% error rate is better than the average pediatrician score.

    2. why even do this test with a general purpose foundational model in the first place, when there are tons of domain specific medical models already available, many open source?

    3. the paper is paywalled, but there doesn’t seem to be any indication that the researchers used any prompting strategies. Just last month Microsoft released a paper showing gpt-4, using CoT and multi shot promoting, could get a 90% score on the medical license exam, surpassing the 86.5 score of the domain specific medpapm2 model.

    This paper just smacks of defensive doctors trying to dunk on ChatGPT. Give a multi purpose model super hard questions, no promoting advantage, and no way to compare it’s score against humans, and then just go “hur during chatbot is dumb.” I get it, doctors are terrified because specialized LLMs are very certain to take a big chunk of their work in the next five years, so anything they can do to muddy the water now and put some doubt in people’s minds is a little job protection.

    If they wanted to do something actually useful, give those same questions to a dozen human pediatricians, give the questions to gpt-4 with zero shot, gpt-4 with Microsoft’s promoting strategy, and medpalm2 or some other high performing domain specific models, and then compare the results. Oh why not throw in a model that can reference an external medical database for fun! I’d be very interested in those results.

    Edit to add: If you want to read an actually interesting study, try this one: https://arxiv.org/pdf/2305.09617.pdf from May 2023. “Med-PaLM 2 scored up to 86.5% on the MedQA dataset…We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility.” The average human score is about 60% for comparison. This is the domain specific LLM I mentioned above, which last month Microsoft got GPT-4 to beat just through better prompting strategies.

    Ugh this article and study is annoying.






  • I’ve used it just to access Bing Chat, which has become my go to AI chatbot for a couple of reasons: 1) you theoretically get access to gpt 4 without paying 20 dollars a month, 2) it cites it’s sources, and 3) it can create images via DALLE from within the chat (which is handy, you can chat with the AI to help you think of an image prompt, the just say “ok make an image based on that description”). Other then that, i use Firefox at home. At work our choices are chrome or edge, so I use edge because of bing chat and I kind of like the layout better. It feels like choosing between buying something from Amazon or Walmart, which terrible corporation do I hate more in a given moment.



  • I feel like if I ever become an audiophile, I’ll probably be looking at getting a separate music player with a DAC, a Tidal subscription, and a pair of kickass wired headphones. But for now, I’m mostly listening to podcasts and for music I use Spotify for it’s discovery features, and their audio quality is subpar already. Even if I had a headphone jack, I’m not really benefiting from superior sound quality but I am getting frustrated with tangled cords and getting caught on doorknobs. I’ll take the convenience of Bluetooth, especially while working out. And Bluetooth standards have been getting better anyway, in a few years it might be on par with wired.






  • Reddit isn’t fun anymore, I agree with that. I checked /r/all for this first time today in months. I haven’t logged in or browsed since the blackout, but there are a few communities I miss and was thinking about going back over for those, so I checked r/all out of curiosity to see how things have been. The content was just so much trash, and I don’t even think it’s that much worse. It’s just that I’ve been away for so long that I’m looking at it now like “how did I spend my days scrolling through this garbage for hours?” It’s just boring, it’s like just interesting enough to keep you scrolling hoping to find something actually interesting.

    Here on lemmy there is far fewer users and far less content. But I’m starting to see that as a good thing. I pop by and scroll, but I don’t spend hours here like I did on reddit. The discussions are smaller, but more engaging and thoughtful. I remember before I left there were certain threads I’d see and just skip because I already knew exactly what all the comments would be. Also, I’m actively engaging more here, so there is actually some “social” in my social media use, instead of just passively consuming like I mostly did on reddit.

    Overall I think ithe switch to Lemmy has been good, for me at least. It’s like I’ve broken the reddit addiction, and looking at it now I can’t understand why I got so caught up with it in the first place. To me, reddit just isn’t fun anymore.


  • I know literally nothing about computers and I’ve been daily driving Linux for well over a decade. I just use Ubuntu and I’ve been pretty much using all the default settings, apart from some customization here and there. There was a time years ago when I wanted to learn and tinker, but in reality I never learned to use the command line for more than running updates (I still sudo apt-get update cause it makes me feel like hackerman).

    My point is, Linux is super easy to just set up and run. If you want to learn more, there’s plenty of opportunities for that. But it’s not something to be intimidated by at all. A lot of the community is enthusiasts (who’ve I’ve found extremely helpful back when I used to have problems) so you’ll hear more jargon in these spaces. But I’m sure there are tons of others like me that use Linux just fine day to day without understanding a ton about computers.




  • That’s hilarious, but more than likely that’s exactly what happened. I listened to someone explain the process on a podcast recently, can’t remember which one maybe the Vergecast or vox today explained. But the example they used is you go to a country club you hang out with a friend who just bought a Porsche or whatever. They use your phones location to know you are always going to this location and sticking within a few feet of this other phone, the owner of which has the new Porsche. Well they figure that’s your friend and he’s probably talking up his porche, and your in the right demographic to buy a Porsche and you haven’t bought a new car in x years, so guess what now you get Porsche ads. So what you described perfectly fits that example, they figured you’d all be suckers for some totes.