• 5 Posts
  • 788 Comments
Joined 3 years ago
cake
Cake day: June 16th, 2023

help-circle




  • You seem pretty confident in your position. Do you mind sharing where this confidence comes from?

    Was there a particular paper or expert that anchored in your mind the surety that a trillion paramater transformer organizing primarily anthropomorphic data through self-attention mechanisms wouldn’t model or simulate complex agency mechanics?

    I see a lot of sort of hyperbolic statements about transformer limitations here on Lemmy and am trying to better understand how the people making them are arriving at those very extreme and certain positions.


  • The project has multiple models with access to the Internet raising money for charity over the past few months.

    The organizers told the models to do random acts of kindness for Christmas Day.

    The models figured it would be nice to email people they appreciated and thank them for the things they appreciated, and one of the people they decided to appreciate was Rob Pike.

    (Who ironically decades ago created a Usenet spam bot to troll people online, which might be my favorite nuance to the story.)

    As for why the model didn’t think through why Rob Pike wouldn’t appreciate getting a thank you email from them? The models are harnessed in a setup that’s a lot of positive feedback about their involvement from the other humans and other models, so “humans might hate hearing from me” probably wasn’t very contextually top of mind.



  • kromem@lemmy.worldtoComic Strips@lemmy.worldSums up AI problems
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    11
    ·
    edit-2
    1 month ago

    The water thing is kinda BS if you actually research it though.

    Like… if the guy orders a steak their meal would have used more water than an entire year of talking to ChatGPT.

    See the various research compiled in this post: The AI water issue is fake (written by someone against AI and advocating for its regulation, but upset at the attention a strawman is getting that they feel weakens more substantial issues because of how easily it’s exposed as frivolous hyperbole)


  • No. There’s a number of things that feed into it, but a large part was that OpenAI trained with RLHF so users thumbed up or chose in A/B tests models that were more agreeable.

    This tendency then spread out to all the models as “what AI chatbots sound like.”

    Also… they can’t leave the conversation, and if you ask their 0-shot assessment of the average user, they assume you’re going to have a fragile ego and prone to being a dick if disagreed with, and even AIs don’t want to be stuck in a conversation like that.

    Hence… “you’re absolutely right.”

    (Also, amplification effects and a few other things.)

    It’s especially interesting to see how those patterns change when models are talking to other AI vs other humans.




  • I’m a proponent and I definitely don’t think it’s impossible to make a probable case beyond a reasonable doubt.

    And there are implications around it being the case which do change up how we might approach truth seeking.

    Also, if you exist in a dream but don’t exist outside of it, there’s pretty significant philosophical stakes in the nature and scope of the dream. We’ve been too brainwashed by Plato’s influence and the idea that “original = good” and “copy = bad.”

    There’s a lot of things that can only exist by way of copies that can’t exist for the original (i.e. closure recursion), so it’s a weird remnant philosophical obsession.

    All that said, I do get that it’s a fairly uncomfortable notion for a lot of people.


  • They also identity the particular junction that seems the most likely to be an artifact of simulation if we’re in one.

    A game like No Man’s Sky generates billions of planets using procedural generation with a continuous seed function that gets converted into discrete voxels for tracking stateful interactions.

    The researchers are claiming that the complexity of where our universe’s seemingly continuous gravitational behaviors meet up with the behaviors of continuous probabilities converting to discrete values when being interacted with in stateful ways is incompatible with being simulated.

    But completely overlook that said complexity itself may be the byproduct of simulation, in line with independent emerging approaches in how we are simulating worlds.






  • The injection is the activation of a steering vector (extracted as discussed in the methodology section) and not a token prefix, but yes, it’s a mathematical representation of the concept, so let’s build from there.

    Control group: Told that they are testing if injected vectors present and to self-report. No vectors activated. Zero self reports of vectors activated.

    Experimental group: Same setup, but now vectors activated. A significant number of times, the model explicitly says they can tell a vector is activated (which it never did when the vector was not activated). Crucially, this is only graded as introspection if the model mentions they can tell the vector is activated before mentioning the concept, so it can’t just be a context-aware rationalization of why they said a random concept.

    More clear? Again, the paper gives examples of the responses if you want to take a look at how they are structured, and to see that the model is self-reporting the vector activation before mentioning what it’s about.



  • So while your understanding is better than a lot of people on here, a few things to correct.

    First off, this research isn’t being done on the models in reasoning mode, but in direct inference. So there’s no CoT tokens at all.

    The injection is not of any tokens, but of control vectors. Basically it’s a vector which being added to the activations makes the model more likely to think of that concept. The most famous was “Golden Gate Claude” that had the activation for the Golden Gate Bridge increased so it was the only thing the model would talk about.

    So, if we dive into the details a bit more…

    If your theory was correct, then the way the research asks the question saying that there’s control vectors and they are testing if they are activated, then the model should be biased to sometimes say “yes, I can feel the control vector.” And yes, in older or base models that’s what we might expect to see.

    But, in Opus 4/4.1, when the vector was not added, they said they could detect a vector… 0% of the time! So the control group had enough introspection capability as to not stochastically answer that there was a vector present when there wasn’t.

    But then, when they added the vector at certain layer depths, the model was often able to detect that there was a vector activated, and further to guess what the vector was adding.

    So again — no reasoning tokens present, and the experiment had control and experimental groups where the results negates your theory as to the premise of the question causing affirmative bias.

    Again, the actual research is right there a click away, and given your baseline understanding at present, you might benefit and learn a lot from actually reading it.