• 0 Posts
  • 266 Comments
Joined 2 years ago
cake
Cake day: March 22nd, 2024

help-circle
  • Thank you! Let me wildly oversimplify and make sure I understand.

    The fundamental problem is that if you train on a set that includes multiple independent facts, the generative aspect of the model - the ability to generate new text that is statistically consistent with the training data - requires remixing and combining tokens in a way that will inevitably result in factual errors.

    Like, if your training data includes “all men are mortal” and “all lions are cats” then in order to generate new text it has to be “loose” enough to output “all men are cats”. Feedback and reinforcement can adjust the probabilities to a degree, but because the model is fundamentally about token probabilities and doesn’t have any other way of accounting for whether a statement is actually true, there’s no way to completely remove it. You can reinforce that “all cats are mortal” is a better answer, but you can’t train it that “all men are cats” is invalid.








  • The metaphor I’ve used before is hammering a nail in with a shoe. It can work. If you have a lot of nail-hammering experience - especially hammering-shoe experience - you can find ways to improve how effectively it works. But by the time you’re able to use a shoe as anything resembling a hammer you should be able to both do the work better with the right tool, even if it is less convenient (needing to write the code yourself being analogous to needing to carry a big hammer with you) and more importantly recognize why it’s not an acceptable tool. Especially because in this analogy the only shoes are made of the finest orphan leather.


  • The problem is less that the system would somehow ignore that part of the prompt and more that “hallucinate” or “make stuff up” aren’t special subroutines that get called on demand when prompted by an idiot, they’re descriptive of what an LLM does all the time. It’s following statistical patterns in a matrix created by the training data and reinforcement processes. Theoretically if the people responsible for that training and reinforcement did their jobs well then those patterns should only include true statements but if it was that easy then you wouldn’t have [insert the entire intellectual history of the human species].

    Even if you assume that the AI boosters are completely right and that the LLM inference process is directly analogous to how people think, does saying “don’t fuck up” actually make people less likely to fuck up? Like, the kind of errors you’re looking at here aren’t generated by some separate process. Someone who misremembers a fact doesn’t know they’ve misremembered until they get called out on the error either by someone else with a better memory or reality imposing the consequence of being wrong. Similarly the LLM isn’t doing anything special when it spits out bullshit.





  • We did catch it internally in testing (as we use VS Code for all our work, so some folks did stumble on it), but I think we underestimated the impact and should do a better job at that.

    Either this is an outright lie or it’s a sign of just how fucked this industry has gotten. There should be no way that anyone looked at this and decided it wasn’t a big enough deal to block given that this is basically the single issue driving most of the industry’s cultural discourse and a good chunk of the broader world’s as well. If that’s what happened then the people making those decisions are so thoroughly insulated from literally any feedback that the industry - to say nothing of the world at large - would be better served if they were replaced by a literal magic 8 ball.