Stubsack: weekly thread for sneers not worth an entire post, week ending 4th August 2025

David Gerard@awful.systems · 2 months ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 4th August 2025

Amoeba_Girl@awful.systems · 1 month ago

… Is this as made-up and arbitrary as it sounds?

BigMuffN69@awful.systems · 1 month ago

💯

scruiser@awful.systems · edit-2 1 month ago

I would give it credit for being better than the absolutely worthless approach of “scoring well on a bunch of multiple choice question tests”. And it is possibly vaguely relevant for the ~~pipe-dream~~ end goal of outright replacing programmers. But overall, yeah, it is really arbitrary.

Also, given how programming is perceived as one of the more in-demand “potential” killer-apps for LLMs and how it is also one of the applications it is relatively easy to churn out and verify synthetic training data for (write really precise detailed test cases, then you can automatically verify attempted solutions and synthetic data), even if LLMs are genuinely improving at programming it likely doesn’t indicate general improvement in capabilities.

antifuchs@awful.systems · 1 month ago

From the people who brought you performance review season: a way to evaluate code quality of humans and machines

Soyweiser@awful.systems · 1 month ago

Made up yes, but I wonder if it arbitrary, or some p-hacking equivalent.

YourNetworkIsHaunted@awful.systems · 1 month ago

It feels very strange to see this kind of statistic get touted, since a 50% success rate would be absolutely unacceptable for one of those software engineers and it’s not suggested that if given more time the AI is eventually getting there.

Rather, the usual fail state is to confidently present a plausible-looking product that absolutely fails to do what it was supposed to do, something that would get a human fired so quickly.

scruiser@awful.systems · 1 month ago

They are going with the 50% success rate because the “time horizons” for something remotely reasonable like 99% or even just 95% are still so tiny they can’t extrapolate a trend out of it and it tears a massive hole in their whole AGI agents soon scenarios().

Soyweiser@awful.systems · edit-2 1 month ago

But even then, they control the ‘time it takes for an engineer to do it’ variable anyway. Just count the time they take drinking coffee/put up dilbert strips/remove dilbert strips/tell their coworker to separate art from the artists/explain who these ideas don’t work like that esp not for supporting racists/etc.

(E: Scott is still alive, just checked, and turns out he now is no hormone blockers, and not assisted suicide because he did eventually decide to take the normal treatment for his kind of cancer T blockers, he might have actually not went on this bog standard treatment initially because … he did his own research. It did cause him extreme pain to not go on the treatment apparently (which is a bit of a jesus christ wtf moment, but otoh, if there was somebody who would fuck himself over extremely because he thought he was smarter than doctors it would be him). (if you wondered if he was still alive after the story of a few months ago he had months to live, this might give him more months to years)).