Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful youāll near-instantly regret.
Any awful.systems sub may be subsneered in this subthread, techtakes or no.
If your sneer seems higher quality than you thought, feel free to cutānāpaste it into its own post ā thereās no quota for posting and the bar really isnāt that high.
The post Xitter web has spawned soo many āesotericā right wing freaks, but thereās no appropriate sneer-space for them. Iām talking redscare-ish, reality challenged āculture criticsā who write about everything but understand nothing. Iām talking about reply-guys who make the same 6 tweets about the same 3 subjects. Theyāre inescapable at this point, yet I donāt see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldnāt be surgeons because they didnāt believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I canāt escape them, I would love to sneer at them.
⦠Is this as made-up and arbitrary as it sounds?
šÆ
I would give it credit for being better than the absolutely worthless approach of āscoring well on a bunch of multiple choice question testsā. And it is possibly vaguely relevant for the
pipe-dreamend goal of outright replacing programmers. But overall, yeah, it is really arbitrary.Also, given how programming is perceived as one of the more in-demand āpotentialā killer-apps for LLMs and how it is also one of the applications it is relatively easy to churn out and verify synthetic training data for (write really precise detailed test cases, then you can automatically verify attempted solutions and synthetic data), even if LLMs are genuinely improving at programming it likely doesnāt indicate general improvement in capabilities.
From the people who brought you performance review season: a way to evaluate code quality of humans and machines
Made up yes, but I wonder if it arbitrary, or some p-hacking equivalent.
It feels very strange to see this kind of statistic get touted, since a 50% success rate would be absolutely unacceptable for one of those software engineers and itās not suggested that if given more time the AI is eventually getting there.
Rather, the usual fail state is to confidently present a plausible-looking product that absolutely fails to do what it was supposed to do, something that would get a human fired so quickly.
They are going with the 50% success rate because the ātime horizonsā for something remotely reasonable like 99% or even just 95% are still so tiny they canāt extrapolate a trend out of it and it tears a massive hole in their whole AGI agents soon scenarios().
But even then, they control the ātime it takes for an engineer to do itā variable anyway. Just count the time they take drinking coffee/put up dilbert strips/remove dilbert strips/tell their coworker to separate art from the artists/explain who these ideas donāt work like that esp not for supporting racists/etc.
(E: Scott is still alive, just checked, and turns out he now is no hormone blockers, and not assisted suicide because he did eventually decide to take the normal treatment for his kind of cancer T blockers, he might have actually not went on this bog standard treatment initially because ⦠he did his own research. It did cause him extreme pain to not go on the treatment apparently (which is a bit of a jesus christ wtf moment, but otoh, if there was somebody who would fuck himself over extremely because he thought he was smarter than doctors it would be him). (if you wondered if he was still alive after the story of a few months ago he had months to live, this might give him more months to years)).