Stubsack: weekly thread for sneers not worth an entire post, week ending 22nd June 2025

BlueMonday1984@awful.systems · 7 days ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 22nd June 2025

BlueMonday1984@awful.systems · 2 days ago

New article from Axos: Publishers facing existential threat from AI, Cloudflare CEO says

Baldur Bjarnason has given his commentary:

Honestly, if search engine traffic is over, it might be time for blogs and blog software to begin to deny all robots by default

Anyways, personal sidenote/prediction: I suspect the Internet Archive’s gonna have a much harder time archiving blogs/websites going forward.

Up until this point, the Archive enjoyed easy access to large swathes of the 'Net - site owners had no real incentive to block new crawlers by default, but the prospect of getting onto search results gave them a strong incentive to actively welcome search engine robots, safe in the knowledge that they’d respect robots.txt and keep their server load to a minimum.

Thanks to the AI bubble and the AI crawlers its unleashed upon the 'Net, that has changed significantly.

Now, allowing crawlers by default risks AI scraper bots descending upon your website and stealing everything that isn’t nailed down, overloading your servers and attacking FOSS work in the process. And you can forget about reigning them in with robots.txt - they’ll just ignore it and steal anyways, they’ll lie about who they are, they’ll spam new scrapers when you block the old ones, they’ll threaten to exclude you from search results, they’ll try every dirty trick they can because these fucks feel entitled to steal your work and fundamentally do not respect you as a person.

Add in the fact that the main upside of allowing crawlers (turning up in search results) has been completely undermined by those very same AI corps, as “AI summaries” (like Google’s) steal your traffic through stealing your work, and blocking all robots by default becomes the rational decision to make.

This all kinda goes without saying, but this change in Internet culture all-but guarantees the Archive gets caught in the crossfire, crippling its efforts to preserve the web as site owners and bloggers alike treat any and all scrapers as guilty (of AI fuckery) until proven innocent, and the web becomes less open as a whole as people protect themselves from the AI robber barons.

On a wider front, I expect this will cripple any future attempts at making new search engines, too. In addition to AI making it piss-easy to spam search systems with SEO slop, any new start-ups in web search will struggle with quality websites blocking their crawlers by default, whilst slop and garbage will actively welcome their crawlers, leading to your search results inevitably being dogshit and nobody wanting to use your search engine.

smiletolerantly@awful.systems · 22 hours ago

I don’t like that it’s not open source, and there are opt-in AI features, but I can highly, highly recommend Kagi from a pure search result standpoint, and one of the only alternatives with their own search index.

(Give it a try, they’ve apparently just opened up their search for users without an account to try it out.)

Almost all the slop websites aren’t even shown (or put in a “Listicles” section where they can be accessed, but are not intrusive and do not look like proper results, and you can prioritize/deprioritize sites (for example, I have gituib/reddit/stackoverflow to always show on top, quora and pinterest to never show at all).

Oh, and they have a fediverse “lens” which actually manages to reliably search Lemmy.

This doesn’t really address the future of crawling, just the “Google has gone to shit” part 😄

HedyL@awful.systems · 1 day ago

FWIW, due to recent developments, I’ve found myself increasingly turning to non-search engine sources for reliable web links, such as Wikipedia source lists, blog posts, podcast notes or even Reddit. This almost feels like a return to the early days of the internet, just in reverse and - sadly - with little hope for improvement in the future.

fnix@awful.systems · 19 hours ago

Searching Reddit has really become standard practice for me, a testament to how inhuman the web as a whole has gotten. What a shame.