@pinkapple

pinkapple@lemmy.ml · 13 hours ago

You don’t actually know what you’re talking about but like many others in here you put this over the top anti-AI current thing sentiment above everything including simple awareness that you don’t know anything. You clearly haven’t interacted with many therapists and medical professionals in general as a non-patient if you think they’re guaranteed to respect privacy. They’re supposed to but off the record and among friends plenty of them yap about everything. They’re often obligated to report patients in case of self harm etc which can get them involuntarily sectioned, and the patients may have repercussions from that for years like job loss, healthcare costs, homelessness, legal restrictions, stigma etc.

There’s nothing contrived or extemely rare about mental health emergencies and they don’t need to be “emergencies” the way you understand it because many people are undiagnosed or misdiagnosed for years, with very high symptom severity and episodes lasting for months and chronically barely coping. Someone may be in any big city and won’t change a thing, hospitals and doctors don’t have magic pills that automatically cure mental illness assuming that patients have insight (not necessarily present during episodes of many disorders) or awareness that they have some mental illness and aren’t just sad etc (because mental health awareness is in the gutter, example: your pretentious incredulity here). Also assuming they have friends available or that they even feel comfortable enough to talk about what bothers them to people they’re acquainted with.

Some LLM may actually end up convincing them or informing them that they do have medical issues that need to be seen as such. Suicidal ideation may be present for years but active suicidal intent (the state in which people actually do it) rarely lasts more than 30 minutes or a few hours at worst and it’s highly impulsive in nature. Wtf would you or “friends” do in this case? Do you know any techniques to calm people down during episodes? Even unspecialized LLMs have latent knowledge of these things so there’s a good chance they’ll end up getting life saving advice as opposed to just doing it or interacting with humans who default to interpreting it as “attention seeking” and becoming even more convinced that they should go ahead with it because nobody cares.

This holier than thou anti-AI bs had some point when it was about VLMs training on scraped art but some of you echo chamber critters turned it into some imaginary high moral prerogative that even turns off your empathy for anyone using AI even in use cases where it may save lives. Its some terminally online “morality” where supposedly “there is no excuse for the sin of using AI” and just echo chamber boosted reddit brainworms and fully performative unless all of you use fully ethical cobalt-free smartphones so you’re not implicitly gaining convenience from the six million victims of the Congo cobalt wars so far, you never use any services on AWS and magically avoid all megadatacenters etc. Touch grass jfc.

pinkapple@lemmy.ml · 2 days ago

And besides this it’s not like there’s no labour aristocracy that primarily gains from this while other working class groups get much less and get ideologically gaslit about not being members of some potentially either fully corrupt or workerist union with zero radical ultimate aims.

Even the global North(west) contains highly exploited groups with only a minority getting the benefits.

pinkapple@lemmy.ml · 5 days ago

So far none of your ramblings disproves what I said. Yeah there are crawlers for niche collecting probably, nobody crawls the entire internet when they can use the weekly updated common crawl. Unless you or anyone else has access to unknown internal openAI policies on why they intentionally reinvent the wheel, your fake anecdotes (lol bots literally telling you they’re going to use scraping for training in the user agent) don’t cut it. You’re probably seeing search bots.

If you didn’t care for ad money and search engine exposure bozo you’d block everything in robots.txt and be done instead of whining about specific bots you don’t like.

You didn’t link to this but go on take their IPs json files and block them.

pinkapple@lemmy.ml · 6 days ago

Bots only identify themselves and their organization in the user agent, they don’t tell you specifically what they do with the data so stop your fairytales. They do give you a really handy url though with user agents and even IPs jn json if you want to fully block the crawlers but not the search bots sent by user prompts.

Your ad revenue money can be secured.

https://platform.openai.com/docs/bots/

If for some reason you can’t be bothered to edit your own robots.txt (because it’s hard to tell which bots are search bots for muh ad money) then maybe hire someone.

pinkapple@lemmy.ml · 6 days ago

via mechanisms including scraping, APIs, and bulk downloads.

Omg exactly! Thanks. Yet nothing about having to use logins to stop bots because that kinda isn’t a thing when you already provide data dumps and an API to wikimedia commons.

While undergoing a migration of our systems, we noticed that only a fraction of the expensive traffic hitting our core datacenters was behaving how web browsers would usually do, interpreting javascript code. When we took a closer look, we found out that at least 65% of this resource-consuming traffic we get for the website is coming from bots, a disproportionate amount given the overall pageviews from bots are about 35% of the total.

Source for traffic being scraping data for training models: they’re blocking javascript therefore bots therefore crawlers, just trust me bro.

pinkapple@lemmy.ml · 8 days ago

Kay, and that has nothing to do with what i said. Scrapers, bots =/= AI. It’s not even the same companies that make the unfree datasets. The scrapers and bots that hit your website are not some random “AI” feeding on data lol. This is what some models are trained on, it’s already free so it’s doesn’t need to be individually rescraped and it’s mostly garbage quality data: https://commoncrawl.org/ Nobody wastes resources rescraping all this SEO infested dump.

Your issue has everything to do with SEO than anything else. Btw before you diss common crawl, it’s used in research quite a lot so it’s not some evil thing that threatens people’s websites. Add robots.txt maybe.

pinkapple@lemmy.ml · 8 days ago

Nobody is scraping wikipedia over and over to create datasets for AIs, there are already open datasets and API deals. But wiki in particular has always had a data dump of the entire db bimonthly.

https://dumps.wikimedia.org/