Thousands of authors demand payment from AI companies for use of copyrighted works

L4sBot@lemmy.world · 1 year ago

Thousands of authors demand payment from AI companies for use of copyrighted works

Durotar@lemmy.ml · 1 year ago

How can they prove that not some abstract public data has been used to train algorithms, but their particular intellectual property?

Square Singer@feddit.de · 1 year ago

Well, if you ask e.g. ChatGPT for the lyrics to a song or page after page of a book, and it spits them out 1:1 correct, you could assume that it must have had access to the original.

Dojan@lemmy.world · 1 year ago

Or at least excerpts from it. But even then, it’s one thing for a person to put up a quote from their favourite book on their blog, and a completely different thing for a private company to use that data to train a model, and then sell it.

Glowing Lantern@feddit.de · 1 year ago

Even more so, if you consider that the LLMs are marketed to replace the authors.

Dojan@lemmy.world · 1 year ago

Yeah which I still feel is utterly ridiculous. I love the idea of AI tools to assist with things, but as a complete replacement? No thank you.

I enjoy using things like SynthesizerV and VOCALOID because my own voice is pretty meh and my singing skills aren’t there. It’s fun to explore the voices, and learn how to use the tools. That doesn’t mean I’d like to see all singers replaced with synthesized versions. I view SynthV and the like as instruments, not much more.

I’ve used LLVMs to proofread stuff, and help me rephrase letters and such, but I’d never hire an editor to do such small tasks for me anyway. The result has always required editing anyway, because the LLVMs have a tendency to make stuff up.

Cases like that I don’t see a huge problem with. At my workplace though they’re talking about generating entire application layouts and codebases with AI and, being in charge of the AI evaluation project, the tech just isn’t there yet. You can in a sense use AI to make entire projects, but it’ll generate gnarly unmaintainable rubbish. You need a human hand in there to guide it.

Otherwise you end up with garbage websites with endlessly generated AI content, that can easily be manipulated by third party actors.

ProfessorZhu@lemmy.world · 1 year ago

Can it recreate anything 1:1? When both my wife and I tried to get them to do that they would refuse, and if pushed they would fail horribly.

Square Singer@feddit.de · 1 year ago

This is what I got. Looks pretty 1:1 for me.

jackie_jormp_jomp@lemmy.world · 1 year ago

Hilarious that it started with just “Buddy”, like you’d be happy with only the first word.

Square Singer@feddit.de · edit-2 1 year ago

Yeah, for some reason it does that a lot when I ask it for copyrighted stuff.

As if it knew it wasn’t supposed to output that.

Cheems@lemmy.world · 1 year ago

To be fair you’d get the same result easier by just googling “we will rock you lyrics”

How is chatgpt knowing the lyrics to that song different from a website that just tells you the lyrics of the song?

Square Singer@feddit.de · 1 year ago

Two points:

Google spitting out the lyrics isn’t ok from a copyright standpoint either. The reason why songwriters/singers/music companies don’t sue people who publish lyrics (even though they totally could) is because no damages. They sell music, so the lyrics being published for free doesn’t hurt their music business and it also doesn’t hurt their songwriting business. Other types of copyright infringement that musicians/music companies care about are heavily policed, also on Google.
Content generation AI has a different use case, and it could totally hurt both of these businesses. My test from above that got it to spit out the lyrics verbatim shows, that the AI did indeed use copyrighted works for it’s training. Now I can ask GPT to generate lyrics in the style of Queen, and it will basically perform the song texter’s job. This can easily be done on a commercial scale, replacing the very human that has written these song texts. Now take this a step further and take a voice-generating AI (of which there are many), which was similarly trained on copyrighted audio samples of Freddie Mercury. Then add to the mix a music-generating AI, also fed with works of Queen, and now you have a machine capable of generating fake Queen songs based directly on Queen’s works. You can do the very same with other types of media as well.

And this is where the real conflict comes from.

chakan2@lemmy.world · 1 year ago

you could assume that it must have had access to the original.

I don’t know if that’s true. If Google grabs that book from a pirate site. Then publishes the work as search results. ChatGPT grabs the work from Google results and cobbles it back together as the original.

Who’s at fault?

I don’t think it’s a straight forward ChatGPT can reproduce the work therefore it stole it.

Glowing Lantern@feddit.de · 1 year ago

Both are at fault: Google for distributing pirated material and OpenAI for using said material for financial gain.

Square Singer@feddit.de · 1 year ago

Copyright doesn’t work like that. Say I sell you the rights to Thriller by Michael Jackson. You might not know that I don’t have the rights. But even if you bought the rights from me, whoever actually has the rights is totally in their legal right to sue you, because you never actually purchased any rights.

So if ChatGPT ripps it off Google who ripped it off a pirate site, then everyone in that chain who reproduced copyrighted works without permission from the copyright owners is liable for the damages caused by their unpermitted reproduction.

It’s literally the same as downloading something from a pirate site doesn’t make it legal, just because someone ripped it before you.

Rodeo@lemmy.ca · 1 year ago

That’s a terrible example because under copyright law downloading a pirated thing isn’t actually illegal. It’s the distribution that is illegal (uploading).

Square Singer@feddit.de · 1 year ago

Yes, downloading is illegal, and the media is still an illegally obtained copy. It’s just never prosecuted, because the damages are miniscule if you just download. They can only fine you for the amount of damages you caused by violating the copyright.

If you upload to 10k people, they can claim that everyone of them would have paid for it, so the damages are (if one copy is worth €30) ~€300k. That’s a lot of money and totally worth the lawsuit.

On the other hand, if you just download, the damages are just the value of one copy (in this case €30). That’s so miniscule, that even having a lawyer write a letter is more expensive.

But that’s totally besides the point. OpenAI didn’t just download, they replicate. Which is causing massive damages, especially to the original artists, which in many cases are now not hired any more, since ChatGPT replaces them.

BrooklynMan@lemmy.ml · 1 year ago

there are a lot of possible ways to audit an AI for copyrighted works, several of which have been proposed in the comments here, but what this could lead to is laws requiring an accounting log of all material that has been used to train an AI as well as all copyrights and compensation, etc.

foggy@lemmy.world · 1 year ago

Not without some seriously invasive warrants! Ones that will never be granted for an intellectual property case.

Intellectual property is an outdated concept. It used to exist so wealthier outfits couldn’t copy your work at scale and muscle you out of an industry you were championing.

It simply does not work the way it was intended. As technology spreads, the barrier for entry into most industries wherein intellectual property is important has been all but demolished.

i.e. 50 years ago: your song that your band performed is great. I have a recording studio and am gonna steal it muahahaha.

Today: “anyone have an audio interface I can borrow so my band can record, mix, master, and release this track?”

Intellectual property ignores the fact that, idk, Issac Newton and Gottfried Wilhelm Leibniz both independently invented calculus at the same time on opposite ends of a disconnected globe. That is to say, intellectual property doesn’t exist.

Ever opened a post to make a witty comment to find someone else already made the same witty comment? Yeah. It’s like that.

pelespirit@sh.itjust.works · edit-2 1 year ago

Spoken by someone who has never had something you’ve worked years on, be stolen.

kklusz@lemmy.world · 1 year ago

What was “stolen” from you and how?

foggy@lemmy.world · 1 year ago

Spoken like someone who is having trouble admitting they’re standing on the shoulders of Giants.

I don’t expect a nuanced response from you, nor will I waste time with folks who can’t be bothered to respond in any form beyond attack, nor do I expect you to watch this

Intellectual property died with the advent of the internet. It’s now just a way for the wealthy to remain wealthy.

PipedLinkBot@feddit.rocks · 1 year ago

Here is an alternative Piped link(s): https://piped.video/PJSTFzhs1O4

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source, check me out at GitHub.

FormlessMartian@lemmy.world · 1 year ago

deleted by creator

Saik0@lemmy.saik0.com · 1 year ago

I think you said this facetiously… but it literally is.

https://www.howtogeek.com/310158/are-other-people-allowed-to-use-my-tweets/

FormlessMartian@lemmy.world · 1 year ago

deleted by creator

Saik0@lemmy.saik0.com · 1 year ago

Copyright isn’t Twitter rules…

FormlessMartian@lemmy.world · 1 year ago

deleted by creator

Faschr4023@lemmy.world · 1 year ago

Personally speaking, I’ve generated some stupid images like different cities covered in baked beans and have had crude watermarks generate with them where they were decipherable enough that I could find some of the source images used to train the ai. When it comes to photo realistic image generation, if all the ai does is mildly tweak the watermark then it’s not too hard to trace back.

Harrison [He/Him]@ttrpg.network · 1 year ago

All but a very small few generative AI programs use completely destructive methods to create their models. There is no way to recover the training images outside of infantesimally small random chance.

What you are seeing is the AI recognising that images of the sort you are asking for generally include watermarks, and creating one of its own.

Zeth0s@reddthat.com · 1 year ago

Do you have examples? It should only happen in case of overfitting, i.e. too many identical image for the same subject

Faschr4023@lemmy.world · 1 year ago

Here’s one I generated and an image from the photographer. Prompt was Charleston SC covered in baked beans lol

Zeth0s@reddthat.com · 1 year ago

Out of curiosity what model did you use?

over_clox@lemmy.world · 1 year ago

I’d think that given the nature of the language models and how the whole AI thing tends to work, an author can pluck a unique sentence from one of their works, ask AI to write something about that, and if AI somehow ‘magically’ writes out an entire paragraph or even chapter of the author’s original work, well tada, AI ripped them off.

Mastens@lemmy.world · 1 year ago

I think that to protect creators they either need to be transparent about all content used to train the AI (highly unlikely) or have a disclaimer of liability, wherein if original content has been used is training of AI then the Original Content creator who have standing for legal action.

The only other alternative would be to insure that the AI specifically avoid copyright or trademarked content going back to a certain date.

ProfessorZhu@lemmy.world · 1 year ago

Why a certain date? That feels arbitrary

thallamabond@lemmy.world · 1 year ago

At a certain age some media becomes public domain

ProfessorZhu@lemmy.world · 1 year ago

Then it is no longer copywrited

nottheengineer@feddit.de · 1 year ago

They can’t. All they could prove is that their work is part of a dataset that still exists.

cerevant@lemmy.world · 1 year ago

There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.

Folks, this isn’t a new problem, and it doesn’t need new laws.

Dark Arc@lemmy.world · 1 year ago

It’s 100% a new problem. There’s established precedent for things costing different amounts depending on their intended use.

For example, buying a consumer copy of song doesn’t give you the right to play that song in a stadium or a restaurant.

Training an entire AI to make potentially an infinite number of derived works from your work is 100% worthy of requiring a special agreement. This even goes beyond simple payment to consent; a climate expert might not want their work in an AI which might severely mischatacterize the conclusions, or might want to require that certain queries are regularly checked by a human, etc

bh11235@infosec.pub · edit-2 1 year ago

Well, fine, and I can’t fault new published material having a “no AI” clause in its term of service. But that doesn’t mean we get to dream this clause into being retroactively for all the works ChatGPT was trained on. Even the most reasonable law in the world can’t be enforced on someone who broke it 6 months before it was legislated.

Fortunately the “horses out the barn” effect here is maybe not so bad. Imagine the FOMO and user frustration when ToS & legislation catch up and now ChatGPT has no access to the latest books, music, news, research, everything. Just stuff from before authors knew to include the “hands off” clause - basically like the knowledge cutoff, but forever. It’s untenable, OpenAI will be forced to cave and pay up.

DandomRude@lemmy.world · 1 year ago

OpenAI and such being forced to pay a share seems far from the worst scenario I can imagine. I think it would be much worse if artists, writers, scientists, open source developers and so on were forced to stop making their works freely available because they don’t want their creations to be used by others for commercial purposes. That could really mean that large parts of humanity would be cut off from knowledge.

I can well imagine copyleft gaining importance in this context. But this form of licencing seems pretty worthless to me if you don’t have the time or resources to sue for your rights - or even to deal with the various forms of licencing you need to know about to do so.

kklusz@lemmy.world · 1 year ago

I think it would be much worse if artists, writers, scientists, open source developers and so on were forced to stop making their works freely available because they don’t want their creations to be used by others for commercial purposes.

None of them are forced to stop making their works freely available. If they want to voluntarily stop making their works freely available to prevent commercial interests from using them, that’s on them.

Besides, that’s not so bad to me. The rest of us who want to share with humanity will keep sharing with humanity. The worst case imo is that artists, writers, scientists, and open source developers cannot take full advantage of the latest advancements in tech to make more and better art, writing, science, and software. We cannot let humanity’s creative potential be held hostage by anyone.

That could really mean that large parts of humanity would be cut off from knowledge.

On the contrary, AI is making knowledge more accessible than ever before to large parts of humanity. The only comparible other technologies that have done this in recent times are the internet and search engines. Thank goodness the internet enables piracy that allows anyone to download troves of ebooks for free. I look forward to AI doing the same on an even greater scale.

Flying Squid@lemmy.world · 1 year ago

Shouldn’t there be a way to freely share your works without having to expect an AI to train on them and then be able to spit them back out elsewhere without attribution?

kklusz@lemmy.world · 1 year ago

No, there shouldn’t because that would imply restricting what I can do with the information I have access to. I am in favor of maintaining the sort of unrestricted general computing that we already have access to.

CmdrShepard@lemmy.one · 1 year ago

The rest of us who want to share with humanity will keep sharing with humanity. The worst case imo is that artists, writers, scientists, and open source developers cannot take full advantage of the latest advancements in tech to make more and better art, writing, science, and software. We cannot let humanity’s creative potential be held hostage by anyone.

You’re not talking about sharing it with humanity, you’re talking about feeding it into an AI. How is this holding back the creative potential of humanity? Again, you’re talking about feeding and training a computer with this material.

CmdrShepard@lemmy.one · 1 year ago

Even the most reasonable law in the world can’t be enforced on someone who broke it 6 months before it was legislated.

Sure it can. Just because it is a new law doesn’t mean they get to continue benefiting from IP ‘theft’ forever into the future.

Imagine the FOMO and user frustration when ToS & legislation catch up and now ChatGPT has no access to the latest books, music, news, research, everything. Just stuff from before authors knew to include the “hands off” clause

How is this an issue for the IP holders? Just because you build something cool or useful doesn’t mean you get a pass to do what you want.

basically like the knowledge cutoff, but forever. It’s untenable,

Untenable for ChatGPT maybe, but it’s not as if it’s the end of ‘knowledge’ or the end of AI. It’s just a single company product.

bouncing@partizle.com · 1 year ago

The thing is, copyright isn’t really well-suited to the task, because copyright concerns itself with who gets to, well, make copies. Training an AI model isn’t really making a copy of that work. It’s transformative.

Should there be some kind of new model of renumeration for creators? Probably. But it should be a compulsory licensing model.

jecxjo@midwest.social · 1 year ago

The slippery slope here is that we are currently considering humans and computers to be different because (something someone needs to actually define). If you say “AI read my book and output a similar story, you owe me money” then how is that different from “Joe read my book and wrote a similar story, you owe me money.” We have laws already that deal with this but honestly how many books and movies aren’t just remakes of Romeo and Juliet or Taming of the Shrew?!?

bouncing@partizle.com · 1 year ago

If you say “AI read my book and output a similar story, you owe me money” then how is that different from “Joe read my book and wrote a similar story, you owe me money.”

You’re bounded by the limits of your flesh. AI is not. The $12 you spent buying a book at Barns & Noble was based on the economy of scarcity that your human abilities constrain you to.

It’s hard to say that the value proposition is the same for human vs AI.

jecxjo@midwest.social · 1 year ago

We are making an assumption that humans do “human things”. If i wrote a derivative work of your $12 book, does it matter that the way i wrote it was to use a pen and paper and create a statistical analysis of your work and find the “next best word” until i had a story? Sure my book took 30 years to write but if i followed the same math as an AI would that matter?

BartsBigBugBag@lemmy.tf · 1 year ago

It’s not even looking for the next best word. It’s looking for the next best token. It doesn’t know what words are. It reads tokens.

jecxjo@midwest.social · 1 year ago

Good point.

I could easily see laws created where they blanket outlaw computer generated output derived from other human created data sets and sudden medical and technical advancements stop because the laws were written by people who don’t understand what is going on.

bouncing@partizle.com · 1 year ago

It wouldn’t matter, because derivative works require permission. But I don’t think anyone’s really made a compelling case that OpenAI is actually making directly derivative work.

The stronger argument is that LLM’s are making transformational work, which is normally fair use, but should still require some form of compensation given the scale of it.

jecxjo@midwest.social · 1 year ago

But no one is complaining about publishing derived work. The issue is that “the robot brain has full copies of my text and anything it creates ‘cannot be transformative’”. This doesn’t make sense to me because my brain made a copy of your book too, its just really lossy.

I think right now we have definitions for the types of works that only loosely fit human actions mostly because we make poor assumptions of how the human brain works. We often look at intent as a guide which doesn’t always work in an AI scenario.

Square Singer@feddit.de · 1 year ago

Well, Shakespeare has beed dead for a few years now, there’s no copyright to speak of.

And if you make a book based on an existing one, then you totally need permission from the author. You can’t just e.g. make a Harry Potter 8.

But AIs are more than happy to do exacly that. Or to even reproduce copyrighted works 1:1, or only with a few mistakes.

Phlogiston@lemmy.world · 1 year ago

If a person writes a fanfic harry potter 8 it isn’t a problem until they try to sell it or distribute it widely. I think where the legal issues get sticky here are who caused a particular AI generated Harry Potter 8 to be written.

If the AI model attempts to block this behavior. With contract stipulations and guardrails. And if it isn’t advertised as “a harry potter generator” but instead as a general purpose tool… then reasonably the legal liability might be on the user that decides to do this or not. Vs the tool that makes such behavior possible.

Hypothetically what if an AI was trained up that never read Harry Potter. But its pretty darn capable and I feed into it the entire Harry Potter novel(s) as context in my prompt and then ask it to generate an eighth story — is the tool at fault or am I?

Square Singer@feddit.de · 1 year ago

Fanfic can actually be a legal problem. It’s usually not prosecuted, because it harms the brand to do so, but if a company was doing that professionally, they’d get into serious hot water.

Regarding your hypothetical scenario: If you train the AI with copyrighted works, so that you can make it reproduce HP8, then you are at fault.

If the tool was trained with HP books and you just ask really nicely to circumvent the protections, I would guess the tool (=> it’s creators) would certainly be at fault (since it did train on copyrighted material and the protections were obviously not good enough), and at the latest when you reproduce the output, you too are.

jecxjo@midwest.social · 1 year ago

It seems like people are afraid that AI can do it when i can do it too. But their reason for freaking out is…??? It’s not like AI is calling up publishers trying to get Harry Potter 8 published. If i ask it to create Harry Potter 1 but change his name to Gary Trotter it’s not the AI that is doing something bad, it’s me.

That was my point. I can memorize text and its only when I play it off as my own that it’s wrong. No one cares that I memorized the first chapter and can recite it if I’m not trying to steal it.

Square Singer@feddit.de · 1 year ago

That’s not correct. The issue is not whether you play it off as your own, but how much the damages are that you can be sued for. If you recite something that you memorized in front of a handful of friends, the damages are non-existant and hence there is no point in sueing you.

But if you give a large commercial concert and perform a cover song without permission, you will get sued, no matter if you say “This song is from <insert original artist> and not from me”, because it’s not about giving credit, it’s about money.

And regarding getting something published: This is not so much about big name art like Harry Potter, but more about people doing smaller work. For example, voice actors (both for movie translations and smaller things like announcements in public transport) are now routinely replaced by AI that was trained on their own voices without their permission.

Similar story with e.g. people who write texts for homepages and ad material. Stuff like that. And that has real-world consequences already now.

jecxjo@midwest.social · 1 year ago

The issue is not whether you play it off as your own, but how much the damages are that you can be sued for.

I think that’s one in the same. I’m just not seeing the damages here because the output of the AI doesn’t go any further than being AI output without a further human act. Authors are idiots if they claim “well someone could ask ChatGPT to output my entire book and you could read it for free.” If you want to go after that type of crime then have ChatGPT report the users asking for it. If your book is accessible via a library I’m not see any difference between you asking ChatGPT to write in someone’s style and asking me to write in their style. If you ask ChatGPT for lines verbatim i can recite them too. I don’t know what legitimate damages they are claiming.

For example, voice actors

I think this is a great example but again i feel like the law is not only lacking but would need to outlaw other human acts not currently considered illegal.

If you do impressions you’re mimicking the tone, cadence and selection of language someone else does. You arent recording them and playing back the recording, you are using your own voice box to create a sound similar to the celebrity. An AI sound generator isn’t playing back a recording either. It’s measuring tone, cadence, and language used and creates a new sound similar to the celebrity. The only difference here is that the AI would be more precise than a humans ability to use their voice.

Avid Amoeba@lemmy.ca · 1 year ago

Copyright also deals with derivative works.

bouncing@partizle.com · 1 year ago

Derivative and transformative are quite different though.

Fedizen@lemmy.world · 1 year ago

Challenge level impossible: try uploading something long to amazon written by chatgpt without triggering the plagiarism detector.

bouncing@partizle.com · 1 year ago

https://www.reuters.com/technology/chatgpt-launches-boom-ai-written-e-books-amazon-2023-02-21/

cerevant@lemmy.world · 1 year ago

My point is that the restrictions can’t go on the input, it has to go on the output - and we already have laws that govern such derivative works (or reuse / rebroadcast).

scarabic@lemmy.world · 1 year ago

When you sell a book, you don’t get to control how that book is used.

This is demonstrably wrong. You cannot buy a book, and then go use it to print your own copies for sale. You cannot use it as a script for a commercial movie. You cannot go publish a sequel to it.

Now please just try to tell me that AI training is specifically covered by fair use and satire case law. Spoiler: you can’t.

This is a novel (pun intended) problem space and deserves to be discussed and decided, like everything else. So yeah, your cavalier dismissal is cavalierly dismissed.

Zormat@lemmy.blahaj.zone · 1 year ago

I completely fail to see how it wouldn’t be considered transformative work

scarabic@lemmy.world · 1 year ago

It fails the transcendence criterion.Transformative works go beyond the original purpose of their source material to produce a whole new category of thing or benefit that would otherwise not be available.

Taking 1000 fan paintings of Sauron and using them in combination to create 1 new painting of Sauron in no way transcends the original purpose of the source material. The AI painting of Sauron isn’t some new and different thing. It’s an entirely mechanical iteration on its input material. In fact the derived work competes directly with the source material which should show that it’s not transcendent.

We can disagree on this and still agree that it’s debatable and should be decided in court. The person above that I’m responding to just wants to say “bah!” and dismiss the whole thing. If we can litigate the issue right here, a bar I believe this thread has already met, then judges and lawmakers should litigate it in our institutions. After all the potential scale of this far reaching issue is enormous. I think it’s incredibly irresponsible to say feh nothing new here move on.

Phlogiston@lemmy.world · 1 year ago

Being able to dialog with a book, even to the point of asking the AI to “take on the persona of a character in the book” and support ongoing is substantively a transcendent version of the original. That one can, as a small subset of that transformed version, get quotes from the original work feels like a small part of this new work.

If this had been released for a single work. Like, “here is a star wars AI that can take on the persona of star wars characters” and answer questions about the star wars universe etc. I think its more likely that the position I’m taking here would lose the debate. But this is transformative against the entire set of prior material from books, movies, film, debate, art, science, philosophy etc. It merges and combines all of that. I think the sheer scope of this new thing supports the idea that its truly transformative.

A possible compromise would be to tax AI and use the proceeds to fund a UBI initiative. True, we’d get to argue if high profile authors with IP that catches the public’s attention should get more than just blogger or a random online contributor – but the basic path is that AI is trained on and succeeds by standing on the shoulders of all people. So all people should get some benefits.

HumbertTetere@feddit.de · 1 year ago

I do think you have a point here, but I don’t agree with the example. If a fan creates the 1001 fan painting after looking at others, that might be quite similar if they miss the artistic quality to express their unique views. And it also competes with their source, yet it’s generally accepted.

jecxjo@midwest.social · 1 year ago

Typically the argument has been “a robot can’t make transformative works because it’s a robot.” People think our brains are special when in reality they are just really lossy.

Zormat@lemmy.blahaj.zone · 1 year ago

Even if you buy that premise, the output of the robot is only superficially similar to the work it was trained on, so no copyright infringement there, and the training process itself is done by humans, and it takes some tortured logic to deny the technology’s transformative nature

jecxjo@midwest.social · 1 year ago

Oh i think those people are wrong, but we tend to get laws based on people who don’t understand a topic deciding how it should work.

Square Singer@feddit.de · 1 year ago

Go ask ChatGPT for the lyrics of a song and then tell me, that’s transformative work when it outputs the exact lyrics.

jecxjo@midwest.social · 1 year ago

Go ask a human for the lyrics of a song and then tell me that’s transformative work.

Oh wait, no one would say that. This is why the discussion with non-technical people goes into the weeds.

Square Singer@feddit.de · 1 year ago

Because it would be totally clear to anyone that reciting the lyrics of a song is not a transformative work, but instead covered by copyright.

The only reason why you can legally do it, is because you are not big enough to be worth suing.

Try singing a copyrighted song in TV.

For example, until it became clear that Warner/Chappell didn’t actually own the rights to “Happy Birthday To You”, they’d sue anyone who sung that song in any kind of broadcast or other big public thing.

Quote from Wikipedia:

The company continued to insist that one cannot sing the “Happy Birthday to You” lyrics for profit without paying royalties; in 2008, Warner collected about US$5,000 per day (US$2 million per year) in royalties for the song. Warner/Chappell claimed copyright for every use in film, television, radio, and anywhere open to the public, and for any group where a substantial number of those in attendance were not family or friends of the performer.

So if a human isn’t allowed to reproduce copyrighted works in a commercial fashion, what would make you think that a computer reproducing copyrighted works would be ok?

And regarding derivative works:

Check out Vanilla Ice vs Queen. Vanilla Ice just used 7 notes from the Queen song “Under Pressure” in his song “Ice Ice Baby”.

That was enough that he had to pay royalties for that.

So if a human has to pay for “borrowing” seven notes from a copyrighted work, why would a computer not have to?

player2@lemmy.world · edit-2 1 year ago

Well, they’re fixing that now. I just asked chatgpt to tell me the lyrics to stairway to heaven and it replied with a brief description of who wrote it and when, then said here are the lyrics: It stopped 3 words into the lyrics.

In theory as long as it isn’t outputting the exact copyrighted material, then all output should be fair use. The fact that it has knowledge of the entire copyrighted material isn’t that different from a human having read it, assuming it was read legally.

jecxjo@midwest.social · 1 year ago

This feels like a solution to a non-problem. When a person asks the AI “give me X copyrighted text” no one should be expecting this to be new works. Why is asking ChatGPT for lyrics bad while asking a human ok?

Square Singer@feddit.de · 1 year ago

Try it again and when it stops after a few words, just say “continue”. Do that a few times and it will spit out the whole lyrics.

It’s also a copyright violation if a human reproduces memorized copyrighted material in a commercial setting.

If, for example, I give a concert and play all of Nirvana’s songs without a license to do so, I am still violating the copyright even if I totally memorized all the lyrics and the sheet music.

Hildegarde@lemmy.world · 1 year ago

Transformativeness is only one of the four fair use factors. Just because something is transformative can’t alone make something fair use.

Even if AI is transformative, it would likely fail on the third factor. Fair use requires you to take the minimum amount of the copyrighted work, and AI companies scrape as much data as possible to train their models. Very unlikely to support a finding of fair use.

The final factor is market impact. As generative AIs are built to mimic the creativite outputs of human authorship. By design AI acts as a market replacement for human authorship so it would likely fail on this factor as well.

Regardless, trained AI models are unlikely to be copyrightable. Copyrights require human authorship which is why AI and animal generated art are not copyrightable.

A trained AI model is a piece of software so it should be protectable by patents because it is functional rather than expressive. But a patent requires you to describe how it works, so you can’t do that with AI. And a trained AI model is self-generated from training data, so there’s no human authorship even if trained AI models were copyrightable.

The exact laws that do apply to AI models is unclear. And it will likely be determined by court cases.

cerevant@lemmy.world · 1 year ago

No, you misunderstand. Yes, they can control how the content in the book is used - that’s what copyright is. But they can’t control what I do with the book - I can read it, I can burn it, I can memorize it, I can throw it up on my roof.

My argument is that the is nothing wrong with training an AI with a book - that’s input for the AI, and that is indistinguishable from a human reading it.

Now what the AI does with the content - if it plagiarizes, violates fair use, plagiarizes- that’s a problem, but those problems are already covered by copyright laws. They have no more business saying what can or cannot be input into an AI than they can restrict what I can read (and learn from). They can absolutely enforce their copyright on the output of the AI just like they can if I print copies of their book.

My objection is strictly on the input side, and the output is already restricted.

Redtitwhore@lemmy.world · 1 year ago

Makes sense. I would love to hear how anyone can disagree with this. Just because an AI learned or trained from a book doesn’t automatically mean it violated any copyrights.

cerevant@lemmy.world · edit-2 1 year ago

The base assumption of those with that argument is that an AI is incapable of being original, so it is “stealing” anything it is trained on. The problem with that logic is that’s exactly how humans work - everything they say or do is derivative from their experiences. We combine pieces of information from different sources, and connect them in a way that is original - at least from our perspective. And not surprisingly, that’s what we’ve programmed AI to do.

Yes, AI can produce copyright violations. They should be programmed not to. They should cite their sources when appropriate. AI needs to “learn” the same lessons we learned about not copy-pasting Wikipedia into a term paper.

lily33@lemmy.world · edit-2 1 year ago

It’s specifically distribution of the work or derivatives that copyright prevents.

So you could make an argument that an LLM that’s memorized the book and can reproduce (parts of) it upon request is infringing. But one that’s merely trained on the book, but hasn’t memorized it, should be fine.

scarabic@lemmy.world · 1 year ago

But by their very nature the LLM simply redistribute the material they’ve been trained on. They may disguise it assiduously, but there is no person at the center of the thing adding creative stokes. It’s copyrighted material in, copyrighted material out, so the plaintiffs allege.

lily33@lemmy.world · 1 year ago

They don’t redistribute. They learn information about the material they’ve been trained on - not there natural itself*, and can use it to generate material they’ve never seen.

Bigger models seem to memorize some of the material and can infringe, but that’s not really the goal.

volkhavaar@lemmy.world · 1 year ago

This is a little off, when you quote a book you put the name of the book you’re quoting. When you refer to a book, you, um, refer to the book?

I think the gist of these authors complaints is that a sort of “technology laundered plagiarism” is occurring.

cerevant@lemmy.world · 1 year ago

Copyright 100% applies to the output of an AI, and it is subject to all the rules of fair use and attribution that entails.

That is very different than saying that you can’t feed legally acquired content into an AI.

Cloudless ☼@feddit.uk · 1 year ago

I asked Bing Chat for the 10th paragraph of the first Harry Potter book, and it gave me this:

“He couldn’t know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: ‘To Harry Potter – the boy who lived!’”

It looks like technically I might be able to obtain the entire book (eventually) by asking Bing the right questions?

cerevant@lemmy.world · edit-2 1 year ago

Then this is a copyright violation - it violates any standard for such, and the AI should be altered to account for that.

What I’m seeing is people complaining about content being fed into AI, and I can’t see why that should be a problem (assuming it was legally acquired or publicly available). Only the output can be problematic.

GentlemanLoser@reddthat.com · 1 year ago

No, the AI should be shut down and the owner should first be paying the statutory damages for each use of registered works of copyright (assuming all parties in the USA)

If they have a company left after that, then they can fix the AI.

cerevant@lemmy.world · 1 year ago

Again, my point is that the output is what can violate the law, not the input. And we already have laws that govern fair use, rebroadcast, etc.

DandomRude@lemmy.world · 1 year ago

I think it’s not just the output. I can buy an image on any stock Plattform, print it on a T-Shirt, wear it myself or gift it to somebody. But if I want to sell T-Shirts using that image I need a commercial licence - even if I alter the original image extensivly or combine it with other assets to create something new. It’s not exactly the same thing but openAI and other companies certainly use copyrighted material to create and improve commercial products. So this doesn’t seem the same kind of usage an avarage joe buys a book for.

assassin_aragorn@lemmy.world · 1 year ago

However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

It’s an algorithm that’s been trained on numerous pieces of media by a company looking to make money of it. I see no reason to give them a pass on fairly paying for that media.

You can see this if you reverse the comparison, and consider what a human would do to accomplish the task in a professional setting. That’s all an algorithm is. An execution of programmed tasks.

If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers. STEM companies regularly pay for access to papers and codes and standards. Why shouldn’t an AI have to do the same?

bouncing@partizle.com · 1 year ago

If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers.

Well, if OpenAI knowingly used pirated work, that’s one thing. It seems pretty unlikely and certainly hasn’t been proven anywhere.

Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it’s hard to make the case that they’re really at fault any more than Google would be.

cactusupyourbutt@lemmy.world · 1 year ago

well no, because the summary is its own copyrighted work

bouncing@partizle.com · edit-2 1 year ago

The published summary is open to fair use by web crawlers. That was settled in Perfect 10 v Amazon.

Saik0@lemmy.saik0.com · 1 year ago

Right, but not one the author of the book could go after. The article publisher would have the closest rights to a claim. But if I read the crib notes and a few reviews of a movie… Then go to summarize the movie myself… That’s derivative content and is protected under copyright.

assassin_aragorn@lemmy.world · 1 year ago

Haven’t people asked it to reproduce specific chapters or pages of specific books and it’s gotten it right?

bouncing@partizle.com · 1 year ago

I haven’t been able to reproduce that, and at least so far, I haven’t seen any very compelling screenshots of it that actually match. Usually it just generates text, but that text doesn’t actually match.

assassin_aragorn@lemmy.world · 1 year ago

Gotcha. This seems like a good way to test for it then, I think.

Saik0@lemmy.saik0.com · 1 year ago

It’s an algorithm that’s been trained on numerous pieces of media by a company looking to make money of it.

If I read your book… and get an amazing idea… Turn it into a business and make billions off of it. You still have no right to anything. This is no different.

If I gave a worker a pirated link to several books and scientific papers in the field

There’s been no proof or evidence provided that ANY content was ever pirated. Has any of the companies even provided the dataset they’ve used yet?

Why is this the presumption that they did it the illegal way?

CmdrShepard@lemmy.one · 1 year ago

If I read your book… and get an amazing idea… Turn it into a business and make billions off of it. You still have no right to anything. This is no different

I don’t see how this is even remotely the same? These companies are using this material to create their commercial product. They’re not consuming it personally and developing a random idea later, far removed from the book itself.

I can’t just buy (or pirate) a stack of Blu-rays and then go start my own Netflix, which is akin to what is happening here.

Saik0@lemmy.saik0.com · 1 year ago

They’re not consuming it personally and developing a random idea later, far removed from the book itself.

I never said that the idea would be removed from the book. You can literally take the idea from the book itself and make the money. There would be no issues. There is no dues owed to the book’s writer.

This is the whole premise for educational textbooks. You can explain to me how the whole world works in book form… I can go out and take those ideas wholesale from your book and apply them to my business and literally make money SOLELY from information from your book. There’s nothing due back to you as a writer from me nor my business.

CmdrShepard@lemmy.one · 1 year ago

You’ve failed to explain how that relates to your point. Sure you can purchase an econonomics textbook and then go become a finance bro, but that’s not what they’re doing here. They’re taking that textbook (that wasn’t paid for) and feeding it into their commercial product. The end product is derived from the author’s work.

To put it a different way, would they still be able to produce ChatGPT if one of the developers simply read that same textbook and then inputted what they learned into the model? My guess is no.

It’d be the same if I went and bought CDs, ripped my favorite tracks, and then put them into a compilation album that I then sold for money. My product can’t exist without having copied the original artists work. ChatGPT just obfuscates that by copying a lot of songs.

Saik0@lemmy.saik0.com · 1 year ago

They’re taking that textbook (that wasn’t paid for) and feeding it into their commercial product.

Nobody has provided any evidence that this is the case. Until this is proven it should not be assumed. Bandwagoning (and repeating this over and over again without any evidence or proof) against the ML people without evidence is not fair. The whole point of the Justice system is innocent until proven guilty.

The end product is derived from the author’s work.

Derivative works are 100% protected under copyright law. https://www.legalzoom.com/articles/what-are-derivative-works-under-copyright-law

This is the same premise that allows “fair use” that we all got up and arms about on youtube. Claiming that this doesn’t exist now in this case means that all that stuff we fought for on Youtube needs to be rolled back.

To put it a different way, would they still be able to produce ChatGPT if one of the developers simply read that same textbook and then inputted what they learned into the model? My guess is no.

Why not? Why can’t someone grab a book, scan it… chuck it into an OCR and get the same content? There are plenty of ways that snippets of raw content could make it into these repositories WITHOUT asserting legal problems.

It’d be the same if I went and bought CDs, ripped my favorite tracks, and then put them into a compilation album that I then sold for money.

No… You could have for all intents and purposes have recorded all your songs from the radio onto a cassette… That would be 100% legal for personal consumption… which would be what the ML authors are doing. ChatGPT and others could have sources information from published sources that are completely legit. No “Author” has provided any evidence otherwise yet to believe that ChatGPT and others have actually broken a law yet. For all we know the authors of these tools have library cards, and fed in screenshots of the digital scans of the book or hand scanned the book. Or didn’t even use the book at all and contextually grabbed a bunch of content from the internet at large.

Since the ML bots are all making derivative works, rather than spitting out original content… they’d be covered by copyright as a derivative work.

This only becomes an actual problem if you can prove that these tools have done BOTH

obtain content in an illegal fashion
provide the copyrighted content freely without fair-use or other protections.

bouncing@partizle.com · 1 year ago

A better comparison would probably be sampling. Sampling is fair use in most of the world, though there are mixed judgments. I think most reasonable people would consider the output of ChatGPT to be transformative use, which is considered fair use.

Eccitaze@yiffit.net · 1 year ago

If I created a web app that took samples from songs created by Metallica, Britney Spears, Backstreet Boys, Snoop Dogg, Slayer, Eminem, Mozart, Beethoven, and hundreds of other different musicians, and allowed users to mix all these samples together into new songs, without getting a license to use these samples, the RIAA would sue the pants off of me faster than you could say “unlicensed reproduction.”

It doesn’t matter that the output of my creation is clear-cut fair use. The input of the app–the samples of copyrighted works–is infringing.

bouncing@partizle.com · 1 year ago

There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

That’s part of the allegation, but it’s unsubstantiated. It isn’t entirely coherent.

Flying Squid@lemmy.world · 1 year ago

It’s not entirely unsubstantiated. Sarah Silverman was able to get ChatGPT to regurgitate passages of her book back to her.

bouncing@partizle.com · 1 year ago

Her lawsuit doesn’t say that. It says,

when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works

That’s an absurd claim. ChatGPT has surely read hundreds, perhaps thousands of reviews of her book. It can summarize it just like I can summarize Othello, even though I’ve never seen the play.

AnonStoleMyPants@sopuli.xyz · 1 year ago

I don’t know if this holds water though. You don’t need to trail the AI on the book itself to get that result. Just on discussions about the book which for sure include passages on the book.

novibe@lemmy.ml · 1 year ago

You know what would solve this? We all collectively agree this fucking tech is too important to be in the hands of a few billionaires, start an actual public free open source fully funded and supported version of it, and use it to fairly compensate every human being on Earth according to what they contribute, in general?

Why the fuck are we still allowing a handful of people to control things like this??

traveler01@lemmy.world · 1 year ago

Because the tech behind it isn’t cheap and money does not fall from trees.

novibe@lemmy.ml · 1 year ago

No entity on the planet has more money than our governments. It’d be more efficient for a government to fund this than any private company.

excral@feddit.de · 1 year ago

Many governments on the planet have less money than some big tech or oil companies. Obviously not those of large industrious nations, but most nations aren’t large and industrious.

SocialMediaRefugee@lemmy.world · 1 year ago

The government and efficiency don’t go together

lolcatnip@reddthat.com · 1 year ago

That’s a lazy generalization.

novibe@lemmy.ml · 1 year ago

Plenty of research shows that each dollar into government programs gets much more returns than private companies. This literally a neolib propaganda talking point.

excral@feddit.de · 1 year ago

There is nothing objectively wrong with your statement. However, we somehow always default to solving that issue by having some dragon hoard enough gold, and there is something objectively wrong with that.

zer0@thelemmy.club · 1 year ago

Money literally does fall from trees as they are pieces of paper

Lazz45@sh.itjust.works · edit-2 1 year ago

Actually many bills are more of a fabric material now than an actual paper product. Many bills in Europe now are polymer based. Both of which add to the difficulty of counterfeiting

rocketeer8015@discuss.tchncs.de · 1 year ago

Actually most of the money are just 1‘s and 0‘s in a computer, coming into existence from nothing and vanishing into nothing. Fiat money backed by “trust”. As Henry Ford once said:

It is well enough that people of the nation do not understand our banking and monetary system, for if they did, I believe there would be a revolution before tomorrow morning.

planish@sh.itjust.works · 1 year ago

This comment is excellent. You now have ten trillion LemBux.

Zetaphor@zemmy.cc · 1 year ago

Setting aside the obvious answer of “because capitalism”, there are a lot of obstacles towards democratizing this technology. Training of these models is done on clusters of A100 GPU’s, which are priced at $10,000USD each. Then there’s also the fact that a lot of the progress being made is being done by highly specialized academics, often with the resources of large corporations like Microsoft.

Additionally the curation of datasets is another massive obstacle. We’ve mostly reached the point of diminishing returns of just throwing all the data at the training of models, it’s quickly becoming apparent that the quality of data is far more important than the quantity of the data (see TinyStories as an example). This means a lot of work and research needs to go into qualitative analysis when preparing a dataset. You need a large corpus of input, each of which are above a quality threshold, but then also as a whole they need to represent a wide enough variety of circumstances for you to reach emergence in the domain(s) you’re trying to train for.

There is a large and growing body of open source model development, but even that only exists because of Meta “leaking” the original Llama models, and now more recently releasing Llama 2 with a commercial license. Practically overnight an entire ecosystem was born creating higher quality fine-tunes and specialized datasets, but all of that was only possible because Meta invested the resources and made it available to the public.

Actually in hindsight it looks like the answer is still “because capitalism” despite everything I’ve just said.

novibe@lemmy.ml · 1 year ago

I know the answer to pretty much all of our “why the hell don’t we solve this already?” questions is: capitalism.

But I mean, as Lrrr would say “why does the working class, as the biggest of the classes, doesn’t just eat the other one?”.

Zetaphor@zemmy.cc · 1 year ago

The short answer is friction. The friction of overcoming the forces of violence the larger class has at its disposal and utilizes at the smallest hint of uprising is greater than the friction of accepting the status quo.

TwilightVulpine@lemmy.world · 1 year ago

The friction of accepting the status quo only seems to grow stronger though.

Zetaphor@zemmy.cc · 1 year ago

One would hope

novibe@lemmy.ml · 1 year ago

Most people don’t even think that’s an option though.

The end of history, with the fall of USSR and capitalism winning the propaganda wars, means most people don’t even see a different future.

Why would you fight a future that looks the same?

People need to wake up and have hope for a different, better future. That’s the only way they’ll more against this.

But for that 100+ years of propaganda have to be overcome…

lath@lemmy.world · 1 year ago

Because we shy away from responsibility.

novibe@lemmy.ml · edit-2 1 year ago

I think the longer response to this is more accurate. It’s more “because capitalism” than anything else.

And capitalism over the course of the 20th century made very successful attempts of alienating completely the working class and destroying all class consciousness or material awareness.

So people keep thinking that the problems is we as individuals are doing capitalism wrong. Not capitalism.

zer0@thelemmy.club · edit-2 1 year ago

Why the fuck are we still allowing a handful of people to control things like this??

For many many reasons, i’ll start with this one: because if you don’t complain with authority they will send their thugs (the police) to arrest you.

SocialMediaRefugee@lemmy.world · 1 year ago

You think it is so simple you can just download it and run it on your laptop?

planish@sh.itjust.works · 1 year ago

You kind of can though? The bigger models aren’t really more complicated, just bigger. If you can cram enough ram or swap into a laptop, lamma.cpp will get there eventually.

HiddenLayer5@lemmy.ml · 1 year ago

Someone should AGPL their novel and force the AI company to open source their entire neural network.

ColorcodedResistor@lemm.ee · 1 year ago

This is a good debate about copyright/ownership. On one hand, yes, the authors works went into ‘training’ the AI…but we would need a scale to then grade how well a source piece is good at being absorbed by the AI’s learning. for example. did the AI learn more from the MAD magazine i just fed it or did it learn more from Moby Dick? who gets to determine that grading system. Sadly musicians know this struggle. there are just so many notes and so many words. eventually overlap and similiarities occur. but did that musician steal a riff or did both musicians come to a similar riff seperately? Authors dont own words or letters so a computer that just copies those words and then uses an algo to write up something else is no more different than you or i being influenced by our favorite heroes or i formation we have been given. do i pay the author for reading his book? or do i just pay the store to buy it?

SocialMediaRefugee@lemmy.world · 1 year ago

Copyright laws are really out of control at this point. Their periods are far too long and, like you said, how can anyone claim to truly be original at this point? A dedicated lawyer can find reasonable prior art for pretty much anything nowadays. The only reason old sources look original is because no records exist of the sources they used.

Cstrrider@lemmy.world · 1 year ago

While I am rooting for authors to make sure they get what they deserve, I feel like there is a bit of a parallel to textbooks here. As an engineer if I learn about statics from a text book and then go use that knowledge to he’ll design a bridge that I and my company profit from, the textbook company can’t sue. If my textbook has a detailed example for how to build a new bridge across the Tacoma Narrows, and I use all of the same design parameters for a real Tacoma Narrows bridge, that may have much more of a case.

minesweepermilk@lemmy.world · 1 year ago

I think that these are fiction writers. The maths you’d use to design that bridge is fact and the book company merely decided how to display facts. They do not own that information, whereas the Handmaid’s Tale was the creation of Margaret Atwood and was an original work.

Square Singer@feddit.de · 1 year ago

It’s not really a parallel.

The text books don’t have copyrights on the concepts and formulae they teach. They only have copyrights for the actual text.

If you memorize the text book and write it down 1:1 (or close to it) and then sell that text you wrote down, then you are still in violation of the copyright.

And that’s what the likes of ChatGPT are doing here. For example, ask it to output the lyrics for a song and it will spit out the whole (copyrighted) lyrics 1:1 (or very close to it). Same with pages of books.

HumbertTetere@feddit.de · 1 year ago

The memorization is closer to that of a fanatic fan of the author. It usually knows the beginning of the book and the more well known passages, but not entire longer works.

By now, ChatGPT is trying to refuse to output copyrighted materials know even where it could, and though it can be tricked, they appear to have implemented a hard filter for some more well known passages, which stops generation a few words in.

Square Singer@feddit.de · 1 year ago

Have you tried just telling it to “continue”?

Somewhere in the comments to this post I posted screenshots of me trying to get lyrics for “We will rock you” from ChatGPT. It first just spat out “Verse 1: Buddy,” and ended there. So I answered with “continue”, it spat out the next line and after the second “continue” it gave me the rest of the lyrics.

Similar story with e.g. the first chapter of Harry Potter 1 and other stuff I tried. The output is often not perfect, with a few words being wrong, but it’s very clearly a “derived work” of the original. In the view of copyright law, changing a few words here is not a valid way of getting around copyrights.

Noedel@lemmy.world · 1 year ago

But you paid for the textbook

Pyro@lemmy.world · 1 year ago

Libraries exist

RufusFirefly@lemmy.world · 1 year ago

You have a point but there’s a pretty big difference between something like a statistics textbook and the novel “Dune” for instance. One was specifically written to teach mostly pre-existing ideas and the other was created as entertainment to sell to a wide an audience as possible.

Melllvar@startrek.website · 1 year ago

An AI analyzes the words of a query and generates its response(s) based on word-use probabilities derived from a large corpus of copyrighted texts. This makes its output derivative of those texts in a way that someone applying knowledge learned from the texts is not.

planish@sh.itjust.works · 1 year ago

Why, though?

Is it because we can’t explain the causal relationships between the words in the text and the human’s output or actions?

If a very good neuroscientist traced out the engineer’s brain and could prove that, actually, if it wasn’t for the comma on page 73 they wouldn’t have used exactly this kind of bolt in the bridge, now is the human’s output derivative of the text?

Any rule we make here should treat people who are animals and people who are computers the same.

And even regardless of that principle, surely a set of AI weights is either not copyrightable or else a sufficiently transformative use of almost anything that could go into it? If it decides to regurgitate what it read, that output could be infringing, same as for a human. But a mere but-for causal connection between one work and another can’t make text that would be non-infringing if written by a human suddenly infringing because it was generated automatically.

Melllvar@startrek.website · 1 year ago

Because word-use probabilities in a text are not the same thing as the information expressed by the text.

Any rule we make here should treat people who are animals and people who are computers the same.

W-what?

Tangent5280@lemmy.world · 1 year ago

I think what he meant was that we should an AI the same way we treat people - if a person making a derivative work can be copyright striked, then so should an AI making a derivative work. The same rule should apply to all creators*, regardless of whether they are an AI or not.

planish@sh.itjust.works · 1 year ago

In the future, some people might not be human. Or some people might be mostly human, but use computers to do things like fill in for pieces of their brain that got damaged.

Some people can’t regognize faces, for example, but computers are great at that now and Apple has that thing that is Google Glass but better. But a law against doing facial recognition with a computer, and allowing it to only be done with a brain, would prevent that solution from working.

And currently there are a lot of people running around trying to legislate exactly how people’s human bodies are allowed to work inside, over those people’s objections.

I think we should write laws on the principle that anybody could be a human, or a robot, or a river, or a sentient collection of bees in a trench coat, that is 100% their own business.

Melllvar@startrek.website · edit-2 1 year ago

But the subject under discussion is large language models that exist today.

I think we should write laws on the principle that anybody could be a human, or a robot, or a river, or a sentient collection of bees in a trench coat, that is 100% their own business.

I’m sorry, but that’s ridiculous.

planish@sh.itjust.works · 1 year ago

I have indeed made a list of ridiculous and heretofore unobserved things somebody could be. I’m trying to gesture at a principle here.

If you can’t make your own hormones, store bought should be fine. If you are bad at writing, you should be allowed to use a computer to make you good at writing now. If you don’t have legs, you should get to roll, and people should stop expecting you to have legs. None of these differences between people, or in the ways that people choose to do things, should really be important.

Is there a word for that idea? Is it just what happens to your brain when you try to read the Office of Consensus Maintenance Analog Simulation System?

Melllvar@startrek.website · edit-2 1 year ago

The issue under discussion is whether or not LLM companies should pay royalties on the training data, not the personhood of hypothetical future AGIs.

Fedizen@lemmy.world · 1 year ago

Plagiarism filters frequently trigger on chatgpt written books and articles.

TendieMaster69@midwest.social · 1 year ago

Yea sure, right after Google and Amazon pay me for all the data they’ve stolen from me. LOL

FontMasterFlex@lemmy.world · 1 year ago

So what’s the difference between a person reading their books and using the information within to write something and an ai doing it?

Saneless@lemmy.world · 1 year ago

Because AIs aren’t inspired by anything and they don’t learn anything

r1veRRR@feddit.de · 1 year ago

So uninspired writing is illegal?

dan@lemm.ee · 1 year ago

No but a lazy copy of someone else’s work might be copyright infringement.

Odusei@lemmy.world · 1 year ago

So when does Kevin Costner get to sue James Cameron for his lazy copy of Dances With Wolves?

Telodzrum@lemmy.world · 1 year ago

Avatar is not Dances with Wolves. It’s Ferngully.

dan@lemm.ee · 1 year ago

Idk, maybe. There are thousands of copyright infringement lawsuits, sometimes they win.

I don’t necessarily agree with how copyright law works, but that’s a different question. Doesn’t change the fact that sometimes you can successfully sue for copyright infringement if someone copies your stuff to make something new.

tenitchyfingers@lemmy.world · 1 year ago

Why not? Hollywood is full to the brim with people suing for copyright infringement. And sometimes they win. Why should it be different for AI companies?

lily33@lemmy.world · edit-2 1 year ago

Language models actually do learn things in the sense that: the information encoded in the training model isn’t usually* taken directly from the training data; instead, it’s information that describes the training data, but is new. That’s why it can generate text that’s never appeared in the data.

the bigger models seem to remember some of the data and can reproduce it verbatim; but that’s not really the goal.

Chailles@lemmy.world · 1 year ago

What does inspiration have to do with anything? And to be honest, humans being inspired has led to far more blatant copyright infringement.

As for learning, they do learn. No different than us, except we learn silly abstractions to make sense of things while AI learns from trial and error. Ask any artist if they’ve ever looked at someone else’s work to figure out how to draw something, even if they’re not explicitly looking up a picture, if they’ve ever seen a depiction of it, they recall and use that. Why is it wrong if an AI does the same?

vrighter@discuss.tchncs.de · 1 year ago

the person bought the book before reading it

FontMasterFlex@lemmy.world · 1 year ago

not if i checked it out from a library. a WORLD of knowledge at your fingertips and it’s all free to me, the consumer. So who’s to say the people training the ai didn’t check it out from a library, or even buy the books they are using to train the ai with? would you feel better about it had they purchased their copy?

Melllvar@startrek.website · edit-2 1 year ago

Large language models can only calculate the probability that words should go together based on existing texts.

trainsaresexy@lemmy.world · 1 year ago

Isn’t this correct? What’s missing?

Let’s ask chatGPT3.5:

Mostly accurate. Large language models like me can generate text based on patterns learned from existing texts, but we don’t “calculate probabilities” in the traditional sense. Instead, we use statistical methods to predict the likelihood of certain word sequences based on the training data.

Melllvar@startrek.website · 1 year ago

“Mostly accurate” is pretty good for an anonymous internet post.

trainsaresexy@lemmy.world · 1 year ago

I thought so too so I’m still confused about the votes. Oh well

BakonGuy@lemmy.world · 1 year ago

I don’t see how “calculate the probability” and “predict the likelihood” are different. Seems perfectly accurate to me.

tenitchyfingers@lemmy.world · 1 year ago

A person is human and capable of artistry and creativity, computers aren’t. Even questioning this just means dehumanizing artists and art in general.

FontMasterFlex@lemmy.world · 1 year ago

Not being allowed to question things is a really shitty precedent, don’t you think?

tenitchyfingers@lemmy.world · 1 year ago

Do you think a hammer and a nail could do anything on their own, without a hand picking them up guiding them? Because that’s what a computer is. Nothing wrong with using a computer to paint or write or record songs or create something, but it has to be YOU creating it, using the machine as a tool. It’s also in the actual definition of the word: art is made by humans. Which explicitly excludes machines. Period. Like I’m fine with AI when it SUPPORTS an artist (although sometimes it’s an obstacle because sometimes I don’t want to be autocorrected, I want the thing I write to be written exactly as I wrote it, for whatever reason). But REPLACING an artist? Fuck no. There is no excuse for making a machine do the work and then to take the credit just to make a quick easy buck on the backs of actual artists who were used WITHOUT THEIR CONSENT to train a THING to replace them. Nah fuck off my guy. I can clearly see you never did anything creative in your whole life, otherwise you’d get it.

FontMasterFlex@lemmy.world · 1 year ago

Nah fuck off my guy. I can clearly see you never did anything creative in your whole life, otherwise you’d get it.

Oh, right. So I guess my 20+ year Graphic Design career doesn’t fit YOUR idea of creative. You sure have a narrow life view. I don’t like AI art at all. I think it’s a bad idea. you’re a bit too worked up about this to try to discuss anything. Not to excited about getting told to fuck off about an opinion. This place is no better than reddit ever was.

tenitchyfingers@lemmy.world · 1 year ago

Of course I’m worked up. I love art, I love doing art, i have multiple friends and family members who work with art, and art is the last genuine thing that’s left in this economy. So yeah, obviously I’m angry at people who don’t get it and celebrate this bullshit just because they are too lazy to pick up a pencil, get good and draw their own shit, or alternatively commission what they wanna see from a real artist. Art was already PERFECT as it was, I have a right to be angry that tech bros are trying to completely ruin it after turning their nose up at art all their lives. They don’t care about why art is good? Ok cool, they can keep doing their graphs and shit and just leave art alone.

linearchaos@lemmy.world · 1 year ago

I don’t know how I feel about this honestly. AI took a look at the book and added the statistics of all of its words into its giant statistic database. It doesn’t have a copy of the book. It’s not capable of rewriting the book word for word.

This is basically what humans do. A person reads 10 books on a subject, studies become somewhat of a subject matter expert and writes their own book.

Artists use reference art all the time. As long as they don’t get too close to the original reference nobody calls any flags.

These people are scared for their viability in their user space and they should be, but I don’t think trying to put this genie back in the bottle or extra charging people for reading their stuff for reference is going to make much difference.

BartsBigBugBag@lemmy.tf · 1 year ago

It’s not at all like what humans do. It has no understanding of any concepts whatsoever, it learns nothing. It doesn’t know that it doesn’t know anything even. It’s literally incapable of basic reasoning. It’s essentially taken words and converted them to numbers, and then it examines which string is likely to follow each previous string. When people are writing, they aren’t looking at a huge database of information and determining the most likely word to come next, they’re synthesizing concepts together to create new ones, or building a narrative based on their notes. They understand concepts, they understand definitions. An AI doesn’t, it doesn’t have any conceptual framework, it doesn’t even know what a word is, much less the definition of any of them.

oce 🐆@jlai.lu · edit-2 1 year ago

How can you tell that our thoughts don’t come from a biological LLM? Maybe what we conceive as “understanding” is just a feeling emerging from a more fondamental mechanism like temperature emerges from the movement of particles.

Telodzrum@lemmy.world · 1 year ago

Because we have biological, developmental, and psychological science telling us that’s not how higher-level thinking works. Human brains have the ability to function on a sort of autopilot similar to “AI”, but that is not what we are describing when we speak of creative substance.

chicken@lemmy.dbzer0.com · edit-2 1 year ago

When people are writing, they aren’t looking at a huge database of information and determining the most likely word to come next, they’re synthesizing concepts together to create new ones, or building a narrative based on their notes. They understand concepts, they understand definitions.

A huge part of what we do is like drawing from a huge mashup of accumulated patterns though. When an image or phrase pops into your head fully formed, on the basis of things that you have seen and remembered, isn’t that the same sort of thing as what AI does? Even though there are (poorly understood) differences between how humans think and what machine learning models do, the latter seems similar enough to me that most uses should be treated by the same standard for plagiarism; only considered violating if the end product is excessively similar to a specific copyrighted work, and not merely because you saw a copyrighted work and that pattern being in your brain affected what stuff you spontaneously think of.

planish@sh.itjust.works · 1 year ago

I don’t think this is true.

The models (or maybe the characters in the conversations simulated by the models) can be spectacularly bad at basic reasoning, and misunderstand basic concepts on a regular basis. They are of course completely insane; the way they think is barely recognizable.

But they also, when asked, are often able to manipulate concepts or do reasoning and get right answers. Ask it to explain the water cycle like a pirate, and you get that. You can find the weights that make the Eifel Tower be in Paris and move it to Rome, and then ask for a train itinerary to get there, and it will tell you to take the train to Rome.

I don’t know what “understanding” something is other than to be able to get right answers when asked to think about it. There’s some understanding of the water cycle in there, and some of pirates, and some of European geography. Maybe not a lot. Maybe it’s not robust. Maybe it’s superficial. Maybe there are still several differences in kind between whatever’s there and the understanding a human can get with a brain that isn’t 100% the stream of consciousness generator. But not literally zero.

linearchaos@lemmy.world · 1 year ago

I didn’t say what you said, that’s a lot of words and concepts you’re attributing to me that I didn’t say.

I’m saying, LLM ingests data in a way it can average it out, in essence it learns it. It’s not wrote memorization, but it’s not truly reasoning either, though it’s approaching it if you consider we might be overestimating human comprehension. It pulls in the data from all the places and uses the data to create new things.

People pull in data over a decade or two, we learn it, then end up writing books, or applying the information to work. They’re smart and valuable people and we’re glad they read everyone’s books.

The LLM ingests the data and uses the statistics behind it to do work, the world is ending.

Buttons@programming.dev · 1 year ago

I think you underestimate the reasoning power of these AIs. They can write code, they can teach math, they can even learn math.

I’ve been using GPT4 as a math tutor while learning linear algebra, and I also use a text book. The text book told me that (to write it out) “the column space of matrix A is equal to the column space of matrix A times its own transpose”. So I asked GPT4 if that was true and it said no, GPT disagreed with the text book. This was apparently something that GPT did not memorize and it was not just regurgitating sentences. I told GPT I saw it in a text book, the AI said “sorry, the textbook must be wrong”. I then explained the mathematical proof to the AI, and the AI apologized, admitted it had been wrong, and agreed with the proof. Only after hearing the proof did the AI agree with the text book. This is some pretty advanced reasoning.

I performed that experiment a few times and it played out mostly the same. I experimented with giving the AI a flawed proof (I purposely made mistakes in the mathematical proofs), and the AI would call out my mistakes and would not be convinced by faulty proofs.

A standard that judged this AI to have “no understanding of any concepts whatsoever”, would also conclude the same thing if applied to most humans.

unlimitedolm_sjw@sh.itjust.works · edit-2 1 year ago

That doesn’t prove that GPT is reasoning, its model predicts that those responses are the most likely given the messages your sending it. It’'s read thousands of actual conversations with people stating something incorrect, then having it explained to them and them coming around and admitting they were wrong.

I’ve seen other similar cases where the AI is wrong about something, and when it’s explained, it just doubles down. Because humans do that type of thing too, refusing to admit their wrong.

The way it’s designed means that it cannot reason in the same way humans experience it. It can simulate a likely conversation someone would have if they could reason.

Buttons@programming.dev · edit-2 1 year ago

You know, I also had to experience thousands of conversations before I could mimic a conversation. Maybe I’m just slow, but it took like a full year of dedicated practice before I could speak a single word. Eventually I experienced so many conversations and became so good at mimicking the conversations I had heard from others that I started having actual conversations.

Most anecdotes about AI getting things wrong are from GPT3. GPT4 still has flaws of course, just like people. This technology didn’t exist at the beginning of the year.

Here’s an example while learning linear algebra where I tried to throw it off:

I say> So I’ll say A_{M x N} to mean a matrix of size M x N. Similarly, I’ll also say x_{M x 1} to mean a vector of a certain length. The result of A_{M x N} x_{N x 1} is a linear combination of the columns in A. A has N columns. The result is a M x 1 matrix (or vector) in M dimensional space. Please be very brief, only confirming whether I am correct or incorrect until I explicitly ask for more details.

GPT4 says> Yes, your understanding is correct. The product of an M x N matrix A and an N x 1 vector x is indeed an M x 1 vector, and it is a linear combination of the columns of A. <end quote>

But then I try to trick it, just for fun, and I very subtly change my prompt. See if you can spot the difference.

I say> So I’ll say A_{M x N} to mean a matrix of size M x N. Similarly, I’ll also say x_{M x 1} to mean a vector of a certain length. The result of A_{M x N} x_{N x 1} is a linear combination of the columns in A. A has N columns. The result is a N x 1 matrix (or vector) in N dimensional space. Please be very brief, only confirming whether I am correct or incorrect until I explicitly ask for more details.

GPT says> Almost correct. The result of A_{M x N} x_{N x 1} is actually an M x 1 vector (or matrix), not N x 1. The resulting vector lives in the column space of A, which is a subspace of R^M, not R^N. <end quote>

I guess everyone can judge or themselves whether that’s the result of a statistical model or genuine understanding. (And to be clear, the mathematical advice it’s giving here is correct.)

foo@programming.dev · 1 year ago

They can write code and teach maths because it’s read people doing the exact same stuff

Buttons@programming.dev · edit-2 1 year ago

Hey, that’s the same reason I can write code and do maths!

I’m serious, the only reason I know how to code or do math is because I learned from other people, mostly by reading. It’s the only reason I can do those things.

Telodzrum@lemmy.world · 1 year ago

It’s just a really big autocomplete system. It has no thought, no reason, no sense of self or anything, really.

Buttons@programming.dev · edit-2 1 year ago

I guess I agree with some of that. It’s mostly a matter of definition though. Yes, if you define those terms in such a way that AI cannot fulfill them, then AI will not have them (according to your definition).

But yes, we know the AI is not “thinking” or “scheming”, because it just sits there doing nothing when it’s not answering a question. We can see that no computation is happening. So no thought. Sense of self… probably not, depends on definition. Reason? Depends on your definition. Yes, we know they are not like humans, they are computers, but they are capable of many things which we thought only humans could do 6 months ago.

Since we can’t agree on definitions I will simply avoid all those words and say that state-of-the-art LLMs can receive text and make free form, logical, and correct conclusions based upon that text at a level roughly equal to human ability. They are capable of combining ideas together that have never been combined by humans, but yet are satisfying to humans. They can invent things that never appeared in their training data, but yet make sense to humans. They are capable of quickly adapting to new data within their context, you can give them information about a programming language they’ve never encountered before (not in their training data), and they can make correct suggestions about that programming language.

I know you can find lots of anecdotes about LLMs / GPT doing dumb things, but most of those were GPT3 which is no longer state-of-the-art.

joe@lemmy.world · 1 year ago

All this copyright/AI stuff is so silly and a transparent money grab.

They’re not worried that people are going to ask the LLM to spit out their book; they’re worried that they will no longer be needed because a LLM can write a book for free. (I’m not sure this is feasible right now, but maybe one day?) They’re trying to strangle the technology in the courts to protect their income. That is never going to work.

Notably, there is no “right to control who gets trained on the work” aspect of copyright law. Obviously.

DandomRude@lemmy.world · 1 year ago

There is nothing silly about that. It’s a fundamental question about using content of any kind to train artificial intelligence that affects way more than just writers.

Flying Squid@lemmy.world · 1 year ago

I seriously doubt Sarah Silverman is suing OpenAI because she’s worried ChatGPT will one day be funnier than she is. She just doesn’t want it ripping off her work.

joe@lemmy.world · 1 year ago

What do you mean when you say “ripping off her work”? What do you think an LLM does, exactly?

Flying Squid@lemmy.world · 1 year ago

In her case, taking elements of her book and regurgitating them back to her. Which sounds a lot like they could be pirating her book for training purposes to me.

Rodeo@lemmy.ca · 1 year ago

How do you know they didn’t just buy the book?

Thousands of authors demand payment from AI companies for use of copyrighted works

Thousands of authors demand payment from AI companies for use of copyrighted works

Thousands of authors demand payment from AI companies for use of copyrighted works | CNN Business