@hunterhacker @ben it's that chatgpt is fundamentally built off copyright infringement and theft. even if in this situation there's no profit being taken, in other situations there absolutely is. openai is fundamentally scummy, and it's good to push back if you can.
@hunterhacker
@wuppy @ben
Yup. OpenAI refuses to reveal what was in their training data.
You may be thankful for an answer to a question but thankful to whom? ChatGPT generates answers claiming it as its own creation and OpenAI gets the credit. At least pre-2023 search engines directed you to the original source.
When asked to create a new game that never existed before, ChatGPT regurgited someone else's game idea and gave it a different name.
https://gizmodo.com/chatgpt-copy-sumplete-puzzle-game-summer-rullo-1850212198
@bornach @hunterhacker @wuppy @ben Aren't they legally required to give credit to the original material?
@hunterhacker @ben @tuzu @bornach I'm not a lawyer but no, they're not required to credit training data. the training data does not show up in any real form in the final product, it was merely used to influence some values. chatgpt doesn't contain within it the entirety of any news article or paper it's "read", it merely absorbed that data in order to influence the likelyhood of it generating certain words. AI is in a sort of weird legal ground where it requires massive amounts of copyrighted data to work, but the final software contains no copyright data. the training data is some hundreds of terabytes or something, while chatgpt itself is less than one. (i may be wrong about this i didn't double check but im pretty sure this is why openai is both allowed to use copyright material and doesn't have to tell anybody what they used)
@wuppy @hunterhacker @ben @bornach I know but it often just gives someones copyrighted material as big part of the response.
@ben @hunterhacker @bornach @tuzu but because of the nature of neural nets it's basically impossible to prove that it is just copying something. unfortunately, it's very hard to police
@wuppy @ben @hunterhacker @tuzu
I'm gonna go train a deep neural network autoencoder using a recent Disney film as training data then distribute the weights and biases on github. I bet Disney lawyers will find a way to prove that its linear algebraic contortions through latent space didn't magically strip their movie of all copyright protections.
@bornach @wuppy @ben @hunterhacker I think this is the issue, specifically if copyright applies if the work was influenced by someones elses work and then recreated that work whit out directly copying it.
@wuppy @ben @hunterhacker @bornach @tuzu I think if the output were more likely to match a significant portion of the input, then it would probably be more clear, legally speaking, that it's copyright infringement, regardless of what the model is actually doing under the hood. But as things stand, it's not *that* common for a language model to emit output that just reproduces part of the input, which makes it less clear-cut.
@diazona @wuppy @ben @hunterhacker @tuzu
How common it is just depends on how niche the topic that you're prompting the LLM
https://youtu.be/xbf4BGIBENk?t=7m18s
See clip at 7:18
In spite of the Reinforcement Learning with Human Feedback fine tuning that is often applied to LLMs in order to align them with chatbot behaviour, they still respond well to text completion tasks.
Several examples of this here:
https://www.patronus.ai/blog/introducing-copyright-catcher
@wuppy @hunterhacker @ben @tuzu @bornach
"the training data does not show up in any real form in the final product"
I currently don't believe the above statement is true: https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-lawsuit-getty-images-stable-diffusion
@tuzu @bornach @hunterhacker @wuppy @ben That's the million dollar question. To me the answer feels like an obvious yes, but I don't think it's been tested in court yet. In the case of Stack Overflow, everything on the website is by default licenced under CC BY-SA (revision depends on when it was posted), which requires attribution. But I'm not a lawyer. I suspect the outcome will depend on how successful AI companies are in their lobbying efforts.
@newbyte @ben @hunterhacker @bornach @tuzu there's a legal battle ongoing rn between NYT and openai that probably sets the precedent for the future
@wuppy @hunterhacker @ben and with chat GPT the answer gets mangled into something subtly dangerous
@wuppy @hunterhacker @ben Built off copyright infringement, theft and deliberate destruction of the planet. People who are okay with AI / LLMs but think everyone should drive EVs need to get a reailty check, imho.