The privacy issue isn’t just about the data being available as people’s names, addresses, and phone numbers are generally available. The issue is if they show up as part of some meme chat and then you as the LLM creator get sued because people start harassing them.
In terms of copyright infringement the bar is quite low, and copying is a basic part of how these algorithms work. This may or may not be an issue for you personally but it is a large land mine for commercial use especially if you’re independently creating one of these systems.
In terms of copyright infringement the bar is quite low, and copying is a basic part of how these algorithms work. This may or may not be an issue for you personally but it is a large land mine for commercial use especially if you’re independently creating one of these systems.