There’s a narrative that keeps coming up in discussions about AI: that it’s all built on theft. That the models powering Claude, Cursor and others were trained on open source code without permission. That developers never consented to having their work scraped and fed into these systems.
I get it. And yes, there’s something uncomfortable about the fact that nobody asked. But here’s the thing: we’ve been here before.
When search engines started indexing the web, they didn’t ask permission from every website owner. When Google scanned millions of books, publishers and authors sued. When hip-hop artists began sampling existing music, it was called stealing before it became an art form with its own licensing systems.
Now, I know what you’re thinking: those comparisons aren’t quite fair. Google gives you traffic in return. Sampling led to licensing deals. But OpenAI, Anthropic and others? They’re making billions while the developers whose code they trained on see nothing in return.
That’s a valid point. And I don’t think we should pretend it isn’t.
I publish open source code myself. I release it under licenses that explicitly allow others to learn from it and build upon it. The mass-scale training of AI models wasn’t something any of us anticipated. But the principle of sharing knowledge so others can build on it? That’s been the heart of open source from the start.
Was training AI models on that code a grey area? Probably. Is it clearer in hindsight than it was at the time? Definitely. Should there be conversations about how value gets distributed? Absolutely.
For me, this ship has sailed. Not in the sense that we should forget about it or shrug it off. But in the sense that letting this constantly loom over what AI is capable of today is becoming a non-argument. The question isn’t whether the foundation was messy. It’s what we build from here.