The PivotNine Blog

The Value of Github CoPilot Is In Search

oskar-kadaksoo-f_rLDn5m2XQ-unsplash.resized.jpg
23 June 2022
Justin Warren

Steve O'Grady from Redmonk has written a nice piece on GitHub CoPilot that covers a lot of the recent discussion about the technology.

There were a couple of aspects that O'Grady didn't dig into (which is fair, because he covered a lot in the piece) that I think are important, so here are my thoughts on it.

Legality

O'Grady quotes GitHub's assertion about the code:

I've emphasised a slightly different section because I didn't see O'Grady tackle this part.

To me, GitHub appears to be trying to sidestep questions of liability for itself and ensuring that, if there are any legal questions about the code that comes out of CoPilot, that's entirely the customer's problem.

As a customer, that's probably something you should be thinking about, possibly with the help of your own legal team.

I agree with O'Grady that GitHub is “saying implicitly that the license of the code that was used to train the system is immaterial”. I agree that's the claim. Whether or not this is actually true remains to be seen, as most of the legal analysis I've read says that we really need more case law on this before we'll know.

GitHub is laying the groundwork to ensure that this isn't a question it needs to answer for. Any future lawsuit is more likely to be aimed at a CoPilot customer who used code that someone else asserts is infringing. Largely that's because Microsoft has a lot of money and lawsuits are expensive. A party would need to have something pretty big to gain from suing Microsoft for it to be worth the hassle. But Independent Software Developers, Inc.? They might be easier to defeat and worth the shot.

I'm unclear on whether “we used CoPilot” would be sufficient for you to disclaim liability for code that GitHub asserts is yours and that you are wholly responsible for. This is a novel situation in many ways, so until courts have a chance to test the claims and make some rulings, I'm not as confident as others that there'll be no problems. I don't think there will be problems for Microsoft but I'm less confident for CoPilot customers.

So this ultimately comes down to a risk/reward calculation. Are you willing to take the risk that CoPilot might generate some code that violates copyright and embed that code into your project?

I bet the answer for most people will be: yes.

If we look at the history of tech, I don't see a lot of evidence of the tech industry paying much attention to the law when the perceived private benefits of breaking the law are large enough, and the weighted likelihood+cost of a penalty is low.

Overall I think the success or otherwise of CoPilot will rely more on its usefulness to companies writing commercial code than whether or not it's technically legal to use it. And if it is eventually found to be illegal, by then it will already be too late and we won't get the genie back in the bottle.

The Value of CoPilot

I'm still unclear on the real-life value of CoPilot. Using pre-generated code snippets that match what it looks like you're doing is already something we do now. Compiler optimisation exists. Standard libraries exist. Javascript frameworks not only exist, they exist in ever increasing abundance. CoPilot seems less like using a standard framework that everyone else in the industry knows and more like cutting and pasting bits of the library into your own weird codebase structure.

It's possible that GitHub is specifically keeping the snippet length fairly short to avoid legal issues around derivative works. If the law became clearer, that might pave the way for longer code blocks to be provided, but then isn't that just inlining code instead of using a library? Why would you do that? You're making someone else's maintenance headache your own while also undermining the principle of loosely-coupled modularity that is at the heart of modern software.

The longer term value of CoPilot (which is what's needed to keep people paying $10 a month to subscribe to it) is more likely in a form of code search. Discovering that someone else already solved your problem is currently tricky, and it is, I'd argue, the main value of open source software. You can build on the shoulders of giants without having to pay royalties to everyone in the pyramid of shoulder-standers.

But small snippets of code are, by their very nature, fairly trivial. Using CoPilot doesn't seem as transformative as, say, adopting Ruby on Rails or Heroku and it costs more. Slightly fancy tab-completion doesn't feel like enough value unless what is found helps me to solve more than just “set up the usual boilerplate here, please”.

So the value of CoPilot is less about the code and more about the humans writing code. And it's about showing those humans code they haven't seen before, but that already exists. It's about search and discovery.

I think the real value of CoPilot will be in providing developers with real-time context hints for different or better ways of achieving their goal that other people have already discovered. CoPilot isn't generating new ideas, it's just showing you ideas that already exist, but they are ideas that you didn't already know.

Ultimately, the value of CoPilot will be that it provides on-the-job training to help developers become better developers. If it can pull this off, it will indeed be a great boon.

I started this article feeling pretty gloomy and down on the idea of CoPilot, and I've managed to change my own mind by thinking this through. I still think there are some legal risks for people using CoPilot, and some moral quandries to navigate, but I think there's a way there. If Microsoft can re-position how people think about CoPilot—and if it can deliver on the promise I just highlighted—they could be on a winner.

Microsoft and GitHub are not currently PivotNine customers.