At some level all software will be too complex to understand easily within one s...

yjftsjthsd-h · on Nov 13, 2022

> While I somewhat agree there are levels of obscufation, just because something is hard to understand on the first go isn't sufficient for something to be "non-free" in my interpretation, like the b interpreter.

The GPLv2 contains this sentence:

> The source code for a work means the preferred form of the work for making modifications to it.

And I think that's really the fundamental question at hand. If the original author(s) actually write in a particular style (I guess the b interpreter is a case of this?) then it's ... kind of annoying IMO, but it isn't nonfree. On the other hand if the original authors wrote another representation and then intentionally obfuscated before publishing it, then that's clearly intentionally aimed at reducing the ability of others to exercise the 4 freedoms and that's a problem. And I do think there's some grey area in the middle; things like $BIG_COMPANY releasing the source code for something, but even if they have no intention of making it hard for others to work with that code it was written for their internal use and it's tied to internal build processes purely as a practical matter.

not2b · on Nov 14, 2022

Yes, distributing modifications to GPLed code only in obfuscated form is a clear GPL violation.

Distributing ugly and unmaintainable code because that's the only code you have is allowed (and sometimes that happens because the code works just well enough to get an academic paper published and isn't going to be polished any more than that).

montag · on Nov 14, 2022

Eventually, we will rely on machines to help us understand software code. Think "fill in missing code comments with GPT-3." Now add 10 years of focused research to make this more reliable and intelligible. It must be inevitable, right?

PartiallyTyped · on Nov 14, 2022

Copilot can fill in type annotations in python, “resolve” some bugs, explain code, “make it more robust”. The features are available in copilot labs, they are obviously very much WIP, but they are definitely not useless.

capitalsigma · on Nov 14, 2022

Type inference really doesn't require ML (and is probably better done without it in order to be robust).

I think the Kolmogorov complexity of serious production codebases is very high -- in order to convey what the codebase does, you have to transmit roughly as many bits as the codebase itseld. In most companies, you already have a way to get a "compact, human-readable description" of what the code does -- it's called asking your coworker, or reading your internal documentation. Ramp up is still hard.

PartiallyTyped · on Nov 14, 2022

> Type inference really doesn't require ML

This holds only if you have sufficient context, giving what is virtually an untyped function into e.g. mypy will not provide anything meaningful. Even languages with rich type systems like Haskell do require some types - even implicitly - to infer everything else.

In contrast, copilot has access to significantly more information when emitting the type-annotated code, including context of what the project is like, how are other functions defined, plus all the prior knowledge accumulated from training.

> I think the Kolmogorov complexity of serious production codebases is very high

Do serious codebases exceed gigabytes in raw storage?

> in order to convey what the codebase does

Doesn't this assume no priors / blank slate?

capitalsigma · on Nov 14, 2022

> Even languages with rich type systems like Haskell do require some types - even implicitly - to infer everything else.

If the Haskell compiler is spitting out "ambiguous instance of +" or whatever that error message is, it's a sign that the author doesn't understand what they're asking for. Taking a language whose value proposition is "I refuse to compile your code if there is the slightest indication that it's not perfect" and slapping a fuzzy ML model on top of it to suppress a class of errors is not a good idea.

Type systems exist as a way of ensuring that the programmer's mental model matches the code. Offloading type annotations to something else removes that safety from you.

> no priors

Sure? I don't see how that changes anything. My point is just that we already have very high quality summarizers of code, but software development is still hard. Making a model that attempts to approximate the summarizers of code that we already have isn't going to help much, no matter how good it gets.

PartiallyTyped · on Nov 14, 2022

> and slapping a fuzzy ML model on top of it to suppress a class of errors is not a good idea.

I am not saying that that is a good idea, I am expressing that there are limitations on what one can infer without broader context.

> Offloading type annotations to something else removes that safety from you.

Counter argument; the annotations can be used as a second sanity check; if the inferred types match your mental model - and the type inference model is good - then you know that your mental model is correct and you didn't miss something.