No, it isn't computable (that is, correctly determining one of "these functions behave the same" or "these functions behave differently", and not "unknown") in general, as it is equivalent to the halting problem. Consider these two versions of a function, are they API compatible?
They're only truly API compatible if the program halts, but the program can be arbitrary, so proving that any 'foo' of this style are equivalent is solving the halting problem.
Of course, one can still likely get useful answers of "definitely incompatible" etc., with much more tractable analyses. AIUI, the Elm version ends up just looking at the function signature, and catches things like removing or adding arguments: for appropriately restricted languages, it is likely to even be possible to determine if downstream code will still compile, but that's not a guarantee it will behave correctly.
It's been a while, since I studied this, but linear bounded automata are arguably a better model for the real computers we can actually build and the halting problem is computable for LBAs.
That said, in practice I'm not sure it is too useful (this is a slightly different question to the one originally asked, though). My understanding is the proof of decidability is essentially a pigeonhole principle argument based on LBA having a finite number of states: run the LBA for that many steps, if it hasn't halted, then it has looped. Even if a program is running on a system that only allows programs to use 1GB of memory, there's 2^(2^30*8) (approximately 10^(2.6e9)) states.
Essentially, the problem is not with computers, but with languages.
If your language lets you build a neverending loop/datastructure/etgc, then you cannot prove halting for all the programs that can be made by the language.
Since such languages exist, then by extension, it applies to computers.
I wonder if tighter guarantees can be given if tools can work with constructs like D's contracts. These are extra sections in each function that are intended to check invariants. If these invariants change, then they were either broken and needed fixing or the function had a semantic change.
That's an interesting idea. It seems like it would be a way for tools to flag "this function is likely to have changed in an interesting way", but changing invariants doesn't necessarily mean the function breaks semver.
For one, an invariant might be changed syntactically, without actually changing what it is asserting, reducing to exactly my example above: in its simplest form, the contract could go from in(true) to in(halts("...")).
Secondly, an contract could have been made more permissive, e.g. in(x > 1) becoming in(x > 0), or out(y > 0) becoming out(y > 1). Assuming violating a contract is regarded as a programming error (as in, it isn't considered breaking to go from failing an assertion to not failing), these are also non-breaking changes.
Lastly, changing behaviour doesn't necessarily mean changing invariants/contracts.
The halting problem doesn't matter if your API is required to define timeouts / time guarantees. Thus if a#1.0() returns "hello" after 100ms and a#1.1() returns "hello" after 20s, even though they return the same result, they are deemed functionally different. This follows real-world expectations where performance which suddenly slows to the point of unusability is considered breakage, even if the same result is returned in the end.
In practice your time guarantee isn't going to let you produce a sound and complete static analysis that proves the equivalence of two arbitrary (modulo termination guarantees) functions.
In the real world all programs have a timeout set to the time to the heat death of the universe. This hasn't helped us make sound and complete static analysis.
I think that's essentially the same point as LBA in a sibling comment. The same intractability-in-practice applies to it: even if your timeout is 10 milliseconds, that's still (tens of) millions of operations on most modern CPUs and order of a gigabit written to/read from memory, which is a huge state space.
I think it's more like a lower bound on difficulty. In other words, halting is one of many guarantees that would need to be checked to ensure that an API didn't break, and it is impossible to check, so it cannot be done, irrelevant of the difficulty of the other problems (like timeouts or whatever).
I don't think you understood my comment - you don't need to check for halting if all methods in the API have timeout guarantees. Halting is only a problem if runtime is uncertain. If runtime is certain (again, because of promised performance figures) then it is entirely possible to verify whether an API has been broken or not.
Is it just pedantry? Even in the strongliest of practical strongly typed languages, two functions with the same signatures might fail to return on some inputs in the new version that it didn't in the old.
That's a practical distinction I care about that can't be computed.
If you're narrowly interpreting an API to mean just the function signature, then sure, if your type system is appropriately restricted then it is computable (although many type systems are Turing complete, meaning this won't be computable in all cases, for the same reasons). This is, in fact, something my comment explicitly called out.
Having signature compatibility is an important prerequisite, but I think it's possible worse to have code that compiles but does the wrong thing than to have code that doesn't compile (at least it's obvious to the developer that something needs to be fixed, in the latter case).
I'd imagine it isn't, at least depending on how you define API compatibility, and whether you're only looking at the API interfaces. Imagine two versions of a library that implement the function "add".
Version 1:
add Int -> Int -> Int
add x y = x + y
Version 2:
add Int -> Int -> Int
add x y = x * y
Both versions expose the same API interface, but the functions that conform to that interface are semantically different. A stronger type system could probably differentiate between the two functions, but I doubt you could generally compute whether both functions implement the same behavior.
Perhaps with some sort of functional extensionality you'd be able to compute compatibility perfectly, but I can't imagine that ever being feasible in practice.
That being said, what Elm does offer is still a huge improvement over humans trying to guess whether they made any breaking changes :)
These proofs weren't automatically discovered, though for such simple programs I'd expect an SMT solver to be able to find the proof or a counterexample easily enough.
But even proving they're the same value for all inputs isn't all that helpful, because of lazy Haskell code like:
version 1:
fib :: Int -> Int
fib n = if n < 3 then 1 else fib (n - 2) + fib (n - 1)
version 2:
fib :: Int -> Int
fib n = let fs = 1:1:zipWith (+) fs (tail fs) in fs !! n
They're (provably) the same for all (positive) input values, but if you call version 1 with n greater than, say, 35, you'll be waiting quite a while for an answer, while version 2 will be very snappy for the first few hundred thousand values of n, at least, after which the size of the answer will be a bottleneck.
If a library switched from the latter to the former, it'd have a good chance of breaking code.
While obviously exponential code is obviously exponential, this sort of behavioural change can show up in less obvious ways - a change in memory usage might blow your heap after a supposedly minor version change, eg.
In the end I don't think the value judgement of 'breaking' or even 'significant' is computable, and you'd need to rely on a human doing something that approximates to 'right' for your world view with their version numbering.
I don't think 'does it work exactly the same' isn't necessarily the right question, given there are expected to be bug fixes which may change the behavior in some functions.
Irrelevant. If fixing a bug is a breaking change then it is a breaking change. This is the importance of pre-releases/"nightly" branches so that you don't have a mistake of addNumbers(5,5) returning 25 instead of 10 and not being caught and then needing to increment a major number to fix a typo of × to +.
A bug fix is a change. A breaking change is a change that breaks the API. Doesn't matter if the previous status was the intended one or not.
SemVer not working is almost entirely people simply not following it correctly. The projects that follow it as best as they can have very few mistakes were "woops that was a major change" occurs while projects that use it as a guideline may as well not be using it at all.
Irrelevant?? That is the most relevant question that a developer could ask in this situation.
Often, there are ways to work around bugs in an API. And, just as often, those workarounds will break as soon as the bug is fixed. The API developer is in a particularly poor place to judge whether a bug fix is breaking or not. Sometimes, they can talk to users to see if a change will break their code, but other times they can't. Thus, it comes down to how well a developer can predict whether a fix will break projects in the wild.
Irrelevant?? That is the most relevant question that a developer could ask in this situation.
No, it isn't. The most relevant question is "can somebody be using the current API". It doesn't matter if your current API matches the documentation, what matters is whether your current API is out there for others to build on.
Don't try to use a crystal ball or other form of divination to predict what your downstream users have been doing with the code; you will always lose. Instead, suck it up, acknowledge the mistake, and signal the breaking change by bumping the major version.
Maybe next time your developers will spend more time validating their public contract, so they won't have to endure the embarrassment of a major version bump.
Using that logic, every single bug fix that even remotely changes how the API operates would be considered a major version. What would be the point of semver?
Semver both acknowledges and is predicated upon the fact that every bug fix has a range of possible impacts. At one end, some bug fixes have almost no chance of breaking an application. At the other end, some bug fixes will almost certainly break applications.
This is not divination. It is understanding your application and its users.
Finally, in regards to your last paragraph, I genuinely hope that you don't speak to developers like that. Mistakes will always happen. Have a little bit of respect and treat people with kindness.
>Using that logic, every single bug fix that even remotely changes how the API operates would be considered a major version. What would be the point of semver?
Yes. That's exactly how SemVer works. It signals to people that there was a breaking API change - it doesn't matter if you're on version 10495.0.3 - if you've somehow broken your API 10,494 times you're following SemVer. Although if you're that bad at maintaining a stable API you probably don't have any users you know... using it.
SemVer is predicated upon the fact that other people are using your API. It signals to them whether or not it should be safe to update without breaking their code that relies on your API. If you break your API - you broke their code. That makes you an asshole in their eyes. The entire point of SemVer is to signal that you broke your API so that people relying on it can examine their code for breakage, decide whether or not to update, or start working on modifying their code to match the new API.
If your API is very unstable then people aren't going to rely on it. If it's both unstable and you don't signal breaking changes I'd be amazed if anyone put up with working with it.
If I'm using version 3.0.2 I should be fine using any 3.y.z version. If you release a 4.0.0 then I know I need to investigate and determine whether or not it is safe for me to update to that version. If your API is still unstable and changing on a daily basis - you shouldn't be at 1.0.0 yet. 1.0.0 is meant to signal the "stable API stage" of development.
A bug fix is not a breaking change. You are speaking in tautologies. "A bug fix is a change that is a change to the API". No, bug fix does not change the interface. A bug fix does not change the API. It does not require a semantic version change unless it also changes the expected behavior of the method, in which case it is a change in the documentation, which is part of the application programming interface.
A bug fix is not always a breaking change but a breaking change is always a breaking change. SemVer doesn't care if it is a bug fix or not. The only single relevant detail is if it broke the behavior of the API for any reason. The reason for the breakage is entirely and completely irrelevant because the only relevant detail in SemVer is if it broke the public API. It's a binary "Yes/No" question.
Doesn't matter if it breaks it in a way that you don't expect users to have been using it - you broke it. Doesn't matter if you were just fixing a typo that changed behavior - you broke it. Doesn't matter if you were fixing a logical error - you broke it. Doesn't matter if you fixed the PRNG causing some results to no longer be possible - you broke it. Doesn't matter if you deleted a deprecated endpoint - you broke it. '
SemVer isn't asking why you broke it - it's asking did you break it.
Only as far as the type system supports, but I feel like constraining the minimum version unit to increment is far better than doing nothing. Disallowing incrementing the patch version when the types of existing API changes is great.