I'd like to see some more general unsolved problems concerning big code bases, programming languages, etc. Examples:
- Object-relational impedance mismatch
- MVC (solves only half of the problem)
- Global state
- Schema vs schemaless data stores
- Raising exceptions vs returning error codes
A good sign of an unsolved problem in this regard is that programmers go back and forth between solutions, forgetting the downsides of the previous flavour while enduring the downsides of the current flavour.
Very often, one new solution pretends it will get rid of complexity, but the actual complexity is just moved a bit further behind the carpet.
This one is easy: don't do it! If you're talking to a database, you should be using a powerful and declarative language for building your queries.
>Global state
This is a solved problem from a computer science perspective. The rest of the work is getting programmers to understand how and change their habits accordingly.
>Raising exceptions vs returning error codes
Option/Either types are a very strong middle ground. They don't require the boilerplate of checking error codes and they don't have the unpredictability of exceptions (particularly in the case of asynchronous code).
>Very often, one new solution pretends it will get rid of complexity, but the actual complexity is just moved a bit further behind the carpet.
I know what you mean. A lot of solutions end up "punting" the complexity to some other structure. The real answer (if there can be one) is to find ways to eliminate this complexity altogether. One major way to do that is to work with values and compose pure functions to build them. Pure functions dramatically improve one's ability to reason about their properties in isolation.
"""This one is easy: don't do it! If you're talking to a database, you should be using a powerful and declarative language for building your queries."""
Not only that, but I have been finding that most problems where I would originally turn to using objects and classes are better solved by using static functions that operate on large data structures. This strategy also helps sidestep the object-relational mismatch in a lot of cases.
Well, it depends. Say you want to model a user of your application, this looks very much like a candidate for an object. Sure when you just start, you can use a dict or even only the id, and have functions working on it.
But will this scale when the number of possible actions over a user will be in the hundreds? Then, with all it drawbacks, the object model may make sense: at least, you have a way to know which actions are available on a given thing.
When you have a "user" in hands, how do you know what you can do with it? When it is an object, it is relatively easy, as the methods are bundled in the object. If you use a functional paradigm, you'll have to resort to some other mean, maybe you'll regroup user-related functions in a place (but some are mixed), or check for function accepting user_id as an argument, but these means seem less straightforward to me.
Ah, so you're talking about tooling. Specifically, the way object method invocation lends itself well to autocompletion or intellisense?
In functional languages we tend to use far more generic facilities than this. Some examples of this are lenses and scrap-your-boilerplate style generic programming.
We also tend to think differently. We don't "hold a user in our hands" and ask what we can do with it. We think in terms of extending our base language until we can express the solution to our problem in a trivial way.
That's all good but my personal experience about complexity and big code bases sadly don't include functional paradigm.
Note that I'm not taking about complex problem with potentially simple solutions, I'm talking about inherently complex systems, like, say, the HR system of General motors.
I want to elaborate on "values" since I've been thinking about them all morning. They're just a simplification of all of programming.
Everyone remembers the first time that they learned how messy IEEE Floating Point math actually is. You were probably trying to treat Floats like R and then the abstraction leaked all over you. The same thing happens with int32/64 types when they roll over on you or (worse) throw exceptions. There's a mismatch between your mental model of these objects and their actual behavior. Anyone who's ever used BigInts has probably noticed the sigh of relief you get when you make that tradeoff: "these might be arbitrarily slow, but I also know that my computations will work exactly how I expect pretty much forever".
(Yes, you can flat out overflow your memory if you want, but even if it has to swap to your disk for 30 years, the value is intact.)
That sigh of relief is the thing we should be buying with our massive CPUs.
That's also exactly the premise of immutable values. They get computed and then they stay computed allowing you to have an extremely low-impact mental model of their behavior. They work platonically—once you compute them they just exist to be inspected.
Better than that, values you compute can often be finite or efficiently infinite structures so that you don't even truly worry about arbitrary slowdowns. It's relatively easy to play within the "easily computable" playground.
Nobody—not even the purest Haskeller—thinks that everything ought to have value semantics. Instead, they just form a significantly simpler underlying language. This just leaves less room for the semantics of your language to be complicated. They only get complicated when you legitimately want them to—so if you want mutable skiplists then you can have them by creating an environment where values has mutable properties. And then that environment, ST, is itself a value that you can pass around. (That's half the point of Monads.)
About global state: yes global state can be avoided most of the time, but the fact that there is still some pressure to use it show it is not that obvious, or, said otherwise, that no other solution is able to cover all use cases.
For example, GOTO is bad and has gone for good and I have never heard of some coders missing it (except for the fun), so it is a solved problem. But if you need to dynamically set a value and use it in a very remote part of the code, the need for some global state arise and is hard to avoid.
About ORM: sure, but these still exists and are widespread, so the solution must be not that obvious.
About complexity: yes also, but I think a really efficient way to properly "dissolve" complexity (and not push it somewhere lese) is to have good data structures.
>About global state: yes global state can be avoided most of the time, but the fact that there is still some pressure to use it show it is not that obvious, or, said otherwise, that no other solution is able to cover all use cases.
Right, but my point is that this is not a computer science problem anymore; it's really a matter of discipline and habits (or languages which enforce good habits).
>But if you need to dynamically set a value and use it in a very remote part of the code, the need for some global state arise and is hard to avoid.
There are still very principled ways to do this. A problem can still be solved even if some people are ignorant of the solution.
>About ORM: sure, but these still exists and are widespread, so the solution must be not that obvious.
These still exist largely due to inertia; same reason a lot of bad ideas still exist.
>About complexity: ... good data structures.
Yes. Good data structures and good habits: layered stack of languages, data abstraction, composition over inheritance, bottom up design, DRY, static typing (why is this still controversial?).
You seem to be talking about a different kind of "problem" than the article. What you list are patterns that have advantages and disadvantages, where the "solution quality" is a matter of opinion. The article talks about things which haven't been proven yet.
POPLmark seems to generally be considered solved. There are multiple good solutions using different tools for all of those problems, I believe.
And the real goal has largely been reached. At this point, if you're proposing new PL theory and you don't have a mechanised proof of it, the entire work is viewed with considerable skepticism.
That said, reuse and productivity are still not much better than that of real-world software. My peers who do that sort of thing are in very high demand for post-doc positions on projects funded by various governments/militaries to "prove the X property about software system Y."
Having said that I suppose most problems we face in this field boil down to mathematical problems at the fundamental level. At least that's the conclusion I came to when I tried to think of unsolved AI problems.
What is the fastest algorithm for multiplication of two n-digit numbers?
What is the fastest algorithm for matrix multiplication?
I think those "What is the fastest" questions are relative. In my opinion, they are not unsolved problem. There are already solutions for them. And in those current solutions, there is of course the fastest.
When a faster algorithm is formulated, a time will come where another algorithm will perform much better.
EDIT: Now I get it. That "fastest" is in terms of time complexity. Thanks guys. :)
An Algorithm with a lower time complexity will always take a shorter amount of time once the problem size grows large enough. You could run one algorithm on a modern cpu, utilizing Simd and tune for cache coherency and the other in QBASIC on an old 386 and the O(n log n) algorithm would still terminate earlier than the O(n^2) one past a certain n.
the problem is that while dropping from n^2 to n log n is a big deal, dropping from (n log n) to n won't help much if you end up increasing the constant costs at the same time.
It's talking about big O time complexity, which means you have to prove that there can be no faster algorithm. For example comparison sort (so no radix) can be no faster than O(nlogn).
I think the hidden subtext here is that many people believe there exists a O(n^2) algorithm for matrix multiplication, but it has not been discovered yet.
[1] Wouldn't be surprised if this is the case for the other problem too, although I don't know much about it.
Ran Raz proved that Matrix Inversion is O(n^2 lg n), ref: Ran Raz. On the Complexity of Matrix Product. SIAM Journal on Computing,
32(5):1356–1369, 2003.
Regarding even faster operations, it has been hypothetized that all matrices are Toeplitz or Hankel (which have O(n lg n) algorithms), ref: D. S. Mackey, N. Mackey, and S. Petrovic. Is Every Matrix Similar to a Toeplitz
Matrix. Linear Algebra & its Applications, 297:87105, 1999.
But that was proved to not be the case:
T. Amdeberhan and G. Heinig - http://www-math.mit.edu/~tewodros/georgmemoriam.pdf
Integer factoring is not (known to be) NP hard, i.e. we don't know whether we can reduce any NP problem to integer factoring (we suspect we cannot). On the other hand, it's in the intersection of NP and Co-NP, because you can give an effectively checkable proof of both primality and compositeness. If you're interested to learn more about these things, Scott Aaronson has an equivalent of a complexity course stuffed into one lecture here ;)
I don't know, I consider them more unsolved mathematics problems in the non academic world.
I know tomato tomato, but to me in computer science it's more important to know if they have been solved for most cases (Or alternatively considered probably to hard in RL)
Getting to caught up in perfect solutions can be more a hindrance.
I'm going to bicker about language, but that sounds more like a software engineering mindset. Most of the most fundamental problems in computer science have exactly that kind of mathematical nature.
Works well enough gets a paycheck, but doesn't impact the field.
The name you're looking for is "Unsolved problems in computer engineering." I agree; there should be an article for that, but it should be kept separate from this one.
- Object-relational impedance mismatch
- MVC (solves only half of the problem)
- Global state
- Schema vs schemaless data stores
- Raising exceptions vs returning error codes
A good sign of an unsolved problem in this regard is that programmers go back and forth between solutions, forgetting the downsides of the previous flavour while enduring the downsides of the current flavour.
Very often, one new solution pretends it will get rid of complexity, but the actual complexity is just moved a bit further behind the carpet.