Prices have never been cheaper and yet deletion strategies remain important. The flaw in this assumption is twofold - data creation grows faster than the price drops and “garbage” data can have performance implications. Cloud storage providers love it if you never delete data because they’re charging you more than it costs, but internally they need to carefully and speedily delete data you’ve asked them to delete because it’s a cost (you’re not getting billed for it).
I don't disagree, but this doesn't address that "if it were free" design point.
Obviously retrieval isn't free, and we have only so much write bandwidth. But a GC might be able to use this in pretty neat ways. You could page out allocations that you think aren't going to be in use, or rarely in use. If you write it to storage, you don't have to be sure. Now that kinda gives you extra infinite memory (the GC dream), so you could do things like project the same data structure into multiple ways depending on how it is being accessed (AoS, SoA, projected subsets, columnar, etc).
We should be thinking about how to have computers decide that deletion strategy for us.
Basically he's describing immutable storage and what we now call write-append-log DB backend.
Quite a foresight at time when microcomputers persisted data on audio tapes and Sinclair launched a computer with custom chassis, keyboard, PCB and 3.5MHz Z80 CPU,..., but yet chose to include only 1kB of RAM to keep the costs low.
Dijkstra once —when the discipline of CS was itself much younger— wrote something to the effect of "how are we supposed to teach our students things that will last their lifetimes?"
(ie if today's kids are ~20, what could we teach that will still be relevant for computing in ~2070?)
The fundamentals and concepts haven't changed much at all, and probably won't for a very, very long time. If you have a good handle on those, everything else is relatively easy to pick up -- even the really new stuff.
What concerns me about new CS grads is that they're not only lacking a lot of the fundamentals, they sometimes even argue that learning them isn't useful.
I use discrete math quite often, but rarely calculus—at least nothing more complicated than knowing what integrals and derivatives are (not how to actually calculate them). I mainly work at the application level, though: understanding business processes and other "soft" skills are much more relevant than advanced math.
I fully expect some companies to still be using Java 8 in fifty years.
Having internalized what it is and what they are and how they work in general is not to be discounted. The usefulness of education is often not in the detail of the rules but in this internalizing. You now know about this concept and it will come to mind without even noticing it when useful when something passes in front of you. If only in business, interest rate or commission is not a small number each time, but adds up. Growth rate is properly "compounded growth rate". Etc.
That’s a good point. Taking calculus and other courses definitely helped to provide a good foundation. But, for anyone struggling in those courses and wondering if they can make it in this industry: a barely-passing grade in Calculus II won’t be a career-ender. :-)
Eugenia Cheng has a funny quip about how the US is the only place that studies calculus as hard as we do. It sounds like she would like it to be a footnote, and not the main course.
Haven't most of these codebases moved to C#/Java over the past 20 years? I feel like Cobol is truly a thing of the past, even for your average old-school bank/insurance behemoth, but then I might live in a bubble.
Some has, but there's still a very large and active COBOL installed base, and there's still active COBOL development taking place.
In fact, COBOL devs tend to be better paid these days, because they're critical but there are fewer of them.
The deal is that companies who rely on such software have a solid, time-proven, solution. Switching that out just to change to a different language would be irresponsibly risky.
Does Java (or it's programmers) know how to represent decimal numbers and fractions at the machine level?
COBOL is used in banking because it natively supported decimal floats from the 70s or some crap, and no other language bothers to truly try and be a COBOL replacement.
Banking / insurance / etc etc are on the Dollar/Penny system. They need 0.01 to be exactly 0.01, and not 0.09999997 or whatever double precision decides to round that to.
And remember, there are fractions of a penny. Ex: $15.097 could be a real price that needs to be exactly calculated.
-------
If this crap hasn't been figured out in the last 20 years, why would Java or C# programmers try to solve it in the next 20 years?
It's more likely for the old COBOL code to just keep running along than to port over to a language that doesn't even meet your legal requirements.
> "If this crap hasn't been figured out in the last 20 years, why would Java or C# programmers try to solve it in the next 20 years?"
C# has had System.Decimal since .NET Standard 1.0 over 20 years ago: https://learn.microsoft.com/en-us/dotnet/api/system.decimal?... - "Decimal value type is appropriate for financial calculations that require large numbers of significant integral and fractional digits and no round-off errors."
Is there even any kind of an online resource that defines these “fundamentals” in a widely-agreed-upon basis, and focuses on only said fundamentals as a purpose-built resource of high specificity?
If so, it’s only a Google search away for these young’uns.
Or, as a mangled quote attributed to Einstein goes, “Never memorize what you can look up in books.”
"fundamentals of computer science" is a pretty disappointing G search, at least in my bubble.
(I currently believe "it's all quantales, what's the problem?" is a defensible proposition, but suspect that this viewpoint may be reminiscent of Mathematics Made Difficult)
One thing the "no deletion" argument misses is that sometimes you have to delete data for policy reasons. At least two cases are important:
* Users ask you to delete their data. If you don't, and they find out you didn't, you have a problem.
* Legal action may require you to delete data. (E.g. you may find that someone uploaded child pornography to your system.)
This is actually a huge problem for companies like Google (where I work). When you have enormous volumes of highly reliable and durable (i.e. replicated) storage, it's actually really hard to make sure you can delete all copies of specific data reliably and quickly.
While acknowledging you're only addressing the Copeland paper (and not Endatabas, where the OP found it), here's the Endatabas solution to this problem:
wonder how it compares with postgre temporal table or just adding a `entity_history` somewhere. Or the timeline data is more intrinsic to the DB design on this one?
The temporal columns are intrinsic to Endb, but they are completely optional. By default, Endb queries run as-of-now, which then return the same results one would expect from a regular Postgres database.
Postgres temporal tables can't make Postgres natively aware of time, so temporal queries tend to be awkward, even if you want the default as-of-now result.
There are temporally-aware databases (SAP HANA, MySQL, SQL Server), but they all treat time as an additional concept layered on top of SQL-92 via SQL:2011. It's difficult for a mutable database to assume immutability or a timeline without becoming another product.
`entity_history` and similar audit tables aren't comparable at all, since they don't even involve the same entity/table, which means all querying of history is manual. Indexing of audit tables is at least a bit easier than the SQL:2011 temporal solutions mentioned above, though.
In all these cases, schema is still an issue that needs to be resolved somehow, since (again) incumbent relational databases assume a Schema First approach to table layout. Endb is Schema Last and allows strongly-typed nested data by default.
The Endb demo is pretty recent, and explains all of this in more detail, with examples:
InterBase[1] was a popular database at one point, the appealing feature to me was that it didn't overwrite data, but kept versions of data, with a pointer to the current version being the only actually overwritten value on disk.
Such a system could be highly useful these days, in the times of almost infinite storage.
Depends on whether that storage will last for millenia (and still be readable with simple technology) or for a decade.
I worry what percentage of valued storage (digitization of valued objects followed by their disposal) will remain in 50 years.
I think of the Eloi libraries (whether Pal's disaster or those of H.G. or Simon Wells.) We can find only faint echos of most profound 'Ancient Greek' texts.
Can someone explain how an organization like the NSA manages to keep records on every digital transaction (supposedly)? This seems like an impossible physics problem because the amount of data seems unfathomable even for an organization like the US govt. Pre-Snowden you could say that thinking was a conspiracy theory. Now we know that not only is it not a conspiracy, its probably much worse.
How about Youtube? They amount of video uploaded daily is increasing yet they manage to store everything essentially on demand going back to the first Youtube video. At the end of the day for Youtube the data must be fetched from a hard drive somewhere...right? Are they buying thousands of HDDs daily?