I wish all explanations made complicated concepts this manageable!
> the simplest way to differentiate most discrete objects (like database insertions) is to insert them probabilistically and apply your derivatives/slopes to the likelihood of the insertion succeeding.
I hope it wouldn't trouble you to go a bit further with this idea? Namely, the idea of differentiating an object or database insertion sounds like it shouldn't be structurally possible.
And why would there be a probability function for a db insertion succeeding? How could it fail - if the db was full or if there are too many requests?
Imagine the particular data structure you'd like to use for your database has if/else branching logic. A simple example thereof is some kind of a BinaryTree<Key, Value> type. Till your termination condition, if the key is bigger go one direction, else go the other direction as you walk down the tree.
The problem you'll find is that the derivative of the error with respect to the _key_ is always zero. Intuitively, nudging the key just a little bit won't change the query result. Imagine some kind of a step function which is 0 for negative inputs and 1 for positive inputs. The slope at every input (except for something interesting happening at 0 -- not a "slope" mathematically, but for your real-world problem you might care about what happens there) is exactly zero.
Phrased another way, the optimization problem you'd like to solve via derivatives and gradients is discrete, requiring combinatorial techniques instead of gradient descent to solve -- at least, that's true if choosing the right database key is an important part of the problem.
If you're able to somehow make the composite algorithm tolerant of probabilistic answers, by far the easiest tweak you can make is to replace if/else logic with a probabilistic if/else. When the key is a lot bigger, almost certainly walk right in the tree. When it's a little bigger, you have a nontrivial chance of walking left instead. When it's a little smaller, you have a bigger chance of walking left and a nontrivial chance of walking right.
Doing so allows you to represent the output as `p * f(right) + (1 - p) * f(left)`, which is differentiable in `p`. There's a bit of nuance in how you get the proper gradient if you only actually take one path (basically, some of the time you'll take the right branch, some of the time you'll take the left, and you want to re-weight the gradient whenever you take a given branch so that if you made that decision many times and averaged the results you'd converge to the formula above), but since you have something differentiable in `p` you're able to propagate that information back to the key you used to derive the probability.
The space where I most see people wanting differentiable databases is in creating giant ML thingamabobs. In that case, the high dimensionality of the value being stored can work in your favor. E.g., you could query the DB a few times, record the probabilities of each path, and use those to do a weighted average of the values you found. High-probability paths will dominate the result, and low-probability paths might still mix in a little interesting information.
That technique would likely not work if you wanted to, e.g., just drop in a differentiable database into your favorite web app and try to follow the gradients to find the inputs causing a certain bug. You'd need something fancier, probably not based on trees at all (at least not explicitly; you might use a tree internally to represent the data, but the conceptual computation wouldn't have those if/else step-function cliffs).
So there will be some key given to the database. And the key either gets results from the right branch or the left (at decision points, as opposed to leaves which would be the result itself). But if you hardcode the branching condition to be something like a greater than or less than, it's not very useful in finding the best result for the key.
But if you have a probabilistic setup, then the key has some chance of retrieving data from the right and from the left. Then we have
{expected goodness = p * f(right) + (1 - p) * f(left)}.
Where f is some objective function that tells us the 'goodness' of the resulting data the key got from going right or left. If you differentiate the expected goodness with respect to p you get f(right) - f(left). Which is how much the expected goodness will change if I increase the probability of going right.
Suppose the derivative f(right) - f(left) was positive, so going right was better than going left, then I can write some script to change p positively so that next time when I get that key the probability of going right is higher. That way we can optimize p for given keys.
Very interesting! I hope I got everything right; where I couldn't understand, I used gpt to help me out (when it told me f was a 'goodness' function, I had a breakthrough lol). The discrete way feels like clean, understandable logic while the probabilistic way with changing p values feels really messy, but I can see intuitively why the latter gives better results.
> the simplest way to differentiate most discrete objects (like database insertions) is to insert them probabilistically and apply your derivatives/slopes to the likelihood of the insertion succeeding.
I hope it wouldn't trouble you to go a bit further with this idea? Namely, the idea of differentiating an object or database insertion sounds like it shouldn't be structurally possible.
And why would there be a probability function for a db insertion succeeding? How could it fail - if the db was full or if there are too many requests?