If you have a spreadsheet model, it lets you answer questions like "Which variable in my model is the most important?" and "If I underestimate X by 10%, what's the effect on Y?".
Most models are built on assumptions and estimates — if you run a company, your financial model will probably include an assumption around your customer growth rate in the future. If you use this model to make decisions — "How many people can we hire in Q4?" — then it's important to understand how sensitive these decisions are to your assumptions. It may turn out that overestimating your growth rate by 10% means you can only hire half as many people, in which case you'll probably want to hire a bit more conservatively. This analysis is pretty cumbersome to do in spreadsheets.
More broadly, we think there should be a better way than Excel to crunch numbers and do modelling, and we're trying to build it (https://causal.app). Would love to chat if this sounds interesting: taimur@causal.app :)
I see you're using a variance based approach. How does it handle a dependent structure? To my knowledge, Sobol indices assume that the input variables are independent.
Yep — we assume that the input variables are independent. This is more acceptable in some settings than others, so as always, it's good to understand the limitations of the model. Thanks for pointing this one out :)
I am not aware of any software package that implements these algorithms, although I am working towards creating or contributing to a python library that does. Ideally I'd publish a paper in a few months showing how the results (e.g, ranking of importance) differs from the independent case on particular datasets.
Edit: It turns out SAlib recently added some of this functionality. I'll check it out!
I'm late to the party. Hoping you can answer a question. On the one hand, you're lowering the bar to do useful statistical work, which is awesome. A new set of people will be able to do better analysis. On the other hand, you're baking in assumptions which this new set of people won't understand, so you aren't necessarily lowering the bar to do correct statistical work. That seems challenging, and I'm curious how you think about it.
The process of backpropagation is figuring out which input has the greatest effect on output and adjusting the weights on the input to fit the output curve. In other words, assuming a trained single layer neural network, the weights alone will tell you which inputs have the greatest effect on output.
Sorry for the late reply, just saw your comment. I think what you're describing is training a model and then using feature importance. This is also a valid method for calculating sensitivities and it's used if the function from inputs to outputs is unknown. However, in our case, this function is known so there's no need to train a predictive model.
If you have a spreadsheet model, it lets you answer questions like "Which variable in my model is the most important?" and "If I underestimate X by 10%, what's the effect on Y?".
Most models are built on assumptions and estimates — if you run a company, your financial model will probably include an assumption around your customer growth rate in the future. If you use this model to make decisions — "How many people can we hire in Q4?" — then it's important to understand how sensitive these decisions are to your assumptions. It may turn out that overestimating your growth rate by 10% means you can only hire half as many people, in which case you'll probably want to hire a bit more conservatively. This analysis is pretty cumbersome to do in spreadsheets.
More broadly, we think there should be a better way than Excel to crunch numbers and do modelling, and we're trying to build it (https://causal.app). Would love to chat if this sounds interesting: taimur@causal.app :)