Ditching Excel for Python in a legacy industry

yold__ · on Dec 31, 2020

I'm a research actuary working in reinsurance. Here is why I think Python creates more problems than it solves from the standpoint of most insurance business users:

1.) Environment management. There are many solutions for managing python dependencies, my favorite is Docker + pip. Good luck getting actuaries and underwriters to write Dockerfiles etc, and good luck getting I.T. to support Docker on Windows desktops. Like it or not, the best "feature" of Excel is that it is mostly the same on every corporate Windows machine.

2.) Unless you are using numpy / numba, Python isn't that much faster than VBA (if at all). Both are "compiled" to interpreter bytecode.

3.) Speed of development and traceability. Excel takes a lot of getting used to, but if you know the purpose of the spreadsheet (e.g. a reserve calculation), it's relatively easy to figure out what a mangled and convoluted formula is doing (Excel has a "debugger" that allows you to evaluate formulas by highlighting pieces).

4.) LAST BUT NOT LEAST. Many financial and actuarial (insurance) calculations are inherently recursive. Excel has built-in memoization (in the dynamic programming sense). It also has a reactive programming model. Good luck implementing that in Python without tripping up on the huge amount of function call overhead, even if you use a memoization decorator.

SoSoRoCoCo · on Dec 31, 2020

It is great that you respond with succinct reasons.

People who have used Python for some years seem to forget just how clunky it really is. I've been using Excel since 1990, and sure, it has its own warts, but Python is a very rudimentary tool compared to Excel.

Python is a machine shop. Excel is a car. It may be a lemon, but it's a functional car.

This is a great example of programmers not being able to see the forest for the trees. Reminds me of the "Once Linux gets a desktop it will take over the world" debate from circa 1997-today.

trimbo · on Dec 31, 2020

> Reminds me of the "Once Linux gets a desktop it will take over the world" debate from circa 1997-today.

Linux did take over the world, just not on the desktop. It was on servers and mobile, which now have more users than desktops or laptops (edit: servers via the web).

Technology gets its warts fixed when it grows along an explosive new market, especially if the market ends up being larger than the last.

Python is currently riding the data science wave, and that wave is growing. If that market expands to the point where large scale data-science type work wags the dog of VBA/excel, the clunkiness[1] will work itself out.

[1] - I don't actually understand what's clunky about Python in the context of the article. Seems like a reasonable direction in a complex market (reinsurance) driven by actuaries. I'd be surprised if newgrad actuaries/stats people aren't using Python?

SoSoRoCoCo · on Dec 31, 2020

Absolutely. I worked at Intel and our distributed computing pools went from a combination of SunOS and AIX machines to Linux in about 3 months, essentially overnight, (back in the 90's). It was an astonishingly fast deployment.

Linux dominates the server world AND the entertainment device world (hello busybox & gstreamer!)

[1] Regarding clunkiness of Python: mostly it is the packages, installation, and 2.x vs 3.x nightmare that persists. Everyone seems to forget the initial pain getting the Python env to work, esp. when it comes to cython native compilation issues / arch wheels, unsupported packages, etc. The only issue I have with python is it is extremely challenging to make cross-platform deployments for single-executables. I've tried three different approaches and they were all trainwrecks. Once that is ironed out, I'll be switching from Electron to whatever Python offers.

nl · on Dec 31, 2020

I think the 2.x vs 3.x issues have mostly been resolved by now. I don't think I've hit one for a long time, and even StackOverflow answers are more likely to be Python 3 now.

SoSoRoCoCo · on Dec 31, 2020

macOS still ships with Py2.7 and has dependencies, and npm-gyp only recently switched to 3.x. same with python SDR. it depends what you use: less popular packages are still languishing.

but that discounts the tens of thousands of projects that are already out there that are in use and need conversion.

it'll take probably 3-5 years for it to really go away.

nl · on Dec 31, 2020

Yes, that's true. And aren't some versions of RedHat still on 2.7 too?

But I class this as a packaging issue more than a 2.7 vs 3.x issue: you see the same problems with (as a random example...) different versions of OpenCV - people not using virtual environments have problems even if they are all on 3.7.

When I think of the "2.x vs 3.x problems" I was thinking more of the language and core libray level incompatibilities.

buryat · on Dec 31, 2020

macOS ships with both py2.7 and 3.7, `python` would call 2.7, and `python3` would 3.7

vimy · on Dec 31, 2020

As of macOS 11.0 there is no python anymore. You have to download it.

SoSoRoCoCo · on Dec 31, 2020

Really? YAY!!! I upgraded to Big Sur and it is still there... but I don't know if it is a legacy since I've been TimeMachining my machines since 2012 and things persist.

vimy · on Jan 1, 2021

Hmm apologies, I was wrong.

The Catalina release notes[1] said they would remove it and I was sure they mentioned it again in this year’s WWDC but apparently it’s still included after all.

[1] “Scripting language runtimes such as Python, Ruby, and Perl are included in macOS for compatibility with legacy software. Future versions of macOS won’t include scripting language runtimes by default, and might require you to install additional packages. If your software depends on scripting languages, it’s recommended that you bundle the runtime within the app.”

noir_lord · on Dec 31, 2020

> Linux did take over the world, just not on the desktop.

I'm honestly not sad that it never happened.

It took over my desktop around ~1998 and I wonder if massive adoption of Linux on the desktop would have benefited me or been worse (from my perspective).

As it stands literally every single tool I want/need to do my job is already available for Linux and indeed many of those tools are simply better on Linux (docker on a mac is horrible, I have a work issued current gen macbook pro, I use it purely for testing docker set-ups and then it goes back in its case).

Its going to sound elitist but not dumbing down the platform for the average user is a benefit to me.

danieldk · on Jan 1, 2021

Well, it would certainly make may day live easier if we had Microsoft Office, the Affinity Suite, and Lightroom on Linux, which would probably have happened if Linux was the dominant desktop.

atoav · on Dec 31, 2020

Depends really on your proficiency at python and the task at hand. For many "small fry" tasks Excel is perfectly fine and faster. Good when the table data is meant to be in foreground.

For things where data exploration and formulas should be in the foreground or where data and formulas should be strictly separated python (or a python/jupyter notebook) has tangible benefits (e.g. really good list syntax).

rualca · on Dec 31, 2020

> "Once Linux gets a desktop it will take over the world" debate from circa 1997-today.

Pretty much all servers run on linux.

Linux also dominates smartphones in the form of Android phones.

Chrome OS is also linux, and it's market share is currently at around 6%.

I mean, other than desktops Linux pretty much is everywhere.

pjmlp · on Jan 1, 2021

> Linux also dominates smartphones in the form of Android phones.

Not really.

See any Linux specific APIs on the NDK official APIs?

https://developer.android.com/ndk/guides/stable_apis

Android is a mix of Java and Kotlin based frameworks, ISO C and C++, POSIX subset and a couple of additional libraries.

Whatever kernel gets used is an implementation detail for Google and Android device makers.

Can be completely replaced in Android 12, and the eco-system would continue to work.

> Chrome OS is also linux, and it's market share is currently at around 6%.

Basically Android (already mentioned above) and Web stacks.

https://chromeos.dev/en/android-environment

https://chromeos.dev/en/web-environment

Ah, but it does expose Linux you say, https://chromeos.dev/en/linux

Indeed, except of the small detail that as shown on the Google IO talk, it is actually a design similar to WSL 2, running a second kernel on a hypervisor based environment.

The real kernel powering ChromeOS doesn't get exposed to userspace and can also be replaced at any time, if Google so desires.

In fact, in a near future Android and ChromeOS can be running on top of Fuchsia and most consumers wouldn't even notice.

fock · on Dec 31, 2020

I wonder what the problem is with standarizing companywide around python@3.x, numpy@1.1x and pandas@1.x. At this point these can all be considered mature and why on earth would an org, which is not developing these packages, nor heavily consuming outside code (because they didn't with excel either in a sensible way?) decide to jump on the "but we need rrrrrollling release"-fad bandwagon?

yold__ · on Dec 31, 2020

The problem isn't feasibility, it's resources. Building and rolling out a standardized environment, and maintaining it, will cost millions of dollars. It shouldn't, but it does. And for what added benefit? The end-users don't want it, you'd have to spend another couple million for a lateral move at best. More than likely, you'll end up with a pile of Python spaghetti code that runs slower than the spreadsheet (see point #4 about massively recursive calcs).

shnock · on Dec 31, 2020

Why does building and rolling out a standardized environment cost so much? Could you break down the requisite steps and resources required to achieve this?

Thank you, I appreciate it :)

yold__ · on Dec 31, 2020

Up-front costs (mostly salaries, but all I.T. projects are "billable") 1.) Getting buy-in from solutions architect, software architecture, information security, I.T. management. This will be a 6 month process. 2.) Getting buy-in from actuarial management and audit. Another 6 month process.

Recurring annual costs (over 10 years) 3.) Contractor at $150 an hour = $300K annually 4.) Contractor PM at $50 an hour = $100K annually 5.) Information security compliance hoops, getting it to play nicely with the myriad of endpoint security tools, etc 6.) Ongoing maintenance and support (failed rollouts and upgrades, user desktop support, user training)

tomnipotent · on Dec 31, 2020

This isn't realistic at all. Assuming actuaries know Python like they know Excel, the only other added cost is someone technical to slap together a working environment, which isn't particularly hard when compared to the state of what Excel offers in a collaborative environment.

You just don't jump from "humans in Excel" to "CI/CD perfected pipeline" overnight, nor do you need it.

Excel shops still have costly expenses rewriting entire workflows/re-doing Excel files constantly as people come and go, it's not like there isn't already maintenance cost with the current method.

buryat · on Dec 31, 2020

reminds me of standardization of the shipping container, not sure of all the details of how that push had happened though

sam_bristow · on Dec 31, 2020

Funny story about Excel on corporate machines. A couple of years ago the company I work for got boight by an Italian company. When we finally migrated the Windows users over to the corporate Office installs a bunch of people found that Excel wouldn't work for them. Things like sum(A1:A20) were syntax errors.

After a bunch of digging i worked out that the localisation from corporate meant they suddenly had Italian function names not English. Very confusing.

Excel is a program that is both incredible and terrifying to me. There are ways of building spreadsheets that are reliable and auditable. Then there's how 95% of people do it.

You can start out really quickly and make great progress. But it tends to grow and metastasize before you know it.

etothepii · on Dec 31, 2020

It must be easier to build an auditable and reliable solution using a high-level language programming language and concepts like source control and automated testing.

Excel is only easier if you aren't interested in building something auditable and reliable solution that might have some hope of being maintained after you have left the company.

sam_bristow · on Dec 31, 2020

That's the thing, most Excel workbooks start out as a one-off then gradually get adapted and extended until they're load-bearing.

They're often built by specialists in another dept who definitely wouldn't consider themselves programmers.

Doing it 'properly' would probably mean having to spec put the problem, get a budget, maybe wait a few months for someone to look at it. And the same thing every time the requirements change.

Excel is available today and they can get started solving their immediate problem straight away.

After it's been in use for a couple of years and shown value someone takes a look and sees the Lovecraftian horror it's become.

tomnipotent · on Dec 31, 2020

> until they're load-bearing

This cannot be stressed enough. I've outlived generations of finance teams at many startups, and I've seen firsthand the masterpieces/abominations left behind in Excel. Imagine a dozen sheets with ad-hoc queried data copy/pasted from System A/B/C/D into Excel, with formulas that feed formulas that feed formulas. Sometimes columns are inputs (seasonality adjustments for monthly forecasts), sometimes their outputs (modeled growth * last year * seasonality adjustment) and more often than not their right next to each other and maybe they have different cell background colors or a black separator line. Maybe.

And this is just finance. For many e-commerce businesses, planning is done in Excel with equal zeal.

sam_bristow · on Dec 31, 2020

I remember hearing about a mythical spreadsheet floating around for modelling something to do with our national grid a few years back.

It would take about 12 hours to calculate, and would error out before finishing about 30% of the time. It needed to be run once a day for something reasonably important.

I don't use Excel much these days, but I do point people to a video if they do plan on doing anything:

* [You suck at excel - Joel Spolsky](https://m.youtube.com/watch?v=0nbkaYsR94c)

rvba · on Dec 31, 2020

Someone who does not use Excel nice will probably make spaghetti code as well.

larrywright · on Jan 1, 2021

I'd rather reverse engineer spaghetti code than a spaghetti Excel spreadsheet.

JackoM · on Jan 5, 2021

Most probably because you do not know Excel !? But for someone who do know well Excel and understand finance concept it'd be easier to understand the spaghetti Excel spreadsheet and find where it doesn't work properly !

empthought · on Jan 2, 2021

You’ve been lucky with your spaghetti code experiences, then.

larrywright · on Jan 2, 2021

At least with spaghetti code I can set a breakpoint or add print statements and get an understanding of the execution path.

Mauricebranagh · on Dec 31, 2020

And you find a bug that costs you $$$$,

I worked for a company that used an opaque excel spread sheet as a part of its accounting system - turns out there where bugs and we found a massive short fall one of the contributing factors in the collapse of the company.

etothepii · on Dec 31, 2020

This is the real problem. It's not with excel per se, but the complete lack of automated testing and source control.

fnord123 · on Dec 31, 2020

Python works with source control and tests so this would indeed be a problem with Excel per se.

etothepii · on Dec 31, 2020

But Python doesn't have to, I've seen plenty of python with 1000 line functions, no tests and no source control. It's particularly common in Jupyter Notebooks.

dataflow · on Dec 31, 2020

> There are ways of building spreadsheets that are reliable and auditable. Then there's how 95% of people do it

Do you have any pointers to learning materials on how to do this? Would be interested in reading more on it.

sam_bristow · on Dec 31, 2020

This Twitter thread is a good start: https://mobile.twitter.com/keith_ng/status/13079610874515251...

There also Joel Spolsky video I linked in another comment: https://youtu.be/0nbkaYsR94c

I don't actually use Excel much so others might have better resources.

dataflow · on Dec 31, 2020

Oh I've already seen the Spolsky video, but I seem to recall for the most part (except for using tables, which is great) it's about how to use Excel effectively, not about how to produce maintainable sheets. The Twitter thread looks good though. Thanks!

unixhero · on Dec 31, 2020

Localisation is where you first have to debug Excel formula errors. Is it colon, or semicolon; it responds to, is my goto method to approach the issue.

HALtheWise · on Dec 31, 2020

This brings up a good point, which is that Excel supports localization, while Python just assumed you know English.

stochastastic · on Dec 31, 2020

Hey fellow reinsurance actuary! I totally agree that Excel has its place in modeling, especially one-offs, and your criticisms make sense. That said, we have been moving a lot of our calculations to Python. We have had way too many rickety tools to move files or send emails (“first you open this spreadsheet and click this button, then you open this spreadsheet and click this button, then...”), and way too many version control issues over the years. Python solves those nicely.

I’m curious about docker + pip, why do you like that better than poetry or pipenv?

yold__ · on Dec 31, 2020

One reason why Python is so successful is that it places very nicely with C code. Many of Python's libraries are thin wrappers around native DLLs.

For example, numpy is a wrapper around a BLAS DLL (e.g. Intel MKL). Pipenv manages the python side of things, but don't exert control over the system DLLs (like Docker does). Anaconda gets very close to what Docker does (by managing DLLs). Have not used poetry, so can't comment.

Ultimately, like most dependency management issues, lacking a stable DLL environment won't be a problem until it is :)

lldbg · on Dec 31, 2020

numpy is much more than a wrapper around a BLAS dll. BLAS implements three sets of operations: Level 1: unary and binary vector vector operations, one transform. Level 2: Matrix vector operations. Level 3: Matrix matrix operations (most famously the dgemm routine).

Perhaps some blas implementations offer more features, but that would defeat the purpose of a standard interface.

stochastastic · on Dec 31, 2020

Okay, I can totally see it now. We are still at the stage where people seem to think that I’m neurotic for worrying about the python side of things, so DLLs have not been on the radar. ;)

z3t4 · on Dec 31, 2020

In Nodejs native libraries are a PITA. The compatibility API breaks on a schedule every 6 month, dependencies get updated by OS distros sometimes breaking stuff, then you rely on the OS being able to compile the library. I hope Python has better native interop.

syndacks · on Dec 31, 2020

Much like the referenced Excel spreadsheets becoming unwieldy, so does a dev's machine [0].

0. https://xkcd.com/1987/

etothepii · on Dec 31, 2020

It would be considered incredibly bad practice for such a dev's machine to be used to perform almost any calculation of significant business importance. It's why mediocre Tech Executives are able to appear like they got some work done by focusing on "no access from dev to production."

With Excel there is no such separation, and when there is it would make a great punchline to an XKCD or Dilbert cartoon.

newdude116 · on Dec 31, 2020

From the submitted link:

"The desire to price increasingly complex deals with increasingly large datasets"

Bingo! Most people use Excel when they actually should use a database. I am sure you can use Excel with a database like MS Access, but then again, who does?

To your arguments: 1. " and good luck getting I.T. to support Docker on Windows desktops." Yah. Great experience to work with Excel on Linux.

2. You can always link compiled code for stuff that needs to be fast. But in the end most people wont use neither python nor Excel for HFT

3. " it's relatively easy to figure out what a mangled and convoluted formula is doing"

https://www.sciencemag.org/news/2016/08/one-five-genetics-pa...

https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-al...

https://www.sciencealert.com/excel-is-responsible-for-20-per...

4. Maybe. Not sure it is really an issue.

cutler · on Dec 31, 2020

The case for using a database with Excel is at least as strong as for using Python or C# to model Excel data. There are excellent free adapters for MySQL and PostgreSQL.

RobinL · on Dec 31, 2020

I think these are all fair points and a good reflection of the downside of Python. But there are also some pretty huge upsides. My view is that for any model of significant complexity, the pros of Python outweigh the cons from a technical point of view.

- Abstraction. It's very difficult to effectively abstract parts of a model in Excel. It's a bit like a doctor having to 'model' a human being as a collection of atoms, rather than having abstractions like organs, cells etc. This makes it very hard to build re-usable components, so analysts end up reinventing the wheel. You also quickly hit a 'complexity ceiling' in Excel, above which mistakes and errors becoming much more likely, and complexity is very difficult to manage.

- Existing libraries provide a huge range of sophisticated calculations and operations for which we don't need to write any code.

- Separation of concerns - particularly separating data from model. Easy in Python, hard in Excel. Another aspect of this is that using data science software promotes the use of tidy data[0] (i.e. clear thinking about how data should be structured).

- Unit/integration tests. For complex models, these are essential. Users of Excel (even extremely clever/competent people) don't have have a great reputation for producing error-free spreadsheets, and I think this is an important reason why, alongside copy-paste errors. The tools for testing in Excel/VBA are rudimentary.

- Version control. This is particularly important for historical reproducibility because it allows us to run past models, and also understand what has changed in the codebase since.

I appreciate some of the above is also possible in VBA, but if you're writing an entire model in code and not really using Excel at all, my view is it's better to use a more sophisticated programming language.

There is also an important cultural point of having to re-skill everyone, and I can see that in some context that means in the short run at least, Excel/VBA may still be better overall.

I've written a bit more about all of this here: https://www.robinlinacre.com/transforming_analytical_functio...

[0] https://vita.had.co.nz/papers/tidy-data.pdf

pjmlp · on Dec 31, 2020

> I appreciate some of the above is also possible in VBA, but if you're writing an entire model in code and not really using Excel at all, my view is it's better to use a more sophisticated programming language.

Which is why many VBA experts eventually adopt VB.NET instead of jumping into a complete foreign language, with the benefit that is actually compiled to native code (JIT/NGEN), if performance is ever an issue.

cutler · on Dec 31, 2020

Yes, I was wondering why Python was the automatic choice considering C#.Net has typesafe native APIs for Excel on Windows.

RobinL · on Dec 31, 2020

I was more thinking about migrating away from Excel fully rather than interfacing with Excel from Python.

I agree that to interact with Excel programmatically VBA is a better choice (and no doubt C#/VB.NET as well, but I have no direct experience). For what it's worth, for interacting with Excel and Office more generally, I've always though VBA is extremely well designed.

jandrewrogers · on Dec 31, 2020

I use both Excel and Python, and like both. They solve different kinds of problems, even within the same context.

Excel is fantastic for what I would describe as linear modeling, building a graph of effects in single data models. I reach for Python when I need to fundamentally transform the data model at points to answer the desired question. That is difficult to the point of being impractical in Excel, especially if the data model is large or exploratory. Python is more programmable in this regard but also lacks the strong static typing that would be useful in such work.

I can’t imagine not using either.

sriku · on Dec 31, 2020

If you have valid reasons to make the move from Excel to Python, why not consider Julia? Environment is easier to manage (Pkg.add), the language "looks like Python and walks like C", math-friendly style possible, just-ahead-of-time compilation resulting in high performance (enough to not need native code implementations), interactive development (Jupyter was named for Julia-Python-R after all), @memoize may be enough for you and Pluto gives you reactive notebooks.

Bonus - tools are also emerging to make stand alone distributables.

Disclaimer: I neither work for nor am I affiliated with Julialang. I just use it.

mint2 · on Dec 31, 2020

Why do actuaries refer to workstation/desktop computers with more than 16 cores as “super computers” it’s embarrassing but sometimes I give in an say “the super computer” because I’m in a hurry and they’ll give me a blank stare if I call it a workstation or anything like that.

mark-r · on Dec 31, 2020

They really are supercomputers though. Do you know how much faster a modern PC is compared to say a Cray-1? Especially if it has a decent graphics card.

mint2 · on Dec 31, 2020

So a typical actuary’s technological reference point is stuck in 1985? That explains a lot about excel and sas egp. But seriously, calling a fairly standard computer in the tech world a “supercomputer” is just another example of the underlying attitude in insurance that makes many actuaries recoil in horror about the thought of “programming” aka learning python or any programming best practices.

ryukafalz · on Dec 31, 2020

Yeah but at this point my phone is comparable. If being faster than a Cray-1 qualifies something as a supercomputer, the definition is meaningless now.

mch82 · on Dec 31, 2020

> problem #1: Environment management

Great observation. Python environment management is getting simpler, but is off putting for people without a software background. Unclear even CS majors get enough classroom exposure to package & dependency management to utilize Python efficiently.

I’m more optimistic about an on-prem deployment of Jupyter Notebooks or Sage Math Cloud as a way to hide a lot of the setup complexity. More like a wiki for math. Curious if anyone has stories/tips to share (good or bad)?

everling · on Dec 31, 2020

Quant here, at my firm we've deployed a JupyterHub server which provides users with a production docker image, so that analysts and portfolio managers can perform analyses without installing python, dependencies and sql drivers locally. It is working well and spurs interest in Python across the wider org - so we let everyone use it.

Similar to in OP's case, I think the real selling point of Python over Excel is advancing the capabilities and the scale of the business. Talks of different programming languages falls on flat ears in finance - show what can be done instead. With Python, Zipline and notebooks I can manage a global equity portfolio, continuously adding active strategies and adapting to real-world changes and constraints. And backtest! Excel is great, but there is an upper bound to what can be reasonably done without a thriving open source community.

klelatti · on Dec 31, 2020

That's really interesting. I've been working on a JupyterHub / JupyterLab / Python based product for the insurance industry - choosing this for all the reasons you've cited. Would be really interested if there are any points you can share e.g are you using Kubernetes and if so how have you found it?

everling · on Jan 1, 2021

We use Kubernetes, according to our IT/devops guys it was pretty straightforward to deploy in Azure with Jupyterhub's KubeSpawner module + documentation. A few people are quite eager to learn Python / code in general, so we try to make it convenient. One common use case would be to work with existing excel spreadsheets, so the notebook volume storage should be mountable in Windows. The file upload/management in notebook servers is quite obtuse IMO.

If a recurring task can be reasonably parameterized then a Streamlit app might be a better choice in some instances. I've developed a monitoring application for our portfolios where I can track daily asset weights, underlying data points, computations etc. Not displaying code ensures that the output can be consumed by a wider audience.

klelatti · on Jan 1, 2021

Thanks especially for the Streamlit suggestion.

We've tested JH with K8 in GCP which was straightforward also. With a small team though tending towards a single VM deploy (based on "The Littleist JupyterHub") which looks a lot easier to maintain.

mch82 · on Jan 7, 2021

Thank you for sharing.

techphys_91 · on Dec 31, 2020

Is this really an issue for the use case being described? Environment management is obviously a significant consideration for software developers who need to keep track of versions etc, but it sounds as if these users primarily want to use the fundamental numpy functions.

They could install one of the scientific python stacks (e.g. anaconda) or just install packages globally with pip.

macintux · on Dec 31, 2020

Things always break over time. I have a very technical co-worker, a systems admin, with a broken Python stack on Windows. No idea what's wrong.

fock · on Dec 31, 2020

so he did a chmod -w on their folder and it broke. Well, that's a feat I still have to work out!

mch82 · on Jan 7, 2021

Anaconda switched its software license recently to require large companies to purchase a license, so now people must wade through the purchasing department before installation & use.

klelatti · on Dec 31, 2020

Agree completely on 1. but not sure on 2. and 4. - I think if you're using Python and need performance then you will be using Numpy etc - would be interested to hear if there are instances where this doesn't work.

yold__ · on Dec 31, 2020

numpy is great for vectorizable calculations, but many calcs (particularly for long-term life contingent risks, i.e. reserves), are not vectorizable except in the most simplistic cases.

auxym · on Dec 31, 2020

Numba can usually speed up non-vectorizable things by JIT compiling and potentially parallelizing.

I mean, in an ideal world you'd use Julia or a Cython extension, but if you already have something in Python/numpy, numba only requires you add a decorator to your function and it gets jitted.

klelatti · on Dec 31, 2020

Thanks - sorry I'm struggling a bit - wouldn't they be vectorizable across the portfolio or across scenario for stochastic calculations. Maybe it's because of different backgrounds (mine in UK) but I'm can't recall seeing the deeply nested function calls that you're alluding to.

yold__ · on Dec 31, 2020

Think about the calculation of an insurance product with a Fund Value. Everything is forward recursive with respect to time. Been a while, so I might butcher some of this. It is likely that you'll want a 30 year projection, so you'll call fundValue(30 * 12)

fundValue(t+1) = if t > 0 fundValue(t) - charges(t) + intCred(t) else initialPrem

charges(t) = netAmtAtRisk(t) * costOfInsurance(t) + riderCosts(t) + policyFee(t)

netAmtAtRisk = (FaceAmt - fundValue(t))

Now think layering on decrements

surrenderMargin(t) = lapseDecrement(t) * (surrenderCharge(t) * fundValue(t))

mortalityMargin(t) = mortalityDecrement(t) * netAmtAtRisk(t)

investmentMargin(t) = (earnedRate(t) - intCred(t)) * assetBase(t)

Now think layering on calcs necessary to calculate the assetBase (e.g. reserves + required capital)...

klelatti · on Dec 31, 2020

That code looks very familiar! I see what you mean now. I don't think I've ever seen this implemented recursively though - can certainly see how this would end up being problematic if you tried to do this in Python!

ps Thanks so much for taking the time to set this out.

pps I've been working on something that implements a highly optimised version of this style of calculation - with a DSL to describe the calcs - can do 30 year cashflow projection for 1m contracts in about 1 min on quad core laptop. UK focus initially but might have wider application?

cutler · on Dec 31, 2020

Clojure now has libpython which would enable you to add the recursion.

etothepii · on Dec 31, 2020

The calculations typically performed by actuaries are individually all fairly simple but in aggregate without automated testing and version control it is reckless to use them in pricing or portfolio calculations.

harha · on Dec 31, 2020

For 1. I would say R is a good option. It works relatively well everywhere and has an ok IDE, lots of packages that make life easier (tidyverse). I also wouldn’t recommend python for exactly that reason.

1337shadow · on Dec 31, 2020

I thought the point was to make a web interface in Python to query some database and process to calculations on a powerful server, in which case the database and server for sure are going to be faster than excel, including the WITH RECURSIVE SQL statement, and only a browser is necessary for users which makes the solution not only multi platform but also remote-friendly. As such, your post really makes me wonder what they are trying to do at all.

raverbashing · on Dec 31, 2020

Those are fair criticisms but I think under Windows, Docker+pip is the worse way of managing it

2 and 4 are surprising, it would be interesting to do a benchmark and maybe figure out the best way to do stuff in Python for your case

About 3, I suppose that's why developers should break up complex expressions (and not only in Python)

etothepii · on Dec 31, 2020

Its absolutely true that an untested 1000 line python function is no better than a 1000 line untested VBA function.

eyeball · on Dec 31, 2020

the new functions in recent versions of excel make an even strong case for its use

https://techcommunity.microsoft.com/t5/excel-blog/announcing...

https://www.excelcampus.com/functions/dynamic-array-formulas...

etothepii · on Dec 31, 2020

I have wondered about this, the lambdas and table could be of huge benefit against some of the most egregious excel mistakes but that isn't an argument for excel's use.

The problem is not one of it not being possible to do automated testing or source control in Excel. VBA is Turing complete so anything is possible, it's more one of not thinking, or understanding why, those things are important. Once you do come to think of such things as important you will quickly never use Excel for anything but the most basic calculations.

kfk · on Dec 31, 2020

Hi, this is a late reply but I am actually pulling this off. Your points above all make sense but are around the desktop paradigm. If you move to the cloud paradigm not only most of the points go away but you gain a lot by having data all in one place (S3) and strict collaboration (github). Specifically the Python env problem goes away if you ask analysts to work online with notebooks (ie jupyter hub).

etothepii · on Dec 31, 2020

Just because one can trace ones way through an excel spreadsheet should be of negligible comfort in the aftermath of a multi-million dollar loss.

belorn · on Dec 31, 2020

The article specific mention that the environment is the browser using python notebook. In the author use case there is no docker, no pip, no window desktop for which I.T. support is managing python on client machines.

I also wonder, if I.T. support were to use docker, are they doing that for python, or would they still continue to use docker even if they move away from python?

yufeng66 · on Dec 31, 2020

I am an actuary for 20 years and people have been trying to replace Excel for at least 15 of them. But Excel is not going anywhere. I think it will be even more popular with the recent introduced lamda function. Excel formulas will Turing complete and we don’t have to use VBA anymore.

mslate · on Dec 31, 2020

From an actuary's perspective: is Google Sheets ever entertained as an Excel alternative?

will_pseudonym · on Dec 31, 2020

No, for many reasons. Workbook calculation performance, advanced, complicated spreadsheet incompatibility, worse UI/keyboard shortcuts, slow UI, different language than VBA for custom functions (It isn't necessarily worse, but it's the same thing as trying to convince a department to switch from language X to Y including rewriting every application that's written in X. Also, every person you've ever hired was familiar with X, and has no experience with Y.)

R/Python/SAS etc. are a much more compelling alternative to Excel than Google Sheets (to say nothing of the actuarial modeling software packages that are used already for more rigorous/complicated problems).

If an insurance company decided to move all of their MS Office users to Google Docs/Sheets etc, my money is on the actuarial department paying for Excel out of their budget without a moment's hesitation.

yold__ · on Dec 31, 2020

It's been a couple years since I've used it, and I didn't feel it was a comparable alternative. It's decent for about 80% of spreadsheet users, but the keyboard shortcuts were lacking and it was missing some functions that I rely on.

For keyboard shortcuts, most Excel power users don't use the mouse, so while it sound trivial, it's really hard to feel productive when you have to hunt around for the right button to click.

From an enterprise perspective, Excel is so entrenched it would be a 5-10 year effort to port existing spreadsheets to sheets. Practically speaking, most companies wouldn't see the benefit.

will_pseudonym · on Dec 31, 2020

> From an enterprise perspective, Excel is so entrenched it would be a 5-10 year effort to port existing spreadsheets to sheets. Practically speaking, most companies wouldn't see the benefit.

And at the end of the day, it would have worse performance than Excel both in calculation speed and _much_ worse UI. One of the reasons Excel is so much better than Sheets is speed. Insurance companies spend hundreds of thousands of dollars a year on actuaries. Even if Excel cost them $500/year/user, it would be easily worth it for actuarial departments.

etothepii · on Dec 31, 2020

The hundreds of thousands of dollars a year that insurance companies spend on actuaries is the real reason that no one will ever come off excel. The maths is not very complicated but the lack of source control and testing means each bugfix or added feature introduces another bug to fix next week or feature to add the week after.

zinekeller · on Dec 31, 2020

Used GSuite (Google Workspace?): It's fine, it is improving but lack of shortcuts even for basic tasks (I can change the font on Word with just a keyboard, try that with Docs without a mouse). Dealbreaker is the 5 million cell (not row nor column) limit, which is even lower than Microsoft's old limits (more than 15 million cells, which was increased in 2007 to you-have-a-serious-problem-if-you-somehow-fill-this-limit cells).

andylynch · on Dec 31, 2020

Not to mention that if you're using Power Query, Excel's 1 million row limit doesn't apply in the workbook queries either.

HALtheWise · on Dec 31, 2020

Alt+/ is the magical shortcut that makes this easy, you can simply search a font name and hit enter.

anonymouse008 · on Dec 31, 2020

Not an actuary - but I think this crosses domains: the lack of comprehensive shortcuts makes Google Sheets DOA (dead on arrival) for my uses.

rvba · on Dec 31, 2020

Uploading propertiary data to a cloud is a big no.

etothepii · on Dec 31, 2020

Companies are more worried about this than they probably should be.

The average enterprise network is nothing like as secure as people behave like it is.

Where do you think your email is hosted? With few exceptions I'd expect its provided by a cloud provider these days.

maxerickson · on Dec 31, 2020

Would the people doing the implementation need to be able to choose and manage dependencies? Or could they do the work inside a prepared environment (comparable to Excel in some sense).

fock · on Dec 31, 2020

that costs millions!

yold__ · on Dec 31, 2020

I think you are mocking me, but I'll bite.

Insurance companies are contractor heavy. They bill at $150 an hour. That's $300K annually per head. Won't take long to get a million, when you add PM overhead, information security oversight and governance, etc. Again, it shouldn't cost that much, but it does.

deknos · on Jan 1, 2021

from your POV, what would be an better replacement than Excel? provided it's opensource?

davidu · on Dec 31, 2020

This is the argument made against all software stack advancements. Nothing to do with industry. But when the benefits outweigh the the hurdles, change happens. And if I was starting a new insurance company (which I've considered) I'd be doing our work in code not xls, and probably python. Having RCS, Numpy, unlimited compute, unlimited storage, all gives me an advantage over my competition. :-)

As to the memoization, that is not hard to manage in Python.

yold__ · on Dec 31, 2020

"As to the memoization, that is not hard to manage in Python."

Yes it is. Recursive calls for financial calculations easily go hundreds of thousands of calls deep. This is why high-end actuarial modeling software either decomposes it into a dependency graph and unrolls function calls where possible, or just "brute-forces" it by being a thin wrapper over c++, i.e. using operator overloading on ::operator().

I've seen ill-fated efforts of capable software developers attempting to unroll the recursive function calls, and ending up with 2000 line functions that are impossible to maintain.

edmundsauto · on Dec 31, 2020

I can't visualize what you mean by deep recursive calls. What are the calculations that mean you can't just use fairly bog standard python for? I didn't realize there was "big data" in accounting.

solresol · on Dec 31, 2020

Why doesn't annotating these functions with @functools.lru_cache(10000000) work?

yold__ · on Dec 31, 2020

First of all, let me say that I've tried it :)

Your recursion needs to "bottom-out" in order for that to work. If you don't get a stack overflow / out of memory error, you're good. But bear in mind that there will be thousands of stack frames. Before you get to time=0 (the recursive base case) in a long-term liability actuarial calc.

The recursion isn't simple like the Fibonacci sequence . It's more like:

f(t+1) = if t > 0 (f(t) + g(t)) * h(t) else initial_constant

g(t) = f(t) + q(t) - d(t)

q(t) = ....

d(t) = ....

andi999 · on Dec 31, 2020

Although at fist glance this formula is written recursively, one doesn't have to (and shouldn't) implement using recursion, does one? Just making f, g, q, d arrays and then loop over t should be good, or is there more to this formula?

yold__ · on Dec 31, 2020

Appreciate the curiosity. In this small trivial case, yes that works. But what happens when something in the logic changes?

You wind up needing to know the order of calculations since things are no longer lazily evaluated via recursion. This is a problem when you have dozens of "columns" (i.e. recursive functions or arrays as you are suggesting). Often times, the value in the array is NULL (or worse, leftover from a previous calculation). You are left to manually try and re-order the calculations, which is not trivial when there are hundreds of functions.

Excel takes care of these details for you automatically. Users program functionally and recursively (fill-down) without even thinking about it. Excel reactively updates when dependent values change (re-evaluates as necessary).

If power, speed, and scale are necessary, there are purpose-built systems (with Domain Specific Languages) which specifically solve this problem in the insurance domain (e.g. FIS Prophet, Risk Agility, AXIS, etc).

BuuQu9hu2 · on Dec 31, 2020

It is common knowledge that all recursive functions can be re-written using iteration (e.g. loops). See “ Recursion versus iteration” here https://en.m.wikipedia.org/wiki/Recursion_(computer_science). The assumption that only trivial calculations can occur using iteration, or that recursion alone allows for supportable code, I believe are very flawed assumptions.

BuuQu9hu2 · on Dec 31, 2020

Um, ever heard of a “for loop”? An obvious alternative to recursion.

Der_Einzige · on Dec 31, 2020

Not sure why this is being downvoted since I don't think the OP has done a good job of showing evidence that this doesn't work for recursion. Your computer almost certainly WILL have enough space for all the stack frames necessary.

pmart123 · on Dec 31, 2020

My guess is the numeric inputs would be changing significantly each call?

klelatti · on Dec 31, 2020

Agree 100% with this. Better analytics can be a key competitive advantage for insurers and modern tools / cloud offer potential to be much better than Excel.

I've been working on a product that turns JupyterLab into an IDE for life insurance calculations - Python API wrapped around an optimised C / GPU computation layer underneath, all integrated with key open source libraries.

bastawhiz · on Dec 30, 2020

When I worked at Uber, one of the big goals of my team was converting spreadsheets from the finance team to Python and Java. The second two problems that the author mentions (pulling in more data and software best practices) were two huge factors. In the former case, you simply cannot have an org where analysts have full read access to every data store to dump a CSV (of sensitive data collocated with lord knows what) at any time. It's a security nightmare. And in the latter case, when you've reached a point of sufficient complexity, you can no longer "roll out an update" to a team of more than a few people. Without versioning and source control, the model_v2_final_FINAL(1)(1).xlsx problem becomes extreme (even on cloud platforms). This leads to mistakes, and mistakes cost time and money.

Excel has other problems that aren't described in the article. First, it intermingles data and logic. If you're not especially careful and deliberate, running an experiment with multiple inputs means that you'll inevitably fuck up one of the inputs (or forget to change some data, or otherwise fail to do the steps necessary to reliably run the model again), leading to bad output. This is a reusability problem: you can do it right (one file per experiment, "template" spreadsheets, error handling logic), but in practice very few folks do this or even care.

Second, there's no meaningful way to test. If you've got critical logic, there's no way to write proper unit tests against the spreadsheet to ensure something hasn't broken. If I had a dollar for every improperly written linear regression in a spreadsheet... Conversely, writing spreadsheets as code means that you can rest assured that important units of logic are sound, which pays dividends when you're dealing with stuff used by a whole org.

Third, spreadsheets are really only useful as the "last step" in data processing. It's not good or easy to use a spreadsheet as input to something else. The inputs to the spreadsheet are usually manually updated (importing a CSV as a sheet), and then the output is graphical by default unless you're parsing the spreadsheet (good luck) or dumping it to CSV to import elsewhere (manual step with the risk of human error). In any business where the model you're dealing with pipes into other processes, there's almost always a manual step to get that data into "the next thing", be it another model, a dashboard, a database, etc. You can hack around this, but I've never seen a hack here that isn't incredibly brittle.

This isn't to say that Excel is bad, but when you use it "at scale" there are very rough edges that dramatically increase the ongoing costs of running a business built around it. When you're building a model, it's great. When you're running that model with different data more than a few dozen times a day and using the output in other systems, the costs quickly start to add up. That's the point where someone needs to step in and say "okay y'all, production use of this needs to run on a server". And if the production implementation is built well, you'll often find it simplifies the lives of the analysts, because they can download a blob of already- or partially-processed data to work with.

meztez · on Dec 31, 2020

I'm pushing for our actuarial team to transition to more R + Git. After 3 years of preaching, most of the actuaries now use RStudio + git as their primary work tool. It is happening.

What we did :

1) Provide documentation on everything from install to using internal R libraries for ETL.

2) Provide mostly problem free, always updated VMs with RStudio Server/ Shiny Server.

3) Establish an hotline channel for instant help on R or git.

4) A couple members on the team developed really close working relationship with IT and we have great respect for each other work.

What we provide is way better and by being active, we built users trust in the tools.

We are phasing out SAS and proprietary modeling tools. Python never took hold even if we bought Anaconda entreprise. Excel is there to stay for sure but since actuarial student learn R in school, it is easier to onboard new hire.

If you want to go down this path and have a chat, hit me up. I'm in P&C. We use R both in development and production environments. We use it for pricing, spatial contractual obligation, claims assignment and a couple more models.

wodenokoto · on Dec 31, 2020

The current top comment (sibling to the one I’m replying to) argues that keeping Python environments across actuaries/users computers up to date is too difficult.

This is nicely solved by using R server.

I’ve worked in an R server shop, and the experience is really nice. You log on to the server in chrome or Firefox and the browser window basically becomes RStudio and all calculations are done on the server and all code and data also lives on the server which is a huge bonus in terms of data protection. No copies are floating around on peoples laptops and if Johnny is sick and forgot to push his code to git - no worries, it’s all on the r studio server.

I don’t now of a nearly as good Python solution. I think Conda suggests using jupyter lab, and while that is a great environment it’s not great if it’s all you can use.

disgruntledphd2 · on Dec 31, 2020

The big problem with notebooks is that you don't have a real REPL. This prevents one from single step debugging and tracing. This is one area where RStudio is much, much better.

The trouble is that so many of the younger DS people are focused on Python, that it makes financial sense to just deal with all its problems. There's also a lot more programming tools (though less statistical modelling tools).

wodenokoto · on Dec 31, 2020

You do have access to a repl when using jupyter notebooks.

You can hook a notebook or a repl to an existing kernel. I always have a command line attached to my notebooks. When using jupyter lab I attach the build-in terminal and place it at the bottom. When using notebooks I attach it from my terminal.

The experience in Rstudio is still better imho. It’s also a more mature text editor and ide than jupyter.

disgruntledphd2 · on Dec 31, 2020

Ok fair enough, I only used notebooks when I can't avoid it. I'm pretty sure you don't get a repl by default though, is there an involved set up in jupyter?

wodenokoto · on Jan 3, 2021

    jupyter console --existing

should start ipython in your terminal and connect to the last started kernel (e.g., the one in the notebook you just started)

https://stackoverflow.com/questions/22447572/connect-termina...

For jupyter lab, you just choose to start a repl from the gui and choose an existing kernel.

disgruntledphd2 · on Jan 5, 2021

Thank you! (clearly I didn't spend a lot of time doing this, as I have an Emacs addiction ;) )

klelatti · on Dec 31, 2020

I'm an actuary with a strong interest in this area - would be very interested to hear more especially on your R vs Python experience.

meztez · on Dec 31, 2020

It came down to IDE, workflow and data.table.

RStudio is an absolute killer solution from the get go. Package management in R is simple and robust. Shiny is the new Excel pivot table on performance enhancing code.

Python has more contributors, more users. It also creates a lot more noise. Business people may feel like it is a a programmer tool. R feel more approachable.

In the end, both are great solutions but we decided on R because we believe in the people contributing to the ecosystem, mostly RStudio. Somewhere down the line, there might be a transition to julia.

klelatti · on Dec 31, 2020

Thanks - really interesting, especially on the RStudio point.

hnracer · on Dec 31, 2020

I've used R (3 years) and Python (8+ years) in data science and much prefer Python, because it can do things that aren't just pure data analysis, and because pandas is so amazingly good compared to R's data matrix solutions, in my opinion. I believe that the algorithmic trading industry has gone fully into Python and away from R for these reasons.

meztez · on Dec 31, 2020

R has data.table. It is the game changer as I agree base R data.frame do not cut it for performance. tibble will come close once they incorporate more of the data.table performance tricks.

https://h2oai.github.io/db-benchmark/

hnracer · on Dec 31, 2020

Does R have robust CSV parsing? I remember using the default and it'd be extremely finicky about getting the header and index flags right and wouldn't typecast numeric columns properly (instead they'd end up as factors and not play nice)

st1ck · on Dec 31, 2020

Python version of data.table has very fast CSV parsing (compared to Pandas), and it didn't have issues like those you mention. Even if data.table had issues with CSV parsing, you could probably use Apache Arrow to parse CSV into arrow table and then convert it to data.table (but that is probably suboptimal).

alexhutcheson · on Dec 31, 2020

https://readr.tidyverse.org/

bostonfincs · on Dec 31, 2020

Personally have never had a problem with R csv parsing

disgruntledphd2 · on Dec 31, 2020

It happens, but mostly because other formats don't produce usable CSV's. The biggest problem is if there are any free-entry text fields (common for customer/business name), and there isn't full quoting around these fields, base R will break.

I believe both fread and readr::read_csv do the right thing here, but the base-R perspective on data manipulation before read.csv is to use Perl (the R-core team are pretty old-school, to be fair).

eyeball · on Dec 31, 2020

h2o's data.table clone is fine

https://github.com/h2oai/datatable

alexilliamson · on Dec 31, 2020

I've been a heavy user of all 3, and pandas syntax is a nightmare compared to dplyR or data.table in R. That being said, I still use pandas because I prefer python for non-analysis.

jimmyjimjimmy · on Dec 31, 2020

I'm a CPA. When I started learning code, I looked for whatever was most like a spreadsheet. R for the bill, with built-in frames.

2Gkashmiri · on Dec 31, 2020

Oh.. similar line for me, accounting/tax law. Excel is bread and butter because all year end fianncials are prepared and finalised on excel. Although I have used libreoffice on my personal machine, it also kinda works.

For a couple of years I have tried to excel macro myself a balance sheet template which does most of the copy pasting from precious years, does bank interest calculations and all.

It would be interesting to know how does a us CPA work because its all accounting package>excel>efile.

thetwentyone · on Dec 31, 2020

I'm on mobile, but do also consider https://JuliaActuary.org (something that I personally have contributed to).

klelatti · on Dec 31, 2020

Looks really interesting thanks. I've seen some interesting insurance projects using Julia e.g.

https://www.youtube.com/watch?v=__gMirBBNXY

jgalt212 · on Dec 31, 2020

R is better if your raw data is already tabular. I prefer Python if the raw data is unstructured / semi-structured. You can make the case that once Python has converted the data to tabular then move to R, but at that point I like the soup to nuts to be in one language.

stochastastic · on Dec 31, 2020

I’ll second klelatti’s question about R vs Python. From my perspective Python is just as practical for actuarial calcs and better for building general purpose tools. Is there a reason Anaconda didn’t click?

meztez · on Dec 31, 2020

GPU integration was broken for a long time. Managing VMs / Environments. The absolutely horrible integration with git/Github.

Having to rebuild your environment from scratch when your workspace crashed. Imagine starting a notebook with a 45 minutes compile time. No go.

One click deploy, let's just forget about it.

stochastastic · on Dec 31, 2020

Thanks! That totally makes sense. If I had to pick a pain point for getting people started with Python tools it would be environments. Comments here make me think my team is working with a lot less data too.

kfk · on Dec 31, 2020

I am putting together course aimed at Python beginners in enterprise, I too have experience in Finance. If someone is interested I would love some early feedback, you can contact me my email is in this profile bio.

ricklamers · on Dec 31, 2020

Would love to have a chat about how you’re making R + git more accessible. How do I best reach out?

intended · on Dec 30, 2020

As the author states - this is an issue for some really complex models - where the complexity, reusability and iteration challenges approach code.

Most models do NOT take that many tabs, you can build a toy model near instantly - the production line from finished model and output to publishable material is a few shortcuts away.

Having an analyst, write that same thing using Jupyter? From an accounts perspective? Man, I’d want to see it in a spread sheet. It’s just simpler, or more familiar, to debug accounting information in a spread sheet.

The idea that we are going to see all those analysts pick up code - over excel - is possible, but I’d say less likely.

I’d suspect that the idea of python inside of excel, is a winner. But given that excel is working with its own data model and data tools with power BI, or with their new Lambda function, I’d say they are also working to keep people happy within the excel ecosystem.

Interestingly, this is a version of the Bloomberg terminal debate - the terminal does everything, any upstart can only do a small part of the BB offering, allowing BB to always be relevant if not dominant.

gerdesj · on Dec 30, 2020

"analysts" - lol!

I know a HR director at a multi-national. He'd had enough of Excel and liked the look of this Python thing. I showed him R as well for balance but he wanted Python. I showed him how to install a Python distro and MS Code on his Windows machine, wired them up and off he went a few months back.

The board are in awe of his presentations. He is not an IT bod at all but a Uni. degree in Psycho. involves a fair amount of stats so a fair grounding there. He grabs huge data dumps from payroll etc and performs analyses that are complex but just work.

I think one of the benefits of using Python is that you instantly divorce input data, calcs and reporting. Fire up Excel and the first thing you often do is write a title. Using Excel properly requires a lot of discipline - I wrote a Finite Capacity Planner, with forecast and labour planner for a pie factory in Excel with quite a lot of VBA. It ran my P60 hard but did the job iteratively in about 2 to 5 minutes. Easter and Chrimbo needed a fair bit of tweaking by a Planner but most of the time my model told several supermarkets what they would be ordering back in the mid 1990s and they mostly faxed or EDId our forecast back as an order.

My brother (cough) is absolutely not an analyst in the normal sense. That a non programmer can bolt together enough Python to perform analyses useful to his job is testament to the power of the libraries and examples and documentation available. I've seen his code: suck in data, process it, spit out results, report results. That's all he needs and not a OO abstraction in sight.

My two examples (me and my FC Planner with Excel and an HR bod thrashing some data to a report with Python) are different things and each uses the opposite "tool for the job" discussed in the OP. However, it is how you use a tool that is important.

klelatti · on Dec 30, 2020

Thanks for some great anecdotes! A couple of thoughts:

- It's often not is Python a good fit for the task but are there Python libraries that are a good fit? If so the actual Python code may be pretty trivial and the equivalent Excel a lot more complex.

- Writing good Excel is definitely possible but needs real discipline as you say - and bad Excel can be really bad!

tomcam · on Dec 31, 2020

All good anecdotes, but I’m still kind of stuck on the whole you got to work at a pie factory thing

gerdesj · on Dec 31, 2020

(Sorry, misty eyed recollection alert)

I should point out that "pies" in the UK is a rather generic term. MBOs (mince beef and onion), sausage rolls, pasties, pork pies and quiche was made in this south Devon based factory, near Plymouth.

It was a good corp citizen thing to attend the 1100 "taste panel" which was part of the quality process. Obviously Product Dev, QA and the line crews could not mark their own work so office staff were expected to taste to standard. The idea is that you taste samples from the store that is post bake. This is a perishable product and there are stores (freezers, chills etc) to provide time buffers throughout production.

There are a lot of constraints. You always make to forecast. In this case, back then, you had to deliver to depot with seven plus days of shelf life. The product needs meat and dough prep, make, bake and wrap and shoving in the back of a trunker (lorry/truck). You need to ensure you've got all your raw ingredients available and most of those have a shelf life and somewhere to store. Your machines have a nominal 100% production rate and a defined servicing period, expected breakdown rate, need cleaning and more. Some machines will do the job end to end and some will only do part of the process. You have bakeries and stores with varying characteristics. Some products have special requirements.

It is clearly a "simple" job of defining, understanding and controlling your constraints and solving a few equations. I absolutely loved it as a challenge. This was 1995ish. I inherited a System 36 that was basically a glorified accounting system with some stock control and a few other things. If it got too warm in summer I used to put bags of solid CO2 that I could scrounge from Despatch (the whole factory panics when it gets really hot) in it.

I am quite partial to pork pies and pasties.

klelatti · on Jan 1, 2021

I helped to sell System 36s (the big ones that took half a room) back in early 1980s.

I remember one demo when a potential customer asked "If it's this slow with one user how slow is it with six?"

The person doing the demo "improvised" with "It's dynamic load time balancing." Which I'd never heard of before. Turns out neither had anyone else involved with the System 36.

I later came across a whole insurance company that was run on a S/36. It was replaced by a single 386 PC.

gerdesj · on Jan 5, 2021

I had to tickle one of the fans to get it started otherwise the bloody thing would shut down about 40 mins after IPL started. One summer we put bags of solid CO2 in it to keep under the temp threshold.

I still giggle hysterically when someone deploys the "enterprise" keyword at me.

tomcam · on Jan 1, 2021

That was far better than I could have hoped for. Thank you.

I’m trying to understand what benighted people are not fond of pork pies and pastries. Let’s have a moment of silence for them and move on.

gerdesj · on Jan 5, 2021

There are always the unenlightened. One day they shall see the light (or a bloody great steak and ale pie) and they shall wipe their mouths in righteousness.

buryat · on Jan 4, 2021

this is an amazing story, thank you!

Foobar8568 · on Dec 31, 2020

And I have seen a board member not wanted anything but linear regression from his own data scientist team (If I remember well, they were like 5~ PHD or master in stats) because he couldn't understand anything. And that was in one of the largest organization in the world.

nojito · on Dec 31, 2020

Data scientists are useless if they don’t have the right communication skills to empower decision makers.

This is why linear regression will always be king and people who know how to turn complex problems into linear problems are worth millions.

wendyshu · on Dec 30, 2020

What's a bod?

afavour · on Dec 31, 2020

A “body”/a person. I suspect the OP is from the U.K. or Australia or similar, it’s slang there. I suppose it’s kind of a gender neutral version of the U.S. “guy”.

gerdesj · on Dec 31, 2020

A "bod" is a person (body.) It is a colloquialism in the UK. You generally only use it in this case to denote an anonymous/generic person: "A bod did stuff".

However, bod is also used as a formal abbreviation for body: "You have a lovely bod". In this case you should be reasonably familiar with the object or you will get slapped!

Sorry, bod means a person.

BlueTemplar · on Dec 31, 2020

Last time I checked, Libre Office Calc had Python as an option for scripting language, why aren't more people using it?

Vaslo · on Dec 31, 2020

I work in an excel heavy function (Corporate Finance). And while this sounds very exciting and fresh I am just not seeing it take hold in Fortune 500s I have/am working for. A few reasons:

1) biggest gripe: I don’t have time to maintain and fix models after I move to a new role. If it’s a Python based model I build, no one can seem to fix it when some tiny thing breaks 6 months after due to a change in the data. I’ve had to work weekends to help colleagues fix models that I don’t use anymore. I can hand Excel to a young or old worker and they can always seem to figure it out and take it over.

2) The tools seem limited when directly doing Python in Excel like the one mentioned nothing the article. VBA kind of sucks in 2020 but until Excel natively accepts Python as part of its base, I don’t love being dependent on these 3rd party tools. VBA always works.

3). I’ve recently complete an MS in Data Sci so I am very familiar with Python and R. My company doesn’t need that level of model for most things. We are a best in class in our industry and we get by using lots of Excel models. I mentioned in my first point that I have built a few things with Python. When I had to fix I just rebuilt in Excel and that was all I needed. When I kept fixing the Python code I always felt like I let folks down if I couldn’t fix their stuff right away. Yet our business makes money and we continue to do well without much Python.

I love Python. But until others start to see its value and a critical mass of individuals knows/supports/can implement Python, I will put emphasis on learning Excel tools or SQL first because those will always be supported.

carlmr · on Dec 31, 2020

>When I kept fixing the Python code I always felt like I let folks down if I couldn’t fix their stuff right away. Yet our business makes money and we continue to do well without much Python.

I'm all too familiar with this. I think you need to let go of those Python models. You need to let others fix them themselves, maybe with minimal guidance. That's the only way they have a chance to learn.

ogre_codes · on Dec 30, 2020

The big problem I've always had with "programming" in a spreadsheet is by nature everything is obfuscated and difficult to trace. Yes, you can inspect a cell and see what the source for that cell is, but that might be 10 other cells and you can only really review one cell at a time. It's like a programming language where you only see one line of code at a time. Worse, those references usually aren't named. What does "A1 + SomeOtherTab:B2" mean?

All of this really starts to fall apart when you have 10s of tabs with hundreds of rows of data which are often copy/ pasted. You won't even notice that some intern hard-coded one value into cell F75 until you actually drill down to that cell.

Spreadsheets are great until you hit a certain complexity, then they are unmanageable messes.

listenallyall · on Dec 31, 2020

> What does "A1 + SomeOtherTab:B2" mean?

Excel offers the ability to name any cell or range of cells. Don't even have to search through menus or the ribbon, it's right there to the left of the formula bar.

I'm well aware that the vast majority of Excel spreadsheets don't use named cells/ranges, but you can't really blame Excel for that. It couldn't be too much easier. Lots of Python programmers don't use comments or descriptive variable names either.

ogre_codes · on Dec 31, 2020

> Excel offers the ability to name any cell or range of cells

Except almost nobody does. It's not intuitive or the way it's taught.

> Lots of Python programmers don't use comments or descriptive variable names either.

You have to have variable names in Python. If you want to give them shitty names, that's your bag, but unlike Excel, it's not an extra step.

You also have to deal with the fact that every cell in a range has its own unique formula. It's like you have a special function for each and every cell. You have nice conveniences for it like copy/ paste and dragging, but ultimately you are copying formulas all over the place. And it's super easy to update all but one of those when you make a change.

Yes, you can create custom functions, but much like named ranges, it's not the default behavior, takes extra steps, and it isn't the way Excel is taught.

Spreadsheets are amazing for small to moderately complex things, but beyond a certain point, they are just an unmanageable mess regardless of who creates them.

listenallyall · on Dec 31, 2020

Interesting, my comment was solely in response to your complaint about naming convention... you've expanded your criticism quite a bit. I already anticipated your issue with naming being that few people used named cells/ranges -- again, it's not even in a menu or ribbon, it's present at all times, what else do you want? Not "the way it's taught"? Well, blame your teacher.

> You have to have variable names in Python

sure, but `(i, j, k)` isn't any more descriptive than A1 or B7 or CQ85759. `intOrderTotal` may seem better initially, until the summer intern creates `intOrderFinalTotal` (after tax) and `intOrderAllInFinalTotal` (after shipping and tax)

> every cell in a range has its own unique formula

you could use array formulas, or were you never taught those either?

> it's not the default behavior, takes extra steps,

Creating a Python virtual environment is not the default behavior, and it takes extra steps. So does using any packages beyond the standard library. So does source control. Or running Jupyter. Using classes, or type hints, or imports, are all not the "default" of one long script in a single file.

Excel isn't superior to Python, or the best tool to solve every type of problem. Excel has its place, Python has its place. But your specific little nitpicks here are a reflection of the user (who I presume is you) not on the tool itself.

etothepii · on Dec 31, 2020

If the code was stored in source control with a separation between dev and production the intern might be able to raise such a Pull Request but it would never get approved without oversight from someone more senior. This Code Review process is almost completely impossible in Excel.

pjmlp · on Dec 31, 2020

That is a big if.

Office has source control management via SharePoint integration.

zentiggr · on Dec 31, 2020

... and then you have two problems. I have never seen an environment converted to SharePoint that didn't suffer badly from the conversion.

And the Excel vs <other language> source control issue isn't history, it's "go ahead and try to diff between two versions of an Excel sheet" vs "diff two versions of that source file".

Unless I've never heard of the tool that can digest two Excel sheets and tell you what formulas differ, or cells. Please correct me, anyone who knows of one.

pjmlp · on Dec 31, 2020

https://support.microsoft.com/en-us/office/overview-of-sprea...

zentiggr · on Jan 4, 2021

Late response over the holidays, but honestly thank you. In years of Excel usage, I've never stumbled on that or had anyone mention it.

I will look into that as Excel 2016 is one of my current required work apps.

etothepii · on Dec 31, 2020

An excel tool that showed you formula differences sounds like a cracking idea for a startup.

ogre_codes · on Jan 1, 2021

> Interesting, my comment was solely in response to your complaint about naming convention... you've expanded your criticism quite a bit.

My point was always that spreadsheets are poorly structured for complex problems and that the logic is obfuscated. Just pointing out additional issues. Nor is my previous post exhaustive.

You are comparing worst case Python programming to best case spreadsheet designs. As soon as you compare a typical moderately complex Python program to a similarly complex spreadsheet, things fall apart.

alexpetralia · on Dec 31, 2020

You can F9 on "A1 + SomeOtherTab:B2" (just that selection) and it'll calculate for you. You can do this for all the dependencies under Trace Dependencies.

I admit it's not perfect but I have found it much easier historically to follow a calculation through Excel than through untested pandas code (people inner join and drop rows; they groupby and lose null groups plus related rows; they filter string data without case insensitive matching, etc.).

quietbritishjim · on Dec 31, 2020

In Excel there's a toolbar button to toggle showing formulae rather than their results. I realise this doesn't counter your overall objection, but it does mean chasing down logic isn't quite as bad as having to select individual cells one at a time.

ogre_codes · on Dec 31, 2020

I mean... sure? There are a fair number of ways you can mitigate these issues, but the way spreadsheets are structured does not lend itself to structured/ well managed code.

DavidPeiffer · on Dec 31, 2020

I fully agree, things can get out of hand in spreadsheets. I've built my fair share of such spreadsheets, and as penance, I'm learning R.

The thing with Excel is there's a low barrier to entry, but there are a lot of differences between a great spreadsheet and a bad one. Somewhat like a junior vs senior developer, the quality of code/spreadsheet depends on what they know, how well they can troubleshoot, and how good of a system they can imagine (to then replicate as much as possible).

For example, most people are entirely unaware that Tables exist in Excel. When you want the sum of a column, rather than writing =SUM(F7:F39) and cursing when you realize you added 10 more rows and that's why the sum is not updating, you can do =SUM(tbl_Sales[SalePrice]), and when you add 10 rows, the table will automatically expand. Suddenly your formulas are somewhat self-documenting, regardless of which sheet holds tbl_Sales. Crtl+T when you've selected your data, or Insert -> Tables -> Table.

You can also make named ranges, which I would say is an analog between using {a, b, c, tempVar} versus well named variables in normal programming.

You can also trace dependents/precedents, showing arrows for how the data flows throughout the spreadsheet. Formula -> Trace Precedents/Dependents.

listenallyall · on Dec 31, 2020

> structured/ well managed code

Funny comment in a Python discussion. No type enforcement, no requirement for class/object declarations, circular imports/dependencies allowed, threading/gevent/async messes, variable/class scope weakly enforced

Python is great for a lot of things, but the language is not a beacon of well-managed code. Good Python programmers write nice, easy-to-follow code, just as good Excel builders create very nice, easy-to-follow spreadsheets.

quietbritishjim · on Dec 31, 2020

I mean... sure? I never said they did. I even specifically said I wasn't disagreeing with your overall point, in the vain hope of avoiding this redundant discussion. I was just correcting one specific inaccuracy.

ogre_codes · on Jan 1, 2021

Fair enough, my comment wasn't meant to mock yours.

formercoder · on Dec 30, 2020

I think there could be a middle ground with an excel like tool with some kind of typing. For example I often get confused about what currencies particular cells are, and mixing this up causes a lot of pain. Not sure why the hard coded number can’t just have a “$” and every time it gets multiplied by EUR/USD changes to a euro, and is displayed as such. This little thing would save me so much time.

Closi · on Dec 31, 2020

How would the conversion rate between EUR and USD be determined?

konjin · on Dec 31, 2020

My first job was cleaning up the mess that was caused by hard coded yahoo finance urls in spreadsheets. One of them died, no one noticed and it cost the company millions of dollars in bad trades over three months.

Closi · on Dec 31, 2020

You could automatically import a table from the web into the sheet, or alternatively create some custom data types for currency conversion.

konjin · on Dec 31, 2020

Or you could act like you're responsible for tens of millions of dollars and hand it over to someone who can make sure it doesn't blow up.

"We don't need to hire an electrician to wire up the office, I did my garage using uninsulated wires and it works perfectly!"

Closi · on Dec 31, 2020

The guys handling millions of dollars should be trained in finance / accounting, so they aren’t untrained.

Handing over to IT usually means their flexible spreadsheet that they can change as they require, turns into an expensive and inflexible black box that only IT can change and that doesn’t integrate with the rest of their decisions. Also the new solution also probably has errors and pulls currency info from the same endpoint. Excel isn’t perfect, but it’s used for a reason.

To use your analogy, You can wait for an electrician to change your lightbulb, but that means your going to be working in the dark for longer.

konjin · on Dec 31, 2020

Lightbulbs do not generally cost millions when they go out.

listenallyall · on Dec 31, 2020

are you suggesting programmers don't hardcode URLs in quick-and-dirty (and sometimes, even production) Python scripts/code?

konjin · on Dec 31, 2020

I'm suggesting they have logs.

listenallyall · on Dec 31, 2020

logs of what? if a programmer hard-codes URLs, you really think they are following best practices elsewhere? further, how do logs help get back the millions of dollars that were lost?

konjin · on Dec 31, 2020

Yes, logging with email alerts means that when a catastrophic error is encountered you'd know about it, instead of it being hidden for months.

listenallyall · on Jan 1, 2021

LOL! as I already stated, a weak coder isn't going to implement best practices for logging. At best, he hard-coded his own email address to receive an alert, however, this also goes unnoticed, since, as you stated initially, he's dead.

Look, whatever company this was, was relying on YAHOO as its data source for making million+ dollar trades -- not Reuters, or Bloomberg, or JP Morgan, but YAHOO -- and then, for MONTHS -- not hours, or days, but MONTHS -- nobody in its finance department or trading desk or whatever happened to notice that the incoming data feeds were not matching up with quotes from counterparties, market makers, CNBC, colleagues, Wall St Journal? Does this company not have auditors? A CFO? Any IT oversight whatsoever? I'm sorry to say that this particular company's problems are rooted much, much deeper than the loss of a particular Excel guy.

formercoder · on Dec 31, 2020

Usually it’s pulled in via plug-in from a third party data source like Bloomberg or a cheaper alternative.

jimmyjimjimmy · on Dec 31, 2020

Yep, lack of reproducibility with complex spreadsheets is a nightmare.

fractionalhare · on Dec 30, 2020

That was a neat detour into reinsurance. I wonder how the main thrust of the article holds up generally. My experience in trading/finance is that Excel remains extremely preferred for last-mile use (i.e. for writing and reading reports). Whereas the data science ecosystem has been adopted to develop research platforms and manage the data into (what ultimately becomes, for most users in the organization) an Excel sheet.

I don't see this changing any time soon. I think Excel was always strongest for last-mile use. Excel is extremely powerful when you know how to use it correctly, and I routinely see people match or exceed the productivity of programmers using it for specialized use cases.

_the_inflator · on Dec 31, 2020

The author pretty accurately describes a business model for a startup in an enterprise company: offering a service that was once hidden in some excel sheets.

As a developer at heart turned Senior Manager, I find this article especially interesting. I stumble a lot over complaints like these in the enterprise company I am working for and truth to be told, I voiced many of these before myself.

Problems I see:

- What is the business problem the author is trying to solve? How does a tool - Python - can help do specifically do what better?

- There are no specific measurements mentioned. How big is the data the author mentioned, how long does an analysis cycle take, how large are the teams, the affected people? What about maintaining the software stack? How many requests are there per year?

-What about cost savings? How could they help us compete with other companies? Lead cycles of even weeks may bother a developer but not the business.

It is not that I don't believe his suggestions. It is just that I don't get to the point other than "my favorite tool could do it, too." We could easily substitute Python with R, for example.

"The spreadsheet took 30+ seconds to open" I know this is an annoyance, but how often do you open it? One time a day? 20 times an hour?

"The new model logic is testable and can be upgraded independently" this is one of the most valuable points here, as long as you work in a larger environment. So context is needed here as well.

I know a colleague of mine who is extremely well versed in Excel who has put a decent amount of magic into her sheets. However, even losing her and starting all over again is from a business perspective way cheaper than trying to put her solution behind a cloud service.

It would be fun, to have a conversation with the author.

burlesonK · on Dec 31, 2020

I agree. The article came across as a programmer griping about Excel and VBA, while praising his/her favorite tooling as the answer to some programmer-centric greivances.

etothepii · on Dec 31, 2020

You can reach out to her (or me) on email. Her email is all over her blog and mine is in my profile.

In my opinion the project that inspired this article was some of the most valuable work we did together and it was made more valuable by working directly in a pair (trio?)-programming context with the Underwriter the model was actually for.

sixdimensional · on Dec 31, 2020

XLL lets you also write .NET code, basically anything that can compile to a DLL, to make custom functions you can expose in Excel - and it is a pretty old supported integration method by Microsoft with Excel. COM enabled DLLs were another way to do this, but they ran slower.

Not that I have any issue with getting Python in my Excel, but people seem to forget that .NET is also an option.

Getting these capabilities enabled in a locked down corporate IT environment traditionally was difficult but I suspect that is changing.

I have also lived the whole, turning a model in Excel into an app exercise. At the time, we rewrote a fairly complex demand planning app from Excel/VBA to C# since the other dev team members were C# devs and could support the app.

However, during the project, I did a demo of how one could build a Winforms app in VB.NET also, to the developer who was the Excel/VBA guru. He'd had no idea that coding in VB.NET and Winforms was close enough that he nearly could have been doing that instead.

The compiled C# version of the model we built, went from running a single instance of the model in 1 hour, to under 1 minute. We could re-run their model for tens of thousands of instances daily, without breaking a sweat.

Ironically, the rewritten version in C# never saw the light of day as the project was canceled (corporate politics and wisdom). However, the simple optimizations we identified in the rewrite were given to the Excel guru who actually made improvements to his tool that let it run in more like 10 minutes.. and it was even object oriented and modular! He learned he could do a lot more in Excel/VBA that he didn't even know about.

Coding is coding.. just some tools make the jobs easier or harder.