Generally speaking, the tests were writing the code in the notebook and looking at the results. At least most of the stuff I use notebooks for is either one off or difficult to test. One off stuff might be some exploratory analysis or glue code to do simple tasks like loading and formatting data. Numerical tasks like ML are frequently very difficult to test because the author doesn’t know exactly what the output should be. If you don’t know the answer it’s hard to test for it.
For these reasons, formal tests rarely make sense in notebooks. If the code expands its use to being used repeatedly then clearly the notebook should be refactored into scripts and packages that have tests for the glue code and maybe some attempt at boundary the behavior of the numerical code.
If you're into machine learning and work with notebooks, take a look at the machine learning platform[0] we're making. We've been profitably shipping complete ML products for enterprise clients for many years now. Throughout these years, we've discovered patterns and inefficiencies that slowed us down, threatened the success of our projects, and overall made them more expensive than they ought to be.
It has collaborative notebooks, long-running notebook scheduling that survives network disruptions and closed tabs (you can view them while running without opening Jupyter), automatic experiment tracking for metrics, parameters, and models, and seamless deployment into a REST API. It also enables you to publish a notebook into a parametrized AppBook to allow domain experts to interact with it without being overwhelmed by the notebook interface, and to change parameters without mutating the notebook. Their runs are also tracked.
We're focused on solving actual problems we have faced on paid projects, as opposed to the infinity of features one can build to solve imaginary problems.
For these reasons, formal tests rarely make sense in notebooks. If the code expands its use to being used repeatedly then clearly the notebook should be refactored into scripts and packages that have tests for the glue code and maybe some attempt at boundary the behavior of the numerical code.