That's not TTFP. TTFP is putting that code into a file (e.g. ttfp.jl) and runnin...

adgjlsfhk1 · on Dec 4, 2023

Also for the record, 1.10 rc1 brings the time down to 1.1 seconds (from 1.9 seconds in version 1.9 on my laptop). There's still more work to do, but there's been a ton of progress on this in the last year or so.

sertbdfgbnfgsd · on Dec 4, 2023

No point having this conversation. These people are still repeating memes from 5 years ago. I've met one of them in the wild, it was quite eye opening.

jampekka · on Dec 4, 2023

You are probably having a different conversation. This is my conversation:

$ time ./julia-1.10.0-rc2/bin/julia -e "using Plots; using DynamicalSystems;"

0m5.433s

This is a different conversation from

julia> @time plot(x, y)

  0.000465 seconds (484 allocations: 45.992 KiB)

But I agree that the conversation is pointless. The TTFX will never be fixed for Julia to be used with sane workflows because the few still using Julia don't want to accept the problem. But the flamewars are fun anyhow and maybe they prevent the next Julia from making the same mistakes.

adgjlsfhk1 · on Dec 4, 2023

I think this conversation is absolutely worth having. It's great to see that 1.10 is ~3x faster here, and there's probably another ~2 to 3x improvement or so from profiling DynamicalSystems to see where the time is going. That said, for sub second startup time with lots of packages, I highly recommend using PackageCompiler to make a custom system image which brings the using time down to roughly 0.

jampekka · on Dec 4, 2023

Is shaving seconds-per-package import a feasible way to get the startup time fast enough for script-type workflows? Before long you end up importing 10 packages and it will be too slow even if each package is 1s. I appreciate that the devs seem to want to do it "the right way" but that seems have taken quite a while.

Wouldn't something like slow interpreter option for "glue code" be feasible? Most code doesn't need to be fast, and waiting for the "hot functions" to compile on change wouldn't be too bad. And you could still AOT for maximum speed when needed. This is essentially how Javascript JITs manage to start up so fast.

Another fine "hack" for many cases would be to keep the Julia process running (like a daemon) but making sure the state is clean when a script is run on it.

Something like PackageCompiler is workable, at least for the short term. Long compiles are OK if they don't have to be done all the time. I indeed manage to get < 1s TTFP with DynamicalSystems with it (last time I tried a while ago I didn't manage). The UX for creating sysimages could be nicer, but doesn't seem anything that a quick script can't fix. Maybe I'll give Julia another go in my next analysis.

From what I gather there's now a new approach for caching the compilation results. I really hope it will succeed, but I'm a bit jaded from the many "Julia 1.x fixes the TTFX" news.

ChrisRackauckas · on Dec 5, 2023

The next step is to enable more post-invalidation precompile for Plots. This is something we recently setup with Symbolics.jl since that is another package which has some "invalidation by design" patterns, where certain assumptions made in the compiler (such as (x==y)::Bool) are invalidated by symbolic numbers acting differently than a lot of numbers and thus some over-eager optimizations require being removed upon using. PrecompileTools.@recompile_invalidations is a recent (v1.9 era) tool that allows for forcing the compile at package precompile time to consider the environment post-invalidation and thus re-precompile, effectively reducing the TTFX in these cases where invalidation is always going to occur by design. Plots.jl is a case like this because its recipe system is effectively allowing people to extend the plotting pipeline well after Plots is loaded, which is its main killer feature but effectively means that one should expect invalidations from it. I think that recompilation post invalidation should make a strong dent in that and plan to see if that's the case in a few months when I get the time to dive in.

> Wouldn't something like slow interpreter option for "glue code" be feasible? Most code doesn't need to be fast, and waiting for the "hot functions" to compile on change wouldn't be too bad. And you could still AOT for maximum speed when needed. This is essentially how Javascript JITs manage to start up so fast.

That is something I've been meaning to investigate more. A lot of the Plots.jl code is internally dynamic because of its design to hold everything in a Dict{Any} and recurse the plot recipes. In other words, it doesn't even benefit from having the JIT at all because internally everything is uninferred boxed variables. There's probably a good way to just do "please interpret the calls in this part of the pipeline" and not see any runtime difference but chop out the core of the compilation. It's relatively straightforward to do, it's a one liner using JuliaInterpreter.jl @interpret, and I think one dig that results in a win would be a nice example of what should be a pattern we do more often in the near future.

jampekka · on Dec 5, 2023

I did briefly look at JuliaInterpreter.jl and if something like that can be used for general purpose, the TTFX would be fixed, the complaints would stop and I'd wager Julia would gain a lot in popularity. I really hope it happens.

For now the interpreter seems to focus on debugging, and if I understand correctly, it can't be really used to e.g. speed up imports or to specify hot code to be AOT'd/JITed?

sgt101 · on Dec 4, 2023

Ahh I getcha a bit. What is the usecase though - which script-type workflows? (genuine curiosity, not just asking!)

jampekka · on Dec 4, 2023

Typically modeling timeseries data and comparing it to data collected from humans (e.g. car telemetry, motion tracking or eye tracking signals). The models are stochastic dynamical systems which have to be optimized against the data. It's more or less exactly in Julia's niche.

The script typically loads the data files and runs simulations (a lot of times when their parameters are optimized). This means the models have to be fast, but they are also recurrent in nature so they can't be vectorized to numpy. So depending on the case it's usually numba, cython or C++, all of which can be quite painful.

sgt101 · on Dec 4, 2023

So what you are saying is that you would like a process that was live - with everything loaded, and ready to respond to any code that you wanted it to run, as opposed to having to bring up a new Juila process for each step in the pipeline.

Can't you compile the pipeline into a single script and then run it in a single instance of whatever?

jampekka · on Dec 4, 2023

I want the opposite. To have different processes for each step. Potentially the different steps being in different languages.

Or at least semantics of that. The steps should be independently callable and each call should result in the same output with same parameters, i.e. the state would reset for each run. I don't care so much how this is exactly implemented as long as it fullfills these.

The data doesn't need to stay in the process memory. It's can be e.g. mmapped and is cached by the OS anyway. And deserializing tens or even hundreds of megabytes takes usually less time than Julia takes to import Plots.

This is the thing that isn't feasible with Julia due to the TTFP (each step incurs the startup latency).

sertbdfgbnfgsd · on Dec 4, 2023

You're building pipelines for someone else. I'm using the pipeline someone else built.

jampekka · on Dec 4, 2023

I'm building pipelines for myself to use. The result is often pipelines that other people can use too. It's all just code, there's nothing magical in building pipelines.

sertbdfgbnfgsd · on Dec 4, 2023

I would love to see something you've built.

Do you have anything that you've shared publicly?

jampekka · on Dec 4, 2023

Here's an example: https://github.com/jampekka/vddfit

Codewise it's not pretty, especially the data mangling, but other people have managed to use it for their own purposes.

The model had to be implemented in C++ and wrapped for python bindings. It's a pain but sadly less painful than doing it with Julia's startup time.

sertbdfgbnfgsd · on Dec 4, 2023

Blast from the past from my days in the academia.

Ok thanks for sharing.

(I still disagree about everything you said about Julia though. I bet you I could re-implement this faster in Julia thank your C++, and in a type generic way too which would make it trivial for users to plug in their types, and it would be trivial to package - I'm assuming you didn't package that repo, you just expect your users to be shuffling files around. You're not contributing to improving the reproducibility situation in the academia, let me tell you xD)

jampekka · on Dec 4, 2023

So the handful of _Julia users_ with their totally unreproducible REPL mess can "trivially" plug in their types? Also saying the horrible mess that is Julia's module system is "trivial to package" is delusional.

I didn't package it and the data mangling part has very little use outside this specific experimental design. Also Python's packaging is quite shitty.

The C++ code is totally standalone and can be git-pulled and compiled how you like. C++ packaging is totally abysmal.

As I said in the comment and the README the code is a mess (although a piece of art compared to the notebook shit out there). It's purpose is that the results can be reproduced from the raw data.

I would love to make my analysis code cleaner. But there's zero incentive for that and very little time. Nobody sadly cares about the code, and most scientists can't code for shit. This is probably why Julia was designed so that it forces your code to be shit.

Sorry but the tone, but you kinda set it.

sgt101 · on Dec 4, 2023

I mean DynamicalSystems is big - 183 subpackages... it doesn't surprise me that it requires some time to pull it in and make it available to other code.. what's the design choice that they should have made that would avoid this?

ChrisRackauckas · on Dec 4, 2023

It can be improved though. Part of the issue is my fault and I plan to redo the DiffEq solver defaulting mechanism in order to reduce the amount that needs to precompile. The design of how to do it is already pretty clear and it's mostly about having the time to do the grunt work. I hoped to get it up last month but had a bit too much travel, but it shouldn't go beyond January. After that's up I plan to reassess some of the profiles for what the remaining pieces are and post some future plans for specifically this part of the ecosystem, but for now the ball is in my court and I need to make some improvements here.

jampekka · on Dec 4, 2023

E.g. javascript can have thousands of subpackages/dependencies and it starts up instantly. Scipy is huge but importing it is practically instant. "Classic" AOT can take ages to compile, but it doesn't have to compile on every run.

JS does this by multi-phase JIT. Python by being dog slow in the execution and AOT by being dog slow in the compilation.

I think Julia's "JIT-AOT" may be a good approach (and perhaps almost necessary for Julia's dispatch) too but it seems to be very hard to make start up fast.