Hacker News new | past | comments | ask | show | jobs | submit login
Taichi lang: High-performance parallel programming in Python (taichi-lang.org)
179 points by whereistimbo on March 9, 2023 | hide | past | favorite | 100 comments



Out of curiosity, I rewrote their prime counter example to use a sieve instead of being a silly maximally-computation-dense example.

To make it work with taichi, I had to change the declaration of the sieve from sieve = [0] * N to sieve = ti.field(ti.i8, shape=N) but the rest of the code remained the same.

Ordinary Python:

time elapsed: 0.444s

Taichi ignoring compile time, I believe:

time elapsed: 0.119s

A slightly more realistic example than the 10x+ improvement they show on the really toy example with results that aren't too bad. I'd take a 3x improvement for tiny changes. Pretty neat!

(I tried some other trivial things like using np.int8 and it was slower. One can obviously make this a ton faster but I was interested in seeing how the toy was if we just made it slightly more memory-bound).

A negative was that throwing list comprehensions in made the python version faster - about 0.3 seconds - (and shorter and arguably more "pythonic") and simultaneously broke the port to taichi.


I tried implementing the same and I am getting 500ms vs 20ms with wrong answer in the first call in taichi but correct in subsequent calls. I guess I found some bug in taichi: https://imgur.com/a/lpK2iVF

Could you share your code as well.

    N = 1000000

    isnotprime = [0] * N

    def count_primes(n: int) -> int:
        count = 0
        for k in range(2, n):
            if isnotprime[k] == 0:
                count += 1
                for l in range(2, n // k):
                    isnotprime[l * k] = 1

        return count

    import taichi as ti
    ti.init(arch=ti.cpu)

    isnotprime = ti.field(ti.i8, shape=(N, ))

    @ti.kernel
    def count_primes(n: ti.i32) -> int:
        count = 0
        for k in range(2, n):
            if isnotprime[k] == 0:
                count += 1
                for l in range(2, n // k):
                    isnotprime[l * k] = 1

        return count


Python:

    import time
    import math
    import numpy as np
    
    N = 1000000
    SN = math.floor(math.sqrt(N))
    sieve = [False] * N
    
    def init_sieve():
        for i in range(2, SN):
            if not sieve[i]:
                k = i*2
                while k < N:
                    sieve[k] = True
                    k += i
            
    
    def count_primes(n: int) -> int:
        return (N-2) - sum(sieve)
    
    start = time.perf_counter()
    init_sieve()
    print(f"Number of primes: {count_primes(N)}")
    print(f"time elapsed: {time.perf_counter() - start}/s")
Taichi:

    import taichi as ti
    import time
    import math
    ti.init(arch=ti.cpu)
    
    N = 1000000
    SN = math.floor(math.sqrt(N))
    sieve = ti.field(ti.i8, shape=N)
    
    @ti.kernel
    def init_sieve():
        for i in range(2, SN):
            if sieve[i] == 0:
                k = i*2
                while k < N:
                    sieve[k] = 1
                    k += i
            
    @ti.kernel
    def count_primes(n: int) -> int:
        count = 0
        for i in range(2, N):
            if (sieve[i] == 0):
                count += 1
        return count
    
    start = time.perf_counter()
    init_sieve()
    print(f"Number of primes: {count_primes(N)}")
    print(f"time elapsed: {time.perf_counter() - start}/s")
(The difference of using 0 vs False is tiny; I had just been poking at the python code to think about how I'd make it more pythonic and see if that made it worse to do taichi)


But how fast does it go with pypy and no changes to the code?


Interested as well


On my old Mac: Python=133.5 s, Numba=2.61 s (parallel prange in count_primes), Taichi=1.8 s. (on ti.cpu, but fails with metal).


Might be worthwhile to run the same code with an appropriate numba decorator. My guess is that you'd get at least as much speed up but without having to change the sieve declaration, but I'm not sure.


> No barrier to entry for Python users: Taichi shares almost the same syntax as Python. Apply a single Taichi decorator, and your functions are automatically turned into optimized machine code.

It looks super interesting except “almost the same syntax as Python” part here seems like such a foot gun for everything from IDE integration to subtle bugs and more.

I was super into the idea of a strict Python subset that gets JIT compiled inline based on just a decorator.



I helped someone that had to use taichi code written by a PhD student and it was a bit weird. It looks a lot like python but you have to code like you would in cuda (e.g control flow), there is no magic.

For this we had to calculate forces to animate some kind of polygon with a lot of joints and we could not just call sycipy from the taichi code. I had to implement a very dirty polynomial equation solver in taichi for the demo


I was playing earlier today with Triton[0], from OpenAI. Like Taichi it makes it super easy to write native GPU code from Python, but it really does feel like something very experimental for now. (I know the use case is very different)

[0] https://openai.com/research/triton


Triton is clearly a popular name for GPU access and inference[0].

[0] https://developer.nvidia.com/nvidia-triton-inference-server


>but it really does feel like something very experimental for now

Meaning?

- their approach is still bizarre and exploratory and they still don't know how to structure their APIs and are making it up as they go?

or:

- there are still some rough edges, bugs, and no full documentation yet?

as those are quite different cases...


>CPUs and AMD GPUs are not supported at the moment

CUDA-only, no mention of Metal.


Please don't retrofit more stuff to make python work. move over to julia already. u can call python from Julia


It's been awhile, but last time I was doing serious work in Julia things were a little janky. For example, the REPL would segfault sometimes if I Ctrl-C'ed during heavy computations. And Flux at first seems like it will work on any code, which seems amazing, but then you find out at runtime that one of the operations you used isn't supported and get a runtime error. PyTorch might not work on regular Python code, but at least I know the APIs provided by PyTorch will work, even though they are a subset of what can be done in regular Python.

Still, most things worked in Julia, and there have been many improvements since then so I suspect the few remaining rough spots are being smoothed out. In the future I will be happy if I get to work with Julia more.


Yeah. It's a chicken and egg thing. Imagine if resources applied to PyTorch was spent implementing in Julia.

But then there's not enough users... so the cycle continues until one day Julia hits critical mass and a tipping point is reached.


I used to be very optimistic about Julia, but my enthusiasm has really cooled off the last year. I find JAX+jax_dataclasses gives me 90% of what I wanted from Julia (a nice monad library that is compatible with jit’ed code and the ability to cache compiled code would make it a solidly better language).


I don't know whether the monad library is nice but caching compiles code is something they are making great progress on. Veraion 1.9 seems able to store most compiled code on disk.


In a winner-takes-all finite world a tipping point may never happen...


It was even close to death


I think the Julia people underestimate how many people, and how much those people, like the syntax of Python. If they'd stuck more to that they might have been able to have won more people over.. (white space, plus many other things).


It is not just syntax. While python has Java-like OOP from a far, Julia has a distinct language design. It doesn't have a concept of "class" in the traditional sense. It instead has multiple dispatch, which is very flexible but sometimes too flexible to control. I found Julia harder to write for an averaged programmer. Furthermore, the time-to-first-plot problem had pissed off many early adopters (I know a few and they won't come back) and apparently remains a problem for some [1].

Julia is a great niche language for what it is good at. It will survive but won't gain much popularity.

[1] https://discourse.julialang.org/t/very-slow-time-to-first-pl...


There are even more issues with Julia than you mention there. But the point is there are _many issues_ with _every_ programming language: Python is slow, its package management story is a dumpster fire, its object model is bizarre and confusing, its typechecking mechanism sucks. Also, its documentation is terrible (docstrings and documentation of functions) and the built-in REPL is basically useless. Rust has slow compile times, too much macro magic leading to unreadable code, the 90% of what the borrow checker rejects is perfectly fine code, its trait coherence rules are too restrictive, async sucks, and it's totally unsuitable for quick experimentation.

These are all real limitations, but users of these languages learn to live with them. Rustacenans learn to start a build, then so something else while it builds. That is, on its face, totally unacceptable to a Pythonista. Pythonistas learn to always ship performance sensitive applications with _another language_ doing all the hard work: Totally unacceptable for a Rustacean.

If someone tries out Python, spends five minutes getting package management to work and fails, they have not seriously thought about Python as a language. I feel people do just that with Julia: Try it out, then reject it on the first rough edge.

Fair enough: There are many people for whom lack of static type checking or Julia's latency is a showstopper, making the language unsuitable. But I'm still firmly convinced that for scientists/engineers at least, Julia, on balance, offers a better language than Python. I hope you're wrong that it won't gain much popularity.


i work in a bank leading some numerical work and I've been training my staff on Julia so we can write monte-simulation code that is fast. heck, the team builds their model in C++ and we soundly beat them with Julia on the code that generates the same results.

when we looked at it, it was because c++ is hard af to learn and the developer dont use it to its full potential. they just use some quantlib which where you look under the hood has many unoptimized parts.

with julia, the code is so simple and clean we even put in some GPU code in one place using CUDA and complete blows the C++ out of the water.

I did achieve some good performace with numba once thought with the avx so pythoni snot all bad but numba is only a small subset of python but with julia i can do crazily fast stuff that looks like python and is readable.


It seems that Julia has been used for numerical stuffs often. Its redmonk ranking is above some great languages like nim and erlang. This is already an achievement for a community-backed open-source language. By "much popularity", I meant "as popular as Java or Python". I had a high hope for Julia.


Actually Python OOP is closer to C++ and Smalltalk than Java, but I digress.


Time to first plot should be all but solved as of v1.9.0 (first stable release ~two days ago).


It's not going to be solved. Stop saying this. It's a great step forward and a real user experience improvement, but it doesn't solve it.

Packages need to precompile, and they don't. They need to fix invalidations, and they don't. They need to fix inference issues in their packages and they don't. "Using" time remains quite high.


"solved" is a big claim. better to just let the speed speaks for itself. ttfp is still somewhat of an issue


Technically we've only released the first release candidate for 1.9. There's still a few bugs in it, and we'll probably have at least 1 more release candidate before 1.9.0 is released.


As someone who switched from python to Julia, Julia syntax supports things like list comprehensions, but also many better things like broadcasting that python really needs. The only thing python has over Julia is that Julia requires an “end” keyword where python uses indentation. But julia has so much more than python (like macros) that it’s just better syntax wise.


Python has macros too.


well. python has ALOT of stuff but alot of it is garbage compared to how they should be done. like python's lambda, you literally have to write the word lambda. it's ugly af

i bet the python macros will be ugly af too.

also walrus operator = bdfl no more.


as of 2020 (PEP 638), python has macros (but they're still slightly janky). Before that there were decorators which look kind of similar but are a lot less powerful.


> as of 2020 (PEP 638)

>> Reference Implementation >> None as yet.

> but they're still slightly janky

maybe you're from the future?


Are they implemented anywhere? It looks like the PEP is draft. You can do all the macro like things you want using framehacks and eval at the top level. Where they are hard to apply w/o direct support is down inside some function. Then you have to bytecode patch the result on first execution.


Not just decorators, but all kinds of functionalities that allowed types to be constructed at runtime.


One of the features that I dislike the most abiout Julia is how list comprehensions have been directlly ported from Python. This was a conscious choice they made out of pragmatism, not because of the merits of the design.

I don't doubt that what you say it's true, but to me, it comes down more to lack of familiarity with other languages than any actual merits of Python syntax and semantics. Frankly, I am glad the didn't take more from Python.


New language developers really overestimate most people's willingness to learn new languages


Yes. It's quite amazing really. Also, people really hate using more than one language. Web developers will twist themselves into incredible knots to avoid having to write HTML and CSS and Javascript in the same project.


Not that amazing. People who build languages are people who like languages, and therefore have tried a lot of them and find learning new ones easy and rewarding.


For what it’s worth, while I’m perfectly comfortable with HTML and CSS, I prefer JSX + CSS-in-JS. This isn’t because it’s one language versus three. It’s because it promotes colocation of related code, and allows a high degree of composability that’s much harder to achieve otherwise. Colocation is great for understanding what a given thing does, and for identifying its dependencies. Composability is great for mapping a domain model to lower level concepts in a coherent way, and for encapsulating dependencies.

This is somewhat achievable with separate languages. But it ultimately takes a very similar shape with additional tooling/ceremony and diminished benefits particularly from colocation.


Nonsense, we do it all the time, we just do it under the guise of Javascript. We just turned HTML into JSX and put our CSS in template literals.


Exactly.


It’s not even the language that causes the most friction, it’s the runtime, libraries, package manager, build system, foreign function interfaces, and on and on…


This, more than anything and it doesn't only apply to the "new" languages. I spent the last year picking up basic Common Lisp and Elixir just for the fun of it. As "languages" they are great, but their ecosystems made me realize how spoiled I am working in Java.


Hundred percent. You're not competing with Python, you're competing with Pip.


It wish that was the case. Then Pkg would have already won, seeing that it's so obviously superior in every possible way.


Pip is the wrong comparison; it should be with PyPI. The package ecosystem is Python's killer app.


Ah, you're exactly right.


It’s not the language, it’s the leverage. I came to python in the 90s because of the white space (I was young and impressionable) and stayed for the flexibility and wealth of packages and projects out there.


not overestimate. i think they know of the challenges and stickiness of python die hards.

but there's a better way, there's a better way.

like python used to be less popular than perl but look at where perl is now vs python. things do take time to change though


As someone in the Julia ecosystem, I wouldn't know where to begin with writing performant python. Is it Numba? Taichi? Torchscript? NumPy? Cython? Pypy? None of these seem to work together (besides calling each other, I suppose).


Wow, it seems like you have a number of good options to choose from. What a horrible situation, to have a language with a really rich ecosystem of powerful libraries.


In my view they are a broad selection of bad choices, neither of whom will generally work to write fast code. I tried them all on my last large Python project (admittedly, 5 years ago since I switched to Julia since).

Want to use NumPy? Better hope you have code that can be expressed as array computations (a tiny minority of performance sensitive tasks in my field of work).

Cython? Now you're just writing another language, and get the joy of having to distribute compiled code in your Python package - for example, like the very fun segfaults I experienced because Conda will automatically override the system linker, breaking all compilation including my Cython modules.

Numba? Hope you don't make custom classes - you know, one of the basic features of the language.


The problem is you have 10 different options that all require rewriting your code and you can't share code with anyone who used any of the others (and almost all of them are still pretty slow compared to C++/Julia)


10 options to make up for the deficiencies of the base language? I wouldn't call that a good thing.


amen


decision paralysis is a real thing


Performant code is really a big group of code.

If you look at ML, Python is completely fine because all the processing that happens with matrix multiplication, even on CPUs, far, far, FAR outweighs all the setup stuff in volume of operations.

On the other hand, if majority of your application relies heavily on processing speed (i.e you need compare/jump operations rather just add/multiply/load/store of the GPUs), Python is going to be slow. In this case, if you want custom performant code, you write C extensions for the performant critical code, and launch them from higher level python code.

That being said, there is generally a library (like Taichi) that already does this for you.


The problem is I have no interest in calling Python from Julia, since I can just use Python.


Ah yes, the language that claims to look like Python and run like C while not being particularly close in either aspect.


That's quite silly. Julia ecosystem is non-existent. If we move nimlang would be the closest . PyPy shaping up nice for cext part and when fully compatible we will just use PyPy.org.

Julia language features are really weak.


I wouldn't consider Julia a replacement for Python


and its an ugly thing

they didn't get range composition right


1 indexed arrays has me confused as hell


yep, part of the problem


care to elaborate


In C and Python (which inherited it from C, I believe)

for(int k = 0; k != N; ++k) { dowork(k); }, similar in Python

ranges are [0...N). If I want just a sum, and would like to split work among multiple CPUs, I could write

s = sum(0...N) or s = sum(0...N/2) + sum(N/2...N) or s = sum(0...N/3) + sum(N/3...5N/7) + sum(5N/7...N)

and it will work beautifully. Ranges are decomposable and composable with any number(s) splitting original range.

[0...N/2) + [N/2...N) = [0...N)

When I first looked at Julia, I was ... unhappy. They didn't get simple ranges composability right. You could do it, sure, but it looks so ... unnnatural.


Tried that. It was fun. Didn't see any benefit though, went back to Python.

I don't care much about how the interface over LLVM looks like. As soon as I have the same result in the end, I'd rather stick to whatever has more users.


I used to hate Python too, but i complety 180ed in the past few years as I learned more about computer science.

If you look at compute in general, it can be pretty much be summed up as add/multiply/load/store/compare/jump (straight from Jim Keller). All the other instructions are more specialized versions of that, with some having dedicated hardware in CPUS.

If you need to do those 6 things as fast as possible, on single piece of hardware, you are most likely writing a video game. Thus video game development is pretty much C/C++ with a few things of Swift/C# sprinkled about.

If a single piece of hardware requirement goes away (i.e you are writing a distributed system to serve a web app), people quickly figured out that hardware is cheaper than developer salary, and also network latency is going to be the dominant thing for speed. This is the reason Python took off - its super quick to write and deploy applications, and instead of paying a developer $10k+ a month, you can just spend half that on more EC2s that handle the load just fine, even if the end user has to wait 1.5 seconds instead of 1.1 for a result.

If you don't need compare/jump, your program is essentially better off suited to running on the GPUs. OpenCL/CUDA came about because people realized that a lot of applications simply need to do math without any decisions along the way, and GPUS are much better at this. The paradigm is that you write kernels that you then load onto the GPU - this can be done in any language since you really just need to run the code once. I.e Python, despite being slow is used primarily for ML because of this.

Then there is multiply/add only, which you probably best know in implementation as ASICs for bitcoin mining that blew GPUs out of the water. When you don't have memory controllers and just load/store from predefined locations, your speed goes through the roof. This is the future of ML chips as well, where your compiler looks a lot like the verilog/hdl compilers for FPGAs.

Furthermore, with ML, the compare/jump and even load/store is being rolled into multiply/add. You have seemingly complex algorithms like GPT that make decisions, but without any branching. Technically speaking, a NAND gate is all you need to make a general purpose CPU and you need 2 neurons to simulate a NAND gate. So you can build an entire general purpose CPU from multiply/add.

So in the end, its absolutely worth investing in Python and making it better. Languages like Julia are currently better suited to performant tasks, but the necessity of writing performant code to run on CPUs is going away slowly with every day. Its better to have a high purpose language that allows you to put ideas into code as quickly as possible, and then have different specializations for more generic tasks.


julia is so nice but WHY did they have to decide to do a 1 based index?? like theres literally no reasonnnnnnnnnnnnnnnn


let's see, fortran, r, matlab.

all serious numerical languages have that. it's more natural

0 base indexing is only good for calculating memory offsets. Nothing else. like in go `vec[a,b]` is indexing `a` to `b-1` which is purely because it's more convenient due to 0-indexing. this `b-1` is hugely confusing and big gotcha for the layman.


> all serious numerical languages have that. it's more natural

And almost all serious general-purpose languages use 0-based. It is more natural at least to me. You see, this is exactly why 1-based index of Julia is disliked by many.


taichi and julia aren't exactly aimed at general-purposes folks like u though. so there's that.

you may not like rachel macadams, but she was never gonna be urs anyway. so meh


If your goal is to promote Julia, your comments here are having exactly the opposite effect on me.

Style aside, you can't simultaneously complain that Julia doesn't get the resources of a general-audience language and then talk down to "general-purposes folks". Julia could be the bees knees for math-y stuff. And if so, good for it. But Python's success is because its flexibility makes it pretty good for a wide range of things, even if it'll never be great for any one of them.


i bet u dont do numerical computing so julia ain't for u anyway.

i love python. in fact, in my daily run on cloud run it's one of the key component since it has a decent gcp client library for writing query code to bigquery.

but then i glue it up with bash and run some other analysis in julia.

i think you can have python, julia, bash, r, whatever. heck, i even dabled in go and react.

just saying julia is aimed at numerical first.

and i think 0-based indexing was merely a historical accident anyway due to lack of memory in early computers so use every last bit including the 0 even though it's not suitable for indexing but is suited for calculating mem offsets.

whatever, use whatever u like, not trying to promote julia to you or anyone else. just having a good argument on friday afternoon


Just to be clear, I am saying that you "having a good argument" is harming my impression of Julia. And you, of course.


You realize how absurd the discussion of array start index is when you learn Fortran arrays have naturally had arbitrary-start-index for over three decades: a(-9:0), a(0:9), and a(1:10) are all valid declarations of a vector of size 10.


Julia is an OK replacement for Fortran. Nothing more.


it would be wonderful if interpreter/compilers could just make this an option.


[flagged]


We've banned this account. Regardless of how right you are or feel you are, you can't attack other users like you did in this thread, and certainly not for ethnic/national/racial reasons.

I appreciate that you have lots of good points to make but we need you to abide by the site rules: https://news.ycombinator.com/newsguidelines.html. Please don't create accounts to break those rules with.


> Julia doesn’t have proper traits nor interfaces

as opposed to Python? where type hints are just that. Hints. It's not even enforced. But u can use mypy... right.... yeah, retrofit more stuff on python. why not? (sarcastic).

I am not a Chinese national btw, I am from Malaysia.


I’m talking about code organization, not types. In Julia, methods are grouped together by type dispatch, mostly. There are times when code reuse comes into play where it’s helpful, but mostly it just confuses the issue and introduces ambiguity into the structure of the program in the big picture sense.

My mistake, I saw your name was xiaodai and assumed.

My point is, Julia is not suited for general software, and it’s questionable as a tool for technical computing as well.

For running simulations and models where the code can be written and never looked at again, maybe it’s okay to use. But for anything else, Python is a lot nicer, and a lot more practical to write proper software in.


> My mistake, I saw your name was xiaodai and assumed.

Oh my bad, I disregarded your comment based on your assumed Chinese nationality and “spam” (also assumed?). Since you’re not Chinese you’re worth my time.

I don’t get it.


Luckily, I wasn’t replying to you? Weirdo.


I've seen this criticism that it's "difficult to reason about which method will actually be called because of the type system" once before, but it absolutely flummoxes me... I have literally never been confused about which method is going to be called by my code, and I'm not even a proper computer scientist (just a regular scientist scientist).

Maybe this is just an issue of not having years of OO habits influencing the way I reason about dispatch in Julia?? Honestly not sure.


I’ve never had any trouble understanding my code but understanding someone else’s multiple dispatch code has ALWAYS been hard for me.


wow. i ran their prime number python accelerator example on 10 000 000 upper bound:

(taichi) [X@X taichi]$ python primes.py

[Taichi] version 1.4.1, llvm 15.0.4, commit e67c674e, linux, python 3.9.14

[Taichi] Starting on arch=x64

Number of primes: 664579

time elapsed: 93.54279175889678/s

Number of primes: 664579

time elapsed: 0.5988388371188194/s


It seems like Taichi fast language, but it can also never be overstated how slow Python is on contemporary architectures.


Right, I'd like to see a comparison to Nim or Julia, or another compiled high-level language that isn't particularly performance-oriented like Haskell or Clojure, or Common Lisp, or even Ruby with its new JIT(s). Or for that matter, Python with Numba or one of the other JIT implementations (PyPy, Pyston, Cinder).


It doesn't support 3.11 yet. 3.10 is slow compared to 3.11.


yeah, but not on the level as the top level comment...


Is that improvement caused because the program is automatically parallelized or because the code is compiled/JITed? A x150 improvement is too much, so I suspect both reasons collaborate to get it.


they say the example i used is JIT compiled into machine code. i haven't looked into the codebase yet but i presume that means it just un-pythons it back into C? not sure.

fwiw, i tried the gpu target (cuda) and it was faster than vanilla, but slower than accelerated cpu target by about 4x.


How does it compare with numba?

(I don't know enough about the python ecosystem, but I have to tweak code from one of my coworkers and he uses numba.)


And if you run it on a number bigger than 2^64, does it error because taichi automatically assumes python's BigInts are Int64s?


This looks amazing! (particularly for some of my interests - https://gods.art)


If I need to use Python why should I use this instead of Jax




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: