Hacker News new | past | comments | ask | show | jobs | submit login
Intel Extension for Scikit-Learn (intel.github.io)
183 points by privong on Nov 1, 2021 | hide | past | favorite | 66 comments



Accelerating scikit-learn is a smart move. At the algorithmic level for every ML use case there is probably x 10 non-ML data science projects. Also, it is good to have a true community framework that does not depend on the success of the metaverse for funding ;-)

The lock-in is an important consideration, but if the scikit-learn API is fully respected it would seem less relevant. It also suggests a pattern for how other hardware vendors could accelerate scikit-learn as a genuine contribution?


i personally think that it would be a more interesting move in its foray into hardware acceleration if intel gives first class support to julia


Intel are focused on data-parallel C++ for delivering high performance, rightly or wrongly.

Julia is one of those "nice in theory" options which has failed to live up to the hype and at this point seems unlikely to unseat python for most use-cases; it just doesn't have a good enough UX when used as a general purpose language.


im not sure what you mean by UX in this context, but julias ecosystem for scientific computing (in a broad sense) has been growing tremendously. this is the area from which it wants to unseat python. general purpose programming is secondary. whether it can i don't know. but i definitely dont think its a settled question. python is my daily driver for machine learning work, but i definitely think julia can overtake its place eventually


Hi, I would love to hear more about your complaints regarding UX for general purpose programming


The editor autocompletion and standard library documentation could use a lot of work. The introductory tutorials are overly focused on type theory and details and do not give a good overview of which generic data structures to use in production code. Julia's JIT is very different from other conventional mainstream languages and the process of selecting standard library generic data structures for optimal performance is very poorly documented.

There is no Effective Julia style of guide. You either have to wade through infantile tutorials for those with minimal programming experience or several reference books worth of nitpicking on syntax. The actual methods themselves are not well documented and lack examples and usage guidelines.

The language and ecosystem do not feel like a project backed by commercial funding, it feels like one of those functional languages out of academia research where the structure and design of the language are more important than actual developer experience. There are many new projects but most are not actively maintained and updated. The language itself feels massive, with syntactic sugar and weird types everywhere. Trying to understand the implementations of other people's Julia code is frustrating, similar to reading a library written in pure C++ templates. Compared to Go/Rust/Dart, Julia feels overly convoluted. Julia literature is structured in a way that seems to heavily encourage you to take regular classes and lectures to learn and pick up the language. It is hard to feel productive from the get-go.


Someone else commented with more detail, but personally I can't get past the package management and the dependency on using the REPL. Rust gets tooling and packages right.


>the dependency on using the REPL

This is a feature for a lot of Julia's core audience (data scientists like me, who grew up with R).


exactly right. REPL is a key feature for computational scientists. in physics, for example, the value proposition of python was that it provided an open source alternative to matlab (also a REPL based environment) without sacrificing functionality. i believe that the really revolutionary thing with python was that it provided an extremely fertile ground for open source development of numerical methods that far exceeded what was offered on matlab in syntax that resembled pseudo code (much like matlab). julia's value proposition is all of that, plus a much more performant base language with arguably even better syntax


Is that REPL usage still relevant in a world of Jupyter notebooks?

Getting started with Julia always just feels clunky to me - perhaps the other commenter was closer to the mark in blaming the documentation rather than the REPL itself. Either way, despite being a former scientist who has moved into IT (sadly), I get the distinct impression that the language is just not aimed at me. As such, I'm always surprised to see people trying to push it in settings outside its current realm of adoption; feels very much like the language maintainers have no real interest in that.


REPL is pretty similar to jupyter. Julia works well with jupyter, but if you like notebooks, you should definitely look at Pluto. It's a reactive notebook so it automatically tracks cell dependencies and makes it so your notebook is never in an inconsistent state.


As it turns out I really don't like notebooks, but that is due to the state issues - so it sounds like I should check out Pluto. Thanks!


I actually kinda do like Julia's syntax, but the OP's comments about poor introductory documentation for experienced programmers definitely ring very true.


PRs to improve docs are always welcome (especially from people who are new to Julia).


In order to make that PR I would first need to learn Julia, which is what I want the docs for.


I don't dispute that, but for people not noodling around with data who just want to make something immediately useful (ie general purpose programming use-cases), the Julia approach to dependency packaging is sub-par.


> Intel are focused on data-parallel C++ for delivering high performance, rightly or wrongly.

They also invest efforts in making it possible to write high performance kernels in Python using an extension to the numba Python compiler:

https://github.com/IntelPython/numba-dppy


Hi all,

Currently some work is being done to improve computational primitives of scikit-learn to enhance its overhaul performances natively.

You can have a look at this exploratory PR: https://github.com/scikit-learn/scikit-learn/pull/20254

This other PR is a clear revamp of this previous one: https://github.com/scikit-learn/scikit-learn/pull/21462

Cheers, Julien.


Intel seems 6 years too late to the party CUDA started. That said, it could pick up traction: academics have increasingly been using pytorch.

EDIT: Perhaps its my inexperience, but is anyone else confused by the OneAPI rollout? There isn't exactly backwards compatiblity with the Classic Intel compiler, and an embarassing amount of time elapsed until I realized "Data Parallel C++" doesn't refer to parallel programming in C++, but rather an Intel-developed API built atop C++.


Perhaps things have changed since I last poked at this, so, standard disclaimers, take my comments with a grain of salt, etc.

GPU acceleration is not a magic "go fast" machine. It only works for certain classes of embarrassingly parallel algorithms. In a nutshell, the parallel regions need to be long enough that the speedup from doing them in the GPU's silicon outweighs the relatively high cost of getting data into and out of the GPU.

That's a fairly easy scenario to achieve with neural networks, which have a pretty high math-to-data ratio. Other machine learning algorithms, not necessarily. But basically all of them can benefit from the CPU's vector instructions, because they live in the CPU rather than out on a peripheral, so there's no hole you need to dig yourself out of before they can deliver a net benefit.

I would also say that what academics are doing is not necessarily a good barometer for what others are doing. In another nutshell, academics' professional incentives encourage them to prefer the fanciest thing that could possibly work, because their job is to push the frontiers of knowledge and technology.

Most people out in industry, though, are incentivized to do the simplest thing that could possibly work, because their job is to deliver software that is reliable and delivers a high return on investment.


Maybe the solution is a discrete SOC for ML? CPU and GPU on a card with shared memory like apples M1


I personally wouldn't bother. If you're not doing deep learning, existing hardware is already good enough that, while I can't say that nobody could get any value out of it, I'm personally not seeing the need. I'd much rather focus on the things that are actually costing me time and money, like data integrity.

Like, I would guess that the potential benefit to my team's productivity from eliminating (over)reliance on weakly typed formats such as JSON from our information systems could be orders of magnitude greater.


I can't imagine that the overlap between those using Scikit-Learn and those willing to buy and integrate ML-specialized hardware is that high. I think a lot of real-world usage of simpler ML libraries like Scikit-Learn is deploying small models onto an already existing x86 or ARM system which had cycles to spare for some basic classification or regression.


I mean, Amazon and Google are already doing that... and there's companies making ML ASICs.

Problem is... the ASICs are really good for certain classes of ML problems but aren't really all that general.


If they're more open with it than nvidia, they have a chance in my opinion.


RAPIDS by NVIDIA has an equivalent API open source version of Sckit-Learn https://docs.rapids.ai/api/cuml/stable/ which seems to offer 100x speedup for a lot of these models.


They also made some entry into the R space with adding their MKL / BLAS library:

https://www.intel.com/content/www/us/en/developer/articles/t...


Just tried the patch in Google Colab and results for the example code were actually about 20% slower than without the patch.

https://imgur.com/a/7EmlYJy

What am I missing?

edit: it seems my instance was using AMD EPYC.


You have to import KMeans again after patching.

Also in you example it's tiny problem size that have lot of fluctuations. Basically in you code you are running stock both times.

this extension also would bring perf to AMD, although Intel would be better optimized


The syntax and usability of

    from sklearnex import patch_sklearn
    # The names match scikit-learn estimators
    patch_sklearn("SVC")
seems quite clunky. I'd have preferred a syntax like

    from sklearnex import SVC
    
Then, maintenance would be substantially easier. If sklearnex had import-level compatibility with sklearn it'd be as simple as some simple replacements,

    import sklearn --> import sklearnex as sklearn

    from sklearn.cluster import KMeans --> from sklearnex.cluster import KMeans
which seems much easier / clearer.


Import magic ends up causing all kinds of problems. There's no way to tell it "I just want to import your classes; I don't want you to patch!" without literally patching the library.

In general, I'm a fan of "let me call the initializer myself, at program startup." It's especially important when you want reversibility, i.e. teardown in addition to initialization, which pops up all the time for unit tests.


Also networking/web apps with lifecycle hooks, where careless import-time logic can break the setup procedures. To quote zen of python, "explicit is better than implicit"


I really really loathe import magics. You end up with situations where dependencies change global behavior without a way to track down where the change is actually coming from.


patching is just one of options to enable optimizations - You also can do already sklearnex import SVC

There are other ways how you can enable this https://intel.github.io/scikit-learn-intelex/what-is-patchin...


A 5000x boost in KNN inference is not bad.

Generally speaking the distribution-packaged versions of python and all its scientific libraries and their support libraries are best ignored. That stuff should always be rebuilt to suit your actual production hardware, instead of a 2007-era Opteron.


Is there a way via pip/conda to compile these to your environment directly? I see most people just pull from repositories and sometimes see wheel discussed.


conda install scikit-learn-intelex -c conda-forge


Thanks Peter. I thought conda-forge was a repository/channel, not a command to compile to local environment.

A few followups: (1) Is this usable for non-intelex packages? (2) What about packages not in conda's channels?


I am busy installing it now. Anaconda should take care of required packages? I am not actually sure. Seems to be working.

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


Is there a demo that shows this 5000x speedup?

(A jump that large suggests to me they may be fixing issues in the default implementation that could also be fixed for other processors!)


Looks like they are responding to https://github.com/intel/scikit-learn-intelex#-acceleration

I completely agree. I hope some Intel competitor funds a scikit-learn developer to read this code and extract all the portable performance improvements.


The point is that sklernex would bring performance for all X86 architectures, not just Intel. And yes scikit developers already working on generic improvements there


As @jjerphan commented above, there is already ongoing effort to get an improved portable brute-force implementation in vanilla scikit-learn, see:

https://news.ycombinator.com/item?id=29069760


As cool as this is, why would you lock yourself into Intel?

Especially with cloud providers making arm processors available at lower prices.

At the same time: "Intel® Extension for Scikit-learn* is a free software AI accelerator that brings over 10-100X acceleration across a variety of applications."

Maybe their free software could be extended to all processors?


It looks more like optimized kernels for some operations, rather than extended functionality. Which is to say, using it shouldn't produce any lock-in for well structured projects -- it is like changing which BLAS library you've linked to.

Not sure what kind of secret sauce they've included, but it is Intel so their specific advantage is that they know everything about their processors and can provide really low level optimizations which might not necessarily be super portable.


I listened to an interesting CPPCast episode where they interviewed someone from Intel's compiler team.

(I'm just guessing that a lot of the benefit here comes from building with Intel's compiler rather than GCC.)

It sounded like the bulk of the benefits they get are just from using profile-guided optimization to maximize the cache-friendliness of the code. I would guess those kinds of optimizations are readily portable to any CPU with a similar layout and cache sizes. I would not expect, though, that they are actively detrimental (compared to whatever the official sklearn builds are doing) on CPUs that have a different cache layout.


Huh, wasn't aware of CPPCast, it seems neat. My podcast listening has mostly been politics, just because they seem to be in much greater supply. Now I just need to find a fortran cast. They could call it... FORTCAST.


There isn't one that I know of, but the recent CPPCast episode on Fortran was very good.


I know people keep saying Intel is dead, but it's not entirely accurate imo.

All of my machines still use Intels (other than my SBCs). So installing this and running it is trivial.

Intel is still a major contributor to the Linux kernel. Thus, all their CPUs have first-class support for it. AMD fired all their Linux engineers some time back. They never rehired them to my knowledge.

Then there's things like this (MKL libraries are another). Intel spends a lot more money on development of these little libraries which does meaningfully speed up processes. Those processes affect my day-to-day work as a software engineer.

That adds up when I have to deploy on the cloud. ARM is not quite there yet and little hiccups at deploy time are a pain when the cost difference is not so significant relative to the hourly cost of my time. Linus Torvalds pointed this out about ARM, stating it couldn't ever take off unless it took off on the desktop.


> AMD fired all their Linux engineers some time back. They never rehired them to my knowledge.

My understanding is that AMD regularly contributes to the Linux kernel for their CPU and GPU lines. How would they do this without Linux engineers?


AMD has had multiple hiring rounds for Linux kernel engineers and their efforts regarding GPU support were never interrupted, so I dunno where you got that AMD fired "all their Linux engineers".


I don't think anyone is saying Intel is actually currently dead. They're clearly not. But their trajectory is not headed the right way.


They claim API compatibility with standard scikit-learn. If that’s true, you can optionally run with sklearnx, or not, without any rewriting of code. Sounds fair to me.

Intel has done similar work before in the C/Fortran world; see BLAS, LAPACK, and FFTW vs MKL.


This is not Intel specific according to https://intel.github.io/scikit-learn-intelex/system-requirem...

Just requires an x86 processor with "at least one of SSE2, AVX, AVX2, AVX512 instruction sets."


Is this Intel only or oneAPI which is supposed to be cross platform. Not entirely clear which makes me suspicious.


> oneAPI Data Analytics Library (oneDAL) is a powerful machine learning library that helps speed up big data analysis. oneDAL solvers are also used in Intel Distribution for Python for scikit-learn optimization.

> oneDAL is part of oneAPI.

So oneAPI is cross industry but this only works with Intel CPUs?

Hmm. Not sure I’m buying this Intel. Sounds like you’re claiming to be open but locking people into Intel only libraries.


https://github.com/intel/scikit-learn-intelex

CuML is similar to Intel Extension for Scikit-Learn in function? https://github.com/rapidsai/cuml

> cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.


What changes have been made to get the speedups?


vectorization, threading, cache optimizations, etc...


Is there a specific “test” to run as a performance standard for scikit? I noticed this the other day that my Mac mini M1 absolutely blows away my MacBook Air 2020 with an i7. I was always curious if there was a good way to gauge performance.


Any idea if this can be used with Jupyter?

I have a bunch of notebooks that take 4-8 hours to run. This could potentially make my life much easier.


yes, notebooks have no difference for sklernex

You likely not exporting algorithms from scikit learn again after patching - i.e. patch call should be made prior to import.

Note there are kaggle notebooks that showcase same optimizations - https://www.kaggle.com/napetrov/tps04-svm-with-intel-extensi... They are also basically notebooks


You can always export jupyter notebooks to straight .py files.

Also: how in th


No reason not to try it


I did, but I see absolutely no performance improvements.

Maybe notebooks require something more?


I don't know your setup but regular jupyterlab notebook execution is really exactly like python in almost everything, there's not much that should be different.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: