ArrayFire, a general-purpose GPU library, goes open source

melonakos · on Nov 12, 2014

Hello everyone! I am a co-founder of ArrayFire. Since this is a startup-oriented board, I thought readers of this thread might be interested in how we arrived at this decision to open source from a business perspective, http://notonlyluck.com/2014/07/31/the-decision-to-open-sourc...

For technical questions, @pavanky is on here :-)

_stephan · on Nov 12, 2014

I find this decision very intriguing from a business perspective. Thanks for elaborating on the reasons in the blog post.

By completely open sourcing your only software product with a liberal license you seem to be turning a software (product) company into a software consultancy, is that a fair assessment?

Do you think the market conditions that lead you to this decision are very specific to GPU computing, or would you expect similar conditions in the more general scientific computing/HPC market? Would you say that it's generally simpler to earn money by doing specialized consulting than by selling technical software libraries, even though the former is less "scalable"?

If you're earning all the money with consulting and support, how do you allocate ressources to the further development of the library? Do your software engineers enjoy working on your company's product the same as working on client projects?

melonakos · on Nov 13, 2014

Yes, our primary method of making money will be through open source monetization strategies. I wrote thoughts on this here: http://notonlyluck.com/2014/01/16/monetizing-open-source-pro...

I bet the market conditions that led to this are broader than just GPU computing. I think it would more generally apply to any middleware business. But it is certainly more palatable for people in scientific computing and HPC to use something free and later pay for support and services and addons. Once people start really relying on the free thing, that reliance can be monetized. In this sense, it is more "scalable" to have an open source product which is readily adoptable by early users than to attempt to sell a product to buyers that have not yet started to rely upon it and have a good distance to go before reliance sets in. This is not SaaS and never will be, haha.

Allocation of future resources is something that we have considered a lot. I wrote before about opportunity costs associated with an open source business model: http://notonlyluck.com/2014/08/13/opportunity-costs-required...

We just open sourced today, but our plan is to treat the open source product the same as we have always treated it even when it was proprietary.

Great questions! Hit me up on Twitter @melonakos. Would be good to connect more with you, especially if you are going to SC'14 next week.

_stephan · on Nov 13, 2014

Thanks a lot for your reply and for providing links to your previous blog entries! I wish you good luck and hope the economics will continue to work out for you.

I do think that libraries should be distributed as open source, but I'm also hoping that at least in certain areas there is a way to commercially develop them as a product business. Provocatively speaking, if software "eats the world", then libraries are too important to just be developed as a by-product of some other ventures or in support of a platform/eco system.

Personally I'm planning on releasing a library under a GPL + commercial dual licensing scheme and later on another library under a non-commercial (incl. academic and government research) + commercial dual license. We'll see how that works out.

pavanky · on Nov 12, 2014

I can not answer the other questions, so I'll let John handle that part. I can answer this:

> If you're earning all the money with consulting and support, how do you allocate ressources to the further development of the library? Do your software engineers enjoy working on your company's product the same as working on client projects?

This is a question we have debated a lot internally. The shortest answer is that our experience building the product bring in the customers. The customer requirements can drive further development of the product.

Choosing the appropriate open source license (BSD-3 clause in this case), helps us reuse a lot of our code in a wide variety of situations.

beagle3 · on Nov 13, 2014

> Choosing the appropriate open source license (BSD-3 clause in this case), helps us reuse a lot of our code in a wide variety of situations.

If I understand it correctly, since you wrote the code and own the rights, you can do this regardless of the license you chose; e.g., you could have done an AGPL-3 release to the public, and continue giving license-to-use-and-modify-but-not-release to customers.

Am I misunderstanding?

pavanky · on Nov 13, 2014

You are technically right, but in my experience a few companies either do not understand this or do not want to risk it for legal reasons.

BSD 3-clause on the other hand is very easy to understand and is permissive off the bat.

MaysonL · on Nov 13, 2014

Another company that made this transition a while ago is Oberon Microsystems, who open-sourced their main product with a very liberal license.

melonakos · on Nov 21, 2014

Thanks for the info!

Game_Ender · on Nov 13, 2014

Does this include ArrayFire Pro [1], mentioned on your doc pages? I am having a hard time finding the source to the pro versions on your GitHub page.

1 - http://www.arrayfire.com/docs/arrayfirepro.htm

melonakos · on Nov 13, 2014

Yes, ArrayFire Pro is no longer proprietary. We literally open sourced every bit of the library we have. We held nothing back in this :-)

We need to remove mention of ArrayFire Pro from the website now. Thanks for pointing to that.

galapago · on Nov 12, 2014

Can you give a short comparative with Theano?

pavanky · on Nov 12, 2014

I am not too familiar with Theano, but from what I can tell it is more focused towards Deep Learning. So I will refrain from comparing and will give you a short list about ArrayFire.

- Supports multiple backends, so you can run on NVIDIA GPUs, AMD GPUs, Intel Xeon Phis, and all CPUs using the same API.

- ArrayFire currently has statistics, image processing, signal processing and Linear algebra functions. We are planning to add Machine Learning and Computer Vision functions / algorithms in the near future.

- ArrayFire is a native (C/C++) library. It can be used from other languages fairly easily.

- The main goal is to make parallel programming in general (GPU programming in particular) easier and portable.

bch · on Nov 12, 2014

I glanced for extern C clauses but didn't see them -- is C considered a first-class language to bind to this lib though ?

update: Downloaded repo, grepped, found externs... looking forward to playing with this :)

pavanky · on Nov 12, 2014

You can look at our headers to find the C api alongside the C++ API.

Here is the image.h header file for example: https://github.com/arrayfire/arrayfire/blob/devel/include/af...

extern "C" is present on line 58.

EvanMiller · on Nov 12, 2014

For some context, ArrayFire is a product of AccelerEyes, which began life selling a GPU booster for Matlab (a product called Jacket).

This and today's .NET announcement shows how hard it is to sell proprietary developer tools. I had considered using ArrayFire for some of my own commercial work, but in the end decided to roll my own OpenCL code in order to have better control. If you require cutting-edge performance (which is the reason you'd consider ArrayFire in the first place), there's just too much risk involved if the vendor doesn't get details like memory access order right on complex matrix problems. Open-sourcing reduces that risk quite a bit; if this decision had been made 3 years ago, I would have given the product a closer look.

From a business perspective, open-sourcing will murder their margins so they're basically gambling on their ability to jump-start volume. I think the product is in a tough position because most of the action these is going towards "Big Data," where data doesn't fit on a single machine -- let alone a GPU -- or towards heavy number-crunching, where hand-rolled kernels will outperform generic array libraries. They might have luck serving as a kind of backend to NumPy, but then they're two steps removed from the customer so it'll be hard building a relationship that leads to a sale.

As a side note, it seems odd to me that "native CPU" is a target distinct from OpenCL, which already runs on both CPUs and GPUs. I understand that kernels written for GPUs sometimes need to be rewritten for CPUs to take advantage of the different computation and memory architecture, but since their native CPU target isn't vectorized or multi-threaded, it seems like any further effort should be spent adapting the OpenCL kernels for CPU platforms rather than reinventing the wheel with a distinct C or assembler target.

I admire the general goal of making GPU processing more accessible, but it's a problem with a lot of nuance and requires a significant amount of customer education. GPUs are sort of like quantum computers in the limited sense that they're totally awesome at some tasks and totally suck at other tasks, and you need a solid grounding in the theory to distinguish the two sets of cases. Open-sourcing should at least help with the education angle, since ArrayFire now represents a respectable percentage of publicly viewable OpenCL code. (The open-source scene for OpenCL is pretty depressing right now.) In any case, good luck out there.

melonakos · on Nov 13, 2014

Great thoughts and interesting to see your thought process along the way. For quite some time, we have made ArrayFire free for a single GPU usage, dipping a toe in building a user base. We have already started monetizing that free user base over the last several years and we are good at that already. So from a business perspective, we have no margins that are really at risk. We only have more money to make from this move!

And you are right. Too bad we didn't do this long ago!!! Hindsight is 20-20 as they say. I wrote about some of the internal deliberations we had on this decision here: http://notonlyluck.com/2014/07/31/the-decision-to-open-sourc...

pavanky · on Nov 13, 2014

> As a side note, it seems odd to me that "native CPU" is a target distinct from OpenCL, which already runs on both CPUs and GPUs.

We are planning to move towards a single library that dynamically loads the appropriate backend depending on the runtimes / drivers available. If we completely relied on OpenCL, the same binary will not work on machines without the OpenCL SDKs installed.

> I think the product is in a tough position because most of the action these is going towards "Big Data," where data doesn't fit on a single machine -- let alone a GPU -- or towards heavy number-crunching, where hand-rolled kernels will outperform generic array libraries

Well that is two part question. As for hand-rolled kernels, they will obviously be better if you know the problem type. But more often than not, our users are happy to get "X" times the speed up in "Y" hours as opposed to "(1.2 - 1.3)X" speedup in "(3-5)Y" hours.

As for Big data, this is something we are working on / towards. We have some ideas that will make scaling across multiple GPUs and multiple machines easier. Since we will be doing this publicly, I am sure we will get a lot of valuable feedback from the community.

pavanky · on Nov 12, 2014

Hello everyone! I am one of the developers. I am more than happy to answer any questions.

m_mueller · on Nov 13, 2014

Hi! A few questions:

- How do you deal with software that has been previously run with coarse grained parallelism, optimized for Multicore/Multinode x86? In my experience, GPGPU porting often leads to a tedious, mostly mechanical conversion from coarse grained to fine grained, which usually includes privatizing all your data manually in your parallel domains.

- Can you do multinode / multi-GPU without wrapping everything in MPI?

- Can ArrayFire also run on CPU clusters?

- How do you deal with different storage orders? Up until now, GPUs often require a different storage order than CPUs, (wide vs. narrow vector processor) - and how does that factor into the last point?

Sidenote: I've been dealing with above problems in a Fortran based research project and have created a preprocessor framework[1] to deal with it.

[1]https://github.com/muellermichel/Hybrid-Fortran

michaellosee · on Nov 13, 2014

GPUs love hashing things, do you think ArrayFire would make that easy to do? I would LOVE to use the library to create an opensource GPU cracking program. Hashcat is amazing but is closed source. I am giddy with excitement at the prospect. Thanks!

pavanky · on Nov 13, 2014

I am not sure how much ArrayFire can help with hashing. I need to read more to understand it better.

fla · on Nov 13, 2014

mostly a matter of making bitwise & arithmetic operations in parallel over integers (shift, xor, add etc..)

pavanky · on Nov 13, 2014

Ah in that case ArrayFire can certainly help!

epsylon · on Nov 13, 2014

Which libraries are your competitors and why would ArrayFire be better than these?

pmalynin · on Nov 12, 2014

Interesting project for sure. When I was working on machine learning project this summer I decided to use the GPU to do a lot of computation on a smallish dataset (100 MiB) with 250k records and the alogirthm was O(n^2) and at some points even O(n^3). I tried to use existing solutions (ViennaCL, etc) but alas nothing seemed to work fast or at all. In the end learning CUDA turned out to be quite easy and profiling with Nvidia's tools is very nice and for most problems it seems rolling your own solution is often the best as they can be so ridiculously optimized (100% bandwidth utilization on 33% thread occupancy)

ubasu · on Nov 12, 2014

Just a curious onlooker - don't mean to criticize. A lot of this seems to be recreating stuff that Fortran 95/2003 does natively, but I guess this is for C/C++ people?

pavanky · on Nov 12, 2014

ArrayFire implements many algorithms that Fortran has (such as statistics, reductions etc), but it also has many image processing functions. We are working on pushing Machine Learning, Computer Vision and Graph related algorithms in the next few weeks.

The library also implements the algorithms in three backends (CUDA, OpenCL and native CPU) using the same API. We'll be adding support for SSE/AVX/NEON to make it more performance portable inthe future.

shepardrtc · on Nov 12, 2014

How does it's performance compare to Intel's MKL on a CPU?

pavanky · on Nov 12, 2014

The current CPU implementation is single core, non-vectorized code. That said, ArrayFire can link with any BLAS / LAPACK library to accelerate the relevant algorithms.

EDIT: The CUDA and OpenCL backends will obviously be faster than MKL. We'll be adding SSE / AVX support at some point which'll make the CPU backend faster as well.

14113 · on Nov 13, 2014

In a similar vein, how does this compare with (say) CuBLAS or ClMath?

pavanky · on Nov 13, 2014

We depend on those libraries. We make the API easier to understand while keeping the performance close to the upstream libraries.

weitzj · on Nov 12, 2014

did you look at http://www.yeppp.info for sse/avx support?

pavanky · on Nov 13, 2014

Ha! We interviewed the author of that library!

roel_v · on Nov 12, 2014

Do you have any plans for (a) shortest path algorithm(s)?

pavanky · on Nov 12, 2014

We are actually very interested in graph algorithms and analytics and recently started to work on these. The first analytic that we tackled was triangle counting for social networks.

We wrote some blogs on this:

-http://arrayfire.com/triangle-counting-in-graphs-on-the-gpu-...

We plan on looking at additional algorithms in the future.

Kai__ · on Nov 13, 2014

Is the title on their website a typo, or some in-joke I don't get?

"Real speedups for you code!"

bhouston · on Nov 13, 2014

The issue is that ArrayFire is competing against CUDA and it is free since NVIDIA makes money from selling the GPUs it runs on.

pavanky · on Nov 13, 2014

We build our library on top of CUDA! (among other things)

I would not say we compete with CUDA, more like complement them!

saeguaiga · on Nov 13, 2014

Thank you for the engaged vacation from the AAA Game and Deep Introspection Belt computing platforms, and may your booth B-wavelet models represent you well.

sjtrny · on Nov 13, 2014

This is an abstraction on top of CUDA. It takes care of writing the low-level code for operations that you would otherwise have to write yourself like matrix multiplication.