Hello everyone! I am a co-founder of ArrayFire. Since this is a startup-oriented board, I thought readers of this thread might be interested in how we arrived at this decision to open source from a business perspective, http://notonlyluck.com/2014/07/31/the-decision-to-open-sourc...
I find this decision very intriguing from a business perspective. Thanks for elaborating on the reasons in the blog post.
By completely open sourcing your only software product with a liberal license you seem to be turning a software (product) company into a software consultancy, is that a fair assessment?
Do you think the market conditions that lead you to this decision are very specific to GPU computing, or would you expect similar conditions in the more general scientific computing/HPC market? Would you say that it's generally simpler to earn money by doing specialized consulting than by selling technical software libraries, even though the former is less "scalable"?
If you're earning all the money with consulting and support, how do you allocate ressources to the further development of the library? Do your software engineers enjoy working on your company's product the same as working on client projects?
I bet the market conditions that led to this are broader than just GPU computing. I think it would more generally apply to any middleware business. But it is certainly more palatable for people in scientific computing and HPC to use something free and later pay for support and services and addons. Once people start really relying on the free thing, that reliance can be monetized. In this sense, it is more "scalable" to have an open source product which is readily adoptable by early users than to attempt to sell a product to buyers that have not yet started to rely upon it and have a good distance to go before reliance sets in. This is not SaaS and never will be, haha.
Thanks a lot for your reply and for providing links to your previous blog entries! I wish you good luck and hope the economics will continue to work out for you.
I do think that libraries should be distributed as open source, but I'm also hoping that at least in certain areas there is a way to commercially develop them as a product business. Provocatively speaking, if software "eats the world", then libraries are too important to just be developed as a by-product of some other ventures or in support of a platform/eco system.
Personally I'm planning on releasing a library under a GPL + commercial dual licensing scheme and later on another library under a non-commercial (incl. academic and government research) + commercial dual license. We'll see how that works out.
I can not answer the other questions, so I'll let John handle that part. I can answer this:
> If you're earning all the money with consulting and support, how do you allocate ressources to the further development of the library? Do your software engineers enjoy working on your company's product the same as working on client projects?
This is a question we have debated a lot internally. The shortest answer is that our experience building the product bring in the customers. The customer requirements can drive further development of the product.
Choosing the appropriate open source license (BSD-3 clause in this case), helps us reuse a lot of our code in a wide variety of situations.
> Choosing the appropriate open source license (BSD-3 clause in this case), helps us reuse a lot of our code in a wide variety of situations.
If I understand it correctly, since you wrote the code and own the rights, you can do this regardless of the license you chose; e.g., you could have done an AGPL-3 release to the public, and continue giving license-to-use-and-modify-but-not-release to customers.
I am not too familiar with Theano, but from what I can tell it is more focused towards Deep Learning. So I will refrain from comparing and will give you a short list about ArrayFire.
- Supports multiple backends, so you can run on NVIDIA GPUs, AMD GPUs, Intel Xeon Phis, and all CPUs using the same API.
- ArrayFire currently has statistics, image processing, signal processing and Linear algebra functions. We are planning to add Machine Learning and Computer Vision functions / algorithms in the near future.
- ArrayFire is a native (C/C++) library. It can be used from other languages fairly easily.
- The main goal is to make parallel programming in general (GPU programming in particular) easier and portable.
For some context, ArrayFire is a product of AccelerEyes, which began life selling a GPU booster for Matlab (a product called Jacket).
This and today's .NET announcement shows how hard it is to sell proprietary developer tools. I had considered using ArrayFire for some of my own commercial work, but in the end decided to roll my own OpenCL code in order to have better control. If you require cutting-edge performance (which is the reason you'd consider ArrayFire in the first place), there's just too much risk involved if the vendor doesn't get details like memory access order right on complex matrix problems. Open-sourcing reduces that risk quite a bit; if this decision had been made 3 years ago, I would have given the product a closer look.
From a business perspective, open-sourcing will murder their margins so they're basically gambling on their ability to jump-start volume. I think the product is in a tough position because most of the action these is going towards "Big Data," where data doesn't fit on a single machine -- let alone a GPU -- or towards heavy number-crunching, where hand-rolled kernels will outperform generic array libraries. They might have luck serving as a kind of backend to NumPy, but then they're two steps removed from the customer so it'll be hard building a relationship that leads to a sale.
As a side note, it seems odd to me that "native CPU" is a target distinct from OpenCL, which already runs on both CPUs and GPUs. I understand that kernels written for GPUs sometimes need to be rewritten for CPUs to take advantage of the different computation and memory architecture, but since their native CPU target isn't vectorized or multi-threaded, it seems like any further effort should be spent adapting the OpenCL kernels for CPU platforms rather than reinventing the wheel with a distinct C or assembler target.
I admire the general goal of making GPU processing more accessible, but it's a problem with a lot of nuance and requires a significant amount of customer education. GPUs are sort of like quantum computers in the limited sense that they're totally awesome at some tasks and totally suck at other tasks, and you need a solid grounding in the theory to distinguish the two sets of cases. Open-sourcing should at least help with the education angle, since ArrayFire now represents a respectable percentage of publicly viewable OpenCL code. (The open-source scene for OpenCL is pretty depressing right now.) In any case, good luck out there.
Great thoughts and interesting to see your thought process along the way. For quite some time, we have made ArrayFire free for a single GPU usage, dipping a toe in building a user base. We have already started monetizing that free user base over the last several years and we are good at that already. So from a business perspective, we have no margins that are really at risk. We only have more money to make from this move!
> As a side note, it seems odd to me that "native CPU" is a target distinct from OpenCL, which already runs on both CPUs and GPUs.
We are planning to move towards a single library that dynamically loads the appropriate backend depending on the runtimes / drivers available. If we completely relied on OpenCL, the same binary will not work on machines without the OpenCL SDKs installed.
> I think the product is in a tough position because most of the action these is going towards "Big Data," where data doesn't fit on a single machine -- let alone a GPU -- or towards heavy number-crunching, where hand-rolled kernels will outperform generic array libraries
Well that is two part question. As for hand-rolled kernels, they will obviously be better if you know the problem type. But more often than not, our users are happy to get "X" times the speed up in "Y" hours as opposed to "(1.2 - 1.3)X" speedup in "(3-5)Y" hours.
As for Big data, this is something we are working on / towards. We have some ideas that will make scaling across multiple GPUs and multiple machines easier. Since we will be doing this publicly, I am sure we will get a lot of valuable feedback from the community.
- How do you deal with software that has been previously run with coarse grained parallelism, optimized for Multicore/Multinode x86? In my experience, GPGPU porting often leads to a tedious, mostly mechanical conversion from coarse grained to fine grained, which usually includes privatizing all your data manually in your parallel domains.
- Can you do multinode / multi-GPU without wrapping everything in MPI?
- Can ArrayFire also run on CPU clusters?
- How do you deal with different storage orders? Up until now, GPUs often require a different storage order than CPUs, (wide vs. narrow vector processor) - and how does that factor into the last point?
Sidenote: I've been dealing with above problems in a Fortran based research project and have created a preprocessor framework[1] to deal with it.
GPUs love hashing things, do you think ArrayFire would make that easy to do? I would LOVE to use the library to create an opensource GPU cracking program. Hashcat is amazing but is closed source. I am giddy with excitement at the prospect. Thanks!
Interesting project for sure. When I was working on machine learning project this summer I decided to use the GPU to do a lot of computation on a smallish dataset (100 MiB) with 250k records and the alogirthm was O(n^2) and at some points even O(n^3). I tried to use existing solutions (ViennaCL, etc) but alas nothing seemed to work fast or at all. In the end learning CUDA turned out to be quite easy and profiling with Nvidia's tools is very nice and for most problems it seems rolling your own solution is often the best as they can be so ridiculously optimized (100% bandwidth utilization on 33% thread occupancy)
Just a curious onlooker - don't mean to criticize. A lot of this seems to be recreating stuff that Fortran 95/2003 does natively, but I guess this is for C/C++ people?
ArrayFire implements many algorithms that Fortran has (such as statistics, reductions etc), but it also has many image processing functions. We are working on pushing Machine Learning, Computer Vision and Graph related algorithms in the next few weeks.
The library also implements the algorithms in three backends (CUDA, OpenCL and native CPU) using the same API. We'll be adding support for SSE/AVX/NEON to make it more performance portable inthe future.
The current CPU implementation is single core, non-vectorized code. That said, ArrayFire can link with any BLAS / LAPACK library to accelerate the relevant algorithms.
EDIT: The CUDA and OpenCL backends will obviously be faster than MKL. We'll be adding SSE / AVX support at some point which'll make the CPU backend faster as well.
We are actually very interested in graph algorithms and analytics and recently started to work on these.
The first analytic that we tackled was triangle counting for social networks.
Thank you for the engaged vacation from the AAA Game and Deep Introspection Belt computing platforms, and may your booth B-wavelet models represent you well.
This is an abstraction on top of CUDA. It takes care of writing the low-level code for operations that you would otherwise have to write yourself like matrix multiplication.
For technical questions, @pavanky is on here :-)