I know how it feels. SVM's are an old and tried classification method that was in fashion before the DL craze.
> An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.
One advantage of SVM's is that they don't use all data points to decide on the separation plane, just the closest points to the gap (the support vectors), making it more invariant.
Another advantage is that they can efficiently perform a non linear classification using the kernel trick, implicitly mapping their inputs into high dimensional feature spaces (here kernel means a distance function between two data points).
Support Vector Machine. A Machine Learning technique that's used typically for classification and regression, but also has been adapted to novelty detection, stuctured prediction, ranking etc. I wrote up a beginner's tutorial some time back, if you are interested [1].
I don't want to be too much of an asshole but don't you have Google like the rest of us. I always think it's so weird when people ask something to the comments in HN of all places when they could literally just have textually written it in Wikipedia or wtv and gotten a better answer
Congratulations, you made it!
You are now, by your own words, too much of an asshole. What an achievement!
Please allow me to explain my reasoning:
I'm of those with no clue about the meaning of SVM, and I came to the comments section looking for an answer to the question "what the ... is SVM anyway?".
Even though I know the way to Google (https://duckduckgo.com, right?), I'm happy someone asked this question and another answered it and as a consequence saved me (and hopefully others) a couple of clicks and taught me something that I might have otherwise been too lazy to investigate myself.
Should I want to be picky, I'd even suggest that having the question answered in the comment section is environmentally more efficient than your suggestion that we should all open a new tab and do the same research.
Save a tree, answer the not-so-obvious questions "for the masses".
Not only saved a couple of clicks, bu there were some good resources thrown around. When trying to learn a new topic without much knowledge, the most difficult part is finding good reads, videos, tutorials, etc, that will set you on track.
>"Outside of neural networks, GPUs don’t play a large role in machine learning today, and much larger gains in speed can often be achieved by a careful choice of algorithms." [1]
Scikit learn espouses a non-GPU approach. Perhaps the performance gains by using GPUs aren't that significant. Has anyone tried SVMs (or for that matter other non-DL classifiers) + GPUs?
Anyone has more insight into why? if I try to grid search hyperparameters for a random forest takes ages in a single machine with scikit-learn. I only ever see GPUs in neural networks. Is there some acceleration for non-NN machine learning algos?
Generally speaking, what GPUs excel at is applying exactly the same operation to many parallel data streams. Neural networks are like that.
Random forests are the opposite: branching on conditional tests is very expensive. It's been several years since I last wrote raw CUDA and OpenCL, but if memory serves, back then the docs essentially said that every if...else amounted to running both branches on all the data and then deciding what to keep, effectively halving performance. So a decision tree just a few levels deep would slow you down by an order of magnitude.
As an aside, you should generally avoid grid search in favor of random search or an evolutionary algorithm. Grid search can waste your time budget on likely bad combinations of hyperparameters instead of looking for combinations local to or nearby a known-good combination.
Most things that "run on CPUs" run on CPUs by many vendors. In the cases where software does not run on ARM processors it's almost always due to an architectural difference with x86, not an artificial proprietary limitation.
If an open-source library chose to rely on a proprietary C compiler with special language features (e.g. the Borland C compiler) that targeted only Intel x86 CPUs, I would argue we would not generally claim this software "runs on CPUs". The software "runs on Intel x86 CPUs" seems more appropriate.
Similarly, especially in the machine learning field, it seems to be more and more common that "GPU" really means "CUDA-required GPU". In a world where AMD and Intel GPUs are also common (perhaps more common?), to me it makes sense to be up-front about this.
I'm not criticizing the authors' choice to use CUDA here. I'm just saying it would be nice if we stop pretending CUDA is synonymous with GPU programming, especially in cases where OpenCL would be a very appropriate choice (e.g. open-source software).
The important point is it runs on any GPUs. If you think then saying it runs on GPUs is wrong the counter sounds more unusual, that it doesn't run on GPUs.
They could have been more precise but there's nothing wrong or particularly misleading about what they said, and what they said conveyed the most important aspect.
> would argue we would not generally claim this software "runs on CPUs"
"This software does not run on CPUs" sounds wrong for this scenario.