Not an explanation, but a benefit could be that SVMs can be evaluated much faste...

eugenhotaj · on Dec 6, 2020

Don’t kernel SVMs need a full pass through the data they were trained on to make predictions? How is that faster?

cscheid · on Dec 6, 2020

No, they require a full pass over the support vectors, which are potentially a much smaller set. (That’s part of why everyone was so excited about SVMs when they were invented) The support vectors are the training values with nonzero hinge loss, or alternatively, training values sufficiently close to the decision boundary.

eugenhotaj · on Dec 6, 2020

Fair enough, but the number of support vectors for non trivial problems is still pretty large (as I understand but could be wrong), e.g. 20-30% of the dataset. Having to iterate over 30% of say imagenet on each batch of predictions seems unfeasible.

alexilliamson · on Dec 6, 2020

You only need the "Support Vectors" to make predictions, not the whole dataset.

slt2021 · on Dec 6, 2020

neural nets at the same time require multiple passes through the data (epochs). if we can train a model in one epoch jnstead of 10000 epochs thats a breakthrough!

sdenton4 · on Dec 6, 2020

Epochs are more about the training data than the model... If you've got a big enough dataset, one epoch or less is fine!

eugenhotaj · on Dec 6, 2020

True, but it sounds like you’re just shifting computation from training to inference. And I’m not sure that’s a very good trade off to make, you’re likely to predict on much more data than you trained on (e.g. ranking models at google, fb, etc)

slt2021 · on Dec 6, 2020

not sure I get your point, both DNNs and SVMs require one forward pass for inference, so there is no difference. if SVM model can converge in one epoch, how is it not less efficient than the status quo with DNNs?

eugenhotaj · on Dec 6, 2020

For kernel SVMs, one needs to keep around part of the training data (the support vectors) right? With DNNs, after training, all you need are the model parameters. For very large datasets, keeping around even a small part of your training data may not be feasible.

Furthermore, number of parameters do not (necessarily) grow with the size of the training data, can be reused if you get more data, can be quantized/pruned/etc. There's not really an easy way to do these things with SVMs as far as I understand.