- You can think of error, loss, and cost functions as the same. In fact, two tex...

underflow · on Jan 17, 2016

"the loss function max(0, -xy) is 0 if x and y are the same class and 1 if they are different"

Not exactly because this would be optimizing the number of correctly classified elements. Instead you minimize the sum of abs(WX) for each misclassified examples.

psyklic · on Jan 17, 2016

In the case of these slides, the loss function is max(0, -xy) and the error function is the sum of these. So, the error function is the number of incorrectly classified examples (if x and y are different, it adds 1 to the error), which is exactly what we hope to minimize.

x=1,y=1 => max(0, -(1 x 1)) = 0.

x=1,y=-1 => max(0, -(-1 x 1)) = 1.

x=-1,y=1 => max(0, -(1 x -1)) = 1.

x=-1,y=-1 => max(0, -(-1 x -1)) = 0.

underflow · on Jan 17, 2016

The transfer function is applied only at evaluation.

In the formulas of the slides (and in the code), for training I compute the loss of an example X and it's expected target as: L(XW, target) What you define is minimizing L(transfer(XW), target) which is not easily optimizable.

psyklic · on Jan 17, 2016

In the case of perceptrons, point taken -- I agree. However, my original statement still holds. The loss and error functions presented on the slides are still valid. Whether or not they are easily optimizable, they are still examples of loss and error functions.