There are functions that are differentiable, but not Lipschitz, e.g. sin(1/x) on (0, ∞)). In this context the Lipschitz property allows you to do things that mere differentiability doesn't, and, in the context of assuming that your function is differentiable, the definition given for the Lipschitz property is correct. It's true that there are also functions that are Lipschitz-continuous but not differentiable, but in this context they aren't interesting.
> It's true that there are also functions that are Lipschitz-continuous but not differentiable, but in this context they aren't interesting.
They absolutely are, even in the article. E.g. the max function or the 1-norm are non-differentiable, but continuous.
There is literally no point in talking about Lipschitz continuity for differentiable functions, because it is equivalent to the derivative being bounded.
Hmm, I guess you're right. But those functions (highly relevant here!) are differentiable almost everywhere; is that enough for Lipschitz continuity to be equivalent to the derivative being bounded?
I still think there's a point in talking about Lipschitz continuity for differentiable functions because the algorithms they're talking about depend on the Lipschitz property, not the differentiability.
>because the algorithms they're talking about depend on the Lipschitz property, not the differentiability.
That is why the definition should not contain the differentiability condition. It is violated by the examples and irrelevant.
My point is that the Lipschitz property, as defined e.g. in the Wikipedia article, is the important part. Also assuming differentiability is unhelpful, because it misses the point and is even violated by the rest of the article.
The article is talking about functions whose changes over an displacement dx are not just bounded by K |dx| for some K, but specifically by the value K=1. So it's not the same thing as the Lipschitz property in analysis, although I guess it's inspired by it--but as a result it's concrete and useful for algorithms.
>The article is talking about functions whose changes over an displacement dx are not just bounded by K |dx| for some K,
It is also talking about functions which do not have a derivative at all and therefore could not possibly have that property. The point of Lipschitz functions is that they need not have a derivative.
>So it's not the same thing as the Lipschitz property in analysis, although I guess it's inspired by it--but as a result it's concrete and useful for algorithms.
That is just some arbitrary scaling. Only relevant to have the math look nicer. E.g. if f is Lipschitz for L=10, then f/10 is Lipschitz for L=1.
> It is also talking about functions which do not have a derivative at all and therefore could not possibly have that property [that their changes over an displacement dx are bounded by K |dx| for some K]
Functions that do not have a derivative at all can certainly have that property. Perhaps you were confused by the notation "dx", suggesting a differential, for the displacement, rather than the conventional Δx, but the sentence makes no sense if we try to read it as a statement about differentials. Possibly your correspondent is not a Greek Keyboard Achiever. Moreover, the functions they are discussing, while indeed not differentiable, are continuous and differentiable almost everywhere, so you can also prove Lipschitz continuity if you can prove a bound on their gradients at the points where they do exist.
> That is just some arbitrary scaling. Only relevant to have the math look nicer. E.g. if f is Lipschitz for L=10, then f/10 is Lipschitz for L=1.
It's a relevant difference if you don't know the Lipschitz constant L or a bound on it, because then you don't know what to divide by.
>Functions that do not have a derivative at all can certainly have that property. Perhaps you were confused by the notation "dx", suggesting a differential, for the displacement, rather than the conventional Δx, but the sentence makes no sense if we try to read it as a statement about differentials. Possibly your correspondent is not a Greek Keyboard Achiever. Moreover, the functions they are discussing, while indeed not differentiable, are continuous and differentiable almost everywhere, so you can also prove Lipschitz continuity if you can prove a bound on their gradients at the points where they do exist.
The Lipschitz property is not a property of functions with a derivative. I really do dislike this kind of drive by commenting, where you ignore an entire conversation and pretend someone does not understand what a gradient is.
Functions with a derivative can definitely have the Lipschitz property. The “examples” section of https://en.m.wikipedia.org/wiki/Lipschitz_continuity begins with a subsection entitled, “Lipschitz continuous functions that are everywhere differentiable”. Most of the examples given there are differentiable in varying ways. (Unqualified “differentiable” normally means “everywhere differentiable”.)
You seem really confused about something. The article is talking about SDFs that do not have that property, which it is then trying to force to have that property via a hack of sorts (f/|del f| apparently), because once the property holds it is very useful for making the algorithm for more efficient.
The arbitrary scaling is not arbitrary if the whole algorithm is defined around K=1. The Lipschitz property in analysis is "bounded by any K". The algorithm is specifically using "bounded by K=1".
Also:
for the purpose of numerical methods, it does not matter if a function "has a derivative" in a classical sense or not. Certainly distributions, step functions, etc are fine. Numerical methods cannot, after all, tell if a function is discontinuous or not. The article is using some known closed-form functions as examples, but it is trying to illustrate the general case, which is not going to be a closed form but just an arbitrary SDF that you constructed from somewhere---which it will be very useful to enforce the K=1 Lipschitz condition on.
I think there are probably different populations commenting here with different backgrounds, and consequently different assumptions and different blind spots. For example, for Newton's Method, it does very much matter whether the function has a derivative in the classical sense everywhere in the relevant interval; the quadratic convergence proof depends on it. Discontinuous functions can break even binary chop.
Max, min, and the L1 norm? Those have derivatives (gradients) almost everywhere. The derivatives are undefined at a set of points of measure zero, and the functions are continuous across those gaps.