This was my only tangible contribution to the Gimp, some twenty years ago. IIRC it was modelled after a similar feature in Paint Shop Pro, and it was easy enough an exercise for an aspiring programmer. Or so I thought: others have now spent much more time maintaining it than I spent writing it.
In hindsight it seems like a bizarre idea to have a feature whose GUI requires filling in 25 numbers. But perhaps it has use as an educational tool.
In development version of the Gimp, the entire plugin has effectively been rewritten. Good riddance!
I've been trying to get a handle on what convolutions really are. Which ever description I read is usually so abstract as to be meaningless. But your plugin is a beautiful way to play with them an understand them in a fundamental way.
N.B. I came across it because of Chris Olah's blog post [0].
I had no idea this feature was available in Gimp, it would have helped me out a lot in debugging and developing shaders which use convolution matrices! Thanks for your hard work!
Well, you'd certainly want a bunch of predefined matrices readily available. A grid of 25 numbers is only ideal if you already know the kind of values you need.
Discrete convolution definitely isn't the dot product. Given an n-tuple a and an m-tuple b, form a polynomial p_1 from the coefficients of a and a polynomial p_2 from the coefficients of p_2. The the convolution of a and b is the coefficients of a * b.
e.g. a= [1,2,3] and b=[4,5,6].
p_1 = 1 + 2x + 3x^2 + ...
p_2 = 4 + 5x + 6x^2 + ...
p_1 * p_2 = 4 + 13x + 28x^2 + ...
Hence the convolution of a and b is [4, 13, 28, ...].
Here's the full answer:
octave:1> conv([1,2,3], [4,5,6])
ans =
4 13 28 27 18
The intuition behind convolution to me for an image is uniform blurring. If you have a square shaped eraser, and you use it to blur a pencil sketch, the sketch will appear blocky and square shaped. Or perhaps a better analogy is painting: your wall will appear different depending on whether you use a paintbrush or a roller.
why in the world are you presenting convolution of polynomials?
i was being glib true but the crux of convolution (what's important for someone who expresses that they don't understand the mathematics) is elementwise multiplication i.e. dot product. yes you take a filter and sweep it across your input (or vice-versa depending on whether you're really talking about convolution or correlation) and that's important but in my opinion what's more important (obviously since that's what /i/ expressed) is elementwise multiplication. once you understand that (the elementwise multiplication) sweeping and padding and etc are simple to understand.
>The intuition behind convolution to me for an image is uniform blurring. If you have a square shaped eraser, and you use it to blur a pencil sketch, the sketch will appear blocky and square shaped. Or perhaps a better analogy is painting: your wall will appear different depending on whether you use a paintbrush or a roller.
this is a very bad example since you can easily convolve with filters that sharpen (e.g. literally any gaussian kernel). so you're really not illustrating convolution but instead convolution with an averaging filter (or something like that).
in general you shouldn't jump to criticize someone's explanation to a student unless you are prepared to give a much better and (this is important) pedagogically sound explanation because 1 you look foolish 2 you sabotage the student's understanding.
I don't care about the "student", I'm telling /you/ what convolution is. It's not the dot product.
"yes you take a filter and sweep it across your input..."
Sweeping across your input is the heart and soul of convolution. I've taken courses in image processing, control theory, probability theory, and digital signal processing - and in all these contexts, convolution is always sweeping across in a very specific way. If you change a single minus sign, you get cross correlation which is completely different.
I don't appreciate the rude tone you're taking so I will not be replying further. I didn't take that tone when replying to your initial message.
> don't care about the "student", I'm telling /you/ what convolution is. It's not the dot product.
Then don't comment on expositions given to students.
>Sweeping across your input is the heart and soul of convolution.
But that's like just your opinion man. The definition is what the definition is and a component of it is the translation of the filter and a component of it is the weighted sum. It's up to each individual to decide what's the most significant component and each individual is free to adjust their focus according to the problem at hand. In this instant I chose to focus on the sum for the sake of pedagogy.
>If you change a single minus sign, you get cross correlation which is completely different.
No in fact you get literally the same function just flipped about t=0. I would not call that completely different.
> I didn't take that tone when replying to your initial message.
Read your response again and reflect some more because absolutely your initial message was antagonistic
The symmetry is mostly an effect of 'natural' images not favoring a specific axis (i.e., vertical vs horizontal lines are still distinguished as the same 'type' of lines, in some sense).
Another way of seeing why the symmetry emerges is that mirroring the image over an axis shouldn't change the edges you observe (this is the case for the edge detection). So, mirroring the underlying image shouldn't change the operation we wish to apply to detect edges.
Oh man, I've been trying to find a good explanation of what a convolution matrix does! I'm literally just now working on bitmap code in LibreOffice and was deliving into the convolution matrix functionality. Thanks to the author for this!
Funny that GIMP is actually doing cross-correlation instead of convolution. That's also a mess in Deep Learning frameworks where "convolution" often means "cross-correlation".
It's a pretty minor difference, really. Over functions (which is the same as over vectors/matrices, except the latter are taken at discrete points), the autocorrelation and the convolution differ by a sign. The definition of convolution is
conv(x(t), y(t))(q) = integral (x(t)·y(q-t) dt)
where x(t), y(t) are functions. Or, with sums and x, y are vectors
conv(x, y)(q) = sum(over i + j = q+1) of (x(i)·y(j))
whereas the definition of the autocorrelation is "simpler"
ac(x(t), y(t))(q) = int(x(t)·y(t-q))
In other words, it replaces y(t) with y(-t) (i.e., the autocorrelation of x(t) and y(t) is the same as the convolution of x(t) and y(-t)). A similar transformation happens for the discrete case.
Turns out, in the context of learning, these are equivalent since it just corresponds to a relabelling of the indices (which is an invertible linear transformation). In fact, even more strongly, independent of the order of the method (so long as it is >0th order!), this will cause no change in the descent direction or objective value, unlike arbitrary invertible linear transformations (for which this is only true of ≥2nd order methods).
Also, sorry, I was taking terminology from the GP. The real term for this is cross-correlation—with autocorrelation being the application of the above operation onto a single function; i.e., taking ac(f(t), g(t)) as defined above, the autocorrelation of g is really defined to be ac(g(t), g(t)).
I don't think it is a weighted average. I believe you have to divide at some point for averaging to be going on. Convolutions just add up the neighborhood based on the kernel, and output their sum.
It still gets across most of the point of how it works.
As for dividing, that's what the Divisor option is for. To quote the article:
"The result of previous calculation will be divided by this divisor. You will hardly use 1, which lets result unchanged, and 9 or 25 according to matrix size, which gives the average of pixel values."
It took me a while to realise that this is just a cousin of convolution: in a convolution, as you sweep the kernel over the image, combine pairs of pixels and kernel elements with multiplication, then summarise the results with addition, whereas with a rolling ball, you combine the pixels and kernel elements with subtraction, and summarise the results with the maximum. The kernel used is the 'ball', but there's no real reason it has to be based on a sphere in any way.
I wonder what other image processing operators could be built in that framework?
As an occasional user, no I can't get over it. I choose the wrong action every time! It's a non-standard way of saving an image, and it's a usability fail.
Since GIMP 2.8 came out 5 years ago and you still haven't got used to it, you must have been using it very occasionally, so hopefully it didn't waste too much of your time. However it's a big win for people who use any non-trivial features of GIMP which can't be "saved" in an ordinary image format. Pretty much every professional media software uses this workflow. You can't "Save As" .mp4 in Sony Vegas, you can only save your project as .veg, a file format which only Sony Vegas can open. Otherwise all your hard work will be lost.
Or he just hasn't upgraded -- I know I haven't. But, you're right. I've mostly moved on to other tools. If I'm using GIMP to edit something, the source file is never going to be XCF. Sorry guys, nobody uses XCF and pros don't use GIMP. A big win for professional users would be CMYK support, not the inability to save non-XCF files.
Let's face it. Chasing non-existent professional users at the expense of your actual user base is pretty darn lousy UX.
There are so many misconceptions in this post, I have to wonder if this isn't some sort of an elaborate troll.
>If I'm using GIMP to edit something, the source file is never going to be XCF.
What makes you think the source file can only be XCF? You can easily open any image format with GIMP.
>Sorry guys, nobody uses XCF and pros don't use GIMP.
Wrong and wrong. XCF is necessary to save your work in GIMP, so most GIMP users use it.
>A big win for professional users would be CMYK support,
Who the hell needs CMYK support these days? Print is dead (well, it's getting there anyway).
>not the inability to save non-XCF files.
You can easily export your project to any image format that exists.
>Chasing non-existent professional users at the expense of your actual user base is pretty darn lousy UX.
The tool should be optimized for people who spend the most time using the tool. GIMP is for long edits, where saving (actually saving) your intermediate progress is a part of the workflow, whether you're a hobbyist or pro. For quick edits there is Pinta and whatever else.
> What makes you think the source file can only be XCF? You can easily open any image format with GIMP.
Nothing. But if I'm opening a not-XCF I probably don't want to SAVE it as an XCF.
> Wrong and wrong. XCF is necessary to save your work in GIMP, so most GIMP users use it.
So basically the only reason anyone uses XCF is that GIMP forces them to? That's some sound design logic there. Used to be you could save your work in not-XCF.
> Who the hell needs CMYK support these days? Print is dead (well, it's getting there anyway).
The professional users GIMP is chasing after.
> You can easily export your project to any image format that exists.
Would be easier to save.
> The tool should be optimized for people who spend the most time using the tool.
So chasing after the non-existent professional GIMP users would be a bad optimization then? Ok.
> GIMP is for long edits, where saving (actually saving) your intermediate progress is a part of the workflow, whether you're a hobbyist or pro.
You know, I'd love to find an open source alternative to Lightroom and Photoshop. Turns out things like darktable and GIMP are far more archaic. That's the reason I still buy and use the Adobe tools. Dicking about with the one workflow where GIMP actually works is a poor move and doesn't do anything beneficial for more involved workflows.
I thought it was interesting the same technique (convolution) was used in two different applications: photoshop/gimp for image filters and in convolutional neural networks (CNNs). What is the purpose of convolution in CNNs?
the setosa interactive says convolution is "a technique for determining the most important portions of an image"
the hubel and wiesel experiment showed "there is a topographical map in the visual cortex that represents the visual field, where nearby cells process information from nearby visual fields. Moreover, their work determined that neurons in the visual cortex are arranged in a precise architecture. Cells with similar functions are organized into columns, tiny computational machines that relay information to a higher region of the brain, where a visual image is formed"
I'm interested in David Marr's work, about to order Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, his level of analysis approach is interesting. I think he did work on edge detection, which must of been involved some kind of convolution/filter
This setosa interactive is great, and actually references the gimp documentation
An image kernel is a small matrix used to apply effects like the ones you might find in Photoshop or Gimp, such as blurring, sharpening, outlining or embossing. They're also used in machine learning for 'feature extraction', a technique for determining the most important portions of an image. In this context the process is referred to more generally as "convolution"
The classic experiments by Hubel and Wiesel are fundamental to our understanding of how neurons along the visual pathway extract increasingly complex information from the pattern of light cast on the retina to construct an image. For one, they showed that there is a topographical map in the visual cortex that represents the visual field, where nearby cells process information from nearby visual fields. Moreover, their work determined that neurons in the visual cortex are arranged in a precise architecture. Cells with similar functions are organized into columns, tiny computational machines that relay information to a higher region of the brain, where a visual image is formed.
Computer vision as analysis by synthesis has a long tradition and remains central to a wide class of generative methods. In this top-down approach, vision is formulated as the search for parameters of a model that is rendered to produce an image (or features of an image), which is then compared with image pixels (or features). The model can take many forms of varying realism but, when the model and rendering process are designed to produce realistic images, this process is often called inverse graphics. In a sense, the approach tries to reverse-engineer the physical process that produced an image of the world.
> I thought it was interesting the same technique (convolution) was used in two different applications: photoshop/gimp for image filters and in convolutional neural networks (CNNs). What is the purpose of convolution in CNNs?
IMO, convolution in CNNs would be better denoted as correlation. In CNNs the feature maps are convolved (correlated), eg swept over the image and multiple at each spot with the results being accumulated, in order to find where in the image these filters fit. The output produces a map of how strongly each filter (there are usually multiple in a convolution layer, that is the third dimension of the layer), fits with the image. These become the weights passed onto the next layers.
As an electrical engineer by training, who used convolution all the time, I didn't understand how the two were related, especially because of the third dimension.
English is likely not the first language of the volunteer who wrote these docs. If you want better documentation, I encourage you to write it yourself.
I assumed as much am not belittling the skills of a non-native speaker. I was just sort of surprised that a project that has been around so long hasn’t had better attention paid to its docs (my biggest FOSS experience is with PostgreSQL which is a bit of an outlier in just how good its docs are).
This was my only tangible contribution to the Gimp, some twenty years ago. IIRC it was modelled after a similar feature in Paint Shop Pro, and it was easy enough an exercise for an aspiring programmer. Or so I thought: others have now spent much more time maintaining it than I spent writing it.
In hindsight it seems like a bizarre idea to have a feature whose GUI requires filling in 25 numbers. But perhaps it has use as an educational tool.
In development version of the Gimp, the entire plugin has effectively been rewritten. Good riddance!