GIMP Convolution Matrix

lealanko · on March 18, 2018

Author here (of the plugin, not the page).

This was my only tangible contribution to the Gimp, some twenty years ago. IIRC it was modelled after a similar feature in Paint Shop Pro, and it was easy enough an exercise for an aspiring programmer. Or so I thought: others have now spent much more time maintaining it than I spent writing it.

In hindsight it seems like a bizarre idea to have a feature whose GUI requires filling in 25 numbers. But perhaps it has use as an educational tool.

In development version of the Gimp, the entire plugin has effectively been rewritten. Good riddance!

icc97 · on March 18, 2018

Thanks for a great plugin.

I've been trying to get a handle on what convolutions really are. Which ever description I read is usually so abstract as to be meaningless. But your plugin is a beautiful way to play with them an understand them in a fundamental way.

N.B. I came across it because of Chris Olah's blog post [0].

[0]: http://colah.github.io/posts/2014-07-Understanding-Convoluti...

OskarS · on March 18, 2018

I had no idea this feature was available in Gimp, it would have helped me out a lot in debugging and developing shaders which use convolution matrices! Thanks for your hard work!

daturkel · on March 19, 2018

It's in photoshop too!

userbinator · on March 18, 2018

In hindsight it seems like a bizarre idea to have a feature whose GUI requires filling in 25 numbers.

There is no better way, if you need to specify an arbitrary matrix, than to ask for 25 numbers to be input.

ghusbands · on March 21, 2018

Well, you'd certainly want a bunch of predefined matrices readily available. A grid of 25 numbers is only ideal if you already know the kind of values you need.

jey · on March 18, 2018

Yeah, weirdly a lot of basic image processing tasks can be expressed as convolutions: https://en.wikipedia.org/wiki/Kernel_(image_processing)

eat_veggies · on March 18, 2018

I don't fully grok the math behind convolutions yet, but the symmetry of the numbers in the matrices is beautiful.

mlevental · on March 18, 2018

discrete convolution is just dot product. in 2d it's elementwise matrix multiplication.

https://i.stack.imgur.com/YDusp.png

don't worry about continuous convolutions

llamaz · on March 18, 2018

Discrete convolution definitely isn't the dot product. Given an n-tuple a and an m-tuple b, form a polynomial p_1 from the coefficients of a and a polynomial p_2 from the coefficients of p_2. The the convolution of a and b is the coefficients of a * b.

e.g. a= [1,2,3] and b=[4,5,6].

p_1 = 1 + 2x + 3x^2 + ...

p_2 = 4 + 5x + 6x^2 + ...

p_1 * p_2 = 4 + 13x + 28x^2 + ...

Hence the convolution of a and b is [4, 13, 28, ...].

Here's the full answer:

octave:1> conv([1,2,3], [4,5,6])

ans =

    4   13   28   27   18

The intuition behind convolution to me for an image is uniform blurring. If you have a square shaped eraser, and you use it to blur a pencil sketch, the sketch will appear blocky and square shaped. Or perhaps a better analogy is painting: your wall will appear different depending on whether you use a paintbrush or a roller.

mlevental · on March 18, 2018

why in the world are you presenting convolution of polynomials?

i was being glib true but the crux of convolution (what's important for someone who expresses that they don't understand the mathematics) is elementwise multiplication i.e. dot product. yes you take a filter and sweep it across your input (or vice-versa depending on whether you're really talking about convolution or correlation) and that's important but in my opinion what's more important (obviously since that's what /i/ expressed) is elementwise multiplication. once you understand that (the elementwise multiplication) sweeping and padding and etc are simple to understand.

>The intuition behind convolution to me for an image is uniform blurring. If you have a square shaped eraser, and you use it to blur a pencil sketch, the sketch will appear blocky and square shaped. Or perhaps a better analogy is painting: your wall will appear different depending on whether you use a paintbrush or a roller.

this is a very bad example since you can easily convolve with filters that sharpen (e.g. literally any gaussian kernel). so you're really not illustrating convolution but instead convolution with an averaging filter (or something like that).

in general you shouldn't jump to criticize someone's explanation to a student unless you are prepared to give a much better and (this is important) pedagogically sound explanation because 1 you look foolish 2 you sabotage the student's understanding.

llamaz · on March 18, 2018

I don't care about the "student", I'm telling /you/ what convolution is. It's not the dot product.

"yes you take a filter and sweep it across your input..."

Sweeping across your input is the heart and soul of convolution. I've taken courses in image processing, control theory, probability theory, and digital signal processing - and in all these contexts, convolution is always sweeping across in a very specific way. If you change a single minus sign, you get cross correlation which is completely different.

I don't appreciate the rude tone you're taking so I will not be replying further. I didn't take that tone when replying to your initial message.

mlevental · on March 18, 2018

> don't care about the "student", I'm telling /you/ what convolution is. It's not the dot product.

Then don't comment on expositions given to students.

>Sweeping across your input is the heart and soul of convolution.

But that's like just your opinion man. The definition is what the definition is and a component of it is the translation of the filter and a component of it is the weighted sum. It's up to each individual to decide what's the most significant component and each individual is free to adjust their focus according to the problem at hand. In this instant I chose to focus on the sum for the sake of pedagogy.

>If you change a single minus sign, you get cross correlation which is completely different.

No in fact you get literally the same function just flipped about t=0. I would not call that completely different.

> I didn't take that tone when replying to your initial message.

Read your response again and reflect some more because absolutely your initial message was antagonistic

alexcnwy · on March 18, 2018

I think about it as a 2-D weighted average.

Not certain that's the correct intuition but it makes sense to me haha.

rootlocus · on March 18, 2018

That's precisely correct.

eat_veggies · on March 18, 2018

That's simpler than I thought. Thank you for the wonderful diagram!

llamaz · on March 18, 2018

it's not that simple. Have a look at my reply to the parent.

LolWolf · on March 18, 2018

The symmetry is mostly an effect of 'natural' images not favoring a specific axis (i.e., vertical vs horizontal lines are still distinguished as the same 'type' of lines, in some sense).

Another way of seeing why the symmetry emerges is that mirroring the image over an axis shouldn't change the edges you observe (this is the case for the edge detection). So, mirroring the underlying image shouldn't change the operation we wish to apply to detect edges.

alexcnwy · on March 18, 2018

This great interactive really helped me understand what a convolution is:

http://setosa.io/ev/image-kernels/

Highly recommended!

chris_wot · on March 18, 2018

Oh man, I've been trying to find a good explanation of what a convolution matrix does! I'm literally just now working on bitmap code in LibreOffice and was deliving into the convolution matrix functionality. Thanks to the author for this!

bitL · on March 18, 2018

Funny that GIMP is actually doing cross-correlation instead of convolution. That's also a mess in Deep Learning frameworks where "convolution" often means "cross-correlation".

rcthompson · on March 18, 2018

I'm interested to know what the difference is. Do you have a link that explains it?

LolWolf · on March 18, 2018

It's a pretty minor difference, really. Over functions (which is the same as over vectors/matrices, except the latter are taken at discrete points), the autocorrelation and the convolution differ by a sign. The definition of convolution is

conv(x(t), y(t))(q) = integral (x(t)·y(q-t) dt)

where x(t), y(t) are functions. Or, with sums and x, y are vectors

conv(x, y)(q) = sum(over i + j = q+1) of (x(i)·y(j))

whereas the definition of the autocorrelation is "simpler"

ac(x(t), y(t))(q) = int(x(t)·y(t-q))

In other words, it replaces y(t) with y(-t) (i.e., the autocorrelation of x(t) and y(t) is the same as the convolution of x(t) and y(-t)). A similar transformation happens for the discrete case.

Turns out, in the context of learning, these are equivalent since it just corresponds to a relabelling of the indices (which is an invertible linear transformation). In fact, even more strongly, independent of the order of the method (so long as it is >0th order!), this will cause no change in the descent direction or objective value, unlike arbitrary invertible linear transformations (for which this is only true of ≥2nd order methods).

LolWolf · on March 18, 2018

Also, sorry, I was taking terminology from the GP. The real term for this is cross-correlation—with autocorrelation being the application of the above operation onto a single function; i.e., taking ac(f(t), g(t)) as defined above, the autocorrelation of g is really defined to be ac(g(t), g(t)).

bitL · on March 18, 2018

A couple videos (1st one shows "real-world" meaning of both):

https://www.youtube.com/watch?v=MQm6ZP1F6ms

https://www.youtube.com/watch?v=C3EEy8adxvc

If you look at the kernel differences, for symmetric kernels they are the same, for non-symmetric they are flipped.

userbinator · on March 18, 2018

One thing that would help the comprehension of this document greatly is to mention that it's simply a "weighted average".

oehpr · on March 18, 2018

I don't think it is a weighted average. I believe you have to divide at some point for averaging to be going on. Convolutions just add up the neighborhood based on the kernel, and output their sum.

userbinator · on March 18, 2018

It still gets across most of the point of how it works.

As for dividing, that's what the Divisor option is for. To quote the article:

"The result of previous calculation will be divided by this divisor. You will hardly use 1, which lets result unchanged, and 9 or 25 according to matrix size, which gives the average of pixel values."

twic · on March 18, 2018

A fun filter i used a bit back when i was a scientist was the rolling ball filter:

https://imagej.net/Rolling_Ball_Background_Subtraction

It took me a while to realise that this is just a cousin of convolution: in a convolution, as you sweep the kernel over the image, combine pairs of pixels and kernel elements with multiplication, then summarise the results with addition, whereas with a rolling ball, you combine the pixels and kernel elements with subtraction, and summarise the results with the maximum. The kernel used is the 'ball', but there's no real reason it has to be based on a sphere in any way.

I wonder what other image processing operators could be built in that framework?

xvilka · on March 18, 2018

Hopefully GIMP will finish GEGL and GTK+3 migration soon to be more tablet-friendly.

inferiorhuman · on March 18, 2018

I still haven't gotten over the hatchet job they did on the UI with 2.8 on the desktop.

Grue3 · on March 18, 2018

They did a great job with 2.8 UI. Or are you one of these people who still can't get over the Save/Export thing?

rleigh · on March 18, 2018

As an occasional user, no I can't get over it. I choose the wrong action every time! It's a non-standard way of saving an image, and it's a usability fail.

Grue3 · on March 18, 2018

Since GIMP 2.8 came out 5 years ago and you still haven't got used to it, you must have been using it very occasionally, so hopefully it didn't waste too much of your time. However it's a big win for people who use any non-trivial features of GIMP which can't be "saved" in an ordinary image format. Pretty much every professional media software uses this workflow. You can't "Save As" .mp4 in Sony Vegas, you can only save your project as .veg, a file format which only Sony Vegas can open. Otherwise all your hard work will be lost.

inferiorhuman · on March 18, 2018

Or he just hasn't upgraded -- I know I haven't. But, you're right. I've mostly moved on to other tools. If I'm using GIMP to edit something, the source file is never going to be XCF. Sorry guys, nobody uses XCF and pros don't use GIMP. A big win for professional users would be CMYK support, not the inability to save non-XCF files.

Let's face it. Chasing non-existent professional users at the expense of your actual user base is pretty darn lousy UX.

Grue3 · on March 18, 2018

There are so many misconceptions in this post, I have to wonder if this isn't some sort of an elaborate troll.

>If I'm using GIMP to edit something, the source file is never going to be XCF.

What makes you think the source file can only be XCF? You can easily open any image format with GIMP.

>Sorry guys, nobody uses XCF and pros don't use GIMP.

Wrong and wrong. XCF is necessary to save your work in GIMP, so most GIMP users use it.

>A big win for professional users would be CMYK support,

Who the hell needs CMYK support these days? Print is dead (well, it's getting there anyway).

>not the inability to save non-XCF files.

You can easily export your project to any image format that exists.

>Chasing non-existent professional users at the expense of your actual user base is pretty darn lousy UX.

The tool should be optimized for people who spend the most time using the tool. GIMP is for long edits, where saving (actually saving) your intermediate progress is a part of the workflow, whether you're a hobbyist or pro. For quick edits there is Pinta and whatever else.

inferiorhuman · on March 18, 2018

> What makes you think the source file can only be XCF? You can easily open any image format with GIMP.

Nothing. But if I'm opening a not-XCF I probably don't want to SAVE it as an XCF.

> Wrong and wrong. XCF is necessary to save your work in GIMP, so most GIMP users use it.

So basically the only reason anyone uses XCF is that GIMP forces them to? That's some sound design logic there. Used to be you could save your work in not-XCF.

> Who the hell needs CMYK support these days? Print is dead (well, it's getting there anyway).

The professional users GIMP is chasing after.

> You can easily export your project to any image format that exists.

Would be easier to save.

> The tool should be optimized for people who spend the most time using the tool.

So chasing after the non-existent professional GIMP users would be a bad optimization then? Ok.

> GIMP is for long edits, where saving (actually saving) your intermediate progress is a part of the workflow, whether you're a hobbyist or pro.

You know, I'd love to find an open source alternative to Lightroom and Photoshop. Turns out things like darktable and GIMP are far more archaic. That's the reason I still buy and use the Adobe tools. Dicking about with the one workflow where GIMP actually works is a poor move and doesn't do anything beneficial for more involved workflows.

tambourine_man · on March 18, 2018

In Photoshop, that's Filter -> Other -> Custom

f00_ · on March 18, 2018

I thought it was interesting the same technique (convolution) was used in two different applications: photoshop/gimp for image filters and in convolutional neural networks (CNNs). What is the purpose of convolution in CNNs?

the setosa interactive says convolution is "a technique for determining the most important portions of an image"

the hubel and wiesel experiment showed "there is a topographical map in the visual cortex that represents the visual field, where nearby cells process information from nearby visual fields. Moreover, their work determined that neurons in the visual cortex are arranged in a precise architecture. Cells with similar functions are organized into columns, tiny computational machines that relay information to a higher region of the brain, where a visual image is formed"

I'm interested in David Marr's work, about to order Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, his level of analysis approach is interesting. I think he did work on edge detection, which must of been involved some kind of convolution/filter

This setosa interactive is great, and actually references the gimp documentation

http://setosa.io/ev/image-kernels/

An image kernel is a small matrix used to apply effects like the ones you might find in Photoshop or Gimp, such as blurring, sharpening, outlining or embossing. They're also used in machine learning for 'feature extraction', a technique for determining the most important portions of an image. In this context the process is referred to more generally as "convolution"

https://knowingneurons.com/2014/10/29/hubel-and-wiesel-the-n...

The classic experiments by Hubel and Wiesel are fundamental to our understanding of how neurons along the visual pathway extract increasingly complex information from the pattern of light cast on the retina to construct an image. For one, they showed that there is a topographical map in the visual cortex that represents the visual field, where nearby cells process information from nearby visual fields. Moreover, their work determined that neurons in the visual cortex are arranged in a precise architecture. Cells with similar functions are organized into columns, tiny computational machines that relay information to a higher region of the brain, where a visual image is formed.

https://ps.is.tuebingen.mpg.de/research_fields/inverse-graph...

Computer vision as analysis by synthesis has a long tradition and remains central to a wide class of generative methods. In this top-down approach, vision is formulated as the search for parameters of a model that is rendered to produce an image (or features of an image), which is then compared with image pixels (or features). The model can take many forms of varying realism but, when the model and rendering process are designed to produce realistic images, this process is often called inverse graphics. In a sense, the approach tries to reverse-engineer the physical process that produced an image of the world.

stevetk · on March 18, 2018

> I thought it was interesting the same technique (convolution) was used in two different applications: photoshop/gimp for image filters and in convolutional neural networks (CNNs). What is the purpose of convolution in CNNs?

IMO, convolution in CNNs would be better denoted as correlation. In CNNs the feature maps are convolved (correlated), eg swept over the image and multiple at each spot with the results being accumulated, in order to find where in the image these filters fit. The output produces a map of how strongly each filter (there are usually multiple in a convolution layer, that is the third dimension of the layer), fits with the image. These become the weights passed onto the next layers.

As an electrical engineer by training, who used convolution all the time, I didn't understand how the two were related, especially because of the third dimension.

I read this https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Unders... and that lead me to how I understand it today.

jasonmp85 · on March 18, 2018

The writing quality in these docs is absolutely terrible.

jake_the_third · on March 18, 2018

English is likely not the first language of the volunteer who wrote these docs. If you want better documentation, I encourage you to write it yourself.

The project needs all the help it can get.

jasonmp85 · on March 18, 2018

I assumed as much am not belittling the skills of a non-native speaker. I was just sort of surprised that a project that has been around so long hasn’t had better attention paid to its docs (my biggest FOSS experience is with PostgreSQL which is a bit of an outlier in just how good its docs are).

Const-me · on March 18, 2018

I wonder who and why would use the feature?

These matrices are slow. Many kernels used in practice are separable https://blogs.mathworks.com/steve/2006/11/28/separable-convo... and therefore much faster to compute than a generic convolution matrix.

Also the UI is hard to use. 25 edit boxes is a lot.