Matlab’s internal memory representation (2012)

chrisBob · on Aug 22, 2015

I recently learned about copy on write the hard way: We had a buggy dll from a hardware vendor that was editing our input, so the obvious solution was to keep a copy of the data we sent it. To our surprise, even the copy we made changed unless we did it in an assignment like A = 1.0*X;

On a related note the tech support at MATLAB is very responsive. Every time I have had an issue they get back to me quickly, and I feel like the follow up and really fix bugs.

imurray · on Aug 22, 2015

A different (probably slightly cheaper) hack to get a safe copy of an array is:

    A = X; A(1) = A(1);

Then one can evilly edit A in-place in a mex file without accidentally editing other arrays like X. However, the docs tell us never to edit arrays from input arguments in place, so a future version of Matlab could be cleverer and still break things. I wish their mex API would provide a supported way to edit things in-place. Native Matlab code does allow inplace editing:

   A = fun(A);

If function fun is defined that way too, A can be altered in-place by native matlab code, but not (officially) by mex functions.

nimrody · on Aug 22, 2015

What Matlab function edits its inputs in-place? I cannot think of any.

Actually Matlab's pass by value is one of its best properties (as a language). This is how functions work in math and is much easier to reason about.

I really dislike Julia's pass by reference semantics. Even if it enables better performance.

imurray · on Aug 22, 2015

Here's an example:

    A = plus_diag(A, x);  % Using [1] below

Adds a vector x onto the diagonal elements of a huge matrix A. In old versions of Matlab, 2*huge memory would temporarily be allocated. Now the memory usage doesn't increase: A is altered in place. This optimization happens automatically when Matlab notices it can be done. The code has to be in a function though, not run from the command-line or a script.

[1] http://homepages.inf.ed.ac.uk/imurray2/code/imurray-matlab/p...

nimrody · on Aug 22, 2015

The fact that A is updated in-place is an optimization that happens only if you pass a variable and assign the output to the same variable.

It is not a characteristic of the function `plus_diag`. The function itself is completely pure.

imurray · on Aug 23, 2015

I'm well aware of how and when it works. It is the recognized idiom to perform in-place operations in practice in Matlab code [1].

It would be nice to take advantage of the same idea in mex code where possible. For example an API function that indicates if it is safe to over-write an input array and return it as the corresponding output or not. I don't think Mathworks currently provide documented and future-safe means to do so however. (If one controls all the code, updating things in-place is possible in practice if careful, using the A(1)=A(1); trick I gave in a parent post. Such code is just not guaranteed to be future-proof.)

[1] http://blogs.mathworks.com/loren/2007/03/22/in-place-operati...

dr_zoidberg · on Aug 22, 2015

This reminds me of the "copy list" expression in Python:

    In [1]: a = [1, 2, 3]
    In [2]: b = a
    In [3]: a is b
    Out[3]: True
    In [4]: b = a[:]
    In [5]: a is b
    Out[5]: False

Usually this kind of "optimization" is done to avoid copying large ammounts of data in memory, but as you said, unless it's widesepread knowledge or explicitly stated, it can lead to some confusing situations/bugs.

animefan · on Aug 22, 2015

In almost all cases knowledge of copy-on-write doesn't have to be widespread because it is equivalent to value semantics.

It is only because they were using a buggy dll (which for non-matlab users, is custom native code that matlab calls against using a FFI), that the expected semantics of matlab were violated.

maxerickson · on Aug 22, 2015

That's just how Python works though, not an optimization. Python has names for objects more than it has variables, = binds a name to an object.