Object Theft Etiquette in C++: methods with a side of &&

wallstop · on July 5, 2015

This post horrified me. This may be due to two factors: (1) having not written any serious C++ in the past year and (2) a focus on "correct" or "easily verifiable" code.

I can understand the urge to want to have the speed and power of move, but it feels like you're destructively working around the standard. The proposed API in SetStringer seems very dangerous; you're passing back a const ref to a string that isn't really const at all (successive calls after the set has been updated will mutate it's state, users may be unaware of this).

Move, copy, and (const) references each have their place and should be used accordingly. Move when you want a transfer of ownership, copy when you want duplication (at this instance) of ownership to another owner, ref when you want shared ownership. The intent of the article seems to actively go against these intents; stinos' comment sums up the rest of my feelings perfectly.

SamReidHughes · on July 5, 2015

> The proposed API in SetStringer seems very dangerous; you're passing back a const ref to a string that isn't really const at all (successive calls after the set has been updated will mutate it's state, users may be unaware of this).

An accessor returning a const reference is perfectly normal. Something being const in C++ never meant that other things can't modify it, and the risk you describe has nothing to do with it being a const reference, as opposed to the non-const kind. For example, one given by the article, in fact, you get the same situation with an even bigger risk profile with the expression v[i], where v is a std::vector<std::string>.

> ref when you want shared ownership.

Not at all. If you have a reference to something, you don't own it.

detrino · on July 5, 2015

You misunderstand the point of reference-to-const in C++. It is just a const view of an object, it makes no promises about the object itself. For functions returning a reference-to-const, the value of the contract is for the function, not the caller. For functions accepting reference-to-const parameters, the value of the contract is for the caller, not the function.

lultimouomo · on July 5, 2015

Your horror is completely justified: this is not correct C++. Accessing an object after it has been move()d it squarely undefined behaviour

Matheus28 · on July 5, 2015

It is not undefined behaviour. The object is left in a valid, but unspecified state. If you .clear() it or whatever (in case of a vector), you'll have a perfectly good object to use.

lultimouomo · on July 5, 2015

You're right; s/undefined/unspecified/.

The comment on the article still stands though. You're still invoking unspecified behaviour, and this is a horrible practice - it forces you to take not on which classes you are sure of the actual behaviour when accessing after move, and you're bound to slip.

(Note that the article itself uncorrectly says that move leaves string in an invalid state, which is what threw me off track)

quicknir · on July 5, 2015

Actually, that's not true. Here's what I wrote:

"By stealing the string, we put foo in an invalid state".

foo is an object of the SetStringer class. It puts it an in invalid state because it no longer maintains its own invariants.

The whole point of this article is about how you can move member variables for efficiency, while ensuring the owning object is left in a valid state (as move/rvalue semantics require).

Wanting to be able to move a member variable is no stranger than wanting to move the object itself; in that sense an rvalue ref overloaded getter is no stranger than move construction/assignment, it's just more rare because it's less used.

wallstop · on July 5, 2015

That's my main point of disagreement: wanting to move a member variable is more strange than moving the object itself. Consider your use case: you have some cached string that you may or may not want to move out. If you move it, the next time you access "it", the string needs to be re-computed. This means the string is going to re-malloc the memory that it had before it was moved. I don't understand what "efficiency" the outlined code gives you.

While I am not a C++ day-developer nor guru, from what I have seen and written of modern C++, the "move member variables" is not "good", in any sense of the word, stemming from the reasons put forth by many a comment.

If you ever want to move a member variable, I would highly recommend that you re-think your design. Ask questions such as "why do I want this?" and "is it really more efficient?". In some cases, where perf is really that important, then yea, go for it. However, just because something can be done, does not mean that it should be done.

While I still disagree with your const & return of the original string, I threw together an example showing the same exact API you proposed, but without moving member variables:

https://gist.github.com/wallstop/4d80fba9b1d15da64488

Depending on method usage, this may or may not invoke the cost of an extra string creation (+ extra on move method, - for continual access of cached string)

SamReidHughes · on July 5, 2015

His example is perfectly fine. The example that puts the object in an invalid state is an example of what not to do. The example of what you should do, which is to make the method marked "&&" so it can only be used on rvalue reference, i.e. an object that's going to be discarded anyway, makes a method that leaves the object in a valid state and one that conforms perfectly to expectations. It even politely cleans up m_set.

Generally speaking, there's nothing wrong at all with moving from a member variable. You can see an example of where it's perfectly all right in this article.

> you have some cached string that you may or may not want to move out. If you move it, the next time you access "it", the string needs to be re-computed. This means the string is going to re-malloc the memory that it had before it was moved. I don't understand what "efficiency" the outlined code gives you.

The method in question can only be called on an rvalue reference, which means the object, in its present state, is not going to be used again in the future anyway, so the problem of reallocating the string is moot.

> https://gist.github.com/wallstop/4d80fba9b1d15da64488

Your code has a use-after-free bug, because your std::string&& str() && function returns a local variable by reference. Your 'std::string&& str() &&' function doesn't even destructively modify the object, so there's no reason for it to be marked '&&' in the first place, and there's no reason for it to exist at all, because its performance is worse than the 'const std::string & str() &' version.

quicknir · on July 6, 2015

So, in retrospect, my example was not a good one, I'm afraid. the recomputing of the string was a red herring. All that matters is that the moved-from object was in a valid state. You can look at the updated post, maybe it will help a bit.

If you want a concrete example, consider std::stringstream. You agree this class is useful, right? In many cases, you build up a string, bit by bit. You then want to extract the string at the end. Because stringstream was written before move semantics, that string is returned by value. That means a full copy of the internal string buffer is made.

In 95% of real life use cases of stringstream, the stringstream is a local variable used to build up a string and then discarded. So copying that string is fundamentally a waste. But exposing the string buffer in a way that std::move could be fruitfully called on it is dangerous, it may break stringstream's class invariants and turn it into a toxic object, this is avoided in C++. So the preferred solution would be to give stringstream::str an rvalue overload, so that the stringstream can safely release its string buffer. Does that make sense?

Of course I am aware that this code involves extra work, is harder for less experienced team members to understand, etc. I wouldn't do it in the vast majority of situations. But sometimes it is useful. The goal was to try to help people understand &&/& member overloads, as there isn't much discussion of them.

Respectfully, I am a professional c++ developer. Odds are that the code I write is under greater pressure for performance and genericity. Maybe a bit of benefit of the doubt is in order.

bodyfour · on July 5, 2015

I think concentrating on the move part is the wrong way to look at rvalue-ref methods. The advantage is that if the compiler knows an object is a temporary it you can safely let the caller safely "steal" a member object.

  #include <string>
  #include <stdio.h>

  struct A {
    const std::string& str() const & { return m_str; }
    std::string&& str() && { return std::move(m_str); }

    std::string m_str;
  };

  void func(const std::string& s) { puts("Got const value"); }
  void func(std::string&& s) { puts("Got rvalue"); }

  A returns_a() { A a; return a; }

  int main()
  {
    func(returns_a().str());
    A b;
    func(b.str());
    return 0;
  }

detrino · on July 5, 2015

The standard defines preconditions when talking about what operations on valid on its containers. std::string::clear() has no preconditions so it is valid to call on a moved from object and then you can be sure of its state.

stinos · on July 5, 2015

Calling moving objects 'stealing', calling for 'dishonest citizens' etc gives moving a bad connotation which it hardly deserves at all, especially not in correct code. The main reason is that the object which is moved from is usually not going to be used anymore anyway so it doesn't matter what happens to it's content. That is just how it is meant to be used by the folks designing it: transferring content. Calling that stealing would be like calling the righteous transport of goods from one warehouse to another warehouse stealing: it is not, all involved parties know that after the transaction the first warehouse will be empty and the latter is full. The premise set in this article (first move, then use the object again and expect it to have content) goes against this. Which is actually fine, unless of course you expect that moving somehow leaves the original intact. Which it doesn't, that is what copying is supposed to be used for. At least in my opinion.

quicknir · on July 5, 2015

Dude, the stealing etc was clearly tongue in cheek.

It seems like the example I picked was a bit unfortunate in the following sense: a couple of people have already misinterpreted the point of the article to be that foo still had content after I moved its string (I won't use stole anymore, seems like a touchy subject).

That's not the point, the point is that the invariants are maintained, which is the requirement for a valid state. I could have reset m_set at the same time as I set m_cached to false, it would still be just as correct.

Edit: I have modified the post slightly so that it's a bit clearer that what I'm after is a valid state for foo, not to maintain the specific things that were inserted.

enqk · on July 5, 2015

I always think of the X in std::move(X) as an object that is about to be destructed. I.e. treat a bit the std::move cast as a marker for destruction

SamReidHughes · on July 5, 2015

There is no particular reason for the 'std::string&& str() &&' method to return a std::string&&. You could return a std::string instead, and still avoid copying. Sometimes you'll get an extra move construction, where it isn't elided on the return side of things, but your code will be more resistant to bad refactorings. The member variable will always be moved from, instead of having that depend on how the caller behaves. Also, it's less likely that you'll code your way into returning an invalid reference.

detrino · on July 5, 2015

This kind of interface has precedent: http://en.cppreference.com/w/cpp/experimental/optional/opera...

SamReidHughes · on July 5, 2015

I'm not saying the interface is externally bad. The danger and reasoning here is that the implementation could be made bad. STL functions can tolerate that risk, and they don't have the luxury of knowing the object's a std::string on an uncritical path, not some graph node with zillions of back-pointers. (Edit: I think their interface also avoids copying when dealing with non-movable types.)

ridiculous_fish · on July 5, 2015

It looks like `m_cached = true;` was accidentally omitted. As written it is always false.

quicknir · on July 5, 2015

Thanks for the catch, fixed.