Hacker News new | past | comments | ask | show | jobs | submit login
Returning multiple values from functions in C++ (thegreenplace.net)
78 points by ingve on March 6, 2016 | hide | past | favorite | 62 comments



I know it's mentioned specially in the article but the C programmer in me immediately wants to use structs. They don't need to be complex or long lived or even initialized. And as a bonus there's no heap involved anywhere. Either that or just bail out of c++ and use go.


Yeah, same here, a president of a local Duff's decice fanclub, but you have to admit that proposed C++ 17 feature is pretty nice:

    ... foo()
    {
        return { 1, "bar", false };
    }

    auto { i, s, b } = foo();
It's a syntax sugar, granted, and they could push it further by adopting tuples as a native language construct with some shorthand notation (e.g. [...]), but the idea is nice.


Especially when being used to Python this would be a very welcome addition


Sweet!

Although I still cannot believe it took C++ like 30 years to come up with this.


> Either that or just bail out of c++ and use go

So either use structs or totally change language? That doesn't really make sense.


std::tuple is essentially an "anonymous" struct -- there's no allocation involved


But the members inside don't have proper names.


That's rarely a problem where MRV makes sense. Though you could always fix it by using anonymous structs whose fields are both named and positional (similar to Python's namedtuples)


I prefer to use generic structures (tuples, pairs etc.) or return proper type. One time structs are usually redeclared on multiple places. It is really hard maintain such code.


    > One time structs are usually redeclared on multiple places. It is really hard maintain such code.
Well, if you don't need forward declarations, you can define structs inline in function signature:

    #include <stdio.h>
    
    struct ret { int a; int b} hello(int c) {
        return (struct ret){ c, c + 1 };
    }
    
    int main(int argc, char **argv) {
        struct ret result = hello(2);
        printf("%d %d", result.a, result.b);
    }
I'm not sure how portable it is though, works in gcc and in whatever ideone is using: https://ideone.com/bgZUPS


G++ 5.2 complains: "new types may not be defined in a return type"

An alternative in C++14:

auto hello(int c) { struct { int a; int b; } result { c, c + 1 }; return result; }

or with gcc extensions: auto hello(int c) { return (struct { int a; int b; }) { c, c + 1 }; }


I meant something like that:

  struct x {int a, char* b}
  x foo();
  // a lot of code
  struct y {char* y, int x}
  y bar();
  //another header
  struct z {bool a, int b, char* c}
  z foo2();
It is ussually the same data structure, but someone was lazy to define it properly.


Well it's not really one time use than is it. Compound literals are your friend.


The problem with deconstructing a tuple is that you can easily make mistake about the order of its members. The struct is a much better solution to me, and you can go over the problem of the weirdly named one-off struct by using the iod library (https://github.com/matt-42/iod):

  foo() { return D(_id = "test", _age = "42" }; }

  auto r = foo();
  std::cout << r.id << r.age << std::endl;


Meaningful names would be nice.

In C++ iterating over a map items you get a pair with ->first() and ->second() methods instead of key / value.

Python has named tuples which can work both ways.


Ada has in, out and in/out parameters which adds a little more semantic to C and C++'s pointer parameters.

Author mentions Common Lisp's multiple values, and one of the main feature of multiple values is that they are optional: the caller does not need to use secondary values. Even though floor returns 2 values, the following is a valid expression:

   (+ (floor x) 2)
Only the primary value is used. A compiler typically generates code that use registers: the program does not allocate memory to wrap return values in list or a structure (all values are still computed, but secondary values are often additional data that a function needs to compute to determine its primary result).


Sometimes Lisp can get annoying though. I've used libraries where a function might return 5-6 values and I only need the first and last. I need to bind all of them and then (declare (ignore...)) the useless bits otherwise I get compiler warnings.

It's an annoyance, because I always feel the general ethos of Lisp is to eliminate tedious typing and multiple value returns frequently add a lot of cruft.


Common Lisp is a programmable programming language.

You need to hack around it. Don't like it? Improve it. Sketch:

    (defun make-vars (vars &aux syms)
      "creates uninterned symbols for vars named NIL.
    Returns a list with those replaced and a list of the new syms."
      (values (loop for var in vars
                    if (eq var NIL)
                    collect (let ((sym (gensym "ignore"))) (push sym syms) sym)
                    else collect var)
              syms))

    (defmacro multiple-value-bind-some (vars form &body body)
      "Similar to MULTIPLE-VALUE-BIND, but variables named NIL will be ignored."
      (multiple-value-bind (vars syms) (make-vars vars)
        `(multiple-value-bind ,vars ,form
           (declare (ignore ,@syms))
           ,@body)))
Example

    (defun foo (x)
      "returns five values"
      (values x (* x 2) (* x 3) (* x 4) (* x 5)))


    (defun test ()
      (multiple-value-bind-some (a nil nil nil b)
          (foo 10)
        (list a b)))


I have a hunch that you defined make-vars as a separate function just to be able to nest two multiple-value-bind, one in the macro and one in the expanded form ;-)


A replacement for MULTIPLE-VALUE-BIND, which uses MULTIPLE-VALUE-BIND itself, shows a bit of the value of MULTIPLE-VALUE-BIND.


Yeah, that's a great example of the power of Lisp!

Of course it now also leads to one of Lisp's other problems; namely that after a while everyone ends up programming in their own private language of accumulated hacks :D


The user needs to learn more than in Java:

1) Learn to deal with a language which has an infinite number of syntactic abstractions.

2) Learn the typical patterns of syntactic abstractions (WITH- , BIND-, DEF- ...).

3) Learn how to write your own abstractions.

Then groups will settle on common (!) Lisp patterns. See for example:

https://common-lisp.net/project/alexandria/

There are lots of libraries which provide language extensions, which are used by many people.

The extremes in Lisp are then:

* no syntactic abstractions -> the power of Lisp wasted

* using those syntactic abstractions which are approved by user groups, due to inclusion into libraries

Above choices are relatively conservative.

As another extreme, it is fully possible to change the language - but then Common Lisp provides more than macros to do so. See for example reader macros, CLOS MOP, customs evaluators/compilers, code walkers, ...


Yeah, I use Alexandria by default in every project now. I remember when I first started coding in Lisp, I wrote a ton of obvious functions/macros like the missing hash table traversals that are in Alexandria but are rather bizarrely missing from the Common Lisp spec. Then I found Alexandria and realised I'd written a good fraction of those functions myself! This was pre-quicklisp, so library discoverability wasn't great.

Now I'm trying to make a concerted effort to follow the recommendations here:

http://eudoxia.me/article/common-lisp-sotu-2015/

Hopefully this will lead to a modern set of consolidated libraries. Effectively a new Common Lisp standard: CL-20xx


Common Lisp already provides various hash-table traversals with LOOP: http://www.lispworks.com/documentation/HyperSpec/Body/06_aba...


That's a good point. I always tend to forget about loop because I don't use it all that much, prefer using the Iterate library. Loop is a whole extra language all on its own.


I agree that Lisp is not always terse, which surprised me at first too. You can abstract many constructs and eradicate redundancy, but at some point, your only chance of being terser is to use reader macros. However, doing so might not be a good idea either.

There are libraries out there that provide syntactic sugar for Common Lisp, which is often discarded because (i) they define their own "ghetto" dialect of Lisp (ii) it is in fact not very important to waste a character or two. Aesthetics alone is not a goal.

For multiple values, you might still use a list instead, in which case you are back to the usual way of accessing data:

   (let ((list (multiple-value-list (get-decoded-time))))
     (format t "Year: ~A~%Seconds: ~A" (sixth list) (first list)))
If you only want to access a particular value, use "nth-value" (not possible in your example, of course).


Yeah, that's how I do it sometimes with other peoples code. But with my own code I much prefer using structs. Makes it much clearer to read and there is little implicit structure like with an ordered list. So my preferred way would look more like:

  (let ((struct (my-get-decoded-time)))
     (format t "Year: ~A~%Seconds: ~A" (time-year struct) (time-seconds struct)))
With the original multiple value return, if there were a bug and I were reviewing the code, I'd need to also review get-decoded-time and see what order the values are in. If instead it returned structured data, I could skip that step as the values are clear.

Lisp is still my favourite language. I generally rank languages by how much time/code I'm spending managing the language versus managing the problem I want to solve. Lisp is very good for that, very little cruft so it lets you focus directly on the problem at hand.


Agreed, structs and classes make often more sense to aggregate data, especially when they are all equally important.


C# has out if I remember correctly. Haven't used it in months (C# that is) but it pretty much does the job as far as I've used it.


The getline case (boolean and possible value) really wants a type like Haskell's "Maybe" or Rust's Option: a type that either contains nothing or a value.


Unfortunately the currently available solution (boost::optional) performs enough worse that the difference can be measured in a simple implementation of "wc -l". Consider the following code:

    #include <iostream>
    #include <string>
    #include <boost/optional.hpp>

    using namespace std;
    using namespace boost;

    optional<string> getline_(istream &st) {
        std::string line;
        if(getline(st, line)) return line;
        return none;
    }

    int main() {
        int lines = 0;
        while(getline_(cin)) lines++;
        cout << lines << "\n";
    }
vs

    using namespace std;
    int main() {
        int lines = 0;
        string line;
        while(getline(cin, line)) lines++;
        cout << lines << "\n";
    }
Run on an input file of 172,544 lines (the GPL-3 license repeated 256 times), the version using "optional" takes 400ms and performs 1,077,504(!) heap allocations, while the more standard version takes just 280ms and performs just 8 allocations.

Of course, "wc -l" (GNU) takes just 9ms, so clearly this isn't a great program either way--but it does serve to show that boost::optional can be a surprising performance sink compared to using a bool return value + out parameter.

(tests on debian jessie amd64, i5-3320M @ 2.6GHz, g++ 4.9 -O3, boost 1.55.0.2)


That's not the fault of optional though. It's because with the optional version you're creating (and destroying) a new string each iteration of the loop.

Something like this (untested) should be much closer to the non-optional version:

    #include <iostream>
    #include <string>
    #include <boost/optional.hpp>

    using namespace std;
    using namespace boost;

    optional<string&> getline_(istream &st, string& line) {
        if(getline(st, line)) return line;
        return none;
    }   

    int main() {
        int lines = 0;
        string line;
        line.reserve( 1024 );  //1024 should be long enough for any line
        while(getline_(cin, line )) lines++;
        cout << lines << "\n";
    }


Which more or less removes any benefit for using optional, unfortunately.


Yes for this case, but that's just because it's a bad example for comparing the performance of optional, because the code is doing things that have very different performance characteristics.

There are plenty of other cases though where it is useful to use optional (e.g. instead of passing/returning a naked pointer which may or may not be null) and in those cases use of optional will have little/no overhead.


+1 for Having Measured It.

The basic getline interface removes heap pressure by reusing the same string over and over, even though semantically the optional<string> approach may look cleaner. Eric Niebler showed a while ago (I can't remember where) that this API lends itself perfectly to ranges, and can provide both a reasonable interface without sacrificing performance. IIRC, the API looked somewhat like this:

    for (auto& line : getline_range(cin))
      f(line); // do something with line, e.g., parse.
The semantics are: iterate over standard input until EOF or error and let the range keep (and reuse) the string internally while exposing it as const-reference by dereferencing.


There's std::optional that's still in the works:

http://en.cppreference.com/w/cpp/experimental/optional


In this case, passing the std::string to std::getline by reference is more than just an "out parameter": it is the buffer in which the result should be stored. Of course this will be more efficient than allocating a new buffer for each line -- and that is a good reason why this particular interface would be worse off with multiple return values.

The std::unordered_map::insert example in the article is an excellent example of an interface with multiple return values: it returns a pair (iterator, bool) of an iterator to the given key and a bool indicating if the iterator is to a newly inserted key-value pair or if it was the key-value pair already stored in the map. Neither the iterator nor the bool owns a buffer like in the std::getline example, so the std::getline out parameter design is not needed here.


For a moment, I thought they had added true multiple value return to C++. I am sooo glad they didn't. The lisp folks added it way back, and it has been an ugly scar on the language ever since. Why do real multi-value returns suck? because they don't compose well, they're awkward as all heck to use, and require wrapping certain function calls in really weird constructs. The PROPER way to do multiple values is to return a data structure and pattern match against it. Of course, lisp has support for this, multi-value return was just an optimization hack. WHICH YOU SHOULD NO LONGER USE!


Multiple values are not "just" an optimization hack, they are designed to be convenient to use. I don't know which construct you find "weird". Multiple-value-bind is a descriptive name which is long to type but this does not matter (type m-v-b + auto-complete).

Typical example: you read a line in a file. The line you get is terminated either by a newline character or by the end-of file. The implementation of read-line[1] must know if you reached end of file or not, but most of the time, you don't care about it. Sometimes, you want to know, and then you can take the secondary value (missing-newline-p). I like the fact that I can focus on the primary value without caring about other ones when I want.

[1] http://clhs.lisp.se/Body/f_rd_lin.htm


Sounds like methods could also add special-purpose multiple values later, without impacting existing callers. Which is nice too.


Maybe this is different in CL, but in scheme, if you're using a procedure that returns multiple values, there is no default return, or wrapping the multiple values into a list: You have to use call-with-values, any number of srfis providing sugar, most commonly srfi-8, although 11 and 71 also exist, and without the sugar, god knows how many lambdas. EVERY. SINGLE. TIME. You want to call a function that returns multiple values.


You might want to learn Common Lisp BEFORE you make statements about how a feature of the language is obsolete or not.


I was talking about the general idea of MVR, AND how I thought is was implemented in lisp. I was wrong. This happens sometimes.

And even the new explanation, I STILL don't like MVR. I think that returning a list or other data structure, or just writing two functions, is much clearer, and lends itself better to function chaining. Sure, in CL there's a default, which makes things a little better, but what if you want the other one? I don't want to write

    (multiple-value-bind (a b) (foo bar baz) b)
all over my codebase, and the scheme equivalent is just as bad, if not worse.


You don't have to write it all over the codebase. If you have a specific need, defmacro yourself out of your problem.

Generally speaking, taking a secondary value alone feels like a bad use of MVR (it happens rarely with standard functions and most libraries).

MVR are not intended to be substitute to other data-structures. As said elsewhere, you can and should use structs, classes, lists, hash-tables,...

After all, why do we use multiple values instead of a single struct for functions arguments? Because in CL and other languages, unlike in Haskell, you don't want to define an ADT for all your functions or rely on currying. But if you start putting too many coupled parameters in your functions, it starts to smell fishy.

Extracting a single value is easy, by the way:

  (nth-value 1 (foo bar baz))


Why don't you take a bit of time to read about multiple values in Common Lisp?

    CL-USER 76 > (nth-value 1 (values 'a 'b 'c 'd))
    B
While you are at it, check out the feature of 'Macros' in Lisp, which allows everyone to write code to shorten this stuff. See my code in this thread for a macro which then allows you to write:

    (multiple-value-bind-some (NIL b)
       (foo bar baz)
      b)
One then can name ignored variables as NIL in the source code...


Good point. And very true. Still looks a bit ugly to me, but it's better than nothing, I suppose. And yes, I am aware of macros. For some reason using them in this situation didn't occur to me, so I guess I am an idiot after all.

This is actually one of the cases where syntax-rules would allow you to write something cleaner, I would guess. I like syntax rules, but everybody always seems to complain about it...


It's funny, your comment uses "true MRV" for language-level special-casing by opposition to a matched (or even just python-style unpacked) data-structure, whereas I've always thought the latter was "true MRV".

Either way, we do agree on MRV being a language-level special case being a hack. Even PHP isn't that bad — though the unpacking is macro-ish bleh.


If you wrap your data in a list or a tuple, it is still a single value. It makes sense to talk about true MRV in languages that support it, like Ada, Forth and others... (https://rosettacode.org/wiki/Return_multiple_values)


Exactly.


Which is clearly nonsense. Multiple values have been extremely useful and there are several code bases in Common Lisp which make extensive use of it.


Yes. They are useful. So is goto, so are continuations. MVR isn't the unforgivable sin that those things are, but imo, it makes your codebase uglier.


Many Lisp programmers think different.


Well, I'm not many lisp programmers. I'm me. I've got an opinion that I think is reasonable, and I'm willing to hear you out. In fact, I already have, and I disagree. This isn't a problem, it's good to hear differing opinions, although you always sound a bit smug about it. But I could just be reading that in.

Point is, I disagree, and your opinion is fine. 'Kay?


Instead of markers, I like to use typedefs instead. Here's an example how this would look like in a HPC related program. Basically, if you know Fortran, you know where this is coming from.

    typedef const double * const __restrict__ IN;
    typedef double * const __restrict__ OUT;
    typedef const int INT_VALUE;
    typedef const double FP_VALUE;

    void diffuse_c(
        OUT runtime_total_s,
        OUT runtime_boundary_s,
        OUT thermal_energy_updated,
        IN thermal_energy,
        INT_VALUE nx,
        INT_VALUE ny,
        INT_VALUE nz
    );


I can't be the only one who finds "INT_VALUE" both superfluous and unnecessarily ugly?


You're right, I have been thinking about this naming as well. I think in this context, "INT_PARAM" would be better. I like to typedef it because this way it's easier to replace with a hardware specific integer type later on if I decide to do so. Anyways, the "VALUE" naming is coming from Fortran, which is by default pass-by-reference, so you use it to get pass-by-value.

Edit: Thinking about it some more, "INT_VALUE" still makes sense - it makes it clear that this specific symbol should be accessed as a value, not as a pointer. Doing this reminds me that I'm (probably) imposing more register pressure by passing by value. In general I'd probably pass a const int * const pointer here, but in this specific case I needed values because of some CUDA specific limitation.


> Doing this reminds me that I'm (probably) imposing more register pressure by passing by value.

Not for "int"s, I would be very surprised if sizeof(int*) < sizeof(int) on your system.


Good point, that was a bit of a brainfart of mine.


I am a bit surprised exceptions were not mentioned in the context of returning an error value.


The idea was to show things that actually make sense as return values / statuses rather than exceptions. I surely wouldn't throw an exception for "getline reached EOF"


Yeah, I am not for overusing exceptions, but languages like python make creative use of exceptions. For instance the generator syntax raises an exception when it is done. Interestingly, you can make a generator to go through the lines in a file, and use that when you are at the EOF.


It is certainly a good way to _avoid_ throwing exceptions in a public API.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: