Creating an empty iterator of a certain type in Rust (2018)

realcr · 2024-09-07T13:02:39 1725714159

I stumbled upon this article today. My first thought was that this is an interesting question, then I continued reading and thought that the solution ideas are kind of crude. Then I noticed it was written at 2018, so I realized we should give the author a break. Finally I realized that it was me who wrote this, 6 years ago!

So much has changed, but I believe the easy solution already existed back then. As people wrote in the comments here, one can iterate over an Option and then use flatten.

Strange experience indeed.

kd5bjo · 2024-09-07T06:16:32 1725689792

Because `OptionIterator` implements the `Iterator` trait, the function signature can be simplified to return just `impl Iterator<…>` like the original version did, leaving the additional type as an implementation detail.

Also, a similar solution is provided by the `either` crate [1], where the function body can return `Either::Left(std::iter::empty())` in one case and `Either::Right(…)` in the other, also hiding all of this behind an `impl Iterator<…>` return type.

[1] https://docs.rs/either/latest/either/enum.Either.html

the_mitsuhiko · 2024-09-07T10:03:59 1725703439

You don't need to do all that much, you can use `Option` with `into_iter` paired with `flatten` instead:

    fn neighbors_with_send_capacity(&self, a: N, capacity: u128) -> impl Iterator<Item = &N> + '_ {
        self.nodes
            .get(&a)
            .map(|m| m.keys())
            .into_iter()
            .flatten()
            .filter(move |b| self.get_send_capacity(&a, b) >= capacity)
    }

The only downside is that I believe `size_hint` is not properly forwarded.

OptionOfT · 2024-09-07T17:45:55 1725731155

Forgive the non-versioned link:

https://doc.rust-lang.org/1.81.0/src/core/iter/adapters/flat...

I think it does work correctly.

claytonwramsey · 2024-09-07T05:43:38 1725687818

Another possible solution for this use case: let each capacity graph store a “dummy” `HashMap<N,(u128, u128)>` for the case where the key is missing. Then, return an iterator over the empty dummy to get an iterator of the same type.

I will however note that simply using an option in the return value is probably the best choice both API- and performance-wise. The consumer may want to do different things if the key doesn’t exist versus if the key has no neighbors. Additionally, returning an Option means there’s only one check for whether the retrieval was correct, rather than checking at every call to `next`.

orlp · 2024-09-07T09:35:51 1725701751

There is another alternative which may be significantly more efficient as it means the caller always gets back a direct iterator instead of an enum it must branch on: make a dummy empty hashmap with a static lifetime you can refer to. Then the implementation becomes

    static DUMMY: LazyLock<HashMap<N, (u128, u128)>> = LazyLock::new(HashMap::new());

    fn neighbors_with_send_capacity_option(&self, a: N, capacity: u128) -> impl Iterator<Item=&N> {
        let node = self.nodes.get(&a).unwrap_or_else(|| &*DUMMY);
        node.keys().filter(move |b| self.get_send_capacity(&a, b) >= capacity)
    }

Unfortunately we have to use a LazyLock for the static because HashMap::new isn't const. It only adds a single always-correctly-predicted branch that checks for initialization on access though, so it's fine.

kawogi · 2024-09-07T06:44:47 1725691487

> However, this solution is not very ergonomic, as it requires the user of the function to first check if the returned value is Some(iterator) and only then iterate over the iterator.

I'd use `flatten()` to iterate over this nested type.

Patryk27 · 2024-09-07T07:59:31 1725695971

fwiw, Option already implements iterator (well, IntoIterator) - calling .into_iter() on an Option will get you an iterator, no need to create anything custom.

antonvs · 2024-09-07T09:34:50 1725701690

I wonder if that existed when this article was written in 2018.

MereInterest · 2024-09-07T11:04:29 1725707069

The footnotes mention that this solution was suggested by Reddit comments at the time.

sk11001 · 2024-09-07T07:56:44 1725695804

I don't understand the people who look at this and go "yep, we should have more of this language". It's complexity for its own sake.

Sharlin · 2024-09-07T09:41:41 1725702101

No, it’s (slight) complexity for performance’s sake. Rust’s iterators are famous for how well they optimize, thanks to being generic, monomorphized, and thus inlinable and eligible to the full power of the optimizer.

If you want slow dynamic dispatch and extra heap allocs like in Java or C# or whatever, that option is available. But Rust would be laughed out of the systems programming table if its iterator abstraction were that inefficient.

In the future, please avoid resorting to strawman arguments stemming from ignorance.

sk11001 · 2024-09-07T10:00:05 1725703205

Let me clarify that (1) I've written a decent amount of Rust before giving up on it (2) I'm not a systems programmer, all of the problems I work on can be solved with higher level garbage collected languages.

The thing that I find baffling is when people start throwing Rust at problems that don't need it like backend services and APIs, CLI tooling, games. I'll take your word that the complexity is necessary in systems programming - the problem is that people are carrying that complexity everywhere else too, and it doesn't seem like a tradeoff that's worth it. Rust doesn't gradually expose you to concepts depending on your use case for the language, once something is needed for the lowest level use case, it becomes additional complexity which contaminates all other use cases.

biorach · 2024-09-07T11:44:03 1725709443

> I'm not a systems programmer, all of the problems I work on can be solved with higher level garbage collected languages

Great. Use those languages. Don't criticise rust for not being one of those languages

> people are carrying that complexity everywhere else too, and it doesn't seem like a tradeoff that's worth it.

Yes, there are people who have used Rust in places where a higher level language would be more appropriate. However rust has characteristics beyond raw performance that make it often a compelling choice in areas outside of systems programming. The complexity of the language brings benefits that may or may not suit a particular use case. Any criticism of necessity or otherwise of its complexity is just noise without talking about what you want to use it for.

sk11001 · 2024-09-07T13:57:30 1725717450

> Great. Use those languages.

I do. I'm not criticizing Rust, I'm failing to understand the people who choose to use it in places where it doesn't belong. Over time that's becoming an increasingly larger part of all the people who use it.

steveklabnik · 2024-09-07T17:01:53 1725728513

Your value judgments are not everyone else’s value judgements. That’s really all it comes down to. Just because you consider something inappropriate does not mean that that is a universal truth.

sk11001 · 2024-09-07T18:11:47 1725732707

> Your value judgments are not everyone else’s value judgements.

I opened with this, yes. I think people who use Rust for everyhing have poor taste.

steveklabnik · 2024-09-07T19:23:00 1725736980

It sounds like you do understand, then.

lowbloodsugar · 2024-09-08T00:03:21 1725753801

I mean, sure, people do do that. But it seems like the use-case described here is one where the performance and robustness of Rust would be appropriate. And while I agree there are certainly people looking to push the boundaries, often too far, where Rust makes sense, the language and its ecosystem are also becoming more proficient.

mustache_kimono · 2024-09-07T12:01:44 1725710504

> Rust doesn't gradually expose you to concepts depending on your use case for the language, once something is needed for the lowest level use case, it becomes additional complexity which contaminates all other use cases.

I guess, once you know it, the linked problem really doesn't seem that complex. Once you know it, you know the idiomatic solution will look like this commenter's: https://news.ycombinator.com/item?id=41472874

If it were any simpler to write, I'm not sure you'd have any confidence the code was doing what it should be doing/what you thought was on the label.

But I also strongly disagree with the quoted notion. One would usually ease their way into returning iterators. Much more likely is one would do the simple thing and allocate an empty Vec for awhile, or an Option<Vec<T>>, like the author even tried first, and then decide you want to be fast or, more likely, ask whether this is what was holding you back, and try to decide how to do that.

> It's complexity for its own sake.

I do agree that Rust can be complex, but this is not complex. This is just the Rust way. Right now, complex Rust, to me, is refactoring the code of others. The code of others has a certain weight/concreteness in Rust that sometimes makes it hard to mold into something else. Rust makes small refactorings a breeze. It makes some large refactorings feel almost impossible.

inferiorhuman · 2024-09-07T19:57:47 1725739067

Vec::new doesn't allocate anything, so by extension an empty vec won't allocate anything. Option<Vec> doesn't really save (or cost) you anything here either.

frou_dh · 2024-09-07T10:51:01 1725706261

If we judge by results and not some notion of appropriateness then in my experience a lot of the miscellaneous CLI software written in Rust is great and high-quality from a user perspective. So if their authors are indeed making things harder than they need to for themselves, then it must just show how capable they are that they're able to get great results anyway.

neonsunset · 2024-09-07T11:36:29 1725708989

To be fair, the way IEnumerable<T> works is more of a historical fact.

The underlying type systems allows you to implement the same monomorphized iterators like in Rust, which some third-party libraries do.

The main limitations are at the language level - the lack of associated types and high order functions being opaque means that for nested iterators you may end up writing unwieldy generic signature by hand, and instead of lambda - manually writing a value-delegate-like struct. In a brighter tomorrow, it will be a matter of the language doing so on your behalf, but in either case there is nothing preventing another guest language or even C# itself in the future having identical iterator abstraction implementation to Rust within the existing type system (unlikely to happen in C# as-is because it would be a huge breaking change, the compiler is getting better at inlining and devirtualization so in the simpler cases it may become not needed).

petertodd · 2024-09-07T15:14:29 1725722069

To be clear, in this case if you don't want to think about it, and just want to use dynamic dispatch, you absolutely can! Just return a boxed iterator for the normal case, and a boxed empty iterator for the special case. Rust's dynamic dispatch will basically mean they're both the `Box<impl Iterator<Item=T>>` type, which is actually implemented with an allocation and vtable. Everything will work fine.

You can optimize it later if it actually matters.

I've done this type of thing plenty of times when working on non-critical code where I just wanted to get something to quickly work without thinking about it too hard.

dundarious · 2024-09-07T16:43:07 1725727387

> Rust’s iterators are famous for how well they optimize

Perhaps famous for how well they optimize away the extrinsic overhead of the iterator concept. Not really famous for how well they optimize compared to just writing the loop and iterate code naively with indices or what have you.

And the latter makes it far easier to adapt to use SIMD, sentinel values, etc.

Please avoid implying that Rust has the objectively best tradeoffs for all situations. (See how annoying it is to be on the receiving end of condescension?)

inferiorhuman · 2024-09-07T19:58:40 1725739120

  Not really famous for how well they optimize compared to just writing
  the loop and iterate code naively with indices or what have you.

Sure they are. The loop will have to bounds check at each iteration, the iterator won't necessarily have to. Collect may save on the number of allocations, etc, etc.

dundarious · 2024-09-08T05:29:18 1725773358

Neither of those are intrinsic issues.

inferiorhuman · 2024-09-09T22:07:51 1725919671

dundarious · 2024-09-11T04:44:47 1726029887

So why compare it against an artificially hobbled alternative and say "that's the awesome power of Rust!" when it does a bit better. Compare it against an alternative that is not hobbled with extrinsic issues.

bilkow · 2024-09-07T18:08:39 1725732519

> It's complexity for its own sake.

Hey, let's talk about the complexity here, I think there's reason for it! They wanted to use static dispatch so that the return type has more information on the logic being returned and is able to optimize better while also avoiding allocations at all. Reduced the original example (that doesn't compile) here: https://play.rust-lang.org/?version=stable&mode=debug&editio...

If you want to do what other languages do, you can just use dyn Trait (trait objects, dynamic dispatch) instead of impl Trait (concrete objects, static dispatch). It's less efficient and requires allocations, which is what other languages usually do, but all you have to do is wrap it in a Box. Example: https://play.rust-lang.org/?version=stable&mode=debug&editio...

If you want to use static dispatch, then you have to look a bit more into it, there's a lot of discussion in the comments already but for this specific case Option's into_iter + flatten works fine, while Either is a more generic solution or you could even implement your own iterator for this use-case if you wanted to.

I'd say the complexity here comes from the explicitness and being forced to make a choice: Don't care about performance? Ok, just wrap it in a Box and call it a day. Wants to take the most out of it? Then you'll have to look deeper into it.

ryanjshaw · 2024-09-07T08:23:05 1725697385

I'm really confused... Is all of this just the equivalent of `Enumerable.Empty<T>` or `yield return` in C#?

Measter · 2024-09-07T13:49:38 1725716978

I'll ignore the option of returning a Vec or Option, as the article already went over why these options were rejected, and concentrate on the later options.

Firstly the `std::iter::empty()` attempt: The issue they had here came down to the signature of the function. What they wanted to do was return an `impl Iterator<Item=&N>`. This means that the function isn't naming the specific type it returns[1], only stating that it's an Iterator that gives some `&N`s. However, the return type of this function is still one, specific type. You cannot return two different types that are both iterators. That's why this one failed.

Boxing: Instead, we can return a `Box<dyn Iterator<Item=&N>>`. This is roughly equivalent to a function in C# returning an `IEnumerabale<N>`, in that the actual type is hidden, and the work is done doing dynamic dispatch. This allows you to return different kinds of iterators, because the specific type is hidden away inside the Box. The downside here is that you're doing a heap allocation.

OptionIterator: This one is actually basically just the first Option case from earlier but wrapped up in a nicer-to-use interface for the user because it implements Iterator in a way that transparently handles the Option.

[1] Additionally, it's not even possible to name the type of the iterator here due to the filter closure being unnamable.

MereInterest · 2024-09-07T11:09:33 1725707373

Not quite, because the user of those would need to perform dynamic dispatch in order to iterate over the result. In the article, the returned iterator has a type that is known to the compiler, and so the caller’s code can be optimized much more effectively.

sk11001 · 2024-09-07T08:36:01 1725698161

So you do agree that other languages achieve the same thing with much less complexity?

tcfhgj · 2024-09-07T10:05:50 1725703550

Not exactly the same thing though

blumenkraft · 2024-09-07T09:57:39 1725703059

Performance or not, this language is borderline unreadable.