Template Metaprogramming with Modern C++: templates in depth

mindvirus · on Sept 29, 2014

Because they can take integral types, they can take function addresses. This is a really interesting article:

http://www.codeproject.com/Articles/11015/The-Impossibly-Fas...

By passing the address of the function to the template, the compiler knows which function is going to be called - the idea being that a smart compiler can now inline your delegates for you.

tpush · on Sept 29, 2014

At least in my experience, compilers are pretty good at pointer chasing and inlining. For example, in my code a function calls another function through a function pointer that was previously set via an exposed function. Much indirection, right? Well, clang with LTO is able to see which function's address will be assigned, and even further can determine that it's the only function whose address gets assigned to the pointer, and thus can inline the whole function into the body of the other function; across compilation units, across pointer indirection. Everything without templates/inline code.

ambrop7 · on Sept 29, 2014

The clean approach is to wrap functions into classes:

class Wrap { void call() {...} };

template <typename F> { ... F::call(); }

And with some imagination one can write a macro to do this wrapping given an existing function, used like this:

void my_func();

using MyFuncWrapped = WRAP_FUNC(my_func);

// Now MyFuncWrapped::call() calls my_func.

My implementation of the wrapper: https://github.com/ambrop72/aprinter/blob/master/aprinter/me... . But note it doesn't use std::forward since this code assumes no references being used.

cbsmith · on Sept 29, 2014

But you don't get much by wrapping it in that class and making a functor out of it, unless you want to manage state. Either way, in C++, it is a slim line between the two.

C++11 lambdas have this quality of "a type for every function" as well.

jzwinck · on Sept 29, 2014

"By default, there is no exponential and mind blowing executable size increase."

I'm sorry but that's not really true. If you compile with debugging information included (with or without optimization), GCC will give you tons and tons of template name stuff in your compiled binary. I have one program that is over 100 MB and well over half of it seems to be template-related debugging stuff. It's unlikely that you could generate so many and such long symbols by hand if you weren't using templates. If you build without debugging symbols, or strip them, or separate them out, you may have different results, but I don't think that's the most common scenario for C++ users.

You can also wind up with actual instruction bloat too: a seemingly "high performance" function written with recursive templates may wind up with dozens of kilobytes of machine code thrashing your i-cache while a perfectly good "old style" function would do the same job with a hundred bytes. Disk may be cheap, but instruction cache is not.

cbsmith · on Sept 29, 2014

I thought everyone stripped out debug info and shoved it in to a separate lzma compressed file these days...

If you use C++11/C++14 constexpr (and don't screw it up ;-), you really can be assured that the computations will complete at compile time at which point you shouldn't have very much left for the compiler to deal with at run time. I guess in theory you could have i-cache issues and such, but in practice I've yet to see that happen.

pkolaczk · on Sept 29, 2014

Even a not very complex program written in modern C++ is going to use a lot of map, set, vector, unordered_map, iterator instantiations for different types. Even if some of them can be deduplicated by a smart-enough-linker, that's still a significant amount of code, added for little or no gain (most of code is not on critical path).

corysama · on Sept 29, 2014

Does the program need to do the map, set, vector, iterator, ect... operations or does it not? If the work needs to be done then it needs to be instantiated in the code somehow. What is your proposed alternative?

dllthomas · on Sept 29, 2014

To me, the parent seemed to be complaining about multiple copies of effectively the same map (or set, or vector, ...) code for different types, not the presence of the map (or ...) code in the first place. The latter is obviously unavoidable. The former is an efficiency trade-off between the optimization opportunities afforded by specialization and inlining, and the smaller instruction footprint of keeping the code generic.

corysama · on Sept 29, 2014

So then, is the proposed alternative a library of containers and algorithms that are genericized via void pointers, memcpy sizes and function pointer predicates?

dllthomas · on Sept 29, 2014

I'd assume so. In principle you can get there without sacrificing type safety (http://en.wikipedia.org/wiki/Type_erasure). In practice, I don't know whether or not you can write type safe C++ that compiles that way.

cbsmith · on Sept 29, 2014

In truth, if you really care about it, you can take a policy based design approach to this and have explicit rules for when you may or may not do type erasure. It's definitely a bit of work, but you can actually get full compile_time type safety and yet avoid having duplicative code with various tricks like (this skips over a lot of the pain and isn't really how you'd do it... just illustrative):

    template <std::size_t N>
    struct raw {
        template <typename U>
        constexpr raw(U x, std::enable_if<.....>);

        template <typename U>
        constexpr operator U();
    };

    template <template ContainerType, typename T, std::enable_if<std::is_trivially_convertible<T, ....> > >
    struct somewhat_erased_container :  private ContainerType<raw<sizeof(T)>> {
    };

It's particularly easy to do with containers of pointers (of course).

dllthomas · on Sept 29, 2014

Very nice.

cbsmith · on Sept 29, 2014

Herb Sutter has prompted a variant of the Pimpl idiom that kind of goes this way, as have others. Linus was proudly showing how C was better than C++ by virtue of being able to do something much like this, but of course you can accomplish this in C++, of course there are some down sides that accompany it. Most importantly with C++ you can much more easily provide more robust compile time type safety with little or no runtime overhead.

corysama · on Sept 29, 2014

This seems like a nice article series on the subject http://akrzemi1.wordpress.com/2013/11/18/type-erasure-part-i...

Just getting started on it, it seems that the theme is that you can't have the same binary code work on different types without having a run-time indirection in there somewhere. But, you can use templates to keep that indirection convenient and type safe. And, you can structure your templates to only specialize a minimal amount of code necessary to interface your type into a non-specialized algorithm.

dllthomas · on Sept 29, 2014

Right, that looks like an exploration of what I was talking about (and at a skim, looks solid).

"you can't have the same binary code work on different types without having a run-time indirection in there somewhere"

That's certainly the case, but if you're passing by reference you already have an indirection, so that may or may not mean additional run-time overhead.

cbsmith · on Sept 29, 2014

If the code isn't critical path, it really doesn't matter too much whether it was for much gain or not, no? ;-)

pkolaczk · on Sept 29, 2014

Yeah, but that code still sits in memory, occupies space and takes non-zero time to load. It also takes free memory from the fragments of code that are on the critical path. Sometimes it doesn't matter (most desktop apps), sometimes it does (embedded software or system-level programming). Also, RAM usage of quite a lot of kinds of software is determined mostly by code size, not data size (e.g. word processors, spreadsheets, CAD software, IDEs, compilers) and startup-time is also determined typically by loading and dynamic linking time.

cbsmith · on Sept 29, 2014

For most of the common template instantiations, the standard runtime's shared library provides you with deduped versions that are likely already in memory, and most linkers are smart enough to make that dynamic linking very efficient.

True, for embedded systems this can still be a painful waste... But I'd argue if that is the case, you probably should be very explicit about using/not using STL libraries.

cageface · on Sept 29, 2014

Don't most operating systems memory-map executables in anyway? As long as the code is never paged it in shouldn't make any difference.

pkolaczk · on Sept 29, 2014

Agree, but if the code is used even only once, it gets loaded to memory. So if you've got a complex_generic_container<T1, T2>, its code gets loaded for every combination of types used.

cbsmith · on Sept 29, 2014

It gets loaded and then most likely paged out, particularly if your linker is profile guided (but even if not... rarely used code tends to get linked with other rarely used code).

It's fair that if you want to optimize startup times, that first page in hurts, although generally with C++ is's more about the symbol linking overhead than anything else (and there are a lot of strategies for minimizing that, but it sure isn't want happens by default).

cbsmith · on Sept 29, 2014

There is still link time if the types are shared across shared libraries (which happens a lot).

dllthomas · on Sept 29, 2014

There is also cache to consider.

cbsmith · on Sept 29, 2014

There is only cache to consider! ;-)

Other than cache, there are plenty of tricks that can cure a lot of other ills, but cache is precious.

dllthomas · on Sept 29, 2014

If you are paging, fixing that is a win on the order of fitting things in cache.

cbsmith · on Sept 29, 2014

Sure, but that's really just another form of "cache". Before you hit that, you generally get TLB stress on top of the cache line stress, so you're already hosted.

I have to say though, I've not seen too many cases where paging was caused by bloated object code size (lots of cases where it was caused by brutal bloat due to just outright bad code).

There was a time a couple of decades ago where it was a common issue with C++, but since then RAM has gotten a lot cheaper and C++ compilers/linkers have become much smarter (not to mention all compilers/linkers getting much smarter about code size). Sure, you can find pain points, but with a modicum of effort it's really hard to imagine a project really flapping outside the working set size purely due to the compiler/linker refusing to be drink from the water it has been lead to...

dllthomas · on Sept 29, 2014

Well, the original context seemed to more or less be someone saying "You're not really going to be paging anymore..." I suppose a more precise response on my part would have been "there are also other caches to consider", but meaning "processor caches" when saying "cache" with no other modifiers is fairly typical in my experience.

Upvoted anyway for precision.

cbsmith · on Sept 30, 2014

I will concede that my "there is _only_ cache to consider" statement was largely tongue in cheek.

In all serious though, swapping due to code bloat (as opposed to actual runtime bloat)... I haven't seen that in ages. Caches though... like a samurai with an absurdly sharp blade... they kill you almost every time, and usually you don't even know you're dead.

MordodeMaru · on Sept 29, 2014

Guys, you have managed to put our site down. We are working on it but HN has managed to put our site down.

Manu Sánchez will be conducting a talk about template metaprogramming in the C/C++ Madrid Meetup (Spain):

http://www.meetup.com/Madrid-C-Cpp/

You should now be able to access a style-less version of the post in the same URL.

flohofwoe · on Sept 29, 2014

The article is good about explaining when and how code is generated, but I think that the fibonacci example does more harm then good because this is one of those fancy 'template hacks' which I wouldn't use for real-world code. A few really useful real-world use cases for templates are (IMHO):

- they can make interfaces more simple by using template methods/functions (have a single method which is specialized by the compiler, instead of polluting the API with 20 slightly different methods which only differ by the argument type)

- they can enable a coding style which 'looks and feels' almost dynamic without loosing the strict type checks and optimization opportunities that a static type system provides

- they can help to keep related data close together in memory and minimize dynamic memory allocation (e.g. an array of structs instead of an array of pointers)

- the compiler has more optimization opportunities because it has more type information (vs. a more dynamic system where types are only known at runtime)

The tradeoff is of course that the compiler generates more (specialized) code, it breaks the simple rule that the amount of generated code grows linearly with the number of lines of code. On the other hand the compiler has a lot more type info to optimize the generated code. Still the programmer has to know what's happening under the hood so he can weigh the advantages against the disadvantages.

[edit: formatting]

cbsmith · on Sept 29, 2014

Part of what is great about C++11/C++14 is constexpr's. It moves ll this "compute ahead of time" logic in to a simpler and more familiar structure.

There is actually a tremendous amount of value of precomputing a lot of work that is currently computed at runtime (the amount of wasted CPU in your typical C++ program, particularly at start time, is kind of crazy). It's just that the real hot spots in real world programs programs tend to revolve around runtime dynamic logic.

Manu343726 · on Sept 29, 2014

I'm agree with you and aware about the fact that the Fibonacci metafunction is not a good example of real metaprogramming (Computing Fibonacci series using tmp has no much sense). I only picked that example because its recursive structure is pretty well known (Is one of the most common first examples used to teach recursion), so fits well to show people how a complex template is instanced, which is the idea of that part of the post. Compare that with using an instance of std::vector, for example. Is a much more realistic example, but people usually doesn't know about the inside of a vector and how it's structured internally. The main objective of these posts is to explain tmp in away that everybody understands it well. When we all are well covered about the basics, we will can get into more complex and realistic cases in depth. That's exactly why more complex examples included in the post, like build_string, are not explained in depth. Are more a proof of concept.

TillE · on Sept 29, 2014

> a coding style which 'looks and feels' almost dynamic

Exactly. Templates let you do compile-time duck typing. That's extremely useful in so many scenarios.

Animats · on Sept 29, 2014

Yes, you can, but you probably shouldn't.

Template "metaprogramming" is something that originated by accident, as a side effect of the original template mechanism, which was intended as a way to allow writing small generic functions. Then it got a fan club, and influence on the C++ standards committee. Now the C++ language design has gone off into template la-la land, with lots of feature support for a bad idea.

It's kind of cool that rewrite rules are Turing complete, and that C++ lets you do things like compute Fibonacci numbers at compile time using recursive rewrite rules. This is a terrible way to program. Someday, someone will have to debug the thing, and it won't be easy.

There's a long history of this sort of clever stupidity. LISP doesn't have a FOR statement. So one was created as a compile-time macro. Here's the MIT Loop Macro, Common LISP version:

ftp://ftp.cs.cmu.edu/user/ai/lang/lisp/code/iter/loop/mit/mit_loop.cl

All 1496 lines of it. It's very clever. It took over a decade to debug.

Don't go there.

millstone · on Sept 29, 2014

I am really struggling to understand the "build_string" example. Here's my attempt to pick it apart.

    template<std::size_t count , typename STRING>
    struct build_string_impl;

I think the above is a template that resolves to a forward declaration of a struct type called build_string_impl. It says, give me a size and a type STRING, and I give you a struct type.

    template<std::size_t count , char... Cs>
    struct build_string_impl<count,string<Cs...>>
    {
        using result = typename build_string_impl<count-1,string<c+count,Cs...>>::result;
    };

I think the above is a partial specialization of the above forward declaration. In this specialization, the typename STRING is not a string, but instead a variadic char-thingy. So the 'typename STRING' is misleading. This partial specialization creates a struct type that contains a member type 'result' that aliases the 'result' type member of a recursive template instantiation.

    template<char... Cs>
    struct build_string_impl<0,string<Cs...>>
    {
    	using result = string<c,Cs...>;
    };

This is a partial specialization that represents the base case. It declares a template that creates a struct type build_string_impl that contains a member type 'result' that aliases the 'result' of a string type.

Where I get lost is here:

    build_string_impl<count-1,string<c+count,Cs...>>

This attempts to instantiate the build_string_impl template with a size_t and a string. But there is no implementation of the template with these types - only for size_t and variadic char. So how does this get instantiated?

Manu343726 · on Sept 29, 2014

There could be a bug on the code snippets included inside the post, sorry. Check the code running at compiler explorer, it's working and you can see clearly the result string literal at the end of the assembly.

The point of the example is: We represent a string as a variadic pack of chars, and build_string is a metafunction that builds that kind of string recursively given a starting character and a size. Inside that metafunction we define an aux metafunction build_string_impl that does the job, we only have to pass the required initial parameters (The count of chars and the initial empty string). The first partial specialization acts as the recursive case of that function, and the second is the base case (note count is 0 in that case).

recentdarkness · on Sept 29, 2014

Since the site appears to be unreachable at the moment here's the cached link for convenience: http://webcache.googleusercontent.com/search?q=cache:ldrpOr9...

hobo_mark · on Sept 29, 2014

There are so many typos in the code that it must have been typed in manually rather than pasted from working examples, but it was very very informative.

MordodeMaru · on Sept 29, 2014

It could also be the version. When we uploaded it for the first time, all the codes >,< were interpreted by a WP plug-in wrong so we amended them on the go. The version being shown is probably the one before that correction.

Animats · on Sept 29, 2014

Linked site appears to be down.

ExpiredLink · on Sept 29, 2014

"Modern C++" is outdated nowadays. Just look at the C++ code published by Google.

omegote · on Sept 29, 2014

For example? Last thing I read from Google about C++ was the Coding Style Guide (http://google-styleguide.googlecode.com/svn/trunk/cppguide.h...)

Manu343726 · on Sept 29, 2014

The term "Modern C++" refers to the fact that most of the C++ industry still depends on or develops C++98/03 code. Is "Modern" compared to that, and the term is used to refer to state of the art C++ instead of old C++98.

cbsmith · on Sept 29, 2014

"Modern C++" often refers specifically to techniques from Alexandrescu's book" http://erdani.com/index.php/books/modern-c-design/

Thanks to updates to the language, some of the hackery in there is no longer nearly so devilish, and there the language standard provides idioms that are simpler but "good enough if not better" for most people (tuples, smart pointers, move semantics, etc.), so a lot of the specific designs in the book _are_ starting to become passé. Still, it is a fantastic book for understanding the expressive power of C++'s multiparadigm design.

Manu343726 · on Sept 30, 2014

+1 to alexandrescu's "Modern C++" showing the expressive power of C++. Also, imho, one o the key points of the book is how Alexandrescu criticises the OO patterns of the so horrible Gang Of Four. Anybody who have readed their book in depth should note that they provide a poor cookbook with "Do you have this problem, copy-paste that code" notes and an implementation provided by someone thats completely biased with all the OOP buzzwords, but never implemented them seriously in a real language.

cbsmith · on Sept 30, 2014

Yeah... no.

First, GoF and design patterns in general isn't really meant to be "copy and paste" code. They're very clear that the code is provided merely for illustrative purposes.

Most of the design patterns in the GoF book are actually cribbed/realized in standard libraries of various languages where they work quite well. The problem is, those languages look almost nothing like C++. You could describe it as "biased with OOP", but it's really more that C++'s peculiarities (value semantics, comparatively complex and tightly coupled inheritance semantics, a purely functional meta-object protocol with very limited reflection semantics, and yes, it's multi-paradigm approach) make it serve as a bad fit.

Ironically though, much of MCPP is proving C++-style manifestations of GoF design patterns...

cbsmith · on Sept 29, 2014

That's funny... most people argue that Google C++ style guide is outdated (not surprising, considering they have a large legacy code base and a large programming team they need to drag with it...).