I wrote up the iterator-driven for back in 2011, because it was one of those things that had been long-since forgotten about; along with what it would look like were it to be incorporated into the (then) C++ standard.
I am fortunate enough to own a copy of the High C/C++ Language Reference in English. (-:
Do you know how the break/return would get compiled down to? Would the yield function need to be transformed to return a status code and checked at the callsite?
It's a non-local goto, also a MetaWare language extension, out of the anonymous nested function that the for statement body becomes to (effectively) an anonymous label right after the for statement.
Another part of the High C/C++ Language Reference describes non-local labels and jumps to them. It doesn't go into great detail, but it does talk about stack unwinding, so expect something similar to how High C/C++ implemented throwing exceptions.
Not sure, but imo you could do it with basically reversing the call/return mechanism - that is, whenever the iterator function returns, it saves its state to the stack, just like if it would during a function call, and conversely, when the outside context hands back the control to the iterator, it would restore its state, analogous to how a return from an outside context would work.
That's not at all how MetaWare implemented iterator-driven for, though.
As Joe Groff said in the headlined post, MetaWare implemented it by turning the nested body of the for statement into an anonymous nested function, which is called back (through a "full function pointer") from the iterator function whenever there's a "yield()" in that latter.
So there's no "whenever the iterator function returns". It only returns when it has finished. The body of the for statement is called by and returns to the iterator function, which is in its turn called by and returns to the function that the for statement is in.
All of the "saving state to the stack" that happens is just the quite normal mechanics of function calling, with merely some special mechanics to pass around a pointer to the lexically outer function's activation record (which is why a "full function pointer" is not a plain "function pointer") as a hidden parameter so that the (anonymous) lexically inner function knows where the outer one's automatic storage duration variables are.
MetaWare also had non-local goto from within nested functions back out into lexically enclosing scopes, and since the for statement body is a nested function, it's just a question of employing that already available implementation mechanism (which in turn does the same sorts of things as throwing exceptions does, unwinding the stack through the iterator function) for break/continue/return (and of course goto) inside the for body.
void test(int a, int b);
void foo() { test(b:3, a:4); }
4. nested functions:
int foo(int i) {
int plus(int a) { return a + i; }
return plus(3);
}
5. static nested functions:
int foo(int i) {
static int plus(int a) { return a + i; }
return plus(3);
}
Error: `static` function `test.foo.plus` cannot access variable `i` in frame of function `test.foo`
Every time Walter posts it reminds me my dream language would simply be C with https://www.digitalmars.com/articles/C-biggest-mistake.html and probably go lang style interfaces. Maybe a little less UB and some extensions for memory safety proofs.
That's why DasBetterC has done very well! You could call it C with array bounds checking.
I occasionally look at statistics on the sources of bugs and security problems in released software. Array bounds overflows far and away are the top cause.
Why aren't people just sick of array overflows? In the latest C and C++ versions, all kinds of new features are trumpeted, but again no progress on array overflows.
I can confidently say that in the 2 decades of D in production use, the incidence of array overflows has dropped to essentially zero. (To trigger a runtime array overflow, you have to write @system code and throw a compiler switch.)
The solution for C I proposed is backwards compatible, and does not make existing code slower.
It would be the greatest feature added to C, singularly worth more than all the other stuff in C23.
I don't even understand how the flip from having C++ collection frameworks being bounds checked by default (Turbo Vision, BIDS, OWL, MFC, Powerplant,...) ended happing, with C++98 getting a standard library that does exactly the opposite by default, and a strong cultural resistance on WG21 to change it until goverments started talking about security liabilities and what programming languages to accept in public projects.
As for WG14, I have no hope, they ignored several proposals, and seem keen in having C being as safe as hand writing Assembly code, and even then, Assembly tends to be safer, as UB only happens when doing something the CPU did not expect, macro assemblers don't do clever optimizations.
i think what happened was that turbo vision, owl, mfc, etc., were mostly for line of business applications: work order tracking, mail merge databases, hotel reservations, inventory management, whatever. but since the late 90s those have moved to visual basic, perl, java, microsoft java, python, and js. only the people who really needed c++'s performance (and predictable memory footprint) kept using c++, and similarly for c
maybe as the center of gravity moves from people writing game engines and kernels to people keeping legacy code running we will get more of a constituency for bounds checking
> The solution for C I proposed is backwards compatible, and does not make existing code slower.
Where can I read about it? The only way to make ptrs to array elements also safe that I can think of, is to replace them with triples: (base, element ptr, limit).
Thanks. I got interested in this topic as people are talking about writing OS kernel code in Rust but a) it only helps new code and b) very hard to justify rewriting millions of lines of C code in Rust (plus rewrites are never 100% faithful feature wise). If on the other hand if C can be made safer, may be through a stepwise process where the code is rewritten incrementally to pass through C->C0->C1->Cn compilers, each making incremental language changes, much more of code can be made safer. It will never be as good as Rust but I do think this space is worth exploring.
When writing software I almost never find myself in a situation where UB is a design concern or needs to be factored around in the structure.
I almost always find myself struggling to name and namespace things correctly for long term durability. Almost all compiled languages get this wrong. They generally force you to consider this before you start writing code so you can explore the shape of the solution first.
I think lisp is the only language I've used where this wasn't a burden, but in reality, lisp then forces you to deeply ponder your data structures and access ideology first, so I didn't find it to be that rewarding in the long run.
I love that Go lets you bang simple "method like functions" straight onto the type. This solves the first layer of namespace problems. It does nothing for the second though, and in fact makes it worse, by applying "style guidelines" to the names of the containing types. I am constantly let down by this when writing Go code and I find it hard to write "good looking" code in the language which is all the more frustrating because this was what the guidelines were supposed to solve in the first place.
I really just want C and I want to namespace my functions that act on structs into the structs themselves. Then I can name stuff however I want and I don't have to prefix_every_single_function() just so the me and the assembler can fully agree on the unmangled symbol table name which I will almost certainly never care about in 99% of what I compile.
There's a real joy to the fast initial development and easy refactoring you can find in scripting languages. Too bad they all have wacky C interfaces and are slower than molasses.
If you haven't already I'd check out Zig. It does what you're describing if I am understanding correctly. There are some choices in that language I find annoying, but maybe you'll still enjoy it
I was thinking D but also there is basically a superset of D and HighC that incorporates best of both worlds, some people here may know it, it's called "Holy C" and it was the basis for a new operating system in the same way C was tightly integrated with UNIX. Written by Saint Terry
I also think that the GC of D is really a nice feature. Sometimes you want manual memory management in low level code, but there is also a ton of stuff where it doesn't really matter and a GC just makes things easier. For example, if you're writing an in-memory cache service, you really don't want the cache entries themselves to be tracked by the GC since the GC is often unaware of the actual access patterns and just gets in the way, but most of the other components of that service are better served with a GC.
On unexpected nice feature of the GC is it makes compile time function execution easy, as the engine for it doesn't have to emulate your custom storage allocator.
> You got any idea why people hate the concept of nested functions in C?
It breaks the model of c mapping to the hardware.
Functions in C are just labels you jump to. Assuming you want to capture values from the environment you're defining the function in, you have to implement some sort of function struct that is callable, and it probably won't work with any shared libraries on your system. Also it would be slightly slower, and C programmers tend to not be willing to make that tradeoff
D (and Pascal) implement it by adding and additional parameter to the function called the "static link". (The stack pointer is the "dynamic link".) The static link points to the stack frame of the function it is lexically nested inside. The compiler accesses the variables by offsets to the static link.
Related: The 'lcc-win' C compiler added operator overloading, default function arguments, and function overloading (see "generic functions") [1]. The Plan 9 C compiler introduced several language extensions, some of which, like anonymous structs/unions would eventually be incorporated into the C standard. Present day GCC accepts the -fplan9-extensions flag [2] which enables some nifty features, like automatically converting a struct pointer to an anonymous field for function calls and assignments.
Who was the genius behind these features? Someone was incredibly forward looking in that company. Too bad it never got out into the world and impacted the language standards. It is surprising to see that so long ago.
CLU had for loops with iterators (generators) and yield in the mid–late 1970s [0]. The Icon programming language around the same time had similar generator features [1] (with yield spelled “suspend”). Ada (1983) also had such features I believe. These weren’t completely unknown language features.
Metaware was a prolific compiler company based out of Santa Cruz in the 80s/90s. Loved what they did, they also had a very interesting culture. I knew about them through coughshadycough sites when learning and writing code back in the day.
Not really, if you dig into archives of high level programming languages since FORTRAN, Lisp, ALGOL and COBOL sprung into existence, you will see lots of these language ideas.
You will also discover the rich history of systems programming languages, and how similar C and Go design's are in ignoring what was being done in other ecosystems, and past experiences.
For anyone wondering why the string literals in the pictured examples end with ¥n rather than \n, it looks like these code examples were written in Shift-JIS, and Shift-JIS puts ¥ where ASCII has \.
This was originally just JIS Roman [0], the Japanese ASCII variant from 1969. Shift-JIS is what much later then added double-byte character set support.
The problem with that is that the ASCII code for a backslash is also used as the second byte in a 2 byte character in Shift-JIS, which can sometimes cause Japanese string literals to not work properly in C. EUC-JP is better for this purpose, because it does not have that problem. (Using Shift-JIS with Pascal also does not have this problem, if you are using the (* *) comments instead of the { } comments.)
The author didn't provide information about when this book came out, nor is there any information to find on it. But I think at the book's release, Shift-JIS (the standard) didn't exist yet.
As mentioned in the article, Pascal had these even before Ada, and a task type with an entry is effectively a generator. I think people often forget that C was incredibly primitive for its time compared to multiple other languages.
Content aside, I'm fascinated by the typography in this book. It's simultaneously beautiful and horrendous!
I don't know enough about Japanese orthography or keming rules to be sure but it looks very much like they took a variable width font with both kanji and latin characters and then hard formatted it into fixed width cells?
Either way, it's nice that the code examples aren't in 8pt font like a lot of the books I have...
This seems way ahead of its time, especially with generators. Maybe Fujitsu was able to just do it because they didn't bother with any of the lengthy standardization processes, but that's probably also why all these extensions seemed relatively unknown and had to be rediscovered and reinvented in modern C/C++ decades later.
It was not Fujitsu. It was MetaWare, which had a fair degree of experience with compilers. It had a contemporaneous Pascal compiler, which was quite well known, and Pascal already had nested functions.
I guess those people were parts of the embedded software industry before 2000 (maybe today, I don't know). It's a very good thing that C, the lingua franca of modern computing, actually runs on everything and not just on the stuff we use to browse the internet.
There was a lot more innovation in computer architectures in the 80s and 90s. C89 is designed to permit implementation on unconventional hardware like the Lisp machine. C's flexible targeting is its greatest asset.
Cs ability to run on "unconventional" (though there is little unconventional about the Lisp Machine, it is a just a stack based machine) long predates C89 (specifically).
ZETA-C, a C compiler specific to the Lisp Machine, was already fully fledged by 1987 or there about. Don't have notes on when ZETA-C came to be, but it was much earlier than that, e.g., some of the headers are dated 1984.
One cool thing about ZETA-C was you could embed Lisp code in between C code:
extern FILE *stdin, *stdout, *stderr;
#lisp
;; We don't want this file to "own" these
(zeta-c:zclib>initialize-file-pointer |stdin| 0)
(zeta-c:zclib>initialize-file-pointer |stdout| 1)
(zeta-c:zclib>initialize-file-pointer |stderr| 2)
#endlisp
1. Standardized on two's complement.
2. Little endian.
3. We went from word based memory systems to line based ones.
4. RISC lost out to super scalar designs.
I don't think it was unreasonable at the time ANSI did the standardization originally. As with Common Lisp, they had to consider the numerous already-existing implementations and code written with them in mind.
Coroutines and generators were already well-understood then (see Icon!), so I think it is indeed mostly about not having to worry about standardization.
Question: Was the book from the screenshots composed in Japanese, or composed in English and then translated into Japanese?
Since it's apparently from Fujitsu, I could see it being the former, but if so, I'm impressed with the quality of the English in the printf statements and code comments from non-native English speakers.
The engineers who wrote the High C compiler must have been able to read and write english well enough to have read documentation and source code for existing compilers, and I would imagine that this book was written by the creators of the language.
Are there are good unofficial gcc plugins/extensions out there? It would be cool to extend C with a thing or two without adopting a full blown compiler like C2 or C3.
Yes, but the language itself is OK-ish besides a pre-processor that's too skinny.
Standardizing libraries is one thing I really favor.
But force fitting them into the language itself maybe like over-engineering.
Let me have an example. qsort() and bsearch() are powerful but the call/return overhead is really unbearable. A (much) more powerful pre-processor which is really part of the language (and maybe syntax-aware) could help in creating templates that would generate solid code for either function with the bare minimum overhead.
I am not saying like C++ templates, but like C++ templates.
The file does not display because the browser insists on percent-encoding the apostrophe but the server insists that the apostrophe should not be percent-encoded, therefore resulting in an error message that it won't redirect properly. I can download the file properly with curl, though.
I think these are good ideas.
- Underscores in numeric literals: I think it is a good idea and is also what I had wanted to do before, too. (It should be allowed in hexadecimal as well as decimal)
- Case ranges: GNU C has this feature, too.
- Named arguments: This is possible with GNU C, although it doesn't work without writing it to handle this (although you can use macros to allow it to work with existing functions). You can pass a structure, either directly to the function, or using a macro containing a ({ }) block which extracts the values from the structure and passes them to the function (the compiler will hopefully optimize out this block and just pass the values directly). You can then use the named initialization syntax (which also allows arguments without named), and GNU C also allows you to have duplicates in which case only one of them will work, which allows you to use macros to provide default values. (I have tested this and it works.)
- Nested functions: GNU C also has it, but does not have the "full function value" like this one does, and I think it might be helpful. Nonlocal exits can also be helpful. (I also think the GNU's nested functions could be improved, by allowing them to be declared as "static" and/or "register" in order to avoid the need of trampoline implementations, although "static" and "register" would both have their own additional restrictions; "static" can't access local variables and functions from the function it is contained in unless they are also declared as "static", and "register" means the address can't be taken (therefore allowing the compiler to pass the local variables as arguments to the nested function).)
- Generator functions: I like this too and I think that it is useful (I had wanted things like this before, too). It is also interesting how it can work well with the nested functions.
There are some other things that I also think should be added into a C compiler (in addition to existing GNU extensions), such as:
- Allowing structures to contain members declared as "static". This is a global value whose name is scoped to the strucure within the file being compiled (so, like anything else declared as static, the name is not exported), so any accesses will access the single shared value. Even in the case of e.g. (x->y) if y is a static member then x does not need to be dereferenced so it is OK if it is a null pointer.
- Scoped macros, which work after the preprocessor works. It may be scoped to a function, a {} block inside of a function, a file, a structure, etc. The macro is only expanded where that name is in scope, and not in contexts where a new name is expected (e.g. the name of a variable or argument being declared) (in this case the macro is no longer in scope).
- Allow defining aliases. The name being aliased can be any sequence of bytes (that is valid as a name on the target computer), even if it is not otherwise valid in C (e.g. due to being a reserved word). Any static declaration that does not declare the value may declare the alias.
- Custom output sections, which can be used or moved into standard sections in a portable way. These sections might not even be mapped, and may have assertions, alignment, overlapping, etc.
- Allow functions to be declared as "register". If a function is declared as "static register" (so that the name is not exported), then the compiler is allowed to change the calling convention to work better with the rest of the program.
The usual workarounds are a stateful API (e.g. Cairo, OpenGL, or Windows GDI), passing a structure explicitly (oodles of examples in Win32, e.g. RegisterClass or GetOpenFileName), or a twiddling an object that’s actually just a structure dressed up in accessor methods (IOpenFileDialog).
There could be reasons to use one of those still (e.g. extensibility while keeping a compatible ABI, as in setsockopt, pthread_attr_*, or arguably posix_spawnattr_*). But sometimes you really do need a finite, well-known but just plain large number of parameters that mostly have reasonable defaults. Old-style 2D APIs to draw and/or stroke a shape (or even just a rectangle) are the classic example. Plotting libraries (in all languages) are also prone to this. It does seem like these situations are mostly endemic to specific application areas—graphics of all kinds first of all—but that doesn’t make them not exist.
If you don’t want to use function-like macros for anything ever even if this particular one works, that’s a valid position. But it does work, it does solve a real problem, and it is less awkward at the use site than the alternatives.
With large numbers of parameters, it's almost always more readable to use a config struct. Especially since often, you want to collect configuration from multiple sources, and incrementally initializing a struct that way is helpful.
There are a lot of syntactic sugar improvements the committee could make that they simply refuse to. Named parameters and function pointer syntax are compile-time fixes that would have zero runtime costs, yet it's 2024 and we've hardly budged from ANSI C.
Exactly. I actually think default parameters are hazardous without named-parameter support. When they added one, IMO they should have added the other as well, so that you can specify exactly which non-default parameters you're passing.
I think this is more an appeasement of the C++ committee because they don't like the order of evaluation to be ambiguous when constructors with side effects come into play. Witness how they completely gimped the primary utility of designated initializers with the requirement to have the fields in order.
Pascal lets you match a range of values with case low..high; wouldn't it be great if C had that feature? High C does, another feature standard C and C++ never adopted.
Not "ages" in comparison to how long MetaWare had them. High C had this stuff back in the early 1990s and 1980s.
The headlined article doesn't mention it, but High C/C++ had modules all of those years ago, too. Anybase literals, as well. Tom Pennello participated in the standardization efforts back then, too, but none of this stuff made it in.
GCC nested functions are atrocious and deserve being banned from existence. Like the article rightfully says they've been implemented using weird hacks that make basically impossible to use them safely. There's a reason why Clang has categorically refused to implement them.
I am fortunate enough to own a copy of the High C/C++ Language Reference in English. (-:
* http://jdebp.uk./FGA/metaware-iterator-driven-for.html
* http://jdebp.uk./Proposals/metaware-iterator-driven-for.html