I claim MS Crypto API is even worse (I used both Crypto and ETW on the same project, it's by far not as bad, good luck tuning it not to lose buffers under load though):
I was implementing TLS/SSL in one of the services working at MS. I couldn't figure out many things from MSDN and samples - they would not cover error and some variations code paths, and there was just no way to figure it out. And recovery would be something like "in third buffer there will be value x, you have to it pass to other function". And there was a need to do it correctly for obvious reasons, really. So finally I requested access to IIS sources to see how it's done correctly, and discovered couple thousand lines of code with comments like this "xxx told us that buffer X will contain value bla if condition Z is met, and this is why we are doing it here". I had no choice but to cut-n-paste it to my service. I can tell you for sure, nobody outside MS, without access to internal sources? can implement TLS/SSL correctly by using MS Crypto APIs.
I think it has to be the sheer number of APIs that Microsoft has developed: they just don't have the resources to care about very many of them. It stems from Microsoft's extremely insular culture in the 1990s and 2000s, where being a "Microsoft developer" meant that you just didn't read code for open-source projects, didn't look at competing APIs, just did everything your own way. Apple had its own bad APIs, and blew a bunch of them away when OS X came along. So you want TLS/SSL on OS X, the typical way to do it was to just use OpenSSL. OpenSSL has tons of problems, but at least they're well-understood problems, with source code and documentation.
My point was that even though OpenSSL documentation is bad, enough people understand how it works that using OpenSSL may be preferable to designing a new API... unless/until you have the resources to design a better API.
"The API is simple because the problem it’s solving is trivial."
I beg to differ -- as ridiculous as the API interface may be, the problem it's solving is most certainly not even close to being trivial. High-performance logging for something as low-level as the thread scheduler is not something you can write in your sleep.
I think that may be part of the author's point. The api isn't solving the underlying problem it's solving the api problem and they're different.
Quite often a lot of the implementation complexity is exposed even though there's no real need as far as the api is concerned. Largely because no-one really designed the api, it followed on from the implementation.
Not that I'm defending the implementation but I do have one quibble with the article. The author didn't need real time events but he used an rt api. Under those circumstances it's pretty hard to come up with a protocol that doesn't expose some of the complexity without incurring a penalty on either the system or the api. Having said that, the usual approach is to restrict what you can do so it can't cause damage in which case you can usual make a simple api.
A great example of this is the OpenGL API for fixed-function stuff (say, pre 2.0).
Simple glBegin, glEnd was easy to understand, but hid a lot of complexity. As new features were added (and, woefully, old calls supported and not deprecated) the API got more complicated and harder to use without trampling internal state.
APIs (usually) should be as simple as possible for most use cases, regardless of how gnarly the implementation is.
When an engineer uses the word "trivial," what you should hear is "There are some complications that I'd rather ignore, so let's just hand-wave the answer".
What are these complications? The API's job appears to be to let you iterate through a list of log items/read data from a log buffer/however you want to imagine it. That sort of thing is not rocket science, no matter how difficult it was to make that data in the first place.
(Besides, even if you don't think there's anything wrong with the way it provides the caller with data from the list, there's always the session nonsense to point and gawp at.)
Uh, for starters, the buffer doesn't have infinite size. It will overflow. What is the system supposed to do here? There are a million possibilities (discard old data, discard new data, allocate more memory, write to a file, call a callback, return an error, stall the rest of the system or halt the clock, etc.); some make sense, some don't. Between those that do, the user needs to be able to choose the best option -- and the time-sensitive nature of the log means you can't just do whatever pleases you; you have to make sure you don't deadlock the system. That's not by any means a trivial task, and I'd bet the reason you think it's so easy is that you haven't actually tried it.
Yes, that's reasonable. But I'm not sure how this doesn't just boil down to configuring how the list is built up. You'd still be iterating through the list afterwards.
The system's hands are somewhat tied, I think. The events are building up in kernel mode, so it can't just switch to the callback for each one, not least because the callback might be executing already (possibly it was even the callback that caused whatever new event has been produced). So all it can do, when an event occurs, is add the event to a buffer - handling overflow (etc.) according to the options the caller set - though I don't think a callback is practical as this would involve switching back to user mode - for later consumption by user mode code. In short, it's building up a list, and perhaps the API could reflect that.
This is not to suggest that it would be easy to get to there from here. I've no doubt it could be literally impossible to retrofit an alternative API without rewriting everything. Just that I don't see why in principle an event tracing API can't work in some more straightforward fashion.
> What is the system supposed to do here? There are a million possibilities…
No, there are two: You dump old data or you dump new data. Everything else should be up to the user code. It's really not as difficult as you are making it out to be. There's certainly no excuse for a ridiculous API as described in the article.
Huh? If you dump data you miss events. Imagine if Process Monitor decided to suddenly dump half of the system calls it monitored. Wouldn't that be ridiculous? For a general event-tracing system, there have to be more options provided. Maybe it wouldn't matter so much for context-switching per se, but for a ton of other types of events you really need to track each and every event.
Yes, you miss events. But if you try to make build the kitchen sink into your low-level logging system then it ceases to be low level. If your logging system allocates memory then how can you log events from your VM subsystem? If your logging system logs to the disk, then how do you log ATA events? It becomes recursive and intractable.
The solution is to make your main interface a very simple pre-allocated ring buffer and have userspace take that and do what they please with it (as fast as it can so things don't overflow).
There is always a point at which your logging system can't keep up. At the kernel level you decide which side of the ring buffer to drop (new data or old) and at the userspace level you decide whether to drop things at all or whether to grind the system to a halt with memory, disk, or network usage.
The options are not simply "drop data" or "don't drop data". The options depend on the logging source, because not every logging source requires a fixed-size buffer. The API itself needs to support various logging sources and thus needs to support extensible buffers (e.g. file-backed sources, the way ProcMon does). Whether or not a particular logging source supports that is independent of whether or not the generic logging interface needs to support it.
I think we're talking past each other here. I don't think we're disagreeing on the userspace part. I'm not even implying that the the low level kernel interface should have unconfigurable buffer sizes. They should be configurable, but pre-allocated and non-growable. You're right, the userspace part can do whatever it wants. But I stand by my last paragraph (you either drop or grind things to a halt).
> Huh? If you dump data you miss events. Imagine if Process Monitor decided to suddenly dump half of the system calls it monitored. Wouldn't that be ridiculous?
All sorts of systems have worked like this in the past (search for "ring buffer overwrite"). If you can't assume unlimited storage, you have to make a decision whether it's more important to have the latest data, dropping older samples, or whether it's more important to maintain the range of history by lowering precision (e.g. overwriting every other sample).
> but for a ton of other types of events you really need to track each and every event.
If you really need this, you have to change the design to keep up with event generation. That's outside the scope of a low-level kernel API where performance and stability trump a desire for data.
Well, the API in question (which I've used, and it was indeed an unpleasant experience) might not be solving something trivial, but it's certainly not well designed.
My all-time worse API is SetupAPI, which despite its name is how you get access to USB devices on Windows. It's . . . pretty miserable. Runner-up is the COM-based stuff that manages the Windows firewall, which is not well specified and has 'interesting' timing issues.
I have mercifully lost most of my memory of the Java stuff I was doing 15 years ago. That stuff made me hate life.
My award goes to Extended MAPI. It took me weeks of trial and error just to read and send email messages through an Exchange Server. I remember people were selling 3rd party wrappers for the API, because it was so horrible.
I used the Extended MAPI API for years ... I thought there was something wrong with me as I struggled through the insanity ... until I saw modern APIs.
The API is not the service. The service is solving a hard problem. The API is reading rather simple data from the service in batches. The API's problem is a trivial one.
Poster has not worked very deeply with Win32, is unaware of its conventions. Film at 11.
For example:
> Yes, that’s right, every user of the Event Tracing for Windows API has to do the arithmetic and layout of the packed structure format themselves.
These represent very common idioms in Windows. One common idiom is about binary compatibility. Microsoft can change the length of the structure in a future rev of the SDK. Old callers can still work because they are specifying sizes and offsets - the library can look at these and know what to do. The other common idiom (very common in the NT kernel for example) is similar to what C99 introduced for structures with variable-length members, something C definitely didn't do for you and even today with it standardized still gets pretty clumsy.
The author lost all credibility when he wrote this:
Why on earth would you say (LPSTR)(char * ) ? That is literally saying (char * ) (char * ).
To me a "bad" API enters into questions like:
* How does it handle errors? Consistency is good. Swallowing them to the caller is bad.
* Does it give the caller the right level of detail about what is going on? It's especially common for it to be a black box and completely fail under some condition that the author did not envision. Some kind of escape hook that exposes implementation details makes library maintanence difficult but sometimes it's needed.
I haven't looked too deeply at etw but I don't suspect it fails at these. Maybe it errs too much on one extreme on the 2nd bullet.
And that article, having the same tone, would be about it being bad. But not understanding the conventions is not the same as being bad. A lot of this coding style is hard fought and battle tested.
A cumbersome API is a cumbersome API, regardless of why the developers thought it should be that way. Author understood conventions, I wager, but was treating it under the lens of "If I had to design this now, the sane way, where would the mismatch be?"
The only saving grace of a lot of the MS stuff is that the MSDN docs are usually pretty good--usually.
No question about it. My answer would actually be a lot more about the history of both Win32 and Win16, both of which I programmed in C++ for years. Looking at something like that now and proclaiming it bad is about the same thing as deploring the Mongol invasions.
> Why on earth would you say (LPSTR)(char * ) ? That is literally saying (char * ) (char * ).
Possibly because it's not obvious that LPSTR is the same thing as char *. Sure, if you've done a non-trivial amount of windows programming I'm sure it's one of those things you just get use to. But as someone coming from the Unix world, I can't imagine why you wouldn't just use the native types (same with DWORD and friends).
Yes, that's frequently done in win32 land, but in this case, I'd argue the following should have been done
1) Why provide a copy of the parameter at the end of a struct?
2) How will you ever expand that struct given the fact that you put a variably-sized member at the end?
3) Why doesn't the current version of the header come with a
char[] SessionName;
member as last member, so it's at least halfway convenient?
4) No seriously - why is it copied in at the end and not a pointer?
5) That struct doesn't even have a DWORD size member as first member, are we expecting to get coupling between some flag and "oh and expect there to be additional members after that char array"?
> No seriously - why is it copied in at the end and not a pointer?
Consider where you have seen similar patterns in the Unix world. The obvious one would be they intend to pass the buffer to kernel mode and a structure with lots of pointers inside will be a pain in the ass to pass over and validate.
A flat buffer with a couple of offsets works better for that. Copy over the whole blob, check a few lengths. Generate your EFAULT errors in a single place. Better than following lots of user mode pointers.
Dude worked at RAD, so I'm willing to bet he's got some experience.
In your LPSTR example, that could've been to silence compiler warnings about doing pointer arithmetic.
EDIT:
From bio:
The most significant project I’ve created to date has been The Granny Animation SDK, a complete animation pipeline system that I first shipped in 1999 and which, 15 years later, still is in active use at many top-tier game studios.
"
So, uh, yeah. Maybe do some reading before spouting off on another's presumed abilities?
EDIT2:
Downvote all you want, but what code of yours has been in production for 15 years?
He's casting pSessionProperties to a char* so that he can do the arithmetic on it--otherwise, the compiler would assume "Oh, golly, I should increment the pSessionProperties pointer by LoggerNameOffset times sizeof(SessionProperties)".
He has to make that conversion in order to do byte offsetting correctly. He then casts that back to what it wants (an LPSTR), to match the required argument type on the function.
It's completely reasonable code, so stop complaining about it as though it weren't.
So, at that point, it gets to be a little more philosophical, right?
I err on the side of "Oh, the API requests this type (which I'll pretend I don't know is actually just a char*), so I will explicitly cast to that". At least that way it's clear in the future that there is some explicit changing of types going on if, say, LPSTR ever changes.
Silent "Oh, well, we all know that it's really going to be a <whatever> pointer here anyways" is a good way to get subtle bugs.
PSTR and PWSTR have a size implicit in their name. They are not going to change. PTSTR (so far not talked about) happens to change based on a macro but I would not recommend using it in this century - it's easier to build all your Windows apps as utf-16 and pretend everything is PWSTR (could be a whole other topic).
It sounds more like you err on the side of inserting lots of pointer casts into the code without considering or very well understanding what the types mean, in order to "shut up compiler warnings" that might not even exist. This is pretty common but it is often a really good sign of someone who doesn't know what they are doing, they are fighting the compiler warnings in their own head instead of solving real problems. (It's really easy for a pointer cast to mask a bug too.)
Frivolous pointer casts are always suspicious. It's much better to let your compiler generate the warnings, listen to them and understand them, and in a lot of cases, fix issues without mindlessly putting in a cast.
Whatever dude. Go on living in your magical bubble (in Redmond, perhaps...?). You've seen the arguments here against why your earlier analysis is wrong, and are hellbent on insisting that writing more explicit code is not a good idea because you're such an elite h4x0r.
With any luck I won't have the pleasure of sharing a codebase with you.
They made a big bet on 16 bit chars before utf-8 existed. Lots of stuff from the same time period stuck with the same choice (java is one example). I am not a fan or opponent of utf-16, utf-8 does work well, but in many places a larger char type is a reality. In Windows it's the only way that makes sense, by the time you get to a syscall you need 16 bit strings and support for anything else without conversion is considered legacy.
I'm not even saying this is the only way to structure a code base or that I'm unwilling or haven't seen or worked with something else. I'm talking about what the sane conventions for a Windows app would be. When in Rome, and all that. I would not advocate utf-16 on Unix (even if that's what millions of people using say Java end up getting).
The docs remain silent on the subject but since ControlTrace takes a TCHAR * I guess the logger name in the struct could be a TCHAR[] too. So perhaps LPTSTR was intended.
It has lot of numbered "ICNTL" (input control) variables as well as named struct fields, and some of those struct fields are only valid if, for example ICNTL(26) has the values 0, 1 or 2.
ICNTL is actually an array with 40 elements, though "only" 1-33 are currently in use. And then there's CNTL (float control parameters), INFOG, RINFOG and what not.
Also, the number of calls you must do to the actual processing function depends on some combination of those magic variables.
Needless to say that there are no named constants for all those magic variables, because the code must be backwards compatiable with Fortran 77, where the behavior of code with identifiers longer than 8 characters is undefined, and differs from compiler to compiler (some truncate, some allow them, some error out). So names are actually in short supply.
I had amazing "fun" with Amazon's marketplace API. Highlights include:
- rejection without error messages
- multiple hours before the successful call showed up as successful
- broken XML schemas along with conflicting versioning and
My favorite:
- error messages from the API asking me to call customer service to perform that action.
According to Eric S. Raymond's version of the Jargon File, people used to joke that MS-DOS system calls delivered "same-day service." Now that we're living in the future, that joke has become literally true.
I thought the standard way to use ETW is by generating code from a manifest file with Message Compiler. Presumably it takes care of all this boilerplate? You're right though, that's a weird API. It'd be interesting to learn the process behind its design.
This is the way I've done it. Which boils the problem down to an init call and a call per event. I dislike the code generated bindings though - is there a more... data driven alternative that's good?
I've got a simple event API where I have an event type identifier, and some basic data. Which I then need to pipe to some third party junk that listens to ETW events. To do this, I have to:
1) Manually map hundreds of event types to code generated ones, and update this mapping every time new events are added... or do a massive N:1 mapping... which is totally against the grain of the intended usage of the third party junk (although appears workable for the moment.)
2) Wrap every ETW event type with callbacks, and the init macro with a stub to allow callbacks, such that I may target the two "identically" configured endpoints... each with their own manifest and autogenerated header.
3) Wrap every ETW invoke with #ifdef s to limit them to the 1 of my 3 platforms they work on.
It's quite ugly compared to the equivalents of the other 2 platforms.
For a very deep, thorough and painful treatment instead of feel-good books, read "Practical API Design" by Jaroslav Tulach. Yes, it's Java, but it exemplifies the fundamental tradeoff that the feel-good books ignore: the more powerful you make your API for users, the less potential for evolution and long-term maintenance your API retains.
I offer a simple API decision which has condemned generations of programmers to useless toil. The decision in .Net to not map database NULL to programming language null. Perhaps there was some higher level philosophical distinction being drawn which mere mortals are not capable of understanding.
If you feel the pain of public APIs you should definitely go and try some internal APIs companies write for themselves. Somehow they can really get worse than what people used to write for Programming 101 homeworks.
I was half expecting the rant to be about TAPI. From memory, I remember at least one instance where it returns a void* pointer, with a few integers telling the offset of the information you need to get.
But at least the function calls made sense. This is much worse.
A funny anecdote on the topic. When I worked in WinFS team, folks tried to add new flags in CreateFile API. Turns out, all 32 bits in dwFlagsAndAttributes were already taken, including the hidden usages inside the Win32 subsystem implementation.
"ALWAYS start by writing some code as if you were a user trying to do the thing that the API is supposed to do."
It still amazes me how many times junior devs ignore this. They will spend a lot of time walking through the UX scenarios, mocking it up, etc. But when designing the API layer, they just start creating end points instead of focusing on the "DX" scenarios (developer experience, not theirs, but devs using the API).
I remember reading Cwalina's "Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries" and really learning what that means. It is a book focused on .NET but I wish many of my colleagues in the open source world would get over that and just read it. It would change their view on API design. And that is just the first chapter.
I do a modified version of this, mostly because I find if I start by writing code calling a "magical" API I sacrifice too much in the implementation to make the API exactly match my design.
I do start by writing code to solve my problem without an API, then work from there to derive a sane API. Usually something reasonable falls out from that, without forcing the machine to bend over backwards to support what I'm doing.
I don't think this is a bad principle—APIs ought to make common needs trivial to meet. I do think it's an over-simplification that's worth refining.
(As an aside, I'm becoming more and more convinced that libraries implementing network protocols should pass network transport duties to the caller, at least in C-family languages. It seems worse for the caller, but trying to work e.g. a WebSockets library into a project with its own I/O event loop is its own exercise in pointless frustration.)
Hmm, my experience was the other way around; the junior devs who'd read their Agile books always did this, it was the senior devs (or architects) who would produce baroque APIs that exposed all the possible options but didn't help you with any use cases.
The second edition of the Framework Design Guidelines book was written in 2008 and seems (from a 10 minute browse of the first chapter) to be pretty up to date still.
It's ISBN 9780321545671 if anyone's looking for it (it's on Safari Books Online, too)
Yea unfortunately, this is one of those things one really end up learning by experience. Once you've been writing code for long enough (and especially in a codebase that's been highly) is when the developer experience starts to be a thing that really occupies your mind when building.
Yeah, when I make API's or configs(or whatever) I usually start with the message(or config) format that I would want to use myself, then go from there. I should think for people wanting to use their own API's that would be pretty natural though.
ETW is much more than a simple logging system; the description says "Use ETW when you want to instrument your application, log user or kernel events to a log file, and consume events from a log file or in real time. ".
It allows broadcast (multiple consumers), structured binary data, filtering based on data structure, etc. It is an immensely powerful system (See manifest-based events above). In addition to everything it does, it is also designed for performance and for not using too much space.
Re my attempt to use ETW: I began to drool when reading the docs; it would be immensely useful in my project. However, the product must also work on Linux, and building a cross-platform layer for something like ETW (even if the Linux counterpart is a no-op) would be an overkill. I might return to it one day.
From the article: "It is a logging system, and it is used to record performance and debugging information by everything from the kernel upwards."
It's apparent that he did not do his homework. ETW is much closer to DTrace than to syslog (e.g., you can turn on and off certain events in a running application w/o disruption).
EDIT: It seems that the author uses ETW as an excuse for writing a rant about how an API should be designed according to his taste. Powerful, low-level APIs are difficult to use. Simple as that.
To be fair -- I am certainly not the only one that opened that page scared of seeing my own name. :-)
>> It’s a great time in the history of computing to be writing an article about bad APIs (which is another way of saying it’s a terrible time to actually have to program for a living).
It is hard to make good APIs. And for complex subjects it is probably impossible without iterating.
But you would think the guys that did the 2nd generation of VMS were smarter and more experienced than me? But OK, there weren't much event driven programming in VMS, I assume.
My vote for worst API I ever read:
I remember reading the "Inside Macintosh" (pre Mac OS X) about simple file IO and it took me multiple readings to realize that you just set some of the parameters to get all the different functionality... (The rewrite of the Inside books were really good.)
I was implementing TLS/SSL in one of the services working at MS. I couldn't figure out many things from MSDN and samples - they would not cover error and some variations code paths, and there was just no way to figure it out. And recovery would be something like "in third buffer there will be value x, you have to it pass to other function". And there was a need to do it correctly for obvious reasons, really. So finally I requested access to IIS sources to see how it's done correctly, and discovered couple thousand lines of code with comments like this "xxx told us that buffer X will contain value bla if condition Z is met, and this is why we are doing it here". I had no choice but to cut-n-paste it to my service. I can tell you for sure, nobody outside MS, without access to internal sources? can implement TLS/SSL correctly by using MS Crypto APIs.