it's pretty trivial to implement this as a max of 14 byte overhead (with small string optimization), but more importantly, it's only 16 bytes on 64 bit systems, which pretty much by definition aren't that memory constrained (since otherwise you would be on 32 but).