Hacker News new | past | comments | ask | show | jobs | submit login
An update on GNU performance (arm.com)
161 points by ingve on Oct 21, 2018 | hide | past | favorite | 64 comments



The original title, "An update on GNU performance" is missing some context that's mostly implicit from the source. It would be great if it said either "An update on GNU toolchain performance on ARM" or "An update on GCC performance on ARM".


Sigh, still hoping for the day it means Hurd with GNU toolchain...


If you don't mind me asking why Hurd? Is it because it's a microkernel? or is there other things I'm missing?


I hope someday it can work with modern hardware.

When I read the headline I was expecting some benchmark tests related to it.

Imagine a world where you could just change your kernel via `pacman -R linux && pacman -S hurd` and have a good fallback should ever linux not be a desirable option, or at least to have some competition.

Having a free microkernel implementation retrocompatible with linux would give enough user data to settle the monolytical vs. microkernel debate, who knows which niche applications would it be ideal for? Mission critical software, realtime processing, embedded software, small reliable image containers, a fresh bench for new OS developments?

There's also the social aspect of how linux is currently developed, what would happen without Linus strong leadership? I am sure there's a very capable team there, but sometimes only technical proficiency is not enough. Would a fragmented leadership handle the NSA's attempted breaches as well? Having a separate project that can handle the same applications would be good in my opinion.

Bonus points for getting rid of that (GNU[+-/])?Linux bullshit.


Having had to work on a locked down linux (stupid corporate rules) system for the last couple of months the dedication to user freedom sounds enticing: https://www.gnu.org/software/hurd/community/weblogs/ArneBab/... . As a user being able to install packages without root and start my own services and mount network drives without root would make my life a lot easier.


> As a user being able to install packages without root and start my own services and mount network drives without root would make my life a lot easier.

... but that's not a problem of linux, that's a problem of your company rules. They would put similar rules whatever the OS.


Locked down was probably not the right word, I'm just a normal user with no root access. It is a linux problem because the admins are trying to do something pretty sensible, not let a single user screw up the machine for everyone and restricting users is how linux achieves this, HURD took a different approach.


> As a user being able to install packages without root and start my own services

The future is today:) Nix is a package manager that can run without root. Gentoo prefix installs are an option. Or if you want something more traditional, proot can run a decent number of distros. Maybe also fakeroot? Haven't tried that personally.

No idea if there's a FUSE module for NFS/CIFS, though sshfs is a thing.


There are ways to prevent users to run any kind of executable from $HOME.


Oh, you have a policy problem rather than a technical problem. In that case, I'm not sure how it would help to have a system where user control was normal, since invariably it would be locked down in exactly the same way. I don't think HURD will prevent mounting /home with noexec.


I am not the OP, and by the way, mounting /home with noexec isn't the only way to prevent execution. :)


Polkit already allows regular users to manage services [1]. Of course that would require the admins responsible for the system to set it up in such a way.

[1] https://wiki.archlinux.org/index.php/Polkit#Allow_management...


I have seen “GNU” been used instead of “GCC” by many people. I’m not sure why.


Does anyone know why "char" is unsigned on ARM/gcc?

To me it seems like a weird design choice that only complicates porting software from x86.


Why would you expect char to be signed?

If you mean just because it's signed on x86, fair enough; but it sounds as if you think signed is just a more natural option. My intuition goes the other way, for what it's worth.

Anyway, here is one possible reason. It used to be that if you wanted to load a single byte from memory on ARM, the only way to do it always treated it as unsigned. So if you wanted to work with signed chars, you needed explicit extra instructions to do the sign-extension. This isn't true for more recent versions of the ARM ISA -- there's an LDRSB instruction to go with the older LDRB -- but it may be one reason why that choice was originally made.


I'd expect char to be signed, because all other integer types are signed by default.


"char" has something else unusual going on. If you compare it to a "unsigned char" in an if-condition, and compile with -Wall -Wextra, you'll get a warning about casting. If you compare it to a "signed char" on the same system, you'll still get the warning! In fact "char" is not considered to be exactly the same as "signed char" or "unsigned char", it has 3 variants! Even though logically it must be one or the other on a particular platform. So you could think of char as mostly characters, whether ascii, or iso8859-1, or utf8 code-units ...

Functions like toupper() take an int but say "the value must be representable as an unsigned char" so you technically need to cast to "unsigned char" on most platforms, but I doubt anybody uses these functions for anything but ascii. They may work for single-byte locales like iso8859-1 etc if you have the locale env vars set right, but they won't work for non-ascii chars encoded as utf-8, which is generally what you always want to use these days. (There's towupper() which works with 2-byte locales like UCS-2, which is utf-16 without surrogate pairs, and can't represent all the new unicode chars, but you probably don't want to go there, you want some modern unicode library that works properly with utf-8 or UCS-4.)


towupper() works with wide chars, which aren't necessarily 2-bytes. In fact on UNIX-like systems wchar_t tends to be 32 bits and wide chars are usually UCS-4.


I honestly didn't know that ... I've never actually used the libc wchar_t functions :)


Good, because nobody should use wchar_t at all. It's the API that was thought up by people who got drunk and asked themselves "how can we make this char situation even worse?" wchar_t is widely recognized as one of those huge mistakes from the 90s, along with UCS-2. Today you should store strings in as bytes using UTF-8 and if you need to handle them in a fixed-width format you would choose an explicit 32-bit-wide type.


Except when you’re on Windows and have to use WCHAR to handle Unicode characters because they use UCS-2, not UTF-8


On Windows WCHAR is defined to hold a 16-bit unicode character, and is defined to be unsigned. In standard C wchar_t can be any damn thing, and isn't even guaranteed to be wider than char. It can be signed or unsigned. It is useless.


I know you're technically right, but it still seems bizarre to me to ever use char as an integer. If I wanted a byte-sized integer, I would use int8_t/uint8_t (today, anyway).

The only use for char that ever seemed intuitively reasonable to me was to hold ascii characters.


char has another use. I've always figured char pointers were the proper way to provide byte-level access to other objects, since char pointers are allowed to alias other pointer types. That is, they're not bound by strict aliasing rules. I don't think int8_t or uint8_t have the same special exception.

This means you could implement your own versions of memcpy, fread and fwrite by casting the void * arguments to char * , but if you cast them to uint8_t * , your code might not be correct.


That rule also applies to unsigned char and signed char. In practice I think uint8_t and int8_t are usually just typedefs for these respectively, but in principle they needn't be, so you're correct that the aliasing exemption might not apply to those.

I would tend to prefer explicitly using unsigned char or signed char rather than plain char though, partly to signal that I am treating the bytes as integers rather than character. (Actually I would still use uint8_t even though I just learnt it might not be unsigned char, because it looks clearer in my eyes, but I'm not sure I should admit it here...)


prior to C99 adoption char was usually how you got a int8_t. ascii characters are technically 7 bits, so should the extra bit be sign?

That's all just having fun, I like the consistency argument and the fact (is it a fact?) that char is signed on most platforms.


> char was usually how you got a int8_t

Not sure if that's just a typo, but you would use a signed char, which is a different type to char even on implementations where char is signed. Part of the reason for this, of course, is because char can be unsigned so if you want a signed integer you have to specify that. But more philosophically, unsigned char and signed char are numerical types that are not meant to be characters (despite their names), whereas char is a character type that just happens to be backed by an integer.

Indeed I believe that int8_t is almost always just a typedef for signed char (but I still would use int8_t where available for clarity).


I'd expect them to go by the standard. "C Programming Language" has this to say:

> Whether plain chars are signed or unsigned is machine-dependent, but printable characters are always positive.

Kernighan, Brian W.. C Programming Language (p. 36). Pearson Education. Kindle Edition.


Well, char has two semantic meanings. Either as a raw byte, or as an ASCII value. Both are represented as unsigned values (at least conceptually), so making them unsigned-by-default is fairly reasonable. Integers in mathematics and common usage are signed, so making them signed-by-default is also fairly reasonable.

But if you think of char as a typedef [u]int8_t, then I do get the consistency argument.


> I'd expect char to be signed, because all other integer types are signed by default.

But why would you expect char to represent an generic integer in the first place?

It's just a wrapper for bytes, which in their nature are just bits devoid of traditional mathematical, numerical value.


Performance. Historically, ARM didn’t have a “load byte and sign extend” instruction (http://www.drdobbs.com/architecture-and-design/portability-t...), making loading a signed char and promoting it to an int slower than loading an unsigned char and promoting it to an int.

In C, a function argument or return value of type char gets promoted to int. So, code that uses char a lot does a lot of such promotions.


> In C, a function argument or return value of type char gets promoted to int. So, code that uses char a lot does a lot of such promotions.

I think you're confusing standard C and machine/implementation-specific behavior. (If not for yourself, for people who read your comment.)


It's worse than that, arithmetic is never done in types narrower than int so even using + with a char type will do a such a promotion.


Signedness of "char" is purely platform-specific. If you want to write portable code, you always have to specify "signed" or "unsigned" before "char".


Why not use {,u}int{8,16,32,64}_t from stdint.h? Then you don't have to think twice whether some type is signed or unsigned.


stdint.h was introduced in C99, yes? There's a lot of code that started before that and hasn't been converted to newfangled things.


> newfangled things

stdint.h is nearing 20 years old now, I don't think it counts as a "newfangled thing" anymore.

But yes, there is plenty of very obsolete software still out there.


> stdint.h is nearing 20 years old now, I don't think it counts as a "newfangled thing" anymore.

In the world of C, it is. Custom compilers for particular embedded systems, such as MCUs, often only speak C89 and nothing else. C99 still hasn't found complete and widespread acceptance.


The compiler is irrelevant here as stdint.h can come from a variety of sources. There's no reason to not have stdint.h, even on ancient as dirt embedded systems stuck on C89. Even in the worst case of there's no vendor-provided one you can still just define it yourself using a copy from almost anywhere else, there's very little to it.


stdint.h can't be reliably be premade. A compiler is completely free to, say, define sizeof(unsigned int) to anything as long as it can represent at least 65535. stdint.h has to work with the C base types, so it does have to be customized for a particular compiler.


that's assuming your compiler even ships any kind of library outside of a header for your hardware's specific functions


Because these are optional. Consider {,u}int_least{8,16,32,64}_t instead.


Technically optional, yes. Any implementation that lacks them is in for a very bad time, though.


> Why not use (...)

Because sometimes you are not the author of the software you want to port. Of course, when you start porting the software, you'd use a (predefined) typedef.


The only argument I can see for using other types is for int, when you want the natural length integer, in any other case it seems to be than stdint.h types are better?


Using the natural integer whenever possible keeps the C code abstract. The program becomes less limited as it is ported to more capable machines with bigger integers, instead of continuing to pretend that everything is a 32 bit i386.

You need some low-level justification for using an uint32_t and such: like conforming to some externally imposed data format or memory-mapped register bank.

The justification for <stdint.h> is that it's better to have one way of defining these types in the language, than every program and every library in every program rolling its own configuration system for detecting types and the typedefs which name them. Let's see, for calling Glib, we use guint32, for OpenMax we use OMX_U32, ...

Funny how these situations persist almost 20 years after stdint.h was standardized (and a number of more years after being draft features).


Well, except that I don’t think most software authors actually know where it is possible to safely use an int versus one of the fixed width stdint types. In particular, you now need to make sure that your code works correctly no matter what the actual size of an int is. This involves complicated knowledge like int promotion rules and how they will interact with different sized int, long int etc. So instead of having portable software you just have software that may fail in unexpected ways on different types of machines. I don’t actually know the rationales , but I would think that making it easier to write portable software was possibly one of the goals of introducing stdint.


The promotion rules only get worse when you use a type alias like int32_t, which could be a typedef for short, int, long or conceivably even char.

An expression of type int doesn't undergo any default promotion to anything, period; the widening promotion is only applied to the char and short types.

An int operand may convert to the type of an opposite operand in an expression, and an int argument or return value will convert to the parameter or return value type. That applies to int32_t also.

Anyway, you really have to know the rules to be working in C. Someone who uses fixed width types for everything doesn't know what they are doing and are just taking swipes at imaginary ghosts in the dark out of fear.


The program becomes less limited as it is ported to more capable machines with bigger integers, instead of continuing to pretend that everything is a 32 bit i386.

I'm guessing you've never ported code that was written for 32-bit ints back to a 16-bit architecture.

Always better to make your type sizes crystal-clear to the reader, IMO, even if it risks using a suboptimal word size on some other platform down the line.


> You need some low-level justification for using an uint32_t and such: like conforming to some externally imposed data format or memory-mapped register bank.

This is backwards. You should use uint32_t etc. like any higher-level language would, unless you have specific reasons to know you need a machine-sized type. Making your code behave differently in some arbitrary, untested way on machines with different default int sizes isn't going to make it "less limited", it's just going to make it broken.


Can you cite one page in the K&R2 where the authors are using some 32-bit-specific integer? Or any other decent C book: Harbison and Steele? C Traps and Pitfalls?


If you don't care about signedness of char, using it might be better for performance. Otherwise, yes, you're right.


Because a char is meant to be used as a character type not a numerical type, and sign doesn't mean anything in that context? Yes, I know people use it to mean byte (even though in most cases the compiler will promote it to 16 bit or whatever the word length of the platform is) but if you mean to use an integer, you should use an integer. The only time I can think of when you would use char as an integer is on 8-bit microprocessors with limited RAM/storage (which, given they're 8-bit, they probably will be limited). But if there are other use-cases, or my understanding is wrong (I haven't done any serious C/low level work in 20 years), please do correct me.


I thought that media codecs were handled outside of general purpose code. Like maybe a special chip or on a graphics processor.


It depends on the hardware. There may be offloading for h.264, VP8, for different profiles and for encoding or decoding. But not all combinations are available on all machines. So software decoding may still be needed.

Now software decoders are often written with some hand-optimized assembly loops, but those optimized variants don't cover all possible target platforms and not every single loop is hand-optimized. So everything that's either a plain-C fallback or just hasn't been optimized can benefit from compiler improvements.


Nope. They could be GPGPU accelerated, but if you aren't rendering to screen then it's not typically graphics processor territory.

Plus modern codecs are quite linearly data dependent which is a bad for for a GPU.


But aren't movies rendered to the screen?

My understanding is most smartphones at least have hardware decoding for H264.


Exactly, a GPU needs to have specific hardware (a dedicated ASIC) for H264 acceleration. It's not accelerated by the usual GPU hardware, which is sufficiently general-purpose that an accelerator running there would not be codec specific.


Arm even sells such a solution, there is a line of Mali video processing units.


This article includes a misleading graph. The "performance improvement" bar graph has bars that extend down into performance degradation, making the improvement look bigger than it really is. If you're going to label the bars as "improvement" then the y-axis should start at 1.


No, it's just normalized data. See the legend: "Throughput/latency improvement over glibc 2.27".

Rather than show the new metric and the baseline, they showed all metrics as a function of the baseline. All of them are an improvement.

This is a common idiom for compiler benchmarks -- IMO it's not misleading at all.

> making the improvement look bigger than it really is.

They even highlight the 1.0 line. IMO it would be confusing for them to start the y-axis at 1.0.


I don't recall seeing normalized data like this on a bar chart before. It's standard for line charts where the x-axis is sampled from continuous data, not bar charts. The 1.0 value is effectively the zero point, so bars should extend up and down from 1.0 (in this case only up because no performance got worse).


This kind of normalized bar graph is the most common way things like SPEC results are reported, although I think that showing one category of bars that's fixed to 1.0 to label what it's being normalized against is more common. (Annoyingly, throughput and latency are both shoved in the same graph, which gives the misleading impression that you're comparing one against the other).


Nice job! Would expect more implementation details from the article though. "Pattern-matching capabilities" is not very descriptive.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: