Hacker News new | past | comments | ask | show | jobs | submit login
Clang/LLVM using the built in assembler builds all of FreeBSD (uiuc.edu)
72 points by X-Istence on Oct 30, 2010 | hide | past | favorite | 37 comments



As a big fan of FreeBSD, I've been following this development with great interest. I just compiled a few select ports with clang, and I look forward to rebuilding all my ports to see what compiles and see how the system feels speed-wise.

However, as a server admin, I've become fond of the extra layer of protection ProPolice (gcc's "-fstack-protector" option) provides. I know it's a small thing, but does anyone know if clang has a similar feature?

Also, are there any non-default optizations I can try out? With the stock gcc in FreeBSD, I've been able to get away with using "-O3" on everything with no build failures or instability.


It looks like LLVM does have a stack protector pass:

https://llvm.org/svn/llvm-project/cfe/tags/Apple/clang-54.1/...

As for non-default optimizations, this passage from the following article best explains how much there is for you to explore:

Which LLVM passes to use, and which order to run them, with what analysis in between, is a huge search problem, though. There are roughly 200 optimization and analysis flags, and these, mostly, can be run in any order, any number of times. If we want to run say, twenty of the passes (about what -O3 does), that’s what, about 10^46 arrangements!

http://donsbot.wordpress.com/2010/03/01/evolving-faster-hask...


Great news, with this and clang compiling Linux we might see it replace GCC on most/all free operating systems within a few years.


No, we won't. The respective communities have been developing with GCC in mind for years. While clang is fantastic, I doubt that an outright replacement will take place.

But emancipated co-existence would be neat.


Actually, quite a few BSD people will be happy to have a non-GPL compiler.

AFAIK, your comment is spot on for Linux, though.


For those BSDs that target a wide range of architectures (NetBSD is the canonical example), it'll be a while before they transition completely, since the gcc ports list is still considerably larger than the LLVM one.


Sure, OpenBSD won't be transitioning anytime soon either - indeed, there are a lot of architectures that aren't supported by LLVM yet, and making sure everything worked with gcc-4.x was enough work...


Yes, you're absolutely right. It's a blessing for the BSD community.

However, when someone mentions "most/all" free operating systems, then Linux and related ecosystems are one hell of a big elephant in the room.


Ubuntu is already considering switching. Most people don't develop with "GCC in mind". They just target it, but now that there's another compiler that compiles almost all GCC C out there the cost of switching is very low.


I'm not too familiar with C, but don't compilers have to adhere to the standard? Is the issue developers who depended on GCC-specific behaviour for things the standard doesn't document?


The standard just means that if you write "standard C", the compiler will work as expected (for some reasonable value of expected). But compilers can add extensions, and GCC has a long list of extensions. The Linux kernel uses these GCC-specific features a lot!

Here's a list of extensions, if you're curious... http://linuxdevcenter.com/pub/a/linux/excerpts/9780596009588...


I see, thank you. I wasn't aware that these were compiler-specific extensions.


gcc, like other compilers, has it's own documented extensions which are used in many projects (such as the linux kernel, but also many interpreters/vms using direct threading for example).

The clang devs try to support them nonetheless for compatibility, except, for what I understand, a few that are explicitly unsupported by design.


Compilers are free to implement extensions if they want. A lot of programs depend on GCC extensions, so you'd never be able to compile e.g. a modern Linux distro with just an ANSI C compiler.


some things are left undefined by the standard, here compiler implementors can do what they will. Another thing are GNU specific extensions, not only in GCC, leading to GNUisms in many scripts and sources. LLVM supports many GNU extensions.


From what I have gathered in the past, and what has been said here on Hacker News as well, FreeBSD is looking to replace GCC in the base with clang/LLVM, and that is exactly where this move is going.

As the mailing list message specified clang is going to be the default on i386 and x86_64. GCCisms will eventually go away and the code base will be cleaner and more manageable based upon it.

As states in the FreeBSD status report for this last quarter there are already experimental tests being run on the FreeBSD ports build system whereby clang/LLVM are used to compile all of the ports in an attempt to see how far clang/LLVM can get.


The main reason for moving to clang is to remove as much GPL code from *BSD as possible (kernel and system libs).

You are correct regarding many of the 'ports' (packaging system) though as there will no doubt be some GCC idiosyncratic code in many of them.


On the ports build clusters they are doing experimental runs using -CURRENT with clang/LLVM as the only compiler for the ports tree in an attempt to weed out what would be broken and what will work without changes.

So this is definitely something they are already testing.


Can someone explain why LLVM is better than native code (if it is)? If it's not, what's the use for this?


That's really comparing apples to oranges.

LLVM can compile to intermediate code in order to make code optimizing passes more generic and better-defined, to cleaner separate compiler components and phases, to better abstract multiple architectures and to impose restrictions that make some analysis and optimization easier (like enforcing Static Single Assignment for forms), you can create a VM that runs IR directly and distribute that, among many other reasons. So IR does eventually make it to machine code, otherwise it won't be able to do anything.

Why LLVM is better than GCC is a different question. It's better, in my opinion, for being a much cleaner, more well-architected, modern, and much much better modularized code base. Also, I am a happy person when I read Clang docs on parsing vs when I read GCC docs on how to manipulate GIMPLE trees. There are also more objective metrics, like the front end being a very nice, portable library for parsing C-derived languages and exposing more functionality via libclang. (This makes very code-aware IDEs more possible and much easier to write, the use of C syntax for custom work, making custom code scanners and analyzers far easier, etc, etc)

Find more out at http://www.llvm.org


Thank you, I thought it was just the vm, but apparently it's a whole suite of compiler tools.


Just to extend a little: Nearly every widely used (non-lisp) compiler that exists first compiles the program into some sort of intermediate representation that is easier to manipulate and optimize, works on it a while in this form, and then finally compiles that into machine code.

The level of this intermediate representation varies between compilers -- GCC uses a very simplified and strict form of c, while many others use an IR that is closer to assembly. LLVM is not really a competitor to JVM or .NET, but an attempt to define an unified IR that many languages can compile into, be optimized in, and then be compiled into various different machine codes. So that when you write a compiler for a language you only have to write the top layers until your code turns to IR, or if you make a compiler for new platform you only have to write the bottom layers to turn IR into machine code, and in both cases you get free use of all the various IR-level compiler optimizations that have been already built into LLVM.


Indeed.

tinycc is one of the only compilers that directly translates to machine code. This makes it compile very quickly, makes the code base smaller, and as a bonus allows it to be a pretty decent c script 'interpreter'. But it also means that it does not optimise as well.

Tinycc was originally submitted as the winning entry in an obfuscated C code contest - but is surprisingly readable now. Worth a read.


Most Lisp compilers have intermediate representations too, although these IRs are still technically Lisp code (although they may not be executable). There's a really good series of blog posts about the IR of the SBCL compiler by Roman Marynchak:

http://insidelisp.blogspot.com/search/label/python-book


It sometimes fun to take some random Lisp code and fully expand the macros:

    (macroexpand '(loop for x in list minimize (sin x)))
The output depends on your Lisp compiler, but it can get pretty wild. This is why we have the macroexpand-1 function, to just expand the first macro it finds and stop before all collapses into madness.

A compiler writer could get a lot of mileage out of this sort of thing. For example, a LET special form could be implemented as a macro that expands to a call to an anonymous function. All the looping forms could be turned into lambdas and recursion. As long as the compiler can handle that sort of thing nicely, this could really simplify the compiler code.


That clarifies it very much, thank you. From what other commenters said, I gather that you can also use it as an interpreter for the IR, which is, again, very interesting.


You can't compare LLVM to native code, since these are different things. LLVM IR can be compiled to efficient native code if that's what you want. In fact, this is what happens when you use clang+llvm to compile code into native executables.

LLVM is much more than that, however. You might want to use it as a JITting interpreter of the IR instead of as a native compiler with all the benefits that may result from this. You may also use it as a sophisticated optimization framework (including cross-object, link-time optimization) taking IR and giving (optimized) IR back.


Just playing devil's advocate here: how much Clang/LLVM users are protected against patent lawsuits from companies (or individuals) who contributed patented technology to it?

I understand it never happened, but I am afraid someone somewhere may contribute something relevant and then have the patents bought by a troll.


I had to go back and check, but it appears I've had this conversation with you before:

http://news.ycombinator.com/item?id=1833560

Why don't we pick up where you left off: you can tell us what compilers you use yourself, and tell us how you are protected from the same.


At that time the only argument you were able to produce was that, if Apple decided to go belligerent on whatever patented tech they collaborated to Clang/LLVM, we would have to go to a previous, pre-Apple, version and fork it from that. That would be a version from a good couple years back.

Being able to fork is the ultimate defense to certain corporate evils, but a license that doesn't provide some protection against patents, obviously, won't protect you against that specific evil.

BTW, I use GCC most of the time. And I am shielded from any patented technology Apple may have contributed to it since the NeXT days by the redistribution clause of the GPL.


Apple can only confer redistribution rights for patents they own. If they - or any other gcc contributor for that matter - mistakenly or deliberately reproduced an invention patented by someone else (IBM, Microsoft, Intel...) and contributed it to gcc, then the clause you mention affords you no protection whatsoever.


What part of Apple becoming belligerent on patents they own didn't you get? Obviously no FLOSS license will protect you from patents the contributor doesn't own.

In you example, if IBM, Microsoft or Intel contributed their own patented tech to GCC, they wouldn't be able to sue any user of the GPL'ed code.

My doubts are about BSD-ish licenses - if contributing your patented tech to a BSD project doesn't put you in position to sue downstream users that didn't receive the right to use it directly from you.


The plausibility part.


Companies come and go. If one company that contributes its own patented tech to a project like LLVM gets acquired by a troll, what protection users would have?

I see no reason to trust a company to behave responsibly towards a community if and when it no longer needs it.

Your insistence in stating the obvious - that nothing protects users from third-party patents - while refusing to acknowledge the question of how much BSD-like licenses protect users from patents owned by those who contribute code to BSD-licensed projects is disturbing.


It's not that I don't acknowledge that your emperor is wearing pasties, I just don't think they cover enough to be considered clothing.

I think the onus is on you, as the advocate of this clause, to name one actual patent troll case that would have been prevented by it. SCO wouldn't have been prevented. The recent Oracle action against Google would not have been prevented. So which one?

You need to provide protection from actual, plausible danger to justify the doubt, uncertainty, and fear that you are keen to spread.


Apple has also contributed to GCC in various ways, what is to say that they didn't contribute some piece of code that they hold a patent on and thus could sue anyone that uses GCC?


They can't sue anyone that uses GCC because GCC is GPL-licensed and, by requiring Apple to give unrestricted redistribution rights, the GPL effectively shield downstream users.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: