I don't mean this as a slight because I love the project and other light fast OSes... but with modern compilers, what does assembly get you over just using C or another language and having it handle the translation? I was under the impression compilers nowadays have lots of optimizations that would take a lot of work for a human to do by hand, as well as creating less readable/maintainable code.
> but with modern compilers, what does assembly get you over just using C or another language [...] I was under the impression compilers nowadays have lots of optimizations that would take a lot of work for a human to do by hand
dav1d, the open source AV1 decoder, has now more asm than C code. It's one of the most recent open source projects with significant asm work ongoing.
The asm version outperforms the C version (full optimizations enabled) by 4,5x on AVX2.
We have similar results in SSSE3 (3,5x) and ARM64 (4x).
We're not talking about a few percents, we're talking about multiple times faster.
And AV1 is a standard, so there are no algo shortcuts that can be made: it is either compliant or it is not.
Sure, in most cases, it is not needed to write asm; but there are cases, notably for multimedia, where writing asm by hand is a lot faster, and that includes codecs and game engines.
Well, video decoding is very adapted to ASM, I'd say. You can use vectorisation, you can optimize pipelined executions, etc. Things were compiler may not be so good (my experience is a bit dated). You also optimize a few tight loops, so you can really invest time in them with good ROI.
I'm not sure that operating systems are so suited to ASM optimization (in the sense that you may not reap so many benefits). Maybe one could optimize for size (so you can make super tiny OS) ?
I think one element of writing ASM by hand is the programmer subliminally changes the algorithm to be 'simpler' to write.
In Java, that might involve adding more classes and abstractions to make things conceptually simpler. In C, that might involve using structs to keep relevant data together. In C++, it might involve using a std::set to keep a list of 3 constants, Etc.
It turns out that when you have to write the asm by hand and you see that double pointer dereference is a pain, you avoid data structures with double dereferences (as C data structures often end up with). You avoid classes and abstractions (as are common in Java), because they involve a lot of boilerplate. You use compile time constants for that set of three things rather than any complex hashing that C++ would do, etc.
There are lots of small things like this, and it turns out the effect adds up significantly.
>> I think one element of writing ASM by hand is the programmer subliminally changes the algorithm to be 'simpler' to write.
I strongly disagree :-)
My experience is that optimizing assembler leads to reorganize your code or even your algorithm into a form that you'll CPU will be most efficient at executing which almost always translate to unbelievably intricate, super hard to modify code. I've done stuff on 6502, 80x86, MMX and the assembly parts, optimized for speed and ended up with impossibly tricky code. Worst, sometimes I have to adapt my data structures to allow optimization, which becomes even tougher. So, I prefer to leave assembly code at the "onyl if necessary" level
But now, to be honest, optimizing assembly code is incredibly satisfying to me :-) I've got the feeling to use a CPU to its maximum capacity. Also, using assembly comes after thorough algorithmic study. So once I'm at the ASM level, I've maxed out my own capabilities ! How happy me !
So if you have the chance to do that, just give it a try !
I suspect that this occurs for you exactly because you're aiming for optimization. This can happen in any language, really.
I'd assume that when you maintain larger codebases (i.e. full OS and application suite), that you start writing practical and maintainable code instead.
I’m wondering if they time spent in writing asm for av1 could also be spend in writing an optimising pass for the c compiler? So that any optimisations can be applied by the compiler while the readability of the C language is retained.
The compiler knows nothing about your program and its data layout, which is were all of that speedup comes from.
There are some SIMD constructs that you can efficiently express in C by writing intrinsics, but that isn't substantially different from writing ASM.
Then there are some trivial cases where a loop can be unrolled and packed into SIMD instructions automagically, which retains readability, but greatly limits what you can write. You'll need to read the generated machine code to make sure you didn't mess up.
The benefit of just writing ASM is to not have such a translation layer between you and the processor. That same code that you carefully wrote to make it through the optimization passes for one compiler will likely get messed up in another compiler.
Maybe one could optimize for size (so you can make super tiny OS) ?
Yes, and remember... optimizing for code size is optimizing for speed. =)
Smaller code and data = more CPU cache hits, which are orders of magnitude faster than fetching from RAM. So even if "all" you do is make code smaller, you can get more speed...
> Yes, and remember... optimizing for code size is optimizing for speed. =)
Not always. I think loop-unrolling is a common perf-optimisation technique. Even if the perf-gain is positive in this case, I highly suspect that it doesn't outweigh the cost of maintaining asm code(vs C/other higher level code).
Compare a 512 byte boot sector game [1] to a ~1Gigabyte unity 'hello world'.
Until the day the compiler can smartly trim down all the unity framework to just 512 bytes because it notices I'm not using most of it, hand coding in asm will always work out smaller and faster, if one puts in enough effort.
Unity isn't.. what? And the problem there is using an ill-suited framework, not the language.
If you compile hello world in C, it's not going to be very big either.
> if one puts in enough effort
"enough effort" is a cheat. I can replace almost any program with a smaller javascript version, if I put in "enough effort". It's not a reasonable way to compare anything.
You say that, but I still haven't seen any compilers link in only parts of libc. Use any of it and it all comes in.
Hello world in C becomes anywhere from 120KB to 1.2MB even if just using puts. Even setting the entry point to a function with no arguments doesn't help.
A tiny hello world ends up needing to ignore the standard library and use a console output function from the OS.
Is that a problem? It's there whether you use it or not, and it's not like the asm is running in a vacuum either.
If you go to an embedded point of view where there is nothing already supplied, you can get a hello world, even with a basic printf, down to that size.
Ok fine, I've used the tool called sstrip from a suite called ELFKickers. Now my hello_world.cpp (yes, c++) is 380 bytes long. I could not make it any shorter, so yeah, it is 120 bytes (50%) bigger than yours.
Yea, large effort (objcopy, strip, sstrip), that I need to do only once, after that I already know how to do this an can just put it in a shell script. Now, I would like to know how quickly you'll be able to write a quicksort implementaion, that would outperform my C code, while being at least 25% smaller . Yeah, and how big your binary will become, once you link libc or opengl or whatever to make it able to do actual work.
I agree that Unity is bloated and should take way less constant storage, but that's an invalid comparison. They are not the same domains, they don't use the same APIs, the don't on the same environment, they don't talk to the same hardware, they don't have the same functionality, etc.
All of the performance wins you’re referencing come from vectorization, you typically can’t vectorize OS code since it isn’t ALU bound, which makes your point moot.
typedef int v4si __attribute__ ((vector_size (16)));
v4si a, b, c;
[…]
b += 3; // add 3 to each of b’s elements
c = a + b; // pairwise addition
Using this gives up some control over assembly; it doesn’t guarantee vector instructions get used, but it makes it easier for the compiler to generate them.
Also, I’m fairly sure there are vector instructions not covered by this extension.
IANA C/C++/ASM expert, but one thing I've been thinking of would greatly benefit from type-punning (overlaying 2 kinds of data and interpreting one as the other) because it can use a 64-bit register to parallelise 4 ops at once where speed really, really matters.
That's trivial in ASM I believe, but don't know how well it's possible in C/C++, if at all.
Type punning in general would be somewhat difficult to do here in a way that vectorized but was still defined by the standard. You’d need to put a branch with undefined behavior in it when alignment wasn’t satisfied, then do a memcpy into a vector type, and hope that the compiler understands what you’re doing and doesn’t deoptimize it.
Video coding is one of the few outliers where compilers are not very good at the kind of optimisations that really help. Any experiences with that kind of code does not generalise to other types of code.
How much would something like Halide help with writing optimized code for AV1? It's originally designed for image manipulation, but I imagine encoding/decoding to be a little bit more demanding of expressiveness.
When I worked with Halide, on a thing that is relatively close to the image decoding (transform a series of angle-amplitude pairs (radioastronomy) to the image, a lot of sprite-with-opaqueness painting), I was perplexed to find out how hard it is to work in Halide with non-constant offsets. The functionality was essentially non-existent back then (2015, I assume).
In fact, any scatter-gather operations were non-existent in Halide.
From what I remember these operations were introduced at some point, but we moved away from Halide.
Also, it was not quite simple to transform a loop that draws sprites over the entire image into a loop that draw sprites over part of image and draws many parts of image in parallel (change nesting). Hand-written CUDA version of the algorithm ended up with exactly that.
Thus, if you need some partial-derivatives-numerical-kernel, Halide is good for you. If you are working on the video decoding, Halide is not that good for you. If you are working on video encoding, Halide will be more of a nuisance than a helping hand (early exits from loops, computable access ranges, etc).
That is all very interesting. If you were drawing sprites, why not use straight openGL?
Do you think it would have worked to organize tiles and threads outside of halide and use halide for isolated parts that are already organized into arrays?
We would like to be relatively target-agnostic. OpenGL could be one of targets, but not only one. We also would like to work on regular and/or GPU-equipped cluster machines, etc.
On the suggestion in your second part: why use Halide then? Should it be responsibility of Halide to work out the best loop nesting and best use of threads?
Again, Halide was put aside and we used CUDA for final version, exactly because of inability of Halide to do good work in our case.
> Should it be responsibility of Halide to work out the best loop nesting and best use of threads?
I don't know about "should", but it seems to me that it would still be valuable, even if working out the threading and organization.of the data into an array.
> Again, Halide was put aside and we used CUDA for final version, exactly because of inability of Halide to do good work in our case
I didn't say anything about that. I'm not sure why you are restating it.
> We're not talking about a few percents, we're talking about multiple times faster.
I wonder how long it will take for compilers close that gap. I assume compilers will eventually produce assembly code that outperforms anything written by human.
These aren't necessarily the typical use cases that compiler writers are trying to optimize, either. So, compilers for general purpose languages may never close that gap.
Have you seen a non asm low level language that could fit hardware better than C ? Not starting a flamewar, just that it seems that you're in a context where you may have reviewed things we might not know of.
A company I used to work for has spent 30 years trying to migrate FORTRAN code to C. I was not involved in the project, so I don't know the details, but the gist of what I was told is that there are edge cases where FORTRAN stomps C in every performance metric. So you can make decent progress for a while, but you'll eventually hit a major roadblock that halts progress.
I think of the BLAS algos as being very Fortran friendly, and the Fortran references never _out_perform the C implementations. (The asm implementations are of course the best.)
I've done some bare-metal development in Ada. It was originally designed for embedded systems, so it has features that fit this niche very well. It's definitely not as simple as C, it has a much more modern design however.
Why is this getting downvoted? Writing programs in pure assembly does offer the possibility of implementing very fine-grained optimisations. However, assembly is not a magic bullet. Modern optimising compilers are extremely efficient. The most efficient assembly is quite often not the most legible or maintainable assembly. I've written lots of bare-metal assembly and if I'm given the choice between writing more efficient assembly and writing straightforward assembly, I'll pick the latter every time.
Edit: I have a little bit of experience in the area of operating-system development in assembly. This is purely a hobby project that I work on half-heartedly. Nevertheless, it does demonstrate what I know about bare-metal x86 assembler.
The biggest benefit is usually that you can forgo calling conventions in your own code because everything is visible to you. Of course the code starts looking like spaghetti.
This is correct. You can forego all aspects of your platform's calling conventions if you so desire. Omitting setting up stack frames is a really simple optimisation that can be done in hand-written assembly. You've already touched on what the stakes of doing such things are.
Yes, but they rarely take a whole-program approach. Reducing a complicated sequence of function calls to a few goto’s is not an easy task. Probably NP-hard?
Doing it on a whole program basis is also unlikely to give much benefit. Function calls are extremely fast, they only show their overhead in tight loops.
Yeah, reviewing the repository there’s a lot of high level language code now, so that probably removes most of the benefit of assembly. You’re still calling into the OS a lot.
Some of the other low level systems (like the Mac) had a trap system that wasn’t so far away in cycle count from user code. But in these days of needing 10,000 cycles to bridge a system call it’s best to do whatever you can to avoid calling the OS.
Then I sort of turned the problem around in my mind.
When is a compiler prevented from optimizing? Maybe pointer aliasing?
Another thing I wondered. I've written hello.c and it's about 84 bytes of C and 8.3k as an executable. Would hand-coded assembler be that large?
maybe it's that compilers CAN do well, but because of requirements, they can't do some things well and necessarily create a lot of "boilerplate infrastructure".
> I've written hello.c and it's about 84 bytes of C and 8.3k as an executable. Would hand-coded assembler be that large?
The answer is generally no. A lot of that overhead is due to the C std library. If memory serves from a blog post I read long ago, with very very aggressive tuning, you can get a hello world binary down under 50 bytes (which is smaller than the ELF header). A “normal” ASM coded “hello world” binary could easily be in the range of a few hundred bytes without anything special.
You can use the `-S` switch with gcc to produce the intermediate assembler representation of your C program instead of binary output. From there you can see for yourself what optimisation has been performed and what hand optimisations are possible.
I just compiled my own 'hello.c' on Linux, with the optimisation flag set for 'code size', and the binary weighed in around 8.3k. I then removed the `printf` call and the inclusion of `stdio.h`. The resulting binary was 8.1k. This is exactly as expected. Why would the inclusion of a call to a dynamic library bloat the final binary size?
Programs don't have to be particularly large complex to become relatively unweildy and difficult to understand in assembly. I find it takes me much more time to comprehend large portions of written assembly than higher-level languages. Your mileage may vary. Other engineers might be more talented than I am in this regard, however I think this is more or less everyone's experience.
Abstractions like functions are still present, just not in the way that you might think of them in a higher-level language. You use 'branching' instructions to jump from one part of the code to another. Either by referencing labeled sections of the code to perform 'absolute' jumps, or by jumping 'relative' to the current instruction. This forms the building blocks that can be used to implement more abstract constructs such as loops, functions and conditionals. This is a gross simplification, I hope it helps answer your question though.
You can code anything you want in ASM, so higher level abstractions as well. However while doing that you will lose some of the performance gain that ASM gives you.
Optimised ASM can still be tested but one problem is that when the code doesn't work, it can simply crash. This makes it a pain to debug.
Only the UNIX Assemblers have been traditionally quite poor in macro capabilities, as they have been mostly used as yet another stage for generating code from C compilers.
A good example how to take advantage of such macros is to implement a poor man's compiler, or the first stage of a bootstraped compiler.
Generate bytecodes that can be easily mapped into macros, and then just by having your macro library for the target platform you get the compiler very quickly up and running.
As noticed above, creators admitted that creating OS written in Assembly was a challenge for them.
However, I totally agree about optimizing capabilities of modern compilers.
Interesting fact on the topic, my friend, Principal Developer of well-known tech giant has a practice of picking up candidates, who mentioned "fluent Assembly" skill in their CV, and challenge them by suggesting to create a small, highly optimized application. After that he compiles the same application in C++ with maximum optimization, and in 90% of cases generated code is more optimal, that written manually. However, before becoming all sceptical about obsolete skills of learning Assembly, I would point out on the rest 10% of his candidates.
Do they mention fluent x86-64 or arm assembly? Because there are many ISAs where no C++ compiler will be able to generate better code than hand optimal code by a fluent programmer.
Exactly, when the only compiler available for an ISA is gcc 2.9... manually writing the assembly doesn't sound that bad anymore. C++11? There isn't even a C++03 compiler for those..
I think, sinse he's working on internet search and knows targer architecture in advance, he meant x86_64 assembly, used probably for platform-specific fine-tuning
As part of a university project we optimized a simple string to uppercase function via assembler analysis and asm. Directly optimizing the C code to produce more optimized assembly (notably removing branching) was nearly an order of magnitude faster.
// uppercase without if
*c -= (*c-'a'<26U)<<5;
Adding loop unrolling to that C-optimized version via asm provided another 20% performance boost. And I didn't even add vectorization using AVX.
While compilers provide a lot of optimization directly, there are huge performance gains in simple functionality by helping the compiler with easier to optimize code or by using asm directly.
Also notably the highest GCC optimization level I tested (-O3) reverted the optimization of the optimized C code, while -O2 kept them, resulting in -O2 being much faster. So only using asm guaranteed the performance gain.
Been 30 years since I've written assembler but back in the day I considered myself part of Abrash's army. What we tend to forget about writing in assembler is each typical formatted line has a one-to-one mapping to a machine instruction unless you're using macros. It really is the most wysiwyg programming experience you can have, so it's simply not possible to get the same optimization and space usage with a higher level language because your only choice for optimizing higher level is via groups of instructions. So C optimizers are inlining groups of instructions and making best assumptions about execution flows for you while maintaining intent. So if you want to enjoy the abstractions of higher languages, and the speed at which you can develop, the only obvious run-time optimization is faster hardware to compensate for abstractions that might decompile to massive amounts of machine instructions.
Oh yeah and I'd just like to mention as an aside that assembler statements make more sense to me than bootstrap css codes - a sad statement to the art of reasonable brevity.
You just have to be careful, because in a lot of cases good algorithms have much higher effect that raw performance.
Case in point: I saw a web forum once, written entirely in assembly language. The author claimed high performance, but looking at the sources, I saw that it uses no buffering, and even a simple webpage results in thousands of write syscalls.
Whatever they saved in more efficient function prologues, they lost back in wasted context switches.
I'd also be really concerned about security in that application. Web applications tend to do a lot of string processing, and naïve assembly implementations of string processing routines are likely to be vulnerable to all the same errors as a naïve C implementation -- buffer overflows, off-by-one errors, null byte injection...
Exactly. Compile a C or C++ application with -Os and the highest size optimisation settings, and you'll almost certainly still end up with a binary that's several multiples if not an order of magnitude or more larger than if you had written it in Asm with the equivalent functionality. Compilers can do SIMD tricks and such to get close in speed, but they are still pretty horrible at size optimisation.
You could try to "decompile"(!) this project into a higher level language and compile the result if you were really curious and wanted to try this exercise in the opposite direction. I suspect a lot of the things done in this code aren't even representable in a HLL or something a C compiler could be coerced to generate (without cheating and using inline Asm.)
I disagree with almost every aspect of this post. I've never seen any evidence of your claim about binary size being the case. It's a spurious claim. There's always the possibility that a user could write slightly more efficient assembler by hand, but orders of magnitude? Maybe in the 80s this could have been correct.
Your claim about compilers using "SIMD tricks" to "get close in speed" makes no sense either. If the HLL is using SIMD instructions and this magic assembler that you're talking about isn't, how would their performance ever be equivalent?
The majority of general-purpose code can't use SIMD because it's really branchy, mundane "business logic" type of code, and that's where handwritten Asm's code density really shows an advantage.
I see that you claim to have "written lots of bare-metal assembly" and have posted a link to your OS in Asm, so I looked at the code...
You are not using Asm the way Asm is supposed to be written. You are writing code like a compiler, which totally misses the point of using Asm. Now it is obvious why you don't understand --- because you've never seen what "real Asm" looks like.
To elaborate, one thing that stands out is lots of stack manipulation, barely using the registers at all. Putting everything on the stack is what stupid compilers do. This is no good for speed nor size.
I have been reading and writing Asm for a few decades. See some of my other comments if you'd like to learn more... here's a quick sampling:
I think you are underestimating the ability of humans.
I remember back in the day where I was following a small amateur indie game scene (1998-2002), a guy released a game he'd written entirely in assembler, just for fun. He could have easily have written it in C instead, but he didn't.
The binary was significantly smaller and the game really smooth.
You don't write assembler the same way you write C or C++.
A compiler is working under various constraints, for instance regarding interoperability. When you craft everything by hand, you really have no constraints other than your imagination. You can benchmark stuff and learn and adapt. Compared to that a compiler is a sophisticated idiot.
That doesn't mean all of us should start working in assembler. But don't look down at people that do - instead find inspiration, and perhaps try to make the compilers less dumb.
No, but those are extreme outliers, and not relevant at all to discussion about large scale programs.
You can absolutely not write a program much larger than 256 bytes in the style you use to write those. It would be utterly incomprehensible and unmaintainable. The small size actually works in your favour here, and allows you to use incredibly questionable tricks.
Linking a static library into your program will increase your binary size, linking to a shared library will not. Smaller binary size is one of the benefits that shared libraries offer.
That decision is entirely up to the developer and the project requirements. If you're not writing a bare-metal program there's no reason why not to use external libraries. It's still perfectly possible.
Program bloat is not due to poor compiler optimization. It’s due to runtime size and a lack of effort to make runtimes smaller. In particular the c runtime.
I was skeptical, so I looked into his zlib implementation. It was 18% faster than gzip v1.6. The author said, "all I did was hand-compile the reference implementation, an 18% reduction in user-space time is a big advantage IMO."
I am not skeptical any more! More details of the benchmarking are in the linked thread.
I suspect a big win is that usually, programs from hand-written assembly will be smaller than compiler-generated binaries. For example, see [1], where a 32-bit GCC-generated binary (2.6 KB) is compared with a hand-written assembly program (45 bytes, although this is not representative). For reference, when I compile an empty program with GCC 8.3 on my 64-bit system, I get a 16KB binary.
The constant overhead for compiler generated binaries is larger because the compiler doesn't optimize for the case of nearly empty programs. I'm not so sure that the benefits are that large once you have a decently sized piece of code.
That's true. The overhead on relatively small systems with lot of simple binaries might stil be considerable. But it's probably not the main reason for the difference in snappiness.
I love the idea and I'm glad someone was able to create it but long term maintenance is the problem with assembler. Few people have the know-how to help. Even if there were a large number of programmers that could maintain it, it's better to write the OS in a high-level language and use assembler to optimize it.
It gives you a very close understanding of your CPU's instruction set and features. It's not necessary to write performant code, since compilers can do this very well.
Modern compilers are pretty good, but it's still quite possible for a good human coder to exceed their performance even for normal code.
For really tight algorithms I've still seen humans beat compilers by a lot, though it takes a lot of skill and some domain knowledge of the CPU.
For code that benefits a lot from vectorization, human coders still beat the crap out of compilers. I am not aware of any vectorizing compiler or JIT that can even approach what a human coder can do with SSE, AVX, or NEON (ARM's equivalent). I think these CPU extensions are just too complex for current generation compilers to effectively deal with. They require too much abstract understanding of what's actually happening in the code and the CPU to use really effectively.
I'm a bit surprised that there's been so little attention paid to the opportunities for applying deep learning and other advanced techniques to compiler optimization. It seems like this area is ripe for a new wave of innovation, but it's not happening. I do believe that many companies would pay for an advanced compiler capable of generating code that was significantly faster than stock compilers, but the speedup would have to be more than a few percent to justify spending money on it.
It depends. If your target is x86/x86_64 and the compiler has a LLVM or Intel backend, I've found that beating the compiler is almost impossible and largely futile. I took a long time to accept this since I know asm and the x86 instructions well.
64 MB is absolutely gigantic if we are talking about assembly code. So that is not "nuff said" by any measure, it is a completely irrelevant factoid that tells us nothing at all?
The main advantage of Asm is the tiny constant factor. A linear or quadratic algorithm can beat a logarithmic or even (amortised) constant-time one on the sizes of data used, if the latter has a much larger constant factor.
Just like in a HLL, if you really need it you can still write reusable data structure libraries, but Asm's really tiny constant factor and effort involved in adding complexity forces you to think about whether you really need it first.
In other words: in the amount of cycles spent initialising a hashmap and inserting a few dozen or hundred items (perhaps involving memory allocations, etc.) just so you can get (once again, a relatively large factor each time) asymptotically constant-time lookup in an HLL, you could've gone through the whole set many times already in a tight loop of less than a dozen instructions, that entirely fits in the L1 cache along with the data too.
What are you talking about? You're talking about implementation. This has nothing to do with assembly at all. Assembly is not a magic language that offers instant efficiency. 'Constant factor' is entirely irrelevant to this discussion. There's nothing stopping an engineer from implementing equally inefficient data structures/algorithms in assembler as they would in a higher-level language. It is true that if the engineer understands the problem domain very well, and the project requirements are amenable, the possibility certainly exists for them to solve the problem with highly efficient assembly. This is true for almost anything. However, it exists at the cost of extra development effort with no real guaranteed benefit.
There's nothing stopping an engineer from implementing equally inefficient data structures/algorithms in assembler as they would in a higher-level language.
...except the additional complexity of doing so? If you have to write every single instruction, you start thinking more about whether you have to write each one.
'Constant factor' is entirely irrelevant to this discussion.
Here's a 6502 instruction that has undefined behavior, because the microsequence connects circuits together in an invalid way causing analog effects that can change even run to run of the same processor.
It looks like that’s not undefined behaviour, but using a non-existent and undefined opcode, which happens to “do something” because the cpu attempts to execute the bits anyway (it doesn’t mask the unused opcodes into no ops or errors). Later variants of the 6502, like the WDC 65C816, did away with this (I believe, either using the opcodes for something defined, or making them no ops, although I’m unsure)
And some undefined behavior is errata that differs from CPU to CPU. Check out the errata documentation for any processor released in the last few decades. For example, x86. You can trigger things that happen on one x86 CPU that won't happen on another.
Exciting! Please be much more specific on your webpage "much faster" (so... 2X, or 10X, ... or just 10%)? Also for "requires only few MB". What are "few MB" for you? 1? 10? 100? :) Please be specific with all your claims and measure everything, to be more credible and useful.
Live CD is ~65MB (the website says the core fits on a 1.44MB floppy). It boots in less than one second (VirtualBox using defaults for "unknown OS"). Mouse and keyboard are working but are not fit for real use.
65MB seems big if compared to Tiny Core Linux for example, which is 11MB: http://tinycorelinux.net/ Maybe it's packing more features than Tiny Core Linux though, unsure.
What niche is this supposed to occupy? I can't seem to find any information on when/why KolibriOS would be a good choice over more traditional OSes.
Stated differently: is this a useful thing, or is it an exercise in "how far can we push a pure-assembly project"? The latter would of course be fine, but I'm quite curious to know which it is.
> Stated differently: is this a useful thing, or is it an exercise in "how far can we push a pure-assembly project"? The latter would of course be fine, but I'm quite curious to know which it is.
It's really fast, and it has a nice retro touch to it. But claiming that is has a web browser is clearly an exaggeration. It's a text-based browser, and for some websites it cannot even display the text (crashes). So as a consumer operating system it isn't really good for anything. Unless you want to play retro games all day long of course.
The website also claims that the operating system has a "word processor." A more appropriate term for it would be "text editor," it appears to be less functional than Notepad on Windows.
I wonder to what degree software written in assembly is faster precisely because it's harder to write. The result is that you end up with a lot less of it.
Everything's fast in the early stages, when there's not so much of it yet.
The most important difference is that kolibrios is a team effort, and open source.
Menuetos abandoned its 32bit version (the one that was open) to focus on 64bit, which is not, and even has a clause in the license that prohibits disassembly.
Regardless of being closed source, I'm going to guess that it would still be highly educational to learn Assembly programming on a system that makes it a first class citizen where you just boot into the OS that already is entirely built around ASM. For really simple stuff, a simple microcontroller might be better, but I'm sure some folks are more interested in building desktop apps.
I didn’t like the event model in Kolibri so I gave up on it after poking around programming for a while. There are a couple interesting games. I like the Lawnmower game that’s like a house painting game I once typed in for the C64.
I wish I could remember exactly. Too much preprocessing of input? Inflexible calling of the program by the OS? Not sure but it was either live with it or rewrite everything to change it.
Seriously, if there's a platform to which a so tightly packed OS could be immensely beneficial, are all those small Linux-capable ARM boards that cost like two beers.
Even the most hardware limited ones (256 MB RAM, 32bit single core, etc.) that still can run a complete Linux environment in some contexts would literally scream when served
something that fast and tight.
RISC OS is a new and different OS for the Pi. It isn't Linux, it isn't Unix, it isn't based on any other OS. It's the first ARM OS, begun in 1987 by the team who designed the original ARM processor. It's also a descendant of the OS used on the early 1980s BBC Micro... those who remember the BBC Micro might find some of its commands familiar. BBC BASIC is only a few keypresses away.
What's interesting about RISC OS?
It's small. It's fast. RISC OS is a full desktop OS, where the core system including windowing system and a few apps fits inside 6MB. It was developed at a time when the fastest desktop computer was an 8MHz ARM2 with 512KB of RAM. That means it's fast and responsive on modern hardware. The memory taken by apps is usually counted in the kilobytes. A 700MHz 256MB Raspberry Pi is luxury - what to do with all that memory?
RISC OS is also a lot simpler than modern OSes such as Linux. The pace of development has been a little slower than other OSes, which means there are fewer layers getting between you and the system. It's much easier to get stuck in and change things. It's also easier to understand. As a formerly closed-source OS, most of the interfaces are documented in a series of books called the Programmers' Reference Manuals (PRMs) which are included on the RISC OS Pi distro. That means you can change a lot of things without having to work on the OS code itself (which is available if you want it). It's very modular, so you can mix and match components, and the communications between modules are carefully documented.
RISC OS gets out of the way. It's a 'co-operatively multi-tasked' OS. While that means one misbehaving application can stall the system until you kill it, it also means you easily write apps that take over the whole machine - for example controlling hardware where you need predictable timing. RISC OS is a single-user OS, which means there's very little security - not great for internet banking, but very handy when you want to dig around and program the internals of the OS.
As a full desktop OS, there's also plenty of traditional desktop software available like drawing programs and desktop publishers. Features you've come to expect on a desktop like scalable fonts and printing are supported. RISC OS was big in UK education in the 1990s, and there's a large back catalogue of educational software.
Not to be a buzzkill, but I'd like the exact opposite. An operating system with no asm, no C. Processors are over 1,000x what they were when I was young, I wish I could trade some of that 1000x for fewer security holes.
I know back in the day there were talks of having Linux (or some other OS?) being re-written in Java, but it never panned out. At least that's what my professor told me, but I can't seem to find any info about it.