That’s not what I’m talking about. I’m talking about software that people are obligated to distribute under a GPL-compatible license. For example, some random company’s private fork of the Linux kernel.
> The second and third books leave something to be desired
Also got this feeling on the first read... but now I remember them very fondly! I like to think that this trilogy happens in the same universe as Dune, being a prequel to the events of Dune. The homage to the Dune universe by the author is obvious (the names of the books, the notion of "other memories", etc). But many notions fit together, with some effort in your imagination. The second book of the trilogy provides a mechanism to explain the other memories in the form of nodal biology. The octopi ftl technique is reminiscent of the guild navigators. The third book hints subtly at a reason why the butlerian jihad could have happened.
As somebody who is afraid of types (and also, who hates types, because we all hate what we fear), may my point of view serve as balance: you don't need a type system if everything is of the same type. Programming in a type-less style is an exhilarating and liberating experience:
assembler : everything is a word
C : everything is an array of bytes
fortran/APL/matlab/octave : everything is a multi-dimensional array of floats
lua : everything is a table
tcl : everything is a string
unix : everything is a file
In some of these languages there are other types, OK, but it helps to treat these objects as awkward deviations from the appropriate thing, and to feel a bit guilty when you use them (e.g., strings in fortran).
I feel the need to issue a correction: while I'm programming in assembly, I very well have types. This word over here (screen position) represents a positive number only, but this one over here (character acceleration) can be negative. When adding one to the other, I need to check the arithmetic flags like so to implement a speed cap...
The types certainly exist. They're in my mind and, increasingly through naming conventions, embedded within some of the comments of my assembler code. But nothing is there to check me. Nothing can catch if I have made an error, and accessed a pointer to a data structure which contains a different type than I thought it did. Without a type system, that error is silent. It may even appear to work! Until 6 months later, when I rearrange my code and the types are arranged differently in memory, and only THEN does it crash.
The original goal of hungarian notation :) But Petzold mistakenly used 'type' in the paper and we ended up with llpcmstrzVariableName instead of int mmWith vs int pixelWidth, which was what they were doing in Office and frankly makes a lot of sense.
But once you get down to the unit data values inside any of those aggregates, you're still dealing with either characters, ints, floats, strings, arrays, and they each have their own individual access patterns and, more importantly, modification functions.
You can't add a number to a string, only to another number.
If you are dealing with a float, you better be careful how you check it for equality.
If it's pure binary, what kind of byte is it? Ascii, unicode code point, unsigned byte, signed multi-byte int, ... whatever.
There's no escaping the details, friend.
And your saying "everything is a word" for assembler is just plain wrong.
Ok, sure. But I doubt that's a good practice. In fact, I can't possibly imagine it not being a horrible idea.
So, I ask: what size and signedness of int? 1, 2, 4, 8? What if the string is of length 3, 2, 1, 0?
Why bother with all those corner cases. Everything has a memory layout and appropriate semantics of representation and modification. Pushing those definitions is a recipe for problems.
I like to keep it simple, keeping the semantics simple in how I code specific kinds of transforms.
The less kinds of techniques you use, the less kinds of patterns you have to develop, test, and ensure consistent application of across a codebase.
Especially down in C land, which is effectively assembler.
Gone are the days of Carmac having to save bytes in Doom, unless you're doing embedded work, in which case that's all the more reason to be very careful how you handle those bytes.
That's entirely how "string" indexing works in C. Strings in C are just pointers to `char` with some valid allocation size. As long as the integer used for the pointer offset results in a pointer into the allocation after the addition, it's valid to dereference the result. Remember, `array[index]` is syntactic sugar for `(array + index)` in C. Lots of the C stdlib string functions use this, e.g. `char strchr(const char* str, int character)` has a naive implementation as a simple loop comparing `char`s[1]. Glibc does it one `unsigned long int` at a time, as an optimization, with some extra work at the start going one `char` at a time to ensure alignment.
> So, I ask: what size and signedness of int? 1, 2, 4, 8?
Doesn't matter, as long as the result of the addition points to within the pointer's allocation. Otherwise you get UB as usual.
> What if the string is of length 3, 2, 1, 0?
Doesn't matter, as long as the result of the addition points to within the pointer's allocation. Otherwise you get UB as usual. For a 0-length string (pointer to '\0\'), the only valid value to add is 0.
> The less kinds of techniques you use, the less kinds of patterns you have to develop, test, and ensure consistent application of across a codebase.
100% agreed. The less C you use for string handling the better. C strings are fundamentally fragile.
Any time the microprocessor accesses memory for use as an int, it's a specific kind of int, meaning size and signedness, and the flags are adjusted properly as per the operation performed.
> Strings in C are just pointers to `char` with
I'm gonna end this here. I taught myself C programming by reading K&R in the late 80s, and then proceeded to do so professionally for YEARS and YEARS.
There are people that know, and there are people that act like they know. You ever read the first two chapters of Windows Internals? You ever write C code that could make Windows system calls from the same program that could be 32- or 64-bit with a simple compiler flag?
I have.
> C strings are fundamentally fragile.
Not if you know what you're doing. You're almost certainly using a C program to type this response in an operating system largely written in C. You get any segfaults lately? I don't EVER on either my Ubuntu or Debian systems.
> Any time the microprocessor accesses memory for use as an int, it's a specific kind of int, meaning size and signedness, and the flags are adjusted properly as per the operation performed.
Sure. But the C standard specifies how addition of a pointer to an integer works in section 6.5.7, particularly paragraph 9. The specifics of what flags get set & the width of integer used are up to the implementation & the programmer, but
> For addition, either both operands shall have arithmetic type, or one operand shall be a pointer to a complete object type and the other shall have integer type. (Incrementing is equivalent to adding 1.)
should be a pretty clear statement that pointer + integer is valid!
> > Strings in C are just pointers to `char` with
> I'm gonna end this here. I taught myself C programming by reading K&R in the late 80s, and then proceeded to do so professionally for YEARS and YEARS.
> There are people that know, and there are people that act like they know. You ever read the first two chapters of Windows Internals? You ever write C code that could make Windows system calls from the same program that could be 32- or 64-bit with a simple compiler flag?
> I have.
I'm an embedded C developer. I've been writing C for decades, but not for windows. But I do write code that can work on both 8-bit and 32-bit systems with just a compiler flag. Strings are arrays of character type with a null terminator, and array-to-pointer decay works as usual with them.
>> C strings are fundamentally fragile.
> Not if you know what you're doing. You're almost certainly using a C program to type this response in an operating system largely written in C. You get any segfaults lately? I don't EVER on either my Ubuntu or Debian systems.
> Thanks for playing.
C strings are arrays of character type with a null terminator. That is fundamentally fragile, since it includes no information about the encoding or length of the string, and thus allows invalid states to be represented. That doesn't mean you will get segfaults, only that it's possible for someone to screw up & interpret your UTF-8 data as ASCII or write a `\0` in the middle of a string or other such mistake, and you'll get no protection from the type system.
Every C compiler I've worked with could output the code as assembler, so C is really a thin layer of abstraction that wraps assembler. Having programmed in pure assembler before, I understand the benefits of C's abstractions, which began with its minimal, but helpful, type system.
Should I not be taking you seriously?
We are not just talking with each other but sharing our expertise with those who may be reading.
Sometimes I forget that other people can just be unpleasant on purpose. I find no other explanation for your response.
Or C. It just turns into pointer math. Godbolt example here[1], just make sure the `int` is an offset within the bounds of the char* and it's well-defined.
I've programmed in typeless languages and they are great for small programs - less than 10,000 lines of code and 5 developers (these numbers are somewhat arbitrary, but close enough from discussion). As you get over that number you start to run into issues because fundamentally your word/array of byte/multi-dimensional array of floats/ ... has deeper meaning and when you get it wrong the code might parse and give a result but the result is wrong.
Types give me a language enforced way to track what the data really means. For small programs I don't care, but types are one of the most powerful tricks needed for thousands of developers to work on a program with millions of lines of code.
I experience it often in teams less than 2 developers and projects less than 2000 lines of code (that's still 50 pages, btw). It boils down to being able to load everything into your mind, and that heavily depends on a type of a project, data/code models, ide, etc, and also various factors unrelated to coding.
A human mind is a cache -- if you overload it, something will fly out and you won't even notice. Anyone who claims that types have no use probably doesn't experience overloads. If it works for them, good, but it doesn't generalize.
Sure, in many languages we have the notation of thing.do_thing(arg1, arg2).
I suggest this is a good notation for data structures like, stack.push(10) or heap.pop()
I'm suggesting we don't use this notation for things like rules to validate a file, so I suggest we write validate(file, rules) instead of rules.validate(file).
Then we can express the rules as a data structure, and keep the IMO unrelated behavior separate.
Note then we don't need to worry about whether it should be file.validate(rules) perhaps. Who does the validation belong to? the rules or the file? the abstractions that are created by non-obvious answers to "who does this behavior belong to" are generally problems for future changes.
The filename suffix isn't much more than part of the filename (a simple variable name in that analogy) - it's more convention than constraint. Nobody is stopping you from giving your file the name you want (and the OS allows). You'd use literal magic[0] to assume an actual type.
"Everything is a file" rather refers to the fact that every resource in UNIX(-like) operating systems is accessible through a file descriptor: devices, processes, everything, even files =)
you can put them in the same repository, if that is your thing.
If you put the build files in a .builds/ folder at the root of your repository, they will be run upon each commit. Just like in github or gitlab. You are just not forced into this way of life.
If you prefer, you can store the build files separately, and run them independently of your commits. Moreover, the build files don't need to be associated to any repository, inside or outside sourcehut.
Thanks! Yes, I handcraft all my HTML and CSS. I'm glad you noticed the HTML and liked it. I find great joy in crafting my website by hand. It's like digital gardening. I grow all my HTML and CSS myself. It's all 100% organic and locally sourced!
I rarely ^U nowadays, but your site was so clean that I couldn't resist!
Just as a side note : when writing html5 by hand, you can use the full power of the language, most notably optional tags (no need to write html, body, etc) and auto-closing tags (no need to close p, li, td, etc). You may get something even crispier!
> Notice that, when writing html5 by hand, you can use the full power of the language, most notably optional tags (no need to write html, body, etc) and auto-closing tags (no need to close p, li, td, etc). You may get something even crispier!
Yes! In fact, sometime back I wrote a little demo page to show the minimal (but not code-golfed) HTML we can write such that it passes validation both with the Nu HTML Checker and HTML Tidy.
That said, when writing my own posts, I prefer keeping optional and closing tags intact. Since I use Emacs, I can insert and indent closing tags effortlessly with C-c /. It's a bit like how some people write:
10 PRINT"HELLO
But I've always preferred:
10 PRINT "HELLO"
I find the extra structure more aesthetically pleasing.
In my time we just wrote very small on scraps of paper or tore pictures from magazines. We folded them tight into a compact rectangle, then we folded that rectangle across a rubber band between our thumb and index finger and with the other hand we stretched that rubber band enough so that when released, it would propel the folded rectangle of paper across the intervening distance rapidly enough that the adversarial player in the room could not detect the source or recipient of the rectangle without querying the entire congregation.
This method eliminated potentially adversarial middlemen in transit who might, if you chose to pass it through multiple players (servers if you will) - read it in transit though the message was not intended for them, and then use the contents against you later.
It had the disadvantage that one needed to insure that the sender and recipient were in sync in case the aim was off and the message bounced to an unintended recipient.
I once had the misfortune of sending a tightly folded, secure message that was part of a war game being played during English class, and having that poorly aimed message hit the largest mass of muscle in the class right squarely in the ear because the recipient was busy gloating over the success of their previous move and wasn't able to secure the reply in transit.
We all heard the light snapping sound of the rubber band followed by an uncharacteristically loud profanity from the unintended recipient, my own barely stifled gasp of horror, lots of giggles and laughter from the audience, and as they turned - the beginning of the next round of the Inquisition by the adversarial instructor who mistakenly thought we were all watching the English lesson on the board in real time instead of conducting paper war games in the background.
At first I really liked this idea, but then I realised the size of stack frames is quite limited, isn't it? So this would work for small data but perhaps not big data.
In theory, this is a compiler implementation detail. The compiler may chose to put large stacks in the heap, or to not even use a stack/heap system at all. The semantics of the language are independent of that.
In practice, stack sizes used to be quite limited and system-dependent. A modern linux system will give you several megabites of stack by default (128MB in my case, just checked in my linux mint 22 wilma). You can check it using "ulimit -all", and you can change it for your child processes using "ulimit -s SIZE_IN_KB". This is useful for your personal usage, but may pose problems when distributing your program, as you'll need to set the environment where your program runs, which may be difficult or impossible. There's no ergonomical way to do that from inside your C program, that I know of.
Its a giant peeve of mine that automatic memory management, in the C language sense of the resource being freed at the end of its lexical scope, is tied to the allocation being on the machine stack which in practice may have incredibly limited size. Gar! Why!?
Ackshually, it has nothing to do with the C language. It's an implementation choice by some compilers. A conforming implementation could give you the whole RAM and swap to your stack.
Yes, but does any implementation actually do that?
AFAIK Ada is typically more flexible, but that has to do with the language actually giving you enough facilities to avoid heap allocations in more cases - e.g. you can not only pass VLAs into a function in Ada, but also return one from a function. So it becomes idiomatic, and compilers then have to support this (usually by maintaining a second "large" stack).
Yea, usually the stack ulimit is only a few KiB for non-root processes by default on linux.
It is easy enough to increase, but it does add friction to using the software as it violates the default stack size limit on most linux installs. Not even sure why stack ulimit is a thing anymore, who cares if the data is on the stack vs the heap?
It isn't a practical pattern for anything beyond the most trivial applications. Consider what this would look like if you tried to write a text editor, for instance - if a user types a new line of text, where is the memory for that allocated?
The problem is that regardless of the amount of confrontation it does not have an answer for any infinite run time event-loop based program, other than "allocate all of memory into a buffer at startup and implement your own memory manager inside that".
Which just punts the problem from a mature and tested runtime library to some code you just make up on the spot.
Heap was invented for a reason, and some tasks are naturally easier to model with it.
The problem is that once it's there, people start using it as the proverbial hammer, and everything looks like a nail even if it isn't.
Note though that ""allocate all of memory into a buffer at startup" is a lot more viable if you scope it not to the start of the app, but to the entrypoint of some code that needs to make a complicated calculation. It's actually not all that uncommon to need something heap-like to store temporary data as you compute - e.g. a list or map to cache intermediary results - but which shouldn't outlive the computation. Ada access types give you exactly that - declare them inside the top-level function that's your entrypoint, allocate as needed in nested functions as they get called, and know that it'll all be cleaned up once the top-level function returns.
That works for something where the events being handled are like "serve a web page" or "compile a C function". It doesn't work for a spreadsheet or word processor or a web browser.
It would be more accurate to say that it doesn't work for some of the allocations in a spreadsheet or word processor app. Which is why you still have the global heap, but the point is that not everything needs to be on the same heap that has the same overall lifetime. That spreadsheet might still be running some algorithm that can do what it needs to do with a local heap.
And that aside, there are still many apps that are more like "serve a web page". Most console apps are like that. Many daemons are, too.
I'm not convinced it even works very well for either of those cases. It's common in many applications to return the result of a computation as an object in memory, like an array or string of arbitrary length or a treelike structure. Without the ability to allocate memory which exists after a function exits, I'm not sure how you'd do that (short of solutions which create arbitrary limits, like writing to a fixed-size buffer).
Well, yes, but I'm trying to be generous to the PoV.
My preferred solution is definitely to use the GC. With some help if you want. You can GC the nursery each time around the event loop. You can create and destroy arenas.
it's a fairly common usage in numeric computing. If you read, for example, the wikipedia entries for "computational fluid dynamics" you'll see that they consistently speak of "codes" when referring to programs.
reply