The very first impression of running 'valec' compiler is that it "panics" when no arguments are given. It literally writes so: "(panic)". I want to point out that "panic" is a very strong word and should be avoided in scenarios where a normal error handling takes place. Any kind of panic is always a sign of an uncontrolled situation, and if a program ever "panics" it leaves a bad taste in the mouth.
The next struggle was to get the basic help for the command-line parameters. But it's currently non-existent.
The next and final stuggle: trying a hello world sample. I copied the code from the website:
Sorry about that, it seems it doesn't print out the help file correctly any more. If you manually cat the valec-help-build.txt in the download, it should explain what you're looking for.
The compiler is very rough around the edges right now. August-May was spent being 100% focused on prototyping regions, and you're experiencing the tech debt I accrued on the way there (including the lack of an integration test for the help system). I've been paying that debt down for the last 1-2 months and we're still not back up to where we were at the 0.2 release.
If you need any more help, let me know, or swing by the discord server where there are many helpful folks. Cheers!
Isn't vale basically still in the R&D phase? That's how it has felt, certainly. I would expect specific commits on specific branches to work and that's it - no that they have a compiler that arbitrary people can use to start building things.
Then again, their README doesn't really indiciate this and does say "Try Vale" so idk. But it seems very R&D/POC at this point.
I don't understand, what set of programmers never (do/want to?) to reach 1.0? Are you referencing something like the Tex versioning scheme?
Conventionally 1.0 is just the inflection point where things are featureful and stable enough that you start making guarantees about the stability of whatever interface/contract/API/whatever that matter to the consumer of whatever you are building.
At least in the past a lot of useful (and actually 'finished') tools were forever at a some 0.x version, especially before semantic versioning got popular. I guess they just wanted some increasing version number but just '1, 2, 3, 4, 5, ...' looked too weird, so they went for '0.1, 0.2, 0.3, ...'.
These aren't even bugs, they're "UI problems." For something experimental, I think it deserves some slack even for actual bugs. UI-wise and even bugs-wise, eg 40 years old C++ debugging experience in 35 years old gdb will give any experimenral language a run for its money (eg printing funcname()::staticvarname is wierd UI and fails about half the time, etc.), not to mention C++ build systems. I think for experimental tech you might criticize the concept but it's entitled to a rough UI.
...more predictable latency than tracing garbage collection.
...better performance and cache friendliness than reference counting.
...prototype and iterate more easily than with borrow checking.
Ok, you had my curiosity, but now you have my attention.
Generational references leak if a counter reaches int_max and involve an increment on alloc and on free. Seems pretty close to reference counting to me.
> Generational references leak if a counter reaches int_max
With 64-bit counters, that's never going to happen. Alloc/free costs more than a nanosecond, and there are lot more than one element that you will be allocating, but even if you somehow managed to reallocate the same object a billion times a second, it would take over 500 years to run out of indexes.
What about if you reach 100 billion times a second (maybe with 128 cores) and have a long running system? Maybe in this case 128-bit or 96-bit counters are better...
If your program is allocating so much that you can exhaust a 64bit counter, you have a seriously bad program plus a serious memory leak. Exhausting the counter would be the least of your worries.
Practically speaking you could never allocate memory that fast, a memory allocation is going to be well over 1000ns on average.
Then there's the little matter of address space. Pointers on x64 are limited to 47 bits, meaning that if even you had a magical memory allocator with no book-keeping overhead, and all your allocations were 1 byte, you'd run out of pointers first. The actual virtual memory space is limited further on many operating systems, but you're still always going to be well short of 64bits.
Except when it is. You'd be surprised what systems programming looks like.
Reference counts are not re-used when memory is re-used. And again, even if for some reason you had a global 64bit counter that you incremented on every allocation and never decremented, and you could somehow handle a billion allocations per second, you'd have 585 years before that counter overflowed back to 0. No computer or program can run for that long.
VMAs sound expensive. Of course a 64 bit counter is going to work for moderately expensive things.
If you have a multithreaded app doing a lot of communication, that's going to be a lot of cheap allocations happening very fast.
Reducing GC and allocation overhead results in more allocations being done, and pushback against ever-expanding allocation behavior is more of a challenge. Instead of ten other things being a higher priority than judicious data architecture, it's dozens or more.
There is a separate counter for every memory object. One counter is only ever going to be touched by a single core at a time.
And even if there was a single counter, multithreading cannot make incrementing a counter faster. Two cores cannot write to the same cache line at the same time. Instead, cache lines need to bounce across cores when you write to them, and this takes such a long time that it turns the time it takes to roll over from centuries to millennia.
My understanding is that this has some benefits over reference counting, though it it similar. Part of the issues with reference counting is that they are shared, meaning that it needs to be atomically incremented and decremented whenever you make a new reference (for instance in C++ if you return a shared_ptr or similar). Generational references though instead track something as a part of the value you pass around, meaning that it has better locality and doesn't suffer from contention.
Copying references is assumed to be more frequent than allocating and freeing, so this is a win.
Typically that slot is disabled on overflow, so that no more objects can be created at that slot / memory location, which avoids the handle collision. The slot could be recycled at specific points in the code when it is certain that no more handles for this slot are out in the wild (not sure if Vale does that)
(Or possibly the whole region could be discarded once it was running full, and the physical memory recycled at a new virtial address. There's plenty of virtual address space to burn through when not limited to 32 bits)
Reference counting involves an increment every time a reference is shared. That's a lot more overhead than just on alloc and free - in particular, it involves going to main memory (or using part of the cache) a lot more in order to change the ref counts. Whereas on alloc and free, those frames have to fetched anyway (at least in my understanding of how memory allocators work)
Ref counts have cycles, which cause leaks (in this universe).
But I digress. Overhead like increments only matters on hot paths, which are very few. The Python + C stack for ML is a manifestation of this truth.
Having ergonomics of a “regular language” (affects all code) and the ability to optimize for performance (hot paths only) and stay in the same language is what I’m excited about.
Uh? While generational reference are memory safe your program still crash when there is a memory issue..
It's much better than going on in a corrupted state but still it's a crash.
GitHub: They keep 6%. With 3% for CC fees and 3% for GitHub. [1]
Patreon: Varies a bit more. Patreon takes 8%, unless they have been on the platform since before the 2019 change and are still on the 5% plan. And payment processing depends on size. Under $3 is 5% and $0.10 per transaction. Over is 2.9% and $0.30 per transaction. And more if PayPal or Venmo in not USD.
[2]
So the split seems much better on GitHub. But the conditions are a bit different for using the platforms, and you can get perks on Patreon which you may not be able to get on GitHub. I can't remember who / which project but I believe I saw one that said something about a difference in taxes / VAT and not being able to give some of the perks on GitHub because of it. Cannot find it right now though.
Cranelift is a compiler backend, mainly focused on JIT, but theoretically could replace LLVM, there’s an alternative backend being worked on but has limitations: https://github.com/bjorn3/rustc_codegen_cranelift
The approach of having options to optimize hot code paths with zero cost abstractions while still not having to worry about memory management in the vast majority of the rest of your code sounds like the best of both worlds to me (given that we only trade performance, not safety for convenience).
The problem of memory management is largely trivial _if_ you are in a small clean opinionated private codebase without cruft, collaborators, third party code, ...? :)
Google et al have been working on sanitisers etc because, even in well kept codebases with strict coding standards that are rigorously applied in reviews, memory bugs do actually creep in.
Memory management is trivial if your problem is trivial. In the real world you have network connections that fail, third party libraries with other conventions than yours, multiple threads with their own lifetimes, memory mapped files, etc.
Non-local bugs due to memory safety. If I have a function
def f(x)
return ...
in Python or Java, those functions will work regardless of what other modules I import. (modulo monkey patching in Python, though you can defend against that)
In C you don't have these guarantees -- foreign code can stomp on your code.
This is probably why C does not have an NPM-like culture of importing thousands of transitive dependencies -- because that style would just fall down.
Also a minor issue, but C doesn't have namespaces (for vars or macros), so it's a bit harder to compose foreign code.
Also compiler flags don't necessarily compose. Different pieces of C code make different portability assumptions, and different tradeoffs for speed.
So your argument is that C does not have strong module encapsulation, then you argue that Python does.
That is just plain false, since a Python module can trivially be tainted by what you import before, and the Python environment is widely known for its dependency hell.
Meanwhile C modules, once compiled, can be fully isolated from what you link against them, depending on build and link settings.
Non-local bugs is just a matter of sharing state across a module boundary. Memory errors is just a very small subset of the possible bugs you can have in a program, and preventing them doesn't magically solve all the other more important bugs.
Not disagreeing with your first paragraph and will add, that memory management mistakes happen to the best. But it is also probably true, that Google and others do this, because they know there will always be someone committing shit, no matter, whether they are at Google or another big company. So they want guarantees, not blind trust.
> _if_ you are in a small clean opinionated private codebase
This is actually an important point. I think all codebases can (and should) be split into small, opinionated, privately owned sub-codebases. This is why developing large scale projects can work even in languages like C. After all this is what that whole 'modularity' thing is about ;)
(it also implies that external dependencies need to be managed the same way you handle internal dependencies, as soon as you use an external dependency you also need to be ready to take ownership of that dependency)
Memory management is fundamentally a cross-cutting concern, so modules don’t help, unless you introduce some hard barrier (like copying everything at boundaries).
Modules work if they can operate without allocating or are generic over allocators. I don't really get why people think it's normal for e.g. a websocket decoder to insist on calling read, write, epoll, and mmap, if the user just wants to encode and decode messages.
Generational-indices also help to secure system boundaries. The memory is always owned and manipulated by a system, and the system only hands out generational-index-handles as "object references".
Arguably that's even a good idea in memory safe languages, it avoids tricky borrow checker issues, and also prevents the outside world to directly manipulate objects. Everything happens under control of the system.
If you witness the amount of effort/work/man-hours that is being poured into making memory management easier, I'd say it is far from a trivial problem.
If you witness the endless amount of bugs, many security related, which stems from the idea that people can handle memory, I'd say it is far from a trivial problem.
If you witness any modern language, a common design principle is to eliminate memory management. Which argues it is far from a trivial problem.
Elimination is perhaps too strong a word, as you can't eliminate it entirely. But you can reduce its cognitive load by a large factor. The amount of code which is being written in a garbage collected language is a witness.
More manual memory management methods still have their place, because there are problems where you can't afford to use a garbage collector, or where it gets into the way.
C++ will be relevant for many years to come. It has way too much momentum as a language and too much software has been written in C++ to ignore it. I personally think Rust will eventually carve up a large part of its niche though, because I think it has a far better approach to managing memory.
I dunno about the GP, but it's a JPL guideline to never use dynamic allocation after initialization. So it's not unthinkable. I'd suspect that many microcontroller programs might have to be really careful about using the heap just because they just don't have the memory to allocate that much.
https://www.perforce.com/blog/kw/NASA-rules-for-developing-s...
It's pretty easy in a lot of embedded applications to basically only have objects that live forever or are allocated on the stack. I usually aim for zero heap at all, and just have statically allocated objects for the 'forever' set (which makes it easier to see what's using memory). If you're careful you can also statically work out worst-case stack usage as well and have a decent guarantee that you won't ever run out of memory. If there are short-lived objects, a memory pool or queue is usually the best option (though at that point you do invite use-after-free type errors and pool exhaustion). I would say with this style it's extremely rare to have memory safety issues, but it's also not really suitable to a lot of applications.
C++ uses value type to mean either a scalar object (int, tuple<double> etc) or a container that manages heap memory for you, e.g. a vector of a value type. If you stay in that world you can basically ignore memory management.
Staying away from std::unique_ptr<T> and std::unique_ptr<T[]> while using std::vector<T> sounds kind of silly. The last one is a generalized version of the first two. So claiming you don't use the first two is really misleading.
I'm not sure how you define "value type" (it certainly isn't C++ terminology; are you coming from C#?) but in any case, this is a distinction without a difference. You can replace every use of std::unique_ptr with std::vector and just switch a few method calls (like using .data() instead of .get()) and you'd achieve the same effects, just slower. I'm not sure what the point would be though, other than to be able to claim that you don't use smart pointers.
I keep wondering what "safe" means in the context of generational references.
If I understand clearly, this prevents use-after-free and double-free? Thus, the program can still fail on a memory access when the expected and actual generations don't match? In this regard, this seems less "safe" than reference counting, tracing garbage collector, or borrow checking?
Double-frees are prevented by Vale's single ownership (in the C++ sense), generational references make it so use-after-frees are safely detected. If we try to access released memory via a reference, we should predictably+safely get either a segmentation fault or an assertion failure (and a future improvement involving remapping virtual space will make it so we get no segmentation faults, which I'm pretty excited for). Hope that helps!
> Double-frees are prevented by Vale's single ownership (in the C++ sense)
...wouldn't that also be prevented by the generation-check even if there is no single-ownership? Because once the referenced item is destroyed (and thus bumping that "memory slot's" generation counter) that item reference becomes invalid because the generation no longer matches, so the next attempt to release the item with that same reference should also fail?
One nice property of generational-indices is that they can be shared without compromising memory safety. As soon as the item is destroyed, all shared references in the wild automatically become invalid. But I guess single-ownership still makes a lot of sense for thread-safety :)
It's safe the same way a segfault is safe instead of just allowing to read or write random memory through a dangling pointer, but generational indices should also allow to check at runtime if an access would be valid before actually attempting the access. Not sure if that's possible in Vale though.
How would it even stop use-after-free and double-free?
The "check" function accesses the allocation because it needs the generation number of the allocation. So basically, the reference needs to access the allocation to check if it can access the allocation. Right.
(That doesn't work of course, because if the allocation was freed, access to the allocation and so its generation number is undefined).
This seems obvious so maybe I am missing something big here?
Or something entirely different is meant or targeted here with "memory safety".
I think the part where your reasoning is invalid is this part:
> so its generation number is undefined
With e.g. a random C compiler and a random malloc, that's true. But why couldn't the language and runtime cooperate to ensure it is defined?
For example deallocation can write a predictable value to that slot, which is never used as a legit generation index. The memory allocator can make sure that a memory address that ever contained a generation can never contain anything else than generation ids for the entire runtime of the program (e.g. by ensuring that for a given page, all objects are the same size and the allocations are aligned to that size). The language can make sure that nothing else can get written to such a memory address by enforcing bounds checks.
Yep, this is the correct answer. Accessing released memory is undefined in C, but well-defined in Vale. The goal is to ensure that the user predictably+safely gets either a segmentation fault or an assertion failure.
We have a future improvement planned here too: for unrelated reasons (to support generation pre-checking) the random generational references implementation will soon not even unmap any virtual address space, instead remapping it to a central page, so we won't even get any segmentation faults, just assertion failures.
Ok but then aren't you going to get memory fragmentation? If you allocate and then deallocate a billion 1kB objects, you can't then coalesce them to allocate larger units because the generation number locations before each 1kB can't be given back to user code.
In the basic generational references approach that was a drawback, and the reason it couldn't release memory back to the OS. We planned to use something like MESH [0] to reduce the fragmentation.
We created two newer approaches since then, which let any memory be reused for any purpose:
* Random generational references, where it's fine if generations overlap with other data.
* Side-table generations, which is slower but we keep the generations in a side-table. It's can be seen in old 0.1 versions as the "resilient-v2" mode, and I plan on resurrecting it for unrelated reasons.
The former will be the default, and the latter we'll be adding back in as an option. Hope that helps!
Traditional memory allocators set aside some of the memory for metadata (for instance to keep track of allocated and free memory regions), I guess that Vale stores the generation count associated with an "allocation item" in a similar way, e.g. somewhere else than the actual items.
Also, the blog post talks about 'generational indices', not pointers. This seems to indicate that items of the same type (or at least same size) are grouped into arrays (and since it's an index anyway, the metadata could be stored in one or multiple separate arrays at the same index).
The big step forward by Vale is that the compiler can elide most of the 'dangling checks' on memory accesses, the method outlined in the blog post requires a few rules-of-thumb the coder must follow when using a pointer that's been looked up from a generational-index.
If I am not wrong, there is a generation number embedded in the reference (smart pointer?). This allows to check if the generation of the reference and the generation of the referee match.
So indeed you it allows you to check for a match, as long as the alloc pointer is valid. The alloc pointer is invalid after a free, because it maybe be in a region no longer accessible to the program (it was returned to the os by free's implementation) or it was given out as part of an other allocation, in which case it can hold arbitrary data.
Not the same language as V. The latter got a very critical review at https://mawfig.github.io/2022/06/18/v-lang-in-2022.html which I've misattributed to Vale because they're similarly named. Leaving this here in case someone else has made the same mistake.
This "critical review", appears more like continuous old spam often used by detractors or to troll. It has no perceived value other than that, because it's a "review" (hit piece) of an alpha version of the language. It's 2023, and V is also in beta (0.4). Furthermore:
1) The creator of it used a disposable GitHub account, launched the review/attack for the drama, then disappeared.
2) The only thing they ever published on their blog, was the hit job on V. No other reviews ever made.
3) Anything, which had any kind of substance, is already fixed[1].
4) A search of mawfig.github, shows how it is spammed on HN, and usually used for smearing.
Yep, this was my bad. I thought Vala was dead because of a certain post (I think it was this one [0]), and because I rarely ever heard anyone mention it. I suspect I was wrong. I've been tossing around the idea of switching Vale's name to Valence to help avoid confusion.
“Vale is Fast: Vale is AOT compiled to LLVM, statically-typed, and uses the new generational references technique for memory safety with speed and flexibility, and will soon have region borrow checking to make it even faster.”
Yeah, this article was rather sparse on background, more intended for friends and sponsors and people who have been following along with Vale. A strategy that backfires with general audiences like HN!
TL;DR: Vale is like a cleaner C++, and it uses generational references [0] which are similar in spirit to running with ASan [1] turned on. Generational references have a bit of overhead, but it can be removed by regions [2] or more specifically, immutable region borrowing [3]. This helps Vale achieve its goal of being a high-performance language while still remaining memory safe.
Hope that helps, happy to answer any other questions =)
Being a C# developer, I absolutely love the syntax. I also like the Universal Function Call Syntax for it's fluency, kind of like pipes in elixir or F#.
I'm still going through the guide but one thing I find curious is the module naming when building your application. It seems like you clone a library to disk and then "import" it as a command line argument with the name you choose. I'm trying to wrap my head around how dependencies would work if you have the following situation:
Note how my_app uses a different name for the parse library than http.
If the http library uses "parse" in the source code when referencing the module (import parse) and my application uses "parse_with_different_name" when referencing the module (import parse_with_different_name), does that mean to compile my app I would have the following...
Funny story, that wasn't my original intent! I have a programming blog, but every day I'm finding weird facts that I want to write about, so I tend to sneak them in:
* And now I had to find a way to spread the word of Brigadier Sir Nils Olav III, so I used a side note in this one. It's embarrassing but I was giggling with glee all day yesterday at the thought of putting that note in!
I suspect this is a curse that a lot of bloggers can empathize with, but they don't have the proper lack of professionalism that I do.
Once I had these little side notes, I figured I'd give some sort of prize to the first person who told me they saw them, which evolved into "comment somewhere mentioning it!" which I guess is a social hack? Maybe? I'll allow it!
A distinguishing feature of Vale is: a) natively-compiled safe language b) that still has a sane syntax.
I remember seeing Vale many years ago (~10). Back then, it was something revolving around Gnome project, but now it still has a pre-prelease version 0.2-alpha. This means that the project's progress is relatively slow, but the language is very interesting to me.
Update: I confused Vale with Vala! Vale is the new project, Vala is 10+ years old, but they have an intersection of syntax, goals and ideas. That's why I was misled by seeing a similar name. Val-vale is almost the same!
I wish there was some way for me to know right off the bat that this article wasn't about the Vale natural language linter. I mean, it didn't take long, but still. Is there some notation for the linter nomenclature I'm missing?
Not much! It's still very young, still in the prototype phases. When it's more mature and polished I hope it will be useful for those writing servers and games mostly.
Correct me if I’m wrong, but I believe this is an alternative way to describe a system that is equivalent to copy on write, but with ahead of time analysis on reference counting, which means we can eliminate most of reference counting. We have this kind of analysis is already done in reference counted languages like Swift which also do copy on write.
It's not ARC, and not reference counting at all, but closer to an idea that has become quite popular in game development (because it's trivial to implement, doesn't need compiler support, and works in any language that has indexable arrays):
(disclaimer: I only wrote a blog post about it, that idea is much older and probably has been re-invented many times over since the first computers were built)
Essentially "non-owning weak references with spatial and temporal memory safety".
What is similar to ARC though is that moving that stuff into the language lets the compiler remove redundant handle-to-pointer conversions, similar to how with ARC the compiler can remove redundant refcounting operations.
They don't seem to be quite zero cost though when applied to the whole program, because they require changes to the allocator to ensure the generations are never overwritten by user data.
If you store them inline with the program data for max speed(tm) you need to ensure that e.g. after 2 2kb chunks are deleted, you don't overwrite them with a 4kb chunk, because that would trample over a generation.
If you do keep the generations inline and rely on a statistical approach, you have to be very careful to never generate "common numbers" like 0 as a generation because then it's extremely likely there will be a collision.
It'd a hard problem and I'm quite curious how all the edge cases are handled.
Maybe Vale's "regions" are per-type (essentially arrays)? That way a specific memory location would only ever be used for that same type (== same size in memory) until the whole region is destroyed.
iOS is getting a 'typed allocator' which seems to work similar:
Yeah typed allocator would be my guess but those aren't zero cost either. They increase memory usage since if your program allocate an array of 100 ints, delete them, and then allocate an array of 100 floats, unless you allocate more ints on the heap that memory isn't getting reused.
I didn't understand that blog post very well, but it made me think of "generational arenas"[0], and I'm curious how they compare? Generational arenas sound similar because they involve passing an index around instead of a pointer, are designed to handle many small self referencing "objects", and are popular in games, so in my mind they seemed similar.
This is not a reference counted system, but you manually free the memory. However, the references are safe to use in the sense that they can detect when the object they are referring to, is deleted.
The very first impression of running 'valec' compiler is that it "panics" when no arguments are given. It literally writes so: "(panic)". I want to point out that "panic" is a very strong word and should be avoided in scenarios where a normal error handling takes place. Any kind of panic is always a sign of an uncontrolled situation, and if a program ever "panics" it leaves a bad taste in the mouth.
The next struggle was to get the basic help for the command-line parameters. But it's currently non-existent.
The next and final stuggle: trying a hello world sample. I copied the code from the website:
and saved it to 'hello.vl' file. Then, I tried to build it: However, no luck for me this time: It looks like I should specify 'build' command. Let's try: Well, here is the result: Hm. Let's try to get some help: The result of the help command is: Not very helpful. At this point, I gave up. How does this thing work?