More

variadix · 2025-05-13T20:59:33 1747169973

If it has inline assembly it’s low-level in my opinion. This feature (or at a minimum the ability to link against and interface with assembled objects) is _the_ requirement for all hardware-facing programming, since general purpose programming languages cannot represent all possible hardware minutiae within the language.

variadix · 2025-05-13T19:09:15 1747163355

This is certainly an interesting argument for making certain behavior (in this case, uninitialized access) the default and UB. There’s a similar argument for making signed overflow UB instead of defined to wrap, even if you’re only targeting two’s-complement machines, that is: leaving the behavior undefined enables analyzers to detect the behavior and making it the default can make otherwise silent errors detectable across all programs. I think I’ve come around to wanting these to be undefined and the default, it’s unintuitive but defined wrapping or zero initialized may be undesirable behaviors anyway.

jimbob45 · 2025-05-13T20:11:54 1747167114

The dangerous behavior should be opt-in, not opt-out. I appreciate that C gives you all of these neat footguns but they need to be hidden to find for only those who need them. Stuff like implicitness being the default for functions, int wrapping, and non-initialized variables just give rise to bugs. And for what? So first-year students can have their unoptimized code be 0.00001% faster by default? It's dumb.

variadix · 2025-05-13T21:23:47 1747171427

If it’s opt-in then code written for the default (ie most code that wasn’t written to use the unintuitive behavior for some performance reason) will be either well-formed code that relies on the behavior (fine, but for signed int wrapping this is rare, for zero init this is common but not always the case) or ill-formed code that subtly fails (e.g. no check for overflow or zero is an invalid value). The code that is ill-formed cannot be checked by present or future static or dynamic analyzers, since the failure condition (signed overflow or access before assignment) is _defined to be something valid_ thereby preventing analyzers from determining whether the programmer intended the signed arithmetic to overflow or not, or for that value to be zero or not, etc.

Hopefully I’m communicating why it is useful to leave the default behavior undefined or invalid. It doesn’t really have to have anything to do with performance, signed wrapping is no less performant on two’s-complement machines (barring compiler optimizations enabled by assuming no overflow) since it is the result produced by add instructions on overflow. The benefit is that it enables instrumentation and analyzers to detect this behavior because it is known to be invalid and not something the programmer intended.

As an analogy, consider what would happen if you defined out of bounds array access to be _something_, now analyzers and instrumentation cannot detect this as an error, since the programmer may have intended for whatever that defined result is to occur in that case.

variadix · 2025-05-12T15:14:34 1747062874

Eh, there are better implementations that are less syntactically obtuse (no ->void) but other than that it’s fine. Fairly obvious what it’s supposed to do and I’ve needed similar things in the past. There’s a cppcon talk that use ->* operator for precedence reasons and the macro lets you use it like ‘defer { … };’

variadix · 2025-05-12T08:55:31 1747040131

I’ve noticed a similar issue in a different crypto library before (mbedTLS), IIRC their MPI implementation allocated and deallocated _a lot_ of tiny allocations during ECC operations.

variadix · 2025-05-11T09:12:21 1746954741

How does this compare to a 128 bit MCG or LCG?

See https://www.pcg-random.org/posts/does-it-beat-the-minimal-st... for examples and constants

the_othernet · 2025-05-11T09:25:04 1746955504

Using only 32bit fast_loop and mix (for 64bits of state) passes PractRand with 64bit output up to 256GB with only one unusual. That's about a 100 bit LCG equivalent? I did have to alter the output to be "(GR * mix) + fast_loop" and change the rotation constants to be 12 and 5.

variadix · 2025-05-11T09:07:58 1746954478

Path tracing

variadix · 2025-04-29T18:18:13 1745950693

Depends on if you need to allocate/deallocate nodes. If you construct the tree once and don’t modify it thereafter you don’t need to. If you do need to modify and alloc/dealloc nodes you can use a bitmap to track free/occupied slots which is very fast (find first set + bitmanip) and has minuscule overhead even for integer sized elements.

josephg · 2025-04-30T00:38:19 1745973499

Yeah, or just store all freed nodes in a linked list. Eg, have a pointer / index from the root to the first unused (free) node, and in that node store a pointer to the next one and so on. This is pretty trivial to implement.

In my case, inserts and read operations vastly outnumber deletes. So much so that in all of my testing, I never saw a leaf node which could be freed anyway. (Leaves store ~32 values, and there were no cases where all of a leaf's values actually get deleted). I decided to just leak nodes if it ever happens in real life.

The algorithm processes data in batches then frees everything. So worst case, it just has slightly higher peak memory usage while processing. A fine trade in this case given it let me remove ~200 lines of code - and any bugs that might have been lurking in them.

variadix · 2025-04-29T14:02:38 1745935358

Not directly, but I wouldn’t be surprised if there’s enough of an efficiency improvement to obviate hiring an engineer or two (across 100+ people). In the same way that Google and StackOverflow made people more efficient when compared to having to otherwise search through and read physical documentation (to debug, to understand some API or hardware thing), LLMs have made me more efficient by being able to get tailored answers to my questions without having to do as much searching or reading. They can provide small code examples as clarification too.

In many ways LLMs feel like the next iteration of search engines: they’re easier to use, you can ask follow up questions or for examples and get an immediate response tailored to your scenario, you can provide the code and get a response for what the issue is and how to fix it, you can let it read internal documentation and get specialized support that wouldn’t be on the internet, you can let it read whole code bases and get reasonable answers to queries about said code, etc.

I don’t really see LLMs automating engineers end-to-end any time soon. They really are incapable of deductive reasoning, the extent to which they are is emergent from inductive phenomena, and breaks down massively when the input is outside the training distribution (see all the examples of LLMs failing basic deductive puzzles that are very similar to a well known one, but slightly tweaked).

Reading, understanding, and checking someone else’s code is harder than writing it correctly in the first place, and letting LLMs write entire code bases has produced immense garbage in all the examples I’ve seen. It’s not even junior level output, it’s something like _panicked CS major who started programming a year ago_ level output.

Eventually I think AI will automate software engineering, but by the time it’s capable of doing so _all_ intellectual pursuits will be automated because it requires human level cognition and adaptability. Until then it’s a moderate efficiency improvement.

timtas · 2025-04-30T22:32:20 1746052340

Excellent breakdown. Software engineering will be automated end-to-end around the same time as doctors and lawyers.

variadix · 2025-04-22T20:01:21 1745352081

The use case that comes to mind is doing manual compile time optimization based on macro arguments. E.g. you have some assembly block that is fast but requires some immediate arguments, and you have a fallback path for the dynamic case, and you want to determine which one to call at compile time based on whether the arguments are constants or not.

variadix · 2025-04-22T19:46:04 1745351164

__builtin_choose_expr can be used instead of a ternary to avoid the type conversion rules that require the typeof cast