I'm very excited by the work being put in to make Bayesian inference more manageable. It's in a spot that feels very similar to deep learning circa mid-2010s when Caffe, Torch, and hand-written gradients were the options. We can do it, but doing anything more complicated than common model structures like hierarchical Gaussian linear models requires dropping out of the nice places and into the guts.
I've had a lot of success with Numpyro (a JAX library), and used quite a lot of tools that are simpler interfaces to Stan. I've also had to write quite a few model-specific things from scratch by hand (more for sequential Monte Carlo than MCMC). I'm very excited for a world where PPLs become scalable and easier to use /customize.
> I think there is a good chance that normalizing flow-based variational inference will displace MCMC as the go-to method for Bayesian posterior inference as soon as everyone gets access to good GPUs.
Wow. This is incredibly surprising. I'm only tangentially aware of normalizing flows, but apparently I need to look at the intersection of them and Bayesian statistics now! Any sources from anyone would be most appreciated!
if you want to understand chip architectures, work through the elements of computing systems (aka nand to tetris). it walks you through implementing your own cpu, gate by gate, plus an operating system, compiler, and tetris game, in a way tested to successfully fit inside of a single-semester undergraduate class. their hardware designs are pretty weird (their hdl looks nothing like any hdl you can actually synthesize hardware from, their descriptions of how gates work are oversimplified, and their instruction set architecture is pretty similar to the eclipse that kidder was writing about but completely unlike any current design) but that isn't really important
after completing the work in that book and raising investment capital for your company, you can honestly call yourself a fullstack engineer
— arm —
the best architecture to write assembly for is the arm—the original one, not thumb, which is a bit of a pain—and arm is also only slightly more complex to build than the fake-ass architecture in nand2tetris, and quite a bit faster. if you already know any assembly language, including the nand2tetris one, the best introduction to arm might not actually be any current tutorial but rather the vlsi arm3 datasheet from 01990 https://www.chiark.greenend.org.uk/~theom/riscos/docs/ARM3-d... which has a summary of the instruction set on p.1-7 (9/56) and a fully complete description on pp.3-12 to 3-43 (19/56 to 34/56)
as you probably know, the vast majority of the cpus around you run the arm instruction set or its thumb variant, although amd64 and aarch64 (arm64) are also significant. the current arm architecture reference manual describes all the new instructions that have been added since arm3, as well as thumb, which makes it enormous and intimidating, with a very low signal-to-noise ratio
perhaps the best introduction to writing assembly language in general on current systems is https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht..., which is not really focused on assembly at all, but on understanding how the interface between the operating system (i386 linux in this case) and user programs work, which just happens to be at the assembly-language level
it's also a lot more fun to read than any of these except maybe kidder and levy
— risc-v —
the risc-v architecture is very similar to arm but simpler; however, it's a little more of a pain to program, and there aren't any high-performance implementations of it out there, something that's likely to change in the next few years. the part of the risc-v manual https://riscv.org/wp-content/uploads/2019/12/riscv-spec-2019... that corresponds to the part of the arm3 manual i recommended above is chapter 2, rv32i base integer instruction set, pp.13–29. this reflects an additional 30 years of computer architecture lessons from arm and its successors, and a lot of those lessons are helpfully explained in the italic notes in the text. geohot livecoded a full implementation of risc-v in verilog on his twitch stream a few years ago, so you can literally implement risc-v in an afternoon: https://github.com/geohot/twitchcore
gcc can compile to risc-v and generates decent code, and linux can run on it, but the risc-v that linux runs on is quite a bit hairier to implement than the rv32i isa; you have to implement the risc-v privileged isa
— if what you're interested in is how varied cpu architectures can be —
weird in a different direction is the tera mta, which has 128 hardware threads and context-switches every clock cycle; the 11¢ padauk pmc251 microcontroller does the same thing (but with only two threads; padauk sells parts with up to four threads)
the tera mta was designed to compete in the vector supercomputer field originally defined by the cray-1, which had a very different architecture; the convex https://news.ycombinator.com/item?id=40979684 was very similar to the cray-1
in a sense, though, the cray wasn't really the first supercomputer; the cdc 6600, also designed by seymour cray, was, and it was arguably the first risc, in 01964
unlike all of these, the burroughs 5500 architecture had no registers, just a stack, and that's what smalltalk, java, and c# are based on
the 12-bit pdp-8 was the first really mass-produced computer, with over 50000 units sold, and its architecture is interestingly different from all of these, too; intersil produced a 4000-gate single-chip version, and the family was popular enough that there are pdp-8 hobbyists even today
most current numerical computation is being done on gpus, and i don't know what to recommend as reading material on gpus, which have bizarrely different instruction set architectures that use a model called 'simt', single instruction, multiple thread. if anyone does, let me know
finally, chuck moore's line of two-stack processors based on forth are a sadly underdeveloped branch of the genealogical tree; koopman's stack computers: the new wavehttps://users.ece.cmu.edu/~koopman/stack_computers/index.htm... goes into their historical development a bit. they had some significant success in the 80s (the novix nc4000 and the harris rtx2000) but the later members of the family (the sh-boom, the mup21, the stillborn f21, the intellasys seaforth 40c18, and the greenarrays ga4 and ga144 chips) have no real users, in part due to malfeasance—after losing a protracted lawsuit with moore, intellasys head dan leckrone now goes by 'mac leckrone', perhaps as a result
there are lots of other interestingly different instruction set architectures out there: s/360, 6502, amd64, pic16, the erim cytocomputer, the thinking machines cm4, the em-4 dataflow machine, the maxim maxq, the as/400, webassembly, vulkan spir-v, the hp 9825 desk calculator, saturn (used in the hp 48gx), etc.
The university I went to was notorious for having their own textbook for Analysis 1 and 2 that had 1600 or so exercises of just calculations and covering all cases(for example, if a theorem gave you 3 conditions for it to work you would work through 3 examples with one condition not present and how the theorem would fail, and stuff like). It was completely different to a theory textbook like Rudin/Tao, which had fewer exercises that were more focused on "did you understand the abstract object" and "use your knowledge to prove this slightly modified proposition or easy extension of the theorem".
If you want to practice addition/subtraction I suggest Zetamac(https://arithmetic.zetamac.com/), this is what most people use to train for HFT/MM interviews(although I heard there are more specialized tests now).
This is Vinge's annotated copy of A Fire Upon the Deep. It has all his comments and discussion with editors and early readers. It provides an absolutely fascinating insight into his writing process and shows the depth of effort he put into making sure everything made sense.
Why are people so enamoured with LR parsers, again?
Anyway, I wanted to comment about the (2) note in the post, about using the FOLLOW sets for errror recovery in LL parsers: it's actually a bit more nuanced than just "when parsing non-terminal A, on unrecognized input: skip all tokens until a token in FOLLOW(A) appears".
The actual strategy (which I've first learned from Per Brinch Hansen's "On Pascal Compilers", sec. 5.8. "Error recovery" and then re-encountered much later when studdying the internals of the Go compiler) instead involves considering the FIRST sets of the sibling non-terminals in the call stack. A simple and efficient way to implement it is by augmenting each non-terminal recognizing procedure with a "stop" parameter holding the "stop" set of tokens, which would start as just set([EOF]) at the very top level. Then, if you're parsing a rule of "A ::= B1 B2 ... Bn" kind, you do it like this:
This approach allows for slightly more precise errory recovery because it basically ends up using union of FOLLOW(A) and all of its parent non-terminals as the stop sets. You can also see this idea proposed e.g. in [0], at the paragraph starting with "What is a reasonable RECOVERY set in a general case?" but it's not implemented there in the way I've described.
I watched Tim Hunkin explain sewing machines when I was about 8 and have never lost my fascination with them (or mechanical engineering) since then.
https://youtu.be/8lwI4TSKM3Y
if you don't have reflinks (or some non-linux equivalent) you're going to have to load the entire file into an array when you open it anyway, even if that array isn't where you edit it
the performance cost of a dumb array is being oversold here. it's true that insertion and deletion are slow in arrays, but let's put this in perspective. i'm displaying my editor on a 2-megapixel display, which is 8 megabytes of pixels because they're 32 bits per pixel. drawing a single frame of video involves the same amount of memory traffic as memmove()ing 4 megabytes of text. so until you have a few megabytes, even with a dumb array, ordinary text editing operations will easily support interactive responsivity. even things like refilling paragraphs and indenting or outdenting regions, which might involve hundreds of individual edits, is fine on everyday-sized files. this laptop can do about 20 gigabytes a second to main memory, 10 gigabytes of memmove(), so in 16.7 milliseconds (a 60-hertz frame) it can memmove() 167 megabytes. it isn't until you get into things like search-and-replace jobs on big logfiles that you start to feel the pain
the complexity of ropes is also being oversold here. as i pointed out in a comment in a subthread, you can make ropes quite complex, but https://github.com/ivanbgd/Rope-Data-Structure-C is 339 lines of code. it's probably not a real-world rope, though; it has far too much memory overhead
in a garbage-collected language, like the cedar language where ropes originated, most of that complexity goes away, and production-quality ropes can be only a few hundred lines of code
another somewhat popular rope implementation is librope, in c https://github.com/josephg/librope which is 684 lines of code and looks to be production-quality
there's an unrelated librope-ocaml in debian (and opam); it's 1029 lines of code but nothing in debian depends on it. it looks quite full-featured; the documentation says it 'has all non-deprecated operations of String' http://chris00.github.io/ocaml-rope/doc/rope/Rope/
in ur-scheme i used the simplest, dumbest kind of rope for building up output text (input was parsed into s-expressions character by character): an output text was either a string, a pair (representing the concatenation of its arguments), or nil (representing the empty string). this is very similar to erlang's io lists. the only operations supported were (constant-time) concatenation (implemented as cons) and (linear-time) flattening into a string; this took 27 lines of scheme because the subset of scheme i implemented it in was quite limited. http://canonical.org/~kragen/sw/urscheme/
I was a theoretical physicist for 13 years, and struggled a lot with this question. I found it very useful to develop several different styles for reading mathematics and physics. Mostly I did this in the context of reading papers, not books, but the comments below are easily adapted to books.
One unusual but very useful style was to set a goal like reading 15 papers in 3 hours. I use the term "reading" here in an unusual way. Of course, I don't mean understanding everything in the papers. Instead, I'd do something like this: for each paper, I had 12 minutes to read it. The goal was to produce a 3-point written LaTeX summary of the most important material I could extract: usually questions, open problems, results, new techniques, or connections I hadn't seen previously. When time was up, it was onto the next paper. A week later, I'd make a revision pass over the material, typically it would take an hour or so.
I found this a great way of rapidly getting an overview of a field, understanding what was important, what was not, what the interesting questions were, and so on. In particular, it really helped identify the most important papers, for a deeper read.
For deeper reads of important papers or sections of books I would take days, weeks or months. Giving lectures about the material and writing LaTeX lecture notes helped a lot.
Other ideas I found useful:
- Often, when struggling with a book or paper, it's not you that's the problem, it's the author. Finding another source can quickly clear stuff up.
- On being stuck: if you feel like you're learning things, keep doing whatever you're doing, but if you feel stuck, try another approach. Early on, I'd sometimes get stuck on a book or a paper for a week. It was only later that I realized that I mostly got stuck when either (a) it was an insubstantive point; or (b) the book was badly written; or (c) I was reading something written at the wrong level for me. In any case, remaining stuck was rarely the right thing to do.
- Have a go at proving theorems / solving problems yourself, before reading the solution. You'll learn a lot more.
- Most material isn't worth spending a lot of time on. It's better to spend an hour each seriously reviewing 10 quantum texts, and finding one that's good, and will repay hundreds of hours of study, than it is to spend 10 hours ploughing through the first quantum text that looks okay when you browse through it in the library. Understanding mathematics deeply takes a lot of time. That means effort spent in identifying high quality material is often repaid far more than with (say) a novel or lighter non-fiction.
Seamless Streaming looks really promising! We just had a new employee start a few months back with profound hearing loss and our company had no idea what to do with him from an accessibility standpoint. They threw out solutions like Dragon, not realizing those solutions are not real-time.
He ended up rolling his own solution by standing up Whisper in one of our clusters and writing a basic front end and API to take his laptop’s mic input and chunk it every few seconds to send to the model and get back text in pseudo-realtime. We got him a pretty beefy Alienware so he wouldn’t be tied to the cluster GPUs. I can’t wait to see what he does with these new models!
This is obviously good but note that it appears to be covering mesothelioma only, and "The median [overall survival] OS was 15.4 months (95% CI, 11.1-22.6) for UV1 plus ipilimumab and nivolumab (treatment arm) versus 11.1 months (95% CI, 8.8-18.1) for ipilimumab and nivolumab alone." Another headline could be: "Cancer vaccine helps some mesothelioma patients live an extra 4.3 months," which is less exciting than the headline's current phrasing. It's like how people are horrified by "you have a 20% chance of dying" but somewhat reassured by "you have an 80% chance of living."
I'm dying from squamous cell carcinoma and am more excited about the Moderna approach with personalized cancer vaccines, like mRNA-4157. Early data for what I have is promising: https://www.fiercebiotech.com/biotech/moderna-s-keytruda-com...: "The combination treatment shrank tumors in five patients with head and neck cancer (50%), eliminating the tumors in two of those patients, Moderna said in a statement. Another four patients in that group had stable disease, meaning their tumors had stopped growing." Recurrent/metastatic head and neck squamous cell carcinoma is almost always fatal: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8155962/, unless the person responds to pembrolizumab/Keytruda (which I do not; about 20 - 30% of patients appear to respond, based on KEYNOTE-048: https://ascopubs.org/doi/full/10.1200/JCO.21.02508). Unfortunately, Moderna won't make mRNA-4157 available under compassionate use or through any other means. :(
In my opinion, the fundamental explanations you seek lie in Probability Theory, not matrix theory. When it comes to ML, matrices are just implementation details. I highly suggest this set of notes: https://chrispiech.github.io/probabilityForComputerScientist...
If you're eager to learn WebGPU, consider checking out Mach[0] which lets you develop with it natively using Zig today very easily. We aim to be a competitor-in-spirit to Unity/Unreal/Godot, but extremely modular. As part of that we have Mach core which just provides Window+Input+WebGPU and some ~19 standalone WebGPU examples[1].
Currently we only support native desktop platforms; but we're working towards browser support. WebGPU is very nice because it lets us target desktop+wasm+mobile for truly-cross-platform games & native applications.
>Regarding manufacturing and machines/machining, any book or resources that stood out? I'm most familiar with the Machinery's Handbook.
I went to a top tier school for MechE and Materials, and would recommend two intro books: Engineering Mechanics Statics by Meriam and Kriage and Shigley's Mechanical engineering Design in that order . If you fully understand the contents of these book, it probably puts you in the top 10% of mechanical engineering graduates.
For a broader education, you can read Fundamentals of Heat and Mass transfer by Incropera, DeWitt, Bergmann & Lavine as well as Fundamentals of Fluid Mechanics by Munson, Young & Okiishi.
Understanding these two books will probably as well will probably put you in the top 1% of grads.
If you have a strong background in mathematics, these mostly deal with applications of linear algebra and differentials, so the value is understanding the applications.
From there, you can branch out. If applicable, Ogata's Modern Control Engineering and Tongu's Principles of vibration
Most undergraduates dont really understand these due to the heavy application of Laplace and Fourier transforms, but are relevant if you want to build complex machines.
When I went for my annual wellness exam, the doctor's office had me acknowledge that my wellness exam would cost $350 or something in the event insurance did not pay for it, and there were posters up informing people that they have a right to ask for a good faith estimate.
Hi all, I'm the author of pdqsort that's credited in the post (to be clear: the authors acknowledged pdqsort at the bottom, I am not associated with the posted work directly). I recently held a talk at my institute on efficient in-memory sorting and the ideas in pdqsort, in case you're interested in hearing some of the theory behind it all: https://www.youtube.com/watch?v=jz-PBiWwNjc
Next week I will hold another talk in the Dutch seminar on data systems design (https://dsdsd.da.cwi.nl/) on glidesort, a new stable sorting algorithm I've been working on. It is a combination of adaptive quicksort (like pdqsort, fully adaptive for many equal elements) and an adaptive mergesort (like timsort, fully adaptive for long pre-sorted runs). It is the first practical implementation of an algorithm I'm aware of that's fully adaptive for both. Like pdqsort it uses modern architecture aware branchless sorting, and it can use an arbitrary buffer size, becoming faster as you give it more memory (although if given a constant buffer size it will degrade to O(n (log n)^2) in theory, in practice for realistic workloads it's just a near-constant factor (c ~<= 3-5) slower).
The source code isn't publish-ready yet, I have to still do some extra correctness vetting and testing, and in particular exception-safety is still not yet fully implemented. This is important because I wrote it in Rust where we must always give back a valid initialized array, even if a comparison operator caused a panic. But I do have some performance numbers to quote, that won't significantly change.
For sorting 2^24 randomly shuffled distinct u32s using a buffer of 2^23 elements (n/2), glidesort beats Rust's stdlib slice::sort (which is a classic timsort also using a buffer of n/2) by a factor of 3.5 times. When stably sorting the same numbers comparing only their least significant 4 bits, it beats stdlib slice::sort by 12.5 times using 6.5 times fewer comparisons, both numbers on my Apple M1 Macbook. All of this is just using single-threaded code with a generic comparison function. No SIMD, no threads, no type-specific optimizations.
Finally, glidesort with a buffer size of >= n/2 is faster than pdqsort.
You gave the answer! Integration tests. And they work recursively, with a Kalman filter to approximate even in noisy conditions.
"USL was inspired by Hamilton's recognition of patterns or categories of errors occurring during Apollo software development. Errors at the interfaces between subsystem boundaries accounted for the majority of errors and were often the most subtle and most difficult to find. Each interface error was placed into a category identifying the means to prevent it by way of system definition. This process led to a set of six axioms, forming the basis for a mathematical constructive logical theory of control for designing systems that would eliminate entire classes of errors just by the way a system is defined.[3][4]"
There's a diagram of rules on the USL Wikipedia page. The rules show triangle feedback loops with a parent, left, right child. Those are like generations of a Sierpiński triangle. Every part is trying to serve the Good Cause that it's working for, and love its neighbour.
"any state-space model that is both controllable and observable and has the same input-output behaviour as the transfer function is said to be a minimal realization of the transfer function
The realization is called "minimal" because it describes the system with the minimum number of states."
"the problem of driving the output to a desired nonzero level can be solved after the zero output one is."
An electronic analogy: find GND, then solve for 1.
A common solution strategy in many optimal control problems is to solve for the costate (sometimes called the shadow price)
A shadow price is a monetary value assigned to currently unknowable or difficult-to-calculate costs in the absence of correct market prices. It is based on the willingness to pay principle – the most accurate measure of the value of a good or service is what people are willing to give up in order to get it.
The costate summarizes in one number the marginal value of expanding or contracting the state variable next turn.
Each part looks ahead 1 generation, chooses left or right.
convert a representation of any linear time-invariant (LTI) control system to a form in which the system can be decomposed into a standard form which makes clear the observable and controllable components of the system
Take a big problem, break it down, look for I/O ports. Or in software test development: layers of abstraction. A suggestion: only add a layer of abstraction when it's too big to fit on the screen at once. Use tools like code folding, tree views.
https://www.youtube.com/watch?v=mKtctCyd0rs&list=PLWEiAJhCw-...