Hacker News new | past | comments | ask | show | jobs | submit | neilmovva's favorites login

Caveat: I'm approaching this from a 'indie hacker' mindset. The article mentions $10 million funding rounds, so they're probably approaching this from a different mindset.

The article distinguishes between a product business and a service business. In the former, the founders find a problem, build a solution, and then sell that solution to people who have the problem. In the latter, the founders find people who have a problem, and then build a solution for that problem.

In a product business, the risk is that you solve the wrong problem. In a service business, the risk is that you never find a problem to solve.

I used to be a schoolteacher, in schools where teachers tended to be on the verge of retirement. Most of my colleagues used computers because one day their typewriter had been taken away and a computer put in its place. Their technology use tended to be quite basic: emails and basic word processing. Most workflows were unchanged since the typewriter and mimeograph days. Almost all planning was still done on paper, on especially-printed planning books. (We were individually asked whether we wanted week-to-a-page or day-to-a-page when the stationery order was being prepared for the next school year.)

(Lesson planning is not a problem to be solved; I'm just using it as an example of the kind of mindset I was working alongside.)

This is a problem-rich environment filled with people who don't know that many of their problems can be solved by computers. You can't ask these people what problems they want solved, because they don't know that their problems can be solved. You have to have something to show them and point to and talk about before they can grasp the idea that it's even possible to automate some of their work.

However, I speak from a position of privilege. I spent years working in that environment, and I walked away with a list of problems - a list of potential products. If you've only worked in software, you won't know any good problems. You'll know some bad problems, and they'll be bad because your potential customers - other software engineers - are just as capable of building their own solutions as you. In that case, a service business is probably no more risky than a product business.

If you start a product business, your moat is the insider knowledge you have of how to solve the problem you solve. If you start a service business, your moat is your reputation for solving people's problems. If you have no insider knowledge, you start your business with no moat at all, and have to dig it while building your castle, whereas the founder with inside knowledge just has to build their castle within their pre-dug moat.


People in the HPC/classical supercomputing space have done this sort of thing for a while. There's a fair amount of literature on lossless floating point compression, such as Martin Burtscher's work or stuff out of LLNL (fpzip):

https://userweb.cs.txstate.edu/~burtscher/ https://computing.llnl.gov/projects/floating-point-compressi...

but it tends to be very application specific, where there tends to be high correlation / small deltas between neighboring values in a 2d/3d/4d/etc floating point array (e.g., you are compressing neighboring temperature grid points in a PDE weather simulation model; temperature differences in neighboring cells won't differ by that much).

In a lot of other cases (e.g., machine learning) the floating point significand bits (and sometimes the sign bit) tends to be incompressible noise. The exponent is the only thing that is really compressible, and the xor trick does not help you as much because neighboring values could still vary a bit in terms of exponents. An entropy encoder instead works well for that (encode closer to the actual underlying data distribution/entropy), and you also don't depend upon neighboring floats having similar exponents as well.

In 2022, I created dietgpu, a library to losslessly compress/decompress floating point data at up to 400 GB/s on an A100. It uses a general-purpose asymmetric numeral system encoder/decoder on GPU (the first such implementation of general ANS on GPU, predating nvCOMP) for exponent compression.

We have used this to losslessly compress floating point data between GPUs (e.g., over Infiniband/NVLink/ethernet/etc) in training massive ML models to speed up overall wall clock time of training across 100s/1000s of GPUs without changing anything about how the training works (it's lossless compression, it computes the same thing that it did before).

https://github.com/facebookresearch/dietgpu


The amount of expertise that went into this 1 month project is crazy and it's all really cool and well put together.

I don't comprehend how you made no mistakes on the journey after drafting the PCBs and writing drivers. From my POV as a software developer, C has so many pitfalls that it is incomprehensible to me that things will Just Work, especially in the context of something that is meant to run for a very long time and not be "restarted."

Why do sensor things at all? What is the ROI for the person who needs that stuff? I mean this in no derogatory sense, I really admire this work.

But the academics who need something something hardware are either so rich they use something commercial / the paid core or so poor they'll use someone else's refuse or a grad student to do it 10x worse & 10x slower for free. Lab equipment, sensors, whatever.

If it's for an industrial purpose, the ultimate consumer for hardware 2 guys can make is the government, as far as the eye can see. Like the people who have a business stake in e.g. the ocean ecosystem are fishermen, oil people, shippers, whatever, and they're only doing this because of a government regulation or threat thereof or whatever. I view government needs as worthwhile, they are a worthy customer, it's that the ROI is essentially imaginary, it's whatever the payer values government compliance and that can be infinitely large or small.

My background in this is very limited, I didn't take "How to Make," I don't know how to use anything in a fablab, but in an intellectually honest way, the audience for "polished, well working gizmo with bug-free firmware" is 1,000,000x larger when it's a coffee machine than any academic or industrial purpose. Why not make "the perfect espresso machine" or "the perfect bike" or whatever? There are $3m Kickstarters for coffee machines whose #1 actual obstacle to successful execution is writing firmware. There are e-bikes that are 10x expensive or 10x crappier because ultimately it's too challenging to make a single firmware and controller to make disparate commodity parts work together cohesively.

I am not at all raining on this parade, because this little blog post was so mind numbingly impressive; and I'm not saying there aren't 10,000 people toiling on dead-on-arrival consumer hardware, be it Oculus peripherals or connected emotive robots or whole divisions at Google. My question is: why? Why not, with your skills, make a thing and fucking sell it?


> Which vCPU is faster, a ten year old Xeon E5-2643 v2 at 3.5GHz or a two year old Xeon Platinum 8352V at 2.1GHz? It depends on the workload.

It really does not depend on the workload, when those workloads we're talking about are by-and-large bounded to 1vCPU or less (CI jobs, serverless functions, etc). Ice Lake cores are substantially faster than Ivy Bridge; the 8352V will be faster in practically any workload we're talking about.

However, I do agree with this take, if we're talking about, say, lambda functions. The reason being that the vast majority of workloads built on lambda functions are bounded by IO, not compute; so newer core designs won't result in a meaningful improvement in function execution. Put another way: Is a function executing in 75ms instead of 80ms worth paying 30% more? (I made these numbers up, but its the illustration that matters).

CI is a different story. CI runs are only bound by IO for the smallest of projects; downloading that 800mb node:18 base docker image takes some time, but it can very easily and quickly be dwarfed by all the things that happen afterward. This is not an uncontroversial opinion; "the CI is slow" is such a meme of a problem at engineering companies nowadays that you'd think more people would have the sense to look at the common denominator (the CI hosts suck) and not blame themselves (though, often there's blame to go around). We've got a project that can build locally, M2 Pro, docker pull and push included, in something like 40 seconds; the CI takes 4 minutes. Its the crusty CPUs; its slow networking; its the "step 1 is finished, wait 10 seconds for the orchestrator to realize it and start step 2".

And I think we, the community, need to be more vocal about this when speaking on platforms that charge by the minute. They are clearly incentivized to leave it shitty. It should even surface in discussions about, for example, the markup of lambda versus EC2. A 4096mb lambda function would cost $172/mo if ran 24/7, back-to-back. A comparable c6i-large: $62/mo; a third the price. That's bad enough on the surface, and we need to be cognizant that its even worse than it initially appears because Amazon runs Lambda on whatever they have collecting dust in the closet, and people still report getting Ivy Bridge and Haswell cores sometimes, in 2023; and the better comparison is probably a t2-medium @ $33/mo; a 5-6x markup.

This isn't new information; lambda is crazy expensive; blah blah blah; but I don't hear that dimension brought up enough. Calling back to my previous point: Is a function executing in 75ms instead of 80ms worth paying 30% more? Well, we're already paying 550% more; the fact that it doesn't execute in 75ms by default is abhorrent. Put another way: if Lambda, and other serverless systems like it such as hosted CI runners, enables cloud providers to keep old hardware around far longer than performance improvements say it should be; the markup should not be 500%. We're doing Amazon a favor by using Lambda.


I can certainly see a lot of parallels with Oculus / Facebook.

Perhaps unusually, I actually wanted FB to impress itself more strongly on Oculus post acquisition because, frankly, Oculus was a bit of a mess. Instead, Oculus was given an enormous amount of freedom for many years.

Personally, nobody ever told me what to do, even though I was willing to "shut up and soldier" if necessary -- they bought that capability! Conversely, I couldn't tell anyone what to do from my position; the important shots were always called when I wasn't around. Some of that was on me for not being willing to relocate to HQ, but a lot of it was built into early Oculus DNA.

I could only lead by example and argument, and the arguments only took on weight after years of evidence accumulated. I could have taken a more traditional management position, but I would have hated it, so that's also on me. The political dynamics never quite aligned with an optimal set of leadership personalities and beliefs where I would have had the best leverage, but there was progress, and I am reasonably happy and effective as a part time consultant today, seven years later.

Talking about "entitled workers" almost certainly derails the conversation. Perhaps a less charged framing that still captures some of the matter is the mixing of people who Really Care about their work with the Just A Job crowd. The wealth of the mega corps does allow most goals to be accomplished, at great expense, with Just A Job workers, but people that have experienced being embedded with Really Care workers are going to be appalled at the relative effectiveness.

The communication culture does tend a bit passive-aggressive for my taste, but I can see why it evolves that way in large organizations. I've only been officially dinged by HR once for insensitive language in a post, but a few people have reached out privately with some gentle suggestions about better communication.

All in all, not a perfect fairy tale outcome, but I still consider taking the acquisition offer as the correct thing for the company in hindsight.


This seems fair to me. The executive view of ML is "can you do me a magic?" And as this article's "Graduate Student Descent" bit makes clear, the worker response is often to semi-randomly perturb code, show some graphs, and say, "Is this a magic?"

For me, most software development is about finding something boring and laborious. We get a computer to do the work so humans can level up and work on something requiring actual thought. That requires getting a deep understanding of the actual work.

Some of that definitely happens in well-run ML projects. But there's a bunch of Silver Bullet Syndrome stuff going on, where ML's shiny results and magazine articles lead to inflated expectations and inflated claims of success. A fellow nerd says, "I did an algorithm!" Some turns that into an impressive presentation with claims of X% gains in the Key Business Metric, hallowed be its name. In reality, it's more plus or minus X% when you account for externalities, natural variation, and actor adaptation. But that's ok, because by the time anybody finds out, attention is elsewhere.

That's not to really blame ML for that. For a period years ago, I kept getting asked, "Can we use a wiki for that?" I would start an explanation of what it actually takes to make a wiki work (hint: it's not the software). Their eyes would glaze over in short order, because they realized that it would take actual work. So many people want the silver bullet, the magic pill. Especially people in the managerial caste, as the reigning dogma there is that management is a universal skill. Details are for the little people.


Then OP sees newsflash on TechCrunch "Amazon Echo Flex v2 is failing in low humidity conditions". Goddamn it, Andrew (at the reliability lab). OP goes home and opens a bottle of Amazon basics wine.

Hi, I work in this field professionally - you are correct up until the tradeoffs. It is not the case that the spatial resolution is higher - in fact, semiconductor manufacturers have struggled to get ToF sensors much above VGA in mobile form factors. In general, ToF systems have a considerably lower maximum precision than structured light of the same resolution, especially at close range - typically structured light systems have quadratic error with range, but the blessing that that curse gives you is that at close ranges, precision goes up superlinearly with horizontal resolution. This is one reason that essentially all industrial 3D scanning is based on structured light. The other is multipath inference, which is specific to time of flight systems (and you can see the effects if you look at a ToF’s results near a corner - the exact corner itself will be correct, but the walls immediately adjacent to it will be pushed out away from the camera).

Temporal resolution is more debatable, because ToF is a lot more conducive to “extremely short, extremely bright flash” than structured light is. But for example, there are systems that run structured light (specifically, single-shot dot pattern structured light, like that seen in TrueDepth or Kinect), or its two-camera equivalent (active stereo) at exposure times of 1ms or less. It is all about the optics stack and sensitivity of the camera. So I don’t agree that temporal resolution is a compelling advantage either.

The main advantages of ToF are that it can be built in a single module (vs two for structured light), it does not require significant calibration in the field when the device is dropped or otherwise deformed, and it is easier to recover depth with decent edge quality. In general the software investment required for a good quality depth map is lower, though in this case Apple has been chewing on this for many years. Another potential advantage is outdoor performance at range - while historically this has been a significant weakness for ToFs, more modern ToFs adopted techniques to improve this, such as deeper pixel wells, brighter and shorter duration exposures, and built-in ambient light compensation. These are hard to do with structured light without manufacturing your own custom camera module. Finally - and I suspect this is why Apple ultimately picked it for their rear depth solution - because time of flight is a single module, it can be worked into the camera region on the rear of the device without having to have a separate opening for the illuminator and camera. The quadratic drop in accuracy with range that I mentioned above can be offset not just by resolution but by the distance between camera and illuminator - for a rear mounted device, the temptation is to make that baseline large, but this would put another hole in the back on the other side of the camera bump. I don’t see Apple going for that.


Indirect ToF: easier to manufacture in small form factors, relatively cheap, well established technology. Easier temperature calibration and lower precision required when manufacturing the emitter, meaning cheaper components and more vendors that can make them. Because the technology can only measure the shift in phase, there is phase ambiguity between waves. The way this is dealt with is to emit multiple frequencies and use the phase shifts from each to disambiguate, but you usually only get a few channels so there ends up being a maximum range, after which there is ambiguity (aliasing, if you will) about if an object falls in a near interval or a far one. Multipath can commonly also cause such artifacts in indirect ToF systems. Finally, because they are continuous wave systems, they can (though modern indirect ToFs try to mitigate this) interfere with each other like crazy if you have multiple running in the same area. I’ll note that there are also gated systems that I would characterize as indirect ToF, that use a train of pulses and an ultrafast shutter to measure distance by how much of each pulse is blocked by the shutter. These suffer from more classical multipath (concave regions are pushed away from the camera), and are not very popular these days. You are right to call out that Microsoft is very heavily patented in the ToF space, and they ship probably the best indirect ToF you can buy for money on the HoloLens 2 and Microsoft Kinect for Azure.

Direct ToF is a newer technology in the mobile space, because it has proven challenging to miniaturize SPADs, which are really the core technology enabling them. Additionally, the timing required is extremely precise, and there are not that many vendors who can supply components adequate for these systems. While there are patent advantages, there are also some significant technical advantages. Direct ToF systems have better long range performance, are much less affected by multipath, interference with other devices is minimal, and most critically - you can push a lot more power, because you emit a single burst instead of a continuous wave. This is really important for range and SNR, because all active IR imaging systems are limited by eye safety. For eye safety you care about not just instantaneous power but also energy delivered to the retina over time. Its helpful to recall that for all these active IR systems that go to consumers, they need to be safe after they’ve been run over by a car, dropped in a pool, shoved into a toddlers eye socket, etc - so this puts pretty strong limits on the amount of power they can deliver (and thus ultimately on range and accuracy). Direct ToFs are also really nice thermally, because your module has a chance to dissipate some heat before you fire it again (vs CW systems where you’re firing it a much higher fraction of the time).


Dependencies (coupling) is an important concern to address, but it's only 1 of 4 criteria that I consider and it's not the most important one. I try to optimize my code around reducing state, coupling, complexity and code, in that order. I'm willing to add increased coupling if it makes my code more stateless. I'm willing to make it more complex if it reduces coupling. And I'm willing to duplicate code if it makes the code less complex. Only if it doesn't increase state, coupling or complexity do I dedup code.

The reason I put stateless code as the highest priority is it's the easiest to reason about. Stateless logic functions the same whether run normally, in parallel or distributed. It's the easiest to test, since it requires very little setup code. And it's the easiest to scale up, since you just run another copy of it. Once you introduce state, your life gets significantly harder.

I think the reason that novice programmers optimize around code reduction is that it's the easiest of the 4 to spot. The other 3 are much more subtle and subjective and so will require greater experience to spot. But learning those priorities, in that order, has made me a significantly better developer.


Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: