DropBox was hard-tech when it came out; they reverse engineered the Finder so that your DropBox folder would appear like a normal Mac folder. They may seem like a commodity now because everybody's doing stuff like that (and in many cases, APIs have appeared that make it trivial), but when they first launched, they did stuff most people assumed was impossible. Twitch/Justin.TV was also hard-tech when it came out, though now the hard parts are built into every iPhone.
Other than that, I largely agree. The last software hard-tech company to strike it really big was Google. There've been a number of open-source projects doing what I'd consider hard-tech software, though - Bitcoin, git/Mercurial/Darcs, Bittorrent, TensorFlow, etc.
(My definition of hard tech, as applied to software, is "Software where you need to use scientific-method trial-and-error to build core pieces of the product." If you can read an online tutorial or reference manual and build the product, it's not hard tech. If you need to poke around at things, observe the responses, and build your own model of how things work, it is.)
"DropBox was hard-tech when it came out; they reverse engineered the Finder so that your DropBox folder would appear like a normal Mac folder. "
This, IMHO, is a really bad example.
This is pretty much just a few days of sitting in GDB for the right engineer[1]. Now, maybe it require people experienced with debugging tools, but it's really not "hard tech". Now, productionizing it so it works on all versions, yeah, a bit trickier. but again, none of this is at the level of basically "understanding how to make custom bacteria that do a thing", etc. If this is the example you mean for "did stuff people assumed was impossible", then i strongly disagree.
""Software where you need to use scientific-method trial-and-error to build core pieces of the product.""
This, IMHO, is way too low a bar. By this definition, the clang compiler we built for windows is "hard-tech". While it requires time and energy and trial and error, that is not hard, in the same way the dropbox stuff is not hard.
It is known that it is possible, and requires the reasonable application of good engineering skill. That engineering skill may often involve the scientific method trial-and-error, but you know you will eventually get there.
The same is true of dropbox, and in particular, your finder example. The only thing unknown is the timeline, and even that you can take a reasonable stab at if you have good enough engineers.
[1] I did it before they did, and i wasn't even the first. Plenty of people have made this happen :)
If we're talking about Google before Brin's PhD thesis, I think it would have qualified. It was not at all clear back then that using backlink data would yield more useful results than mere textual analysis of page content. One can definitely imagine a scenario where you try to build a search engine based on going down the rabbit hole of natural language processing as the key feature and then end up with something that doesn't work all that great.
Perhaps I'm just not as conversant in reverse-engineering as some of the people here, but my understanding is that if a key part of your product relies on patching somebody else's software for which you don't have the source code, this is also fraught with potential dead-ends and uncontrollable risks. What if they're using ASLR? What if they change the functions involved in the next version? What if the functions you're trying to patch have side effects that you can't afford to ignore?
That's why I prefer to put the dividing line at "must figure out things by poking at them rather than by reading documentation". The definition of "research" can be pretty vague - is a security team poking at a product conducting research? How about a UX team trying to figure out how their users behave? A search team doing language modeling? All of these would count in my head, and if a startup built their product around one of these results I would consider it "hard tech", but evidently not everyone agrees.
Google's other big innovation was building with commodity machines instead of high-end servers. That definitely puts them in the hard software tech category.
I'm personally sympathetic to "I'll need a research team and five years" missions. I think both university research and xkcd are great! But it's a critically different from YC's definition that:
Hard tech = "There is doubt that the technology can be built at all."
Though many pieces of technology face huge doubts, what's key is often a team can get to a working prototype or partial release in way less than 5 years! (E.g. Dropbox.) YC is exactly the type of environment to refocus a 'research style' team exclusively on demonstrable progress.
The problem with the xkcd definition (if attempted in a startup) is very few research teams can continue to fundraise for 5 years without a product or significant prototype.
A partial solution to the "doubted" tech is often good enough to build a great company.
>This is pretty much just a few days of sitting in GDB for the right engineer[1]. Now, maybe it require people experienced with debugging tools, but...
As someone sort-of familiar with gdb (but not extensively so) I have no idea how I'd do that. Can you point me in the right direction?
I don't use a Mac, but assuming Finder will detect that if its view is of a folder and items are added to the folder elsewhere, Finder's view will be updated to reflect this:
Use dtrace and create a lot of such events. They're presumably using kqueue or some event mechanism to be notified when the file arrives. Do this with many file types if they look different in Finder. Somewhere in there should also be a read that corresponds to the dirent. You can break on these things.
attach the debugger and create the events. Step through the code to find when these things are read. Attempt to discern how what is read differs between file types. Do stuff like make files with conspicuous attributes (e.g., file size), because it's easier to correlate from traces. The data is probably a file containing file metadata somewhere.
This is probably mostly looking at the bytes coming off the read. dtrace makes this easy because you can trigger it to set a flag when the kqueue event fires and then just dump bytes and locations from file reads/opens. If it's more integrated into the OS Finder would have to have its own special syscalls to read stuff off inodes or whatever. You'd be able to see those happening too.
Once you think you know how it works, give it a try. Rinse and repeat.
Now it may be you have surprises here and there and it's kind of annoying, but I'd be surprised if I couldn't do it.
My definition of hard tech, as applied to software, is "Software where you need to use scientific-method trial-and-error to build core pieces of the product."
I think that's a great definition actually and fits with how ML systems are built.
Sorry, no, JVM was not even innovative, let alone "hard tech". There were lots and lots of previous examples of doing the same (and better) than JVM, see for example the Smalltalk and Lisp world.
IMO, the only thing on your list that comes close is the Azul GC which, in my limited understanding, actually advanced the state of the art.
Other than that, I largely agree. The last software hard-tech company to strike it really big was Google. There've been a number of open-source projects doing what I'd consider hard-tech software, though - Bitcoin, git/Mercurial/Darcs, Bittorrent, TensorFlow, etc.
(My definition of hard tech, as applied to software, is "Software where you need to use scientific-method trial-and-error to build core pieces of the product." If you can read an online tutorial or reference manual and build the product, it's not hard tech. If you need to poke around at things, observe the responses, and build your own model of how things work, it is.)