The ML research team at Voxel51 just released a paper showing that foundation models rival the accuracy of human annotators in labeling large visual datasets, at several orders of magnitude less time and cost.
We also found that models trained from these labels perform about as well as those trained from human labels when tested against public validation sets. Interestingly, setting a relatively low confidence threshold (0.2 - 0.5) for the auto-generated labels maximized downstream model performance. Very high confidence thresholds often produced worse results due to reduced recall.
The upshot is that zero-shot labeling can replace human annotation in many datasets. The massive cost savings can then be redirected toward training higher-parameter models.
Happy to answer any questions about the research. You can also read this blog we wrote that goes more in depth into the methods and tools we used.
https://link.voxel51.com/HN-VAL-blog/
There's a fantastic Ray Bradbury short story from 1948 called "Jonah and the Jove-Run" that I hardly see referenced anywhere. It's about Jupiter being the next frontier after colonizing Mars and the complexity of navigating the asteroid belt on supply runs.
It's a great quick read. Though it hardly attempts the sort of scientific justification as in The Three-Body Problem.
"Made with love" is a concept that's subjective but real. You can tell Andor was made with love, while the sequel trilogy looks like it was made with a set of release criteria designed by consultants.
Not to mention I don't even put Andor in the category of a typical Star Wars story. It's just great geopolitical writing. The boardroom scenes were some of my favorites of any show I've ever watched.
One of the best things I was forced to do in high school was read "How to Lie with Statistics" by Darrell Huff. The book's a bit dated and oversimplified in parts, but it gave me a healthy skepticism that served me well in college and beyond.
I think the issues described in this piece, and by other comments, are going to get much worse with the (dis)information overload AI can provide. "Hey AI, plot thing I don't like A with bad outcome B, and scale the axes so they look heavily correlated". Then it's picked up on social media, a clout-chasing public official sees it, and now it's used to make policy.
It helps to internalize the concept that all statistics (visualizations, but also literally any statistic with an element of organization) is narrative. “The medium is the message” type of way.
Sometimes you are choosing the narrative consciously (I created this chart to tell a story), and sometimes you are choosing it unconsciously (I just want to scatter plot and see what it shows - but you chose the x and y to plot, and you chose the scatter plot vs some other framework), and sometimes it is chosen for you (chart defaults for example, or north is up on a map).
And it’s not just charts. Statistics on the whole exist to organize raw data. The very act of introducing organization means you have a scheme, framework, lens which with to do so. You have to accept that and become conscious of that.
You cannot do anything as simple as report an average without choosing which data to include and which type of average to use. Or a histogram without choosing the bin sizes, and again, the data to include.
This is all to say nothing of the way the data was produced in the first place. (Separate topic)
If even the hyperscalers like OpenAI aren't making a profit, when exactly to companies adopting AI start making money? I wonder if we'll start seeing the hyperscalers start raising prices and squeezing customers when investors finally start expecting a return, and if that will put the breaks on adoption as a result.
Depends a lot on the investor expectations. If they think the opportunity for growth and market expansion is coming to its end, then they would force for higher returns.
I'm curious, why hasn't Valkey picked up corporate sponsors to the degree OpenTofu did when HashiCorp changed Terraform's licensing? I just haven't seen a meaningful level of reaction compared to the community outcry when Hashi changed to BSL.
> I'm curious, why hasn't Valkey picked up corporate sponsors to the degree OpenTofu did when HashiCorp changed Terraform's licensing?
You seem to be completely out of the loop. Valkey is backed by AWS, Google Cloud, Oracle, etc. If I recall correctly, a principal engineer from AWS was spearheading the project.
Valkey has lots of corporate sponsors, including Amazon, Oracle, Google, Percona, and Ericsson. It's a also under the Linux Foundation and will get support and coverage from there (which in turn is sponsored by even more large companies)
Probably because Terraform’s value was always the community of providers and modules, and that was in danger.
Where as, Redis/Valkey’s ecosystem exists mainly as advocacy and happy users. It might be central to an architecture, sure, but using a previously open sourced version was unlikely to cause considerable problems.
Contrast to potential huge changes to the BUSL’d terraform that create incompatibility with existing providers would lock you in to HashiCorp’s new, unfavorable, terms.
> Probably because Terraform’s value was always the community of providers and modules, and that was in danger.
It was never in danger, the providers remained under MPL and were explicitly excluded from the licensing change, with a good associated explanation (most of them were developed by and with partners and the community, unlike Terraform core which was almost entirely HashiCorp).
The providers need a “driver.” Without that, they aren’t very useful as is. That’s the danger. (Yes, pulumi, etc)
Additionally, HashiCorp changed the terms of service on the registry, making it only acceptable to use the official terraform binaries to download modules or providers.
Now, the providers are mostly open source, so, it was never impossible to recreate the thing—just work. But the point here is that Hashicorp took steps that caused the community of terraform users to recognize that closing off the ecosystem would have a tremendous impact on devops.
That’s why there was so much outrage and immediate action taken.
> Additionally, HashiCorp changed the terms of service on the registry, making it only acceptable to use the official terraform binaries to download modules or providers.
Why would HashiCorp provide free hosting of providers and modules for projects competing, using HashiCorp's own code at that? Multiple entire companies exist doing little more than providing wrappers around stuff HashiCorp develops. HashiCorp has no obligation to give them everything so they have an easier time at undercutting them (because they don't have to actually develop the main stuff).
> But the point here is that Hashicorp took steps that caused the community of terraform users to recognize that closing off the ecosystem would have a tremendous impact on devops.
The community of people using alternative products off HashiCorp's efforts, code, and money. Terraform Community Edition is still free and usable for anyone as long as you don't sell it to compete with HashiCorp.
> Why would HashiCorp provide free hosting of providers and modules for projects competing, using HashiCorp's own code at that?
If you recall, my point is that “providers were in danger,” and this is a reason in support of that. HashiCorp, of course, has no obligation to host providers for competitors. But, this is one more reason OpenTofu succeeded!
> Terraform Community Edition is still free and usable for anyone as long as you don't sell it to compete with HashiCorp.
Except, it’s rather unclear what “compete with HashiCorp” means, and there’s very little assurance that if you stick with terraform community edition you won’t get screwed over and be forced to pay in 6 months.
You can make all the arguments about “needing to make money”, “free loaders”, etc. HashiCorp is not unique in changing licenses and getting backlash.
But, as someone who joined HashiCorp, in part, because of our open source strategy, and hearing over and over, for years, how it was the reason we were so successful…
I've found LLMs to often be a time-suck rather than supercharge my own learning. A huge part of thinking is reconsidering your initial assumptions when you start to struggle in research, mathematical problem solving, programming, whatever it may be. AI makes it really easy to go down a rabbit hole and spend hours filling in details to a question or topic that wasn't quite right to begin with.
Basically analog thinking is still critical, and schools need to teach it. I have no issues with classrooms bringing back the blue exam books and evaluating learning quality that way.
Engineering often supplies the potential energy fueling vision and vice versa. ARPANET was very specifically engineered with the vision of a network that could survive nuclear attack. Then it evolved into the internet, which fueled a new vision of collaboration and discovery for students, hackers, and researchers. Who in turn created systems that became new business models, and the cycle goes on.
I mean code ages quickly, so the value of software must include the skillset needed to support and maintain it. Which is why enterprise software contracts exist, and are expensive. You're not paying for the binary. You're paying for the team supporting it.
I'd expect LLMs to continue making bloat significantly worse. When the cost of a thing craters, in this case generated lines of code, then you'll inevitably get way more of that thing.
Also my observations so far of "vibe coding" include a lot of copy and pasting errors that are then fixed with installing yet more libraries until you have a massive, glued-together mess.
We also found that models trained from these labels perform about as well as those trained from human labels when tested against public validation sets. Interestingly, setting a relatively low confidence threshold (0.2 - 0.5) for the auto-generated labels maximized downstream model performance. Very high confidence thresholds often produced worse results due to reduced recall.
The upshot is that zero-shot labeling can replace human annotation in many datasets. The massive cost savings can then be redirected toward training higher-parameter models.
Happy to answer any questions about the research. You can also read this blog we wrote that goes more in depth into the methods and tools we used. https://link.voxel51.com/HN-VAL-blog/