Let's ask for 3 goats then. And how much developing o1 cost, how much another version will cost? X billions of dollars per goat is not really a good scaling when any number of goats or cabbages can exist.
Interesting paper, but their reason for dismissing constrained decoding methods seems to be that they want to academically study the in-context setting.
For practitioners, using a framework like Guidance which forces the models to write valid JSON as they generate text solves this trivially (https://github.com/guidance-ai/guidance)
And OpenAI also has Structured Outputs[1] that has the same effect as Guidance. I use it to safely deserialize remote function calls based on a jsonschema[2]. It works very well.
To be fair, they first build a benchmark which they call "StructuredRAG" and it doesn't make sense to run constrained decoding against a benchmark, because it would always get you a 100% success chance. Once they have a benchmark, they try to figure out whether it is possible to prompt engineer your way to a 100% success rate and by using ORPO to generate the prompt, they did achieve that 100% success rate without relying on constrained decoding.
Last I checked, physician pay represents a fairly small (single digit) fraction of healthcare expenditure. If I recall right, administrative overhead and insurance is significantly more of a contributor to pricing.
Late to the thread here, but the paper announcing Med-PaLM (https://arxiv.org/abs/2212.13138) does not report many benchmark results on Med-PaLM and is instead mostly about Flan-PaLM 540B (which is compared against in this paper). I am curious if any other Med-PaLM benchmarks have been published but I don't believe it is currently possible to do any further comparison against Med-PaLM given that the model is not public and no other open benchmark results are reported in the original Med-PaLM paper.
"We present a comprehensive evaluation of GPT-4, a state-of-the-art LLM, on medical competency examinations and benchmark datasets. GPT-4 is a general-purpose model that is not specialized for medical problems through training or engineered to solve clinical tasks...results show that GPT-4, without any specialized prompt crafting, exceeds the passing score on USMLE by over 20 points and outperforms earlier general-purpose models (GPT-3.5) as well as models specifically fine-tuned on medical knowledge (Med-PaLM, a prompt-tuned version of Flan-PaLM 540B). In addition, GPT-4 is significantly better calibrated than GPT-3.5, demonstrating a much-improved ability to predict the likelihood that its answers are correct."
Do you have some sources where I can read up more on this? I have always believed that the cost of front line worker salaries (Doctors/Nurses) was a relatively small % of the total cost of healthcare.
Studies like this [1] led me to believe that massive rises in administration costs and inefficiencies due to insurance structure are bigger culprits than doctor/nurse salaries. But I am admittedly not very well informed in this area.
Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?
E.g. from the FAQ: "We do intend to release aggregated data sets in the public good to foster an open web. When we do this, we will remove your personal information and try to disclose it in a way that minimizes the risk of you being re-identified."
It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.
> Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?
Yes. Central differential privacy is a very promising direction for datasets that result from studies on Rally.
> It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.
I've done a little re-identification research, and my faculty neighbor at Princeton CITP wrote the seminal Netflix paper, so we take this quite seriously.
Hats off to people like Carlos Guestrin & John Giannandrea, who seem to have pushed a cultural shift through Apple. I didn't think I'd see a site like this a few years ago, based on Apple's historic reputation of strongly discouraging employees from publishing research.
Not to take away from the people who worked hard to shift the culture (which is always hard at a large company), but I saw the same shift happen at Amazon and it came down to a simple fact - it's very hard to recruit and keep the best talent if you don't publish. You don't build ML reputation and the best researchers often don't want to work somewhere where they can't add papers to their resume.
Not sure about this: a lot of interesting work is just hidden because of NDAs, commercial IP, military, etc. You will never see anything made public until necessary and just the minimum amount required.
This is still hugely frustrating to the people working on it. I build high pressure distsys architecture and I've worked on some incredibly cool bleeding edge tech that nobody will ever hear about and I can't even put on my resume because of NDAs. The subject pops up on hn maybe once every 3 months and I'm just dying to talk to other people in that industry but it's just a private group so I enjoy what tidbits I can read, they're always low reply posts, and pay for white papers for most of the rest.
It actually became really important to me over the last 5-10 years. I have no desire to discuss the secret sauce but I've been at companies so secretive that we couldn't even send a github issue up on our personal accounts because some context might bubble up eventually as to what we built and who we are.
I am so, so much happier working on my niche at places where I can interact openly with people from other companies/projects, not in a competitive manner but as engineering peers working to improve some subsystem of our architecture.
I feel like the shift is coming, especially with all of the open standards groups companies are rallying behind. I'm loving that. It's just taking its time to come east of SV.
A big part of maturing as an engineer is realizing that if you’re not talking about something under NDA that’s actually cool, someone else on your team is, and it just makes you worse off.
The rabbit hole goes deeper. Sometimes you’re hired to recruit your friends where you’ll necessarily have to break the NDA to get them excited.
A lot of people look back on R&D work where they did not thrive and quit. I think a big part of it is not realizing what confidentiality really means. They miss out on working with their smart friends, they miss out on getting ideas from other people, they realize there are not enough people they know whom they could trust with confidential stuff. They turn out to be way too square to be doing R&D work, they're hung up on breaking the little rules so how are you going to break the big rules? They're being paid to break rules and they're just afraid to break them. Then when it just comes to doing good work, it’s so important to talk to friends and family for useful feedback especially since you almost always find out from your boss that you’re doing something wrong way too late.
This is especially acute at places like Apple that glorifies confidentiality. They’ve been more successful than ever with their more relaxed attitude towards leaks. It was totally unproductive but it was cargo-culted into places like e.g. Facebook, Snapchat and Samsung that have a hard time recruiting because nobody finds out what they’re going to be working on.
I use NDA talk as a marker of seniority of the person I'm talking to. If the person says they can't discuss this and that due to NDA, they are usually junior people. When you talk to senior people, they know what they can and cannot talk about, and will walk the line carefully, but never ever bring up NDA in any conversation.
Yeah but there are also senior people whose heads are so far up their butts. Definitely a situation at Apple that gets cargo-culted elsewhere.
The line they walk is in the service of their own egos. Senior people are rarely in possession of the sort of detailed knowledge that is actual valuable IP to steal - that's like, in the documents and code. They want to get chauffeured in a Bentley. [1] They want everyone to hold their breath when they talk. And by the way, Valleywag turned out to be spot on about how good Tim Cook would be and how bad Johnny Ive has turned, based entirely on their attitudes and not their past performance, which was a serious refutation of the entire way that R&D org was oriented and is reflective of the positive shifts Apple is doing today.
Senior people just have ideas, which honestly anyone can guess that Apple is working on an head mounted display, or that they are experimenting with a Siri that can see through the HMD's camera, working on their own bank, etc. etc. So what is there to keep secret? Google is literally working on everything all the time, even and especially ideas that have failed in the past, so there is literally nothing of value you can learn from "What is Google working on?" So the actual economic value of the head of R&D's secrets is very low, their job is to go and recruit and in that case they should really be talking quite openly about what excites them.
What are the odds any of them are like Shigeru Miyamoto, whose body guards brought around shrouds because last time he turned one of his day to day activities into a video game, it made billions of dollars? Slim. Some VP at Apple is not Shigeru Miyamoto. If you were Shigeru Miyamoto you'd go start your own thing. And that's really what I mean by ego, what 55 year old, at the peak of their seniority and career, is really as great as their paycheck and ego says they are, if they aren't you know, telling their amazing ideas to everyone and recruiting people to do their thing?
> What are the odds any of them are like Shigeru Miyamoto, whose body guards brought around shrouds because last time he turned one of his day to day activities into a video game, it made billions of dollars?
An aside, and an unnecessary one, but after reading your [1] reference, I cannot say how happy I am that Thiel was able to shut down Gawker. That reference read like a cheap supermarket tabloid, and the internet is better for Gawker having been destroyed.
My point really wasn't about discussing grey-area NDA stuff in an interview or with my family. It was about contributing upstream, publishing my research, and interacting with the communities of projects that we use as components of our systems. I'd ideally like my industry to be more open with its exchange of information, best practices, etc, but it's literally all competitors unless you're a university.
I know exactly what is reasonable to say in a business context. Comfortable walking that line - it is for business purposes and I'm never concerned by my own ability to make judgement calls.
But when talking to a recruiter? Ehhhhh. I am pretty sure I would break the precise terms... so I guess the real question is how vindictive is the holder of the NDA :)
I don't think that's actually true. Lots of stuff gets published that could (and if the bean counters had their say _would_) be "hidden". But NONE of it is published by commercial labs before all the patents are filed. MS Research cranks out an amazing number of patents. As does Google Brain and Deepmind. Then they have this Mexican standoff and "license" patents to each other.
But _some_ select stuff does not get published. I know of at least two examples first hand: one at MS, one at Google. This is usually the case when publishing a paper would help large, direct competitors to partially or fully close staggering competitive gaps. As you can imagine Google doesn't publish a whole lot on the subject of search ranking, for example.
Isn't it just defensive ? Is there any instance of anyone using or licensing an ML patent ? Mostly people like IBM, MS, etc. were doing patents anyway for decades, and they continue to do so. So everyone else has to play the same game. I don't think there has been any big case involving patent infringement over ML/AI.
It's "defensive" only until the company starts sliding financially, at which point it becomes _very_ offensive. That's how IBM got MS into this game: one day IBM lawyers showed up with an invoice at Bill Gates' office.
Concur. There is amazing work being done behind gobs of money and NDAs. Some people are motivated by money and accomplishment rather than citation rings.
Apple is in FAANG. For some reason (likely related to 2015-17 era SWE compensation) FANG just includes Netflix (with amazon facebook and google) and not Apple.
I mean the actual companies that go into these acronyms are generally arbitrary but AFAIK it was Facebook-Apple-Netflix-Google as companies with rapid growth/decent profitability during that period.
Totally respect what they have achieved, one thing to keep in mind is to structure the incentives so that researchers don't just focus on h-scores and the like but also ship features that can transform user's experiences.
I don't understand how Facebook got a $5B fine, yet Equifax gets a ~$650m fine. The data breached in the Equifax case seems to cause far more direct harm, and affected many more Americans. It feels like the 10x difference should go the other way.
Can someone more educated in how these fines work teach me about how these numbers are calculated?
in addition to demonstrating harm, regulators really hate it if you defy them. Repeat offenses carry a significant penalty as you're seen to be thumbing your nose at them.
That's what's frustrating about most of Elon's crap. Don't test the patience of the SEC with _tweeting_. Put your phone away and save that social capital for when you actually need it.
FB is more strategic but still repeatedly misleads congress, the FCC, etc. After a while, they're sick of being made to look a fool. Notice that FB isn't getting the "trust us" benefit of the doubt with Libra (nor should they.)
Yes. The other most significant aggregator of data, Google, is by no means a saint in this space, but I think they would get a bit more "trust us" points than Facebook. Their settlement over childrens' privacy on the youtube platform is a salient example here. To my view, the rapid emergence of children vloggers turning it into a career and causing COPPA issues is probably something they should have twigged to earlier, but it doesn't smack of the blatant & extreme exploitation & carelessness of user data seen by Facebook. That said, Google is probably only one decent sized data scandal away from that territory, and hopefully takes FB's fine and increased scrutiny as instructive in being more careful themselves.
That's one way of looking at it. The other way is that companies should feel about equal pain relative to their sizes. Otherwise, big companies are able to gain an unfair advantage by just ignoring laws for which they can afford the fines.
> The other way is that companies should feel about equal pain relative to their sizes. Otherwise, big companies are able to gain an unfair advantage by just ignoring laws for which they can afford the fines.
Which doesn't make any sense and just gives them the incentive to play the same games they do in avoiding taxes.
The first reason it doesn't make sense is that the penalty should have some relation to the damages. If you cause $500 damage to someone else without their consent, screw you. But if the fine for that is $5000 per victim, it's a deterrent no matter how big you are, because $5000 is more than $500 (and provides a fair margin for the probability of not getting caught), and if the company is getting more than $500 in value from doing it then it could have just offered to pay the victim $501 to consent to allowing it, which implies that they're not.
Meanwhile if you don't think large corporations can move numbers around on a spreadsheet to minimize what they owe, you haven't been paying attention. And we sure as heck don't need a system where Equifax gets to put its risky business in one entity that has inconsequential revenues and then suffer a $10 total fine when it screws up this bad because whatever penalty percentage of almost nothing rounds to zero.
Say companies A and B each cause $500 in damages. Company A makes $600 from that act, while Company B makes $6,000. A fine of $5,000 is way over the top for Company A, but Company B can just write it off as the cost of doing business.
As I've said elsewhere, I'm not advocating one particular method of coming up with this number. I'm just saying that the fine should depend on the company, not be a flat number based on damages caused.
If the cost of damages (including a punitive 2x or whatever) is really and truly $500, and Company B is willing to make their victims whole at that cost...I'm 100% sure there's a problem that needs solving.
What you are talking about is called punitive damages. Punitive damages exist exactly for the purpose of causing financial pain to companies in order to give them actual incentive to change their behavior (since if it is profitable to kill people, companies will make killing people a standard operating practice, we have multiple proofs of this). No other type of damage is levied as punishment. Other types of damages are driven by actual recovery of damages.
Percentage of profits would be a terrible way to fine. Imagining how that could be messed with isn’t hard. Mysteriously there would be no profit and the follow few years worth of expenses would be pre-paid, spent or otherwise brought forward.
Sure, I don't really have any opinion on the best way to measure the size of a company. I'm just espousing the principle of scaling the fine to their ability to pay.
%of profits it fraught with ways to hide profits. Even gross revenue can be gamed, albeit with less efficacy. More practical is something like X dollars per infraction, with the ability for regulating bodies to exert some professional judgement that lets them determine if the culprit's infractions were severe enough to let the per-infraction cost put them out of business all together.
Because fines like this are often more about sending a message to the entire industry than simply about reimbursing damage. It's saying, "make sure you take security seriously, or you're risking us taking X% of your revenue/value".
If you don't do it this way, you end up in a situation similar to speeding tickets: well-off people don't care at all (and are even probably more annoyed about having their drive interrupted than the actual fine), but it can mean a poor person has to skip meals to recover. If the goal is discouraging a certain type of behavior overall, it has to hurt violators comparably, no matter their wealth.
> It's saying, "make sure you take security seriously, or you're risking us taking X% of your revenue/value".
I would believe that if these companies didn't just keep doing what they were doing anyway. Losing a percentage of revenue or profit for one year does nothing to deter them! We need to reinstate the corporate death penalty. Equifax deserves to die for its negligence, IMO.
I agree with you; the fines are much too small. But the point is that they're too small for both Equifax and Facebook. Facebook's stock even went up because it was only a $5B fine!
The only way things will change is if the fines hurt more, but it needs to hurt the huge companies just as much as the small ones, otherwise it ends up just being another factor that helps keep the already-dominant companies at the top.
Do you really believe that Facebook's stock went up because they "only" got a $5B fine?
Facebook's stock went up because they had a pending fine, and the value of the fine was announced, reducing uncertainty. Put another way, would you buy a car that has an unknown repair bill for the same price as a car you know how much it's going to cost to fix?
"You don't get to exist if you screw up that badly" is a great way to send a message to an industry. Sorry, but Equifax is in a position to be a gatekeeper for data of people who haven't asked or given direct permission for them to have it. They should have gotten the death penalty as a corporation and their remaining data should have been seized.
Well, that's the thing. FB didn't necessarily case $5B in damages, they broke the consent agreement. Actual damages might be, relative to the fine, minimal. It's hard to say how much monetary damage Equifax actually caused, but I thin it's not unreasonable for a primary tenet of setting fine levels the hurt, but aren't so punitive the the company must shutdown unless the activity was so egregious that a return to legitimate business may not even be possible or practical. Sort of like, in banking, the difference between leveling lots of fines on WellsFargo for their shenanigans but lettings Lehman Brothers just fail and go bankrupt. (I know, opinions differ on how these things should have gone down, and on whether Equifax should have been forced to wind down and parcel of its services to other entities. I'm just trying to explain why actual damages isn't always the sole consideration.)
corporations are not people. They don't "learn lessons". They respond to incentives. If this breach didn't cost them dearly, but they still reaped any reward from having had the breach (e.g., saved money on security, and opt to pay the fine instead when they are breached), they will do it again in the future.
A fine is meant to deter as well as punish. If the fine is too small, it won't deter. And certainly if less than the profits earned, it can't punish, nor deter.
Corporations don't learn lessons, but people do. You want managers arguing for budget to prioritize security, or lawyers arguing for legal stuff, to be able to use this as a compelling example.
Losing $650 million is perhaps not quite as compelling a story as losing billions, or a smoking hole where a company used to be (as in Enron and Arthur Andersen). But it's a pretty big chunk of change. I have no experience making such arguments, but it seems plausible that it will be remembered for a while at Equifax and their competitors, at least?
I'm doubtful that people respond to such incentives rationally. It probably has more to do with how well the storyteller tells the story. And whether the thing they're selling actually works well for improving security seems pretty hit-and-miss, too.
The FB fine was due to violating a previously existing consent decree with the FTC due to previous violations. The "2nd offense" nature of the offense probably contributed significantly to the higher number.
Probably because FB is politically charged, disliked by both parties, consumers and the wider industry. The fine represents the public's anger at large.
Let's be even more honest: if they did know what Equifax is, what they do, and how long they've been doing it, they would certainly hate them more than Facebook.
Looks like you got downvoted a bit on this, but it's an excellent point. I fully think FB deserved what they got, and would not have balked at more. But Equifax, even with a fine higher than FB's relative to market cap, still seems to have gotten off lighter.
IIRC you had to agree not to join class action suits to take advantage. Seems like a pretty self-serving tactic given that we're the ones who have to deal with their idiocy.
I got one of those letters. It seemed like a cruel joke to me. "We're sorry that we leaked all your personal data. But we have a great opportunity for you today! Send us some more personal data, and we'll monitor your file or something. For free! Trust us, it's gonna be great!"
I would imagine that Equifax was able to prove that they at least met the prudent man rule.
The prudent man rule which requires senior executives to take personal responsibility for ensuring the due care that ordinary, prudent individuals would exercise in the same situation. This rule, developed in the realm of fiscal responsibility, now applies to information security as well.
The intent was to patch the system but they experienced some sort of issue that prevented the timely action. From what I understand, you only have to show the courts that we tried to do the right thing and had the right intention.
Plus they aren't involved in any election scandals which certainly helps....
The one positive thing that came out of all this is that you can lock down your credit for free and open it again for free when you need to . Basically no one could ever open an account or credit card in your name if the offering party tries to run a credit report.
Assuming I buy your argument, to me, it just implies that the prudent man rule is inadequate here. Intent doesn't secure my data. As far as I'm concerned, they can intend in one hand and shit in the other and see which one fills up first. When the consequences of failure are the compromise of the financial lives of virtually every American adult, you need to be more than prudent about it.
Yes, intent minus execution equals some level of incompetence. Which (I guess?) is better than never having the intent to begin with, but it's sort of a distinction without a difference. "I wanted to fix my brakes but the brake shop was closed, that's why I got into a car crash" isn't really an endearing argument to the other parties involved or the regulators (Police in this case) that deal with the fallout.
Plus they aren't involved in any election scandals which certainly helps....
Yes, plus their perceived censoring of right-leaning content (real or imagined).
But between the election stuff and their attempt to setup a currency whose monetary policy would be governed by a group of wealthy corps and partly based in another country regulated by a foreign body... these are things that touch on the sovereignty of the US, and no government wants internal competition on that front.
My first try with o1. Seems right to me…what does this teach us about LLMs :)?