It's not about training directly on the test set, it's about people discussing questions in the test set online (e.g., in forums), and then this data is swept up into the training set. That's what makes test set contamination so difficult to avoid.
That is the "reality" - that because companies can train their models on the whole Internet, companies will train their (base) models on the entire Internet.
And in this situation, "having heard the problem" actually serves as a barrier to understanding of these harder problems since any variation of known problem will receive a standard "half-assed guestimate".
And these companies "can't not" use these base models since they're resigned to the "bitter lesson" (better the "bitter lesson viewpoint" imo) that they need large scale heuristics for the start of their process and only then can they start symbolic/reasoning manipulations.
But hold-up! Why couldn't an organization freeze their training set and their problems and release both to the public? That would give us an idea where the research stands. Ah, the answer comes out, 'cause they don't own the training set and the result they want to train is a commercial product that needs every drop of data to be the best. As Yan LeCun has said, this isn't research, this is product development.
>> It's not about training directly on the test set, it's about people discussing questions in the test set online
Don't kid yourself. There are 10's of billions of dollars going into AI. Some of the humans involved would happily cheat on comparative tests to boost investment.
The incentives are definitely there, but even CEOs and VCs know that if they cheat the tests just to get more investment, they're only cheating themselves. No one is liquidating within the next 5 years so either they end up getting caught and lose everything or they spent all this energy trying to cheat while having a subpar model which results in them losing to competitors who actually invested in good technology.
Having a higher valuation could help with attracting better talent or more funding to invest in GPUs and actual model improvements but I don't think that outweighs the risks unless you're a tiny startup with nothing to show (but then you wouldn't have the money to bribe anyone).
Theranos didn't have 10 different competitors doing the exact same thing. A new AI model which scores better on a random metric isn't going to suddenly make them the top model that everyone uses unless they're actually good. So while Theranos cheating would help put them in stores like CVS, an AI company cheating would just mean that they make a few sales before everyone realizes that their model is actually pretty bad compared to all the competitors.
If I'm understanding correctly, by "income", this article means "actual individual consumption", which is the amount of money spent by a household. Thus, the article is saying that for countries where households spend more, they spend more on healthcare. Given that healthcare is a huge fraction of household expenditures (almost 20%), this seems tautological. Am I misunderstanding something?
This is exactly the problem we have found in our research on generative AI for education [1]. We ran a pilot in a large high school in collaboration with math teachers, and found that students basically copy answers from ChatGPT, resulting in worse performance compared to students not given ChatGPT. If students don't want to learn, ChatGPT isn't going to fix anything.
You are just giving them ChatGPT with a bit of prompt engineering, and evaluating them on math problems, which we know LLMs make errors on because they are not calculators. You aren't putting in the effort needed to build a real tutor and learning assistant. I would not extrapolate from these results
There are also a lot of things that can come in before you build a full on tutor. One example is being able to tailor word problems (transform the nouns) to subjects interesting to the particular student. They could also be used to help understand where students are struggling. We are still at the early phases of useful AI, optimism is more appreciated, especially as contemporary times have become so pessimistic
This breaks down because it's not easy to statically reason about when a variable is a NonZeroNumber. For example, what is the type signature for subtraction of two NonZeroNumbers? You can't guarantee that it isn't zero, so it has to be a Number. Thus, you can't divide by the difference. You could use a more powerful type system to reason about these kinds of constraints, but then type checking quickly becomes undecidable (or at least very, very expensive).
Here you can infer that the type of 'a' inside the 'then' is a NonZeroNumber. "All" you need is for the type checker to be able to recognize when a conditional acts as a guard against a subset of the possible types.
But if we're dividing by (a-1) then recognizing that it's safe may require a type NonOneNumber. And in other cases perhaps NonFortyTwoNumber. Where does that end?
Only in as much as it would require the type checker to be able to express and operate on expressions constraining or combining types, not name every one. E.g. being able to recognise that in this:
if a < 2 then exit
.. afterward, "a" has a named numeric type fully covered by its previous type combined with the extra constraints of the check if that allows picking a more constrained type, and its real type is further constrained to the subset a>=2, and be able to reconcile that this means 'a' can't be 1.
In practice, yes, you can absolutely get into situations where this will mean you end up writing extra checks to prove to the compiler that a value can only be within the required subset if the type checker couldn't figure it out. How often will depend on how advanced your type checker is.
Important caveat with some of the results: they are using better prompting techniques for Gemini vs GPT-4, including their top line result on MMLU (CoT@32 vs top-5). But, they do have better results on zero-shot prompting below, e.g., on HumanEval.
> So, the 2% growth rate for world energy consumption should be a conservative assumption.
An important caveat: this article assumes that energy consumption will continue to increase exponentially to get the 1000 year timeline of draining the rotational energy of the Earth.
There are 8 billion people in the world, and we’re on track to have 10 by end of century. Which is more likely? Everyone lives like a European (from a consumption perspective)? Or a bunch of people don’t or die while some do?
> Which is more likely? Everyone lives like a European (from a consumption perspective)? Or a bunch of people don’t or die while some do?
There are no systems that can ever be created to actually balance out / eliminate inequality, outside of very small communities operated via extreme authoritarianism.
See: modern Sweden. A formerly highly effective welfare state, increasingly brought to its knees by poor immigrants (and it's going to get a lot worse over the coming decades). They can't remotely handle what's happening to their country in terms of inequality, despite how good they have been historically at managing it. France - also historically effective at managing a high quality of life welfare state - has entirely failed at managing a similar scenario, and for the same reasons.
The exact same principle applies globally as it does locally. It can't be done under any scenario. So yes, vast inequality will persist forever, as it has forever.
The end of inequality is when there are one or fewer humans remaining.
We don't have to guess at likelihoods (will everyone live like a European). The answer is known and certain. Even within Europe half the population can't afford to 'live like a European.'
You have some sources for “Sweden and France are failed countries”? Something tells me that’s not quite an objective take. “Inequality exists now so it will always” is also not the objective scientific take I think you’re implying it to be
I'm willing to bet our descendants will be approximately as irrational and lucky as humans have always been.
There's a good chance that at least one of those ten billion are going to have some radically good ideas, and an even better chance than a few more will have some really good ideas.
Jesus fucking Christ this is depressing. You just brought up how you think genocide inevitable, and you imply that we should start it so that we’ll win?
I don’t think there’s any reason we should take “let’s kill everyone else so we can keep our nice stuff” should make any more sense in 2023 than it has over the last… well, such arguments pop up in all of human history, I guess.
To answer your question: i think it’s more likely that we build an egalitarian global society, or at least continue on the path.
I didn’t say genocide, I said violence over scarce resources. They are distinct motivators. I also didn’t say to start it, I said to win.
> To answer your question: i think it’s more likely that we build an egalitarian global society, or at least continue on the path.
Good luck with that. Hope is not a strategy, but hopefully the future is not as bleak as the current data predicts. Show me a voter cohort that will willingly give up substantial go forward energy or resource quality of life for people on the other side of the world who they have not nor will never meet. Do you know how many people are dying right now at this moment because their basic needs are not being met? One every 4 seconds per Oxfam. This is before more frequent heat events, crop failures, aquifers reaching terminal depletion, etc.
Agreed it’s (extremely!) depressing, but facts are different than a reality based on feelings. Finding truth is following the facts to the (sometimes) unpleasant places it takes us.
> > Good luck with that. Hope is not a strategy, but hopefully the future is not as bleak as the current data predicts. Show me a voter cohort that will willingly give up substantial go forward quality of life for people on the other side of the world who they have not nor will never meet.
Voters might never do it, but the guy who wins the Presidency and only has to face voters every 4 years, he starts thinking about Nobel Peace Prize the moment he sets foot in the WH, and to make a legitimate candidacy he needs to be a great humanitarian and/or getting some significant Foreign Policy victories.
Bush 43 took it upon himself to fund a whole lot of malaria and HIV prevention campaigns, and even Trump pursued the defeat of ISIS and normalization of relationships between Israel and the Arab world.
I don’t really understand the argument that corporate profits are driving inflation. If demand for a product goes up, we would naturally expect prices to rise since there is more competition for a fixed set of goods. Since the cost of producing the product has not increased, corporate profits necessarily rise. In the long term, we would expect competition to drive prices back to marginal cost as production of that product increases, but this process can takes time.
In this case, how do you separate inflation due to increased demand from inflation due to corporate profits? The two seem inextricably tied to me.
You can't compare prices and profits. Increased demand and short supply can cause prices to rise. Prices. Not Profits. Profits are typically after costs of selling goods/services. If pandemic caused various factors to cost a lot more and that cost was passed onto consumers at a similar rate as in the past then profits should remain same-ish. Its possible and logical that companies raised prices during the pandemic due to some cost increase (parts, shipping, etc.) and then never lowered prices even though that cost increase faded away.
If over a long period of time there has been consolidation (aka acquisitions) in an industry and there are only a few conglomerates around, they essentially have a market to themselves and can do what they want without actually legally being considered a monopoly.
people seem to think there is a corporate generosity/corporate greed cycle, where sometimes they get greedy and push the prices up and other times they feel generous and lower the prices for no reason.
If people would pay more they'd already have set the prices higher...these quangos commenting on it have clearly lowered the quality of their academic hires if they're willing to publish garbage like the article above and the imf's recent piece.
You can look at plenty of markets where there is no increased demand. Food is an obvious one. Also as other people have pointed out, it's not like they're hiding it.
The price of a pound of oats at the regular grocery store has trebled over the last year to 5.00/lb. Meanwhile if you buy them in 25 lb bags at Cash and Carry they’re less than 80 cents a pound. Someone is gouging like crazy.
The producer price index in the US was at 118 in 2020. It shot up to 140.72. In the US that corresponds to the producer having less purchasing power which directly ties into our consumer price index and the raw costs of goods. At one point the producer-side inflation was WORSE. Businesses were loosing their asses in the US. it wasn't "price gouging".
Food is down 20-40% over last year and the only thing still propping it up as much as it is surrounds concern over the US midwest drought. Indeed, consumer buying lags production, but it is only a matter of time.
In the UK, it feels like food is up 25-50% over a mere couple of years.
Items that were £2 not so long ago are now rounded up to £3 in some cases.
It doesn't help that supermarkets are constantly playing games with pricing, trying to get people into the store with seemingly good deals that never last long, then experimenting with just how far they can push the greedflation on other items.
We need a measure of inflation that's based on the cost of essentials, primarily housing, energy, transport, and food - rather than stacking the 'basket of goods' with infrequent purchases that we expect to fall in price.
Food prices are definitely still up 50%+ over 2019, but down from last year. Concerns over the conflict in Ukraine sent things to the moon for a little while. I expect there is still some fertilizer concern out there, but it is mostly the weather driving things right now.
We are talking about food, not groceries. As before, there is a lag between them. The consumer is still largely paying out last year's food contracts at the high prices. We likely won't see grocery prices come down until next year.
There is increased demand/prices for the inputs, and decreased supply for some of them. "Increased demand" doesn't just mean increased demand for the end product.
> If the headline read consumers now willing to pay twice the price it wouldn't have the same clickbait value
that's not how it works though. Inflation affects a lot of goods where you have no choice, you just need it (e.g. food, utilities, housing, transportation).
Almost all good have a sliding scale for quality and how much you consume. It's not pretty or Pleasant but it's possible. I don't want to in any way sugar coat it, but it's possible. For example people don't need meat or eggs to have a subsistence diet. The fact that they continue to buy those goods at a higher price demonstrates that they are both willing and capable of paying that price before switching to something cheaper like rice and beans
Because the corporations are saying it is an increase in costs that are driving higher prices, not demand. And the price curve for things like "eggs" is not being driven by demand.
I have a hard time believing this is supply-side inflation. Are they saying that multi-national corporations are inflating their own prices so-as to invoke cost-push inflation so that they can profit on the back-end? If that's what the EU is saying (which is what you're implying) than they're insane conspiracy theorists. markets aren't centrally planned processes, especially markets as complex as the EU's. Chances are it's demand side inflation, which is exactly what the US has. The US had a massive recession in 2007 and they inundated corporations with bailouts yet this did nothing to spike inflation. When the US pumped trillions of active currency into the economy they triggered demand-side inflation (https://www.bloomberg.com/news/articles/2022-08-24/demand-su...). How is the EU any different? They too pumped excessive amounts of currency into their markets (https://www.consilium.europa.eu/en/policies/coronavirus/covi...).
> markets aren't centrally planned processes, especially markets as complex as the EU's.
It could be when all the brands are owned by a couple of corporations, which are chaired by a couple of large investment firms.
You know for a fact that big tech was doing a no poaching agreement in 2005, but can't wrap your head around the fact that corporations can collude on a price hikes in 2023?
You picked just about the worst example to make your case. Egg prices Spiked to huge numbers because of large culls of egg laying hens then fell back down. They didn't fall all the way back down because customers are now willing to pay higher prices
no don't you understand, the egg producers just became more greedy for a bit and raked in extra profits, then decided to become less greedy! Get out of here with your important context!
OpenAI hasn't said exactly how they trained code-davinci-002 so this is speculative, but I'm reasonably sure it was trained on more data and languages than CodeGen and for longer. It was also trained using fill-in-the middle [1].
I’m not sure I buy the conclusion. The point in the original story (at least the version I’ve read) was that Mel’s main job was to write code that could make the computer appear as fast as possible to potential customers. Of course one hack isn’t going to noticeably improve performance, but a collection of hacks can significantly improve performance.
I'm not sure about that. If Mary knows all the physics of light, she can design and set up a sensor to detect the color of the apple and the instructions, just like we have sensors that detect ultraviolet and infrared light. Just as we will never know how it feels to "see" ultraviolet light, this thought experiment is about Mary's subjective experience perceiving color, not her ability to determine the color of the world around her.
Mary is using sensors either way: they either come pre-installed or after market.
On a more serious note, I'm taking the question in the article at face value, "Has Mary learnt anything new about the colour red upon seeing the colour for the first time?" If the author really meant it to be a question about the subjective experience of seeing, they should have simply written a better question.
Your confusion is not the author's fault. You ignored the parameters of the thought experiment, which clearly state that she can discern colors without "seeing" them. The question is whether the subjective experience of "experiencing" color adds new information.
I'm familiar with the thought experiment and have been for decades. I still think the author could write more clearly.
Besides, we can clearly perform a proxy experiment today by way of enchroma glasses and colorblind people. Simply devise a way to test for information predicated on trichromacy using bichromatic metamers. Did the subject experience a functional difference? If so, it's a pretty easy question to answer. If there's no functional difference then why does it matter?
"Simply devise a way to test for information predicated on trichromacy using bichromatic metamers." lol... "Simply..." this is literally the 'the explanatory gap' described in the article. We don't currently have a way to measure if the brain "acquired new information" when it knows about something but then sees it for the first time... which is the whole point. "experience a functional difference" -- what does that even mean?
AFAIK, we have no way of assessing whether “someone” has acquired new information other than that someone demonstrating their knowledge. We can’t MRI “acquisition of information”. So, as per this thought experiment, we have no way of telling if Mary gained new information by “seeing” red when she already “knew” what red was and could interpret it as such. That’s the whole point of the the thought experiment, which clearly went over your head.