I used to work on video generation models and was shocked at how hard it was to find any videos online that were not hosted on YouTube, and YouTube has made it impossibly hard to download more than a few videos at a time.
> YouTube has made it impossibly hard to download more than a few videos at a time
I wonder why. Perhaps because people use bots to mass-crawl contents from youtube to train their AI. And Youtube prioritizes normal users who only watch a few videos at most at the same time, over those crawling bots.
I wonder how Google built their empire. Who knows? I’m sure they didn’t scrape every page and piece of media on the internet and train models on it.
My point was that the large players have monopoly hold on large swaths of the internet and are using it to further advantage themselves over the competition. See Veo 3 as an example, YouTube creators didn’t upload their work to help Google train a model to compete with them but Google did it anyways, and creators didn’t have a choice because all eye balls are on YouTube.
By scraping every page and directing the traffic back to the site owners. That was how Google built their empire.
Are they abusing the empire's power now? In multiple ways, such as the AI overview stuff. But don't pretend that crawling Youtube and training video generation models is the same as what Google (once) brought to the internet. And it's ridiculous to expect Youtube to make it easy for crawlers.
you have to feed it multiple arguments with rate limiting and long wait times. i am not sure if there have been recent updates other than the js interpreter but ive had to spin up a docker instance of a browser to feed it session cookies as well.
Unusually well-argued post, hard to disagree with...
What exactly is the problem? That they worked on video generation models? That they only used YouTube? That they downloaded videos from YouTube? That they downloaded multiple videos from YouTube?
Yeah kinda hard to see companies being more aggressive than they already are about outsourcing. I know companies that fired their entire tech org from the CTO down and moved it to India.
When I was looking for work early this year I was told that most of the Google NYC roles were listed for internal transfers and that most of the actual hiring was in Warsaw (with 1000s of open roles, which I was told by Google recruiters at a conference in Europe)
If someone is transferring from SF to NYC they wouldn't have to advertise the position. I think the OP is referring to transferring people into the country on L1.
I was told that they were actually required to list them even if it’s someone transferring internally.
It was for a few specific ML research roles that I was interested in, of which there were very few in NYC and during the interview process I was told that they would go to internal candidates
Yeah it's even worse than that. These big cos will be incentivized to move whole teams out of the US since it will be easier to hire from other countries for offices in Paris / Zurich / Warsaw / etc.
Isn't that already the case, though? Offshoring has been a thing for decades, but companies clearly prefer to have employees on site, in the US, if possible.
Yes, this new fee will make that more expensive to do, but I'm not convinced it will no longer be worth it for most companies.
I don't see how it would "all pop" - same as with the internet bubble, even if the massive valuations disappear, it seems clear to me that the technology is already massively disruptive and will continue growing its impact on the economy even if we never reach AGI.
Exactly like the internet bubble. I've been working in Deep Learning since 2014 and am very bullish on the technology but the trillions of dollars required for the next round of scaling will not be there if GPT-5 is not on the exponential growth curve that sama has been painting for the last few years.
Just like the dot com bubble we'll need to wash out a ton of "unicorn" companies selling $1s for $0.50 before we see the long term gains.
So is this just about a bit of investor money lost? Because the internet obviously didn't decline at all after 2000, and even the investors who lost a lot but stayed in the game likely recouped their money relatively quickly. As I see it, the lesson from the dot-com bust is that we should stay in the game.
I wouldn't say "well above" when the curve falls well within the error bars. I wonder how different the plot would look if they reported the median as their point estimate rather than mean.
I don't expect GPT-5 to be anything special, it seems OpenAI hasn't been able to keep its lead, but even current level of LLMs to me justifies the market valuations. Of course I might eat my words saying that OpenAI is behind, but we'll see.
Because everything past GPT 3.5 has been pretty unremarkable? Doubt anyone in the world would be able to tell a difference in a blind test between 4.0, 4o, 4.5 and 4.1.
I would absolutely take you on a blind test between 4.0 and 4.5 - the improvement is significant.
And while I do want your money, we can just look at LMArena which does blind testing to arrive at an ELO-based score and shows 4.0 to have a score of 1318 while 4.5 has a 1438 - it's over twice likely to be judged better on an arbitrary prompt, and the difference is more significant on coding and reasoning tasks.
Well word on the street is that the OSS models released this week were Meta-Style benchmaxxed and their real world performance is incredibly underwhelming.
All the Llamas have done it (well, 2 and 3, and I believe 1, I don't know about 4). I think they have a citation for it, though it might just be the RoPE paper (https://arxiv.org/abs/2104.09864).
I'm not actually aware of any model that doesn't do positional embeddings on a per-layer basis (excepting BERT and the original transformer paper, and I haven't read the GPT2 paper in a while, so I'm not sure about that one either).
I'm also using Claude Code and am very familiar with it, but haven't had a chance to try Qwen3 Coder 30B A3B for any real-world development. That said, it did well with my "kick the tires" tests, and some reports show that it's comparable to Sonnet (at least before adding the various levels of 'think' directives):
Judging by the @america feed on twitter it will be all of the fascism with none of the fake MAGA populism. Good luck finding a constituency for that outside of a handful of billionaires and their groupies.
I've heard from someone who knows that they're scamming people like crazy. Supposedly they also setup a bunch of LLCs to hire influencers then never paid them.
A great feature of pydantic are the validation hooks that let you intercept serialization/deserialization of specific fields and augment behavior.
For example if you are querying a DB that returns a column as a JSON string, trivial with Pydantic to json parse the column are part of deser with an annotation.
Pydantic is definitely slower and not a 'zero cost abstraction', but you do get a lot for it.
Wow great timing, I just got a $22,000 bill 2 hours ago for a surgery that UHC approved 2 months ago (in a written letter from them) because they refused to pay.
I'm on the hook for $128k for a no complications birth and 5 days my newborn had to be on a CPAP machine after blue cross denied the claim. I picked the plan only after confirming all our providers were in network, but failed to check if the building where the delivery was occurring was in network.
The plan at this point is to just ignore it and hope it goes away, since they can't put it on your credit anymore.
>I picked the plan only after confirming all our providers were in network, but failed to check if the building where the delivery was occurring was in network
What?
I'm sorry what kind of kaska-esque system is this?!
It's the system that us Americans are tricked into believing is the best and nOt sOciAlIsM. Certainly USA healthcare is "the best" — if you can afford it!
My personal belief is that the kafkaesque nature of so many systems is designed to keep people destitute and despondent — to quote ole TedK: "our system keeps people demoralized because a demoralized person won't fight back."
~"We'll keep them poor and tired; if they're poor they can't afford to fight back, and if they're tired they won't have energy to..."~ —Jeff (Jonestown Massacre)
Having dropped out of a US medical school (almost two decades ago), I can assure you things have only gotten worse (from a bottom 80% POV). My best method of pyhhric victory is to not reproduce, earn just enough to live minimally (i.e. lessen tax burden/revenue), and never pay for health insurance.
I have no idea, I tried calling the number on the bill but it gave me a dialer with 8 options of "if you're calling about a bill from X which is now part of Y, please dial N". When I selected 8, which was "all other" I got a canned message telling me to call between 9-5 on a week day.
Start by calling billing and telling them what happened, and that you effectively don't have insurance and will be self-paying (said for the purpose of negotiation, not what you may or may not actually do). They should discount it by a lot.
Healthcare providers have starting saying it's "insurance fraud" to say that you don't have insurance when you do.
My guess: they know they can get more money from the insurer than the individual (or a combination of both!) so they want to scare you from not allowing them to negotiate with the insurers.
reply