The Pokémon Go company tried that shortly after launch to block scraping. I remember they had three categories of IPs:
- Blacklisted IP (Google Cloud, AWS, etc), those were always blocked
- Untrusted IPs (residential IPs) were given some leeway, but quickly got to 429 if they started querying too much
- Whitelisted IPs (IPV4 addresses are used legitimately by many people), for example, my current data plan tells me my IP is from 5 states over, so anything behind a CGNAT.
You can probably guess what happens next. Most scrapers were thrown out, but the largest ones just got a modem device farm and ate the cost. They successfully prevented most users from scraping locally, but were quickly beaten by companies profiting from scraping.
I think this was one of many bad decisions Pokémon Go made. Some casual players dropped because they didn't want to play without a map, while the hardcore players started paying for scraping, which hammered their servers even more.
I have an ad hoc system that is similar, comprised of three lists of networks: known good, known bad, and data center networks. These are rate limited using a geo map in nginx for various expensive routes in my application.
The known good list is IPs and ranges I know are good. The known bad list is specific bad actors. The data center networks list is updated periodically based on a list of ASNs belonging to data centers.
There are a lot of problems with using ASNs, even for well-known data center operators. First, they update so often. Second, they often include massive subnets like /13(!), which can apparently overlap with routes announced by other networks, causing false positives. Third, I had been merging networks (to avoid overlaps causing problems in nginx) with something like https://github.com/projectdiscovery/mapcidr but found that it also caused larger overlaps that introduced false positives from adjacent networks where apparently some legitimate users are. Lastly, I had seen suspicious traffic from data center operators like CATO Networks Ltd and ZScaler that are some kind of enterprise security products that route clients through their clouds. Blocking those resulted in some angry users in places I didn't expect...
This really seems like they did everything they could and still got abused by borderline criminal activity from scrapers.
But i do really think it had an impact on scraping, it is just a matter of attrition and raising the cost so it should hurt more to scrape, the problem really never can go away, because at some point the scrapers can just start paying regular users to collect the data.
I felt that, too. It turns out I was getting 'too comfortable' while using CC. The best way is to treat CC like a junior engineer and overexplain things before letting it do anything. With time, you start to trust CC, but you shouldn't do that because it is still the same LLM when you started.
Another thing is that before, you were in a greenfield project, so Claude didn't need any context to do new things. Now, your codebase is larger, so you need to point out to Claude where it should find more information. You need to spoon-feed the relevant files with "@" where you want it to look up things and make changes.
If you feel Claude is lazy, force it to use more thinking budget "think" < "think hard" < "think harder" < "ultrathink.". Sometimes I like to throw "ultrathink" and do something else while it codes. [1]
> Does telling the AI to "just be correct" essentially work?
This forces the LLM to use more "thinking" tokens, making the AI more likely to visualize any mistakes in the previous outputs. In most APIs, this can be configured manually, producing better results for complex problems, at the cost of time.
The biggest mistake people are making is treating AI as a product instead of a feature.
While people are doing their work, they don't think, "Oh man, I am really excited to talk with AI today, and I can't wait to talk with a chatbot."
People want to do their jobs without being too bored and overwhelmed, and that's where AI comes in. But of course, we cannot hype features; we sell products after all, so that's the state we are in.
If you go to Notion, Slack, or Airtable, the headline emphasizes AI first instead of "Text Editor, Corporate Chat etc".
The problem is that AI is not "the thing", it is the "tool that gets you to the thing".
I wouldn't even call it a feature. It's enabling technology. I've never once said "I would like AI in [some product]." I say: "I would like to be able to [do this task]." If the company adds that feature to a product, I'll buy it. I don't care if the company used AI, traditional algorithms, or sorcery to make the feature work--I just care that it does what I want it to do.
Too many companies are just trying to spoon AI into their product somehow, as if AI itself is a desired feature, and are forgetting to find an actual user problem for it to actually solve.
All true, but then there goes your stratospheric valuations and all the crazy hype. This come to jesus moment may very well deflate one of the few remaining hot areas around software engineering..I could see people being reluctant to stop the hype train as then we'd really have to come to terms with the fact that the "industry" as a whole is kind of in the shitter and it's a less good time to be a software engineer across the board than 5 or 10 years ago.
I wouldn't mind it if it were presented as yet another tool in the box. Maybe have a one-time popup saying "Hey, there's this thing, here's a cool use case, go check it out on your own terms."
In reality, AI sparkles and logos and autocompletes are everywhere. It's distracting. It makes itself the star of the show instead of being a backup dancer to my work. It could very well have some useful applications, but that's for users to decide and adapt to their particular needs. The ham-fisted approach of shoving it into every UI front-and-center signals a gross sense of desperation, neediness, and entitlement. These companies need to learn how to STFU sometimes.
I like this take, in fact i feel a little uneasy when i see startups mention "MCP" on their landing pages! Its a protocol and its like saying we use HTTP here.
I could be wrong but, all in all, buy a .com for your "ai" product, such that you survive the Dot-ai bubble [1]
Agreed. We’ve got the potential to build real bicycles for the mind here and marketing departments are jumping right in to trying to sell people spandex cycling shorts.
His "I told you so" attitude only makes sense for people who believe what they saw on Twitter from people like Sam, who thrive on hype. For most people, it was obvious that we were/are in the last part of the S curve on the LLMs' advancement and won't get us to AGI.
The leap from 3.5 to 4 was amazing, but then everyone started catching up with OpenAI, which got diminishing returns on each new model. Expecting out of nowhere that OAI would improve its pacing from o1 -> o3 improvements to AGI doesn't make sense, no matter how Sam Altman hypes.
We don't know what deal they made with the VCs, but they could have multiple liquidation preference agreements.
> A liquidation preference multiple (e.g., 1x, 2x) determines how much investors receive before any distribution to common shareholders. A 2x preference means investors are entitled to twice their initial investment amount before others receive payouts.
So Google writes a check for $2.4B to Windsurf and gets the IP. Check deposited with Windsurf. Ledger entries made. Windsurf now has $2.4B in assets more than it had before. Money in the bank. Preference cliffs do not apply to this licensing deal. Key employees and CEO then take a 2.4 mile hike over to Google. Lunch is served.
Then Cognition offers $250M for Windsurf itself. Ok, I can imagine the preference cliffs kicking in now. But Windsurf just got a check for $2.4B and I don't think they had anywhere close to that in liabilities.
So where'd the $2.4B go? This seems like a strange deal.
1.2B went to investors, the remaining 1.2B was actually an incentive/payout for the founders/employees that google took. The company basically has whatever money it had in the bank, plus a bit more from Google - but no investor liabilities.
Ok, Google can pay $1.2B to the CEO and key employees to get them to walk. The other $1.2B is for the Windsurf IP and it cannot go directly to the investors. It has to go through the company where it is first revenue and then an asset.
But Windsurf could distribute profit at this point before the Cognition deal. I guess this is where the preference rights got exercised. The tweet from employee #2 said his stock wasn't worth anything. Actually, he got preferenced out of the $1.2B in dividends.
Then came the $250M Cognition deal. He got preferenced out of the proceeds of the Cognition deal as well.
Reasoning models do a lot better at AIME than non-reasoning models, with o3 mini getting 85% and 4o-mini getting 11%. It makes some sense that this would apply to small models as well.
In your first two points, I think that after you know your team well enough, you will understand where your engineers' skills lie and learn how to delegate effectively to either the person most capable or the one who will learn the most.
Your last point is true in larger enterprises. However, it's not so bad if you are a manager in smaller companies or startups, when you are 1-3 steps from the CEO, you get a lot of independence.
In the other points, I must admit I never went through an RIF and never had a situation where the "engineers were too green." However, I worked at an enterprise company where there was a 1:1 ratio of interns to employees, so it might be a large enterprise. Generally, there is always at least one senior in the team to deal with firefighting.
If there is a major change (e.g., Python 3, React Native new arch), they are replaced/forked.
reply