I'd expect mid-level developer to show more understanding and better reasoning. So far it looks like a junior dev who read a lot of books and good at copy pasting from stackoverflow.
(Based on my everyday experience with Sonet and Cursor)
The key here is "under your guidance". LLM's are a major productivity boost for many kinds of jobs, but can LLM-based agents be trusted to act fully autonomously for tasks with real world consequence? I think the answer is still no, and will be for a long time. I wouldn't trust LLM to even order my groceries without review, let alone push code into production.
To reach anything close to definition of AGI, LLM agents should be able to independently talk to customers, iteratively develop requirements, produce and test solutions, and push them to production once customers are happy. After that, they should be able to fix any issues arising in production. All this without babysitting / review / guidance from human devs, reliably
1. Sonnet 3.7 is a mid-level web developer at least
2. DeepResearch is about as good an analyst as an MBA from a school ranked 50+ nationally. Not lower than that. EY, not McKinsey
3. Grok 3/GPT-4.5 are good enough as $0.05/word article writers
Its not replacing the A-players but its good enough to replace B players and definitely better than C and D players