My work has been adding more and more AI review bots. It's been like 0 for 10 for the feedback the AI has given me. Just wasting my time. I see where it's coming from, it's not utter nonsense, but it just doesn't understand the nuance or why something is logically correct.
That said, there have been some reports where the AIs have predicted what later became outages when they were ignored.
So... I don't know. Is it worth wading through 10 bad reviews of 1 good one prevents a bad bug? Maybe. I do hope the ratio gets better though
Yes, specifically when it comes to open-ended research or development, collocation is non-negotiable. There are greater than linear benefits in creativity of approach, agility in adapting to new intermediate discoveries, etc that you get by putting a number of talented people who get along in the same space who form a community of practice.
Remote work and flattening communication down to what digital media (Slack, Zoom, etc) afford strangle the beneficial network effects.
I think they were talking about total time spent working rather than remote vs. in-person. I've seen more than a few studies over the years showing that going from 40 to 35 or 30 hours/wk has minimal or positive impacts on productivity. Idk if that would apply to all work environments though, and I don't recall any of the studies being about research productivity specifically.
You’re being downvoted but you’re right. The number of people who act like a web cam reproduces the in person experience perfectly, for good and bad, is hilarious to me.
$89,000 GDP per capita vs $46,000 rather proves the point about productivity per butt. US office workers are extraordinarily productive in terms of what their work generates (thanks to numerous well understood things like the outsized US scaling abilities). Measuring beyond that is very difficult due to the variance of every business.
Weird take. Norway has about the same gdp per capita as the USA with stricter regulations than France. Ireland’s GDP per capita is higher than that of the USA, with less bureaucracy than France but more than the US. Not to mention that all of these are before adjusting for PPP. Almost as if GDP per capita is not a good measurement of productivity.
First, one should probably look at GNP (or even GNI) rather than GDP to reduce the distortionary impact of foreign direct investment, company headquarters for tax reasons, etc.
Next, need to distinguish between market rate and PPP, as you highlight.
Lastly, these are all measures of output (per capita), while productivity is output per input, in this context output per hour worked. There the differences are less pronounced.
> $89,000 GDP per capita vs $46,000 rather proves the point about productivity per butt.
So if I work 24h/day in a farm in Afghanistan, I should earn more than software developers in the Silicon Valley (because I'm pretty sure that they sleep)? Is that how you say GDP works?
I think maybe we should completely switch to admitting this. Every extra second you sit in the (home)office adds to productivity, just not necessarily converting into market values, that can be inflated with hype. Also longer hours is not necessarily safe or sustainable.
We only wish more time != more productivity because it's inconvenient in multiple ways if it were. We imagine a multiplier in there to balance the equation, such factor that can completely negate production, using mere anecdotal experiences as proofs.
Maybe that's not scientific, maybe time spent very closely match productivity, and maybe production as well as productivity need external, artificial regulations.
> Every extra second you sit in the (home)office adds to productivity
I'm not sure I believe that. I think at some point the additional hours worked will ultimately decrease the output/unit of time and at some point that you'll reach a peak whereafter every hour worked extra will lead to an overall productivity loss.
Its also something that I think is extremely hard to consistently measure, especially for your typical office worker.
Music players, including car radios and portable CD and MiniDisc players, did that around 25 years ago. It's sort-of a standard UI pattern for variable-length text in a fixed-size display.
Yep. 17 year old me working alongside a 70 year old dude working the same job as me... I knew that's not what I wanted for my life.
That said, I think I've still wafted through life on tracks. I just concluded that FAANG was the next track after uni so I made it happen. Not sure I'm happy any more though. Maybe I need to reinvent myself.
That's not been my experience so far. LLMs are good at mimicking existing good, it doesn't usually bring in new things when not asked. Sometimes I have to go out of my way to point to other bits of code in the project to copy from because it hasn't ingested enough of the codebase.
That said, a negative prompt like we have in stable diffusion would still be very cool.
I'm in the camp of 'no good for existing'. I try to get ~1000 line files refactored to use different libraries, design paradigms, etc and it usually outputs garbage - pulling db logic into the UI, grabbing unrelated api/function calls, to entirely just corrupting the output.
I'm sure there is a way to correctly use this tool, so I'm feeling like I'm "just holding it wrong".
Which LLM are you using? what LLM tool are you using? What's your tech stack that you're generating code for? Without sharing anything you can't, what prompts are you using?
I would suggest using an agentic system like Cline, so that the LLM can wander through the codebase by itself and do research and build a "mental model" and then set up an implementation plan. The you iterate in that and hand it off for implementation. This flow works significantly better than what you're describing.
It doesn't need the entire codebase, it just needs the call map, the function signatures, etc. It doesn't have to include everything in a call - but having access to all of it means it can pick what seems relevant.
Yes, that's exactly right. The LLM gets a rough overview over the project (as you said, including function signatures and such) and will then decide what to open and use to complete/implement the objective.
If your repo map fits into 1000 tokens then your repo is small enough that you can just concatenate all the files together and feed the result as one prompt to the LLM.
No, current LLM technology does not allow to process actual (i.e. large) repos.
I've refactored some files over 6000 loc. It was necessary to do it iteratively with smaller patches. "Do not attempt to modify more than one function per iteration" It would just gloss over stuff. I would tell it repeatedly: I noticed you missed something, can you find it? I kept doing that until it couldn't find anything. Then I had to manually review and ask for more edits. Also lots of style guidelines and scope limit instructions. In the end it worked fine and saved me hours of really boring work.
I'll back this up. I feel constantly gaslit by people who claim they get good output.
I was hacking on a new project and wanted to see if LLMs could write some of it. So I picked an LLM friendly language (python). I picked an LLM friendly DB setup (sqlalchemy and postgres). I used typing everywhere. I pre-made the DB tables and pydantic schema. I used an LLM-friendly framework (fastapi). I wrote a few example repositories and routes.
I then told it to implement a really simple repository and routes (users stuff) from a design doc that gave strict requirements. I got back a steaming pile of shit. It was utterly broken. It ignored my requirements. It fucked with my DB tables. It fucked with (and broke) my pydantic. It mixed db access into routes which is against the repository pattern. Etc.
I tried several of the best models from claude, oai, xai, and google. I tried giving it different prompts. I tried pruning unnecessary context. I tried their web interfaces and I tried cursor and windsurf and cline and aider. This was a pretty basic task I expect an intern could handle. It couldn't.
Every LLM enthusiast I've since talked to just gives me the run-around on tooling and prompting and whatever. "Well maybe if you used this eighteenth IDE/extension." "Well maybe if you used this other prompt hack." "Well maybe if you'd used a different design pattern."
The fuck?? Can vendors not produce a coherent set of usage guidelines? If this is so why isn't there a set of known best practices? Why can't I ever replicate this? Why don't people publish public logs of their interactions to prove it can do this beyond a "make a bouncing ball web game" or basic to-do list app?
Prisma and Drizzle...those gave me a bit too much heck. Kysley is close enough to SQL while offering some benefits, typings being one of them, but also query builders are often helpful when I need to run subtle variations of the same query, e.g. depending on the user's permissions or to add search filters.
reply