Show HN: gpt-engineer – platform for devs to tinker with AI programming tools

anotherpaulg · on June 21, 2023

Have you done much work on using GPT to *edit* code in an existing codebase? That's been my focus lately, working on my open source GPT coding tool [0].

Generating new code from whole-cloth seems like an easier task for GPT. My tool can certainly do that, as can smol-developer, etc. But you really only do that "once" per project.

Can folks use gpt-engineer to modify and extend the code it has already created, as the user comes up with new features, etc? Can it be used to work on a pre-existing codebase?

[0] https://github.com/paul-gauthier/aider

ghughes · on June 22, 2023

Cool project! I'm working in the same space (a ChatGPT plugin that can edit files within a shared VS Code workspace) and have built something similar to your "repo map" concept, except slightly lower-level: what you might call a "file map" generated by selectively collapsing AST nodes to fit within the available token budget. If ctags isn't cutting it for you, have a look at tree-sitter [1]. It can generate ASTs for most languages and has a nice API.

[1] https://tree-sitter.github.io/tree-sitter/

anotherpaulg · on June 22, 2023

Glad to hear there are others working on similar things. I've been wishing there was a good forum for like minded folks to share ideas about AI coding, beyond the random drive-by commenting that happens here on HN.

I have been looking at tree-sitter quite a bit actually. I love that it has broad language support, which is a key design goal for my tool.

My only hesitation is that it doesn't appear to correctly identify multi-line function signatures & calls. If you look below at create, io.tool_error and __init__ you can see that the (row,col)-(row,col) indicies only reference the first line.

GPT would really benefit from seeing the entire function signature and call sites.

  $ tree-sitter tags aider/coders/base_coder.py
  ...
  create      | function def (39, 8) - (39, 14) `def create(`
  check_model_availability  | call     ref (54, 19) - (54, 43) `if not check_model_availability(main_model):`
  tool_error  | call     ref (56, 23) - (56, 33) `io.tool_error(`
  EditBlockCoder  | call     ref (66, 19) - (66, 33) `return EditBlockCoder(main_model, io, **kwargs)`
  ...
  __init__    | function def (74, 8) - (74, 16) `def __init__(`
  set         | call     ref (89, 26) - (89, 29) `self.abs_fnames = set()`
  ...

spaghetti1535 · on June 22, 2023

Hi, I like discussing these topics too. I just created this for this purpose if you want to join me here https://discord.gg/r3vK4xY4

There's also an existing larger "gpt hackers" Discord you might be interested in checking out that I'm not involved with: https://discord.gg/pMjHMvkK

qrian · on July 3, 2023

I'd also like to join. Care to open up the invitation link again?

deathmonger5000 · on June 22, 2023

I too would be interested in a good forum for like minded folks to share ideas about AI coding!

spaghetti1535 · on June 22, 2023

Hi, I like discussing these topics too. I just created this for this purpose if you want to join me here https://discord.gg/r3vK4xY4

There's also an existing larger "gpt hackers" Discord you might be interested in checking out that I'm not involved with: https://discord.gg/pMjHMvkK

deathmonger5000 · on June 23, 2023

Thank you

IanCal · on June 22, 2023

Same. I don't know if it'd be Lemmy or something else just hosted somewhere, and there might just be 5 of us - if people are interested I can make something today. Strong preferences on platforms? If not I'll make something.

IanCal · on June 22, 2023

I setup https://aicoding.club/ a lemmy instance for this.

selestify · on June 22, 2023

It is not loading for me :(

IanCal · on June 22, 2023

Aha, that would be because I got the url wrong that I just set up. https://aicoding.club/ (and fixed in the message before).

If you get a fasthosts landing page, it's DNS propogation issues. Should be hooked up to 68.183.253.254 (correct on google and cloudflare dns).

deathmonger5000 · on June 23, 2023

Thank you

kaechle · on June 22, 2023

Likewise interested and exploring the same area.

splatzone · on June 22, 2023

I’d be very interested in joining this group

antonp · on June 22, 2023

Add me too please. Good initiative.

Garlef · on June 22, 2023

I'm also interested

braindead_in · on June 22, 2023

Count me in as well.

LuisFerreira · on June 22, 2023

And me!

ninjaa · on June 22, 2023

Interested. I currently use GPT to write the majority of my commit messages and catch bugs at commit time.

IanCal · on June 22, 2023

I don't want to spam everyone so will just reply here in the chain - I setup https://aicoding.club/, a lemmy instance for this. Emails aren't working just now as I've only just setup the domain and need to sort spf/etc, but I'll keep an eye on things to approve people.

selestify · on June 23, 2023

Is there a tool you use for that?

headcanon · on June 21, 2023

Looks like you know more about it than me, but it seems to me that the main challenge is being able to include the appropriate context, and train it to output a diff which you can then apply to the codebase.

Definitely looking forward to the day I review my first AI-generated PR (beyond dependency updates of course).

anotherpaulg · on June 21, 2023

Yup, those seem to be the key challenges. I've been making good progress on them, but there's plenty more work to do!

On the topic of "AI-generated PRs", I used my tool to file a PR to the `glow` CLI tool. I don't know the go language, so I had my tool `aider` add the feature I needed. I mostly use glow to preview README.md for GitHub, so I wanted it to render line breaks like GitHub does.

https://github.com/charmbracelet/glow/pull/502

I've also been able solve a couple of github issues that were file by users by just pasting the issue into my tool... it fixed itself. Links below:

https://github.com/paul-gauthier/aider/issues/13#issuecommen...

https://github.com/paul-gauthier/aider/issues/5#issuecomment...

jonrouach · on June 21, 2023

Love this!

I've been iterating on a similar project, Talk-to-Repo.

It uses retrieval-augmented-generation to access the relevant parts of the code, and lets you chat and collect which code pieces you want.

Got stuck at generating good diffs, I'll be sure to look at how you've done it!

btw i started my project by turning another project, "Twitter explainer", on itself. It loaded its own code, i asked it to add new features and copy-pasted the results (with some tweaking and occasional trips to phind.com )... :)

https://github.com/Arjeo-Inc/talk-to-repo original project by Mark Tennenholtz: https://twitter.com/marktenenholtz/status/165156810719298355...

isaacfung · on June 22, 2023

Does anyone know if models like wizardcoder are trained on finished code only or they have been trained on ticket/PR/commit messages and the diff (with the interface of related code provided as context)?

lgas · on June 22, 2023

It's interesting to me that you say

> Generating new code from whole-cloth seems like an easier task for GPT.

as I've been working on a similar tool of my own and I've found the opposite. The initial task I've been trying to have it complete is to build something like https://craigmbooth.com/projects/killer-sudoku-calculator/ but with my bespoke requirements. If I try to specify it up front and have it generate the whole thing it almost always fails in a number of ways at once, despite the requirements being relatively straightforward and clear.

However I have had success by walking it through the process step by step along the lines of "please create an index.html that references react from a cdn. It should draw a blank sudoku board" -> "Please add a button that says 'add a cage'. When the button is clicked a box labeled "Cage 1", "Cage 2", etc should be added to a list to the right of the board" -> "Please track the currently selected cage and set the background of the corresponding cage box to green", etc.

Likewise, I added the initial set of functions it could access ("Create a File", "Update a File", "Remove a File") manually, but then I had it add additional commands ("Add a directory", "Remove a directory", "Copy a file", etc) and it was able to do it correctly on the first try each time because the pattern already existed.

anotherpaulg · on June 22, 2023

I think we agree, but maybe I wasn't writing clearly.

You're describing building a green field app starting from nothing, step by step. GPT shines at things like this, because they are by definition small code bases. You can probably fit the whole codebase into the context window.

Also, your approach of walking it through step-by-step is perfect, since you get to guide it to build a wise code architecture as you go.

It's hard to naively point GPT at a big, existing repo and try and do non-trivial changes to that codebase. Without a bunch of tooling to help it understand the overall codebase, it won't understand or respect the existing modules, abstractions, etc. It will just start trying to write code in a vacuum, which probably isn't the right thing to do when modifying an existing codebase.

jrockway · on June 22, 2023

This is something that I've also been wanting to play with. The token limit is too small for my codebase so I haven't really bothered, but it would be nice to tell it all my code and then say "refactor everything of the form `err = foo` into `if err := foo; ...`". If AI is going to take my job, it is going to have to learn to do maintenance!

anotherpaulg · on June 22, 2023

Give aider a try. As long as each file fits in the context window, it should work. With gpt-3.5-turbo-16k or gpt-4 you can edit files up to around 30 kbytes in size. If you have access to gpt-4-32k, you could edit files up to about 120 kbytes. See the notes here for more info:

https://github.com/paul-gauthier/aider#gpt-4-vs-gpt-35

I use GPT for all kinds of busy work, and code quality type work. Adding test cases or quality of life features. Things that I might not have the energy to do myself. GPT can often accomplish these tasks off a 1-2 sentence request. Or if not, it will do all the boilerplate and get you 80% of the way there and it's easy to polish up the final 20%.

NicoJuicy · on June 22, 2023

Perhaps it helps:

Don't entirely copy-paste your code base.

Eg. I adjusted the prompt to the logical separation in my project ( Eg. the module and the task of the class can be derived from it's namespace) and then indexed the codebase through the namespace + classes + method names with their parameters.

It helps adding more codebase to the prompt.

antonoo · on June 21, 2023

Not yet. Considered adding it soon, the only reason I decided against — for now — is that for automatic evaluation human edits make it be try difficult!

Fully automatic can be evaluated fully automatically.

Good input.

anton-107 · on June 26, 2023

Hmm, if generating new code is an easy task for GPT, why don't you ask it to create a new project from scratch every moment a user comes up with a new feature?

Who needs to maintain an old codebase if you can rewrite it adding new features at whim?

andsoitis · on June 22, 2023

> Can it be used to work on a pre-existing codebase?

For that, I posit the system would need to understand the existing code base. Not just what the code does, but the intent and the why. I'll leave it up to the reader to decide whether they believe LLMs understand anything. I know where I stand.

grugagag · on June 22, 2023

Im there with you, LLMs don’t understand in the same sense we do. But they can transform a well written compressed spec into code. The spec becomes the true source and can be tweaked on and chatgpt regenerates the whole code over. It’s nondeterministic so everytime it will be slightly different but with multiple renditions its average should look like the travel salesman graph of the problem itself

braindead_in · on June 22, 2023

I've played around with aider trying to run tests and fix the code, but it just crashes after exceeding the context window. I am now trying to repurpose the AutoGPT example in langchain.

anotherpaulg · on June 22, 2023

Sounds like maybe your source files are bigger than the context window? Try including fewer files in the chat. See some notes on GPT models and file sizes here:

https://github.com/paul-gauthier/aider#gpt-4-vs-gpt-35

I also have some incoming improvements so the tool is more graceful and helpful if you hit the context window limit. And in general, most of my main efforts are focused on making it possible to work with larger and larger codebases in spite of the context window limitations.

Please do file an issue with more details on your problems. I can try and help and give you updates if I make improvements to the tool which could solve your use case.

rane · on June 22, 2023

How have you created the svg screencast in the README?

andrewescott · on June 22, 2023

Looking at the source, it appears to have been generated by https://github.com/nbedos/termtosvg

anotherpaulg · on June 22, 2023

Yup, that's what I used. It produces really nice, crisp SVG screencasts.

mayaakim · on June 22, 2023

Hey Anton, congratulations, I love the project, the results are amazing even though I still have access to gpt-3.5 only. I can't even image the results with gpt-4.

I'd love to see some improvements on the clarifications/questions part, but overall it's a great project with so much potential. Did you consider including some sort of code self-repair step?

Btw I posted a video [0] about gpt-engineer and my audience is also very impressed.

[0] https://www.youtube.com/watch?v=4ehvtuv3ZuQ

Kiro · on June 22, 2023

I haven't tried this but how do you get around the fact that it hallucinates functions and variables when doing it file by file? I haven't managed to make an app without error going this route. The only time it produces something working is when I keep it single-file (e.g. html, js and css together) or single-function.

gaolei8888 · on June 21, 2023

I'd like to see this is going forward. So much potential.

gitgud · on June 24, 2023

Interesting project, great work on getting it done.

One thing I noticed is that the video in the readme doesn’t actually show the generated code running. It would be much more convincing if it did!

braindead_in · on June 22, 2023

Cool project. I tried to build a reactjs Todo app with TDD, but it just put comments in the test file instead of the actual test. A self heal loop would be quite useful.

LonisHamaili · on June 21, 2023

Wow, this is a big improvement on other 'gpt engineering' projects I've seen out there. What are the main things you think you can improve on it from here?

reallymental · on June 21, 2023

Just a quick thing I loved about your gif at the end, the font and the theme of your Vim settings! Really loved it, do you mind sharing your .vimrc?

Hallmane · on June 21, 2023

https://github.com/AntonOsika/dotfiles

nathan_tarbert · on June 21, 2023

This is a really cool project. I'm going to play around with it. I love the fact that it's Open-Source!

antonoo · on June 21, 2023

Thanks!

thepra · on June 22, 2023

can it "scan" scan a local codebase, understand all the syntaxes used and their differences and being asked to write code for that particular part of the project?

ErikBjare · on June 22, 2023

Nice to finally see you submit it here Anton :)

nobu513 · on June 22, 2023