Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: gpt-engineer – platform for devs to tinker with AI programming tools
178 points by antonoo on June 21, 2023 | hide | past | favorite | 53 comments
Hello Hacker News community,

Wanted to share a project I started working on during my spare time and was then discovered by many in the open source community last week.

GPT Engineer’s mission: Be the open platform for devs to tinker with and build their personal code-generation toolbox.

I believe it's key for us devs to engage in how building software can and will change.

You can find more info about the flexible technical "philosophy" to make it work well, and the community we want it to become on github: https://github.com/AntonOsika/gpt-engineer

The project is still in early stages. It's clear that there is a lot of room for improvement as the space to combine tricks that guide LLM's is large.

Appreciate any suggestions, experiences, or ideas on the project from you all!




Have you done much work on using GPT to *edit* code in an existing codebase? That's been my focus lately, working on my open source GPT coding tool [0].

Generating new code from whole-cloth seems like an easier task for GPT. My tool can certainly do that, as can smol-developer, etc. But you really only do that "once" per project.

Can folks use gpt-engineer to modify and extend the code it has already created, as the user comes up with new features, etc? Can it be used to work on a pre-existing codebase?

[0] https://github.com/paul-gauthier/aider


Cool project! I'm working in the same space (a ChatGPT plugin that can edit files within a shared VS Code workspace) and have built something similar to your "repo map" concept, except slightly lower-level: what you might call a "file map" generated by selectively collapsing AST nodes to fit within the available token budget. If ctags isn't cutting it for you, have a look at tree-sitter [1]. It can generate ASTs for most languages and has a nice API.

[1] https://tree-sitter.github.io/tree-sitter/


Glad to hear there are others working on similar things. I've been wishing there was a good forum for like minded folks to share ideas about AI coding, beyond the random drive-by commenting that happens here on HN.

I have been looking at tree-sitter quite a bit actually. I love that it has broad language support, which is a key design goal for my tool.

My only hesitation is that it doesn't appear to correctly identify multi-line function signatures & calls. If you look below at create, io.tool_error and __init__ you can see that the (row,col)-(row,col) indicies only reference the first line.

GPT would really benefit from seeing the entire function signature and call sites.

  $ tree-sitter tags aider/coders/base_coder.py
  ...
  create      | function def (39, 8) - (39, 14) `def create(`
  check_model_availability  | call     ref (54, 19) - (54, 43) `if not check_model_availability(main_model):`
  tool_error  | call     ref (56, 23) - (56, 33) `io.tool_error(`
  EditBlockCoder  | call     ref (66, 19) - (66, 33) `return EditBlockCoder(main_model, io, **kwargs)`
  ...
  __init__    | function def (74, 8) - (74, 16) `def __init__(`
  set         | call     ref (89, 26) - (89, 29) `self.abs_fnames = set()`
  ...


Hi, I like discussing these topics too. I just created this for this purpose if you want to join me here https://discord.gg/r3vK4xY4

There's also an existing larger "gpt hackers" Discord you might be interested in checking out that I'm not involved with: https://discord.gg/pMjHMvkK


I'd also like to join. Care to open up the invitation link again?


I too would be interested in a good forum for like minded folks to share ideas about AI coding!


Hi, I like discussing these topics too. I just created this for this purpose if you want to join me here https://discord.gg/r3vK4xY4

There's also an existing larger "gpt hackers" Discord you might be interested in checking out that I'm not involved with: https://discord.gg/pMjHMvkK


Thank you


Same. I don't know if it'd be Lemmy or something else just hosted somewhere, and there might just be 5 of us - if people are interested I can make something today. Strong preferences on platforms? If not I'll make something.


I setup https://aicoding.club/ a lemmy instance for this.


It is not loading for me :(


Aha, that would be because I got the url wrong that I just set up. https://aicoding.club/ (and fixed in the message before).

If you get a fasthosts landing page, it's DNS propogation issues. Should be hooked up to 68.183.253.254 (correct on google and cloudflare dns).


Thank you


Likewise interested and exploring the same area.


I’d be very interested in joining this group


Add me too please. Good initiative.


I'm also interested


Count me in as well.


And me!


Interested. I currently use GPT to write the majority of my commit messages and catch bugs at commit time.


I don't want to spam everyone so will just reply here in the chain - I setup https://aicoding.club/, a lemmy instance for this. Emails aren't working just now as I've only just setup the domain and need to sort spf/etc, but I'll keep an eye on things to approve people.


Is there a tool you use for that?


Looks like you know more about it than me, but it seems to me that the main challenge is being able to include the appropriate context, and train it to output a diff which you can then apply to the codebase.

Definitely looking forward to the day I review my first AI-generated PR (beyond dependency updates of course).


Yup, those seem to be the key challenges. I've been making good progress on them, but there's plenty more work to do!

On the topic of "AI-generated PRs", I used my tool to file a PR to the `glow` CLI tool. I don't know the go language, so I had my tool `aider` add the feature I needed. I mostly use glow to preview README.md for GitHub, so I wanted it to render line breaks like GitHub does.

https://github.com/charmbracelet/glow/pull/502

I've also been able solve a couple of github issues that were file by users by just pasting the issue into my tool... it fixed itself. Links below:

https://github.com/paul-gauthier/aider/issues/13#issuecommen...

https://github.com/paul-gauthier/aider/issues/5#issuecomment...


Love this!

I've been iterating on a similar project, Talk-to-Repo.

It uses retrieval-augmented-generation to access the relevant parts of the code, and lets you chat and collect which code pieces you want.

Got stuck at generating good diffs, I'll be sure to look at how you've done it!

btw i started my project by turning another project, "Twitter explainer", on itself. It loaded its own code, i asked it to add new features and copy-pasted the results (with some tweaking and occasional trips to phind.com )... :)

https://github.com/Arjeo-Inc/talk-to-repo original project by Mark Tennenholtz: https://twitter.com/marktenenholtz/status/165156810719298355...


Does anyone know if models like wizardcoder are trained on finished code only or they have been trained on ticket/PR/commit messages and the diff (with the interface of related code provided as context)?


It's interesting to me that you say

> Generating new code from whole-cloth seems like an easier task for GPT.

as I've been working on a similar tool of my own and I've found the opposite. The initial task I've been trying to have it complete is to build something like https://craigmbooth.com/projects/killer-sudoku-calculator/ but with my bespoke requirements. If I try to specify it up front and have it generate the whole thing it almost always fails in a number of ways at once, despite the requirements being relatively straightforward and clear.

However I have had success by walking it through the process step by step along the lines of "please create an index.html that references react from a cdn. It should draw a blank sudoku board" -> "Please add a button that says 'add a cage'. When the button is clicked a box labeled "Cage 1", "Cage 2", etc should be added to a list to the right of the board" -> "Please track the currently selected cage and set the background of the corresponding cage box to green", etc.

Likewise, I added the initial set of functions it could access ("Create a File", "Update a File", "Remove a File") manually, but then I had it add additional commands ("Add a directory", "Remove a directory", "Copy a file", etc) and it was able to do it correctly on the first try each time because the pattern already existed.


I think we agree, but maybe I wasn't writing clearly.

You're describing building a green field app starting from nothing, step by step. GPT shines at things like this, because they are by definition small code bases. You can probably fit the whole codebase into the context window.

Also, your approach of walking it through step-by-step is perfect, since you get to guide it to build a wise code architecture as you go.

It's hard to naively point GPT at a big, existing repo and try and do non-trivial changes to that codebase. Without a bunch of tooling to help it understand the overall codebase, it won't understand or respect the existing modules, abstractions, etc. It will just start trying to write code in a vacuum, which probably isn't the right thing to do when modifying an existing codebase.


This is something that I've also been wanting to play with. The token limit is too small for my codebase so I haven't really bothered, but it would be nice to tell it all my code and then say "refactor everything of the form `err = foo` into `if err := foo; ...`". If AI is going to take my job, it is going to have to learn to do maintenance!


Give aider a try. As long as each file fits in the context window, it should work. With gpt-3.5-turbo-16k or gpt-4 you can edit files up to around 30 kbytes in size. If you have access to gpt-4-32k, you could edit files up to about 120 kbytes. See the notes here for more info:

https://github.com/paul-gauthier/aider#gpt-4-vs-gpt-35

I use GPT for all kinds of busy work, and code quality type work. Adding test cases or quality of life features. Things that I might not have the energy to do myself. GPT can often accomplish these tasks off a 1-2 sentence request. Or if not, it will do all the boilerplate and get you 80% of the way there and it's easy to polish up the final 20%.


Perhaps it helps:

Don't entirely copy-paste your code base.

Eg. I adjusted the prompt to the logical separation in my project ( Eg. the module and the task of the class can be derived from it's namespace) and then indexed the codebase through the namespace + classes + method names with their parameters.

It helps adding more codebase to the prompt.


Not yet. Considered adding it soon, the only reason I decided against — for now — is that for automatic evaluation human edits make it be try difficult!

Fully automatic can be evaluated fully automatically.

Good input.


Hmm, if generating new code is an easy task for GPT, why don't you ask it to create a new project from scratch every moment a user comes up with a new feature?

Who needs to maintain an old codebase if you can rewrite it adding new features at whim?


> Can it be used to work on a pre-existing codebase?

For that, I posit the system would need to understand the existing code base. Not just what the code does, but the intent and the why. I'll leave it up to the reader to decide whether they believe LLMs understand anything. I know where I stand.


Im there with you, LLMs don’t understand in the same sense we do. But they can transform a well written compressed spec into code. The spec becomes the true source and can be tweaked on and chatgpt regenerates the whole code over. It’s nondeterministic so everytime it will be slightly different but with multiple renditions its average should look like the travel salesman graph of the problem itself


I've played around with aider trying to run tests and fix the code, but it just crashes after exceeding the context window. I am now trying to repurpose the AutoGPT example in langchain.


Sounds like maybe your source files are bigger than the context window? Try including fewer files in the chat. See some notes on GPT models and file sizes here:

https://github.com/paul-gauthier/aider#gpt-4-vs-gpt-35

I also have some incoming improvements so the tool is more graceful and helpful if you hit the context window limit. And in general, most of my main efforts are focused on making it possible to work with larger and larger codebases in spite of the context window limitations.

Please do file an issue with more details on your problems. I can try and help and give you updates if I make improvements to the tool which could solve your use case.


How have you created the svg screencast in the README?


Looking at the source, it appears to have been generated by https://github.com/nbedos/termtosvg


Yup, that's what I used. It produces really nice, crisp SVG screencasts.


Hey Anton, congratulations, I love the project, the results are amazing even though I still have access to gpt-3.5 only. I can't even image the results with gpt-4.

I'd love to see some improvements on the clarifications/questions part, but overall it's a great project with so much potential. Did you consider including some sort of code self-repair step?

Btw I posted a video [0] about gpt-engineer and my audience is also very impressed.

[0] https://www.youtube.com/watch?v=4ehvtuv3ZuQ


I haven't tried this but how do you get around the fact that it hallucinates functions and variables when doing it file by file? I haven't managed to make an app without error going this route. The only time it produces something working is when I keep it single-file (e.g. html, js and css together) or single-function.


I'd like to see this is going forward. So much potential.


Interesting project, great work on getting it done.

One thing I noticed is that the video in the readme doesn’t actually show the generated code running. It would be much more convincing if it did!


Cool project. I tried to build a reactjs Todo app with TDD, but it just put comments in the test file instead of the actual test. A self heal loop would be quite useful.


Wow, this is a big improvement on other 'gpt engineering' projects I've seen out there. What are the main things you think you can improve on it from here?


Just a quick thing I loved about your gif at the end, the font and the theme of your Vim settings! Really loved it, do you mind sharing your .vimrc?



This is a really cool project. I'm going to play around with it. I love the fact that it's Open-Source!


Thanks!


can it "scan" scan a local codebase, understand all the syntaxes used and their differences and being asked to write code for that particular part of the project?


Nice to finally see you submit it here Anton :)


test




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: