Hacker News new | past | comments | ask | show | jobs | submit login
Autodoc: Toolkit for auto-generating codebase documentation using LLMs (github.com/context-labs)
166 points by funfunfunction on March 25, 2023 | hide | past | favorite | 86 comments



I have a conceptual problem with this. Documentation is meant to describe stuff thats's not in the code. Sure, there's the odd occasion where you've done some super weird optimization where you want to say "Over here I've translated all my numbers into base 5, bit reversed them and then added them, mathematically this is the same as just adding them in base 9, but fits our custom logic cell better". But that's the exception, the general purpose of documentation is to describe why it's doing hwat it's doing, not how. Tell me that this module does X this way because that helps this other module do Y". Tell me why you've divided the problem this way. You're giving information about why certain design decisions were made, not just describe what they are.

It doesn't matter how good your LLM is, the information simply isn't there for it to know the information it needs to document. You're never going to get a comment out of this that says "This interface is meant to be backwards compatible with the interface Bob once wrote on a napkin in the pub on a particularly quiet friday afternoon when he decided to reinvent Kafka".


Just give your LLM access to all your slack chats and screenshots of all items found in the bins and it would tell you what Bob had for breakfast too


> and it would tell you what Bob had for breakfast too

Sure it would, if you asked. But then, it could be 100% wrong while giving you a very confident answer.


'accuracy' and 'truth' are legacy 0.1X concepts, move fast and break things


Like many eyewitness in court


I agree with you that documentation should expose developer intent that isn't encoded in the codebase, but if the documentation at least reduces the entry bar to understanding the code, I believe it could have its merits.

Autodoc won't substitute the need for engineers to document their work, but I believe specially in legacy cosebases that it could help with maintenance of an otherwise helpless codebase.


I've been using ChatGPT to write docs. Here's how I'll do it: I'll start feeding it specs and examples about my project. Then I will tell it the outline of the docs we are going to write. Then for each section, I tell it we're going to write that section, and provide it with the subsections. Finally, in the prompt, I fill in all the key details it needs to hit, which is where I would tell it "Make a note that the interface was meant to be backwards compatible...". I had already defined ahead of time that a "Note" is an element in the docs with a little emoji in front and special styling, so it even formats it nicely.


I have a similar docs workflow now without the emojis. :)

I give non specific explanations about what I'm writing, the template and the target audience. It's sped up my documentation time significantly. I can create high quality documents in hours, instead of days.


Can you show examples of the documentation you've created with this workflow? Or better yet, can you make a screencast of the process?


I think it depends a lot on what exactly is meant by ‘documentation’. If I’m looking at the man page for for strlen, what it does is almost always the first thing I need to know. I’d go so far as to say that knowing what something does is almost always a prerequisite for understanding anything to do with ‘why’.


Yep. Comments are for why not how. Docs though can be very dumb and still quite useful. "This is the class to access the Animals database. Use getAnimals to query for animals."


Just tested it on a side-project codebase.

Main impression - it does hallucinate like crazy. I asked "How does authorization of HTTP request work?" and it started spitting explanation of how user bcrypt hash is stored in SQlite database and token is stored in Redis cache. There are no signs of SQLite or Redis whatsover on this project.

In other query it started confidently explaining how `getTeam` and `createTeam` functions work. There are no such entity or a word "team" in the entire codebase. To add to the insult, it said that this whole team handling logic is stored in `/assets/sbadmin2/scss/_mixins.scss`.

Other time it offered extremely detailed explanation of some business-logic related question, linking to a lot of existing files from the project, but that was completely off.

Sometimes it offered meaningful explanations, but was ignoring the question. Like I ask to explain relation between two entities and it started showing how to display that entity in a HTML template.

But I guess it's just a question of time when tools like this become a daily assistant. Seems invaluable for the newcomers to the codebase.


Wrong documentation is even worse than no documentation. Without it your at least force to look at the code and validate your assumptions. Get a feel for the code base. Wrong docs pointing you to tech that’s not even used is going to be a mess.


Furthermore, software is essentially versioned by its documentation first. If it says it does something, people will depend on that, and it not doing that is a bug.


I wasted two hours yesterday just to track down a bug that was in the documentation, not the code. It's so frustrating. Maintaining good documentation is incredibly hard especially when you're trying to document how your program interacts with other software because now you have to change your documentation when a third party changes their code.


Alternatively, it's hyped up like crazy, the tech is inherently bad at information retrieval, and most of the people hyping it are trying to get in on a gold rush.

To be clear, I don't know the answer.


Hey if your project is public I would love to take a look if you don’t mind sharing a link


No, sorry, that was private project.


Most interesting part to me, the prompts:

https://github.com/context-labs/autodoc/blob/83f03a3cee62d6e...

> You are acting as a code documentation expert for a project called ${projectName}. Below is the code from a file located at \`${filePath}\`. Write a detailed technical explanation of what this code does. Focus on the high-level purpose of the code and how it may be used in the larger project. Include code examples where appropriate. Keep you response between 100 and 300 words. DO NOT RETURN MORE THAN 300 WORDS. Output should be in markdown format. Do not say "this file is a part of the ${projectName} project". Do not just list the methods and classes in this file. Code: ${fileContents} Response:


This basically matches my experience in trying to get it to do the right thing. BEING VERY EXPLICIT AND ANGRY WORKS TO REINFORCE A POINT. Specifically telling it to not do a thing it will otherwise do is often necessary.

The only part that surprises me is `Output should be in markdown format`. Usually being that vague results in weird variation in output; I'd have expected a formatted example in the prompt for GPT to copy.


It understands most things without being given examples in my experience. Being explicit is helpful. Being angry is likely inconsequential, but I can't say for sure since I never felt the need TBH. What I can say is that the bot has become spookily like a person, "someone" that I conceive as helpful, courteous and friendly, albeit sometimes wildly wrong in spite of the assertive tone in the response.

I'll probably get used to it over time, as I get a deeper sense of how it works, and how it differs from real persons. ATM the distinction is blurred.


There's a subreddit where people post angry prescriptive memos or notices from their terrible bosses, and this would fit right in.


Quick, someone make a project which gets ChatGPT to answer those memos


This has far surpassed my dystopian predictions of how people would misuse LLMs.

Self spamming your own code base with comments that are either obvious, misleading or wrong was previously unfathomable to me.

Most people think I’m unrealistically pessimistic.

Well done.


Code is the ultimate reference for understanding a project, while documentation is often neglected, outdated, or incorrect. It can also be difficult to keep up to date.

An LLM may not fully comprehend the code like the original author, but it can offer a different perspective that may be valuable. The only argument I've seen against LLMs is that it may encourage laziness, but this is a flawed argument similar to those made against the printing press, which was said to make people illiterate.

As a reader of the docs: does it require discipline to refer back to the code when needed. Yes, but this is no different from the discipline required to write documentation in the first place. But with a key difference, the discipline is shifted from author to reader.


This just begs the question… why not generate the docs when someone reads the code instead? Why bother generating a bunch of half assed docs today when you can wait and interrogate the code in a much more natural and fluid way using the same LLM when it matters?


SEO. When you ask an LLM “What’s a good library/tool I can use to accomplish X?” The ones with some documentation already attached will rise to the surface and make it easier for the consuming LLM to generate code samples for.


1. It's cheaper

2. Everyone can see the same thing

3. It can be utilized for search

4. You can do both

Perhaps most importantly

5. It can be reviewed


Because not everyone will have access to LLM technology, and even if they do, your project might not always be used in a setting where access is technically feasible.


People have been doing that for a long time. It's usually a form of malicious compliance to "document everything" policies.

https://www.reddit.com/r/ProgrammerHumor/comments/4ktp12/my_...


Just becasue people are doing that for a long time it doesn't mean we should make it more efficient.


Just because we shouldn’t doesn’t mean people won’t pay for the service.

My “Well done” was not meant sarcastically!


I doubt an LLM is more efficient than previous documentation generators.


Wait. This is already an industry?!?


But you see, developers have been self spamming their code with obvious, misleading, and/or wrong comments for decades. Especially with those pesky “everything must have a doc-comment” linters. Think of all the time ChatGPT will save doing it for them!


I have actually been using chatGPT do write jsdocs for me.

It mostly only need the function and the type definitions (typescript) and it is usually very good at writing them and it does save me time.


In can help to give an initial overview over a code base, its structure and contained functionality (with the usual risk of inaccuracies and fabulations), but it can’t supply the rationale and contextual information that good documentation provides, which can’t be derived from just the code. It also can’t distinguish between what is an implementation detail and what is part of an implicit interface contract.


your pessimism is hardly unwarranted. this will lead to rather lousy code documentation, and might be misused by folks. but on the other hand, i look at this as something that will be effective as a "summarizer" of sorts where it describes already written code with lack of/poor documentation.



He now can use gpt to generate code, comments, comments in code reviews, and work on 3 jobs simultaneously.


Overemployment is ruining America.


It would be chef’s kiss perfect if another LLM did the code review.


maybe a "code formatter" that just asks an LLM if it can rewrite this code, but pretty?


I'm far, far more interested in having an LLM tell me where particular functionality is in a codebase, and how it works from a high level.

Autogenerating function documentation seems like such a low bar by comparison. It's like taking limited creativity and applying it with high powered tools.

Literally like asking for a faster horse.

Tell me how WebKit generates tiles for rasterizing a document tree. Show me specifically where it takes virtualized rendering commands and translates them into port specific graphics calls.

Show me the specific binary format and where it is written for Unreal Engine 5 .umaps so that I can understand the embedded information for working between different types of software or porting to different engines.

Some codebases are so large that it literally doesn't matter if individual functions are documented when you have to build a mental model of several layers of abstraction to understand how something works.


Completely agree. Explaining how systems work in plain english is much more valuable than just giving inputs and outputs of individual functions. We want to understand how a system and it's subsystems work independently and interdependently.

We're not there yet with Autodoc; there is still tons of work to do.

If you have't tried the demo, give it a shot. You might be surprised.


Have you considered finding a way to instead write a text editor plugin that allows me to talk to a GPT and ask questions about a codebase? This would be a serious technology that moves beyond the in-code documentation paradigm.


This is exactly what autodoc does in your terminal. It wouldn't be hard to package it as a VSCode plugin.


Documentation can help the llm search though. Layering of tasks is important.


Today, LLMs learn from a codebase that has mostly insightful comments from well-meaning humans.

In the future, the training sets will contain more and more automatically generated stuff I believe will not be curated well, leading to a spiral of ever declining quality.


I am not overly concerned by a future where AI spoils its own training sets.

Companies like openai will keep injecting human-generated feedback into the training set.

A lot of the value is already in the RLHF today. See openai technical report.


Same reasoning suggests to make a copy of Wikipedia before it’s too late :)


https://dumps.wikimedia.org/

Would be interesting to analyze it over the next years, maybe even anlyzing the past ones, with these AI detection tools.


Remember that dude that wrote the vast majority of Scots wikipedia while not being able to speak Scots? Yeah, just waiting for the first LLM to make headlines by hallucinating it's own version of human history and publishing it to wikipedia.


One thing I find worse than no docs is wrong docs.

It would be really cool if we could take code + docs, feed it into an LLM and get a determination of whether the code matches what's in the docs. It could also be a good way to evaluate the correctness of the generated docs from the linked tool (assuming it works).


If docs can be generated, they're not worth reading.


All docs are generated. Now some are generated by silicon.


I agree, and I think people are missing the point. The value of comments is to tell the reader things they can't get from reading the code, details of why the author did what they did. ChatGPT commenting its own code as it writes it makes sense, ChatGPT commenting others' code (or its own without context) is inherently guesstimating.


you're wrong it can be extremely useful


Come talk to me when it can reverse engineer as good as Ken Shirriff, develop a complete understanding of the whole codebase, and generate authoritatively accurate and useful output. Oh, and uncover bugs while it's at it.


This still wouldn’t be enough, because writing good documentation usually requires knowing contextual information not contained in the code base.


Oh, so it's only interesting when it has completely surpassed even the smartest human in intelligence? When that happens, why would anybody bring it to you? When that happens, why would humans be needed for anything?


Perhaps I was a little dramatic, but my impression is today's AI is not well suited for this task. What I laid out is slightly above my expectations for a human who does the job for me today (yes, my bar is fairly high) and the point is I won't get excited until the shiny new tech you want to replace that human with does a better job than they do.

In the meantime I'd prefer not to be subjected to served-from-a-can, machine-generated content in the code I'm trying to grok.


> Come talk to me when it can…

I am afraid no one will come talk to you when that happens.


New tech always need early adopters. You do not have to be part of them, but do not dismiss them either.


Are you really grand standing against an AI model rn


How to verify the meaning of docs? How to deal with the model hallucinations?

It would be hell to lose trust to api docs due to those risks.


The way I’m thinking of it is as a junior engineer who can go do busy work for me. I’m not going to accept their “PRs” without a review. Even if it gets me 75% of the way there, that’s still a big time savings for me.


Hallucination is definitely a problem but can be somewhat mitigated by good prompting. GPT-4 seems less prone to hallucination. This will be better over time.

You can view the prompts used for generating docs here[1] and the prompts used for answering questions here[2]

[1]https://github.com/context-labs/autodoc/blob/master/src/cli/... [2https://github.com/context-labs/autodoc/blob/master/src/cli/...]


Very good point, but easily solved - just tag the docs as being generated by GPT-4, and make sure whoever reads them knows it.


That doesn't solve the problem.

Documentation of unknown quality is useless noise.

People don't understand the unfathomable amount of garbage that's going to be generated in light of all these models. Doesn't matter how accurate they are, the lack of understanding for that remainder percent of inaccuracy is going to cause false confidence and cause errors to compound like mad.



Generally these documents would be used directly. You can use the `doc q` command to query these documents with natural language questions.


Generally these documents wouldn't be used directly. You can use the `doc q` command to query these documents with natural language questions.


Please run this on the BuildKit code base which has almost no comments but a huge usage footprint. It's also (obviously) a non-trivial test case


It was funny how for years the only documentation of BuildKit was some .md file in the middle of that repo with all the magic incantations for running apt install with cached layers... to be fair, I actually found it clearer than the real Docker docs.


Hi there, creator of autodoc here.

You can do it yourself and make pull request back to the BuildKit codebase :)

If it gets merged everyone who uses BuiltKit would have access.


Yea, not going to do it myself, would be better if you did. As I said, it's a good test to find where your tool provides value or breaks down


I'm more interested for LLMs to get the point when it can look at the several hundred codebases in your company and tell you who sets what value in your local data model, and why they set it. There's always slack instead of poorly generated documentation.


This is extremely interesting. I have a few monstrous repositories I’d like to try it on.

The thing I’m wondering about is the cost. How much would it cost to run this on the entire WordPress source, for example?


How many pages? You can get an estimate for how much it would cost using the `estimate` command in Autodoc.


I’m not prepared to install this at the moment, but if you were to give an example of costs for what I suggested above, it might convince me to try it out


Running the `doc estimate` command on the Wordpress repository says it will cost $58.47. The estimates are usually +/-10% of the actual cost.


That seems very reasonable! Thank you for testing it, I’ll give it a try on some of my own repos soon


It does not for such an uncertain result.


Very cool stuff.

I think people who dismiss this kind of tool because it can hallucinate stuff are off topic.

The AI will get better and better, but more importantly we will evolve and learn to work with this kind of tool.


Granting that some hallucination is likely, still seems like a step in the right direction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: