Hacker News new | past | comments | ask | show | jobs | submit login
Devin is now generally available (cognition.ai)
155 points by neural_thing 19 days ago | hide | past | favorite | 132 comments



First place I usually go is the terms of service and what they are granting themselves rights to. Not excited about how broad this is "3.2 License: By using the Services, you hereby grant to Cognition, its affiliates, successors, and assigns a non-exclusive, worldwide, royalty-free, fully paid, sublicensable, transferable license to reproduce, distribute, modify, and otherwise use, display, and perform all acts with respect to the Customer Data as may be necessary for Cognition to provide the Services to you."


"as may be necessary for Cognition to provide the Services to you" kind of makes sense IMO. Does that mean they'll only use the license (note: they only get a license, not ownership) to provide services to you? Is it a restriction?


Yes, that clause/phrase restricts the company's rights with respect to their license to your data. Essentially, a clause like that is necessary for users to interact with the service. Makes sense when you think about it, how can they provide service if they can't use the data you provide them?

It's a pretty typical clause you'll see in most SaaS policies.

Source: I work for a SaaS, but I am not a lawyer, caveat emptor.


I want to pay for their product, but not enough that I have to ask my lawyer about the language. I did see that one of the features of the enterprise plan is custom terms, but that's not the plan I'm interested in.


How do you use other Saas products or is this the first one you consider using ?


I always wonder how enforceable these blanket rights would be in court. Didn’t Meta claim to own end users’ photos in the T&Cs back around 2009 and it got challenged and shot down (ianal)?


I did some Googling on this.

https://web.archive.org/web/20111103081406/http://consumeris...

Original article that caused the outrage. In particular, the TOS did not say they owned your pictures, but it did give them a license that was quite broad, which included using your likeness in advertisements. However, the change that caused the outrage was that the license no longer expired on account deletion nor content removal.

https://www.npr.org/2009/02/17/100783689/facebook-users-angr...

News article about the outrage.

https://www.nytimes.com/2009/02/19/technology/internet/19fac...

News article about the walkback.

I could not find anything about it being challenged in court.


No public testing, no benchmarks, no clear information on context window size or restrictions for extensive use, no comparison with the newest Claude Sonnet 3.5 or O1, nothing.

What we do get is a price of $ 500,- per month from a company that has been caught lying about this very product [0] and has never allowed independent testing.

Cognition, I am sorry to tell you, but there is no reason to trust you. In fact, there are multiple good reasons no to, even if you offered Devin at a fraction.

If this were e.g. Anthropic launching a new beyond Opus size model that was still performant and came with "chain-of-thought" capabilities, a far more extensive context window that still fully passes needle in haystack and is absolutely solid in sourcing from provided files, keeps on track even when provided with large documents, has few or no restrictions on usage and comes with extensive, verifiable benchmarks that showcase this offering being a significant upgrade over other models, maybe such a price could be justified.

You know why Cognition? Because they haven’t actively lied. What they did instead was let people use their models and actually test the advantages. Even Claude Instant way back when had certain use cases that made them have their own niche and showed they could execute before expanding with 2 and the larger context, then 3 with more applications. You never did any of that, you never gave anyone reason to believe what you claim, you didn’t even release benchmarks. See the difference?

Seems more like a simple cash grab, attempting to ride the O1 wave. OpenAI has a hard time justifying their Pro pricing, you doubling that makes this an out of season April fools joke. Waiting for the inevitable reporting that this is just another API wrapper for Claude or ChatGPT with our old faithful RAG.

[0] https://www.youtube.com/watch?v=tNmgmwEtoWE&pp=ygUJZGV2aW4gY...


From the second video: "We can focus on the things that excite us rather than just the maintenancing [maintenance] work".

But these are the kinds of problems that help shape the product. The software archictecture should be a compression of a deep and intuitive understanding of the problem space. How can you develop that knowledge if you're just delegating it to a black box that can't operate at a near-human level?

I've used ai based tools to great success, but on an ad-hoc basis, for specific and small functions or modules. To do the integration part requires an understanding of what abstraction is appropriate where. I don't think these tools are good that.


Good software can be art. And like all art, we have hit the stage in which code can also be cranked out en masse, thoughtlessly, for a quick buck. It was only inevitable.


Mike from Vesta (first demo video) claims Devin saved "at least a hundred hours" debugging API integrations. That seems crazy to me - API integrations rarely take that long, and any engineer would spot issues like wrong API keys almost immediately. The tool might be more valuable for non-engineers creating initial drafts, but by the time you've written all the detailed specs for Devin, a mid-level engineer could have made significant progress on the task.


I wish API integrations never took that long! But it's dependent on who you're integrating with and what your product looks like. I'm the engineering manager of the payroll integrations team at a company that does workplace savings plans.

Sometimes even when you're making calls to dozens of different endpoints they're easy, but other times, you end up guessing at how to access undocumented functionality within a GraphQL API that has introspection turned off, or working around entity modeling that's completely different from your system and requires a lot of translation. Or you work with an API whose indexes variably start from 1, 0, -1, and -2 in different endpoints. These generally aren't hard technical challenges to solve, and something like Devin that could take care of most surface-level problems you see while integrating with some XML API from 2007 would be welcome.

There are companies like https://www.tryfinch.com and https://www.merge.dev that try to solve these issues, but their abstractions also reduce flexibility and aren't a perfect for all HRIS integration use cases right now.


I agree. Integrations can be incredibly cumbersome if you have to learn each API from scratch.

There are also more flexible solutions like https://www.nango.dev

It handles the API-specific complexities (auth, retries, webhooks, per-customer config, pre-built templates) but allows you to implement the exact use case + data model you need.

It's open source/source available.

(disclaimer: I am a founder)


Hey that’s really cool. I read the FAQ on connection.

So if there are 10 users, the free tier lets me give them the ability to add up to 3 integrations? Is it 3 per user?

Thanks


Been using https://www.laminar.run/ here and there and found it a good mix of abstract and being able to get in there.


clearly nobody else has spent all the time i have integrating really old mortgage software :(


I doubt Devin could write an integration for an underspecified legacy API like that. Whenever I have had to, I've needed to talk to support/engineering on the other side.


that's definitely still the case. devin drafts my emails for issues it runs into (which i tell it to do) and i send them off.

this is definitely slower than if i were doing it full time, but i run a company. i go from customer meeting to customer meeting and spend 5-10 min a day taking whatever is blocking devin and pasting it into an email to the partner to get a response for devin.


although i do agree with you w.r.t integrating like, modern software with well documented/good apis


Debugging is a pretty vague word. I know a LOT of api endpoints with shit documentation. Could Devin generate documentation for a vast number of api endpoints that could have theoretically taken a hundred hours to write?


The trend of AI tools to make a bold claim at launch, just have lots of caveats caveats caveats caveats when actually releasing to public.


Looking for comprehensive benchmarks with Devin vs Cursor + Claude 3.6 vs ChatGPT o1 Pro.

In my own experience using Cursor with Claude 3.5 Sonnet (new) and o1-preview, Claude is sufficient for most things, but there are times when Claude gets stumped. Invariably that means I asked it to do too much. But sometimes, maybe 10-20% of the time, o1-preview is able to do what Claude couldn’t.

I haven’t signed up for o1 Pro because going from Cursor to copy/pasting from ChatGPT is a big DevX downgrade. But from what I’ve heard o1 Pro can solve harder coding problems that would stump Claude or o1-preview.

My solution is just to split the problem into smaller chunks that make it tractable for Claude. I assume this is what Devin’s doing. Or is Devin using custom models or an early version of the o1 (full or pro) API?


This predates the o1 release, but the folks behind Devin did do some early evaluation of o1 vs 4o vs Devin back in September:

https://x.com/cognition_labs/status/1834292718174077014

I'd expect a very different experience with Devin vs the IDE-forks -- it provides status updates in Slack, runs CI, and when it's done it puts up a pull request in GitHub.


Thanks, but that comparison is for old models, a different, non-shipped version of Devin called “Devin-base”, and doesn’t include Claude.

Slack integration, automatically pushing to CI, etc., are relatively low-value compared to the questions of “does it write better code than alternatives?”, “can I depend on it to solve hard problems?”, “will I still need a Cursor and/or ChatGPT Pro subscription to debug Devin’s mistakes?”


Should have come with a prominent warning at the app site that you're heading towards a $500 sub. I'm sure it's mentioned in places I didn't see it. Ideally, you would agree to the sub before you even create an account. This could save LOADS of signups from people who aren't your intended users.


They have a $50 tier too, but that one is not currently open to new members.


I'm curious to see how this plays out when it comes to deploying and maintaining production-grade apps. I know relatively little about infrastructure and DevOps, but that's the stuff that actually always seems complicated when it goes from going to MVP to production. This question feels particularly important if we're expecting PMs and designers to be primary users.

That said, I'm super excited about this space and love seeing smart folks putting energy into this. Even if it's still a bit aspirational, I think the idea of cutting down time spent debugging and refactoring and putting more power in the hands of less technical folks is awesome.


hey guys - Walden here, one of the founders. Excited to have you try out Devin. Reach out here if you have any questions!


Hi Walden,

my name is Devin and I don't like sharing a name with a product. Will you please consider changing the name?

There is always the chance that someone named Devin will do something that gives your product a bad name. Perhaps some new scandal will involve someone named Devin or something.

I'd also like you to imagine that a hot new erotic AI was named "Walden", and people said things like "I was talking with Walden last night" as a euphemism. How would that make you feel?


I'd try it out if you allowed paying $50 for some credits instead of requiring subscription.

Even if that version is limited to only editing public Github repos. $500 to see how well it works is too much.


$50/month in the subscription price in the personal tier (currently not accepting new users) includes 50 credits per month in it, and $500/month teams tier includes 250 credits per month in it. This is what I see with my current user and when I try to sign up as a new user respectively.

I'd like to see that $50/month tier reopened to subscribers, and a $0/month+credits tier added (1 concurrent active session only, constrained to small VM spec with immutable rootfs (regular devin VMs have writable rootfs), no automatic knowledge generation, no snapshots, though playbooks allowed).

> Even if that version is limited to only editing public Github repos

Not possible to constrain like that with the current Devin architecture.


The price seems reasonable, but my main hesitation is on data storage + third party providers- there doesn't seem to be much available information on:

* will you store my code + train on workflows that Devin does for me? * are you piping data to other third party providers (i.e. anthropic, openAI)?


Why don't any LLM show examples of C++ applications? I have yet to see a tool like that which I would be happy to use at work.


Or CUDA code, as this will be somehow ironic given that LLMs inference engines and training are CUDA code in some way.


It can do that too, I tried that too.


I tried it with C and C++ code, it can do them but not very well.


How large the repositories are that it can "reason" about?


It's smart enough to figure out the relevant part to change once it scans the codebase. It could do Linux kernel if not for the Linus' policy.


Thanks. For example, if I feed it with the 10 MLoC repository, how long does it take before it can start working through the problem?

Would it really work well with the mixture of at least C and assembly which you implied it would with Linux kernel example?


> For example, if I feed it with the 10 MLoC repository, how long does it take before it can start working through the problem?

The initial scan may take about a hour with a repository that size, and the knowledge base buildup will take about a week (but that one happens during the coding process). It will not continuously scan the entire codebase once it builds up the knowledge of the repository.


Fascinating and scary at the same time.

Is this the beginning when intelligence, domain expertise and ability to research becomes commodity?


When crafting projects from scratch, does your system actually fix it's own errors?

That seems to be the challenge with Cursor Agent in it's current form, it generates a bunch of code that has bugs and requires a lot of iteration.


as someone who has been trying you guys out for the past 8 months... you need a speed lever. default devin is way too slow for me :/ i asked scott for a "demo mode" first time we met


latest update is around 3-4x faster than it was back in Apr but we are working on making it much faster still!


How is that done?


Devin got a lot faster for me recently, made it a lot more enjoyable to use


You should really add an option to spawn a VM with immutable rootfs, current VMs all have writable rootfs which cost a lot to run, immutable VMs could be much much cheaper to operate (possibly enabling free tiers even).

Also to mention, "suggest knowledge" modal is broken (it silently ignores changes made if you edit the suggested knowledge).

Another issue, sleep&snapshot system is still prone to race conditions in certain cases.


What model does it use under the hood?

How much context window does it load when it is solving tasks?

How does it determine which files to load into context?


It's a finetuned version of o1-preview sized distillation of o1-pro if I remember correctly, with an Azure Ubuntu VM with writable filesystem and internet access.


Can you only use it with a $500 / month subscription?

The word "try" is VERY different than the actual case, which is "pay for use".

If the answer to the first line is yes, how do I request my email be deleted? I started to sign up but I am not a use case for $500 a month at the moment.


It's monthly subscription plus prepaid compute credits (called ACU in the UI).


I'm excited to try it. I use aider quite a bit and tried opendevin at some point.

What is the pricing story?

Can I use it as side project dev or is the target enterprise customers only / mainly?


we have plenty of small, early stage teams that use Devin but it's optimized designed to fit into a team's workflow. you can of course give it a try and see if it's a good fit for your projects!


Any estimates regarding when the personal tier ($50/month+credits) will resume accepting signups?


Is it just me that finds it ironic that you're looking for software developers?


Hey, can you fix the issue where the editor times out and Devin gets stuck?


How does one estimate the number of ACUs required to finish a task?


It spends about 2 to 10 ACU per hour in the small VM, and ten times as much on the large one. No credits spent during sleep and "waiting for response" time as far as I observed.


a helpful benchmark is that a typical frontend task is about 1-2 ACUs, but really depends on the complexity of the task


Does it work with more obscure languages like Lean 4?


It can work with any language, as it interacts with the VM and can read compiler messages.


How should I go about arranging a demo?


Not really product related: The current trajectory of LLMs/Agents, what is your career advice to someone in school for Computer Science right now?


Are you a human founder?


very much so!


Can you upload a picture on the company domain with your face, and holding a piece of paper containing your name, the date, the time, and the current bitcoin block number? That would make us more likely to believe you are properly human.


It seems like a lot of the magic is providing LLMs with tools that let it work like a human would. This approach makes more sense to me then the model of expecting an LLM to just emit a giant block of code for a change, given a pile of RAG context.

( removed pricing q, as I missed it is $500 / month for whole teams. I get why that is the pricing, but doesn't work for me to try it in side projects sadly )


It says at the top $500 per month


Starting at. If you want, they will take more of your money too.


Am I the only one who laments this trend of using a common first name as a product name? When I see this, my first reaction is that the company lacks any empathy for people who have the name they're co-opting.

https://www.washingtonpost.com/technology/interactive/2021/p...

https://archive.is/w8r58


As someone named Devin who works in tech, I greatly hope this project fails. :)


No, I'm Devin.

At least our names got attached to an upstanding product, and one that is likely to languish and fail. We're not the next "Alexa", I hope.


100% agree. It is shitty and rude. Not to mention it does not even make sense.


Not sure about the “rude” part. It really depends on the person. But yes, it can get annoying rally fast. Therefore “shitty” indeed. But yeah, I do think it is very cheezy and lazy when companies do this. When I talked to someone that worked there, I guess it was because of the hard constant “X” -it would make a better Hollywood movie if they said Artificial. Language. Expanded. Xenomorphic. Amplified. A. L. E. X. A.


Devin comes from "dev in chat", a common phrase in livestream chat rooms to signal that the developer of the game or product being showcased was present.


why not just call it dev


Dev is also a human name.


The short version of my name is one letter away from "Alexa". You can imagine how many comments and jokes about Amazon's AI assistant I've been party to for the past decade. Although it may be hard for you to believe I actually don't really care, much as you probably don't care about the hot dogs bearing your name that you see when you walk down the cold aisle in the grocery store. Should they instead call the anthropomorphized AI assistant something like "W'rkncacnter" to preclude the possibility of name collisions (chaotic entities imprisoned in alien stars notwithstanding)?


My Japanese mom always thought it was weird to put peoples names to destructive forces like hurricanes. I think she said in Japan use some numbering system (might be as simple as incrementing, I don't remember).


The US did this for a long time -- only numbering storms. In 1953 they switched to a list of names, female only. Then 25 years later to male and female names. It is kinda weird, and if they're destructive enough the name is retired. I think the idea is that people would pay more attention to human names in the warning process as the hurricanes approach land.


I could definitely see the support response to a major storm being better when it's easier to communicate and identify a specific storm.


When I was 7, my family's Japanese foreign exchange student was being introduced to me. She bursted out laughing saying my nick name Dev Dev sounded like "fart fart" or "fat fart".

Had the nickname fart fart until my sister moved out of the house.

Maybe you could confirm, but ChatGPT tells me in Japanese Debu colloquially and offensively means "fat" or "chubby", and Bu is an onomotapoeia for a fart noise, like "prrt" in English.


> "W'rkncacnter" to preclude the possibility of name collisions (chaotic entities imprisoned in alien stars notwithstanding)?

Bold move, but imagine the patch notes:

• Fixed bug where assistant attempted to unmake the fabric of reality

• Resolved issue where “Set alarm for 7 AM” triggered a rampancy cascade

• Improved pronunciation of “Lh’owon” for calendar appointments

Probably still a better bet than Durandal, definitely an improvement over Tycho.

And then there was Leela…


It appears your name is Alex, so I'm not surprised that the Alexa product name doesn't bother you. I suspect you would feel different if your name was Alexa. If the product was named Nate, it would bother me. There are plethora of other options for product names that companies can use besides common first names.


But I bet you're never late for your train


I think it's different when the product is an tool you call by name to use vs just the name of the tool. E.g. the article is about "Alexa" and I'm not sure most people even realize there are ways to use it without saying "Hey Alexa" every time. Without that type of callback association it's not a very serious concern.


I don't care about it potentially being a real name, because I doubt it would be a household item, but somehow the name itself for this particular product seems offputting.

If it had to be a name for a product, it seems like to give me some sort of cheap male grooming or AXE body spray product vibes.


Probably, I share a name with a product and I couldn't care less. It's wild that some would feel bad much less consider it lacking empathy.

I don't like first name product names for other reasons but not because they share a name with humans named the same


They gotta be Joshing us! How's Dic-I mean, Richard?

Just having fun. I see what you mean and vaguely support it... I just won't lose anything over it


How is it lacking empathy? Devin is not something invoked by voice, so I fail to see the comparison to Alexa.

edit:

> Eschew flamebait. Avoid generic tangents. Omit internet tropes.

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.


I couldn't find anywhere a list of languages that this tool supports. What makes this tool better than e.g. cursor?


Aren't you guys afraid that Copilot will simply crushed you? They have all the training data afterall.


There are third party CI software like circleCI that didn't get crushed by github because its a high touch business that they don't want to get into.

There are many niches to be captured.


I thought circleCI wasn't doing too well?


No, Devin is an autopilot.


Can you also add Discord, Telegram, Gitlab, Forgejo integrations for those whose use them for their software development discussions?


> Small frontend bugs and edge cases - tag Devin in Slack threads

And other points where it should shine. How does it compare to using Cursor? Is it the slack integration?


the workflow is quite different from Cursor or Copilot - Devin is an asynchronous tool. A common way to use Devin is to kick off a few sessions in the morning, while you work on other higher priority tasks. It feels a lot more like working with a colleague that you can tag in Slack or go back and forth with on PR comments


Devin has 2 hour, 6 hour and 24 hour inactivity limits before it pauses the work and temporarily deprovisions (sleep) the VM, so you have to supervise it every so often.


Does it open PRs?


Yes, it does if you request it to do so. You can tell it to not do too.


How does Devin compare to lovable.dev ? I've been thoroughly impressed by their ability to build and host functioning apps from very basic prompts.


Is there any evidence this works better than Claude 3.5?


I work with a team at Nubank that has been using Devin. I would say that it doesn't quite make sense to compare it to Claude 3.5, because Devin isn't really like Copilot; it's more like an assistant to which you can assign a project. We're using it only for particular use cases, but for those particular use cases it's like having a superpower.


Based on this, what is the outlook for software dev generally, and junior and mid level devs?


More specifically: What kind of advice does GP have for Computer Science students in school right now?

I've been frankly terrified of the pace of LLM development since 2022.


Do you have any examples of the kinds of projects you would assign it to?


The reason it makes sense to compare them is there are problems that Claude 3.5 (or o1) can’t solve. Can Devin solve them? If yes, it’s easily worth the $500. If no, it’s a harder sell.


> We're using it only for particular use cases

Can you share concrete examples?


I can’t really be too specific. But I can say that at least one pattern of problem it tackles very effectively is: “we’re migrating from X to Y, and it’s going to touch a ton of files, and the nature of that migration is much more involved than what we can reasonably hope to accomplish with sed and a bash script.”


I tasked Devin with writing a project proposal (in a topic I am not going to disclose here) with multiple documents including feasibility analysis, grant applications, legal analysis and post-implementation training materials and it was almost perfect at it.


Amazing claims, if only it could be publicly shared and scrutinized.


i use this every day and a lot of the magic is in the workflow and agent layer -- claude 3.5 can generate a snippet of code for you but it isn't going to open a browser, read api docs, actually make calls to the api, debug, run the code and make sure it builds and works, etc


Anthropic and OpenAI have certainly been working on this behind the scenes, while they try to see how much better they can get models, they will let others pay for the current state until they find it valuable. The shift we are seeing now is already happening, and they are taking an even larger macroscopic approach by creating computer/tool use, along with the context protocol, so that when it's released it will work with almost any IDE and system...


Why wouldn't it? Just give it a shell tool. (Something like claude.vim, perhaps.)



People are saying it’s apples and oranges, but with Computer Use taken into account, this seems like a fair question.

https://docs.anthropic.com/en/docs/build-with-claude/compute...


I wish they offered a computer use reference implimentation on Windows instead of a linux docker container.


Any plans or capabilities for something local? Not a locally hosted Devin, mind you, but a way to interact with on-prem source control repos?


Devin really wasted a lot of time going GA because they lost a lot of their initial buzz


Might be an interesting headline if it said what "Devin" is.


You never say what Devin is.


[dead]


I was about to state that there is nothing here that 4o or Sonnet couldn’t do with very limited prompting, then I noticed that the hamburger menu on mobile doesn’t even work and had to retract that statement. Both wouldn’t have made such a mistake.

Thanks, this only cements where Devin lies in comparison and explains the lack of benchmarks and independent testing…


Ok, describe your experience.


I'm impressed with Devin's capabilities. Its good at building standard web applications and implementing common patterns. I is particularly effective for enterprises needing basic web pages or solutions that follow established development patterns.

While Devin handles routine development tasks well, it still requires oversight and guidance when dealing with complex integrations or custom business logic. It was helpful in reducing the time spent on boilerplate code and basic setup tasks.


Why does this response sound llm generated to me? Maybe it's the phrase "however it's worth noting"


And starting with "based on".

Definitely LLM slop. Shameless.


Account created 17 hours ago.

So dang clumsy. I mean come on.


[flagged]


I understand why you would have that impression, but for what it's worth, my team has been helping other teams internally to use Devin, and we also did due diligence and experimented with OpenHands, as well as try rolling our own solution. While OpenHands is cool for what it is, neither it nor our homegrown solution came within a mile of doing what Devin could do.


Are they actually just using 3rd party calls or are they hosting the GPT themselves?


Well yeah, and GPT is just a bunch of matrix multiplications configured in a specific way.

The bells and whistles are what turn it into something useful.


How useful and at what computing cost? Genuine questions. Because from what I gather there are quite a number of "loops" to check for correctness of output, etc... that makes it expensive fast. Not talking about money, but compute.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: