First place I usually go is the terms of service and what they are granting themselves rights to. Not excited about how broad this is "3.2 License: By using the Services, you hereby grant to Cognition, its affiliates, successors, and assigns a non-exclusive, worldwide, royalty-free, fully paid, sublicensable, transferable license to reproduce, distribute, modify, and otherwise use, display, and perform all acts with respect to the Customer Data as may be necessary for Cognition to provide the Services to you."
"as may be necessary for Cognition to provide the Services to you" kind of makes sense IMO. Does that mean they'll only use the license (note: they only get a license, not ownership) to provide services to you? Is it a restriction?
Yes, that clause/phrase restricts the company's rights with respect to their license to your data. Essentially, a clause like that is necessary for users to interact with the service. Makes sense when you think about it, how can they provide service if they can't use the data you provide them?
It's a pretty typical clause you'll see in most SaaS policies.
Source: I work for a SaaS, but I am not a lawyer, caveat emptor.
I want to pay for their product, but not enough that I have to ask my lawyer about the language. I did see that one of the features of the enterprise plan is custom terms, but that's not the plan I'm interested in.
I always wonder how enforceable these blanket rights would be in court. Didn’t Meta claim to own end users’ photos in the T&Cs back around 2009 and it got challenged and shot down (ianal)?
Original article that caused the outrage. In particular, the TOS did not say they owned your pictures, but it did give them a license that was quite broad, which included using your likeness in advertisements. However, the change that caused the outrage was that the license no longer expired on account deletion nor content removal.
No public testing, no benchmarks, no clear information on context window size or restrictions for extensive use, no comparison with the newest Claude Sonnet 3.5 or O1, nothing.
What we do get is a price of $ 500,- per month from a company that has been caught lying about this very product [0] and has never allowed independent testing.
Cognition, I am sorry to tell you, but there is no reason to trust you. In fact, there are multiple good reasons no to, even if you offered Devin at a fraction.
If this were e.g. Anthropic launching a new beyond Opus size model that was still performant and came with "chain-of-thought" capabilities, a far more extensive context window that still fully passes needle in haystack and is absolutely solid in sourcing from provided files, keeps on track even when provided with large documents, has few or no restrictions on usage and comes with extensive, verifiable benchmarks that showcase this offering being a significant upgrade over other models, maybe such a price could be justified.
You know why Cognition? Because they haven’t actively lied. What they did instead was let people use their models and actually test the advantages. Even Claude Instant way back when had certain use cases that made them have their own niche and showed they could execute before expanding with 2 and the larger context, then 3 with more applications. You never did any of that, you never gave anyone reason to believe what you claim, you didn’t even release benchmarks. See the difference?
Seems more like a simple cash grab, attempting to ride the O1 wave. OpenAI has a hard time justifying their Pro pricing, you doubling that makes this an out of season April fools joke. Waiting for the inevitable reporting that this is just another API wrapper for Claude or ChatGPT with our old faithful RAG.
From the second video: "We can focus on the things that excite us rather than just the maintenancing [maintenance] work".
But these are the kinds of problems that help shape the product. The software archictecture should be a compression of a deep and intuitive understanding of the problem space. How can you develop that knowledge if you're just delegating it to a black box that can't operate at a near-human level?
I've used ai based tools to great success, but on an ad-hoc basis, for specific and small functions or modules. To do the integration part requires an understanding of what abstraction is appropriate where. I don't think these tools are good that.
Good software can be art. And like all art, we have hit the stage in which code can also be cranked out en masse, thoughtlessly, for a quick buck. It was only inevitable.
Mike from Vesta (first demo video) claims Devin saved "at least a hundred hours" debugging API integrations. That seems crazy to me - API integrations rarely take that long, and any engineer would spot issues like wrong API keys almost immediately. The tool might be more valuable for non-engineers creating initial drafts, but by the time you've written all the detailed specs for Devin, a mid-level engineer could have made significant progress on the task.
I wish API integrations never took that long! But it's dependent on who you're integrating with and what your product looks like. I'm the engineering manager of the payroll integrations team at a company that does workplace savings plans.
Sometimes even when you're making calls to dozens of different endpoints they're easy, but other times, you end up guessing at how to access undocumented functionality within a GraphQL API that has introspection turned off, or working around entity modeling that's completely different from your system and requires a lot of translation. Or you work with an API whose indexes variably start from 1, 0, -1, and -2 in different endpoints. These generally aren't hard technical challenges to solve, and something like Devin that could take care of most surface-level problems you see while integrating with some XML API from 2007 would be welcome.
There are companies like https://www.tryfinch.com and https://www.merge.dev that try to solve these issues, but their abstractions also reduce flexibility and aren't a perfect for all HRIS integration use cases right now.
It handles the API-specific complexities (auth, retries, webhooks, per-customer config, pre-built templates) but allows you to implement the exact use case + data model you need.
I doubt Devin could write an integration for an underspecified legacy API like that. Whenever I have had to, I've needed to talk to support/engineering on the other side.
that's definitely still the case. devin drafts my emails for issues it runs into (which i tell it to do) and i send them off.
this is definitely slower than if i were doing it full time, but i run a company. i go from customer meeting to customer meeting and spend 5-10 min a day taking whatever is blocking devin and pasting it into an email to the partner to get a response for devin.
Debugging is a pretty vague word. I know a LOT of api endpoints with shit documentation. Could Devin generate documentation for a vast number of api endpoints that could have theoretically taken a hundred hours to write?
Looking for comprehensive benchmarks with Devin vs Cursor + Claude 3.6 vs ChatGPT o1 Pro.
In my own experience using Cursor with Claude 3.5 Sonnet (new) and o1-preview, Claude is sufficient for most things, but there are times when Claude gets stumped. Invariably that means I asked it to do too much. But sometimes, maybe 10-20% of the time, o1-preview is able to do what Claude couldn’t.
I haven’t signed up for o1 Pro because going from Cursor to copy/pasting from ChatGPT is a big DevX downgrade. But from what I’ve heard o1 Pro can solve harder coding problems that would stump Claude or o1-preview.
My solution is just to split the problem into smaller chunks that make it tractable for Claude. I assume this is what Devin’s doing. Or is Devin using custom models or an early version of the o1 (full or pro) API?
I'd expect a very different experience with Devin vs the IDE-forks -- it provides status updates in Slack, runs CI, and when it's done it puts up a pull request in GitHub.
Thanks, but that comparison is for old models, a different, non-shipped version of Devin called “Devin-base”, and doesn’t include Claude.
Slack integration, automatically pushing to CI, etc., are relatively low-value compared to the questions of “does it write better code than alternatives?”, “can I depend on it to solve hard problems?”, “will I still need a Cursor and/or ChatGPT Pro subscription to debug Devin’s mistakes?”
Should have come with a prominent warning at the app site that you're heading towards a $500 sub. I'm sure it's mentioned in places I didn't see it. Ideally, you would agree to the sub before you even create an account. This could save LOADS of signups from people who aren't your intended users.
I'm curious to see how this plays out when it comes to deploying and maintaining production-grade apps. I know relatively little about infrastructure and DevOps, but that's the stuff that actually always seems complicated when it goes from going to MVP to production. This question feels particularly important if we're expecting PMs and designers to be primary users.
That said, I'm super excited about this space and love seeing smart folks putting energy into this. Even if it's still a bit aspirational, I think the idea of cutting down time spent debugging and refactoring and putting more power in the hands of less technical folks is awesome.
my name is Devin and I don't like sharing a name with a product. Will you please consider changing the name?
There is always the chance that someone named Devin will do something that gives your product a bad name. Perhaps some new scandal will involve someone named Devin or something.
I'd also like you to imagine that a hot new erotic AI was named "Walden", and people said things like "I was talking with Walden last night" as a euphemism. How would that make you feel?
$50/month in the subscription price in the personal tier (currently not accepting new users) includes 50 credits per month in it, and $500/month teams tier includes 250 credits per month in it. This is what I see with my current user and when I try to sign up as a new user respectively.
I'd like to see that $50/month tier reopened to subscribers, and a $0/month+credits tier added (1 concurrent active session only, constrained to small VM spec with immutable rootfs (regular devin VMs have writable rootfs), no automatic knowledge generation, no snapshots, though playbooks allowed).
> Even if that version is limited to only editing public Github repos
Not possible to constrain like that with the current Devin architecture.
> For example, if I feed it with the 10 MLoC repository, how long does it take before it can start working through the problem?
The initial scan may take about a hour with a repository that size, and the knowledge base buildup will take about a week (but that one happens during the coding process). It will not continuously scan the entire codebase once it builds up the knowledge of the repository.
as someone who has been trying you guys out for the past 8 months... you need a speed lever. default devin is way too slow for me :/ i asked scott for a "demo mode" first time we met
You should really add an option to spawn a VM with immutable rootfs, current VMs all have writable rootfs which cost a lot to run, immutable VMs could be much much cheaper to operate (possibly enabling free tiers even).
Also to mention, "suggest knowledge" modal is broken (it silently ignores changes made if you edit the suggested knowledge).
Another issue, sleep&snapshot system is still prone to race conditions in certain cases.
It's a finetuned version of o1-preview sized distillation of o1-pro if I remember correctly, with an Azure Ubuntu VM with writable filesystem and internet access.
Can you only use it with a $500 / month subscription?
The word "try" is VERY different than the actual case, which is "pay for use".
If the answer to the first line is yes, how do I request my email be deleted? I started to sign up but I am not a use case for $500 a month at the moment.
we have plenty of small, early stage teams that use Devin but it's optimized designed to fit into a team's workflow. you can of course give it a try and see if it's a good fit for your projects!
It spends about 2 to 10 ACU per hour in the small VM, and ten times as much on the large one. No credits spent during sleep and "waiting for response" time as far as I observed.
Can you upload a picture on the company domain with your face, and holding a piece of paper containing your name, the date, the time, and the current bitcoin block number? That would make us more likely to believe you are properly human.
It seems like a lot of the magic is providing LLMs with tools that let it work like a human would. This approach makes more sense to me then the model of expecting an LLM to just emit a giant block of code for a change, given a pile of RAG context.
( removed pricing q, as I missed it is $500 / month for whole teams. I get why that is the pricing, but doesn't work for me to try it in side projects sadly )
Am I the only one who laments this trend of using a common first name as a product name? When I see this, my first reaction is that the company lacks any empathy for people who have the name they're co-opting.
Not sure about the “rude” part. It really depends on the person. But yes, it can get annoying rally fast. Therefore “shitty” indeed. But yeah, I do think it is very cheezy and lazy when companies do this. When I talked to someone that worked there, I guess it was because of the hard constant “X” -it would make a better Hollywood movie if they said Artificial. Language. Expanded. Xenomorphic. Amplified. A. L. E. X. A.
Devin comes from "dev in chat", a common phrase in livestream chat rooms to signal that the developer of the game or product being showcased was present.
The short version of my name is one letter away from "Alexa". You can imagine how many comments and jokes about Amazon's AI assistant I've been party to for the past decade. Although it may be hard for you to believe I actually don't really care, much as you probably don't care about the hot dogs bearing your name that you see when you walk down the cold aisle in the grocery store. Should they instead call the anthropomorphized AI assistant something like "W'rkncacnter" to preclude the possibility of name collisions (chaotic entities imprisoned in alien stars notwithstanding)?
My Japanese mom always thought it was weird to put peoples names to destructive forces like hurricanes. I think she said in Japan use some numbering system (might be as simple as incrementing, I don't remember).
The US did this for a long time -- only numbering storms. In 1953 they switched to a list of names, female only. Then 25 years later to male and female names. It is kinda weird, and if they're destructive enough the name is retired. I think the idea is that people would pay more attention to human names in the warning process as the hurricanes approach land.
When I was 7, my family's Japanese foreign exchange student was being introduced to me.
She bursted out laughing saying my nick name Dev Dev sounded like "fart fart" or "fat fart".
Had the nickname fart fart until my sister moved out of the house.
Maybe you could confirm, but ChatGPT tells me in Japanese Debu colloquially and offensively means "fat" or "chubby", and Bu is an onomotapoeia for a fart noise, like "prrt" in English.
It appears your name is Alex, so I'm not surprised that the Alexa product name doesn't bother you. I suspect you would feel different if your name was Alexa. If the product was named Nate, it would bother me. There are plethora of other options for product names that companies can use besides common first names.
I think it's different when the product is an tool you call by name to use vs just the name of the tool. E.g. the article is about "Alexa" and I'm not sure most people even realize there are ways to use it without saying "Hey Alexa" every time. Without that type of callback association it's not a very serious concern.
I don't care about it potentially being a real name, because I doubt it would be a household item, but somehow the name itself for this particular product seems offputting.
If it had to be a name for a product, it seems like to give me some sort of cheap male grooming or AXE body spray product vibes.
the workflow is quite different from Cursor or Copilot - Devin is an asynchronous tool. A common way to use Devin is to kick off a few sessions in the morning, while you work on other higher priority tasks. It feels a lot more like working with a colleague that you can tag in Slack or go back and forth with on PR comments
Devin has 2 hour, 6 hour and 24 hour inactivity limits before it pauses the work and temporarily deprovisions (sleep) the VM, so you have to supervise it every so often.
I work with a team at Nubank that has been using Devin. I would say that it doesn't quite make sense to compare it to Claude 3.5, because Devin isn't really like Copilot; it's more like an assistant to which you can assign a project. We're using it only for particular use cases, but for those particular use cases it's like having a superpower.
The reason it makes sense to compare them is there are problems that Claude 3.5 (or o1) can’t solve. Can Devin solve them? If yes, it’s easily worth the $500. If no, it’s a harder sell.
I can’t really be too specific. But I can say that at least one pattern of problem it tackles very effectively is: “we’re migrating from X to Y, and it’s going to touch a ton of files, and the nature of that migration is much more involved than what we can reasonably hope to accomplish with sed and a bash script.”
I tasked Devin with writing a project proposal (in a topic I am not going to disclose here) with multiple documents including feasibility analysis, grant applications, legal analysis and post-implementation training materials and it was almost perfect at it.
i use this every day and a lot of the magic is in the workflow and agent layer -- claude 3.5 can generate a snippet of code for you but it isn't going to open a browser, read api docs, actually make calls to the api, debug, run the code and make sure it builds and works, etc
Anthropic and OpenAI have certainly been working on this behind the scenes, while they try to see how much better they can get models, they will let others pay for the current state until they find it valuable. The shift we are seeing now is already happening, and they are taking an even larger macroscopic approach by creating computer/tool use, along with the context protocol, so that when it's released it will work with almost any IDE and system...
I was about to state that there is nothing here that 4o or Sonnet couldn’t do with very limited prompting, then I noticed that the hamburger menu on mobile doesn’t even work and had to retract that statement. Both wouldn’t have made such a mistake.
Thanks, this only cements where Devin lies in comparison and explains the lack of benchmarks and independent testing…
I'm impressed with Devin's capabilities. Its good at building standard web applications and implementing common patterns. I is particularly effective for enterprises needing basic web pages or solutions that follow established development patterns.
While Devin handles routine development tasks well, it still requires oversight and guidance when dealing with complex integrations or custom business logic. It was helpful in reducing the time spent on boilerplate code and basic setup tasks.
I understand why you would have that impression, but for what it's worth, my team has been helping other teams internally to use Devin, and we also did due diligence and experimented with OpenHands, as well as try rolling our own solution. While OpenHands is cool for what it is, neither it nor our homegrown solution came within a mile of doing what Devin could do.
How useful and at what computing cost? Genuine questions. Because from what I gather there are quite a number of "loops" to check for correctness of output, etc... that makes it expensive fast. Not talking about money, but compute.