Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Corgea (YC S23) – Auto fix vulnerable code
34 points by asadeddin on Jan 9, 2024 | hide | past | favorite | 43 comments
Hi HN, I’m the founder of Corgea (https://corgea.com). We help companies fix their vulnerable source code using AI.

Originally, we started with a data security product that would detect data leaks at companies. Despite initial successes and customer acquisitions, we frequently heard that highlighting issues wasn't enough; customers wanted proactive fixes. They had hundreds (yes hundreds!) of security tools alerting them about vulnerabilities, but couldn’t afford a dedicated team to go through them all and fix them. One prospect we spoke to had tens of thousands of reported vulnerabilities in their SAST tool. With the rise of AI code generation, we saw an opportunity to give customers what they really wanted.

Having Corgea is like having a security engineer on staff focused on making your code more secure. We want security to be an enabler of engineering rather than a blocker to it, and the reverse to be true. To accomplish this, we built it on top of existing LLMs to issue code fixes.

To show Corgea’s capabilities, we took some popular vulnerable-by-design applications like Juice Shop (https://github.com/juice-shop/juice-shop), scanned them and issued fixes for their vulnerabilities. You can see some of them here: https://demo.corgea.com. Some examples of vulnerabilities it solves are like SQL injection, Path Traversal and XSS.

What makes this tough is that currently LLMs struggle at generalist coding tasks because it has to understand your whole code base, the domain you’re in, and the user’s request to do something. This can lead to a lot of unintended behavior where it codes things incorrectly because it’s giving a best guess at what you want. Adam, one of the founding engineers on the team coined it well: LLMs don’t reason, they fuzz.

We made several decisions that helped the LLM become more deterministic. First, what we’re doing is extremely domain specific: vulnerable code fixes in a limited number of programming languages. There are roughly 900 security vulnerabilities in code, called CWE’s (https://cwe.mitre.org/), that we’ve built into Corgea. An SQL injection vulnerability in a Javascript app is the same regardless if you’re a payments company or a travel booking website. Second, we have no user generated input going into the LLM, because SAST scanners everything needed to issue a fix. This makes it much more predictable and reproducible for us and customers. We can also create robust QA processes and checks.

To illustrate the point, let’s put some of this to the test using some napkin math. Assume you’re serving 5,000 enterprises that ship on average 300 domain specific features a year in 5 different programming languages that each require 30 lines of code changes across multiple files. You’ll have about 300m permutations the product needs to support. What a nightmare!

Using the same napkin math, Corgea needs to support the ~900 vulnerabilities (CWE’s). Most of them require 1 - 2 line changes. It doesn’t need to understand the whole codebase since the problem is usually isolated to a few lines. We want to support the 5 most popular programming languages. If we have 5,000 customers, we have to support ~4,500 permutations (900 issues x 5 different languages). This leads to a massive difference in accuracy. Obviously, this is an oversimplification of the whole thing but it illustrates the point.

What makes this different from Copilot and other code-gen tools is that they do not specialize in security and we’ve seen them inadvertently introduce security issues unbeknownst to the engineer. Additionally, they do not integrate into existing scanning tools that companies are using to resolve those issues. So unless a developer is working on every part of the product, they’re unable to clear security backlogs, which can be in the thousands of tickets.

As for security scanners, the current market is flooded with tools that report and overwhelm security teams and are not effective at fixing what they’re reporting. Most vulnerability scanners do not remediate issues, and if they do they’re mostly limited to upgrading packages from one version to another to reduce a CVSS. If they do offer CWE remediation capabilities their success rates are very low because they’re often based on traditional AI methodologies. Additionally, they do not integrate with each other because they want to only serve their own findings. Enterprises use multiple tools like Snyk, Semgrep, Checkmarx, but also have a penetration testing program, and a bug bounty program. They need a solution that consolidates across their existing tools. They also use Github, Gitlab and Bitbucket for their code repository.

We’re offering a free tier for smaller teams and priced tiers. We believe we can reduce 80% of the engineering effort for security fixes, which would equate to at least $10m a year for enterprises.

We’re really excited to share this with you all and we’d love any thoughts, feedback, and comments!




Your sample fix for the ssrf bug is wrong: it ignores IPv6 and DNS returning localhost or other interesting things on the network. Really there isn't a great answer without knowing something about the network or not having the feature.


I worry that this tool will rewrite code such that the security scanning tool can no longer detect the problem, but won't actually fix it (as above). This ends up being an adversarial system that makes it even harder to detect the vulnerabilities left behind. If the generated patches are reviewed by non-experts, these details will be missed.

Edit: To highlight a specific problem here: a classic target for SSRF is the instance metadata IP address[1]. This IP address is not on the generated blacklist. Worse you've made it harder to detect this problem in the future.

I don't want to recommend a fix here; you're selling the fix. You should consider hiring a security expert to determine if LLM is really up for this task.

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance...


Thanks for commenting, and it is a great point. Keep in mind, we're not touting that Corgea is the fixer-all for every vulnerability. CWE's typically have simpler and more standardized fixes. You can see examples of that here: https://cwe.mitre.org/

A few things we're doing to combat this: 1) We've given the entire corpus of CWE's to Corgea, and how to fix things safely. This we've found from our testing and users does a really good job. I personally QA a lot of results (in the thousands) and we've not seen that to be a common problem. 2) Corgea is designed to require two sets of eye balls. The first being the security engineer and the second being the developer that reviews the PR. We hope, that things should be caught. Additionally, we believe our fixes will be better than what a developer does. There are over 900 CWE's, and it's really hard for engineers to know how to fix every issue. Googling answers and asking ChatGPT can lead to them introducing issues. 3) We provide in the product AI generated explanations on the fix and how it was appropriate. This is to educate non-experts on the topic. 4) We already have checks in place to make sure things aren't misbehaving, but we're rolling out soon a more advanced fix checker to make sure we didn't introduce any new vulnerabilities. Based on our testing, and 5) Finally we QA a lot every week, and run reports on the areas we're good at or not so good at to help us iterate.


I guess I'm not convinced. This is a demo that should have been chosen to show the product in the best light, but it doesn't fix the problem. Was the demo reviewed by QA?

I really like the idea and I think that you're right about the goal being to fix bugs "better than the average engineer". I don't think you've reached that bar.


Looks like this is is not the only problematic example, for example: https://demo.corgea.com/338 Makes sure you don't try to get ctf.key (but not .env for example). Another issue: https://demo.corgea.com/531# The LLM makes up a usage of shell=True despite the original “vulnerable” code not using it.

Well, at least they are showing a real demo and not some made up results.

I think that overall the idea has some potential, but not sure we are there yet.


Thanks for the feedback!

For the first one the SAST scanner reports to us issues based on lines and issue type, so we generate fixes isolated for that issue. We do not generate fixes for other vulnerabilities in the same file for the same finding in the same because we want to have one fix to one finding. There might be another issue reported on another issue, and we plan on allowing people to group fixes in the same file together.

Not sure if I'm missing something on the shell=True. It's in the vulnerable code, which is why it changed it. You have to scroll to the right in the code viewer. https://github.com/RhinoSecurityLabs/cloudgoat/blob/8ed1cf0e...

Is there something I'm missing?


For the first issue: I understand. Thanks.

As for the second, There is no shell=True for me in the demo but it is present in the code you sent. So maybe it is just a bug in the presentation somewhere.


Scrolling to the right should work, but you'll need to do so on each code editor section. We should combine scrolling of these two windows to be in sync.

We'll also take a look at what's causing this. It might be a browser issue.


They scroll in sync for me, but long lines seem truncated in iOS 16.2 Safari. No visible code on that second linked page includes the string in question.


Thanks for sharing! Will look into it :)


Same here, must be a bug in the view, for me it's missing the closing parenthesis as well.


Yeah, it doesn’t fix the issue at all. Rough to have a security product demo be fundamentally insecure.


I'll respond to this comment to provide a general response for all of the sub-comments here.

As I highlighted in my post, LLM's generally are still not in a position to replace a developer for more complex tasks and refactoring. We're in the early days of the technology, but we are seeing extremely strong improvements in it over the last year. We on the team have QA'd thousands of results for public, and private repositories. The private ones are particularly interesting because the LLM's do not have that in their corpus, and have seen very strong fix results.

Most people just assume we're wrapping around an LLM, but there's a lot that goes underneath the hood that needs to happen to ensure that fixes are going to be secure and correct. Here are the standards we're setting for fix quality:

- The fix needs to be best-practice and complete. A partial security fix isn't a security fix. This is something we're constantly working on. - Supporting the widest coverage in CWE's.

- Not introducing any breaking changes in the rest of the code. - Understanding the language, the framework being used, and any specific packages. For example, fixing an CSRF issue in Django is different than Flask. Both are python frameworks but approach it differently. - Reusing existing packages correctly to improve security and if it does need to add a package does so in a standard way. - Placing imports in the correct part of the file. - Not using deprecated or risky packages. - Avoiding LLM hallucinations. - Ensuring syntax and formatting are correct. - Follow the coding and naming convention in the file being fixed. - Making sure fixes are consistent within the same issue type. - Explain the fix properly and clearly so that someone can understand it. - Avoiding assumptions that could cause problems. - Not removing any code that is not part of the issue.

Our goal is to get to 90% - 95% accuracy in fixes this year, and we're on a trajectory to do that. I will be the first to say 100% accuracy is impossible, and our goal is to get it right more times than engineers would.

We take fix quality and transparency extremely seriously. We'll be publishing a whitepaper showing the accuracy in results because it's the right thing to do. I hope this helps.


LLMs writing code are fundamentally insecure. This product is completely batshit insane and I'd fire any vendor I knew used it.


Agreed that there’s no way to do this meaningfully and securely.

Looking forward to the archeological audits of LLM-developed apps x years from now that are a total mystery to the product owners…


Thanks for commenting. We're always trying to learn more and iterate to make Corgea better. How should've the fix looked like?


If you don't know that - or rather, if nobody on your team recognized this issue and brought it up - you should not be selling and shipping this product.


Oh my amen.


Thanks for highlighting this.

This fix is to demonstrate more sophisticated fixes, and it does require human input to determine the correct domain and IP config. We are introducing in the product for the ability for humans to add additional context pre-fix generation, provide feedback to generate a new fix after it's been generated and edit the proposed fix. Users have asked for these tools because of scenarios that require more insight.


Whats vulnerable for one company, may not be for another. It needs lot of context. Low hanging thing is to update dependencies but code fix is very tricky. Generic best practices is not very valuable as it might conflict with the code context as well as generate code might create confusion.

Historically, a generated code and human driven code is often segregated. With copilot, it is still assisting. But here, it generating and replacing code over the existing code therefore impacting the ownership of the code as well as value is not so much here. I think the PR that it would generate would endup not being merged.

Having said that, it is a great hook to get attention but i think you might fail in delivering meaningful value.


For your first point, we're not responsible for detecting what is vulnerable (or even exploitable). Today there are 4 categories of vulnerabilities in software: 1 - CVE's which you've mentioned around updating dependencies. This requires the least amount of context to detect, and it's easiest to change but requires a lot of context to fix downstream dependent code. 2 - CWE's which are common weaknesses in software. This requires a medium amount of context for detection and a small to medium amount of context to fix. 3 - Business and code logic flaws. This is currently unserved by most tools, and this is where the wide variance between code bases is. This requires a lot of context to both detect and fix, and it's what most people think of when it comes to your first point. 4 - Misconfiguration of environments.

Currently, we've focused on CWE's because of how focused and isolated some of the fixes are in relation to items #1, #3 and #4. We've run thousands of tests and see a very high accuracy in results. We do have plans to support #1 after we feel accomplished with #2. This requires more sophisticated tools and logic to handle upgrade changes safely.

At the moment, the responsibility is on the SAST tools at the moment to perform these detections. We've heard a lot of complaints about false positives and it's probably one of the biggest problems in the industry. We have future plans to tackle both detection and prioritization, but that's a separate thing.

To comment on your second point, Dependabot and other code-gen products like ourselves, code ownership will be impacted. I believe our understanding of code ownership will change fundamentally as more of these tools come out and get adopted. One clarification point, Corgea doesn't auto issue PR's like Dependabot does. Someone needs to look at the fix, before issuing a PR.

Thanks for commenting, and it's definitely a different perspective on meaning value. For other code-gen tools like copilot, where do you think the value is then?


I like the idea of this, but in a way it seems like going on to a website to enter your password to see if it was involved in any leaks. And that makes me uneasy.

A system like this would be so much better if all the scanning was done locally, keeping the source private from leaking at all.


Thanks for commenting, and totally get your perspective on it.

Scanning today can be done locally with many tools like Semgrep before you use Corgea. We do send over vulnerability information over to Corgea to make sure we can issue fixes for them reliably and at-scale. Keep in mind repos have vulnerabilities in the thousands or even tens of thousands. So it's not as simple as copilot running on your IDE reading your current likes of code. We have to be able to do this at-scale.

Finally, we've put a lot of effort into securing things down and you can read some of those details here: https://docs.corgea.app/security


> We help companies fix their vulnerable source code using AI.

I think like 95% of the general "vulnerability market" exists because companies have assets they don't own the codebase of, and have to wait for and test patches when they finally arrive.

> It doesn’t need to understand the whole codebase since the problem is usually isolated to a few lines.

I'm not a terrific coder but isn't this a pretty risky simplification? It's a very common occurence that a minor, one line change breaks something in a whole different part of the codebase.


Thanks for the comment.

- I would agree that a big chunk of vulnerability market helps with things that companies don't own, but I'm not sure it's that high. A lot of companies use SaaS tools and deploy tools they don't own on their private cloud that they have to wait on patches for. Our perspective is that a lot of the tools that emerged in the last few years center around detection, and very little on remediation. We didn't want to be another tool that contributes to alert fatigue. With budgets tightening, tougher scrutiny of security and increasing threats, companies need automation in remediation.

- We've designed Corgea to always have a human in the loop. Corgea doesn't push code automatically to prod. It creates a PR when someone clicks on the button to do so, and it will have an engineer to review the PR to ensure nothing breaks. Almost every company has those controls in place. Additionally, for the vast majority of cases the fixes are safe and don't lead to dependency issues further downstream, and for others we will be building logic to account for that. For example this SQL injection fix requires you to parameterize the inputs correctly, which is a one line code change and doesn't have dependency. https://demo.corgea.com/501.


> A lot of companies use SaaS tools and deploy tools they don't own on their private cloud that they have to wait on patches for

Dont forget ~ 85% of global compute is still on-premise stuff.

> With budgets tightening, tougher scrutiny of security and increasing threats, companies need automation in remediation.

Yes, but the main bulk of vulns for enterprises and SMB+ (as in, the orgs that actually do security spending) are in products they don't own the codebase of. Windows, Redhat, Cisco, Confluence, Jenkins, and more recently also solutions like Okta, Forgerock and others are getting exploited via vulns for the benefit of attackers.

I'm not trying to be a dick btw, but I think you're confusing the market you're trying to play in. You're selling something closer to a dev tool than a security tool, and talking about detection and such doesn't really concern this area.


No worries. I think we might be talking past each other, and that's ok :). I believe we're defining the market differently. It sounds like you're talking about vulnerable code in other apps, and I'm talking about vulnerability detection in general in security. Is that correct?

For code you own, control or have over-sight over, we can help. Otherwise, we can't.

When I mention vulnerability detection, I mean that generally as "findings". For example, our original product was detection data leakage issues in SaaS tools like Slack, Snowflake, JIRA, etc. For example, we could detect someone pasting credentials by mistake to their colleague in Slack because they wanted to share logs. This is a human caused problem facilitated by not properly sanitizing logs. I include these when I'm talking about the market.

The comment about the dev tool vs security tool is interesting. How would you categorize Snyk?


Gotcha, alert fatigue is usually associated with threat detection hence the qualys

> How would you categorize Snyk?

Im not a heavy user, but I believe they themself position themself as ”developer security” (think its even in their slogan)


I believe we're in a similar category. A telling sign of what category a company belongs to is based on what conferences and events they're sponsoring and attending. Snyk and the likes sell to the security teams with dev friendliness in mind. We're aiming to do the same here.


I get why people can get a bit apprehensive with using AI tools for Pull Requests because of hallucination but this is such a great application and will give it a spin on some of my Django boilerplates to see what it comes up with, congratulations to the team!

My question would be are you using it on your own codebase or an open-source tool you're fond of, would love to see this operating in the wild (examples are great but real life PRs hit different)?


Thank you! Please give it a spin. We'd love any feedback or thoughts. :)

We are using it on our codebases, and it's helped us secure our own product. Users have also been trying it out with their private codebases, and we even used our own personal projects to test it.

If you'd like to try Corgea with some open-source tools, there are a ton of applications that are vulnerable by design like. Some popular ones:

https://github.com/bkimminich/juice-shop https://github.com/we45/Vulnerable-Flask-App https://github.com/adeyosemanputra/pygoat

Edit: Forgot to mention, we've put in some controls to avoid hallucinations like comparing diff sizes between the two changes. Sometimes LLM's like to truncate code when generating a fix or generates too much. We actually stop the result from being generated and we retry again.


I ran the command-line and got:

  Running scan with commmand 'snyk code test --json'
  Finished running scan.
  Uploading results to Corgea.
  Scan upload finished.
  View results at: https://www.corgea.app
but nothing appears in my dashboard. How long must I wait?


My apologies for taking too long to reply, and thanks for being patient. I must've missed this in the thread. I'm not sure what is happening. Things look correct.

Could you reach out to us at support@corgea.com with your account? We're happy to help you resolve this.


Would be interested to see y’all compete in DARPA’s AI Cyber Challenge. https://aicyberchallenge.com/


Thanks for sharing this. We're aware of the challenge and we're considering competing. :)


I help with some small open source things, would this be a thing I can use to scan a public GitHub repository and see what it found?


You can use one of the existing scanners we support like Semgrep and Snyk to scan, and use Corgea to issue pull-requests for the fixes. We will support scanning in the future with some advanced capabilities.


Congrats on launching!


Thank you!


This looks awesome. Congrats on the launch


Thank you!


Very smart, seems super useful. Congrats on the launch


Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: