I really like his quote: "a well-formed crap report is harder and takes longer to discard".
I think that cuts to the core of why people feel betrayed when they suspect they're being fed unlabeled AI content. You see the well-formatted paragraphs, the precise explanations, and you naturally extend a bit more effort in reciprocation. There have always been junk bug reports, but they used to look like what they were.
Well i agree, in my experience in the past, lots of reports that looked like junk reports were actually real. I've seen lots of security reports with nonsensical explanations, very broken english to the point you can't follow, and then you run their PoC and it shockingly works.
Triaging security reports is exhausting and very hard.
IOW, older heuristics (meant to screen out below par human work) no longer function. Newer AI-specific heuristics are needed. Which is precisely what we're all developing, often without realizing.
Unfortunately there's a bit of a perverse feedback loop in that Pinocchio wants to be a real kid.
> That being said, in cases where the check may be bypassed or in a different implementation scenario, similar vulnerabilities can still appear.
This is so funny. "Oh, I see you have a bounds check that prevents vulnerability. BTW, if you remove that bounds check, your code will be vulnerable!!"
We get bug bounty reports like that sometimes. "I think your site might have an XSS vulnerability but your WAF is stopping it." "What I hear you saying is that we don't have an XSS vulnerability."
I mean, it's possible we do have a mistake in code somewhere we haven't found yet, but if the system effectively protects it, that's not a vulnerability.
This WAF scenario is different, it's a porous unreliable last line defense, like anti-virus. Having a mitigation stop something != not having the vulnerability in the first place.
I formerly worked in triage for a bug bounty program. We paid attention to these kinds of reports because it's often possible to bypass the WAF, or at least repurpose the vulnerability in a way the WAF wasn't designed to defend against.
Absolutely! If you have a known SQL injection behind a WAF, you better go fix it! It seems like these reports come down to the equivalent of “I pasted HTML into a form and you displayed the escaped version back to me, but maybe you forgot some tag.” No, I’m not going to turn off our WAF so you can test that hypothesis.
> No, I’m not going to turn off our WAF so you can test that hypothesis.
It would be worth your while to test it. You could run a dev/testing version of your app on a separate domain, without a WAF, and without any sensitive data held on it.
WAFs are a last resort to fix the bugs you didn't know about, and your application should still be safe without a WAF otherwise you're not actually getting the defense-in-depth you wanted. For an attacker that cares enough, WAF bypasses are a near inevitability.
It may be worthwhile to test, but the strength of "I see this field is correctly encoded but maybe hypothetically it could be your WAF protecting a vulnerable application. My sole supporting reason for this hypothesis is that if it is true, your bug bounty program will pay out for me" is, as vulnerability signals go, too uselessly weak to act on.
Bug bounty programs are nifty in that they give real researchers an effective outlet for the things they were quite possibly going to discover anyhow, but part of the price of that is you get a lot of submissions from people basically treating it as a system for spraying bug bounty programs with lottery tickets with low effort.
I'm kind of curious: do these bug bounty "spray and pray" tactics actually make money? I can't help but wonder if people are doing it because it works, or if it just looks like it should work and people are desperate.
Exactly. It's basically spam: there's nearly no cost to send it, so even an abysmal success rate is likely to return a fat profit.
I've heard that the average reward is about $500. You can afford a lot of rejections per success at that rate.
Never mind that you're destroying the effectiveness of those programs, driving staff nuts, and generally making the world less secure; that's their problem, right? (Sarcasm, obv.)
Please give me the benefit of the doubt and assume that we actually do test things. As I assume you’re good at what you do, so am I. The WAF thing was an example of the sort of report we commonly get: “if you turn off all the mitigations, it may be possible to attack you!” Yes, but we use defense in depth so that the mitigations help cover any gaps we might’ve missed, and if the end result isn’t exploitable, it isn’t exploitable.
Just like in the original report here: “if you turn off those checks, it could be vulnerable!” “Yes, but we’re not about to do that because we’re not insane.”
The curl report (incorrectly) describes missing bounds checks leading to buffer overflows.
If the curl project said "buffer overflows are ok because our code gets compiled with ASLR and NX", then that would be comparable to saying SQLi and XSS are non-issues due to a WAF. Fortunately, that's not what they said.
It was an example, an analogy, an illustration. We’re not dependent on our WAF any more than curl is dependent on ASLR and NX. We (and curl) use those things in addition to good coding practices so they all work together to create as secure of a product as our respective teams can manage.
I got hit with a CVE spammer last year which forced me to re-release a library to fix a fake vulnerability that cannot be disputed in any way -- https://gist.github.com/6en6ar. They do nothing but post exploits where long strings purportedly take a long time to process -- because they are slow to construct! And even more fun, the example exploits often don't even compile.
I assumed it was a human, but I'm starting to wonder if this was an early CVE-farming AI bot.
The CVE process is very broken and extremely exploitable as a DoS against humans.
> exploits where long strings purportedly take a long time to process -- because they are slow to construct!
That’s not an accurate characterization of ReDoS. Even if a long string is required to produce the behavior, the vulnerability is that the string takes a disproportionately long time to process even for its length, such that it becomes disproportionately easy to bring down a service. The CVE scoring system gives denial of service way too much weight if you ask me, but it’s not a fake vulnerability.
However i think there are people spamming fake redos vulns just to get creds. Things like, yeah, if you passed in 10 mb of input this would be a problem, but also some other layer limits it to 100 bytes.
ReDos is real in certain circumstances, but it is way way overhyped and usually bullshit.
The code purported to be vulnerable is not slow because of the length of the string. The "example exploit" is slow because the reports use slow methods to construct the string under test. When timing the affected methods, they are _not_ slow.
Can you explain how a CVE "awaiting analysis" forced you to "resolve" it? Just wondering how this works.
I looked at the first one (https://nvd.nist.gov/vuln/detail/CVE-2024-25355) and obviously they time way too much of the "vulnerability", attributing their own test setup to your library (as you mentioned).
So- someone reports a CVE (can just anyone do that?), and without any validation, some other services dings your library and now random users ask you "can you please fix your library"?
That wasn't the library I had an issue with but the general problem is this:
Downstream consumers of a library that have integrated Dependabot get alerts for CVEs filed against a library, even if the are "awaiting analysis". Those consumers send messages asking for a resolution, and there's no trivial way to push back that an advisory is false.
For example, here's the one I'm griping about. This is marked as _Github reviewed_:
I used the reporter's reproduction and could not reproduce the slowdown at all. It turns out that the testcase was slow only because they were printing the URL under test.
As a maintainer, I have a choice: either I need to go and clean up all of the automated tools that respond to CVE spam, OR I just release a new version of a library, fuck it all and move on with my life after blocking the reporter.
For what it's worth, Github did not respond to reports about this user, so I got to the point where I think everything is broken and I no longer care about anything other than clearing those alerts out.
Could we introduce a real monetary cost to posting a bug? That'd disincentivize making massive amounts of noise, but allows people to compete for rewards with worthwhile answers.
I would rather some sort of authenticated peer-rating system. If someone has a history of making good, useful bug reports- then convey that in some way to project maintainers reading bug reports.
I'm afraid any peer-rating system would be suspectible to same kind of point-manipulation other platforms are prone to, slightly increasing the latency but not the amount of spam. Returnable tokens worth real money would make this category of attacks completely infeasible
I can kinda understand the motive for carpet bombing an issue tracker with AI slop hoping for a hit, but then to whine about unprofessional responses is really too much. Time to unplug the chatbot.
Not to mention that the first couple of responses were very professional. At some point you just have to call a spade a spade, or not suffer fools, or whatever expression you prefer.
It sounds like a non native English speaker to me. I've worked with many non natives and some speak like that. Of course they also have enough self awareness to know they don't speak good English - some of them with much effort gotten good in a few years.
I also understand the "motive": money. If you can spam enough maintainers with bullshit vulnerability reports, there's a chance you'll hit something real or trick a maintainer and get a pay-out.
This type of thing is my biggest AI fear. It’s just too easy to produce bug reports, twitter posts, academic papers, entire books and audiobooks, using AI. While the results are almost entirely trash, we can’t force humans to take the time to categorize and reject them, as there isn’t enough time.
The only fix I can think of is going to be to introduce trust models, where you can vouch for people and their outputs, and people can trust their friends, or particular lists. PGP keys aren’t the technical answer (because it’s a mess), but I think something more modern in that area might be needed.
There should be consumer protection laws in place that require AI companies to record every result their tools produce. Then when some trash human submits the AI output to a bug tracker, forum, etc. with a negative impact on the victim website, they can submit a claim to the AI company for a $100 million payment due to the inability to prevent the tool from being used improperly.
Having contributed my small grain of sand to the curl project in the past, I can only say I have huge respect for badger. All the issues I worked on he was impressively active on and even though I was a newbie and his language came across a bit terse when making comments, he was never wrong or disrespectful. I can’t imagine how much strain is AI slop putting on curl maintainers so I hope there’s a solution to that in the near future.
Can the title be changed? It is the exact title of the "report", but it's not a real report, and that's the story here. Keeping the real title of the fake AI-slop-report only adds to the confusion.
It's hard to do a neutral, descriptive, non-editorialising, non-clickbait title, but perhaps along the lines of:
> Curl maintainers lose time dealing with AI-generated fake security report
I wholeheartedly side with Bagder here. I felt his response in my soul.
Even before AI, everyone wants to report a small problem that could be completely insufficient and with mountains of those reports piling on, it becomes increasingly difficult to find REAL issues being reported, by non-technical users who can't do proper PR's.
Unfortunate that they deleted it and didn't / couldn't lock the account and flag all its submissions as suspect while leaving them in place for review.
We're also seeing symptoms of this in software support. Specifically with flatpak for example, as people start to recommend or wonder about flags (that have never existed) not working.
The maintainer vs. AI issue aside: in C when you write a method with a dst pointer and a size_t size argument, is it enough to check the size argument to be sure that the dst pointer will fit the data? It doesn't seem like that would protect against mistakes, for example if I allocate 100 bytes and call the method with 200 for the size_t, then it will pass the check but potentially write out of bounds anyway?
I guess what I'm trying to say is: would it not be safer and more ergonomic to use some struct of pointer+size and pass that around just like higher level languages pass around arrays with a known length? That way the invariant that the size of the allocated memory actually matches the size field can be preserved?
It is too late to do that. C was built in the 1970's when buffer overflows where only exploited by friends to get a laugh and so they were not security issuses.
Apart from the LLM bs, it looks like spammers exploit the fact that they can submit bug reports at no to little cost, while the maintainers have to spend a significant amount to triage the reports.
If this spamming goes out of control, then I think it will be inevitable that maintainers will need to charge money to triage bug bounty reports to balance this out somewhat. This would obviously suck for all legitimate parties involved.
Actually this could be an attack in itself. Say you find a significant 0-day in some popular library. And start exploit it, but you at same time bury the reporting system in generated false leads and reports of non-existing issues. Thus taking time away from actual issue while you actively exploit it...
I mean, if the reporter actually tried following the PoC they might have realized it's not a valid PoC. (Though to be fair, perror does return a confusing error message in that case).
At first I thought they were trying to make the claim that if you lie about size, you can (duh) get a buffer overflow, but they didn't even successfully manage that.
People who file reports like this should be permanently banned from HackerOne and excised from the security community. They are parasites, leeching off of the time of open-source maintainers.
I can’t believe how nice badger is on the thread, frankly. He is well within his rights to be much harsher.
dude is creating capital and asset funds for a business that doesn't yet exist. i'm afraid we've got a clinical case of the silicon valley brain worms here folks.
If I found a really valuable exploit that I wanted to avoid seeing fixed for at least a few months, then why now swamp the project with false positives for a period of time so that my exploit, if it's found, is just one of hundreds being reported and IF it's reported, there is a chance it's being drowned or even accidentally removed.
Perhaps I'm too optimistic, but the legit exploit will have a very different style. Perhaps broken English and a 50% code and 50% text instead of 10% code and 90% text. I guess that a fast triage to detect non-cookie-cutter reports would be enough (but anoying anyway).
I found his custom version of ChatGPT by searching for "napol-webug". It seems the user's name. I have more conversations with this custom ChatGPT model but this is basically it:
You said:
Translate those to English
ChatGPT said:
Here are the custom instructions translated into English:
Chat in Thai only: Respond to all messages in Thai unless explicitly asked in English.
Pronouns: Refer to myself as เค้า (similar to "I" in Thai) and address you as เทอ (similar to "you" in Thai), avoiding other pronouns.
Ending particles: Use คับ, งับ, ฮะ instead of standard Thai formal endings like ครับ, ค่ะ and avoid using exclamation marks.
Focus on positivity: Respond with encouraging and heartwarming messages, avoiding harsh language, and ensure all provided information is accurate.
Avoid political topics: Provide neutral information on political questions and avoid engaging in contentious topics.
Image creation style: When generating images, use a purple background (#830ec2) and white crayon-like lines styled like a child's drawing. The lines should cover 20% of the image area.
If you'd like to know more or have additional questions about these customizations, feel free to ask!
Open source burnout is already real, I hope the volume of the AI slop security reports stays manageable.