> [...] We will thus hold back the banning of the url for now, awaiting for confirmation of the desired effect to reduce the potential harmful impact on the application users. Given how much "sample code" we found around the internet using that url, it might still be a good idea to merge the patch later just to prevent this from happening again.
Looks like a case that the developers carelessly copied and pasted some "sample code" into the app...
> [...] it seems that there is no good way to get in touch with them through email (I sent an email to all publicly available channels, only to get back an autoresponder that assumes I'm an user of the app and asking for my phone number). I eventually resorted to DM their CEO on twitter.
Resorting to Twitter for support is increasingly common. The importance of having an "abuse@" email (and possibly some social media bots to DM all sysops when a mail arrives)...
> Resorting to Twitter for support is increasingly common.
Always maintain some out of band support system. If that's email, so be it.
If there's business owners watching this, please do not make Twitter (or any other social media) your primary point of contact for support or abuse. I stay far away from Twitter such that I don't have an account and can't even see a single tweet without jumping through some hoops. I've learned via posting here that I'm not alone in this and that this trend will likely grow as time goes on.
On the other, it made "Copy Paste Programming" go to eleven. (There was even a C# example the other day that famously broke in a big project but I can't find it)
Maybe it would be a case of Stack Overflow linting examples to remove stuff like builtin urls and such.
I've seen "developers" complaining that example code with a very explicit >replace this part for your use case< complaining that the example didn't work. I guess making some things harder would just be an overall gain.
You're probably thinking about some app not being able to start when some other specific app is running because both were using a GUID copied from SO for implementing single-instance apps.
I cannot find the source, but the folklore says a Linux kernel developer wrote a USB tutorial with his USB VID/PID, years later, he found he became the manufacturer of all sorts of gadgets he never heard about.
It's a good example but I think it's not exactly that. Maybe it was a GUID generation code that would generate the same one for every instance?
> I know it has something to do with some mythical thing called a mutex, rarely can I find someone that bothers to stop and explain what one of these are.
I recall once, there's at least a prototype visual studio plugin which would copy code examples straight from the internet.
I would love a bet it list of common use case templates which can just pop up when I use visual studio.
Or maybe a sort of snippet box to drag and drop in my code. For example reading and writing a file in C sharp isn't something I exactly know off the top of my head.
Some time we'll get round to writing this up but there's a small customer of Cloudflare that gets a very high HTTP requests per second rate. It's a simple service (bit like a "what's my IP address" but not that) and it turns out that a quite popular hardware device hard-coded requests to this service and doesn't appear to cache the results and so it asks over and over and over again for the same information.
We've contacted the manufacturer and I think it's been patched but the life time of installed equipment is long...
This is similar to how Qualcomm's DNS servers got knocked off the air. An OEM shipped an update which would query a development TURN server we were running - once per connection, over millions of devices. It was a crazy day.
Are you not tempted to just block the requests from these devices, and let the manufacturer take the loss? I imagine serving all those requests is costing real money.
It's not a TOS violation. It did cause us some ops pain at one point (they were getting hit with > 50,000rps concentrated in certain locations). But one of the reasons Cloudflare can operate our service is we have 3.2 million customers who are doing all sorts of stuff. We get so much stronger from that great variety of traffic.
Could you be more explicit on the nature of the service?
I’d like to explore mechanisms for tests that detect IoT devices that misbehave this way (and other ways as well). Your anecdote sounds interesting. Is it unrelated to time servers? Unrelated to internet connectivity tests?
> To narrow down the app, we decided to observe connections to the image from clients (phones) to our servers. We did this by opening the popular apps one-by-one and noting down the time. After doing this for all the apps, we then ran this query in Hive: SELECT * FROM wmf.webrequest WHERE year=2021 AND month=2 AND day=9 AND parse_media_file_url(uri_path).base_name='/wikipedia/commons/1/16/AsterNovi-belgii-flower-1mb.jpg' AND webrequest_source='upload' AND uri_host = 'upload.wikimedia.org' AND user_agent='-' AND ip=<IP>;
> We then found the specific app that was making the request by matching the time when it was opened and the time image was requested from our servers, restricting the results to the User-Agent '-' and from the IP we tested.
Unless I missed something, running mitmproxy/Charles etc. in front of the phone would have been way easier than querying the entirety of Wikimedia server logs and trying to match IP & timing windows.
"way easier" depends on what you are familiar with.
If you are familiar with "querying the entirety of wikimedia server logs" and do it all the time (the word "entirety" makes it seem like a big deal, but they clearly have tools meant for this that they use all the time)... and have never learned to use "mitmproxy/Charles etc" before....
It sounds like the "querying the entirety of the server logs" for this task probably took them tens of minutes at most. It would probably take me at least an hour or two to learn how to use "mitmproxy/Charles etc".
So "way easier"?
If you have to do this sort thing all the time, it might be useful to install and learn how to use "mitmproxy/Charles etc", why not? Certainly worth considering. But if the tools you have are working for you...
I mean, what they did seems like it worked to get them the answer and was pretty efficient, using the toolset they use all the time for dealing with wikimedia ops... Seems like some good detective work to me. I get the desire to point out other tools that would be well-suited for this kind of task, but why the need to point it out as if they did something wrong or not "way easier"? Sounds like what they did was pretty easy for them, and they didn't need to learn new tools to make it "way easier".
I enjoyed hearing about how they tracked this down, and found it useful. Pointing out how they didn't use the "right" tools just makes it less likely people will be willing to share their processes.
I mean, the query doesn't look that complicated, and is something they'd obviously be already familiar with how to use - not that mitmproxy etc is particularly hard, but its another thing.
Judging by the last point they seemed to want to doublecheck findings and confirm before throwing accusations around.
> To further confirm this finding and to ensure that we had the correct app, we decided to log DNS queries from a phone by setting up a local resolver to capture DNS traffic. After pointing the phone towards it and launching the app, we noticed that it was indeed the one looking up upload.wikimedia.org on startup.
You can only pin your own certificate, not someone else’s. In this case you probably don’t even need SSL proxying to pin down the culprit, as I dare say not many apps connect to wikimedia on startup. You do need SSL proxying to be sure though.
Nope. Starting from Android 10, unless an app has explicitly allowed user certificates (and no-one reasonably does, it's all behind a <debug-overrides> flag), you will not be able to MITM it. You may inject your certificates as much as you want. The only option is to have a device on which you have root access, which can push system certificates with adb. This pretty much only means the android emulator these days.
I don’t use Android so I wasn’t aware of that. But that’s a completely separate concern from cert pinning which does not hinder decrypting third party connections at all.
Edit: after looking into this a bit, this is pretty nuts. How do enterprises inject certificates now?
They don't. It's been made increasingly clear that allowing certs roots to infect unrelated apps is a Bad Thing. MDM profiles etc presumably allow internal certs to be deployed, but those are hopefully limited as countries, let alone companies, have attempted to use those mechanisms to spy on millions of people.
It looks like it wasn't a big deal to perform the lookup, so I guess it comes down to what the engineer was familiar wit.
It does say later in the post that they used a local proxy to confirm their findings, too. Maybe they wanted to check from both sides, just to be sure.
They didn't have 'the phone'. They only had traffic logs and had to find the app that matched a certain traffic pattern. And they did have 'something like ElasticSearch', it's all in the link, and even in this reply.
It sounds like they did have the phone. Otherwise, Wikipedia is doing some sketchy stuff if they can figure out what apps are on my phone just from my IP address (IMO unlikely)
The most impressive thing about this is that the offending app doesn't even display the image, it was some copypasta code the app developers apparently didn't even understand.
the hard part about this, if it were not wikimedia but some individual person's server, is that the traditional method of using something like an apache rewrite rule to replace the jpg/png with goatse wouldn't work, because the image isn't even being displayed.
Even worse: You might be seen as an easy way to Goatse people. Just give the yahoos the URL of an image you're rewriting to be the infamous gape. The traffic would only grow once you came to be known for Goatse as a Service.
Implemented via several dozen geographically distributed 1U servers with 10 or 100GbE network interfaces, and anycast DNS. You could do it pretty much the same as anycast DNS recursive resolvers, but all sharing the same hostname, IP and TLS keys.
You can totally get your own asn, acquire a /24 of ipv4 space, and announce as anycast your netblock containing the goatse httpd from multiple geographic locations.
Don’t know why you would go with didn’t understand rather than didn’t care. I thought the prevailing theory was that it was used for measuring internet speeds.
> 12. By this time, we had isolated the app and were convinced that this is the one that is fetching the image on startup. We could not find the image anywhere in the app, confirming our theory that it fetches the image but does not display it.
The analysis stops right when things start to become interesting. I was hoping there'd be a decompiled code snippet to see what the app in question is actually doing with the image, since it's not displaying it.
If one were to decompile and check what the app is doing, I'd guess it would still be a good idea to not say so on a public forum. Especially, in cases where there is so much public attention. :)
I agree with what you're saying about not going public immediately if anything malicious had been discovered, but then they probably wouldn't have written:
> We will thus hold back the banning of the url for now [...]
If on the other hand it's really just some benign leftover example code downloading the image and not doing anything with it later as has been suggested and is indeed the most likely, there'd be no harm in confirming that's the case.
They went to great lengths with their investigation, and this would be the obvious final step to wrap it up. Posting a couple of the relevant .smali lines wouldn't have to reveal the name of the app in question (which at this point can be identified by anyone sufficiently motivated anyway).
Evidently the download has been specially crafted and surreptitiously emplaced to globally disseminate a steganographically embedded key that decrypts tailored malware aimed at disrupting the [REDACTED] nuclear weapons programme and for which the app is a weaponised delivery sabot distributed and marketed as part of the same covert operation.
It's usually not a good idea to explain a joke, and I'm not the GP so I'm not sure that's what they meant, but "plant" has multiple meanings, and I suspect they are referring to meanings 3 to 5 from this list: https://www.oxfordlearnersdictionaries.com/definition/americ... . Basically, you were asking for some kind of elaborate meaning to a random image downloaded by a random app, and they provided an elaborate conspiracy theory to match your question...
Heaven forbid someone make a joke on the internet and not take your frustrations as seriously as you seem to take yourself.
To address your frustration at the analysis stopping there: What do you expect someone who is likely more versed in web dev and their unique distributed systems to do? Do you expect them to have the expertise to decompile an app from a third party, an app popular enough to cause this much traffic? And if they did, would it be worthwhile when their only concern is limiting/lowering that traffic?
> I agree with what people have already said, but I think there's one more point to add: people usually over-estimate how funny their own comments are. We have a tendency to think, "This idea of mine is hilarious! And different! Surely this witticism is the exception." And we are usually wrong. When you have N people all doing that, there's a lot of noise.
> I try to gently point this out to people who complain when their attempt at humor has been downvoted by the community. It's not that we don't like humor. We just don't like banal attempts at humor, which becomes noise. Or, put in a less charitable fashion, "You're not as funny as you think you are."
I'll offer a counterpoint that humour containing a kernel of truth, or simply tickling enough grey matter to be stimulating, especially when coupled to topical opinion or informative substance, however obliquely, or by contrapunctus an uncomfortable notion delivered via a candy-coating of levity, is often appreciated, and although I am not generally concerned with the integer popularity contest it is helpful feedback for training one's temporal lobe, and I will reveal that on this occasion the jest has strong positive reinforcement, most likely due to diffusely enclosing a distorted subtext of real events and the innate conflict of reconciling this to the mock-paranoiac narrative, but perhaps also due to the coupling thereof to the construction of a pun, and further observe that my own stumbling path to exploring which modes of wit might be appreciated on this forum, vs those rejected as noise, has been largely empirical, and years in the walking, and remains an ongoing process for the ages, and although I cannot claim to have discovered a global maxima, but instead have merely blindly grasped the whimsical elephant, I would question whether the existence and forms of comedy could be otherwise derived from first principles or any other means, all of which is to breathlessly recommend: go ahead, crack a joke, see what happens.
I don't expect anyone to do anything. As a person reading this story, I just commented on the fact that it would be interesting to know more details as to how the app ended up making these requests in the first place.
I was just expressing my personal opinion that if I went this far investigating the situation, that's what I'd like to find out as well.
> [...] when their only concern is limiting/lowering that traffic?
If that were their only concern, they could have just (quoting my previous comment):
>> block[ed] the request URL/UA string pair, which was also an option
However:
>> since they already traced the culprit with a lot of effort [...] the logical ultimate step to conclude their investigation should be to see what the code does
Like everything else, this is just my personal opinion of course.
> your frustrations
> your frustration
Not really sure where this comes from but it's really unnecessary.
I'm glad for Wikimedia that they resolved the issue, and shared the details, which make for an interesting read.
I've majored for years in cracking jokes on Hacker News; often a hazardous undertaking given the capricious and judgemental denizens of this forum. And yet comments whose kernel or allusions contain a mode of wit or wry observation have supplied roughly one third of the aggregate approbation tally, for what that's worth (which is, admittedly, very little, but remains nevertheless a useful guide to one's welcome).
What's curious is that many folks claim the subthreads of their comment as a personal fiefdom, taking umbrage at any remark they deem unworthy of the continuation, the self-appointed gatekeepers of repartee.
Specifically, though, I don't think this is a valid complaint:
> there really isn't any need to explain this to me
since it follows from:
> how does anything you wrote relate to anything I wrote?
which is, rhetorical or otherwise, and notwithstanding the pilgarlic territoriality, a clear request for an explanation, which rob74 has elegantly supplied, and to which I'd append only one additional observation, viz. that the image is a plant
In addition to that, one of the culprits here is a widespread sample code that was carelessly copied to a popular app. Shaming does penalize the other culprit but not that one.
Just to be clear, I don't think every author using this image for their sample code is to blame. I'm specifically looking for someone using the public Wikimedia CDN for speed tests [1] and I think that someone is probably the sample code author.
Given that it doesn't display it, and that the image u.r.i. is frequently used in example code rather than something with example.com, it was almost certainly an innocent mistake of copying example code.
> it fetches the image from Wikimedia Commons but does not display it
If I were to guess, they use the picture as a connectivity/speed test. They probably figured Wikipedia has unlimited free bandwidth, so they didn't care.
This seems interesting from a legal point of view as well:
Is an app downloading, but never displaying, creative commons content infringing on copyright (by not showing correct attribution and violating the CC terms)?
Besides copyright, could this be considered theft of service?
> This seems interesting from a legal point of view as well
Is it? If you make a resource freely available to people online, and people access said resource, what's the legal ramification there? It would appear there is no malicious intent which would be necessary to make the case for abuse, and theft of service would be a stretch given that Wikimedia doesn't charge for their service.
I think OP is referring to the fact that your device is internally making a copy of the image during the download process, yet the creative commons license requires that copies of an image have attribution. The terms of the license are therefore likely not being met.
Though it's interesting to think of the possible legal ramifications, I doubt there will be a court case. The "damages" looks like about ten terabytes of bandwidth, and lawyer fees would surpass that in days .
NTP domains have had a history of similar problems, and they seem to be resolved by apologizing, fixing the problem, and sometimes a donation.
I don't think this is theft. Nothing was stolen. At most Wikipedia suffered some damage in the form of slightly increased bandwidth costs. Also, theft is involuntary. The Wikipedia server has the power to simply refuse the connection.
It's difficult to steal something offered for free. You could try to charge them with attempted DDOS or something like that but since wikimedia did not suffer any actual degradation of service. I think that at most you can go for something like "causing harmful traffic through negligence" but you'd need to prove the traffic was actually harmful.
In any case let's not get carried away. 90 million requests for a 70KB file is only 5.8 TB. Wikimedia mentions in their about pages that they are hosted on bare metal servers in various places around the world. Just going on the bandwidth charges of the first provider in the list, that'd be about $30 USD per month if they have the "bulk" pricing or $300 USD per month if they use the list pricing. I don't think that is worth going to court over for the Wikimedia foundation.
Considering that this seems to have been code accidentally left in while copy pasting from a tutorial, it's very hard to say, without doing the exact same investigation the Wikimedia team did.
There are a lot of apps that have launched in India around that time frame with huge numbers of users thanks to nationalistic rhetoric. They are terrible apps, but they are made in India terrible apps, and that apparently is enough to get a large following in India of late.
To be fair, a nation has to make lots of terrible apps first, before they can make mediocre apps, and then good apps. The logic of wanting homegrown apps could still be correct, even though there is a painful period of terrible apps.
Now, what the U.S. excuse is for its terrible apps, I'm not sure...
Yeah, guess that the Wikimedia guys are not adepts of the "name and shame" method. Although it would be entirely justified in this case, and stop people trying to guess it and implicating the wrong app...
This reminds me of a Stack Overflow answer that became popular but instead of using 'example.com', they used some other random, but valid, URL that suddenly created a huge spike in traffic for the unsuspecting web page.
I'm now curious, how much traffic does example.com receive? Does it use Anycast? Does IETF publish statistics? Searched and found the answer here, no known statistics, but it's backed by a CDN.
* Ask HN: What does traffic to example.com look like?
There was ”Hello to Hacker News” from one of the moderators, to which someone replied ”Hello from Hacker News”. No that much of a spam IMO. And it was from one person, not much all of ”guys” in HN
Never link to issues on bug tracker. This always, without fail, happens. It makes more work for people who are already stressed about trying to solve a problem.
Nah, if it's a historically or culturally significant issue, I'm definitely linking to it so people can read about it. Just add a warning not to post meaningless garbage in the ticket's thread, if you're that concerned about it.
TL;DR The thread doesn't name the specific app, but it's Indian, likely a social Android app, probably Say Namaste or Mutton TV. The right app was found by installing apps and checking the connection logs.
Looks like a case that the developers carelessly copied and pasted some "sample code" into the app...
> [...] it seems that there is no good way to get in touch with them through email (I sent an email to all publicly available channels, only to get back an autoresponder that assumes I'm an user of the app and asking for my phone number). I eventually resorted to DM their CEO on twitter.
Resorting to Twitter for support is increasingly common. The importance of having an "abuse@" email (and possibly some social media bots to DM all sysops when a mail arrives)...