While most are focusing on the "default location" issue, I can see another issue in this article:
Why are people expecting the output of an IP mapping database to be precise enough to send law enforcement to that physical location? From what I could gather from the article, law enforcement (and others) are treating it as if it was as exact as a reverse phone number lookup, while in reality there's no way a global IP mapping database can get much more precise than "around that city" (unless perhaps if the IP address is in a datacenter).
That is, even without it being a "default location", sending people to the GPS coordinates obtained from an IP mapping database is nonsense.
Really, the error here is that it's returning a point. And this is a mistake that pervades all of society. The idea that any measurement could give you an exact value. Every measurement of the physical world should always give you a range of non-zero size. So, a lookup in a geo-ip database should give you an area. And it's perfectly fine to give a default value of "the USA" if that's where the address is located. It's only idiotic to return "the center of the USA".
It also seems the designers didn't envision any errors - the system always gives a valid result - which indicates this hasn't been particularly well engineered.
A GeoIP lookup can fail, and that error should be propagated through to whoever is using it - instead of returning a default value.
There are multiple geo-ip databases. MaxMind's apparently does give an "accuracy radius" https://www.maxmind.com/en/geoip-demo but many others don't, and of course tools built on top of any particular geo-ip database might not present that information to users.
Most people including many in tech firms that build tools based off IP to Geo lookups don't actually understand the inaccuracies inherent in the process.
I actually spent a couple of weeks not so long ago trying to get data on accuracy benchmarks from tech firms in China. Initially they didn't even understand the question - they just used a standard database provided by someone else which is treated as Gospel.
So I'm not too surprised non tech people like police make this mistake.
>Why are people expecting the output of an IP mapping database to be precise enough to send law enforcement to that physical location?
The everyday layperson doesn't understand technology as well as we'd like them to. A common analogy is comparing an IP address to a mailing address or phone number for computers. For most purposes, this analogy is good enough. When we start trying to pinpoint people for crimes, the analogy is no longer perfect.
>A common analogy is comparing an IP address to a mailing address or phone number for computers. For most purposes, this analogy is good enough
While it is a common analogy, is not "good enough" because it is incorrect, inaccurate, and has lead to a host of problems far removed from this current GeoIP issue.
RIAA/MPAA is one of the worst offenders at attempting to equate an IP address to a Physical Address/person for the purpose of civil and legal liability. I believe they are the ones that cemented this very very flawed and incorrect analogy in to the minds of the less technical minded persons
I think we have to take the analogies for what they are: a simple comparison to break down a complex topic into familiar terms. Once you move from informational purposes to application, a simple analogy isn't good enough. You have to start learning the limitations of whatever you're using.
Case in point: the RIAA/MPAA. They took the analogy and ran with it just as you said. That's not the fault of the analogy or the people who first used it.
Wouldn't a better analogy be comparing to a cell phone? Yes, we know the "owner" of the number. And we might know roughly where they were a short time ago. But, we can't, with any certainty, tell you where they'll be tomorrow, because cell phones are portable.
That is, even without it being a "default location", sending people to the GPS coordinates obtained from an IP mapping database is nonsense.
Ignoring the default location issue.
Imagine I'm a police officer and I don't know that MaxMind returns a default location. I then take an IP pop it into my tool and it gives me a lat/long. I take that transpose it on a map and see that to my surprise there's a single home or address at that location (these folks live on a 600 acre ranch). If I'm an investigator and all I have to work with is an IP, even knowing that MaxMind isn't perfect wouldn't it still be worth it to take a trip out to that point to investigate?
BTW Just for giggles this is the map if you use the returned coordinates
The question is why are they using something people generally expect to be very precise (lat, lon coordinates) to communicate information that is only accurate to a country or state level. That is where the problem is.
If you have ever ordered pizza from a static IP, you probably have a perfect address matched to that IP in a database someone has for sale somewhere. Probably true for other online retailers (though not all of them).
Pizza guy still calls me two blocks from my house because that is where GPS tells him to stop.
If you don't want this happening, you're going to need a police department that doesn't fire cops whenever the become smart enough to understand what an 'IP address' is.
It would work well enough for those who just want a door to knock down. The public draws its own conclusions about those who use this sort of service for this sort of purpose.
I thought it was well-known by now that police in the United States are either violent, unintelligent thugs or complicit in allowing same to operate. He asked a sincere question, this is a sincere answer.
I wish to abide by the rules for commenting here, so I respectfully request what you mean by "like this".
The comment in question violates the guidelines by introducing a classic flamewar topic without anything new to say, by your own admission. It also does so uncivilly and without substantive contribution to the discussion: the poster's question was not about the abilities of law enforcement, but about the inherent inaccuracy of IP-based geolocation.
Please re-read the guidelines if you haven't recently:
This is kind of like "SWATting", in that we hold anyone and everyone except police responsible for horrible things done by police. One would have expected some sort of procedure to have developed, between the second and four-hundred-thirty-second time the sheriff's department drove out to hassle these people. Apparently one would be expecting too much.
The problem is not only with police or law enforcement, though. It seems all sorts of individuals and organisations knock the door of the farm because they don't realize they were given a dummy location.
There are a lot of comments saying that given a fixed area (e.g. "Somewhere in the USA), that it's common to return the centroid of a polygon.
There's an interesting note in the Fusion article[1], where it mentions "[the farm] is a two-hour drive from the exact geographical center of the United States".
> As any geography nerd knows, the precise center of the United States is in northern Kansas, near the Nebraska border. Technically, the latitudinal and longitudinal coordinates of the center spot are 39°50′N 98°35′W.
> In digital maps, that number is an ugly one: 39.8333333,-98.585522
> So MaxMind decided to clean up the measurements and go with a simpler, nearby latitude and longitude: 38.0000,-97.0000.
I wonder whether that decision - to choose a location, rather than use the precise centroid of the area - will cost MaxMind the case.
In that case they'd be getting sued by various police agencies who have wasted money on arctic trips, as well as the survivors of various officers who were eaten by polar bears.
Except 0,0 is not in the Arctic, it's a little bit off the coast of Nigeria.
And, the problem with returning Null or 0,0 is that it implies there is no information available. There is still value in having a default location of the country of an IP is known, but the location is not. If you work with MaxMind a lot, you know that 38,-97 is a US IP with unknown location.
Who looks at lat-lng anyhow? Prefixes have proper fields for addresses, or even comments. There are no objects for individual IP addresses anyhow. MaxMind is good for a quick country of origin assignment, and that's it. Their city level DB is just smoke and mirrors.
However, for some people it actually is the location they're looking for. The closest road junction to that place actually has some signs:
https://goo.gl/maps/1vyM9Ss7o8t
I wonder how many geocaches are hidden in that patch?
(Hmm... had to register to geocaching.com to find out; apparently just one at the exact spot and then a couple nearby.)
It should just return a precision estimate along with the data. Here's what the ancient (from january 1996) standard for "location" records in the DNS specifies, which in my view is very sensible: https://tools.ietf.org/html/rfc1876
SIZE The diameter of a sphere enclosing the described entity, in
centimeters, expressed as a pair of four-bit unsigned integers,
each ranging from zero to nine, with the most significant four bits
representing the base and the second number representing the power of
ten by which to multiply the base. This allows sizes from 0e0 (<1cm) to
9e9 (90,000km) to be expressed. This representation was chosen such that
the hexadecimal representation can be read by eye; 0x15 = 1e5. Four-bit
values greater than 9 are undefined, as are values with a base of zero
and a non-zero exponent.
HORIZ PRE The horizontal precision of the data, in centimeters,
expressed using the same representation as SIZE. This is
the diameter of the horizontal "circle of error", rather
than a "plus or minus" value. (This was chosen to match
the interpretation of SIZE; to get a "plus or minus" value,
divide by 2.)
...so, for a point "somewhere" in the united states, I'd reckon that SIZE=1·10⁰m (1 times 10^0=1m) and HORIZ (and VERT) PRE set to 5·10⁶m (5 times 10^6=5000km) would be a sane choice.
(measuring on google maps, the united states seem to measure about 4500km from east- to west-coast)
> "expressed as a pair of four-bit unsigned integers, each ranging from zero to nine, with the most significant four bits representing the base and the second number representing the power of ten by which to multiply the base. [...] Four-bit values greater than 9 are undefined, as are values with a base of zero and a non-zero exponent."
That is the stupidest data format I have ever heard of...
The precision is in cm, so HORIZ PRE will be 5e8 or 500000000cm = 5000km and VERT PRE is more likely to be around 1e5 or 100000cm = 1km although I'm not sure about this value - maybe 5km or 5e5 would better account for the height variation in the US?
It's a terrible approach, as it throws off all sorts of algorithms, and has the effect seen above. If you have to return a Long/Lat, make sure that your "Unknown" is something that nobody would ever confuse with being a valid address - North Pole, South Pole, In the middle of some Ocean, Dessert, or Mountain Range - whichever is least likely to be valid for your particularly application.
This makes it much easier for downstream application developers to filter out "Invalid" addresses, and simply eyeballing them on a map makes it clear what the "Invalid" value is.
And of course, nobody has thought of the possibility of including an accuracy radius with the position, so instead of having a point saying "It's here", you have "It's anywhere within this area".
I have not used MaxMind but with another geo data provider in addition to the centroid it returned the shape and the type of match it was (exact address, city, county, state, country,etc...). So you know whether the IP has an exact address match or not. But many application developers probably chose to ignore the complexity as that might be easier.
It may be a terrible approach in your view but it's essentially universal and is often the best one. If I ask just about any database for the Lat/Long of some city, it's going to return the location of some approximation for the center of the city--which may be a park, a house, or the middle of a river. It's not going to, nor should it, tell me "invalid question" because the city (or really any address within that city) isn't really represented by a single Lat/Long point.
I've worked on a number of applications, where we need to make a decision as to what the long/lat will be of "Unknown" - and we always pick something that is guaranteed to throw a flag for downstream consuming applications.
Recognizing how your data will be used, and taking some precaution to ensure that it doesn't result in scenarios described in the article is quite often fairly straightforward. (As evidences - The article itself made it clear that when they don't know the actual location, they have changed the long/lat to return a value in the middle of a lake to avoid this sort of problem in the future).
Not when this has consequences for whomever lives there
"Just send the drones to bomb the centroid of city X" doesn't seem so smart
Or they should have a giant disclaimer on their results saying that their information is subject to errors and it should not be used for legal/law-enforcement purposes
There's always some uncertainty. What do you want it to return when it can tell the location is Springfield, but no further? Will your suggestion actually stop devs from blindly converting into coordinates?
Then it should return both the center of Springfield and an accuracy measurement, like a city boundary. If the act of returning a location is inherently unreliable, then that must be made clear.
And then a layer deep in the app that's trying to simplify the world calls a function that calculates the center of that circle. Or averages the points, or returns just the first one, depending on laziness. The mapping world doesn't work in bounding circles, so it gets converted, and it probably gets converted in a way that loses important information.
It seems this kind of mistake just keeps cropping up again and again, from api results to programming language features. For some reason, people have this real, buring desire to return something that __looks__ useful, even in the most abject failure case.
Why are these law enforcement agencies turning up to -any- latitude/longitude pair from an IP-to-location database? Even when the location is "known" it's still just going to point to the center of a city or region an IP range was allocated to at some time.
To map a specific IP to a specific physical location you want to arrest someone at, you'd have to actually go to the ISP (with a subpoena) and ask them what customer that IP was allocated to at a specific time, then look up that customer's address. They know that, right?
Most likely reason? Ignorance. Very few people (sub 1% of the earth's population) actually know how an IP works, how an IP is assigned, or what their weak and often non-relationship to physical locations or GPS coordinates are. So when said people use a product that pretends to offer accurate correlation, they don't know any better.
But you'd think that if their job was law enforcement of internet crimes, they would have to have at least a very basic understand of this sort of thing. It couldn't be more than a 2 or 3 hour training course to teach people a bit about IP and how to use subpoenas or other law enforcement requests to ISPs to get what they need.
It doesn't look like law enforcement of internet crimes is the main use of this product. More like general law enforcement is just one use among many. That said, I am surprised the local sheriffs don't immediately recognize the address and disregard it, after the first few misunderstandings.
I barely know the first thing about how supply chains for illegal drugs work, and I suspect most of the population is like me. Law enforcement however should understand how this works however, otherwise how can they do their job effectively? Likewise if they're investigating computer crimes they should have some basic understanding of how computers work (at least from a practical perspective, they don't need the theory of why computers work).
Kill two birds with one stone, as they say: return the coordinates of FBI in Washington, DC. Sooner or later the lightbulb will go off. "Wait, that can't be right..."
I get that this is indicative of a bigger issue: default behavior. Choosing to return something as arbitrary as //almost// the center of the US as a default when the location is unknown is pretty universally ridiculous. And in production code expected to have a huge user base and likely a lot of situations where there's no known location? Alright.
That said, after reading again and again how ridiculous desicions like these lead to these disproportionate real-world effects, I can't help but laugh at the sheer absurdity of it all. "Digital hell", indeed.
I like how they 'fixed' it by moving the default to the centre of a lake. So you still have the waste of police (or whoever) driving out to some location before they figure out what's going on.
Yeh good compromise. The ideal solution would be for their API to not report a precise location if it doesn't know one. I guess they have no simple method of stating a region rather than a point, and if they did the consumer of the APIs would probably just use the centre of the region most of the time.
I guess they could just not give lat/long when unknown, but still state country, state, county, etc. if they're known.
It is up to the Developer to do sanity checks on the Accuracy for their type of application.
in some instances just knowing it came from the US or some other country is accurate enough.
To claim they should simply not return any data if they do not have an accuracy level to your arbitrary standards would make the service useless
IMO Max Mind is not the problem here, people taking the data and using it as if it is accurate to 1in is the problem.
Geo-location data on IP address has NEVER EVER been that accurate, NEVER. The fact the law Enforcement, Consumers and others use this data as the sole data point then act on that data is the problem, not that Max Mind Returned a Lat Lon to the center of the US
> To claim they should simply not return any data if they do not have an accuracy level to your arbitrary standards would make the service useless
Ideally they'd offer version of the service that gives a probability 'heat map', but in reality 99% of users would use a simplified version with an app configured threshold for what constitutes useful info for that app, and also how to convert a heat map to a single point (since that what most people seem to want) or a very localised region (e.g. within 100 meters or so).
In reality the simplified version would be provided as a service with a default threshold, anything under the threshold would not report a position, but could still report a country ISO code and perhaps state, county as optional extras. These are workable compromises to the ideal of everyone consuming a heat map in a sensible way (IMO).
>Ideally they'd offer version of the service that gives a probability 'heat map'
that is not "ideal" at all, one of the first uses of this data was for real time CC fraud Detection, giving a computer processing CC info a graphical "heat map" is less than useless. Most geolocation API data is consumed by computers that use it for many things, not presented to the user directly.
>In reality the simplified version would be provided as a service with a default threshold, anything under the threshold would not report a position, but could still report a country ISO code and perhaps state, county as optional extras.
It appears you believe this data is only for Human Consumption. If an API is designed to return LAT and LON and Accuracy, then that is what is should return, not an ISO country code. I get you believe no API should be designed this way, but I as a developer that consumes these services prefer it that way, makes it easier to write against
I as a developer am asking for Max Mind to give me Lat Lon and Accuracy, not a ISO Country Code or Heatmap
The heat map is the baseline data model; from that it's possible to derive simpler models for simpler use cases. However, providing access to the baseline model will likely be useful for some use cases. i.e. if you want something more nuanced than lat/long.
> It appears you believe this data is only for Human Consumption.
Most human's I've encountered don't refer to countries by their ISO code... most.
Exactly. If your API returned wkt text for the perimeter of the area, somebody who stores a simple lat-long pair is going to take the centroid and store that.
Completely agree with you. Just seems more sensible than the middle of a lake. Police will still turn up!
I wonder if it would be feasible to set it to one of the small islands map makers include to identify copyright infringements. Or, have a known fake territory where any 'dead' values can to be parked, such as the above.
I mean realistically it would be best to simply return "unknown location" or something of that nature. But barring that -clearly absurd- idea, set it to some place that's obviously incorrect. I propose the moon.
So at what level do you believe these service to be Accurate to before returning data? 10mi? 1mi? 50feet?
If they can not tell me what room in my home I am located should they simply return "the moon" as my location?
Different Services use this data for different reasons, some times simply knowing what nation the IP is from is enough, but you believe they should return "the moon" if all they can determine it is came from with in the US?
"Our last remaining topic of the day at this meeting, This years San Bernardino Sheriffs office request for a discretionary budget increase of 68bn Dollars for a "Space Capable Cruiser".
At least for towns that just have one zip code, I've sometimes seen the post office pinned as the town's location. (Although I'm not sure IP addresses will tend to correspond to towns as opposed to some other area.)
So I think there's an important lesson in here for software engineers.
How do you design your API so that it is as difficult as possible for your clients to misinterpret?
The "always return a center & accuracy" API is very simple and elegant, but with the benefit of hindsight, you can assume that a significant fraction of your users are just going to ignore that accuracy number and treat the center as precise. As was pointed out elsewhere in this thread, the default point isn't the only problem -- any town, city, or state will generate similar problems.
One option would be to return a richer result type:
(COUNTRY, "United States")
(CITY, "Portland, OR, USA")
(REGION, latlng-a, latlng-b)
(POINT, latlng, accuracy)
Now it becomes more difficult for a client to pretend most of those are precise points.
The best API is sometimes not the one that is easiest to use but most difficult to misuse.
I appreciate the notion that "explicit is better than implicit", so as to minimize assumptions and miscommunications. But that said, if the service is providing details for a region using the center-and-radius approach, that sounds perfectly reasonable. It explicitly states what is being returned, and the fact that it isn't a single point. For responsible clients who are using the API correctly, this data-response is the easiest to parse and make sense of.
I would hate to see a world where services are responsible for generating and clients are responsible for parsing data in an overly convoluted and cumbersome format, just to minimize the risk of irresponsible clients misinterpreting it.
I'd return a rectangle instead of a circle. One point would be the upper-left corner, the other the lower right corner. To get to the "center" you'd have to do arithmetic on the coordinates, which would encourage thinking.
Half of the classical problem of any detection system (like GPS, big data, IA): the cost of false positive.
The other one being false negative (like an alarm not detecting trepassing).
False positive are called artefacts, but people want to believe so much in the infallibility of IT that they use detection system as if the result were error prone.
Hence what I call the Oracle syndrome: genuinely scientific person relying on an inaccurate system by nature as an exact system. Then they scale up system an what is anecdotal occurence becomes a serious concern with accumulation.
Non conformity with expectations are not handled anymore, they are disdained and measurement systems (hence that can fail) are used as exact systems.
It is like death penalty: should we care about the innocent people that will pay a dear price from wrongfully giving too much trust in non perfect systems knowing there is a tendency to make it hard to contest the decision because it would attack the trust we have in the system?
There are two completely separate fucked up things here:
a) Maxmind shouldn't be returning a location like this for "Anywhere in the USA". It should either be 0,0 for unknown or something totally obvious like the Washington Monument in WA DC.
b) The fact that clueless/ignorant law enforcement is blithely trusting and USING this spurious data. Someday a person is going to get SWATTed and shot dead over this sort of thing.
Similar situation near Atlanta, Georgia (not the center of the US) was discussed earlier this year [1]. Reply All had a good podcast about figuring out the real issue [2] in this case.
You would think after like fourth or fifth IP search turned up this some bloody farm these guys might start questioning that result.
I don't work in LE at all but if I was, and I was getting sent to the same damn house every other week for everything from drug trafficking to sex slavery I'd start raising an eyebrow whenever dispatch tried to have me go there again.
I suppose they really can't just not go because of the gravity of most of those calls, but you'd think at least the stolen cellphone/car could wait until the morning.
From the original Fusion article[1], it sounds like the local LE know exactly what's going on, but need to intervene with people who aren't local - FBI agents, federal marshals, IRS collectors, ambulances, police officers, etc:
> “That poor woman has been harassed for years,” Butler County Sheriff Kelly Herzet told me by phone. Herzet said that his department’s job has become to protect the Taylor house from other law enforcement agencies.
Contrarian perspective: These GeoIP firms should start offering businesses in a given area the ability to sponsor the area and become the default location.
How in the world is this not a valid reason to sue? This private company's faulty data has led to a ton of stressful interaction with all kinds of Law Enforcement who bang on their doors for everything from a stolen iPhone to kidnapping and exploitation.
I don't think a settlement is out of the question here, frankly I think $75k is incredibly reasonable.
$75K is merely the minimum in damages required to sue in Federal Court; see page 2 of their complaint, #3: "This action involves a dispute between citizens of different states and the amount in controversy is in excess of $75,000. Accordingly this Court's jurisdiction is invoked under 28 USC § 1332...."
See e.g. https://www.law.cornell.edu/uscode/text/28/1332# for that bit of law, this is what's required to literally make it a Federal case. We can be quite sure they'll be asking for more in due course, if the case gets anywhere (it takes some effort e.g. $$ to lawyers to establish specific damages to a court's satisfaction).
> $75K is merely the minimum in damages required to sue in Federal Court;
Strictly speaking, $75K is the minimum amount in controversy required for federal courts to exercise diversity jurisdiction (where the requisite diversity of citizenship exists); other routes to federal jurisdiction (particularly, federal question jurisdiction) to which such a minimum does not apply.
Yeah, in fact, the amount seems calculated to elicit a response along the lines of the following: "Wait. Are you sure it's only 3 zeros? Hell, write them a check and be done with it."
I'm not sure the company really did anything wrong here. They can't check every area center for not being someone's house or somewhere else problematic. But this case is a particular outlier and the amount requested is petty cash levels--unless they worry about setting a precedent.
> I'm not sure the company really did anything wrong here.
I wouldn't say it's "wrong", I don't believe they did it maliciously. Negligence I think is the more appropriate term. Their software when it encounters an IP it doesn't know should say that, not simply return the default position. That would be like if every time you searched for a recipe that some website didn't know, it just gave you the recipe for macaroni and cheese.
> That would be like if every time you searched for a recipe that some website didn't know, it just gave you the recipe for macaroni and cheese.
If all the recipes are on one page, and it knows you want pasta, it can make sense to link you to an arbitrary recipe in the pasta section.
The article is poorly worded, but this is a default US address, along with default state and default city addresses. It's not what you get when the IP is totally unknown.
The thing is that it never knows for sure. And in many/most cases, it only knows an area best case. In the case of only knowing it's the US, yes, it's a big area. Yet, for a variety of reasons, developers/users find it useful to get a point returned to substitute for that area. Which is what this company does.
I'm not sure how that's negligent just because developers and/or users of various kinds don't understand the limits of the data they're using. If anyone is negligent, it's developers who use this data and provide it to users without caveats about its accuracy and precision.
If the software returns this point in such a way where there is no way to tell whether it's saying "this is RIGHT here" from "this is somewhere in this massive area" then that's a huge fail on the UI designer behind it. And again, if the software just doesn't know, then it shouldn't return the point at all. Why would you? It doesn't mean anything and it's not useful.
Why are people expecting the output of an IP mapping database to be precise enough to send law enforcement to that physical location? From what I could gather from the article, law enforcement (and others) are treating it as if it was as exact as a reverse phone number lookup, while in reality there's no way a global IP mapping database can get much more precise than "around that city" (unless perhaps if the IP address is in a datacenter).
That is, even without it being a "default location", sending people to the GPS coordinates obtained from an IP mapping database is nonsense.