This article's quotes of what appear to be OpenStreetMap representatives are generally positive, so maybe that means they fixed all the problems they caused.
There are many positive developments such as good quality satellite imagery and goverment geographic data being offered to OpenStreetMap contributors for integration. An up-to-date aerial image of a city does wonders to the mapping experience. Mapillary's service complements OSM nicely as well as provides an invaluable on-the-ground data source. There's so many happy things in the community I can't possibly list them all.
Agreed, Tranport for London recently built an enormous database of cycling infrastructure to be integrated into OSM. It's an exciting project/collaboration.
Projects like these highlight that as ML becomes more and more complex, a gap grows larger and larger which must be filled by manual or semi-manual labor. If it isn't a team of volunteers combing the rendered results for errors, it's users who have succumbed to those errors and leave feedback in the hopes the system improves.
That gets me wondering - is the future of AI really just a semi-autonomous twilight zone where cheap / free labor augments an already faulty system? If not, what possible application is there for an expensive and closed automated system which works 100% requiring no human input, when other options are cheaper and leave clear directions for improvement?
Yes and no. There is a huge explosion of 'Ops Plus' startups that take an existing manual process and then build some basic tooling around it (with or without any substantial ML component). There are mild to moderate efficiency gains coming from this, but a lot of their valuation is coming from a bet that in the future they'll be able to fully automate the system and reap efficiency gains.
In practice, almost nobody is even thinking about building a fully automated process for every case. The reason is simple: automating the first 60% of work takes x effort, automating the next 30% takes 10x, and automating the next 9% takes 100x and operating the final 1% is essentially impossible. So if you came to the table with the goal of 100% automation right out the gate you'd spent 10 years developing something with little to show.
I think full automation of some systems is possible, but is actually blocked by generational norms. By and large the systems that "Ops Plus" startups are attempting to automate were designed by people who are not digital natives. They're not illiterate, but things like instant messaging, async communication and and structured data are not natural primitives for them. I'm not saying everyone in the Fortnite generation is a master data modeller, but I think that when they join the workforce they'll set up systems that are much more feasible to automate.
What you describe is pretty much exactly parallel to the automation of physical work. Having a steam engine gets you a long way, then the first few robots a bit further... there's no particular reason to want a factory with zero employees (except the watchman and his dog...) unless you want it to work on Mars or something.
I actually just made it up. Companies that come to mind are Scale.ai (doing data labelling stuff for ML that historically would have been done by outsourcing companies, etc), Flexport (freight forwarding, traditionally done via spreadsheets and emails), Checkr (background checks), Atrium (legal services), Oscar/Clover (health insurance), Cadre (real estate investing). There are tons and tons of them in the recruiting (really sourcing) space too: I'd say Triplebyte is an Ops Plus company, as is Sourceress and a couple more.
Surely there's a spectrum here? Conventional, manually-made maps are full of errors too, and there's no army of low-paid workers combing over those— nope, they're published as they are, and it's up to end users to report problems. The better the map maker is (whether it's an ML black box or a team of human experts), the fewer errors there are, and either way, one expects an upward quality trend over time.
> If it isn't a team of volunteers combing the rendered results for errors, it's users who have succumbed to those errors and leave feedback in the hopes the system improves.
Very cool that they are integrating it with HOT tasking manager and making it easy for anyone to use the editor with the ML-generated proposed objects in several countries. I think the ML has been there for a couple years but currently it's not easy to take advantage of it.
Hopefully they eventually release the ML pipeline itself as well.
I guess it's nice in those areas with low contribution precentage. But I suspect in many cases and definetly in Estonia there's a national high quality map database that could be used to augment existing maps instead. I'm wondering though, has anyone attempted to do so, quick Google search reveals nothing.
It is possible to "import" other data into OSM. However it's very tricky. Licencing/legal issues aside, it can be very technologically challenging to merge another database into OSM.
I suggest posting to the OSM talk@ mailing list, or if there's a local estonian one too
Depends on whether the owner of said high quality map database will license it for inclusion in OSM.
My understanding, from following the Australian OSM mailing list, is that it takes an individual to pursue this with a government agency, which is a ton of work, and often you'll just get a 'no'.
This is semantic segmentation, so it has to be a convnet. I've never heard of trying to do semantic segmentation any other way, and a prototype of another approach wouldn't go into production at scale.
I don't disagree, but I'd be careful with that statement.
A huge amount of landcover segmentation in remote sensing still relies on simple models - either linear regression (thresholds) or classical machine learning like random forests or SVMs. For a lot of cases, these techniques will get you 90% of the way and it's very rare to have ground truth data that is accurate enough that you can measure the difference with any real degree of confidence.
A big problem in the field is the lack of good (public) ground truth. There's so little hand labelled data to work with that without humans in the loop it's extremely difficult to validate the results meaningfully (unless you have an army of staff to do it). With something like roads you could also have heuristics about what a road looks like and where it goes (e.g. it's a continuous thin line), which can help condition things.
I've seen a lot of papers which are applying deep learning for semantic segmentation for satellite mapping, but they evaluate on very limited datasets, they attempt to regress to simpler models without realising it (e.g. trying to predict a linear model), or they leak train and test data and report amazing results because they randomly split data from the same region.
I'm not saying that convnets aren't better than simpler models, but particularly for satellite imaging I'd take them with a pinch of salt and see what the improvement from a baseline method is. If you look at a random sampling of papers from the DeepGlobe competition, almost none of them provide the results from e.g. a cheap linear SVM.
Fun side note - several existing "famous" datasets generalise poorly to the developing world because most of the imagery is from the developed world (and even more specifically the West) and infrastructure looks totally different.
How do you give a class to each pixel in an image using a linear classifier in a way that uses the surrounding pixels as context/input? I'm genuinely curious! You are right about the data, it's expensive to make and startups based on satelite imagery tend to keep them as it's their main advantage.
In a most cases, you can reshape to a 1xN^2 vector for a NxN region. This was how object detection worked long before convolutional inputs were popular.
Have a look at mnist classification using a linear SVM, for example.
Classification makes sense, because you do a linear (or kernel) combination of the input and squash it using sigmoid to get a probability of a class. For segmentation you output a pixel mask so you would have a NxNx3 vector to predict 1 class for 1 pixel and then you would have to do it for all pixels so you'd have to encode the position as well. Alternatively, if you take a unique weight for each position you end up with get single FC layer with NxNx3 inputs and NxN outputs (N^4 parameters). I guess for me it's hard to imagine doing segmentation "back then" and I find it very fascinating.
FB is far from the only big tech company which has sponsored OSM as a competitor to Google Maps (Microsoft/Facebook/DigitalGlobe/Telenav/FourSquare/Craigslist have all sponsored OSM to some degree; Apple, of course, went its own way and created Apple Maps).
It's a reaction to Google Maps: a monopoly on high-quality up-to-date global maps with business location is dangerous to everyone else, as a chokepoint on mobile applications. It's less about 'acquiring data' and more about not being extorted by GM. Classic 'commoditize your complement' dynamics: https://www.gwern.net/Complement
Apple Maps is based on OpenStreetMap, and they use OSM in many countries. I believe they use OSM for turn-by-turn routing in Denmark (or was it Netherlands) (source: Apple gave a talk at the OSM conference (SotM) in 2018, but required it not be recorded).
Their motivation is self-serving but I don't think it's so nefarious. They use OpenStreetMap in check-in posts to display map data around locations that Facebook users have visited, so improving OpenStreetMap in turn improves the quality of this feature.
Facebook wasn't the only company adding low-quality data: https://forum.openstreetmap.org/viewtopic.php?id=64430 (Previous discussion on HN: https://news.ycombinator.com/item?id=18723138 )
This article's quotes of what appear to be OpenStreetMap representatives are generally positive, so maybe that means they fixed all the problems they caused.