OpenStreetMap

Minh Nguyen's Diary Comments

Diary Comments added by Minh Nguyen

Post When Comment
When AI is (not) needed

Based on the source tag, that building probably came from the French cadaster import. Many government building datasets have errors of this sort because the data collection is based on remote sensing technologies like LiDAR. Cleaning up these errors is the very reason why imports are more difficult than simply loading the data and uploading.

Whether an external building dataset comes from computer vision, machine learning classification, LiDAR, or other automated techniques, data consumers tend to prefer OSM data wherever it’s present because we’ve typically paid more individual attention and performed quality control on it. If you use an automated dataset in your product, you need to filter out low-confidence features or else you wind up with an impressive statistic but lots of junk.

It’s not just buildings. Every now and then, a navigation software vendor gets the bright idea to detect one-way streets automatically based on whether they have telemetry of people mostly going in one direction along the street but hardly in the other. Great – finally solved the problem of routing people the wrong way down a street! Invariably, they have to back away from this approach, because it turns out that many one-way streets don’t have the traffic volume needed to make a confident prediction about the traffic direction. Instead, they get complaints about having to circle around the entire city just to turn right. This data still makes for a great QA tool with a human in the loop, but it’s only a matter of time before someone sees that QA tool and gets a bright idea…

When AI is (not) needed

I’m guessing this is the Microsoft building dataset, which applies computer vision to aerial imagery. Some data consumers like Mapbox and Overture Maps are using this dataset to backfill areas where OSM building coverage is lacking or nonexistent. From their perspective, the increase in coverage in places with fewer OSM mappers probably outweighs individual bloopers like this, and I guess from our perspective, we’d rather not face a bulk automated import of this dataset due to these bloopers.

Another thing that commonly occurs is that a building has been demolished, so we’ve deleted the building from OSM. But a data consumer working off outdated aerial imagery can’t distinguish that from a never-before-mapped building, so it restores the building from the Microsoft dataset. Of course, a human mapper could make the same mistake if they happen to be using the same outdated imagery with no local knowledge.

To address both cases, I’ve gotten into the habit of retagging buildings as demolished:building=*, at least until the local default imagery layer gets updated. These data consumers will omit any Microsoft building that intersects a building one OSM, so I hope they’ll do something similar with demolished:building in the future. This key also has the benefit of serving as a to-do list for OpenHistoricalMap and as a pre-cleanup step for any building import planned for the area.

In theory, we could go around mapping no:building=yes for thr buildings on wheels you spotted, but my hands are full already without worrying about something that Microsoft could fix by tuning their noise filter.

Restructure wiki page key:name?

Yes, this page and the main “Names” page could use a thorough rewrite. There are a lot of intentional nuances in the text that matter but need to be organized better in order for readers to come away with what they need.

The article uses “primary name” in order to give an idea of when to use name versus some other name-related key such as name:en or alt_name. All of these keys hold proper nouns, or proper names as you put it, so replacing “primary name” with “proper name” would be correct but beside the point.

Minutely Shortbread tiles

Excited would be an understatement! I know it’s just a demonstration with no guarantees, but it just came in handy for the epic abandoned railway discussion we’re having. It was super simple to take your demo and extend it to demonstrate a mashup of minutely OpenStreetMap tiles and minutely OpenHistoricalMap tiles. There’s nothing quite like a live-updating, interactive map to convince others that it’s more than just talk.

OpenStreetMap + Wikidata

The possibility of minutely updates was one of the nice things about Sophox back when it was functional.

Sophox is functional again!

OpenStreetMap + Wikidata

I see. The possibility of minutely updates was one of the nice things about Sophox back when it was functional. It also queried Wikidata directly instead of keeping a local copy, at the expense of running time.

OpenStreetMap + Wikidata

Have you checked out QLever yet? It’s a fast alternative for federated queries on the server side. This diary post provides some examples to work off of.

QLever: a new way to query OpenStreetMap

I think setting it up would require as first step to dump the data items as RDF but I can’t find any documentation on that in the wiki.

You can get a full dump of the wiki’s pages and data items from this directory. I added a passage about it to the wiki page.

🌂 The Past, The Present, The Future

To your first point above: the close button on the banner was not about you. A number of us experienced the bug, yours truly took the time to calmly report the bug, you had some suggestions for fixing it, and it got fixed a different way. I’m sorry it didn’t get fixed in quite the way you suggested. Personally, I was pleasantly surprised at the turnaround time, and I don’t see any motive behind the bug that can be tied to the incident about AWS credits.

To your other points: I’m just a simpleton to whom clouds are welcome relief from the incessant sun in this part of the world. Simpletons like me don’t know what to do with all this melodrama.

Overture Places Data: Matching to OSM Tags

Bottom line, all POIs are questionable until checked by a person, which is

The line below this line is whether a local community considers this dataset to be a good use of their time versus other potential data sources as a reference point for verification (either in person or in an armchair). There isn’t a single global answer to that question.

Generalization of extraction of example codes, tabular data and Infoboxes from MediaWikis such as OSM.wiki

There’s a lot to unpack here, but just for awareness:

It’s JSON, which explains just how disconnected it actually is to the MediaWiki experience. That’s why it feels so foreign and disorienting, and functions like the completely tacked-onto experience it provides.

Wikitext is only one of the page content models that MediaWiki supports. For example, the Module: namespace is in Lua, and every user can personalize their wiki experience via personal subpages in JavaScript and CSS. Pages in the template namespace can also be in JSON, irrespective of Wikibase. Though this isn’t currently enabled on the OSM Wiki, we did consider it for event listings and such until OSMCal came along.

For all its warts, I appreciate the fact that Wikibase is intended for structured data. We can of course make wikitext look like structured data by convention and build custom tooling around it, but ultimately that results in a different kind of subpar experience for anyone who attempts to edit the wiki: you can write a wiki page using simple wikitext syntax as long as you avoid breaking several lightly documented tools that place arbitrary constraints on exactly how you write (e.g., whitespace and capitalization) it due to assumptions they make. Writing for the renderer, in other words.

I appreciate your efforts at data mining the OSM Wiki, to the extent that you find the output useful. I also appreciate your emphasis on reusing existing content without creating extra maintenance overhead. However, we should view this kind of tooling as being complementary to structured data, not in competition with it.

“The Birdcage is lonely” - @OpenStreetMap engagement on Mastodon/Fediverse is streets ahead of Twitter.

I thought maybe the curious dynamic on mastodon where we have a “osm.town” grouping of users then a wider fediverse might mean we’re more likely to get lots of engagement from the users in our instance. That might mean it’s a better channel for talking to our community, but not so good for outreach.

At the moment, in raw numbers, Mastodon probably isn’t as powerful for outreach as Twitter used to be, but then again neither is Twitter these days.

Mastodon has the advantage of being the social network where many OSM-adjacent people have congregated, people who are likely to understand the OSM account’s posts without needing background context and a glossary, and thus more likely to get involved OSM because of these posts. I agree that en.osm.town is a bit too cozy to serve as an outreach venue on its own, though this probably means it’s up to others to boost the posts to their still-likeminded followers beyond that server.

Check if POI website is active

Crossposted to this forum thread about solutions for avoiding link rot.

Maxar usage over the last year

Thanks for running these numbers. It helps to get a sense of where the missing imagery will be most acutely felt. Another measure of that would be the countries with the greatest share of Maxar-based changesets out of the total number of changesets.

Detour Routes in Pennsylvania

Indeed, there are two mapping styles. The two-relation approach is an optional enhancement over the one-relation approach. It has some benefits, including less likelihood of breakage (especially scrambled member order) when other editors come in later to split a way or remodel an intersection, and a more robust way to tag the signposted cardinal direction, if any.

Detour Routes in Pennsylvania

Because these detours are bidirectional, the from and to tags aren’t necessary.

Maybe it would be useful to create two relations, one in each direction, with the exit numbers as from and to, instead of relying on the name to communicate that information?

Mapping of runways

My point is that runways (and taxiways) are not “navigable” - they are not joined or connected to create routes or other relations.

It may help to consider nontrivial examples such as an international commercial airport that typically has not only taxiways but also service roads for airport vehicles crisscrossing the taxiways and apron.

OSM is already being used by flight simulator games that make use of runway centerlines (as well as their surface areas) for rendering and gameplay. This is not routing in the sense of an OsmAnd user getting directions to the supermarket, but it is still relevant for a “navigable path” concept to be mapped.

Mapping of runways

It also conforms to the very basic principle of “one feature on the ground, one entry in OSM”.

The “One feature, one OSM element” guideline goes on to list many examples of where multiple features are a good idea.

Sometimes it’s a matter of perspective. When you think of a river, you might be thinking of a channel to cross or navigate along, or you may be thinking of a body of water to enter. We can reconcile these two perspectives by mapping the “centerline” or perhaps the thalweg as a waterway=river way and simultaneously mapping the waterbody as a natural=water water=river area. Similarly, the idea behind area:aeroway=runway or any other area:*=* key is to provide an option to micromap the surface of something as opposed to the navigable path of something. By this logic, aeroway=runway would correspond to the centerline marking.

From a technical perspective, it’s possible but not necessarily straightforward to simplify a series of runway and taxiway areas into a series of centerlines that intersect correctly, especially because an area:aeroway=taxiway area would join to an area:aeroway=runway area with gentle curves on both sides. It wouldn’t be unreasonable for a renderer to make use of this kind of simplification as you zoom out.

Do people map single tennis courts?

A multicourt surface is sometimes named and signposted and even addressable as if it’s a “tennis center”, but often it’s just two or four conjoined courts that happen to share some out-of-bounds space. I guess the closest analogy would be baseball diamonds, which are often arranged so that the outfields overlap significantly. Basketball courts can also be combined on a single surface, but hoops is already well established for that scenario.

If these multi-court tennis complexes are tagged as sports centers, renderers need to be prepared for the possibility of sports centers within larger sports centers, as well as other edge cases. Plenty a YMCA features a fenced-in complex of two or four tennis courts. This actual tennis center containing a dozen courts each with spectator stands, three stadiums, and a campground would be tagged identically. This sports complex contains a tennis complex that has not been mapped because each court has been mapped individually.

Removing quantity= tags from pitches in the San Francisco Bay Area

I’m curious if folks would recommend also mapping a single fenced-in court as a pitch surrounded by a leisure=sports_centre. Otherwise it’s kind of inconsistent to map some courts along the out-of-bounds line and others at the edge of the pavement.