Minh Nguyen's Diary Comments
Diary Comments added by Minh Nguyen
Post | When | Comment |
---|---|---|
When AI is (not) needed | Based on the Whether an external building dataset comes from computer vision, machine learning classification, LiDAR, or other automated techniques, data consumers tend to prefer OSM data wherever it’s present because we’ve typically paid more individual attention and performed quality control on it. If you use an automated dataset in your product, you need to filter out low-confidence features or else you wind up with an impressive statistic but lots of junk. It’s not just buildings. Every now and then, a navigation software vendor gets the bright idea to detect one-way streets automatically based on whether they have telemetry of people mostly going in one direction along the street but hardly in the other. Great – finally solved the problem of routing people the wrong way down a street! Invariably, they have to back away from this approach, because it turns out that many one-way streets don’t have the traffic volume needed to make a confident prediction about the traffic direction. Instead, they get complaints about having to circle around the entire city just to turn right. This data still makes for a great QA tool with a human in the loop, but it’s only a matter of time before someone sees that QA tool and gets a bright idea… |
|
When AI is (not) needed | I’m guessing this is the Microsoft building dataset, which applies computer vision to aerial imagery. Some data consumers like Mapbox and Overture Maps are using this dataset to backfill areas where OSM building coverage is lacking or nonexistent. From their perspective, the increase in coverage in places with fewer OSM mappers probably outweighs individual bloopers like this, and I guess from our perspective, we’d rather not face a bulk automated import of this dataset due to these bloopers. Another thing that commonly occurs is that a building has been demolished, so we’ve deleted the building from OSM. But a data consumer working off outdated aerial imagery can’t distinguish that from a never-before-mapped building, so it restores the building from the Microsoft dataset. Of course, a human mapper could make the same mistake if they happen to be using the same outdated imagery with no local knowledge. To address both cases, I’ve gotten into the habit of retagging buildings as In theory, we could go around mapping |
|
Restructure wiki page key:name? | Yes, this page and the main “Names” page could use a thorough rewrite. There are a lot of intentional nuances in the text that matter but need to be organized better in order for readers to come away with what they need. The article uses “primary name” in order to give an idea of when to use |
|
Minutely Shortbread tiles | Excited would be an understatement! I know it’s just a demonstration with no guarantees, but it just came in handy for the epic abandoned railway discussion we’re having. It was super simple to take your demo and extend it to demonstrate a mashup of minutely OpenStreetMap tiles and minutely OpenHistoricalMap tiles. There’s nothing quite like a live-updating, interactive map to convince others that it’s more than just talk. |
|
OpenStreetMap + Wikidata |
|
|
OpenStreetMap + Wikidata | I see. The possibility of minutely updates was one of the nice things about Sophox back when it was functional. It also queried Wikidata directly instead of keeping a local copy, at the expense of running time. |
|
OpenStreetMap + Wikidata | Have you checked out QLever yet? It’s a fast alternative for federated queries on the server side. This diary post provides some examples to work off of. |
|
QLever: a new way to query OpenStreetMap |
You can get a full dump of the wiki’s pages and data items from this directory. I added a passage about it to the wiki page. |
|
🌂 The Past, The Present, The Future | To your first point above: the close button on the banner was not about you. A number of us experienced the bug, yours truly took the time to calmly report the bug, you had some suggestions for fixing it, and it got fixed a different way. I’m sorry it didn’t get fixed in quite the way you suggested. Personally, I was pleasantly surprised at the turnaround time, and I don’t see any motive behind the bug that can be tied to the incident about AWS credits. To your other points: I’m just a simpleton to whom clouds are welcome relief from the incessant sun in this part of the world. Simpletons like me don’t know what to do with all this melodrama. |
|
Overture Places Data: Matching to OSM Tags |
The line below this line is whether a local community considers this dataset to be a good use of their time versus other potential data sources as a reference point for verification (either in person or in an armchair). There isn’t a single global answer to that question. |
|
Generalization of extraction of example codes, tabular data and Infoboxes from MediaWikis such as OSM.wiki | There’s a lot to unpack here, but just for awareness:
Wikitext is only one of the page content models that MediaWiki supports. For example, the Module: namespace is in Lua, and every user can personalize their wiki experience via personal subpages in JavaScript and CSS. Pages in the template namespace can also be in JSON, irrespective of Wikibase. Though this isn’t currently enabled on the OSM Wiki, we did consider it for event listings and such until OSMCal came along. For all its warts, I appreciate the fact that Wikibase is intended for structured data. We can of course make wikitext look like structured data by convention and build custom tooling around it, but ultimately that results in a different kind of subpar experience for anyone who attempts to edit the wiki: you can write a wiki page using simple wikitext syntax as long as you avoid breaking several lightly documented tools that place arbitrary constraints on exactly how you write (e.g., whitespace and capitalization) it due to assumptions they make. Writing for the renderer, in other words. I appreciate your efforts at data mining the OSM Wiki, to the extent that you find the output useful. I also appreciate your emphasis on reusing existing content without creating extra maintenance overhead. However, we should view this kind of tooling as being complementary to structured data, not in competition with it. |
|
“The Birdcage is lonely” - @OpenStreetMap engagement on Mastodon/Fediverse is streets ahead of Twitter. |
At the moment, in raw numbers, Mastodon probably isn’t as powerful for outreach as Twitter used to be, but then again neither is Twitter these days. Mastodon has the advantage of being the social network where many OSM-adjacent people have congregated, people who are likely to understand the OSM account’s posts without needing background context and a glossary, and thus more likely to get involved OSM because of these posts. I agree that en.osm.town is a bit too cozy to serve as an outreach venue on its own, though this probably means it’s up to others to boost the posts to their still-likeminded followers beyond that server. |
|
Check if POI website is active | Crossposted to this forum thread about solutions for avoiding link rot. |
|
Maxar usage over the last year | Thanks for running these numbers. It helps to get a sense of where the missing imagery will be most acutely felt. Another measure of that would be the countries with the greatest share of Maxar-based changesets out of the total number of changesets. |
|
Detour Routes in Pennsylvania | Indeed, there are two mapping styles. The two-relation approach is an optional enhancement over the one-relation approach. It has some benefits, including less likelihood of breakage (especially scrambled member order) when other editors come in later to split a way or remodel an intersection, and a more robust way to tag the signposted cardinal direction, if any. |
|
Detour Routes in Pennsylvania |
Maybe it would be useful to create two relations, one in each direction, with the exit numbers as |
|
Mapping of runways |
It may help to consider nontrivial examples such as an international commercial airport that typically has not only taxiways but also service roads for airport vehicles crisscrossing the taxiways and apron. OSM is already being used by flight simulator games that make use of runway centerlines (as well as their surface areas) for rendering and gameplay. This is not routing in the sense of an OsmAnd user getting directions to the supermarket, but it is still relevant for a “navigable path” concept to be mapped. |
|
Mapping of runways |
The “One feature, one OSM element” guideline goes on to list many examples of where multiple features are a good idea. Sometimes it’s a matter of perspective. When you think of a river, you might be thinking of a channel to cross or navigate along, or you may be thinking of a body of water to enter. We can reconcile these two perspectives by mapping the “centerline” or perhaps the thalweg as a From a technical perspective, it’s possible but not necessarily straightforward to simplify a series of runway and taxiway areas into a series of centerlines that intersect correctly, especially because an |
|
Do people map single tennis courts? | A multicourt surface is sometimes named and signposted and even addressable as if it’s a “tennis center”, but often it’s just two or four conjoined courts that happen to share some out-of-bounds space. I guess the closest analogy would be baseball diamonds, which are often arranged so that the outfields overlap significantly. Basketball courts can also be combined on a single surface, but If these multi-court tennis complexes are tagged as sports centers, renderers need to be prepared for the possibility of sports centers within larger sports centers, as well as other edge cases. Plenty a YMCA features a fenced-in complex of two or four tennis courts. This actual tennis center containing a dozen courts each with spectator stands, three stadiums, and a campground would be tagged identically. This sports complex contains a tennis complex that has not been mapped because each court has been mapped individually. |
|
Removing quantity= tags from pitches in the San Francisco Bay Area | I’m curious if folks would recommend also mapping a single fenced-in court as a pitch surrounded by a |