OpenStreetMap logo OpenStreetMap

balrog-kun's Diary

Recent diary entries

LZO for compressing planets

Posted by balrog-kun on 19 July 2009 in English.

Earlier I compared dealing with planet files compressed with bzip2 and gzip, and user Mungewell suggested trying out LZO (which I had originally remembered as LZMA, which is actually a different compressor with the opposite goals: maximize compression ratio, regardless of processing time). LZO turns out even better than gzip because it compresses and more importantly decompresses even faster, again at a loss in compression ratio but without leaving the order of magnitude set by gzip, bzip2, lzma. Recent planet sizes shape as follows:

raw 150 GB
lzop 14 GB
gzip 10.5 GB
bzip2 6.5 GB

...with lzop decompressing at least 2x as fast as gzip (which is already at least 15x faster than bzip2), so on a (2009) average hard-drive and average desktop CPU, processing a planet (reading off HD + decompressing) is fastest with LZO. Compression + writing to HD is also the fastest with LZO on my hardware, unfortunately I can't give exact numbers because I'm doing my processing on one of my university's machines now, which have better specs.

I suspect with LZO we're close the sweet spot and with one of those slower HDs you might be better off using gzip because the balance between CPU speed and HD access speed is moved in the direction where you want to save on IO. You definitely don't want to use raw planet because IO becomes the bottleneck even on fastest avilable hardware.

openstreetmap.hu

Posted by balrog-kun on 7 July 2009 in English.

When I acquired openstreetmap.pl I noticed lots of other ccTLDs were already taken so I thought I'd get as many as I can before some domain troll gets them because I had problems with those before. I only got .pl and .hu, most other domains were either really expensive, required a passport number of the country or something else (.hu required sending some documents in Hungarian by snail mail but I could do that). So I've set up a very simple Polish page on opesntreetmap.pl that does nothing fancy but has the initial view set on Poland and localised links, and then redirected .hu to the same page only adding javascript for changing the initial location on the page to point at Hungary.

So, if anyone has ideas for a better use for the .hu domain or can at least translate the copyright notice and the "edit" and "forums" links to hungarian then I can host a localised page on the same computer or we can figure out something else.

(posted from the Gran Canaria Desktop Summit where there was going to be a mapping party but it doesn't seem like enough people know about it and on Thursday when it was originally planned some people will already be taking off to the State Of The Map, so once again it seems like I'll never have an occasion to attend a real mapping party)

So I visited User:Mala in Poznań for the weekend (which we stretched to four days) and together we had our first formal Mapping Party, in the sense that more than one person had been out in the "real world" at the same time in the same place, with the specific goal of putting stuff on the map. We've been writing down everything and more, and then loading up JOSM on my laptop and mapping it before wandering off too far so if we had any doubts (my handwriting is really bad) we could just look again. The density of amenities and POIs in the centre of Poznań is such that you can map a whole lot without walking a distance longer than 100m. This also means that most stuff will not get a chance to render on mapnik because the renderer tries to avoid drawing things one on another, but this is what the centre looks like now:

http://osm.org/go/0ORAyZuhg==
(compare with osmarender)

In two weeks Mala is visiting me in Warsaw and hopefully we can do the same to the Warsaw Old Town market square which, like the Poznań Old Market is all pedestrian traffic only so you don't go there often except for tourism and so the seemingly busy and historically rich area is not really well mapped (http://osm.org/go/0Oy6Vx9yh==?layers=0B00FTF).

Location: 52.408, 16.934

Gzip for compressing planets

Posted by balrog-kun on 18 June 2009 in English. Last updated on 20 March 2010.

I've eventually set up my own live mirror of the planet file today and needed to write the update scripts etc (I could probably have downloaded somebody else's from somewhere but nobody answered quickly enough on IRC).

I first tried to have osmosis take the bzipped planet, patch it with the daily diff and generate a new bzipped planet (yes, I took care for the bzipping and unbzipping to happen in separate processes so they could run on the second core). The decompression speed was ok, the processing speed was amazingly ok too (despite java) but compression was taking 80% of the three processes' total CPU time, the ratio was about 10/15/100 (bzcat/osmosis/bzip2). It would have taken likely more that 20h to for this to complete, the result would be about 6GB.

Then I tried saving uncompressed and the thing became IO bound by my SATA disk I think, might have finished in 5h perhaps and occupied about 150GB.

Now I thought the best option would be a compromise using a really cheap algorithm, which should still be useful considering the planet is all text. I considered gzip and lzma, the benchmark here: http://tukaani.org/lzma/benchmarks made it pretty clear that lzma was even heavier on time than bzip2, and that gzip -1 (or --fast -- the lowest compression ratio setting) is clearly what I wanted. Both compression and decompression is multiple times faster than with bzip2 or lzma in --fast mode. The compression ratio is still in the same order of magnitude with bzip2 and lzma (not more than 2x worse), enough to pretty much guarantee that the whole conversion won't be IO bound on an average SATA 1TB disk and an Athlon 64 2X. The ratio of cpu cycles consumed (bzcat/osmosis/gzip) is quite sane now, 35/100/15. If the benchmark I linked is right then also reading the new planet.osm for further processing should be multiple times faster than either of: uncompressed, bzip2 or lzma.

EDIT: ungzip/osmosis/gzip process takes just over a 3h on the above setup and the planets are 10GB in size, the CPU usage proprtions were about 7/100/13 on average.