Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Saturday August 29, 2009
01:47 PM

198,586

[ #39553 ]

I now have 198,586 cities in my database, complete with latitude and longitude for all of them. As a result, my pet project allows you to click on countries, see the regions for that country and then click on a region to get a paged city list (1576 cities just for New South Wales) and see a map of any city you click on.

The site still doesn't do anything amazing and it's not particularly user-friendly. That's because I'm discovering that managing data quality is really hard when working with only free data. Not quite sure why American Somoa has a region named "00", but there you go. In fact, I have 54 regions in the database named "00". Lots of slogging through here to understand things.

(Who the hell spends time coding on their vacation?)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Yeah, it's quite easy to find data problems there.

    Germany > Berlin: the cities there are really boroughs, and most of them are duplicated (with and without the "Berlin-" prefix).

    Inconsistent umlaut handling: Germany > Baden-Württemberg has a "u" instead of "ü", but I see umlauts in the "cities" under Berlin.

    Croatia > Zagrebacka: this should not be Zagrebacka, but rather Zadarska or so.

    Croatia > Grad Zagreb: the city of Zagreb appears twice, once under the correct name and once under th

    • I'll look for the OpenStreetMap data and check the license. I've blown enough time looking at annoying data integrity issues that an alternative data source would be good.

      Also, SQLite is fine for a simple database, but when you need serious reliability and are struggling with data integrity, it's painful to work with.

      • Kudos for a great job!

        Let me know if you think I can contribute somehow, I am originally from Argentina so I can check or add places in Argentina/South America from sources in Spanish/etc.

        -- ank

        • Thanks for the offer of help. Eventually I'll get to the point where I'll be needing it from folks who want to help entering legal immigration data per country.

          To be honest, though, I might drop city/region support altogether. As I've discovered, most of the thorough information out there is geographic in nature (not surprising) but the information I need is political in nature. They don't quite fit together. If you want to emigrate to the US, do you really need to know where Paris, Texas is? Not reall