Showing posts with label Latitude. Show all posts
Showing posts with label Latitude. Show all posts

Aug 10, 2013

PSGC Data Refinement

A couple of days ago, I crawled the PSGC Directory Listings. Today, I've been very busy refining the data I gathered from those crawls. Even though the data I gathered was made by a huge government agency in the Philippines, the data is still not suitable for any use. I was able to find place names with parenthesized "Capital". So I needed to strip those off and get rid of other parenthesized stuffs like place name aliases. Maybe some more refinements and it will then be ready for Inverted Indexing.
Another thing I did today was extracting latitude and longitude data from the Barangays shape file I downloaded from the Philippine GIS Data Clearinghouse. The file is in .7z format so I needed to extract its contents first and then find the file with the .shp extension. I then used GDAL's ogr2ogr utility to convert the shape file into a .kml file. Since KMLs are marked up using XML, I then was able to extract the necessary data I wanted with some lines-of-code in Java and the jsoup library.

After getting tremendous data, I felt sad with the results. :'( Not all of the Barangays have coordinates data. Maybe I can find more stuffs on the Web but still have no idea where to get one.

So much for now, I'll post some more updates in the next days.