Showing posts with label Barangay. Show all posts
Showing posts with label Barangay. Show all posts

Aug 10, 2013

PSGC Data Refinement

A couple of days ago, I crawled the PSGC Directory Listings. Today, I've been very busy refining the data I gathered from those crawls. Even though the data I gathered was made by a huge government agency in the Philippines, the data is still not suitable for any use. I was able to find place names with parenthesized "Capital". So I needed to strip those off and get rid of other parenthesized stuffs like place name aliases. Maybe some more refinements and it will then be ready for Inverted Indexing.
Another thing I did today was extracting latitude and longitude data from the Barangays shape file I downloaded from the Philippine GIS Data Clearinghouse. The file is in .7z format so I needed to extract its contents first and then find the file with the .shp extension. I then used GDAL's ogr2ogr utility to convert the shape file into a .kml file. Since KMLs are marked up using XML, I then was able to extract the necessary data I wanted with some lines-of-code in Java and the jsoup library.

After getting tremendous data, I felt sad with the results. :'( Not all of the Barangays have coordinates data. Maybe I can find more stuffs on the Web but still have no idea where to get one.

So much for now, I'll post some more updates in the next days.

Aug 9, 2013

Complete Successful PSGC Crawl

One of the requirements for my post-graduate Application Project is to obtain an updated copy of the Philippine Standard Geographic Code. There are many reasons why the PSGC is updated, you can read all about it in the PSGC Interactive - Updating Procedures. So, I created a crawler to get the necessary data from NSCB's PSGC Directory, starting from the List of Regions to the detailed listings of Provinces, Cities, Municipalities and Barangays. My crawl finished around 2:30 pm I guess (Philippine Time).
The crawler was really not that complicated to make. I was able to make it by using Java and some external classes from the jsoup project. To store the gathered data I used MySQL (naive programmers' favorite choice) with the help of its connector for Java. The most notable obstacle that I've encountered during the making of the crawler was the old-school markup of the pages. You'll see that the markups for the pages were most of the time marked up using <tables> that's why traversing the tags to get to the right data may come a little tricky.

Even though I only used a not-so-powerful computer, yet I was able to crawl it easy. The crawler was designed for optimal usage of processor cores, thus making the crawling more faster. The computer that I used is just a laptop with only 2 cores but the crawler was made to have an optimal performance such that the number of threads to be run for crawling will be less than one from the number of processor cores, thus leaving another core for the main process that has the same set of jobs as the threads. Just some basic parallel processing. My previous professor even called it "Naive Parallel Programming". lol

I am so happy when I checked the PSGC's most updated summary (as of March 2013) because I had the same results of data after matching it with the data I gathered from the crawl.

For now, maybe the next thing I'm going to do is refining the data.

I hope God will give me the might to do everything I needed to be done. :-)

Aug 2, 2013

Philippine Places API

There is a need to have a Web API for the places in the Philippines which include the names of Barangays, Municipalities, Cities and Provinces. There are so many enthusiastic Filipino Web Developers out there that really need an API for it. So, my solution is the PSGC Web API: A Philippine Standard Geographic Code Software Platform.

The PSGC Web API is my post-graduate special problem. I hope that there are also some people out there who think that this project is very important. As of now, I'm still on research and documentation stage of it. I'll post again on this blog regarding the updates.

By the way, if you are interested with the project, you might want to read the Background of the PSGC and how the standard geographic code was structured. Then to make the API available for public use, I'm going to use JSON and probably XML as well to make something like this.

I hope I can finish my project as soon as possible.

Feb 19, 2017
UPDATE: I've already published the Web API in juan-ld.appspot.com. Go to http://juan-ld.appspot.com/ to see the available web services