sam keen's corner of the web

Portland Restaurant Health Map

Inspired by Toby Segaran’s creation of a heat map of restaurant health inspection scores for San Francisco, I set out to do the same for Portland, Oregon.  I was able to scrape establishment records from the existing Multnomah County Food Establishment Inspections Search.  Then I got lat/longs for those addresses using Google’s geocode API.  Using the gathered establishment records I was able to scrape the MCHD search site once more for the Inspection records of those businesses.  Lastly, I displayed the data on a Google map graded green for good score down through to red for the not so good scores.

See the map here. (It’s plotting ~3000 markers so it can take a bit of time to load).

Health Scores Map

After looking at the map I realized, just as Toby did, other than displaying ‘concerning’ red areas, this map exposes restaurant ‘clusters’ that you may not have known about and can then choose to explore.

Below is some more detail on the techniques I used to build this particular map.

Get the data

I searched around the web and found the Multnomah County Food Establishment Inspections Search.  I made an attempt to get an export of the back-end data for this by calling county health.  They were helpful but when I received the export of the data it was just the business records, not the inspection records for the businesses.  I think they may have just misunderstood me and  I probably could have pressed more and gotten data I needed, but I thought it would be fun the polish the old web scraping skills instead.  Also, a scraping strategy would allow me to refresh the data whenever needed and I wouldn’t have to keep bugging MCHD.

Selenium is a suite of web app testing tools.  Selenium IDE is a firefox addon that records all the actions you take in the browser.  Meant for building functional testing scripts but doubles as a great web scrape script building tool.  What I find most useful about Selenium IDE is that after recording your script you can export it to the xUnit version of your choice, PHPUnit in my case.  This gives you the starting point for your script and you can then add things like database persistence for the information you are scraping and utilize the phpUnit assertion methods to let you know if your script has broken.  Then, you can use a tool called Selenium Remote Control to re-run the script (and drive the browser) anytime you need.  This intro video does a great job of explaining some of Selenium’s features.

mchd-running

Selenium Script Churning Through Data

So I needed to accomplish 4 things.

(I included links to the scripts I used, these are by no means ‘production ready’ but some might find them useful as a starting point to their own projects.  Also here is the DB sql used.)

Step I: Pull the establishment records from the MCHD site

I noticed that on the MCHD site, if you just clicked ’search’ without any criteria you were taken to a holistic, paginated set of the records (currently 130 pages for total of just over 3000 records).  So my first script was started with selenium IDE and worked through these 130 pages, gathering establishment meta-data and storing it in a mysql database. The Script

Step II: Pull the inspection scores for the gathered establishments from the MCHD site.

This was done by again, using Selenium IDE to start a PHPUnit script that scraped the inspection record for each gathered establishment.  It simply queries all the establishment ids from out database and builds URLs to scrape the inspection data for each business and store that information in an inspections table in the database. The Script

Step III: Geocode the Establishment Records using Google

In preparation for displaying the data on a map, I needed to get the lat/long for each establishment in the database.  I did this using Google’s Geocode service but there are many options for services that can accomplish this.  Just be sure you stay within their Terms Of Use. The Script

Step IV: Display the Establishments on something akin to a heat-map

This is the easy part once you’ve done the heavy lifting in steps I,II and III.   It involves one query from the Db to pull the score, latitude, and longitude for each establishment we know about.  Then using a fairly simple php web page we build the HTML and javascript to display this data on a Google map.  The Script

Leave a Reply

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>