My main side project over the last many months is in response to what I saw during the last big Ebola outbreak which – sadly – is still simmering along such that “flare-ups of cases continue” despite the emergency having been officially declared over. The first piece of software to come out of that side project is called an omolumter (live demo of v0.1.0).

Omolumeter v0.1.0

Back in December of last year I completed analysis of the open data on the Ebola 2014 West Africa outbreak. Currently, building on that analysis, I am working on a data format I am calling the Outbreak Time Series Specification (“the Spec”). The API design is actually leveraging the recently completed W3C Recommendation, CSV on the Web (CSVW). In parallel I am writing software which reads files (or in HTTP terminology, resources) that are formatted compliant to the Spec.

The software is called an omolumter, a word I just made up. It’s a meter for Omolu, the African orisha (“god”) of epidemics and the healing thereof. It will be the first app that reads outbreak data compliant to the Spec. Currently all it does is parse a CSV that contains outbreak data and renders it in a table view. Fancier visualizations such as charts and maps are to be added in later versions of the omolumeter. Everything will be implemented using liberally licensed web technology, no flash or native apps. This will be the codebase that is the reference implementation for the Spec. I am building this out as I write the spec in order to prove it is easy to implement software which works with the Spec.

The omolumeter code is just a single web page with a lot of JavaScript, known as a Single Page App (SPA). Specifically it is based on the AngularJS framework (v1.5) with a Material Design look-and-feel user experience. The same code can be packaged as a native app for iOS and Android (so same user experiences as a native app but it is not technically a native app; it is web technology packaged for a native deploy). It currently works in Chrome Mobile and Safari Mobile.

Internet Explorer is not going to be supported. (Yeah, I said it.) Well, not by me. If anyone wants to join the party, pull requests are welcome :)

Back in April 2015, I started working with the City of Seattle on a very interesting project which just went live at version 1.0 (press release). Note: here “1.0” means it has been publicly deployed and is collecting data but 2.0 will be slicker still. I am really excited about this technology and more importantly the long term legal implications.

I was the project lead on this. Open Technology Institute was the “vendor.” Bruce Blood, Open Data Manager for the City, was the point man inside the government. Bruce is a good guy and he is doing great work; it will be a civic loss when he retires later this year.

ISP quality map

The City is calling the tool the “Broadband Speed Test”. Personally I label it as an “ISP quality monitor.” (“Broadband” sounds so highfalutin that the term must have come out of some marketing department.) We already have air quality monitors and the County mails out pamphlets regularly which enumerate quality measures for the drinking water. The goal is of this project is that one day we will have laws regulating net neutrality and quality. In such a regulatory scheme we will need ISP quality monitoring tools.

Clearly we do not currently have laws for this but even it the government simply collects and disseminates network quality information that is a big win. And now for the first time in the USA we have the government (albeit only a municipal government) collecting this sort of information. Hopefully efforts like this will lead to federal legislation.

Even without regulatory laws, simply governmental monitoring of the network may well lead to the situation improving. An analogy can be made to commercial airlines. The federal Department of Transportation collects performance data on airlines. For example, “Gate Departure Time” data is collected. Then reports of delays per airline are periodically published.

Simply collecting and making available such data was sufficient to get the airline corporations to improve those numbers. (Ironically, this is one of the reasons that we Americans now spend so much time parked on runways: the numbers reflect when an aircraft pushed off from its gate, not when the airplane started flying.) Based on IP addresses collected during network speed testing, we can determine which ISP the test is being run over and then later generate quality reports per ISP. Hopefully, similar to the airline departure times, we will see an improvement in the performance numbers of ISPs now that we are starting to collect the relevant information.

So, legally this may well prove to be interesting stuff in the long term. Moving on to the technological aspects of this project, that is currently interesting. NDT (Network Diagnostic Test) is the very core of this technology package. NDT is the bit-pushing speed tester. The code is liberally licensed. This is the code which actually generates a fake file and sends it to the server for upload timing. Then the server sends a fake file to the client for download timing.

The map UI is based on two high-quality JavaScript GIS libraries: Leaflet for the base map and Turf for aggregating results within US Census Block Groups.

Piecewise is the name of the web app that runs the tests and generates the reports and provides the user interface. Unfortunately Piecewise is licensed under GPL3, rather than Apache or MIT, but that detail is not a show stopper.

If you want further information, I put a lot of work into a wiki and you can also see some of my project management artifacts.

(Update: since originally posting this, I have learned a trick or two which avoids some of the problems discussed here. Nonetheless, this post still stands as potentially helpful to anyone walking the same path I did. See below for details.)

This is one of those posts that a confounded developer hopes to find in a time of need. The situation: generating TopoJSON files. Presented here are two attempts to generate a TopoJSON file; the first one failed, the second worked for me, YMMV. The fail path is due to a naive attempt to do the conversion in one step. The successful path is one I eventually figured out that involves two steps. The two step process consists of using topojson and then GDAL (on OSX, it can be installed via brew install gdal).

As part of the seattle-boundaries project, I needed to translate a shapefile for the Seattle City Council Districts to TopoJSON. I got the shapefile, Seattle_City_Council_Districts.zip, from the City of Seattle’s open data site.

TopoJSON is from the mind of Mike Bostock. As part of that effort he created a “small, sharp tool” for generating TopoJSON files. The command line tool is appropriately called topojson.

My naive and doomed first attempt was to generate the TopoJSON directly from the shapefile, which is advertised as possible.

1
topojson --out seattle-city-council-districts-as-topojson.bad.json City_Council_Districts.shp

(Update: turns out the mistake I made was not using the --spherical option. Inspecting the *.prj file that came with the *.shp file revealed that the data was on a spherical projection. Re-running original command as topojson --spherical ... worked like a charm.)

Below is the generated file, uploaded to GitHub. Notice the orange line at the north pole. That is the TopoJSON rendered (read: FAIL).

Generated TopoJSON which does not work

The tool, topojson, can read from multiple file formats, including shapefiles (*.shp), but there were problems with it converting the Seattle City Council District file. The root of the problem is in the JSON with the nutty bbox and transform which clearly are not latitude and longitude numbers:

1
2
3
4
5
6
7
8
9
10
11
bbox: [
  1244215.911418721,
  183749.53649789095,
  1297226.8887299001,
  271529.74938854575
],
...
translate: [
  1244215.911418721,
  183749.53649789095
]

On the other hand, if this file is uploaded to mapshaper.org then it renders well. Note though that there is no geographic “context” for the rendering, i.e., no global map and the Seattle City Districts are scaled to take up the full window’s allocated pixels. Perhaps mapshaper is not using the bbox and such, which enables it to render.

I explored the topojson command line switches but was not getting anywhere, so I went to Plan B which eventually got me better results. This involved two stages: first use GDAL’s ogr2ogr to translate the shapefile to GeoJSON, and then feed the GeoJSON to topojson.

1
2
ogr2ogr -f GeoJSON -t_srs crs:84 seattle-city-council-districts-as-geojson.json City_Council_Districts.shp
topojson --out seattle-city-council-districts-as-topojson.good.json seattle-city-council-districts-as-geojson.json

The resulting TopoJSON renders on GitHub as follows.

Generated TopoJSON which does work

Notice how the districts are colored orange, similar to the TopoJSON “North Pole line” in the bad file.

I guess ogr2ogr is better at handling shapefiles. TopoJSON was invented in order to make a more efficient geo-info JSON format than the rather “naive” GeoJSON, so it stands to reason that Bostock’s tool is better at GeoJSON to TopoJSON than it is at Shapefile to TopoJSON. Or at least that is my guess. I have no ability to judge the quality of the input Shapefile; maybe the thing was funky to start with.

For more information and all the files related to this task, check out my GitHub repo on this topic.

Update: Take 2 of the work in the post used topojson@1.6.20 and I am not sure which version was used for Take 1 but that version of topojson was installed over six months earlier.

Also, the City now has UI for exporting (read: downloading) their datasets as GeoJSON, which leads to another option: use topojson to convert the City’s GeoJSON to TopoJSON, no shapefile involved at all.

This year some of my projects needed maps of Seattle, in particular while working with Open Seattle and the City of Seattle. Specifically I have needed City Council District and census-related maps. The census geographic hierarchy has three levels for Seattle: Census Blocks are the smallest areas which are then aggregated into Census Block Groups which in turn are combined into Census Tracks. Of course, there are other administrative geographic units into which Seattle has be subdivided: neighborhoods, zips code areas, school districts, etc.

The City of Seattle actually does a good job of making this information available as shapefiles on its open data site. Nonetheless, what web developers want is to have the data extremely readily available in a source code repository (for modern open source, read: a git repo) and in formats that are immediately usable in web apps, specifically GeoJSON and TopoJSON.

So, in Open Seattle we have been building out such a repository of JSON files and hosting it on GitHub. That repository is called seattle-boundaries.

Some Seattle boundaries

As a further convenience, Seth Vincent has packaged up the data for distribution via npm. Additionally, he has also taken the maps and made available an API service, boundaries.seattle.io. This service will reverse-geocode a point into a list containing one entry for each map in the repo, where the list items are the GeoJSON feature to which the point belongs. For example, let us say you already know where in town the best place for dim sum is (read: you have a point) and you are curious as to which regions it belongs. The URL to fetch from is:
http://boundaries.seattle.io/boundaries?long=-122.323334&lat=47.598109

Over the last few weeks I surveyed every bit of available data on the 2014 Ebola Outbreak in West Africa that I could find. There were two major sub-tasks to the survey: broad-search and then dig-into-the-good-stuff.

Various ebola visualizations

For the first sub-task, the work started with cataloging the datasets on eboladata.org. I sifted through those 36 (and growing) datasets. My analysis of those can be found on the EbolaMapper wiki at Datasets Listed on eboladata.org. An additional part of this first sub-task was to catalog the datasets available at HDX and NetHope.

I have posted the conclusions of sub-task #1 on the wiki at Recommendations for Data to Use in Web Apps. The humanitarian community seems most comfortable with CSV and Excel spreadsheets. Coming from a web services background I expected a JSON or XML based format, but the humanitarians are not really thinking about web services, although the folks at HDX started on an effort which shows promise. Finally, for data interchange, the best effort within the CVS/spreadsheet space is #HXL.

The second major sub-task centered on hunting down any “hidden” JSON: finding the best visualizations on the web and dissecting them with various web dev-tools in order to ferret out the JSON. That which was found could be considered “private” APIs; it seems that there has not yet been any attempt to come up with a API (JSON and/or XML) for infectious disease records. At best, folks just pass around non-standard but relatively simple CSVs and then manually work out the ETL hassles. My analysis of the web service-y bits can be found on EbolaMapper wiki as well at JSON Ebola2014 Data Found on the Web.

My conclusion from the second sub-task is that the world needs a standard data format for outbreak time series, one which is friendly to both the humanitarian community and to web apps, especially for working with mapping software (read: Leaflet). Someone should do something about that.

The EbolaMapper project is all about coming up with computer graphics (charts, interactives, maps, etc.) for visualizing infectious disease outbreaks.

A sign of excellent news for any given outbreak is when the bullseye plot animations go static. For example, consider the WHO’s visualization shown below which is plotting data for the 2014 Ebola Outbreak in West Africa.

Each bullseye shows two datum: the outer circle is cumulative deaths and the inner circle is new deaths in the last 21 days. 21 days is the accepted incubation period and that is why Hans Rosling tracks new cases for the last 21 days. When the inner circles shrink to zero the outbreak is over.

Yet there are much lower tech ways of presenting information to people that can be quite affecting. On the grim side there are the graves.

Sadder still are memorials such as The Lancet’s obituary for health care workers who died of ebola while caring for others – true fallen heroes.

On the other hand there are signs of positive progress.

The image on the left is from a MSF tweet:

The best part about battling #Ebola is seeing our patients recover. Here, hand prints are left in Monrovia #Liberia

The image on the right is from an UNMEER tweet:

Survivor Tree in Maforki, #SierraLeone holds piece of cloth for each patient who left ETU Ebola-free. #EbolaResponse

Those must be quite uplifting reminders on the front lines of the ebola response. Likewise EbolaMapper should have positive messages, say turning bulls-eyes green when there are no new cases. That will need to be kept in mind.

As part of the EbolaMapper project, I reported on the state of the ebola open data landscape to the Africa Open Data Group. That report is currently the best overview of the EbolaMapper project, including its goals and deliverables i.e. open source tools for visualizing epidemiological outbreaks via the Outbreak Time Series Specification data APIs.

the state of government open data

The project name, EbolaMapper, has become misleading as there is nothing ebola specific about the project (although ebola data is used for testing). That name was chosen during the ebola outbreak in West Africa before the project’s scope expanded beyond just a specific ebola crisis. So, a better name is needed to go with the description “reusable tools and data standard for tracking infectious disease outbreaks, with the intent of helping to modernize the global outbreak monitoring infrastructure, especially by decentralizing the network.”

(When the project started, there was a serious dearth of open data on the crisis. The project eventually outgrew the immediate need of curating data on the outbreak. Eventually the UN started publishing ebola statistics, and independently Caitlin Rivers’ efforts seeded the open data collection effort.)

If the report is tl;dr then perhaps just check out the interactive Ebola visualizations.

The Humanitarian Data Exchange is doing great work and they have a well defined road map they are plugging away at.

The sub-national time series dataset hosted there is the first data feed I am using to test EbolaMapper against.

They have a very nice, interactive ebola dashboard. The story of how the dashboard came to be is impressive but I want to work to make that tortured path not have to be traveled for future outbreaks.

hdx-dashboard

They do not have case mapped to locations over time. For that The New York Times is the best.

I have finally found quality outbreak data with which to work:

Sub-national time series data on Ebola cases and deaths in Guinea, Liberia, Sierra Leone, Nigeria, Senegal and Mali since March 2014

The dataset comes from The Humanitarian Data Exchange. Those HDX folks are doing great work. More on them later.

I came to this dataset via a long, convoluted hunt. The difficulty of this search has led me to understand that the problem I am working on is one of data, as well as of code. This will need to be addressed, with APIs and discoverability but for now it is time to move on to coding (finally).

After I concluded that the data was usable, I started poking around on its page on HDX a bit more. On the left side of the page there are links to various visualizations of the data. This is how I discovered Google’s Public Data Explorer which is quite a nice tool. Below is one view of the HDX data in the Explorer. Click through the image to interactively explore the data over at Google.
google-pd-viewer Also among the visualizations on the HDX page was, to my surprise, the NYTimes visualization. Low and behold that visualization credits their data source as the HDX:

Source: United Nations Office for the Coordination of Humanitarian Affairs, The Humanitarian Data Exchange

So, that is good enough for me: the data hunt over. It is time to code.