Blogging for America

Filling Your Open Data Catalog

So your city has deployed a shiny new data catalog as part of your new open data initiative…now what?!

Cities produce a great deal of data, all of which is probably interesting to *somebody* out there. Given our limited municipal data resources, how do we prioritize the datasets that we publish? Here are some of the guidelines that we’re using in Santa Cruz, Calif. as we work to deploy our Open Data Catalog.

1. Start with what you’ve got

In Santa Cruz, the City was already producing a handful of machine-readable datasets and publishing them regularly on the City website. Those were the first ones we added to the catalog, and the first ones that we tried to create a formal publication process for.

2. Get the maps in next

Give the mashup artists something to work with by publishing all of your geographic information system files, layers, and other assets. If your city has no GIS or shapefile assets whatsoever, then at least redistribute or link to the Census Bureau’s TIGR line set for your area.

3. Pick your pilots

In Santa Cruz, we chose the Water Department and the Police Department to be our test pilots. We’ve started working with each department to analyze the data assets they maintain and see where we can automate existing Excel-and-PDF-based processes. Once we’ve worked the bugs out of the data acquisition, cleanup, and publication processes, we can move on confidently to other departments.

4. Wisdom of the crowd

Look at what’s working in other cities. Check out a list of New York’s most popular datasets. See what the most frequently-requested datasets in Philadelphia are. Assume your city’s data consumers will have similar needs, within reason. For instance, Open Austin (in Austin, Texas) probably won’t be adding surf conditions to their portal, but we’ve already got requests for that in “Surf City.”

5. Get the rest of your departments involved

City departments might find that their biggest data consumer is another city department. In Santa Cruz, one of the anticipated results of our open data initiative is an increase in interoperability between departments and their respective systems. Helping departments successfully integrate data from other admin units can dramatically improve support for open data efforts.

6. Target specific user groups

Once your open data catalog has been published and seeded with some initial datasets, let consumer demand help drive publication priorities. In Santa Cruz, we’re planning outreach initiatives to solicit input, and showcase data for a variety of groups, including:

  • other governments: neighboring cities, Santa Cruz County, the State of California, US federal agencies
  • local community groups/NPOs/NGOs
  • local businesses
  • media and journalists
  • regional academics
  • niche interest groups: environmental, cycling, surfing, etc.

7. Set out the suggestion box

Don’t forget to provide a mechanism for all users of your catalog to request additional datasets. No matter how many interest groups you identified, someone will have needs you didn’t anticipate. Also consider publishing the requests themselves and the fulfillment status of each request as an added measure of transparency; this can help demonstrate that open data is a priority within your city.

And then?

That should be enough to get you started but the data publication task is never done — keep following what’s going on in other cities, states, and nations around the world and make sure your catalog’s users are getting what they need!

Code for America Labs, Inc is a non-partisan, non-political 501(c)(3) organization. Content is licensed through Creative Commons.