This version of the site is now archived. See the next iteration at v4.chriskrycho.com.

A Plea for Open Data

One of my current side projects involves some database work for a client in an academic context. There is an enormous trove of data being collected by the project, but the local administrators refuse to publish the data on the internet themselves. This despite the fact that it’s already being published to their academic intranet. This despite the fact that they’re willing (with some persuasion) to pay an outside contractor to develop a means of displaying the data for all the public to use.

I’m not sure what’s driving this sort of recalcitrant refusal to share the data, but I can’t see there being any good reason.

There are plenty of reasons to keep secrets. If you’re developing a new product and don’t want your competitors getting a drop on you, or are close to making a breakthrough that could make you enormous amounts of money, it may well be in your best interests to keep your work behind closed doors. Whatever one thinks of Apple and its sometimes hamhanded approach to dealing with leaks, the company at least has a good reason for preferring secrecy; it’s part of its marketing scheme and it helps prevent copies from flooding the market too quickly. (Consider how long it took to get a real competitor to the iPad to market, and you’ll quickly grasp the power of secrecy.)

If you’re collecting data for a scientific endeavor, though, I can’t see the benefits unless you’re going to be applying for a patent. The project I’m working on is nothing like that. It’s not developing technology; it’s recording data – data that is widely useful and ultimately available. We just have to jump through hoops first.

While there is real utility in the database work I’m doing on this project, it would be far more useful for the data to be publicly available to everyone as soon as it was ready to be published. I’m creating a number of ways to access and display the data, but what if someone thinks of a novel way to recombine the data themselves? They can certainly use various combinations of reports my tool will generate for them. Or, if the data were freely available to them, they could remix it however they liked, without the constraints of the particular reports my tool is supposed to generate.

This ties into a broader picture culturally: the more openly available any kind of information is, the more readily people can reuse it, including for purposes the creator may never have imagined. The public availability of the data is a step in the right direction – I’m glad that the techs in the academic program I’m working for recognize that much at least. However, really open data goes further: it makes all the data as available as it can be. A truly open data approach might still supply some tools, like the ones I’m building, to make managing the data easier for the general use case. However, it would also leave the door open for direct access to the baseline data, getting out of the way of someone who conceives of a novel use for the information.

People my age and younger are seeing cultural media (songs, movies, etc.) not so much as fixed artifacts but rather as something to be adapted to new uses and presentations. This is part of what has caused such tumult in old media industries used to protecting their content from being reused in any way. A few particularly forward thinking artists have recognized that, with the right constraints, the remix culture can be the best kind of advertising – free. A kind word asking for credit to be given where credit is due and not to charge for remix work is nearly always honored.

The same is true of data, where oftentimes there is nothing to protect – just old habits of keeping things under wraps. The more that universities and researches push their data out for the rest of the world to use, the more that data can accomplish. A single team of researchers will always have limited time and resources, not to mention biases in their aims for the data. But another group of researchers may find that data allows them to do complimentary work at lower cost – a benefit to all of us. Toss in the sorts of contributions that hobbyists sometimes make, especially in the tech industry with its history of open-source innovation, and open data is a recipe for success – not only your own success, but also everyone else’s success.