Earlier this month, when Goldman Sachs had the gall to dump 5 terabytes of data in the lap of the Financial Crisis Inquiry Commission in response to a subpoena for information relating to its role in the mortgage meltdown, John Carney of CNBC had a novel idea: Use a crowdsourcing model to comb through the data to find what Goldman Sachs apparently wants to keep buried and unfindable. It's an interesting idea, but is it feasible?
According to Lukas Biewald, CEO of crowdsourcing service provider CrowdFlower, it is. I spoke with Biewald last week, and he said CrowdFlower is, in fact, trying to find the right channel to approach the FCIC. And since he sees it as a cause worth helping, he's even prepared to take a financial hit to do it:
We're trying to figure out exactly how much we could do. We would do the categorization and tagging of the first 100,000 documents [free of charge], just to show that it's possible, as a proof of concept. This would be through our paid work force, so that would be a significant expense for us. The other thing we could do is what we did in Haiti, where we took our platform to organize a horizontal, distributed work force, and make sure that people are taking the job seriously, and they're redundantly assigned tasks-all that technology that we built, and really believe is valuable, we would certainly offer that for a really, really low price for this kind of task. We don't look at the government as a great customer-Haiti was not a profitable operation for us. We want to really showcase what this technology can do, and this would be a great place to do it.
Of course, whether crowdsourcing is an option hinges on whether the data that Goldman Sachs provided to the FCIC is considered public information. Biewald argues that if ever there was a case for information to be open to public scrutiny, this is it:
It seems to me that if the government is issuing a subpoena to Goldman Sachs about the financial crisis, it should be public information. For example, all of the e-mails of Enron were read into the public record, and that's become a famous data set of e-mails. [The Goldman Sachs data] seems significantly less privacy-infringing than the Enron data set. Again, I don't know all the details here, and we're trying to mine our contacts from the Haiti thing to get to the right people in government to contact about this. There may be really good reasons not to release it, but there also may not be a good reason not to release it. If CrowdFlower could play a small part in helping the government find the outrageous things that Goldman Sachs was undoubtedly doing, I would feel really proud.
Biewald said what he needs now is the crowd's help to get more information on exactly what the government is thinking with respect to opening the data up to the public, and to find the right channel into the FCIC. If you can help, please speak up.