How do you legitimately make your hacking skills pay off? By working in an enterprise information management program, according to David Champagne, CTO of Revolution Analytics, in an article on TDWI. Add that to the long list of skills for information management that I wrote about earlier.
Actually, he's referring to the work of Drew Conway, a former member of the intelligence community and now a Ph.D. student in political science at New York University. Along with hacking skills, Conway says the major skills that data scientists need are math and statistical knowledge, and substantive expertise.
Hacking abilities are important because data tends to reside in multiple locations, and in multiple systems. Finding and retrieving data sometimes requires the skills of a burglar-even when the data is in the public domain, owned by your organization, or owned by another organization that has agreed to let you use it.
He quotes Mike King, a quantitative analyst at Bank of America, speaking of the legacy systems at many enterprises:
You need to be familiar with different databases, different operating systems and different programming languages. Then you have to get those systems to communicate with each other. You must learn where and how to extract the data you need, and you better enjoy the process of figuring it out. This is not a trivial process, especially in a large organization. You need to be resourceful and create your own solutions.
The article actually is about the rise of data science and the debate over what exactly that is. He reports dismay among some data analysts over the departure in the field from the scientific method. Writes Champagne:
Remember the scientific method? First you ask a question, then you construct a hypothesis, and you design an experiment. You run your experiment, collect and analyze the data, and draw conclusions. Finally, you communicate your results and let other people throw rocks at them.
Nowadays, thanks largely to all of the newer tools and techniques available for handling ever-larger sets of data, we often start with the data, build models around the data, run the models, and see what happens.
This is less like science and more like panning for gold. Several data analysts interviewed for this article describe the current trend as "throwing spaghetti against the wall and seeing what sticks."