Icebergs of Data
I was having a great lunch yesterday. One of those ones when you look back afterwards and realise you somehow talked seamlessly about 30+ different topics with the conversation always flowing. One of the topics we keep coming back to was data, about which I've written a lot.
When I was working predominantly with data I used to argue that, when collecting it, you should allocate the same amount of time to managing the it as you did to collecting. That is, if you spend an hour collecting video, then you should spend an hour afterwards properly sorting and cataloguing the video so that you can use it easily in the future. This, of course, never happened, because in the time you spent sorting and cataloguing, you could be COLLECTING MORE DATA!!!! And surely that would be a better use of your time, right? Because, you know, then you have more stuff....
Anyway, during lunch yesterday I started think of data analysis as an iceberg.
Historically, and still in most cases nowadays, we live in an inverted iceberg world. There is a huge amount of data collected, but under the surface only a small relatively small amount of time and resources spent on understanding it. What some of the 'Advanced Stats' in the NBA (for example) are starting to do is to tip this iceberg over so that what happens beneath the surface is actually vastly more than what is collected above the surface. That is, very sophisticated analysis is done with the data collected, providing a depth and breadth of understanding previously unheard of. To do this has required a huge paradigm shift in the way organisations look at science and analysis.