There was a recent article in The New Yorker titled, “Steamrolled by Big Data,” which reminded me of the trend occurring now in the Life Sciences. Even though, unlike Google, we’re not working in such quantities as terabytes or petabytes, the amount of genomic information available has skyrocketed over the last ten years. Even just considering all which has come about from the Genome Wide Association Studies (GWAS) is overwhelming.
And that’s the problem: it’s overwhelming. We’ve figured out ways to very rapidly collect extraordinary amounts of data on living systems, and yet our capacity to fit all these puzzle pieces together is woefully lagging behind. –That’s not to say that since we can’t handle it all we should stop collecting it, but things need to be put into perspective. Unfortunately, we had very grandiose ideas as to how much we would learn from collecting so much genomic and cellular data, ideas which have not panned out. And why is that? Well, Kupiec (2009) suggests that
“… post-genomic biology requires enormous use of bio-computing to integrate the huge quantities of data collected by large-scale transcriptome and proteome analysis. The aim of these programmes is to identify all the RNAs and proteins in a cell in order to establish a map of the interactions they have with each other in the form of networks. It is thus hoped to arrive at a complete description of how a cell functions. However, scientific progress does not result simply from accumulating data. The observations made depend just as much on the theories which guide the research as on the reverse” (p. 2).
The above also applies to the genomic data we’ve collected to date. In spite of all the big data we now have at our fingertips and for all the supercomputers spitting out 1’s and 0’s, science still requires human beings to figure out what it means. But why? As Kupiec would assert, it is because biology is not deterministic– as our current state of genetics theories are still apt to posit– but probalistic. Given that the cell is exceptionally adaptable, interactive, and the boundaries between it and its environment are nominalistic at best, that means that genetic determinism is a woefully simplistic and inadequate concept. In short, it’s just plain wrong.
For awhile there, we had a glimmering hope for a lazy science, one in which we no longer had to think and reason but could simply sit back and wait for a program. With the advent of newer technology, scientists were foolishly hopeful that computers would basically do the work for us as though the calculations were as easy as arithmetic. But most biological data, though informative, is nevertheless ambiguous– tantalizing us with a variety of interpretations. Science requires something more than just calculators. It necessitates– dare I say it?– philosophers. The ways in which we collect data today adhere to the scientific method, but the ways in which we interpret it still hold much in common with modern science’s philosophical predecessor. While methods and technologies have advanced, the human mind still rationalizes the same ways it did during the Renaissance or Ancient Greece. Humans are, after all, human.
Says The New Yorker article:
Some problems do genuinely lend themselves to Big Data solutions. The industry has made a huge difference in speech recognition, for example, and is also essential in many of the things that Google and Amazon do; the Higgs Boson wouldn’t have been discovered without it. Big Data can be especially helpful in systems that are consistent over time, with straightforward and well-characterized properties, little unpredictable variation, and relatively little underlying complexity.
But alas, evolution has thrived on complexity and even the workings of a single cell are still out of our grasp. And the very state of our science reflects that complexity. Just take a wander through PubMed or Google Scholar to get a glimpse at how much data we currently have and how much of it is just sitting there doing nothing. Waiting.
–Meanwhile, while you’re there note how poorly synthesized it all is too. Browsing through search engines can be overwhelming, with hundreds if not thousands of publications being served up for a given query. How is one ever supposed to read through it all? I admit, even from within my own field I’m sure I’m familiar with only a small percentage of related and relevant materials, much less that of other fields of research, and undoubtedly it affects my capacity to interpret new data. In science as a whole, there is considerable inefficiency, squandering, duplication of efforts, and piecemeal leadership. Let’s face it, if we were a company we’d have gone bankrupt long ago.