Responses to the ENCODE Project: When Scientists Have to Deal with Beetles in Boxes

How is the word “function” like a beetle in a box? No, this isn’t a twist on the Mad Hatter’s “Why is a raven like a writing desk?”

As I’ve covered in a previous blog, the ENCODE Project released some astounding results of their multi-year study last September, claiming that approximately 80% of the human genome is functional. Following this, numerous statements were made that hailed the death of “junk DNA”. In response, the findings have received a heavy backlash from many scientists who disagree with the interpretations of the results and have since tried to, shall we say, reanimate junk DNA from beyond the grave. And as far as potential paradigmatic shifts are concerned, from my vantage point it seems they’ve been fairly successfully quashed (to my own chagrin).

One of junk DNA’s more outspoken proponents, Dan Graur, an evolutionary molecular biologist working at the University of Houston, has published a notable rebuttal in the journal, Genome Biology and Evolution. In it, he argues that the grandest flaw within the consortium project methodology was its selected definition of the term “function”. ENCODE opted to define it by three primary variables in which a DNA sequence was considered “functional” if it:

1) produces an RNA transcript

2) binds a protein

3) is methylated.

Graur, in contrast, argues that functionality should be defined by sequence conservation according to population genetics, which he says gives a better indication of function over evolutionary time. From his definition, this would indicate that only approximately 10% of the human genome is functional.

Who is correct?

Language is an exceptionally complex, cultural, and personal thing. It’s a symbolic abstraction which we use to communicate ideas to one another. And yet language is so embedded in our ways of life that it can take on a feeling of concretion, like it’s as real as the cells we use it to describe. To understand precisely why scientists are arguing so fervently with one another over “function” and why this is providing such a challenge to the communication of biological concepts, we can glean some help from the great linguistic philosopher, Ludwig Wittgenstein.


To give us some clarity through metaphor, Wittgenstein would place a small box before each of us. In each box, respectively, is something each of us would term a “beetle”. We examine the “beetles” in our own boxes and then we talk of our “beetles” with one another. But each person, regardless of the term, may have something entirely different in his box. One may have a beetle, another a spider, another a piece of string, and another perhaps nothing at all. The point which is made is that the definition of a “beetle” is unique to each individual, a “private language” as philosophy calls it. Meanwhile, the means to communicate our concept of beetle, the word “beetle” itself, is part of a common language. But the common language, at best, is a broader schema which can communicate general concepts but breaks down when one attempts to derive a precise universal definition for an idea.

beetle in box

The same problems we have in attempting to define “love” or “pain” scientists are currently having in defining the word “function” and even “junk DNA”. While there may be some overlap, obviously not everybody can agree. ENCODE has one definition of functional, Graur another. Neither is necessarily wrong. Although that’s not to say that one definition may not be more useful than another under different circumstances of investigation.

What is simultaneously interesting and frustrating is that scientists are spending a hell of a lot of energy  vehemently arguing over who is wrong, when the problem is primarily one of language, not of science. And it’s a worthy passtime for Philosophers of Science to study, but is it really a good allocation of time for scientists? It seems to me that arguing in circles over words doesn’t get us very far in understanding the implications of the ENCODE studies, neither their flaws nor their relevance.

Let’s face it, everybody probably has a slightly different definition of “function”. I know mine would match neither of the above. And in this instance I think it would be useful for scientists to acknowledge those inherent differences. Treating the term “functional” as a concrete identifiable entity will only lead to further confusions, raised tempers, and bruised egos. Look at the debacle so far.

elephant copy

3 responses to “Responses to the ENCODE Project: When Scientists Have to Deal with Beetles in Boxes

    • Err. Sorry. Responding on a phone. Anyways, my take was the skeptics of the encode project had the upper hand if your definition on function involved our ability to detect any phenotypical difference between organisms with changes in “junk” DNA. In other words, if the changes resulting from a difference/duplication/removal of these sequences of DNA don’t manifest meaningfully from a physiological standpoint, what does it really matter if those sequences are methylated or not?

      It has been a lot of wrangling and allocation of research resources, that’s for sure.


      • Well, one thing which springs immediately to mind is that “functional” is not the equivalent to “vital”. Entire genes (depending on the gene) can be knocked out with little consequence to the larger organism. In addition, if one wishes to determine “function” via the importance of a given sequence’s absence and whether phenotype is altered, a few problems lie in this particular definition: 1) some sequences, while functional, may nevertheless serve a redundant purpose such that a measurable difference in phenotype is not identifiable; 2) actually identifying a change in phenotype can be damn near impossible without the right measurements and if you don’t know what you’re looking for (the proverbial needle-in-a-haystack); and 3) some definitions of “functional” or “useful” may only be identifiable over long periods of evolutionary time.

        In my own work I’m interested in what specifically underlies different kinds of mutation susceptibility and how particular genes (e.g., autism-related genes) may be more prone to mutations in general; in fact, the non-coding intronic regions may actually be quite important for the evolutionary trends seen in these types of genes. For the reasons that introns tend to house more transposable elements, microsatellites, etc., which can affect the overall stability of the gene overall and lead to greater adaptability.

        Now, most people might not consider mutations “functional”, but to me, in the larger evolutionary scheme of things, mutational rates have defined the specific sequence conservation and have either led to generalized stability or more rapid evolution of gene families. If intronic sequences play an important role in defining gene stability, and stability ultimately determines the level of gene adaptability, then throughout molecular evolution introns could be considered quite important and serve a “function”.

        Making the point that functional is extremely variable and really depends on how you’re using it and in relation to what research. In my research, it makes sense. Perhaps my definition for Graur’s research just doesn’t mesh.

        Another problem with Graur’s conservation arguments, to my understanding, is that they are based upon primary (i.e., “digital”) sequence conservation (e.g., ATGGGCTAGTCATGCTTTAGTCACCACCA). However, other levels of molecular conservation are undoubtedly present, such as tertiary conservation. The precise sequence may not remain conserved, but the 3D formation which the gene region generally takes could be. Such conservations are seen often with homologous proteins even though the exact amino acid sequences may differ. Therefore, the 10% estimation of primary conservation is probably a severe underestimate even from Graur’s standpoint. There are undoubtedly other levels of conservation, such as quarternary conformation in which DNA interacts with other molecular partners (RNA, protein, ions, etc.).

        Given that Biochemistry as a discipline would find it exceptionally difficult to predict largescale tertiary/quaternary DNA conformations, it would currently be impossible to predict “functionality” based upon sequence conservation. But it is safe to presume that 10% is a considerable underestimate.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s