Continuing the Debate on “Function” in Junk DNA: Rethinking the Onion Test

“From our very early days we learn to react to situations with the appropriate responses, linguistic or otherwise. The teaching procedures both shape the ‘appearance’, or ‘phenomenon’, and establish a firm connection with words, so that finally the phenomena seem to speak for themselves without outside help or extraneous knowledge. They are what the associated statements assert them to be. The language they ‘speak’ is, of course, influenced by the beliefs of earlier generations which have been held for so long that they no longer appear as separate principles, but enter the terms of everyday discourse, and, after the prescribed training, seem to emerge from the things themselves” (Feyerabend, Against Method, p. 52).

I’ve discussed in a previous blog the pitfalls of language in science, utilizing the recent debate over the term “function” following the publication of the ENCODE findings as a prime example. In short, ENCODE researchers defined “function” by whether the DNA in question produces an RNA transcript, binds a protein, or is methylated. Many researchers, some extremely vocal, proclaimed that such a definition was far too lax and never addressed whether such a multitude of transcripts, DNA-protein interactions, or methylation actually provide a service to the cell or are coincident, the latter an extreme likelihood within sequences that exhibit relatively weak conservation.

However, the problem even with ENCODE opponents’ definition of “function” is that, like Feyerabend’s quote above suggests, it is embedded in a history of 20th century genetics and cellular biology that tends to view DNA mostly as a primary sequence and fails to consider secondary or tertiary forms of sequence conservation and therefore usefulness or function. It also fails to consider “function” in a grander sense, one which covers a deeper chronology than the immediate state of the cell. One which, though most modern geneticists would find a hard time explaining it, for example shows that many fragile sites are well-conserved between man and mouse despite 75 million years of separation [1]. This sort of conservation eludes a primary sequence approach, cannot be explained by usefulness through transcription, binding, or methylation, and yet it exists. But why would these highly mutable sites be maintained within the genome were they not to serve some “function”? Sometimes, “function” may not simply be a function of conservation but one of constrained adaptability and a means for change.

The most beautiful examples of conserved adaptability comes from contingency genes in certain pathogenic bacteria and parasites, which house key genes that are highly mutable under duress and, when mutated, bring about changes in phenotype that are potentially adaptive in reaction to stress, e.g., increased heat, antibiotic exposure, etc. Some microbes, such as bacterial influenza and meningococcus, have contingency genes that house numerous simple repeat sequences that provide for inherent instability and the potential for adaptability [2]. Meanwhile, others such as E. coli, utilize the well-regulated active insertion of transposable elements in key sequences to combat duress, such as starvation [3]. Interestingly, in this latter case of E. coli, it is not the exact primary sequence per se that is conserved but its capacity to allow insertion in a given location.

In eukaryotes, intronic and intergenic regions of DNA tend to be rife with both repetitive sequences and tranposable element content, the two frequently overlapping and both maintaining the tendency towards unstable alternate conformations such as hairpin loops [4]. Common fragile sites, which are often conserved across species, likewise share similar traits though usually encompass much larger regions. Traditionally, and as this ongoing debate has highlighted, intergenic regions have usually been considered “junk DNA”, DNA which holds no use other than as a store for potential reuse should necessity and luck arise. Meanwhile, the most service which introns theoretically provide is one of regulation. Proponents of this notion proffer examples such as the oft-repeated “onion test”, which is intended as a reality check rather than any sort of true test. The Onion Test uses a basic comparison of intergenic content between the onion and man, pointing out that the onion houses far larger “junk DNA” content than the human genome, with the argument that were intergenic regions to actually perform some basic or conserved function, that the differences in size would be smaller.


On the one hand, such an example lends one a moment for pause or reflection– for instance, why are the differences in size so great?–, but size is a poor means for addressing function, either in a broad or conservative fashion. The onion, Allium cepa, houses extensive extinct transposable element content, one of the largest of known plant genomes [5]. Judging intergenic regions across species by size makes the assumption that all intergenic sequences are inherently equal. It fails to account for different patterns of vertical and horizontal inheritance of different types of transposition events. Not all transposons are the same and not all species share similar transposons, which would undoubtedly lead to different patterns and rates of insertion over time and adaptive change in genes. This accounts for the differences in size between the large-genomed onion and the small-genomed Arabidopsis thaliana [6].

Such a divergence, however, doesn’t mean that fragile sites, intergenic, and intronic regions across species are incomparable and that these regions serve no conceivable function. Though there are extraordinary differences in genomic size between man and microbe, AT-rich intergenic regions have consistently provided a means for genetic adaptation [7]. Even though such adaptation can be immediately apparent in a pathogenic bacterium which is capable of eluding broad spectrum antibiotics, with cause and effect easily observable in a short period of time, a similar instability has still been a primary means for eukaryotic adaptation over millenia, though is much more difficult to measure and observe. As a prime example and one which I blogged about just last week, let’s take the numerous olfactory receptor (OR) genes in the mammalian genome. In humans, there are hundreds of these closely-related genes which cluster around similar regions of the genome. These tiny genes, usually less than 1,000 base pairs in length and containing no more than two exons, often cluster around telomeres, the highly unstable, highly repetitive ends of chromosomes– something they share in common with microbial contingency genes [8]. While the gene content of these sequences may initially make one wonder why they are so extremely unstable, often turning up as false positives in cancer studies, one merely needs to look at the intergenic content between them to find that they are surrounded by unusually high transposable element content which promotes high rates of copy number variation and recombination [9]. Meanwhile, with multiple-exon genes, the larger their intronic content the more unstable they tend to be over time, including that of their coding sequence (unpublished data). Interestingly, an unusually high number of genes lie nearby fragile sites, suggesting these regions’ importance in the evolution of protein-coding genes in general [10].

I should state clearly that I am not claiming that every single unstable sequence, as an individual sequence, necessarily serves a conserved function, though there are undoubtedly single examples to the contrary. What I am proposing is that general locations within genomes, specifically those intronic, intergenic regions, and fragile sites which are often considered “junk” or bordering on useless, have subserved an overall inherent instability and provide the greatest evolutionary means for adaptive divergence. Considering the longterm “usefulness” of adaptation, I propose that though “junk DNA” may serve debatable functions in terms of transcription, protein-binding, and methylation, they are a loosely-conserved though vital mechanism for genetic evolution. Without junk DNA, eukaryotic and prokaryotic life would not exist as we know it, if at all.

I don’t know about you, but I consider that pretty damned “functional”.

4 responses to “Continuing the Debate on “Function” in Junk DNA: Rethinking the Onion Test

  1. When reading through your “descriptions of functionalism,” I could not help but notice that what you describe as “inherent instability and . . . adaptability. . .” resembles potential energy surfaces for “chemical reactivity.” Please correct me if I don’t understand. But, in simple physical chemistry terminology–the fundamental instability of pathogens would place them on a “saddle point” of a potential energy surface. Given that notion, if the energy barrier to invading a host is too great, then the pathogen will either “die out” or find a “more suitable host.”
    Please forgive me for the oversimplifications, but the piece is very deep and thought provoking. Thanks for the posting it.

  2. Hi, jaksichja. Your question is a very interesting one, and one which I can’t profess to have wrapped my head around, being that my physics and chemistry knowledge is shamefully lacking. Do you have any summary literature you could recommend that could give me a wee bit of background? I’m painfully ignorant but would like to be able to respond to your thoughtful question. 😳

  3. Pardon me for not getting back to you sooner. My impression is that I have over-simplified the concept—however, one fairly good link is from Yale:
    I used the concept by analogy–I do apologize for the embarrassment.

    It has been my observation that Biology and Molecular Biology are becoming mathematical in concept & practice as computational tools increase in sophistication, as well. Furthermore, it would also seem that my observation may require a lot of computational power utilizing
    traditional concepts of potential energy surfaces.

  4. One last clarification–I used the term “saddle point” in my original comment. The “saddle point” would correspond to a “transition state” for a potential energy surface. The term had been used in “the past” because “transition states” may resemble a saddle. And, the due to energetics–the “transition state” molecule (or in my example–a pathogen) may go in one of two (?) paths. The paths correspond to the which is “most energetically favorable–or easily traversable upon the surface.” It is easier to understand if one views the concept as a “path that is easier to traverse–possibly a path that utilizes less energy to reach its final target.”

    Again, I must apologize if I have made it too hard to understand?

Leave a Reply to jaksichja Cancel reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s