For those not familiar, transposable or mobile elements are segments of DNA that have, or at one time had, the potential for moving about the genome. This occurs either via a cut-and-paste method as used by the family of DNA transposons, or the copy-and-paste method as used by the retrotransposon families, which entails using RNA transcription as an intermediate step followed by reverse transcription which produces DNA for reinsertion into the genome.
Obviously, the ability of a segment of DNA to either copy- or cut-and-paste itself elsewhere holds considerable potential for disruptive mutations. But transposable elements (TE) in situ also have the capacity for mutation, in part because of their repetitive content. About half of the human genome is made up of repetitive content. This can come in the form of simple repeat elements, such as “AAAAAAA” or “CACACACA”, can expand to include short paired palindromic sequences such as “GTAACTG” and “GTCAATG”, and also includes larger segments of homology such as the TE sequences themselves. Because, for instance, Alus are related to one another, their sequences are extremely similar and therefore if they’re nearby one another there’s the potential that the DNA strands could slip and mispair, leading to a mutational event.
De Smith et al. (2008) found that Alus, more than any other TE, were enriched at deletion breakpoints within the human genome. This may be particularly relevant in order to understand the evolution of primate and human genetics because Alu elements are unique to primates. Data from our own previous study also indicated that TEs, including Alus, were particularly enriched in the intronic regions of autism-risk genes, although whether their enrichment in these areas may be the cause of autism-risk mutations is yet to be answered. However, it may help explain some of the increased frequency of de novo copy number variations (CNV) occurring in conditions like autism and schizophrenia [1, 2].
Figure 1 from Haesler and Strub (2006) illustrating the basic structure of the Alu elements. At the end of the right arm lies the poly-A tail, which is a long stretch of adenine nucleotides characteristic to both SINEs and LINEs.
If data from the fruitfly can be generalized to other species, it appears that the majority of TEs lie within heterochromatic regions. Heterochromatin, in contrast to euchromatin, are the gene-poor regions of the genome that tend to be under tighter epigenetic lockdown. Because it’s gene-poor, however, heterochromatin is probably under looser conservative constraints, hence higher retention of TE insertion over the generations. In one study by Kaminker et al. (2002), they found that less than 30% of studied TEs resided within the gene-heavy euchromatin.
But why might Alu elements be so special compared to other TEs? For one, they’re very short, usually less than 300 base pairs (bp) in length, whereas their larger cousins, the long interspersed elements (LINE), can be over 6,000bp when intact. This may be particularly important in their retention within the sequences they insert, perhaps leading to less disastrous effects compared to the larger elements. They also have this bizarre feature that tends to promote euchromatic (open) structure wherever they insert. Alus are highly overrepresented adjacent to housekeeping genes for instance, which are constitutively expressed genes necessary for very basic cellular functions . Eller et al. (2007) postulated that, because Alus promote euchromatic (open) formation and housekeeping genes need to be constantly expressed, Alus insertions have been positively selected for in these regions. Meanwhile, they found that Alus are underrepresented near genes that are tissue-specific and not constitutively expressed in all cell types.
But since we found an abundance of TEs, including Alus, in the intronic regions of autism-risk genes, are they potentially playing some role in the regulatory functions– and perhaps even mutations– within these genes? It’s hard to say and certainly much more work needs to be done to even approach an answer. For one, we know that autism-risk genes tend to be longer, which includes longer introns as well . We also know that long genes tend to have strongly conserved sequences (multispecies conserved sequences or MCS) contained within the introns, as compared to shorter genes like the housekeeping ones . These MCSes seem to preference TE insertion, though at a distance in the intron well away from the MCS itself, and they also, for some unknown reason, promote the expansion of the intron over time in a fashion not entirely due to TE insertion. We also know that recombination rates are actually lower in long genes than one would expect given their repetitive content .
The question remains: Even though recombination rates may be lower in autism-risk genes, are Alu sequences nevertheless a point of vulnerability in the development of rare de novo copy number variations? We’re currently working to address this question, but it will also be interesting to see what new questions arise from this line of investigation. I have the feeling that our Alu project is only just beginning…