Friday, February 09, 2018

Are splice variants functional or noise?

This is a post about alternative splicing. I've avoided using that term in the title because it's very misleading. Alternative splicing produces a number of different products (RNA or protein) from a single intron-containing gene. The phenomenon has been known for 35 years and there are quite a few very well-studied examples, including several where all of the splice regulatory factors have been characterized.

The number of known examples is quite small in any given species. In contrast, the number of different splice variants is enormous. Most human genes, for example, are associated with a dozen or so different variants that have been detected over the years. Almost of of these splice variants have been rejected by genome annotators because they are very rare, never leave the nucleus, and are never present in sufficient quantities to be functional. They are undoubtedly junk RNA produced by the sloppy spliceosome. This kind of noise should not be called alternative splicing because that term should be restricted to real examples that produce functional variants by some sort of regulatory mechanism.

This seems like common sense to me but, unfortunately, most scientists disagree. They continue to refer to any example of splice variants as alternative splicing even though they might be just splicing errors. In fact, most of these scientists don't even consider the possibility of splicing errors. See the following posts for a more thorough discussion of this problem.

Debating alternative splicing (part I)
Debating alternative splicing (part II)
Debating alternative splicing (Part III)
Debating alternative splicing (Part IV)

A recent paper by John Mattick and his collaborators highlights the problem (Deveson et al., 2017). Recall that Mattick is a prominent opponent of junk DNA. He thinks that most of the genome is devoted to producing regulatory RNAs. His "proof" is pervasive transciption. He claims there are thousands and thousands on long nocoding RNAs that have a function [John Mattick still claims that most lncRNAs are functional].

His most recent paper employs the latest technology for detecting RNAs in a cell. The authors highlight the fact that they can detect very low abundance RNAs. They apply the technique to map all the RNAS complementary to the DNA on human chromosome 21. They choose three tissues; testis, brain, and kidney. Two of these tissues are well-known examples of noisy transcription.

The results are not unexpected. They detected an enormous number of different transcripts covering most of the non-repetitive DNA in chromosome 21. Each protein-coding gene matched to dozens of different splice variants in addition to the standard mRNA. Although the authors make passing reference to the controversy over splicing, it's clear that they treat all of these mRNA variants as examples of true alternative splicing. But that's not the main point of their paper. The main point is that the rest of the chromosome specifies a large number of noncoding RNAs and those RNAs exhibit an enormous diversity of splice variants. The result is nicely captured in their summary image (right).

The old RNA-Seq view is shown in the upper-right part of the image. A typical protein-coding gene produces a number of splice variants that I assume are examples of splicing errors. Mattick and his colleagues assume they are due to alternative splicing. The noncoding part of the genome is complementary to another set of transcipts with a limited set of splice variants. Mattick assumes these regions are genes and the RNAs are functional, although he has no proof of that. I assume that most of these RNA are spurious transcipts of junk DNA. This should be the default assumption.

The new view is derived from their more exhaustive analysis of very rare transcripts. There are more splice variants from protein-coding genes but the increase is not enormous. In contrast, there are many more variants RNAs from the rest of the genome and this includes an enormous diversity of different exons. The title of the paper say it all: Universal alternative splicing of noncoding exons. Here are the main conclusion of the paper ...
We propose that noncoding exons are functionally modular, with alternative splicing generating an enormous repetoire of potentially regulatory RNAs and a rich transcriptional reservoir for gene evolution. (abstract)

One can envision a scenario where individual noncoding exons interact independently with other biomolecules (proteins, RNAs and/or DNA-motifs), organizing these around the scaffold of a noncoding transcript. In this way, alternative isoforms could assemble different collections of binding partners to dynamically regulate cellular processes. (discussion)
Yes, it's true that one could envisage such a scenario. One can image many things, but the real question is not how potent your imagination is but whether it's realistic.

Scenarios should be based on facts and not on wishful thinking. In this case there's a lot of evidence that most of our genome is junk. If you are going to propose that most of it contains genes for regulatory RNAs then you have an obligation to refute or discredit the evidence for junk. This paper doesn't do that.

Similarly, there are many good reasons to suspect that splice variants are mistakes in splicing. The variants are not conserved, most are present at less than one copy per cell, splicing errors are known to occur at relatively high frequency, and very few have been shown to have a function. The default assumption must be that they are junk RNA unless proven otherwise.

Mattick and his colleagues dismiss some of these objections using arguments that make no sense. The problem with this paper is that it is promoting an extraordinary claim without any serious evidence of function, let alone extraordinary evidence. I don't understand how it passed peer review. The data may be fine but the interpretation and the conclusions are not.

I think the tide is turning against Mattick and his supporters but perhaps that's just wishful thinking on my part. Take a look at the RNA variants in the lower right-hand corner of the figure. How many of you believe they represent exquisite fine-tuning of a regulatory RNA? How many of you think they are mostly transcriptional and splicing errors?

Deveson, I.W., Brunck, M.E., Blackburn, J., Tseng, E., Hon, T., Clark, T.A., Clark, M.B., Crawford, J., Dinger, M.E., Nielsen, L.K., Mattick, J.S., and Mercer, T.R. (2017) Universal alternative splicing of noncoding exons. Cell Systems, 6:(1-11). [doi: 10.1016/j.cels.2017.12.005]


  1. As usual, you must be right Larry. Just don't forget to put it in your book.

  2. Out of curiosity I had to check for what chromosome 21 primarily codes for. What I found indicates that it's very involved in brain morphogenesis and reproduction. Correct?

    1. It's not like there are chromosomes that primarily code for anything. When you compare synteny across species, you realise that over millions of years genes have been shuffled and reshuffled continuously. This leaves one with the inevitable impression that whichever genes happen to be grouped together on a chromosome at any point in time are going to ultimately be a coincidence.

    2. It's not like there are chromosomes that primarily code for anything. When you compare synteny across species, you realise that over millions of years genes have been shuffled and reshuffled continuously. This leaves one with the inevitable impression that whichever genes happen to be grouped together on a chromosome at any point in time are going to ultimately be a coincidence.

      My cognitive science related interests made it necessary for me to first check whether the chromosome being used as a good example of lncRNA activity is known to be involved in brain development or reproduction. If it were not then I would have been less interested in the topic and went back to work on the next step after my previous cognitive model, with a first of its kind spatial reasoning system using waves instead of "connection weights" typical of modern AI. Neuroscience is reporting that the "engrams" of our memory are not in the connections, after all. Focus is now most on the RNA that's inside brain cells. At Reddit I plucked the best part out of a long article on the topic:

      I conceptualize gene locations being most dependent on where things end up after replication, when the chromosomes have fully uncoiled into intermingling territories and networks are at full function again. It seems most important to preserve that topology, even where that ends up breaking some connections causing re-splicing to the nearest sites around. So I can agree by saying that where things end up after being coiled for replication is very different from where they need to be when not.

      I also have software and .png images I made for digitally banding Chromosome 2, and later highlighting features of the fusion site with single pixels against a black background:

      In regards to bridging AI to Neuroscience the model I now have is from what I can see: all that's left standing. At least part of the reason why is maybe because using waves and a small number of simple rules at each place in the map to for complex place avoidance behaviors of animals being something only a kook like me would actually code, but it worked!

      Going into more cellular detail requires modeling in the basic functions of the RNA and protein systems, which all together as in slime molds too makes cells inherently good at anticipating events and acting ahead a time. I expect almost most or all of that behavior to be caused by the system outside the nucleus. Inside the nucleus is for the over the long haul morphological behavior of the system which also can be modeled in when enough is known for it to be possible to do so, shuffling genes and all.

    3. @Gary Gaulin

      You have already been warned once about spamming my blog with your kooky ideas. This is your second warning.

      Do it again and you will be banned.

  3. This, perhaps: