Half of our genome is garbage, but it could help the study of cancer

The genome that contains the instructions to make our organism looks more like a landfill than an orderly city. However, it does contain some key genes implicated in cancer and perhaps other diseases.

It is strange that much of the genome that contains the instructions for making organisms as fascinating as the elegant monarch butterflies that migrate to Mexico every year for the winter, or humans themselves, with their ability to solve problems, is more like a landfill than to the neat and organized blueprints of an architect.

What is fascinating is that genes implicated in the development of cancer and, probably, in other diseases have been found in these apparently useless regions known as the ‘junk genome’.

Why have we called it a ‘junk genome’ then?

With his work The Origin of Species, Charles Darwin transformed the concept of species, showing that all organisms were related and had a common origin. Nearly a century later, James Watson and Francis Crick put a face to the molecule that connects all living things: DNA.

However, it still took half a century to sequence the more than 3 billion bases that make up the human genome. The expectations were really high: it was hoped to identify more than 100,000 protein-coding genes – or what is the same, with a “clear” function – that would help to understand the enormous complexity of the human being.

But, as almost always in science, the reality was very different. There were barely 20,000 functional genes, little more than those identified in a fly or a worm. Simultaneously, we discover that the genome is riddled with repetitive sequences, defective genes, ancient viruses… with no apparent function. Hence, this half of the genome is called the ‘junk genome’.

Of course, in this case, garbage is not synonymous with useless. As in an archaeological excavation, the analysis of these sequences allows us to identify remains of what we once were. Thus, we can identify remains of genes involved in the formation of eggs (present in oviparous animals) but which, being unnecessary in placental mammals, have accumulated lethal mutations. In a way, they have been “abandoned”, like a broken amphora in a landfill.

And why are there so many repetitions in this “junk genome”? That’s the fault of the virus. With their machinery to make copies of themselves, they have caused small genome sequences to be replicated in the human genome hundreds or thousands of times. All this has given rise to numerous repeated sequences.

The end result is that our genome resembles the blueprint of a building. Only that it is a very peculiar plane, somewhat confusing, because in its final version all the previous sketches have been reflected, along with doodles and copies of structures.

These copies, which in the case of DNA are called ‘repetitive sequences’, make up about half of the human genome. And, far from being useless, they contain essential elements for the functioning of the cell.

On the trail of the ‘junk genome’

Researchers need to find out where in the genome a certain sequence came from. The challenge would be equivalent to being given an aerial photo of a building and locating the country, city and street to which it belongs.

Identifying where a sequence comes from is relatively easy when it is a unique sequence. But when it comes to a repetitive sequence, it is not easy to identify which chromosome it comes from. The same thing happens if we have an image of a small piece of desert: it is hard to determine exactly where the photo was taken.

This is the reason why the junk genome, this repetitive genome that represents almost half of our genetic material, has not yet been analyzed in detail in the context of cancer.

Junk genome mutations implicated in cancer development

In order to explore the possible presence of mutations in the repetitive genome, our group developed a bioinformatics tool called ‘Armadillo’. This tool makes it possible to combine all the sequences of a repetitive gene into a single sequence and analyze whether there are mutations in tumor samples that are absent in healthy cells from the same patient.

In this way we were able to identify a gene, called U2 small nuclear RNA, which had base 28 mutated in dozens of tumors, especially hematological, –chronic lymphatic leukemia, various types of lymphoma…– as well as in prostate cancer. This is a gene that, by definition, could be considered part of the junk genome, since we have multiple identical copies of the same sequence. However, it participates in a fundamental process for cell function, directing the maturation of most RNAs that are translated into proteins.

All this shows that, again as in a good archaeological excavation, although we have already found an important and valuable part of what we are looking for, there are still great things to discover in the genome. Especially in the repetitive regions, which we have only just begun to explore.

It will be easier now that the human genome has finally been sequenced using new sequencing techniques that allow longer sequences to be generated. Thanks to this technology we have filled many of the gaps that still remained in the draft that we had for 20 years.

The application of these same technologies for the sequencing of cancer genomes will make it possible to explore these repetitive regions that until now have remained inaccessible to cancer research.

This article has been published in ‘
The conversation‘.

#genome #garbage #study #cancer