Synopsis
The GENCODE project produces high quality reference gene annotations for the human and mouse genomes. The CNIO is one of the seven members of the GENCODE consortium and carries out the computational analysis of coding genes and gene models.
The APPRIS database and web server, developed by the CNIO, maps protein structure, function and conservation information to alternative splice isoforms. It also selects one of the splice variants as the main (principal) isoform for each coding gene. APPRIS selects principal isoforms for the human and mouse genomes and for six other model organisms.
As part of the annotation of the human genome the CNIO has the unglamorous task of suggesting which annotated coding genes are least likely to code for proteins. We also suggest improvements in gene models based on structural and functional features.
The CNIO validates the coding potential of genes and gene models via careful reanalysis of large-scale proteomics experiments.
We have shown that while some alternative splice isoforms are functionally important, most are not. We have also demonstrated that the number of human protein coding genes is likely to be closer to 19,000 than the currently accepted 20,000.
Personnel
GENCODE – CNIO links
The APPRIS database: APPRIS
References
- Tardaguila M, et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res. 2018 PMID: 29440222
- Rodriguez JM, et al. APPRIS 2017: principal isoforms for multiple genesets. Nucleic Acids Res. 2018 PMID: 29069475
- Tress ML, et al. Most Alternative Isoforms Are Not Functionally Important. Trends Biochem Sci. 2017 PMID: 28483377
- Tress ML, et al. Alternative Splicing May Not Be the Key to Proteome Complexity. Trends Biochem Sci. 2017 PMID: 27712956
- Ezkurdia I, et al. The potential clinical impact of the release of twodrafts of the human proteome. Expert Rev Proteomics. 2015 PMID:26496066
- Abascal F, et al. Alternatively Spliced Homologous Exons Have Ancient Origins and Are Highly Expressed at the Protein Level. PLoS Comput Biol. 2015 PMID: 26061177
- Rodriguez JM, et al. APPRIS WebServer and WebServices. Nucleic Acids Res. 2015 PMID: 25990727
- Abascal F, et al. The evolutionary fate of alternatively spliced homologous exons after gene duplication. Genome Biol Evol. 2015 PMID: 25931610
- Abascal F, et al. Alternative splicing and co-option oftransposable elements: the case of TMPO/LAP2α and ZNF451 in mammals. Bioinformatics. 2015 PMID: 25735770
- Ezkurdia I, et al. Most highly expressed protein-coding genes have a single dominant isoform. J Proteome Res. 2015 PMID: 25732134
- Ezkurdia I, et al. Analyzing the first drafts of the human proteome. J Proteome Res. 2014 PMID: 25014353
- Ezkurdia I, et al. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet. 2014 PMID: 24939910
- Maietta P, et al. FireDB: a compendium of biological and pharmacologically relevant ligands. Nucleic Acids Res. 2014 PMID: 24243844
- Rodriguez JM, et al. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res. 2013 PMID: 23161672
- Harrow J, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012 PMID: 22955987
- ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 PubMed PMID: 22955616.
- Ezkurdia I, et al. Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function. PMID: 22446687
- Ezkurdia I, Tress ML. Protein structural domains: definition and prediction. Curr Protoc Protein Sci. 2011 PMID: 22045561
- Lopez G, et al. firestar–advances in the prediction of functionally important residues. Nucleic Acids Res. 2011 PMID: 21672959
- ENCODE Project Consortium. A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 2011 PMID: 21526222
- Tress ML, et al. Proteomics studies confirm the presence of alternative protein isoforms on a large scale. Genome Biol. 2008 PMID: 19017398
- Tress ML, et al. Determination and validation of principal gene products. Bioinformatics. 2008 PMID: 18006548
- Tress ML, et al. The implications of alternative splicing in the ENCODE protein complement. Proc Natl Acad Sci U S A. 2007 PMID: 17372197