转录组工具文献介绍

声明：以下内容转载自360图书馆。 />前端大法好，网页内容随意复制</ 一、比对工具 (Kim et al., 2015) HISAT: a fast spliced aligner with low memory requirements. Nature methods.

Aligns RNA-seq reads to a reference genome using uncompressed suffix arrays. STAR has a potential for accurately aligning long (several kilobases) reads that are emerging from the third-generation sequencing technologies.

(Dobin et al., 2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics.

Self-training Algorithm for Splice Junction Detection using RNA-seq.

(Li et al., 2013) TrueSight: a new algorithm for splice junction detection using RNA-seq. Nucleic acids research.

A toolkit for processing next-gen sequencing data. These programs were also implemented in Bioconductor R package Rsubread.

(Liao et al., 2013) The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic acids research.

(Rogers et al., 2012) SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data. Genome biology.

(Philippe et al., 2013) CRAC: an integrated approach to the analysis of RNA-seq reads. Genome biology.

A fast splice junction mapper for RNA-Seq reads. TopHat aligns RNA-Seq reads to mammalian-sized genomes using the high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

(Kim et al., 2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol.

(Chu et al., 2015) SpliceJumper: a classification-based approach for calling splicing junctions from RNA-seq data. BMC bioinformatics.

(Srivastava et al., 2016) RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics.

A framework for genome-based transcript reconstruction and quantification. CIDANEis engineered to not only assembly RNA-seq reads ab initio, but to also make use of the growing annotation of known splice sites, transcription start and end sites, or even full-length transcripts, available for most model organisms. To some extent, CIDANEis able to recover splice junctions that are invisible to existing bioinformatics tools.

(Canzar et al., 2016) CIDANE: comprehensive isoform discovery and abundance estimation. Genome biology.

An open source tool for accurate genome-guided transcriptome assembly from RNA-seq reads based on the model of splice graph. An extension of our program CLASS, CLASS2 jointly optimizes read patterns and the number of supporting reads to score and prioritize transcripts, implemented in a novel, scalable and efficient dynamic programming algorithm.

(Song et al., 2016) CLASS2: accurate and efficient splice variant annotation from RNA-seq reads. Nucleic acids research.

二、Read数统计 An RNA-seq read counting tool which builds upon the speed of featureCounts and implements the counting modes of HTSeq. VERSE is more than 30x faster than HTSeq when computing the same gene counts. VERSE also supports a hierarchical assignment scheme, which allows reads to be assigned uniquely and sequentially to different types of features according to user-defined priorities. It is built on top of featureCounts.

(Zhu et al., 2016) VERSE: a versatile and efficient RNA-Seq read counting tool. bioRxiv.

A tool for RNA-Seq data analysis that counts for each gene how many aligned reads overlap its exons.

(Anders et al., 2013) Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature protocols.

A package that provides efficient low-level and highly reusable S4 classes for storing ranges of integers, RLE vectors (Run-Length Encoding) and, more generally, data that can be organized sequentially (formally defined as Vector objects), as well as views on these Vector objects. IRanges provides also efficient list-like classes for storing big collections of instances of the basic classes. All classes in the package use consistent naming and share the same rich and consistent Vector APIas much as possible.

(Lawrence et al., 2013) Software for computing and annotating genomic ranges. PLoS computational biology.

A read summarization program, which counts mapped reads for the genomic features such as genes and exons.

(Liao et., 2013) featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics

三、定量 A fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It is primarily a genome-guided transcriptome assembler, although it can borrow algorithmic techniques from de novo genome assembly to help with transcript assembly. Its input can include not only the spliced read alignments used by reference-based assemblers, but also longer contigs that were assembled de novo from unambiguous, non-branching parts of a transcript.

(Pertea et al., 2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology.

A computational approach that measures changes in mature RNA and pre-mRNA reads across different experimental conditions to quantify transcriptional and post-transcriptional regulation of gene expression. EISA reveals both transcriptional and post-transcriptional contributions to expression changes, increasing the amount of information that can be gained from RNA-seq data sets.

(Gaidatzis et al., 2015) Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation. Nature biotechnology.

Assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

(Trapnell et al., 2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology.

A method for transcriptome reconstruction that relies solely on RNA-Seq reads and an assembled genome to build a transcriptome ab initio. The statistical methods to estimate read coverage significance are also applicable to other sequencing data. Scripture also has modules for ChIP-Seq peak calling.

(Guttman et al., 2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature biotechnology

Accurate quantification of transcriptome from RNA-Seq data by effective length normalization.

(Lee et al., 2011) Accurate quantification of transcriptome from RNA-Seq data by effective length normalization. Nucleic acids research.

An integrated alignment workflow and a simple counting-based approach to derive estimates for gene, exon and exon-exon junction expression. In contrast to previous counting-based approaches, EQP takes into account only reads whose alignment pattern agrees with the splicing pattern of the features of interest. This leads to improved gene expression estimates as well as to the generation of exon counts that allow disambiguating reads between overlapping exons.

(Schuierer and Roma, 2016) The exon quantification pipeline (EQP): a comprehensive approach to the quantification of gene, exon and junction expression from RNA-seq data. Nucleic acids research.

It was designed as a user friendly solution to extract and annotate biologically important transcripts from next generation RNA sequencing data.

(Forster et al., 2013) RNA-eXpress annotates novel transcript features in RNA-seq data. Bioinformatics.

A versatile model to account for sequence specific bias that commonly occurs at the ends of fragments. Isolotar analyzes RNA-Seq experiments using a simple Bayesian hierarchical model. Combined with aggressive bias correction, it produces estimates that are simultaneously accurate and show high agreement between samples. Isolator is uniquely able to compute posterior probabilities corresponding to arbitrarily complex questions, within the confines of the model.

(Jones et al., 2016) Isolator: accurate and stable analysis of isoform-level expression in RNA-Seq experiments. bioRxiv.

四、标准化与差异表达 A method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

(Love et al., 2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology

A software package designed to facilitate flexible differential expression analysis of RNA-Seq data. Ballgown can also be used to visualize the transcript assembly on a gene-by-gene basis, extract abundance estimates for exons, introns, transcripts or genes, and perform linear model–based differential expression analyses.

(Frazee et al., 2015) Ballgown bridges the gap between transcriptome assembly and expression analysis. Nature biotechnology.

A package to dampen the effect of outliers on count-based differential expression analyses. edgeR uses empirical Bayes estimation and exact tests based on the negative binomial distribution and is useful for differential signal analysis with other types of genome-scale count data. It requires a delicate tradeoff to maintain high power while at the same time achieving a decent resistance to the presence of outliers. In particular, it is difficult to know exactly what an outlier is and where the line should be drawn to identify it as such.

(Zhou et al., 2014) Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic acids research

A differential transcript expression (DTE) analysis algorithm. SDEAPestimates the number of conditions directly from the input samples using a Dirichlet mixture model and discovers alternative splicing events using a new graph modular decomposition algorithm. By taking advantage of the above technical improvement, SDEAP was able to outperform the other DTE analysis methods in extensive experiments on simulated data and real data with qPCR validation. The prediction of SDEAP also allows users to classify the samples of cancer subtypes and cell-cycle phases more accurately.

(Yang and Jiang, 2016) SDEAP: a splice graph based differential transcript expression analysis tool for population data. Bioinformatics

Enables rapid interpretation of complex gene expression studies as well as other high-throughput genomics assays. variancePartition is a statistical and visualization framework, used to prioritize drivers of variation based on a genome-wide summary, and identify genes that deviate from the genome-wide trend. This tool quantifies variation in each expression trait attributable to differences in disease status, sex, cell or tissue type, ancestry, genetic background, experimental stimulus, or technical variables.

(Hoffman and Schadt, 2016) variancePartition: interpreting drivers of variation in complex gene expression studies. BMC BIoinformatics.

A realistic framework to assess the impact of the key components of the statistical framework for differential analyses of RNA-seq data. This tool is based on real data sets and allows the exploration of various scenarios differing in the proportion of non-differentially expressed genes. Hence, it provides an evaluation of the key ingredients of the differential analysis, free of the biases associated with the simulation of data using parametric models.

(Rigaill et al., 2016) Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis. Briefings in Bioinformatics.

Detects differentially expressed (DE) genes for RNA-seq data with high level of hetergeniety such as cancer RNA-seq data. ELTSeq is an empirical likelihood ratio test (ELT) with a mean-variance relationship constraint for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease

(Xu and Chen, 2016) An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq. Briefings in Bioinformatics.

A package for detecting the differentially expressed (DE) genes in time course RNA-Seq data. The negative binomial mixed-effect model (NBMM) method is applied to gene expression data on a gene-by-gene basis. A parallel computing option is implemented in timeSeq package to speed up the computing process. We showed that our approach outperforms other currently available methods in both synthetic and real data.

(Sun et al., 2016) Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model. BMC Bioinformatics.

A method for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two.

(Kang et al., 2016) multiDE: a dimension reduced model based statistical method for differential expression analysis using RNA-sequencing data with multiple treatment conditions. BMC bioinformatics.

(Jia et al., 2015) MetaDiff: differential isoform expression analysis using random-effects meta-regression. BMC bioinformatics.

Provides a data-driven solution to test the assumptions of global normalization methods. Group level information about each sample (such as tumor/normal status) must be provided because the test assesses if there are global differences in the distributions between the user-defined groups.

(Hicks and Irizarry, 2015) quantro: a data-driven approach to guide the choice of an appropriate normalization method. Genome biology.

A Bayesian hierarchical approach to investigate within-sample and between-sample variations in RNA-Seq data.

(Gu et al., 2014) BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data. BMC bioinformatics.

An algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries.

(Trapnell et al., 2013) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature biotechnology.

(Li et al., 2012) Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics.

A package to identify differentially expressed genes or isoforms for RNA-seq data from different samples. DEGseq also encourage users to export gene expression values in a table format which could be directly processed by edgeR (Robinson, 2009), an R package implementing the method based on negative binominal distribution to model overdispersion relative to Poisson for digital gene expression data with small replicates (Robinson and Smyth, 2007)

(Wang et al., 2010) DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics.

五、基因融合

An enhanced version with the ability to align reads across fusion points, which results from the breakage and re-joining of two different chromosomes, or from rearrangements within a chromosome.

(Kim and Salzberg, 2011) TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome biology.

A python package to annotate and visualize gene fusions. For a given gene fusion, AGFusion will predict the cDNA, CDS, and protein sequences resulting from fusion of all combinations of transcripts and save them to fasta files. AGFusion can also plot the protein domain architecture of the fusion transcripts.

(Murphy and Elemento, 2016) AGFusion: annotate and visualize gene fusions. bioRxiv.

A toolkit for fusion gene and chimeric transcript detection from RNA-seq data. InFusion is a computational method for the discovery of chimeric transcripts from RNA-seq data capable of detecting alternatively spliced chimeric transcripts and fusion genes involving non-coding regions. InFusion allows detection of fusions that involve intergenic regions, analyses and filters putative fusion events based on coverage depth, genomic context and strand specificity.

(Okonechnikov et al., 2016) InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data. PLoS One.

六、可变剪接 (Reuter et al., 2016) PreTIS: A Tool to Predict Non-canonical 5’ UTR Translational Initiation Sites in Human and Mouse. Plos Computational Biology.

(Afsari et al., 2016) Splice Expression Variation Analysis (SEVA) for Differential Gene Isoform Usage in Cancer. bioRxiv.

The DEXseq method is implemented as an open Bioconductor package, which facilitates data visualization and exploration. It can detect with high sensitivity genes, and in many cases exons, that are subject to differential exon usage.

(Anders et al., 2012) Detecting differential usage of exons from RNA-seq data. Genome research.

(Liu et al., 2012) Detection, annotation and visualization of alternative splicing from RNA-Seq data with SplicingViewer. Genomics.

(Ryan et al., 2012) SpliceSeq: a resource for analysis and visualization of RNA-Seq data on alternative splicing and its functional impacts. Bioinformatics.

Alternative Splicing transcriptional landscape visualization tool.

(Foissac and Sammeth, 2007) ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic acids research.

六、等位基因 (Deonovic et al., 2016)IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing. Nucleic Acids Research.

(Soderlund et al., 2014) Allele workbench: transcriptome pipeline and interactive graphics for allele-specific expression. PloS one

(Romanel et al., 2015) ASEQ: fast allele-specific studies from next-generation sequencing data. BMC medical genomics.

(Nariai et al., 2015) A Bayesian approach for estimating allele-specific expression from RNA-Seq data with diploid genomes. BMC genomics.