Jubayer Hossain

Biomedical Researcher

Transcriptomic Data Analysis


From Bulk to Spatial: Learn RNA-Seq and Single-Cell Analysis with Linux and R


Course Overview 

Transcriptomic Data Analysis course is a comprehensive program designed to equip participants with the skills and knowledge required to analyze transcriptomic datasets effectively. The course spans fundamental concepts, hands-on tools, and advanced analytical techniques for both bulk and single-cell RNA sequencing (RNA-seq) data. With a focus on reproducibility, practical applications, and integration of computational and statistical methods, this course is tailored for biologists, bioinformaticians, and data scientists interested in transcriptomics. 

Course Objectives

By the end of this course, students will be able to:
  1. Understand the fundamental concepts of transcriptome analysis and RNA sequencing.
  2. Perform upstream RNA-seq data pre-processing in a Linux environment.
  3. Utilize R and the Tidyverse ecosystem for data manipulation and visualization.
  4. Analyze RNA-seq data using Bioconductor packages, including DESeq2 for differential gene expression.
  5. Conduct advanced downstream RNA-seq analysis and identify differentially expressed genes.
  6. Apply Seurat for single-cell RNA-seq analysis and explore cellular heterogeneity.
  7. Analyze spatial transcriptomics data to understand spatial gene expression patterns.

What is RNA Seq?

RNA-Seq, a pivotal tool employing Next-Generation Sequencing (NGS) technologies, is designed to create detailed maps and quantifications of the transcriptome, thereby unmasking information such as gene transcription levels, the structure and expression of transcripts, RNA modification, and non-coding RNA, among other facets. The transcriptome, a comprehensive collection of all transcripts in a cell, offers vital information regarding transcript levels at specific developmental stages or physiological states. Comprehending the transcriptome is essential to interpret the functional elements of the genome, as well as to understand biological development and diseases. Key objectives of transcriptomics encompass cataloging all species of transcripts; pinpointing the transcriptional structure of genes; and quantifying the expression levels of each transcript under varying conditions.
In offering an unbiased high-resolution view of global transcription patterns, RNA-Seq introduces an economical and accurate method for gene expression quantification and differential gene expression analysis across multiple sample groups. It enables the identification of novel and previously unpredicted transcripts, independent of a reference genome, hence facilitating de novo assembly of unstudied transcriptomes. Further, it allows for the discovery of new gene architectures, alternatively spliced isoforms, gene fusions, SNP/InDel, and allele-specific expressions (ASE).

Advantages of RNA-Seq

  • Quantitative and precise measurements of RNA molecules at a single base-pair resolution
  • Discovery of novel transcripts, splice variants, and gene fusions
  • Remarkably, this strategy is applicable to any species, regardless of the availability of the reference genome
  • A practice affording comparable or even lower costs relative to many other methodologies.
  • The approach adeptly detects various RNA types, spanning mRNA, miRNA, lncRNA, amongst others, proffering a comprehensive viewpoint on the RNA present in cells or tissues.
  • Noteworthy is the capacity to simultaneously analyze multiple samples, efficiently accruing abundant data—an attribute underscoring its profound utility in high-throughput RNA analytics.

RNA-Seq Development

Sequencing technology has undergone significant transformations and advancements over time, particularly over the past two to three decades. Initially, Sanger sequencing was instituted as the first-generation sequencing method. By leveraging reversible termination synthesis reactions in binary chemistry, Sanger sequencing enabled the determination of base sequences at the DNA termini in accordance with the RNA sequence. The 1990s marked noteworthy strides in sequencing technology coinciding with the commencement of the whole-genome sequencing projects. Introduction of high-throughput sequencing platforms, such as 454 sequencing, Illumina sequencing, and Ion Torrent sequencing, facilitated the feasibility of RNA sequencing. Traditional RNA sequencing technologies often required a substantial volume of cells to obtain a satisfactory quantity of RNA for sequencing, which overshadowed the heterogeneity amongst different cells. Subsequently, the emergence of single-cell RNA sequencing techniques, like SMART-seq and 10x Genomics, permitted high-throughput sequencing of the transcriptomes of individual cells. In essence, these techniques uncover the gene expression characteristics of distinct cellular types and states.

The Applications of RNA-Seq

RNA sequencing (RNA-seq) stands as a widely employed technique, applicable across various domains within biological and medical research. Below delineates several common applications of RNA-seq:
  • Gene expression analysis
  • Differential gene expression analysis
  • Discovery of novel genes
  • Alternative splicing analysis
  • Biomarker discovery
  • Non-coding RNA research
  • Gene function elucidation
  • Population genetics and evolutionary biology
[Picture]
RNA-Seq Analysis Pipeline