Trinity, Bowtie, eXpress and DEGseq (PE)
This pipeline is useful for the researcher who deal with species without reference genome. It will takes a few days to finish whole analysis for the 240 million 50 bp paired-end data.
Transcriptome de novo assembly is constructed using Trinity. Reads are mapped to contigs using bowtie, and expression revel (FPKM) is calculated using eXpress. Pairwise comparison using DEGseq, extracting differential expressed genes, and clustering based on K-means is done. It adds annotation to each contigs based on blastx to Uniprot and blastn to NR and extracts significantly up regulated GO and visualize using REVIGO.
|Execution time||About a few days (20M paired-end reads[50bp+50bp])|
Table of contents
- Check the dataset type explanation: fastq (paired-end)
Raw NGS reads (FASTQ format, paired-end): 1-10
|1. Table of expression level||2. Clustering|
|3.Stats of DEGseq (MA plot etc.)||4. Extracting characteristic GO for each clusters|
|5.Result of assemble||6. Results of mapping|
How to run this pipeline
1．Login to Maser and open project list.
Click "Create New Project" and make new project.
2．Name to the project (for example, "human RNA-seq") and click "OK".
3．Click the project and open the page.
4．If you make the project, upload input files (FASTQ file). Click "Upload Data Here".
5．Select file transfer protocol. Usually select "HTTPS".
6．Select label name of data (here SRX082565M1.2), data type (here fastq (paired-end)), and upload file (here C:\SRX082565M1.2_1.fastq).
Because paired-end FASTQ is a set of forward and reverse, push "Add file" button and add items.
7．Push reference button and select another FASTQ file.
Upload file should be as follow
・Can not use Japanese and character such as space " ", parentheses "(", quotation "'". Use only alphabetic character, number and underbar "_" as name.
・If FASTQ file is paired-end, forward should be "xxx_1.fastq", "xxx_1.fq", "xxx_1.txt" and "xxx_R1_yyy.fastq", and reverse should be "xxx_2.fastq"...
(ex: my_sample_1.fastq, my_sample_2.fastq)
Sea details here
Read name of paired-end should be same before the first space in both forward and reverse. Sometimes, forward read name is given "/1" and reverse read name is given "/2", but eXpress does not correspond with them, so there is a probability that can not compare expression level.
8．If you have other samples to compare expression level, upload the other FASTQ file.
Here, three files are uploaded.
Click all of "Select" button.
The results line up is same order as you clicked.
9．Selected file will be shown in another window.
Click "Analysis" button.
10．Select "RNA-seq" and click "Analysis" button beside the pipeline name (Trinity, Bowtie, eXpress and DEGseq (PE)).
11．Scroll down and you can sea the list of input files.
Click "Set option and run".
12．Input the sample names in the order, here start from "input1 sample name".
13．Set up p-value for pairwise comparison and number of clusters for k-means clustering.
Check other options, then push "Run" button in the bottom of the page.
1．To check the results or progress, open project page and click "Show Module Flow".
Finished modules or on going modules will appear.
2．Check the results of assemble.
Click fifth icon in the figure and open download page.
3．There are three file as following.
・Trinity.fasta ・・・ assembled contig
・Trinity.fasta.fai ・・・ index of contig file (necessary to visualize using IGV etc.)
・summary ・・・ stats of assembly and mapping ratio
Click "Https" near the summary and open the data.
4．Number of assembled contigs, N50 (bp) (index like average length), maximum contig length (bp) will be shown.
Mapping ratio to contigs using bowtie is shown below as input1, input2...
5．Check histogram of expression level and results of pairwise comparison. Click the third output and open HTML.
X-axis is expression level, Y-axis is frequency of Kernel density estimation .
MA plot of pairwise comparison baed on MATR DEGseq is shown below.
6．Check significantly upregurated or downregurated contigs.
Open the first output and download the data. The first is "expression table" which is tab-delimited text file and can open the file using Excel or other applications. You can sea contig name in the leftmost column header.
Contig name is such as "comp1000_c0_seq1". "comp1000_c0" is the contig and the last part seq1, seq2 means splice variant. For details about contig, sea Trinity manual.
Next, contig length is shown.
Then, expression level for each samples calculated by eXpress. eXpress calculates FPKM considering contig length and bias of
reads. For details, sea eXpress manual.
To scroll to the right, there are contig sequences. If the contig is over 20,000bp, line break is inserted by Excel.
Results of pairwise comparison based on DEGseq.
To scroll to the right, there is number of cluster by clustering analysis.
Then annotation information based on blastx to Uniprot with GO.
To scroll to the right, results of blastn to NCBI NR. "superkingdom" is top level category of taxonomy
7．Grouping expression pattern and pick up the genes in the pattern.
Open the second output.
In the example, the cluster in which sample 3 is upregulated is the forth cluster.
8．To understand the trend of the cluster, you can visualize characteristic GO for each cluster.
Open the forth output.
Click "cluster: 4".
The list is up regulated or down regulated GO based on FET-test.
Coming back one section, click "p-value < 0.001 Revigo".
The website of REVIGO is opened, and cluster4 GO list with p-value less than 0.001 is entered in REVIGO. Click “Start Revigo” on the bottom. At that time, there are some cases where no over represented GO occur because of small number of contigs in the cluster. If so, go back one step and click “full Revigo” button.
In the results of REVIGO, there are "defense response" and "immune system process", and protective immune response would be occurred.
Example1: three sets of 50 bp, 80 million paired-end reads (ERX011194, ERX11195, ERX11197) was used as input, and assemble and annotation was examined.
|Trinity||(Original site)||(NGS Surfer's wiki)|
|Bowtie||(Original site)||(NGS Surfer's wiki)|
|SAMtools||(Original site)||(NGS Surfer's wiki)|
|DEGseq||(Original site)||(NGS Surfer's wiki)|