Platform for Drug Discovery


Trinity, Bowtie, eXpress and DEGseq (PE)


Introduction


    This pipeline is useful for the researcher who deal with species without reference genome. It will takes a few days to finish whole analysis for the 240 million 50 bp paired-end data.


    Transcriptome de novo assembly is constructed using Trinity. Reads are mapped to contigs using bowtie, and expression revel (FPKM) is calculated using eXpress. Pairwise comparison using DEGseq, extracting differential expressed genes, and clustering based on K-means is done. It adds annotation to each contigs based on blastx to Uniprot and blastn to NR and extracts significantly up regulated GO and visualize using REVIGO.


    Input formatFASTQ
    Library layoutPaired-end
    SpeciesUnspecified
    Execution timeAbout a few days (20M paired-end reads[50bp+50bp])

Table of contents

Inputs


    Raw NGS reads (FASTQ format, paired-end): 1-10


Outputs


    Workflow

    Image

    Example 1

    1. Table of expression level
    Image
    List of expression level and annotation based on Blastx (tsv file (readable in Excel))
    2. Clustering
    Image
    Result of clustering (HTML report)
    3.Stats of DEGseq (MA plot etc.)
    Image
    Pairwise comparison of assembled contigs
    4. Extracting characteristic GO for each clusters
    Image
    GO analysis using REVIGO
    5.Result of assemble
    Image
    FASTA file
    6. Results of mapping
    Image
    BAM file (visualizing using samtools view)

How to run this pipeline


    1.Login to Maser and open project list.
    Image
    Click "Create New Project" and make new project.

    2.Name to the project (for example, "human RNA-seq") and click "OK".
    Image

    3.Click the project and open the page.
    Image

    4.If you make the project, upload input files (FASTQ file). Click "Upload Data Here".
    Image

    5.Select file transfer protocol. Usually select "HTTPS".
    Image

    6.Select label name of data (here SRX082565M1.2), data type (here fastq (paired-end)), and upload file (here C:\SRX082565M1.2_1.fastq).
    Image
    Because paired-end FASTQ is a set of forward and reverse, push "Add file" button and add items.

    7.Push reference button and select another FASTQ file.
    Image Upload file should be as follow
    Can not use Japanese and character such as space " ", parentheses "(", quotation "'". Use only alphabetic character, number and underbar "_" as name.
    ・If FASTQ file is paired-end, forward should be "xxx_1.fastq", "xxx_1.fq", "xxx_1.txt" and "xxx_R1_yyy.fastq", and reverse should be "xxx_2.fastq"...
    (ex: my_sample_1.fastq, my_sample_2.fastq)
    Sea details here
    Read name of paired-end should be same before the first space in both forward and reverse. Sometimes, forward read name is given "/1" and reverse read name is given "/2", but eXpress does not correspond with them, so there is a probability that can not compare expression level.

    8.If you have other samples to compare expression level, upload the other FASTQ file.
    Here, three files are uploaded.
    Click all of "Select" button.
    The results line up is same order as you clicked.

    Image

    9.Selected file will be shown in another window.
    Image
    Click "Analysis" button.

    10.Select "RNA-seq" and click "Analysis" button beside the pipeline name (Trinity, Bowtie, eXpress and DEGseq (PE)).
    Image

    11.Scroll down and you can sea the list of input files.
    Image
    Click "Set option and run".


    12.Input the sample names in the order, here start from "input1 sample name".
    Image

    13.Set up p-value for pairwise comparison and number of clusters for k-means clustering.
    Image

    Check other options, then push "Run" button in the bottom of the page.

Result explanation


    1.To check the results or progress, open project page and click "Show Module Flow".
    Image
    Finished modules or on going modules will appear.


    2.Check the results of assemble.
    Image
    Click fifth icon in the figure and open download page.


    3.There are three file as following.
    Image ・Trinity.fasta ・・・ assembled contig
    ・Trinity.fasta.fai ・・・ index of contig file (necessary to visualize using IGV etc.)
    ・summary ・・・ stats of assembly and mapping ratio
    Click "Https" near the summary and open the data.

    4.Number of assembled contigs, N50 (bp) (index like average length), maximum contig length (bp) will be shown.
    Image
    Mapping ratio to contigs using bowtie is shown below as input1, input2...

    5.Check histogram of expression level and results of pairwise comparison. Click the third output and open HTML.
    Image
    X-axis is expression level, Y-axis is frequency of Kernel density estimation .

    MA plot of pairwise comparison baed on MATR DEGseq is shown below.

    6.Check significantly upregurated or downregurated contigs.
      Open the first output and download the data. The first is "expression table" which is tab-delimited text file and can open the file using Excel or other applications. You can sea contig name in the leftmost column header.
      Contig name is such as "comp1000_c0_seq1". "comp1000_c0" is the contig and the last part seq1, seq2 means splice variant. For details about contig, sea Trinity manual.
      Next, contig length is shown.
      Then, expression level for each samples calculated by eXpress. eXpress calculates FPKM considering contig length and bias of
      reads. For details, sea eXpress manual.
    Image
    To scroll to the right, there are contig sequences. If the contig is over 20,000bp, line break is inserted by Excel.
    Results of pairwise comparison based on DEGseq.
    Image
    To scroll to the right, there is number of cluster by clustering analysis.
    Then annotation information based on blastx to Uniprot with GO.
    Image
    To scroll to the right, results of blastn to NCBI NR. "superkingdom" is top level category of taxonomy Image

    7.Grouping expression pattern and pick up the genes in the pattern.
    Open the second output.
    Image
    In the example, the cluster in which sample 3 is upregulated is the forth cluster.

    8.To understand the trend of the cluster, you can visualize characteristic GO for each cluster.
    Open the forth output.
    Image
    Click "cluster: 4".
    Image
    The list is up regulated or down regulated GO based on FET-test.
    Coming back one section, click "p-value < 0.001 Revigo".
    Image

    The website of REVIGO is opened, and cluster4 GO list with p-value less than 0.001 is entered in REVIGO. Click “Start Revigo” on the bottom. At that time, there are some cases where no over represented GO occur because of small number of contigs in the cluster. If so, go back one step and click “full Revigo” button.
    Image
    In the results of REVIGO, there are "defense response" and "immune system process", and protective immune response would be occurred.

Use case


    Example1: three sets of 50 bp, 80 million paired-end reads (ERX011194, ERX11195, ERX11197) was used as input, and assemble and annotation was examined.

Related information