Platform for Drug Discovery


Annotation after assembling (PE)


Introduction


    This pipeline predicts gene structures using Augustus program, annotates the predicted genes by blast against NT database, and estimates population size history.


    Input formatFASTQ, FASTA, RepeatMasker library
    Library layoutPaired-end
    Execution timeAbout a few weeks

Inputs


    Genome assembly (Contig or Scaffold sequences) (FASTA format): 1



    Raw NGS reads (FASTQ format, Paired-end): 1

    • If you have some FASTQ files, please merge them into a single file

    (Optional) Repeat sequence database (RepeatMaskerLib.embl file): 1

    • RepeatMaskerLib.embl file can be retrieved at Repbase.

Outputs


    Workflow

    Image

    Output files


    1. Removal of duplicated sequences by RepeatMasker
    Image
    2. Gene prediction by AUGUSTUS
    Image
    Gene prediction with Augustus (gff format)
    3. Gene annotation in the Uniprot and NR knowlegebase
    Image
    A list of contigs with annotation information(tsv format)
    4. Dotplot
    Image
    5. A list of SNV and short INDELs
    Image
    6. Population size estimation
    Image
    7. A plot of coverage vs assembly contig length
    Image


Related information