Platform for Drug Discovery



MetaWRAP binning


Introduction


    MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis



    Inputs


    • multi fastq (paired-end)
      The forward file names must be "XXX_1.fastq", "XXX_R1.fastq" or "XXX_R1_XXX.fastq" and the reverse file names must be "XXX_2.fastq", "XXX_R2.fastq" or "XXX_R2_XXX.fastq".

    • fasta (nucleotide)
      The file name extension must end with ".fa" or ".fasta".

    • FASTQ Format Specification

    Outputs


    • FASTA files for each bin [fasta (nucleotide)]

    • Information [downloadable]

      • The assignment of taxonomy for each bin by CheckM.
        -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          Bin Id            Marker lineage            # genomes   # markers   # marker sets    0     1    2    3   4   5+   Completeness   Contamination   Strain heterogeneity
        -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          bin.12   c__Gammaproteobacteria (UID4444)      263         505           231         2    500   3    0   0   0       99.71            0.70               0.00
          bin.2     f__Rhodobacteraceae (UID3340)         84         568           330         11   555   2    0   0   0       97.73            0.20               0.00
          bin.4      o__Actinomycetales (UID1593)         69         400           198         21   377   2    0   0   0       94.22            0.61              50.00
          bin.7     f__Rhodobacteraceae (UID3356)         67         615           329         42   552   21   0   0   0       93.25            3.57              76.19
          bin.9         s__algicola (UID2846)             47         571           303         64   489   18   0   0   0       87.34            2.98              61.11
          bin.5    c__Gammaproteobacteria (UID4443)      356         451           270         82   358   11   0   0   0       87.04            1.96              81.82
          bin.1         s__algicola (UID2847)             33         496           263        102   366   26   2   0   0       75.80            4.00              46.88
          bin.8      p__Proteobacteria (UID3880)         1495        242           151         72   165   5    0   0   0       70.56            2.98              40.00
          bin.10   c__Gammaproteobacteria (UID4443)      356         451           270        131   314   6    0   0   0       67.96            0.92              33.33
        			
      • The assignment of taxonomy for each bin by metaWrap.
        bin     completeness    contamination   GC      lineage N50     size    binner
        bin.12  99.71   0.697   0.417   Gammaproteobacteria     366336  4425968 binsB
        bin.2   97.72   0.202   0.511   Rhodobacteraceae        39870   3058105 binsAB
        bin.4   94.21   0.606   0.576   Actinomycetales 101071  1445600 binsA
        bin.7   93.24   3.568   0.551   Rhodobacteraceae        7028    2513378 binsAB
        bin.9   87.33   2.978   0.354   algicola        7689    1744558 binsAB
        bin.5   87.03   1.957   0.510   Gammaproteobacteria     7458    1998585 binsAB
        bin.1   75.79   3.997   0.371   algicola        2762    1678591 binsB
        bin.8   70.56   2.980   0.368   Proteobacteria  7356    1315389 binsB
        bin.10  67.96   0.919   0.433   Gammaproteobacteria     3532    1056065 binsAB
        bin.13  60.63   2.610   0.443   Gammaproteobacteria     1721    858561  binsB
        			
      • TPM vs contig length graph for each bin.


    Options


    Comments


    Use case


    Related information