Advertisement

宏基因组分析流程

阅读量:
1.md5sum+trimmomatic
复制代码
    md5sum SRR1976948_1.fastq.gz SRR1976948_2.fastq.gz
    
    java -jar /data/XXXXX/software/software/Trimmomatic-0.36/trimmomatic-0.36.jar PE \
    -phred33 SRR1976948_1.fastq.gz SRR1976948_2.fastq.gz \
    /data/XXXXX/test/MGS/01trim/SRR1976948_1_paired.fq.gz \
    /data/XXXXX/test/MGS/01trim/SRR1976948_1_unpaired.fq.gz \
    /data/XXXXX/test/MGS/01trim/SRR1976948_2_paired.fq.gz \
    /data/XXXXX/test/MGS/01trim/SRR1976948_2_unpaired.fq.gz \
    ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)
Remove leading low quality or N bases (below quality 3) (LEADING:3)
Remove trailing low quality or N bases (below quality 3) (TRAILING:3)
Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15)
Drop reads below the 36 bases long (MINLEN:36)

2.fastqc
复制代码
    fastqc SRR1976948_1_paired.fq.gz
    fastqc SRR1976948_2_paired.fq.gz
3.spades拼接组装
复制代码
    python /data/XXXXX/software/software/SPAdes-3.11.1-Linux/bin/spades.py --meta -1 SRR1976948_1_paired.fq.gz -2 SRR1976948_2_paired.fq.gz -o /data/XXXXX/test/MGS/02spades/
    #contig通过序列overlap搭建起来的非冗余序列集。scaffold通过使用有paie-end关系的reads对contig徐磊进行连接后所得序列集。
4.QUEST评估组装效果
复制代码
    /data/XXXXX/software/software/quast-5.0.2/quast.py contigs.fasta -o /data/XXXXX/test/MGS/03quast
5.Prokka注释基因
复制代码
    prokka contigs.fasta --outdir /data/XXXXX/test/MGS/04prokka/ --prefix metagG --metagenome --kingdom Bacteria
6.sourmash比较数据集
复制代码
    #比较组装结果,计算过滤序列的k51特征
    source activate sourmash_env
    sourmash compute -k51 --scaled 10000 contigs.fasta -o /data/XXXXX/test/MGS/05sourmash/reads.scaled10k.k51.sig 
    
    #评估有多少reads包含进入了组装中
    source activate sourmash_env
    sourmash search reads.scaled10k.k51.sig reads.scaled10k.k51.sig --containment 
    
    #比较不同样本并做图
    sourmash compare *sig -o Hu_metagenomes
    sourmash plot --labels Hu_metagenomes
    from IPython.display import Image
    Image("Hu_metagenomes.matrix.png")
7.基因丰度估计Salmon
复制代码
8.maxbin分箱宏基因组
复制代码
9.prodigal进行orf预测
复制代码
10.metaphlan物种丰度分析
复制代码

全部评论 (0)

还没有任何评论哟~