宏基因组分析流程
发布时间
阅读量:
阅读量
1.md5sum+trimmomatic
md5sum SRR1976948_1.fastq.gz SRR1976948_2.fastq.gz
java -jar /data/XXXXX/software/software/Trimmomatic-0.36/trimmomatic-0.36.jar PE \
-phred33 SRR1976948_1.fastq.gz SRR1976948_2.fastq.gz \
/data/XXXXX/test/MGS/01trim/SRR1976948_1_paired.fq.gz \
/data/XXXXX/test/MGS/01trim/SRR1976948_1_unpaired.fq.gz \
/data/XXXXX/test/MGS/01trim/SRR1976948_2_paired.fq.gz \
/data/XXXXX/test/MGS/01trim/SRR1976948_2_unpaired.fq.gz \
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)
Remove leading low quality or N bases (below quality 3) (LEADING:3)
Remove trailing low quality or N bases (below quality 3) (TRAILING:3)
Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 (SLIDINGWINDOW:4:15)
Drop reads below the 36 bases long (MINLEN:36)
2.fastqc
fastqc SRR1976948_1_paired.fq.gz
fastqc SRR1976948_2_paired.fq.gz
3.spades拼接组装
python /data/XXXXX/software/software/SPAdes-3.11.1-Linux/bin/spades.py --meta -1 SRR1976948_1_paired.fq.gz -2 SRR1976948_2_paired.fq.gz -o /data/XXXXX/test/MGS/02spades/
#contig通过序列overlap搭建起来的非冗余序列集。scaffold通过使用有paie-end关系的reads对contig徐磊进行连接后所得序列集。
4.QUEST评估组装效果
/data/XXXXX/software/software/quast-5.0.2/quast.py contigs.fasta -o /data/XXXXX/test/MGS/03quast
5.Prokka注释基因
prokka contigs.fasta --outdir /data/XXXXX/test/MGS/04prokka/ --prefix metagG --metagenome --kingdom Bacteria
6.sourmash比较数据集
#比较组装结果,计算过滤序列的k51特征
source activate sourmash_env
sourmash compute -k51 --scaled 10000 contigs.fasta -o /data/XXXXX/test/MGS/05sourmash/reads.scaled10k.k51.sig
#评估有多少reads包含进入了组装中
source activate sourmash_env
sourmash search reads.scaled10k.k51.sig reads.scaled10k.k51.sig --containment
#比较不同样本并做图
sourmash compare *sig -o Hu_metagenomes
sourmash plot --labels Hu_metagenomes
from IPython.display import Image
Image("Hu_metagenomes.matrix.png")
7.基因丰度估计Salmon
8.maxbin分箱宏基因组
9.prodigal进行orf预测
10.metaphlan物种丰度分析
全部评论 (0)
还没有任何评论哟~
