Advertisement

quast 的结果怎么看_使用quast评估基因组装配的质量

阅读量:

quast 的结果怎么看

The assembly algorithms that have been developed so far intend to provide better assemblies evaluated under different criteria. Hence, depending on the specific scenario the assembly process might produce better results if we use the most appropriate assembler. Even though contiguous genomes may not be produced, segments from the reference genomes can be obtained using existing assembly methods. Therefore, the need to evaluate the quality of assemblies exists. These evaluations help researchers to pick different assemblers for different scenarios.

到目前为止已开发的组装算法旨在提供在不同标准下评估的更好的组装。 因此,根据具体情况,如果使用最合适的汇编程序,则汇编过程可能会产生更好的结果。 即使可能不会产生连续的基因组,也可以使用现有的组装方法从参考基因组中获得片段。 因此,存在评估装配质量的需求。 这些评估有助于研究人员针对不同的情况选择不同的组装商。

How can we know whether the assemblies we obtain from reads using currently available assemblers are correct or not? In this article, we will see how to determine the quality of assemblies using QUAST , which is one of the most famous assessment tools available for genome assemblies. Let’s get started.

我们如何知道我们使用当前可用的汇编程序读取的汇编程序是否正确? 在本文中,我们将看到如何使用QUAST 来确定装配的质量, QUAST 是可用于基因组装配的最著名的评估工具之一。 让我们开始吧。

什么是QUAST? (What is QUAST?)

QUAST stands for QUality Assessment Tool. QUAST can evaluate assemblies using reference genomes, as well as without reference genomes. QUAST produces detailed reports, tables and plots which show the different aspects of assemblies.

QUAST 代表质量评估工具。 QUAST可以使用参考基因​​组以及不使用参考基因​​组来评估装配。 QUAST生成详细的报告,表格和图解,以显示装配的不同方面。

下载QUAST (Download QUAST)

You can go to the official website of QUAST and click on the DOWNLOAD button.

您可以访问QUAST的官方网站,然后单击“ 下载” 按钮。

You will be directed to a SOURCEFORGE download page from where you can download the latest version (quast-5.0.2 when I was writing this article) of QUAST. The pre-compiled binaries will be downloaded and you can run it straight away after extracting.

您将被引导到SOURCEFORGE下载页面,从该页面可以下载 QUAST 的最新版本(我写本文时为 quast-5.0.2 )。 预编译的二进制文件将被下载,提取后您可以立即运行它。

复制代码
    tar -xf quast-5.0.2.tar.gzcd quast-5.0.2quast.py
    

You can see the following after executing quast.py or python quast.py.

执行quast.pypython quast.py之后,您可以看到以下python quast.py

复制代码
    QUAST: Quality Assessment Tool for Genome AssembliesVersion: 5.0.2Usage: python quast.py [options] <files_with_contigs>Options:-o  --output-dir  <dirname>       Directory to store all result files [default: quast_results/results_<datetime>]-r                <filename>      Reference genome file-g  --features [type:]<filename>  File with genomic feature coordinates in the reference (GFF, BED, NCBI or TXT)                                  Optional 'type' can be specified for extracting only a specific feature type from GFF-m  --min-contig  <int>           Lower threshold for contig length [default: 500]-t  --threads     <int>           Maximum number of threads [default: 25% of CPUs]These are basic options. To see the full list, use --helpOnline QUAST manual is available at http://quast.sf.net/manual
    

Once you have ensured that QUAST is running correctly, we can start to assess some assemblies.

一旦您确保QUAST正确运行,我们就可以开始评估一些程序集。

获取示例程序集 (Obtaining an Example Assembly)

We will be using the example dataset used in the Flye assembler. The example dataset consists of reads of an E. coli genome (Escherichia coli str. K-12 substr. MG1655 with NCBI accession number CP009685). The reads consist of PacBio reads.

我们将使用Flye汇编器中使用的示例数据集 。 该示例数据集由大肠杆菌 基因组( 大肠杆菌 序列K-12,子级MG1655,NCBI登录号CP009685 )的读段组成。 读取包括PacBio读取。

You can download the dataset with reads using the following command.

您可以使用以下命令下载读取的数据集。

复制代码
    wget https://zenodo.org/record/1172816/files/E.coli_PacBio_40x.fasta
    

Let’s assemble this dataset using the Flye assembler.

让我们使用Flye汇编器汇编此数据集。

复制代码
    flye --pacbio-raw E.coli_PacBio_40x.fasta --out-dir my_assembly --threads 8
    

Now we have an example assembly. The contigs of the final assembly can be found in the file assembly.fasta. Let’s see how good the quality of the assembly is.

现在我们有一个示例程序集。 最终程序集的重叠群可以在文件assembly.fasta找到。 让我们看看装配的质量如何。

使用QUAST (Using QUAST)

You can run QUAST by providing the contigs file containing the final assembly and the reference genome.

您可以通过提供包含最终装配和参考基因组的重叠群文件来运行QUAST。

复制代码
    quast.py 
    

Now you can view the final report from the report.html file in the output folder.

现在,您可以从输出文件夹中的report.html文件中查看最终报告。

You can also compare multiple assemblies (assemly1.fasta and assembly2.fasta) as shown. You can specify labels for each assembly as well.

您还可以比较多个程序集( assemly1.fastaassembly2.fasta ),如图所示。 您也可以为每个装配指定标签。

复制代码
    quast.py assemly1.fasta assembly2.fasta -l label1,label2 -r ref.fasta -o quastResult
    
Image for post

QUAST report for two assemblies 两个程序集的QUAST报告

You can note the following common evaluation measures that are used to assess the quality of genomes.

您可以注意到以下用于评估基因组质量的常见评估方法。

  • Genome fraction

基因组分数

  • Largest alignment

最大对齐

  • NGA50

NGA50

  • LGA50

LGA50

  • Number of misassemblies

组装数量

  • Number of contigs

重叠群数

QUAST provides sample explanations for each of these measures. You can hover over each measure and a popup message will be shown with the explanation.

QUAST提供了每种措施的样本说明。 您可以将鼠标悬停在每个小节上,并显示一条弹出消息和说明。

You can also assess your assembly without providing any reference genomes.

您也可以在不提供任何参考基因组的情况下评估装配。

复制代码
    quast.py 
    

Your result will contain details of the statistics without any references such as,

您的结果将包含统计信息的详细信息,而无需任何参考,例如,

  • Number of contigs

重叠群数

  • Largest contig

最大重叠群

  • Total length

总长度

  • N50

N50

  • L50

L50
Image for post

QUAST report for Flye assembly of E. coli dataset without reference 没有参考的大肠杆菌数据集Flye装配的QUAST报告

Icarus Contig浏览器 (Icarus Contig Browser)

Icarus is a tool available within QUAST which can visualise assemblies for analytical purposes.

Icarus是QUAST中提供的工具,可以可视化程序集以进行分析。
Image for post

Icarus contig browser Icarus contig浏览器

You can view how well your assembly aligns with the reference genome.

您可以查看装配体与参考基因组的对齐程度。

MetaQUAST:用于元基因组学大会的QUAST (MetaQUAST: QUAST for Metagenomics Assemblies)

QUAST provides a version named MetaQUAST, that allows us to assess metagenomics assemblies. You can provide multiple assemblies and compare them at once. Moreover, you can provide multiple reference genomes as well.

QUAST提供了一个名为MetaQUAST的版本,它使我们能够评估宏基因组学程序集。 您可以提供多个程序集并一次比较它们。 此外,您还可以提供多个参考基因组。

You can run MetaQUAST as follows.

您可以按以下方式运行MetaQUAST。

复制代码
    metaquast.py meta.contigs1.fasta meta.contigs2.fasta -l label1,label2 -R References/ -t 8 -o metaquastResult
    

Similar to QUAST, you can provide labels for each assembly so that they will be displayed in the final report. Moreover, you can provide a single folder containing all the reference genomes for the assessment.

与QUAST相似,您可以为每个部件提供标签,以便它们将显示在最终报告中。 此外,您可以提供一个包含所有参考基因组的文件夹,用于评估。
Image for post

MetaQUAST report for three assemblies with multiple references 具有多个引用的三个程序集的MetaQUAST报告

最后的想法 (Final Thoughts)

Hope you found this article useful and informative as a starting point towards using quality assessment tools for genome assemblies. Feel free to use these tools for your projects and research work as they are freely available.

希望您发现这篇文章对将质量评估工具用于基因组装配的起点是有用的,并且是有益的。 您可以免费使用这些工具来进行项目和研究工作。

Cheers, and stay safe!

干杯,保持安全!

You can read my previous articles related to bioinformatics and DNA analysis.

您可以阅读我以前有关生物信息学和DNA分析的文章。

翻译自: https://medium.com/computational-biology/assessing-the-quality-of-genome-assemblies-using-quast-94fec3f8cb70

quast 的结果怎么看

全部评论 (0)

还没有任何评论哟~