生物信息基因处理工具 Biopython、pysam
发布时间
阅读量:
阅读量
参考:
https://github.com/biopython/biopython
https://github.com/pysam-developers/pysam
注意:
pysam暂时不支持windows,在linux直接安装pip安装
1、Biopython
DNA、RNA、蛋白数据处理
参考:https://biopython.org/wiki/Documentation
安装模块:[pip] install biopython
序列读写 SeqIO
## 读取
from Bio import SeqIO
for record in SeqIO.parse("example.fasta", "fasta"):
print(record.id)
## SeqIO.read("NC_006581.gbk", "genbank")
##保存
from Bio import SeqIO
sequences = ... # add code here
SeqIO.write(sequences, "example.fasta", "fasta")
##序列转换,gb格式转fasta格式
from Bio import SeqIO
count = SeqIO.convert("cor6_6.gb", "genbank", "cor6_6.fasta", "fasta")
print("Converted %i records" % count)
序列处理 Seq
from Bio.Seq import Seq
# 新建一个DNA序列对象
dna_seq = Seq("GGATGGTTGTCTATTAACTTGTTCAAAAAAGTATCAGGAGTTGTCAAGGCAGAGAAGAGAGTGTTTGCA")
# # 新建一个RNA序列对象
rna_seq = Seq("UGCAAACACUCUCUUC")
# # # 新建一个蛋白质序列对象
protein_seq = Seq("GWLSINLFKKVSGVVKAEKRVFA")
dna_seq.translate(),dna_seq.transcribe(),dna_seq.reverse_complement(),dna_seq.reverse_complement_rna()
rna_seq,rna_seq.back_transcribe()
reverse_complement_rna和reverse_complement结果是与原序列反着的

蛋白pdb读取及氨基酸数量长度计算
参考:https://www.thinbug.com/q/38027353
from Bio import PDB
parser = PDB.PDBParser()
pdb1 =r"C:\***ata\7b85.pdb"
structure = parser.get_structure("7b85", pdb1)
model = structure[0]
res_no = 0
non_resi = 0
for model in structure:
for chain in model:
for r in chain.get_residues():
if r.id[0] == ' ':
res_no +=1
else:
non_resi +=1
print ("Residues: %i" % (res_no))
print ("Other: %i" % (non_resi))

2、pysam
pip install pysam
pysam - 多种格式基因组数据(sam/bam/vcf/bcf/cram/…)读写与处理模块

全部评论 (0)
还没有任何评论哟~
