反向输出dna序列_在DNA序列中寻找反向重复序列
该DNA序列较为冗长,并且其两侧翼区域存在两个特定位置的反向互补片段。
输入是:
cgtacacgagtagtcgtagctgtcagtcgatcgtacgtacgtagctgctgtagcactatcgaccccacacgtgtgtacacgatgcacagtcgtctatcacatgctagcgctgcccgtacgGATGGCCAAGGCCATCcgatcgctagctagcgccgcgcgtagcccgatcgagacatgctagcagttgtgctgatgtcgagatagctgtgatgcgatgctagcgccgcctagccgcctcgtgtaggctggatgcga的tcgatcgatgctagcggcgcgatcga tgcactagcc gtagcg ct ag ct g at cg at cg ta GATGGCCAAGGCCATCc gc g tag ata c g ac a t c c gg gg gt at a taa
这是我的代码:
use strict;
use warnings;
my input= ARGV[0];
chomp $input;
open (my $fh_in, "
my $dna= ;
chomp $dna;
#######################################################################################
if ($dna=~ /[^ACGT]/gi) {
print "This is not a valid DNA sequence, it has unknown base(s)\n";
}
$dna=~ tr/[acgt]/[ACGT]/;
######################################################################################
print "Minimum length of palindromic sequence?\n";
my $min= ;
chomp $min;
print "Maximum length of palindromic sequence?\n";
my $max= ;
chomp $max;
print "Minimum length of spacer region?\n";
my $min_spacer= ;
chomp $min_spacer;
print "Maximum length of spacer region?\n";
my $max_spacer= ;
chomp $max_spacer;
######################################################################################
my dna_length= length(dna);
my (length , offset , string_1 , string_2);
for (offset= 0 ; offset <= dna_length-max-max-max_spacer ; $offset++) {
for (length= min ; length <= max ; $length++) {
string_1= substr (dna, offset, length);
string_2= reverse string_1;
$string_2=~ tr/[ACGT]/[TGCA]/;
if (dna=~ /((string_1)([ACGT]{min_spacer,max_spacer})($string_2))/) {
print "IR starts at offset => 2***3***4\n$1\n\n";
}
}
}
带参数:
min = 6, max = 12, min_spacer = 4, max_spacer = 12
我得到的输出是:
IR starts at 26 => TCGATCGATGCTAGCGGCGCGATCGA
TCGATCGATGCTAGCGGCGCGATCGA
IR starts at 27 => CGATCGATGCTAGCGGCGCGATCG
CGATCGATGCTAGCGGCGCGATCG
IR starts at 118 => CGGATGGCCAAGGCCATCCG
CGGATGGCCAAGGCCATCCG
IR starts at 118 => CGGATGGCCAAGGCCATCCG
CGGATGGCCAAGGCCATCCG
IR starts at 118 => CGGATGGCCAAGGCCATCCG
CGGATGGCCAAGGCCATCCG
IR starts at 119 => GGATGGCCAAGGCCATCC
GGATGGCCAAGGCCATCC
IR starts at 119 => GGATGGCCAAGGCCATCC
GGATGGCCAAGGCCATCC
IR starts at 120 => GATGGCCAAGGCCATC
GATGGCCAAGGCCATC
IR starts at 136 => CGATCGATGCTAGCGGCGCGATCG
CGATCGATGCTAGCGGCGCGATCG
IR starts at 164 => CGATCGATGCTAGCGGCGCGATCG
CGATCGATGCTAGCGGCGCGATCG
IR starts at 252 => CGATCGATGCTAGCGGCGCGATCG
CGATCGATGCTAGCGGCGCGATCG
IR starts at 254 => ATCGATGCTAGCGGCGCGATCGAT
ATCGATGCTAGCGGCGCGATCGAT
IR starts at 254 => ATCGATCGATGCTAGCGGCGCGATCGAT
ATCGATCGATGCTAGCGGCGCGATCGAT
IR starts at 255 => TCGATCGATGCTAGCGGCGCGATCGA
TCGATCGATGCTAGCGGCGCGATCGA
IR starts at 256 => CGATCGATGCTAGCGGCGCGATCG
CGATCGATGCTAGCGGCGCGATCG
IR starts at 258 => ATCGATGCTAGCGGCGCGATCGAT
ATCGATGCTAGCGGCGCGATCGAT
IR starts at 274 => CGATCGATGCTAGCGGCGCGATCG
CGATCGATGCTAGCGGCGCGATCG
IR starts at 276 => ATCGATGCTAGCGGCGCGATCGAT
ATCGATGCTAGCGGCGCGATCGAT
IR starts at 304 => ATCGATGCTAGCGGCGCGATCGAT
ATCGATGCTAGCGGCGCGATCGAT
IR starts at 304 => ATCGATCGATGCTAGCGGCGCGATCGAT
ATCGATCGATGCTAGCGGCGCGATCGAT
IR starts at 305 => TCGATCGATGCTAGCGGCGCGATCGA
TCGATCGATGCTAGCGGCGCGATCGA
IR starts at 306 => CGATCGATGCTAGCGGCGCGATCG
CGATCGATGCTAGCGGCGCGATCG
IR starts at 314 => GATGGCCAAGGCCATC
GATGGCCAAGGCCATC
然而,在核查我的首次点击所处区域(在输入中以粗体标出)时, 此定位的offset似乎不在位置26. 有人能指出我的代码存在什么问题吗? 感谢
