サムネがコーヒーの記事は書きかけです。

ハミング距離と塩基配列のミスリードの修正

今回は、FASTA形式で取得した同じ長さのDNA断片からミスリードされたものを検出するアルゴリズムについて考えてみます。

このアルゴリズム自体にはほとんど意味がないのですが、制約をうまく調整すれば実際にミスリードを検出できるかもしれません。

ハミング距離とは

難しく聞こえますが、ハミング距離とは同じ長さの文字列2つを比べた時どれだけ(何個)相違があるかを表しているだけです。

ですので、ハミング距離自体は以下の関数で簡単に求めることができます。

def get_hamming_distance(s_1,s_2):
    mismatches = 0
    for i in range(len(s_1)):
        if s_1[i] != s_2[i]:
            mismatches += 1
    return mismatches

ミスリードの定義

今回は、ミスリードを以下のように定義します。

1.もし任意の配列がミスリードの場合、その配列は全配列に1回のみ出現し、なおかつその配列とのハミング距離が1であるもの。

2.もし任意の配列が正しい場合、その配列は逆相補鎖を含め全配列中に2回は出現する。

アルゴリズム

以上の2つの制約を満たすようにアルゴリズムを考えていきます。

対象配列

今回はFASTA形式のDNA断片を使用します。

>5073GCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTT>2057GTAAGTCGGTAACCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGG>5981GCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATCACTAAAGGGGGGTGG>9484CGTGGTACTCTTCGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGA>8265ACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGT>1729TCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTT>7264AAGTGACGGGCTCTGTTTCACCTTGCGGATCTGGCTTCTAAATTACTAAA>2209CTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCT>0142ACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGT>7879GGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAA>7991GGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAA>1009GACGGCCATAACCGGTACATCGTGGTACTCTTGGTCGGCCAAAGTGACGG>9954CGGTACATCGTGGTACTCTTGGGCAGCCAAAGTGACGGGCTCTGTTTCAC>4116GCGGCCAAAGTGACGGGATCTGTTTCACCTTGCGGAGCTGGCTTCTAAAT>7753CATCGTGGTAATCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGC>8153TAAGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGC>3536TACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTTG>3282GGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGC>6227TGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCG>8302CTGGCTTCTAAATTACTAATGGGGGGTGGCGAGTACCGTCGCAGGATAGC>6843GTGCCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAA>7647CTGCTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGA>6245GACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGGGACGG>2850TCTTGGGCGGCCAAAGTGACGGGCTCTGTTGCACCTTGCGGAGCTGGCTT>6321AAAGTGACGTGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAA>7401CCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGCACCGTC>8306CTCGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTC>8205GGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACC>1324TACTCTTGGGCGGCCAAAGTGACGGACTCTGTTTCACCTTGCGGAGCTGG>3479GTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCT>0308GCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTT>7442ACTCTGGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGC>0225CGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATT>4315GGCCAAAGTGACCGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTA>5466CGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCA>1038CTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGC>1005GTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTG>3931CTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCT>5310CCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACT>0758CTTGCGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTC>6139CGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTGAC>8273GCTCTGTTTCACCTTGCTGAGCTGGCTTCTAAATTACTAAAGGGGGGTGG>0045TGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGC>0677TGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTT>3636ACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGG>5902GCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTAC>4205GGCCAAAGTGACGGGCTCAGTTTCACCTTGCGGAGCTGGCTTCTAAATTA>9275GGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAA>1885TACCCGGACGGCCATAACCGGTACATCGTGGTACTCTGGGGCGGCCAAAG>7343CTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGAGGAGCTGGCTTC>8116AACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTT>6340CAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTGTAAATTACTA>0031TACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCTCCTT>4873CAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTA>4558CGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGG>3857CTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCG>4272CTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCTGAGCTGGCTTC>7358GTACCCGGCCGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAA>3605ACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGCG>6870CCTTGCGGAGCTGGCTTCTAAATTACTGAAGGGGGGTGGCGAGTACCGTC>4714GTACTCCTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTG>6271TGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCA>4337TTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTTTT>2406GGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGTGCT>6841CTGTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCT>4645AGTGACGGGATCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAG>0659ACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGT>6965CGGAGCTGGCTTCTAAACTACTAAAGGGGGGTGGCGAGTACCGTCGCAGG>2267TCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACC>5463ACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATGACTAAAGGGGG>1580TACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAG>7301GTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCT>3979GGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGT>7397TCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCG>0273ACGGGCTCTGTTTCACCTTGGGGAGCTGGCTTCTAAATTACTAAAGGGGG>3219GTCGGTACCCGGACGGCCATAACCGGTACATCGTCGTACTCTTGGGCGGC>6567GGTACTCTTGGGCGGCTAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCT>7716GTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGT>1139ATAACCCGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGT>7224GTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCT>6952GAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGAT>4236GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG>1507CCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACT>3442ACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGT>9814CTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGA>3429CATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTG>8413CATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGC>2874GGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGT>5151TCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGTTGGCG>6874CACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCG>3885TGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTA>4244CGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAACTGACGGGC>2520AACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTT>7466CGGGCTCTGTTTAACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGG>3297CGGTCCCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCA>7914GCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTC>4175GGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCGAAGTGACGGGCT>8058GTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTACACCTTGCGGAGCTG>6659GGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACG>1341AAGTCGGTACCCGGACGGCCATAACCGGAACATCGTGGTACTCTTGGGCG>7340AGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGACGCAGGATA>6802TTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTAC>1824CCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGA>5405TGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTA>0144GTAAGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGG>0656GGGCTCTGTTTCACCTTGCGGTGCTGGCTTCTAAATTACTAAAGGGGGGT>8739TTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTA>6742GTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCT>8634GTAAGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTAGGG>0298CTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGC>0365GTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTG>4996GGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACC>5241ACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTAGCGGAGCTGGC>7587AAGTCGGTACCCGGATGGCCATAACCGGTACATCGTGGTACTCTTGGGCG>1110TTCTAAATTACTAAAGGGGGGTGGCGAGTCCCGTCGCAGGATAGCGTTTT>9825GGGCGGCCAAAGTGACGGGCTCTGTTTTACCTTGCGGAGCTGGCTTCTAA>5229AACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGTTCTGTTT>5772CGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTGTCAC>8112TTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTATCGTCGC>4606GCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTC>1936ATAACCGGTACAACGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGT>5123CCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACT>6023CGGTTCATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCAC>7846GACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGG>8364GACGGGCTCTCTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGG>9472CGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGAC>6014ACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGT>8468TTGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCC>3568TGTTACACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAG>7681AGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAATACCGTCGCAGGATA>2580TTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT>0869CGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGAC>3213CCAAAGTGACGGGCTCTGTTTCACCATGCGGAGCTGGCTTCTAAATTACT>3818ATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCG>9857GACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGG>2115GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG>0116GTCGGTACCCGGACGGCCATAACCTGTACATCGTGGTACTCTTGGGCGGC>4507TAAGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGC>4804CACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCG>4806GTCGGTACCCGGACGGCCATAACCGGTACATCGTGGCACTCTTGGGCGGC>1937ACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGG>4296TTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGC>8610CGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTATGTTTCACCTTGCGGA>3346GCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTAC>3506GTACCCGGACGGCCAGAACCGGTACATCGTGGTACTCTTGGGCGGCCAAA>8729CGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGA>1678CCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCA>4821TGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGG>8348GGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTG>4802CATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTG>1454GTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGGTG>1105AGTGACGCGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAG>1118AAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAA>8176TTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGC>7219CGGGCTCTTTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGG>5516GTACCCGGACGGCCACAACCGGTACATCGTGGTACTCTTGGGCGGCCAAA>0212ATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGT>6998GTAAGTCGGTACGCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGG>7747CCATAACCGGTACATCGTGGTCCTCTTGGGCGGCCAAAGTGACGGGCTCT>9143ATAACCGGTACTTCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGT>1920CTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCG>5695GGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCTCCTTGCGGAGCT>1864CGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGAC>9853AAGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGAACTCTTGGGCG>8279AGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGG>0497TAAGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGC>9539CTGGCTTCTAAATTACTATAGGGGGGTGGCGAGTACCGTCGCAGGATAGC>4805CTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTTT>8339AAAGTGACGGGCTCTGTTTCATCTTGCGGAGCTGGCTTCTAAATTACTAA>5107GTGTTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAG>0888GGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGA>7817TAAGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGC>8269CTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCT>6364GAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGAT>0991ATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGTTCTGT>6675AGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATA>8272CTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTTT>8532ACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGT>4933ATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCG>4240TTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTTTT>3720CGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGC>0538CGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGG>7802GGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCT>3935GCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTAC>9917ACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTG>9021ATAACCGGTACATCGTCGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGT>4230GTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGC>6654GGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAA>5921CCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACT>8446GGCCATAACCGGTACATCGTGGTAGTCTTGGGCGGCCAAAGTGACGGGCT>2101GACGGGCTCTGTTTCACCTAGCGGAGCTGGCTTCTAAATTACTAAAGGGG>4749CGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGA>2543CACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTTGCGAGTACCG>8708GGGCTCTGTTTCACCTTGCGTAGCTGGCTTCTAAATTACTAAAGGGGGGT>3242CTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGCGGCGAGTACCGTCG>0241TTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTAC>4235GCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTC>3143ATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCG>2833TTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTA>0592ACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTG>3287GTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAG>6896GGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCT>3423CTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGC>6289GCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAAT>2774GGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCT>9004TTGGGCGGTCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT>7557AAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTTTAAATTACTAAA>2707GGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTG>4857TCTTGGGCGGCCAAAGTGACGGTCTCTGTTTCACCTTGCGGAGCTGGCTT>8240CCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGG>9793GCCATAACCGGTACATCGTGGTACTCGTGGGCGGCCAAAGTGACGGGCTC>6167CCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGTTCT>6345TCGGTACCCGGACGGCCATAACGGGTACATCGTGGTACTCTTGGGCGGCC>7736ACGGCCATAACCGGTAGATCGTGGTACTCTTGGGCGGCCAAAGTGACGGG>6739GGGCTCTGCTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGT>5128GCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGGGTT>4013TAAGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGC>5725TCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGGTTCACCTTGCGG>9587AGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGG>7949TGACGGGCGCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGG>6230TGTTTCACCTTGCGGAGCAGGCTTCTAAATTACTAAAGGGGGGTGGCGAG>0009AACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTT>2938CCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTC>3968GACGGGCTCTGTTTCACCTTGCGAAGCTGGCTTCTAAATTACTAAAGGGG>6545CACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCG>3060TACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGG>2627TTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG>2196AAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAA>0549AGTCGGTACCCGGACGGCCATAACCGGTACATCGTTGTACTCTTGGGCGG>5997CGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCAC>4713CCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTG>1155ACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAGAGGGGG>6716CCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACT>3173CATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTCGC>7257TACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTTTGTTTCACCTT>7885AGTCGGTACCCGGACGGCCAAAACCGGTACATCGTGGTACTCTTGGGCGG>0617CTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGC>2711CTGTTTCACCTTGCTGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGA>1719CTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTTT>4070TTGTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTTTT>6086TGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTA>9918GGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCT>2354TTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT>5446AAGTGACGGGCTGTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAA>8288GGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGT>6789GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACCAAAGG>4886CCGGTACATCGTGGTATTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCA>5693GTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGT>4066AGTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGC>8470TCGGTACCCGGACGGCCATAACCGGTACATCGTGATACTCTTGGGCGGCC>2193CCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTG>8020CGGCCAAAGTGACGGGCTCTGATTCACCTTGCGGAGCTGGCTTCTAAATT>7752TTGGGCGGCCAAAGTTACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT>8179GGGCGGCCAAAGTGACGGGGTCTGTTTCACCTTGCGGAGCTGGCTTCTAA>9055ACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGT>7739ACCGTTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTC>1765AAGTCGGTACCCGGACGGCCATAACCTGTACATCGTGGTACTCTTGGGCG>9084ACCGGTACATCGTGGTACTCTTGGGCGGCCCAAGTGACGGGCTCTGTTTC>2039GACGGCCATAACCAGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGG>8911GGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTGTAAATTA>4327CGGGCTCTGTTTCACCTTGCGGAGCGGGCTTCTAAATTACTAAAGGGGGG>1094GCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTC>3062CCATAACCGGTACATCGTGGTAATCTTGGGCGGCCAAAGTGACGGGCTCT>8667CCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCA>1578GTCGGTACCCGCACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGC>8576ACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGT>8796GGGCTCTGATTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGT>5453GTAAGTCGGTACCCGGACGGCCATAAGCGGTACATCGTGGTACTCTTGGG>1201CTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCATGATAGCGTTTTAG>4817GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG>3539GCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTC>0684CACCTTGCGGAGCTGGCTTCTAAATTACTAAAGTGGGGTGGCGAGTACCG>1276CCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCA>2832TGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAG>6165ACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGT>5854GCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAG>5501CCGGTACATCGTGGTACTCTTGGGCGGCCACAGTGACGGGCTCTGTTTCA>7955GTGACGGTCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG>6438CTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGTTAGCGTTTTAG>7107TGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCA>3126CAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTA>2526CGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGC>3272CTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCG>4611TCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCC>0660GGCCATAACCGGTACATCGTGGTACTCTTAGGCGGCCAAAGTGACGGGCT>5526GCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTC>2497GCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTAC>6181TTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTAC>9040GGAGCTGGCTTCTAAATTACTAAAGAGGGGTGGCGAGTACCGTCGCAGGA>9828TTCACCTTGCGGAGCTGGCTACTAAATTACTAAAGGGGGGTGGCGAGTAC>1800TTGCGGAGCTGGCTTCTTAATTACTAAAGGGGGGTGGCGAGTACCGTCGC>4899AGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCT>7777GACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGG>7902TGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCA>6969CGGCCATAACCGCTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGC>7854CTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCG>4788CATCGTGGTACTCTTGGGCGGCCAAAGCGACGGGCTCTGTTTCACCTTGC>8125ACGGCCATAACCGGTACATCGTGGCACTCTTGGGCGGCCAAAGTGACGGG>6761TTCTAAATTACTAAAGGGAGGTGGCGAGTACCGTCGCAGGATAGCGTTTT>4960GCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTAC>9189GGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGA>6769CCCGGACGGCCATCACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTG>4921ACCGGTACATCGTGGTACTCTTGGGCTGCCAAAGTGACGGGCTCTGTTTC>8758ACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGT>0543CTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCGCCTTGCGGAGCTGGCT>2499AGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAG>1893ACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTG>5418AAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAA>9109GGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACG>3908GCTGGCTTCTAAATTAATAAAGGGGGGTGGCGAGTACCGTCGCAGGATAG>9669GTATCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGT>7063GCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGG>7923GTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGT>4341CGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGAC>6206AAGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCG>7364TCTAAATTACTAAAGGGGGGTGGCGAGTACAGTCGCAGGATAGCGTTTTA>8563TTGGGCGGCCAAGGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT>4223TACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTT>8957GCTTCTAAATTAATAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTT>0816ATCGTGGTACTCTTGGGCGGCCAAAGTTACGGGCTCTGTTTCACCTTGCG>0899GGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCT>1525CTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGCCGA>2604ACTGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTC>2135TCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGCCGGCC>3191TCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCG>4439CGGAGCTGGCTTCTAAATTACTAACGGGGGGTGGCGAGTACCGTCGCAGG>3295GTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCT>0795ACGGGCTCTGTTTCACCTTGCGGAGCTGCCTTCTAAATTACTAAAGGGGG>1346TACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTT>9104AGTCGGTACCCGGACCGCCATAACCGGTACATCGTGGTACTCTTGGGCGG>2605GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAACGG>5004GTCGGTACCCGGACGGCCATAACCGGTACATCGTAGTACTCTTGGGCGGC>1939TTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGC>8991GTACTCTTGGGCGGGCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTG>9757CCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACT>6799TCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCC>1358TGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTTAATTACTAAAGGG>6446TCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCG>5369GACGGGCTCTGTTTCACCTTTCGGAGCTGGCTTCTAAATTACTAAAGGGG>1104TACTCTTGGGCGGCCAAAGAGACGGGCTCTGTTTCACCTTGCGGAGCTGG>5512CTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGCCT>7135AACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTT>2906CTAAATTACTAAAGGGGGGTGGCGAGTACAGTCGCAGGATAGCGTTTTAG>8906TCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCATCTTGCGG>7703ATACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCT>9864GAGCTGGCTTCTAAATTACTAATGGGGGGTGGCGAGTACCGTCGCAGGAT>4234TACATCGTGGTACTCTAGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTT>5293ACTCTTGGGCGGCCAAAGTGAAGGGCTCTGTTTCACCTTGCGGAGCTGGC>7113CGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGC>6977ACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTG>2668TGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTGTAAATTACTAAAGGG>2258AGTCGGTACCCGGACGGCCATAACCGGTACATCGAGGTACTCTTGGGCGG>2758TCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGG>6657GGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTG>0040TACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAG>4238TCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGG>3396TCTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAG>7541GCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAAT>6129CGGCCGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGAC>5608GCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGG>0221TGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCA

配列切り出し

>の記号をマーカーとして配列を切り出していきます。

この時、ついでにbiopythonモジュールを使用して逆相補鎖も取得しておきます。

indice = [i for i in range(len(s)) if s[i]=='>']
len_s = indice[1]-indice[0]-5
s_l = [Seq(s[i+5:i+5+len_s]) for i in indice]
s_r_l = [Seq(i).reverse_complement() for i in s_l]
s_l_ = s_l + s_r_l

ミスリードの検出

制約1に従い、ミスリードのみを変数s_wに格納します。

s_w = [i for i in s_l_ if s_l_.count(str(i))== 1 and i in s_l]

ミスリードの修正

制約2に従い、s_w中に格納されているミスリード配列を、元の配列から修正先候補を検出しながら修正していきます。

この時、判定用の逆相補鎖は含めないことに注意します。

tmp = []
for i in s_w:
    for j in s_l_:
        if get_hamming_distance(i,j) == 1 and i not in tmp and s_l.count(j) > 1:
            print(i+'->'+j)
            tmp.append(i)
            tmp.append(j)

実行結果

以下が実行結果です。

GCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATCACTAAAGGGGGGTGG->GCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGG
CGTGGTACTCTTCGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGA->CGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGA
AAGTGACGGGCTCTGTTTCACCTTGCGGATCTGGCTTCTAAATTACTAAA->AAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAA
GCGGCCAAAGTGACGGGATCTGTTTCACCTTGCGGAGCTGGCTTCTAAAT->GCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAAT
CTGGCTTCTAAATTACTAATGGGGGGTGGCGAGTACCGTCGCAGGATAGC->CTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGC
GCTCTGTTTCACCTTGCTGAGCTGGCTTCTAAATTACTAAAGGGGGGTGG->GCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGG
TGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTT->TGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTA
TACCCGGACGGCCATAACCGGTACATCGTGGTACTCTGGGGCGGCCAAAG->TACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAG
CAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTGTAAATTACTA->CAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTA
TACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCTCCTT->TACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTT
GTACTCCTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTG->GTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTG
GGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGTGCT->GGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCT
CTGTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCT->CTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCT
CGGAGCTGGCTTCTAAACTACTAAAGGGGGGTGGCGAGTACCGTCGCAGG->CGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGG
GGTACTCTTGGGCGGCTAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCT->GGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCT
TCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGTTGGCG->TCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCG
CGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAACTGACGGGC->CGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGC
GGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCGAAGTGACGGGCT->GGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCT
GTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTACACCTTGCGGAGCTG->GTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTG
TTCTAAATTACTAAAGGGGGGTGGCGAGTCCCGTCGCAGGATAGCGTTTT->TTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTTTT
AACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGTTCTGTTT->AACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTT
TTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTATCGTCGC->TTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGC
GACGGGCTCTCTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGG->GACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGG
TTGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCC->TCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCC
CCAAAGTGACGGGCTCTGTTTCACCATGCGGAGCTGGCTTCTAAATTACT->CCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACT
CGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTATGTTTCACCTTGCGGA->CGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGA
GTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGGTG->GTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTG
GGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCTCCTTGCGGAGCT->GGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCT
CTGGCTTCTAAATTACTATAGGGGGGTGGCGAGTACCGTCGCAGGATAGC->CTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGC
GGCCATAACCGGTACATCGTGGTAGTCTTGGGCGGCCAAAGTGACGGGCT->GGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCT
GACGGGCTCTGTTTCACCTAGCGGAGCTGGCTTCTAAATTACTAAAGGGG->GACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGG
CACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTTGCGAGTACCG->CACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCG
CTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGCGGCGAGTACCGTCG->CTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCG
TTGGGCGGTCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT->TTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT
AAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTTTAAATTACTAAA->AAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAA
CCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGG->TCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGG
GCCATAACCGGTACATCGTGGTACTCGTGGGCGGCCAAAGTGACGGGCTC->GCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTC
TCGGTACCCGGACGGCCATAACGGGTACATCGTGGTACTCTTGGGCGGCC->TCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCC
ACGGCCATAACCGGTAGATCGTGGTACTCTTGGGCGGCCAAAGTGACGGG->ACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGG
GCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGGGTT->GCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTT
TCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGGTTCACCTTGCGG->TCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGG
GACGGGCTCTGTTTCACCTTGCGAAGCTGGCTTCTAAATTACTAAAGGGG->GACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGG
TTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG->GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG
AGTCGGTACCCGGACGGCCATAACCGGTACATCGTTGTACTCTTGGGCGG->AGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGG
TACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTTTGTTTCACCTT->TACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTT
AGTCGGTACCCGGACGGCCAAAACCGGTACATCGTGGTACTCTTGGGCGG->AGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGG
TTGTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTTTT->TTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTTTT
AAGTGACGGGCTGTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAA->AAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAA
GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACCAAAGG->GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG
CCGGTACATCGTGGTATTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCA->CCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCA
TCGGTACCCGGACGGCCATAACCGGTACATCGTGATACTCTTGGGCGGCC->TCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCC
TTGGGCGGCCAAAGTTACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT->TTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT
CACCTTGCGGAGCTGGCTTCTAAATTACTAAAGTGGGGTGGCGAGTACCG->CACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCG
CCGGTACATCGTGGTACTCTTGGGCGGCCACAGTGACGGGCTCTGTTTCA->CCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCA
GTGACGGTCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG->GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG
GGCCATAACCGGTACATCGTGGTACTCTTAGGCGGCCAAAGTGACGGGCT->GGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCT
GGAGCTGGCTTCTAAATTACTAAAGAGGGGTGGCGAGTACCGTCGCAGGA->GGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGA
TTCACCTTGCGGAGCTGGCTACTAAATTACTAAAGGGGGGTGGCGAGTAC->TTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTAC
TTGCGGAGCTGGCTTCTTAATTACTAAAGGGGGGTGGCGAGTACCGTCGC->TTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGC
AGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCT->GGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCT
CGGCCATAACCGCTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGC->CGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGC
ACGGCCATAACCGGTACATCGTGGCACTCTTGGGCGGCCAAAGTGACGGG->ACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGG
TTCTAAATTACTAAAGGGAGGTGGCGAGTACCGTCGCAGGATAGCGTTTT->TTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTTTT
CCCGGACGGCCATCACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTG->CCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTG
CTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCGCCTTGCGGAGCTGGCT->CTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCT
GTATCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGT->GTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGT
TTGGGCGGCCAAGGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT->TTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCT
GCTTCTAAATTAATAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTT->GCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGATAGCGTT
ATCGTGGTACTCTTGGGCGGCCAAAGTTACGGGCTCTGTTTCACCTTGCG->ATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCG
TCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGCCGGCC->TCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCC
CGGAGCTGGCTTCTAAATTACTAACGGGGGGTGGCGAGTACCGTCGCAGG->CGGAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGG
AGTCGGTACCCGGACCGCCATAACCGGTACATCGTGGTACTCTTGGGCGG->AGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGG
GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAACGG->GTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGG
GTACTCTTGGGCGGGCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTG->GTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTG
GACGGGCTCTGTTTCACCTTTCGGAGCTGGCTTCTAAATTACTAAAGGGG->GACGGGCTCTGTTTCACCTTGCGGAGCTGGCTTCTAAATTACTAAAGGGG
CTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGCCT->CTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGGAGCTGGCT
TCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCATCTTGCGG->TCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTTGCGG
ATACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCT->GTACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCT
GAGCTGGCTTCTAAATTACTAATGGGGGGTGGCGAGTACCGTCGCAGGAT->GAGCTGGCTTCTAAATTACTAAAGGGGGGTGGCGAGTACCGTCGCAGGAT
TACATCGTGGTACTCTAGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTT->TACATCGTGGTACTCTTGGGCGGCCAAAGTGACGGGCTCTGTTTCACCTT
AGTCGGTACCCGGACGGCCATAACCGGTACATCGAGGTACTCTTGGGCGG->AGTCGGTACCCGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGG
CGGCCGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGAC->CGGACGGCCATAACCGGTACATCGTGGTACTCTTGGGCGGCCAAAGTGAC

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です