サムネがコーヒーの記事は書きかけです。

【研究用スクリプト】最近接塩基対法によるプライマーの自動設計

各隣接配列のエントロピー、エンタルピーを考慮したTmを利用してプライマー配列候補を自動計算するプログラムを書きます。

最近接塩基対法

以下の式を利用して、Tm値を求めます。

$Tm=\frac{\Delta H}{A + \Delta S + R\ln \frac{C}{4}}-273.15+16.6\log [Na^+]$

この時の各種熱力学的パラメータは以下の通りです。

parameters = {
    "deltaH":{
                "AA":-9.1,
                "AT":-8.6,
                "TA":-6.0,
                "CA":-5.8,
                "GT":-6.5,
                "CT":-7.8,
                "GA":-5.6,
                "CG":-11.9,
                "GC":-11.1,
                "GG":-11.0,
                "TT":-9.1,
                "AT":-8.6,
                "TA":-6.0,
                "TG":-5.8,
                "AC":-6.5,
                "AG":-7.8,
                "TC":-5.6,
                "CG":-11.9,
                "GC":-11.1,
                "CC":-11.0,
            },
    "deltaS":{
                "AA":-0.024,
                "AT":-0.0239,
                "TA":-0.0169,
                "CA":-0.0129,
                "GT":-0.0173,
                "CT":-0.0208,
                "GA":-0.0135,
                "CG":-0.0278,
                "GC":-0.0267,
                "GG":-0.0266,
                "TT":-0.024,
                "AT":-0.0239,
                "TA":-0.0169,
                "TG":-0.0129,
                "AC":-0.0173,
                "AG":-0.0208,
                "TC":-0.0135,
                "CG":-0.0278,
                "GC":-0.0267,
                "CC":-0.0266,
            },
    }

上記を利用して実際に配列のTm値を求める関数を作成します。

def Tm(seq: str) -> float:
    parameters: dict[str,dict[str,float]] = {
    "deltaH":{
                "AA":-9.1,
                "AT":-8.6,
                "TA":-6.0,
                "CA":-5.8,
                "GT":-6.5,
                "CT":-7.8,
                "GA":-5.6,
                "CG":-11.9,
                "GC":-11.1,
                "GG":-11.0,
                "TT":-9.1,
                "AT":-8.6,
                "TA":-6.0,
                "TG":-5.8,
                "AC":-6.5,
                "AG":-7.8,
                "TC":-5.6,
                "CG":-11.9,
                "GC":-11.1,
                "CC":-11.0,
            },
    "deltaS":{
                "AA":-0.024,
                "AT":-0.0239,
                "TA":-0.0169,
                "CA":-0.0129,
                "GT":-0.0173,
                "CT":-0.0208,
                "GA":-0.0135,
                "CG":-0.0278,
                "GC":-0.0267,
                "GG":-0.0266,
                "TT":-0.024,
                "AT":-0.0239,
                "TA":-0.0169,
                "TG":-0.0129,
                "AC":-0.0173,
                "AG":-0.0208,
                "TC":-0.0135,
                "CG":-0.0278,
                "GC":-0.0267,
                "CC":-0.0266,
            },
    }
    deltaH,deltaS = 0, 0
    for i in range(len(seq)-1):
        deltaH += parameters["deltaH"][seq[i:i+2]]
        deltaS += parameters["deltaS"][seq[i:i+2]]
    return (deltaH)/(-0.0108+deltaS+0.00199*np.log(0.0000005/4)) - 273.15+16.6*np.log10(0.05)

最適プライマーの探索

以下の制約条件のもと、最適なプライマー候補を探索します。

・塩基配列サイズは25bp以上、35bp以下であること
・GC rateは0.45から0.6であること
・Tm値は63以上であること
・ATリッチな配列(5連続以上)でないこと
・GC3'末端を持つこと

この条件のもと、以下の配列について検証してみます。

CGATAAGCTAGCTTCACGCTGCCGCAAGCACTCAGGGCGCAAGGGCTGCTAAAGGAAGCGGAACACGTAGAAAGCCAGTCCGCAGAAACGGTGCTGACCCCGGATGAATGTCAGCTACTGGGCTATCTGGACAAGGGAAAACGCAAGCGCAAAGAGAAAGCAGGTAGCTTGCAGTGGGCTTACATGGCGATAGCTAGACTGGGCGGTTTTATGGACAGCAAGCGAACCGGAATTGCCAGCTGGGGCGCCCTCTGGTAAGGTTGGGAAGCCCTGCAAAGTAAACTGGATGGCTTTCTTGCCGCCAAGGATCTGATGGCGCAGGGGATCAAGATCTGATCAAGAGACAGGATGAGGATCGTTTCGCATGATTGAACAAGATGGATTGCACGCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGATGCCGCCGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTGTCCGGTGCCCTGAATGAACTCCAAGACGAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTGTCACTGAAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCACCTTGCTCCTGCCGAGAAAGTATCCATCATGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTCGACCACCAAGCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGATCTGGACGAAGAGCATCAGGGGCTCGCGCCAGCCGAACTGTTCGCCAGGCTCAAGGCGCGGATGCCCGACGGCGAGGATCTCGTCGTGACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAAATGGCCGCTTTTCTGGATTCATCGACTGTGGCCGGCTGGGTGTGGCGGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCATCGCCTTCTATCGCCTTCTTGACGAGTTCTTCTGAGCGGGACTCTGGGGTTCGAAATGACCGACCAAGCGACGCCCAACCTGCCATCACGAGATTTCGATTCCACCGCCGCCTTCTATGAAAGGTTGGGCTTCGGAATCGTTTTCCGGGACGCCCTCGCGGACGTGCTCATAGTCCACGACGCCCGTGATTTTGTAGCCCTGGCCGACGGCCAGCAGGTAGGCCGACAGGCTCATGCCGGCCGCCGCCGCCTTTTCCTCAATCGCTCTTCGTTCGTCTGGAAGGCAGTACACCTTGATAGGTGGGCTGCCCTTCCTGGTTGGCTTGGTTTCATCAGCCATCCGCTTGCCCTCATCTGTTACGCCGGCGGTAGCCGGCCAGCCTCGCAGAGCAGGATTCCCGTTGAGCACCGCCAGGTGCGAATAAGGGACAGTGAAGAAGGAACACCCGCTCGCGGGTGGGCCTACTTCACCTATCCTGCCCCGCTGACGCCGTTGGATACACCAAGGAAAGTCTACACGAACCCTTTGGCAAAATCCTGTATATCGTGCGAAAAAGGATGGATATACCGAAAAAATCGCTATAATGACCCCGAAGCAGGGTTATGCAGCGGAAAAGCGCTGCTTCCCTGCTGTTTTGTGGAATATCTACCGACTGGAAACAGGCAAATGCAGGAAATTACTGAACTGAGGGGACAGGCGAGAGACGATGCCAAAGAGCTCCTGAAAATCTCGATAACTCAAAAAATACGCCCGGTAGTGATCTTATTTCATTATGGTGAAAGTTGGAACCTCTTACGTGCCGATCAACGTCTCATTTTCGCCAAAAGTTGGCCCAGGGCTTCCCGGTATCAACAGGGACACCAGGATTTATTTATTCTGCGAAGTGATCTTCCGTCACAGGTATTTATTCGGCGCAAAGTGCGTCGGGTGATGCTGCCAACTTACTGATTTAGTGTATGATGGTGTTTTTGAGGTGCTCCAGTGGCTTCTGTTTCTATCAGCTCCTGAAAATCTCGATAACTCAAAAAATACGCCCGGTAGTGATCTTATTTCATTATGGTGAAAGTTGGAACCTCTTACGTGCCGATCAACGTCTCATTTTCGCCAAAAGTTGGCCCAGGGCTTCCCGGTATCAACAGGGACACCAGGATTTATTTATTCTGCGAAGTGATCTTCCGTCACAGGTATTTATTCGGCGCAAAGTGCGTCGGGTGATGCTGCCAACTTACTGATTTAGTGTATGATGGTGTTTTTGAGGTGCTCCAGTGGCTTCTGTTTCTATCAGGGCTGGATGATCCTCCAGCGCGGGGATCTCATGCTGGAGTTCTTCGCCCACCCCAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCATTGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGACCATGATTACGAATTCGAGCTCGGTACCCtgacgttctagACCCTACCTCTCCAAGATTACCGGCGTGCCCATGGTGGCCCTGGCCACCAACGTCATGCTGGGCAAAAGCCTGCCGGAGCAGGGCTACCGGGGCGGCTTAATGCCGCCGCCGGATTTTACCGCCGTAAAGGTCCCCGTCTTTTCCTTCGGCAAGCTGTTGCAGGTGGACACCTCCCTGGGACCGGAGATGAAGTCCACCGGCGAGGTAATGGGGATTGATCCCGTCTTCGAACGCGCCCTCTATAAAGGCCTGGTAGCCGCCGGCTGCTCCATCCCCCATCACGGCACCCTGCTGGCGACCATCGCCGATAAGGACAAGGCGGAAGCAGTGCCCATCATCAAGGGCTTTGCCGAACTGGGCTTCCAGGTGGTGGCTACCGCCGGCACCGCCGGCGCCCTGGCCGCAGCGGGACTCTTCGTAGAGAGGGTGGGGAAGATCCGCGAGGGTTCGCCCCACATTATCGACTATATCCGGGAAGGGAAGGTCCACTTTGTCCTCAACACCCTCACCAGGGGCAAGATGCCCGGCCGGGACGGTTTTAAGATCCGCCGCGCCGCGGCCGAACTGGGCATCCCCTGCCTGACTTCCCTGGATACGGCCCGGGCCCTGCTCAAAGTCCTCCAGTCCCTGAAGTCCGGCGACGGGTTTAACCTCAAACCCCTGCAGGAGTATGTACCCCTTTCCCGTCCTTAACGGAGGAGCGCCAAATCGCCTCCGCCCCACCCCGGCAGGAGCAGCAGCCCGCGGCTGCACCGGCCGGGCGGTTTCCCGGCCGGCCCTTCAAGCACCAGGCGAGATGGCCGGGCCGCCGCCATTTAGCATATCAAGAGCGCCGGAAGGGAAGGGCTTTTCCGGTTTTTACCGGTCGGGGTTAAGCCTGACTTAAGGGCCGGTACCGGACCCTCCCCATATTCACTCCGCTTACACTCCGTTTTTTGAACTATAAGATCATAAAGCGATATTTAAGGGCTTCTGGCCTGCTTGCCAACACTAATGTACCTGCAGGAGATGATCCGCATGCATGCCAAGGACAAAATAATCGTCGCCCTGGATGTTCCCGACCTGGCTGCCGGGGAAAAGCTGGTGGACCGGCTTTCCCCCTACGCCGGCATGTTTAAAGTCGGCCTGGAGTTTTTCACCGCCGCCGGGCCGGCGGCCGTCCGGATGGTAAAGGAGCGTGGTGGCCGGGTATTTGCCGACCTGAAGTTCCACGACATCCCCAACACCGTGGCCGGAGCGGCGCGGGCCCTGGTGCGCCTGGGCGTGGATATGCTCAACGTTCACGCCGCCGGCGGCAAGGCCATGCTGCAGGCTGCCGCCGCCGCCGTCCGGGAGGAGGCCGCGGCCTTAAACCGCCCGGCGCCGGTAATAATCGCGGTCACTGTTTTGACCAGCCTGGACAGGGAAGCTCTACGCTGCGAGGTGGGTATCGAGCGAGAGGTAGAAGAACAGGTGGCCCGCTGGGCGCTCCTGGCCCGGGAGGCCGGCCTGGACGGCGTAGTAGCCTCGCCCCGGGAGATCCGGGCCATCCGGGAGGCCTGCGGGCCGGAGTTCGTCATCGTGACCCCGGGCGTGCGCCCGGCTGGGTCCGACCGGGGCGACCAGCGCCGGGTCATGACCCCGGCCGAGGCCCTGCGGGAGGGCGCCTCCTACCTGGTCATCGGCCGGCCCATCACCGCGGCCCCCGACCCCGTCGCCGCCGCCCGGGCCATCGCGGCGGAAATAGAGATGGTGAAATAATAACTGGACGGTTGCCAAGTACCGGGACGAGCAGGGCATCCCGGCGGCGGCTAAAAGAAAACGATATTAGTTAAGAAGGATTTTGACCATTTGTGTTGAATAGATAGTGTTTGACGGTACAATCTCCGGCAATTAGCAATATATCATAATAAATCCTGATTGGGTTAGGAATAATATCAAAAGCCAAGGAGCCTGAAAGCGGTGGGGGTTGACGCTGCAGGAATTTAACCCTTGCCGTTACAATAAATATAAGGAGGAGTACATAATGAACTTCAACAAGATCGATCTGGACAACTGGAAACGCAAGGAAATCTTCAACCATTATCTGAACCAGCAGACCACCTTCTCCATCACCACGGAGATCGACATCTCCGTGCTGTACCGGAACATCAAGCAAGAAGGCTACAAGTTCTACCCCGCCTTCATCTTTCTGGTCACGCGGGTCATCAACAGCAACACCGCCTTCCGCACCGGGTACAACTCCGACGGCGAGCTCGGCTACTGGGACAAGCTGGAACCGCTCTACACCATCTTCGACGGCGTGAGCAAGACCTTTAGCGGCATCTGGACGCCCGTGAAGAACGACTTCAAGGAGTTCTACGATCTGTACCTCTCCGACGTGGAAAAGTACAACGGCTCCGGCAAGCTGTTCCCGAAAACCCCCATTCCCGAAAACGCCTTTTCCCTCTCCATCATCCCGTGGACCAGCTTCACCGGCTTTAATCTGAACATTAACAACAACAGCAACTATCTGCTGCCGATCATCACCGCCGGCAAGTTCATCAACAAGGGCAACAGCATCTACCTCCCGCTGAGTCTGCAAGTCCACCACAGCGTCTGCGATGGCTACCACGCCGGGCTGTTTATGAACAGCATCCAAGAACTGAGCGACCGCCCGAATGACTGGCTGCTGTAATCGGCCTGCTTTCATGCTTGATAATTTTTGTCATGTAGGGCTACAATGATAGTAACAGGTGATGACACGATGGAACGAATTAACTTTATCAATACCCGGGAGTTTAAAAATAGAGCAACCCAAATCTTGAGGCAGGTACAAAAAGACCAGGTTATTATTATAACCAATCGCGGTAAACCTGTAGCCACTTTAAAAGGTTTCAATCCACGTGACCTGGTTGTTGCAGAAGATAGACATGATAGCCTTTACCAGCATTTGCGGCAACAAATTTTAAAAGAAAGTCCAGAACTGGCTGCCAGGGATACCAGGCAAATCGCCACTGATTTTGAAAAGATAACAGCTAAAATGAGAAAACAGATTGCCTACAGGACCTGGGAAGAAATGGACCGGCACTTAAAAGGGGATCCTTATGATCTTACTGGATACTAATATTTTTATAATCGATCGTTTTTTTCCAGGGGATAGTCATTACGCTATAAACAAGGAATTCATTCAAGAGTTATCGCGGCTCGAGGCGGGGTTTTCTATTTTTTCGTTATTAGAACTTACCGGCATTGCTTCTTTTAACCTTTCAGCCAAAGAATTGCAGCAATGGTTGTTTGATTTTGCCTCCGTTTATCCTATTCGTATTCTTGATCCCTATGATTTAAAGATTGATTCTGCCAAGGAATGGTATACTAAATTTTTGCAGGAACTAATGGCAAAAGTTACCCACCAAATGACTTTTGGCGACGCTATTTTTTTACGTGAAGCTGAAGGTTATCAAGTAGAGTATATTATTAGCTGGAATAAGAAACATTTTCTTTCACGTACGACAATCAAAGTGCTCAACCCTGAAGAGTTTTTGACAATATGGAAACCGCAATAATTCCTTTGGTCATAAGCAAGGCGTGGGCTTTTTTAATCTCGTTGGAATGGGTAACTTTATGGGGTTAGCTCCCGGCAACCTAAATCGGAAGGTGCATAAGCTTGGACagatatcaggtcaGGGGATCCTCTAGAGTCGACCTGCAGGCATGCAAGCTTGGCACTGGCCGTCGTTTTACAACGTCGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGCG

先ほどのTm関数を使用して、候補配列を取り出します。

primer_candidates:list[str] = [seq[i:bp+i] for bp in range(25,36) for i in range(0,len(seq) - bp) 
                        if 0.45<get_GC_rate(seq[i:bp+i])<0.6 and 
                        Tm(seq[i:bp+i])> 63 and 
                        ((seq[i:bp+i][0] == "G" and seq[i:bp+i][-1] == "C") or (seq[i:bp+i][0] == "C" and seq[i:bp+i][-1] == "G")) and
                        "AAAA" not in seq[i:bp+i] and 
                        "TTTT" not in seq[i:bp+i] 
                        ]

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です