サムネがコーヒーの記事は書きかけです。

Pythonで特定の文字に挟まれた文字を抽出する方法

データ分析の際に、CSVにカンマを使用しているデータを整形する場合、split関数が使えないため困ったことがありました。そのため、今回は文字列中の特定の文字に挟まれた文字列を抽出する方法について考えます。

調べたところ、正規表現を使用して簡単に実装できるようなので、やってみます。

import re
import re
s = '"aaa"'
def split_(s:str,left:str,right:str) -> list[str]:
    return list(re.search(r'{}(.+){}'.format(left,right),s).groups())

print(split_(s,"\"",'\"'))
>>>
aaa

ただし、上記のブログラムだと、同じ文字が多数あった場合に一番外側のものだけが認識されてしまうため、複数のターゲットに対しては使えません。

そのため、文字列を特定の文字に囲まれた複数の文字列に分割する関数を作成します。

def split_1(s:str,left:str,right:str) -> list[str]:
    if left != right:
        left_i:list[int] = [i for i in range(len(s)) if s[i] == left]
        right_i:list[int] = [i for i in range(len(s)) if s[i] == right]
        return [s[i:j] for i,j in zip(left_i,right_i)]
    i_s:list[int] = [i for i in range(len(s)) if s[i] == left]
    return [s[i_s[i*2]:i_s[i*2+1]] for i in range(len(i_s)//2)]
    
s = '"Austria","AUT","Fertility rate, total (births per woman)","SP.DYN.TFRT.IN","2.69","2.78","2.8","2.82","2.79","2.7","2.66","2.62","2.58","2.49","2.29","2.2","2.08","1.94","1.91","1.83","1.69","1.63","1.6","1.6","1.65","1.67","1.66","1.56","1.52","1.47","1.45","1.43","1.45","1.45","1.46","1.51","1.51","1.5","1.47","1.42","1.45","1.39","1.37","1.34","1.36","1.33","1.39","1.38","1.42","1.41","1.41","1.38","1.42","1.39","1.44","1.43","1.44","1.44","1.46","1.49","1.53","1.52","1.47","1.46","1.44","",'
>>>
['"Austria', '"AUT', '"Fertility rate, total (births per woman)', '"SP.DYN.TFRT.IN', '"2.69', '"2.78', '"2.8', '"2.82', '"2.79', '"2.7', '"2.66', '"2.62', '"2.58', '"2.49', '"2.29', '"2.2', '"2.08', '"1.94', '"1.91', '"1.83', '"1.69', '"1.63', '"1.6', '"1.6', '"1.65', '"1.67', '"1.66', '"1.56', '"1.52', '"1.47', '"1.45', '"1.43', '"1.45', '"1.45', '"1.46', '"1.51', '"1.51', '"1.5', '"1.47', '"1.42', '"1.45', '"1.39', '"1.37', '"1.34', '"1.36', '"1.33', '"1.39', '"1.38', '"1.42', '"1.41', '"1.41', '"1.38', '"1.42', '"1.39', '"1.44', '"1.43', '"1.44', '"1.44', '"1.46', '"1.49', '"1.53', '"1.52', '"1.47', '"1.46', '"1.44', '"']

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です