6. 実用的な文字列操作

第 5 章では、文字列の扱いの基本――作成方法、インデックスやスライスによる文字へのアクセス、基本的な文字列メソッド――を学びました。ここではその土台の上に、実際の Python プログラムで常に使うことになる、より高度な文字列操作を積み重ねていきます。

この章では、日常的なプログラミングの問題を解決する、実用的な文字列操作テクニックに焦点を当てます。テキストを分解して再構成したり、見た目が整ったフォーマット済みの出力を作ったりします。これらのスキルは、ユーザー入力の処理、レポートの生成、データファイルの読み込み、テキストを扱うあらゆるプログラムの構築に不可欠です。

6.1) 文字列の分割と結合

テキスト処理で最もよくある作業の 1 つは、文字列を小さなかたまりに分解したり、複数のかたまりを 1 つの文字列に結合したりすることです。Python はどちらの操作にも強力なメソッドを提供しています。

6.1.1) split() による文字列の分割

split() メソッドは、区切り文字 (separator／delimiter とも呼ばれます) に基づいて文字列を小さな文字列のリストに分割します。これは、CSV データのような構造化されたテキスト、複数の値を含むユーザー入力、文を単語に分割する場合などに非常に便利です。

空白文字での基本的な分割:

split() を引数なしで呼び出すと、任意の空白文字 (スペース、タブ、改行) で分割し、結果から空の文字列を自動的に取り除きます:

python

# split_basic.py
sentence = "Python is a powerful programming language"
words = sentence.split()
print(words)  # Output: ['Python', 'is', 'a', 'powerful', 'programming', 'language']
print(len(words))  # Output: 6

split() が複数のスペースをどのように賢く扱うかに注目してください:

python

# split_whitespace.py
messy_text = "Python    is     awesome"
words = messy_text.split()
print(words)  # Output: ['Python', 'is', 'awesome']

単語の間に複数のスペースがあっても、split() は任意の数の空白を 1 つの区切りとして扱い、きれいなリストを生成します。

特定の区切り文字で分割する:

引数として区切りたい文字または文字列を渡すことで、何で分割するかを正確に指定できます:

python

# split_separator.py
csv_data = "apple,banana,cherry,date"
fruits = csv_data.split(',')
print(fruits)  # Output: ['apple', 'banana', 'cherry', 'date']
 
date_string = "2024-03-15"
parts = date_string.split('-')
print(parts)  # Output: ['2024', '03', '15']
year = parts[0]
month = parts[1]
day = parts[2]
print(f"Year: {year}, Month: {month}, Day: {day}")  # Output: Year: 2024, Month: 03, Day: 15

重要な違い: 区切り文字を指定した場合、split() はそれを文字どおりに扱い、区切り文字が連続して現れると空文字列を生成します:

python

# split_empty_strings.py
data = "apple,,cherry"
result = data.split(',')
print(result)  # Output: ['apple', '', 'cherry']
print(len(result))  # Output: 3

中央にある空文字列 '' は、連続する 2 つのカンマの間の「何もない部分」を表しています。これは、引数なしで空白文字で分割する場合とは異なる動作です。

分割回数を制限する:

2 番目の引数 (maxsplit) を渡すことで、何回分割を行うかを制御できます:

python

# split_maxsplit.py
text = "one:two:three:four:five"
parts = text.split(':', 2)  # 最初の 2 つのコロンだけで分割する
print(parts)  # Output: ['one', 'two', 'three:four:five']

これは最大で 3 つの要素 (maxsplit + 1) を作成します。指定した回数だけ分割したら、それ以降は分割をやめ、残りの文字列はそのまま残ります。

実用例: ユーザー入力の処理

python

# process_input.py
user_input = input("Enter your full name: ")
# User enters: "Alice Marie Johnson"
 
name_parts = user_input.split()
if len(name_parts) >= 2:
    first_name = name_parts[0]
    last_name = name_parts[-1]  # 最後の要素
    print(f"First name: {first_name}")  # Output: First name: Alice
    print(f"Last name: {last_name}")    # Output: Last name: Johnson
else:
    print("Please enter at least a first and last name")

6.1.2) join() による文字列の結合

join() メソッドは split() の逆で、文字列のリスト (または任意のイテラブル) を 1 つの文字列に結合し、それぞれの要素の間に区切り文字を挟みます。構文は最初は逆のように見えるかもしれませんが、区切り文字の文字列がメソッドを呼び出し、リストを引数として渡します。

基本的な結合:

python

# join_basic.py
words = ['Python', 'is', 'awesome']
sentence = ' '.join(words)
print(sentence)  # Output: Python is awesome
 
csv_line = ','.join(['apple', 'banana', 'cherry'])
print(csv_line)  # Output: apple,banana,cherry

' ' や ',' のように join() を呼び出す文字列が、要素同士の間に挟まれる区切りになります。

なぜ構文が separator.join(list) なのか:

この構文は、区切り文字の立場から考えると理にかなっています。「これらの要素を自分自身をそれぞれのペアの間に挟みながら結合したい」というイメージです。また、この構文によりメソッドの連鎖がしやすくなり、コード中で区切り文字が非常に目立つようになります。

さまざまな区切り文字で結合する:

python

# join_separators.py
items = ['eggs', 'milk', 'bread', 'butter']
 
# カンマ区切り
print(', '.join(items))  # Output: eggs, milk, bread, butter
 
# 改行区切り (各要素を 1 行ずつ表示)
print('\n'.join(items))
# Output:
# eggs
# milk
# bread
# butter
 
# ハイフン区切り
print('-'.join(items))  # Output: eggs-milk-bread-butter
 
# 区切りなし (単純な連結)
print(''.join(items))  # Output: eggsmilkbreadbutter

重要: join() は文字列にしか使えない:

イテラブル内のすべての要素が文字列である必要があります。数値やその他の型を結合しようとするとエラーになります:

python

# join_error.py
numbers = [1, 2, 3, 4]
# result = ','.join(numbers)  # This would cause: TypeError: sequence item 0: expected str instance, int found

文字列以外の要素を結合するには、まずそれらを文字列に変換します。第 34 章でこれをよりエレガントに行う方法を学びますが、今のところは各要素を手作業で変換できます:

python

# join_numbers.py
numbers = [1, 2, 3, 4]
# 各数値を手動で文字列に変換する
string_numbers = [str(numbers[0]), str(numbers[1]), str(numbers[2]), str(numbers[3])]
result = ','.join(string_numbers)
print(result)  # Output: 1,2,3,4

実用例: ファイルパスの構築

python

# build_path.py
path_parts = ['home', 'user', 'documents', 'report.txt']
# Unix 系システム (Linux, macOS) の場合
unix_path = '/'.join(path_parts)
print(unix_path)  # Output: home/user/documents/report.txt
 
# Windows の場合
windows_path = '\\'.join(path_parts)
print(windows_path)  # Output: home\user\documents\report.txt

注意: 第 26 章では、より良いクロスプラットフォームなパス操作を提供する os.path モジュールについて学びますが、ここでは join の概念を示しています。

6.1.3) テキスト処理での split() と join() の組み合わせ

これら 2 つのメソッドは、テキストの変換において非常に相性よく働きます。組み合わせることで、乱れた入力をクリーンアップしたり、フォーマットを変換したり、データを抽出して再構成したりできます:

python

# transform_text.py
# 複数のスペースを 1 つのスペースに置き換える
messy = "Python    is     really    cool"
clean = ' '.join(messy.split())
print(clean)  # Output: Python is really cool
 
# カンマ区切りをスペース区切りに変換する
csv_data = "apple,banana,cherry"
space_separated = ' '.join(csv_data.split(','))
print(space_separated)  # Output: apple banana cherry
 
# すべてのスペースを削除する
text_with_spaces = "H e l l o"
no_spaces = ''.join(text_with_spaces.split())
print(no_spaces)  # Output: Hello

6.1.4) その他の分割メソッド

Python は特定の用途向けに、他にも分割メソッドを提供しています。

rsplit() - 右側から分割する:

python

# rsplit_example.py
path = "folder/subfolder/file.txt"
 
# maxsplit を指定した通常の split
parts = path.split('/', 1)
print(parts)  # Output: ['folder', 'subfolder/file.txt']
 
# rsplit は右側から分割する
parts = path.rsplit('/', 1)
print(parts)  # Output: ['folder/subfolder', 'file.txt']

これは、文字列の最後の部分と、それ以前のすべてを分けたいときに便利です。

splitlines() - 改行で分割する:

python

# splitlines_example.py
multiline = "Line 1\nLine 2\nLine 3"
lines = multiline.splitlines()
print(lines)  # Output: ['Line 1', 'Line 2', 'Line 3']
 
# 異なる改行スタイルにも対応する
mixed_lines = "Line 1\nLine 2\r\nLine 3\rLine 4"
all_lines = mixed_lines.splitlines()
print(all_lines)  # Output: ['Line 1', 'Line 2', 'Line 3', 'Line 4']

splitlines() メソッドは、すべての標準的な改行表現 (\n, \r\n, \r) を認識して分割します。そのため、さまざまなソースからのテキストを処理する際には、split('\n') よりも堅牢です。

6.2) f-文字列による文字列フォーマット

フォーマット済みの出力を作ることは、プログラミングでもっともよくある作業の 1 つです。テキストと変数の値を組み合わせたり、列を整列させたり、数値を整形したり、ユーザーにとって読みやすい出力を作成したりする必要があります。Python の f-文字列 (formatted string literal) は、これを行うための、もっとも新しく、読みやすく、強力な方法です。

6.2.1) f-文字列の基本構文

f-文字列は、先頭に f または F が付いた文字列リテラルで、中括弧 {} の中に式を含めることができます。Python はこれらの式を評価し、その結果を文字列に変換します:

python

# fstring_basic.py
name = "Alice"
age = 30
greeting = f"Hello, {name}! You are {age} years old."
print(greeting)  # Output: Hello, Alice! You are 30 years old.

{} の中に書けるのは、任意の有効な Python 式です:

python

# fstring_expressions.py
x = 10
y = 20
result = f"The sum of {x} and {y} is {x + y}"
print(result)  # Output: The sum of 10 and 20 is 30
 
price = 19.99
quantity = 3
total = f"Total cost: ${price * quantity}"
print(total)  # Output: Total cost: $59.97

6.2.2) f-文字列が古い方法より優れている理由

f-文字列 (Python 3.6 で導入) 以前は、プログラマーは文字列の連結や format() メソッドを使っていました。比較してみましょう。

文字列連結 (古くてエラーを起こしやすい方法):

python

# concatenation_example.py
name = "Bob"
age = 25
# 数値を文字列に変換し、多数の + 演算子を使う必要がある
message = "Hello, " + name + "! You are " + str(age) + " years old."
print(message)  # Output: Hello, Bob! You are 25 years old.

この方法は冗長で、( str() を忘れるとエラーになるなど) エラーを起こしやすく、変数が多くなると読みづらくなります。

f-文字列 (モダンでクリーンな方法):

python

# fstring_clean.py
name = "Bob"
age = 25
message = f"Hello, {name}! You are {age} years old."
print(message)  # Output: Hello, Bob! You are 25 years old.

f-文字列は値を自動で文字列に変換し、読みやすく、他の方法より実際には高速です。

6.2.3) f-文字列内での式やメソッド呼び出し

f-文字列の中には、複雑な式、メソッド呼び出し、さらには関数呼び出しも含めることができます:

python

# fstring_methods.py
name = "alice"
print(f"Capitalized: {name.capitalize()}")  # Output: Capitalized: Alice
print(f"Uppercase: {name.upper()}")  # Output: Uppercase: ALICE
print(f"Length: {len(name)}")  # Output: Length: 5
 
# 算術演算や比較
x = 10
print(f"Is {x} even? {x % 2 == 0}")  # Output: Is 10 even? True
 
# インデックスやスライス
text = "Python"
print(f"First letter: {text[0]}")  # Output: First letter: P
print(f"First three: {text[:3]}")  # Output: First three: Pyt

6.2.4) f-文字列での数値フォーマット

f-文字列は、値の表示方法を制御するフォーマット指定子 (format specifier) をサポートしています。構文は {expression:format_spec} です。

浮動小数点数の小数桁数を制御する:

python

# fstring_decimals.py
pi = 3.14159265359
 
print(f"Default: {pi}")  # Output: Default: 3.14159265359
print(f"2 decimals: {pi:.2f}")  # Output: 2 decimals: 3.14
print(f"4 decimals: {pi:.4f}")  # Output: 4 decimals: 3.1416
print(f"No decimals: {pi:.0f}")  # Output: No decimals: 3

フォーマット指定子 .2f は「小数点以下 2 桁の float としてフォーマットする」という意味です。f は「固定小数点表記 (fixed-point notation)」を表します。

3 桁ごとの区切りを付けてフォーマットする:

python

# fstring_thousands.py
large_number = 1234567890
 
print(f"No separator: {large_number}")  # Output: No separator: 1234567890
print(f"With commas: {large_number:,}")  # Output: With commas: 1,234,567,890
print(f"With underscores: {large_number:_}")  # Output: With underscores: 1_234_567_890
 
# 小数桁数との組み合わせ
price = 1234567.89
print(f"Price: ${price:,.2f}")  # Output: Price: $1,234,567.89

パーセント表記:

python

# fstring_percentage.py
ratio = 0.847
print(f"Ratio: {ratio}")  # Output: Ratio: 0.847
print(f"Percentage: {ratio:.1%}")  # Output: Percentage: 84.7%
print(f"Percentage: {ratio:.2%}")  # Output: Percentage: 84.70%

% フォーマット指定子は値を 100 倍し、パーセント記号を付けます。

6.2.5) f-文字列の実用例

フォーマットされたレポートを作成する:

python

# report_example.py
product = "Laptop"
price = 899.99
quantity = 5
tax_rate = 0.08
 
subtotal = price * quantity
tax = subtotal * tax_rate
total = subtotal + tax
 
print(f"Product: {product}")  # Output: Product: Laptop
print(f"Price: ${price:.2f}")  # Output: Price: $899.99
print(f"Quantity: {quantity}")  # Output: Quantity: 5
print(f"Subtotal: ${subtotal:.2f}")  # Output: Subtotal: $4499.95
print(f"Tax (8%): ${tax:.2f}")  # Output: Tax (8%): $360.00
print(f"Total: ${total:.2f}")  # Output: Total: $4859.95

ユーザー向けのわかりやすいメッセージを作成する:

python

# user_messages.py
username = "Alice"
login_count = 42
last_login = "2024-03-15"
 
welcome = f"Welcome back, {username}!"
stats = f"You've logged in {login_count} times. Last login: {last_login}"
 
print(welcome)  # Output: Welcome back, Alice!
print(stats)  # Output: You've logged in 42 times. Last login: 2024-03-15

6.2.6) f-文字列によるデバッグ

Python 3.8 では、デバッグ用の = 指定子が導入され、式そのものとその値を両方表示できるようになりました:

python

# fstring_debug.py
x = 10
y = 20
z = x + y
 
print(f"{x=}")  # Output: x=10
print(f"{y=}")  # Output: y=20
print(f"{z=}")  # Output: z=30
print(f"{x + y=}")  # Output: x + y=30

これは、開発中に変数の値を素早く確認するのに非常に便利で、変数名を 2 回書く必要がありません。

6.2.7) f-文字列で中括弧をエスケープする

f-文字列の中でリテラルの中括弧を使いたい場合は、2 回続けて書きます:

python

# fstring_escape.py
value = 42
# 単一の中括弧は式のプレースホルダー
print(f"Value: {value}")  # Output: Value: 42
 
# 二重の中括弧はリテラルの中括弧になる
print(f"Use {{value}} as a placeholder")  # Output: Use {value} as a placeholder
print(f"The value is {value}, shown as {{value}}")  # Output: The value is 42, shown as {value}

6.3) format() とフォーマット指定子によるフォーマット

f-文字列がモダンで推奨される方法ですが、format() メソッドも今なお広く使われており、理解しておくと便利な機能を提供します。また、format() は f-文字列の土台となる仕組みでもあるため、format() を理解することで f-文字列もよりよく理解できます。

6.3.1) format() の基本構文

format() メソッドは、文字列内の中括弧 {} をプレースホルダーとして使い、挿入する値を引数として渡します:

python

# format_basic.py
template = "Hello, {}! You are {} years old."
message = template.format("Alice", 30)
print(message)  # Output: Hello, Alice! You are 30 years old.
 
# format() を複数回使う
greeting = "Hello, {}!".format("Bob")
print(greeting)  # Output: Hello, Bob!

6.3.2) 位置引数とキーワード引数

位置番号や名前を使って、どの引数をどこに挿入するか制御できます。

位置引数:

python

# format_positional.py
# デフォルトの順番 (0, 1, 2, ...)
template = "{} + {} = {}"
result = template.format(5, 3, 8)
print(result)  # Output: 5 + 3 = 8
 
# 位置を明示する
template = "{0} + {1} = {2}"
result = template.format(5, 3, 8)
print(result)  # Output: 5 + 3 = 8
 
# 並べ替えや再利用
template = "{2} = {0} + {1}"
result = template.format(5, 3, 8)
print(result)  # Output: 8 = 5 + 3
 
# 同じ値を再利用する
template = "{0} times {0} equals {1}"
result = template.format(7, 49)
print(result)  # Output: 7 times 7 equals 49

キーワード引数:

python

# format_keyword.py
template = "Hello, {name}! You are {age} years old."
message = template.format(name="Alice", age=30)
print(message)  # Output: Hello, Alice! You are 30 years old.
 
# 位置引数と混在させることも可能 (位置引数は先に書く必要がある)
template = "{0}, your score is {score} out of {1}"
result = template.format("Alice", 100, score=95)
print(result)  # Output: Alice, your score is 95 out of 100

6.3.3) format() でのフォーマット指定子

フォーマット指定子は、f-文字列の場合と同じく : 区切りで format() でも使えます:

python

# format_specifiers.py
pi = 3.14159265359
 
print("{:.2f}".format(pi))  # Output: 3.14
print("{:.4f}".format(pi))  # Output: 3.1416
 
# 名前付き引数と組み合わせる
print("{value:.2f}".format(value=pi))  # Output: 3.14
 
# 複数の値を、異なるフォーマットで
template = "{name}'s score is {score:.1f}%"
result = template.format(name="Bob", score=87.654)
print(result)  # Output: Bob's score is 87.7%

6.3.4) f-文字列ではなく format() を使うべきとき

通常は f-文字列が好まれますが、特定の状況では format() が便利です。

1. テンプレート文字列をデータとは別に定義するとき:

python

# format_templates.py
# テンプレートを 1 回定義し、異なるデータで何度も使う
email_template = "Dear {name},\n\nYour order #{order_id} has shipped.\n\nThank you!"
 
# テンプレートを異なる顧客に対して使う
message1 = email_template.format(name="Alice", order_id=12345)
message2 = email_template.format(name="Bob", order_id=12346)
 
print(message1)
# Output:
# Dear Alice,
#
# Your order #12345 has shipped.
#
# Thank you!
 
print(message2)
# Output:
# Dear Bob,
#
# Your order #12346 has shipped.
#
# Thank you!

2. テンプレートが外部ソースから渡されるとき:

python

# format_external.py
# テンプレートは設定ファイルやデータベースから来るかもしれない
# (ファイルの読み込みについては第 24 章で学びます)
user_template = input("Enter message template: ")
# User enters: "Hello, {name}! Welcome to {place}."
 
message = user_template.format(name="Charlie", place="Python")
print(message)  # Output: Hello, Charlie! Welcome to Python.

f-文字列では、式が即座に評価されるためテンプレートはコード中にある必要があります。一方 format() では、テンプレートはどこから来た通常の文字列でも構いません。

6.3.5) オブジェクト属性や辞書キーへのアクセス

format() メソッドは、オブジェクトの属性や辞書のキーに直接アクセスできます:

python

# format_access.py
# 辞書へのアクセス
person = {"name": "Alice", "age": 30, "city": "Boston"}
message = "Name: {0[name]}, Age: {0[age]}, City: {0[city]}".format(person)
print(message)  # Output: Name: Alice, Age: 30, City: Boston
 
# キーワード引数を使う場合
message = "{p[name]} is {p[age]} years old".format(p=person)
print(message)  # Output: Alice is 30 years old

注意: オブジェクトの属性については第 30 章で学びますが、ここでは format() がネストしたデータ構造にもアクセスできることを示しています。

6.4) フォーマット済み出力での数値の整列と丸め

プロフェッショナルな見た目の出力には、値の整列や数値フォーマットが重要になることがよくあります。f-文字列と format() の両方が、見栄えの良い表やレポート、表示を作るための強力な機能を提供します。

6.4.1) テキストの整列

フォーマット指定子を使って、値の幅と配置 (整列) を制御できます:

python

# alignment_basic.py
# 構文: {value:width}
# デフォルトは文字列が左寄せ、数値が右寄せ
 
name = "Alice"
print(f"|{name}|")      # Output: |Alice|
print(f"|{name:10}|")   # Output: |Alice     |  (幅 10 の左寄せ)
print(f"|{name:>10}|")  # Output: |     Alice|  (右寄せ)
print(f"|{name:^10}|")  # Output: |  Alice   |  (中央寄せ)

配置指定子は次のとおりです:

< : 左寄せ (文字列のデフォルト)
> : 右寄せ (数値のデフォルト)
^ : 中央寄せ

数値の整列:

python

# alignment_numbers.py
value = 42
print(f"|{value}|")      # Output: |42|
print(f"|{value:5}|")    # Output: |   42|  (デフォルトで右寄せ)
print(f"|{value:<5}|")   # Output: |42   |  (左寄せ)
print(f"|{value:^5}|")   # Output: | 42  |  (中央寄せ)

6.4.2) 任意の埋め文字 (フィル文字)

空白部分を埋める文字を指定することができます:

python

# alignment_fill.py
name = "Bob"
print(f"|{name:*<10}|")  # Output: |Bob*******|
print(f"|{name:*>10}|")  # Output: |*******Bob|
print(f"|{name:*^10}|")  # Output: |***Bob****|
 
# 見た目の区切り線として便利
print(f"{name:=^20}")    # Output: ========Bob=========

構文は {value:fill_char align width} です。

6.4.3) 整列と数値フォーマットの組み合わせ

幅、配置、数値フォーマットを組み合わせることができます:

python

# alignment_combined.py
price = 19.99
quantity = 5
total = price * quantity
 
# 幅 10、右寄せ、小数点以下 2 桁
print(f"Price:    ${price:>10.2f}")     # Output: Price:    $     19.99
print(f"Quantity: {quantity:>10}")      # Output: Quantity:          5
print(f"Total:    ${total:>10.2f}")     # Output: Total:    $     99.95
 
# 見た目の効果のために空白を別の文字に置き換える
print(f"Total:    ${total:>10.2f}".replace(' ', '.'))  # Output: Total:....$.....99.95

この章では、ほぼすべての Python プログラムで使うことになる、重要な文字列操作テクニックを学びました。テキスト処理のための文字列の分割と結合、f-文字列とフォーマット指定子によるフォーマット済み出力の作成、プロフェッショナルな表示に向けた数値の整列とフォーマットです。

これらのスキルは、Python でテキストデータを扱ううえでの基礎になります。これでユーザー入力を処理したり、フォーマットされたレポートを作成したり、ファイルからのデータを扱ったりできるようになります (ファイルについては第 24 章で扱います)。Python の学習を続けるなかで、これらの文字列操作テクニックは常に使うことになるので、自然に使いこなせるようになるまで練習してください。