34. 推导式：一种紧凑创建列表、字典与集合的方法

推导式(comprehensions)是 Python 最优雅的特性之一，它允许你在一行可读的代码中创建并转换集合(collections)。相比用循环(loop)与 append 操作写多行代码，推导式可以用更简洁、而且通常更清晰的方式表达同样的逻辑。

在本章中，我们将探索如何使用列表推导式(list comprehensions)、字典推导式(dictionary comprehensions) 和集合推导式(set comprehensions) 来编写更 Pythonic 的代码。我们将看到如何融入条件逻辑，何时应选择推导式而不是传统循环，以及如何用嵌套迭代来处理更复杂的场景。

34.1) 用列表推导式创建与转换列表

34.1.1) 基本的列表推导式语法

列表推导式(list comprehension) 提供了一种紧凑的方式：通过对现有序列中的每个元素应用一个表达式(expression)，来创建一个新列表。基本语法是：

python

[expression for item in iterable]

这会创建一个新列表，其中每个元素都是对 iterable（任何你可以循环遍历的序列，例如列表、range 或字符串）中的每个 item 计算 expression 的结果。

让我们从一个简单示例开始。假设我们想为 0 到 4 的数字创建一个平方列表：

python

# 使用循环的传统做法
squares = []
for number in range(5):
    squares.append(number ** 2)
print(squares)  # Output: [0, 1, 4, 9, 16]

使用列表推导式，我们可以更简洁地表达：

python

# 使用列表推导式
squares = [number ** 2 for number in range(5)]
print(squares)  # Output: [0, 1, 4, 9, 16]

两种方式会产生相同的结果，但推导式更紧凑，而且当你熟悉这种语法后，通常也更容易阅读。推导式清楚地表明我们正在创建一个由平方值组成的列表。

34.1.2) 转换现有数据

列表推导式在将数据从一种形式转换为另一种形式方面表现出色。让我们看一些实用示例。

将摄氏温度转换为华氏温度：

python

# 摄氏温度数据
celsius_temps = [0, 10, 20, 30, 40]
 
# 使用公式转换为华氏温度：F = C * 9/5 + 32
fahrenheit_temps = [temp * 9/5 + 32 for temp in celsius_temps]
print(fahrenheit_temps)  # Output: [32.0, 50.0, 68.0, 86.0, 104.0]

将字符串转换为大写：

python

# 混合大小写的产品编码
product_codes = ["abc123", "def456", "ghi789"]
 
# 统一为大写
uppercase_codes = [code.upper() for code in product_codes]
print(uppercase_codes)  # Output: ['ABC123', 'DEF456', 'GHI789']

34.1.3) 从 Range 对象创建列表

列表推导式与 range() 配合得很自然，我们在第 12 章学习过它。这对于生成具有特定模式的序列很有用：

python

# 生成 0 到 10 的偶数
evens = [n * 2 for n in range(6)]  # n 从 0 到 5，所以 n*2 得到 0, 2, 4, 6, 8, 10
print(evens)  # Output: [0, 2, 4, 6, 8, 10]
 
# 生成 5 的倍数
multiples_of_five = [n * 5 for n in range(1, 6)]
print(multiples_of_five)  # Output: [5, 10, 15, 20, 25]

34.1.4) 推导式 vs 使用 Append 构建列表

理解列表推导式是一次性创建整个列表很重要，而传统循环方式是逐步构建列表。两者都会产生相同结果，但推导式通常在创建新列表时更快，并且被认为更 Pythonic。

下面是并排对比：

python

# 传统循环方式
result = []
for i in range(5):
    result.append(i * 3)
print(result)  # Output: [0, 3, 6, 9, 12]
 
# 列表推导式方式
result = [i * 3 for i in range(5)]
print(result)  # Output: [0, 3, 6, 9, 12]

两种方式都有效，但推导式更简洁，并且清晰表达了意图：“创建一个列表，其中每个值都是 i * 3。”

34.2) 列表推导式中的条件逻辑

34.2.1) 使用 if 条件进行过滤

列表推导式最强大的特性之一，是能够基于条件过滤元素。你可以在推导式末尾添加一个 if 子句，只包含满足特定条件的元素：

python

[expression for item in iterable if condition]

if 子句起到过滤器(filter) 的作用：Python 会对每个元素评估条件，只有条件为 True 的元素才会包含在结果列表中。不满足条件的元素会被完全跳过。

让我们用一个简单示例来看它如何工作：

python

# 只获取 0 到 9 的偶数
numbers = range(10)
evens = [n for n in numbers if n % 2 == 0]
print(evens)  # Output: [0, 2, 4, 6, 8]

这里，n % 2 == 0 用来检查数字是否为偶数。只有通过测试的数字才会被包含进新列表。

过滤学生分数：

python

# 学生测试分数
scores = [45, 78, 92, 65, 88, 55, 73, 95]
 
# 只获取及格分数 (>= 70)
passing_scores = [score for score in scores if score >= 70]
print(passing_scores)  # Output: [78, 92, 88, 73, 95]

34.2.2) 转换被过滤的元素

你可以将过滤与转换结合起来，对通过过滤的元素应用表达式：

python

# 学生分数
scores = [45, 78, 92, 65, 88, 55, 73, 95]
 
# 获取及格分数并将它们缩放到 0-10 范围
scaled_passing = [score / 10 for score in scores if score >= 70]
print(scaled_passing)  # Output: [7.8, 9.2, 8.8, 7.3, 9.5]
# 先过滤（只保留 >= 70），再转换（除以 10）

转换并过滤字符串：

python

# 质量参差不齐的产品名称
products = ["apple", "BANANA", "cherry", "DATE", "elderberry"]
 
# 获取名称长度超过 5 个字符的产品，并转为大写
long_products_upper = [product.upper() for product in products if len(product) > 5]
print(long_products_upper)  # Output: ['BANANA', 'CHERRY', 'ELDERBERRY']

34.2.3) 在推导式中使用条件表达式（if-else）

有时你希望基于条件以不同方式转换元素，而不是把它们过滤掉。这时，你可以在推导式的表达式部分使用条件表达式(conditional expression)（我们在第 10 章学过）：

python

[expression_if_true if condition else expression_if_false for item in iterable]

这与过滤不同。在这里，每个元素都会出现在结果中——if-else 决定对每个元素应用哪一个表达式。条件表达式（来自第 10 章）出现在表达式部分，在 for 子句之前。

注意语法差异：

过滤：[expr for item in seq if condition] - if 在末尾，没有 else
条件表达式：[expr_if if cond else expr_else for item in seq] - if-else 在表达式中，在 for 之前

python

# 将数字分类为偶数或奇数
numbers = range(6)
classifications = ["even" if n % 2 == 0 else "odd" for n in numbers]
print(classifications)  # Output: ['even', 'odd', 'even', 'odd', 'even', 'odd']

基于条件应用不同的转换：

python

# 学生分数
scores = [45, 78, 92, 65, 88, 55, 73, 95]
 
# 给不及格分数加奖励分，及格分数保持不变
adjusted_scores = [score + 10 if score < 70 else score for score in scores]
print(adjusted_scores)  # Output: [55, 78, 92, 75, 88, 65, 73, 95]

在这两个示例中，请注意：

原始列表中的每个元素都会出现在结果中
if-else 决定每个元素变成什么值
没有元素被过滤掉

34.2.4) 理解差异：过滤 vs 条件表达式

理解这两种模式之间的差异至关重要：

过滤（末尾的 if） - 一些元素会被排除：

python

# 只包含正数
numbers = [-2, 5, -1, 8, 0, 3]
positives = [n for n in numbers if n > 0]
print(positives)  # Output: [5, 8, 3]
print(len(positives))  # Output: 3 (只有 3 个元素)
# 过程：检查条件 → 如果为 True，包含该元素 → 如果为 False，跳过该元素

条件表达式（表达式中的 if-else） - 所有元素都会被包含，但会以不同方式转换：

python

# 将负数转换为零，保留正数
numbers = [-2, 5, -1, 8, 0, 3]
non_negatives = [n if n > 0 else 0 for n in numbers]
print(non_negatives)  # Output: [0, 5, 0, 8, 0, 3]
print(len(non_negatives))  # Output: 6 (全部 6 个元素)
# 过程：检查条件 → 如果为 True，使用第一个 expr → 如果为 False，使用第二个 expr → 始终包含结果

34.3) 字典推导式

34.3.1) 基本的字典推导式语法

正如列表推导式创建列表，字典推导式(dictionary comprehensions) 创建字典。语法类似，但你需要同时指定键(key)和值(value)：

python

{key_expression: value_expression for item in iterable}

这会创建一个新字典，其中每个键值对都由可迭代对象生成。

让我们从一个简单示例开始，创建一个将数字映射到其平方的字典：

python

# 创建一个数字及其平方的字典
squares_dict = {n: n ** 2 for n in range(5)}
print(squares_dict)  # Output: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

从两个列表创建字典：

python

# 学生姓名及其分数
names = ["Alice", "Bob", "Charlie"]
scores = [85, 92, 78]
 
# 创建一个将姓名映射到分数的字典
student_scores = {names[i]: scores[i] for i in range(len(names))}
print(student_scores)  # Output: {'Alice': 85, 'Bob': 92, 'Charlie': 78}

更优雅的组合两个序列的方法是使用 zip()，我们将在第 37 章学习它。现在，基于索引的方法也很好用。

34.3.2) 转换现有字典

字典推导式非常适合转换现有字典。你可以修改键、值，或两者都修改。

在推导式中遍历字典时，使用 .items() 来同时访问键和值。.items() 方法会返回键值对，你可以在 for 子句中将它们解包：

python

# 以美元为单位的原始价格
prices = {"apple": 1.50, "banana": 0.75, "cherry": 2.00}
 
# 转换为美分（乘以 100）
prices_in_cents = {fruit: price * 100 for fruit, price in prices.items()}
print(prices_in_cents)  # Output: {'apple': 150.0, 'banana': 75.0, 'cherry': 200.0}

转换键：

python

# 小写的产品编码
codes = {"abc": 100, "def": 200, "ghi": 300}
 
# 将键转换为大写
uppercase_codes = {code.upper(): quantity for code, quantity in codes.items()}
print(uppercase_codes)  # Output: {'ABC': 100, 'DEF': 200, 'GHI': 300}

同时转换键和值：

python

# 学生姓名与分数
scores = {"alice": 85, "bob": 92, "charlie": 78}
 
# 将姓名首字母大写，并将分数缩放到 0-10 范围
formatted_scores = {name.capitalize(): score / 10 for name, score in scores.items()}
print(formatted_scores)  # Output: {'Alice': 8.5, 'Bob': 9.2, 'Charlie': 7.8}

34.3.3) 过滤字典条目

与列表推导式类似，字典推导式也可以包含条件来过滤条目：

python

# 学生分数
scores = {"Alice": 85, "Bob": 65, "Charlie": 92, "David": 55, "Eve": 78}
 
# 只获取及格分数 (>= 70)
passing_scores = {name: score for name, score in scores.items() if score >= 70}
print(passing_scores)  # Output: {'Alice': 85, 'Charlie': 92, 'Eve': 78}

按键的特征过滤：

python

# 产品库存
inventory = {"apple": 50, "banana": 30, "apricot": 20, "cherry": 40}
 
# 只获取以 'a' 开头的产品
a_products = {product: quantity for product, quantity in inventory.items() 
              if product.startswith('a')}
print(a_products)  # Output: {'apple': 50, 'apricot': 20}

34.3.4) 从序列创建字典

字典推导式对于从序列创建查找字典很有用：

python

# 单词列表
words = ["python", "java", "ruby", "javascript"]
 
# 创建一个将每个单词映射到其长度的字典
word_lengths = {word: len(word) for word in words}
print(word_lengths)  # Output: {'python': 6, 'java': 4, 'ruby': 4, 'javascript': 10}

34.3.5) 在字典推导式中使用条件表达式

你可以使用条件表达式根据条件以不同方式计算值：

python

# 学生分数
scores = {"Alice": 85, "Bob": 65, "Charlie": 92, "David": 55}
 
# 添加 "Pass" 或 "Fail" 状态
scores_with_status = {name: "Pass" if score >= 70 else "Fail" 
                      for name, score in scores.items()}
print(scores_with_status)  # Output: {'Alice': 'Pass', 'Bob': 'Fail', 'Charlie': 'Pass', 'David': 'Fail'}

应用不同的转换：

python

# 产品价格
prices = {"apple": 1.50, "banana": 0.75, "cherry": 2.50}
 
# 对较贵的商品（> $2.00）应用折扣
discounted_prices = {product: price * 0.9 if price > 2.00 else price 
                     for product, price in prices.items()}
print(discounted_prices)  # Output: {'apple': 1.5, 'banana': 0.75, 'cherry': 2.25}

34.4) 集合推导式

34.4.1) 基本的集合推导式语法

集合推导式(set comprehensions) 使用与列表推导式类似的语法创建集合，但使用花括号：

python

{expression for item in iterable}

结果是一个集合(set)，这意味着重复值会被自动移除，并且顺序不保证。

python

# 创建一个平方集合
squares_set = {n ** 2 for n in range(6)}
print(squares_set)  # Output: {0, 1, 4, 9, 16, 25}

与列表推导式的关键区别是集合会自动消除重复：

python

# 列表推导式 - 保留重复项
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
squared_list = [n ** 2 for n in numbers]
print(squared_list)  # Output: [1, 4, 4, 9, 9, 9, 16, 16, 16, 16]
 
# 集合推导式 - 移除重复项
squared_set = {n ** 2 for n in numbers}
print(squared_set)  # Output: {16, 1, 4, 9} (order may vary)

注意，集合输出的顺序可能与这里看到的不一样。集合是无序集合，因此 Python 可能以任意顺序显示元素。

34.4.2) 提取唯一值

当你需要从一个集合中提取唯一值时，集合推导式非常合适：

python

# 学生回答（有重复）
responses = ["yes", "no", "yes", "maybe", "no", "yes", "maybe"]
 
# 获取唯一回答
unique_responses = {response for response in responses}
print(unique_responses)  # Output: {'maybe', 'yes', 'no'}

从字符串中提取唯一字符：

python

# 含有重复字符的文本
text = "mississippi"
 
# 获取唯一字符
unique_chars = {char for char in text}
print(unique_chars)  # Output: {'m', 'i', 's', 'p'}

34.4.3) 使用集合推导式进行转换与过滤

与其他推导式一样，集合推导式也可以包含转换和条件：

python

# 学生姓名
names = ["Alice", "bob", "CHARLIE", "david", "EVE"]
 
# 获取大写的唯一首字母
first_letters = {name[0].upper() for name in names}
print(first_letters)  # Output: {'A', 'B', 'C', 'D', 'E'}

使用条件过滤：

python

# 有重复的数字
numbers = [1, -2, 3, -4, 5, -2, 3, 6, -4]
 
# 获取唯一的正数
positive_numbers = {n for n in numbers if n > 0}
print(positive_numbers)  # Output: {1, 3, 5, 6}

34.4.4) 集合推导式在何时最有用

集合推导式在以下场景中特别有价值：

你需要唯一值：自动移除重复项
顺序不重要：集合是无序的，因此当序列顺序不重要时使用
你将执行集合运算：结果可用于并集、交集等（正如我们在第 17 章学过的）

python

# 两门课程的学生选课名单
course_a = ["Alice", "Bob", "Charlie", "David"]
course_b = ["Charlie", "David", "Eve", "Frank"]
 
# 使用集合推导式获取两门课程中的唯一学生
all_students = {student for course in [course_a, course_b] for student in course}
print(all_students)  # Output: {'Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'}

34.5) 选择推导式还是循环

34.5.1) 推导式更好的场景

当你通过转换或过滤现有集合来创建新集合时，通常更推荐使用推导式。它们更简洁，往往更易读，并且通常比等价的循环更快。

推导式在以下场景表现出色：

从现有集合创建新集合：

python

# 推导式的良好用法
prices = [10.99, 25.50, 8.75, 15.00]
discounted = [price * 0.9 for price in prices]

转换很直接：

python

# 清晰且简洁
names = ["alice", "bob", "charlie"]
uppercase_names = [name.upper() for name in names]

基于简单条件进行过滤：

python

# 易于理解
scores = [85, 92, 78, 65, 88, 55, 73, 95]
passing = [score for score in scores if score >= 70]

34.5.2) 传统循环更好的场景

不过，在某些情况下，传统循环更合适也更易读：

在以下场景使用循环：

逻辑复杂或涉及多个步骤：

python

# 对推导式来说太复杂
results = []
for score in scores:
    if score >= 90:
        grade = "A"
    elif score >= 80:
        grade = "B"
    elif score >= 70:
        grade = "C"
    else:
        grade = "F"
    results.append({"score": score, "grade": grade})

虽然你可以把它写成推导式，但会更难读。

你需要执行超出创建集合之外的操作：

python

# 进行 I/O 或产生副作用时，用循环更清晰
for filename in files:
    with open(filename) as f:
        content = f.read()
        print(f"Processing {filename}")
        # ... more processing

你需要原地修改一个现有集合：

python

# 原地修改列表 - 不能用推导式
numbers = [1, 2, 3, 4, 5]
for i in range(len(numbers)):
    numbers[i] *= 2
print(numbers)  # Output: [2, 4, 6, 8, 10]

你需要在复杂逻辑中使用 break 或 continue：

python

# 查找第一次出现并进行额外处理
found = None
for item in items:
    if item.startswith("target"):
        found = item
        print(f"Found: {found}")
        break

34.5.3) 可读性考虑

最重要的因素是可读性。如果一个推导式变得太长或太复杂，就把它拆成传统循环：

python

# 难读 - 一行里发生了太多事情
result = [item.upper().strip() for item in items if len(item) > 5 and item.startswith('a')]
 
# 更好 - 当逻辑复杂时使用循环
result = []
for item in items:
    if len(item) > 5 and item.startswith('a'):
        cleaned = item.strip().upper()
        result.append(cleaned)

一个很好的经验法则：如果你的推导式无法舒服地放在一行内（或者最多两行且格式清晰），就考虑改用循环。

34.5.4) 性能考虑

推导式通常比等价循环更快，因为它们在解释器层面做了优化。然而，对于小到中等规模的集合，这种性能差异通常可以忽略。

python

# 两者产生相同结果
# 推导式稍快一些
squares_comp = [n ** 2 for n in range(1000)]
 
# 循环稍慢一些，但更灵活
squares_loop = []
for n in range(1000):
    squares_loop.append(n ** 2)

对大多数实际用途来说，应基于可读性而不是性能来选择。只有当性能分析表明某个操作是瓶颈时，才需要为速度做优化。

34.5.5) 结合两种方式

有时最好的解决方案会结合两种方式：

python

# 对简单转换使用推导式
student_data = [
    {"name": "Alice", "score": 85},
    {"name": "Bob", "score": 92},
    {"name": "Charlie", "score": 78}
]
 
# 使用推导式提取分数
scores = [student["score"] for student in student_data]
 
# 对复杂处理使用循环
for student in student_data:
    score = student["score"]
    if score >= 90:
        print(f"{student['name']}: Excellent!")
    elif score >= 80:
        print(f"{student['name']}: Good job!")
    else:
        print(f"{student['name']}: Keep working!")

34.6) 嵌套循环与多个 for 子句

34.6.1) 理解多个 for 子句

推导式可以包含多个 for 子句，这等价于嵌套循环。语法是：

python

[expression for item1 in iterable1 for item2 in iterable2]

这等价于：

python

result = []
for item1 in iterable1:
    for item2 in iterable2:
        result.append(expression)

关键点是：这些 for 子句按从左到右读取，就像嵌套循环按从上到下书写一样。

让我们从一个简单示例开始，创建两个列表的所有组合：

python

# 两个值列表
colors = ["red", "blue"]
sizes = ["S", "M", "L"]
 
# 创建所有组合
combinations = [(color, size) for color in colors for size in sizes]
print(combinations)
# Output: [('red', 'S'), ('red', 'M'), ('red', 'L'), ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]

这会创建颜色与尺码的每一种可能配对。

34.6.2) 创建坐标对

一个常见用例是生成坐标对：

python

# 创建一个 3x3 的坐标网格
coordinates = [(x, y) for x in range(3) for y in range(3)]
print(coordinates)
# Output: [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

创建乘法表：

python

# 生成乘法组合
products = [(x, y, x * y) for x in range(1, 4) for y in range(1, 4)]
for x, y, product in products:
    print(f"{x} × {y} = {product}")
# Output:
# 1 × 1 = 1
# 1 × 2 = 2
# 1 × 3 = 3
# 2 × 1 = 2
# 2 × 2 = 4
# 2 × 3 = 6
# 3 × 1 = 3
# 3 × 2 = 6
# 3 × 3 = 9

34.6.3) 展平嵌套列表

多个 for 子句对于展平嵌套结构很有用：

python

# 嵌套数字列表
nested_numbers = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
 
# 展平成一个列表
flat = [num for sublist in nested_numbers for num in sublist]
print(flat)  # Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

这等价于：

python

flat = []
for sublist in nested_numbers:
    for num in sublist:
        flat.append(num)

将单词列表展平成字符：

python

# 单词列表
words = ["cat", "dog", "bird"]
 
# 获取所有单词的所有字符
all_chars = [char for word in words for char in word]
print(all_chars)  # Output: ['c', 'a', 't', 'd', 'o', 'g', 'b', 'i', 'r', 'd']

34.6.4) 在嵌套推导式中添加条件

你可以添加条件来过滤结果：

python

# 创建和为偶数的配对
pairs = [(x, y) for x in range(5) for y in range(5) if (x + y) % 2 == 0]
print(pairs)
# Output: [(0, 0), (0, 2), (0, 4), (1, 1), (1, 3), (2, 0), (2, 2), (2, 4), (3, 1), (3, 3), (4, 0), (4, 2), (4, 4)]

找出列表之间的共同元素：

python

# 两个数字列表
list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]
 
# 找到值相等的配对（共同元素）
common = [x for x in list1 for y in list2 if x == y]
print(common)  # Output: [4, 5]

注意：为了找共同元素，使用集合交集更高效：set(list1) & set(list2)，我们在第 17 章学习过它。

34.6.5) 嵌套字典推导式

你也可以在字典推导式中使用多个 for 子句：

python

# 创建一个坐标和的字典
coord_sums = {(x, y): x + y for x in range(3) for y in range(3)}
print(coord_sums)
# Output: {(0, 0): 0, (0, 1): 1, (0, 2): 2, (1, 0): 1, (1, 1): 2, (1, 2): 3, (2, 0): 2, (2, 1): 3, (2, 2): 4}

34.6.6) 何时避免嵌套推导式

虽然嵌套推导式很强大，但它们很快就会变得难以阅读。请考虑以下指南：

可接受 - 相对简单：

python

# 两层嵌套，表达式简单
matrix = [[i * j for j in range(3)] for i in range(3)]
print(matrix)  # Output: [[0, 0, 0], [0, 1, 2], [0, 2, 4]]

开始复杂 - 考虑使用循环：

python

# 三层嵌套 - 难以阅读
result = [[[i + j + k for k in range(2)] for j in range(2)] for i in range(2)]
# 更好是使用嵌套循环以提高清晰度

经验法则：如果你有超过两个 for 子句，或者表达式很复杂，就改用传统的嵌套循环：

python

# 使用显式循环更清晰
result = []
for i in range(2):
    middle = []
    for j in range(2):
        inner = []
        for k in range(2):
            inner.append(i + j + k)
        middle.append(inner)
    result.append(middle)

包含多个 for 子句的推导式是强大的工具，但请记住：清晰比简短更重要。如果一个嵌套推导式变得难以理解，最好使用显式循环。