Python & AI Tutorials Logo
Python Programming

36. Generators and Lazy Iteration

In Chapter 35, we learned how iteration works in Python through iterables and iterators. We saw that iterators return values one at a time when requested, which allows Python to process sequences without loading everything into memory at once. Now we'll explore generators, Python's most elegant and practical way to create iterators.

Generators are functions that can pause and resume their execution, producing values one at a time as they're requested, rather than computing all values upfront and storing them in memory. This approach—called lazy evaluation—means values are generated only when needed, making it one of Python's most powerful features for writing memory-efficient code.

36.1) What Generators Are and Why They're Useful

36.1.1) The Problem with Creating Large Lists

Let's start by understanding the problem generators solve. Suppose you need to process a sequence of one million numbers. Here's the traditional approach using a list:

python
# Creating a list of one million squares
def get_squares_list(n):
    """Return a list of squares from 0 to n-1."""
    squares = []
    for i in range(n):
        squares.append(i * i)
    return squares
 
# This creates a list with 1,000,000 numbers in memory
numbers = get_squares_list(1_000_000)
print(f"First five squares: {numbers[:5]}")  # Output: First five squares: [0, 1, 4, 9, 16]

This approach has a significant problem: it creates and stores all one million numbers in memory at once, even if you only need to process them one at a time. For larger datasets or more complex calculations, this can consume enormous amounts of memory or even crash your program.

36.1.2) Introducing Generators: Computing Values On Demand

A generator is a special type of function that produces values one at a time, only when requested. Instead of building and returning a complete list, a generator computes each value as needed and "remembers" where it left off between calls.

Here's the same functionality implemented as a generator:

python
# Creating a generator of squares
def get_squares_generator(n):
    """Generate squares from 0 to n-1, one at a time."""
    for i in range(n):
        yield i * i  # yield pauses the function and returns a value
 
# This creates a generator object, not a list
squares_gen = get_squares_generator(1_000_000)
print(squares_gen)  # Output: <generator object get_squares_generator at 0x...>
 
# Get values one at a time
print(next(squares_gen))  # Output: 0
print(next(squares_gen))  # Output: 1
print(next(squares_gen))  # Output: 4

The generator doesn't compute all one million squares upfront. Instead, it computes each square only when you call next() on it. Between calls, the generator "pauses" and remembers its state (the current value of i).

36.1.3) Memory Efficiency: The Key Advantage

The memory difference between lists and generators becomes dramatic with large datasets. Let's compare:

python
import sys
 
# List approach: stores all values
def squares_list(n):
    return [i * i for i in range(n)]
 
# Generator approach: computes values on demand
def squares_generator(n):
    for i in range(n):
        yield i * i
 
# Compare memory usage for 100,000 numbers
list_result = squares_list(100_000)
gen_result = squares_generator(100_000)
 
print(f"List size in memory: {sys.getsizeof(list_result):,} bytes")
# Output: List size in memory: 800,984 bytes (actual size may vary)
 
print(f"Generator size in memory: {sys.getsizeof(gen_result)} bytes")
# Output: Generator size in memory: 200 bytes (actual size may vary)

The list consumes over 800 KB of memory, while the generator uses only 200 bytes—regardless of how many values it will eventually produce. The generator stores only the function's state (the current value of i and where to resume), not the actual sequence of values.

36.1.4) When Generators Are Useful

Generators excel in several common scenarios:

Processing Large Files:

python
def read_large_file(filename):
    """Generate lines from a file one at a time."""
    with open(filename, 'r') as file:
        for line in file:
            yield line.strip()
 
# Process a huge log file without loading it all into memory
for line in read_large_file('huge_log.txt'):
    if 'ERROR' in line:
        print(line)

Infinite Sequences:

python
def fibonacci():
    """Generate Fibonacci numbers indefinitely."""
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b
 
# Generate Fibonacci numbers forever (or until you stop asking)
fib = fibonacci()
print([next(fib) for _ in range(10)])
# Output: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

36.1.5) Generators Are Iterators

As we learned in Chapter 35, generators are actually a special kind of iterator. They automatically implement the iterator protocol (__iter__() and __next__()), which is why they work seamlessly with for loops:

python
def countdown(n):
    """Generate countdown from n to 1."""
    while n > 0:
        yield n
        n -= 1
 
# Generators work directly in for loops
for num in countdown(5):
    print(num)
# Output:
# 5
# 4
# 3
# 2
# 1

When you use a generator in a for loop, Python automatically calls next() on it repeatedly until the generator is exhausted (raises StopIteration).

36.2) Creating Generator Functions with yield

36.2.1) The yield Statement: Pausing and Resuming

The yield statement is what makes a function a generator. When Python encounters yield, it does something special: instead of returning a value and ending the function, it pauses the function and returns the value. The next time you call next() on the generator, execution resumes right after the yield statement.

Here's a simple example that demonstrates this pause-and-resume behavior:

python
def simple_generator():
    """Demonstrate how yield pauses execution."""
    print("Starting generator")
    yield 1
    print("Resuming after first yield")
    yield 2
    print("Resuming after second yield")
    yield 3
    print("Generator finished")
 
gen = simple_generator()
print("Created generator")
# Output:
# Created generator
 
print(f"First value: {next(gen)}")
# Output:
# Starting generator
# First value: 1
 
print(f"Second value: {next(gen)}")
# Output:
# Resuming after first yield
# Second value: 2
 
print(f"Third value: {next(gen)}")
# Output:
# Resuming after second yield
# Third value: 3
 
try:
    next(gen)
except StopIteration:
    print("Generator exhausted - no more values")
# Output:
# Generator finished
# Generator exhausted - no more values

Notice how the function's execution is interleaved with the calls to next(). Each yield pauses the function, and each next() resumes it from where it left off.

36.2.2) Generator State: Remembering Local Variables

Generators remember all their local variables between yields. This makes them useful for maintaining state across multiple calls:

python
def counter(start=0):
    """Generate sequential numbers starting from start."""
    current = start
    while True:
        yield current
        current += 1
 
# The generator remembers 'current' between yields
count = counter(10)
print(next(count))  # Output: 10
print(next(count))  # Output: 11
print(next(count))  # Output: 12
 
# Each generator has its own independent state
count1 = counter(0)
count2 = counter(100)
print(next(count1))  # Output: 0
print(next(count2))  # Output: 100
print(next(count1))  # Output: 1
print(next(count2))  # Output: 101

The variable current is preserved each time the generator pauses at a yield and resumes at the next next() call. This allows the generator to continue counting from its last value. Each generator instance maintains its own independent state.

36.2.3) Yielding in Loops: The Most Common Pattern

The most common use of generators is yielding values inside a loop. This pattern generates a sequence of values:

python
def even_numbers(start, end):
    """Generate even numbers in the given range."""
    current = start if start % 2 == 0 else start + 1
    while current <= end:
        yield current
        current += 2
 
# Use the generator
evens = even_numbers(1, 20)
print(list(evens))
# Output: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

Each iteration of the loop yields one value, then continues to the next iteration when next() is called again.

36.2.4) Multiple Yield Statements

A generator can have multiple yield statements at different points in its code. Execution flows through them in order:

python
def process_data(data):
    """Generate processed data with status messages."""
    yield "Starting processing..."
    
    cleaned = [item.strip().lower() for item in data]
    yield f"Cleaned {len(cleaned)} items"
    
    unique = list(set(cleaned))
    yield f"Found {len(unique)} unique items"
    
    for item in sorted(unique):
        yield item
 
# Process some data
data = ["  Apple  ", "Banana", "apple", "Cherry", "BANANA"]
processor = process_data(data)
 
for result in processor:
    print(result)
# Output:
# Starting processing...
# Cleaned 5 items
# Found 3 unique items
# apple
# banana
# cherry

This pattern is useful for generators that need to perform setup work, yield status information, and then yield actual data.

36.3) Generator Expressions vs List Comprehensions

36.3.1) Introducing Generator Expressions

In Chapter 34, we learned about list comprehensions—a concise way to create lists. Generator expressions use almost identical syntax but create generators instead of lists.

A generator expression is essentially a compact way to write a simple generator function. Compare these two equivalent approaches:

python
# Generator function
def squares_function(n):
    for x in range(n):
        yield x * x
 
# Generator expression - does the same thing
squares_expression = (x * x for x in range(10))
 
# Both create generator objects
gen1 = squares_function(10)
gen2 = squares_expression
 
print(type(gen1))  # Output: <class 'generator'>
print(type(gen2))  # Output: <class 'generator'>
 
# Both produce the same values
print(list(squares_function(10)))  # Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
print(list(squares_expression))  # Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The syntax is almost identical to list comprehensions. The differences are: use parentheses () instead of square brackets [], and while list comprehensions create lists, generator expressions create generators:

python
# List comprehension - creates entire list in memory
squares_list = [x * x for x in range(10)]
print(squares_list)
# Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
 
# Generator expression - creates generator object
squares_gen = (x * x for x in range(10))
print(squares_gen)
# Output: <generator object <genexpr> at 0x...>
 
# Convert to list to see values
print(list(squares_gen))
# Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Generator expressions provide the same concise syntax as list comprehensions but with the memory efficiency of generators.

36.3.2) Memory Comparison: When It Matters

For small sequences, the memory difference between list comprehensions and generator expressions is negligible. But for large sequences, it becomes significant:

python
import sys
 
# Small sequence - minimal difference
small_list = [x for x in range(100)]
small_gen = (x for x in range(100))
 
print(f"Small list: {sys.getsizeof(small_list)} bytes")
# Output: Small list: 920 bytes (actual size may vary)
print(f"Small generator: {sys.getsizeof(small_gen)} bytes")
# Output: Small generator: 192 bytes (actual size may vary)
 
# Large sequence - huge difference
large_list = [x for x in range(1_000_000)]
large_gen = (x for x in range(1_000_000))
 
print(f"Large list: {sys.getsizeof(large_list):,} bytes")
# Output: Large list: 8,448,728 bytes (actual size may vary)
print(f"Large generator: {sys.getsizeof(large_gen)} bytes")
# Output: Large generator: 192 bytes (actual size may vary)

The generator's size remains constant regardless of how many values it will produce—it only stores the expression and current state. The list, however, must store all values in memory, which is why its size grows proportionally with the number of elements.

36.3.3) Generator Expressions in Function Calls

Generator expressions are particularly elegant when passed directly to functions that consume iterables. You can omit the extra parentheses when a generator expression is the only argument:

python
# Calculate sum of squares without creating a list
total = sum(x * x for x in range(100))  # Note: no extra parentheses needed
print(total)
# Output: 328350
 
# Find maximum of transformed values
numbers = [1, 2, 3, 4, 5]
max_square = max(x * x for x in numbers)
print(max_square)
# Output: 25
 
# Check if any value meets a condition
data = [10, 15, 20, 25, 30]
has_large = any(x > 100 for x in data)
print(has_large)
# Output: False

This pattern is both memory-efficient and readable. Functions like sum(), max(), min(), any(), and all() process the generator one value at a time, never creating an intermediate list.

36.3.4) Filtering with Generator Expressions

Generator expressions support the same conditional logic as list comprehensions:

python
# Filter even numbers
numbers = range(20)
evens = (x for x in numbers if x % 2 == 0)
print(list(evens))
# Output: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
 
# Transform and filter
words = ["hello", "world", "python", "programming"]
long_upper = (word.upper() for word in words if len(word) > 5)
print(list(long_upper))
# Output: ['PYTHON', 'PROGRAMMING']

36.3.5) When Generator Expressions Aren't Enough

Generator expressions are concise and elegant, but they have limitations. Use generator functions when you need:

Complex Logic:

python
# Too complex for a generator expression
def process_log_lines(filename):
    """Process log file with complex logic."""
    with open(filename, 'r') as file:
        for line in file:
            line = line.strip()
            if not line or line.startswith('#'):
                continue  # Skip empty lines and comments
            
            parts = line.split('|')
            if len(parts) >= 3:
                timestamp, level, message = parts[0], parts[1], parts[2]
                if level in ('ERROR', 'CRITICAL'):
                    yield {
                        'timestamp': timestamp,
                        'level': level,
                        'message': message
                    }

Multiple Yields or State:

python
# Generator expression can't maintain state across iterations
def running_total(numbers):
    """Generate running total of numbers."""
    total = 0
    for num in numbers:
        total += num
        yield total
 
numbers = [1, 2, 3, 4, 5]
print(list(running_total(numbers)))
# Output: [1, 3, 6, 10, 15]

Error Handling:

python
# Generator expression can't handle exceptions
def safe_divide(numbers, divisor):
    """Generate division results, handling errors."""
    for num in numbers:
        try:
            yield num / divisor
        except ZeroDivisionError:
            yield float('inf')

36.4) When to Use Generators Instead of Lists

36.4.1) Large Datasets: The Primary Use Case

The most compelling reason to use generators is when working with large amounts of data. If you're processing millions of records, generators can make the difference between a program that runs smoothly and one that crashes.

Bad approach - Loading entire file into memory:

python
# DON'T DO THIS with large files
def count_errors_bad(filename):
    """Load entire file into memory - will crash with large files."""
    with open(filename, 'r') as file:
        lines = file.readlines()  # Loads ENTIRE file into memory
    
    error_count = 0
    for line in lines:
        if 'ERROR' in line:
            error_count += 1
    
    return error_count
 
# If the file is 10 GB, this tries to load 10 GB into memory!

Good approach - Using a generator:

python
def read_log_lines(filename):
    """Generate lines from a log file one at a time."""
    with open(filename, 'r') as file:
        for line in file:
            yield line.strip()
 
def count_errors_good(filename):
    """Count errors without loading entire file into memory."""
    error_count = 0
    for line in read_log_lines(filename):
        if 'ERROR' in line:
            error_count += 1
    
    return error_count
 
# This works efficiently even with gigabyte-sized log files
# because it only keeps one line in memory at a time
count = count_errors_good('huge_application.log')
print(f"Found {count} errors")

The generator approach processes one line at a time, so memory usage stays constant regardless of file size. A 10 GB file uses the same amount of memory as a 10 KB file.

36.4.2) Infinite or Unknown-Length Sequences

Generators are perfect for sequences where you don't know the length in advance or where the sequence is conceptually infinite:

python
def user_input_stream():
    """Generate user inputs until they type 'quit'."""
    while True:
        user_input = input("Enter a number (or 'quit'): ")
        if user_input.lower() == 'quit':
            break
        try:
            yield int(user_input)
        except ValueError:
            print("Invalid number, try again")
 
# Process user inputs as they arrive
total = 0
count = 0
for number in user_input_stream():
    total += number
    count += 1
    print(f"Running average: {total / count:.2f}")

You can't create a list of unknown length, but a generator handles this naturally.

36.4.3) Chained Transformations: Building Data Pipelines

When you need to apply multiple transformations to data, generators let you chain operations without creating intermediate lists:

python
# Transform numbers through multiple stages
def generate_numbers(n):
    """Generate numbers from 1 to n."""
    for i in range(1, n + 1):
        yield i
 
def square_numbers(numbers):
    """Generate squares of input numbers."""
    for num in numbers:
        yield num * num
 
def keep_even(numbers):
    """Generate only even numbers."""
    for num in numbers:
        if num % 2 == 0:
            yield num
 
# Chain generators - no intermediate lists created
numbers = generate_numbers(10)
squared = square_numbers(numbers)
even_squares = keep_even(squared)
 
# Process results
print(list(even_squares))
# Output: [4, 16, 36, 64, 100]

Each stage processes one value at a time, passing it to the next stage. This is memory-efficient and allows you to process datasets larger than available RAM.

generate_numbers

square_numbers

keep_even

Results

Without generators, you'd need intermediate lists:

python
# Non-generator approach - creates intermediate lists
numbers = list(range(1, 11))           # [1, 2, 3, ..., 10]
squared = [n * n for n in numbers]     # [1, 4, 9, ..., 100]
even_squares = [n for n in squared if n % 2 == 0]  # [4, 16, 36, 64, 100]
 
# With generators - no intermediate lists
numbers = (i for i in range(1, 11))
squared = (n * n for n in numbers)
even_squares = (n for n in squared if n % 2 == 0)
print(list(even_squares))
# Output: [4, 16, 36, 64, 100]

For a pipeline with three stages processing one million items, the list approach would create three lists of one million items each. The generator approach keeps only one value in memory at a time.

36.4.4) When Lists Are Better Than Generators

Despite their advantages, generators aren't always the right choice. Use lists when you need:

Multiple Iterations:

python
# List - can iterate multiple times
numbers = [1, 2, 3, 4, 5]
print(sum(numbers))      # Output: 15
print(max(numbers))      # Output: 5 (works fine)
 
# Generator - can only iterate once
numbers_gen = (x for x in range(1, 6))
print(sum(numbers_gen))  # Output: 15
print(max(numbers_gen))  # Output: ValueError: max() iterable argument is empty

If you need to process the same data multiple times, use a list.

Random Access:

python
# Need to access elements by index - use a list
students = ['Alice', 'Bob', 'Charlie', 'Diana']
print(students[2])  # Output: Charlie
 
# Generators don't support indexing
students_gen = (name for name in students)
# students_gen[2]  # ERROR: 'generator' object is not subscriptable

Length Information:

python
# Need to know the length - use a list
data = [1, 2, 3, 4, 5]
print(f"Processing {len(data)} items")
 
# Generators don't have a length
data_gen = (x for x in data)
# len(data_gen)  # ERROR: object of type 'generator' has no len()

Small Datasets:

python
# For small datasets, lists are fine and more convenient
small_data = [x * 2 for x in range(10)]
 
# The memory savings of a generator aren't significant here
# and the list is more flexible

36.4.5) Practical Decision Guide

Here's a practical guide for choosing between generators and lists:

Use Generators When:

  • Processing large files or datasets
  • Working with data streams or user input
  • Building data processing pipelines
  • Memory efficiency is important
  • You only need to iterate once
  • The sequence is infinite or very long

Use Lists When:

  • The dataset is small (< 10,000 items typically)
  • You need to iterate multiple times
  • You need random access by index
  • You need to know the length
  • You need to pass the data to code expecting a list

36.4.6) Converting Between Generators and Lists

You can easily convert between generators and lists when needed:

python
# Generator to list
numbers_gen = (x * 2 for x in range(5))
numbers_list = list(numbers_gen)
print(numbers_list)
# Output: [0, 2, 4, 6, 8]
 
# List to generator (using generator expression)
numbers_list = [1, 2, 3, 4, 5]
numbers_gen = (x for x in numbers_list)

This flexibility means you can start with a generator for efficiency and convert to a list only when you need list-specific features:

python
# Start with a generator for memory efficiency
numbers = (x for x in range(1, 1001))
filtered = (x for x in numbers if x % 7 == 0)
 
# Convert to list when you need multiple iterations
multiples_of_seven = list(filtered)
 
# Now you can use list features
print(f"Count: {len(multiples_of_seven)}")
# Output: Count: 142
 
print(f"First: {multiples_of_seven[0]}")
# Output: First: 7
 
print(f"Last: {multiples_of_seven[-1]}")
# Output: Last: 994
 
# Can iterate multiple times
total = sum(multiples_of_seven)
average = total / len(multiples_of_seven)
print(f"Average: {average:.1f}")
# Output: Average: 500.5

Generators are one of Python's most elegant features for writing memory-efficient code. They allow you to process large datasets, build data pipelines, and work with infinite sequences—all while keeping your code clean and readable. As you gain experience, you'll develop an intuition for when generators are the right tool for the job.

© 2025. Primesoft Co., Ltd.
support@primesoft.ai