17. Sets: Working with Unique Unordered Data

In previous chapters, we've worked with lists (ordered, mutable collections) and dictionaries (key-value mappings). Now we'll explore sets, Python's collection type designed specifically for storing unique items and performing mathematical set operations efficiently.

Sets are particularly powerful when you need to eliminate duplicates, test membership quickly, or perform operations like finding common elements between collections. Unlike lists, sets are unordered and cannot contain duplicate values—attempting to add the same item twice has no effect.

17.1) Creating Sets and Basic Operations

17.1.1) Creating Sets with Curly Braces

The most common way to create a set is using curly braces {} with comma-separated values:

python

# Creating a set of programming languages
languages = {"Python", "JavaScript", "Java", "C++"}
print(languages)  # Output: {'Python', 'JavaScript', 'Java', 'C++'}
print(type(languages))  # Output: <class 'set'>

Important: The order of elements when you print a set may differ from the order you entered them. Sets are unordered collections, meaning Python doesn't maintain any particular sequence:

python

numbers = {5, 2, 8, 1, 9}
print(numbers)  # Output might be: {1, 2, 5, 8, 9} or another order

The output order can vary between Python runs and versions. Never rely on sets maintaining a specific order—if order matters, use a list instead.

17.1.2) Sets Automatically Remove Duplicates

One of the most useful properties of sets is that they automatically eliminate duplicate values. If you try to create a set with duplicate items, only one copy of each unique value is kept:

python

# Creating a set with duplicate values
student_ids = {101, 102, 103, 102, 101, 104}
print(student_ids)  # Output: {101, 102, 103, 104}
 
# This property makes sets perfect for removing duplicates
grades = [85, 90, 85, 78, 90, 92, 78, 85]
unique_grades = set(grades)
print(unique_grades)  # Output: {78, 85, 90, 92}

This automatic deduplication happens because sets use a mathematical set model where each element can appear only once. When you add a value that already exists, the set simply ignores the duplicate.

17.1.3) Creating Sets with the set() Constructor

You can create sets from other iterables using the set() constructor. This is particularly useful for converting lists, tuples, or strings into sets:

python

# Creating a set from a list
colors_list = ["red", "blue", "green", "red", "yellow"]
colors_set = set(colors_list)
print(colors_set)  # Output: {'red', 'blue', 'green', 'yellow'}
 
# Creating a set from a string (each character becomes an element)
letters = set("programming")
print(letters)  # Output: {'p', 'r', 'o', 'g', 'a', 'm', 'i', 'n'}
 
# Creating a set from a tuple
coordinates = set((10, 20, 30, 20, 10))
print(coordinates)  # Output: {10, 20, 30}

When you create a set from a string, each unique character becomes a separate element. This is useful for finding all distinct characters in text:

python

text = "Mississippi"
unique_chars = set(text.lower())
print(unique_chars)  # Output: {'m', 'i', 's', 'p'}
print(f"The word contains {len(unique_chars)} unique letters")
# Output: The word contains 4 unique letters

17.1.4) Creating an Empty Set

Here's a critical gotcha: you cannot create an empty set using {} because Python interprets that as an empty dictionary. Instead, you must use set():

python

# WRONG - This creates an empty dictionary, not a set
empty_dict = {}
print(type(empty_dict))  # Output: <class 'dict'>
 
# CORRECT - This creates an empty set
empty_set = set()
print(type(empty_set))  # Output: <class 'set'>
print(empty_set)  # Output: set()

This distinction exists because dictionaries were added to Python before sets, so {} was already claimed for empty dictionaries. When you print an empty set, Python displays it as set() to avoid confusion.

Common beginner confusion: When creating a set with a single element using a variable, the set contains the value of the variable, not the variable name:

python

# Understanding set creation with variables
x = 5
my_set = {x}  # Creates {5}, not {'x'}
print(my_set)  # Output: {5}
 
# If you want a set containing the string 'x':
my_set = {'x'}
print(my_set)  # Output: {'x'}
 
# This applies to any expression
result = 10 + 5
my_set = {result}  # Creates {15}
print(my_set)  # Output: {15}

17.1.5) Basic Set Properties and Operations

Sets support several fundamental operations that make them useful for data processing:

python

# Checking the number of unique elements
website_visitors = {"alice", "bob", "charlie", "alice", "david"}
print(f"Unique visitors: {len(website_visitors)}")
# Output: Unique visitors: 4
 
# Checking membership with 'in' (very fast for sets)
if "alice" in website_visitors:
    print("Alice visited the website")
# Output: Alice visited the website
 
# Checking non-membership
if "eve" not in website_visitors:
    print("Eve has not visited yet")
# Output: Eve has not visited yet

Membership testing with in is one of the key advantages of sets. For large collections, checking if an item exists in a set is much faster than checking in a list. We'll explore why this matters in Section 17.5.

17.2) Adding and Removing Elements from Sets

Unlike tuples (which are immutable), sets are mutable—you can add and remove elements after creation. However, the elements themselves must be immutable types (we'll explore this restriction in Section 17.7).

17.2.1) Adding Single Elements with add()

Adding individual elements to a set is straightforward with the add() method. If the element already exists, the set remains unchanged—no error is raised, and no duplicate is created:

python

# Building a set of completed tasks
completed_tasks = {"task1", "task2"}
print(completed_tasks)  # Output: {'task1', 'task2'}
 
# Adding a new task
completed_tasks.add("task3")
print(completed_tasks)  # Output: {'task1', 'task2', 'task3'}
 
# Adding a duplicate has no effect
completed_tasks.add("task1")
print(completed_tasks)  # Output: {'task1', 'task2', 'task3'}

This behavior makes sets ideal for tracking unique occurrences. You can safely call add() without checking if the element already exists—the set handles duplicates automatically.

17.2.2) Adding Multiple Elements with update()

To add multiple elements at once, use update() which accepts any iterable (list, tuple, another set, etc.) and adds all its elements to the set:

python

# Starting with a small set of skills
skills = {"Python", "SQL"}
print(skills)  # Output: {'Python', 'SQL'}
 
# Adding multiple skills from a list
new_skills = ["JavaScript", "Docker", "Python"]
skills.update(new_skills)
print(skills)  # Output: {'Python', 'SQL', 'JavaScript', 'Docker'}

Notice that "Python" appeared in both the original set and the list being added, but the set still contains only one copy. The update() method can accept multiple iterables as arguments:

python

# Combining skills from multiple sources
current_skills = {"Python"}
course_skills = ["JavaScript", "HTML"]
job_requirements = {"SQL", "Python", "Docker"}
 
current_skills.update(course_skills, job_requirements)
print(current_skills)
# Output: {'Python', 'JavaScript', 'HTML', 'SQL', 'Docker'}

17.2.3) Removing Elements with remove()

Removing elements requires care. The remove() method deletes an element from a set, but raises a KeyError if the element doesn't exist:

python

# Managing active users
active_users = {"alice", "bob", "charlie", "david"}
 
# Removing a user who logged out
active_users.remove("bob")
print(active_users)  # Output: {'alice', 'charlie', 'david'}
 
# Attempting to remove a non-existent element causes an error
# active_users.remove("eve")  # Raises: KeyError: 'eve'

Because remove() raises an error for missing elements, it's best used when you're certain the element exists, or when you want to catch the error if it doesn't:

python

# Safe removal with error handling (we'll learn more about try/except in Chapter 28)
users = {"alice", "bob", "charlie"}
user_to_remove = "david"
 
if user_to_remove in users:
    users.remove(user_to_remove)
    print(f"Removed {user_to_remove}")
else:
    print(f"{user_to_remove} was not in the set")
# Output: david was not in the set

17.2.4) Removing Elements Safely with discard()

For safer element removal that won't raise errors, discard() provides a forgiving alternative. It removes the element if present, but does nothing if the element doesn't exist:

python

# Managing a shopping cart
cart_items = {"apple", "banana", "orange"}
 
# Safely removing items (no error if item doesn't exist)
cart_items.discard("banana")
print(cart_items)  # Output: {'apple', 'orange'}
 
cart_items.discard("grape")  # No error, even though grape isn't in the set
print(cart_items)  # Output: {'apple', 'orange'}

Use discard() when you want to ensure an element is not in the set, regardless of whether it was there initially. Use remove() when the element's absence indicates an error condition you want to catch.

17.2.5) Removing and Returning an Arbitrary Element with pop()

The pop() method removes and returns an arbitrary element from the set. Because sets are unordered, you cannot predict which element will be removed:

python

# Processing a queue of pending tasks (order doesn't matter)
pending_tasks = {"email", "report", "meeting", "review"}
 
# Process one task (we don't care which one)
task = pending_tasks.pop()
print(f"Processing: {task}")  # Output: Processing: email (or another task)
print(f"Remaining: {pending_tasks}")
# Output: Remaining: {'report', 'meeting', 'review'} (without the popped task)

If you call pop() on an empty set, it raises a KeyError:

python

empty_set = set()
# empty_set.pop()  # Raises: KeyError: 'pop from an empty set'

The pop() method is useful when you need to process all elements in a set but don't care about the order:

python

# Processing all items in a set
items_to_process = {"item1", "item2", "item3"}
 
while items_to_process:
    item = items_to_process.pop()
    print(f"Processing {item}")
    # Process the item...
 
print("All items processed")
# Output:
# Processing item1
# Processing item2
# Processing item3
# All items processed

17.2.6) Removing All Elements with clear()

The clear() method removes all elements from a set, leaving it empty:

python

# Resetting a session's data
session_data = {"user_id", "timestamp", "ip_address"}
print(session_data)  # Output: {'user_id', 'timestamp', 'ip_address'}
 
session_data.clear()
print(session_data)  # Output: set()
print(len(session_data))  # Output: 0

This is more efficient than creating a new empty set if you want to reuse the same set object.

17.3) Set Operations: Union, Intersection, Difference, and Symmetric Difference

Sets support mathematical set operations that allow you to combine, compare, and analyze collections efficiently. These operations are fundamental to set theory and have many practical applications in data processing.

17.3.1) Union: Combining Sets

Let's start with a practical scenario to understand why union matters. Imagine you're managing student enrollments across different courses:

python

# Students enrolled in different courses
python_students = {"alice", "bob", "charlie"}
javascript_students = {"bob", "david", "eve"}
 
# Finding all students taking either course (or both)
all_students = python_students | javascript_students
print(all_students)
# Output: {'alice', 'bob', 'charlie', 'david', 'eve'}

The union of two sets contains all elements that appear in either set (or both). Python provides two ways to compute unions: the | operator (shown above) and the union() method:

python

# Same result using the union() method
all_students = python_students.union(javascript_students)
print(all_students)
# Output: {'alice', 'bob', 'charlie', 'david', 'eve'}

The union() method can accept multiple sets as arguments, making it convenient for combining data from many sources:

python

# Students in three different courses
python_students = {"alice", "bob"}
javascript_students = {"bob", "charlie"}
sql_students = {"charlie", "david"}
 
# All students across all courses
all_students = python_students.union(javascript_students, sql_students)
print(all_students)
# Output: {'alice', 'bob', 'charlie', 'david'}

Another example of union is combining email lists from different departments:

python

# Combining email lists from different departments
marketing_contacts = {"alice@company.com", "bob@company.com"}
sales_contacts = {"bob@company.com", "charlie@company.com"}
support_contacts = {"david@company.com", "alice@company.com"}
 
# All unique contacts across departments
all_contacts = marketing_contacts | sales_contacts | support_contacts
print(f"Total unique contacts: {len(all_contacts)}")
# Output: Total unique contacts: 4

17.3.2) Intersection: Finding Common Elements

Understanding which elements appear in multiple sets is crucial for many data analysis tasks. The intersection operation answers the question: "What do these sets have in common?"

python

# Finding customers who bought both products
customers_product_a = {101, 102, 103, 104, 105}
customers_product_b = {103, 104, 105, 106, 107}
 
# Customers who bought both products
both_products = customers_product_a & customers_product_b
print(f"Bought both: {both_products}")
# Output: Bought both: {103, 104, 105}

The intersection contains only elements that appear in both sets. You can also use the intersection() method, which accepts multiple sets:

python

# Finding students enrolled in all three courses
python_students = {"alice", "bob", "charlie"}
javascript_students = {"bob", "charlie", "david"}
sql_students = {"charlie", "eve", "bob"}
 
# Students taking all three courses
all_three = python_students.intersection(javascript_students, sql_students)
print(all_three)  # Output: {'bob', 'charlie'}

Here's a practical use case for finding products available in multiple warehouses:

python

# Finding products available in multiple warehouses
warehouse_a = {"laptop", "mouse", "keyboard", "monitor"}
warehouse_b = {"mouse", "keyboard", "printer", "scanner"}
warehouse_c = {"keyboard", "monitor", "mouse", "desk"}
 
# Products available in all warehouses
available_everywhere = warehouse_a & warehouse_b & warehouse_c
print(f"Available in all locations: {available_everywhere}")
# Output: Available in all locations: {'mouse', 'keyboard'}

17.3.3) Difference: Finding Elements in One Set but Not Another

Sometimes you need to identify what's unique to one collection. The difference operation finds elements that are in the first set but not in the second:

python

# Inventory management: finding discrepancies
expected_items = {"item001", "item002", "item003", "item004"}
actual_items = {"item001", "item003", "item005"}
 
# Items missing from inventory
missing = expected_items - actual_items
print(f"Missing items: {missing}")
# Output: Missing items: {'item002', 'item004'}
 
# Unexpected items in inventory
unexpected = actual_items - expected_items
print(f"Unexpected items: {unexpected}")
# Output: Unexpected items: {'item005'}

You can also use the difference() method:

python

# Students only in Python course (not in JavaScript)
python_students = {"alice", "bob", "charlie"}
javascript_students = {"bob", "david", "eve"}
 
python_only = python_students.difference(javascript_students)
print(python_only)  # Output: {'alice', 'charlie'}

Important: The difference operation is not commutative—the order matters:

python

python_students = {"alice", "bob", "charlie"}
javascript_students = {"bob", "david", "eve"}
 
# Students in Python but not JavaScript
python_only = python_students - javascript_students
print(f"Python only: {python_only}")
# Output: Python only: {'alice', 'charlie'}
 
# Students in JavaScript but not Python
javascript_only = javascript_students - python_students
print(f"JavaScript only: {javascript_only}")
# Output: JavaScript only: {'david', 'eve'}

17.3.4) Symmetric Difference: Elements in Either Set but Not Both

The symmetric difference finds elements that are in either set but not in both. This operation is particularly useful for identifying changes between two versions:

python

# Comparing two versions of a configuration
old_settings = {"debug", "logging", "cache", "compression"}
new_settings = {"logging", "cache", "monitoring", "security"}
 
# Settings that changed (added or removed)
changes = old_settings ^ new_settings
print(f"Changed settings: {changes}")
# Output: Changed settings: {'debug', 'compression', 'monitoring', 'security'}
 
# To see specifically what was added vs removed:
removed = old_settings - new_settings
added = new_settings - old_settings
print(f"Removed: {removed}")  # Output: Removed: {'debug', 'compression'}
print(f"Added: {added}")  # Output: Added: {'monitoring', 'security'}

You can also use the symmetric_difference() method:

python

# Students in exactly one course (not both)
python_students = {"alice", "bob", "charlie"}
javascript_students = {"bob", "david", "eve"}
 
one_course_only = python_students.symmetric_difference(javascript_students)
print(one_course_only)
# Output: {'alice', 'charlie', 'david', 'eve'}

Unlike difference, symmetric difference is commutative—the order doesn't matter:

python

result1 = python_students ^ javascript_students
result2 = javascript_students ^ python_students
print(result1 == result2)  # Output: True

17.4) Subset and Superset Relationships (issubset, issuperset, isdisjoint)

Beyond combining sets, we often need to understand the relationships between them. Python provides methods to test whether one set is contained within another, contains another, or shares no elements with another.

17.4.1) Testing Subsets with issubset() and <=

A set A is a subset of set B if every element in A is also in B. In other words, B contains all of A's elements (and possibly more).

python

# Course prerequisites
basic_skills = {"reading", "writing"}
intermediate_skills = {"reading", "writing", "analysis"}
 
# Check if basic skills are a subset of intermediate skills
print(basic_skills.issubset(intermediate_skills))  # Output: True
print(basic_skills <= intermediate_skills)  # Output: True (same result)

A set is always a subset of itself:

python

skills = {"Python", "SQL", "JavaScript"}
print(skills.issubset(skills))  # Output: True
print(skills <= skills)  # Output: True

If you want to test for a proper subset (A is a subset of B but not equal to B), use the < operator:

python

basic_skills = {"reading", "writing"}
intermediate_skills = {"reading", "writing", "analysis"}
 
# Proper subset: basic is a subset of intermediate AND they're not equal
print(basic_skills < intermediate_skills)  # Output: True
 
# Not a proper subset of itself (they're equal)
print(basic_skills < basic_skills)  # Output: False

A practical example of subset testing is checking permissions or requirements:

python

# User permission system
required_permissions = {"read", "write"}
user_permissions = {"read", "write", "delete", "admin"}
 
# Check if user has all required permissions
if required_permissions.issubset(user_permissions):
    print("Access granted")
else:
    print("Access denied - missing permissions")
# Output: Access granted
 
# Another user with insufficient permissions
limited_user = {"read"}
if required_permissions.issubset(limited_user):
    print("Access granted")
else:
    missing = required_permissions - limited_user
    print(f"Access denied - missing: {missing}")
# Output: Access denied - missing: {'write'}

17.4.2) Testing Supersets with issuperset() and >=

A set A is a superset of set B if A contains all elements of B. This is the inverse relationship of subset—if A is a subset of B, then B is a superset of A.

python

# Skill levels
basic_skills = {"reading", "writing"}
advanced_skills = {"reading", "writing", "analysis", "research"}
 
# Check if advanced skills is a superset of basic skills
print(advanced_skills.issuperset(basic_skills))  # Output: True
print(advanced_skills >= basic_skills)  # Output: True (same result)

Like subsets, a set is always a superset of itself:

python

skills = {"Python", "SQL"}
print(skills.issuperset(skills))  # Output: True

For a proper superset (A is a superset of B but not equal to B), use the > operator:

python

basic_skills = {"reading", "writing"}
advanced_skills = {"reading", "writing", "analysis"}
 
# Proper superset: advanced contains all of basic AND has more
print(advanced_skills > basic_skills)  # Output: True
 
# Not a proper superset of itself
print(advanced_skills > advanced_skills)  # Output: False

17.4.3) Testing for Disjoint Sets with isdisjoint()

Two sets are disjoint if they have no elements in common—their intersection is empty. The isdisjoint() method returns True if the sets share no elements:

python

# Checking for conflicts in scheduling
morning_classes = {"math", "english", "history"}
afternoon_classes = {"science", "art", "music"}
 
# Check if there are any conflicts (same class in both sessions)
if morning_classes.isdisjoint(afternoon_classes):
    print("No scheduling conflicts")
else:
    conflicts = morning_classes & afternoon_classes
    print(f"Conflicts: {conflicts}")
# Output: No scheduling conflicts

When sets are not disjoint:

python

morning_classes = {"math", "english", "history"}
afternoon_classes = {"science", "math", "music"}
 
if morning_classes.isdisjoint(afternoon_classes):
    print("No scheduling conflicts")
else:
    conflicts = morning_classes & afternoon_classes
    print(f"Conflicts: {conflicts}")
# Output: Conflicts: {'math'}

Empty sets are disjoint with all sets (including other empty sets):

python

empty = set()
numbers = {1, 2, 3}
 
print(empty.isdisjoint(numbers))  # Output: True
print(empty.isdisjoint(empty))  # Output: True

17.5) When to Use Sets Instead of Lists

Understanding when to use sets versus lists is crucial for writing efficient Python code. While both store collections of items, they have different characteristics that make each suitable for different tasks.

17.5.1) Use Sets for Fast Membership Testing

One of the most significant advantages of sets is their speed for membership testing. Checking if an item exists in a set is much faster than checking in a list, especially for large collections:

python

# Checking if a user is in a large collection
active_users_list = []
for i in range(10000):
    active_users_list.append("user" + str(i))
 
# With a list (slow for large collections)
print("user5000" in active_users_list)  # Checks each element until found
 
active_users_set = set()
for i in range(10000):
    active_users_set.add("user" + str(i))
 
# With a set (fast regardless of size)
print("user5000" in active_users_set)  # Direct lookup

While both produce the same result, the set version is dramatically faster for large collections. This is because sets use a hash table internally, allowing near-instant lookups regardless of size, while lists must check each element sequentially.

17.5.2) Use Sets to Eliminate Duplicates

When you need to remove duplicates from a collection, converting to a set is the simplest approach:

python

# Removing duplicate entries from user input
survey_responses = [
    "yes", "no", "yes", "maybe", "yes", "no", "maybe", "yes"
]
 
# Get unique responses
unique_responses = set(survey_responses)
print(unique_responses)  # Output: {'yes', 'no', 'maybe'}
 
# If you need a list back (with duplicates removed)
unique_list = list(unique_responses)
print(unique_list)  # Output: ['yes', 'no', 'maybe'] (order may vary)

17.5.3) Use Sets for Mathematical Set Operations

When you need to find common elements, differences, or unions between collections, sets provide clear, efficient operations:

python

# Analyzing customer purchase patterns
customers_product_a = {101, 102, 103, 104, 105}
customers_product_b = {103, 104, 105, 106, 107}
 
# Customers who bought both products
both_products = customers_product_a & customers_product_b
print(f"Bought both: {both_products}")
# Output: Bought both: {103, 104, 105}
 
# Customers who bought only product A
only_a = customers_product_a - customers_product_b
print(f"Only product A: {only_a}")
# Output: Only product A: {101, 102}
 
# All customers who bought at least one product
all_customers = customers_product_a | customers_product_b
print(f"Total customers: {len(all_customers)}")
# Output: Total customers: 7

17.5.4) Use Lists When Order Matters

Sets are unordered, so if the sequence of elements is important, you must use a list:

python

# WRONG - Order is not preserved with sets
task_order = {"wake up", "breakfast", "work", "lunch", "work", "dinner"}
print(task_order)  # Order is unpredictable and "work" appears only once
 
# CORRECT - Use a list when order matters
task_order = ["wake up", "breakfast", "work", "lunch", "work", "dinner"]
print(task_order)
# Output: ['wake up', 'breakfast', 'work', 'lunch', 'work', 'dinner']

17.5.5) Use Lists When Duplicates Are Meaningful

If duplicate values carry information (like frequency or multiple occurrences), use a list:

python

# Recording quiz scores (duplicates show how many students got each score)
quiz_scores = [85, 90, 85, 78, 90, 92, 85, 88]
 
# With a list, we can count occurrences
score_85_count = quiz_scores.count(85)
print(f"Students who scored 85: {score_85_count}")
# Output: Students who scored 85: 3
 
# With a set, we'd lose this information
unique_scores = set(quiz_scores)
print(unique_scores)  # Output: {78, 85, 88, 90, 92}
# We can't tell how many students got each score

17.5.6) Use Lists When You Need Indexing

Sets don't support indexing because they're unordered. If you need to access elements by position, use a list:

python

# WRONG - Sets don't support indexing
colors = {"red", "blue", "green"}
# first_color = colors[0]  # Raises: TypeError: 'set' object is not subscriptable
 
# CORRECT - Use a list for indexed access
colors = ["red", "blue", "green"]
first_color = colors[0]
print(first_color)  # Output: red

17.6) Frozensets and Immutable Sets

So far, we've worked with regular sets, which are mutable—you can add and remove elements after creation. Python also provides frozensets, which are immutable versions of sets. Once created, a frozenset cannot be modified.

17.6.1) Creating Frozensets

You create a frozenset using the frozenset() constructor, similar to how you create a regular set with set():

python

# Creating a frozenset from a list
colors = frozenset(["red", "blue", "green"])
print(colors)  # Output: frozenset({'red', 'blue', 'green'})
print(type(colors))  # Output: <class 'frozenset'>
 
# Creating a frozenset from a tuple
numbers = frozenset((1, 2, 3, 4, 5))
print(numbers)  # Output: frozenset({1, 2, 3, 4, 5})
 
# Creating an empty frozenset
empty = frozenset()
print(empty)  # Output: frozenset()

Like regular sets, frozensets automatically eliminate duplicates:

python

# Duplicates are removed
values = frozenset([1, 2, 2, 3, 3, 3, 4])
print(values)  # Output: frozenset({1, 2, 3, 4})

17.6.2) Frozensets Are Immutable

Once created, you cannot modify a frozenset. Methods like add(), remove(), discard(), pop(), and clear() don't exist for frozensets:

python

# Creating a frozenset
languages = frozenset(["Python", "JavaScript", "Java"])
 
# Attempting to modify raises an error
# languages.add("C++")  # AttributeError: 'frozenset' object has no attribute 'add'
# languages.remove("Java")  # AttributeError: 'frozenset' object has no attribute 'remove'

This immutability is the defining characteristic of frozensets. If you need to "modify" a frozenset, you must create a new one:

python

# Original frozenset
original = frozenset([1, 2, 3])
 
# Creating a new frozenset with an additional element
modified = frozenset(list(original) + [4])
print(original)  # Output: frozenset({1, 2, 3})
print(modified)  # Output: frozenset({1, 2, 3, 4})

17.6.3) Set Operations Work with Frozensets

Frozensets support all the same set operations as regular sets (union, intersection, difference, etc.):

python

# Set operations with frozensets
set_a = frozenset([1, 2, 3, 4])
set_b = frozenset([3, 4, 5, 6])
 
# Union
print(set_a | set_b)  # Output: frozenset({1, 2, 3, 4, 5, 6})
 
# Intersection
print(set_a & set_b)  # Output: frozenset({3, 4})
 
# Difference
print(set_a - set_b)  # Output: frozenset({1, 2})
 
# Symmetric difference
print(set_a ^ set_b)  # Output: frozenset({1, 2, 5, 6})

You can also mix regular sets and frozensets in operations:

python

regular_set = {1, 2, 3}
frozen_set = frozenset([3, 4, 5])
 
# Operations between regular and frozen sets
result = regular_set | frozen_set
print(result)  # Output: {1, 2, 3, 4, 5}
print(type(result))  # Output: <class 'set'> (result is a regular set)

17.6.4) Why Use Frozensets?

The primary reason to use frozensets is that they can be used as dictionary keys or as elements in other sets, which regular sets cannot:

python

# WRONG - Regular sets cannot be dictionary keys
# regular_set = {1, 2, 3}
# my_dict = {regular_set: "value"}  # TypeError: unhashable type: 'set'
 
# CORRECT - Frozensets can be dictionary keys
frozen_set = frozenset([1, 2, 3])
my_dict = {frozen_set: "value"}
print(my_dict)  # Output: {frozenset({1, 2, 3}): 'value'}
print(my_dict[frozen_set])  # Output: value

A practical example using frozensets as dictionary keys:

python

# Storing information about coordinate pairs
# Each coordinate is a frozenset of (x, y) values
location_data = {
    frozenset([0, 0]): "origin",
    frozenset([1, 0]): "east",
    frozenset([1, 1]): "northeast"
}
 
# Looking up a location
point = frozenset([1, 0])
print(location_data[point])  # Output: east

Frozensets can also be elements in other sets:

python

# WRONG - Regular sets cannot be elements of sets
# set_of_sets = {{1, 2}, {3, 4}}  # TypeError: unhashable type: 'set'
 
# CORRECT - Frozensets can be elements of sets
set_of_frozensets = {
    frozenset([1, 2]),
    frozenset([3, 4]),
    frozenset([5, 6])
}
print(set_of_frozensets)
# Output: {frozenset({1, 2}), frozenset({3, 4}), frozenset({5, 6})}

A practical example representing groups:

python

# Representing teams where each team is a frozenset of player IDs
tournament_teams = {
    frozenset([101, 102, 103]),  # Team A
    frozenset([201, 202, 203]),  # Team B
    frozenset([301, 302, 303])   # Team C
}
 
# Check if a specific team is registered
team_to_check = frozenset([101, 102, 103])
if team_to_check in tournament_teams:
    print("Team is registered")
else:
    print("Team not found")
# Output: Team is registered

17.6.5) Converting Between Sets and Frozensets

You can easily convert between regular sets and frozensets:

python

# Converting a regular set to a frozenset
regular = {1, 2, 3, 4}
frozen = frozenset(regular)
print(frozen)  # Output: frozenset({1, 2, 3, 4})
 
# Converting a frozenset to a regular set
frozen = frozenset([5, 6, 7, 8])
regular = set(frozen)
print(regular)  # Output: {5, 6, 7, 8}
 
# Now we can modify the regular set
regular.add(9)
print(regular)  # Output: {5, 6, 7, 8, 9}

17.7) Hashable and Unhashable Types: What Can Be Dictionary Keys or Set Elements (and a Brief Note on Hashing)

Throughout this chapter, we've seen that sets can contain some types of objects but not others. For example, you can create a set of integers or strings, but not a set of lists. This restriction exists because set elements (and dictionary keys, as we learned in Chapter 16) must be hashable.

17.7.1) What Does "Hashable" Mean?

A hashable object is one that has a hash value that never changes during its lifetime. Python computes this hash value using a built-in function called hash():

python

# Hashable types have a hash value
print(hash(42))  # Output: 42
print(hash("Python"))  # Output: (some large integer)
print(hash((1, 2, 3)))  # Output: (some large integer)

The hash value is an integer that Python uses internally to quickly locate objects in sets and dictionaries. Think of it like an address or index that helps Python find things efficiently.

Key property: For an object to be hashable, its hash value must remain constant throughout its lifetime. This means the object itself must be immutable—if the object could change, its hash value would need to change too, which would break sets and dictionaries.

17.7.2) Immutable Types Are Hashable

All of Python's immutable built-in types are hashable and can be used as set elements or dictionary keys:

python

# Integers are hashable
numbers = {1, 2, 3, 4, 5}
print(numbers)  # Output: {1, 2, 3, 4, 5}
 
# Strings are hashable
words = {"apple", "banana", "cherry"}
print(words)  # Output: {'apple', 'banana', 'cherry'}
 
# Tuples are hashable (if they contain only hashable elements)
coordinates = {(0, 0), (1, 1), (2, 2)}
print(coordinates)  # Output: {(0, 0), (1, 1), (2, 2)}
 
# Frozensets are hashable
frozen_sets = {frozenset([1, 2]), frozenset([3, 4])}
print(frozen_sets)  # Output: {frozenset({1, 2}), frozenset({3, 4})}
 
# Booleans and None are hashable
mixed = {True, False, None, 42, "text"}
print(mixed)  # Output: {False, True, None, 42, 'text'}

17.7.3) Mutable Types Are Not Hashable

Mutable types like lists, regular sets, and dictionaries are not hashable because their contents can change:

python

# Lists are NOT hashable
# my_set = {[1, 2, 3]}  # TypeError: unhashable type: 'list'
 
# Regular sets are NOT hashable
# set_of_sets = {{1, 2}, {3, 4}}  # TypeError: unhashable type: 'set'
 
# Dictionaries are NOT hashable
# my_set = {{"key": "value"}}  # TypeError: unhashable type: 'dict'

Why does mutability matter? Consider what would happen if we could add a list to a set:

python

# Hypothetical scenario (this doesn't actually work)
# my_list = [1, 2, 3]
# my_set = {my_list}  # Suppose this worked
# 
# # Python computes hash based on [1, 2, 3]
# # Now we modify the list:
# my_list.append(4)  # Now it's [1, 2, 3, 4]
# 
# # The hash value would be wrong! The set would be corrupted.

This is why Python prevents mutable objects from being in sets or used as dictionary keys—it would break the internal data structure.

Common beginner confusion: Even though sets themselves are mutable (you can add and remove elements), the elements must be immutable. Beginners sometimes try to modify objects after adding them to sets, not realizing this conceptual distinction:

python

# Common confusion: set is mutable, but elements must be immutable
# Set is mutable - you can change its contents
fruits = {'apple', 'banana'}
fruits.add('orange')     # ✓ Works
fruits.remove('apple')   # ✓ Works
 
# But elements must be immutable - they cannot be changed
my_list = [1, 2, 3]
# my_set = {my_list}  # ✗ TypeError: unhashable type: 'list'
# Why? If you could modify my_list after adding it, the set's internal 
# structure would be corrupted.
 
# This works because tuples are immutable
my_tuple = (1, 2, 3)
my_set = {my_tuple}  # ✓ Works - tuples can't be modified

17.7.4) The Special Case of Tuples

Tuples are hashable only if all their elements are hashable. A tuple containing mutable objects is not hashable:

python

# Tuple with only immutable elements - hashable
good_tuple = (1, 2, "three")
my_set = {good_tuple} # Works: good_tuple is hashable
print(my_set)  # Output: {(1, 2, 'three')}
 
# Tuple containing a list - NOT hashable
bad_tuple = (1, 2, [3, 4])
# my_set = {bad_tuple}  # TypeError: unhashable type: 'list'

This makes sense: even though the tuple itself is immutable (you can't change which objects it contains), if one of those objects is mutable, the overall "value" of the tuple can change:

python

# Demonstrating why tuples with mutable elements can't be hashed
inner_list = [1, 2]
my_tuple = (inner_list, 3)
 
# The tuple structure is fixed, but the list inside can change
inner_list.append(3)  # Now inner_list is [1, 2, 3]
# The tuple now "contains" different data, but it's the same tuple object

17.7.5) Testing Hashability

You can test if an object is hashable by trying to compute its hash:

python

# Testing hashability
def is_hashable(obj):
    """Check if an object is hashable."""
    try:
        hash(obj)
        return True
    except TypeError:
        return False
 
# Testing various types
print(is_hashable(42))  # Output: True
print(is_hashable("text"))  # Output: True
print(is_hashable((1, 2, 3)))  # Output: True
print(is_hashable([1, 2, 3]))  # Output: False
print(is_hashable({1, 2, 3}))  # Output: False
print(is_hashable({"key": "value"}))  # Output: False

17.7.6) Summary of Hashable Types

Hashable (can be set elements or dict keys):

Integers: 42
Floats: 3.14
Strings: "text"
Tuples (if all elements are hashable): (1, 2, "three")
Frozensets: frozenset([1, 2, 3])
Booleans: True, False
None: None

Not Hashable (cannot be set elements or dict keys):

Lists: [1, 2, 3]
Regular sets: {1, 2, 3}
Dictionaries: {"key": "value"}
Tuples containing unhashable elements: (1, [2, 3])

Understanding hashability helps you choose the right data structures and avoid common errors when working with sets and dictionaries. The key principle is simple: if an object can change, it can't be hashed; if it can't be hashed, it can't be in a set or used as a dictionary key.