Python sets (2.7.x)

Imagine that you have an online shop, and from your sales data you want to find all the countries you have ever sold and shipped a product to. You don’t care how many sales you made to each country, just that you have sold something there at least once.

This is a good example of where using sets would be easier and faster than a solution that uses lists.

 

Sets

 

Sets can be described as follows:

  • Each element in a set is unique.
  • The elements are unordered within the set.

 

Initialising a set

 

An empty set can be initialised by:

my_set = set([])

A pre-populated set can be initialised by:

my_set = set(["one", 5, "hello"])

Or (preferred method):

my_set = {"one", 5, "hello"}

 

Note that from the examples above a set, like a list, can be populated with elements of different data types (i.e. they do not all have to be the same data type).

 

add

 

You can add an element to a set by using add():

my_set = set([])
my_set.add("hello")
print my_set
set(['hello'])

 

remove

 

You can remove an element from a set by using remove():

my_set = {"bop", "bit", 5}
my_set.remove("bop")
print my_set
set(['bit', 5])

 

union

 

If you have two sets and want to create a new set with all the elements from both sets, you can use union():

set_one = {"hello", 12, 7}
set_two = {"apple", "hello", 7, 18}

set_union = set_one.union(set_two)
print set_union
set([18, 'apple', 7, 12, 'hello'])

 

intersection

 

If you have two sets and want to find the elements that are in both sets, you can use intersection():

set_one = {"hello", 12, 7}
set_two = {"apple", "hello", 7, 18}

set_intersection = set_one.intersection(set_two)
print set_intersection
set(['hello', 7])

 

difference

 

If you have two sets and wish to remove any elements that appear in the second set from the first set, you can use difference():

set_one = {"hello", 12, 7}
set_two = {"apple", "hello", 7, 18}

set_difference = set_one.difference(set_two)
print set_difference
set([12])

 

symmetric difference

 

If you have two sets and want to find elements that appear in one set but not in both sets, you can use symmetric_difference():

set_one = {"hello", 12, 7}
set_two = {"apple", "hello", 7, 18}

set_symmetric_difference = set_one.symmetric_difference(set_two)
print set_symmetric_difference
set([18, 12, 'apple'])