Generators in python provide an efficient way of generating numbers or objects as and when needed, without having to store all the values in memory beforehand.
You can think of Generators as a simple way of creating iterators without having to create a class with
So how to create a Generator?
There are multiple ways, but the most common way to declare a function with a
yield instead of a
return statement. This way you will be able to iterate it through a for-loop.
# Define a Generator function: squares. def squares(numbers): for i in numbers: yield i*i
Create the generator and iterate.
# Create generator and iterate sq_gen = squares([1,2,3,4]) for i in sq_gen: print(i)
Generator Basics: The advantage of using Generators
Now let’s get into the details of a generator. But first let’s understand some basics.
Consider the following two approaches of printing the squares of values from 0 to 4:
Approach 1: Using list
# Approach 1: Using list L = [0, 1, 2, 3, 4] for i in L: print(i*i)
Approach 2: Using range generator
# Approach 2: Using range for i in range(5): print(i*i)
The first approach uses a list whereas the second one uses
range, which is a generator. Though, the output is the same from both methods, you can notice the difference when the number of objects you want to iterate massively increases.
Because, the list object occupies actual space in memory. As the size of the list increases, say you want to iterate till 5000, the required system memory increases proportionately.
However, that is not the case with the generator
range. No matter the number if iterations, the size of the generator itself does not change. That’s something!
# Check size of List vs Generator. import sys print(sys.getsizeof(L)) print(sys.getsizeof(range(6)))
range is a generator, the memory requirement of
range for iterating 5000 numbers does not increase. Because, the values are generated only when needed and not actually stored.
# check size of a larger range print(sys.getsizeof(range(5000)))
That’s still the same number of bytes as
Now, that’s the advantage of using generators.
The good part is, Python allows you to create your own generator as per your custom logic. There are multiple ways to do it though. Let’s see some examples.
Approach 1. Using the yield keyword
We have already seen this. Let’s create the same logic of creating squares of numbers using the
yield keyword and this time, we define it using a function.
- Define the generator function
def squares(numbers): for i in numbers: yield i*i
- Create the generator object
nums_gen = squares([1,2,3,4]) nums_gen
Notice, it has only created a generator object and not the values we desire. Yet. To actually generate the values, you need to iterate and get it out.
print(next(nums_gen)) print(next(nums_gen)) print(next(nums_gen)) print(next(nums_gen))
The yield statement is basically responsible for creating the generator that can be iterated upon.
Now, what happens when you use
Two things mainly:
- Because you’ve used the
yieldstatement in the func definition, a dunder
__next__()method has automatically been added to the
nums_gen, making it an iterable. So, now you can call
Once you call
next(nums_gen), it starts executing the logic defined in
squares(), until it hits upon the
yieldkeyword. Then, it sends the yielded value and pauses the function temporarily in that state without exiting. When the function is invoked the next time, the state at which it was last paused is remembered and execution is continued from that point onwards. This continues until the generator is exhausted.
The magic in this process is, all the local variables that you had created within the function’s local name space will be available in the next iteration, that is when
next is called again explicitly or when iterating in a for loop.
Had we used the
return instead, the function would have exited, killing off all the variables in it’s local namespace.
yield basically makes the function to remember its ‘state’. This function can be used to generate values as per a custom logic, fundamentally become a ‘generator’.
What happens after exhausting all the values?
Once the values have been exhausted, a
StopIteration error gets raised. You need to create the generator again in order to use it again to generate the values.
# Once exhausted it raises StopIteration error print(next(nums_gen))
You will need to re-create it and run it again.
nums_gen = squares([1,2,3,4])
This time, let’s iterate with a for-loop.
for i in nums_gen: print(i)
Alternately, you can make the generator keep generating endlessly without exhaustion. This can be done by creating it as a class that defines an
__iter__() method with an
Approach 2. Create using class as an iterable
# Approach 3: Convert it to an class that implements a `__iter__()` method. class Iterable(object): def __init__(self, numbers): self.numbers = numbers def __iter__(self): n = self.numbers for i in range(n): yield i*i iterable = Iterable(4) for i in iterable: # iterator created here print(i)
It’s fully iterated now.
Run gain without re-creating iterable.
for i in iterable: # iterator again created here print(i)
Approach 3. Creating generator without using yield
gen = (i*i for i in range(5)) gen
#> at 0x000002372CA82E40>
for i in gen: print(i)
Try again, it can be re-used.
for i in gen: print(i)
This example seems redundant because it can be easily done using
Let’s see another example of reading a text file. Let’s split the sentences into a list of words.
gen = (i.split() for i in open("textfile.txt", "r", encoding="utf8")) gen
#> at 0x000002372CA84190>
Create generator again
for i in gen: print(i)
#> [‘Amid’, ‘controversy’, ‘over’, ‘‘motivated’’, ‘arrest’, ‘in’, ‘sand’, ‘mining’, ‘case,’]
#> [‘Punjab’, ‘Congress’, ‘chief’, ‘Navjot’, ‘Singh’, ‘Sidhu’, ‘calls’, ‘for’, ‘‘honest’, ‘CM’, ‘candidate’.’]
#> [‘Amid’, ‘the’, ‘intense’, ‘campaign’, ‘for’, ‘the’, ‘Assembly’, ‘election’, ‘in’, ‘Punjab,’]
#> [‘due’, ‘less’, ‘than’, ‘three’, ‘weeks’, ‘from’, ‘now’, ‘on’, ‘February’, ’20,’, ‘the’, ‘Enforcement’, ‘Directorate’, ‘(ED)’]
#> [‘on’, ‘Friday’, ‘arrested’, ‘Bhupinder’, ‘Singh’, ‘‘Honey’,’, ‘Punjab’, ‘Chief’, ‘Minister’]
#> [‘Charanjit’, ‘Singh’, ‘Channi’s’, ‘nephew,’, ‘in’, ‘connection’, ‘with’, ‘an’, ‘illegal’, ‘sand’, ‘mining’, ‘case.’]
Let’s try that again, but just extract the first 3 words in each line.
gen = (i.split()[:3] for i in open("textfile.txt", "r", encoding="utf8")) for i in gen: print(i)
#> [‘Amid’, ‘controversy’, ‘over’]
#> [‘Punjab’, ‘Congress’, ‘chief’]
#> [‘Amid’, ‘the’, ‘intense’]
#> [‘due’, ‘less’, ‘than’]
#> [‘on’, ‘Friday’, ‘arrested’]
#> [‘Charanjit’, ‘Singh’, ‘Channi’s’]
Nice. We have covered all aspects of working with generators. Hope the concept of generators is clear now.