Printer-styled range in Python - a case study
Recently, I had to convert a printer-styled range string into the corresponding list of selected elements. This made for a nice case study on Python class properties, iterability, length, and attribute/method privacy.
A feature request
So, there’s this script of mine that some colleagues use to convert microscopy images from the microscope vendor’s proprietary format (we have a Nikon one in the lab, so some
nd2 files) to an open source one (
.tif in this case).
Recently, I have been asked to add a feature to this script. They would like to be able to convert only some fields of view. To do this, I decided to use a range string format that is ubiquitous, OS-independent, and widely used everyday: the printer-styled range string.
This is as simple as having the pages comma-separated. Let’s say, if I wanted to print pages number 1, 5, and 6, I would write
1,5,6. Also, page ranges are allowed by using the dash (
-): to print page from 3 to 6 (both included), I would write
3-6. And the two things can be combined into
A Python3.6+ class
So, I needed something to validate, parse, and convert a printer-styled range string into an actual series of numbers. The desired behavior would be to convert
[0,2,3,4,5] (remember, Python is a 0-indexed language).
To do this, I implemented a Python3.6+ class,
MultiRange, that I then published as a Gist on GitHub (see below). I will break it down and explain it bit by bit below.
1 2 import re from typing import Iterator, List, Optional, Pattern, Tuple
The class has only two dependencies from the Python standard libraries:
reprovides regular expression related features. We will use it to validate the printer-style range string.
typingis used to provide type hints and help developers.
The class attributes
1 2 3 4 5 6 __current_item: Tuple[int] = (0, 0) __string_range: Optional[str] = None __extremes_list: Optional[List[Tuple[int]]] = None __reg: Pattern = re.compile(r'^[0-9-, ]+$') __length: Optional[int] = None __ready: bool = False
__string_rangeattribute will contain the printer-styled range string that we want to convert into a list of numbers. For example:
__extremes_listwill contain a list of tuples. Each tuple will have two elements: the start and stop of a range/slice. The idea is to convert
__string_rangeinto something like
__regcontains the validation regular expression. During class instantiation, we will try to match the input string to this. If no match is found, the class raises an error. Basically, we allow only strings including digits, commas, dashes, and spaces.
- I will talk about
__readyin the length section below.
As you might have noticed, all attributes start with two underscores
__. In Python (according to PEP8), this makes them ‘private’, i.e., inaccessible from outside the class through name mangling.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 def __init__(self, s: str): super(MultiRange, self).__init__() assert self.__reg.search(s) is not None, ( "cannot parse range string. It should only contains numbers, " + "commas, dashes, and spaces.") self.__string_range = s string_range_list = [b.strip() for b in self.__string_range.split(",")] self.__extremes_list =  for string_range in string_range_list: extremes = [int(x) for x in string_range.split("-")] if 1 == len(extremes): extremes = [extremes, extremes] assert 2 == len(extremes), "a range should be specified as A-B" assert extremes >= extremes self.__extremes_list.append(tuple(extremes)) self.__extremes_list = sorted( self.__extremes_list, key=lambda x: x) self.__clean_extremes_list() assert 0 < self.__extremes_list, "'page' count starts from 1." self.__ready = True
__init__ method creates an instance of the class and takes the printer-style range string as input (
s). The first thing we want to do is validate the input by checking it against
__reg, and stop otherwise by raising an
After that, we store the input string in
__string_range and split it into elements by using the commas as delimiters and removing any terminal white spaces.
Each element is now a string containing either a single page (
"1") or a page range (
"3-6"). We identify these two cases by splitting each element by the dashes, and counting the number of generated elements. In the case of a single page, we convert it to a number (
int) and store it into
(1,1). In the case of a page range, we save it as
Then, we sort the list of tuples we created by their first elements, and clean them up. Basically, this cleaning operation is meant to avoid overlaps between tuples (see the next section for more details).
Finally, we verify that the first tuple does not start with a 0 (page numbers start from 1) and tell the class that it is
__ready to be used.
Cleaning the list of extremes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 def __clean_extremes_list(self) -> None: is_clean = False while not is_clean: popped = 0 i = 0 while i < len(self.__extremes_list)-1: A = self.__extremes_list[i] B = self.__extremes_list[i+1] if A >= B and A < B: self.__extremes_list[i] = (A, B) self.__extremes_list.pop(i+1) popped = 1 break elif A >= B: self.__extremes_list.pop(i+1) popped = 1 break i += 1 if i >= len(self.__extremes_list)-2+popped: is_clean = True
After converting the input string into a list of number pairs (as tuples), each representing a range of pages from the first number to the second one, both included. You can imagine any of these pairs as a
Before proceeding, we want to be sure that every page number will be provided only once. In other words, all these ranges should not overlap. We achieve this by iterating through the the list of
(start,end) elements and comparing each pair with the next pair and see if they overlap. For this to work, it is crucial to have the list of pairs sorted by their start element (which we did in the
Now, when are two pairs overlapping? Given that the first pair (
A) has a start (
A) that is always lower than or equal to the start of the second one
B, this can happen in two ways:
- The second pair is fully included in the first. In other words, the first pair ends after or precisely when the second one ends:
A >= B.
- The two pairs partially overlap. The first pair ends after or precisely when the second one starts (
A >= B), but the first pair ends before the second one ends (
A < B).
It is important to distinguish these two scenarios because they require us to act differently:
- If the second pair is fully included in the first one, we can simply remove (pop) it from our list of ranges.
- If the overlap is only partial, we need to merge the two ranges. We can achieve this by removing (popping) the second one, and replacing the end position of the first one with the end of the second. Basically, we remove both pairs and place a new one:
Every time we found overlapping pairs, we resolve the conflict, and restart from the beginning of the list. Only when we reach the end of the list, we can say that it is indeed clean and ready to be used.
The MultiRange length
1 2 3 4 5 6 7 8 9 10 @property def length(self): if self.__length is None and self.__ready: self.__length = 0 for a, b in self.__extremes_list: self.__length += b-a+1 return self.__length def __len__(self) -> Optional[int]: return self.length
We want to be able to know the length of our MultiRange instance. But given that this requires some computation, we want to (1) calculate it only if the MultiRange is
__ready to be used, (2) calculate it once, (3) store it somewhere (
self.__length), (4) read the stored value at any future call of the
To provide a hook for the
len function and make the
len(MultiRange("1,2-6")) code work, we need to define a
__len__ method. The method returns an integer number: the one we will store in
We do not want anyone to be able to mess with the stored value, that’s why we store it in the private attribute
__length (notice those
__ at the beginning!). But we want a user to be able to access it without having to call the
len function. To do this, we define a class property with the
Basically, every time the user calls
MultiRange(" 1,2-6").length it will trigger the property. This will check if the class is ready (
self.__ready) and if no length value has been previously stored (
self.__length is None). Only and only in this case it will calculate the length value and store it in
__length. Otherwise, it will return the default length value:
To calculate the MultiRange length, we calculate the difference between all range extreme pairs
(a,b) and sum 1 to each (as they should both be included):
b-a+1. Then, we sum them all and store it in
How to iterate over the MultiRange elements?
What is an ‘iterator’?
First things first: what is and iterator? Well, in Python we can have iterable (1) and/or iterator (2) methods/classes. An iterable contains elements that can be iterated over, while an iterator is used to iterate over an iterable. According to PEP234:
- An object can be iterated over with for if it implements
- An object can function as an iterator if it implements
Just to re-iterate the concept (pun unintended): if a class includes an
__iter__ method it can be iterated over, i.e., the class is iterable. If a class includes a
__next__ method, it is an iterator.
Notice that, when a class is an iterator it is, per definition, also iterable. The vice-versa is not true: not all iterables are iterators. While the wording might be confusing, the concept is quite simple. Since an iterator iterates over the elements of something, one could iterate over its elements, which makes it an iterable too. On the other hand, one could potentially be able to iterate over the elements of an iterable only if an iterator is available.
Making your own Python iterator
So, now that we know what an iterator is, how can we make our
MultiRange class into one? It’s fairly simple: we implemented two methods:
__next__is our iterator method, which will return one element of MultiRange at a time, and in order. Once the elements are over, it will raise a
__iter__is our iterable method. When called, it allows a user to iterate over the elements of the MultiRange, starting from its first. This method needs only to reset the iterable location to the first element, and return the class itself. Python will tkae care of the magic behind it all, and link
__next__ we want to (1) know which element have iterated to and (2) iterate to the next one if possible, otherwise trigger a
StopIteration. First, we store the iterator location in
__current_item as a tuple containing two numbers: the index of the current range and the index of the current element in that range (starting from
When we reach the last element of a range, we move to the first element of the next range. If we have reached the last element of the last range, it is time to stop.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 def __next__(self) -> int: current_range_id, current_item = self.__current_item if current_range_id >= len(self.__extremes_list): raise StopIteration current_range = self.__extremes_list[current_range_id] if current_item >= current_range-current_range: self.__current_item = (current_range_id+1, 0) else: self.__current_item = (current_range_id, current_item+1) return current_range+current_item def __iter__(self) -> Iterator[int]: self.__current_item = (0, 0) return self