Wednesday, September 19, 2012

Fun with cartesian products

Cartesian product is a fancy term for all possible, ordered combinations of things taken from 2 or more collections of things (There is an exact mathematical definition for it but I don't bother you with it.) The Cartesian product is frequently used to define a grid of coordinates. For instance the following code (Python 2.7) prints out the Cartesian product of the two lists xs and ys:
xs = [1,2]
ys = [1,2,3]
for x in xs:
    for y in ys:
        print x,y
And here is the output:
1 1
1 2
1 3
2 1
2 2
2 3
Simple enough. Just two nested loops for two dimensions. This can be written very nicely as a list comprehension
product = [(x,y) for x in xs for y in ys]
which gives you
[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)]
However, it gets more challenging if you want to generalize to arbitrary numbers of dimensions. An iterative implementation is rather ugly but the recursive version is beautiful (though hard to remember):
def product(ls):
    if not ls: return [[]]
    return [[e]+p for e in ls[0] for p in product(ls[1:])]
And here a usage example:
ls = [[1,2], ['a','b','c'], ['A','B']]
for p in product(ls): 
    print p
[1, 'a', 'A']
[1, 'a', 'B']
[2, 'c', 'A']
[2, 'c', 'B']
How does it work? If the input list ls is empty a list of empty lists is returned ([[]]). Otherwise a list comprehension is performed, where the elements ([e]+p) are the elements (e) of first list (ls[0]) within the input list, prepended to the cartesian products (p) of the remainder (ls[1:]) of the input list. While really nice, it has the disadvantage that the entire Cartesian product is created in memory. Frequently, only the individual products are needed and it would be much more efficient, if one could iterate over them. The following implementation show the generator version:
def product(ls):
    if not ls: return iter( [[]] ) 
    return ([e]+p for e in ls[0] for p in product(ls[1:]))
Often only a more constraint version of this very general implementation is need. Here, for instance, a binary counter with n digits:
def bin_counter(n):
    if not n: return iter( [[]] ) 
    return ([e]+p for e in [0,1] for p in bin_counter(n-1)
or a bit more general, a counter with arbitrary digit ranges:
def counter(ls):
    if not ls: return iter( [[]] ) 
    return ([e]+p for e in xrange(ls[0]) for p in counter(ls[1:]))
ls = [2,3]                
for p in counter(ls): print p
[0, 0]
[0, 1]
[0, 2]
[1, 0]
[1, 1]
[1, 2]
Finally, let's do the same in Scala (2.9). First the general version of the Cartesian Product:
def product[T](ls: List[List[T]]): List[List[T]] = ls match {
  case Nil => List(List())  
  case h::t => for(e <- h; r <- product(t)) yield e::r
val ls = List(List(1,2),List(1,2,3))
println( product(ls) )
The type declarations make it a bit more wordy but type-safe and it still looks elegant. The main difference to the Python version is, that the dimensions need to be of the same type (e.g. all Ints or Char but not mixed). The counter implementation is very similar. Apart from the different type (List[Int]) the main difference is the usage of List.range(0,h). Alternatively, e <-(0 to h).toList, could have been used but the required conversion to a list would make it ugly.
def counter(ls: List[Int]): List[List[Int]] = ls match {
  case Nil => List(List())  
  case h::t => for(e <- List.range(0,h); r <- counter(t)) yield e::r

val ls = List(2,3)
println( counter(ls) )
None of the two Scala implementations above is a generator though. Don't get deceived by the "yield " keyword, which defines a list comprehension in Scala but not a generator as in Python. However, creating a version that generates products or counts on demand using an iterator is easy; here for the counter:
def counter(ls: List[Int]): Iterator[List[Int]] = ls match {
  case Nil => Iterator(List())  
  case h::t => for(e <- Iterator.range(0,h); r <- counter(t)) yield e::r

val ls = List(2,3)

That's it.