Thursday, June 13, 2013

The beauty of FP (and Scala)

I recently came across a coding problem that beautifully demonstrates the benefits of functional programming and - in this case - Scala. The following (Python) code shows a simplified version of the problem.

numbers1 = [1,2,3]
numbers2 = [1,2,3]
pairs = [(n1,n2) for n1 in numbers1 for n2 in number2]
processed = [process(n1,n2) for n1,n2 in pairs]
for p in processed: print p,  

As you can see it is trivial code that pairs the numbers in both lists, processes the pairs and prints out the results of the process function. Obviously, one could simply write

for n1 in numbers1:
  for n2 in numbers2:
    print process(n1,n2) 

and it would be better code but the previous example resembles the structure of the real, more complex code that has several additional processing steps.

To come back to the previous example, there are two possible problems: 1) let's assume it is not a few numbers but many 2) and that the process function is computationally expensive. For large number-lists memory consumption becomes an issue (since all pairs are stored in a list) and the processing will be time consuming.

Let's address the first problem first. We can easily replace list comprehensions by generators and would then iterate over number-pairs instead of storing them and their results. Easy.

pairs = ((n1,n2) for n1 in numbers1 for n2 in number2)
processed = (process(n1,n2) for n1,n2 in pairs)
for p in processed: print p,  

Problem two is harder. Let's assume the order of the results is of no importance and on a multi-core computer it would be nice to distribute the processing over the cores. But as far as I am aware there is no really elegant way to achieve this with Python. I'll show you soon what I mean by "elegant". Let's switch to Scala.

  
  val numbers1 = List(1,2,3)
  val numbers2 = List(1,2,3)
  val pairs = for(n1 <- numbers1; n2 <- numbers2) yield (n1,n2)
  val processed = pairs map { case (n1,n2) => process(n1,n2)}
  processed foreach (print)

As you can see the Scala code does the same as the first Python program. To change from list comprehensions to generators/iterators in the Scala code it is sufficient to transform the first list into an iterator (using toIterator).

  
  val numbers1 = List(1,2,3).toIterator
  val numbers2 = List(1,2,3)
  val pairs = for(n1 <- numbers1; n2 <- numbers2) yield (n1,n2)
  val processed = pairs map {case (n1,n2) => process(n1,n2)}
  processed foreach (print)

In the Python version we had to replace two list comprehensions by generators. In the Scala version we need to modify only one list. pairs and processed automatically become iterators because their inputs are iterators. In languages without generators this simple change would require substantial restructuring of the code.

Let's assume memory is not the problem but we want to speed things up. We simply add .par after pair to convert the list to a parallel collection and the processing is now performed concurrently. Beautiful!

    
  val numbers1 = List(1,2,3)
  val numbers2 = List(1,2,3)
  val pairs = for(n1 <- numbers1; n2 <- numbers2) yield (n1,n2)
  val processed = pairs.par map {case (n1,n2) => process(n1,n2)}
  processed foreach (print)

Can we have both - speed and low memory consumption? Not without effort. A parallel collection cannot be an iterator and vice versa. We could use futures but it wouldn't be trivial. However, this example shows how easy it is to tune functional code or parts of code to be memory efficient or fast/concurrent.

No comments:

Post a Comment