Sunday, December 21, 2014

Abstract Data Structures

What are Abstract Data Types (ADTs)? You can find many definitions but it remains often fuzzy what it actually means and what the concept is good for. For instance, one could define an ADT as follows:

An Abstract Data Type is a formal definition of the behavior of a data object with time (and space) complexity guarantees but no implementation details.

Doesn't help much, does it? Is an abstract class an ADT ... possibly. Does an array and a bunch of functions constitute an ADT ... could be. Let us take a concrete example and play with it. The standard example for an ADT is a stack, which is defined as follows:

ADT: Stack
operations:        complexity:
  push(item)         O(1)
  item <- pop()      O(1)
  Last-in-first-out (LIFO)

It has a push operation to put an item on the stack and a pop operation to remove an item from the stack. Both operations take constant time and have LIFO behavior. A typical Python implementation of a stack in object oriented style could be this:

# object oriented
class Stack:
    def __init__(self) :
        self.items = []
    def push(self, item) :
    def pop(self) :
        return self.items.pop()

Since list.append() and list.pop() are O(1) operations in Python (see here) the above stack.push() and stack.pop() operations also take constant time. Now we can add and remove items to the stack in constant time and LIFO order. For instance,

stack = Stack()
for i in range(5):
for i in range(5):
    print(stack.pop(), end=" ")

will print out:

4 3 2 1 0 

Instead of a Python list, which actually is a dynamically growing array, we could have used a linked list, an array of fixed size, or any other data structure to implement the stack. As long as the implementation has O(1) push and pop operations with LIFO behavior it does not matter. We also can implement the stack type in an iterative or functional style. For the imperative style, we simply take a Python list and define push and pop functions:

# imperative
def push(stack, item):
def pop(stack):
    return stack.pop()

stack = []
push(stack, 1)
push(stack, 2)
item = pop(stack)

Functional programming tend to prefer immutable data structures. Let's use immutable tuples to implement a stack:

# functional
def push(stack, item):
    return stack + (item,)
def pop(stack):
    return stack[-1], stack[:-1]

stack = push(push((), 1), 2)
item, stack = pop(stack)

These are all implementations of the ADT "stack". Since Python already has a list with append and pop methods this is trivial. Let us do something a bit more interesting. A common interview question is to implement a queue using stacks as data structures only. A queue is another common ADT that is defined as follows:

ADT: Queue
operations:         complexity:
  enqueue(item)       O(1)
  item <- dequeue()   O(1)
  First-in-first-out (FIFO)

The key insight is that a stack reverses the order of the items (LIFO) and that reversing twice gives you the FIFO behavior required for a queue. We will use an instack that stores all enqueued items, and an outstack from where elements are dequeued. To achieve the required FIFO behavior of the queue, we move all items from the instack to the outstack (using push and pop operations) whenever dequeue() is called and the outstack is empty:

class Queue:
    def __init__(self) :
        self.instack = Stack()
        self.outstack = Stack()
    def enqueue(self, item) :
    def dequeue(self) :
        if self.outstack.isEmpty():
            while not self.instack.isEmpty():
        return self.outstack.pop()

You noticed that I am using the, so far undefined, method isEmpty() to test if a stack is empty. It is not essential, and we could have counted the items added/removed from the stacks to achieve the same, but most stack implementations will provide an isEmpty() method for convenience (see below).

class Stack:
    def isEmpty(self):
        return self.items == []

What is important, however, is that isEmpty() is also an O(1) operation. Now, let's check if our queue implementation actually follows the ADT specification. The enqueue method only calls the push method, which is an O(1) operation and therefore enqueuing an item is also a constant time operation as required. The dequeue method is a bit more complex. Let's think about the simple case of enqueuing and then immediately dequeuing items. In this case, we call outstack.isEmpty(), outstack.isEmpty(), outstack.push(instack.pop()) and outstack.pop() once, and since they are all O(1) operations, dequeuing is an O(1) operation as well.

Now for the more complex case. Assume we are enqueuing n items on the instack; that is of complexity O(1)*n = O(n). If we dequeue those n items, we run the while loop for the first dequeuing call, moving all n items from the instack to the outstack. That is an O(n) operation that occurs only once. We call outstack.pop() every time; this is another O(n) operation. Together we have (O(n) + O(n))/n = O(n)/n and the amortized time complexity is O(1).

In what way did the concept of ADTs help here? When implementing the queue and analyzing its time complexity, we did not have to think about the implementation of the stack. It simplified the reasoning and implementation, being able to rely on guaranteed O(1) complexity of all stack operations.