## Patricia

# Patricia

Patricia is a kind of **prefix tree**, enhanced based on Trie. In a trie, every bit(integer trie) or character(alphabetic trie) occupies a tree node. This behaviour may be unefficient when lots of node only have one child, as shown in figure 1.

Figure 1: integer trie(little-endian)

Figure 2: integer patricia(big-endian)

To make trie more compact, we can merge nodes where a parent node only has one child, as shown in figure 2. In Patricia, we have two kinds of nodes:

**Leaf**: contains information like key and value.**Branch node**: contains information like the common prefix of its children.

We will use **Integer Patricia** to demonstrate how to construct a patricia.

## Integer Patricia

Integer patricia is a kind of patricia, whose keys are integers. Similar to integer trie, integer keys are represented using binaries. For example, 2 is represented as `10`

, 5 is represented as `101`

.

### Insertion

To insert key value pair in patricia, we need to deal with several cases. Suppose we need to insert key value pairs: **(4, 4), (6,6), (5,5), (1,1)** sequently.

#### case 1

If the root of patricia is empty, we just initiate a new node with key and value as root node, which is **(4,4),** as shown in figure 3. Note that the binary representation of 4 is `100`

.

Figure 3: insert key 4 when the root is empty

#### case 2

When insert key 6, we find that key 4 (binary: 100) and key 6 (binary: 110) has a common prefix `100`

, so we create a new branch node with **prefix** 100 and **mask** 100, as shown in figure 4.

**prefix**: the longest common bits of two keys.**mask**: denotes which bit begin to be different of two keys. It's value is exponential of 2, for example: 10, 100, 1000. All bits lower than mask do not belong to common prefix.

Below is python program that calculate prefix and mask.

def create_prefix_node(self, key1, key2): mask = 1 xor = key1 ^ key2 while xor > 0: xor = xor >> 1 mask = mask << 1 prefix = key1 & (~(mask - 1)) prefix_node = IntegerPatriciaNode(prefix=prefix, mask=mask) return prefix_node, mask

To calculate the prefix and mask of 100 and 110, we first perform **exclusive or** between 100 and 110, which is 10. We can think the highest bit of `xor`

denotes the bit where `key1`

begin to be different from `key2`

. Next, we use the `xor`

to calculate `mask`

, which is 100. Note that mask always has one more bit than `xor`

. Last, we use `mask`

and `key1`

(100/110) to calculate prefix. The key concept of `key1 & (~(mask - 1))`

is that we try to keep bits which are higher than or equal to the highest bit of mask, setting others to 0.

The mask 100 means that key 100 and 110 begin to be different at the second bit position. The second bit of key 100 is 0, whereas 1 for key 110. So the key 100 is branched out to left, key 110 is branched out to right. We can write this mask logic in python:

def match_left(self, mask, key): return key & (mask >> 1) == 0

Figure 4: insert key 6, generate a new branch node

#### case 3

When inserting key 5, we first need to determine whether 5 matches the prefix 100 with mask 100. The matching logic is the same as the prefix generating logic, which can be expressed in python:

def match_prefix(self, prefix, mask, key): return key & (~(mask - 1)) == prefix

Since the binary of 5 is `101`

, it match the prefix 100 with mask 100.

Next, we need to decide which branch(0 or 1) should we branch key 5. We can use the `match_left`

function above, which will return True. So 5 will be branched to left.

Finally, we need to calculate the common prefix of 4 and 5, creating a new branch node with prefix 100 and mask 10.

Figure 5: insert key 5

#### case 4

The last case is inserting key 1. We first need to check whether 1 match the prefix 100 with mask 100. Obviously, 1 do not match prefix 100, so we need to recalculate a new prefix for key 1 and prefix 100, which yields 0 with mask 1000, and then create a new branch node using the new prefix and mask, as shown in figure 6.

figure 6: insert key 1

### Search

To search a key in patrica, we begin at root node, and there are 2 cases.

#### case 1

If current node is a leaf, we just compare the key in leaf with search key.
* If they match, we return the value stored in leaf.
* Otherwise, we terminate the search process.

#### case 2

If current node is branch node, we need to determine whether the search key match the prefix of branch node.
* If match, we pass the search key to child of branch node and continue to search its child(left or right child, determined by match_left method).
* Otherwise, we terminate the search process.

The search process can be expressed in python:

def search(self, key): if not self.root: return None current = self.root while not self.is_leaf(current) and self.match_prefix(current.prefix, current.mask, key): current = current.left if self.match_left(current.mask, key) else current.right if self.is_leaf(current) and current.key == key: return current.value return None

### Full Python Implementation

class IntegerPatriciaNode(object): def __init__(self, key=None, value=None, left=None, right=None, prefix=None, mask=None): self.key = key self.value = value self.left = left self.right = right self.prefix = prefix self.mask = mask if self.prefix is not None: self.is_leaf = False else: self.is_leaf = True def __unicode__(self): return u'key:%s, prefix:%s, mask:%s' % (self.key, self.prefix, self.mask) class IntegerPatricia(object): def __init__(self): self.root = None def is_leaf(self, node): return node.is_leaf def match_prefix(self, prefix, mask, key): return key & (~(mask - 1)) == prefix def match_left(self, mask, key): return key & (mask >> 1) == 0 def create_prefix_node(self, key1, key2): mask = 1 xor = key1 ^ key2 while xor > 0: xor = xor >> 1 mask = mask << 1 prefix = key1 & (~(mask - 1)) prefix_node = IntegerPatriciaNode(prefix=prefix, mask=mask) return prefix_node, mask def insert(self, key, value): assert isinstance(key, int) if not self.root: self.root = IntegerPatriciaNode(key, value) return parent = None current = self.root while not self.is_leaf(current) and self.match_prefix(current.prefix, current.mask, key): parent = current current = current.left if self.match_left(current.mask, key) else current.right if self.is_leaf(current) and current.key == key: current.value = value return tmp_key = current.key if self.is_leaf(current) else current.prefix prefix_node, mask = self.create_prefix_node(tmp_key, key) if self.match_left(mask, key): prefix_node.left = IntegerPatriciaNode(key, value) prefix_node.right = current else: prefix_node.right = IntegerPatriciaNode(key, value) prefix_node.left = current # update root node if not parent: self.root = prefix_node else: # relink to parent if parent.left == current: parent.left = prefix_node else: parent.right = prefix_node def search(self, key): if not self.root: return None current = self.root while not self.is_leaf(current) and self.match_prefix(current.prefix, current.mask, key): current = current.left if self.match_left(current.mask, key) else current.right if self.is_leaf(current) and current.key == key: return current.value return None

## Comments