Patricia

Patricia

Patricia is a kind of prefix tree, enhanced based on Trie. In a trie, every bit(integer trie) or character(alphabetic trie) occupies a tree node. This behaviour may be unefficient when lots of node only have one child, as shown in figure 1.

Alt text

Figure 1: integer trie(little-endian)

Alt text

Figure 2: integer patricia(big-endian)

To make trie more compact, we can merge nodes where a parent node only has one child, as shown in figure 2. In Patricia, we have two kinds of nodes:

  • Leaf: contains information like key and value.
  • Branch node: contains information like the common prefix of its children.

We will use Integer Patricia to demonstrate how to construct a patricia.

Integer Patricia

Integer patricia is a kind of patricia, whose keys are integers. Similar to integer trie, integer keys are represented using binaries. For example, 2 is represented as 10, 5 is represented as 101.

Insertion

To insert key value pair in patricia, we need to deal with several cases. Suppose we need to insert key value pairs: (4, 4), (6,6), (5,5), (1,1) sequently.

case 1

If the root of patricia is empty, we just initiate a new node with key and value as root node, which is (4,4), as shown in figure 3. Note that the binary representation of 4 is 100.

Alt text

Figure 3: insert key 4 when the root is empty

case 2

When insert key 6, we find that key 4 (binary: 100) and key 6 (binary: 110) has a common prefix 100, so we create a new branch node with prefix 100 and mask 100, as shown in figure 4.

  • prefix: the longest common bits of two keys.
  • mask: denotes which bit begin to be different of two keys. It's value is exponential of 2, for example: 10, 100, 1000. All bits lower than mask do not belong to common prefix.

Below is python program that calculate prefix and mask.

def create_prefix_node(self, key1, key2):
    mask = 1
    xor = key1 ^ key2
    while xor > 0:
        xor = xor >> 1
        mask = mask << 1
    prefix = key1 & (~(mask - 1))
    prefix_node = IntegerPatriciaNode(prefix=prefix, mask=mask)

    return prefix_node, mask

To calculate the prefix and mask of 100 and 110, we first perform exclusive or between 100 and 110, which is 10. We can think the highest bit of xor denotes the bit where key1 begin to be different from key2. Next, we use the xor to calculate mask, which is 100. Note that mask always has one more bit than xor. Last, we use mask and key1(100/110) to calculate prefix. The key concept of key1 & (~(mask - 1)) is that we try to keep bits which are higher than or equal to the highest bit of mask, setting others to 0.

The mask 100 means that key 100 and 110 begin to be different at the second bit position. The second bit of key 100 is 0, whereas 1 for key 110. So the key 100 is branched out to left, key 110 is branched out to right. We can write this mask logic in python:

def match_left(self, mask, key):
    return key & (mask >> 1) == 0

Alt text

Figure 4: insert key 6, generate a new branch node

case 3

When inserting key 5, we first need to determine whether 5 matches the prefix 100 with mask 100. The matching logic is the same as the prefix generating logic, which can be expressed in python:

def match_prefix(self, prefix, mask, key):
    return key & (~(mask - 1)) == prefix

Since the binary of 5 is 101, it match the prefix 100 with mask 100.

Next, we need to decide which branch(0 or 1) should we branch key 5. We can use the match_left function above, which will return True. So 5 will be branched to left.

Finally, we need to calculate the common prefix of 4 and 5, creating a new branch node with prefix 100 and mask 10.

Alt text

Figure 5: insert key 5

case 4

The last case is inserting key 1. We first need to check whether 1 match the prefix 100 with mask 100. Obviously, 1 do not match prefix 100, so we need to recalculate a new prefix for key 1 and prefix 100, which yields 0 with mask 1000, and then create a new branch node using the new prefix and mask, as shown in figure 6.

Alt text

figure 6: insert key 1

To search a key in patrica, we begin at root node, and there are 2 cases.

case 1

If current node is a leaf, we just compare the key in leaf with search key. If they match, we return the value stored in leaf. Otherwise, we terminate the search process.

case 2

If current node is branch node, we need to determine whether the search key match the prefix of branch node. If match, we pass the search key to child of branch node and continue to search its child(left or right child, determined by match_left method). Otherwise, we terminate the search process.

The search process can be expressed in python:

def search(self, key):
    if not self.root:
        return None

    current = self.root
    while not self.is_leaf(current) and self.match_prefix(current.prefix, current.mask, key):
        current = current.left if self.match_left(current.mask, key) else current.right

    if self.is_leaf(current) and current.key == key:
        return current.value

    return None

Full Python Implementation

class IntegerPatriciaNode(object):

    def __init__(self, key=None, value=None, left=None, right=None, prefix=None, mask=None):
        self.key = key
        self.value = value
        self.left = left
        self.right = right
        self.prefix = prefix
        self.mask = mask
        if self.prefix is not None:
            self.is_leaf = False
        else:
            self.is_leaf = True

    def __unicode__(self):
        return u'key:%s, prefix:%s, mask:%s' % (self.key, self.prefix, self.mask)


class IntegerPatricia(object):

    def __init__(self):
        self.root = None

    def is_leaf(self, node):
        return node.is_leaf

    def match_prefix(self, prefix, mask, key):
        return key & (~(mask - 1)) == prefix

    def match_left(self, mask, key):
        return key & (mask >> 1) == 0

    def create_prefix_node(self, key1, key2):
        mask = 1
        xor = key1 ^ key2
        while xor > 0:
            xor = xor >> 1
            mask = mask << 1
        prefix = key1 & (~(mask - 1))
        prefix_node = IntegerPatriciaNode(prefix=prefix, mask=mask)

        return prefix_node, mask

    def insert(self, key, value):
        assert isinstance(key, int)
        if not self.root:
            self.root = IntegerPatriciaNode(key, value)
            return

        parent = None
        current = self.root
        while not self.is_leaf(current) and self.match_prefix(current.prefix, current.mask, key):
            parent = current
            current = current.left if self.match_left(current.mask, key) else current.right

        if self.is_leaf(current) and current.key == key:
            current.value = value
            return

        tmp_key = current.key if self.is_leaf(current) else current.prefix
        prefix_node, mask = self.create_prefix_node(tmp_key, key)
        if self.match_left(mask, key):
            prefix_node.left = IntegerPatriciaNode(key, value)
            prefix_node.right = current
        else:
            prefix_node.right = IntegerPatriciaNode(key, value)
            prefix_node.left = current

        # update root node
        if not parent:
            self.root = prefix_node
        else:
            # relink to parent
            if parent.left == current:
                parent.left = prefix_node
            else:
                parent.right = prefix_node


    def search(self, key):
        if not self.root:
            return None

        current = self.root
        while not self.is_leaf(current) and self.match_prefix(current.prefix, current.mask, key):
            current = current.left if self.match_left(current.mask, key) else current.right

        if self.is_leaf(current) and current.key == key:
            return current.value

        return None
Current rating: 5

Comments

ADDRESS

  • Email: jimmykobe1171@126.com
  • Website: www.catharinegeek.com