Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


Pointer Free Data Structures
Pointer Free Data Structures

Last month, we began looking at building data structures in Java. The idea for the article was inspired by the constant posts to comp.lang.java from people who were lost without pointers. The data structures which we introduced were useful, but were really more of a starting point for some more advanced data structures. The third data structure covered last month was a binary search tree. As you will remember, binary search trees are great for storing large amounts of information, as search time is much faster than for a linked-list. The problem with binary trees, however, is the worst case scenario. When the input stream is already in order (reverse or forward), then our binary tree becomes a simple linked list. This, of course, can bring our worst search time from O(lgN) all the way down to O(N) which is no good when you have to search large amounts of data.

The term applied to trees which have become weighted to one side is unbalanced. Figure 1 shows an example of a very unbalanced tree, the result of an input stream of 1-2-3-4-5-6-7. A solution to our problem is obvious; we must make sure that we do not allow our trees to become unbalanced. This, however, is easier said than done. It obviously requires manipulation of a variety of links and if the utmost attention is not spent implementing your algorithms then you can end up with a terrible mess of nodes which will be no fun to untangle.

Various computer scientists have come up with balanced tree algorithms. One of the more popular implementations is the Red-Black tree implementation. These algorithms are actually based on another, more complicated, type of balanced tree: 2-3-4 trees. Before I confuse anyone, let's fall into some theory. If you are already familiar with these topics then feel free to jump to the next section, where we begin our pointer-free implementations. If you are unfamiliar with these topics or if you are just a little rusty, then you will have a full understanding in a few minutes.

Since Red-Black balanced trees are based on 2-3-4 balanced trees we will start our discussion there. Then we will discuss their disadvantages, and finally we will discuss Red-Black balanced trees.

One of the main constraints of binary search trees is the fact that they can only connect to two unique nodes. 2-3-4 balanced trees - as the name implies - get around this by allowing each node to link to 2, 3, or 4 unique nodes. Additionally, each node is allowed to have 1, 2, or 3 keys. Figure 2 shows our tree from Figure 1 implemented as a 2-3-4 balanced tree. See how much better everything is now.

If you were paying attention to some of my earlier comments, then you will remember when I stated that balanced trees were easy to discuss in theory, but were difficult to implement. This also applies to 2-3-4 trees. It is not hard to see that the methods used to manipulate these trees would be a mess of code that only the hard-core geeks could enjoy. A happy medium of binary search trees and 2-3-4 balanced trees has been developed, and this is what we will implement in this month's column.

The "happy medium" comes in the form of something called a Red-Black balanced tree, which is basically a binary representation of a 2-3-4 balanced tree. The caveat here is that the node class uses an extra field to hold a color variable, and we represent different node structures (2, 3, or 4) by altering each node's color variable. Take a look at Figure 3. Here I show how we will represent 3 and 4-nodes by altering the color variable of different binary nodes. Figure 4 shows the tree from Figures 1 and 2 represented as a Red-Black balanced tree.

There are a few rules which Red-Black trees must conform to, and we will institute various checks throughout our methods to insure that these rules are conformed to at all times. The rules are:
1. Every node must be either red or black.
2. Every leaf must be black.
3. If a node is red, then both of its children must be black. This of course implies that we can not have two red links in a row.
4. Every path from a node to a descendant leaf contains the same number of black nodes.

Let's take a break from theory now, and begin coding. If you are still a little unclear as to just how we will insure that the rules will be adhered to, don't fret; we will build two methods (rotate and split) which will take care of everything for us. First, let's build an object which will represent one node. The object will need to have fields for the key (for simplicity we are using ints), and for the color. To save space we are using just one byte for the color, which will take a value of 1 for red or 0 for black. Additionally, we will need to create links to the left, right, and parent nodes. Listing 1 shows the node class. The code is basic, and can be easily understood from the in line comments. It should be noted that all of the get methods are designed to throw an exception if they are asked for a null node. This exception is defined in Listing 2.

As the first comment stated, RBNode will never be manipulated directly by the user. All interaction with the tree will be handled by creating a new instance of the RBTree class (Listing 3). RBTree implements two methods, rotate and split, which manage the balancing of our tree. Let's first take a look at how they would deal with the insertion of a 11 into a Red-Black balanced tree which already contains a few nodes. Once I demonstrate what they do, I will discuss how they do it, and why this is necessary.

Note: If you feel that unnecessary time is spent transferring extra objects through the methods (grand, great, curr, child, etc...) then by all means rewrite the code. I simply felt that the code was much easier to understand this way. If anyone does implement the algorithm without the object passing then please do some benchmarks and let me know which is faster; I am always eager to see my code improved upon. Plus, you will get your results published in next month's edition!

Figure 5 shows a Red-Black balanced tree waiting for insertion of the number 11. As soon as insert attaches the new node containing 11 to the end of the tree (figure 6), it calls split and passes references to the parent, grand, and great objects.

It then checks to see if the parent is a red node; if it is, one to two rotations are preformed on the links. Having to preform two rotations is obviously not something that we want to do too often. Sacrificing too much speed on an insert is not something that we want regardless of how fast a search is. Red-Black trees are great for storing a dynamic data set. If it takes too long to add to that data set then we will not be too pleased. Split by itself does not really do too much work; rotate does the meat of the reorganizing. For our example, only the second call to rotate will happen. It will be passed by the int which has just been inserted, and also the great (great-grandparent) node. It should be noted that cases where two rotations are required are not likely to occur often.

Rotate now uses the key passed into it to "rediscover" the locations of great's child and grandchild. The rotate method wraps itself up by preforming three "link restructures".

As I feel closely tied to the idiom that ends "...show me and I understand", I hope that you now have a detailed understanding of how Red-Black trees stay balanced. Before I leave you with the final code listing I want to stress a few points regarding the function of our methods. First, it is important to note that 4-nodes are something which we try to keep to a minimum. If you take a look at the snippet from our insert method, you will notice that as we perform each insert, any 4-nodes which are encountered are immediately broken up:

//is the current node a "4 node"
try {
if ( "red".equals( currNode.getLeft().getColor() ) &&
"red".equals( currNode.getRight().getColor() ))
{ split( theInt, currNode, parent, grand, great ); }
}
catch ( NoSuchNodeException nsne) { }

Also, you should note the standard points where the node structure is manipulated. These points allow us to follow the set up rules for Red-Black trees which we defined earlier. The points, with their associated action are:

  • A 3-Node connected to a 4-Node should be translated into a 4-Node connected to two 2-Nodes.
  • A 2-Node connected to a 4-Node should be translated into a 3-Node connected to two 2-Nodes.

    Related material and references:
    Algorithms in C++ by Robert Sedewick, 1992 Addison Wesley Introduction to Algorithms by Thomas H. Corman, Charles E. Leiserson, Ronald L. Riverst, 1990 MIT, McGraw-Hill The Java Programming Language by Ken Arnold and James Gosling, 1996 Sun Microsystems, Addison Wesley

    About Luke Cassady-Dorion
    Luke Cassady-Dorion is currently finishing his degree in computer science at Drexel University in Philadelphia, PA. Additionally he heads up Java Development for Odyssey Systems Corporation in Manayunk, PA.

  • In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    Enterprise Open Source Magazine Latest Stories . . .
    Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP). The...
    With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and st...
    AMD said late Tuesday that its chief sales officer Emilio Ghilardi had left the company and that CEO and president Rory Read is going to do his job while a replacement is sought. AMD didn’t say why Ghilardi left but it’s assumed Read wants his own people. Read is relatively new to th...
    During the lifespan of M3 (Monitis Monitor Manager) there has always been something lacking – timers. M3 execution procedure was outlined in this previous article. The execution mentioned in the latter was a one-time-execution, whereas server monitoring requires periodic invocati...
    Red Hat is putting its bought-in Gluster scale-out NAS storage technology, acquired in October, on the Amazon cloud. It’s styled Red Hat Virtual Storage Appliance for Amazon Web Services and other clouds are supposed to follow in short order.
    A new episode of the screencast series is now available at the OpenNebula YouTube Channel. This screencast demonstrates the new easily-customizable self-service portal for cloud consumers. Its aim is to offer a simplified access to shared infrastructure for non-IT end users. The scree...
    Subscribe to the World's Most Powerful Newsletters
    Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


    SYS-CON Featured Whitepapers
    ADS BY GOOGLE