Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


The Paradox of Writing Perfect Code
Static code analysis versus Santa Claus and the Easter Bunny

Don't you love looking at a good piece of code? I'm talking about the kind of code where the design is so sound that the code practically wrote itself, where there were no nasty surprises at implementation, where it was 100% feature complete and bug-free, and you didn't have to patch it up a bunch of times. Maybe I'm squarely in the land of Santa Claus and the Easter Bunny, but I believe, deep down, all developers want to write that perfect piece of code. Unfortunately, real life has other ideas. Deadlines, unclear or conflicting requirements, ridiculous scope, being human - all these things keep us from the promised land of perfect code.

But here's the rub: though it may be satisfying to dream about, it's likely that you'll never produce truly perfect code for real-world applications. You'll sit down to write a piece of code, you'll do the best you can, taking into account everything you know about how the system works, how your piece of code fits into that system, and so forth. But we all know there will be mistakes - probably lots of them. And you'll do some testing, and the QA guys will do some testing, and the beta customers will do some testing, and then poof, the business-minded people in your organization will decide it's good enough to be released. At that point, the code isn't perfect, and every time you have to change that released code, you introduce risk into the system. Thousands or hundreds of thousands or millions of people are using it as is, and if you decide to make changes it might work differently for those people. This is risky, and the tools you use to help write your code must be cognizant of that fact.

Tools for Writing Imperfect Code
This article is about a certain kind of tool - static code analysis - that can be used to help you in writing good code. Not perfect code, but good code. As introductory computer science classes increasingly move to Java (even the high school AP computer science curriculum is Java-based now), the tools available to C/C++ developers should move over to Java as well. Over the last decade, as Java exploded in popularity, there have been tremendous breakthroughs in the area of practical static code analysis for defect detection. Today many commercial tools are available to do static code analysis of your C, C++, and Java code. I work for one such tool provider and I'll discuss our experience expanding from C/C++ into Java here. We'll explore how some of the concepts we used to analyze C/C++ code translated into the Java realm and the lessons we've learned in making this type of technology practical to help you write good code. First, I'll dig into some discussion of architecture and then I'll give you my philosophy on finding bugs automatically and the true purpose of these tools.

C++ and Java: What's the Same?
From a code analysis perspective, C++ and Java have a lot in common. Both require you to build some representation of the code into the guts of your analysis for dataflow analysis. This means breaking each function or method into basic blocks, computing a control flow graph, and having an analysis engine that can push checks down each possible execution path in the methods while keeping track of the relevant variables and their values. With this, each check can then pull out relevant constructs while analyzing the code. For example, if I'm looking for NULL exception problems, my NULL checker simply looks for places where objects are compared against NULL or assigned a NULL value, and then lets the analysis push down a path until I see a dereference when that value is NULL.

Listing 1 shows an example from the Struts framework. Notice that on line 171, the developer compares body against null. Unfortunately, the developer probably meant to make that comparison == instead of !=. In the case where the pointer is null, the code will skip over the assignment on line 172 and dereference the body variable on line 175. Oops. Listing 2 shows you what that code looks like in the interface of a static code analysis tool. The analysis engine pushes the checker down all the paths in this function. The checker notices the comparison against null, keeps track of the body value as being null when the condition on line 171 is false, and then reports a problem when it's derferenced as null. Simple enough, right?

False Positives and Java
Well, almost right. The biggest problem that the designer of any static code analysis tool faces is false positives. What is a false positive? Basically, any time the analysis reports a defect where there is none, that's a false positive. Some people call this noise, but I like to stay away from that term. Noise is a problem, but it's a different problem. To better understand a case that might trip up a static code analysis tool, take a look at Listing 3. The struts code from the previous example has been slightly modified to introduce a data dependence between the value of body and the value of body_tracker. Notice that after the test of body against NULL, the value of body_tracker will be 5 if body is not NULL and 12 if body is NULL. As such, there's no longer a NULL dereference on line 177 because it's guarded by the check of body tracker. This example is simple enough, but may fool some simple analysis engines into reporting the defect where there really is no problem at all because there's no possible execution path that leads body to be dereferenced when NULL.

False positives cause developers to lose trust in a tool. Why? Because the tool is wrong, and if it's wrong more often than it's right, eventually the user won't trust the tool at all. Fortunately, the techniques available for reducing false positives in C/C++ analysis translate rather nicely into the Java space. We simply provide additional checkers to search for "false paths" through the code - paths that can never be executed when the program is running. These additional checkers keep track of data flow in different ways, and any time they find a path that can't be executed, it's pruned from the analysis. This "false path pruning" is a key way to significantly reduce the false positive rate.

C++ and Java - What's Different?
There are a few key differences in analyzing C/C++ code versus Java code. Unlike C/C++, Java affords us more luxury in choosing which code to analyze. We chose to analyze bytecode instead of source code. There are tremendous advantages to looking for defects at the bytecode level. The biggest, of course, is the fact that the code has already been compiled - you don't have to deal with compiling the code and juggling the many different flavors of build systems out there. The disadvantage (if you can call it that) of analyzing bytecode instead of source code is that you need some way to tie the errors you find back to the source code. This means that the bytecode needs to have debugging symbols in it or the errors you produce won't be of much help in actually fixing the code.

The types of defects that you look for are also different. Defects in Java code have different runtime implications than their C/C++ counterparts. A NULL pointer dereference throws an exception in Java and crashes your system in C/C++. A resource leak in C/C++ happens any time heap-allocated memory isn't freed, but in Java, resource leaks occur under different circumstances - when clean-up must be done on an object that the garbage collector can't be responsible for.

Interprocedural Analysis
One key feature of the most powerful static code analysis solutions is their ability to understand what happens when one method calls another. This not only helps in finding more complex defects in the code, it also reduces the false positive rate because analysis mistakes are less likely. However, the analysis of Java introduces a challenge in this regard because virtually every method call is, er - virtual. This means that it's not so clear which instruction a virtual method call will jump to when the code is being analyzed. It depends on the runtime type of the object invoking the method. While this is a problem in C++ as well, it tends to be less systemic due to the fact that most people developing C++ code (a) don't always use objects in their code and (b) don't make all their methods virtual. To tackle this problem with a practical code analysis tool, we've developed techniques to infer the correct types of objects at runtime to determine which virtual methods could be instantiated at any given call site. Of course, our technology must make the appropriate trade-offs to retain as much precision as possible while still scaling to analyze large real Java systems. There's some great research out there to discuss the academic techniques from which we draw our ideas for implementing this in the real world. If you're interested in learning more, check Google for "Rapid Type Analysis" or "Class Hierarchy Analysis."

Noise
As I mentioned earlier, false positives are the number one challenge for static analysis. The number two challenge, and unfortunately a harder problem to deal with, is noise. How is noise different from a false positive? Noise is any issue reported by the analysis that, while technically correct from an analysis perspective, is something you just don't care about. It's obvious why this is so hard - it's completely subjective! Yet it's very important to address this to produce useful results. Take a look at Listing 4. Notice that on line 173 there's an extra space before the statement. Your static code analysis tool could report that extra space as a defect, but I'm willing to bet that most of us would consider that noise. The analysis isn't wrong per se - the statements don't line up - but I just don't care. Sure, this example is extreme, but there are less extreme cases that can be equally frustrating - even within checkers for things like NULL pointer exceptions. I've heard developers say, "Sure, but if that happens, we're totally hosed anyway, so it doesn't matter that it throws an exception there!" So the analysis can be spot on, producing an actual "defect" that could occur, but it's still reporting noise.

What To Look For
There's no silver bullet for eliminating noise, and there will always be a trade-off between the aggressiveness of an analysis and its false positive rate. But this brings me back to my initial point about the risk of changing your code. The purpose of a static code analysis tool, whether for C/C++ or for Java, is to help you find defects that would hurt the most, and to find them earlier in the software process. The purpose of these tools is not to find everything that's bad in your code, and that's a subtle distinction. There's too much risk associated with changing your code to address every little nitpick a static analysis tool can report. So when you're looking to add this type of technology to your arsenal of tools to help you ward off the bugs, take a close look at what it's going after. More "bugs" aren't necessarily better. Your time is valuable, and you don't want to waste it poring through false positive-ridden and noisy reports. Fortunately, there are tools out there that are on your side.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Enterprise Open Source Magazine Latest Stories . . .
Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP). The...
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and st...
AMD said late Tuesday that its chief sales officer Emilio Ghilardi had left the company and that CEO and president Rory Read is going to do his job while a replacement is sought. AMD didn’t say why Ghilardi left but it’s assumed Read wants his own people. Read is relatively new to th...
During the lifespan of M3 (Monitis Monitor Manager) there has always been something lacking – timers. M3 execution procedure was outlined in this previous article. The execution mentioned in the latter was a one-time-execution, whereas server monitoring requires periodic invocati...
Red Hat is putting its bought-in Gluster scale-out NAS storage technology, acquired in October, on the Amazon cloud. It’s styled Red Hat Virtual Storage Appliance for Amazon Web Services and other clouds are supposed to follow in short order.
A new episode of the screencast series is now available at the OpenNebula YouTube Channel. This screencast demonstrates the new easily-customizable self-service portal for cloud consumers. Its aim is to offer a simplified access to shared infrastructure for non-IT end users. The scree...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE