Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


Simplify Pattern Matching
Use java.util.regex

Pattern matching using "regular expressions" can help automate a number of text-processing operations like search and replace, input validation, text conversion, and filters. What otherwise requires significant amounts of code can be done in just a few lines with regular expressions because of the powerful underlying regular expressions processing engine. Some programming languages such as Perl and operating systems utilities such as grep have supported regular expressions for a number of years. But before J2SE 1.4, Java (J2SDK) didn't support it and one had to use external packages like Jakarta Regexp, IBM's commercial package (com.ibm.regex). Thankfully that changed with the introduction of the java.util.regex package. The package provides standard implementations for specifying and handling regular expressions. This article will show you how you can quickly use it to implement regular expressions for pattern-based search features. The article starts out by reviewing some important regular expressions fundamentals and then dives into the details of the package. The embedded examples demonstrate the important constructs through simple use cases.

What's a Regular Expression and Why It's Important
If you've used regular expressions in other languages, the following sections will introduce you to the Java flavor and help uncover some of the new features. If you're not familiar with regular expressions, you'll soon discover how to use them effectively to handle text processing in ways you never thought possible before.

A regular expression is a mechanism to specify a textual pattern and detect the presence of the pattern in a given character sequence. In other words, it's a pattern language. A regular expressions pattern is typically specified as a combination of two types of characters, literals and meta-characters. Literals are normal text characters (a, b, c, 1, 2) while meta-characters (ex. *, $, etc.) convey a special meaning to the regular expression engine discussed in the next few sections. A regular expression engine understands the pattern language. The engine interprets the regular expression, does the pattern match, and processes the results. The language and the engine together make regular expressions a powerful tool that simplifies pattern matching. A given implementation like java.util.regex and JRegex provides additional query and utility functions (replace, split, etc.) that are useful in modifying the target text. For details about other Java implementations and implementations available in other languages, please consult the references section.

Meta-Characters
Meta-characters provide advanced expressive power to regular expressions. I will discuss a frequently used meta-character subset that Java supports. For a complete list, please consult the Sun's API documentation (class java.util.regex.Pattern). A number of examples that use these meta-characters immediately follow this discussion.

Anchors
An anchor matches a pre-defined position in the target text. Anchors are similar to reference points and are used to determine the relative positions of other elements in the regular expression. They are typically used to match the boundary positions of string, line, word, etc., although they could also match any other position using the special "Lookaround" constructs shown in Listing 1. The Lookaround constructs match a position based on a given condition. A positive lookahead (?= Neo) matches a position that's immediately followed by the text 'Neo' whereas a negative lookahead (?! Neo) matches the positions that don't have the text 'Neo' at the end. Lookbehind constructs (positive ?<=..., negative ?<!...) work in the opposite way.

Character Classes, Class Shorthands and Alternation
A character class construct [...] is used to specify a list of characters to be included in the regular expression while the construct [^...] specifies the character list to be excluded. In the case of [...] a match is considered successful if any of the characters specified in the list is found. For example, the regular expression [cw]ould matches the instances of words 'could' and 'would'. The class notation implies a logical OR condition also known as "Alternation" between its elements. Alternation is used to specify conditions (x|y) where matching either x or y is considered a success. Therefore, the earlier regular expression could also be written as (c|w)ould.

Special class meta-characters such as (-) can be used to specify a range of values, so class [a-z] specifies all letters from a through z. Class shorthand is a simplified representation of commonly used classes such as the class digit (\d), word (\w), whitespace, etc. A list of class shorthands available in Java is shown in Listing 1.

Quantifiers
Quantifiers are used to indicate the number of instances of the element (to which they are applied in the regular expression) required for a successful match. Java supports three quantifier types namely greedy, reluctant, and possessive. Greedy quantifiers try to match as much as possible while their reluctant counterparts (with ? at the end) try to match the least required to fulfill a match. What this means is that a greedy quantifier will try to match the entire line whether or not a successful match has occurred. It can turn into real performance overhead when the target text is big. Reluctant (or lazy) quantifiers quit as soon as a successful match occurs without bothering to run through the entire line. Possessive quantifiers (with + appended) are useful in optimizing the match operations since they don't keep the prior match states around. Listing 1 details all three types of quantifiers.

Mode Modifiers
These are special constructs to turn certain powerful regex features 'on' or 'off.' The default mode for these features is 'off' since they involve additional overhead when doing a match. The use of (?i), for example, in a regular expression turns on the case insensitive match mode. Java also supports specifying the mode modifiers at compile time using the static final variables in the class java.util.regex.Pattern. The Pattern class is discussed below in the java.util.regex section.

Example 1: Input Validation
Let's now review an example that uses the meta-characters discussed so far to address the password validation needs at Zion. The security standards set at Zion Corporation require that passwords contain only alphanumeric characters, with at least one digit and ranging between six and 32 characters long.

Listings 2 and 3 show two possible solutions to the same problem. The first approach (Listing 2) uses the built-in regular expression support inside the java.lang.String matches() method. The second approach (Listing 3) uses the classes provided by the java.util.regex package. The underlying mechanics are the same in either case and are discussed next. I'll leave the API specifics to the next section.

Let's see how the solution meets the specified requirements. The regular expression pattern on Line 3 (Listing2) is same as the Patttern pContent (Line5, Lisiting 3). The pattern uses a combination of the meta-characters, namely the character class [a-z], class shorthand (\d shorthand for character class [0-9]), and greedy quantifiers (*, +). When put in a solution context the pattern "\\b(?i)([a-z]*\\d+[a-z]*)\\b" is successful if between the word boundaries, there are 0 or more letters followed by 1 or more digits followed by 0 or more letters. The mode modifier ?i is used to indicate that the search is case-insensitive. Notice that there are a couple of differences in the regular expressions in the two listings. The obvious one is the use of comments in Listing 3. The other difference is more subtle but important, did you find it? Check out the next section (Capturing, Grouping) to verify the answer.

The pattern on line 4 (Listing 2) addresses the password-length requirement, using the {min,max} quantifier that imposes minimum and maximum limits on the number of successful matches. In this case a match is successful if "\\b(?i)([a-z0-9]){6,32}\\b" there are between six and 32 instances of alphanumeric characters between the word boundaries. Notice that in Listing 3 the case-insensitive option is specified using the final variables in the class Pattern, which makes the expression more readable. The variables are discussed further in the following sections.

About Anant Athale
Anant Athale is a senior software engineer at Motorola Labs. He specializes in enterprise Java technologies and is an active participant in the Java Community Process (JSR 262,260). He is Sun certified and has a masters degree from Arizona State University.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Could be an interesting article if the Listings 1... would be included that the article referes to.

http://groups-beta.google.com/group/regex


Your Feedback
Kaarle wrote: Could be an interesting article if the Listings 1... would be included that the article referes to.
Enterprise Open Source Magazine Latest Stories . . .
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and st...
AMD said late Tuesday that its chief sales officer Emilio Ghilardi had left the company and that CEO and president Rory Read is going to do his job while a replacement is sought. AMD didn’t say why Ghilardi left but it’s assumed Read wants his own people. Read is relatively new to th...
During the lifespan of M3 (Monitis Monitor Manager) there has always been something lacking – timers. M3 execution procedure was outlined in this previous article. The execution mentioned in the latter was a one-time-execution, whereas server monitoring requires periodic invocati...
Red Hat is putting its bought-in Gluster scale-out NAS storage technology, acquired in October, on the Amazon cloud. It’s styled Red Hat Virtual Storage Appliance for Amazon Web Services and other clouds are supposed to follow in short order.
A new episode of the screencast series is now available at the OpenNebula YouTube Channel. This screencast demonstrates the new easily-customizable self-service portal for cloud consumers. Its aim is to offer a simplified access to shared infrastructure for non-IT end users. The scree...
C12G Labs has just announced an update release of OpenNebulaPro, the enterprise edition of the OpenNebula Toolkit. OpenNebula 3.2, released two weeks ago, brings important benefits to cloud providers with a new easily-customizable self-service portal for cloud consumers, and builders w...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE