Comments
litl_phil wrote: While it's nice that Google and Acer share the vision of cloud-based computing, it's also worth noting that we at litl already have a webbook on the market (available at litl.com) that runs our own cloud-based OS. Unlike Chrome, litlOS is focused on creating a new and better web experience for the home, so we don't have the usual browser interface, we have our own innovative UI. In conjunction with easel mode (litl's inverted-V position) and our growing cohort of litl channels (special apps t...
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


It's a Multi-Core World: Let the Data Flow
A functional parallelism paradigm that fits multi-core processor architecture

Dataflow Implementation
The Pervasive DataRush framework implements many of the basic structures of dataflow. Processing nodes (processes in DataRush) are built in Java and interface using dataflow queues. The dataflow queues in DataRush are typed and support native Java types besides string, date, timestamp, and binary.

The dataflow queues in DataRush are somewhat comparable in functionality to the blocking queue implementations in the java.util.concurrent package introduced in the Java 5 release. They're both memory-based queues that block readers on empty queues and block writers of full queues. The DataRush queues, however, must support deadlock detection and handling. Due to support for multiple queue readers and the fact that processes can have multiple inputs and outputs, cycles of dependencies can be created in a dataflow graph. These cycles can lead to deadlock, whereby writers and readers are waiting in a way that needs intervention for the graph to continue working. A deadlock algorithm in the DataRush engine detects deadlock situations and handles it, normally by temporarily expanding the size of the problematic queue.

Besides the pipeline scalability that a dataflow architecture already provides, the Pervasive DataRush framework has built-in support for two other types of scalability: horizontal partitioning and vertical partitioning. Horizontal partitioning replicates a section of dataflow logic and segments the input data into chunks, flowing the data concurrently through the replicated dataflow sections. Figure 2 depicts this scenario using a lookup component as an example. In this example, the lookup operator is replicated with a data partitioner spreading the data load evenly to each lookup instance. This lets each lookup operator run in parallel, fully utilizing multiple cores on the system. Vertical partitioning supports running different dataflow logic in parallel on each field of an input stream. Figure 1 shows the high-level architecture of the Pervasive DataRush framework including design and execution components. The user utilizes an IDE such as Eclipse to create DFXML assemblies and Java processes and customizers. Figure 2 exemplifies horizontal partitioning, one of three types of scalability, which can be implemented using Pervasive DataRush. Horizontal partitioning replicates a section of dataflow logic and segments the input data into chunks, flowing the data concurrently through the replicated dataflow sections.

Why Java?
As the article on dataflow points out, there have been many instantiations of dataflow technology over the years. Most of them have been implemented in C or C++. This makes sense due to the prevalence of C and C++ when the systems were built. When DataRush was first being developed, the decision was made to use Java as the programming language. This decision was based on several factors: portability, flexibility, extensibility, and scalability - and you can throw in productivity for good measure. The decision was also based on the high level of industry investment in JVM technology. Over the past few years, we've seen significant performance improvements with each JDK release. Also, the amount of open source libraries available is astounding. With such a rich environment, the decision has proved to be a good one.

The question always arises about Java and performance. What we've found, with the introduction of the java.nio package and other JVM performance enhancements, is that native speeds can be obtained from Java. This is especially true for frameworks like DataRush in which a static set of classes (the process nodes) are utilized over a relatively long period of time. This scenario provides an environment well suited for JIT compilers.

A Simple Benchmark
To demonstrate the scalability of the DataRush framework, we developed a simple benchmark implementing a one-pass K-means algorithm. The algorithm takes two double-typed values as points and clusters the points into like groups. The benchmark measures the performance of running K-means on 100 input columns over 10 million rows of data. For this particular test, the input data is generated. As can be seen from Figure 3, the performance of the benchmark test improves as more CPU resources are made available. These benchmark results of a K-means test run on an 8-core machine demonstrate how a non-parallelized application fails to scale as more compute resources are added. A snapshot of the CPU utilization is also provided, showing that the DataRush framework was able to keep the machine heavily utilized for the duration of the test. Figure 4 shows CPU usage during the K-means benchmark, the Pervasive DataRush platform has scaled to take full advantage of all 8 cores available on the machine used for this test.

Conclusion
The DataRush application development framework implements dataflow concepts that enable Java programmers to create highly scalable applications that can process many million rows of data. The framework is currently in beta release and can be downloaded at www.pervasivedatarush.com. DataRush is built completely in Java and so is easy to install and begin using right away. A user interface in the Eclipse IDE is being developed, so please check back with the site periodically for updates on that development. The site also includes more information on DataRush and forums for discussion and questions.

About Jim Falgout
Jim Falgout is solutions architect for Pervasive Software, where he applied dataflow principles to help architect Pervasive DataRush. He is active in the Java development community; in May of 2007, he presented a technical paper titled 'Unleashing the Power of Multi-Core Processors: Scalable Data Processing in Java Technology' at JavaOne.

About Matt Walker
Matt Walker is an engineer at Pervasive Software, seeking a deeper understanding of concurrent programming techniques to improve the Pervasive DataRush framework for dataflow programming. He holds an MS in computer science from UT and received his BS in electrical and computer engineering from Rice University.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Enterprise Open Source Magazine Latest Stories . . .
Oracle seems to have divided the open source ranks over the MySQL delay it’s having closing its acquisition of Sun. Eben Moglin, the GPL’s most ardent defender and delineator, the lawyer who has worked hand in glove for years with the Free Software Foundation’s founder Richard Stallman...
Cloud computing is a game changer. The cloud is disrupting traditional software and hardware business models by disrupting how IT service gets delivered. Entrepreneurial opportunities abound as this classic disruptive technology begins to proliferate, so it is no surprise that SYS-CON'...
The irony is that Oracle has advanced MySQL, lost money in the process, and helped its competitors - all at the same time. When Oracle buys Sun and controls MySQL the gift (other than to Microsoft SQL Server) keeps on giving as the existential threat to RDBs is managed by Redwood Shore...
WSO2, the open source SOA company, today announced the launch of the WSO2 Cloud Platform. Available today, the new WSO2 Cloud Platform features a family of WSO2 Cloud Virtual Machines; WSO2 Cloud Connectors for enabling fast, secure cloud services; and the multi-tenant WSO2 Governance-...
Now, the open source Mozilla Thunderbird client software can be used with Open-Xchange collaboration software. The "Community OXtender for Thunderbird" software connector gives users full access to appointments and contacts stored in the Open-Xchange Server and enables them to use Thun...
Morph Labs, a leading provider of enterprise cloud computing technology, today announced an introductory trial of the Morph CloudServer, an open, standards-based server IT organizations can use to rapidly model and evaluate their cloud implementations. A miniature "Cloud Environment in...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE