Comments
litl_phil wrote: While it's nice that Google and Acer share the vision of cloud-based computing, it's also worth noting that we at litl already have a webbook on the market (available at litl.com) that runs our own cloud-based OS. Unlike Chrome, litlOS is focused on creating a new and better web experience for the home, so we don't have the usual browser interface, we have our own innovative UI. In conjunction with easel mode (litl's inverted-V position) and our growing cohort of litl channels (special apps t...
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


It's a Multi-Core World: Let the Data Flow
A functional parallelism paradigm that fits multi-core processor architecture

Though the potential for distinguishing between the process language and the language or mechanism for coordinating these processes was recognized early on, it wasn't until the late 1980s and early 1990s that the idea drew much attention. Thomas M. Parks presented a Kahn process network scheduling policy in his PhD thesis that ensured bounded queues for infinite inputs, making practical implementations realizable. Simultaneously, projects focused on the software engineering aspects of dataflow began showing up. While not technically dataflow because it doesn't obey the dataflow execution model, J. Paul Morrison's flow-based programming explored the reusability of large-grain processes implemented in common programming languages. He applied these ideas to large systems in the banking industry, empirically measuring an improvement in programmer productivity. More modern frameworks combine the engineering benefits of languages such as Java with the dataflow execution model.

Application and Existing Implementations
Now that you have some background in dataflow technology, you're probably starting to see applications for it. The traditional arena for dataflow is signal processing, since that's where it got its start. And dataflow is still used in that area of the industry today. This is especially true in the academic world. A quick search for dataflow on the Web will show many universities with research activity in the area of signal processing using dataflow technology.

Along those lines, the LabVIEW toolset created by National Instruments has an architecture based on dataflow technology (www.ni.com/labview/). The outstanding user experience offered by this toolset lets a user build up a dataflow graph of data collection and data processing nodes very quickly. It appears to have used dataflow concepts as more of a functional paradigm than for performance. However, with the advent of multi-core and the availability of processing power at much more affordable price points, the LabVIEW toolset is poised to provide highly scalable processing due to its use of dataflow architecture.

There are other applications for dataflow technology beyond signal processing. The pipeline nature of dataflow implementations provides a natural fit for data processing applications. Since the data is pipelined in a dataflow architecture, massive amounts of data can be processed in a highly scalable way. This ability implies that dataflow techniques can be applied to many industry problems, including:

  • Data mining and data warehousing
  • Data analytics and business intelligence
  • ETL (extract, transform, and load) processing
  • Data quality
  • Fraud detection

One of the hybrid approaches to dataflow implementation has been used to create massively scalable data processing engines. Way back in the 1990s, a start-up company called Torrent created a C++-based dataflow framework named Orchestrate. This framework implemented dataflow techniques and could run across a cluster of homogeneous systems. Several Torrent customers created business intelligence applications using this dataflow framework. Torrent was eventually acquired by Ascential, which was then acquired by IBM in 2005.

Another data processing engine using dataflow architecture has been built with 100% Java technology. This engine, available from Pervasive Software, uses more of the style of flow-based programming (www.pervasivedatarush.com). It's currently available as part of a free public beta program sponsored by Pervasive. See the sidebar for more information.

Conclusion
As the information age and the multi-core wave continue to collide, more pressure is exerted on software developers to provide access to increasingly inexpensive computing horsepower. High-performance computing was once the domain of system experts, government agencies and universities. Now it's in demand by large as well as medium-sized companies with massive amounts of data to process and shrinking time windows. Compute power that used to cost millions is now available in systems that can be ordered on the Web for $20k.

All of this leads to the need for a better programming paradigm: a paradigm that encourages the developer to build highly performing and scalable software without the burden of low-level system knowledge. Dataflow is one such paradigm. Dataflow, being a pipelined architecture, is inherently scalable. And as we've pointed out, dataflow concepts have been around for many years. They've undergone change and growth as the ideas have matured in the academic community.

There are also several commercial implementations of dataflow in the marketplace, a sign that dataflow technology is real and has many benefits to bring to the software development community. We encourage you to investigate dataflow concepts and determine how they can fit into your software architectures going forward.

Resources
• Johnston, Hanna, and Millar. "Advances in Dataflow Programming Languages." 2004.
http://portal.acm.org/citation.cfm?id=1013208.1013209
• Najjar, Lee, and Gao. "Advances in the Dataflow Computational Model." 1999.
http://ptolemy.eecs.berkeley.edu/publications/papers/99/dataflow/
• Parks. "Bounded Scheduling of Process Networks." 1995.
http://ptolemy.eecs.berkeley.edu/publications/papers/95/parksThesis/
• Harris, Marlow, Jones, and Herlihy. "Composable Memory Transactions." 2006.
http://research.microsoft.com/~simonpj/papers/stm/#composble
• Van Roy and Haridi. "Concepts, Techniques, and Models of Computer Programming." 2004.
WWW.INFO.UCL.AC.BE/~PVR/BOOK.HTML

SIDEBAR

Pervasive DataRush: A Dataflow Implementation

The article on dataflow concepts introduced you to dataflow technology and its relevance to multi-core processing. Here we'll discuss an implementation of dataflow technology built with Java. The application framework is called Pervasive DataRush and is currently in beta release from Pervasive Software.

DataRush is an application development framework. Its purpose is to enable the user to build data processing applications that can easily take advantage of multi-core processors to produce highly scalable software. DataRush implements many dataflow concepts and extends some of these concepts to provide several ways to dramatically increase scalability of applications.

DataRush implements a scripting language, based on XML that provides the means to create reusable components and data processing applications. This language, called DFXML (dataflow XML), is simple in syntax and very flexible. DFXML is used to compose what are called assemblies. An assembly is a composite operator and can be composed of other assemblies, processes, and customizers. A process is the lowest level of operator. It's written in Java and performs the work in the dataflow graph. A customizer is a compiler helper also written in Java that enables the dynamic nature of the DataRush framework. DataRush includes a component library that consists of more than 50 pre-built, ready-to-use components provided by the framework. Included are components that provide connectivity to data and data processing components such as sort, join, merge, lookup, group, and so on.

The architecture block diagram in Figure 1 depicts the high-level architecture of the DataRush framework.

The user utilizes an IDE such as Eclipse to create DFXML assemblies and Java processes and customizers (www.eclipse.org). The DataRush assembler is used to convert DFXML scripts into a binary form. The DataRush execution engine is then invoked to compile the binary files into a dataflow graph for execution. This compilation step is run for every engine instantiation, providing a very dynamic graph-generation capability. Once the dataflow graph is generated, the DataRush engine creates threads and dataflow queues representing the graph and executes the graph. Execution monitoring is provided via JMX.


About Jim Falgout
Jim Falgout is solutions architect for Pervasive Software, where he applied dataflow principles to help architect Pervasive DataRush. He is active in the Java development community; in May of 2007, he presented a technical paper titled 'Unleashing the Power of Multi-Core Processors: Scalable Data Processing in Java Technology' at JavaOne.

About Matt Walker
Matt Walker is an engineer at Pervasive Software, seeking a deeper understanding of concurrent programming techniques to improve the Pervasive DataRush framework for dataflow programming. He holds an MS in computer science from UT and received his BS in electrical and computer engineering from Rice University.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Enterprise Open Source Magazine Latest Stories . . .
Oracle seems to have divided the open source ranks over the MySQL delay it’s having closing its acquisition of Sun. Eben Moglin, the GPL’s most ardent defender and delineator, the lawyer who has worked hand in glove for years with the Free Software Foundation’s founder Richard Stallman...
Cloud computing is a game changer. The cloud is disrupting traditional software and hardware business models by disrupting how IT service gets delivered. Entrepreneurial opportunities abound as this classic disruptive technology begins to proliferate, so it is no surprise that SYS-CON'...
The irony is that Oracle has advanced MySQL, lost money in the process, and helped its competitors - all at the same time. When Oracle buys Sun and controls MySQL the gift (other than to Microsoft SQL Server) keeps on giving as the existential threat to RDBs is managed by Redwood Shore...
WSO2, the open source SOA company, today announced the launch of the WSO2 Cloud Platform. Available today, the new WSO2 Cloud Platform features a family of WSO2 Cloud Virtual Machines; WSO2 Cloud Connectors for enabling fast, secure cloud services; and the multi-tenant WSO2 Governance-...
Now, the open source Mozilla Thunderbird client software can be used with Open-Xchange collaboration software. The "Community OXtender for Thunderbird" software connector gives users full access to appointments and contacts stored in the Open-Xchange Server and enables them to use Thun...
Morph Labs, a leading provider of enterprise cloud computing technology, today announced an introductory trial of the Morph CloudServer, an open, standards-based server IT organizations can use to rapidly model and evaluate their cloud implementations. A miniature "Cloud Environment in...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE