|
SYS-CON.TV Webcasts
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
Top Links You Must Click On
Feature It's a Multi-Core World: Let the Data Flow
A functional parallelism paradigm that fits multi-core processor architecture
By: Jim Falgout; Matt Walker
Aug. 26, 2007 12:00 PM
Though the potential for distinguishing between the process language and the language or mechanism for coordinating these processes was recognized early on, it wasn't until the late 1980s and early 1990s that the idea drew much attention. Thomas M. Parks presented a Kahn process network scheduling policy in his PhD thesis that ensured bounded queues for infinite inputs, making practical implementations realizable. Simultaneously, projects focused on the software engineering aspects of dataflow began showing up. While not technically dataflow because it doesn't obey the dataflow execution model, J. Paul Morrison's flow-based programming explored the reusability of large-grain processes implemented in common programming languages. He applied these ideas to large systems in the banking industry, empirically measuring an improvement in programmer productivity. More modern frameworks combine the engineering benefits of languages such as Java with the dataflow execution model. Application and Existing Implementations Along those lines, the LabVIEW toolset created by National Instruments has an architecture based on dataflow technology (www.ni.com/labview/). The outstanding user experience offered by this toolset lets a user build up a dataflow graph of data collection and data processing nodes very quickly. It appears to have used dataflow concepts as more of a functional paradigm than for performance. However, with the advent of multi-core and the availability of processing power at much more affordable price points, the LabVIEW toolset is poised to provide highly scalable processing due to its use of dataflow architecture. There are other applications for dataflow technology beyond signal processing. The pipeline nature of dataflow implementations provides a natural fit for data processing applications. Since the data is pipelined in a dataflow architecture, massive amounts of data can be processed in a highly scalable way. This ability implies that dataflow techniques can be applied to many industry problems, including:
One of the hybrid approaches to dataflow implementation has been used to create massively scalable data processing engines. Way back in the 1990s, a start-up company called Torrent created a C++-based dataflow framework named Orchestrate. This framework implemented dataflow techniques and could run across a cluster of homogeneous systems. Several Torrent customers created business intelligence applications using this dataflow framework. Torrent was eventually acquired by Ascential, which was then acquired by IBM in 2005. Another data processing engine using dataflow architecture has been built with 100% Java technology. This engine, available from Pervasive Software, uses more of the style of flow-based programming (www.pervasivedatarush.com). It's currently available as part of a free public beta program sponsored by Pervasive. See the sidebar for more information. Conclusion All of this leads to the need for a better programming paradigm: a paradigm that encourages the developer to build highly performing and scalable software without the burden of low-level system knowledge. Dataflow is one such paradigm. Dataflow, being a pipelined architecture, is inherently scalable. And as we've pointed out, dataflow concepts have been around for many years. They've undergone change and growth as the ideas have matured in the academic community. There are also several commercial implementations of dataflow in the marketplace, a sign that dataflow technology is real and has many benefits to bring to the software development community. We encourage you to investigate dataflow concepts and determine how they can fit into your software architectures going forward. Resources SIDEBAR Pervasive DataRush: A Dataflow Implementation The article on dataflow concepts introduced you to dataflow technology and its relevance to multi-core processing. Here we'll discuss an implementation of dataflow technology built with Java. The application framework is called Pervasive DataRush and is currently in beta release from Pervasive Software. DataRush is an application development framework. Its purpose is to enable the user to build data processing applications that can easily take advantage of multi-core processors to produce highly scalable software. DataRush implements many dataflow concepts and extends some of these concepts to provide several ways to dramatically increase scalability of applications. DataRush implements a scripting language, based on XML that provides the means to create reusable components and data processing applications. This language, called DFXML (dataflow XML), is simple in syntax and very flexible. DFXML is used to compose what are called assemblies. An assembly is a composite operator and can be composed of other assemblies, processes, and customizers. A process is the lowest level of operator. It's written in Java and performs the work in the dataflow graph. A customizer is a compiler helper also written in Java that enables the dynamic nature of the DataRush framework. DataRush includes a component library that consists of more than 50 pre-built, ready-to-use components provided by the framework. Included are components that provide connectivity to data and data processing components such as sort, join, merge, lookup, group, and so on. The architecture block diagram in Figure 1 depicts the high-level architecture of the DataRush framework. The user utilizes an IDE such as Eclipse to create DFXML assemblies and Java processes and customizers (www.eclipse.org). The DataRush assembler is used to convert DFXML scripts into a binary form. The DataRush execution engine is then invoked to compile the binary files into a dataflow graph for execution. This compilation step is run for every engine instantiation, providing a very dynamic graph-generation capability. Once the dataflow graph is generated, the DataRush engine creates threads and dataflow queues representing the graph and executes the graph. Execution monitoring is provided via JMX.
Reader Feedback: Page 1 of 1
Enterprise Open Source Magazine Latest Stories . . .
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||