|
SYS-CON.TV Webcasts
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
Top Links You Must Click On
Feature It's a Multi-Core World: Let the Data Flow
A functional parallelism paradigm that fits multi-core processor architecture
By: Jim Falgout; Matt Walker
Aug. 26, 2007 12:00 PM
Dataflow Implementation The dataflow queues in DataRush are somewhat comparable in functionality to the blocking queue implementations in the java.util.concurrent package introduced in the Java 5 release. They're both memory-based queues that block readers on empty queues and block writers of full queues. The DataRush queues, however, must support deadlock detection and handling. Due to support for multiple queue readers and the fact that processes can have multiple inputs and outputs, cycles of dependencies can be created in a dataflow graph. These cycles can lead to deadlock, whereby writers and readers are waiting in a way that needs intervention for the graph to continue working. A deadlock algorithm in the DataRush engine detects deadlock situations and handles it, normally by temporarily expanding the size of the problematic queue. Besides the pipeline scalability that a dataflow architecture already provides, the Pervasive DataRush framework has built-in support for two other types of scalability: horizontal partitioning and vertical partitioning. Horizontal partitioning replicates a section of dataflow logic and segments the input data into chunks, flowing the data concurrently through the replicated dataflow sections. Figure 2 depicts this scenario using a lookup component as an example. In this example, the lookup operator is replicated with a data partitioner spreading the data load evenly to each lookup instance. This lets each lookup operator run in parallel, fully utilizing multiple cores on the system. Vertical partitioning supports running different dataflow logic in parallel on each field of an input stream. Figure 1 shows the high-level architecture of the Pervasive DataRush framework including design and execution components. The user utilizes an IDE such as Eclipse to create DFXML assemblies and Java processes and customizers. Figure 2 exemplifies horizontal partitioning, one of three types of scalability, which can be implemented using Pervasive DataRush. Horizontal partitioning replicates a section of dataflow logic and segments the input data into chunks, flowing the data concurrently through the replicated dataflow sections. Why Java? The question always arises about Java and performance. What we've found, with the introduction of the java.nio package and other JVM performance enhancements, is that native speeds can be obtained from Java. This is especially true for frameworks like DataRush in which a static set of classes (the process nodes) are utilized over a relatively long period of time. This scenario provides an environment well suited for JIT compilers. A Simple Benchmark Conclusion
Reader Feedback: Page 1 of 1
Enterprise Open Source Magazine Latest Stories . . .
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||