Feature
Turbo-Charging Applications with Mid-Tier Distributed Caching
Fast and predictable data access
Feb. 21, 2008 12:00 PM
Reliability Considerations
When performing
operations across the cluster, reliability of the architecture is
paramount to the success of a project. In particular, if transactional
activity is occurring across the cluster, when data is changed in the
grid and ultimately persisted to a back-end store, it's essential to
ensure that the chosen solution survives failure gracefully.
We recommend that customers build their solution and test failure
scenarios with four to six machines in their topology. When considering
which data grid solution to use, the following capabilities should be
given high importance in the selection criteria:
- Dynamic scaling of the grid. You should be able to add nodes to it
without program or configuration changes. The grid should scale
dynamically once the baseline configuration of the application has been
set.
- Reliability of the grid. Basic tests such as unplugging
machines from the network and forcefully killing specific nodes in the
cluster will ensure that the architecture is robust and can be depended
on for mission-critical applications.
- Throughput performance of the grid. Adding nodes to the
grid should give a near-linear and predictable growth in throughput. If
the throughput doesn't grow in linear fashion, the solution may not be
effective and may not provide the desired performance in a production
situation.
A fundamental decision criteria is determining whether a
solution works only on a single or preset number of nodes, or worse,
requires specific changes to the program code when it scales to larger
numbers of nodes.
These technologies aren't only being used in small grids consisting of
four to eight nodes; they're increasingly being deployed in large grids
of 500 to 1,000 nodes. At these extreme levels predictable and
automated reliability, scalability, and performance are critical.
Deployment Considerations
When using and deploying
a data grid, there are many things to consider in addition to the
programming model and cache topologies that you want to use. Some
issues to consider are as follows.
The Network
It's vital to ensure that
when data is requested across the network due to client data requests,
recovery requests within the grid, or other processing, your
environment is optimally tuned and secured at all levels. Some areas to
consider include:
- High network throughput - bandwidth should be a minimum of 1 GBps.
- Redundant network interface cards (NICs) in each server for
availability and performance, as well as physically separate network
segments for non-cache traffic. For example, from within the cache you
may have back-end data stores that need to be written to or read from.
These should use separate NICs and therefore separate data paths so as
not to interfere with cache traffic.
- Optimal configuration of your network and switching devices.
A data grid operating in full production mode has the potential to
saturate the networking infrastructure.
Operating Systems Typical development
environments usually consist of either Windows or Mac operating
systems. Test and production environments tend to be Linux, Solaris,
AIX, or Windows. Depending on the operating systems you use, some of
the issues to consider are:
- Tune the TCP/UDP layers to be optimal for the operating system you're using.
- Ensure that your NICs are working properly in full duplex mode.
- Make sure that your grid server processes are never paged to disk since this will severely impact performance.
- Ensure that you test your application thoroughly on the
target platform so that any issues related to the subtle differences in
operating systems or JVM versions/implementations are aired.
JVMs JVM configurations and command line
parameters vary slightly between the different vendor implementations.
The following considerations are a good place to start:
- Set the -Xms option and the -Xmx option to the same value to
ensure that you're not allocating too much memory to the JVMs. This
should help keep garbage collection pauses manageable.
- Use the -server option to get better performance.
- Be aware of the capabilities and switches available in your JVM to achieve optimal JVM performance and GC optimization.
Hardware Choices Consider taking
advantage of commodity-based, dual/quad-core x86 or x86-64 hardware.
Solutions built on these platforms are very cost-effective and designed
to scale efficiently.
Security
Ensure that you've considered
encrypting data in transit and securing the data grid so that only
authorized processes can access and manipulate data in the grid.
Conclusion
Keeping data cached in object form in a
mid-tier data grid provides fast and predictable access to it and
provides a scalable and reliable platform for supporting extreme
transaction processing from Java, C#/.NET, and C++.
Data grids provide multiple caching topologies to support various data
access requirements including static reference data and massive volumes
of volatile data as well as integration with back-end data sources
using technologies such as Hibernate and Toplink. By using data grid
technology with commodity-based hardware, you can linearly scale your
data and processing and provide predictable and reliable access to your
data.
With many vendors providing reliable and scalable data grid solutions,
engineers can spend their valuable time designing and writing code to
solve business problems rather than building caching and data grid
infrastructures from scratch.
About Tim MiddletonTim Middleton is a solution architect with Oracle in Perth, Western Australia. He has over 17 years of experience in the IT industry. During this time he has been involved in the design and implementation of many large and leading-edge technology projects within the government and private sectors. His focus is on providing middleware solutions around SOA, with an emphasis on architectures that are highly available, scalable and reliable. Tim also has extensive development experience with J2EE and application server-based solutions, as well as many years experience as a DBA.