litl_phil wrote: While it's nice that Google and Acer share the vision of cloud-based computing, it's also worth noting that we at litl already have a webbook on the market (available at litl.com) that runs our own cloud-based OS.
Unlike Chrome, litlOS is focused on creating a new and better web experience for the home, so we don't have the usual browser interface, we have our own innovative UI. In conjunction with easel mode (litl's inverted-V position) and our growing cohort of litl channels (special apps t...
Resolving Data Conflicts Besides these scoping and
profiling exercises to manage data quality, it's also imperative to
resolve value-level conflicts that exist in the data. These conflicts
can be categorized into three major types (C.H. Goh, "Representing and
Reasoning about Semantic Conflicts in Heterogeneous Information
Systems," Sloan School of Management, Massachusetts Institute of
Technology, 16-22, January 1997.):
Structural and Formatting Conflicts:
Conflicts in the formats of the data values and schemas used for
structuring and organizing the data. Some examples of structural and
formatting conflicts include type conflicts in which different data
types are used to represent the same element. For example, customer ID
is stored as a double in one system and as a string in another system.
Another example is labeling conflicts where similar concepts are
labeled differently such as "supplier" versus "vendor."
Semantic conflicts: Conflicts in
how the meanings of certain data values are interpreted. Examples of
semantic conflicts include naming in which the same concept is
expressed with different values. This is similar to the labeling
conflict but occurs in the data value, whereas with labeling, the
conflict is in the label on the data structure (metadata). The
significance of this difference is that with the semantic naming
conflict, detection and resolution may be more difficult, and the
detection and resolution mechanism has to be applied multiple times
over the entire set of values.
Intensional conflicts: Conflicts
arising when consumer assumptions and expectations of data content
differ from those of data producers. These conflicts are prevalent when
structural representations are identical but the data domains that are
encapsulated in these structures vary with the data producers.
Intensional conflicts often arise when varying producers have
fundamentally different conceptions of integrity constraints between
related entities: cardinality, nillability, or uniqueness.
These data conflicts can often be addressed by using commercial
data management tools and methodologies, as well as enterprise data
modeling software. Another emerging possibility is semantics-centric
modeling environments. Instead of hard-coding data cleansing routines,
these tools use a semantic description of the enterprise - the business
concepts and relationships between those concepts, as well as any
business rules governing the relationships - and provide a mechanism to
describe how legacy systems support the semantics of the enterprise.
This useful abstraction lets the enterprise deterministically identify
how each enterprise data asset supports the enterprise business
functions, as well as any gaps between the enterprise semantic model
and the underlying data representation schemes. This modeling approach
can then be used to determine where physical data conflicts or
duplications may exist, as well as forward engineer data consolidation
and cleansing scripts.
Data Access Controls In traditional application
architectures, data access security is typically governed by
application-specific mechanisms. In this environment, each source has
its own set of users, roles, and access control policies. Which means
that user profiles, roles, and access control policies lack consistency
across the enterprise. An SOA environment magnifies this problem by
making data sources visible across the organization. So it becomes
increasingly important to move away from individual
application-specific and data source-specific mechanisms in favor of
enterprise-level SOA identity management and access control mechanisms.
This means that when creating the central data services layer, the data
sources must rely on central provisioning of some security functions so
they can be managed centrally. The challenge is in finding the right
balance between the security functions that should be managed centrally
and what should be managed as part of the data sources. There are
several options in implementing such a scheme, including a centrally
managed data security layer, or using layered authorization through
multiple policy decision points (PDP).
With the central management option, the data sources relinquish
security and rely solely on the data services to protect the access to
their data. Within each data source, a single user profile is created
for the data service that has full access to the data. Any request to
the data through this service is authorized through this user profile.
So there's no longer a concern about whether the principal's identity
from the overarching security domain exists or means anything in the
data source. However, this option pushes security checks into the data
service layer and reduces the granularity of accountability. As a
consequence, any access control policies from the data source along
with the associated roles and privileges should now be re-created and
maintained at the central enterprise points.
In contrast, layering the use of multiple policy decision points
encourages the reuse of existing authorization capabilities, user
profiles, and access control policies of the underlying data sources.
This approach allows some of the more fine-grained access control
decisions to be made at the data sources rather than elevating them
into the enterprise layer. Although many variations exist for this
design, the premise is that different layers of authorization with
multiple PDPs are making the decisions. The basic flow of this approach
is as follows: Authentication still occurs at the edge using enterprise
authentication services. Requests for data originate at different
security domains in the enterprise. A PDP in each of these domains
evaluates requests for resources in that domain. When a data service is
invoked it calls the enterprise policy decision point to authorize
access to the data service as well as the specific operation requested.
The data service then delegates the decision to each data source so
they can authorize access to their specific data object(s). Thus,
coarse-grained decisions are made at the enterprise level while
finer-grained decisions use data source-specific profiles and policies
that aren't exposed to the enterprise.
Data Services Architecture From an architectural
perspective, the heart of this solution is an enterprise layer that
logically centralizes access to the data spread across the enterprise.
This set of logically centralized data services provides several
architectural advantages. First, the enterprise can assert greater
control over the governance and implementation of data access
mechanisms. Second, clients use a consistent mechanism to access data.
Third, the enterprise can design and implement a solution in a holistic
fashion instead of the typical one-off models that are the norm in data
integration. Finally, besides the basic Create, Read, Update, and
Delete (CRUD) operations, the underlying architecture must also support
data aggregation, inter-service transactions, and multiple access and
usage patterns, all while ensuring acceptable levels of quality of
service.
Data Aggregation Scenarios This data services
layer acts as a façade over the enterprise assets - it logically
provides access to enterprise data assets in a singular manner, while
physically dispatching requests and aggregations across relevant
co-located assets. Three main scenarios should be considered for data
aggregation:
The unified view of a data entity is defined by combining
attributes from multiple sources. The actual data of that view is also
obtained by combining data from multiple sources. The main difficulty
with this aggregation scenario is linking related data from multiple
systems that may not share unique identifiers. This often requires the
creation of a cross-reference table to link related records.
The unified view of an entity is derived from the model of a
single source. However, the actual data is obtained from multiple
sources with different models. The main difficulty here is an
understanding of de-duplication - tapping multiple systems to get a
complete set of instance data can result in multiple instance records
about the same thing. In this case, once duplicates are identified,
which one survives to become the "golden copy"? In this model,
identification and use of authoritative sources becomes important.
The unified view of an entity is partitioned across multiple
instances of a single model. Data distribution can be the result of
planned partitioning or just the ad hoc use of the same source system
across multiple departments resulting in multiple instances. In case of
planned partitioning, the partitioning schema can be used to optimize
the performance of the data access layer, while in the case of ad hoc
distribution duplicates are a problem and should be addressed through
the use of authoritative data sources.
Some of these aggregation
capabilities can be supported through Enterprise Information
Integration (EII) technology, which provides SOA-centric capabilities
for accessing and querying co-located data in real-time. EII products
provide adapters to legacy data sources and expose their underlying
data in a service-oriented fashion. EII is best used in discrete
query-based mechanisms where data volumes are moderate. EII isn't meant
to be a replacement for traditional ETL (extract, transform, load), EAI
(enterprise application integration), or MDM (master data management)
technologies. For example, some of the aggregation scenarios requiring
de-duplication capabilities can require the use of MDM technologies.
The data services layer allows creates and updates to be requested once
by a client and then decomposed by the supporting architecture into
individual write commands to targeted data sources. Therefore, the
architecture must support transactionality - ensuring that writes are
consistent so that underlying data across all affected data sources are
left in a consistent state. This isn't significantly different from
current data integration pains. However, most systems today requiring
multi-write transaction capabilities leverage the XA standards. Similar
standards for the Web Services environment are only starting to emerge.
OASIS has recently formed a Web Services Transaction Technical
Committee (WS-TX TC) responsible for stewarding WS-AtomicTransaction,
WS-Coordination, and WS-BusinessActivity specifications through the
standardization process. None of these standards have been ratified
yet. Because these specifications are still being developed, most
SOA-related transaction support is being custom-developed, typically
through the use of homegrown compensation mechanisms - effectively an
"undoing" of a previously executed service invocation. Instead of
providing true rollback semantics, compensation is an additional
service invocation that rewrites data to its original state. While it
may be beneficial to take a wait-and-see approach to building
transactionality, solutions aligned with the three specifications
seeding WS-TX deliberations will likely provide the path of least
resistance to standards compliance.
Quality of Service With all the data access
operations going through this data services layer, a major concern is
the potential bottleneck at this layer that may limit scalability. The
obvious way to resolve this problem is to create a clustered
environment with multiple instances of this data services layer.
There are complexities with clustering dependant on whether the
enterprise is using a purely federated approach or has some level of
data replication. If using a purely federated approach, then it can be
simple to have a cluster with multiple instances. However, the
architecture must still address the issue of affinity for a particular
instance - especially in the case of inter-service transactions. The
architecture must address questions such as: Are all operations that
are part of a transaction forced to go to the same data service
instance? Can different operations that use different data service
instances still be part of the transaction?
A simple solution is to require all operations in a single transaction
to interact with a single service instance. However, this solution
isn't without its disadvantages since it can affect how well the load
is distributed across the cluster. With some replication, clustering
becomes more difficult. In addition to the server affinity issue, the
architecture must include a partitioning strategy. This strategy
answers questions such as: Do all instances of the data services allow
access to all the data? Or are data services partitioned so that only
certain instances allow access to certain data?
Data Access and Usage Patterns It's important to
note that different applications have different data access and usage
patterns. Some applications can produce many transactions but access
only a small amount of data in each transaction. For other
applications, the transaction throughput can be small but the volume of
data that's accessed very large. The way to tune data source
performance for these patterns is very different. When using a data
services solution to provide centralized access to enterprise data
sources, the enterprise must accommodate all the various access and
usage patterns of the applications that will be integrated with this
solution. Tuning the infrastructure to support a single application's
performance requirements is complicated, trying to tune it to
adequately support multiple patterns of use and access will be even
more difficult. Often, there will be conflicting configurations -
something that optimizes the performance of one application will
degrade the performance of another. The enterprise should analyze and
model the access and use patterns of the applications that will be
using the data services and ensure that well-defined performance
criteria for each scenario have been developed. Additionally, enough
time should be planned for testing the performance of a particular
solution with simulations that reflect the access and usage patterns
that are common to the enterprise environment.
Summary Harmonizing data assets has always been a
challenging problem; the problems and urgency are further exacerbated
when migrating to an SOA. Developing a strategy for handling this kind
of transition is essential to properly enabling data access in an
enterprise SOA environment. By developing appropriate requirements and
use cases and by analyzing data assets and data usage, organizations
can better understand the breadth and depth of their data integration
issues and begin to take steps to address them. Ultimately, every
organization must develop a strategy tailored to its specific needs,
but the overall approach described in this article provides guidance in
understanding what types of questions should be asked and how to
leverage possible technology solutions to address the resulting issues
that are identified. This guidance will enable organizations to fully
leverage and exploit their most important strategic asset: their data.
About Tieu Luu Tieu Luu works at Booz Allen Hamilton where he helps the U.S. government create and implement strategies and architectures that apply innovative technologies and approaches in IT. You can read more of Tieu’s writing at his blog at http://tieuluu.com/blog.
About Sandeep Maripuri Sandeep Maripuri is an associate with Booz Allen Hamilton where he designs and implements data sharing architectures that apply service-oriented concepts. Prior to joining Booz Allen Hamilton, Sandeep held architecture and engineering positions in both large consulting firms and a commercial software startup, where he was an architect and lead engineer of one of the first commercially-available semantic data interoperability platforms.
About Riad Assir Riad Assir is a senior technologist with Booz Allen Hamilton where he designs enterprise systems for commercial and government clients. Prior to Booz Allen Hamilton, Riad held Senior technology positions at companies such as Thomson Financial, B2eMarkets and Manugistics, where he worked on large supply chain systems development.
Trackback Added: SOA and Data Architecture ; A data access tier is an architectural component of many systems designs. Reusable data objects are a fundamental building block for SOA. Yet many architects are ignoring the data tier for SOA.
#1
SOA Web Services Journal News commented on 15 Jun 2006
The adoption of Service Oriented Architecture (SOA) promises to further decouple monolithic applications by decomposing business functions and processes into discrete services. While this makes enterprise computing assets more accessible and reusable, SOA implementation patterns are primarily an iteration over previous application development models. Like most application development evolutions, SOA approaches inject more layers and flexibility into the application tier, but have often neglected the most fundamental building block of all applications: the underlying data.
Business Integration Architecture & Technology wrote: Trackback Added: SOA and Data Architecture ; A data access tier is an architectural component of many systems designs. Reusable data objects are a fundamental building block for SOA. Yet many architects are ignoring the data tier for SOA.
SOA Web Services Journal News wrote: The adoption of Service Oriented Architecture (SOA) promises to further decouple monolithic applications by decomposing business functions and processes into discrete services. While this makes enterprise computing assets more accessible and reusable, SOA implementation patterns are primarily an iteration over previous application development models. Like most application development evolutions, SOA approaches inject more layers and flexibility into the application tier, but have often neglected the most fundamental building block of all applications: the underlying data.
Oracle seems to have divided the open source ranks over the MySQL delay it’s having closing its acquisition of Sun. Eben Moglin, the GPL’s most ardent defender and delineator, the lawyer who has worked hand in glove for years with the Free Software Foundation’s founder Richard Stallman...
Cloud computing is a game changer. The cloud is disrupting traditional software and hardware business models by disrupting how IT service gets delivered. Entrepreneurial opportunities abound as this classic disruptive technology begins to proliferate, so it is no surprise that SYS-CON'...
The irony is that Oracle has advanced MySQL, lost money in the process, and helped its competitors - all at the same time. When Oracle buys Sun and controls MySQL the gift (other than to Microsoft SQL Server) keeps on giving as the existential threat to RDBs is managed by Redwood Shore...
WSO2, the open source SOA company, today announced the launch of the WSO2 Cloud Platform. Available today, the new WSO2 Cloud Platform features a family of WSO2 Cloud Virtual Machines; WSO2 Cloud Connectors for enabling fast, secure cloud services; and the multi-tenant WSO2 Governance-...
Now, the open source Mozilla Thunderbird client software can be used with Open-Xchange collaboration software. The "Community OXtender for Thunderbird" software connector gives users full access to appointments and contacts stored in the Open-Xchange Server and enables them to use Thun...
Morph Labs, a leading provider of enterprise cloud computing technology, today announced an introductory trial of the Morph CloudServer, an open, standards-based server IT organizations can use to rapidly model and evaluate their cloud implementations. A miniature "Cloud Environment in...