Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


Open Source: The Next Frontier for Data Quality Management
Data quality, a pervasive & critical business issue

Data is the fundamental building block of every business, data in the form of client information, sales information, employee information, and financial information fuels the operation of every business. In today's business environment, which enables data entry from multiple points and through myriad processes, data quality has become an increasing concern for businesses trying to succeed in an ever more competitive atmosphere.

Data quality or data integrity as defined as incomplete, erroneous, or incompatible data is part of every business's day-to-day operation. Furthermore, as new flexible data entry options become available, the opportunity for data quality issues to be introduced into enterprise data increases. Overall business strategy is also increasing the prevalence of data quality issues as mergers, acquisitions, and department consolidations becoming part of almost every business's growth initiatives.

Data quality issues are often latent in an enterprise until a critical business initiative becomes road blocked because the enterprise data can't comply with the needs of the business. Companies of every size in every industry are increasingly reporting issues with data quality. The Data Warehousing Institute reported that 50% of its respondents felt that company data quality is worse than the organization thinks. Furthermore, more than half of respondents indicated their organizations had suffered losses due to poor data quality.

Data encompasses all the critical decision-making variables in an organization, including financial data, employee data, client data, prospect data, and inventory data. Viewing data that is erroneous or incomplete can seriously impact the decisions an organization makes and the strategies it employs. Recent research from Aberdeen indicates that the state of a company's data quality directly impacts its growth, profitability, and ability to compete. Poor data quality obscures an organization's view causing it to miss additional revenue opportunities, risk regulatory issues, and forfeit the intelligence gained from a clear view of business data.

As the prevalence and impact of data quality issues become more apparent, concern over these issues is reaching beyond the IT community to the C-suite. A recent study by the Financial Executives Research Foundation indicates that data quality across the enterprise was its number one concern, surpassing information security and Sarbanes-Oxley. Finance professionals cited information integrity as the key issue impacting overall corporate operations and performance.

Data quality is every organization's sleeping monster. It quietly erodes profitability, impedes growth, and hinders the implementation of mission-critical business initiatives.

The Limitations of Commercial Data Quality Solutions
Once an organization recognizes its data quality issues and their operational impact, it typically evaluates commercially available solutions to address the problem since most companies lack the IT infrastructure and knowledge to address enterprise data issues. However, for most companies seeking a data quality solution, the evaluation process is a sobering one because most commercially available solutions are costly, complex, and require software licenses and term contracts, while only addressing a portion of the overall issue.

Commercially available data solutions are fundamentally flawed in their implementation model. To be most effective data quality processes should be deployed at multiple touch points throughout an organization. Full implementations are almost impossible because they become cost-prohibitive when licenses are expanded to encompass more users and multiple systems.

Commercial solutions are also prohibi-tive to many organizations due to their term contract commitments, software licenses, and implementation requirements. Price tags for traditional solutions can often total in the hundreds of thousands of dollars if not over a million dollars, not including the human capital within the organization needed to manage the solution in concert with the provider. Such price tags make commercially available data solutions inaccessible to many small and mid-size enterprises that need data quality solutions.

Another drawback of traditional solutions is that they offer only cookie-cutter product approaches to data quality. Since most companies have data issues that are unique due to their specific organizational history and infrastructure, traditional cookie-cutter solutions often require significant programming and custom code development - all requiring additional testing, resources, and money, adding significantly to the complexity of the solution for implementation and service management.

Moreover, support for traditional solutions is typically limited to the providing vendor due to the proprietary software and licenses involved in the implementation of the solution. This restriction further increases the price tag of the conventional solution since support, service, and implementation can total as much as 70% of the purchase price of the solution.

Open Source: The Next Frontier for Data Quality Management
While open source has been gaining traction and attention in many business solutions, data quality solutions have remained an area where open source is not widely utilized. Open source, however, is well equipped to address the limitations of traditional software-based solutions or SAAS solutions and create industry-leading data solutions. Open source solutions are inherently better suited to address the needs of comprehensive data quality management with their flexibility, cost efficiency, customization, rapid integration, and turnkey scalability options.

A key benefit of open source data quality solutions is that they can be implemented at multiple data entry points throughout an organization because they require no license purchases. This flexibility creates a more comprehensive and longer-term solution than single-point commercial solutions.

Open Source data quality solutions also provide a significant cost advantage over conventional quality solutions because they require no software license purchases or management. Software licenses can account for up to 20% of the cost of a traditional implementation. This represents a significant cost savings to organizations. Furthermore, software licenses typically come with lengthy contract commitments attached, impacting the cost structure for an organization for a significant if not perpetual period of time.

Moreover, open source data quality software can be easily customized to address the unique data fingerprint of every organization eliminating the need to retrofit cookie-cutter traditional solutions with code modifications and custom programming. This customization ability reduces the complexity of the solutions and offers faster implementations, simpler integrations, less testing, and more rapid results than commercial solutions.

Another benefit of open source solutions is that servicing is more flexible and cost-efficient because it isn't tied to proprietary licensing. Service can then be provided by the technology vendor, secondary vendor, or internal resources. Furthermore, the open source community can also provide support and innovation for solutions as they evolve within an enterprise.

Lastly, open source data quality solutions have the added value of using the new technology processing systems dedicated to providing "pay as you go" (utility computing) processing options for turnkey scalability. This offers a further significant cost advantage over commercial solutions that require licenses tied to hardware. Data solutions are especially prone to scalability issues due to the volume of data undergoing processing, many traditional solutions become easily stressed due to these needs, increasing the costs, delaying results, and reducing the return on investment for traditional solutions.

It's clear that an open source solution for data quality offers many benefits to clients over conventional solutions. Open source provides all businesses access to critical data quality solutions that can positively impact their overall profitability, growth, and competitive position. Furthermore, the existence of the open source community enables a solution users' immediate access to shared knowledge and implementation enhancements, rather than waiting months or years for another software release. Open source can offer organizations the most customer-centric data quality solution available in the marketplace today with flexibility, customization, and significant cost advantages.

Research Sources:
•  TDWI. "Taking Data Quality to the Enterprise through Data Governance 2005."
•  Aberdeen Report. "{Customer Data Quality, The Roadmap to Growth and Profitability 2007."
•  Technology Issues for Financial Executives 2007 Annual Report.

About Subbu Manchiraju
Subbu Manchiraju is a vice president at Infosolve Technologies, which provides business clients with comprehensive data solutions that leverage the power of their enterprise data to achieve business objectives and create strategic opportunities-- without the burdens of cumbersome licensing agreements, complex term contracts and expensive hardware requirements.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

i've seen that Quality management is a must, for example the other day i was looking at hi5.com and they didn't even have quality policies, what's wrong with that?

I find mrmo's comment right: if you want an open source data quality tool, you need to go directly to the open source software editors. There are plenty of programs to satisfy your needs.

Just look at Pentaho or Talend to use their data quality products. They are open source and downloadable on the company’s website.

Echoing the other commenter, InfoSolve does not provide open source. They provide source code for things they build on top of OSS to people who pay them. There is a distribution of source to the payer, so it's really a source code license. I think the magazine should do a little more homework before providing this type of information as it's misleading and promotes another fauxpen-source vendor.

Hi,
I represent Aggregate Profiler from Arrah Technology. It is an open source and more than 6000 downloads. I would apprciate if you can review that and give your feedback.
Future plan is to use Modeling and scheduling so that it can run in batch mode also.
http://www.arrah.in

I absolutely agree with everything you're saying about the advantages of Open Source data quality but I find it less convincing when faced with the fact that Infosolvetech does not provide an Open Source licensed solution that complies with the Open Source definition! I've tried several times to find the source code for your OpenDQ product, but found that you had to be a paying customer to get it? How open is that? And how do you benefit from a non existing community?

So now the point my point is obvious... Find another Open Source data quality solution to gain those benefits that you speak of. Try using DataCleaner (which I will gladly admit that I represent), Aggregate Profiler or Open Data Profiler.

Respectively:
http://www.eobjects.dk/datacleaner
http://sourceforge.net/projects/dataquality/
http://sourceforge.net/projects/dataprofiler/


Your Feedback
rabk wrote: i've seen that Quality management is a must, for example the other day i was looking at hi5.com and they didn't even have quality policies, what's wrong with that?
Toby Russ wrote: I find mrmo's comment right: if you want an open source data quality tool, you need to go directly to the open source software editors. There are plenty of programs to satisfy your needs. Just look at Pentaho or Talend to use their data quality products. They are open source and downloadable on the company’s website.
mark wrote: Echoing the other commenter, InfoSolve does not provide open source. They provide source code for things they build on top of OSS to people who pay them. There is a distribution of source to the payer, so it's really a source code license. I think the magazine should do a little more homework before providing this type of information as it's misleading and promotes another fauxpen-source vendor.
Vivek wrote: Hi, I represent Aggregate Profiler from Arrah Technology. It is an open source and more than 6000 downloads. I would apprciate if you can review that and give your feedback. Future plan is to use Modeling and scheduling so that it can run in batch mode also. http://www.arrah.in
Kasper Sørensen wrote: I absolutely agree with everything you're saying about the advantages of Open Source data quality but I find it less convincing when faced with the fact that Infosolvetech does not provide an Open Source licensed solution that complies with the Open Source definition! I've tried several times to find the source code for your OpenDQ product, but found that you had to be a paying customer to get it? How open is that? And how do you benefit from a non existing community? So now the point my point is obvious... Find another Open Source data quality solution to gain those benefits that you speak of. Try using DataCleaner (which I will gladly admit that I represent), Aggregate Profiler or Open Data Profiler. Respectively: http://www.eobjects.dk/datacleaner http://sourceforge.net/projects/dataquality/ http://sourceforge.net/projects/dataprofiler/
Enterprise Open Source Magazine Latest Stories . . .
Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP). The...
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and st...
AMD said late Tuesday that its chief sales officer Emilio Ghilardi had left the company and that CEO and president Rory Read is going to do his job while a replacement is sought. AMD didn’t say why Ghilardi left but it’s assumed Read wants his own people. Read is relatively new to th...
During the lifespan of M3 (Monitis Monitor Manager) there has always been something lacking – timers. M3 execution procedure was outlined in this previous article. The execution mentioned in the latter was a one-time-execution, whereas server monitoring requires periodic invocati...
Red Hat is putting its bought-in Gluster scale-out NAS storage technology, acquired in October, on the Amazon cloud. It’s styled Red Hat Virtual Storage Appliance for Amazon Web Services and other clouds are supposed to follow in short order.
A new episode of the screencast series is now available at the OpenNebula YouTube Channel. This screencast demonstrates the new easily-customizable self-service portal for cloud consumers. Its aim is to offer a simplified access to shared infrastructure for non-IT end users. The scree...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE