Comments
litl_phil wrote: While it's nice that Google and Acer share the vision of cloud-based computing, it's also worth noting that we at litl already have a webbook on the market (available at litl.com) that runs our own cloud-based OS. Unlike Chrome, litlOS is focused on creating a new and better web experience for the home, so we don't have the usual browser interface, we have our own innovative UI. In conjunction with easel mode (litl's inverted-V position) and our growing cohort of litl channels (special apps t...
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


Data Warehouse Adoption of the Linux-Based Platform
A Study of Trends and Challenges

Data warehouse implementations represent one of the most challenging types of deployments for the enterprise. Several factors contribute to the challenge of deploying a successful data warehouse. Among these are large-scale and complex system configurations, sophisticated data modeling and analysis tools, and high visibility in a broad range of important business functions within the company.

Data warehouse workloads can serve as a litmus test to determine the enterprise readiness of a given deployment platform. For this reason it's interesting to determine how well Linux can support such challenging workloads. To that end I began a study, examining two interrelated aspects of enterprise readiness for a data warehouse on Linux:

  1. Is the solution stack supported on Linux?
  2. Are end-user companies actively deploying the stack to support their business needs?
To investigate this issue, I chose to work in cooperation with the Data Center Linux initiative at OSDL. Building on personal, practical experience with data warehouse deployments, I conducted an informal survey of the readiness of the Linux platform for this workload. This article is a summary of the findings of that survey.

Data Warehouse Solution Participants

The survey examined three types of participants in the data warehouse solution or ecosystem:
  1. Independent software vendors (ISV)
  2. Independent hardware vendors (IHV)
  3. End-user company deployments
A number of adequate studies has been published that shows how Linux is well accepted on a variety of industry-standard vendor platforms, so its base acceptance was taken as an assumption within my study. Rather, the focus of my study was on Linux readiness within the ISV and end-user communities.

I used Ralph Kimball's "High Level Warehouse Technical Architecture" as a reference for analysis and to provide common terminology for analysis of the solution stack. I broke down the list of vendors into "front room" and "back room" categories, based upon Kimball's architecture.

The study involved a total of 18 vendors. It's important to note that this roster did not represent a de facto list chosen to illustrate Linux usage. In fact the list represented the dominant vendors, chosen based upon experience in deployments at a number of large companies.

Study Results - Data Warehouse Trends

The study found that overall there exists reasonable support for Linux from ISVs that comprise the data warehouse solution, with 14 of 18 vendors offering some level of support for the open source OS. Within Kimball's technical architecture, the vendors supplying products to meet the needs for the "front room" were predominantly hosting their offerings on client platforms. They had weaker support overall for Linux than the "back room" vendors with products in such areas as extract, transform, and load (ETL) and database. Specifically, the ETL vendors tended to support one particular Linux distribution very well, while database vendors tended to support multiple Linux distributions.

The study further examined motivators and other issues driving (and inhibiting) Linux adoption and support by ISVs, with the following findings.

Motivators

  • Market demand for the Linux platform
Issues
  • How many and which distributions to support
  • Differences in packages across distributions
  • Lack of standardization among maintenance tools and lack of usability features
While issues exist with regard to supporting the Linux platform, clearly a majority of ISVs within the data warehouse felt that the market demand was sufficiently compelling to deliver products for that platform.

By examining end-user company deployments, my study focused on companies that had data warehouse and/or data mart implementations that would be considered medium-sized to large (i.e., total implementation data size was at least one terabyte), with a typical configuration around 60 terabytes. These types of configurations shared some common themes:

  • Overall configuration elements - medium to large data warehouse:
    - SAN disk - use of failover
    - Employ NFS
    - Use multiple file systems as well as raw disk partitions
    - Employ large file systems
    - Multi-CPU large servers dominant - use of partitioning
The study further surveyed a subset of companies from a group of companies with data warehouse implementations within the target size. Initially, a small sample set of companies was chosen to limit the scope of the study and get an initial picture of the deployments. In the future, the study of companies will be expanded.

Of the seven companies surveyed, the responses broke down as shown in Table 1.

The following is a summary of the issues and motivators for the three groups above.

Group 1

  • While there are some potential motivators for cost consolidation, there are significant inhibitors in terms of the internal infrastructure to support Linux and the perceived immaturity in the platform.
Group 2
  • Flexibility in choice of hardware platforms drove decisions to build a development environment as a first step toward evolving a mature support infrastructure for Linux.
  • The primary inhibitor to moving to production was the lack of adequate support infrastructure within key ISVs for solutions on Linux.
Group 3
  • Migration to Linux represented a strategic move to take advantage of the flexibility of deploying the hardware and software solutions that Linux provides.
  • The primary production issue for IT infrastructure teams was providing systems integration services to ensure the success of such a demanding workload, such as the need to build customized monitoring scripts for the environment.
Based on the data above, the most important group to analyze in more detail was Group 1 because it was the dominant group. Moreover, I wanted to provide input to the OSDL Data Center Linux group regarding the strategic focus areas to drive acceptance of the data warehouse on Linux.

Group 1 reported the following motivations and issues in detail.

Motivators

  • Cost consolidation
  • H/W platform flexibility
  • Low-cost clustering
  • Consolidation of system administration skills
Issues
  • Weak internal support for Linux infrastructure
  • Lack of maturity of data warehouse solutions on Linux
    - Maturity defined: Referenceable and in production for at least one year
  • Lack of acceptance of Linux within DW
    - Acceptance defined: Deployments within Fortune 100 companies

Conclusion

The overall conclusion drawn from this survey of the data warehouse and Linux was that the solution stack is sufficient to support the workload on Linux. However, the Linux support infrastructure is often not mature enough for Linux-based deployments for the large, complex configurations and demanding workloads of data warehouses.

End-User Highlights
Some very specific findings emerged from the study with regard to end-user deployment:

  • The majority of companies in Group 1 (no plans in the near future to migrate to Linux) will eventually move into Group 2 (development on Linux with a longer- term move to production). They fell into Group 1 because complexity, reliability, and scalability requirements proved too demanding for current deployments on Linux. Staffing and support issues were key inhibitors as well.
  • Groups 2 and 3 featured early adopters who leveraged the availability of H/W, database, and ETL server solutions to enable successful deployment.
ISV Highlights
Similarly, salient ISV data emerged from the study:
  • Market adoption of Linux in "back room" solutions is healthy and growing.
  • Market adoption of Linux in "front room" solutions is measured, due to limitations in current ISV offerings and challenges for ISVs to support multiple Linux distributions.
  • Opportunities exist for standardization across distributions, e.g., tools, packages, etc., to support the ISV community.
The information from this study has been incorporated into the prioritization of requirements for the OSDL Data Center Linux initiative, especially within the database and data warehouse tier. The OSDL intends to expand upon this informal study in the future to continue to drive visibility to the needs of this and other critical Data Center workloads.
About Lynn de la Torre
Lynn de la Torre is a member of OSDL and coordinates the activities of the DCL Working Group. Lynn has thirty years of experience in the data center, and has worked in operations, system administration, database administration, and software development. Prior to joining OSDL, Lynn was a project manager for a large data warehouse implementation.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Enterprise Open Source Magazine Latest Stories . . .
Oracle seems to have divided the open source ranks over the MySQL delay it’s having closing its acquisition of Sun. Eben Moglin, the GPL’s most ardent defender and delineator, the lawyer who has worked hand in glove for years with the Free Software Foundation’s founder Richard Stallman...
Cloud computing is a game changer. The cloud is disrupting traditional software and hardware business models by disrupting how IT service gets delivered. Entrepreneurial opportunities abound as this classic disruptive technology begins to proliferate, so it is no surprise that SYS-CON'...
The irony is that Oracle has advanced MySQL, lost money in the process, and helped its competitors - all at the same time. When Oracle buys Sun and controls MySQL the gift (other than to Microsoft SQL Server) keeps on giving as the existential threat to RDBs is managed by Redwood Shore...
WSO2, the open source SOA company, today announced the launch of the WSO2 Cloud Platform. Available today, the new WSO2 Cloud Platform features a family of WSO2 Cloud Virtual Machines; WSO2 Cloud Connectors for enabling fast, secure cloud services; and the multi-tenant WSO2 Governance-...
Now, the open source Mozilla Thunderbird client software can be used with Open-Xchange collaboration software. The "Community OXtender for Thunderbird" software connector gives users full access to appointments and contacts stored in the Open-Xchange Server and enables them to use Thun...
Morph Labs, a leading provider of enterprise cloud computing technology, today announced an introductory trial of the Morph CloudServer, an open, standards-based server IT organizations can use to rapidly model and evaluate their cloud implementations. A miniature "Cloud Environment in...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE