Comments
bruce.armstrong wrote: Somebody just said it better than I did, and with more chops to say it: Open Letter to Mark Zuckerberg, Sheryl Sandberg & Facebook Mobile
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


Best Practices for Managing Your Linux/Unix Performance and Availability
Key performance and availability indicators

Over the past few years, many IT organizations have begun to adopt internal service level agreements (SLAs) designed to ensure the performance and accountability of IT systems that support critical business functions. If you don't already have an internal SLA for your Linux/Unix server performance and availability, chances are you probably will very soon. The following best practices for managing your Linux/Unix system performance provide useful guidelines that help you set expectations in your organization and establish the metrics on which your performance will be judged.

The reasons behind the trend toward SLAs are not difficult to understand. In a highly competitive global environ-ment, IT operations must support critical applications that automate key business processes. That translates into service level agreement metrics that are based on the availability of applications and their response times for completing crucial tasks. End users, whether they're external or internal customers, judge system performance in terms of application response time.

It's IT's responsibility to optimize the operation of servers and applications to meet acceptable response times, avoid costly downtime due to system failures or bottlenecks, and contain costs through better utilization or performance tuning of existing hardware resources.

Beyond General Health Indicators: A Best Practices Approach

Traditional measures of system performance and availability have usually focused on monitoring the general health indicators of an individual server. Typical server health parameters include monitoring processor, memory, and disk space usage. Yet, even when conventional health parameters appear to be within normal ranges, your business applications could experience problems.

For example, if a developer decides to test a new piece of code on a production server, that code could lock out a portion of a mission-critical database. When an authorized user logs on to an application and that part of the database cannot be accessed because it's locked, the application fails. General health indicators do not offer any way to diagnose or properly assess this kind of a situation.

While common tools for measuring general server health, such as TOP, are readily available, they do not provide sufficient insight into server performance and availability to satisfy the metrics of most service-level agreements. The following best practices describe the key performance and availability indicators you will need to properly monitor and deliver the service levels expected from today's mission-critical Linux/Unix systems. Keep in mind, there are automated, commercially available tools that can help you implement these best practices.

Best Practice #1:

Workload Monitoring
It's essential to monitor whether specific application processes are getting access to important system resources such as CPU, memory, and disk I/O. Beyond general server health indicators, workload monitoring provides specific analysis of individual-named processes. It's particularly important to monitor disk I/O applications, such as databases, that generate significant activity as it can create "hot spots" that can impair an application's process. You should look for a tool that provides granular capability that includes monitoring the number of logical disk transfers, logical disk reads, and writes per second.

In addition, you should be able to monitor total CPU usage for each named process, the total for all processes, as well as queue length (load average) to determine the risk of CPU overloads. The same goes for individual memory use for each specified process.

Best Practice #2:

Performance Management
Performance management, along with the ability to identify bottlenecks and their causes so that corrective action can be taken before system performance is affected, is critical to understanding what your end user is experiencing. Bottlenecks can occur when portions of your system do not run fast enough to keep up with demands from application processes. Your automated tools must be able to calculate the response times of your back-end database, for example, and provide a measurement whereby you can benchmark optimal perfor-mance (see Figure 1). Identifying and correcting root causes can in some cases avoid the need for hardware upgrades. In addition, you should be able to monitor your network's ability to support critical application traffic and determine if the network is causing a performance problem.

Best Practice #3:

Availability Management
Critical applications like Oracle do not work if their associated processes or daemons are unavailable.You need to know immediately if a process/daemon fails. In the case of Network File System, this means monitoring key processes such as mountd and nfsd. Manually checking these processes/daemons is not practical. You should look for automated tools that can periodically "ping" servers to ensure proper availability, as well as monitor the overall availability of your Linux/Unix servers.

Best Practice #4:

Directory and File Management
Directory and file management is crucial for Linux/Unix servers since failure to recognize that a disk or directory is approaching its capacity can be catastrophic. This is especially true for environments with a large number of servers. Since most application log files are written to the /var directory, if this directory becomes full a panic condition results, causing a core dump. If you have several users working on a highly integrated CAD application, for example, and they all try to save or store their work to the same directory and it's full, some of these users could lose some or all of their work that day.

Thus you must be able to track the amount of disk space utilized by files and directories and be alerted when they approach their capacity. That means being able to automatically check the disk space used by specified directories and the files under those directories, generate an alert if the situation exceeds a threshold, and have the option to manually or automatically take corrective action to move files to a less-full disk.

Best Practice #5:

User Activity Monitoring
For security as well as performance and availability reasons, it's essential to know which users have logged on to your system at certain times. It's especially important to ensure that no user be allowed to log in directly as the root user since on Linux/Unix systems this gives the user nearly unlimited power to perform any action. You need to be sure that all users log in with their user name and then require a further command to go to the root directory. Automated performance tools can record and provide alerts on all root directory user requests and provide for escalation in the event of multiple failed root user log-on attempts.

Without these added controls, users may not always realize they have logged in as a root user and could accidentally delete files.

Best Practice #6:

Automated Log File Monitoring
Because Linux/Unix servers generate enormous amounts of log file data, it's impossible to try and monitor log files manually. You need an automated tool that can track certain messages or errors, including the Syslog file. You should be able to search for specific messages or strings in order to identify and flag certain messages. For example, if a user is trying to access a server remotely, your tool should generate an alert that is sent to you as the administrator. The same applies to application log files, whereby the tool can alert you to specific application log file errors.

Best Practice #7:

Process Monitoring
It's important for you to recognize which processes are currently running on your Linux/Unix servers at any time, including those processes that normally operate on your systems. You should know the total number of processes running at any given time, which processes are running or terminated, and whether the total exceeds a preset threshold representing "normal" server activity. If a threshold is exceeded - say X number of processes at once - your monitoring tools should generate an alert to make you aware of the server status.

This information can be critical to properly diagnosing system problems. Systems running much more slowly at certain periods during the day may indicate a rogue application process on a production server or a problem with managing the activity of cron jobs, for example. In addition, proper process monitoring can identify "zombie processes" created by another process that has not properly terminated and is needlessly using valuable system resources.

Best Practice #8:

Application Resource Management
Because Linux/Unix servers are typically configured to a specific application, such as an Oracle database or a Network File System, you as the administrator must be sure the intended application is getting the resources necessary to run. The only way to do this properly is to track the specific resources that critical applications are imposing on their servers in terms of memory and processing power over time. Knowing the consumption history of server CPU and memory can assist you in assessing trends and anticipating capacity changes in your systems.

For example, if you notice that the CPU consumption for an Oracle database application has grown from 55% in the past month to 65% in the last week to 75% within the last 24 hours, you can justifiably predict that CPU utilization will quickly be maximized. In recognizing this situation, you may see that the business has been adding users of the application at a brisk pace and that the application may need to be moved to a larger server with more CPU capacity to accommodate the increased demand.

Conclusion

Each of the best practices described here represents a more sophisticated level of performance and availability system accountability than the conventional general health measures that most administrators may be familiar with. However, as user demands on IT departments continue to increase and accelerate, these best practices are becoming the rule rather than the exception. Fortunately, there are performance and availability tools available commercially that can help you automate these best practices and achieve the service levels required to maintain and enhance your Linux/Unix systems.

Even in the best circumstances, more effective and efficient administration of Linux/Unix systems is a challenging assignment. Hopefully, these best practices will help you gain a better understanding of how you can meet these challenges.

About David Giannini
David Giannini is the director of cross-platform systems management products at NetIQ Corp.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Enterprise Open Source Magazine Latest Stories . . .
Before embarking on using open source cloud technology for your web property, a basic understanding of cloud, as it’s used in the industry, is essential. While there might be exceptions, here are the definitions. A software application delivered on the web instead of installing standa...
Businesses today generate billions of events or 100s of TBs of data in a month. These data contain valuable insights into customer behavior, key trends, buying patterns, etc. If these are successfully mined, they can lead to successful decision-making to maximize revenue and traffic fo...
Grid Dynamics, an eCommerce technology solutions company, and GridGain Systems, makers of an open source in-memory platform for Big Data processing, on Wednesday announced the expansion of their partnership which began in 2008. Grid Dynamics provides personalization and big data solut...
Private clouds solve many problems for enterprises and bring unique operational challenges along with them. There are dozens of companies of all sizes that will build you a private cloud and turn over the keys – then what? Trying to convert a traditional enterprise IT operations team t...
The networking industry has gone through different waves over last 30+ years. In the ’80s, the first wave was all about connecting and sharing; how to connect a computer to other peripheral devices and other computers. There were many players who developed technology and services to ad...
If your organization already uses virtualized infrastructure, you are well on your way to providing IT as a Service. But as businesses demand faster results in today’s competitive market, organizations look to gain more benefits from cloud computing than just virtualized infrastructure...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE