Performance tuning, step 1: Planning a tuning engagement
Tuning a database or database applications is a time-intensive, repetitive task. Over time, the database changes in many ways and must be reevaluated. In addition, changes made to improve performance will need to be evaluated. Did the change actually improve performance? If so, how much? How much effort did it take to achieve the gain (or loss)? What is "good performance" and how can we make it better?
This series identifies a standard approach for performing or evaluating database tuning efforts. It is applicable to data warehouses, custom databases, custom database applications or Oracle Applications Database. The criteria used in the evaluations will depend on individual circumstances. For example, a data warehouse has different performance requirements than a customer order application. These requirements will affect the database architecture, which affects how performance can be optimized. In short, the goal is to identify how performance can be optimized, as measured by how operations are transacted by users.
When users start complaining that the database is too slow, the natural desire is to tune everything. But the real answer, based on customer needs, is a combination of training, tuning and business process management. This primer identifies the basic elements of performance tuning, the tools used and a basic methodology for obtaining the information used to evaluate a database. It is designed for the non-technical user i.e., the applications user, project managers or anyone who would like understand what those techies and sysadmins do, and why they talk to themselves...
It identifies the basic steps for each stage of performance tuning, explains what tools may be used and function or expected results yielded by the tool. It also provides a set of basic questions for initial performance problem assessment -- a manager's checklist to guide the non-technical manager to the business of performance tuning.
What is good performance?
There are a number of published articles on what constitutes good -- and bad -- performance. These articles are usually defined for a specific architecture. This is because the machine configuration -- size, speed and number of CPUs significantly impacts the system. Simply put, some systems have more horsepower than others. However, CPU is not the most important measure in dealing with performance issues. The most important measure is response time. Time is the metric the user understands, time is the measure the manager uses and time is the most effective measure of performance improvement. While there are many important ratios that measure whether a database is working effectively, the goal is for the database to effectively support user transactions. Tuning hit ratios and cache statistics is like patching potholes. Sometimes patching fixes the problem; sometimes repaving is better. That's why a more effective measure of performance tuning results is response time. Response time concentrates on the specific business process that users have identified as problematic.
Planning a tuning engagement is like the initial stages of any project and uses the same steps:
1. Determine what you've got.
2. Determine what they think they need.
What does the user see as the primary problem? No matter what you actually find (good, needing improvement, dearly beloved, etc.), you will frame your results and recommendations in terms of the user's perceptions. This is the future state, or the "to-be" state. It establishes an understanding of perceived priorities.
3. Determine the details of what you have.
In this case, you or your tame techie will use commands and scripts to retrieve detailed information about each area under consideration.
4. Establish goals.
If it ain't broke, don't fix it. What is it that returns the most value for the user? Remember that tuning for tuning's sake is not effective….
5. Prepare a plan.
Documenting your plan is an important management. If you are lucky, when you distribute your plan, you'll get feedback from all sides. (Recite this daily: "Feedback is my friend.") Feedback allows you to plan targeted communication as results are achieved.
6. Work the plan.
Your improvement plan will probably have multiple phases. At each phase completion, evaluate the accomplishments in comparison to both the original baseline and with accomplishments of previous phases. Update your communications, give feedback to the users regarding results.
7. Keep going until you reach your goal or reassess your goals.
As time elapses, priorities may change. For instance, a company with seasonal sales may need to have all performance enhancements completed before the sales rush or put in place some temporary "fixes" for the duration. Care must be taken when introducing temporary solutions, however.
Performance tuning, step 2: Determining the current state
In step 1 we identified the elements of a basic performance tuning plan. The next step is to determine the "current state" of the system. The current state is documented in order to form a baseline for comparison in later stages. We will also look at determining relative priorities with users and management, and how to define the details of the current state.
The current state
Determining the current state involves both users and the technical architecture of the database. This includes working with knowledgeable users, often called power users, to find out what they see as problems. Ask your power users if they have observed any specific trends, jot down what time problems occur and what they were doing. Take the time to observe the user performing their problematic tasks and record the steps. Using the same steps every time establishes a baseline for comparison. When working with the user, take care to ask questions, but not to jump to conclusions or make promises about how much you will be able to change. You are in the evaluation phase and changes may be technical, training related or both.
Check the database size and basic parameters. This can be done using the enterprise manager or a set of basic SQL scripts. Do not make changes at this time. It is important to establish a baseline for comparison first.
Check the database architecture. Identify network nodes used, the size and location of database files. Identify mount points. Identify network connectivity parameters, including packet sizes. Prepare a network diagram for future reference.
Check how database transactions are accomplished. How many users are there? What software and hardware do they use? What transactions have been identified as problematic? Later technical personnel will check the actual code used in transactions to determine effectiveness. Is the system shared (i.e., does a single server house production and development instances)? Are other types of applications housed on the system? In one corporate IT department, top-of-the-line mainframes housed database applications as well as manufacturing production tracking applications, creating contention for resources. Changes made must be evaluated in concert with ALL operations affected.
Can CPUs be added? What is the predicted effect and how does it compare with the cost to add? Scalable architectures may allow different sizes and numbers of CPUs. CPUs provided by different manufacturers may have different ratings as well. Whatever machine is run, a general rule of thumb is to know your CPU utilization target. Often this is a goal of 30 to 50 percent. What this means is that during normal, off-peak operation, measurements should indicate sufficient additional capacity for peak times, with an allowance for times of unusual load. Oracle Applications performance is not stable when CPU is over 80%.
Is the system maintained internally or externally? Is there a maintenance requirement that stipulates availability or capacity to be provided? If there are service level agreements (SLAs) in place, some improvements may be made at less (or more) cost. Additionally, the more organizations involved, the more time it may take to change performance.
It is important to document the baseline, distribute the information and maintain accurate records over time. Where more than one organization is involved, the need to establish priorities agreed upon -- or negotiated with all interested parties is particularly important.
Typical documentation (a.k.a. "deliverables") for a performance tuning project will include:
1. System or technical architecture document
2. Network architecture
3. Database architecture
4. Capacity plan
5. Change management plan
6. Test plans and procedures
7. Maintenance plan
Plan for your deliverables early in the project. (Establish a plan, use the plan to focus, fill in the plan as you go.) Evaluate existing documentation to examine whether the data is still valid. Existing documentation may provide key information regarding previous problems, agreements that have been made or projections where problems are expected to occur.
While examining the system and objectively asking questions of the users, you must consider what gives the most bang for the buck.
Base performance goals on the customer's system, established baseline:
1. Meet with stakeholders to determine their perception of what is needed.
2. Meet with primary users identified as knowledgeable personnel by the stakeholder. Observe problems. Understand the business process. List ALL customer issues.
3. Get baselines.
4. Based on preliminary information, agree on improvement goals with customer. Establish that applications performance improvements made by procedural changes, network architectural changes, database changes or SQL tuning. In many instances, retraining users, concurrent manager re-llocation, can affect major improvements.
Make sure that the customer agrees with your strategy and present the goals in a measurable context. Establish priorities for each area to be addressed. Document the baseline, performance goals. Goals should be stated clearly and concisely and with measurable parameters. For instance:
1. Sales order booking takes 45 seconds for one order line. We need it to run in five seconds for one order line.
2. Custom report: Update raw material costs takes 30 minutes to complete in off-peak hours. We need regular updates throughout the day without impacting sales order pricing.
3. Web pricing request takes 17 seconds to retrieve and display 4X4 cost/delivery matrix.
Be careful what you promise. Vendor published baselines represent aggressively tuned, optimal systems. Any promises you make to your users should be based on improving existing performance and any existing SLAs.
Evaluating the baseline will take into account information from all levels of users. A multilayer approach helps provide a 360-degree view of the system. Each type of user is asked the same questions and encouraged to give detailed examples. (When the user asserts, "Performance is rotten! I can't get my work done!" they are asked, "What are you doing when you notice it is so bad? Is it always bad, or is it sometimes bad? Does everyone have the same problem?"
The assessment process...
Is highly collaborative. It takes information from all types of users/interested parties.
Focuses on business. It identifies business processes, rather than imposing a technical solution.
Helps identify change impact. By identifying business processes affected, it assists in creation of important performance metrics.
Leads to measurable benefits. The metrics identified will be used in subsequent comparisons.
Supports future vision. The assessment and the performance tuning/analysis will often be used to support the justification for technical purchases such as more memory, additional hard drives or faster hard drives. It may also suggest changing existing configuration. On one system, a combination of disk shadowing and the selected RAID configuration created considerable degradation. What is a suitable RAID configuration for software development or manufacturing systems may not be suitable for databases.
May identify new models. As you discuss the system and obtain information, you may need to assure each set of users that you have no particular agenda and that your goal is to obtain as complete information as possible. Assure them that the results of the investigation will be available, and no solution technical or otherwise can be decided until all information is available. This is where having a planned set of deliverables is critical.
The goal for the assessment is to...
Define the problem(s) quickly. The users responses will probably be able to be combined into categories. These categories provide the general areas where you will concentrate your efforts.
Define the problem accurately. Your problems should be reported in terms of measurable behaviors. Often the performance tuning will be alleviated in stages. One set of "fixes" will improve performance, but additional improvements may be made with subsequent actions.
Identify the appropriate resources. Your problem definition will include the scope of the tuning's responsibility. For instance, while your information may reveal that hardware components are undersized, the purchase and installation may not be possible in near term. Identify the times that resources are available; certain types of testing may be required to be performed in off-hours, or some personnel may be unavailable during certain periods.
Solve the problem quickly. By structuring the problem statements, the analysis and the recommended solutions, you prepare the customer for your methods and the time it will take to analize.
You may provide test/reporting scripts for the user to evaluate functional timing. Generally, you will want to limit any testing tools you may provide to "power" users. Let them know that these scripts may impact performance, and should not be run continuously. Power users can be your best friends. They can alert you to trends that you may not otherwise see.
Performance tuning, step 3: Working the plan
The overall tuning process is a systematic review based on a finite set of business processes. There are a number of published articles on what constitutes good -- and bad -- performance. These articles are usually defined for a specific architecture. This is because the machine configuration -- size, speed and number of CPUs will significantly impact the system. Simply put, some systems have more horsepower than others. However, CPU is not the most important measure in dealing with performance issues. The most important measure is response time. It is the metric the user sees, and is the most effective measure of performance improvement. While there are many important ratios that measure whether a database is working effectively, the goal is for the database to effectively support user transactions. Tuning hit ratios and cache statistics is like patching potholes. Sometimes patching fixes the problem; sometimes repaving is better. That's why a more effective measure of performance tuning results is response time. Response time concentrates on the specific business process that users have identified as problematic. Select some basic, clearly understood problems and solutions. This is often referred to as "gathering the low-hanging fruit." At this point, you evaluate the information gathered during the evaluation of the current state.
Checking the business process first can have great impact on your tuning efforts. For instance, do customer support specialists indicate that their searches for open orders for a specific customer are slow? By looking at what their actual process is, you may find that the operator is not making good use of the workbench search fields, and could benefit from adding parameters to his or her search criteria. In this case, some additional one-one-one training can increase their satisfaction and the performance of their searches.
Next, obtaining benchmarks -- running specific searches at non-load times, non-peak, peak times gives a measure of how results vary throughout the day. If some operators have widely disparate results at the same time, you may need to look at network configuration. For example, if Joe and Mary in operator bay A have no problems, but operations in bay B take twice as long, the network configuration should be analyzed. Note: In one instance, only one operator on a floor was slow. It was determined that someone had connected that cubicle only to a different server at the local switch as a stopgap measure and never set it back. The moral of the story is that you need to look beyond the database for performance solutions.
Perform all benchmarks on a single, tuned PC or laptop client to ensure continuity. If necessary, move the client to perform benchmarks in different areas. If this is not possible, verify client configuration complies with company standards. Comparing results of disparate systems is extremely difficult.
Gather benchmarks for all processes identified as problematic. When all benchmarks have been derived, discuss which processes have priority and what performance goals are appropriate. You may wish to present phased (preliminary, acceptable and target) goals, as well. This is because the final solution may combine a series of user training, network and database or application tuning actions.
Documentation is used to identify the initial definition of the problem, initial conditions, goals agreed upon and the focus for the work to be done. Over time, additional documentation addresses what actions are recommended and the results of the actions. Finally, an assessment outlining the completed actions, results and recommendations for ongoing actions to maintain/further improve the increased performance is recommended.
As mentioned earlier, significant improvement can be made by retraining users in specific areas of application use. Additional training performed should be documented for later use by new users.
Because the solution for a specific area is often iterative, as efforts are defined and redefined, the work performed must be detailed and recorded scrupulously (and hopefully controlled in a change-management system). At each stage, results should be compared to the baseline; performance tuning is a balancing act, where action in one area may have deleterious effects on another.
Also because systems change over time, repeatable scripts will allow the user to verify the degree that changes have/have not occurred... (i.e. "This time last year we had 5,000 orders placed per day, at a average rate of 10 orders per hour for each call-in sales rep.")
The technical components of performance tuning
Technical architecture: Server and network tuning
If there is a problem with the Oracle server, such as an overloaded CPU, excessive memory swapping or a disk I/O bottleneck, no amount of tuning within the Oracle database is going to improve your performance.
Application architecture: Integration of middle tier. Apache Server, Forms/Reports Server.
When evaluating the performance of forms and reports, check both requirements and standards under which they were developed. Forms and reports created using standard defaults may benefit from reducing or relaxing the standards for performance reasons. In one example, approximately 20 Web forms were completely redesigned when it was determined extensive use of check constraints impacted performance. For many of the forms, data input was prevalidated; when the constraints were removed, overall performance improved over 100%.
(a) Instance tuning: Tuning the Oracle SGA is the next step, and all of the Oracle initialization parameters must be reviewed to ensure that the database has been properly configured. This phase of Oracle tuning is directed at looking for resources shortages in the db_block_buffers, shared_pool_size and sort_area_size. Investigate important default parameters for Oracle, such as optimizer_mode.
(b) Object tuning: This phase of tuning looks at the setting for Oracle tables and indexes. Settings such as PCTFREE, PCTUSED and FREELISTS can have a dramatic impact on Oracle performance.
SQL tuning: This is the most time-consuming tuning operation because there can be many thousands of individual SQL statements that access the Oracle database. At a high level, we identify the most common SQL statements, tune each one by carefully reviewing the execution plan for the SQL and adjust the execution plan using Oracle hints.
Networks are increasingly more important as we move to global businesses. Work with your telecom provider to evaluate link capacity and utilization. Request detailed reports. Consider resizing to reduce costs on underutilized links or upgrading those close to capacity. Evaluate network drivers. Is there a more recent, downloadable version? Vendors often update their drivers. This can make a significant improvement.
An up-to-date, accurate, detailed network diagram of both the logical and physical network locating all servers and users can be used to troubleshoot and predict problem areas. Check transaction processing routes to and from the server... if they are not the same, check the routing tables on each. What, if any, network management systems are used? Can any devices be placed in diagnostic mode during evaluations? Check that software and hardware are not operated routinely in diagnostic mode, as this will generally affect performance. Determine number of packet retries and collisions, bandwith utilizaton. Note any traffic management or priority queuing set on any device. Detail any contracts or service level agreements for the LAN and WAN. Identify any protocols installed on the clients.
Compare findings with benchmarks to establish size and extent of performance issues.
Implementing the plan
Having evaluated the current state, and performed numerous measurements, actual changes to the system must be controlled and documented. Because you will most likely continue to assess the performance measures, changes should be documented in detail. A good practice is to establish a repository of network and database assessment scripts. Similarly, record the steps of any changes. All changes should be evaluated on a test system before rolling out to production systems. Before implementing any changes, and each successive change, be sure to make a complete backup and test your ability to restore conditions to their original state. Make only one change at a time. Recovering after multiple changes is frustrating at best and may be impossible.
The sequence of changes will be based on priorities discussed with the users, management and technical personnel. You will also want to run baseline assessment scripts to determine whether the changes are effective. Occasionally, a change may not have the intended effect. This is what makes performance tuning an iterative task.