Tuesday, October 6, 2009

Automatic Statistics Gathering in Oracle 10g

CPU and I/O Performance Statistics
In Oracle 10G, the optimizer takes CPU and I/O costs into consideration during query optimization. If the optimizer understands what the performance characteristics are for I/O and CPU, it will be more effective at query optimization. Current system statistics allow the optimizer to make more intelligent decisions during the optimization process, which ultimately results in higher quality access paths being generated.
Unlike object statistics, administrators must continue to gather system statistics manually in Oracle 10G. This is done by executing the DBMS_STATS.GATHER_SYSTEM_STATS procedure. The key to gathering good system statistics is to run the procedure when a normal workload is being executed. Running it when the workload is light, or too heavy, can result in poor access paths being generated.
System Statistics
System statistics describe the system's hardware characteristics, such as I/O and CPU performance and utilization, to the query optimizer. When choosing an execution plan, the optimizer estimates the I/O and CPU resources required for each query. System statistics enable the query optimizer to more accurately estimate I/O and CPU costs, enabling the query optimizer to choose a better execution plan.
When Oracle gathers system statistics, it analyzes system activity in a specified period of time. The statistics are collected using the DBMS_STATS.GATHER_SYSTEM_STATS procedure. Oracle Corporation highly recommends that you gather system statistics.

Parameter Name    Description    Initialization    Options for Gathering or Setting Statistics      
cpuspeed    CPU speed is the average number of CPU cycles per second.    At system startup    Set gathering_mode = NOWORKLOAD, INTERVAL, or START|STOP, or set statistics manually.      
ioseektim    I/O seek time equals seek time + latency time + OS overhead time.    At system startup    Set gathering_mode = NOWORKLOAD or set statistics manually.      
iotfrspeed    I/O transfer speed is the rate at which an Oracle database can read data in the single read request.    At system startup    Set gathering_mode = NOWORKLOAD or set statistics manually.      
maxthr    Maximum I/O throughput is the maximum throughput that the I/O subsystem can deliver.    None    Set gathering_mode = NOWORKLOAD, INTERVAL, or START|STOP, or set statistics manually.      
slavethr    Slave I/O throughput is the average parallel slave I/O throughput.    None    Set gathering_mode = INTERVAL or START|STOP, or set statistics manually.      
sreadtim    Single block read time is the average time to read a single block randomly.    None    Set gathering_mode = INTERVAL or START|STOP, or set statistics manually.      
mreadtim    Multiblock read is the average time to read a multiblock sequentially.    None    Set gathering_mode = INTERVAL or START|STOP, or set statistics manually.      
mbrc    Multiblock count is the average multiblock read count sequentially.    None    Set gathering_mode = INTERVAL or START|STOP, or set statistics manually.   
Gathering Object Statistics
Oracle provides the DBMS_STATS package to allow administrators to gather, modify, view, export, import, and delete statistics. The DBMS_STATS package gathers statistics on columns, tables, indexes and partitions and stores the results in Oracle's data dictionary. The optimizer reads the object statistics stored in the data dictionary during query optimization.
After DBMS_STATS is executed, current statements in the shared pool that access the newly analyzed tables are invalidated. This forces the statement to be re-parsed which will allow the optimizer to generate access paths based on the new statistics.
There are many different options available in DBMS_STATS. Administrators Or Developers are able to select which objects to analyze, how much of each object to analyze (large tables may take a long time to analyze), select the number of parallel processes that will perform the analyze, etc.
In releases prior to Oracle10G, we were required to schedule DBMS_STATS jobs on a regular basis to ensure that valid statistics were available to the optimizer. In releases prior to Oracle9i, it was necessary to "guess" how much of the data changed to determine if statistics collection was necessary.
In Oracle9i, the GATHER AUTO option of the DBMS_STATS procedure could be used to help determine if statistics generation was required.
If more than 10% of the rows changed in the table since the last analyze was performed, the DBMS_STATS procedure (with the GATHER_AUTO option activated) analyzed the table.
By default, Oracle10G automates these tasks by evaluating the statistics for all of the tables in the database and running analyze when required. Oracle10G's default maintenance window is nightly from 10 PM to 6 AM and all day on weekends. During these time periods, statistics are automatically collected using the GATHER_STATS_JOB procedure. The maintenance window time-periods can be adjusted to tailor them to each individual application's business processing requirements.
In Oracle Database 10g, Automatic Optimizer Statistics Collection reduces the likelihood of poorly performing SQL statements due to stale or invalid statistics and enhances SQL execution performance by providing optimal input to the query optimizer.
To ensure that 10G is automatically gathering statistics for data objects, we'll need to verify that:
§    The statistics job is running by executing the following SQL statement:

SELECT * FROM DBA_SCHEDULER_JOBS WHERE JOB_NAME = 'GATHER_STATS_JOB';
§    The modification monitoring feature that identifies stale statistics is enabled. This feature is enabled when the STATISTICS_LEVEL parameter is set to TYPICAL (default) or ALL.
Gathering Statistics Manually in Oracle10G
If we have a warehouse system that contains an extremely volatile table. Hundreds of thousands of rows are added and removed around the clock. The table is loaded and literally hundreds of SQL statements are run to allow our marketing personnel to make critical business decisions.
The level of activity is based on the particular business process taking place. At one particular time, the table can contain hundreds of thousands of rows and at other times it can contain dozens. If we run DBMS_STATS at the same time each day, in most cases we would think we would get a predictable set of statistics generated. Not so in this table's case, sometimes we get hundreds of rows and sometimes we get hundreds of thousands.
If we are unlucky and generate statistics on the table when it has hundreds of rows, access paths suffer when the table grows to hundreds of thousands. To prevent poor access paths from being generated (a big problem in a data warehouse), we run the DBMS_STATS utility immediately after the table is loaded. The optimizer has current statistics to use as input and generates optimal access paths for our critical business queries.

No comments:

Post a Comment