Tuesday, October 6, 2009

Tuning Joins

Abstract

A SQL statement involving two or more tables requires that Oracle join data from multiple sources in order to achieve the desired result. The Oracle Server has several built in methods for performing joins, and each method has its own performance and resource usage qualities. While an application developer or DBA doesn’t need to explicitly tell Oracle which join method to use or the order in which different tables should be joined, a solid understanding of how Oracle processes joins allows you to design and build systems for optimal performance. In this presentation we’ll examine the different methods Oracle uses for joining data from multiple sources, and we’ll see how this understanding can be used to design better database schemas, tune SQL statements, and tune databases as a whole for better join performance. Since most SQL statements involve joins, tuning joins leads to improved overall system performance.

How Oracle Processes SQL Statements Involving Joins

Oracle’s optimizer must choose an execution plan for each SQL statement submitted. The plan will indicate how data will be accessed—which indexes will be used and so forth. If the SQL statement involves a join or a sub query, then the optimizer must also choose the order in which tables will be joined together, and the method to be used for each join operation.

Join Order

A row source is a logical set of rows from a partially complete query. The row source may be rows in an actual table or rows in a temporary segment stored on disk or in memory. Oracle always joins two row sources at a time. Therefore, if a SQL statement joins three tables, Oracle will actually join two tables together and then join the results with the third table.

The order in which row sources are joined can have a significant impact on performance. The optimizer makes a list of all physically possible join orders and then chooses what it believes will be the best option. The cost-based optimizer makes this decision by estimating the cost of each option and choosing the one with the lowest cost. The rule-based optimizer, meanwhile, scores each option based on a fixed set of rules and chooses the option with the best score.

Outer joins limit the number of possible join orders because a table being outer joined must appear in the join order after the table it joins to.

Let’s use the following query as an example:

SELECT E1.name “Employee Name”, D.deptname “Department”,
E2.name “Manager Name”
FROM emp E1, dept D, emp E2
WHERE E1.empno = :empno
AND D.deptno = E1.deptno
AND E2.empno (+) = E1.mgr_empno;

Oracle could execute this SQL statement by joining emp table E1 to dept, and then joining the results to emp table E2. Alternatively, the order could be E1 - E2 - D. Note that the order D - E1 - E2 is also possible, but probably not too efficient. Also note that the order D - E2 - E1 is not physically possible because E2 is being outer joined to E1 and therefore must appear after E1 in the join order. And since D and E2 share no criteria in the WHERE clause, such a join order wouldn’t seem to make much sense anyway.

Join Methods

Since the days of Oracle V6, the optimizer has used three different ways to join row sources together. These are the nested loops join, the sort-merge join, and the cluster join. With Oracle 7.3 the hash join was introduced, making for a total of four join methods. Each has a unique set of features and limitations.

Nested Loops Joins

Suppose somebody gave you a telephone book and a list of 20 names, and asked you to write down each person’s name and telephone number on a fresh sheet of paper. You would probably go down the list of names, looking each up in the telephone book one at a time. This would be pretty easy for you, because the telephone book is alphabetized by name. Moreover, somebody looking over your shoulder could begin calling the first few numbers you write down while you are still looking up the rest. This scene describes a nested loops join.

In a nested loops join, Oracle reads the first row from the first row source and then checks the second row source for matches. All matches are then placed in the result set and Oracle goes on to the next row from the first row source. This continues until all rows in the first row source have been processed. The first row source is often called the outer table or driving table, while the second row source is called the inner table.

Nested loops joins are ideal when the driving row source is small and the joined columns of the inner row source are uniquely indexed or have a highly selective non-unique index. Nested loops joins have an advantage over other join methods in that they can quickly retrieve the first few rows of the result set without having to wait for the entire result set to be determined. This is ideal for query screens where an end-user can read the first few records retrieved while the rest are being fetched. Nested loops joins are also flexible in that any two row sources can always be joined by nested loops—regardless of join condition and schema definition.

However, nested loops joins can be very inefficient if the inner row source does not have an index on the joined columns or if the index is not highly selective. Also, if the driving row source is quite large, other join methods may be more efficient.

Sort-Merge Joins

Suppose two salespeople attend a conference and each collects over 100 business cards from potential new customers. Each salesperson now has a pile of cards in random order, and they want to see how many cards are duplicated in both piles. So each salesperson alphabetizes their pile, and then they call off names one at a time. Since both piles of cards have been sorted, it becomes much easier to find the names that appear in both piles. This scene describes a sort-merge join.

In a sort-merge join, Oracle sorts the first row source by its join columns, sorts the second row source by its join columns, and then “merges” the sorted row sources together. As matches are found, they are put into the result set.

Sort-merge joins can be effective when lack of data selectivity or useful indexes render a nested loops join inefficient, or when both of the row sources are quite large. However, sort-merge joins can only be used for equijoins (WHERE D.deptno = E1.deptno, as opposed to WHERE T.temperature >= D.daily_avg_temperature). Also, sort-merge joins require temporary segments for sorting. This can lead to extra memory utilization and extra disk I/O in the temporary tablespace.

Cluster Joins

Imagine a filing cabinet full of personnel records. For each employee there is a separate folder containing a basic information sheet, a list of vacation days used, and other papers pertaining to the employee. If the boss wants to see a report that lists each employee’s name, social security number, and number of vacation days used this year, then you could work through the filing cabinet one file at a time. When you pull out one employee’s file, you immediately have access to the basic information sheet (which has the employee’s social security number on it) and the list of vacation days used. If you imagine that the basic information is stored in one database table and the vacation days used in another, then this scene could describe a cluster join.

A cluster join is really just a special case of the nested loops join. If the two row sources being joined are actually tables that are part of a cluster and if the join is an equijoin between the cluster keys of the two tables, then Oracle can use a cluster join. In this case, Oracle reads each row from the first row source and finds all matches in the second row source by using the cluster index. In the personnel cabinet example, the filing cabinet is the cluster and the individual employee folders are the cluster key.

Cluster joins are extremely efficient, since the joining rows in the two row sources will actually be located in the same physical data block. However, clusters carry certain caveats of their own, and you can’t have a cluster join without a cluster. Therefore, cluster joins are not very commonly used. In a later section we’ll look at the pros and cons of clusters.

Hash Joins

In a hash join, Oracle reads all of the join column values from the second row source, builds a hash table, and then probes the hash table for each of the join column values from the first row source. This is like a nested loops join, except that first Oracle builds a hash table to facilitate the operation.

Hash joins can be effective when the lack of a useful index renders nested loops joins inefficient. The hash join might be faster than a sort-merge join in this case because only one row source needs to be sorted, and could possibly be faster than a nested loops join because probing a hash table in memory can be faster than traversing a B-tree index. As with sort-merge joins and cluster joins, though, hash joins only work on equijoins. Also, as with sort-merge joins, hash joins use memory resources and can drive up I/O in the temporary tablespace. Finally, hash joins are only available in Oracle 7.3 and later, and then only when cost-based optimization is used.

Effective Schema Design

The database schema plays an important role in tuning joins. Certain features in the schema, or the absence of these features, will limit the options available to the optimizer for joining tables efficiently. Also, poorly placed features in the schema can fool the optimizer into making a poor decision.

Indexing Keys

All primary and unique keys should be identified and appropriate constraints declared. This will cause the creation of unique indexes on the primary and unique key columns. Indexes should usually be created explicitly on foreign key columns to facilitate joins and to eliminate locking problems when Oracle enforces referential integrity constraints.

Consider an orders table with primary key order_id, and an order_lines table with primary key order_line_id and foreign key order_id referencing the orders table.

SELECT O.order_number, L.line_number, L.item_number, L.quantity
FROM orders O, order_lines L
WHERE O.order_id = :order_id
AND L.order_id = O.order_id
ORDER BY O.order_number, L.line_number

If the database will contain many orders, then the order_id column in the order_lines table is very selective and should be indexed. This index, plus the unique index on the order_id column of the orders table will allow Oracle to execute this query extremely efficiently using a nested loops join.

First Oracle will use the unique index on the orders table to create a row source with one row from the orders table. Then Oracle will join this row source with all rows from the order_lines table. The index on the order_id column of the order_lines table allows Oracle to efficiently find all order lines for the specified order. Without this index, Oracle would probably perform a sort-merge join or hash join, requiring a scan of all rows in the order_lines table.

Its not a hard and fast rule, though, that all foreign key columns should be indexed. Indexes should not be placed on columns that are not selective. Such indexes could trick the optimizer into choosing a bad execution plan. Suppose that an order tracking application has a table called order_placement_methods that lists how orders can be placed and an orders table that lists active orders. The orders table has an order_placement_method_id referencing the order_placement_methods table to indicate how a specific order was placed. If orders can be placed by telephone or via the Internet, and it is expected that roughly half the orders will be placed each way, then the foreign key in the orders table should definitely not be indexed. This index would not be selective, always retrieving about half the rows in the orders table.

The Star Formation

Beginning in Oracle 7.3, the optimizer recognizes a schema design known as a star formation. This design is common in data warehouses, and the optimizer can handle queries against these types of schemas using a special nested loops join that is an exception to the “only two row sources can be joined at a time” rule.

In a star formation, several relatively small tables called dimension tables or lookup tables have simple primary keys. A large table called the fact table has a primary key that is a concatenation of the primary keys of the dimension tables. A schema with small tables for time, region, and product and a large table for sales figures would follow a star formation if each record in the sales table indicated sales of one product in one region during one time period.

The following query would be executed very efficiently if the primary key on the sales table consists of the concatenation of the primary keys of the other three tables:

SELECT S.*
FROM time T, regions R, products P, sales S
WHERE T.end_date BETWEEN SYSDATE - 1 AND SYSDATE
AND R.name = ‘West Coast’
AND P.bar_code = ‘184901724’
AND S.time_id = T.time_id
AND S.region_id = R.region_id
AND S.product_id = P.product_id

Clusters

Clusters can improve join performance when two or more tables are usually accessed together, as in a master-detail relationship. Queries that join the clustered tables together run faster because each master record is stored in the same data block with its related detail records. Even a single table that is frequently self-joined or queried for all records with the same value of a key can benefit from being stored in a cluster.

However, updates are slower when a table is clustered, as are full table scans. Also, storage tends not to be as efficient in a cluster. Therefore, much thought should be put in before clustering tables. In practice, clusters are not frequently used.

Distributed Queries

Because each node of a distributed database is an autonomous unit, distributed queries can be a serious bottleneck. Whereas Oracle can efficiently join two row sources located on the same node, joins between a local row source and a remote one can be costly. In a cross-node join, the local node must connect to the remote node and then formulate standard SQL statements to extract the desired data from the remote node. The optimizer on the local node does not have access to the remote node’s optimizer and therefore must make educated guesses as to the optimal way to execute the distributed query.

In many cases, distributed queries can be tuned to work efficiently. This will be discussed in the section below on tuning SQL. For distributed queries that just can’t be made efficient, or where maximum performance is critical, replication can be used. Simple snapshots or Oracle’s Advanced Replication Option can be used to copy data from a remote node so that queries do not need to join row sources across nodes.

Choosing Data Types

Choose data types for all primary and unique keys carefully, and remain consistent throughout the schema. For example, if the primary key on an invoice table is a column called invoice_number and you select a data type of NUMBER(9) for this column, then always use this same data type for invoice numbers throughout the entire schema. If you were to use a data type of VARCHAR2(10) for the invoice_number column of the invoice_items table, then Oracle would need to perform a data type conversion when joining the invoices and invoice_items tables. This conversion costs CPU cycles and can sometimes defeat indexes, having a disastrous effect on performance.

Primary and unique keys for large tables or tables that are heavily joined should be kept small and simple. I’ve worked on multiple projects where a critical table had as its primary key a column with a data type such as VARCHAR2(100). The primary key was actually a series of strings, concatenated together in order to make a unique “key”. Such a primary key is expensive because the large key must be carried in every referencing table, and also because Oracle must compare long strings when performing a join.

A NUMBER(9) or VARCHAR2(5) column requires far fewer bytes and allows faster comparisons during a join. If the true unique key for a table in your application involves many columns or long strings, then consider declaring a unique key on these columns and adding a surrogate primary key column with data type NUMBER(9) that is populated from a sequence. All referencing tables could then carry the NUMBER(9) value instead of the more complex true unique key.

Tuning SQL

The more a database schema is designed with an eye toward optimizing joins, the less individual SQL statements need to be tuned. But still, SQL is such a flexible language that there are usually many ways to write a query. Being able to identify the efficient ways is a good skill to have.

The Usual Suspects

All of the usual fundamentals of tuning SQL still apply when tuning joins, including such techniques as using selective indexes where possible, avoiding unintentional index defeat caused by NVL() and other functions, and avoiding excessive sorting action by overworking the DISTINCT and GROUP BY operators. Since this presentation already has so much ground to cover, these basic tuning concepts will not be discussed here.

Early Elimination of Candidate Rows

Suppose you have a list of 1000 residents of your town along with each resident’s street address, and you are asked to prepare an alphabetized list of residents who have propane delivered to their home.

You could first alphabetize the list of 1000 names and then look up each street address in the list of propane delivery locations, but instead you probably would look up each street address first and then alphabetize second. Why? Either way you will need to perform 1000 street address lookups. However, these lookups will eliminate many names from the list (since not all residents have propane delivered to their home) and thus your alphabetizing job will be easier.

You can apply the same concept when writing SQL that joins tables together. The Oracle optimizer is pretty smart about choosing the most efficient order in which to do things, but how a query is written can constrain the options available to the optimizer.

The query below leaves the optimizer no choice but to read all of Acme’s invoice lines, when in fact only the unpaid invoices are of interest:

SELECT V.vendor_num, I.invoice_num, SUM (L.amount)
FROM vendors V, invoices I, invoice_lines L
WHERE V.vendor_name = ‘Acme’
AND L.vendor_num = V.vendor_num
AND I.vendor_num = L.vendor_num
AND I.invoice_num = L.invoice_num
AND I.paid = ‘N’
GROUP BY V.vendor_num, I.invoice_num
ORDER BY I.invoice_num

This query could be rewritten as follows:

SELECT V.vendor_num, I.invoice_num, SUM (L.amount)
FROM vendors V, invoices I, invoice_lines L
WHERE V.vendor_name = ‘Acme’
AND I.vendor_num = V.vendor_num
AND I.paid = ‘N’
AND L.vendor_num = I.vendor_num
AND L.invoice_num = I.invoice_num
GROUP BY V.vendor_num, I.invoice_num
ORDER BY I.invoice_num

In the rewritten query, the optimizer eliminates all of the paid invoices before joining to the invoice_lines table. If most of the invoices in the database have already been paid, then the rewritten query will be significantly faster. (The schema design in this example is dubious, and is used for illustrative purposes only.)

Forcing a Specific Join Method

When choosing an execution plan for a query involving joins, the optimizer considers all possible join methods and join orders. The optimizer does its best to evaluate the merits of each option and to choose the optimal execution plan, but sometimes there is just no substitute for human intelligence.

In these situations the USE_NL, USE_MERGE, and USE_HASH hints can be used to request a specific join method, and the ORDERED hint can be used to request a specific join order. The optimizer will do its best to observe the wishes of these hints, but if you ask for something impossible (such as a sort-merge join on an anti-join) the hint will be ignored. Note that there is no hint to request a cluster join.

When tuning SQL that uses joins, you may wish to run benchmark comparisons between different join methods and orders if it’s not immediately obvious which would be best. For example, if a report joins two tables that form a master-detail relationship and the proper primary key and foreign key indexes are in place, the optimizer will probably choose to use a nested loops join. But if you know that this particular report joins all of the master records to all of the detail records, you might think it faster to use a sort-merge join or hash join instead. Run a benchmark and see.

Figure 1 shows an example query and its TKPROF output. Figure 2 shows the same query with a USE_MERGE hint, and Figure 3 shows it with a USE_HASH hint. You can see that in this situation the hash join cut CPU time by almost 40 percent and logical I/Os by about 98 percent.

SELECT A.BUSINESS_UNIT,A.PO_NUMBER,A.VENDOR_TYPE,B.LINE_NUMBER,
B.LINE_AMOUNT,B.LINE_STATUS,B.DESCRIPTION
FROM
PURCHASE_ORDERS A,PURCHASE_ORDER_LINES B WHERE B.BUSINESS_UNIT =
A.BUSINESS_UNIT AND B.PO_NUMBER = A.PO_NUMBER ORDER BY
A.BUSINESS_UNIT,A.PO_NUMBER,B.LINE_NUMBER


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.04 0.12 0 0 1 0
Fetch 73370 23.47 23.55 2071 298667 2089 73369
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 73372 23.52 23.68 2071 298667 2090 73369

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 455 (RSCHRAG)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
73369 SORT (ORDER BY)
73369 NESTED LOOPS
73726 TABLE ACCESS GOAL: ANALYZED (FULL) OF
'PURCHASE_ORDER_LINES'
73369 TABLE ACCESS GOAL: ANALYZED (BY ROWID) OF
'PURCHASE_ORDERS'
73726 INDEX GOAL: ANALYZED (UNIQUE SCAN) OF
'PURCHASE_ORDERS_PK' (UNIQUE)

Figure 1: Benchmark of a query using a nested loops join.

SELECT /*+ USE_MERGE (B) */A.BUSINESS_UNIT,A.PO_NUMBER,
A.VENDOR_TYPE, B.LINE_NUMBER,B.LINE_AMOUNT,B.LINE_STATUS,
B.DESCRIPTION
FROM
PURCHASE_ORDERS A,PURCHASE_ORDER_LINES B WHERE B.BUSINESS_UNIT =
A.BUSINESS_UNIT AND B.PO_NUMBER = A.PO_NUMBER ORDER BY
A.BUSINESS_UNIT, A.PO_NUMBER,B.LINE_NUMBER


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.02 0.15 0 0 2 0
Fetch 73370 17.49 19.57 3772 4165 3798 73369
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 73372 17.52 19.73 3772 4165 3800 73369

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 455 (RSCHRAG)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
73369 SORT (ORDER BY)
73369 MERGE JOIN
886 SORT (JOIN)
886 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'PURCHASE_ORDERS'
73726 SORT (JOIN)
73726 TABLE ACCESS GOAL: ANALYZED (FULL) OF
'PURCHASE_ORDER_LINES'

Figure 2: Benchmark of the query from Figure 1, but using a sort-merge join instead.

SELECT /*+ USE_HASH (B) */A.BUSINESS_UNIT,A.PO_NUMBER,
A.VENDOR_TYPE,B.LINE_NUMBER,B.LINE_AMOUNT,B.LINE_STATUS,
B.DESCRIPTION
FROM
PURCHASE_ORDERS A,PURCHASE_ORDER_LINES B WHERE B.BUSINESS_UNIT =
A.BUSINESS_UNIT AND B.PO_NUMBER = A.PO_NUMBER ORDER BY
A.BUSINESS_UNIT,A.PO_NUMBER,B.LINE_NUMBER


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.05 0.13 0 0 1 0
Fetch 73370 14.88 14.95 2071 4165 2093 73369
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 73372 14.93 15.08 2071 4165 2094 73369

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 455 (RSCHRAG)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
73369 SORT (ORDER BY)
137807 HASH JOIN
886 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'PURCHASE_ORDERS'
73726 TABLE ACCESS GOAL: ANALYZED (FULL) OF
'PURCHASE_ORDER_LINES'

Figure 3 Benchmark of the query from Figure 1, but using a hash join instead.

Distributed Queries

As we discussed earlier, distributed queries can sometimes be disastrous for performance. In particular a nested loops join between two row sources on separate nodes of a distributed database can be very slow. Figure 4 shows a simple distributed query and its execution plan. This query will be slow because for each row retrieved from the customers table, a separate query will be dispatched to the remote node in order to see if there is an open booking for the customer. This will result in many small network packets moving between the two nodes of the database, and the network latency and overhead will degrade performance.

Figure 5 shows the same query rewritten in a form that causes less network traffic. One query is sent to the remote node to determine all customers with open bookings. The results are the same, but performance is greatly improved. Both versions of the query use roughly the same CPU time and logical I/Os on the local node, but the elapsed time is about 97% less in Figure 5. This is due to reduced network overhead.

SELECT CUSTOMER_ID,CUSTOMER_NAME,CLASS_CODE
FROM
CUSTOMERS CUST WHERE EXISTS (SELECT 1 FROM BOOKINGS@BOOK BKG WHERE
BKG.CUSTOMER_ID = CUST.CUSTOMER_ID AND BKG.STATUS = 'open' ) ORDER BY
CUSTOMER_NAME


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 156 0.41 11.85 0 476 2 155
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 158 0.41 11.86 0 476 2 155

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 455 (RSCHRAG)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
155 SORT (ORDER BY)
467 FILTER
467 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'CUSTOMERS'
0 REMOTE [BOOK.WORLD]
SELECT "CUSTOMER_ID","STATUS" FROM "BOOKINGS" BKG WHERE
"STATUS"='open' AND "CUSTOMER_ID"=:1

Figure 4: Benchmark of a query using a nested loops join between two tables on separate nodes of a distributed database.

SELECT CUSTOMER_ID,CUSTOMER_NAME,CLASS_CODE
FROM
CUSTOMERS WHERE CUSTOMER_ID IN (SELECT CUSTOMER_ID FROM BOOKINGS@BOOK
WHERE STATUS = 'open' )ORDER BY CUSTOMER_NAME


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 156 0.07 0.27 0 467 0 155
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 158 0.07 0.28 0 467 0 155

Misses in library cache during parse: 0
Optimizer goal: CHOOSE
Parsing user id: 455 (RSCHRAG)

Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
155 SORT (ORDER BY)
155 NESTED LOOPS
156 VIEW
1000 SORT (UNIQUE)
1000 REMOTE [BOOK.WORLD]
SELECT "CUSTOMER_ID","STATUS" FROM "BOOKINGS" BOOKINGS WHERE
"STATUS"='open'
155 TABLE ACCESS GOAL: ANALYZED (BY ROWID) OF 'CUSTOMERS'
156 INDEX GOAL: ANALYZED (UNIQUE SCAN) OF 'SYS_C002109'
(UNIQUE)

Figure 5: Benchmark of the query from Figure 4, after being optimized to reduce network traffic.

When distributed queries cannot be avoided, use IN clauses, set operators such as UNION and MINUS, and whatever else you can to reduce the network traffic between nodes of the database. Using views will not necessarily improve performance. Views will hide the complexity from the application developer, but all of the complexity and issues are still there under the covers.

With Oracle 8, you can use the DRIVING_SITE hint to control which node of the distributed database drives the distributed query. In Oracle 7, the driving site will always be the node where the query originated.

Tuning the Database

In addition to optimizing the schema and tuning individual SQL statements, the database as a whole can be tuned in a way to optimize join performance. As in the previous section, we’ll focus on issues specific to joins and only glance at general tuning principles.

Optimizer Statistics

The query optimizer evaluates all possible execution plans for a given SQL statement and attempts to choose the best option. If you are using cost-based optimization, the optimizer will rely heavily on statistics in the data dictionary. Since queries with joins tend to have many possible execution paths, its especially important to keep these statistics in order.

For every database schema except SYS, you should collect statistics for all clusters, tables, and indexes on a regular basis. Estimated statistics are sufficient; its usually not worth the overhead to collect computed statistics. Analyzing clusters and tables explicitly is sufficient, because indexes are analyzed automatically when a table is analyzed. If some critical tables have badly skewed data distributions, you might want to have Oracle build histograms while collecting statistics. You can do this in Oracle 7.3 and later, and this can make a big difference.

I was once called in to look at a database where the initial loads ran quickly, but performance quickly degraded. It turned out that statistics had been collected when the database was almost empty. Statistics were not updated as tables grew in size, and so the optimizer began to make bad decisions. This is why it is important to collect statistics regularly. You can use an operating system scheduling function, such as cron in Unix, to do this. Or you can let Oracle take care of this itself by creating a database job (using dbms_job) that calls dbms_utility.analyze_schema on a regular basis.

Arranging Tablespaces

As you probably learned in DBA 101, it is important to create separate tablespaces for tables, indexes, temporary segments, and rollback segments. Separating tables from indexes is key to balancing disk I/O and reducing contention. This will significantly impact nested loops join performance since nested loops joins make heavy use of indexes.

Since sort-merge joins rely on temporary segments, it’s important to dedicate a tablespace to holding temporary segments only and to designate each user’s temporary tablespace accordingly. In a large database with many concurrent users performing lots of sorts, it can be beneficial to have several temporary tablespaces spread across many disks. In Oracle 7.3 and later, you can designate the contents of a tablespace as temporary with the CREATE TABLESPACE command, and this speeds up dynamic space allocation and release for temporary segments.

Be sure to size the temporary tablespace large enough to accommodate enough concurrent sort-merge joins, and size the initial, next, and max extents properly so that temporary segments will be able to grow large enough to perform the desired sorts. Even though Oracle 7.3 and later allow unlimited extents, using very small extent sizes in the temporary tablespace will lead to unnecessary dynamic space allocation.

Initialization Parameters

Performance of sort-merge joins and hash joins are strongly impacted by certain initialization parameters. Join performance can be crippled if these parameters are not set properly.

Sort-Merge Joins

db_file_multiblock_read_count specifies how many blocks Oracle should read at a time from disk when performing a sequential read such as a full table scan. Since sort-merge joins often involve full table scans, this parameter should be set properly. Larger values will reduce overhead when scanning large tables.

sort_area_size specifies how much memory can be used for sorting, and this has a strong impact on performance of all sorts. Since sort-merge joins require sorting of both row sources, sort_area_size strongly impacts sort-merge join performance. If an entire sort cannot be completed in the amount of memory specified by this parameter, then a temporary segment in the temporary tablespace will be allocated. In this case the sort will be performed in memory one part at a time, and partial results will be stored on disk in the temporary segment.

If sort_area_size is set very small, then excessive disk I/O will be necessary to perform even the smallest of sorts. If it is set too high, then the operating system may run out of physical memory and resort to swapping. Sort space is private to each session, and can consume lots of memory if there are a high number of concurrent users. For example, if sort_area_size is set to 5000000, then 100 concurrent users could theoretically use up 500 Mb of memory for sorting.

Hash Joins

The three parameters hash_join_enabled, hash_area_size, and hash_multiblock_io_count control hash join behavior. Unlike most other initialization parameters, these ones can also be dynamically altered with the ALTER SESSION command.

hash_join_enabled dictates whether or not the optimizer should consider using hash joins. If you don’t want hash joins to be used, set this parameter to false.

hash_area_size specifies how much memory can be used to build a hash table for a hash join, and bears a lot in common with sort_area_size. If this parameter is set too small, then partial hash tables will need to be stored in temporary segments. If this parameter is set too big, then physical memory may be exhausted. As with sort_area_size, hash_area_size indicates how much memory can be used per session. Many concurrent sessions can consume a lot of memory.

hash_multiblock_io_count specifies how many blocks should be read at a time when building the hash table, somewhat similar to db_file_multiblock_read_count for sort-merge joins. However, the interplay between hash_multiblock_io_count and hash_area_size significantly impacts hash join performance. Before changing either of these parameters, consult the formula listed in the Oracle 7.3 or 8.0 Server Reference Guide.

1 comment:

  1. KrChowdary,

    Thank you for the great article and simply explanation. I kept it for future reference.

    However, could you please provide more details and possibly a test case or reference paper on the performance benefits over segregating index/data tablespaces?

    Because this is one of those Oracle topics that generates a lot of discussions and causes a bit of confusion.

    For example, I've read on the article http://www.dba-oracle.com/t_segregate_tables_indexes_separate_disks_data_files.htm that the only benefit would be for organization purposes, because:

    "They noted that Oracle the access between an index and it's table entries is linear. Oracle requests an index block, waits, then requests a table block, &c, such that any individual SQL statement would never simultaneously request and index and table block. Hence, they mistakenly concluded that the separation of tables and indexes would not reduce I/O contention."

    Thanks again.

    ReplyDelete