The ANSI/ISO SQL standard defines four levels of transaction isolation, with different possible outcomes for the same transaction scenario. That is, the same work performed in the same fashion with the same inputs may result in different answers, depending on your isolation level. These levels are defined in terms of three phenomena that are either permitted or not at a given isolation level:
· Dirty read: The meaning of this term is as bad as it sounds. You're permitted to read uncommitted, or dirty, data. You can achieve this effect by just opening an OS file that someone else is writing and reading whatever data happens to be there. Data integrity is compromised, foreign keys are violated, and unique constraints are ignored.
· Nonrepeatable read: This simply means that if you read a row at time T1 and try to reread that row at time T2, the row may have changed. It may have disappeared, it may have been updated, and so on.
· Phantom read: This means that if you execute a query at time T1 and re-execute it at time T2, additional rows may have been added to the database, which may affect your results. This differs from a nonrepeatable read in that with a phantom read, data you already read hasn't been changed, but instead, more data satisfies your query criteria than before.
Note that the ANSI/ISO SQL standard defines transaction-level characteristics, not just individual statement-by-statement-level characteristics. I'll examine transaction-level isolation, not just statement-level isolation.
The SQL isolation levels are defined based on whether they allow each of the preceding phenomena. It's interesting to note that the SQL standard doesn't impose a specific locking scheme or mandate particular behaviors, but rather describes these isolation levels in terms of these phenomena—allowing for many different locking/concurrency mechanisms to exist (see Table 1).
Isolation Level Dirty Read Nonrepeatable Read Phantom Read
READ UNCOMMITTED Permitted Permitted Permitted
READ COMMITTED -- Permitted Permitted
REPEATABLE READ -- -- Permitted
SERIALIZABLE -- -- --
Table 1: ANSI isolation levels
Oracle explicitly supports the READ COMMITTED and SERIALIZABLE isolation levels as they're defined in the standard. However, this doesn't tell the whole story. The SQL standard was trying to set up isolation levels that would permit various degrees of consistency for queries performed at each level. REPEATABLE READ is the isolation level that the SQL standard claims will guarantee a read-consistent result from a query. In the SQL standard definition, READ COMMITTED doesn't give you consistent results, and READ UNCOMMITTED is the level to use to get nonblocking reads.
However, in Oracle Database, READ COMMITTED has all of the attributes required to achieve read-consistent queries. In other databases, READ COMMITTED queries can and will return answers that never existed in the database. Moreover, Oracle Database also supports the spirit of READ UNCOMMITTED. The goal of providing a dirty read is to supply a nonblocking read, whereby queries are not blocked by, and do not block, updates of the same data. However, Oracle Database doesn't need dirty reads to achieve this goal, nor does it support them. Dirty reads are an implementation other databases must use to provide nonblocking reads.
In addition to the four defined SQL isolation levels, Oracle Database provides another level: READ ONLY. A READ ONLY transaction is equivalent to a REPEATABLE READ or SERIALIZABLE transaction that cannot perform any modifications in SQL. A transaction using a READ ONLY isolation level sees only those changes that were committed at the time the transaction began. Inserts, updates, and deletes aren't permitted in this mode (other sessions may update data, but not the READ ONLY transaction). Using this mode, you can achieve REPEATABLE READ and SERIALIZABLE levels of isolation.
I'll now move on to discuss exactly how multiversioning and read consistency fit into the isolation scheme and how databases that don't support multiversioning achieve the same results. This information is helpful for anyone who has used another database and believes he or she understands how the isolation levels must work. It's also interesting to see how ANSI/ISO SQL, the standard that was supposed to remove the differences between the databases, actually allows for them. This standard, while very detailed, can be implemented in very different ways.
READ UNCOMMITTED. The READ UNCOMMITTED isolation level allows dirty reads. Oracle Database doesn't use dirty reads, nor does it even allow them. The basic goal of a READ UNCOMMITTED isolation level is to provide a standards-based definition that allows for nonblocking reads. As you've seen, Oracle Database provides for nonblocking reads by default. You'd be hard-pressed to make a SELECT query block and wait in the database (as noted earlier, there is the special case of a distributed transaction). Every single query, be it a SELECT, INSERT, UPDATE, MERGE, or DELETE, executes in a read-consistent fashion. It might seem funny to refer to an UPDATE statement as a query, but it is. UPDATE statements have two components: a read component as defined by the WHERE clause, and a write component as defined by the SET clause. UPDATE statements read and write to the database, as do all DML statements. The case of a single row INSERT using the VALUES clause is the only exception to this, because such statements have no read component—just the write component.
In Chapter 1 [of Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions], I demonstrated Oracle Database's method of obtaining read consistency by way of a simple single table query, which retrieved rows that were deleted after the cursor was opened. I'm now going to explore a real-world example to see what happens in Oracle Database using multiversioning, as well as what happens in any number of other databases.
I'll start with that same basic table and query:
create table accounts
( account_number number primary key,
account_balance number not null
);
select sum(account_balance)
from accounts;
Before the query begins, I have the data shown in Table 2.
Row Account Number Account Balance
1 123 $500.00
2 456 $240.25
... ... ...
342,023 987 $100.00
Table 2: ACCOUNTS table before modifications
Now, my SELECT statement starts executing and reads row 1, row 2, and so on. At some point while this query is in the middle of processing, a transaction moves $400 from account 123 to account 987. This transaction does the two updates but does not commit. The ACCOUNTS table now looks as shown in Table 3.
Row Account Number Account Balance Locked?
1 123 ($500.00) changed to $100.00 X
2 456 $240.25 --
... ... ... --
342,023 987 ($100.00) changed to $500.00 X
Table 3: ACCOUNTS table during modifications
As Table 3 shows, two of those rows are locked. If anyone tried to update them, they'd be blocked. So far, the behavior I'm seeing is more or less consistent across all databases. The difference will be in what happens when the query gets to the locked data.
When the query I'm executing gets to the block containing the locked row (row 342,023) at the bottom of the table, it will notice that the data in it has changed since execution began. To provide a consistent, or correct, answer, Oracle Database will create a copy of the block containing this row as it existed when the query began. That is, it will read a value of $100, which is the value that existed when the query began. Effectively, Oracle Database takes a detour around the modified data—it reads around it, reconstructing it from the undo (also known as a rollback) segment. A consistent and correct answer comes back without waiting for the transaction to commit.
Now, a database that allowed a dirty read would simply return the value it saw in account 987 at the time it read it—in this case, $500. The query would count the transferred $400 twice. Therefore, not only does it return the wrong answer, but also it returns a total that never existed in the table. In a multiuser database, a dirty read can be a dangerous feature. Personally, I've never seen the usefulness of it. Say that, rather than transferring, the transaction was actually just depositing $400 in account 987. The dirty read would count the $400 and get the "right" answer, wouldn't it? Well, suppose the uncommitted transaction was rolled back. I've just counted $400 that was never actually in the database.
The point here is that dirty read is not a feature; rather, it's a liability. In Oracle Database, it's just not needed. You get all of the advantages of a dirty read—no blocking—without any of the incorrect results.
READ COMMITTED. The READ COMMITTED isolation level states that a transaction may read only data that has been committed in the database. There are no dirty reads (reads of uncommitted data). There may be nonrepeatable reads (that is, rereads of the same row may return a different answer in the same transaction) and phantom reads (that is, newly inserted and committed rows become visible to a query that weren't visible earlier in the trans-action). READ COMMITTED is perhaps the most commonly used isolation level in database applications everywhere, and it's the default mode for Oracle Database. It's rare to see a different isolation level used in Oracle databases.
However, achieving READ COMMITTED isolation is not as cut-and-dried as it sounds. If you look at Table 1, it looks straightforward. Obviously, given the earlier rules, a query executed in any database using the READ COMMITTED isolation will behave in the same way, right? No, it won't. If you query multiple rows in a single statement in almost any other database, READ COMMITTED isolation can be as bad as a dirty read, depending on the implementation.
In Oracle Database, using multi-versioning and read-consistent queries, the answer I get from the ACCOUNTS query is the same in the READ COMMITTED example as it was in the READ UNCOMMITTED example. Oracle Database will reconstruct the modified data as it appeared when the query began, returning the answer that was in the database when the query started.
Now I'll take a look at how my previous example might work in READ COMMITTED mode in other databases. You might find the answer surprising. I'll pick up my example at the point described in Table 3:
· I'm in the middle of the table. I've read and summed the first N rows.
· The other transaction has moved $400 from account 123 to account 987.
· The transaction has not yet committed, so rows containing the information for accounts 123 and 987 are locked.
I know what happens in Oracle Database when it gets to account 987: It will read around the modified data, find out it should be $100, and complete. Table 4 shows how another database, running in some default READ COMMITTED mode, might arrive at the answer.
Time Query Account Transfer Transaction
T1 Reads row 1. Sum = $500.00 so far. --
T2 Reads row 2. Sum = $740.25 so far. --
T3 -- Updates row 1 and puts an exclusive lock on row 1, preventing other updates and reads. Row 1 now has $100.00.
T4 Reads row N. Sum = . . . --
T5 -- Updates row 342,023 and puts an exclusive lock on this row. Row now has $500.00.
T6 Tries to read row 342,023 and discovers that it is locked. This session will block and wait for this block to become available. All processing on this query stops. --
T7 -- Commits transaction.
T8 Reads row 342,023, sees $500.00, and presents a final answer that includes the $400.00 double-counted.
Table 4: Timeline in a non-Oracle database using READ COMMITTED isolation
The first thing to notice in Table 4 is that this other database, upon reading account 987, will block my query. This session must wait on that row until the transaction holding the exclusive lock commits. This is one reason why many people have a bad habit of committing every statement, instead of processing well-formed transactions consisting of all of the statements needed to take the database from one consistent state to the next. Updates interfere with reads in most other databases. The really bad news in this scenario is that I'm making the end user wait for the wrong answer. I still receive an answer that never existed in the database, as with the dirty read, but this time I made the user wait for the wrong answer. In the next section, I'll look at what these other databases must do to achieve read-consistent, correct results.
The lesson here is that various databases executing in the same, apparently safe isolation level can and will return very different answers under the same circumstances. It's important to understand that, in Oracle Database, nonblocking reads are not had at the expense of correct answers. You can have your cake and eat it too, sometimes.
REPEATABLE READ. The goal of REPEATABLE READ is to provide an isolation level that gives consistent, correct answers and prevents lost updates. I'll show examples of what you must do in Oracle Database to achieve these goals and examine what happens in other systems. If you have REPEATABLE READ isolation, the results from a given query must be consistent with respect to some point in time. Most databases (not Oracle) achieve repeatable reads through the use of row-level, shared read locks. A shared read lock prevents other sessions from modifying data that you've read. This, of course, decreases concurrency. Oracle Database opted for the more concurrent, multiversioning model to provide read-consistent answers.
Using multiversioning in Oracle Database, you get an answer consistent with when the query began execution. In other databases, using shared read locks, you get an answer that's consistent with when the query completes—that is, when you can get the answer at all (more on this in a moment).
In a system that employs a shared read lock to provide repeatable reads, you'd observe rows in a table getting locked as the query processed them. So, using the earlier example, as my query reads the ACCOUNTS table, it'd leave shared read locks on each row, as shown in Table 5.
Time Query Account Transfer Transaction
T1 Reads row 1. Sum = $500.00 so far. Block 1 has a shared read lock on it. --
T2 Reads row 2. Sum = $740.25 so far. Block 2 has a shared read lock on it. --
T3 -- Attempts to update row 1 but is blocked. Transaction is suspended until it can obtain an exclusive lock.
T4 Reads row N. Sum = . . . --
T5 Reads row 342,023, sees $100.00, and presents final answer. --
T6 Commits transaction. --
T7 -- Updates row 1 and puts an exclusive lock on this block. Row now has $100.00.
T8 -- Updates row 342,023 and puts an exclusive lock on this block. Row now has $500.00. Commits transaction.
Table 5: Timeline 1 in a non-Oracle database using READ REPEATABLE isolation
Table 5 shows that I now get the correct answer, but at the cost of physically blocking one transaction and executing the two transactions sequentially. This is one of the side effects of shared read locks for consistent answers: Readers of data will block writers of data. This is in addition to the fact that, in these systems, writers of data will block readers of data. Imagine if ATMs worked this way in real life.
So, you can see how shared read locks can inhibit concurrency, but how they can also cause spurious errors to occur. In Table 6, I start with my original table, but this time with the goal of transferring $50 from account 987 to account 123.
Time Query Account Transfer Transaction
T1 Reads row 1. Sum = $500.00 so far. Block 1 has a shared read lock on it. --
T2 Reads row 2. Sum = $740.25 so far. Block 2 has a shared read lock on it. --
T3 -- Updates row 342,023 and puts an exclusive lock on block 342,023, preventing other updates and shared read locks. This row now has $50.00.
T4 Reads row N. Sum = . . . --
T5 -- Attempts to update row 1 but is blocked. Transaction is suspended until it can obtain an exclusive lock.
T6 Attempts to read row 342,023 but cannot, as an exclusive lock is already in place. --
Table 6: Timeline 2 in a non-Oracle database using READ REPEATABLE isolation
I've just reached the classic deadlock condition. My query holds resources the update needs, and vice versa. My query has just deadlocked with my update transaction. One of them will be chosen as the victim and will be killed. I just spent a long time and a lot of resources only to fail and get rolled back at the end. This is the second side effect of shared read locks: Readers and writers of data can and frequently will deadlock each other.
As you've seen in Oracle Database, you have statement-level read consistency without reads blocking writes or deadlocks. Oracle Database never uses shared read locks—ever. Oracle has chosen the harder-to-implement but infinitely more concurrent multi-versioning scheme.
SERIALIZABLE. This is generally considered the most restrictive level of transaction isolation, but it provides the highest degree of isolation. A SERIALIZABLE transaction operates in an environment that makes it appear as if there are no other users modifying data in the database. Any row you read is assured to be the same upon a reread, and any query you execute is guaranteed to return the same results for the life of a transaction.
For example, if you execute—
select * from T;
begin dbms_lock.sleep( 60*60*24 ); end;
select * from T;
—the answers returned from T would be the same, even though you just slept for 24 hours (or you might get an ORA-1555, snapshot too old error). The isolation level assures you these two queries will always return the same results. Side effects, or changes, made by other transactions aren't visible to the query no matter how long it has been running.
In Oracle Database, a SERIALIZABLE transaction is implemented so that the read consistency you normally get at the statement level is extended to the transaction. (As noted earlier, there's also an isolation level in Oracle called READ ONLY. It has all of the qualities of the SERIALIZABLE isolation level, but it prohibits modifications. Note that the SYS user (or users connected as SYSDBA) cannot have a READ ONLY or SERIALIZABLE transaction. SYS is special in this regard.)
Instead of results being consistent with respect to the start of a statement, they're preordained when you begin the transaction. In other words, Oracle Database uses the rollback segments to reconstruct the data as it existed when your transaction began, instead of when your statement began. That's a pretty deep thought there: the database already knows the answer to any question you might ask it, before you ask it.
But this degree of isolation comes with a price, and that price is the following possible error:
ERROR at line 1:
ORA-08177: can't serialize access for this transaction
You'll get this message whenever you try to update a row that has changed since your transaction began. (Note that Oracle tries to do this purely at the row level, but you may receive an ORA-01877 error even when the row you're interested in modifying hasn't been modified. The ORA-01877 may happen due to some other row(s) being modified on the block that contains your row.)
Oracle Database takes an optimistic approach to serialization: it gambles on the fact that the data your transaction wants to update won't be updated by any other transaction. This is typically the way it happens, and usually the gamble pays off, especially in quick transaction, OLTP-type systems. If no one else updates your data during your transaction, this isolation level, which will generally decrease concurrency in other systems, will provide the same degree of concurrency as it would without SERIALIZABLE transactions. The downside is that you may get the ORA-08177 error if the gamble doesn't pay off. If you think about it, however, it's worth the risk. If you're using SERIALIZABLE transaction, you shouldn't expect to update the same information as other transactions.
If you do, you should use the SELECT ... FOR UPDATE as described in Chapter 1 [of Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions]. This will serialize the access. So, you can effectively use an isolation level of SERIALIZABLE if you:
· Have a high probability of no one else modifying the same data
· Need transaction-level read consistency
· Will be doing short transactions (to help make the first bullet point a reality)
But—there is always a "but"—you must understand these different isolation levels and their implications. Remember, with isolation set to SERIALIZABLE, you won't see any changes made in the database after the start of your transaction, until you commit. Applications that try to enforce their own data integrity constraints, such as the resource scheduler described in Chapter 1 [of Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions], must take extra care in this regard. The problem in Chapter 1 was that you couldn't enforce your integrity constraint in a multiuser system since you couldn't see changes made by other uncommitted sessions. Using SERIALIZABLE, you'd still not see the uncommitted changes, but you'd also not see the committed changes made after your transaction began.
As a final point, be aware that SERIALIZABLE does not mean that all transactions executed by users will behave as if they were executed one right after another in a serial fashion. It doesn't imply that there's some serial ordering of the transactions that will result in the same outcome. The phenomena previously described by the SQL standard do not make this happen. This last point is a frequently misunderstood concept, and a small demonstration will clear it up. The following table represents two sessions performing work over time. The database tables A and B start out empty and are created as follows:
ops$tkyte@ORA10G> create table a ( x int );
Table created.
ops$tkyte@ORA10G> create table b ( x int );
Table created.
Now I have the series of events shown in Table 7.
Time Session 1 Executes Session 2 Executes
T1 Alter session set isolation_level=serializable; --
T2 -- Alter session set isolation_level=serializable;
T3 Insert into a select count(*) from b; --
T4 -- Insert into b select count(*) from a;
T5 Commit; --
T6 -- Commit;
Table 7: SERIALIZABLE Transaction Example
Now, when the processes shown Table 7 are all said and done, tables A and B will each have a row with the value 0. If there was some serial ordering of the transactions, I couldn't possibly have both tables each containing a row with the value 0. If session 1 executed before session 2, then table B would have a row with the value 1 in it. If session 2 executed before session 1, then table A would have a row with the value 1 in it. As executed here, however, both tables will have rows with a value of 0. They just executed as if they were the only transaction in the database at that point in time. No matter how many times session 1 queries table B, the count will be the count that was committed in the database at time T1. Likewise, no matter how many times session 2 queries table A, the count will be the same as it was at time T2.
Tuesday, October 6, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment