1.0 Materialised Views: A bit of background
I confess a weakness for Materialised Views. It strikes me that they are amongst the cleverest, neatest, most productive features Oracle has ever come up with, and they continue to impress me every time I use them, even though they’ve been around for many, many Oracle versions (including a stint in ancient versions as something known as a ‘snapshot table’).
If you’ve not really worked with materialised views before, that older name for them gives a better idea to their true nature and function than the much sexier-sounding ‘materialised view’ one: they are snapshots of data in ‘real’ tables as at a particular point of time. What’s more (and their real reason for existing), the snapshot that is stored can be the result of a complex join or a set of aggregations of data. So, for example, a very simple example might be:
create materialized view emp_dept_mv
as
select e.empno, e.ename, d.dname, d.loc, e.sal
from scott.emp e, scott.dept d
where e.deptno=d.deptno;
Note firstly that the object uses American spelling, so whereas I refer to them as ‘materialised’, the SQL syntax demands you use a ‘z’ in their names. Second, what you have just created here is a new table, in fact -but one which stores the result of a join between two other tables. You could actually have achieved almost exactly the same thing by issuing this command instead:
create table emp_dept_join
as
select e.empno, e.ename, d.dname, d.loc, e.sal
from scott.emp e, scott.dept d
where e.deptno=d.deptno;
In either case, the select part of the syntax could get an awful lot more complex than the trivial example I have shown here, and hence the ‘snapshot table’ you create in either case would represent the stored results of more and more costly queries. That, in effect, is the principle reason for wanting to use materialised views in the first place: they store the results of costly queries, so that you can then select directly from the stored results in the future, without the database having to bear the costs associated with re-performing the original query.
But, you might reasonably ask, if create table as select syntax can effectively snapshot complex query results, why was there felt to be a need for some new exotic segment type called a materialised view? What can it do that good old CTAS cannot?
Well, the answer to that comes in two parts. First, with a CTAS table copy, you have to know the copy exists, and write your SQL to use it directly. That is, if a user issues a query along the lines of ‘select e.empno, d.loc from emp, dept...’, then the fact that there is an emp_dept_join table in my database is of no consequence: the user is not asking to query from that summarisation, so the query is resolved by re-performing the expensive join on the original, ‘real’, tables. Practically, that would mean that unless you re-wrote your entire application to be aware of the existence of summary CTAS tables, they’d never actually get used. A materialised view, however, can be used even if a user writes SQL which appears to reference the real tables: the database’s optimiser weighs up the costs of re-performing the joins (and possible aggregations) on the base tables, versus the costs of querying from the materialised view which stores the prior results of performing those operations... and if it’s cheaper to query from the materialised view, the optimiser will re-write the SQL issued by the user, automatically and transparently: materialised views could therefore be employed without re-writing a single line of application code. Very nice!
The second reason why materialised views are much more functional than CTAS-copied tables is to do with the fact that any snapshot or copy of a table or tables is bound, sooner or later to get out-of-date and will therefore, presumably, need to be brought back up-to-date somehow. With a CTAS-table, the only thing you can really do is this:
truncate table emp_dept_join;
insert into emp_dept_join
select e.empno, e.ename, d.dname, d.loc, e.sal
from scott.emp e, scott.dept d
where e.deptno=d.deptno;
At this point, of course, you are re-performing the original costly query -and you’re having to remember to do it manually. If the base tables on which your copy is based are subject to a reasonable amount of DML, you’d be having to re-perform this sort of synchronisation so often, you probably would have been better off not having created the copy table in the first place! And if you forget to re-synchronise the copy with its base tables, then users whose SQL statements make use of the copy table will be receiving inaccurate results.
By contrast, a Materialised View can refresh itself, automatically, and (under certain circumstances) the nature of that refresh can be relatively trivial. If you create another ‘exotic’ segment called a materialised view log, then the materialised view can capture the DML that gets performed to the original base tables: the materialised view log is a bit like a mini-online redo log that stores changes made only to the table its built on. At some point, the contents of the view log can then be ‘replayed’ to the materialised view itself, thus bringing it completely up-to-date.
Just to show you these two key functions of materialised views in action, here’s a simple example. First, a few necessary preliminaries:
drop user scott cascade;
User dropped.
@?/rdbms/admin/utlsampl
Disconnected from Oracle Database 10g Enterprise Edition
Release 10.1.0.2.0 - Production With the Partitioning, OLAP and Data Mining options
sqlplus / as sysdba
SQL*Plus: Release 10.1.0.2.0 - Production on Mon Feb 14 08:53:11 2005
Copyright (c) 1982, 2004, Oracle. All rights reserved.
grant create materialized view, query rewrite to scott;
Grant succeeded.
grant select on dba_objects to scott;
Grant succeeded.
grant advisor to scott;
Grant succeeded.
connect scott/tiger
Connected.
Here, I’m clearing out Scott from my database so that I can be sure I’m working with a clean slate. I then re-create Scott using the utlsampl.sql script provided by Oracle in the \rdbms\admin directory. I then finally grant the minimum set of privileges necessary to Scott so that the rest of what follows in this article will work.
I will pause at this point to confess that I have been guilty in the past -and in the first drafts of this paper- of simply granting the DBA role to Scott. It certainly makes sure that the examples in the articles work, but it is monstrously bad practice for any DBA to follow. I should be setting a better example than that, so I have resolved that in this and all future papers I will make the effort to demonstrate the smallest set of privileges needed to make things work. In this case, the privileges fall into three groups. The grant of create materialized view and query rewrite are fundamental to what follows. Without them, 99% of this article would be meaningless. The second grant, of select on DBA_OBJECTS is admittedly a bit odd, and not particularly good DBA practice -but it will let me create a large table with lots of rows and low cardinality ofr some columns very quickly. In a sense, it’s just a cosmetic grant of a privilege for mere convenience, and if you had an alternative table of similar size and cardinality qualities to use instead, you could manage perfectly well without it. The third privilege grant is of advisor, and that is a brand new, 10g-specific privilege. If you didn’t do it on your database, then just one (though very important) piece of the article concerning new 10g functionality won’t work for you, but the bulk of the article about materialised view functionality in general would still make sense.
In any case, my system and object privilege grants have been made, and I’m ready to begin in earnest:
drop table ob purge;
drop table ob purge
*
ERROR at line 1:
ORA-00942: table or view does not exist
drop materialized view ob_sum;
drop materialized view ob_sum
*
ERROR at line 1:
ORA-12003: materialized view "SCOTT"."OB_SUM" does not exist
I’ve thrown in a couple of drop object DDL statements there as one further way of ensuring I’m starting with a clean slate. They cause errors the first time you work through this article, of course, because the first time through, no such objects exist. Second and subsequent work-throughs would proceed without error at this point.
create table ob as select * from dba_objects;
Table created.
alter table ob add (constraint ob_pk primary key(object_id));
Table altered.
SQL> create materialized view log on ob with rowid
2 (owner, object_name, subobject_name, object_id) including new values;
Materialized view log created.
SQL> create materialized view ob_sum
2 refresh on commit fast
3 enable query rewrite
4 as
5 select owner, count(*) as counter from ob
6 group by owner;
Materialized view created.
I start out by creating a copy of a suitably large table: DBA_OBJECTS usually has several thousand records in it, so a copy of it called OB should be sufficient for my purposes. I add a primary key constraint to the copied table, and then add on a materialised view log (so that the eventual materialised view will be able to refresh itself by re-applying base table DML that’s been captured in the log). Finally, I create the materialised view itself, including a demand that it identifies itself as being available for use if the optimiser is considering rewriting any user queries. Line 2 of that piece of SQL also indicates that if a piece of DML is performed on the base table OB, then the materialised view is to bring itself up-to-date with that change as soon as it has been committed. Note that the materialised view merely records the count of the number of objects owned by each user. It contains no joins to other tables, in other words, but it does store the results of a fairly hefty piece of summarisation, aggregation, grouping and sorting.
We’ll be able to see what all of that means in practical terms in just a moment, once I have given Scott the ability to inspect the execution plans the optimiser generates for his queries. That means I have first to run the utlxplan.sql script that Oracle supplies (in /rdbms/admin) to create a PLAN_TABLE in Scott’s schema:
SQL> @?\rdbms\admin\utlxplan.sql
Table created.
Now I can see what execution plans various simple queries will generate: SQL> set autotrace trace explain
SQL> select count(*) from ob 2 where owner='SCOTT';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=1 Bytes=30)
1 0 SORT (AGGREGATE)
2 1 MAT_VIEW REWRITE ACCESS (FULL) OF 'OB_SUM' (MAT_VIEW REWRITE) (Cost=3 Card=1 Bytes=30)
Note that my query says to select from OB. Yet what does the execution plan reveal? It shows that we are really accessing the OB_SUM materialized view -and a minor new feature in 10g is that the reason the query is going to use a completely different segment than the one it actually asked to use is that there has been a MAT_VIEW REWRITE. This is proof, in other words, that users can issue SQL that thinks it’s going to read from one table, but transparently end up looking at another, because the optimiser deems it cheaper to do so. That was the first of the two reasons why materialised views are such wonderful segments, you’ll recall.
SQL> insert into ob (owner, object_name, subobject_name, object_id)
2 values ('SCOTT','TESTINSERT','NONE',98365);
1 row created.
At this point, I’ve just inserted one new record into the base table, OB. I haven’t yet committed that insert. Remembering that I told my materialised view only to bring itself up-to-date when I commit DML performed on the base table, this must mean that my materialised view should now be considered out-of-date. Although it is configurable (though explaining how is outside the scope of this article), materialised views cannot generally be candidates for a bit of query rewriting by the optimiser if they are indeed known to be out-of-date. My materialised view has therefore just been ruled out of consideration for future query rewrites by my little bit of base table DML, and I can test that this is the case easily enough:
SQL> select count(*) from ob 2 where owner='SCOTT';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=129 Card=1 Bytes=17)
1 0 SORT (AGGREGATE)
2 1 TABLE ACCESS (FULL) OF 'OB' (TABLE) (Cost=129 Card=6 Bytes=102)
This is exactly the same query as I issued before, but this time the execution plan reveals that we’re going to select records out of the OB table itself, and not use the OB_SUM materialised view. This is proof therefore that an out-of-date materialised view is dismissed by the optimiser for query rewrite purposes. Incidentally, you can now see why, when the materialised view is known to be fully up-to-date, the optimiser is so keen on making use of it: the query previously reported a total cost of ‘3’, whereas if the base table has to be used, the cost shoots up to 129. That means my silly, simple, materialised view is nevertheless performing about 40 times faster than the base table: now you understand how these things can be so useful!
SQL> commit;
Commit complete.
select count(*) from ob 2 where owner='SCOTT';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3 Card=1 Bytes=30)
1 0 SORT (AGGREGATE)
2 1 MAT_VIEW REWRITE ACCESS (FULL) OF 'OB_SUM' (MAT_VIEW REWRITE) (Cost=3 Card=1 Bytes=30)
Finally, I commit my bit of pending DML, and then immediately re-issue exactly the same query as before. The execution plan this time reveals that the materialised view is back in business. That’s an indication that the commit must have caused the materialised view to bring itself back up-to-date, at which point it’s a candidate for query rewrites once more. A more direct proof of the second of the key features of materialised views -this ability of theirs to refresh themselves- can be seen easily like so:
SQL> set autotrace off
SQL> select count(*) from ob where owner='SCOTT';
COUNT(*)
----------
9
SQL> select counter from ob_sum where owner='SCOTT';
COUNTER
----------
9
So far, at least, the two table’s count of the number of objects owned by Scott is synchronised.
SQL> insert into ob (owner, object_name, subobject_name, object_id)
2 values ('SCOTT','TESTINSERT','NONE',98379);
1 row created.
SQL> select count(*) from ob where owner='SCOTT';
COUNT(*)
----------
10
SQL> select counter from ob_sum where owner='SCOTT';
COUNTER
----------
9
But as a result of the insert of one new record, the OB base table’s count increases to 10, whereas the count of objects in the materialised view OB_SUM remains at 9. The materialised view is thus visibly ’stale’.
SQL> commit;
Commit complete.
SQL> select count(*) from ob where owner='SCOTT';
COUNT(*)
----------
10
SQL> select counter from ob_sum where owner='SCOTT';
COUNTER
----------
10
One commit statement later, however, and the two objects’ counts agree once more. As demanded by the materialised view’s definition, the commit causes the earlier insert to OB to be replayed to OB_SUM, thus re-synchronising the two objects. OB_SUM is considered ‘fresh’ once more, and hence (as we saw) available for query rewrites.
I’ve taken quite a lot of time and space to explain the fundamentals of what materialised views are and how they work because, without that level of basic understanding, hardly any of the incremental (but very nice) enhancements that we get in 10g would make sense! Armed with that foundation-level knowledge, therefore, let’s now look specifically at what has changed in 10g.
2.0 DBMS_ADVISOR
Creating materialised views looks simple enough, at least when you consider the examples I used earlier. But it isn’t actually that easy. Consider this simple example:
SQL> create materialized view emp_dept_mv
2 build immediate
3 refresh fast
4 enable query rewrite
5 as
6 select e.ename, d.dname
7 from emp e, dept d
8 where e.deptno=d.deptno;
from emp e, dept d
*
ERROR at line 7:
ORA-12052: cannot fast refresh materialized view SCOTT.EMP_DEPT_MV
This is not the world’s most complex select statement, I think you will agree! Yet it obviously contains sufficient complexity to prevent a fast-refreshable materialised view being created based on it. This, indeed, has been one bugbear that has bedevilled attempts to implement materialised views in previous Oracle versions: their seemingly irrational dislike of being fast-refreshed (which, of course, they have to be able to do if they are to be kept nice and fresh at minimal cost). All sorts of simple factors would trip them up and result in this sort of error, and it was something of an acquired skill to work out which ones were interfering at any one time.
Fortunately, that’s practically a thing of the past in 10g. We have a new package DBMS_ADVISOR and a new procedure within that package called TUNE_MVIEW, which will not only make it clear why a proposed materialised view can’t be created, but will give you the SQL statements actually needed to create it! Here’s how I might sort out that troublesome materialised view above:
SQL> var hjr varchar2(30) begin
2 dbms_advisor.tune_mview(:hjr,
3 'create materialized view emp_dept_mv build immediate refresh fast enable
4 query rewrite as select e.ename, d.dname from emp e,
5 dept d where e.deptno=d.deptno');
6 end;
7 /
PL/SQL procedure successfully completed.
The first line here simply declares a new bind variable, called hjr (you can call it anything you like, of course!). That’s to act as the first of the TUNE_MVIEWS parameters: it becomes the unique identifier for the explanations about how to make this materialised view work when you inspect the relevant table. After that, you simply invoke the TUNE_MVIEW procedure supplying the bind variable identifier and the exact text of the proposed ‘create materialized view’ SQL statement. Oracle thinks about it for a moment or two, and then tells you the procedure finished successfully. At this point, you can inspect its work like so:
set long 3000
set pagesize 3000
set linesize 132
select * from user_tune_mview;
OWNER TASK_NAME ACTION_ID
------------------------------ ------------------------------ ----------
SCRIPT_TYPE
--------------
STATEMENT
--------------------------------------------------------------------------------
SCOTT TASK_5 7
IMPLEMENTATION
CREATE MATERIALIZED VIEW LOG ON "SCOTT"."DEPT" WITH ROWID
SCOTT TASK_5 8
IMPLEMENTATION
ALTER MATERIALIZED VIEW LOG FORCE ON "SCOTT"."DEPT" ADD ROWID
SCOTT TASK_5 9
IMPLEMENTATION
CREATE MATERIALIZED VIEW LOG ON "SCOTT"."EMP" WITH ROWID
SCOTT TASK_5 10
IMPLEMENTATION
ALTER MATERIALIZED VIEW LOG FORCE ON "SCOTT"."EMP" ADD ROWID
SCOTT TASK_5 11
IMPLEMENTATION
CREATE MATERIALIZED VIEW SCOTT.EMP_DEPT_MV build immediate REFRESH FAST WITH RO
WID ENABLE QUERY REWRITE AS SELECT SCOTT.EMP.ROWID C1, SCOTT.DEPT.ROWID C2, "SCO
TT"."DEPT"."DNAME" M1, "SCOTT"."EMP"."ENAME" M2 FROM SCOTT.EMP, SCOTT.DEPT WHERE
SCOTT.DEPT.DEPTNO = SCOTT.EMP.DEPTNO
SCOTT TASK_5 12
UNDO
DROP MATERIALIZED VIEW SCOTT.EMP_DEPT_MV
6 rows selected.
First of all, note how I set the SQL*Plus “LONG” environment variable to a suitably large figure: the STATEMENT column in this view returns a CLOB, and therefore the text will be truncated unless you do that before you start. The other ‘set’ commands stop lines wrapping and page breaks cluttering up the display of the query results.
Secondly, notice how the new USER_TUNE_MVIEW view (there’s a DBA_ equivalent, naturally) contains (in this case) five SQL statements which need to be issued to make the proposed materialised view actually workable. There is also a sixth statement, which shows how you would drop the materialised view after it gets created! In theory, you could simply cut and paste each of the statements in turn (bar the last one, of course!) into a SQL session, and that would get the materialised view created.
But third: notice that these statements are all related to each other by being identified with the one ‘task name’ -in my case, they are all part of TASK_5. I can use that information as an input to yet another new 10g procedure that will generate a SQL script consisting of these individual SQL statements for me, like so:
SQL> connect / as sysdba
Connected.
SQL> create directory MYDIR as 'C:\';
Directory created.
SQL> grant read, write on directory mydir to scott;
Grant succeeded.
SQL> connect scott/tiger
Connected.
SQL> exec dbms_advisor.create_file (dbms_advisor.get_task_script('TASK_5'),'MYDIR','FIXME.TXT')
PL/SQL procedure successfully completed.
The first thing I did here was to make a quick detour to SYS’s schema, so that I could create a new directory object. That will allow the database to write a text file out to the file system. I then granted a couple of object privileges to Scott so that he can actually make use of the new directory object. Obviously, if a directory object already exists in your database, to which you have appropriate read and write access, then there’s no need to create a new one just to get this process working.
With the directory object in place, I logged back on as Scott and invoked the DBMS_ADVISOR package once more, but this time its CREATE_FILE procedure. That takes three essential arguments; the task name I mentioned earlier, the name of the directory object the SQL script is to be written to, and the desired name of the SQL script itself. When this is executed, Oracle sweeps up all SQL statements found in the USER_TUNE_MVIEW view that share the specified task name, and outputs them all to a text file that looks a bit like this:
host type c:\fixme.txt
Rem SQL Access Advisor: Version 10.1.0.1 - Production
Rem
Rem Username: SCOTT
Rem Task: TASK_5
Rem
set feedback 1
set linesize 80
set trimspool on set tab off set pagesize 60
whenever sqlerror CONTINUE
CREATE MATERIALIZED VIEW LOG ON
"SCOTT"."DEPT"
WITH ROWID ;
ALTER MATERIALIZED VIEW LOG FORCE ON
"SCOTT"."DEPT"
ADD ROWID
;
CREATE MATERIALIZED VIEW LOG ON
"SCOTT"."EMP"
WITH ROWID ;
ALTER MATERIALIZED VIEW LOG FORCE ON
"SCOTT"."EMP"
ADD ROWID
;
CREATE MATERIALIZED VIEW SCOTT.EMP_DEPT_MV
REFRESH FAST WITH ROWID
ENABLE QUERY REWRITE
AS SELECT SCOTT.EMP.ROWID C1, SCOTT.DEPT.ROWID C2, "SCOTT"."DEPT"."DNAME" M1
"SCOTT"."EMP"."ENAME"
M2 FROM SCOTT.EMP, SCOTT.DEPT WHERE SCOTT.DEPT.DEPTNO = SCOTT.EMP.DEPTNO;
whenever sqlerror EXIT SQL.SQLCODE
begin
dbms_advisor.mark_recommendation('TASK_5',1,'IMPLEMENTED');
end;
/
There’s nothing particularly clever here: the SQL statements have been wrapped up nicely and formatted beautifully, of course. But otherwise, there’s nothing much here you couldn’t have gotten by simply cutting and pasting direct from USER_TUNE_MVIEW... except for that last statement, which will change the SCRIPT_TYPE column in the USER_TUNE_MVIEW column from IMPLEMENTATION to IMPLEMENTED -thus allowing you to distinguish between SQL statements in that view which still need to be performed and those which have been performed. Of course, if you choose the cut-and-paste straight from the view technqiue, there’s nothing to stop you executing something like that last line yourself and achieving exactly the same outcome. In my case, having created a file called FIXME.TXT, I can execute it as a SQL script like so:
SQL> @C:\fixme.txt
Materialized view log created.
Materialized view log altered.
Materialized view log created.
Materialized view log altered.
Materialized view created.
PL/SQL procedure successfully completed.
And suddenly, what had been impossible to create at the start of this section has clearly been created without fuss:
SQL> select * from emp_dept_mv;
C1 C2 M1 M2
------------------ ------------------ -------------- ----------
AAAPxoAAEAAAAAcAAA AAAPxmAAEAAAAAMAAB RESEARCH SMITH
AAAPxoAAEAAAAAcAAB AAAPxmAAEAAAAAMAAC SALES ALLEN
AAAPxoAAEAAAAAcAAC AAAPxmAAEAAAAAMAAC SALES WARD
AAAPxoAAEAAAAAcAAD AAAPxmAAEAAAAAMAAB RESEARCH JONES
AAAPxoAAEAAAAAcAAE AAAPxmAAEAAAAAMAAC SALES MARTIN
AAAPxoAAEAAAAAcAAF AAAPxmAAEAAAAAMAAC SALES BLAKE
AAAPxoAAEAAAAAcAAG AAAPxmAAEAAAAAMAAA ACCOUNTING CLARK
AAAPxoAAEAAAAAcAAH AAAPxmAAEAAAAAMAAB RESEARCH SCOTT
AAAPxoAAEAAAAAcAAI AAAPxmAAEAAAAAMAAA ACCOUNTING KING
AAAPxoAAEAAAAAcAAJ AAAPxmAAEAAAAAMAAC SALES TURNER
AAAPxoAAEAAAAAcAAK AAAPxmAAEAAAAAMAAB RESEARCH ADAMS
AAAPxoAAEAAAAAcAAL AAAPxmAAEAAAAAMAAC SALES JAMES
AAAPxoAAEAAAAAcAAM AAAPxmAAEAAAAAMAAB RESEARCH FORD
AAAPxoAAEAAAAAcAAN AAAPxmAAEAAAAAMAAA ACCOUNTING MILLER
It might not look pretty -but at least it exists!
You know, it is quite often the little, insignficant new features which go to make a new Oracle version a must-have upgrade... and my belief is that DBMS_ADVISOR.TUNE_MVIEW is one of them for 10g. I myself have spent many a long hour, frustrated and irritated, at the intractable mysteries of getting materialised views to behave... and now here we have something that will write the precise SQL statements needed to make it happen for you. That is, in a word, wonderful, if you ask me.
In practice, it means those many sites who have not implemented materialised views as well or as comprehensively as they could have done because of the complexity of doing so now have no excuse... and can expect a performance dividend accordingly.
And one final word: remember that the before you can make use of the DBMS_ADVISOR package, you need to be granted the new 10g ADVISOR system privilege (it’s number -263 in the SYSTEM_PRIVILEGE_MAP view if you’re interested!)
3.0 Miscellaneous Enhancements
3.1 REWRITE_OR_ERROR Hint
The other enhancements to materialised views that 10g brings are, by comparison, and in my estimation, much less significant, so I won’t spend too much time discussing them here. One of the nice ones is this:
select /*+ REWRITE_OR_ERROR */ count(*) from ob where owner='SCOTT';
That’s a new hint which requires of the optimiser that it either re-writes a query so that it definitely uses the materialised view, or it raises an error and aborts the whole thing. Thus, using one of my earlier examples, I get this:
SQL> select /*+ REWRITE_OR_ERROR */ count(*) from ob where owner='SCOTT';
COUNT(*)
----------
10
1 row selected.
...and because of the hint there, I know without even looking that this query must have been re-written to use the OB_SUM materialised view. A finalbit of proof:
SQL> insert into ob (owner, object_name, subobject_name, object_id)
2 values ('SCOTT','TESTINSERT','NONE',98380);
1 row created.
SQL> select /*+ REWRITE_OR_ERROR */ count(*) from ob where owner='SCOTT';
select /*+ REWRITE_OR_ERROR */ count(*) from ob where owner='SCOTT'
*
ERROR at line 1: ORA-30393: a query block in the statement did not rewrite
Disconnected from Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
Here, I’ve inserted another new row into the OB base table. That means (as we saw earlier) the materialised view is now stale and cannot be used for query rewrite purposes. When I therefore re-issue the hinted select statement, this time it raises an error... and unceremoniously dumps me back to the operating system whilst it’s at it, you might note. I’m not convinced that the hint should actually cause a session to die like that -to my way of thinking, that's a bit drastic!- but at least the base table wasn’t actually used for the query!
That is a fairly major new feature in some respects: what we have here is, for the first time, a mechanism which actively prevents the use of the base table for the query (in 9i, we only had a mechanism to prevent the use of the materialised view). Given that materialised views are likely to be created as summaries and aggregates of huge tables in a data warehouse, the ability to prevent an inadvertent access to the huge base tables is capable of saving a massive amount of unwanted/unnecessary work on the database. Definitely a good thing, therefore. Just be warned that if a query could potentially be re-written so that it refers to two or more materialised views, and the optimiser successfully uses just one of those materialised views, that counts as a successful re-write, and thus the error isn’t triggered... even though by not accessing the other materialised view, the query is now (presumably) doing massively expensive access to at least one base table. Still: half a loaf is better than no bread whatever, so let’s not complain.
3.2 More Fast Refreshes
As I’ve mentioned elsewhere in this article, it has always been trickier than it perhaps should have been to persuade a materialised view to fast refresh. All sorts of little things would stop them being able to do so: in 8i, for example, if you used the SQL keyword UNION in your materialised view’s defining select statement, that was enough to kill fast refreshing in its tracks.
That particular restriction on when you could fast refresh was lifted in 9i, and 10g continues the trend of removing obstacles which prevent fast refreshes from being possible. For example, a materialised view that joins a table to itself can now be fast refreshable; as can a materialised view built on a select statement that uses an inline view in the FROM clause of the defining select statement. And so on and on... I won’t bore you with the complete list of new occasions when fast refreshes are possible, because it’s tedious in the extreme, and they’re all accompanied by lots of caveats and provisos in any case.
The general point, I think, should simply be left at this: more materialised views will be able to fast refresh in 10g than in any previous Oracle version. That’s good enough for me, at any rate.
3.3 Reliable and Cheaper Refreshes
Refreshing a materialised view is, almost by definition, an expensive job: we create them in the first place because the query they represent is considered expensive, yet refreshing them must inevitably require us to re-execute precisely that same expensive query! This can be extremely bad news indeed when you set about creating multiple summarisations of your base data -in other words, when you create multiple materialised views on the same base tables- because each and every one of them, when they refresh, will have to perform what we can assume will be massively expensive queries.
At least, that was the way things were in 9i. Fortunately, the expense issue is somewhat alleviated in 10g because it is now possible for a materialised view to be refreshed by having the query it is built on re-written so that the data can come from other materialised views rather than direct from the expensive-to-query base tables.
The subtlety of this point is easily lost, so I’ll reiterate it in case you missed it! A materialised view is usually created as a selection of columns from base tables, but the optimiser in 10g is willing to re-writing the very query that defines the contents of materialised view so that fresh data can be sourced from other materialised views, not the base tables themselves.
If a table has several materialised views built on it, for example, each a different form of summarisation compared with the rest (imagine summarising sales data by day, week, month, quarter, state, region, store, salesman and so on), then potentially only one of them has to be refreshed by going back to the base table itself: all the others could conceivably get their fresh data from that one source-refreshed materialised view. If so, that would make refreshing the bulk of the materialised views enormously cheaper. Excellent news!
Naturally, there’s a catch! To know that a summarisation of sales data by week (say) can give you the data needed to refresh a summarisation of sales by month, you have to know that weeks are related to months. Oracle has no such chronological understanding innately -but you’ve been able to tell it about such relationships for many versions by creating objects called dimensions. Trouble is, dimensions are not enforced in the database: they merely declare that a relationship exists: nobody and nothing actually bothers to check that it’s true. Similarly, it has long been possible to create foreign and primary key constraints which aren’t actually enabled, but which are merely flagged as being ‘reliable’ (by setting the RELY flag for them, in fact!). That is another form of merely declaring a relationship or quality to exist, rather than having the database prove it for certain. Well, the catch with materialised views being able to refresh themselves from other materialised views is that -by default- they will ignore any relationship that is merely declared to exist. Unless it’s strictly enforced (and dimensions, for example, simply cannot be), then the optimiser won’t use that relationship to determine whether a query rewrite for refresh purposes can take place.
Put into simple words, that means a materialised view of sales by month would not be able to use a materialised view summarising sales by week as its source for a refresh because the dimension telling Oracle that weeks and months are related is ignored by the optimiser for the refresh operation.
That is good in one sense. An enforced relationship is known, absolutely, to be accurate and reliable. A trusted relationship is only as reliable as the DBA that set the RELY flag! If a materialised view were to use a trusted relationship as the basis of determining where it should get its fresh data from, and that relationship was in fact badly defined, or violated left, right and centre by the data, then the contents of the refreshed materialised view would end up logically corrupt. Using only enforced relationships means that can’t happen.
But the downside is that to definitely enforce a primary key, for example, you’re going to need an index: and in a Data Warehouse, that’s not going to be a trivial little index. It’s going to be a thumping great monster that consume huge amounts of disk space, and lots of CPU cycles as it is created and maintained. And as for dimensions: well, they are only ever trusted relationships, so the default, strict policy means they can never influence the refresh mechanism.
Fortunately, this behaviour is configurable, otherwise the new feature of materialised views refreshing other materialised views would be a pretty lame one. So, although the use of only genuinely-enforced relationships is the default behaviour in 10g, it can be changed like so:
create materialized view SALESMV
refresh complete on demand
using trusted constraints
as select weekno, sum(sales)
from sales
group by weekno;
...and the magic words there are USING TRUSTED CONSTRAINTS. That’s the opposite of the default setting, which is USING ENFORCED CONSTRAINTS.
Trying to sum up a subtle and complex point in as few words as possible, therefore: ENFORCED CONSTRAINTS is the default setting, and means materialised view refreshes only use totally, 100% reliable information as the source of their data when they get refreshed. But some relationships (dimensions are the classic example) can never be used to influence the query rewriting process with this set. And other relationships may be very costly to genuinely enforce (indexes on primary keys, for example). You gain reliability at the cost of performance, basically.
If you use TRUSTED CONSTRAINTS, by contrast, then things like dimensions really can be used to influence the rewriting process, and performance will probably be better without massive indexes (for example) slowing you down. Rewrites are more likely to happen at this setting, too, so maintenance of the materialised views themselves will be quicker and cheaper. But this setting does mean, remember, that the contents of the materialised views are only as good as the competence of the DBA who declares relationships to be trustworthy in the first place. If he or she gets it wrong, your materialised views are effectively corrupt and they’ll need to be rebuilt from scratch at some point (which is, of course, a very expensive operation).
Tuesday, October 6, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment