Tuesday, October 6, 2009

Data Pump: Export and Import

1.0 Introduction
Traditional export and import (`exp' and `imp') have not gone away: they are still there in 10g. But they have been augmented (superseded, perhaps) by EXPDP and IMPDP, with the extra `DP' in their executable names indicating that we are talking about the Export Data Pump and Import Data Pump utilities. The new utilities do very much what the traditional ones used to do... but they do them more efficiently and give the DBA a chance to exercise greater control and administrative management over them.
This short paper sets out to explain what the new utilities can offer that the old ones lacked, and to demonstrate with practical, worked examples, how they work.
2.0 Characteristics of Data Pump
The Data Pump utilities do the same job as the old EXP and IMP ones: extract data out of an Oracle database into a proprietary binary format, or load data into a database from a proprietary binary file. But they do so more effectively than their ancient cousins, for several reasons:
·    You can disconnect from a Data Pump job, leaving it running in the background whilst you get on with other work. Compare that to EXP or IMP, where your terminal/session was tied up until the export/import job had completed.
·    You can monitor a Data Pump job from within the database itself. There are data dictionary views to show you what DP jobs are running, what their names are, and how far along they've gotten. With traditional EXP/IMP, you could only tell the progress of a job by looking at the terminal/session it had taken over, and hoping that the screen displayed some activity.
·    DP Jobs are resumable. They keep tabs on where they've gotten to by populating a special table in the export/import data set, and thus if the job is abnormally terminated, it can resume from where it left off by reading the contents of this table.
·    DP Jobs are parallelisable. I can't recall a way of making EXP/IMP parallelise its operations, except by starting lots of different export jobs, each with a different list of things to export, but Data Pump definitely does, and easily.
·    Data Pump has an application programming interface (API) -which is a posh way of saying it works because there are a bunch of new packages and procedures in the database (DBMS_DATAPUMP, for example) to control it, and that in turn means that it is possible to write your own code to interface with it. Hence, programmatic control of DP with your own customised interfaces becomes possible.
·    Export Data Pump allows you to choose specific objects to export, and is able to export far more object types than just tables, indexes and so on.
·    Data Pump does its work entirely and only on the server itself. It can be invoked from a client, of course, but the processes will run, and the outputs will only ever be produced, on the server.
·    You can ask an Export Data Pump job to estimate how big the output file would be, without actually running it for real.
·    You can make an Export Data Pump output file version-specific. This is of no use in getting 10g data into a 9i database, of course, since the 9i database will not understand the version identifier. But in the future, porting data between 10g and (say) 12z will be a lot easier when the export utility itself makes sure the output it generates is version-identifiable.
·    Potentially one of the most useful features is that it is now possible to import data into one database directly from another one over a database link (that is, over the network). Doing that in 9i or earlier, you would have exported out of one and generated a dump file; then you would have moved the dump file between the two servers; then you would have run import on the other database. With Data Pump running in Network Mode, you simply extract the data directly over the database link: there are no intermediate files.
·    A Data Pump import job can change tablespace names found in the source dump file set... so there should be no more mucking around trying to edit a binary file, or fiddling with user permissions and default tablespaces, simply to get a table created in a tablespace with a different name than it came from.
Oracle invented DBMS_METADATA in version 9i, and although it was a nice way of extracting the DDL needed to re-create a table or an index (say), you had to wonder why they bothered with it, since it didn't exactly lead on to anything else very major. But the reason becomes obvious in 10g, because it is the DBMS_METADATA package which Data Pump uses to extract the DDL for objects out of the database and into the dump file set. Data Pump actually uses its GET_XML procedure, too, so the DDL inside a dump file set is actually stored in XML format.
Every Data Pump job causes a table to be created in the schema of the user who is performing the job and with a name that is the same as the Data Pump job name (which can be system-generated if you don't supply one). This table is known generically as the `Master Table' or MT for short (and in the Oracle documentation). The Master Table is used to keep track of where a particular Data Pump job has gotten to, and is thus the mechanism which makes resumable Data Pump jobs possible. When the job finishes, the table is dropped. If the job aborts abnormally, the table persists and may need to be manually dropped (assuming you don't want to restart the job, of course).
If you already own a real table which happens to have the same name as the name you give to a Data Pump job, then that job will not even start, because Oracle won't be able to create the Master Table for the proposed job due to the potential duplicate table names.
Data Pump can output data using the direct path mode, or via external tables. It will usually choose the direct path option, but may elect to use external tables whenever it encounters the following:
·    Tables which use row-level security to limit what users can insert or select
·    Clustered tables (Index clusters, hash clusters)
·    Tables with triggers
·    Tables that have BFILE data type columns
·    Tables with foreign key constraints
Because Data Pump runs entirely on the server, it will output to a DIRECTORY object (which obviously needs to be created first). It can also output to a default directory object (which still needs to be created first, of course), specified with an environment variable DATA_PUMP_DIR.
For example:
SQL> create directory DUMPHERE as 'c:\dumpdir\mydump';
Directory created.

SQL> exit

C:\>expdp scott/tiger directory=DUMPHERE dumpfile=file%U.dmp full=y

Export: Release 10.1.0.2.0 - Production on Monday, 07 February, 2005 10:56 Copyright (c) 2003, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
FLASHBACK automatically enabled to preserve database integrity.
Starting "SCOTT"."SYS_EXPORT_FULL_01": scott/******** directory=DUMPHERE dumpfile=file%U.dmp full=y
Estimate in progress using BLOCKS method...
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
In this case, everything ends up being written to wherever the DUMPHERE directory object points:
C:\dumpdir\mydump>dir

Volume in drive C is W2K
Volume Serial Number is A0E3-CE0E
Directory of C:\dumpdir\mydump
07/02/2005 10:57a    
          .
07/02/2005 10:57a               ..
07/02/2005 10:59a               2,846 export.log
07/02/2005 10:57a               4,096 FILE01.DMP
               2 File(s)         6,942 bytes
               2 Dir(s)  6,969,552,896 bytes free
That includes the dump file itself (note the %U in the requested file name has been converted into a unique, two-digit, sequential number), and the log file, which is a record of how the Data Pump job went. Incidentally, the %U is only unique within the one Data Pump job: if another job is started, its %U will initialise at `01', not at whatever number the other job's %U had gotten to. That could mean one job proposing a file name that's actually the same as one already created by another job -at which point, the second job will either fail to start, or will abort.
Alternatively, I could do this:
C:\>set DATA_PUMP_DIR=DUMPHERE

C:\>expdp scott/tiger dumpfile=file%U.dmp full=y

Export: Release 10.1.0.2.0 - Production on Monday, 07 February, 2005 11:15 Copyright (c) 2003, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
FLASHBACK automatically enabled to preserve database integrity.
Starting "SCOTT"."SYS_EXPORT_FULL_01": scott/******** dumpfile=file%U.dmp nologfile=y full=y
Estimate in progress using BLOCKS method...
This time, I've set the DATA_PUMP_DIR environment variable, so I don't have to then specify a DIRECTORY parameter: I merely specify the bare file name component as before, and the environment variable kicks in to act as a default DIRECTORY setting.
If you wish different export files to be written to different locations, then assuming you have the directory objects created first, you could use this sort of syntax:
SQL> create directory LOGDUMP as 'c:\dumpdir';
Directory created.

SQL> exit

C:\>expdp scott/tiger dumpfile=dumphere:file%U.dmp logfile=logdump:log%U.log full=y

Export: Release 10.1.0.2.0 - Production on Monday, 07 February, 2005 11:25 Copyright (c) 2003, Oracle. All rights reserved.
FLASHBACK automatically enabled to preserve database integrity.
Starting "SCOTT"."SYS_EXPORT_FULL_01": scott/******** dumpfile=dumphere:file%U.dmp logfile=logdump:log%U.log full=y
Estimate in progress using BLOCKS method...
Here, the actual data is being sent to one directory object, and the log file is being sent to quite another using a DIRECTORY:FILENAME syntax. Both, however, are still referencing directory objects which were already in existence inside the database and to which Scott, the database user, had been granted read and write permissions.
What you absolutely can not do is something like this:
C:\>expdp scott/tiger dumpfile=C:\dumpdir\mydump:file%U.dmp full=y

Export: Release 10.1.0.2.0 - Production on Monday, 07 February, 2005 11:07 Copyright (c) 2003, Oracle. All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
ORA-39001: invalid argument value
ORA-39000: bad dump file specification
ORA-39088: file name cannot contain a path specification
As the error message indicates, paths cannot be specified at the time of doing the Data Pump job, so my request for the export to go to C:\anywhere at all is bound to fail. This is a very important security measure, because with traditional export, if you had rights to connect to the database at all, you could export the database's data to pretty much anywhere -to a USB key, for example... at which point the DBA just has to watch helplessly as sensitive data leaves the building. Here is an export mechanism, however, that ensures that data can only be exported to designated locations (wherever the directory objects created earlier point to) -and since the DBA can grant read and write permissions for those directory objects to database users, the export can only be done by designated users. Owning tables these days does not, in other words, give you the right to datapump the data in those tables to anywhere you choose.  
3.0 Other Key Data Pump functionality
You can obtain a list of all the parameters that can be specified for a Data Pump job by doing this:
expdp help=y
or
impdp help=y
Some of the parameters you'll find particularly useful are:
3.1 FILESIZE
This parameter allows you to limit the size of an export dump file to a specified number of bytes. When the export hits this size, it increments the value for %U (if specified) and keeps going by generating a new output file. Thus, you might end up with a series of files called, say, FILE01.DMP, FILE02.DMP and so on. If %U wasn't specified, the utility will simply sit there, prompting you to supply a new file name.
3.2 PARALLEL
Quite a key enhancement, of course: this parameter allows you to specify a number of slave processes which will simultaneously export the requested data. If you set this greater than 1, then that many export output files will immediately be created to receive the output, assuming the use of the %U parameter in the file naming part of the invoking command. If many different directory locations are specified for the different output files, then they are used in round robin fashion.
3.3 EXCLUDE / INCLUDE
Allows you to specify which object types you definitely want in the export, or definitely kept out of it. For example:
INCLUDE=TABLE:"LIKE `EMP%'"
...would mean `only export tables with names beginning with EMP'. And:
EXCLUDE=INDEX:"='PK_EMP'"
EXCLUDE=PACKAGE
EXCLUDE=VIEW
...would mean `Don't export an index called PK_EMP and don't include any packages or views, whatever they're called'. For logical reasons, you cannot combine INCLUDE and EXCLUDE in the same Data Pump job. If you put all four of the above parameters in the same job, you'd be asking to `export only tables called EMP, and please don't include indexes, packages or views either'... but those indexes, packages and views weren't going to be included in the first place, because the INCLUDE clause, er, excludes them from consideration! To protect you from such logical nonsense, Oracle doesn't let you mix the two parameters in the one job in the first place. You'll get this sort of error message if you try:
UDE-00011: parameter include is incompatible with parameter exclude
When you specify these parameters, be careful of the double quotation marks. At the command line, they will need to be escaped like so:
expdp scott/tiger dumpfile=dumphere:file%U.dmp schemas=SCOTT exclude=TABLE:\"='EMP'\", EXCLUDE=FUNCTION:\"='WHOAMI'\", EXCLUDE=VIEW:\"='XXX'\"
If the escape characters make it barely readable, that's what a parameter file is for! (That is, create a text file with the parameters specified without the escape characters, and then invoke the Data Pump job with the parameter PARFILE='name_of_text_file_here').
One final example:
expdp scott/tiger dumpfile=dumphere:file%U.dmp schemas=SCOTT include=TABLE:\"like'EMP%'\", INCLUDE=FUNCTION
This means the dump file only contains EMP, as well as all functions owned by Scott, regardless of their name. I stress this point, because the Oracle documentation is wrong on this point. It states that the above example would tables with names starting with EMP' to be exported, along with any functions owned by Scott that also have names starting with EMP. That is not the case: the above example really does export the EMP table, but it also exports a function I deliberately created called WHOAMI -which most definitely doesn't have a name starting with EMP!! The INCLUDE and EXCLUDE type and name clauses are thus independent of each other: a name specified for a table include does not apply to a package include, for example.
3.4 REMAP
This is specific to Data Pump Import. It is what allows you to import a table into a different tablespace from which it was originally exported. As a worked example, let's see where Scott's EMP table was originally created:
SQL> select tablespace_name from dba_tables
   2 where table_name='EMP';

TABLESPACE_NAME
------------------------------
USERS
Left to its own devices, this is where Data Pump Import would re-create the table if asked:
SQL> drop table EMP;
Table dropped.

C:\>impdp scott/tiger dumpfile=FILE01.DMP tables='EMP'

Starting "SCOTT"."SYS_IMPORT_TABLE_01": scott/******** Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
. . imported "SCOTT"."EMP" 7.820 KB 14 rows
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/INDEX
Processing object type [...]

C:\>sqlplus scott/tiger

SQL> select tablespace_name from dba_tables
   2 where table_name='EMP';

TABLESPACE_NAME
------------------------------
USERS
But using the REMAP parameter, I can change this default behaviour:
SQL> drop table EMP;
Table dropped.

C:\>impdp scott/tiger dumpfile=FILE01.DMP tables='EMP' REMAP_TABLESPACE='USERS:EXAMPLE'

Import: Release 10.1.0.2.0 - Production on Monday, 07 February, 2005 12:42 [snip the feedback]C:\>sqlplus scott/tiger

SQL> select tablespace_name from dba_tables
   2 where table_name='EMP';

TABLESPACE_NAME
------------------------------
EXAMPLE
So now the table is happily located in its new tablespace -and that's a piece of functionality people have been requesting for absolutely ages, so it's nice to see it there, finally!
You can also REMAP_DATAFILE. That's particularly helpfui when you transfer between databases on different platforms: your original create tablespace X datafile 'c:\wherever\x.dbf' commands can be automatically altered to now read create tablespace X datafile '/u01/app/oracle/10g/x.dbf' and so on. You can use the feature to transform any datafile path and name into any other path and name, though, so it's not just for going cross-platform.
Finally, there is also a REMAP_SCHEMA parameter to alter ownership of objects -rather like the old FROMUSER and TOUSER parameters in traditional Import.
4.0 Data Pump Job Names
When you start a Data Pump job, it is either assigned a system-generated name, or you assign it one yourself. The examples I've been using so far have all relied on system-generated names. Remembering that the all-important master table gets created with the same name as the job, you can see what job name your Data Pump job got given like so: [In session 1]
SQL> select table_name from user_tables;

TABLE_NAME
------------------------------
SALGRADE
BONUS
EMP
DEPT

[In session 2]
C:\>expdp scott/tiger dumpfile=file%U.dmp full=y
[...]
Starting "SCOTT"."SYS_EXPORT_FULL_01": scott/*** dumpfile=file%U.dmp full=y
Estimate in progress using BLOCKS method...
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA

[Back in session 1]
SQL> connect scott/tiger
Connected.

SQL> select table_name from user_tables;

TABLE_NAME
------------------------------
SYS_EXPORT_FULL_01
SALGRADE
BONUS
EMP
DEPT
...and you'll note, I'm sure, the immediate existence of a new table, called something like SYS_EXPORT_FULL_nn. That is a table name, but it is also the name of the actual export job started in Session 2, as you can check directly by querying a view new to 10g:
SQL> select owner_name, job_name, operation, job_mode
   2 from dba_datapump_jobs;

OWNER_NAME JOB_NAME           OPERATION      JOB_MODE
---------- ------------------ -------------- ---------
SCOTT      SYS_EXPORT_FULL_01 EXPORT         FULL
That view, however, as you might expect only contains rows when a Data Pump job is actually running. Once the thing has finished, the rows are cleared out (and the Master Table in the user's schema should also be dropped).
If you want to assign your own names to Data Pump jobs, then you do so by specifying the JOB_NAME parameter at the time of starting the job -but be careful in your choice of name:
C:\>expdp scott/tiger dumpfile=file%U.dmp full=y JOB_NAME='DP-JOB-01'
C:\>sqlplus scott/tiger

SQL> select owner_name, job_name, operation, job_mode
  2 from dba_datapump_jobs;

OWNER_NAME JOB_NAME          OPERATION      JOB_MODE
---------- ------------------ -------------- ---------
SCOTT      DP                EXPORT         FULL

SQL> select table_name from user_tables;

TABLE_NAME
------------------------------
DP
SALGRADE
BONUS
EMP
DEPT
Table names in Oracle cannot have hyphens in them at the best of times -so specifying them as part of my Data Pump export job name causes some problems. In this case, Oracle simply trimmed off everything from the first hyphen onwards, and the job actually proceeded to a successful completion. But imagine what would have happened if I'd called my job something like `EMP-DPUMP-JOB1':
C:\>expdp scott/tiger dumpfile=file%U.dmp full=y JOB_NAME='EMP-DPUMP-JOB1' Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
ORA-31626: job does not exist
ORA-31633: unable to create master table "SCOTT.EMP"
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPV$FT", line 701
ORA-00955: name is already used by an existing object
The proposed name gets truncated to something which implies creating a table called `EMP' ...and we already have one of those, so the whole thing goeshorribly wrong!
5.0 Managing Data Pump Jobs
One of the key new features Data Pump has over traditional export and import is that the jobs run entirely on the server... so the client can disconnect at any time, and leave the job running in the background. This also gives you the ability to re-connect to an already-running job, and the further capability of being able to stop and re-start jobs as necessary.
5.1 Detaching from a Job
To detach from a Data Pump job, simply press Ctrl+C -though don't do that too early in the piece! Once the job has at least begun to be processed is fine, though:
C:\>expdp scott/tiger dumpfile=file%U.dmp full=y
Export: Release 10.1.0.2.0 - Production on Monday, 07 February, 2005 13:51
Copyright (c) 2003, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
FLASHBACK automatically enabled to preserve database integrity.
Starting "SCOTT"."SYS_EXPORT_FULL_01": scott/******** dumpfile=file%U.dmp full=y
Estimate in progress using BLOCKS method...

[Ctrl+C pressed at this point]
Export>
You'll notice that this takes you to an `Export>' prompt. Here you can type various commands without disturbing the progress of the job you've already initiated, such as:
Export> exit_client
C:\>
...which obviously gets you back to the command line. You can check that the job is still running in the background quite happily by querying either the DBA_DATAPUMP_JOBS view I showed you earlier, or this one:
SQL> select opname, target_desc, sofar, totalwork
  2  from v$session_longops;

OPNAME                         TARGET_DES      SOFAR TOTALWORK
------------------------------ ---------- ---------- ----------
SYS_EXPORT_FULL_01             EXPORT            70         70
SYS_EXPORT_FULL_01             EXPORT            70         70
DP                             EXPORT            70         70
SYS_EXPORT_FULL_01             EXPORT            71         71
SYS_EXPORT_FULL_01             EXPORT            26         71
This view is not new in 10g, but it's been given the new ability to view the progress of Data Pump jobs. A lot of the rows you see here represent finished, old jobs (the rows persist until the instance is re-started). But the last one certainly shows that the job is 26MB through a 71MB job (there is a column called UNITS which you can select to see what the numbers in the SOFAR and TOTALWORK columns actually represent). To pick out just your job amongst many possible candidate jobs, you will need to add a WHERE clause to the query, selecting the SID and SERIAL# of the Data Pump client session, which you could determine by a prior glance at V$SESSION. Actually, the simpler approach is to always run Data Pump jobs as a `weird' user specially created for the occasion -then you can simply join V$SESSION_LONGOPS with V$SESSION and select on V$SESSION's USERNAME column. Be aware, incidentally, that V$SESSION_LONGOPS takes a bit of a while to get warmed up! A job can be running for many minutes before any rows appear in the view to record the fact it ever started... and the view will probably report there is still quite a lot of work to do long after the job has actually finished. So, there's a bit of a time lag, but it's still mostly useful!
5.2 Re-attaching to a Job
You may wish to re-attach to a running job in order to be able to stop it. You can do so relatively easily:
C:\>expdp scott/tiger attach=SYS_EXPORT_FULL_01
Export: Release 10.1.0.2.0 - Production on Monday, 07 February, 2005 14:08
Copyright (c) 2003, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
Job: SYS_EXPORT_FULL_01
Owner: SCOTT
Operation: EXPORT
Creator Privs: FALSE
GUID: 65C413248E824D45A3E42B5704B93A13
Start Time: Monday, 07 February, 2005 14:08
Mode: FULL
Instance: sales
Max Parallelism: 1
EXPORT Job Parameters:
Parameter Name Parameter Value:
CLIENT_COMMAND scott/******** dumpfile=file%U.dmp full=y
DATA_ACCESS_METHOD AUTOMATIC
ESTIMATE BLOCKS
INCLUDE_METADATA 1
LOG_FILE_DIRECTORY DUMPHERE
LOG_FILE_NAME export.log
TABLE_CONSISTENCY 0
State: EXECUTING
Bytes Processed: 0
Current Parallelism: 1
Job Error Count: 0
Dump File: c:\dumpdir\mydump\file%u.dmp
Dump File: C:\DUMPDIR\MYDUMP\FILE01.DMP
bytes written: 4,096
Worker 1 Status:
State: EXECUTING
Export>
As you can see, all that is needed is to invoke the relevant Data Pump executable with an ATTACH= parameter. That parameter then accepts a Data Pump job name as its argument -though if there is only one Data Pump job running in the schema you're connecting to, then you can actually get away simply with something like expdp scott/tiger attach. Incidentally, the above example was done when I'd forgotten to disconnect from the job in the original command line session -so one can conclude that you can attach to the same job several times without causing too much trouble.
5.3 Stopping a Job
One of the main reasons you'd want to re-attach to a job would be to stop it (having realised that all the wrong parameters were supplied in the first place, perhaps; or because you hadn't realised quite what a devastating effect on your other users the massive select activity of the export would have). That's easy, too:
C:\>expdp scott/tiger attach

Export: Release 10.1.0.2.0 - Production on Monday, 07 February, 2005 14:16
Copyright (c) 2003, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
Job: SYS_EXPORT_FULL_01
Owner: SCOTT
Operation: EXPORT
[snip for brevity's sake]
Export> stop_job
Are you sure you wish to stop this job ([y]/n): y
C:\>
This time, I've used the bare `attach' parameter, with no job name specified -yet, as you can see, because Scott's only got one job running, there's no problem connecting to it. To stop the job, I then simply type the command STOP_JOB at the Export> prompt. Confirmation is requested, after which I am dropped back to the command prompt.
A job that is stopped in this way doesn't simply blow up. It finishes the bit that it's currently processing, and therefore can be said to stop `gracefully'. If you genuinely need the job to simply blow up, then you can use the command STOP_JOB=IMMEDIATE. Assuming you've gone for the graceful option, however, you could then do this:
C:\>expdp scott/tiger attach Job: SYS_EXPORT_FULL_02
[snip]
EXPORT Job Parameters:
Parameter Name Parameter Value:
CLIENT_COMMAND scott/******** dumpfile=file%U.dmp full=y
DATA_ACCESS_METHOD AUTOMATIC
ESTIMATE BLOCKS
INCLUDE_METADATA 1
LOG_FILE_DIRECTORY DUMPHERE
LOG_FILE_NAME export.log
TABLE_CONSISTENCY 0
State: STOP PENDING
Notice the last line I've shown here (in reality, it's buried half-way through a much longer set of feedback returns): the request to stop is `pending'. Give it enough time, and a gracefully-stopped job will indeed terminate, and you'll be able to verify that by re-attaching to it:
C:\>expdp scott/tiger attach

Job: SYS_EXPORT_FULL_02 [snip]
Parameter Name Parameter Value:
CLIENT_COMMAND scott/******** dumpfile=file%U.dmp full=y
DATA_ACCESS_METHOD AUTOMATIC
ESTIMATE BLOCKS
INCLUDE_METADATA 1
LOG_FILE_DIRECTORY DUMPHERE
LOG_FILE_NAME export.log
TABLE_CONSISTENCY 0
State: IDLING
Bytes Processed: 0
Current Parallelism: 1
Job Error Count: 0
Dump File: c:\dumpdir\mydump\file%u.dmp
Dump File: c:\dumpdir\mydump\file01.dmp
bytes written: 4,096
Worker 1 Status:
State: UNDEFINED
And this time, the Job's "State" is declared to be `idling', and the state of the various server processes and slaves which used to be running it is shown to be `undefined'. Watch out for how you attempt to re-attach to stopped jobs. On assorted Windows machines, I repeatedly got this sort of problem:
C:\>expdp scott/tiger attach

Export: Release 10.1.0.2.0 - Production on Monday, 07 February, 2005 15:03
Copyright (c) 2003, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
ORA-31626: job does not exist
ORA-31630: a job name is required to attach a job for user SCOTT

C:\>expdp scott/tiger attach=SYS_EXPORT_FULL_03
Export: Release 10.1.0.2.0 - Production on Monday, 07 February, 2005 15:04
Copyright (c) 2003, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
Job: SYS_EXPORT_FULL_03
Owner: SCOTT
That is, the attempt to attach to a non-named job failed (even though there was definitely only one job running in the schema at the time) -but an attempt to attach to a named job immediately afterwards worked fine. I'm not yet sure whether this is just me, just Windows, or a timing issue generally. But if in doubt, specify a job name, and it seems always to work.
5.4 Restarting a Job
A job which is stopped gracefully, in short, doesn't simply cease to exist. Instead, it is still there, idling, and you can attach to it at any time to re-start it. You do so as follows:
C:\>expdp scott/tiger attach

[snip]

Job: SYS_EXPORT_FULL_02
Owner: SCOTT
Operation: EXPORT
[snip]
State: IDLING
Bytes Processed: 0
Current Parallelism: 1
Job Error Count: 0
Dump File: c:\dumpdir\mydump\file%u.dmp
Dump File: c:\dumpdir\mydump\file01.dmp
bytes written: 4,096
Worker 1 Status:
State: UNDEFINED
Export> parallel=4
Export> start_job
Export> exit_client
The re-attachment to the idle job should by now be standard fare. What's new here is the ability to specify new parameters at the Export> prompt and to then be able to issue the START_JOB command. That resumes the job from where it left off at the point it was stopped. As in this example, one of the main reasons you might have for wanting to stop and re-start a job in this way is when you realise you forgot to set a decent degree of parallelism, and that the original job, left to its own devices, is therefore not going to finish until sometime next Christmas. There are limits to the sort of parameters you can modify like this, though: attempting to add a new EXCLUDE or an INCLUDE parameter, for example, is pushing it too far:
Export> EXCLUDE=TABLE
UDE-00053: unrecognized client command 'EXCLUDE=TABLE'
...and that's for good reason, obviously, since the job could well have already exported many tables which you are now claiming you want ignored. If that's the sort of thing you're after, you don't want to stop and resume a job, you want to kill it off entirely, and start a completely new one... for which piece of functionality, Oracle has thoughtfully provided the KILL_JOB command to be typed at the Export> prompt!
6.0 Generating DDL Scripts
Good, old-fashioned import in 9i would allow you to supply an INDEXFILE parameter. This would cause import to generate a text file, located and called wherever that parameter specified, which would contain within it the SQL commands necessary to re-create the various tables and indexes that import job had been concerned with. The create table statements were all commented out in the indexfile, but a bit of text file editing would nevertheless have made the indexfile a workable DDL generation script... providing you weren't interested in creating anything other than indexes or tables, of course, because things like `create procedure' statements just didn't get included. Data Pump import does a lot better than this, because when you run it you can specify the SQLFILE parameter, and that will cause a text file to be generated which has the necessary DDL in it to re-create all object types, not just tables and indexes. As an example:
C:\>impdp scott/tiger dumpfile=file%U.dmp sqlfile=SCOTT.SQL schemas=SCOTT

With the Partitioning, OLAP and Data Mining options
Master table "SCOTT"."SYS_SQL_FILE_SCHEMA_01" successfully loaded/unloaded
Starting "SCOTT"."SYS_SQL_FILE_SCHEMA_01": scott/******** dumpfile=file%U.dmp sqlfile=SCOTT.SQL schemas=SCOTT
Processing object type DATABASE_EXPORT/SCHEMA/USER
Processing object type DATABASE_EXPORT/SCHEMA/GRANT/SYSTEM_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/ROLE_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/DEFAULT_ROLE
Processing object type DATABASE_EXPORT/SCHEMA/PROCACT_SCHEMA
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/INDEX
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/CONSTRAINT/CONSTRAINT
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/COMMENT
Processing object type DATABASE_EXPORT/SCHEMA/FUNCTION/FUNCTION
Processing object type DATABASE_EXPORT/SCHEMA/FUNCTION/ALTER_FUNCTION
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/CONSTRAINT/REF_CONSTRAINT
Job "SCOTT"."SYS_SQL_FILE_SCHEMA_01" successfully completed at 15:58
In this example, the SQL file is sent to the same directory object that my dump files and log files are directed to (which happens, you'll recall, to be whichever directory object within the database the DATA_PUMP_DIR environment variable is set to). I did at least specify that the output file should be named `scott.sql', so using my trusty text editor of choice, I can open that file up and see something like this:
C:\dumpdir\mydump>type scott.sql | more

-- CONNECT SCOTT
-- new object type path is: DATABASE_EXPORT/SCHEMA/USER
-- CONNECT SYSTEM
CREATE USER "SCOTT" IDENTIFIED BY VALUES 'F894844C34402B67'
DEFAULT TABLESPACE "USERS"
TEMPORARY TABLESPACE "TEMP";
-- new object type path is: DATABASE_EXPORT/SCHEMA/GRANT/SYSTEM_GRANT
GRANT UNLIMITED TABLESPACE TO "SCOTT";
-- new object type path is: DATABASE_EXPORT/SCHEMA/ROLE_GRANT
GRANT "DBA" TO "SCOTT";
GRANT "CONNECT" TO "SCOTT";
GRANT "RESOURCE" TO "SCOTT";
-- new object type path is: DATABASE_EXPORT/SCHEMA/DEFAULT_ROLE
ALTER USER "SCOTT" DEFAULT ROLE ALL;
-- new object type path is: DATABASE_EXPORT/SCHEMA/PROCACT_SCHEMA
-- CONNECT SCOTT
BEGIN
sys.dbms_logrep_imp.instantiate_schema(schema_name=>'SCOTT', export_db_name=>'SALES', inst_scn=>'643491');
COMMIT;
END;
/
-- new object type path is: DATABASE_EXPORT/SCHEMA/TABLE/TABLE
CREATE TABLE "SCOTT"."DEPT"
( "DEPTNO" NUMBER(2,0),
"DNAME" VARCHAR2(14),
"LOC" VARCHAR2(13)
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS" ;
So the create table syntax is nice and clear and definitely not commented out! Later on in the same file, I see this:
-- new object type path is: DATABASE_EXPORT/SCHEMA/FUNCTION/FUNCTION
CREATE FUNCTION "SCOTT"."WHOAMI"
return number
is
mgrno number(4);
begin
select mgr# into mgrno from mgr
where ename = (select(sys_context('userenv','session_user')) from dual);
return mgrno;
end;
/
-- new object type path is: DATABASE_EXPORT/SCHEMA/FUNCTION/ALTER_FUNCTION
ALTER FUNCTION "SCOTT"."WHOAMI"
COMPILE
PLSQL_OPTIMIZE_LEVEL= 2
PLSQL_CODE_TYPE= INTERPRETED
PLSQL_DEBUG= FALSE
REUSE SETTINGS TIMESTAMP '2005-02-07:15:50:43'
/
And so we can see that a Data Pump SQLFILE does contain DDL for things other than basic tables and indexes. In this case, a function in Scott's schema is represented, together with the necessary SQL commands to ensure the re-created function is re-compiled properly. Given the number of newsgroup posts I used to respond to concerning the possibility of extracting non-table/non-index DDL from a pre-10g database, not to mention students on courses requesting the same thing, this SQLFILE ability of Data Pump must be considered as one of its highlights. It will surely make a lot of people very happy!
7.0 Data Pumping over the Network
One of the last things I'll mention about Data Pump is its ability to pass data directly over a network, so that data can be exchanged between two databases (production and development, say) without the need for intermediate disk files to be created first. That's useful because hammering the hard disk of a production database with lots of export-engendered I/O is not exactly a sensible thing to do... and never mind that the disk space needed for the export dump file might not have been available.
Piping the data straight over the network certainly avoids some of these issues (though your network bandwidth had better be up to the task, of course).
The technique works because Data Pump (both export and import) accepts a NETWORK_LINK parameter, which accepts as its value the name of a database link object already created in the database -and a database link itself uses a tnsnames alias, so all that sort of Oracle networking needs to be set up and functioning properly before pumping over the network can succeed.
To demo this, let's start with a clean slate. I have a database called SALES on server AUGUSTUS and a database called ORCL on server CLAUDIUS. In SALES, I have created the standard Scott schema, complete with EMP, DEPT and so on. In ORCL, I have no such schema. Here's how I would go about pumping the entire SCOTT schema into the ORCL database without generating a traditional dumpfile:
On ORCL (the 'destination' database, in a sense), I do the following
SQL> connect / as sysdba
Connected.

SQL> create public database link sales
  2  connect to scott identified by tiger
  3  using 'sales';

Database link created.
This third line of this command implies that my tnsnames.ora includes an alias called sales, and this is indeed the case:
SALES =
  (DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = augustus)(PORT = 1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = salesdb.dizwell.home)
    )
  )
Next, still connected as SYS on the ORCL database, I issue the following commands:
SQL> create user fred identified by wilma
  2  quota unlimited on users
  3  account unlock;

User created.

SQL> grant create session, create table to fred;

Grant succeeded.
And finally, I create a directory object in which the import's log file can be written:
SQL> create directory newdump as '/home/oracle';

Directory created.

SQL> grant read, write on directory newdump to fred;

Grant succeeded.
With those basic preparations out of the way, I can therefore do the 'pulling' of data out of SALES and into ORCL like so:
[oracle@claudius ~]$ impdp fred/wilma directory=newdump network_link=sales tables=emp,dept remap_schema='SCOTT:FRED'

Import: Release 10.2.0.1.0 - Production on Thursday, 20 July, 2006 11:06:17

Copyright (c) 2003, 2005, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Starting "FRED"."SYS_IMPORT_TABLE_01":  fred/******** directory=newdump network_link=sales tables=emp,dept remap_schema=SCOTT:FRED
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 128 KB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . imported "FRED"."DEPT"                                    4 rows
. . imported "FRED"."EMP"                                    14 rows
Processing object type TABLE_EXPORT/TABLE/INDEX/INDEX
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Processing object type TABLE_EXPORT/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type TABLE_EXPORT/TABLE/CONSTRAINT/REF_CONSTRAINT
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Job "FRED"."SYS_IMPORT_TABLE_01" successfully completed at 11:07:00
The command I type has to mention a directory object, because the log file recording what happens during an import data pump has to be written to a physical file somewhere, even if the DDL and table data themselves don't. The NETWORK_LINK parameter tells import what database link to use, and thus (because of the database link's definition) which schema it's connecting to -in my case, Scott's. That is why I can just mention tables "emp, dept", without having to call them scott.emp or scott.dept: the schema bit of the table name is already implicit in the use of a database link. Finally, a neat bit of REMAP_SCHEMA makes it clear that what is owned by Scott on SALES should become owned by Fred on ORCL. 
Proof that all that worked, if any were needed, is readily obtainable:
[oracle@claudius ~]$ sqlplus fred/wilma

SQL*Plus: Release 10.2.0.1.0 - Production on Thu Jul 20 11:09:43 2006

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select count(*) from emp;

  COUNT(*)
----------
        14
I find that pretty neat! What's even neater is that the entire thing is do-able the other way around. If I am connected to SALES rather than to ORCL, I can export across the network link and thus 'push' my data into ORCL, rather than be connected to ORCL 'pulling' the data out of SALES with import.
Anyone who needs to move large volumes of databetween databases (data marts to warehouses, for example) will probably find this new 'diskless' mode of transporting data very useful indeed.


No comments:

Post a Comment