情况描述
客户报告数据库故障,新来的系统管理员误操作。删掉了一些文件。具体情况是:删掉了所有重要数据文件、所有控制文件。数据库原来是归档模式,用 RMAN (Recovery Manager) 备份数据,而 RMAN 使用控制文件。
幸运的是,最后一次 RMAN full 备份是包括了控制文件在内。系统没有设定自动备份控制文件。现在状况是数据库无法启动。
不用说,客户的备份方案不够完善,但是这时候再去说这些话责备用户有事后诸葛亮之嫌,”用户是上帝,不要去得罪他”。还有,客户有Full备份(虽然不是自动备份控制文件,这样无法用常规的恢复步骤来进行恢复)。这对我们来说是个绝对的好消息。
下面我们通过一次模拟操作来演示这个问题的解决办法。
背景知识
在Oracle 816 以后的版本中,Oracle 提供了一个包: DBMS_BACKUP_RESTORE.DBMS_BACKUP_RESTORE 包是由 dbmsbkrs.sql 和 prvtbkrs.plb 这两个脚本创建的. catproc.sql 脚本运行后会调用这两个包.所以是每个数据库都有的. 这个包是 Oracle 服务器和操作系统之间 IO 操作的接口. 由恢复管理器直接调用。这两个脚本的功能是内建到 Oracle 的一些库文件中的.
由此可见,我们可以在数据库 nomount 情况下调用这些 package ,来达到我们的恢复目的。在dbmsbkrs.sql 和 prvtbkrs.plb 这两个脚本中有详细的说明文档,出于篇幅问题,就不一一加以翻译了,但在下面会直接引用一些原文说明。
关键的内容有:
FUNCTION deviceAllocate( type IN varchar2 default NULL ,name IN varchar2 default NULL ,ident IN varchar2 default NULL ,noio IN boolean default FALSE ,params IN varchar2 default NULL ) RETURN varchar2;
-- Describe the device to be used for sequential I/O. For device types where -- only one process at a time can use a device, this call allocates a device -- for exclusive use by this session. The device remains allocated until -- deviceDeallocate is called or session termination. The device can be used -- both for creating and restoring backups. -- -- Specifying a device allocates a context that exists until the session -- terminates or deviceDeallocate is called. Only one device can be specified -- at a time for a particular session. Thus deviceDeallocate must be called -- before a different device can be specified. This is not a limitation since -- a session can only read or write one backup at a time. -- -- The other major effect of allocating a device is to specify the name space -- for the backup handles (file names). The handle for a sequential file does -- not necessarily define the type of device used to write the file. Thus it -- is necessary to specify the device type in order to interpret the file -- handle. The NULL device type is defined for all systems. It is the file -- system supplied by the operating system. The sequential file handles are -- thus normal file names. -- -- A device can be specified either by name or by type. -- If the type is specified but not the name, the system picks an -- available device of that type. -- If the name is specified but not the type, the type is determined -- from the device. -- If neither the type or the name is given, the backups are files in -- the operating system file system.
-- Note that some types of devices, optical disks for example, can be shared -- by many processes, and thus do not really require allocation of the device -- itself. However we do need to allocate the context for accessing the -- device, and we do need to know the device type for proper interpretation -- of the file handle. Thus it is always necessary to make the device -- allocation call before making most other calls in this package. -- -- Input parameters: -- type -- If specified, this gives the type of device to use for sequential -- I/O. The allowed types are port specific. For example a port may -- support the type "TAPE" which is implemented via the Oracle tape -- API. If no type is specified, it may be implied by specifying a -- particular device name to allocate. The type should be allowed to -- default to NULL if operating system files are to be used. -- -- name -- If specified, this names a particular piece of hardware to use for -- accessing sequential files. If not specified, any available -- device of the correct type will be allocated. If the device cannot -- be shared, it is allocated to this session for exclusive use. -- The name should be allowed to default to NULL if operating system -- files are to be used. -- -- ident -- This is the users identifier that he uses to name this device. It -- is only used to report the status of this session via -- dbms_application_info. This value will be placed in the CLIENT_INFO -- column of the V$SESSION table, in the row corresponding to the -- session in which the device was allocated. This value can also -- be queried with the dbms_application_info.read_client_info procedure. -- -- noio -- If TRUE, the device will not be used for doing any I/O. This allows -- the specification of a device type for deleting sequential files -- without actually allocating a piece of hardware. An allocation for -- noio can also be used for issuing device commands. Note that some -- commands may actually require a physical device and thus will get -- an error if the allocate was done with noio set to TRUE. -- -- params -- This string is simply passed to the device allocate OSD. It is -- completely port and device specific. -- -- Returns: -- It returns a valid device type. This is the type that should be -- allocated to access the same sequential files at a later date. Note -- that this might not be exactly the same value as the input string. -- The allocate OSD may do some translation of the type passed in. The -- return value is NULL when using operating system files.
PROCEDURE restoreControlfileTo(cfname IN varchar2);
-- This copies the controlfile from the backup set to an operating system -- file. If the database is mounted, the name must NOT match any of the -- current controlfiles. -- 从备份中恢复控制文件为操作系统文件. -- 如果数据库在 mount 状态,控制文件名字则必须要匹配当前控制文件的名字 -- Input parameters: -- cfname -- Name of file to create or overwrite with the controlfile from the -- backup set.
-- 备份集里的控制文件名字,创建或者覆盖控制文件. PROCEDURE restoreDataFileTo( dfnumber IN binary_integer ,toname IN varchar2 default NULL); -- -- restoreDataFileTo creates the output file from a complete backup in the -- backup set. -- 从完整的备份集里创建输出文件
如果您有兴趣可以去阅读一下这两个文件的注释说明.
解决过程
首先,用控制文件作数据库系统的全备份:
C:WUTemp>rman target /
Recovery Manager: Release 9.2.0.1.0 - Production.
Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.
connected to target database: DEMO (DBID=3272375326)
RMAN> run { 2> allocate channel C1 type disk; 3> backup full tag 'FullBackup' format 'd:\KDE\%d_%u_%s_%p.dbf' database include current controlfile; 4> sql ' alter system archive log current'; 5> release channel C1; 6> }
using target database controlfile instead of recovery catalog allocated channel: C1 channel C1: sid=15 devtype=DISK
Starting backup at 18-JUL-04 channel C1: starting full datafile backupset channel C1: specifying datafile(s) in backupset including current SPFILE in backupset including current controlfile in backupset input datafile fno=00001 name=D:\ORACLE\ORADATA\DEMO\SYSTEM01.DBF input datafile fno=00002 name=D:\ORACLE\ORADATA\DEMO\UNDOTBS01.DBF input datafile fno=00004 name=D:\ORACLE\ORADATA\DEMO\EXAMPLE01.DBF input datafile fno=00009 name=D:\ORACLE\ORADATA\DEMO\XDB01.DBF input datafile fno=00005 name=D:\ORACLE\ORADATA\DEMO\INDX01.DBF input datafile fno=00008 name=D:\ORACLE\ORADATA\DEMO\USERS01.DBF input datafile fno=00003 name=D:\ORACLE\ORADATA\DEMO\DRSYS01.DBF input datafile fno=00006 name=D:\ORACLE\ORADATA\DEMO\ODM01.DBF input datafile fno=00007 name=D:\ORACLE\ORADATA\DEMO\TOOLS01.DBF channel C1: starting piece 1 at 18-JUL-04 channel C1: finished piece 1 at 18-JUL-04 piece handle=D:\KDE\DEMO_01FR79OT_1_1.DBF comment=NONE channel C1: backup set complete, elapsed time: 00:01:17 Finished backup at 18-JUL-04
sql statement: alter system archive log current
released channel: C1
如上所示,我们做了一次数据库的Full备份.备份片中包括控制文件.注意上面输出内容的黑体部分.我们在后面的恢复操作中会用到.
模拟错误,关掉实例,删掉所有的控制文件和所有的.DBF文件。然后starup会看到如下的出错信息:
SQL> startup ORACLE instance started.
Total System Global Area 152115804 bytes Fixed Size 453212 bytes Variable Size 100663296 bytes Database Buffers 50331648 bytes Redo Buffers 667648 bytes ORA-00205: error in identifying controlfile, check alert log for more info
查看alert Log,应该是系统找不到控制文件.现在情形和客户问题一致.不过在继续讲述之前,我们还需要介绍一点背景知识.
我们首先尝试恢复控制文件:
SQL>startup force nomount;
SQL> DECLARE 2 devtype varchar2(256); 3 done boolean; 4 BEGIN 5 devtype:=sys.dbms_backup_restore.deviceAllocate(type=>'',ident=>'T1'); 6 sys.dbms_backup_restore.restoreSetDatafile; 7 sys.dbms_backup_restore.restoreControlfileTo (cfname=>'d:\oracle\Control01.ctl'); 8 sys.dbms_backup_restore.restoreBackupPiece(done=>done,handle=>'D:\KDE\DEMO_01FR79OT_1_1.DBF', params=>null); 9 sys.dbms_backup_restore.deviceDeallocate; 10 END; 11 /
PL/SQL procedure successfully completed.
OK,控制文件恢复完成.对以上内容的解释:
- 第五行 分配一个device channel,因为使用的操作系统文件,所以这里为空,如果是从磁带上恢复要用 “sbt_tape”;
- 第六行 指明开始restore ;
- 第七行 指出待恢复文件目标存储位置;
- 第八行 从哪个备份片中恢复;
- 第九行 释放设备通道.
不妨对以上操作的结果验证一下:
SQL> host dir d:\oracle Volume in drive D is DATA Volume Serial Number is DC79-57F8 Directory of d:\oracle
07/18/2004 09:08 PM <DIR> . 07/18/2004 09:08 PM <DIR> .. 06/08/2004 03:21 PM <DIR> admin 07/18/2004 09:08 PM 1,871,872 CONTROL01.CTL 07/16/2004 11:27 AM <DIR> ORA92 07/18/2004 09:02 PM <DIR> oradata
这样,我们成功的 restore 了控制文件 .如果控制文件在 Full 备份之后单独做的,接下来关掉实例,拷贝控制文件到具体位置,然后 rman 执行 restore database; 即可。
可是,我们这里的情况有些不同.
视丢失文件的情况而定,继续进行如下的恢复操作:
SQL> DECLARE 2 devtype varchar2(256); 3 done boolean; 4 BEGIN 5 devtype:=sys.dbms_backup_restore.deviceAllocate (type=>'',ident=>'t1'); 6 sys.dbms_backup_restore.restoreSetDatafile; 7 sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>01,toname=>'d:\oracle\oradata\demo\SYSTEM01.DBF'); 8 sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>02,toname=>'d:\oracle\oradata\demo\UNDOTBS01.DBF'); 9 sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>03,toname=>'d:\oracle\oradata\demo\DRSYS01.DBF'); 10 sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>04,toname=>'d:\oracle\oradata\demo\EXAMPLE01.DBF'); 11 sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>05,toname=>'d:\oracle\oradata\demo\INDX01.DBF'); 12 sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>06,toname=>'d:\oracle\oradata\demo\ODM01.DBF'); 13 sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>07,toname=>'d:\oracle\oradata\demo\TOOLS01.DBF'); 14 sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>08,toname=>'d:\oracle\oradata\demo\USERS01.DBF'); 15 sys.dbms_backup_restore.restoreDatafileTo(dfnumber=>09,toname=>'d:\oracle\oradata\demo\XDB01.DBF'); 16 sys.dbms_backup_restore.restoreBackupPiece(done=>done,handle=>'D:\KDE\DEMO_01FR79OT_1_1.DBF', params=>null); 17 sys.dbms_backup_restore.deviceDeallocate; 18 END; 19 /
PL/SQL procedure successfully completed.
–我们的情形是所有的数据文件都丢失了,那就如法炮制 ………..
–文件对应编号来自前面全备份时候的屏幕输出内容.所以,在备份的时候保留操作 Log 是个很好的习惯.
SQL> startup force mount; ORACLE instance started.
Total System Global Area 152115804 bytes Fixed Size 453212 bytes Variable Size 100663296 bytes Database Buffers 50331648 bytes Redo Buffers 667648 bytes Database mounted. SQL> Recover database using backup controlfile until cancel ; ORA-00279: change 243854 generated at 07/18/2004 20:57:03 needed for thread 1 ORA-00289: suggestion : D:\KDE\ARC00002.001 ORA-00280: change 243854 for thread 1 is in sequence #2 Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
D:\KDE\ARC00002.001 ORA-00279: change 244089 generated at 07/18/2004 20:58:18 needed for thread 1 ORA-00289: suggestion : D:\KDE\ARC00003.001 ORA-00280: change 244089 for thread 1 is in sequence #3 ORA-00278: log file 'D:\KDE\ARC00002.001' no longer needed for this recovery
Specify log: {<RET>=suggested | filename | AUTO | CANCEL} CANCEL Media recovery cancelled.
SQL> alter database open resetlogs; Database altered.
最后,不得不resetlogs .
然后,打扫战场,马上进行数据库的全备份。如果您是DBA的话,应该进一步制定并完善备份计划.亡羊补牢,为时未晚。
总结一下
- 控制文件在备份中意义重大,建议每次对其单独备份,如果数据库版本允许的话,应该设置为控制文件自动备 份。同时应该尽可能地增大CONTROL_FILE_RECORD_KEEP_TIME这个初始化参数的值。以便备份信息能更长时间的保留
- 应该制定比较完善的备份计划,否则备份计划一旦出现缺口,将可能给系统带来灾难.记住, "可能出错的地方一定会出错".
- 熟悉RMAN内部备份机制,对DBMS_BACKUP_RESTORE的用法有一定的掌握在关键时侯很有帮助.
- 备份脚本应该对Log重定向并保存.以便在出错的查找有用信息.
参考信息
RMAN Recovery Without Recovery Catalog or Controlfiles by Bonnie Bizzaro
dbmsbkrs.sql 和 prvtbkrs.plb 文件说明注释(可在你的系统 $ORACLE_HOME/rdbms/admin/中找到.)
相关链接
本文的更多信息,请看ITpub 论坛对此话题的讨论 – http://www.itpub.net/244345.html
本文的Blog讨论,请看CSDN 对此话题的讨论 – http://blog.csdn.net/fenng/archive/2004/07/19/44945.aspx
本文是对旧文章的整理.请看这篇文章的旧版本
RMAN, RMAN备份, RMAN恢复, 恢复
Oracle XE 自带的数据库如何创建的?
简单的说了一下Oracle 10g XE 的字符集问题, 我们接下来看看 Oracle 10g XE 自带的数据库如何创建的.
XE 用的是非常规手段: dbms_backup_restore…
Oracle XE 自带的数据库如何创建的?
简单的说了一下Oracle 10g XE 的字符集问题, 我们接下来看看 Oracle 10g XE 自带的数据库如何创建的.
XE 用的是非常规手段: dbms_backup_restore…