数据如何在底层进行存储

每个表都有一个段头,段头里记录了哪些文件哪些 BLOCK 属于该表。我们可以通过以下 SQL 进行查询:

select HEADER_FILE,HEADER_BLOCK from dba_segments where segment_name='TB0101';

HEADER_FILE HEADER_BLOCK
----------- ------------
      7      146


ALTER system dump DATAFILE 7 block 146;
SELECT * FROM v$diag_info WHERE name ='Default Trace File';

看看段头块内容


*** 2025-06-26 17:56:38.789
Start dump data blocks tsn: 7 file#:7 minblk 146 maxblk 146
Block dump from cache:
Dump of buffer cache at level 4 for tsn=7 rdba=29360274
BH (0x68bdfc18) file#: 7 rdba: 0x01c00092 (7/146) class: 4 ba: 0x68902000
  set: 6 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,25
  dbwrid: 0 obj: 87359 objn: 87359 tsn: 7 afn: 7 hint: f
  hash: [0x7d9a2fc0,0x7d9a2fc0] lru: [0x68bdfe40,0x68bdfbd0]
  ckptq: [NULL] fileq: [NULL] objq: [0x60cb1018,0x68bdf850] objaq: [0x60cb1008,0x68bdfc08]
  st: XCURRENT md: NULL fpin: 'ktswh03: ktscts' tch: 7
  flags: block_written_once redo_since_read
  LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [3]
Block dump from disk:
buffer tsn: 7 rdba: 0x01c00092 (7/146)
scn: 0x0000.000fb80e seq: 0x03 flg: 0x04 tail: 0xb80e2303
frmt: 0x02 chkval: 0xfeba type: 0x23=PAGETABLE SEGMENT HEADER
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x00007FA2D9637200 to 0x00007FA2D9639200
7FA2D9637200 0000A223 01C00092 000FB80E 04030000  [#...............]
7FA2D9637210 0000FEBA 00000000 00000000 00000000  [................]
7FA2D9637220 00000000 00000001 00000008 00000A9C  [................]
7FA2D9637230 00000000 00000008 00000008 01C00098  [................]
7FA2D9637240 00000000 00000000 00000000 00000005  [................]
7FA2D9637250 00000000 00000000 00000000 00000000  [................]
7FA2D9637260 00000008 00000008 01C00098 00000000  [................]
7FA2D9637270 00000000 00000000 00000005 01C00090  [................]
7FA2D9637280 01C00090 00000000 00000000 00000000  [................]
7FA2D9637290 00000000 00000000 00000000 00000000  [................]
        Repeat 3 times
7FA2D96372D0 00000001 00002000 00000000 00001434  [..... ......4...]
7FA2D96372E0 00000000 01C00091 00000001 01C00090  [................]
7FA2D96372F0 01C00091 00000000 00000000 00000000  [................]
7FA2D9637300 00000000 00000000 00000001 00000000  [................]
7FA2D9637310 0001553F 10000000 01C00090 00000008  [?U..............]
7FA2D9637320 00000000 00000000 00000000 00000000  [................]
        Repeat 152 times
7FA2D9637CB0 01C00090 01C00093 00000000 00000000  [................]
7FA2D9637CC0 00000000 00000000 00000000 00000000  [................]
        Repeat 151 times
7FA2D9638640 00000000 00000000 01C00091 00000000  [................]
7FA2D9638650 00000000 00000000 00000000 00000000  [................]
        Repeat 185 times
7FA2D96391F0 00000000 00000000 00000000 B80E2303  [.............#..]
  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      spare2: 0      #extents: 1      #blocks: 8     
                  last map  0x00000000  #maps: 0      offset: 2716  
      Highwater::  0x01c00098  ext#: 0      blk#: 8      ext size: 8     
  #blocks in seg. hdr's freelists: 0     
  #blocks below: 5     
  mapblk  0x00000000  offset: 0     
                   Unlocked
  --------------------------------------------------------
  Low HighWater Mark : 
      Highwater::  0x01c00098  ext#: 0      blk#: 8      ext size: 8     
  #blocks in seg. hdr's freelists: 0     
  #blocks below: 5     
  mapblk  0x00000000  offset: 0     
  Level 1 BMB for High HWM block: 0x01c00090
  Level 1 BMB for Low HWM block: 0x01c00090
  --------------------------------------------------------
  Segment Type: 1 nl2: 1      blksz: 8192   fbsz: 0      
  L2 Array start offset:  0x00001434
  First Level 3 BMB:  0x00000000
  L2 Hint for inserts:  0x01c00091
  Last Level 1 BMB:  0x01c00090
  Last Level II BMB:  0x01c00091
  Last Level III BMB:  0x00000000
     Map Header:: next  0x00000000  #extents: 1    obj#: 87359  flag: 0x10000000
  Inc # 0 
  Extent Map
  -----------------------------------------------------------------
   0x01c00090  length: 8     
  
  Auxillary Map
  --------------------------------------------------------
   Extent 0     :  L1 dba:  0x01c00090 Data dba:  0x01c00093
  --------------------------------------------------------
  
   Second Level Bitmap block DBAs 
   --------------------------------------------------------
   DBA 1:   0x01c00091
  
End dump data blocks tsn: 7 file#: 7 minblk 146 maxblk 146

是不是看到密密麻麻的字就头晕,不要着急,我们先主要看几个配置信息就行了。

L2 Hint for inserts: 0x01c00091 :位图块 L2 在那个块

type: 0x23=PAGETABLE SEGMENT HEADER:表明该块类型是段头块

Extent Map :记录了那些块属于该表

select 
    dbms_utility.DATA_BLOCK_ADDRESS_FILE(to_number(replace('0x01c00091', '0x', ''), 'xxxxxxxxxxx')) file_no, 
    dbms_utility.DATA_BLOCK_ADDRESS_BLOCK(to_number(replace('0x01c00091', '0x', ''), 'xxxxxxxxxxx')) block_no
from dual;

FILE_NO   BLOCK_NO
---------- ----------
	 7	  145

得到位图块 L2 的位置,那我们继续 dump 然后查看追踪文件,这里就只展示主要内容:

type: 0x21=SECOND LEVEL BITMAP BLOCK:表明该块是一个二级位图块

L1 Ranges : 0x01c00090 Free: 5 Inst: 1 指向具体的一级位图块管理者五个空闲块

那我们继续 dump 看 L1 的主要内容

--------------------------------------------------------

DBA Ranges :

--------------------------------------------------------

0x01c00090 Length: 8 Offset: 0

0:Metadata 1:Metadata 2:Metadata 3:unformatted

4:unformatted 5:unformatted 6:unformatted 7:unformatted

--------------------------------------------------------

可以看到 0-2 是存放元数据块,剩下的 5 个块为unformatted,表示没有格式化,意味着可以往里面插入数据,插入的时候就格式化,当插入一条数据后,会顺带把相邻的也进行格式化。

INSERT INTO TB0101 VALUES (13,'13_a','other_col...');

插入一条数据后 L1 的变化如下:

0:Metadata 1:Metadata 2:Metadata 3:75-100% free

4:75-100% free 5:50-75% free 6:75-100% free 7:75-100% free

可以看到,数据块 5 剩余大小跟其他块不一样,显然这条数据被插入到了数据块 5 里面,那么我们有什么方法进行验证呢?

如果我们想确定某一行的具体位置我们可以用什么,没错,就是 rowid。但是 rowid 的内容好像看不太懂,没事,我们有办法让它“现原形”。

SELECT a.*,rowid from tb0101 a;
|ID |NAME|OTHER_COL|rowid|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
|---|----|---|----|
|13 |13_a|other_col...|AAAVU/AAHAAAACVAAA|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
SELECT dbms_rowid.ROWID_RELATIVE_FNO('AAAVU/AAHAAAACVAAA') file_no,
dbms_rowid.ROWID_BLOCK_NUMBER('AAAVU/AAHAAAACVAAA') block_no,
dbms_rowid.ROWID_ROW_NUMBER('AAAVU/AAHAAAACVAAA') row_no
FROM dual;

|FILE_NO|BLOCK_NO|ROW_NO|
|-------|--------|------|
|7      |149     |0     |

该表是从 144 开始的,144+5=149,跟我们执行的结果是一样的。

接下来呢,我们将 bbed 工具来查看行数据在数据块中是怎么进行存储的,关于 bbed 工具的使用等有空再额外写一篇。我们来插入一些数据:

INSERT INTO TB0101 VALUES (14,'14_a','other_col...');
INSERT INTO TB0101 VALUES (15,'15_a','other_col...');
BBED> p *kdbr[0]  
rowdata[4028]
-------------
ub1 rowdata[4028]                           @6174     0x2c

BBED> p *kdbr[1]
rowdata[2014]
-------------
ub1 rowdata[2014]                           @4160     0x2c

BBED> p *kdbr[2]
rowdata[0]
----------
ub1 rowdata[0]                              @2146     0x2c

kdbr[0] = 8180 - 6174 = 2014
kdbr[1] = 6174 - 4160 = 2014
kdbr[2] = 4160 - 2146 = 2014

也就是说每条数据占的字节数是 2014 ,但是跟我想的不太一样,(14,'14_a','other_col...'),int 占 2 个字节,varchar 占 4 个字节,char 是 2000 个字节,那么应该是 2006 才对。

我们打印出存储的信息来分析看看,

BBED> x /rncc
rowdata[4028]                               @6174    
-------------
flag@6174: 0x2c (KDRHFL, KDRHFF, KDRHFH)
lock@6175: 0x01
cols@6176:    3

col    0[2] @6177: 13 
col    1[4] @6180: 13_a
col 2[2000] @6185: other_col...  

我们可以看到,除了我们存储的数据之后,还有一个行头部信息:

flag 表示标志位

KDRHFH :Head 行头,KDRHFL: LAST ROW PIECE,KDRHFF:FIRST ROW PIECE 表示一行的最开始和结尾都在整理,这说明这条数据是完整的。

lock 表示锁状态,cols 表示列数量。 这部分信息占 3 个字节。 2006 + 3 加起来还是没达到 2014,那剩下的

5 个字节是什么呢?我们可以看到 6180-6177 = 3,与 int 字节多了一个字节,多的这个字节表示列长。一般列长需要一个字节。但是如果列的长度超过了250个字节,列长就将为3个字节。正好就对应的上了。

3+1+2+1+4+3+2000 = 2014,到此一行记录是怎么存储的我们就清楚了,那我们接着往下学习。

flag@6174: 0x2c (KDRHFL, KDRHFF, KDRHFH)                         @4160     0x2c

普通的 row 的 flag 一般为

Single Row = ROW_HEAD_PIECE + ROW_FIRST_PIECE + ROW_LAST_PIECE

= 0x20 + 0x80 + 0x04

=0x2c

如果这行记录被删除了,此时

Single Row = ROW_HEAD_PIECE + ROW_FIRST_PIECE + ROW_LAST_PIECE + ROW_DELETED_ROW

= 0x20 + 0x80 + 0x04 + 0x10

= 0x3c

行迁移(HeadRID)只有ROW_HEAD_PIECE,0x20

被指向的行(NextRID),标记是 NONE,0x00。

BBED> p *kdbr[0]
rowdata[4028]
-------------
ub1 rowdata[4028]                           @6174     0x3c

BBED> x /rncc
rowdata[4028]                               @6174    
-------------
flag@6174: 0x3c (KDRHFL, KDRHFF, KDRHFD, KDRHFH)
lock@6175: 0x02
cols@6176:    0

如果我们把数据删除后还未被覆盖,此时能不能恢复呢。答案是可以的

delete from tb0101 where id=13;

BBED> p *kdbr[0]  
rowdata[4028]
-------------
ub1 rowdata[4028]                           @6174     0x2c

BBED> modify /x 2c offset 6174
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /opt/oracle/app/oradata/orcl/assm_datafile.dbf (7)
 Block: 147              Offsets: 6174 to 6685           Dba:0x01c00093
------------------------------------------------------------------------
 2c020302 c10e0431 335f61fe d0076f74 6865725f 636f6c2e 2e2e2020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 
 20202020 20202020 20202020 20202020 20202020 20202020 20202020 20202020 

 <32 bytes per line>

BBED> sum dba 7,147
Check value for File 7, Block 147:
current = 0x1762, required = 0x1762

BBED> sum dba 7,147 apply
Check value for File 7, Block 147:
current = 0x1762, required = 0x1762

修改完成之后还需要重启数据库才行,此时我们再查询,就发现 id 为 13 的数据就回来了。

数据块的内部结构

BBED> map /v
 File: /opt/oracle/app/oradata/orcl/assm_datafile.dbf (7)
 Block: 147                                   Dba:0x01c00093
------------------------------------------------------------
 KTB Data Block (Table/Cluster)

 struct kcbh, 20 bytes                      @0       
    ub1 type_kcbh                           @0       
    ub1 frmt_kcbh                           @1       
    ub1 spare1_kcbh                         @2       
    ub1 spare2_kcbh                         @3       
    ub4 rdba_kcbh                           @4       
    ub4 bas_kcbh                            @8       
    ub2 wrp_kcbh                            @12      
    ub1 seq_kcbh                            @14      
    ub1 flg_kcbh                            @15      
    ub2 chkval_kcbh                         @16      
    ub2 spare3_kcbh                         @18      

 struct ktbbh, 72 bytes                     @20      
    ub1 ktbbhtyp                            @20      
    union ktbbhsid, 4 bytes                 @24      
    struct ktbbhcsc, 8 bytes                @28      
    sb2 ktbbhict                            @36      
    ub1 ktbbhflg                            @38      
    ub1 ktbbhfsl                            @39      
    ub4 ktbbhfnx                            @40      
    struct ktbbhitl[2], 48 bytes            @44      

 struct kdbh, 14 bytes                      @100     
    ub1 kdbhflag                            @100     
    sb1 kdbhntab                            @101     
    sb2 kdbhnrow                            @102     
    sb2 kdbhfrre                            @104     
    sb2 kdbhfsbo                            @106     
    sb2 kdbhfseo                            @108     
    sb2 kdbhavsp                            @110     
    sb2 kdbhtosp                            @112     

 struct kdbt[1], 4 bytes                    @114     
    sb2 kdbtoffs                            @114     
    sb2 kdbtnrow                            @116     

 sb2 kdbr[3]                                @118     

 ub1 freespace[2022]                        @124     

 ub1 rowdata[6042]                          @2146    

 ub4 tailchk                                @8188