Database System Basic - 2

Database Storage I (File Storage + Page Layout + Tuple Layout)


A. Disk-Based Architecture

DBMS assums that the primary storage location of the database is on non-volatile disk.

DBMS's components manage the movement of data between non-volatile disk and volatile disk.

 

B. Storgae Hierarchy

1. kinds of Storage

image-20220515194603229

2. Access Times

TimeStorgae HierarchyTime (changed)
0.5 nsL1 Cache Ref0.5 sec
7 nsL2 Cache Ref7 sec
100 nsDRAM100 sec
150,000 nsSSD1.7 days
10,000,000 nsHDD16.5 weeks
~ 30,000,000 nsNetwork Storage11.4 months
1,000,000,000 nsTape Archives31.7 years

 

C. Sequential VS. Random Access

For non-volatile: running time (usually) ==> Random access < Sequential access

DBMS want to maximize Sequential access

 

D. Disk-oriented DBMS

image-20220515202117443

 

E. File Storage

The DBMS stores a database as one or more files on disk typically in a proprietary format.

 

1. Storage Manager

Storage Manager: Responsible for maintaining a database's file.

 

It organizes the files as a collection of pages.

--> Tracks data read / written to pages

--> Tracks the available space

 

2. Database Pages

Page: A fixed-size block of data (Some systems require a page to be self-contained)

Each page is given a unique identifier (page IDs --> Physical locations)

TypesSize
Hardware Page (the largest block of data that the device can guarantee failsafe write)usually 4 KB
OS Pageusually 4 KB
Database Page512 B - 16 KB

 

3. Database Heap

Heap File: An unordered collection of pages with tuples that are stored in random order.

--> Create / Get / Write / Delete Page & support iterating over all pages

Represent a heap file:

--> Linked List

--> Page Directory

For example: (Track Multiple files)

image-20220515211948998

a). Heap File: Linked List

Maintain a header page at the beginning of the file that stores two pointers:

--> HEAD of the free page list

--> HEAD of the data page list

Each page keeps track of how many free slots they currently have.

For example:

image-20220515212844623

 

b). Heap File: Page Directory

DBMS maintains special pages that tracks the location of data pages in the database file.

The directory also records the number of free slots of per page.

(P.s. the directory pages are in sync with the data pages)

 

4. Page Header

Every page contains a header of meta-data about the page's contents

--> Page Size

--> Checksum

--> DBMS Version

--> Transaction Visibility

--> Compression Information

For example:

image-20220515214349465

 

F. Page Layout

Two approaches:

--> Tuple-oriented

--> Log-structured

 

1. Tuple Storage

Strawman Idea: Keep track of the number of tuples in a page and then just append a new tuples to the end.

For example:

Page

Num Tuples = 3
Tuple #1
Tuple #4
Tuple #3
 

 

2. Slotted Pages

The slot array maps "slots" to the tuples' starting position offsets

The header keeps strack of:

--> The # of used slots

--> The offset of the starting location of the last slot used

For example:

image-20220515215945263

 

3. Record IDs

Each tuple is assigned a unique record identifier.

--> Most common: page_id + offset / slot

--> Can also contain file location info

 

G. Tuple Layout

Tuple: Essentially a sequence of bytes

 

1. Tuple Header

Each tuple is prefixed with a header that contains meta-data about it.

--> Visibility info (concurrency control)

--> Bit Map for NULL values

 

2. Tuple Data

Attributes are typically stored in the order that you specify them when you create the table.

Tuple

Headerabcde
..................

 

3. Denormalized Tuple Data

DBMS can physically denormalize related tuples and store them together in the same page.

--> Reduces the amount of I/O for common workload patterns

--> Can make updates more expersive

For example:

image-20220515222313973