Database System Basic - 3

Database Storage II


Page Base Storage


A. Disk-oriented Architecture

The DBMS assumes that the primary storage location of the database is on non-volatile disk.

The DBMS's components manage the movement of data between non-volatile and volatile storage.

 

B. Log-structured File Organization

Instead of storing tuples in pages, the DBMS only stores log records

1. Update / Write Log

The system appends log records to the file of how the database was modified:

--> Inserts store the entire tuple

--> Deletes mark the tuple as deleted

--> Updates contain the delta of just the attributes that were modified

For example:

image-20220516140754074

 

2. Read

a) To read a record, the DBMS scans the log backwards and "recreates" the tuple to find what it needs

For example:

image-20220516141333618

b) Build index to allow it to jump to location in the log

For example:

image-20220516141613525

c) Periodically compact the log

For example:

image-20220516142235795

 

C. Log-structured Compaction

Compaction coalesces larger log files into smaller files by removing unnecessary records

 

Level CompactionUniversal Compaction
image-20220516142552544image-20220516142716972

 

Data Resentation


INTEGER / BIGINT / SMALLINT / TINYINT

--> C / C++ Representation

FLOAR / REAL vs. NUMERIC / DECIMAL

--> IEEE-754 Standard / Fixed-point Decimals

VARCHAR / VARBINARY / TEXT / BLOB

--> Header with length, followed by data bytes

TIME / DATE / TIMESTAMP

--> 32 / 64-bit integer of (micro) seconds since Unix epoch

 

A. Variable Precision Numbers

Inexact, variable-precision numeric type that uses the "native" C / C++ types

--> Ex: FLOAT / REAL / DOUBLE

Example:

 

B. Fixed Precision Numbers

Numeric data types with arbitrary precision and scale. Used when rounding errors are unacceptable

--> Ex. NUMERIC / DECIMAL

Many different implementations

 

1. Postgres: Numeric

image-20220517134720774

 

2. MySQL: Numeric

image-20220517135241093

 

C. Large Values

Most DBMSs don't allow a tuple to exceed the size of a single page.

To store values that are larger than a page, the DBMS uses separate overflow storage pages.

--> Postgres: TOAST (> 2 KB)

--> MySQL: Overflow (> 1/2 size of page)

--> SQL Server: Overflow (> size of page)

Example:

image-20220517140042991

 

D. External Value Storage

Some systems allow you to store a really large value in an external file. Treated as a BLOB type. (Oracle & Microsoft)

The DBMS cannot manipulate the contents of an external file

Example:

image-20220517142309414

 

System Catalogs


A DBMS store meta-data about databases in its internal catalogs

--> Tables, columns, indexes, views

--> Users, permissions

--> Internal statistics

Almost every DBMS stores the database's catalog inside itself

--> Wrap object abstractuon around tuples

 

You can query the DBMS's internal INFORMATION_SCHEMA catalog to get info about the database

--> ANSI standard set of read-only views that provide info about all the tables, views, columns, and procedures in a database

 

A. Accessing Table Schema

For example: = student

SQL-92:

Postgres:

MySQL:

SQLite:

 

B. Database Workloads

1. On-line Transaction Processing (OLTP)

--> Fast operations that only read / update a small amount of data each time

2. On-line Analytical Processing (OLAP)

--> Complex queries that read a lot of data to compute aggregates

3. Hybrid Transaction + Analytical Processing

--> OLTP + OLAP together on the same database instance

 

Graph:

image-20220517144442102

1. Bifurcated Environment

1. OLTP ----> OLAP

image-20220517144945156

2.HTAP ----> OLAP

image-20220517145055581

 

2. Wikipedia Example

 

3. OLTP

On-line Transaction Processing (OLTP)

---> Simple queries that read / update a small amount of data that is related to a single entity in the database

 

4. OLAP

On-line Analytical Processing (OLAP)

--> Complex queries that read large portions of the database spanning multiple entites

 

 

Data Storage Models


The DBMS can store tuples in different ways that are better for either OLTP or OLAP workloads

 

A. N-Ary Storage Model (NSM)

The DBMS stores all attributes for a single tuple contiguously in a page

Ideal for OLTP workload where queries tend to operate only on an individual entity and insertheavy workloads

The DBMS stores all attributes for a single tuple contiguosly in a page

image-20220517193815982

Example:

image-20220517193953663

image-20220517194051353

 

AdvantagesDisadvantages
- Fast inserts, updates, and deletes- Not good for scanning large portions of the table and / or a subsetof the attributes
- Good for queries that need the entire tuple 

 

B. Decomposition Storage Model (DSM)

The DBMS stores the values of a single attribute for all tuples contiguously in a page

Ideal for OLAP workload where read-only queries perform large scans over a subset of the table's attributes

image-20220517195039534

1. Example:

image-20220517195316157

2. Tuple Identification

Choice #1: Fixed-length Offsets

--> Each value is the same length for an attribute

image-20220517195920324

 

Choice #2: Embedded Tuple Ids

--> Each value is stored with its tuple id in a column

image-20220517195945875

 

3. Advantages vs. Disadvantages

AdvantagesDisadvantages
- Reduces the amount wasted I / O because the DBMS only reads the data that it needs- Slow for point queries, inserts, updates, and deletes because of tuple splitting / stitching
- Better query processing and data compression