The DBMS assumes that the primary storage location of the database is on non-volatile disk.
The DBMS's components manage the movement of data between non-volatile and volatile storage.
Instead of storing tuples in pages, the DBMS only stores log records
The system appends log records to the file of how the database was modified:
--> Inserts store the entire tuple
--> Deletes mark the tuple as deleted
--> Updates contain the delta of just the attributes that were modified
For example:
For example:
For example:
For example:
Compaction coalesces larger log files into smaller files by removing unnecessary records
Level Compaction | Universal Compaction |
---|---|
INTEGER / BIGINT / SMALLINT / TINYINT
--> C / C++ Representation
FLOAR / REAL vs. NUMERIC / DECIMAL
--> IEEE-754 Standard / Fixed-point Decimals
VARCHAR / VARBINARY / TEXT / BLOB
--> Header with length, followed by data bytes
TIME / DATE / TIMESTAMP
--> 32 / 64-bit integer of (micro) seconds since Unix epoch
Inexact, variable-precision numeric type that uses the "native" C / C++ types
--> Ex: FLOAT / REAL / DOUBLE
xxxxxxxxxx
int main(int argc, char* argv[]) {
float x = 0.1;
float y = 0.2;
printf("x + y = %f\n", x + y);
printf("0.3 = %f\n", 0.3);
}
// output:
// x + y = 0.300000
// 0.3 = 0.300000
xxxxxxxxxx
int main(int argc, char* argv[]) {
float x = 0.1;
float y = 0.2;
printf("x + y = %.20f\n", x + y);
printf("0.3 = %.20f\n", 0.3);
}
// output:
// x + y = 0.30000001192092895508
// 0.3 = 0.29999999999999998890
Numeric data types with arbitrary precision and scale. Used when rounding errors are unacceptable
--> Ex. NUMERIC / DECIMAL
Many different implementations
Most DBMSs don't allow a tuple to exceed the size of a single page.
To store values that are larger than a page, the DBMS uses separate overflow storage pages.
--> Postgres: TOAST (> 2 KB)
--> MySQL: Overflow (> 1/2 size of page)
--> SQL Server: Overflow (> size of page)
Example:
Some systems allow you to store a really large value in an external file. Treated as a BLOB type. (Oracle & Microsoft)
The DBMS cannot manipulate the contents of an external file
Example:
A DBMS store meta-data about databases in its internal catalogs
--> Tables, columns, indexes, views
--> Users, permissions
--> Internal statistics
Almost every DBMS stores the database's catalog inside itself
--> Wrap object abstractuon around tuples
You can query the DBMS's internal INFORMATION_SCHEMA catalog to get info about the database
--> ANSI standard set of read-only views that provide info about all the tables, views, columns, and procedures in a database
For example:
SQL-92:
x
SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'student';
Postgres:
xxxxxxxxxx
\d student;
MySQL:
xxxxxxxxxx
DESCRIBE student;
SQLite:
xxxxxxxxxx
.schema student
--> Fast operations that only read / update a small amount of data each time
--> Complex queries that read a lot of data to compute aggregates
--> OLTP + OLAP together on the same database instance
Graph:
x
CREATE TABLE useracct (
userID INT PRIMARY KEY,
userName VARCHAR UNIQUE,
...
);
x
CREATE TABLE pages (
pageID INT PRIMARY KEY,
title VARCHAR UNIQUE,
latest INT
REFERENCES revisions (revID),
);
xxxxxxxxxx
CREATE TABLE revisions (
revID INT PRIMARY KEY,
userID INT REFERENCES useracct (userID),
pageID INT REFERENCES pages (pageID),
content TEXT,
updated DATATIME
);
On-line Transaction Processing (OLTP)
---> Simple queries that read / update a small amount of data that is related to a single entity in the database
xxxxxxxxxx
SELECT P.*, R.*
FROM pages AS P
INNER JOIN revisions AS P
ON P.latest = R.revID
WHERE P.pageID = ?
xxxxxxxxxx
UPDATE useracct
SET lastLogin = NOW(),
hostname = ?
WHERE userID = ?
xxxxxxxxxx
INSERT INTO revisions
VALUES (?, ?..., ?)
On-line Analytical Processing (OLAP)
--> Complex queries that read large portions of the database spanning multiple entites
xxxxxxxxxx
SELECT COUNT(U.lastLogin),
EXTRACT(month FROM
U.lastLogin) AS month
FROM useracct AS U
WHERE U.hostname LIKE '%.gov'
GROUP BY
EXTRACT(month FROM U.lastLogin)
The DBMS can store tuples in different ways that are better for either OLTP or OLAP workloads
The DBMS stores all attributes for a single tuple contiguously in a page
Ideal for OLTP workload where queries tend to operate only on an individual entity and insertheavy workloads
The DBMS stores all attributes for a single tuple contiguosly in a page
Example:
Advantages | Disadvantages |
---|---|
- Fast inserts, updates, and deletes | - Not good for scanning large portions of the table and / or a subsetof the attributes |
- Good for queries that need the entire tuple |
The DBMS stores the values of a single attribute for all tuples contiguously in a page
Ideal for OLAP workload where read-only queries perform large scans over a subset of the table's attributes
--> Each value is the same length for an attribute
--> Each value is stored with its tuple id in a column
Advantages | Disadvantages |
---|---|
- Reduces the amount wasted I / O because the DBMS only reads the data that it needs | - Slow for point queries, inserts, updates, and deletes because of tuple splitting / stitching |
- Better query processing and data compression |