Data
Information
Knowledge
Non-persistent storage
Persistent storage
Confidentiality
info protection against unauthorized access
Integrity
info protection against unauthorized modify & delete
Availability
mangae authorized to access info
Other
Authenticity
verify user
Accountability
tracking the data access
Example:
Encryption
Digital Signature
Database Replication
Countermeasures designed to meet a set of defined security requirements
Example:
Authentication
Access control
Monitoring and Auditing
Challenges of Data Storage
More data need to keep
Data breaches is increasing this few years
A security incident which involves the intentional or unintentional access, disclosure, manipulation or destruction of data
Causes of Breach
Application vulnerabilities
Malware, Trojan Horse and Virus
Malicious insiders
Human errors
Features
reducing
improving
data access to users through use of query languages
enhancing
data security
decreasing
data entry, storage, and retrieval costs
allowing
more flexibility for manipulating data
presenting
Database Management System (DBMS)
DBMS Architecture
Disk Space Management
Collection -> Block1, Block2, Block3 ...
DBMS Buffer Management
(Data fetched from RAM use by program)
Buffer Pool
Memory <--> Disk
Buffer manager
A subsystem in DBMS which is responsible for allocating buffer space in main memory
Programs call on the buffer manager when they need a block from disk
Files and Index Management
Index
data structure in database
Query Processor
Translates query into an execution plan
Query optimization
Execute the query
Data Manipulation Language (DML)
SELECT
INSERT
UPDATE
DELETE
SQL Joins
Data Definition Language (DDL)
CREATE
DROP
ALTER
RENAME
Data Control Language (DCL)
GRANT
gives user’s access privileges to database
REVOKE
withdraws user’s access privileges given which use GRANT
Relational DBMS (RDBMS)
MySQL
MariaDB
PostgreSQL
MS SQL Server
Oracle
...
NoSQL is a broad term used to label many types of data stores that diverge from the traditional relational database model
NoSQL DBMS
MongoDB
Cassandra
Redis
...
noSQL | Relational Databases | |
---|---|---|
Data models and schema | Unstructured | Structured |
Data structure | Document-based | Table-based |
Scaling | Horizontal | Vertical |
Development model | Open source | Close source |
A key difference between SQL/NoSQL is how each data model handles data normalization
Functional Dependency
Boyce-Codd Normal Form (BCNF)
Database Availability
The ability to use the information or resource desired
An unavailable system is as bad as no system
Someone may deliberately deny access to data or to a resource by making it unavailable
Attack example: DDOS, DOS
Database normally built in a distributed manner
Three issues
How to partition the database into fragments
Which fragments to replicate
Where to locate those fragments and replicas
Data Fragmentation ( this info is store in the distributed data catalog (DDC) )
To break a single object into two or more segments, or fragments
Tree Types
Horizontal Fragmentation
The division of a relation into subsets (fragments) of tuples (rows)
Round-robin partitioning
an even distribution of rows among all fragments
Bad for "Location awareness"
Range partitioning based on a partition key
Vertical Fragmentation
The division of a relation into attribute (column) subsets
Each subset (fragment) is stored at a different node & each fragment has unique columns
Mixed Fragmentation
a combination of horizontal and vertical strategies
Example
alter table real_estate add column OfficeCityID INTEGER after OfficeCity;
update real_estate set officecityid=1 where `OfficeCity`="CityA";
update real_estate set officecityid=2 where `OfficeCity`="CityB";
-- Let the table splite to 2 partitions by hash value of OfficeCityID
alter table real_estate partition by hash(`OfficeCityID`) partitions 2;
SELECT TABLESPACE_NAME, FILE_NAME FROM information_schema.FILES;
-- Remove the partitions to one
alter table real_estate remove partitioning;
alter table real_estate
PARTITION BY LIST(`OfficeCityID`) (
PARTITION CityA VALUES IN (1),
PARTITION CityB VALUES IN (2)
);
Fragmentation: Pros and Cons?
Advantages
Partitions could be stored on different storage devices, taking advantage of resources in parallel
Bulk data operations
Efficient locking
Better resilience in case of corruption
Partitioning makes it possible to store more data in one table than can be held on a single disk or file system partition
Some queries can be greatly optimized in virtue of the fact that data satisfying a given WHERE clause can be stored only on one or more partitions
Cons
Data skew if partitions are not balanced
Storage overhead
More complex to manage than a non-partitioned table
Some DBMS do not natively support storing partitions in different locations
Data Replication
Mutual Consistency Rule: That all copies of data fragments be identical.
It refers to the storage of data copies at multiple sites served by a computer network
Two styles
Push Replication
This type of replication focuses on maintaining data consistency
decreases data availability before the copying finish
Pull Replication
The focus is on maintaining data availability
Replica Overhead
Data Allocation
Allocation Types
Centralized data allocation
Partitioned data allocation
Replicated data allocation
A database state is consistent if and only if it satisfies all defined constraints
Consistency types
Strong consistency (强一致性)
Read will always reflect the most recent write
Make sure that the write to all replicas are completed before committing
Eventual consistency (最终一致性)
different nodes may have different versions of the data, but where state changes eventually propagate to all of the servers
Quorum consistency (基于群体的一致性)
The majority of the replicas agree on the result
Trade latency for consistency in eventually consistent stores
( It uses a voting mechanism to determine the consistency of data operations )
The CAP theorem states that a distributed database system has to make a tradeoff between Consistency and Availability when a Partition occurs.
Proposed in 2000
Consistency
Availability
Partition tolerance
Atomicity, Consistency, Isolation, Durability (ACID)
Basically available, soft state, eventually consistent (BASE)
Primary and Secondaries
Read Operations to Replica Sets
Automatic Failover
MongoDB Election
MongoDB Mirrored Reads
Network Partition (due to network failure)
Fault Tolerance
Fault tolerance for a replica set is the number of members that can become unavailable and still leave enough members in the set to elect a primary
In other words, it is the difference between the number of members in the set and the majority of voting members needed to elect a primary
Without a primary, a replica set cannot accept write operations
Number of Members | Majority Req of Elect a new primary | Fault Tolerance |
---|---|---|
3 | 2 | 1 |
4 | 3 | 1 |
5 | 3 | 2 |
6 | 4 | 2 |
Rollbacks During Replica Set Failover
Write Concern
Default write: 1 (just write to Primary)
Also can “majority” write concern
MongoDB sharded cluster
Sharded Cluster
Sharded Collections
Sharding key
Range-based Sharding
Read operations on sharded clusters
Read operations is efficient when directed to a specific shard
Hashed Sharding
Chunks Splitting and Migration
Non-shardeed Collections
Shard
mongos
Config Servers
Points of Vulnerability
Data-in-motion
The client and server are not co-located
Data-at-rest
Data that is not actively moving
Data-in-use
Data loaded into memory, and resides there for some time
How threat actors exploit databases
Stealing the credentials of a privileged administrator or application user
Exploiting weaknesses in applications
Escalating run-time privileges by exploiting vulnerable applications
Accessing database files that are unencrypted on the disk
Stealing archive tapes and media containing database backups
Exploiting unpatched systems or misconfigured databases to bypass access controls
Copying live data from development and test systems where the data is typically not as well protected as in the production systems
Elements in Database Security
Activity Audit
Data Discovery
Compliance Scan
Vulnerability Scan
Patch Automation
Assess & Prevent & Detect
Confidentiality is the concealment of information or resources
Security Mechanisms
Encryptions
Symmetric Encryption
Example: AES
Mode of Operation (ECB, CBC, CFB)
Public-Key (Asymmetric) Cryptography
Example: RSA, ECC
Cryptographic Hashing
Example: SHA-1, SHA256
Levels of Database Encryption
OS/Storage Level encryption
Database Files are encrypted/decrypted by the operating system when they are written/read from disk
DBMS-Level Encryption
Encrypt/decrypt the data as it is inserted to, or retrieved from the database
Application-Level / Clientside Encryption
The Encryption & Decryption Function in MySQL
PBKDF2 / HKDF
RANDOM_BYTES(16)
xxxxxxxxxx
AES_ENCRYPT(str, key_str);
AES_DECRYPT(crypt_str, key_str);
VARBINARY(x)
Assume AES is used, blocks are always 16 bytes
Length of encrypted text is 16 × (trunc(string_length / 16) + 1)
Cryptographic Hash in MySQL
SHA1()
Calculates an SHA-1 160-bit checksum for the string
SHA2(str, hash_length)
SHA-224, SHA-256, SHA-384, and SHA-512
Dictionary Attack
Salts
The extra random bits are called salt
Password Hashing Algo: scrypto / bcrypto / argon
MySQL Enterprise Encryption Functions
Security based on public-key cryptography
Oracle’s DBMS_Crypto library
Pros of DBMS Encryption Features
Centralized Management
One place where encryption happens, easier auditing and control
Key Rotation
Changing keys periodically is good practice, easier key management within the DB
Offloading Encryption Overhead
More efficient resource utilization on the client side
Consistency
Same encryption standard for all client apps, same security for all
Search Encrypted Records
Search queries could be made to automatically decrypt data before comparison
Oracle’s Transparent Data Encryption (TDE)
TDE encrypts data automatically when written to database, including backups, data dumps exports, and logs
Advantages of TDE
Prevent OS users from inspecting the tablespace files
Protects against theft or loss of disks and backups
Transparent protection
TDE for Tablespaces
Keys in TDE for Tablespaces
TDE tablespace encryption Key
TDE master encryption key
TDE Column Encryption
Keystore Storage of TDE Master Encryption Keys
Oracle Database provides a key management framework for TDE that stores and manages keys and credentials
Microsoft SQL server
TDE in MSS
“Always Encrypted” in MSS
DBMS -Level Encryption
Pros | cons |
---|---|
Does not usually require significant changes to the application layer | The encryption keys must be transmitted or kept with the encrypted data on the server side |
Data are decrypted on the database server at runtime |
Application-Level Encryption
Sensitive data is encrypted in the application layer before it is sent to the database and decrypted before usage
Pros | cons |
---|---|
Offers the highest degree of security since the only one that is able to access the sensitive data is the legitimate client | Each applications need to be modified to adopt this solution |
Freedom in terms of enforcing cryptographic access control | Limit the ability of the database server to process the encrypted data |
separate encryption keys from the encrypted data stored in the database |
Data Owner and Data Manager Separation
Provides a separation between
Data Owners
Data Managers
Enables customers to confidently store sensitive data outside of their direct control
Enable the Database Engine to process some queries on encrypted data, while preserving the confidentiality of the data
Database administrators, cloud database operators, or other highprivileged unauthorized users, cannot access the encrypted data
“Always Encrypted” uses two types of keys
Column encryption keys
Column master keys are protecting keys used to encrypt column encryption keys
The Database Engine
It stores
encryption configuration for each column in database metadata
encrypted values of column encryption keys
the information about the location of column master keys
To access data stored in an encrypted column in plaintext, an application must use an Always Encrypted enabled client driver
The encryption/decryption process
Submit query
Decryption
Always Encrypted
Pros | Cons |
---|---|
Protection of data “in use” | Unlike TDE, this is only partially transparent to applications |
Data stored in the database is protected even if the entire machine is compromised | Always Encrypted only supports very limited operations on encrypted database columns |
Protecting data from high-privilege users | |
Provides a separation between those who own the data and those who manage the data | |
enables customers to confidently store sensitive data in the cloud, delegate on-premises database administration to third parties |
Data in file systems are normally protected in two major aspects
Access Control
Data Encryption
NTFS Security
allows administrators to set permissions on a file or folder, and specify the groups and users whose access administrator want to restrict or allow, and then select the type of access
supports the Encrypting File System (EFS) technology used to store encrypted files on NTFS volumes
NTFS Permissions
Administrators can use the NTFS utility to provide access control for files and folders, and other objects on the network
SECURITY_DESCRIPTOR
To help control access to NTFS objects, NTFS introduced the $SECURITY_DESCRIPTOR that was embedded into the MFT
Discretionary Access Control List (DACL)
Encrypting File System (EFS)
EFS uses symmetric key encryption in combination with public key cryptography to protect files
File data is being encrypted with symmetric algorithm
The key, used in symmetric encryption is called File Encryption Key (FEK)
Encryption Steps
Index | step |
---|---|
1 | EFS generate File Encryption Key (FEK) |
2 | Get public/private key pair; if it does not exist at this stage, EFS generate a new pair |
3 | EFS creates Data Decryption Field (DDF) for the current user, where it places FEK and encrypts it with public key |
4 | A temporary file Efs0.tmp is created in the same folder as the file being encrypted. The contents of original file (plain text) is copied into temporary file, after that the original is overwritten with encrypted data |
Decryption Steps
index | step |
---|---|
1 | System checks if user has a private key used by EFS. If yes, it reads EFS attributes and walk through the DDF ring looking for DDF for current user |
2 | If DDF is found, user's private key is used to decrypt FEK extracted from DDF. Using decrypted FEK, EFS decrypts file data. It should be noticed that file never decrypted in whole but rather by sectors when upper level module requests particular sector |
BitLocker
BitLocker encrypts an entire volume
Preferably to have Trusted Platform Module (TPM) on your PC’s motherboard
A complex hierarchy of keys to encrypt devices
Malicious SQL statements are inserted by attackers to trick data-driven applications into executing unintended commands or bypassing security controls
Bypassing Login Controls
Example
A Vulnerable Application
Variety of SQL Injection Attacks
UNION attacks
Example SQL
x-- ex 1
SELECT a, b FROM table1
UNION
SELECT c, d FROM table2
-- ex 2
SELECT name, description FROM products WHERE category = 'Gifts' UNION SELECT username, password FROM users--
Determine Number of Columns
Submitting a series of UNION SELECT payloads specifying a different number of null values
Example
xxxxxxxxxx
' UNION SELECT NULL,NULL,NULL --
' UNION SELECT NULL,NULL,'a',NULL --
Determine the Data Type - Example
Find one or more columns in the original query results whose data type is, or is compatible with, string data
Example
xxxxxxxxxx
Lifestyle'+UNION+SELECT+NULL,'abcdef',NULL--
Determining the OS version
To determine the DB version
Example
xxxxxxxxxx
' UNION SELECT BANNER,
NULL FROM v$version --
Blind SQL injection
Example
xxxxxxxxxx
-- ex 1
SELECT TrackingId
FROM TrackedUsers
WHERE TrackingId = 'xD3psPF6m3EFOT2k'
-- ex 2 - Determine the length of the password
x'+UNION+SELECT+'a'+FROM+users+WHERE+username='administrator'+AND+length(password)>1--
x'+UNION+SELECT+'a'+FROM+users+WHERE+username='administrator'+AND+length(password)>2--
-- ex 3 - › Determine the password’s character at position 1,2, …
x'+UNION+SELECT+'a'+FROM+users+WHERE+username='administrator'+AND+substring(password,1,1)='a'--
x'+UNION+SELECT+'a'+FROM+users+WHERE+username='administrator'+AND+substring(password,1,1)='b'--
Second order SQL Injection
Example
Parameterized Query
Limitations
Cannot be used to handle untrusted input in other parts of the query
Example
xxxxxxxxxx
// prepare and bind
$stmt = $conn->prepare("SELECT name, password FROM lab1.USER WHERE name=? and password=?");
$stmt->bind_param("ss", $_GET["user_name"] , $_GET["password"]);
//execute the prepared statement
$result = $stmt->execute();
//grab a result set
$resultSet = $stmt->get_result();
Input Validation and Filtering
Make sure that server-side validation is enforced
Vulnerability Scanners
Example
sqlmap
Logging
DBMS provides logging facilities that record what activity is taking place
DBMS Firewal
A DBMS firewall enables database administrators to permit or deny SQL statement execution based on matching against whitelists of accepted statement patterns
Example
MySQL Enterprise Firewall
Profile Operational Modes
OFF
RECORDING
PROTECTING
DETECTING
Authentication + Authorization
Authentication
a mechanism that determines whether a user is who he/she claim to be
Authorization
granting of a privilege that enable a user to have a legitimate access to a resources
What do we want to achieve?
Decide who can do what on the resources
Who = Principal
What = action
Can be represented in an Access Control Matrix
How do we achieve this?
First, need to authenticate the principal
Password-based authentication is just an example
Password-Based Authentication
A naive way to understand password-based authentication
Insider Threat or Server Compromised
How to protect it?
Outsource the authentication process?
Hash Function
Current Version of Mysql supports different Authentication Plugins
The authentication plugin associated with an account performs any hashing required of a cleartext password specified
Example
xxxxxxxxxx
CREATE USER 'jeffrey'@'localhost' IDENTIFIED BY 'password’;
Limitations
Password Guessing Attack
Attacker reading other parts of the Database
Attacker modifying/destroying the password hash
Web Applications
Dictionary Attack
Kerberos
Kerberos is an authentication protocol that allows nodes communicating over an insecure network to prove their identity
Client-server model
Mutual authentication
Depends on symmetric key cryptography
Requires a trusted third party
System Model
Kerberos Trust Model
Notations
TICKETS AND AUTHENTICATORS
SESSION KEY DISTRIBUTION
USER AUTHENTICATION
for user to server authentication, client key is the user’s password
USE OF THE SESSION KEY
Kerberos establishes a session key Kc,s
session key can be used by the applications for
client to server authentication
mutual authentication
message confidentiality using Kc,s
message integrity using Kc,s
TICKET-GRANTING SERVICE
Problem: Transparency
Solution: Ticket-Granting Service (TGS)
TICKET LIFETIME
Life time is minimum of:
requested life time
max lifetime for requesting principal
max lifetime for requesting service
max lifetime of ticket granting ticket
Max lifetime is 21.5 hours
SSH
Password Authentication
Password requirements/strength: Entropy
Shannon Entropy
Let X be password distribution. Passwords are drawn from X
𝑛 is size of support of X
𝑝1, 𝑝2 , ... , 𝑝n are probabilities of passwords in decreasing order
Min-entropy
related to commonness of most popular password
“guessing probability” or GP denote probability of most probable password over a population
How is the password stored?
Cryptographic hash function
One-way function
Giveny = H(M), hard to compute M
Deterministic
H maps any message to a short digest (e.g.,256-bit string)
Collisions resistant
Can’t find M, M’
s.t. H(M) = H(M’)
Attacks on password
Online
Try to guess passwords by logging to a live system
Online attack: Biggest data breaches
Offline
Try to guess passwords in the (typically stolen) password database, or
Pre-computation can make offline attacks very fast
Offline attack: Build Rainbow table
Multi forms of password authentication
Single password authentication
Multi-Factor Authentication
Factors for two factor authentication (2FA)
SMS (short message service) Authentication
Circumventing SMS-Based 2FA
Time-based One-Time Passwords
Biometric Authentication
Biometric Error Rates
Design better Fuzzy extractor such that
𝐹𝐸𝑚 =𝐹𝐸(𝑚′)
Biometric Birthday paradox
Primarily should be used as a second factor authentication, rather than a primary authentication factor
Public key Authentication
Example
Web Authentication
HTTPS
SSH Key Authentication
Digital Signature
A digital signature is a mathematical scheme for verifying the authenticity of digital messages or documents
SSH Authentication
ACCESS MATRIX MODEL
Basic Abstractions
Subjects
Objects
Rights
The rights in a cell specify the access of the subject (row) to the object (column)
USERS AND PRINCIPALS
the system authenticates the user in context of a particular principal
Example
There should be a one-to-many mapping from users to principals
This ensures accountability of a user's actions
PRINCIPALS AND SUBJECTS
A subject is a program (application) executing on behalf of a principal
A principal may at any time be idle, or have one or more subjects executing on its behalf
Example
Database User
Account creation
xxxxxxxxxx
CREATE USER 'username'[@'hostname'][IDENTIFIED BY 'password'];
Account Deletion
xxxxxxxxxx
DROP 'user'[@'hostname'];
Privileges
Database-level
Access all of the data within the database
Table-level
Access given tables of a database
Column-level
Access specific columns
Example
xxxxxxxxxx
GRANT SELECT (EmpID, FName, LName, Title,Office) ON BusinessCLS.Employee TO 'roberts';
Row-level
Access specific rows
Neet to set role before use
xxxxxxxxxx
SET ROLE 'CFO'
Adding and Removing User to a Role
xxxxxxxxxx
GRANT 'role' [@'hostname'| (s) TO 'user'[@'hostname'](s);
CRUD Matrix
Example
Tables
Customer (custID, Name, Address, phone)
Order (OrderID, custID, Date)
Orderline (OrderLineID, OrderID, ProductID, Quantity)
Product (ProductID, Stock, Price)
Matrix Table
Customer | order | orderline | product | |
---|---|---|---|---|
Web Application | RU | CRUD | CRUD | RU |
Stored Function vs Procedure
Stored Function always return a value
Stored Procedure does not always return value
Stored Function only support “Select”
Stored Procedure can modify database objects
Stored Functions cam be called from procedures, and can be embedded in Select statments
Oracle Data Safe
Unified control center for your Oracle Databases
Features
Security Assessment
Assess the security of Oracle Database Configurations
Common poor database configurations
Weak password policies
No control on over-privileged accounts
Lack of activity monitor
User Accounts
Privileges and Roles
Authorization Control
Fine-Grained Access Control
Auditing
Encryption
Database Configuration
User Assessment
Data Discovery
Find sensitive data in your Oracle databases
Data Masking
Replacing sensitive data with fictitious yet realistic looking data
Activity Auditing
Ensure accountability
Collect and retain audit record
Alerts
Track and be notified of particular user activities and unusual behavior
Re-identification by Linking
Requirements
Accuracy
Researchers want accurate and meaningful data
Confidentiality
Database administrators want to maintain the privacy of the data owner
Common Approaches
Access Restriction
Databases normally have different access levels for different types of users
Query Set Restriction
A query-set size control can limit the number of records that must be in the result set
Microaggregation
Raw (individual) data is grouped into small aggregates before publication
Data Perturbation
Perturbed data is raw data with noise added
Output Perturbation
Only the output or query results are perturbed
Auditing
Auditing is the process of keeping track of all queries made by each user
Random Sampling
Only a sample of the records meeting the requirements of the query are shown
Quasi-Identifiers
Key attributes
Quasi-identifiers
Classification of Attributes
Sensitive attributes
K-Anonymity: Intuition
Any quasi-identifier present in the released table must appear in at least k records
K-Anonymity Protection Model
Private table
Released table: RT
Attributes: A1, A2, …, An
Quasi-identifier subset: Ai, …, Aj
Generalization
Replace quasi-identifiers with less specific, but semantically consistent values
Achieving k-Anonymity
Generalization
Lots of algorithms in the literature
Aim to produce “useful” anonymizations
Curse of Dimensionality
Generalization fundamentally relies on spatial locality
Real-world datasets are very sparse
Projection to low dimensions loses all info --> k-anonymized datasets are useless
Two (and a Half) Interpretations
Membership disclosure
Sensitive attribute disclosure
Identity disclosure
Unsorted Matching Attack
Problem: records appear in the same order in the released table as in the original table
Solution: randomize order before releasing
Complementary Release Attack
Linking Independent Releases
Attacks on k-Anonymity
k-Anonymity does not provide privacy if
Sensitive values in an equivalence class lack diversity
The attacker has background knowledge
k-Anonymity Considered Harmful
Syntactic
Focuses on data transformation, not on what can be learned from the anonymized dataset
“k-anonymous” dataset can leak sensitive information
“Quasi-identifier” fallacy
Assumes a priori that attacker will not
know certain information about his target
Relies on locality
Destroys utility of many real-world datasets
Differential Privacy
Differential privacy is a new notion of privacy
It is about the risk of joining the database
Differential Privacy: Formal Definition
Achieving Differential Privacy
Differential Privacy
Type of Encryption Schemes
Symmetric key encryption
Encryption key and decryption key are the same
Asymmetric key encryption
Encryption key and decryption key are different
Also known as public-key encryption
Always Encrypted
Always Encrypted provides transparent, end-to-end encryption for your sensitive columns
All encryption and decryption is handled transparently by the driver library on the client
Randomized Encryption
For randomized encryption, the IV is randomly generated
Deterministic Encryption
When using deterministic encryption: IV = HMAC-SHA256( iv_key, cell_data ) truncated to 128 bits
Limitations
Randomized encryption | Deterministic encryption |
---|---|
Allows for transparent retrieval of encrypted data but NO OPERATIONS | Allows for transparent retrieval of encrypted data AND equality comparison |
– More secure due to the more random nature of the encryption | E.g., in WHERE clauses and joins, distinct, group by |
LIKE and ranges not supported (in V1) |
Searchable Encryption
Store data externally
encrypted
want to search data easily
avoid downloading everything then decrypt
allow others to search data without having access to plaintext
Motivating Scenario
▪ You want your device to only download encrypted files marked with the keyword “urgent” from the server
You don’t want the server to know what are the keywords associated with each file
Confidentiality vs. Functionality (or Searchability)
Database Encryption
Encryption Scheme
Key generation algorithm (KeyGen)
Enc k (m) → c, Dec k (c) → m
Symmetric-key encryption = (KeyGen, Enc, Dec)
One-Time Pad (OTP) based on XOR
XOR | 0 | 1 |
---|---|---|
0 | 0 | 1 |
1 | 1 | 0 |
OTP = {KeyGen, Enc, Dec}
Enc(k, m) → c = m ⊕ k
(m is λ-bit long)
Dec(k, c) → m = c ⊕ k
Retrieval of Encrypted Data
Symmetric Searchable Encryption (SSE)
Client first uploaded the encrypted data to the server
Keyword search over text files
Server can only search after obtaining the trapdoor for w
Deterministic Encryption (DE)
To search for w, the secret-key owner encrypts w for the server
It lets equality test (on ciphertexts) carry over to the plaintexts
But once revealed, everyone (e.g., Eve) knows what it means
Dictionary attack is possible (if Eve can trick Alice to search for w)
Security and Efficiency of SSE
Analysis of DE
L1 Leakage:
#DB (number of records in the database)
Equality (or whatever property allowed by the public test)
DB itself! (unless the DB comes from a high-entropy distribution)
L2 Leakage:
Access patterns and Search patterns
(actually, typical SSE schemes also leak these two patterns)
Search efficiency: sub-linear in #DB
(which is good)
Design Overview of SSE
Full-text search approach requires linear scan
Sublinear search requires (plaintext) index preparation
Lightweight symmetric-key primitives
Pseudorandom Function (PRF)
Block / Stream Cipher (for encrypting actual content)
Data structures
Pseudorandom Function (PRF) Family
Two Useful Features of SSE
Dynamic
Further allow update over encrypted data
Parallel
Search should be done in parallel
Dynamic SSE (DSSE) Syntax
Full-Text SSE (Basic Idea)
Refining Basic Idea: Short Secret Key
Further Refinement: only start from w
Index/Inverted Index-based Approach
Token for Inverted-Index
Dynamic + Parallel is difficult
Linked list is inherently sequential
Reveal too much information to the server to update
Solution
Cascaded Triangles
A set of perfect binary trees for each node
Cascading tree heights: h1 ≤ h2 < ··· < hn
Store all edges connecting the node
Node Addition
If h1 = h2 , merge trees with new node as root
Else, add new tree (of height 1)
Node Deletion
Replace deleted node by root of shortest tree
Result
The cascaded heights are maintained
No tree balancing rotation
Additional Concern: Forward Security
Our Generic Upgrade
Dynamic Searchable Symmetric Encryption
Good
Sublinear search -- index data for efficiency
Forward privacy -- refresh database to invalidate old tokens
Bad
Not applicable for multi-writer scenarios
Public-Key Searchable Encryption (PKSE)
Good
No secret distribution/synchronous communication
Bad
Hard to achieve forward security & sublinear search (often linear test)
No controllable search