《Database Internals: A Deep-Dive into How Distributed Data System Work》读书笔记

阅读量：

Overview

Terminology can sometimes be ambiguous and hard to understand without a complete context.

Distinction among the DBMS in terms of :

a storage medium:

Memory

In-memory DBMS (main memory DBMS) store data primarily in memory and use the disk for recovery and logging.

Store their contents almost exclusively in RAM

Disk based DBMS

Disk based DBMS hold most of the data on disk and use memory for caching disk contents or as a temporary storage.

layout: Column vs. Row Oriented DBMS

three major categories:

Online transaction processing databases (OLTP)
Online analytical processing databases (OLAP)
Hybrid transactional and analytical processing (HTAP)

another way:

key-value stores
relational databases
document-oriented stores
graph databases

Client/Server

Database management systems use a client/server model, where database system instances (nodes) take the role of servers, and application instances take the role of clients.

Client requests arrive through the transpot subsystem.
在这里插入图片描述

The storage engine has several components with dedicated responsibilities:

复制代码

1. Transaction manager
2. Lock manager
3. Access methods
4. Buffer manager
5. Recovery manager

Horizontal Scaling vs. Scaling Vertically

Scaling out: improving performance and increasing capacity by running multiple database instances acting as a single logical unit: Gamma Database Machine Project, Teradata, Greenplum, Parallel DB2.

Scaling up by moving the database to a larger, more powerful machine.

Database components

复制代码

1.

A transport layer accepting request

复制代码

2.

A query processor determing the most efficient way to run queries

复制代码

3.

An execution engine carrying out the operations

复制代码

4.

A storage engine(or databasa engine)

A software component of a database management system responsible for storing, retrieving, and amanging data in memory and on disk, designed to capture a persistent, long-term memory of each node.

Database management systems are applications built on top of storage engines, offering a schema, a query language, indexing, transactions, and many other useful features.

Storage engines were developed independently from the database management systems they’re now embedded into.

Database file system

The primary goal of a database system is to store data and to allow quick access to it.

File organization improve efficiency.

Database systems do use files for storing the data, but instead of relying on filesystem hierarcheies of directories and files for lacating records, they compose files using implementation-specific formats.

The main reasons to use specialized file organization over flat files are:

复制代码

1. Storage efficiency
2. Access efficiency
3. Update efficincy

A database system ususally separates data files and index files: data files store data records, while index files store record metadata and use it to locate records in data files.

Data files also called primary files.

B-Tree

B-Trees were introduced by Rudolph Bayer and Edward M. McCreight back in 1971 and gained popularity over the years.

Binary search trees

A binary search tree (BST) is a sorted in-memory data structure, used for efficient key-value lookups.

The balanced tree is defined as one that has a height of $log_2N$ where N is the total number of items in the tree; and the difference in height between the two subtrees is not greater than one.

Unbalanced trees have a worst-case complexity of $O(N)$ , balanced trees give us an average $O(log_2N)$ .

理解fanout in database

Binary trees have a fanout of just two.

2-3 Trees and other low-fanout trees have a similar limitation: while they are useful as in-memory data structures, small node size makes them impractical for external storage.

复制代码

* `High fanout` to improve locality of the neighboring keys
* `Low height` to reduce the number of seeks during traversal

Disk-Basked Structures

Most traditional algorithms were developed when spinning disks were the most widespread persistent storage medium, which significantly influenced their design.

On spinning disk, seeks increase costs of random reads because they require disk rotation and mechanical head movements to position the read/write head to the desired location. Once the expensive part is done, reading or writing contiguous bytes (i.e., sequential operations) is relatively cheap.

Head positioning is the most expensive part of an operation on the HDD.

Solid state drives (SSDs) do not have moving parts: there’s no disk that spins, or head that has to be positioned for the read. A typical SSD is built of memory cells , connected into strings (typeically 32 to 64 cells per string), strings are combined into arrays , arrays are combined into pages(range from 2 to 16 Kb), and pages are combined into blocks.

SSD concepts	size
memory cells	one or multiple bits
string	32-64 cells
array
page	2-16 Kb(smallest unit that can be written (programmed) or read)
block	64-512 pages(the smallest erase entity)
plane
die	where plane are placed on
SSD	one or more dies

The smallest unit that can be written (programmed) or read is a page, however we can only make changes to the empty memory cells (i.e., to ones that have been erased befor the write). The smallest erase entity is not a page, but a block that holds multiple pages, which is why it is often called an erase block.

pages in an empty block have to be written sequentially.

On-Disk Structures

Besides the cost of disk access itself, the main limitation and design condition for building efficient on-disk structures is the fact that the smallest unit of disk operation is a block.

Ubiquitous B-Trees

B-Trees can be thought of as a cast catalog room in the library

B-Trees build upon the foundataion of balanced search trees and are different in that they have higher fanout (have more child nodes) and smaller height.

While binary tree nodes are drawn as circles, B-Tree nodes are often drawn as rectangles.
在这里插入图片描述

B-Trees are sorted: keys inside the B-Tree nodes are stored in order. This implies that lookups in B-Trees have logarithmic complexity.

B-Trees consist of multiple nodes, each node holds up to N keys and N+1 pointers to the child nodes. These nodes are logically grouped into three groups:

复制代码

* Boot node
* Leaf nodes
* Internal nodes

B-Trees are a page organization technique, we often use terms node and page interchangeably.

Keys stored in B-Tree nodes are called index entries (sepatator keys, divider cells), they split the tree into subtrees (branches, subranges)

Page Structure

In B-Trees, we distinguish:

复制代码

* leaf nodes: hold keys and data records pair
* nonleaf nodes: hold keys and pointers to other nodes

Each B-Tree node occupies one page or multiple pages linked together.

Slotted page

To efficiently store variable-size records such as strings, binary large objects (BLOBs), etc., we can use an organization technique called slotted page or slot directory.

理解slotted page

Cell Layout

key cells

Key cells hold a separator key and a pointer to the page between two neighboring pointers

key-value cells

Key-value cells hold keys and data records associated with them

Database Transaction

A transaction is an indivisible logical unit of work in database management system, allowing you to represent multiple operations as a single step.

Distributed System

In a distributed system, we have several participants (also called processes, nodes, or replicas).

Each participant has its own local state.

CPUs are tiny distributed systems with links, processors, and communication protocols.

Algorithms define participant roles, exchanged messages, states, transitions, eecuted steps, properties of the delivery medium, timing assumptions, failure models, and other characteristics that describe processes and their interactions.

Consistency models

Consistency models describe concurrent executions and establish an order in which operations can be executed and made visible to the participants.

Concurrent and Parallel

When two sequences of steps execute concurrently, both of them are in progress, but only one of them is executed at any moment; Concurrent operations overlap in time;

If two sequences execute in parallel, their steps can be executed simultaneously; parallel operations are executed by multiple processors.

Network Partitions

When two or more servers cannot communicate with each other, we call the situation network partition.

全部评论 (0)

还没有任何评论哟~

《Database Internals: A Deep-Dive into How Distributed Data System Work》读书笔记

Overview Terminologycansometimesbeambiguousandhardtounderstandwithoutacompletecontext. Distinctionam...

Zookeeper‘s Role in Apache Flink: A Deep Dive into Distributed Stream Processing

1.背景介绍在大数据时代，流处理技术已经成为了一种非常重要的技术，它能够实时处理大量的数据，并进行实时分析和决策。ApacheFlink是一个流处理框架，它能够处理大规模的流数据，并提供了一系列的流...

阅读笔记-Security Code Review by LLMs: A Deep Dive into Responses

1. 研究背景：随着大型语言模型（LLMs）在软件开发任务中的潜力逐渐显现，例如代码补全、程序修复和测试用例生成，将LLMs更深入地整合到开发者的工作流程中以解决更具挑战性的任务成为了可能。代码审查...

A deep dive into transformer‘s inner workings!

作者：禅与计算机程序设计艺术 1.简介最近几年，Transformer模型在NLP领域获得了极大的关注，在文本处理方面也逐渐应用到许多任务中。但在理解Transformer的工作原理时，可能会感到比...

A Deep Dive into DNS Security for Beginners

作者：禅与计算机程序设计艺术 1.简介 1.1引言互联网服务目前已经成为世界范围内最流行的服务之一，越来越多的人开始使用互联网产品和服务。而网络服务中的域名系统（DomainNameSystem，D...

How to Import Large Data into Neo4j Database

HowtoImportLargeDataintoNeo4jDatabase Preface Weonlyintroducehowtoimportlargedataintographdatabasene...

dive into deep learning 笔记（pytorch版本）（二）

文章目录前言 softmax的原始代码 softmax的简洁代码多层感知机多层感知机原始代码多层感知机简洁代码前言 softmax回归其实和普通线性回归很像，只是普通的线性回归是二分类，但是...

dive into deep learning 笔记（pytorch版本）（一）

文章目录前言算数操作索引 Numpy、PIL、pytorch常用的类型转换广播机制自动求梯度解释 tensor自动求梯度中的叶子节点 detach 线性回归的原始版本读取数据的方式训练流...

ActorCritic Algorithm: A Deep Dive into Recurrent Neural Networks

1.背景介绍人工智能（ArtificialIntelligence,AI）是指一种使计算机具有人类智能的科学和技术。人工智能的目标是让计算机能够理解人类的智能，包括学习、理解自然语言、认知、决策、问...

React JSX: A Deep Dive into the JavaScript XML

ReactJSX:ADeepDiveintotheJavaScriptXML Introduction ReactJSX,oftenreferredtoassimplyJSX,isasyntaxext...

是否确定退出登录?

《Database Internals: A Deep-Dive into How Distributed Data System Work》读书笔记

Overview

Client/Server

Horizontal Scaling vs. Scaling Vertically

Database components

Database file system

B-Tree

Binary search trees

Disk-Basked Structures

On-Disk Structures

Ubiquitous B-Trees

Page Structure

Slotted page

Cell Layout

Database Transaction

Distributed System

Consistency models

Concurrent and Parallel

Network Partitions

全部评论 (0)

相关文章推荐

《Database Internals: A Deep-Dive into How Distributed Data System Work》读书笔记

Zookeeper‘s Role in Apache Flink: A Deep Dive into Distributed Stream Processing

阅读笔记-Security Code Review by LLMs: A Deep Dive into Responses

A deep dive into transformer‘s inner workings!

A Deep Dive into DNS Security for Beginners

How to Import Large Data into Neo4j Database

dive into deep learning 笔记（pytorch版本）（二）

dive into deep learning 笔记（pytorch版本）（一）

ActorCritic Algorithm: A Deep Dive into Recurrent Neural Networks

React JSX: A Deep Dive into the JavaScript XML