TiDB Architecture - Understanding Product Components

TiDB Architecture

Compared with traditional standalone databases, TiDB provides the following benefits.

  • It has a distributed architecture with flexible and elastic scalability.
  • It is fully compatible with the MySQL 5.7 protocol and common MySQL features and syntax.
    • You do not need to change a single line of code to migrate an application to TiDB.
  • If a small number of replicas fail, it supports high availability through automatic failover. This is transparent to the application.
  • It supports ACID transactions, making it suitable for scenarios that require strong consistency, such as bank transfers.
  • It provides various data migration tools for data migration, replication, or backup.

As a distributed database, TiDB is designed as a set of multiple components. These components communicate with each other to form a complete TiDB system. The architecture is as follows.

TiDB Architecture

TiDB Server - Externally Exposed Server

TiDB Server is a stateless SQL layer that exposes MySQL protocol connection endpoints externally.

TiDB Server receives SQL requests, performs SQL parsing and optimization, and ultimately generates distributed execution plans. It can scale horizontally and provides a unified interface externally through load balancing components such as LVS (Linux Virtual Server), HAProxy, and F5. It does not store data. It performs only computation and SQL analysis, and forwards actual data read requests to TiKV nodes or TiFlash nodes.

PD (Placement Driver) Server - Cluster Management

The PD (Placement Driver) server is the metadata management component for the entire cluster.

It stores real-time data distribution metadata for every TiKV node and the topology structure of the entire TiDB cluster, provides the TiDB Dashboard management UI, and allocates transaction IDs to distributed transactions. The PD server not only stores cluster metadata but also sends data scheduling commands to specific TiKV nodes based on the data transfer status reported by TiKV nodes in real time, so it acts like the “brain” or “command center” of the entire TiDB cluster. The PD server also consists of at least three nodes to provide high availability, and deploying an odd number of PD nodes is recommended.

Storage Servers - Storage Cluster

There are two kinds of storage cluster servers.

TiKV Server

TiKV Server is responsible for storing data. TiKV is a distributed transactional key-value storage engine.

A Region is the basic unit for storing data. Each Region stores data for a specific key range from StartKey to EndKey, represented as a left-closed and right-open interval.

Each TiKV node has multiple Regions. The TiKV API natively supports distributed transactions on key-value pairs and supports snapshot isolation by default. This is the core of how TiDB supports distributed transactions at the SQL level. After processing an SQL statement, TiDB Server converts the SQL execution plan into actual calls to the TiKV API. Therefore, data is stored in TiKV. All data in TiKV is automatically maintained in multiple replicas, three replicas by default, so TiKV supports basic high availability and automatic failover.

TiFlash Server

TiFlash Server is a special type of storage server. Unlike normal TiKV nodes, TiFlash stores data by column and is mainly designed to accelerate analytical processing.

References