TiDB Overview

Tags:

TiDB(/’taɪdiːbi:/, where “Ti” stands for Titanium) is an open source NewSQL database that supports hybrid transactional and analytical processing (HTAP) workloads. It is compatible with MySQL and provides horizontal scalability, strong consistency, and high availability. TiDB aims to provide users with a one-stop database solution for OLTP (online transaction processing), OLAP (online analytical processing), and HTAP services. TiDB is suitable for a wide range of use cases that require high availability and strong consistency for large-scale data.

Key Features

Horizontal scale-out and simple scale-in

TiDB’s architecture separates computing from storage, so compute or storage capacity can be scaled out or scaled in independently online as needed. The scaling process is transparent to application operations and maintenance teams.
Financial-grade high availability

Data is stored in multiple replicas. Data replicas use the Multi-Raft protocol to replicate transaction logs. A transaction can be committed only after data has been successfully written to most replicas. This ensures strong consistency and availability when a minority of replicas go down. To meet different disaster-resilience requirements, you can configure geographical locations and the number of replicas as needed.
Real-time HTAP

TiDB provides two storage engines. TiKV is a row-based storage engine, and TiFlash is a column-based storage engine. TiFlash uses the Multi-Raft Learner protocol to replicate data from TiKV in real time, ensuring data consistency between the TiKV row-based storage engine and the TiFlash column-based storage engine. TiKV and TiFlash can be deployed on different systems as needed to address HTAP resource isolation.

HTAP (Hybrid Transaction/Analytical Processing)
HTAP (Hybrid Transaction/Analytical Processing) is a term coined by the research firm Gartner. It refers to a next-generation data platform that supports OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) without data replication. Original article: ["Big Data Frugal Innovation": 8 Convergence Tips from MapR](https://www.ciokorea.com/t/544/12200/30801?page=0,1#csidxd92aa7fca28b93089f3d5aea81955b0)
Cloud-native distributed database

TiDB is a distributed database designed for the cloud, providing flexible scalability, reliability, and security on cloud platforms. Users can scale TiDB flexibly to match changing workload requirements. With TiDB, each piece of data has at least three replicas and can be scheduled across different cloud availability zones to tolerate the loss of an entire data center. TiDB Operator supports TiDB management on Kubernetes and automates tasks related to TiDB cluster operations. This makes it easy to deploy TiDB on clouds that provide managed Kubernetes. TiDB Cloud, a fully managed TiDB service, is the easiest, most cost-effective, and most elastic way to use TiDB in the cloud, allowing you to deploy and run TiDB clusters with just a few clicks.
Compatible with the MySQL 5.7 protocol and MySQL ecosystem

TiDB is compatible with the MySQL 5.7 protocol, common MySQL features, and the MySQL ecosystem. Migrating an application to TiDB often requires no code changes, or only a small amount of code modification. TiDB also provides a set of data migration tools that help migrate application data to TiDB easily.

Use Cases

Financial industry scenarios with high requirements for data consistency, reliability, availability, scalability, and fault tolerance

As many people know, the financial industry has high requirements for data consistency, reliability, availability, scalability, and fault tolerance. Traditional solutions provide services from two data centers in the same city, while a third data center in another city provides disaster recovery but does not serve traffic. This approach has drawbacks such as low resource utilization and high maintenance costs, and its RTO (Recovery Time Objective) and RPO (Recovery Point Objective) do not meet expectations. TiDB uses multiple replicas and the Multi-Raft protocol to schedule data across different data centers, racks, and devices. If some devices fail, the system automatically switches over, enabling RTO ≤ 30 seconds and RPO = 0.
Massive data and high-concurrency scenarios with high requirements for storage capacity, scalability, and concurrency

As applications grow rapidly, data increases sharply. Traditional standalone databases cannot meet data capacity requirements. The solution is to use sharding middleware or a NewSQL database such as TiDB. The latter is cost-effective. TiDB uses a separated compute and storage architecture, so compute or storage capacity can be scaled out or scaled in independently. The compute layer supports up to 512 nodes, each node supports up to 1,000 concurrent executions, and the maximum cluster capacity reaches the PB (petabyte) level.
Real-time HTAP scenarios

With the rapid growth of 5G, the Internet of Things, and artificial intelligence, data generated by enterprises has increased enormously, reaching hundreds of TB (terabytes) or even PB scale. A traditional solution is to use an OLTP database for online transaction applications and use ETL (extract, transform, load) tools to replicate data to an OLAP database for data analysis. This approach has several drawbacks, including high storage costs and low real-time performance. TiDB introduced the TiFlash columnar storage engine in v4.0. Combined with the TiKV row-based storage engine, it turns TiDB into a true HTAP database. With only a small amount of additional storage cost, both online transaction processing and real-time data analysis can be handled in the same system, significantly reducing costs.
Data aggregation and secondary processing scenarios

Most enterprise application data is distributed across various systems. As applications grow, decision-makers need to understand the company’s overall business status and make timely decisions. In this situation, a company needs to aggregate distributed data into the same system and perform secondary processing to generate T+0 or T+1 reports. A traditional solution is to use ETL and Hadoop, but Hadoop systems are complex and have high operations, maintenance, and storage costs. Compared with Hadoop, TiDB is much simpler. You can use ETL or data migration tools provided by TiDB to replicate data into TiDB. Reports can be generated directly using SQL statements.

TiDB Introduction Last modified on 2022-08-11 18:22:49: overview: refine the size of the intro video (#9938) (#9940)