Log-Structured Merge Tree (LSM-Tree)
- is a type of data structure optimized for write-heavy workloads, commonly used in modern databases like Cassandra, RocksDB, LevelDB, and HBase
- it’s designed to make inserts, updates, and deletes efficient, especially when working with large datasets stored on disk
Basic Idea
An LSM-tree organizes data into multiple levels of storage:
- In-memory component (C0): Fast, temporary storage (usually a sorted structure like a memtable)
- Disk components (C1, C2, …): Larger, persistent storage on disk, often stored as sorted files (i.e. SSTables)
Instead of updating data directly on disk (which is slow due to random I/O), LSM-trees write changes sequentially to memory first, then periodically flush them to disk in bulk.