Log-Structured Merge Tree (LSM-Tree)

Basic Idea

An LSM-tree organizes data into multiple levels of storage:

  • In-memory component (C0): Fast, temporary storage (usually a sorted structure like a memtable)
  • Disk components (C1, C2, …): Larger, persistent storage on disk, often stored as sorted files (i.e. SSTables)

Instead of updating data directly on disk (which is slow due to random I/O), LSM-trees write changes sequentially to memory first, then periodically flush them to disk in bulk.