Short-Time Fourier Transform (STFT)

STFT - Definition

STFT analyzes a signal π‘₯(𝑑)Β by multiplying it with a window function 𝑀(𝑑 βˆ’ 𝜏) centered at time 𝜏, then taking the Fourier Transform:

Ordinary Fourier Transform analyzes a signal π‘₯(𝑑) with no window function:

Choosing Window Function

Different window shapes have different trade-offs in time vs frequency resolution:

Rectangular window

  • simple, but causes spectral leakage (lots of side lobes in frequency)

Hamming window

  • reduces leakage by tapering edges

Hann window

  • similar to Hamming

Gaussian window

  • smoothest, minimizes uncertainty principle
  • when used the resulting STFT is called a Gabor Transform

Choosing Window/Frame Size

Size (N)

Time span of frame

Frequency resolution Ξ”f = Fs/N

Notes

256

~5.8 ms

~172 Hz/bin

Very fast, good for transients (drums, speech consonants), but poor pitch resolution

512

~11.6 ms

~86 Hz/bin

Balance for speech, can still track rapid events

1024

~23.2 ms

~43 Hz/bin

Common default, good mix for general audio/music

2048

~46.4 ms

~21 Hz/bin

Better frequency detail (notes, harmonics), worse timing

4096

~93 ms

~10.8 Hz/bin

High frequency precision, but smears fast events

8192

~186 ms

~5.4 Hz/bin

Super sharp frequency, very blurry in time β€” used in offline spectral analysis, not real-time

Subpages