Resources

https://www.youtube.com/watch?v=kMNSAhsyiDg
https://www.youtube.com/watch?v=b6xeOLjeKs0
https://www.youtube.com/watch?v=Q4LYys9v9Ko
https://www.youtube.com/watch?v=RRsq9apr5QY
https://www.youtube.com/watch?v=Q4LYys9v9Ko - Tech Talk: What’s that Sound? An Overview of Shazam’s Audio Search Algorithm
https://www.youtube.com/watch?v=LZ7THTB88AE - Cameron Macleod - Implementing a Sound Identifier in Python
https://www.cameronmacleod.com/blog/how-does-shazam-work
https://github.com/worldveil/dejavu
https://www.youtube.com/watch?v=WhXgpkQ8E-Q - PWLTO#11 – Peter Sobot on An Industrial-Strength Audio Search Algorithm
https://github.com/itspoma/audio-fingerprint-identifying-python

Audio Features Invariant to Signal Degradations

fourier coefficients
mel frequency cepstral coefficients (MFCC)
spectral flatness
sharpness
linear predictive coding (LPC)

In order to extract a 32-bit frame, 33 non-overlapping frequency bands are selected

frequency range from 300Hz to 2000Hz
logarithmic spacing (HAS operates on approximately logarithmic bands)

Initial

yt-dlp -x "https://www.youtube.com/watch?v=hLQl3WQQoQ0"

ffmpeg -i song-01.opus -c:a pcm_s24le song-01.wav

import os
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

audio_fpath = "./audio/"
audio_clips = os.listdir(audio_fpath)

# x, sr = librosa.load(audio_fpath + audio_clips[0], sr=None, offset=15.0, duration=0.01)
x, sr = librosa.load(audio_fpath + audio_clips[0], sr=None)
# x is the audio time series
# sr is the sample rate

# Compute the Short-Time Fourier Transform (STFT)
D = librosa.stft(x, n_fft=131072)

D = np.abs(D)

# Convert amplitude to decibels (log scale)
log_D = librosa.amplitude_to_db(D, ref=np.max)

# Plot the log spectrogram
plt.figure(figsize=(14, 5))
librosa.display.specshow(log_D, sr=sr, x_axis='time', y_axis='log')
plt.colorbar()
plt.show()

／var／log marcus chiu

Explorer

Shazam Copy

Resources

Audio Features Invariant to Signal Degradations

Initial

／var／logmarcus chiu

Explorer

Shazam Copy

Resources

Audio Features Invariant to Signal Degradations

Initial

／var／log marcus chiu