Waveform class

This file contains the central Waveform class of the surfboard package, and all the corresponding methods

class surfboard.sound.Waveform(path=None, signal=None, sample_rate=44100)

The central class of the package. This class instantiates with a path to a sound file and a sample rate to load it or a signal and a sample rate. We can then use methods of this class to compute various components.

waveform

Properties written in this way prevent users to assign to self.waveform

sample_rate

Properties written in this way prevent users to assign to self.sample_rate

compute_components(component_list)

Compute components from self.waveform and self.sample_rate using a list of strings which identify which components to compute. You can pass in arguments to the components (e.g. frame_length_seconds) by passing in the components as dictionaries. For example: {‘mfcc’: {‘n_mfcc’: 26}}. See README.md for more details.

Parameters:component_list (list of str or dict) – The methods to be computed. If elements are str, then the method uses default arguments. If dict, the arguments are passed to the methods.
Returns:Dictionary mapping component names to computed components.
Return type:dict
mfcc(n_mfcc=13, n_fft_seconds=0.04, hop_length_seconds=0.01)

Given a number of MFCCs, use the librosa.feature.mfcc method to compute the correct number of MFCCs on self.waveform and returns the array.

Parameters:
  • n_mfcc (int) – number of MFCCs to compute
  • n_fft_seconds (float) – length of the FFT window in seconds.
  • hop_length_seconds (float) – how much the window shifts for every timestep, in seconds.
Returns:

MFCCs.

Return type:

np.array, [n_mfcc, T / hop_length]

log_melspec(n_mels=128, n_fft_seconds=0.04, hop_length_seconds=0.01)

Given a number of filter banks, this uses the librosa.feature.melspectrogram method to compute the log melspectrogram of self.waveform.

Parameters:
  • n_mels (int) – Number of filter banks per time step in the log melspectrogram.
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Log mel spectrogram.

Return type:

np.array, [n_mels, T_mels]

magnitude_spectrum(n_fft_seconds=0.04, hop_length_seconds=0.01)

Compute the STFT of self.waveform. This is used for further spectral analysis.

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

The magnitude spectrogram

Return type:

np.array, [n_fft / 2 + 1, T / hop_length]

bark_spectrogram(n_fft_seconds=0.04, hop_length_seconds=0.01)

Compute the magnitude spectrum of self.waveform and arrange the frequency bins in the Bark scale. See https://en.wikipedia.org/wiki/Bark_scale

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

The Bark spectrogram

Return type:

np.array, [n_bark_bands, T / hop_length]

morlet_cwt(widths=None)

Compute the Morlet Continuous Wavelet Transform of self.waveform. Note that this method returns a large matrix. Shown relevant in Vasquez-Correa et Al, 2016.

Parameters:
  • wavelet (str) – Wavelet to use. Currently only support “morlet”.
  • widhts (None or list) – If None, uses default of 32 evenly spaced widths as [i * sample_rate / 500 for i in range(1, 33)]
Returns:

The continuous wavelet transform

Return type:

np.array, [len(widths), T]

chroma_stft(n_fft_seconds=0.04, hop_length_seconds=0.01, n_chroma=12)

See librosa.feature documentation for more details on this component. This computes a chromagram from a waveform.

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
  • n_chroma (int) – Number of chroma bins to compute.
Returns:

The chromagram

Return type:

np.array, [n_chroma, T / hop_length]

chroma_cqt(hop_length_seconds=0.01, n_chroma=12)

See librosa.feature documentation for more details on this component. This computes a constant-Q chromagram from a waveform.

Parameters:
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
  • n_chroma (int) – Number of chroma bins to compute.
Returns:

Constant-Q transform mode

Return type:

np.array, [n_chroma, T / hop_length]

chroma_cens(hop_length_seconds=0.01, n_chroma=12)

See librosa.feature documentation for more details on this component. This computes the CENS chroma variant from a waveform.

Parameters:
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
  • n_chroma (int) – Number of chroma bins to compute.
Returns:

CENS-chromagram

Return type:

np.array, [n_chroma, T / hop_length]

spectral_slope(n_fft_seconds=0.04, hop_length_seconds=0.01)

Compute the magnitude spectrum, and compute the spectral slope from that. This is a basic approximation of the spectrum by a linear regression line. There is one coefficient per timestep.

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Linear regression slope, for every timestep.

Return type:

np.array, [1, T / hop_length]

spectral_flux(n_fft_seconds=0.04, hop_length_seconds=0.01)

Compute the magnitude spectrum, and compute the spectral flux from that. This is a basic metric, measuring the rate of change of the spectrum.

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

The spectral flux array.

Return type:

np.array, [1, T / hop_length]

spectral_entropy(n_fft_seconds=0.04, hop_length_seconds=0.01)

Compute the magnitude spectrum, and compute the spectral entropy from that. To compute that, simply normalize each frame of the spectrum, so that they are a probability distribution, then compute the entropy from that.

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

The entropy of each normalized frame.

Return type:

np.array, [1, T / hop_length]

spectral_centroid(n_fft_seconds=0.04, hop_length_seconds=0.01)

Compute spectral centroid from magnitude spectrum. “First moment”.

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Spectral centroid of the magnitude spectrum (first moment).

Return type:

np.array, [1, T / hop_length]

spectral_spread(n_fft_seconds=0.04, hop_length_seconds=0.01)

Compute spectral spread (also spectral variance) from magnitude spectrum.

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Spectral skewness of the magnitude spectrum (second moment).

Return type:

np.array, [1, T / hop_length

spectral_skewness(n_fft_seconds=0.04, hop_length_seconds=0.01)

Compute spectral skewness from magnitude spectrum.

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Spectral skewness of the magnitude spectrum (third moment).

Return type:

np.array, [1, T / hop_length

spectral_kurtosis(n_fft_seconds=0.04, hop_length_seconds=0.01)

Compute spectral kurtosis from magnitude spectrum.

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Spectral kurtosis of the magnitude spectrum (fourth moment).

Return type:

np.array, [1, T / hop_length]

spectral_flatness(n_fft_seconds=0.04, hop_length_seconds=0.01)

Given an FFT window size and a hop length, uses the librosa feature package to compute the spectral flatness of self.waveform. This component is a measure to quantify how “noise-like” a sound is. The closer to 1, the closer the sound is to white noise.

Parameters:
  • n_fft_seconds (float) – Length of the FFT window in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Spectral flatness vector computed over windows.

Return type:

np.array, [1, T/hop_length]

spectral_rolloff(roll_percent=0.85, n_fft_seconds=0.04, hop_length_seconds=0.01)

Given an FFT window size and a hop length, uses the librosa component package to compute the spectral roll-off of self.waveform. It is the point below which most energy of a signal is contained and is useful in distinguishing sounds with different energy distributions.

Parameters:
Returns:

Spectral rolloff vector computed over windows.

Return type:

np.array, [1, T/hop_length]

loudness()

Compute the loudness of self.waveform using the pyloudnorm package. See https://github.com/csteinmetz1/pyloudnorm for more details on potential arguments to the functions below.

Returns:The loudness of self.waveform
Return type:float
loudness_slidingwindow(frame_length_seconds=1, hop_length_seconds=0.25)

Compute the loudness of self.waveform over time. See self.loudness for more details.

Parameters:
  • frame_length_seconds (float) – Length of the sliding window in seconds.
  • hop_length_seconds (float) – How much the sliding window moves by
Returns:

The loudness on frames of self.waveform

Return type:

[1, T / hop_length]

shannon_entropy()

Compute the Shannon entropy of self.waveform, as per https://ijssst.info/Vol-16/No-4/data/8258a127.pdf

Returns:Shannon entropy of the waveform.
Return type:float
shannon_entropy_slidingwindow(frame_length_seconds=0.04, hop_length_seconds=0.01)

Compute the Shannon entropy of subblocks of a waveform into a newly created time series, as per https://ijssst.info/Vol-16/No-4/data/8258a127.pdf

Parameters:
  • frame_length_seconds (float) – Length of the sliding window, in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Shannon entropy for each frame

Return type:

np.array, [1, T / hop_length]

zerocrossing()

Compute the zero crossing rate on self.waveform and return it as per https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0162128&type=printable Note: can also compute zero crossing rate as a time series – see librosa.feature.zero_crossing_rate, and self.get_zcr_sequence.

Returns:Keys “num_zerocrossings” and “rate” mapping to: zerocrossing[“num_zerocrossings”]: number of zero crossings in self.waveform zerocrossing[“rate”]: number of zero crossings divided by number of samples.
Return type:dictionary
zerocrossing_slidingwindow(frame_length_seconds=0.04, hop_length_seconds=0.01)

Compute the zero crossing rate sequence on self.waveform and return it. This is now a sequence where every entry is computed on frame_length samples. There is a sliding window of length hop_length.

Parameters:
  • frame_length_seconds (float) – Length of the sliding window, in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Fraction of zero crossings for each frame.

Return type:

np.array, [1, T / hop_length]

rms(frame_length_seconds=0.04, hop_length_seconds=0.01)

Get the root mean square value for each frame, with a specific frame length and hop length. This used to be called RMSE, or root mean square energy in the jargon?

Parameters:
  • frame_length_seconds (float) – Length of the sliding window, in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

RMS value for each frame.

Return type:

np.array, [1, T / hop_length]

intensity(frame_length_seconds=0.04, hop_length_seconds=0.01)

Get a value proportional to the intensity for each frame, with a specific frame length and hop length. Note that the intensity is proportional to the RMS amplitude squared.

Parameters:
  • frame_length_seconds (float) – Length of the sliding window, in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Proportional intensity value for each frame.

Return type:

np.array, [1, T / hop_length]

crest_factor(frame_length_seconds=0.04, hop_length_seconds=0.01)

Get the crest factor of this waveform, on sliding windows. This value measures the local intensity of peaks in a waveform. Implemented as per: https://en.wikipedia.org/wiki/Crest_factor

Parameters:
  • frame_length_seconds (float) – Length of the sliding window, in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Crest factor for each frame.

Return type:

np.array, [1, T / hop_length]

f0_contour(hop_length_seconds=0.01, method='swipe', f0_min=60, f0_max=300)

Compute the F0 contour using PYSPTK: https://github.com/r9y9/pysptk/.

Parameters:
  • hop_length_seconds (float) – Hop size argument in pysptk. Corresponds to hopsize in the window sliding of the computation of f0. This is in seconds and gets converted.
  • method (str) – One of ‘swipe’ or ‘rapt’. Define which method to use for f0 calculation. See https://github.com/r9y9/pysptk
  • f0_min (float) – minimum acceptable f0.
  • f0_max (float) – maximum acceptable f0.
Returns:

F0 contour of self.waveform. Contains unvoiced

frames.

Return type:

np.array, [1, t1]

f0_statistics(hop_length_seconds=0.01, method='swipe')

Compute the F0 mean and standard deviation of self.waveform. Note that we cannot simply rely on using statistics applied to the f0_contour since we do not want to include the zeros in the mean and standard deviation calculations.

Parameters:
  • hop_length_seconds (float) – Hop size argument in pysptk. Corresponds to hopsize in the window sliding of the computation of f0. This is in seconds and gets converted.
  • method (str) – One of ‘swipe’ or ‘rapt’. Define which method to use for f0 calculation. See https://github.com/r9y9/pysptk
Returns:

Dictionary mapping: “mean”: f0 mean of self.waveform.

”std”: f0 standard deviation of self.waveform.

Return type:

dict

ppe()

Compute pitch period entropy. This is an adaptation of the following Matlab code: https://github.com/Mak-Sim/Troparion/blob/5126f434b96e0c1a4a41fa99dd9148f3c959cfac/Perturbation_analysis/pitch_period_entropy.m Note that computing the PPE relies on the existence of voiced portions in the F0 trajectory.

Returns:The pitch period entropy, as per http://www.maxlittle.net/students/thesis_tsanas.pdf
Return type:float
jitters(p_floor=0.0001, p_ceil=0.02, max_p_factor=1.3)

Compute the jitters mathematically, according to certain conditions given by p_floor, p_ceil and max_p_factor. See jitters.py for more details.

Parameters:
  • p_floor (float) – Minimum acceptable period.
  • p_ceil (float) – Maximum acceptable period.
  • max_p_factor (float) – value to use for the period factor principle
Returns:

dictionary mapping strings to floats, with keys “localJitter”, “localabsoluteJitter”, “rapJitter”, “ppq5Jitter”, “ddpJitter”

Return type:

dict

shimmers(max_a_factor=1.6, p_floor=0.0001, p_ceil=0.02, max_p_factor=1.3)

Compute the shimmers mathematically, according to certain conditions given by max_a_factor, p_floor, p_ceil and max_p_factor. See shimmers.py for more details.

Parameters:
  • max_a_factor (float) – Value to use for amplitude factor principle
  • p_floor (float) – Minimum acceptable period.
  • p_ceil (float) – Maximum acceptable period.
  • max_p_factor (float) – value to use for the period factor principle
Returns:

Dictionary mapping strings to floats, with keys “localShimmer”,

”localdbShimmer”, “apq3Shimmer”, “apq5Shimmer”, “apq11Shimmer”

Return type:

dict

hnr()

See https://www.ncbi.nlm.nih.gov/pubmed/12512635 for more thorough description of why HNR is important in the scope of healthcare.

Returns:The harmonics to noise ratio computed on self.waveform.
Return type:float
dfa(window_lengths=[64, 128, 256, 512, 1024, 2048, 4096])

See Tsanas et al, 2011: Novel speech signal processing algorithms for high-accuracy classification of Parkinson‟s disease Detrended Fluctuation Analysis

Parameters:window_lengths (list of int > 0) – List of L to use in DFA computation. See dfa.py for more details.
Returns:The detrended fluctuation analysis alpha value.
Return type:float
lpc(order=4, return_np_array=False)

This uses the librosa backend to get the Linear Prediction Coefficients via Burg’s method. See librosa.core.lpc for more details.

Parameters:
  • order (int > 0) – Order of the linear filter
  • return_np_array (bool) – If False, returns a dictionary. Otherwise a numpy array.
Returns:

Dictionary mapping ‘LPC_{i}’ to the i’th lpc coefficient, for i = 0…order. Or: LP prediction error coefficients (np array case)

Return type:

dict or np.array, [order + 1, ]

lsf(order=4, return_np_array=False)

Compute the LPC coefficients, then convert them to LSP frequencies. The conversion is done using https://github.com/cokelaer/spectrum/blob/master/src/spectrum/linear_prediction.py

Parameters:
  • order (int > 0) – Order of the linear filter for LPC calculation
  • return_np_array (bool) – If False, returns a dictionary. Otherwise a numpy array.
Returns:

Dictionary mapping ‘LPC_{i}’ to the

i’th lpc coefficient, for i = 0…order. Or LSP frequencies (np array case).

Return type:

dict or np.array, [order, ]

formants()

Estimate the first four formant frequencies using LPC (see formants.py)

Returns:Dictionary mapping {‘f1’, ‘f2’, ‘f3’, ‘f4’} to corresponding {first, second, third, fourth} formant frequency.
Return type:dict
formants_slidingwindow(frame_length_seconds=0.04, hop_length_seconds=0.01)

Estimate the first four formant frequencies using LPC (see formants.py) and apply the metric_slidingwindow decorator.

Parameters:
  • frame_length_seconds (float) – Length of the sliding window, in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Time series of the first four formant frequencies

computed on windows of length frame_length_seconds, with sliding window of hop_length_seconds.

Return type:

np.array, [4, T / hop_length]

kurtosis_slidingwindow(frame_length_seconds=0.04, hop_length_seconds=0.01)

Computes the kurtosis on frames of the waveform with a sliding window

Parameters:
  • frame_length_seconds (float) – Length of the sliding window, in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Kurtosis on each sliding window.

Return type:

np.array, [1, T / hop_length]

log_energy()

Compute the log energy of self.waveform as per Abeyrante et al. 2013.

Returns:The log energy of self.waveform, computed as per the paper above.
Return type:float
log_energy_slidingwindow(frame_length_seconds=0.04, hop_length_seconds=0.01)

Computes the log energy on frames of the waveform with a sliding window

Parameters:
  • frame_length_seconds (float) – Length of the sliding window, in seconds.
  • hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:

Log energy on each sliding window.

Return type:

np.array, [1, T / hop_length]