Waveform class¶

This file contains the central Waveform class of the surfboard package, and all the corresponding methods

class surfboard.sound.Waveform(path=None, signal=None, sample_rate=44100)¶

The central class of the package. This class instantiates with a path to a sound file and a sample rate to load it or a signal and a sample rate. We can then use methods of this class to compute various components.

waveform¶: Properties written in this way prevent users to assign to self.waveform

sample_rate¶: Properties written in this way prevent users to assign to self.sample_rate

compute_components(component_list)¶

Compute components from self.waveform and self.sample_rate using a list of strings which identify which components to compute. You can pass in arguments to the components (e.g. frame_length_seconds) by passing in the components as dictionaries. For example: {‘mfcc’: {‘n_mfcc’: 26}}. See README.md for more details.

Parameters:	component_list (list of str or dict) – The methods to be computed. If elements are str, then the method uses default arguments. If dict, the arguments are passed to the methods.
Returns:	Dictionary mapping component names to computed components.
Return type:	dict

mfcc(n_mfcc=13, n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Given a number of MFCCs, use the librosa.feature.mfcc method to compute the correct number of MFCCs on self.waveform and returns the array.

Parameters:	n_mfcc (int) – number of MFCCs to compute n_fft_seconds (float) – length of the FFT window in seconds. hop_length_seconds (float) – how much the window shifts for every timestep, in seconds.
Returns:	MFCCs.
Return type:	np.array, [n_mfcc, T / hop_length]

log_melspec(n_mels=128, n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Given a number of filter banks, this uses the librosa.feature.melspectrogram method to compute the log melspectrogram of self.waveform.

Parameters:	n_mels (int) – Number of filter banks per time step in the log melspectrogram. n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Log mel spectrogram.
Return type:	np.array, [n_mels, T_mels]

magnitude_spectrum(n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Compute the STFT of self.waveform. This is used for further spectral analysis.

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	The magnitude spectrogram
Return type:	np.array, [n_fft / 2 + 1, T / hop_length]

bark_spectrogram(n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Compute the magnitude spectrum of self.waveform and arrange the frequency bins in the Bark scale. See https://en.wikipedia.org/wiki/Bark_scale

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	The Bark spectrogram
Return type:	np.array, [n_bark_bands, T / hop_length]

morlet_cwt(widths=None)¶

Compute the Morlet Continuous Wavelet Transform of self.waveform. Note that this method returns a large matrix. Shown relevant in Vasquez-Correa et Al, 2016.

Parameters:	wavelet (str) – Wavelet to use. Currently only support “morlet”. widhts (None or list) – If None, uses default of 32 evenly spaced widths as [i * sample_rate / 500 for i in range(1, 33)]
Returns:	The continuous wavelet transform
Return type:	np.array, [len(widths), T]

chroma_stft(n_fft_seconds=0.04, hop_length_seconds=0.01, n_chroma=12)¶

See librosa.feature documentation for more details on this component. This computes a chromagram from a waveform.

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds. n_chroma (int) – Number of chroma bins to compute.
Returns:	The chromagram
Return type:	np.array, [n_chroma, T / hop_length]

chroma_cqt(hop_length_seconds=0.01, n_chroma=12)¶

See librosa.feature documentation for more details on this component. This computes a constant-Q chromagram from a waveform.

Parameters:	hop_length_seconds (float) – How much the window shifts for every timestep, in seconds. n_chroma (int) – Number of chroma bins to compute.
Returns:	Constant-Q transform mode
Return type:	np.array, [n_chroma, T / hop_length]

chroma_cens(hop_length_seconds=0.01, n_chroma=12)¶

See librosa.feature documentation for more details on this component. This computes the CENS chroma variant from a waveform.

Parameters:	hop_length_seconds (float) – How much the window shifts for every timestep, in seconds. n_chroma (int) – Number of chroma bins to compute.
Returns:	CENS-chromagram
Return type:	np.array, [n_chroma, T / hop_length]

spectral_slope(n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Compute the magnitude spectrum, and compute the spectral slope from that. This is a basic approximation of the spectrum by a linear regression line. There is one coefficient per timestep.

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Linear regression slope, for every timestep.
Return type:	np.array, [1, T / hop_length]

spectral_flux(n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Compute the magnitude spectrum, and compute the spectral flux from that. This is a basic metric, measuring the rate of change of the spectrum.

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	The spectral flux array.
Return type:	np.array, [1, T / hop_length]

spectral_entropy(n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Compute the magnitude spectrum, and compute the spectral entropy from that. To compute that, simply normalize each frame of the spectrum, so that they are a probability distribution, then compute the entropy from that.

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	The entropy of each normalized frame.
Return type:	np.array, [1, T / hop_length]

spectral_centroid(n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Compute spectral centroid from magnitude spectrum. “First moment”.

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Spectral centroid of the magnitude spectrum (first moment).
Return type:	np.array, [1, T / hop_length]

spectral_spread(n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Compute spectral spread (also spectral variance) from magnitude spectrum.

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Spectral skewness of the magnitude spectrum (second moment).
Return type:	np.array, [1, T / hop_length

spectral_skewness(n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Compute spectral skewness from magnitude spectrum.

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Spectral skewness of the magnitude spectrum (third moment).
Return type:	np.array, [1, T / hop_length

spectral_kurtosis(n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Compute spectral kurtosis from magnitude spectrum.

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Spectral kurtosis of the magnitude spectrum (fourth moment).
Return type:	np.array, [1, T / hop_length]

spectral_flatness(n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Given an FFT window size and a hop length, uses the librosa feature package to compute the spectral flatness of self.waveform. This component is a measure to quantify how “noise-like” a sound is. The closer to 1, the closer the sound is to white noise.

Parameters:	n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Spectral flatness vector computed over windows.
Return type:	np.array, [1, T/hop_length]

spectral_rolloff(roll_percent=0.85, n_fft_seconds=0.04, hop_length_seconds=0.01)¶

Given an FFT window size and a hop length, uses the librosa component package to compute the spectral roll-off of self.waveform. It is the point below which most energy of a signal is contained and is useful in distinguishing sounds with different energy distributions.

Parameters:	roll_percent (float) – The roll-off percentage: https://essentia.upf.edu/reference/streaming_RollOff.html n_fft_seconds (float) – Length of the FFT window in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Spectral rolloff vector computed over windows.
Return type:	np.array, [1, T/hop_length]

loudness()¶

Compute the loudness of self.waveform using the pyloudnorm package. See https://github.com/csteinmetz1/pyloudnorm for more details on potential arguments to the functions below.

Returns:	The loudness of self.waveform
Return type:	float

loudness_slidingwindow(frame_length_seconds=1, hop_length_seconds=0.25)¶

Compute the loudness of self.waveform over time. See self.loudness for more details.

Parameters:	frame_length_seconds (float) – Length of the sliding window in seconds. hop_length_seconds (float) – How much the sliding window moves by
Returns:	The loudness on frames of self.waveform
Return type:	[1, T / hop_length]

shannon_entropy()¶

Compute the Shannon entropy of self.waveform, as per https://ijssst.info/Vol-16/No-4/data/8258a127.pdf

Returns:	Shannon entropy of the waveform.
Return type:	float

shannon_entropy_slidingwindow(frame_length_seconds=0.04, hop_length_seconds=0.01)¶

Compute the Shannon entropy of subblocks of a waveform into a newly created time series, as per https://ijssst.info/Vol-16/No-4/data/8258a127.pdf

Parameters:	frame_length_seconds (float) – Length of the sliding window, in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Shannon entropy for each frame
Return type:	np.array, [1, T / hop_length]

zerocrossing()¶

Compute the zero crossing rate on self.waveform and return it as per https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0162128&type=printable Note: can also compute zero crossing rate as a time series – see librosa.feature.zero_crossing_rate, and self.get_zcr_sequence.

Returns:	Keys “num_zerocrossings” and “rate” mapping to: zerocrossing[“num_zerocrossings”]: number of zero crossings in self.waveform zerocrossing[“rate”]: number of zero crossings divided by number of samples.
Return type:	dictionary

zerocrossing_slidingwindow(frame_length_seconds=0.04, hop_length_seconds=0.01)¶

Compute the zero crossing rate sequence on self.waveform and return it. This is now a sequence where every entry is computed on frame_length samples. There is a sliding window of length hop_length.

Parameters:	frame_length_seconds (float) – Length of the sliding window, in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Fraction of zero crossings for each frame.
Return type:	np.array, [1, T / hop_length]

rms(frame_length_seconds=0.04, hop_length_seconds=0.01)¶

Get the root mean square value for each frame, with a specific frame length and hop length. This used to be called RMSE, or root mean square energy in the jargon?

Parameters:	frame_length_seconds (float) – Length of the sliding window, in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	RMS value for each frame.
Return type:	np.array, [1, T / hop_length]

intensity(frame_length_seconds=0.04, hop_length_seconds=0.01)¶

Get a value proportional to the intensity for each frame, with a specific frame length and hop length. Note that the intensity is proportional to the RMS amplitude squared.

Parameters:	frame_length_seconds (float) – Length of the sliding window, in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Proportional intensity value for each frame.
Return type:	np.array, [1, T / hop_length]

crest_factor(frame_length_seconds=0.04, hop_length_seconds=0.01)¶

Get the crest factor of this waveform, on sliding windows. This value measures the local intensity of peaks in a waveform. Implemented as per: https://en.wikipedia.org/wiki/Crest_factor

Parameters:	frame_length_seconds (float) – Length of the sliding window, in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Crest factor for each frame.
Return type:	np.array, [1, T / hop_length]

f0_contour(hop_length_seconds=0.01, method='swipe', f0_min=60, f0_max=300)¶

Compute the F0 contour using PYSPTK: https://github.com/r9y9/pysptk/.

Parameters:

hop_length_seconds (float) – Hop size argument in pysptk. Corresponds to hopsize in the window sliding of the computation of f0. This is in seconds and gets converted.
method (str) – One of ‘swipe’ or ‘rapt’. Define which method to use for f0 calculation. See https://github.com/r9y9/pysptk
f0_min (float) – minimum acceptable f0.
f0_max (float) – maximum acceptable f0.

Returns:

F0 contour of self.waveform. Contains unvoiced: frames.

Return type:

np.array, [1, t1]

f0_statistics(hop_length_seconds=0.01, method='swipe')¶

Compute the F0 mean and standard deviation of self.waveform. Note that we cannot simply rely on using statistics applied to the f0_contour since we do not want to include the zeros in the mean and standard deviation calculations.

Parameters:

hop_length_seconds (float) – Hop size argument in pysptk. Corresponds to hopsize in the window sliding of the computation of f0. This is in seconds and gets converted.
method (str) – One of ‘swipe’ or ‘rapt’. Define which method to use for f0 calculation. See https://github.com/r9y9/pysptk

Returns:

Dictionary mapping: “mean”: f0 mean of self.waveform.: ”std”: f0 standard deviation of self.waveform.

Return type:

dict

ppe()¶

Compute pitch period entropy. This is an adaptation of the following Matlab code: https://github.com/Mak-Sim/Troparion/blob/5126f434b96e0c1a4a41fa99dd9148f3c959cfac/Perturbation_analysis/pitch_period_entropy.m Note that computing the PPE relies on the existence of voiced portions in the F0 trajectory.

Returns:	The pitch period entropy, as per http://www.maxlittle.net/students/thesis_tsanas.pdf
Return type:	float

jitters(p_floor=0.0001, p_ceil=0.02, max_p_factor=1.3)¶

Compute the jitters mathematically, according to certain conditions given by p_floor, p_ceil and max_p_factor. See jitters.py for more details.

Parameters:	p_floor (float) – Minimum acceptable period. p_ceil (float) – Maximum acceptable period. max_p_factor (float) – value to use for the period factor principle
Returns:	dictionary mapping strings to floats, with keys “localJitter”, “localabsoluteJitter”, “rapJitter”, “ppq5Jitter”, “ddpJitter”
Return type:	dict

shimmers(max_a_factor=1.6, p_floor=0.0001, p_ceil=0.02, max_p_factor=1.3)¶

Compute the shimmers mathematically, according to certain conditions given by max_a_factor, p_floor, p_ceil and max_p_factor. See shimmers.py for more details.

Parameters:

max_a_factor (float) – Value to use for amplitude factor principle
p_floor (float) – Minimum acceptable period.
p_ceil (float) – Maximum acceptable period.
max_p_factor (float) – value to use for the period factor principle

Returns:

Dictionary mapping strings to floats, with keys “localShimmer”,: ”localdbShimmer”, “apq3Shimmer”, “apq5Shimmer”, “apq11Shimmer”

Return type:

dict

hnr()¶

See https://www.ncbi.nlm.nih.gov/pubmed/12512635 for more thorough description of why HNR is important in the scope of healthcare.

Returns:	The harmonics to noise ratio computed on self.waveform.
Return type:	float

dfa(window_lengths=[64, 128, 256, 512, 1024, 2048, 4096])¶

See Tsanas et al, 2011: Novel speech signal processing algorithms for high-accuracy classification of Parkinson‟s disease Detrended Fluctuation Analysis

Parameters:	window_lengths (list of int > 0) – List of L to use in DFA computation. See dfa.py for more details.
Returns:	The detrended fluctuation analysis alpha value.
Return type:	float

lpc(order=4, return_np_array=False)¶

This uses the librosa backend to get the Linear Prediction Coefficients via Burg’s method. See librosa.core.lpc for more details.

Parameters:	order (int > 0) – Order of the linear filter return_np_array (bool) – If False, returns a dictionary. Otherwise a numpy array.
Returns:	Dictionary mapping ‘LPC_{i}’ to the i’th lpc coefficient, for i = 0…order. Or: LP prediction error coefficients (np array case)
Return type:	dict or np.array, [order + 1, ]

lsf(order=4, return_np_array=False)¶

Compute the LPC coefficients, then convert them to LSP frequencies. The conversion is done using https://github.com/cokelaer/spectrum/blob/master/src/spectrum/linear_prediction.py

Parameters:

order (int > 0) – Order of the linear filter for LPC calculation
return_np_array (bool) – If False, returns a dictionary. Otherwise a numpy array.

Returns:

Dictionary mapping ‘LPC_{i}’ to the: i’th lpc coefficient, for i = 0…order. Or LSP frequencies (np array case).

Return type:

dict or np.array, [order, ]

formants()¶

Estimate the first four formant frequencies using LPC (see formants.py)

Returns:	Dictionary mapping {‘f1’, ‘f2’, ‘f3’, ‘f4’} to corresponding {first, second, third, fourth} formant frequency.
Return type:	dict

formants_slidingwindow(frame_length_seconds=0.04, hop_length_seconds=0.01)¶

Estimate the first four formant frequencies using LPC (see formants.py) and apply the metric_slidingwindow decorator.

Parameters:

frame_length_seconds (float) – Length of the sliding window, in seconds.
hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.

Returns:

Time series of the first four formant frequencies: computed on windows of length frame_length_seconds, with sliding window of hop_length_seconds.

Return type:

np.array, [4, T / hop_length]

kurtosis_slidingwindow(frame_length_seconds=0.04, hop_length_seconds=0.01)¶

Computes the kurtosis on frames of the waveform with a sliding window

Parameters:	frame_length_seconds (float) – Length of the sliding window, in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Kurtosis on each sliding window.
Return type:	np.array, [1, T / hop_length]

log_energy()¶

Compute the log energy of self.waveform as per Abeyrante et al. 2013.

Returns:	The log energy of self.waveform, computed as per the paper above.
Return type:	float

log_energy_slidingwindow(frame_length_seconds=0.04, hop_length_seconds=0.01)¶

Computes the log energy on frames of the waveform with a sliding window

Parameters:	frame_length_seconds (float) – Length of the sliding window, in seconds. hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns:	Log energy on each sliding window.
Return type:	np.array, [1, T / hop_length]