Waveform class¶
This file contains the central Waveform class of the surfboard package, and all the corresponding methods
-
class
surfboard.sound.
Waveform
(path=None, signal=None, sample_rate=44100)¶ The central class of the package. This class instantiates with a path to a sound file and a sample rate to load it or a signal and a sample rate. We can then use methods of this class to compute various components.
-
waveform
¶ Properties written in this way prevent users to assign to self.waveform
-
sample_rate
¶ Properties written in this way prevent users to assign to self.sample_rate
-
compute_components
(component_list)¶ Compute components from self.waveform and self.sample_rate using a list of strings which identify which components to compute. You can pass in arguments to the components (e.g. frame_length_seconds) by passing in the components as dictionaries. For example: {‘mfcc’: {‘n_mfcc’: 26}}. See README.md for more details.
Parameters: component_list (list of str or dict) – The methods to be computed. If elements are str, then the method uses default arguments. If dict, the arguments are passed to the methods. Returns: Dictionary mapping component names to computed components. Return type: dict
-
mfcc
(n_mfcc=13, n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Given a number of MFCCs, use the librosa.feature.mfcc method to compute the correct number of MFCCs on self.waveform and returns the array.
Parameters: - n_mfcc (int) – number of MFCCs to compute
- n_fft_seconds (float) – length of the FFT window in seconds.
- hop_length_seconds (float) – how much the window shifts for every timestep, in seconds.
Returns: MFCCs.
Return type: np.array, [n_mfcc, T / hop_length]
-
log_melspec
(n_mels=128, n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Given a number of filter banks, this uses the librosa.feature.melspectrogram method to compute the log melspectrogram of self.waveform.
Parameters: - n_mels (int) – Number of filter banks per time step in the log melspectrogram.
- n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Log mel spectrogram.
Return type: np.array, [n_mels, T_mels]
-
magnitude_spectrum
(n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Compute the STFT of self.waveform. This is used for further spectral analysis.
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: The magnitude spectrogram
Return type: np.array, [n_fft / 2 + 1, T / hop_length]
-
bark_spectrogram
(n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Compute the magnitude spectrum of self.waveform and arrange the frequency bins in the Bark scale. See https://en.wikipedia.org/wiki/Bark_scale
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: The Bark spectrogram
Return type: np.array, [n_bark_bands, T / hop_length]
-
morlet_cwt
(widths=None)¶ Compute the Morlet Continuous Wavelet Transform of self.waveform. Note that this method returns a large matrix. Shown relevant in Vasquez-Correa et Al, 2016.
Parameters: - wavelet (str) – Wavelet to use. Currently only support “morlet”.
- widhts (None or list) – If None, uses default of 32 evenly spaced widths as [i * sample_rate / 500 for i in range(1, 33)]
Returns: The continuous wavelet transform
Return type: np.array, [len(widths), T]
-
chroma_stft
(n_fft_seconds=0.04, hop_length_seconds=0.01, n_chroma=12)¶ See librosa.feature documentation for more details on this component. This computes a chromagram from a waveform.
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
- n_chroma (int) – Number of chroma bins to compute.
Returns: The chromagram
Return type: np.array, [n_chroma, T / hop_length]
-
chroma_cqt
(hop_length_seconds=0.01, n_chroma=12)¶ See librosa.feature documentation for more details on this component. This computes a constant-Q chromagram from a waveform.
Parameters: - hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
- n_chroma (int) – Number of chroma bins to compute.
Returns: Constant-Q transform mode
Return type: np.array, [n_chroma, T / hop_length]
-
chroma_cens
(hop_length_seconds=0.01, n_chroma=12)¶ See librosa.feature documentation for more details on this component. This computes the CENS chroma variant from a waveform.
Parameters: - hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
- n_chroma (int) – Number of chroma bins to compute.
Returns: CENS-chromagram
Return type: np.array, [n_chroma, T / hop_length]
-
spectral_slope
(n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Compute the magnitude spectrum, and compute the spectral slope from that. This is a basic approximation of the spectrum by a linear regression line. There is one coefficient per timestep.
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Linear regression slope, for every timestep.
Return type: np.array, [1, T / hop_length]
-
spectral_flux
(n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Compute the magnitude spectrum, and compute the spectral flux from that. This is a basic metric, measuring the rate of change of the spectrum.
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: The spectral flux array.
Return type: np.array, [1, T / hop_length]
-
spectral_entropy
(n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Compute the magnitude spectrum, and compute the spectral entropy from that. To compute that, simply normalize each frame of the spectrum, so that they are a probability distribution, then compute the entropy from that.
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: The entropy of each normalized frame.
Return type: np.array, [1, T / hop_length]
-
spectral_centroid
(n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Compute spectral centroid from magnitude spectrum. “First moment”.
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Spectral centroid of the magnitude spectrum (first moment).
Return type: np.array, [1, T / hop_length]
-
spectral_spread
(n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Compute spectral spread (also spectral variance) from magnitude spectrum.
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Spectral skewness of the magnitude spectrum (second moment).
Return type: np.array, [1, T / hop_length
-
spectral_skewness
(n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Compute spectral skewness from magnitude spectrum.
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Spectral skewness of the magnitude spectrum (third moment).
Return type: np.array, [1, T / hop_length
-
spectral_kurtosis
(n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Compute spectral kurtosis from magnitude spectrum.
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Spectral kurtosis of the magnitude spectrum (fourth moment).
Return type: np.array, [1, T / hop_length]
-
spectral_flatness
(n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Given an FFT window size and a hop length, uses the librosa feature package to compute the spectral flatness of self.waveform. This component is a measure to quantify how “noise-like” a sound is. The closer to 1, the closer the sound is to white noise.
Parameters: - n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Spectral flatness vector computed over windows.
Return type: np.array, [1, T/hop_length]
-
spectral_rolloff
(roll_percent=0.85, n_fft_seconds=0.04, hop_length_seconds=0.01)¶ Given an FFT window size and a hop length, uses the librosa component package to compute the spectral roll-off of self.waveform. It is the point below which most energy of a signal is contained and is useful in distinguishing sounds with different energy distributions.
Parameters: - roll_percent (float) – The roll-off percentage: https://essentia.upf.edu/reference/streaming_RollOff.html
- n_fft_seconds (float) – Length of the FFT window in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Spectral rolloff vector computed over windows.
Return type: np.array, [1, T/hop_length]
-
loudness
()¶ Compute the loudness of self.waveform using the pyloudnorm package. See https://github.com/csteinmetz1/pyloudnorm for more details on potential arguments to the functions below.
Returns: The loudness of self.waveform Return type: float
-
loudness_slidingwindow
(frame_length_seconds=1, hop_length_seconds=0.25)¶ Compute the loudness of self.waveform over time. See self.loudness for more details.
Parameters: - frame_length_seconds (float) – Length of the sliding window in seconds.
- hop_length_seconds (float) – How much the sliding window moves by
Returns: The loudness on frames of self.waveform
Return type: [1, T / hop_length]
-
shannon_entropy
()¶ Compute the Shannon entropy of self.waveform, as per https://ijssst.info/Vol-16/No-4/data/8258a127.pdf
Returns: Shannon entropy of the waveform. Return type: float
-
shannon_entropy_slidingwindow
(frame_length_seconds=0.04, hop_length_seconds=0.01)¶ Compute the Shannon entropy of subblocks of a waveform into a newly created time series, as per https://ijssst.info/Vol-16/No-4/data/8258a127.pdf
Parameters: - frame_length_seconds (float) – Length of the sliding window, in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Shannon entropy for each frame
Return type: np.array, [1, T / hop_length]
-
zerocrossing
()¶ Compute the zero crossing rate on self.waveform and return it as per https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0162128&type=printable Note: can also compute zero crossing rate as a time series – see librosa.feature.zero_crossing_rate, and self.get_zcr_sequence.
Returns: Keys “num_zerocrossings” and “rate” mapping to: zerocrossing[“num_zerocrossings”]: number of zero crossings in self.waveform zerocrossing[“rate”]: number of zero crossings divided by number of samples. Return type: dictionary
-
zerocrossing_slidingwindow
(frame_length_seconds=0.04, hop_length_seconds=0.01)¶ Compute the zero crossing rate sequence on self.waveform and return it. This is now a sequence where every entry is computed on frame_length samples. There is a sliding window of length hop_length.
Parameters: - frame_length_seconds (float) – Length of the sliding window, in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Fraction of zero crossings for each frame.
Return type: np.array, [1, T / hop_length]
-
rms
(frame_length_seconds=0.04, hop_length_seconds=0.01)¶ Get the root mean square value for each frame, with a specific frame length and hop length. This used to be called RMSE, or root mean square energy in the jargon?
Parameters: - frame_length_seconds (float) – Length of the sliding window, in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: RMS value for each frame.
Return type: np.array, [1, T / hop_length]
-
intensity
(frame_length_seconds=0.04, hop_length_seconds=0.01)¶ Get a value proportional to the intensity for each frame, with a specific frame length and hop length. Note that the intensity is proportional to the RMS amplitude squared.
Parameters: - frame_length_seconds (float) – Length of the sliding window, in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Proportional intensity value for each frame.
Return type: np.array, [1, T / hop_length]
-
crest_factor
(frame_length_seconds=0.04, hop_length_seconds=0.01)¶ Get the crest factor of this waveform, on sliding windows. This value measures the local intensity of peaks in a waveform. Implemented as per: https://en.wikipedia.org/wiki/Crest_factor
Parameters: - frame_length_seconds (float) – Length of the sliding window, in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Crest factor for each frame.
Return type: np.array, [1, T / hop_length]
-
f0_contour
(hop_length_seconds=0.01, method='swipe', f0_min=60, f0_max=300)¶ Compute the F0 contour using PYSPTK: https://github.com/r9y9/pysptk/.
Parameters: - hop_length_seconds (float) – Hop size argument in pysptk. Corresponds to hopsize in the window sliding of the computation of f0. This is in seconds and gets converted.
- method (str) – One of ‘swipe’ or ‘rapt’. Define which method to use for f0 calculation. See https://github.com/r9y9/pysptk
- f0_min (float) – minimum acceptable f0.
- f0_max (float) – maximum acceptable f0.
Returns: - F0 contour of self.waveform. Contains unvoiced
frames.
Return type: np.array, [1, t1]
-
f0_statistics
(hop_length_seconds=0.01, method='swipe')¶ Compute the F0 mean and standard deviation of self.waveform. Note that we cannot simply rely on using statistics applied to the f0_contour since we do not want to include the zeros in the mean and standard deviation calculations.
Parameters: - hop_length_seconds (float) – Hop size argument in pysptk. Corresponds to hopsize in the window sliding of the computation of f0. This is in seconds and gets converted.
- method (str) – One of ‘swipe’ or ‘rapt’. Define which method to use for f0 calculation. See https://github.com/r9y9/pysptk
Returns: - Dictionary mapping: “mean”: f0 mean of self.waveform.
”std”: f0 standard deviation of self.waveform.
Return type: dict
-
ppe
()¶ Compute pitch period entropy. This is an adaptation of the following Matlab code: https://github.com/Mak-Sim/Troparion/blob/5126f434b96e0c1a4a41fa99dd9148f3c959cfac/Perturbation_analysis/pitch_period_entropy.m Note that computing the PPE relies on the existence of voiced portions in the F0 trajectory.
Returns: The pitch period entropy, as per http://www.maxlittle.net/students/thesis_tsanas.pdf Return type: float
-
jitters
(p_floor=0.0001, p_ceil=0.02, max_p_factor=1.3)¶ Compute the jitters mathematically, according to certain conditions given by p_floor, p_ceil and max_p_factor. See jitters.py for more details.
Parameters: - p_floor (float) – Minimum acceptable period.
- p_ceil (float) – Maximum acceptable period.
- max_p_factor (float) – value to use for the period factor principle
Returns: dictionary mapping strings to floats, with keys “localJitter”, “localabsoluteJitter”, “rapJitter”, “ppq5Jitter”, “ddpJitter”
Return type: dict
-
shimmers
(max_a_factor=1.6, p_floor=0.0001, p_ceil=0.02, max_p_factor=1.3)¶ Compute the shimmers mathematically, according to certain conditions given by max_a_factor, p_floor, p_ceil and max_p_factor. See shimmers.py for more details.
Parameters: - max_a_factor (float) – Value to use for amplitude factor principle
- p_floor (float) – Minimum acceptable period.
- p_ceil (float) – Maximum acceptable period.
- max_p_factor (float) – value to use for the period factor principle
Returns: - Dictionary mapping strings to floats, with keys “localShimmer”,
”localdbShimmer”, “apq3Shimmer”, “apq5Shimmer”, “apq11Shimmer”
Return type: dict
-
hnr
()¶ See https://www.ncbi.nlm.nih.gov/pubmed/12512635 for more thorough description of why HNR is important in the scope of healthcare.
Returns: The harmonics to noise ratio computed on self.waveform. Return type: float
-
dfa
(window_lengths=[64, 128, 256, 512, 1024, 2048, 4096])¶ See Tsanas et al, 2011: Novel speech signal processing algorithms for high-accuracy classification of Parkinson‟s disease Detrended Fluctuation Analysis
Parameters: window_lengths (list of int > 0) – List of L to use in DFA computation. See dfa.py for more details. Returns: The detrended fluctuation analysis alpha value. Return type: float
-
lpc
(order=4, return_np_array=False)¶ This uses the librosa backend to get the Linear Prediction Coefficients via Burg’s method. See librosa.core.lpc for more details.
Parameters: - order (int > 0) – Order of the linear filter
- return_np_array (bool) – If False, returns a dictionary. Otherwise a numpy array.
Returns: Dictionary mapping ‘LPC_{i}’ to the i’th lpc coefficient, for i = 0…order. Or: LP prediction error coefficients (np array case)
Return type: dict or np.array, [order + 1, ]
-
lsf
(order=4, return_np_array=False)¶ Compute the LPC coefficients, then convert them to LSP frequencies. The conversion is done using https://github.com/cokelaer/spectrum/blob/master/src/spectrum/linear_prediction.py
Parameters: - order (int > 0) – Order of the linear filter for LPC calculation
- return_np_array (bool) – If False, returns a dictionary. Otherwise a numpy array.
Returns: - Dictionary mapping ‘LPC_{i}’ to the
i’th lpc coefficient, for i = 0…order. Or LSP frequencies (np array case).
Return type: dict or np.array, [order, ]
-
formants
()¶ Estimate the first four formant frequencies using LPC (see formants.py)
Returns: Dictionary mapping {‘f1’, ‘f2’, ‘f3’, ‘f4’} to corresponding {first, second, third, fourth} formant frequency. Return type: dict
-
formants_slidingwindow
(frame_length_seconds=0.04, hop_length_seconds=0.01)¶ Estimate the first four formant frequencies using LPC (see formants.py) and apply the metric_slidingwindow decorator.
Parameters: - frame_length_seconds (float) – Length of the sliding window, in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: - Time series of the first four formant frequencies
computed on windows of length frame_length_seconds, with sliding window of hop_length_seconds.
Return type: np.array, [4, T / hop_length]
-
kurtosis_slidingwindow
(frame_length_seconds=0.04, hop_length_seconds=0.01)¶ Computes the kurtosis on frames of the waveform with a sliding window
Parameters: - frame_length_seconds (float) – Length of the sliding window, in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Kurtosis on each sliding window.
Return type: np.array, [1, T / hop_length]
-
log_energy
()¶ Compute the log energy of self.waveform as per Abeyrante et al. 2013.
Returns: The log energy of self.waveform, computed as per the paper above. Return type: float
-
log_energy_slidingwindow
(frame_length_seconds=0.04, hop_length_seconds=0.01)¶ Computes the log energy on frames of the waveform with a sliding window
Parameters: - frame_length_seconds (float) – Length of the sliding window, in seconds.
- hop_length_seconds (float) – How much the window shifts for every timestep, in seconds.
Returns: Log energy on each sliding window.
Return type: np.array, [1, T / hop_length]
-