Vanilla feature extraction

This file contains functions to compute features.

surfboard.feature_extraction.load_waveforms_from_paths(paths, sample_rate)

Loads waveforms from paths using multiprocessing

surfboard.feature_extraction.extract_features_from_paths(paths, components_list, statistics_list=None, sample_rate=44100)

Function which loads waveforms, computes the components and statistics and returns them, without the need to store the waveforms in memory. This is to minimize the memory footprint when running over multiple files.

Parameters:
  • paths (list of str) – .wav to compute
  • components_list (list of str/dict) – This is a list of the methods which should be applied to all the waveform objects in waveforms. If a dict, this also contains arguments to the sound.Waveform methods.
  • statistics_list (list of str) – This is a list of the methods which should be applied to all the time-dependent features computed from the waveforms.
  • sample_rate (int > 0) – sampling rate to load the waveforms
Returns:

pandas dataframe where every row corresponds

to features extracted for one of the waveforms and columns represent individual features.

Return type:

pandas DataFrame

surfboard.feature_extraction.extract_features_from_waveform(components_list, statistics_list, waveform)

Given one waveform, a list of components and statistics, extract the features from the waveform.

Parameters:
  • components_list (list of str or dict) – This is a list of the methods which should be applied to all the waveform objects in waveforms. If a dict, this also contains arguments to the sound.Waveform methods.
  • statistics_list (list of str) – This is a list of the methods which should be applied to all the “time-dependent” components computed from the waveforms.
  • waveform (Waveform) – the waveform object to extract components from.
Returns:

Dictionary mapping names to numerical components extracted

for this waveform.

Return type:

dict

surfboard.feature_extraction.extract_features(waveforms, components_list, statistics_list=None)

This is an important function. Given a list of Waveform objects, a list of Waveform methods in the form of strings and a list of Barrel methods in the form of strings, compute the time-independent features resulting. This function does multiprocessing.

Parameters:
  • waveforms (list of Waveform) – This is a list of waveform objects
  • components_list (list of str/dict) – This is a list of the methods which should be applied to all the waveform objects in waveforms. If a dict, this also contains arguments to the sound.Waveform methods.
  • statistics_list (list of str) – This is a list of the methods which should be applied to all the time-dependent features computed from the waveforms.
Returns:

pandas dataframe where every row corresponds

to features extracted for one of the waveforms and columns represent individual features.

Return type:

pandas DataFrame