Feature extraction with multiprocessing

This file contains functions to compute features with multiprocessing.

surfboard.feature_extraction_multiprocessing.load_waveform_from_path(sample_rate, path)

Helper function to access constructor with Pool

Parameters:
  • sample_rate (int) – The sample rate to load the Waveform object
  • path (str) – The path to the audio file to load
Returns:

The loaded Waveform object

Return type:

Waveform

surfboard.feature_extraction_multiprocessing.load_waveforms_from_paths(paths, sample_rate, num_proc=1)

Loads waveforms from paths using multiprocessing

Parameters:
  • paths (list of str) – A list of paths to audio files
  • sample_rate (int) – The sample rate to load the audio files
  • num_proc (int >= 1) – The number of parallel processes to run
Returns:

List of loaded Waveform objects

Return type:

list of Waveform

surfboard.feature_extraction_multiprocessing.extract_features_from_path(components_list, statistics_list, sample_rate, path)

Function which loads a waveform, computes the components and statistics and returns them, without the need to store the waveforms in memory. This is to prevent accumulating too much memory.

Parameters:
  • components_list (list of str/dict) – This is a list of the methods which should be applied to all the waveform objects in waveforms. If a dict, this also contains arguments to the sound.Waveform methods.
  • statistics_list (list of str) – This is a list of the methods which should be applied to all the “time-dependent” features computed from the waveforms.
  • sample_rate (int > 0) – sampling rate to load the waveforms
  • path (str) – path to audio file to extract features from
Returns:

Dictionary mapping feature names to values.

Return type:

dict

surfboard.feature_extraction_multiprocessing.extract_features_from_paths(paths, components_list, statistics_list=None, sample_rate=44100, num_proc=1)

Function which loads waveforms, computes the features and statistics and returns them, without the need to store the waveforms in memory. This is to prevent accumulating too much memory.

Parameters:
  • paths (list of str) – .wav to compute
  • components_list (list of str or dict) – This is a list of the methods which should be applied to all the waveform objects in waveforms. If a dict, this also contains arguments to the sound.Waveform methods.
  • statistics_list (list of str) – This is a list of the methods which should be applied to all the “time-dependent” features computed from the waveforms.
  • sample_rate (int > 0) – sampling rate to load the waveforms
Returns:

pandas dataframe where every row corresponds

to features extracted for one of the waveforms and columns represent individual features.

Return type:

pandas DataFrame

surfboard.feature_extraction_multiprocessing.extract_features(waveforms, components_list, statistics_list=None, num_proc=1)

This is an important function. Given a list of Waveform objects, a list of Waveform methods in the form of strings and a list of Barrel methods in the form of strings, compute the time-independent features resulting. This function does multiprocessing.

Parameters:
  • waveforms (list of Waveform) – This is a list of waveform objects
  • components_list (list of str or dict) – This is a list of the methods which should be applied to all the waveform objects in waveforms. If a dict, this also contains arguments to the sound.Waveform methods.
  • statistics_list (list of str) – This is a list of the methods which should be applied to all the “time-dependent” features computed from the waveforms.
  • num_proc (int >= 1) – The number of parallel processes to run
Returns:

pandas dataframe where every row corresponds

to features extracted for one of the waveforms and columns represent individual features.

Return type:

pandas DataFrame