Feature extraction with multiprocessing¶
This file contains functions to compute features with multiprocessing.
-
surfboard.feature_extraction_multiprocessing.
load_waveform_from_path
(sample_rate, path)¶ Helper function to access constructor with Pool
Parameters: - sample_rate (int) – The sample rate to load the Waveform object
- path (str) – The path to the audio file to load
Returns: The loaded Waveform object
Return type:
-
surfboard.feature_extraction_multiprocessing.
load_waveforms_from_paths
(paths, sample_rate, num_proc=1)¶ Loads waveforms from paths using multiprocessing
Parameters: - paths (list of str) – A list of paths to audio files
- sample_rate (int) – The sample rate to load the audio files
- num_proc (int >= 1) – The number of parallel processes to run
Returns: List of loaded Waveform objects
Return type: list of Waveform
-
surfboard.feature_extraction_multiprocessing.
extract_features_from_path
(components_list, statistics_list, sample_rate, path)¶ Function which loads a waveform, computes the components and statistics and returns them, without the need to store the waveforms in memory. This is to prevent accumulating too much memory.
Parameters: - components_list (list of str/dict) – This is a list of the methods which should be applied to all the waveform objects in waveforms. If a dict, this also contains arguments to the sound.Waveform methods.
- statistics_list (list of str) – This is a list of the methods which should be applied to all the “time-dependent” features computed from the waveforms.
- sample_rate (int > 0) – sampling rate to load the waveforms
- path (str) – path to audio file to extract features from
Returns: Dictionary mapping feature names to values.
Return type: dict
-
surfboard.feature_extraction_multiprocessing.
extract_features_from_paths
(paths, components_list, statistics_list=None, sample_rate=44100, num_proc=1)¶ Function which loads waveforms, computes the features and statistics and returns them, without the need to store the waveforms in memory. This is to prevent accumulating too much memory.
Parameters: - paths (list of str) – .wav to compute
- components_list (list of str or dict) – This is a list of the methods which should be applied to all the waveform objects in waveforms. If a dict, this also contains arguments to the sound.Waveform methods.
- statistics_list (list of str) – This is a list of the methods which should be applied to all the “time-dependent” features computed from the waveforms.
- sample_rate (int > 0) – sampling rate to load the waveforms
Returns: - pandas dataframe where every row corresponds
to features extracted for one of the waveforms and columns represent individual features.
Return type: pandas DataFrame
-
surfboard.feature_extraction_multiprocessing.
extract_features
(waveforms, components_list, statistics_list=None, num_proc=1)¶ This is an important function. Given a list of Waveform objects, a list of Waveform methods in the form of strings and a list of Barrel methods in the form of strings, compute the time-independent features resulting. This function does multiprocessing.
Parameters: - waveforms (list of Waveform) – This is a list of waveform objects
- components_list (list of str or dict) – This is a list of the methods which should be applied to all the waveform objects in waveforms. If a dict, this also contains arguments to the sound.Waveform methods.
- statistics_list (list of str) – This is a list of the methods which should be applied to all the “time-dependent” features computed from the waveforms.
- num_proc (int >= 1) – The number of parallel processes to run
Returns: - pandas dataframe where every row corresponds
to features extracted for one of the waveforms and columns represent individual features.
Return type: pandas DataFrame