ava.segmenting package¶

AVA package for segmenting audio

Contains¶

ava.segmenting.amplitude_segmentation: Segment based on amplitude thresholds.
ava.segmenting.refine_segments: Get rid of of false positive syllables (noise).
ava.segmenting.segment: Segment large batches of audio files.
ava.segmenting.template_segmentation: Segment based on peaks in spectrogram cross correlation.
ava.segmenting.utils: Useful functions for segmenting.

Submodules¶

ava.segmenting.amplitude_segmentation module¶

Amplitude-based syllable segmentation.

ava.segmenting.amplitude_segmentation.get_onsets_offsets(audio, p, return_traces=False)[source]¶

Segment the spectrogram using thresholds on its amplitude.

A syllable is detected if the amplitude trace exceeds p[‘th_3’]. An offset is then detected if there is a subsequent local minimum in the amplitude trace with amplitude less than p[‘th_2’], or when the amplitude drops below p[‘th_1’], whichever comes first. Syllable onset is determined analogously.

Note

p[‘th_1’] <= p[‘th_2’] <= p[‘th_3’]

Parameters:

audio (numpy.ndarray) – Raw audio samples.
p (dict) – Parameters.
return_traces (bool, optional) – Whether to return traces. Defaults to False.

Returns:

onsets (numpy array) – Onset times, in seconds
offsets (numpy array) – Offset times, in seconds
traces (list of a single numpy array) – The amplitude trace used in segmenting decisions. Returned if return_traces is True.

ava.segmenting.refine_segments module¶

Remove noise from segmenting files.

ava.segmenting.refine_segments.refine_segments_post_vae(dc, seg_dirs, audio_dirs, out_seg_dirs, verbose=True, num_imgs=2000, tooltip_output_dir='temp', make_tooltip=True, img_fn='temp.pdf')[source]¶

Manually remove noise by selecting regions of UMAP latent mean projection.

First, a tooltip plot of the spectrogram latent means will be made (using ava.plotting.tooltip_plot) and saved to tooltip_output_dir. You should open this plot and see which regions of the UMAP contain noise. Then, when prompted, press return to identify noise, Then enter the coordinates of a rectangle (x1, x2, y1, and y2) in the UMAP projection containing noise, following the prompts. You will be able to see the selected noise regions in the image save at img_fn, by default ‘temp.pdf’. When you are finished identifying noise regions, press ‘q’ and the original segments from seg_dirs that aren’t identified as noise (contained in one of the rectangles) are copied to segment files in out_seg_dirs.

Doesn’t support datasets that are too large to fit in memory.

Parameters:

dc (ava.data.data_container.DataContainer) – DataContainer object
seg_dirs (list of str) – Original segment directories.
out_seg_dirs (list of str) – Output segment directories.
verbose (bool, optional) – Defaults to True.
num_imgs (int, optional) – Number of images for tooltip plot. Defaults to 2000.
tooltip_output_dir (str, optional) – Where to save tooltip plot. Defaults to 'temp'.
make_tooltip (bool, optional) – Defaults to True.
img_fn (str, optional) – Where to save

ava.segmenting.refine_segments.refine_segments_pre_vae(seg_dirs, audio_dirs, out_seg_dirs, p, n_samples=10000, num_imgs=1000, verbose=True, img_fn='temp.pdf', tooltip_output_dir='temp')[source]¶

Manually remove noise by selecting regions of UMAP spectrogram projections.

First, a tooltip plot of the UMAPed spectrograms will be made (using ava.plotting.tooltip_plot) and saved to tooltip_output_dir. You should open this plot and see which regions of the UMAP contain noise. Then, when prompted, press return to identify noise, Then enter the coordinates of a rectangle (x1, x2, y1, and y2) in the UMAP projection containing noise, following the prompts. You will be able to see the selected noise regions in the image save at img_fn, by default ‘temp.pdf’. When you are finished identifying noise regions, press ‘q’ and the original segments from seg_dirs that aren’t identified as noise (contained in one of the rectangles) are copied to segment files in out_seg_dirs.

Doesn’t support datasets that are too large to fit in memory.

Parameters:

seg_dirs (list of str) – Directories containing segmenting information
audio_dirs (list of str) – Directories containing audio files
out_seg_dirs (list of str) – Directories to write updated segmenting information to
p (dict) – Segmenting parameters: TO DO: ADD REFERENCE!
n_samples (int, optional) – Number of spectrograms to feed to UMAP. Defaults to 10000.
num_imgs (int, optional) – Number of images to embed in the tooltip plot. Defaults to 1000.
verbose (bool, optional) – Defaults to True.
img_fn (str, optional) – Image filename. Defaults to 'temp.pdf'.
tooltip_output_dir (str, optional) – Where to save tooltip plot. Defaults to 'temp'.

ava.segmenting.segment module¶

Segment audio files and write segmenting decisions.

TO DO:

tune window size
segment could be sped up if it operated file by file.

ava.segmenting.segment.get_audio_seg_filenames(audio_dir, segment_dir, p=None)[source]¶

Return lists of sorted filenames.

Warning

p is unused. This will be removed in a future version!

Parameters:	audio_dir (str) – Audio directory. segment_dir (str) – Segments directory. p (dict, optional) – Unused! Defaults to `None`.

ava.segmenting.segment.segment(audio_dir, seg_dir, p, verbose=True)[source]¶

Segment audio files in audio_dir and write decisions to seg_dir.

Parameters:

audio_dir (str) – Directory containing audio files.
seg_dir (str) – Directory containing segmenting decisions.
p (dict) – Segmenting parameters. Must map the key ‘algorithm’ to a segmenting algorithm, for example ava.segmenting.amplitude_segmentation.get_onsets_offsets. Must additionally contain keys requested by the segmenting algorithm.
verbose (bool, optional) – Defaults to True.

ava.segmenting.segment.tune_segmenting_params(audio_dirs, p, img_fn='temp.pdf')[source]¶

Tune segementing parameters by visualizing segmenting decisions.

Chunks of audio will be drawn at random, segmented, and a plot showing the segmenting decisions will be saved as img_fn, by default 'temp.pdf'.

Parameters:	audio_dirs (list of str) – Directories containing audio files. p (dict) – Segmenting parameters. Must contain the keys: -‘max_dur’: maximum segment duration, in seconds -‘algorithm’: segmenting algorithm, for example ava.segmenting.amplitude_segmentation.get_onsets_offsets. in addition to the keys required by ava.segmenting.utils.get_spec. img_fn (str, optional) – Where to save segmenting images.
Returns:	p – Adjusted segmenting parameters.
Return type:	dict

ava.segmenting.template_segmentation module¶

Segment song motifs by finding maxima in spectrogram cross correlations.

ava.segmenting.template_segmentation.clean_collected_data(result, audio_dirs, segment_dirs, p, max_num_specs=10000, verbose=True, img_fn='temp.pdf', tooltip_plot_dir='html')[source]¶: Deprecated. See clean_collected_segments.

ava.segmenting.template_segmentation.clean_collected_segments(result, audio_dirs, segment_dirs, p, max_num_specs=10000, verbose=True, img_fn='temp.pdf', tooltip_plot_dir='html')[source]¶

Take a look at the collected segments and discard false positives.

Parameters:

result (dict) – Output of segment_files or read_segment_decisions`.
audio_dirs (list of str) – Directories containing audio.
segment_dirs (list of str) – Directories containing segmenting decisions.
p (dict) – Parameters. Must contain keys: 'fs', 'min_freq', 'max_freq', 'nperseg', 'noverlap', 'spec_min_val', 'spec_max_val'.
max_num_specs (int, optional) – Maximum number of spectrograms to feed to UMAP. Deafults to 10000.
verbose (bool, optional) – Defaults to True.
img_fn (str, optional) – Image filename. Defaults to 'temp.pdf'.
tooltip_plot_dir (str, optional) – Directory to save tooltip plot to. Defaults to 'html'.

ava.segmenting.template_segmentation.get_template(feature_dir, p, smoothing_kernel=(0.5, 0.5), verbose=True)[source]¶

Create a linear feature template given exemplar spectrograms.

Parameters:	feature_dir (str) – Directory containing multiple audio files to average together. p (dict) – Parameters. Must contain keys: `'fs'`, `'min_freq'`, `'max_freq'`, `'nperseg'`, `'noverlap'`, `'spec_min_val'`, `'spec_max_val'`. smoothing_kernel (tuple of floats, optional) – Each spectrogram is blurred using a gaussian kernel with the following bandwidths, in bins. Defaults to `(0.5, 0.5)`. verbose (bool, optional) – Defaults to `True`.
Returns:	template – Spectrogram template.
Return type:	np.ndarray

ava.segmenting.template_segmentation.read_segment_decisions(audio_dirs, segment_dirs, verbose=True)[source]¶

Returns the same data as segment_files.

Parameters:	audio_dirs (list of str) – Audio directories. segment_dirs (list of str) – Segment directories. verbose (bool, optional) – Defaults to `True`.
Returns:	result – Maps audio filenames to segments.
Return type:	dict

ava.segmenting.template_segmentation.segment_files(audio_dirs, segment_dirs, template, p, num_mad=2.0, min_dt=0.05, n_jobs=1, verbose=True)[source]¶

Write segments to text files.

Parameters:	audio_dirs (list of str) – Audio directories. segment_dirs (list of str) – Corresponding directories containing segmenting decisions. template (numpy.ndarray) – Spectrogram template. p (dict) – Parameters. Must contain keys: `'fs'`, `'min_freq'`, `'max_freq'`, `'nperseg'`, `'noverlap'`, `'spec_min_val'`, `'spec_max_val'`. num_mad (float, optional) – Number of median absolute deviations for cross-correlation threshold. Defaults to `2.0`. min_dt (float, optional) – Minimum duration between cross correlation maxima. Defaults to `0.05`. n_jobs (int, optional) – Number of jobs for parallelization. Defaults to `1`. verbose (bool, optional) – Defaults to `True`.
Returns:	result – Maps audio filenames to segments (numpy.ndarrays).
Return type:	dict

ava.segmenting.template_segmentation.segment_sylls_from_songs(audio_dirs, song_seg_dirs, syll_seg_dirs, p, shoulder=0.05, img_fn='temp.pdf', verbose=True)[source]¶

Split song renditions into syllables, write segments.

Enter quantiles to determine where to split the song motif. Entering the same quantile twice will remove it.

Note

All the song segments must be the same duration!

Parameters:

audio_dirs (list of str) – Audio directories.
song_seg_dirs (list of str) – Directories containing song segments.
syll_seg_dirs (list of str) – Directories where syllable segments are written.
p (dict) – Segmenting parameters.
shoulder (float, optional) – Duration of padding on either side of song segments, in seconds.
img_fn (str, optional) – Image filename. Defaults to 'temp.pdf'.
verbose (bool, optional) – Defaults to True.

ava.segmenting.template_segmentation.segment_sylls_from_warped_songs(warped_window_dset, audio_dirs, spec_dirs, time_bins=512, num_specs=3, img_fn='temp.pdf', verbose=True)[source]¶

Split time-warped song renditions into time-warped syllables, save specs.

Enter quantiles to determine where to split the song motif. Entering the same quantile twice will remove it.

Parameters:

warped_window_dset (ava.models.window_vae_dataset.WarpedWindowDataset) – Dataset defining a warping.
audio_dirs (list of str) – Audio directories.
spec_dirs (list of str) – Spectrogram directories.
time_bins (int, optional) – Number of spectrogram time bins to plot.
num_specs (int, optional) – Number of spectrograms to plot. Defaults to 1.
img_fn (str, optional) – Image filename. Defaults to 'temp.pdf'.
verbose (bool, optional) – Defaults to True.

ava.segmenting.utils module¶

Useful functions for segmenting.

ava.segmenting.utils.clean_segments_by_hand(audio_dirs, orig_seg_dirs, new_seg_dirs, p, nrows=4, ncols=4, shoulder=0.1, select_to_reject=True, img_filename='temp.pdf')[source]¶

Plot spectrograms and ask for accept/reject input.

The accepted segments are taken from orig_seg_dirs and copied to new_seg_dirs.

Notes

Enter indices of false positive spectrograms (or if select_to_reject is False, true positive spectrograms) separated by spaces.
This will not overwrite existing segmentation files and will raise an AssertionError if any of the files already exist.

Parameters:

audio_dirs (list of str) – Audio directories.
orig_seg_dirs (list of str) – Original segment directories.
new_seg_dirs (list of str) – New segment directories.
p (dict) – Parameters. Should the following keys: ‘fs’, ‘nperseg’, ‘noverlap’, ‘min_freq’, ‘max_freq’, ‘spec_min_val’, ‘spec_max_val’
nrows (int, optional) – Number of rows of spectrograms to plot. Defaults to 4.
ncols (int, optional) – Number of columns of spectrograms to plot. Defaults to 4.
shoulder (float, optional) – Duration of audio to plot on either side of segment. Defaults to 0.1.
select_to_reject (bool, optional) – If True, the user is asked to identify false positives. Else, the user is asked to identify true positives. Defaults to True.
img_filename (str, optional) – Where to write images. Defaults to 'temp.pdf'.

ava.segmenting.utils.copy_segments_to_standard_format(orig_seg_dirs, new_seg_dirs, seg_ext, delimiter, usecols, skiprows, max_duration=None)[source]¶

Copy onsets/offsets from SAP, MUPET, or Deepsqueak into a standard format.

Note

delimiter, usecols, and skiprows are all passed to numpy.loadtxt.

Parameters:

orig_seg_dirs (list of str) – Directories containing original segments.
new_seg_dirs (list of str) – Corresponding directories for new segments.
seg_ext (str) – Input filename extension.
delimiter (str) – Input filename delimiter. For a CSV file, for example, this would be a comma: ‘,’
usecols (tuple) – Input file onset and offset columns, zero-indexed.
skiprows (int) – Number of rows to skip. For example, if there is a single-line header set skiprows=1.
max_duration ({None, float}, optional) – Maximum segment duration. If None, no max is set. Defaults to None.

ava.segmenting.utils.get_audio_seg_filenames(audio_dirs, seg_dirs)[source]¶

Return lists of audio filenames and corresponding segment filenames.

Parameters:

audio_dirs (list of str) – Audio directories
seg_dirs (list of str) – Corresponding segmenting directories

Returns:

audio_fns (list of str) – Audio filenames
seg_fns (list of str) – Corresponding segment filenames

ava.segmenting.utils.get_spec(audio, p)[source]¶

Get a spectrogram.

Much simpler than ava.preprocessing.utils.get_spec.

Raises:

AssertionError if len(audio) < p['nperseg'].

Parameters:

audio (numpy array of floats) – Audio
p (dict) – Spectrogram parameters. Should the following keys: ‘fs’, ‘nperseg’, ‘noverlap’, ‘min_freq’, ‘max_freq’, ‘spec_min_val’, ‘spec_max_val’

Returns:

spec (numpy array of floats) – Spectrogram of shape [freq_bins x time_bins]
dt (float) – Time step between time bins.
f (numpy.ndarray) – Array of frequencies.

ava.segmenting.utils.merge_segments(orig_seg_dirs, new_seg_dirs, merge_threshold, left_shoulder=0.0, right_shoulder=0.0, min_duration=0.0, verbose=True)[source]¶

Merge nearby segments into larger segments.

Parameters:

orig_seg_dirs (list of str) – Directories containing original segments.
new_seg_dirs (list of str) – Corresponding directories for new segments.
merge_threshold (float) – All segments closer than this duration are merged.
left_shoulder (float, optional) – Extra time to add before merged segments. Defaults to 0.0
right_shoulder (float, optional) – Extra time to add after merged segments. Defaults to 0.0.
min_duration (float, optional) – Minumum duration of a merged segment. Defaults to 0.0.

ava.segmenting.utils.softmax(arr, t=0.5)[source]¶: Softmax along first array dimension. Not numerically stable.

ava.segmenting.utils.write_segments_to_audio(in_audio_dirs, out_audio_dirs, seg_dirs, n_zfill=3, verbose=True)[source]¶

Write each segment as its own audio file.

Parameters:	in_audio_dirs (list of str) – Where to read audio. out_audio_dirs (list of str) – Where to write audio. seg_dirs (list of str) – Where to read segments. n_zfill (int, optional) – For filename formatting. Defaults to `3`. verbose (bool, optional) – Deafults to `True`.