ava.segmenting package

Submodules

ava.segmenting.amplitude_segmentation module

Amplitude-based syllable segmentation.

ava.segmenting.amplitude_segmentation.get_onsets_offsets(audio, p, return_traces=False)[source]

Segment the spectrogram using thresholds on its amplitude.

A syllable is detected if the amplitude trace exceeds p[‘th_3’]. An offset is then detected if there is a subsequent local minimum in the amplitude trace with amplitude less than p[‘th_2’], or when the amplitude drops below p[‘th_1’], whichever comes first. Syllable onset is determined analogously.

Note

p[‘th_1’] <= p[‘th_2’] <= p[‘th_3’]

Parameters:
  • audio (numpy.ndarray) – Raw audio samples.
  • p (dict) – Parameters.
  • return_traces (bool, optional) – Whether to return traces. Defaults to False.
Returns:

  • onsets (numpy array) – Onset times, in seconds
  • offsets (numpy array) – Offset times, in seconds
  • traces (list of a single numpy array) – The amplitude trace used in segmenting decisions. Returned if return_traces is True.

ava.segmenting.refine_segments module

Remove noise from segmenting files.

ava.segmenting.refine_segments.refine_segments_post_vae(dc, seg_dirs, audio_dirs, out_seg_dirs, verbose=True, num_imgs=2000, tooltip_output_dir='temp', make_tooltip=True)[source]

Manually remove noise by selecting regions of UMAP latent mean projections.

Doesn’t support datasets that are too large to fit in memory.

Parameters:
  • dc (ava.data.data_container.DataContainer) – DataContainer object
  • seg_dirs (list of str) – Original segment directories.
  • out_seg_dirs (list of str) – Output segment directories.
  • verbose (bool, optional) – Defaults to True.
  • num_imgs (int, optional) – Number of images for tooltip plot. Defaults to 2000.
  • tooltip_output_dir (str, optional) – Where to save tooltip plot. Defaults to 'temp'.
  • make_tooltip (bool, optional) – Defaults to True.
ava.segmenting.refine_segments.refine_segments_pre_vae(seg_dirs, audio_dirs, out_seg_dirs, p, n_samples=10000, num_imgs=1000, verbose=True, img_fn='temp.pdf', tooltip_output_dir='temp')[source]

Manually remove noise by selecting regions of UMAP spectrogram projections.

Parameters:
  • seg_dirs (list of str) – Directories containing segmenting information
  • audio_dirs (list of str) – Directories containing audio files
  • out_seg_dirs (list of str) – Directories to write updated segmenting information to
  • p (dict) – Segmenting parameters: TO DO: ADD REFERENCE!
  • n_samples (int, optional) – Number of spectrograms to feed to UMAP. Defaults to 10000.
  • num_imgs (int, optional) – Number of images to embed in the tooltip plot. Defaults to 1000.
  • verbose (bool, optional) – Defaults to True.
  • img_fn (str, optional) – Image filename. Defaults to 'temp.pdf'.
  • tooltip_output_dir (str, optional) – Where to save tooltip plot. Defaults to 'temp'.

ava.segmenting.segment module

Segment audio files and write segmenting decisions.

TO DO:
  • tune window size
  • segment could be sped up if it operated file by file.
ava.segmenting.segment.get_audio_seg_filenames(audio_dir, segment_dir, p=None)[source]

Return lists of sorted filenames.

Warning

  • p is unused. This will be removed in a future version!
Parameters:
  • audio_dir (str) – Audio directory.
  • segment_dir (str) – Segments directory.
  • p (dict, optional) – Unused! Defaults to None.
ava.segmenting.segment.segment(audio_dir, seg_dir, p, verbose=True)[source]

Segment audio files in audio_dir and write decisions to seg_dir.

Parameters:
  • audio_dir (str) – Directory containing audio files.
  • seg_dir (str) – Directory containing segmenting decisions.
  • p (dict) – Segmenting parameters. Must map the key ‘algorithm’ to a segmenting algorithm, for example ava.segmenting.amplitude_segmentation.get_onsets_offsets. Must additionally contain keys requested by the segmenting algorithm.
  • verbose (bool, optional) – Defaults to True.
ava.segmenting.segment.tune_segmenting_params(audio_dirs, p, img_fn='temp.pdf')[source]

Tune segementing parameters by visualizing segmenting decisions.

Chunks of audio will be drawn at random, segmented, and a plot showing the segmenting decisions will be saved as img_fn, by default 'temp.pdf'.

Parameters:
  • audio_dirs (list of str) – Directories containing audio files.
  • p (dict) –
    Segmenting parameters. Must contain the keys:
    -‘max_dur’: maximum segment duration, in seconds -‘algorithm’: segmenting algorithm, for example
    ava.segmenting.amplitude_segmentation.get_onsets_offsets.

    in addition to the keys required by ava.segmenting.utils.get_spec.

  • img_fn (str, optional) – Where to save segmenting images.
Returns:

p – Adjusted segmenting parameters.

Return type:

dict

ava.segmenting.template_segmentation module

Segment song motifs by finding maxima in spectrogram cross correlations.

ava.segmenting.template_segmentation.clean_collected_data(result, audio_dirs, segment_dirs, p, max_num_specs=10000, verbose=True, img_fn='temp.pdf', tooltip_plot_dir='html')[source]

Deprecated. See clean_collected_segments.

ava.segmenting.template_segmentation.clean_collected_segments(result, audio_dirs, segment_dirs, p, max_num_specs=10000, verbose=True, img_fn='temp.pdf', tooltip_plot_dir='html')[source]

Take a look at the collected segments and discard false positives.

Parameters:
  • result (dict) – Output of segment_files or read_segment_decisions`.
  • audio_dirs (list of str) – Directories containing audio.
  • segment_dirs (list of str) – Directories containing segmenting decisions.
  • p (dict) – Parameters. Must contain keys: 'fs', 'min_freq', 'max_freq', 'nperseg', 'noverlap', 'spec_min_val', 'spec_max_val'.
  • max_num_specs (int, optional) – Maximum number of spectrograms to feed to UMAP. Deafults to 10000.
  • verbose (bool, optional) – Defaults to True.
  • img_fn (str, optional) – Image filename. Defaults to 'temp.pdf'.
  • tooltip_plot_dir (str, optional) – Directory to save tooltip plot to. Defaults to 'html'.
ava.segmenting.template_segmentation.get_template(feature_dir, p, smoothing_kernel=(0.5, 0.5), verbose=True)[source]

Create a linear feature template given exemplar spectrograms.

Parameters:
  • feature_dir (str) – Directory containing multiple audio files to average together.
  • p (dict) – Parameters. Must contain keys: 'fs', 'min_freq', 'max_freq', 'nperseg', 'noverlap', 'spec_min_val', 'spec_max_val'.
  • smoothing_kernel (tuple of floats, optional) – Each spectrogram is blurred using a gaussian kernel with the following bandwidths, in bins. Defaults to (0.5, 0.5).
  • verbose (bool, optional) – Defaults to True.
Returns:

template – Spectrogram template.

Return type:

np.ndarray

ava.segmenting.template_segmentation.read_segment_decisions(audio_dirs, segment_dirs, verbose=True)[source]

Returns the same data as segment_files.

Parameters:
  • audio_dirs (list of str) – Audio directories.
  • segment_dirs (list of str) – Segment directories.
  • verbose (bool, optional) – Defaults to True.
Returns:

result – Maps audio filenames to segments.

Return type:

dict

ava.segmenting.template_segmentation.segment_files(audio_dirs, segment_dirs, template, p, num_mad=2.0, min_dt=0.05, n_jobs=1, verbose=True)[source]

Write segments to text files.

Parameters:
  • audio_dirs (list of str) – Audio directories.
  • segment_dirs (list of str) – Corresponding directories containing segmenting decisions.
  • template (numpy.ndarray) – Spectrogram template.
  • p (dict) – Parameters. Must contain keys: 'fs', 'min_freq', 'max_freq', 'nperseg', 'noverlap', 'spec_min_val', 'spec_max_val'.
  • num_mad (float, optional) – Number of median absolute deviations for cross-correlation threshold. Defaults to 2.0.
  • min_dt (float, optional) – Minimum duration between cross correlation maxima. Defaults to 0.05.
  • n_jobs (int, optional) – Number of jobs for parallelization. Defaults to 1.
  • verbose (bool, optional) – Defaults to True.
Returns:

result – Maps audio filenames to segments (numpy.ndarrays).

Return type:

dict

ava.segmenting.template_segmentation.segment_sylls_from_songs(audio_dirs, song_seg_dirs, syll_seg_dirs, p, shoulder=0.05, img_fn='temp.pdf', verbose=True)[source]

Split song renditions into syllables, write segments.

Enter quantiles to determine where to split the song motif. Entering the same quantile twice will remove it.

Note

  • All the song segments must be the same duration!
Parameters:
  • audio_dirs (list of str) – Audio directories.
  • song_seg_dirs (list of str) – Directories containing song segments.
  • syll_seg_dirs (list of str) – Directories where syllable segments are written.
  • p (dict) – Segmenting parameters.
  • shoulder (float, optional) – Duration of padding on either side of song segments, in seconds.
  • img_fn (str, optional) – Image filename. Defaults to 'temp.pdf'.
  • verbose (bool, optional) – Defaults to True.
ava.segmenting.template_segmentation.segment_sylls_from_warped_songs(warped_window_dset, audio_dirs, spec_dirs, time_bins=512, num_specs=3, img_fn='temp.pdf', verbose=True)[source]

Split time-warped song renditions into time-warped syllables, save specs.

Enter quantiles to determine where to split the song motif. Entering the same quantile twice will remove it.

Parameters:
  • warped_window_dset (ava.models.window_vae_dataset.WarpedWindowDataset) – Dataset defining a warping.
  • audio_dirs (list of str) – Audio directories.
  • spec_dirs (list of str) – Spectrogram directories.
  • time_bins (int, optional) – Number of spectrogram time bins to plot.
  • num_specs (int, optional) – Number of spectrograms to plot. Defaults to 1.
  • img_fn (str, optional) – Image filename. Defaults to 'temp.pdf'.
  • verbose (bool, optional) – Defaults to True.

ava.segmenting.utils module

Useful functions for segmenting.

ava.segmenting.utils.clean_segments_by_hand(audio_dirs, orig_seg_dirs, new_seg_dirs, p, nrows=4, ncols=4, shoulder=0.1, select_to_reject=True, img_filename='temp.pdf')[source]

Plot spectrograms and ask for accept/reject input.

The accepted segments are taken from orig_seg_dirs and copied to new_seg_dirs.

Notes

  • Enter indices of false positive spectrograms (or if select_to_reject is False, true positive spectrograms) separated by spaces.
  • This will not overwrite existing segmentation files and will raise an AssertionError if any of the files already exist.
Parameters:
  • audio_dirs (list of str) – Audio directories.
  • orig_seg_dirs (list of str) – Original segment directories.
  • new_seg_dirs (list of str) – New segment directories.
  • p (dict) – Parameters. Should the following keys: ‘fs’, ‘nperseg’, ‘noverlap’, ‘min_freq’, ‘max_freq’, ‘spec_min_val’, ‘spec_max_val’
  • nrows (int, optional) – Number of rows of spectrograms to plot. Defaults to 4.
  • ncols (int, optional) – Number of columns of spectrograms to plot. Defaults to 4.
  • shoulder (float, optional) – Duration of audio to plot on either side of segment. Defaults to 0.1.
  • select_to_reject (bool, optional) – If True, the user is asked to identify false positives. Else, the user is asked to identify true positives. Defaults to True.
  • img_filename (str, optional) – Where to write images. Defaults to 'temp.pdf'.
ava.segmenting.utils.copy_segments_to_standard_format(orig_seg_dirs, new_seg_dirs, seg_ext, delimiter, usecols, skiprows, max_duration=None)[source]

Copy onsets/offsets from SAP, MUPET, or Deepsqueak into a standard format.

Note

  • delimiter, usecols, and skiprows are all passed to numpy.loadtxt.
Parameters:
  • orig_seg_dirs (list of str) – Directories containing original segments.
  • new_seg_dirs (list of str) – Corresponding directories for new segments.
  • seg_ext (str) – Input filename extension.
  • delimiter (str) – Input filename delimiter.
  • usecols (tuple) – Input file onset and offset column.
  • skiprows (int) – Number of rows to skip.
  • max_duration ({None, float}, optional) – Maximum segment duration. If None, no max is set. Defaults to None.
ava.segmenting.utils.get_audio_seg_filenames(audio_dirs, seg_dirs)[source]

Return lists of audio filenames and corresponding segment filenames.

Parameters:
  • audio_dirs (list of str) – Audio directories
  • seg_dirs (list of str) – Corresponding segmenting directories
Returns:

  • audio_fns (list of str) – Audio filenames
  • seg_fns (list of str) – Corresponding segment filenames

ava.segmenting.utils.get_spec(audio, p)[source]

Get a spectrogram.

Much simpler than ava.preprocessing.utils.get_spec.

Raises:
  • AssertionError if len(audio) < p['nperseg'].
Parameters:
  • audio (numpy array of floats) – Audio
  • p (dict) – Spectrogram parameters. Should the following keys: ‘fs’, ‘nperseg’, ‘noverlap’, ‘min_freq’, ‘max_freq’, ‘spec_min_val’, ‘spec_max_val’
Returns:

  • spec (numpy array of floats) – Spectrogram of shape [freq_bins x time_bins]
  • dt (float) – Time step between time bins.
  • f (numpy.ndarray) – Array of frequencies.

ava.segmenting.utils.merge_segments(orig_seg_dirs, new_seg_dirs, merge_threshold, left_shoulder=0.0, right_shoulder=0.0, min_duration=0.0, verbose=True)[source]

Merge nearby segments into larger segments.

Parameters:
  • orig_seg_dirs (list of str) – Directories containing original segments.
  • new_seg_dirs (list of str) – Corresponding directories for new segments.
  • merge_threshold (float) – All segments closer than this duration are merged.
  • left_shoulder (float, optional) – Extra time to add before merged segments. Defaults to 0.0
  • right_shoulder (float, optional) – Extra time to add after merged segments. Defaults to 0.0.
  • min_duration (float, optional) – Minumum duration of a merged segment. Defaults to 0.0.
ava.segmenting.utils.softmax(arr, t=0.5)[source]

Softmax along first array dimension. Not numerically stable.

ava.segmenting.utils.write_segments_to_audio(in_audio_dirs, out_audio_dirs, seg_dirs, n_zfill=3, verbose=True)[source]

Write each segment as its own audio file.

Parameters:
  • in_audio_dirs (list of str) – Where to read audio.
  • out_audio_dirs (list of str) – Where to write audio.
  • seg_dirs (list of str) – Where to read segments.
  • n_zfill (int, optional) – For filename formatting. Defaults to 3.
  • verbose (bool, optional) – Deafults to True.

Module contents

AVA segmenting module

Contains

ava.segmenting.amplitude_segmentation
Segment based on amplitude thresholds.
ava.segmenting.refine_segments
Get rid of of false positive syllables (noise).
ava.segmenting.segment
Segment large batches of audio files.
ava.segmenting.template_segmentation
Segment based on peaks in spectrogram cross correlation.
ava.segmenting.utils
Useful functions for segmenting.