ava.segmenting package¶
Submodules¶
ava.segmenting.amplitude_segmentation module¶
Amplitude-based syllable segmentation.
-
ava.segmenting.amplitude_segmentation.get_onsets_offsets(audio, p, return_traces=False)[source]¶ Segment the spectrogram using thresholds on its amplitude.
A syllable is detected if the amplitude trace exceeds p[‘th_3’]. An offset is then detected if there is a subsequent local minimum in the amplitude trace with amplitude less than p[‘th_2’], or when the amplitude drops below p[‘th_1’], whichever comes first. Syllable onset is determined analogously.
Note
p[‘th_1’] <= p[‘th_2’] <= p[‘th_3’]
Parameters: - audio (numpy.ndarray) – Raw audio samples.
- p (dict) – Parameters.
- return_traces (bool, optional) – Whether to return traces. Defaults to False.
Returns: - onsets (numpy array) – Onset times, in seconds
- offsets (numpy array) – Offset times, in seconds
- traces (list of a single numpy array) – The amplitude trace used in segmenting decisions. Returned if return_traces is True.
ava.segmenting.refine_segments module¶
Remove noise from segmenting files.
-
ava.segmenting.refine_segments.refine_segments_post_vae(dc, seg_dirs, audio_dirs, out_seg_dirs, verbose=True, num_imgs=2000, tooltip_output_dir='temp', make_tooltip=True)[source]¶ Manually remove noise by selecting regions of UMAP latent mean projections.
Doesn’t support datasets that are too large to fit in memory.
Parameters: - dc (ava.data.data_container.DataContainer) – DataContainer object
- seg_dirs (list of str) – Original segment directories.
- out_seg_dirs (list of str) – Output segment directories.
- verbose (bool, optional) – Defaults to
True. - num_imgs (int, optional) – Number of images for tooltip plot. Defaults to
2000. - tooltip_output_dir (str, optional) – Where to save tooltip plot. Defaults to
'temp'. - make_tooltip (bool, optional) – Defaults to
True.
-
ava.segmenting.refine_segments.refine_segments_pre_vae(seg_dirs, audio_dirs, out_seg_dirs, p, n_samples=10000, num_imgs=1000, verbose=True, img_fn='temp.pdf', tooltip_output_dir='temp')[source]¶ Manually remove noise by selecting regions of UMAP spectrogram projections.
Parameters: - seg_dirs (list of str) – Directories containing segmenting information
- audio_dirs (list of str) – Directories containing audio files
- out_seg_dirs (list of str) – Directories to write updated segmenting information to
- p (dict) – Segmenting parameters: TO DO: ADD REFERENCE!
- n_samples (int, optional) – Number of spectrograms to feed to UMAP. Defaults to
10000. - num_imgs (int, optional) – Number of images to embed in the tooltip plot. Defaults to
1000. - verbose (bool, optional) – Defaults to
True. - img_fn (str, optional) – Image filename. Defaults to
'temp.pdf'. - tooltip_output_dir (str, optional) – Where to save tooltip plot. Defaults to
'temp'.
ava.segmenting.segment module¶
Segment audio files and write segmenting decisions.
- TO DO:
- tune window size
- segment could be sped up if it operated file by file.
-
ava.segmenting.segment.get_audio_seg_filenames(audio_dir, segment_dir, p=None)[source]¶ Return lists of sorted filenames.
Warning
- p is unused. This will be removed in a future version!
Parameters: - audio_dir (str) – Audio directory.
- segment_dir (str) – Segments directory.
- p (dict, optional) – Unused! Defaults to
None.
-
ava.segmenting.segment.segment(audio_dir, seg_dir, p, verbose=True)[source]¶ Segment audio files in audio_dir and write decisions to seg_dir.
Parameters: - audio_dir (str) – Directory containing audio files.
- seg_dir (str) – Directory containing segmenting decisions.
- p (dict) – Segmenting parameters. Must map the key ‘algorithm’ to a segmenting algorithm, for example ava.segmenting.amplitude_segmentation.get_onsets_offsets. Must additionally contain keys requested by the segmenting algorithm.
- verbose (bool, optional) – Defaults to
True.
-
ava.segmenting.segment.tune_segmenting_params(audio_dirs, p, img_fn='temp.pdf')[source]¶ Tune segementing parameters by visualizing segmenting decisions.
Chunks of audio will be drawn at random, segmented, and a plot showing the segmenting decisions will be saved as
img_fn, by default'temp.pdf'.Parameters: - audio_dirs (list of str) – Directories containing audio files.
- p (dict) –
- Segmenting parameters. Must contain the keys:
- -‘max_dur’: maximum segment duration, in seconds
-‘algorithm’: segmenting algorithm, for exampleava.segmenting.amplitude_segmentation.get_onsets_offsets.
in addition to the keys required by ava.segmenting.utils.get_spec.
- img_fn (str, optional) – Where to save segmenting images.
Returns: p – Adjusted segmenting parameters.
Return type: dict
ava.segmenting.template_segmentation module¶
Segment song motifs by finding maxima in spectrogram cross correlations.
-
ava.segmenting.template_segmentation.clean_collected_data(result, audio_dirs, segment_dirs, p, max_num_specs=10000, verbose=True, img_fn='temp.pdf', tooltip_plot_dir='html')[source]¶ Deprecated. See
clean_collected_segments.
-
ava.segmenting.template_segmentation.clean_collected_segments(result, audio_dirs, segment_dirs, p, max_num_specs=10000, verbose=True, img_fn='temp.pdf', tooltip_plot_dir='html')[source]¶ Take a look at the collected segments and discard false positives.
Parameters: - result (dict) – Output of
segment_filesor read_segment_decisions`. - audio_dirs (list of str) – Directories containing audio.
- segment_dirs (list of str) – Directories containing segmenting decisions.
- p (dict) – Parameters. Must contain keys:
'fs','min_freq','max_freq','nperseg','noverlap','spec_min_val','spec_max_val'. - max_num_specs (int, optional) – Maximum number of spectrograms to feed to UMAP. Deafults to
10000. - verbose (bool, optional) – Defaults to
True. - img_fn (str, optional) – Image filename. Defaults to
'temp.pdf'. - tooltip_plot_dir (str, optional) – Directory to save tooltip plot to. Defaults to
'html'.
- result (dict) – Output of
-
ava.segmenting.template_segmentation.get_template(feature_dir, p, smoothing_kernel=(0.5, 0.5), verbose=True)[source]¶ Create a linear feature template given exemplar spectrograms.
Parameters: - feature_dir (str) – Directory containing multiple audio files to average together.
- p (dict) – Parameters. Must contain keys:
'fs','min_freq','max_freq','nperseg','noverlap','spec_min_val','spec_max_val'. - smoothing_kernel (tuple of floats, optional) – Each spectrogram is blurred using a gaussian kernel with the following
bandwidths, in bins. Defaults to
(0.5, 0.5). - verbose (bool, optional) – Defaults to
True.
Returns: template – Spectrogram template.
Return type: np.ndarray
-
ava.segmenting.template_segmentation.read_segment_decisions(audio_dirs, segment_dirs, verbose=True)[source]¶ Returns the same data as
segment_files.Parameters: - audio_dirs (list of str) – Audio directories.
- segment_dirs (list of str) – Segment directories.
- verbose (bool, optional) – Defaults to
True.
Returns: result – Maps audio filenames to segments.
Return type: dict
-
ava.segmenting.template_segmentation.segment_files(audio_dirs, segment_dirs, template, p, num_mad=2.0, min_dt=0.05, n_jobs=1, verbose=True)[source]¶ Write segments to text files.
Parameters: - audio_dirs (list of str) – Audio directories.
- segment_dirs (list of str) – Corresponding directories containing segmenting decisions.
- template (numpy.ndarray) – Spectrogram template.
- p (dict) – Parameters. Must contain keys:
'fs','min_freq','max_freq','nperseg','noverlap','spec_min_val','spec_max_val'. - num_mad (float, optional) – Number of median absolute deviations for cross-correlation threshold.
Defaults to
2.0. - min_dt (float, optional) – Minimum duration between cross correlation maxima. Defaults to
0.05. - n_jobs (int, optional) – Number of jobs for parallelization. Defaults to
1. - verbose (bool, optional) – Defaults to
True.
Returns: result – Maps audio filenames to segments (numpy.ndarrays).
Return type: dict
-
ava.segmenting.template_segmentation.segment_sylls_from_songs(audio_dirs, song_seg_dirs, syll_seg_dirs, p, shoulder=0.05, img_fn='temp.pdf', verbose=True)[source]¶ Split song renditions into syllables, write segments.
Enter quantiles to determine where to split the song motif. Entering the same quantile twice will remove it.
Note
- All the song segments must be the same duration!
Parameters: - audio_dirs (list of str) – Audio directories.
- song_seg_dirs (list of str) – Directories containing song segments.
- syll_seg_dirs (list of str) – Directories where syllable segments are written.
- p (dict) – Segmenting parameters.
- shoulder (float, optional) – Duration of padding on either side of song segments, in seconds.
- img_fn (str, optional) – Image filename. Defaults to
'temp.pdf'. - verbose (bool, optional) – Defaults to True.
-
ava.segmenting.template_segmentation.segment_sylls_from_warped_songs(warped_window_dset, audio_dirs, spec_dirs, time_bins=512, num_specs=3, img_fn='temp.pdf', verbose=True)[source]¶ Split time-warped song renditions into time-warped syllables, save specs.
Enter quantiles to determine where to split the song motif. Entering the same quantile twice will remove it.
Parameters: - warped_window_dset (ava.models.window_vae_dataset.WarpedWindowDataset) – Dataset defining a warping.
- audio_dirs (list of str) – Audio directories.
- spec_dirs (list of str) – Spectrogram directories.
- time_bins (int, optional) – Number of spectrogram time bins to plot.
- num_specs (int, optional) – Number of spectrograms to plot. Defaults to 1.
- img_fn (str, optional) – Image filename. Defaults to
'temp.pdf'. - verbose (bool, optional) – Defaults to True.
ava.segmenting.utils module¶
Useful functions for segmenting.
-
ava.segmenting.utils.clean_segments_by_hand(audio_dirs, orig_seg_dirs, new_seg_dirs, p, nrows=4, ncols=4, shoulder=0.1, select_to_reject=True, img_filename='temp.pdf')[source]¶ Plot spectrograms and ask for accept/reject input.
The accepted segments are taken from orig_seg_dirs and copied to new_seg_dirs.
Notes
- Enter indices of false positive spectrograms (or if select_to_reject is False, true positive spectrograms) separated by spaces.
- This will not overwrite existing segmentation files and will raise an AssertionError if any of the files already exist.
Parameters: - audio_dirs (list of str) – Audio directories.
- orig_seg_dirs (list of str) – Original segment directories.
- new_seg_dirs (list of str) – New segment directories.
- p (dict) – Parameters. Should the following keys: ‘fs’, ‘nperseg’, ‘noverlap’, ‘min_freq’, ‘max_freq’, ‘spec_min_val’, ‘spec_max_val’
- nrows (int, optional) – Number of rows of spectrograms to plot. Defaults to
4. - ncols (int, optional) – Number of columns of spectrograms to plot. Defaults to
4. - shoulder (float, optional) – Duration of audio to plot on either side of segment. Defaults to 0.1.
- select_to_reject (bool, optional) – If
True, the user is asked to identify false positives. Else, the user is asked to identify true positives. Defaults toTrue. - img_filename (str, optional) – Where to write images. Defaults to
'temp.pdf'.
-
ava.segmenting.utils.copy_segments_to_standard_format(orig_seg_dirs, new_seg_dirs, seg_ext, delimiter, usecols, skiprows, max_duration=None)[source]¶ Copy onsets/offsets from SAP, MUPET, or Deepsqueak into a standard format.
Note
- delimiter, usecols, and skiprows are all passed to numpy.loadtxt.
Parameters: - orig_seg_dirs (list of str) – Directories containing original segments.
- new_seg_dirs (list of str) – Corresponding directories for new segments.
- seg_ext (str) – Input filename extension.
- delimiter (str) – Input filename delimiter.
- usecols (tuple) – Input file onset and offset column.
- skiprows (int) – Number of rows to skip.
- max_duration ({None, float}, optional) – Maximum segment duration. If None, no max is set. Defaults to None.
-
ava.segmenting.utils.get_audio_seg_filenames(audio_dirs, seg_dirs)[source]¶ Return lists of audio filenames and corresponding segment filenames.
Parameters: - audio_dirs (list of str) – Audio directories
- seg_dirs (list of str) – Corresponding segmenting directories
Returns: - audio_fns (list of str) – Audio filenames
- seg_fns (list of str) – Corresponding segment filenames
-
ava.segmenting.utils.get_spec(audio, p)[source]¶ Get a spectrogram.
Much simpler than
ava.preprocessing.utils.get_spec.Raises: AssertionErroriflen(audio) < p['nperseg'].
Parameters: - audio (numpy array of floats) – Audio
- p (dict) – Spectrogram parameters. Should the following keys: ‘fs’, ‘nperseg’, ‘noverlap’, ‘min_freq’, ‘max_freq’, ‘spec_min_val’, ‘spec_max_val’
Returns: - spec (numpy array of floats) – Spectrogram of shape [freq_bins x time_bins]
- dt (float) – Time step between time bins.
- f (numpy.ndarray) – Array of frequencies.
-
ava.segmenting.utils.merge_segments(orig_seg_dirs, new_seg_dirs, merge_threshold, left_shoulder=0.0, right_shoulder=0.0, min_duration=0.0, verbose=True)[source]¶ Merge nearby segments into larger segments.
Parameters: - orig_seg_dirs (list of str) – Directories containing original segments.
- new_seg_dirs (list of str) – Corresponding directories for new segments.
- merge_threshold (float) – All segments closer than this duration are merged.
- left_shoulder (float, optional) – Extra time to add before merged segments. Defaults to 0.0
- right_shoulder (float, optional) – Extra time to add after merged segments. Defaults to 0.0.
- min_duration (float, optional) – Minumum duration of a merged segment. Defaults to 0.0.
-
ava.segmenting.utils.softmax(arr, t=0.5)[source]¶ Softmax along first array dimension. Not numerically stable.
-
ava.segmenting.utils.write_segments_to_audio(in_audio_dirs, out_audio_dirs, seg_dirs, n_zfill=3, verbose=True)[source]¶ Write each segment as its own audio file.
Parameters: - in_audio_dirs (list of str) – Where to read audio.
- out_audio_dirs (list of str) – Where to write audio.
- seg_dirs (list of str) – Where to read segments.
- n_zfill (int, optional) – For filename formatting. Defaults to
3. - verbose (bool, optional) – Deafults to
True.
Module contents¶
AVA segmenting module
Contains¶
- ava.segmenting.amplitude_segmentation
- Segment based on amplitude thresholds.
- ava.segmenting.refine_segments
- Get rid of of false positive syllables (noise).
- ava.segmenting.segment
- Segment large batches of audio files.
- ava.segmenting.template_segmentation
- Segment based on peaks in spectrogram cross correlation.
- ava.segmenting.utils
- Useful functions for segmenting.