ava.preprocessing package

AVA for making spectrograms

Contains

ava.preprocessing.preprocess
Preprocess syllable spectrograms.
ava.preprocessing.utils
Useful functions for preprocessing.
ava.preprocessing.warping
Simple shift-only and linear time-warping functions.

Submodules

ava.preprocessing.preprocess module

Make and save syllable spectrograms.

ava.preprocessing.preprocess.get_audio_filenames(audio_dir)[source]

Return a list of sorted audio files.

ava.preprocessing.preprocess.get_audio_seg_filenames(audio_dir, segment_dir, p)[source]

Return lists of sorted filenames.

ava.preprocessing.preprocess.get_syll_specs(onsets, offsets, audio_filename, p)[source]

Return the spectrograms corresponding to onsets and offsets.

Parameters:
  • onsets (list of floats) – Syllable onsets.
  • offsets (list of floats) – Syllable offsets.
  • audio_filename (str) – Audio filename.
  • p (dict) – A dictionary mapping preprocessing parameters to their values. NOTE: ADD REFERENCE HERE!
Returns:

  • specs (list of {numpy.ndarray, None}) – Spectrograms.
  • valid_syllables (list of int) – Indices of specs containing valid syllables.

ava.preprocessing.preprocess.is_audio_file(fn)[source]

Return whether the given filename is an audio filename.

ava.preprocessing.preprocess.process_sylls(audio_dir, segment_dir, save_dir, p, shuffle=True, verbose=True)[source]

Extract syllables from audio_dir and save to save_dir.

Parameters:
  • audio_dir (str) – Directory containing audio files.
  • segment_dir (str) – Directory containing segmenting decisions.
  • save_dir (str) – Directory to save processed syllables in.
  • p (dict) – Preprocessing parameters. TO DO: add reference.
  • shuffle (bool, optional) – Shuffle by filename. Defaults to True.
  • verbose (bool, optional) – Defaults to True.
ava.preprocessing.preprocess.read_onsets_offsets_from_file(txt_filename, p)[source]

Read a text file to collect onsets and offsets.

Note

  • The text file must have two coulumns separated by whitespace and # prepended to header and footer lines.
ava.preprocessing.preprocess.tune_syll_preprocessing_params(audio_dirs, seg_dirs, p, img_fn='temp.pdf')[source]

Flip through spectrograms and tune preprocessing parameters.

Parameters:
  • audio_dirs (list of str) – Audio directories
  • seg_dirs (list of str) – Segment directories
  • p (dict) – Preprocessing parameters: Add a reference!
Returns:

p – Adjusted preprocessing parameters.

Return type:

dict

ava.preprocessing.preprocess.tune_window_preprocessing_params(audio_dirs, p, img_fn='temp.pdf')[source]

Flip through spectrograms and tune preprocessing parameters.

Parameters:
  • audio_dirs (list of str) – Audio directories
  • p (dict) – Preprocessing parameters ADD REFERENCE
  • img_fn (str, optional) – Where to save images. Defaults to 'temp.pdf'.
Returns:

p – Adjusted preprocessing parameters.

Return type:

dict

ava.preprocessing.utils module

Useful functions for preprocessing.

ava.preprocessing.utils.get_spec(t1, t2, audio, p, fs=32000, target_freqs=None, target_times=None, fill_value=-1000000000000.0, max_dur=None, remove_dc_offset=True)[source]

Norm, scale, threshold, stretch, and resize a Short Time Fourier Transform.

Notes

  • fill_value necessary?
  • Look at all references and see what can be simplified.
  • Why is a flag returned?
Parameters:
  • t1 (float) – Onset time.
  • t2 (float) – Offset time.
  • audio (numpy.ndarray) – Raw audio.
  • p (dict) – Parameters. Must include keys: …
  • fs (float) – Samplerate.
  • target_freqs (numpy.ndarray or None, optional) – Interpolated frequencies.
  • target_times (numpy.ndarray or None, optional) – Intepolated times.
  • fill_value (float, optional) – Defaults to -1/EPSILON.
  • max_dur (float, optional) – Maximum duration. Defaults to None.
  • remove_dc_offset (bool, optional) – Whether to remove any DC offset from the audio. Defaults to True.
Returns:

  • spec (numpy.ndarray) – Spectrogram.
  • flag (bool) – True

ava.preprocessing.warping module

Simple shift-only and linear time-warping functions.

This is an alternative to affinewarp time warping.

Warning

  • ava.preprocessing.warping is experimental and may change in a future version of AVA!
ava.preprocessing.warping.align_specs(specs, shift_λs, slope_λs, verbose=True)[source]

Align the spectrograms, return warping parameters and warped specs.

Minimizes the following regularized L2 loss:

\[\| \textrm{warped_spec} - \textrm{target_spec} \|_2^2 + \textrm{shift_λ} \cdot \textrm{shift}^2 + \textrm{slope_λ} \cdot (\log \textrm{slope})^2\]

where target_spec is the average warped spectrogram, updated after every optimization iteration. It’s a good idea to start with large values of shift_λ and slope_λ that gradually decrease to zero if you want to end up with a maximum likelihood estimate. In particular, I’ve found it’s helpful to do a shift-only warp the first few iterations by setting slope_λ to np.inf.

Notes

  • If the optimization fails, the failure message is printed and (None, None) is returned.
  • This works better when the spectrograms are summed over the frequency axis like this: np.sum(specs, axis=1, keepdims=True)
  • Because the objective changes every iteration, it’s not necessarily bad if the loss isn’t monotonically decreasing.
Raises:

UserWarning – Tells you this is experimental and may change in future versions of AVA.

Parameters:
  • specs (numpy.ndarray) – Spectrograms, shape: [n_specs, freq_bins, time_bins]
  • shift_λs (sequence of float) –
  • slope_λs (sequence of float) –
  • verbose (bool, optional) –
Returns:

  • warped_specs (numpy.ndarray) – Warped spectrograms. Same shape as input spectrograms.
  • warp_params (dictionary) – Maps ‘shifts’ and ‘slopes’ to their inferred values. Values are in units of time bins.

ava.preprocessing.warping.apply_warp(specs, warp_params)[source]

Take real spectrograms, apply linear warps, making warped spectrograms.

Parameters:
  • specs (numpy.ndarray) – Spectrograms with shape [n_specs, num_freq_bins, num_time_bins]`
  • warp_params (dict) – Returned by align_specs. Maps keys ‘shifts’ and ‘slopes’ to Numpy arrays with shape [n_specs].
Returns:

warped_specs – Time-warped spectrograms. Same shape as spec.

Return type:

numpy.ndarray