ava.preprocessing package¶

AVA for making spectrograms

Contains¶

ava.preprocessing.preprocess: Preprocess syllable spectrograms.
ava.preprocessing.utils: Useful functions for preprocessing.
ava.preprocessing.warping: Simple shift-only and linear time-warping functions.

Submodules¶

ava.preprocessing.preprocess module¶

Make and save syllable spectrograms.

ava.preprocessing.preprocess.get_audio_filenames(audio_dir)[source]¶: Return a list of sorted audio files.

ava.preprocessing.preprocess.get_audio_seg_filenames(audio_dir, segment_dir, p)[source]¶: Return lists of sorted filenames.

ava.preprocessing.preprocess.get_syll_specs(onsets, offsets, audio_filename, p)[source]¶

Return the spectrograms corresponding to onsets and offsets.

Parameters:

onsets (list of floats) – Syllable onsets.
offsets (list of floats) – Syllable offsets.
audio_filename (str) – Audio filename.
p (dict) – A dictionary mapping preprocessing parameters to their values. NOTE: ADD REFERENCE HERE!

Returns:

specs (list of {numpy.ndarray, None}) – Spectrograms.
valid_syllables (list of int) – Indices of specs containing valid syllables.

ava.preprocessing.preprocess.is_audio_file(fn)[source]¶: Return whether the given filename is an audio filename.

ava.preprocessing.preprocess.process_sylls(audio_dir, segment_dir, save_dir, p, shuffle=True, verbose=True)[source]¶

Extract syllables from audio_dir and save to save_dir.

Parameters:

audio_dir (str) – Directory containing audio files.
segment_dir (str) – Directory containing segmenting decisions.
save_dir (str) – Directory to save processed syllables in.
p (dict) – Preprocessing parameters. TO DO: add reference.
shuffle (bool, optional) – Shuffle by filename. Defaults to True.
verbose (bool, optional) – Defaults to True.

ava.preprocessing.preprocess.read_onsets_offsets_from_file(txt_filename, p)[source]¶

Read a text file to collect onsets and offsets.

Note

The text file must have two coulumns separated by whitespace and # prepended to header and footer lines.

ava.preprocessing.preprocess.tune_syll_preprocessing_params(audio_dirs, seg_dirs, p, img_fn='temp.pdf')[source]¶

Flip through spectrograms and tune preprocessing parameters.

Parameters:	audio_dirs (list of str) – Audio directories seg_dirs (list of str) – Segment directories p (dict) – Preprocessing parameters: Add a reference!
Returns:	p – Adjusted preprocessing parameters.
Return type:	dict

ava.preprocessing.preprocess.tune_window_preprocessing_params(audio_dirs, p, img_fn='temp.pdf')[source]¶

Flip through spectrograms and tune preprocessing parameters.

Parameters:	audio_dirs (list of str) – Audio directories p (dict) – Preprocessing parameters ADD REFERENCE img_fn (str, optional) – Where to save images. Defaults to `'temp.pdf'`.
Returns:	p – Adjusted preprocessing parameters.
Return type:	dict

ava.preprocessing.utils module¶

Useful functions for preprocessing.

ava.preprocessing.utils.get_spec(t1, t2, audio, p, fs=32000, target_freqs=None, target_times=None, fill_value=-1000000000000.0, max_dur=None, remove_dc_offset=True)[source]¶

Norm, scale, threshold, stretch, and resize a Short Time Fourier Transform.

Notes

fill_value necessary?
Look at all references and see what can be simplified.
Why is a flag returned?

Parameters:

t1 (float) – Onset time.
t2 (float) – Offset time.
audio (numpy.ndarray) – Raw audio.
p (dict) – Parameters. Must include keys: …
fs (float) – Samplerate.
target_freqs (numpy.ndarray or None, optional) – Interpolated frequencies.
target_times (numpy.ndarray or None, optional) – Intepolated times.
fill_value (float, optional) – Defaults to -1/EPSILON.
max_dur (float, optional) – Maximum duration. Defaults to None.
remove_dc_offset (bool, optional) – Whether to remove any DC offset from the audio. Defaults to True.

Returns:

spec (numpy.ndarray) – Spectrogram.
flag (bool) – True

ava.preprocessing.warping module¶

Simple shift-only and linear time-warping functions.

This is an alternative to affinewarp time warping.

Warning

ava.preprocessing.warping is experimental and may change in a future version of AVA!

ava.preprocessing.warping.align_specs(specs, shift_λs, slope_λs, verbose=True)[source]¶

Align the spectrograms, return warping parameters and warped specs.

Minimizes the following regularized L2 loss:

\[\| \textrm{warped_spec} - \textrm{target_spec} \|_2^2 + \textrm{shift_λ} \cdot \textrm{shift}^2 + \textrm{slope_λ} \cdot (\log \textrm{slope})^2\]

where target_spec is the average warped spectrogram, updated after every optimization iteration. It’s a good idea to start with large values of shift_λ and slope_λ that gradually decrease to zero if you want to end up with a maximum likelihood estimate. In particular, I’ve found it’s helpful to do a shift-only warp the first few iterations by setting slope_λ to np.inf.

Notes

If the optimization fails, the failure message is printed and (None, None) is returned.
This works better when the spectrograms are summed over the frequency axis like this: np.sum(specs, axis=1, keepdims=True)
Because the objective changes every iteration, it’s not necessarily bad if the loss isn’t monotonically decreasing.

Raises:

UserWarning – Tells you this is experimental and may change in future versions of AVA.

Parameters:

specs (numpy.ndarray) – Spectrograms, shape: [n_specs, freq_bins, time_bins]
shift_λs (sequence of float) –
slope_λs (sequence of float) –
verbose (bool, optional) –

Returns:

warped_specs (numpy.ndarray) – Warped spectrograms. Same shape as input spectrograms.
warp_params (dictionary) – Maps ‘shifts’ and ‘slopes’ to their inferred values. Values are in units of time bins.

ava.preprocessing.warping.apply_warp(specs, warp_params)[source]¶

Take real spectrograms, apply linear warps, making warped spectrograms.

Parameters:	specs (numpy.ndarray) – Spectrograms with shape [n_specs, num_freq_bins, num_time_bins]` warp_params (dict) – Returned by align_specs. Maps keys ‘shifts’ and ‘slopes’ to Numpy arrays with shape [n_specs].
Returns:	warped_specs – Time-warped spectrograms. Same shape as spec.
Return type:	numpy.ndarray