ava.preprocessing package¶
AVA for making spectrograms
Contains¶
- ava.preprocessing.preprocess
- Preprocess syllable spectrograms.
- ava.preprocessing.utils
- Useful functions for preprocessing.
- ava.preprocessing.warping
- Simple shift-only and linear time-warping functions.
Submodules¶
ava.preprocessing.preprocess module¶
Make and save syllable spectrograms.
-
ava.preprocessing.preprocess.get_audio_filenames(audio_dir)[source]¶ Return a list of sorted audio files.
-
ava.preprocessing.preprocess.get_audio_seg_filenames(audio_dir, segment_dir, p)[source]¶ Return lists of sorted filenames.
-
ava.preprocessing.preprocess.get_syll_specs(onsets, offsets, audio_filename, p)[source]¶ Return the spectrograms corresponding to onsets and offsets.
Parameters: - onsets (list of floats) – Syllable onsets.
- offsets (list of floats) – Syllable offsets.
- audio_filename (str) – Audio filename.
- p (dict) – A dictionary mapping preprocessing parameters to their values. NOTE: ADD REFERENCE HERE!
Returns: - specs (list of {numpy.ndarray, None}) – Spectrograms.
- valid_syllables (list of int) – Indices of specs containing valid syllables.
-
ava.preprocessing.preprocess.is_audio_file(fn)[source]¶ Return whether the given filename is an audio filename.
-
ava.preprocessing.preprocess.process_sylls(audio_dir, segment_dir, save_dir, p, shuffle=True, verbose=True)[source]¶ Extract syllables from audio_dir and save to save_dir.
Parameters: - audio_dir (str) – Directory containing audio files.
- segment_dir (str) – Directory containing segmenting decisions.
- save_dir (str) – Directory to save processed syllables in.
- p (dict) – Preprocessing parameters. TO DO: add reference.
- shuffle (bool, optional) – Shuffle by filename. Defaults to
True. - verbose (bool, optional) – Defaults to
True.
-
ava.preprocessing.preprocess.read_onsets_offsets_from_file(txt_filename, p)[source]¶ Read a text file to collect onsets and offsets.
Note
- The text file must have two coulumns separated by whitespace and
#prepended to header and footer lines.
- The text file must have two coulumns separated by whitespace and
-
ava.preprocessing.preprocess.tune_syll_preprocessing_params(audio_dirs, seg_dirs, p, img_fn='temp.pdf')[source]¶ Flip through spectrograms and tune preprocessing parameters.
Parameters: - audio_dirs (list of str) – Audio directories
- seg_dirs (list of str) – Segment directories
- p (dict) – Preprocessing parameters: Add a reference!
Returns: p – Adjusted preprocessing parameters.
Return type: dict
-
ava.preprocessing.preprocess.tune_window_preprocessing_params(audio_dirs, p, img_fn='temp.pdf')[source]¶ Flip through spectrograms and tune preprocessing parameters.
Parameters: - audio_dirs (list of str) – Audio directories
- p (dict) – Preprocessing parameters ADD REFERENCE
- img_fn (str, optional) – Where to save images. Defaults to
'temp.pdf'.
Returns: p – Adjusted preprocessing parameters.
Return type: dict
ava.preprocessing.utils module¶
Useful functions for preprocessing.
-
ava.preprocessing.utils.get_spec(t1, t2, audio, p, fs=32000, target_freqs=None, target_times=None, fill_value=-1000000000000.0, max_dur=None, remove_dc_offset=True)[source]¶ Norm, scale, threshold, stretch, and resize a Short Time Fourier Transform.
Notes
fill_valuenecessary?- Look at all references and see what can be simplified.
- Why is a flag returned?
Parameters: - t1 (float) – Onset time.
- t2 (float) – Offset time.
- audio (numpy.ndarray) – Raw audio.
- p (dict) – Parameters. Must include keys: …
- fs (float) – Samplerate.
- target_freqs (numpy.ndarray or
None, optional) – Interpolated frequencies. - target_times (numpy.ndarray or
None, optional) – Intepolated times. - fill_value (float, optional) – Defaults to
-1/EPSILON. - max_dur (float, optional) – Maximum duration. Defaults to
None. - remove_dc_offset (bool, optional) – Whether to remove any DC offset from the audio. Defaults to
True.
Returns: - spec (numpy.ndarray) – Spectrogram.
- flag (bool) –
True
ava.preprocessing.warping module¶
Simple shift-only and linear time-warping functions.
This is an alternative to affinewarp time warping.
Warning
- ava.preprocessing.warping is experimental and may change in a future version of AVA!
-
ava.preprocessing.warping.align_specs(specs, shift_λs, slope_λs, verbose=True)[source]¶ Align the spectrograms, return warping parameters and warped specs.
Minimizes the following regularized L2 loss:
\[\| \textrm{warped_spec} - \textrm{target_spec} \|_2^2 + \textrm{shift_λ} \cdot \textrm{shift}^2 + \textrm{slope_λ} \cdot (\log \textrm{slope})^2\]where target_spec is the average warped spectrogram, updated after every optimization iteration. It’s a good idea to start with large values of
shift_λandslope_λthat gradually decrease to zero if you want to end up with a maximum likelihood estimate. In particular, I’ve found it’s helpful to do a shift-only warp the first few iterations by settingslope_λtonp.inf.Notes
- If the optimization fails, the failure message is printed and
(None, None)is returned. - This works better when the spectrograms are summed over the frequency axis
like this:
np.sum(specs, axis=1, keepdims=True) - Because the objective changes every iteration, it’s not necessarily bad if the loss isn’t monotonically decreasing.
Raises: UserWarning – Tells you this is experimental and may change in future versions of AVA.
Parameters: - specs (numpy.ndarray) – Spectrograms, shape: [n_specs, freq_bins, time_bins]
- shift_λs (sequence of float) –
- slope_λs (sequence of float) –
- verbose (bool, optional) –
Returns: - warped_specs (numpy.ndarray) – Warped spectrograms. Same shape as input spectrograms.
- warp_params (dictionary) – Maps ‘shifts’ and ‘slopes’ to their inferred values. Values are in units of time bins.
- If the optimization fails, the failure message is printed and
-
ava.preprocessing.warping.apply_warp(specs, warp_params)[source]¶ Take real spectrograms, apply linear warps, making warped spectrograms.
Parameters: - specs (numpy.ndarray) – Spectrograms with shape [n_specs, num_freq_bins, num_time_bins]`
- warp_params (dict) – Returned by align_specs. Maps keys ‘shifts’ and ‘slopes’ to Numpy arrays with shape [n_specs].
Returns: warped_specs – Time-warped spectrograms. Same shape as spec.
Return type: numpy.ndarray