ava.data package¶
Submodules¶
ava.data.data_container module¶
DataContainer class for linking directories containing different sorts of data.
This is meant to make plotting and analysis easier.
TO DO¶
- request random subsets.
- make sure input directories are iterable
- add features to existing files.
-
ava.data.data_container.ALL_FIELDS= ['audio', 'sap_time', 'segments', 'segment_audio', 'latent_means', 'latent_mean_pca', 'latent_mean_umap', 'specs', 'onsets', 'offsets', 'audio_filenames', 'syllable_number', 'syllable_start_time', 'syllable_end_time', 'inter-syllable_interval', 'syllable_duration', 'starting_frequency', 'final_frequency', 'minimum_frequency', 'maximum_frequency', 'mean_frequency', 'frequency_bandwidth', 'total_syllable_energy', 'peak_syllable_amplitude', 'cluster', 'id', 'label', 'accepted', 'score', 'begin_time', 'end_time', 'call_length', 'principal_frequency', 'low_freq', 'high_freq', 'delta_freq', 'frequency_standard_deviation', 'slope', 'sinuosity', 'mean_power', 'tonality', 'syllable_duration_sap', 'syllable_start', 'mean_amplitude', 'mean_pitch', 'mean_FM', 'mean_AM2', 'mean_entropy', 'mean_pitch_goodness', 'mean_mean_freq', 'pitch_variance', 'FM_variance', 'entropy_variance', 'pitch_goodness_variance', 'mean_freq_variance', 'AM_variance']¶ All fields that can be requested by a DataContainer object.
-
class
ava.data.data_container.DataContainer(audio_dirs=None, segment_dirs=None, spec_dirs=None, feature_dirs=None, projection_dirs=None, plots_dir='', model_filename=None, template_dir=None, verbose=True)[source]¶ Bases:
objectLink directories containing different data sources for easy plotting.
The idea here is for plotting and analysis tools to accept a DataContainer, from which they can request different types of data. Those requests can then be handled here in a central location, which can cut down on redundant code and processing steps.
-
audio_dirs¶ Directories containing audio. Defaults to None.
Type: {list of str, None}, optional
-
segment_dirs¶ Directories containing segmenting decisions.
Type: {list of str, None}, optional
-
spec_dirs¶ Directories containing hdf5 files of spectrograms. These should be files output by ava.preprocessing.preprocessing. Defaults to None.
Type: list of {str, None}, optional
-
model_filename¶ The VAE checkpoint to load. Written by models.vae.save_state. Defaults to None.
Type: {str, None}, optional
-
projection_dirs¶ Directory containing different projections. This is where things like latent means, their projections, and handcrafted features found in feature_dirs are saved. Defaults to None.
Type: list of {str, None}, optional
-
plots_dir¶ Directory to save plots. Defaults to ‘’ (current working directory).
Type: str, optional
-
feature_dirs¶ Directory containing text files with different syllable features. For exmaple, this could contain exported MUPET, DeepSqueak or SAP syllable tables. Defaults to None.
Type: list of {str, None}, optional
-
template_dir¶ Directory continaing audio files of song templates. Defaults to None.
Type: {str, None}, optional
Notes
Supported directory structure:
├── animal_1 │ ├── audio (raw audio) │ │ ├── foo.wav │ │ ├── bar.wav │ │ └── baz.wav │ ├── features (output of MUPET, DeepSqueak, SAP, ...) │ │ ├── foo.csv │ │ ├── bar.csv │ │ └── baz.csv │ ├── spectrograms (used to train models, written by │ │ ├── syllables_000.hdf5 preprocessing.preprocess.process_sylls) │ │ └── syllables_001.hdf5 │ └── projections (latent means, UMAP, PCA, tSNE │ ├── syllables_000.hdf5 projections, copies of features in │ └── syllables_001.hdf5 experiment_1/features. These are │ written by a DataContainer object.) ├── animal_2 │ ├── audio │ │ ├── 1.wav │ │ └── 2.wav │ ├── features │ │ ├── 1.csv │ │ └── 2.csv │ ├── spectrograms │ │ ├── syllables_000.hdf5 │ │ └── syllables_001.hdf5 │ └── projections │ ├── syllables_000.hdf5 │ └── syllables_001.hdf5 . . .
There should be a 1-to-1 correspondence between, for example, the syllables in animal_1/audio/baz.wav and the features described in animal_1/features/baz.csv. Analogously, the fifth entry in animal_2/spectrograms/syllables_000.hdf5 should describe the same syllable as the fifth entry in animal_2/projections/syllables_000.hdf5. There is no strict relationship, however, between individual files in animal_1/audio and animal_1/spectrograms. The hdf5 files in the spectrograms and projections directories should contain a subset of the syllables in the audio and features directories.
Then a DataContainer object can be initialized as:
>>> from ava.data.data_container import DataContainer >>> audio_dirs = ['animal_1/audio', 'animal_2/audio'] >>> spec_dirs = ['animal_1/spectrograms', 'animal_2/spectrograms'] >>> model_filename = 'checkpoint.tar' >>> dc = DataContainer(audio_dirs=audio_dirs, spec_dirs=spec_dirs, model_filename=model_filename) >>> latent_means = dc.request('latent_means')
It’s fine to leave some of the initialization parameters unspecified. If the DataContainer object is asked to do something it can’t, it will hopefully complain politely. Or at least informatively.
-
clear_projections()[source]¶ Remove all projections.
This deletes all the
.hdf5files inself.projection_dirs.
-
request(field)[source] Request some type of data.
Parameters: field (str) – The type of data being requested. Should come from … Raises: NotImplementedError – when field is not recognized. Note
Besides __init__ and clear_projections, this should be the only external-facing method.
-