ava.data package¶

Submodules¶

ava.data.data_container module¶

DataContainer class for linking directories containing different sorts of data.

This is meant to make plotting and analysis easier.

TO DO¶

request random subsets.
make sure input directories are iterable
add features to existing files.

ava.data.data_container.ALL_FIELDS = ['audio', 'sap_time', 'segments', 'segment_audio', 'latent_means', 'latent_mean_pca', 'latent_mean_umap', 'specs', 'onsets', 'offsets', 'audio_filenames', 'syllable_number', 'syllable_start_time', 'syllable_end_time', 'inter-syllable_interval', 'syllable_duration', 'starting_frequency', 'final_frequency', 'minimum_frequency', 'maximum_frequency', 'mean_frequency', 'frequency_bandwidth', 'total_syllable_energy', 'peak_syllable_amplitude', 'cluster', 'id', 'label', 'accepted', 'score', 'begin_time', 'end_time', 'call_length', 'principal_frequency', 'low_freq', 'high_freq', 'delta_freq', 'frequency_standard_deviation', 'slope', 'sinuosity', 'mean_power', 'tonality', 'syllable_duration_sap', 'syllable_start', 'mean_amplitude', 'mean_pitch', 'mean_FM', 'mean_AM2', 'mean_entropy', 'mean_pitch_goodness', 'mean_mean_freq', 'pitch_variance', 'FM_variance', 'entropy_variance', 'pitch_goodness_variance', 'mean_freq_variance', 'AM_variance']¶: All fields that can be requested by a DataContainer object.

class ava.data.data_container.DataContainer(audio_dirs=None, segment_dirs=None, spec_dirs=None, feature_dirs=None, projection_dirs=None, plots_dir='', model_filename=None, template_dir=None, verbose=True)[source]¶

Bases: object

Link directories containing different data sources for easy plotting.

The idea here is for plotting and analysis tools to accept a DataContainer, from which they can request different types of data. Those requests can then be handled here in a central location, which can cut down on redundant code and processing steps.

audio_dirs¶

Directories containing audio. Defaults to None.

Type:	{list of str, None}, optional

segment_dirs¶

Directories containing segmenting decisions.

Type:	{list of str, None}, optional

spec_dirs¶

Directories containing hdf5 files of spectrograms. These should be files output by ava.preprocessing.preprocessing. Defaults to None.

Type:	list of {str, None}, optional

model_filename¶

The VAE checkpoint to load. Written by models.vae.save_state. Defaults to None.

Type:	{str, None}, optional

projection_dirs¶

Directory containing different projections. This is where things like latent means, their projections, and handcrafted features found in feature_dirs are saved. Defaults to None.

Type:	list of {str, None}, optional

plots_dir¶

Directory to save plots. Defaults to ‘’ (current working directory).

Type:	str, optional

feature_dirs¶

Directory containing text files with different syllable features. For exmaple, this could contain exported MUPET, DeepSqueak or SAP syllable tables. Defaults to None.

Type:	list of {str, None}, optional

template_dir¶

Directory continaing audio files of song templates. Defaults to None.

Type:	{str, None}, optional

request(field)[source]¶: Request some type of data.

Notes

Supported directory structure:

├── animal_1
│   ├── audio                     (raw audio)
│   │   ├── foo.wav
│   │   ├── bar.wav
│   │   └── baz.wav
│   ├── features                 (output of MUPET, DeepSqueak, SAP, ...)
│   │   ├── foo.csv
│   │   ├── bar.csv
│   │   └── baz.csv
│   ├── spectrograms             (used to train models, written by
│   │   ├── syllables_000.hdf5   preprocessing.preprocess.process_sylls)
│   │   └── syllables_001.hdf5
│   └── projections              (latent means, UMAP, PCA, tSNE
│      ├── syllables_000.hdf5    projections, copies of features in
│      └── syllables_001.hdf5    experiment_1/features. These are
│                                written by a DataContainer object.)
├── animal_2
│   ├── audio
│   │   ├── 1.wav
│   │   └── 2.wav
│   ├── features
│   │   ├── 1.csv
│   │   └── 2.csv
│   ├── spectrograms
│   │   ├── syllables_000.hdf5
│   │   └── syllables_001.hdf5
│   └── projections
│       ├── syllables_000.hdf5
│       └── syllables_001.hdf5
.
.
.

There should be a 1-to-1 correspondence between, for example, the syllables in animal_1/audio/baz.wav and the features described in animal_1/features/baz.csv. Analogously, the fifth entry in animal_2/spectrograms/syllables_000.hdf5 should describe the same syllable as the fifth entry in animal_2/projections/syllables_000.hdf5. There is no strict relationship, however, between individual files in animal_1/audio and animal_1/spectrograms. The hdf5 files in the spectrograms and projections directories should contain a subset of the syllables in the audio and features directories.

Then a DataContainer object can be initialized as:

>>> from ava.data.data_container import DataContainer
>>> audio_dirs = ['animal_1/audio', 'animal_2/audio']
>>> spec_dirs = ['animal_1/spectrograms', 'animal_2/spectrograms']
>>> model_filename = 'checkpoint.tar'
>>> dc = DataContainer(audio_dirs=audio_dirs, spec_dirs=spec_dirs,      model_filename=model_filename)
>>> latent_means = dc.request('latent_means')

It’s fine to leave some of the initialization parameters unspecified. If the DataContainer object is asked to do something it can’t, it will hopefully complain politely. Or at least informatively.

clear_projections()[source]¶

Remove all projections.

This deletes all the .hdf5 files in self.projection_dirs.

request(field)[source]

Request some type of data.

Parameters:	field (str) – The type of data being requested. Should come from …
Raises:	NotImplementedError – when field is not recognized.

Note

Besides __init__ and clear_projections, this should be the only external-facing method.

ava.data package¶

Submodules¶

ava.data.data_container module¶

TO DO¶

Module contents¶