ava.data package

Submodules

ava.data.data_container module

DataContainer class for linking directories containing different sorts of data.

This is meant to make plotting and analysis easier.

TO DO

  • request random subsets.
  • make sure input directories are iterable
  • add features to existing files.
ava.data.data_container.ALL_FIELDS = ['audio', 'sap_time', 'segments', 'segment_audio', 'latent_means', 'latent_mean_pca', 'latent_mean_umap', 'specs', 'onsets', 'offsets', 'audio_filenames', 'syllable_number', 'syllable_start_time', 'syllable_end_time', 'inter-syllable_interval', 'syllable_duration', 'starting_frequency', 'final_frequency', 'minimum_frequency', 'maximum_frequency', 'mean_frequency', 'frequency_bandwidth', 'total_syllable_energy', 'peak_syllable_amplitude', 'cluster', 'id', 'label', 'accepted', 'score', 'begin_time', 'end_time', 'call_length', 'principal_frequency', 'low_freq', 'high_freq', 'delta_freq', 'frequency_standard_deviation', 'slope', 'sinuosity', 'mean_power', 'tonality', 'syllable_duration_sap', 'syllable_start', 'mean_amplitude', 'mean_pitch', 'mean_FM', 'mean_AM2', 'mean_entropy', 'mean_pitch_goodness', 'mean_mean_freq', 'pitch_variance', 'FM_variance', 'entropy_variance', 'pitch_goodness_variance', 'mean_freq_variance', 'AM_variance']

All fields that can be requested by a DataContainer object.

class ava.data.data_container.DataContainer(audio_dirs=None, segment_dirs=None, spec_dirs=None, feature_dirs=None, projection_dirs=None, plots_dir='', model_filename=None, template_dir=None, verbose=True)[source]

Bases: object

Link directories containing different data sources for easy plotting.

The idea here is for plotting and analysis tools to accept a DataContainer, from which they can request different types of data. Those requests can then be handled here in a central location, which can cut down on redundant code and processing steps.

audio_dirs

Directories containing audio. Defaults to None.

Type:{list of str, None}, optional
segment_dirs

Directories containing segmenting decisions.

Type:{list of str, None}, optional
spec_dirs

Directories containing hdf5 files of spectrograms. These should be files output by ava.preprocessing.preprocessing. Defaults to None.

Type:list of {str, None}, optional
model_filename

The VAE checkpoint to load. Written by models.vae.save_state. Defaults to None.

Type:{str, None}, optional
projection_dirs

Directory containing different projections. This is where things like latent means, their projections, and handcrafted features found in feature_dirs are saved. Defaults to None.

Type:list of {str, None}, optional
plots_dir

Directory to save plots. Defaults to ‘’ (current working directory).

Type:str, optional
feature_dirs

Directory containing text files with different syllable features. For exmaple, this could contain exported MUPET, DeepSqueak or SAP syllable tables. Defaults to None.

Type:list of {str, None}, optional
template_dir

Directory continaing audio files of song templates. Defaults to None.

Type:{str, None}, optional
request(field)[source]

Request some type of data.

Notes

Supported directory structure:

├── animal_1
│   ├── audio                     (raw audio)
│   │   ├── foo.wav
│   │   ├── bar.wav
│   │   └── baz.wav
│   ├── features                 (output of MUPET, DeepSqueak, SAP, ...)
│   │   ├── foo.csv
│   │   ├── bar.csv
│   │   └── baz.csv
│   ├── spectrograms             (used to train models, written by
│   │   ├── syllables_000.hdf5   preprocessing.preprocess.process_sylls)
│   │   └── syllables_001.hdf5
│   └── projections              (latent means, UMAP, PCA, tSNE
│      ├── syllables_000.hdf5    projections, copies of features in
│      └── syllables_001.hdf5    experiment_1/features. These are
│                                written by a DataContainer object.)
├── animal_2
│   ├── audio
│   │   ├── 1.wav
│   │   └── 2.wav
│   ├── features
│   │   ├── 1.csv
│   │   └── 2.csv
│   ├── spectrograms
│   │   ├── syllables_000.hdf5
│   │   └── syllables_001.hdf5
│   └── projections
│       ├── syllables_000.hdf5
│       └── syllables_001.hdf5
.
.
.

There should be a 1-to-1 correspondence between, for example, the syllables in animal_1/audio/baz.wav and the features described in animal_1/features/baz.csv. Analogously, the fifth entry in animal_2/spectrograms/syllables_000.hdf5 should describe the same syllable as the fifth entry in animal_2/projections/syllables_000.hdf5. There is no strict relationship, however, between individual files in animal_1/audio and animal_1/spectrograms. The hdf5 files in the spectrograms and projections directories should contain a subset of the syllables in the audio and features directories.

Then a DataContainer object can be initialized as:

>>> from ava.data.data_container import DataContainer
>>> audio_dirs = ['animal_1/audio', 'animal_2/audio']
>>> spec_dirs = ['animal_1/spectrograms', 'animal_2/spectrograms']
>>> model_filename = 'checkpoint.tar'
>>> dc = DataContainer(audio_dirs=audio_dirs, spec_dirs=spec_dirs,      model_filename=model_filename)
>>> latent_means = dc.request('latent_means')

It’s fine to leave some of the initialization parameters unspecified. If the DataContainer object is asked to do something it can’t, it will hopefully complain politely. Or at least informatively.

clear_projections()[source]

Remove all projections.

This deletes all the .hdf5 files in self.projection_dirs.

request(field)[source]

Request some type of data.

Parameters:field (str) – The type of data being requested. Should come from …
Raises:NotImplementedError – when field is not recognized.

Note

Besides __init__ and clear_projections, this should be the only external-facing method.

Module contents