ava.models package¶

Submodules¶

ava.models.vae module¶

A Variational Autoencoder (VAE) for spectrogram data.

VAE References¶

[1]	Kingma, Diederik P., and Max Welling. “Auto-encoding variational bayes.” arXiv preprint arXiv:1312.6114 (2013). https://arxiv.org/abs/1312.6114

[2]	Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. “Stochastic backpropagation and approximate inference in deep generative models.” arXiv preprint arXiv:1401.4082 (2014). https://arxiv.org/abs/1401.4082

class ava.models.vae.VAE(save_dir='', lr=0.001, z_dim=32, model_precision=10.0, device_name='auto')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Variational Autoencoder class for single-channel images.

save_dir¶

Directory where the model is saved. Defaults to ''.

Type:	str, optional

lr¶

Model learning rate. Defaults to 1e-3.

Type:	float, optional

z_dim¶

Latent dimension. Defaults to 32.

Type:	int, optional

model_precision¶

Precision of the observation model. Defaults to 10.0.

Type:	float, optional

device_name¶

Name of device to train the model on. When 'auto' is passed, 'cuda' is chosen if torch.cuda.is_available(), otherwise 'cpu' is chosen. Defaults to 'auto'.

Type:	{‘cpu’, ‘cuda’, ‘auto’}, optional

Notes

The model is trained to maximize the standard ELBO objective:

\[\mathcal{L} = \mathbb{E}_{q(z|x)} log p(x,z) + \mathbb{H}[q(z|x)]\]

where \(p(x,z) = p(z)p(x|z)\) and \(\mathbb{H}\) is differential entropy. The prior \(p(z)\) is a unit spherical normal distribution. The conditional distribution \(p(x|z)\) is set as a spherical normal distribution to prevent overfitting. The variational distribution, \(q(z|x)\) is an approximately rank-1 multivariate normal distribution. Here, \(q(z|x)\) and \(p(x|z)\) are parameterized by neural networks. Gradients are passed through stochastic layers via the reparameterization trick, implemented by the PyTorch rsample method.

The dimensions of the network are hard-coded for use with 128 x 128 spectrograms. Although a desired latent dimension can be passed to __init__, the dimensions of the network limit the practical range of values roughly 8 to 64 dimensions. Fiddling with the image dimensions will require updating the parameters of the layers defined in _build_network.

decode(z)[source]¶

Compute \(p(x|z)\).

\[p(x|z) = \mathcal{N}(\mu, \Lambda)\]

\[\Lambda = \mathtt{model\_precision} \cdot I\]

where \(\mu\) is a deterministic function of z, \(\Lambda\) is a precision matrix, and \(I\) is the identity matrix.

Parameters:	z (torch.Tensor) – Batch of latent samples with shape `[batch_size, self.z_dim]`
Returns:	x – Batch of means mu, described above. Shape: `[batch_size, X_DIM=128*128]`
Return type:	torch.Tensor

encode(x)[source]¶

Compute \(q(z|x)\).

\[q(z|x) = \mathcal{N}(\mu, \Sigma)\]

\[\Sigma = u u^{T} + \mathtt{diag}(d)\]

where \(\mu\), \(u\), and \(d\) are deterministic functions of x and \(\Sigma\) denotes a covariance matrix.

Parameters:	x (torch.Tensor) – The input images, with shape: `[batch_size, height=128, width=128]`
Returns:	mu (torch.Tensor) – Posterior mean, with shape `[batch_size, self.z_dim]` u (torch.Tensor) – Posterior covariance factor, as defined above. Shape: `[batch_size, self.z_dim]` d (torch.Tensor) – Posterior diagonal factor, as defined above. Shape: `[batch_size, self.z_dim]`

forward(x, return_latent_rec=False)[source]¶

Send x round trip and compute a loss.

In more detail: Given x, compute \(q(z|x)\) and sample: \(\hat{z} \sim q(z|x)\) . Then compute \(\log p(x|\hat{z})\), the log-likelihood of x, the input, given \(\hat{z}\), the latent sample. We will also need the likelihood of \(\hat{z}\) under the model’s prior: \(p(\hat{z})\), and the entropy of the latent conditional distribution, \(\mathbb{H}[q(z|x)]\) . ELBO can then be estimated as:

\[1/N \sum_{i=1}^N \mathbb{E}_{\hat{z} \sim q(z|x_i)} \log p(x_i,\hat{z}) + \mathbb{H}[q(z|x_i)]\]

where \(N\) denotes the number of samples from the data distribution and the expectation is estimated using a single latent sample, \(\hat{z}\). In practice, the outer expectation is estimated using minibatches.

Parameters:

x (torch.Tensor) – A batch of samples from the data distribution (spectrograms). Shape: [batch_size, height=128, width=128]
return_latent_rec (bool, optional) – Whether to return latent means and reconstructions. Defaults to False.

Returns:

loss (torch.Tensor) – Negative ELBO times the batch size. Shape: []
latent (numpy.ndarray, if return_latent_rec) – Latent means. Shape: [batch_size, self.z_dim]
reconstructions (numpy.ndarray, if return_latent_rec) – Reconstructed means. Shape: [batch_size, height=128, width=128]

get_latent(loader)[source]¶

Get latent means for all syllable in the given loader.

Parameters:	loader (torch.utils.data.Dataloader) – ava.models.vae_dataset.SyllableDataset Dataloader.
Returns:	latent – Latent means. Shape: `[len(loader.dataset), self.z_dim]`
Return type:	numpy.ndarray

Note

Make sure your loader is not set to shuffle if you’re going to match these with labels or other fields later.

load_state(filename)[source]¶

Load all the model parameters from the given .tar file.

The .tar file should be written by self.save_state.

Parameters:	filename (str) – File containing a model state.

Note

self.lr, self.save_dir, and self.z_dim are not loaded.

save_state(filename)[source]¶: Save all the model parameters to the given file.

test_epoch(test_loader)[source]¶

Test the model on a held-out test set, return an ELBO estimate.

Parameters:	test_loader (torch.utils.data.Dataloader) – ava.models.vae_dataset.SyllableDataset Dataloader for test set
Returns:	elbo – An unbiased estimate of the ELBO, estimated using samples from test_loader.
Return type:	float

train_epoch(train_loader)[source]¶

Train the model for a single epoch.

Parameters:	train_loader (torch.utils.data.Dataloader) – ava.models.vae_dataset.SyllableDataset Dataloader for training set
Returns:	elbo – A biased estimate of the ELBO, estimated using samples from train_loader.
Return type:	float

train_loop(loaders, epochs=100, test_freq=2, save_freq=10, vis_freq=1)[source]¶

Train the model for multiple epochs, testing and saving along the way.

Parameters:

loaders (dictionary) – Dictionary mapping the keys 'test' and 'train' to respective torch.utils.data.Dataloader objects.
epochs (int, optional) – Number of (possibly additional) epochs to train the model for. Defaults to 100.
test_freq (int, optional) – Testing is performed every test_freq epochs. Defaults to 2.
save_freq (int, optional) – The model is saved every save_freq epochs. Defaults to 10.
vis_freq (int, optional) – Syllable reconstructions are plotted every vis_freq epochs. Defaults to 1.

visualize(loader, num_specs=5, gap=(2, 6), save_filename='reconstruction.pdf')[source]¶

Plot spectrograms and their reconstructions.

Spectrograms are chosen at random from the Dataloader Dataset.

Parameters:

loader (torch.utils.data.Dataloader) – Spectrogram Dataloader
num_specs (int, optional) – Number of spectrogram pairs to plot. Defaults to 5.
gap (int or tuple of two ints, optional) – The vertical and horizontal gap between images, in pixels. Defaults to (2,6).
save_filename (str, optional) – Where to save the plot, relative to self.save_dir. Defaults to 'temp.pdf'.

Returns:

specs (numpy.ndarray) – Spectgorams from loader.
rec_specs (numpy.ndarray) – Corresponding spectrogram reconstructions.

ava.models.vae.X_DIM = 16384¶

freq_bins * time_bins

Type:	Processed spectrogram dimension

ava.models.vae.X_SHAPE = (128, 128)¶

[freq_bins, time_bins]

Type:	Processed spectrogram shape

ava.models.vae_dataset module¶

Methods for feeding syllable data to the VAE.

Meant to be used with ava.models.vae.VAE.

class ava.models.vae_dataset.SyllableDataset(filenames, sylls_per_file, transform=None)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

torch.utils.data.Dataset for animal vocalization syllables

ava.models.vae_dataset.get_syllable_data_loaders(partition, batch_size=64, shuffle=(True, False), num_workers=4)[source]¶

Return a pair of DataLoaders given a test/train split.

Parameters:	partition (dictionary) – Test train split: a dictionary that maps the keys ‘test’ and ‘train’ to disjoint lists of .hdf5 filenames containing syllables. batch_size (int, optional) – Batch size of the returned Dataloaders. Defaults to 32. shuffle (tuple of bools, optional) – Whether to shuffle data for the train and test Dataloaders, respectively. Defaults to (True, False). num_workers (int, optional) – How many subprocesses to use for data loading. Defaults to 3.
Returns:	dataloaders – Dictionary mapping two keys, `'test'` and `'train'`, to respective torch.utils.data.Dataloader objects.
Return type:	dictionary

ava.models.vae_dataset.get_syllable_partition(dirs, split, shuffle=True, max_num_files=None)[source]¶

Partition the filenames into a random test/train split.

Parameters:	dirs (list of strings) – List of directories containing saved syllable hdf5 files. split (float) – Portion of the hdf5 files to use for training, \(0 < \mathtt{split} \leq 1.0\) shuffle (bool, optional) – Whether to shuffle the hdf5 files. Defaults to True. max_num_files ({int, None}, optional) – The number of files in the train and test partitions <= max_num_files. If `None`, all files are used. Defaults to `None`.
Returns:	partition – Contains two keys, `'test'` and `'train'`, that map to lists of hdf5 files. Defines the random test/train split.
Return type:	dict

ava.models.window_vae_dataset module¶

Methods for feeding randomly sampled spectrogram data to the shotgun VAE.

Meant to be used with ava.models.vae.VAE.

TO DO¶

replace affinewarp with ava.preprocessing.warping

ava.models.window_vae_dataset.DEFAULT_WARP_PARAMS = {'l2_reg_scale': 1e-07, 'n_knots': 0, 'smoothness_reg_scale': 0.1, 'warp_reg_scale': 0.01}¶: Default time-warping parameters sent to affinewarp

class ava.models.window_vae_dataset.FixedWindowDataset(audio_filenames, roi_filenames, p, transform=None, dataset_length=2048, min_spec_val=None)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

write_hdf5_files(save_dir, num_files=500, sylls_per_file=100)[source]¶

Write hdf5 files containing spectrograms of random audio chunks.

Write to multiple directories.

Note

This should be consistent with ava.preprocessing.preprocess.process_sylls.

Parameters:	save_dir (str) – Directory to save hdf5s in. num_files (int, optional) – Number of files to save. Defaults to `500`. sylls_per_file (int, optional) – Number of syllables in each file. Defaults to `100`.

class ava.models.window_vae_dataset.WarpedWindowDataset(audio_filenames, p, transform=None, dataset_length=2048, load_warp=False, save_warp=True, start_q=-0.1, stop_q=1.1, warp_fn=None, warp_params={}, warp_type='spectrogram')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

get_specific_item(query_filename, quantile)[source]¶

Return a specific window of birdsong as a Numpy array.

Parameters:	query_filename (str) – Audio filename. quantile (float) – 0 <= `quantile` <= 1
Returns:	spec – Spectrogram.
Return type:	numpy.ndarray

get_whole_warped_spectrogram(query_filename, time_bins=128)[source]¶

Get an entire warped song motif.

Parameters:	query_filename (str) – Which audio file to use. time_bins (int, optional) – Number of time bins.
Returns:	spec – Spectrogram.
Return type:	numpy.ndarray

write_hdf5_files(save_dir, num_files=400, sylls_per_file=100)[source]¶

Write hdf5 files containing spectrograms of random audio chunks.

Note

This should be consistent with ava.preprocessing.preprocess.process_sylls.

Add the option to also write segments. This could be useful for noise removal.

Parameters:	save_dir (str) – Where to write. num_files (int, optional) – Number of files to write. Defaults to 400. sylls_per_file (int, optional) – Number of spectrograms to write per file. Defaults to 100.

ava.models.window_vae_dataset.get_fixed_window_data_loaders(partition, p, batch_size=64, shuffle=(True, False), num_workers=4, min_spec_val=None)[source]¶

Get DataLoaders for training and testing: fixed-duration shotgun VAE

Parameters:	partition (dict) – Output of `ava.models.window_vae_dataset.get_window_partition`. p (dict) – Preprocessing parameters. Must contain keys: … batch_size (int, optional) – Defaults to `64`. shuffle (tuple of bool, optional) – Whether to shuffle train and test sets, respectively. Defaults to `(True, False)`. num_workers (int, optional) – Number of CPU workers to feed data to the network. Defaults to `4`.
Returns:	loaders – Maps the keys `'train'` and `'test'` to their respective DataLoaders.
Return type:	dict

ava.models.window_vae_dataset.get_warped_window_data_loaders(audio_dirs, p, batch_size=64, num_workers=4, load_warp=False, warp_fn=None, warp_params={}, warp_type='spectrogram')[source]¶

Get DataLoaders for training and testing: warped shotgun VAE

Warning

Audio files must all be the same duration! You can use segmenting.utils.write_segments_to_audio to extract audio from song segments, writing them as separate .wav files.

Add a train/test split!

Parameters:	audio_dirs (list of str) – Audio directories. p (dict) – Preprocessing parameters. Must contain keys: `'window_length'`, `'nperseg'`, `'noverlap'`, `'min_freq'`, `'max_freq'`, `'spec_min_val'`, and `'spec_max_val'`. batch_size (int, optional) – DataLoader batch size. Defaults to `64`. num_workers (int, optional) – Number of CPU workers to retrieve data for the model. Defaults to `4`. load_warp (bool, optional) – Whether to load a previously saved time warping result. Defaults to `False`. warp_fn ({str, None}, optional) – Where the x-knots and y-knots should be saved and loaded. Defaults to `None`. warp_params (dict, optional) – Parameters passed to affinewarp. Defaults to `{}`. warp_type ({`'amplitude'`, `'spectrogram'`, `'null'`}, optional) – Whether to time-warp using ampltidue traces, full spectrograms, or not warp at all. Defaults to `'spectrogram'`.
Returns:	loaders – Maps the keys `'train'` and `'test'` to their respective DataLoaders.
Return type:	dict

ava.models.window_vae_dataset.get_window_partition(audio_dirs, roi_dirs, split=0.8, shuffle=True, exclude_empty_roi_files=True)[source]¶

Get a train/test split for fixed-duration shotgun VAE.

Parameters:	audio_dirs (list of str) – Audio directories. roi_dirs (list of str) – ROI (segment) directories. split (float, optional) – Train/test split. Defaults to `0.8`, indicating an 80/20 train/test split. shuffle (bool, optional) – Whether to shuffle at the audio file level. Defaults to `True`. exclude_empty_roi_files (bool, optional) – Defaults to `True`.
Returns:	partition – Defines the test/train split. The keys `'test'` and `'train'` each map to a dictionary with keys `'audio'` and `'rois'`, which both map to numpy arrays containing filenames.
Return type:	dict

Module contents¶

AVA models module

Contains¶

ava.models.vae: Defines the variational autoencoder (VAE).
ava.models.vae_dataset: Feeds syllable data to the VAE.
ava.models.window_vae_dataset: Feeds random data to the (shotgun) VAE.