ava.models package¶
Submodules¶
ava.models.vae module¶
A Variational Autoencoder (VAE) for spectrogram data.
VAE References¶
| [1] | Kingma, Diederik P., and Max Welling. “Auto-encoding variational bayes.” arXiv preprint arXiv:1312.6114 (2013). |
| [2] | Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. “Stochastic backpropagation and approximate inference in deep generative models.” arXiv preprint arXiv:1401.4082 (2014). |
-
class
ava.models.vae.VAE(save_dir='', lr=0.001, z_dim=32, model_precision=10.0, device_name='auto')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectVariational Autoencoder class for single-channel images.
-
save_dir¶ Directory where the model is saved. Defaults to
''.Type: str, optional
-
lr¶ Model learning rate. Defaults to
1e-3.Type: float, optional
-
z_dim¶ Latent dimension. Defaults to
32.Type: int, optional
-
model_precision¶ Precision of the observation model. Defaults to
10.0.Type: float, optional
-
device_name¶ Name of device to train the model on. When
'auto'is passed,'cuda'is chosen iftorch.cuda.is_available(), otherwise'cpu'is chosen. Defaults to'auto'.Type: {‘cpu’, ‘cuda’, ‘auto’}, optional
Notes
The model is trained to maximize the standard ELBO objective:
\[\mathcal{L} = \mathbb{E}_{q(z|x)} log p(x,z) + \mathbb{H}[q(z|x)]\]where \(p(x,z) = p(z)p(x|z)\) and \(\mathbb{H}\) is differential entropy. The prior \(p(z)\) is a unit spherical normal distribution. The conditional distribution \(p(x|z)\) is set as a spherical normal distribution to prevent overfitting. The variational distribution, \(q(z|x)\) is an approximately rank-1 multivariate normal distribution. Here, \(q(z|x)\) and \(p(x|z)\) are parameterized by neural networks. Gradients are passed through stochastic layers via the reparameterization trick, implemented by the PyTorch rsample method.
The dimensions of the network are hard-coded for use with 128 x 128 spectrograms. Although a desired latent dimension can be passed to __init__, the dimensions of the network limit the practical range of values roughly 8 to 64 dimensions. Fiddling with the image dimensions will require updating the parameters of the layers defined in _build_network.
-
decode(z)[source]¶ Compute \(p(x|z)\).
\[p(x|z) = \mathcal{N}(\mu, \Lambda)\]\[\Lambda = \mathtt{model\_precision} \cdot I\]where \(\mu\) is a deterministic function of z, \(\Lambda\) is a precision matrix, and \(I\) is the identity matrix.
Parameters: z (torch.Tensor) – Batch of latent samples with shape [batch_size, self.z_dim]Returns: x – Batch of means mu, described above. Shape: [batch_size, X_DIM=128*128]Return type: torch.Tensor
-
encode(x)[source]¶ Compute \(q(z|x)\).
\[q(z|x) = \mathcal{N}(\mu, \Sigma)\]\[\Sigma = u u^{T} + \mathtt{diag}(d)\]where \(\mu\), \(u\), and \(d\) are deterministic functions of x and \(\Sigma\) denotes a covariance matrix.
Parameters: x (torch.Tensor) – The input images, with shape: [batch_size, height=128, width=128]Returns: - mu (torch.Tensor) – Posterior mean, with shape
[batch_size, self.z_dim] - u (torch.Tensor) – Posterior covariance factor, as defined above. Shape:
[batch_size, self.z_dim] - d (torch.Tensor) – Posterior diagonal factor, as defined above. Shape:
[batch_size, self.z_dim]
- mu (torch.Tensor) – Posterior mean, with shape
-
forward(x, return_latent_rec=False)[source]¶ Send x round trip and compute a loss.
In more detail: Given x, compute \(q(z|x)\) and sample: \(\hat{z} \sim q(z|x)\) . Then compute \(\log p(x|\hat{z})\), the log-likelihood of x, the input, given \(\hat{z}\), the latent sample. We will also need the likelihood of \(\hat{z}\) under the model’s prior: \(p(\hat{z})\), and the entropy of the latent conditional distribution, \(\mathbb{H}[q(z|x)]\) . ELBO can then be estimated as:
\[1/N \sum_{i=1}^N \mathbb{E}_{\hat{z} \sim q(z|x_i)} \log p(x_i,\hat{z}) + \mathbb{H}[q(z|x_i)]\]where \(N\) denotes the number of samples from the data distribution and the expectation is estimated using a single latent sample, \(\hat{z}\). In practice, the outer expectation is estimated using minibatches.
Parameters: - x (torch.Tensor) – A batch of samples from the data distribution (spectrograms).
Shape:
[batch_size, height=128, width=128] - return_latent_rec (bool, optional) – Whether to return latent means and reconstructions. Defaults to
False.
Returns: - loss (torch.Tensor) – Negative ELBO times the batch size. Shape:
[] - latent (numpy.ndarray, if return_latent_rec) – Latent means. Shape:
[batch_size, self.z_dim] - reconstructions (numpy.ndarray, if return_latent_rec) – Reconstructed means. Shape:
[batch_size, height=128, width=128]
- x (torch.Tensor) – A batch of samples from the data distribution (spectrograms).
Shape:
-
get_latent(loader)[source]¶ Get latent means for all syllable in the given loader.
Parameters: loader (torch.utils.data.Dataloader) – ava.models.vae_dataset.SyllableDataset Dataloader. Returns: latent – Latent means. Shape: [len(loader.dataset), self.z_dim]Return type: numpy.ndarray Note
- Make sure your loader is not set to shuffle if you’re going to match these with labels or other fields later.
-
load_state(filename)[source]¶ Load all the model parameters from the given
.tarfile.The
.tarfile should be written by self.save_state.Parameters: filename (str) – File containing a model state. Note
- self.lr, self.save_dir, and self.z_dim are not loaded.
-
test_epoch(test_loader)[source]¶ Test the model on a held-out test set, return an ELBO estimate.
Parameters: test_loader (torch.utils.data.Dataloader) – ava.models.vae_dataset.SyllableDataset Dataloader for test set Returns: elbo – An unbiased estimate of the ELBO, estimated using samples from test_loader. Return type: float
-
train_epoch(train_loader)[source]¶ Train the model for a single epoch.
Parameters: train_loader (torch.utils.data.Dataloader) – ava.models.vae_dataset.SyllableDataset Dataloader for training set Returns: elbo – A biased estimate of the ELBO, estimated using samples from train_loader. Return type: float
-
train_loop(loaders, epochs=100, test_freq=2, save_freq=10, vis_freq=1)[source]¶ Train the model for multiple epochs, testing and saving along the way.
Parameters: - loaders (dictionary) – Dictionary mapping the keys
'test'and'train'to respective torch.utils.data.Dataloader objects. - epochs (int, optional) – Number of (possibly additional) epochs to train the model for.
Defaults to
100. - test_freq (int, optional) – Testing is performed every test_freq epochs. Defaults to
2. - save_freq (int, optional) – The model is saved every save_freq epochs. Defaults to
10. - vis_freq (int, optional) – Syllable reconstructions are plotted every vis_freq epochs.
Defaults to
1.
- loaders (dictionary) – Dictionary mapping the keys
-
visualize(loader, num_specs=5, gap=(2, 6), save_filename='reconstruction.pdf')[source]¶ Plot spectrograms and their reconstructions.
Spectrograms are chosen at random from the Dataloader Dataset.
Parameters: - loader (torch.utils.data.Dataloader) – Spectrogram Dataloader
- num_specs (int, optional) – Number of spectrogram pairs to plot. Defaults to
5. - gap (int or tuple of two ints, optional) – The vertical and horizontal gap between images, in pixels. Defaults
to
(2,6). - save_filename (str, optional) – Where to save the plot, relative to self.save_dir. Defaults to
'temp.pdf'.
Returns: - specs (numpy.ndarray) – Spectgorams from loader.
- rec_specs (numpy.ndarray) – Corresponding spectrogram reconstructions.
-
-
ava.models.vae.X_DIM= 16384¶ freq_bins * time_binsType: Processed spectrogram dimension
-
ava.models.vae.X_SHAPE= (128, 128)¶ [freq_bins, time_bins]Type: Processed spectrogram shape
ava.models.vae_dataset module¶
Methods for feeding syllable data to the VAE.
Meant to be used with ava.models.vae.VAE.
-
class
ava.models.vae_dataset.SyllableDataset(filenames, sylls_per_file, transform=None)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjecttorch.utils.data.Dataset for animal vocalization syllables
-
ava.models.vae_dataset.get_syllable_data_loaders(partition, batch_size=64, shuffle=(True, False), num_workers=4)[source]¶ Return a pair of DataLoaders given a test/train split.
Parameters: - partition (dictionary) – Test train split: a dictionary that maps the keys ‘test’ and ‘train’ to disjoint lists of .hdf5 filenames containing syllables.
- batch_size (int, optional) – Batch size of the returned Dataloaders. Defaults to 32.
- shuffle (tuple of bools, optional) – Whether to shuffle data for the train and test Dataloaders, respectively. Defaults to (True, False).
- num_workers (int, optional) – How many subprocesses to use for data loading. Defaults to 3.
Returns: dataloaders – Dictionary mapping two keys,
'test'and'train', to respective torch.utils.data.Dataloader objects.Return type: dictionary
-
ava.models.vae_dataset.get_syllable_partition(dirs, split, shuffle=True, max_num_files=None)[source]¶ Partition the filenames into a random test/train split.
Parameters: - dirs (list of strings) – List of directories containing saved syllable hdf5 files.
- split (float) – Portion of the hdf5 files to use for training, \(0 < \mathtt{split} \leq 1.0\)
- shuffle (bool, optional) – Whether to shuffle the hdf5 files. Defaults to True.
- max_num_files ({int, None}, optional) – The number of files in the train and test partitions <= max_num_files.
If
None, all files are used. Defaults toNone.
Returns: partition – Contains two keys,
'test'and'train', that map to lists of hdf5 files. Defines the random test/train split.Return type: dict
ava.models.window_vae_dataset module¶
Methods for feeding randomly sampled spectrogram data to the shotgun VAE.
Meant to be used with ava.models.vae.VAE.
TO DO¶
- replace affinewarp with ava.preprocessing.warping
-
ava.models.window_vae_dataset.DEFAULT_WARP_PARAMS= {'l2_reg_scale': 1e-07, 'n_knots': 0, 'smoothness_reg_scale': 0.1, 'warp_reg_scale': 0.01}¶ Default time-warping parameters sent to affinewarp
-
class
ava.models.window_vae_dataset.FixedWindowDataset(audio_filenames, roi_filenames, p, transform=None, dataset_length=2048, min_spec_val=None)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject-
write_hdf5_files(save_dir, num_files=500, sylls_per_file=100)[source]¶ Write hdf5 files containing spectrograms of random audio chunks.
- Write to multiple directories.
Note
- This should be consistent with ava.preprocessing.preprocess.process_sylls.
Parameters: - save_dir (str) – Directory to save hdf5s in.
- num_files (int, optional) – Number of files to save. Defaults to
500. - sylls_per_file (int, optional) – Number of syllables in each file. Defaults to
100.
-
-
class
ava.models.window_vae_dataset.WarpedWindowDataset(audio_filenames, p, transform=None, dataset_length=2048, load_warp=False, save_warp=True, start_q=-0.1, stop_q=1.1, warp_fn=None, warp_params={}, warp_type='spectrogram')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject-
get_specific_item(query_filename, quantile)[source]¶ Return a specific window of birdsong as a Numpy array.
Parameters: - query_filename (str) – Audio filename.
- quantile (float) – 0 <=
quantile<= 1
Returns: spec – Spectrogram.
Return type: numpy.ndarray
-
get_whole_warped_spectrogram(query_filename, time_bins=128)[source]¶ Get an entire warped song motif.
Parameters: - query_filename (str) – Which audio file to use.
- time_bins (int, optional) – Number of time bins.
Returns: spec – Spectrogram.
Return type: numpy.ndarray
-
write_hdf5_files(save_dir, num_files=400, sylls_per_file=100)[source]¶ Write hdf5 files containing spectrograms of random audio chunks.
Note
This should be consistent with
ava.preprocessing.preprocess.process_sylls.- Add the option to also write segments. This could be useful for noise removal.
Parameters: - save_dir (str) – Where to write.
- num_files (int, optional) – Number of files to write. Defaults to 400.
- sylls_per_file (int, optional) – Number of spectrograms to write per file. Defaults to 100.
-
-
ava.models.window_vae_dataset.get_fixed_window_data_loaders(partition, p, batch_size=64, shuffle=(True, False), num_workers=4, min_spec_val=None)[source]¶ Get DataLoaders for training and testing: fixed-duration shotgun VAE
Parameters: - partition (dict) – Output of
ava.models.window_vae_dataset.get_window_partition. - p (dict) – Preprocessing parameters. Must contain keys: …
- batch_size (int, optional) – Defaults to
64. - shuffle (tuple of bool, optional) – Whether to shuffle train and test sets, respectively. Defaults to
(True, False). - num_workers (int, optional) – Number of CPU workers to feed data to the network. Defaults to
4.
Returns: loaders – Maps the keys
'train'and'test'to their respective DataLoaders.Return type: dict
- partition (dict) – Output of
-
ava.models.window_vae_dataset.get_warped_window_data_loaders(audio_dirs, p, batch_size=64, num_workers=4, load_warp=False, warp_fn=None, warp_params={}, warp_type='spectrogram')[source]¶ Get DataLoaders for training and testing: warped shotgun VAE
Warning
- Audio files must all be the same duration! You can use
segmenting.utils.write_segments_to_audio to extract audio from song
segments, writing them as separate
.wavfiles.
- Add a train/test split!
Parameters: - audio_dirs (list of str) – Audio directories.
- p (dict) – Preprocessing parameters. Must contain keys:
'window_length','nperseg','noverlap','min_freq','max_freq','spec_min_val', and'spec_max_val'. - batch_size (int, optional) – DataLoader batch size. Defaults to
64. - num_workers (int, optional) – Number of CPU workers to retrieve data for the model. Defaults to
4. - load_warp (bool, optional) – Whether to load a previously saved time warping result. Defaults to
False. - warp_fn ({str, None}, optional) – Where the x-knots and y-knots should be saved and loaded. Defaults to
None. - warp_params (dict, optional) – Parameters passed to affinewarp. Defaults to
{}. - warp_type ({
'amplitude','spectrogram','null'}, optional) – Whether to time-warp using ampltidue traces, full spectrograms, or not warp at all. Defaults to'spectrogram'.
Returns: loaders – Maps the keys
'train'and'test'to their respective DataLoaders.Return type: dict
- Audio files must all be the same duration! You can use
segmenting.utils.write_segments_to_audio to extract audio from song
segments, writing them as separate
-
ava.models.window_vae_dataset.get_window_partition(audio_dirs, roi_dirs, split=0.8, shuffle=True, exclude_empty_roi_files=True)[source]¶ Get a train/test split for fixed-duration shotgun VAE.
Parameters: - audio_dirs (list of str) – Audio directories.
- roi_dirs (list of str) – ROI (segment) directories.
- split (float, optional) – Train/test split. Defaults to
0.8, indicating an 80/20 train/test split. - shuffle (bool, optional) – Whether to shuffle at the audio file level. Defaults to
True. - exclude_empty_roi_files (bool, optional) – Defaults to
True.
Returns: partition – Defines the test/train split. The keys
'test'and'train'each map to a dictionary with keys'audio'and'rois', which both map to numpy arrays containing filenames.Return type: dict