Experiment Management


class fannypack.utils.Buddy(experiment_name: str, model: Optional[torch.nn.modules.module.Module] = None, *, checkpoint_dir: str = 'checkpoints', checkpoint_max_to_keep: int = 5, metadata_dir: str = 'metadata', log_dir: str = 'logs', optimizer_type: str = 'adam', optimizer_checkpoint_interval: float = 300, optimizer_names: Optional[List[str]] = None, verbose: bool = True, cpu_only: bool = False)

Bases: fannypack.utils._buddy_include._checkpointing._BuddyCheckpointing, fannypack.utils._buddy_include._optimizer._BuddyOptimizer, fannypack.utils._buddy_include._logging._BuddyLogging, fannypack.utils._buddy_include._metadata._BuddyMetadata

Buddy is a model manager that abstracts away PyTorch boilerplate. Buddy helps with…

  • Creating/using/managing optimizers,

  • Checkpointing (models + optimizers),

  • Namespaced/scoped Tensorboard logging,

  • Saving human-readable metadata files.

  • experiment_name (str) – Name for the current model/experiment. Should not contain hyphens.

  • model (torch.nn.Module) – PyTorch model to work with.

Keyword Arguments
  • checkpoint_dir (str, optional) – Path to save checkpoints into. Defaults to "checkpoints".

  • checkpoint_max_to_keep (int, optional) – Number of auto-saved checkpoints to keep. Set to None to keep all. Defaults to 5.

  • metadata_dir (str, optional) – Path to save metadata YAML files into. Defaults to "metadata".

  • log_dir (str, optional) – Path to save Tensorboard log files into. Defaults to "logs".

  • optimizer_type (str, optional) – Optimizer type to use: "adam" or "adadelta". Defaults to "adam".

  • optimizer_checkpoint_interval (float, optional) – How often to auto-checkpoint, as an interval in seconds. Time is computed from the first call to minimize(). Set to 0 to disable. Defaults to 300.

  • verbose (bool, optional) – Flag for toggling debug messages. Defaults to True.

  • cpu_only (bool, optional) – Set to True to turn off auto-detection of cuda support.

attach_model(model: torch.nn.modules.module.Module) None

Attach a model to our Buddy, and move it onto buddy.device.

If a model isn’t explicitly passed into the constructor’s model field, attach_model should be called before any optimization, checkpointing, etc happens.


model (nn.Module) – Model to attach.

property device: torch.device

Read-only interface for the active torch device. Auto-detected in the constructor based on CUDA support.

property model: torch.nn.modules.module.Module

Read-only interface for the attached model. Raises an error if no model is attached.


class fannypack.utils._buddy_include._checkpointing._BuddyCheckpointing(checkpoint_dir: str, checkpoint_max_to_keep: int)

Bases: abc.ABC

Buddy’s model checkpointing interface.

property checkpoint_labels: List[str]

Accessor for listing available checkpoint labels. These should be saved as: experiment_name-label.ckpt in the checkpoint_dir directory.


List[str] – Checkpoint labels, sorted alphabetically.

load_checkpoint(label: Optional[str] = None, path: Optional[str] = None, experiment_name: Optional[str] = None) None

Loads a checkpoint. By default, loads the one with the highest number of training iterations.

Can also be specified via a label or file path.

load_checkpoint_module(source: str, target: Optional[str] = None, label: Optional[str] = None, path: Optional[str] = None, experiment_name: Optional[str] = None) None

Loads parameters from a specific child module within a checkpoint. By default, loads the checkpoint with the highest number of training iterations.

Can also be specified via a label or file path.

load_checkpoint_optimizer(source: str, target: Optional[str] = None, label: Optional[str] = None, path: Optional[str] = None, experiment_name: Optional[str] = None) None

Loads state associated with a specific optimizer from a checkpoint. By default, loads the checkpoint with the highest number of training iterations.

Can also be specified via a label or file path.

load_checkpoint_optimizers(label=None, path=None, experiment_name=None) None

Loads all optimizer settings from a checkpoint. By default, loads the checkpoint with the highest number of training iterations.

Can also be specified via a label or file path.

save_checkpoint(label: Optional[str] = None) None

Saves a checkpoint, which can optionally be labeled.


class fannypack.utils._buddy_include._optimizer._BuddyOptimizer(optimizer_type: str, optimizer_checkpoint_interval: float)

Bases: abc.ABC

Buddy’s optimization interface.

get_learning_rate(optimizer_name: str = 'primary') float

Gets an optimizer learning rate.

minimize(loss: torch.Tensor, optimizer_name: str = 'primary', *, retain_graph: bool = False, checkpoint_interval: Optional[float] = None, clip_grad_max_norm: Optional[float] = None) None

Compute gradients and use them to minimize a loss function.

property optimizer_steps: int

Read-only interface for # of steps taken by optimizer.

set_default_learning_rate(value: Union[float, Callable[[int], float]]) None

Sets a default learning rate for new optimizers.

set_learning_rate(value: Union[float, Callable[[int], float]], optimizer_name: str = 'primary') None

Sets an optimizer learning rate. Accepts either a floating point learning rate or a schedule function (int steps -> float LR).

Tensorboard Logging

class fannypack.utils._buddy_include._logging._BuddyLogging(log_dir: str)

Bases: abc.ABC

Buddy’s TensorBoard logging interface.

log_grad_histogram(scope: str = 'grad') None

Log model gradients into a histogram. Should be called after buddy.minimize().

Naming: with scope set to “grad”, a parameter name “model.Linear.bias” will be logged to the tag buddy.log_scope_prefix("grad/model/Linear/bias").


scope (str, optional) – Scope to log gradients into. Defaults to “grad”.

log_image(name: str, image: Union[torch.Tensor, numpy.ndarray], dataformats: str = 'CHW') None

Convenience function for logging an image tensor for visualization in TensorBoard.

Equivalent to:

  • name (str) – Identifier for Tensorboard.

  • image (torch.Tensor or np.ndarray) – Image to log.

  • dataformats (str, optional) – Dimension ordering. Defaults to “CHW”.

log_parameters_histogram(scope: str = 'weights', *, ignore_zero_grad: bool = True) None

Log model weights into a histogram.

Naming: with scope set to “weights”, a parameter name “model.Linear.bias” will be logged to the tag buddy.log_scope_prefix("weights/model/Linear/bias").

  • scope (str, optional) – Scope to log gradients into. Defaults to “weights”.

  • ignore_zero_grad (bool, optional) – Ignore parameters without gradients: decreases log sizes when only parts of models are being trained. Defaults to True.

log_scalar(name: str, value: Union[torch.Tensor, numpy.ndarray, float]) None

Convenience function for logging a scalar for visualization in TensorBoard.

Equivalent to:

  • name (str) – Identifier for Tensorboard.

  • value (torch.Tensor, np.ndarray, or float) – Value to log.

log_scope(scope: str) Generator[None, None, None]

Returns a context manager that scopes log names.

Example usage:

with buddy.log_scope("scope"):
    # Logs to scope/loss
    buddy.log_scalar("loss", loss_tensor)

scope (str) – Name of scope.

log_scope_pop(scope: Optional[str] = None) None

Pop a scope we logged tensors into. See log_scope_push().


scope (str, optional) – Name of scope. Needs to be the top one in the stack.

log_scope_prefix(name: str = '') str

Get or apply the current log scope prefix.

Example usage:

print(buddy.log_scope_prefix()) # ""

with buddy.log_scope("scope0"):
    print(buddy.log_scope_prefix("loss")) # "scope0/loss"

    with buddy.log_scope("scope1"):
        print(buddy.log_scope_prefix()) # "scope0/scope1/"

name (str, optional) – Name to prepend a prefix to. Defaults to an empty string.


str – Scoped log name, or scope prefix if input is empty.

log_scope_push(scope: str) None

Push a scope to log tensors into.

Example usage:


    # Logs to scope/loss
    buddy.log_scalar("loss", loss_tensor)

    buddy.log_scope_pop("scope") # name parameter is optional

:param scope: Name of scope.
:type scope: str
property log_writer: torch.utils.tensorboard.writer.SummaryWriter

Accessor for standard Tensorboard SummaryWriter. Instantiated lazily.

Experiment Metadata

class fannypack.utils._buddy_include._metadata._BuddyMetadata(metadata_dir: str)

Bases: abc.ABC

Buddy’s experiment metadata management interface.

add_metadata(content: Dict[str, Any]) None

Add human-readable metadata for this experiment. Input should be a dictionary that is merged with existing metadata.

load_metadata(experiment_name: Optional[str] = None, metadata_dir: Optional[str] = None, path: Optional[str] = None, _write=True) None

Read existing metadata file. Note that metadata is loaded automatically: this only needs to be called if loading across experiments.

Overwrites existing metadata.

property metadata: Dict[str, Any]

Read-only interface for experiment metadata.

property metadata_path: str

Read-only path to my metadata file.

set_metadata(content: Dict[str, Any]) None

Assign human-readable metadata for this experiment. Input should be a dictionary that replaces existing metadata.

Command-line Interface

Buddy’s CLI currently supports four primary functions:

  • buddy delete [experiment_name]: Delete an existing experiment. Displays a selection menu with metadata preview if no experiment name is passed in.

  • buddy info {experiment_name}: Print summary + metadata of an existing experiment.

  • buddy list: Print table of existing experiments + basic information.

  • buddy rename {source} {dest}: Rename an existing experiment.

For more details and a full list of options, run buddy {subcommand} --help.

The Buddy CLI also has full support for autcompleting experiment names. This needs to be registered in your .bashrc to be enabled:

# Append to .bashrc
eval "$(register-python-argcomplete buddy)"

Alternatively, for zsh:

# Append to .zshrc
autoload -U +X compinit && compinit
autoload -U +X bashcompinit && bashcompinit
eval "$(register-python-argcomplete buddy)"