fifteen.experiments._experiment
Simple experiment manager for managing metadata, logs, and checkpoints.
Source: https://github.com/brentyi/dfgo/blob/master/lib/experiment_files.py
Module Contents
Classes
We define an "experiment" as a simple directory, where files associated with some |
Attributes
- fifteen.experiments._experiment.T
- fifteen.experiments._experiment.PytreeType
- fifteen.experiments._experiment.Pytree
- class fifteen.experiments._experiment.Experiment[source]
We define an “experiment” as a simple directory, where files associated with some run of a training script are co-located.
There’s very little real code here; instead we use a common experiment data directory to implement thin wrappers around:
flax.training.checkpoints
for checkpointing.PyYAML
andtyro
for serializing metadata.tensorboardX.SummaryWriter
for logging.
- data_dir :pathlib.Path
- verbose :bool = True
- write_metadata(self, name: str, object: Any) None [source]
Serialize an object as a yaml file, then save it to the experiment’s metadata directory. Includes special handling for dataclasses (via tyro).
- read_metadata(self, name: str, expected_type: Type[T]) T [source]
Load an object from the experiment’s metadata directory. Includes special handling for dataclasses (via tyro).
- save_checkpoint(self, target: Pytree, step: int, prefix: str = 'checkpoint_', keep: int = 1, overwrite: bool = False, keep_every_n_steps: int | None = None) str [source]
Thin wrapper around flax’s
save_checkpoint()
function. Returns a file name, as a string.
- restore_checkpoint(self, target: PytreeType, step: int | None = None, prefix: str = 'checkpoint_') PytreeType [source]
Thin wrapper around flax’s
restore_checkpoint()
function.
- log(self, log_data: fifteen.experiments.TensorboardLogData, step: int, log_scalars_every_n: int = 1, log_histograms_every_n: int = 1)[source]
Logging helper for Tensorboard.
For TensorboardLogData instances returned from
pmap
-transformed functions, seeTensorboardLogData.fix_sharded_scalars()
.
- assert_new(self) Experiment [source]
Makes sure that there are no existing checkpoints, logs, or metadata. Returns self.
- assert_exists(self) Experiment [source]
Makes sure that there are existing checkpoints, logs, or metadata. Returns self.
- clear(self) Experiment [source]
Deletes
self.data_dir
. This clears all checkpoints, logs, and metadata inside of it. Returns self.
- move(self, new_data_dir: pathlib.Path) Experiment [source]
Move all files corresponding to an experiment to a new location. Returns updated Experiment object.