fannypack.data

Google Drive Downloads

fannypack.data.download_drive_file(url: str, target_path: str, chunk_size=32768) → None

Download a file via a public Google Drive url.

Example usage:

download_file_from_google_drive(
    "https://drive.google.com/file/d/1AsY9Cs3xE0RSlr0FKlnSKHp6zIwFSvXe/view",
    "/home/brent/Downloads/test.pdf"
)
Parameters
  • url (str) – Google Drive url.

  • target_path (str) – Destination to write to.

fannypack.data.cached_drive_file(name: str, url: str) → str

Return a local path to a file from Google Drive. Downloads the file if it doesn’t exist yet locally.

By default, cached files live in ~/.cache/fannypack-drive-files/. It often makes sense to move this directory (eg to an NFS): see fannypack.data.set_cache_path().

Parameters
  • name (str) – Name of path, eg secret_key.pem.

  • url (str) – URL, eg https://drive.google.com/file/d/1AsY9Cs3xE0RSlr0FKlnSKHp6zIwFSvXe/view.

Returns

str – Local path to file.

fannypack.data.set_cache_path(path: str)

Set the cache location for fannypack.data.cached_drive_file().

Parameters

_cache_path (str) – New location for cached files. Defaults to ~/.cache/fannypack-drive-files/.

HDF5 for Trajectories

class fannypack.data.TrajectoriesFile(*args, **kwds)

Bases: collections.abc.Iterable, typing.Generic

An interface for reading/writing trajectories via h5py.

Each TrajectoriesFile represents an iterable list of trajectories, where trajectores are stored as dictionaries that map str keys to np.ndarray contents.

Example usage (read):

with TrajectoriesFile('test.hdf5') as traj_file:

    for traj in traj_file:
        print(traj.keys()) # list of keys
        print(traj['some-key-name']) # numpy array

Example usage (write):

traj_file = TrajectoriesFile('test.hdf5', read_only=False)

traj_file.add_meta({'label': 5})
traj_file.add_timestep({'a': 1, 'b': 2})
traj_file.add_timestep({'a': 3, 'b': 4})

with traj_file:
    traj_file.complete_trajectory()

print(len(traj_file)) # 1 trajectory!

with traj_file:
    print(traj_file[0]['label']) # 5
    print(traj_file[0]['a']) # [1, 3]
    print(traj_file[0]['b']) # [2, 4]

Note that some operations – ones that require interfacing with the filesytem – need to be called within a with statement.

Parameters
  • path (str) – File path for this trajectory file.

  • convert_doubles (bool) – Convert doubles to floats to shrink files.

  • read_only (bool, optional) – Open file in read-only mode.

  • compress (bool, optional) – Reduce filesize w/ gzip.

  • verbose (bool, optional) – Enable debug prints.

abandon_trajectory() → None

Abandon the current trajectory.

add_meta(content: Dict[str, numpy.ndarray]) → None

Add some metadata to the current trajectory.

Parameters

content (dict) – Map from metadata keys (str) to values (np.ndarray).

add_timestep(content: Dict[str, numpy.ndarray]) → None

Add a timestep to the current trajectory.

Parameters

content (dict) – Map from timestep keys (str) to values (np.ndarray).

clear() → None

Clear the contents of the TrajectoriesFile.

complete_trajectory() → None

Write the current trajectory to disk, and mark the start of a new trajectory. Must be called with the TrajectoriesFile object in a with statement.

The next call to add_timestep() will be time 0 of the next trajectory.

get_all(key: str) → list

Get contents associated with a key from all trajectories.

Parameters

key (str) – Content identifier.

Returns

list – List of contents. First index is trajectory #.

resize(count: int)

Expand or contract our TrajectoriesFile.