io

Readers/Loaders and Writers/Savers for various file types and systems.

All (callable) classes are configured via a unified API. The flexibility of switching the file system from local to, for example, a remote object storage by changing a single argument ensures painless transitions between development, staging, and production environments.

class Find(path='', storage=Storage.FILE, suffix='', max_depth=1, storage_kws=None)[source]

Bases: ArgRepr

List files by prefix and suffix on any supported filesystem.

Parameters:
  • path (str, optional) – Directory under which files should be discovered. Since it (or part of it) can also be provided later, when the callable instance is called, it is optional here. Defaults to an empty string.

  • storage (str, optional) – The type of file system to read from (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • suffix (str, optional) – The suffix to filter file names by. Defaults to an empty string, which will allow all and any suffixes.

  • max_depth (int, optional) – The maximum depth to descend into subdirectories. Defaults to 1. If set to None, all subdirectories will be visited recursively.

  • storage_kws (dict, optional) – Passed on as keyword arguments to the constructor of the file system.

Raises:
  • TypeError – If path is not a string, max_depth is not an int, or if storage_kws is not a dictionary.

  • ValueError – If storage is not among the currently supported file-system schemes, max_depth is smaller than 1, or if storage_kws is not a dictionary.

See also

Storage

Notes

Avoid creating explicit “subfolders” on cloud object storage! Depending on details and cloud service, these might actually be 0 bytes files with names ending in a trailing slash and would, thus, be included in the results of the current class.

__call__(path='')[source]

List files matching the given criteria on any supported filesystem.

Parameters:

path (str) – Directory under which files should be discovered. If it starts with a backslash, it will be interpreted as absolute, if not, as relative to the path specified at instantiation. Defaults to an empty string, which results in an unchanged path.

Returns:

The full paths to all files under the specified directory, filtered for their suffix (if any was given).

Return type:

list

Raises:

ValueError – If the final path is directly under root (e.g., “/file.suffix”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.

property fs

Fresh fsspec file system on first use, same thereafter.

property prefix

File-system specific URI prefix.

class Copy(src_base='', tgt_base=None, src_storage=Storage.FILE, tgt_storage=None, overwrite=False, skip=False, chunk_size=32, src_kws=None, tgt_kws=None)[source]

Bases: ArgRepr

Efficiently copy a file from one location/filesystem to another.

Parameters:
  • src_base (str, optional) – Base folder or bucket of the file to read or indeed the full, absolute path to the file to read. Because it (or part of it) can also be given later at call time, it defaults to an empty string here.

  • tgt_base (str, optional) – Base folder or bucket of the file to write or indeed the full, absolute path to the file to write. Defaults to None, which will be resolved to src_base.

  • src_storage (str, optional) – The type of file system to read from (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • tgt_storage (str, optional) – The type of file system to read from (“file”, “s3”, etc.). Defaults to src_storage if not set. Use the Storage enum to avoid typos.

  • overwrite (bool, optional) – Whether to silently overwrite the destination file. Defaults to False, which will raise an exception if it already exists.

  • skip (bool, optional) – Whether to silently do nothing if the target file already exists. Defaults to False.

  • chunk_size (int, optional) – Chunk size to use when streaming the selected file in MiB. Defaults to 32 (MiB).

  • src_kws (dict, optional) – Passed on as keywords to the constructor of the source file system.

  • tgt_kws (dict, optional) – Passed on as keywords to the constructor of the target file system.

Raises:
  • TypeError – If path is not a string, chunk_size is not an integer or either src_kws or tgt_kws are not dictionaries.

  • ValueError – If storage is not among the currently supported file-system schemes, the chunk_size is smaller than 1 (MiB), or if either src_kws or tgt_kws are not dictionaries.

See also

Storage

__call__(path='')[source]

Efficiently copy a single file between file systems.

Parameters:

path (str, optional) – Full path or sub-folder relative to src_base and tgt_base of the file to copy.

Returns:

Full path to the target file.

Return type:

str

Raises:
  • FileExistsError – If the destination file already exists, skip is False and overwrite is also False.

  • ValueError – If the final path is directly under root (e.g., “/file.txt”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.

property chunk_bytes

Bytes to flush from/to the file system in one go.

property src_fs

Fresh fsspec source file system on first use, same thereafter.

property tgt_fs

Fresh fsspec target file system on first use, same thereafter.

class DataFrame2Parquet(path, storage=Storage.FILE, overwrite=False, skip=False, chunk_size=32, storage_kws=None, parquet_kws=None)[source]

Bases: Writer

Save a pandas or polars dataframe to any supported file system.

Parameters:
  • path (str) – The absolute path to the parquet file to save the dataframe into. May include two or more forward slashes (subdirectories will be created) and string placeholders (i.e., pairs of curly brackets) that will be interpolated when instances are called.

  • storage (str, optional) – The type of file system to write to (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • overwrite (bool, optional) – Whether to silently overwrite the destination file. Defaults to False, which will raise an exception if it already exists.

  • skip (bool, optional) – Whether to silently do nothing if the target file already exists. Defaults to False.

  • chunk_size (int, optional) – Chunk size to use when writing to the selected file system in MiB. Defaults to 32 (MiB).

  • storage_kws (dict, optional) – Passed on as keywords to the constructor of the file system.

  • parquet_kws (dict, optional) – Passed on as keyword arguments to the dataframe’s write method. See the documentation for to_parquet and write_parquet methods.

Raises:
  • TypeError – If path is not a string, chunk_size is not an integer or either storage_kws or parquet_kws are not dictionaries.

  • ValueError – If storage is not among the currently supported file-system schemes, mode not among the supported file-mode options, the chunk_size is smaller than 1 (MiB), or if either storage_kws or parquet_kws are not dictionaries.

See also

Storage

Note

Make sure you do a reset_index() before you save a pandas dataframe! Otherwise, you might have unexpected extra columns in the parquet file and potentially undesirable (if not unpredictable) behavior when you load it again.

__call__(df, *parts)[source]

Write a pandas or polars dataframe to a supported file system.

Parameters:
  • df (DataFrame) – The pandas or polars dataframe to save.

  • *parts (str) – Fragments that will be interpolated into the path given at instantiation. Obviously, there must be at least as many as there are placeholders in the path.

Returns:

An empty tuple.

Return type:

tuple

Raises:
  • IndexError – If the path given at instantiation has more string placeholders that there are parts.

  • FileExistsError – If the destination file already exists, skip is False and overwrite is also False.

  • ValueError – If the final path is directly under root (e.g., “/file.parquet”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.

class Parquet2DataFrame(path='', storage=Storage.FILE, chunk_size=32, storage_kws=None, parquet_kws=None, bear=Bears.PANDAS)[source]

Bases: Reader

Read a parquet file from anywhere into a pandas or polars dataframe.

Parameters:
  • path (str, optional) – Directory under which the parquet file is located or full path to the parquet file. Since it (or part of it) can also be provided later, when the callable instance is called, it is optional here. Defaults to an empty string.

  • storage (str, optional) – The type of file system to read from (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • chunk_size (float, optional) – Chunk size to use when reading from the selected file system in MiB. Defaults to 32 (MiB).

  • storage_kws (dict, optional) – Passed on as keyword arguments to the constructor of the file system.

  • parquet_kws (dict, optional) – Passed on as keyword arguments to the dataframe’s read method. See the documentation for pandas.read_parquet and polars.read_parquet top-level functions.

  • bear (str, optional) – Type of dataframe to return. Can be one of “pandas” or “polars”. Use the Bears enum to avoid typos. Defaults to “pandas”.

Raises:
  • TypeError – If path is not a string, chunk_size is not an integer or either storage_kws or parquet_kws are not dictionaries.

  • ValueError – If storage is not among the currently supported file-system schemes, mode not among the supported file-mode options, the chunk_size is smaller than 1 (MiB), or if storage_kws is not a dictionary.

See also

Storage, Bears

__call__(path='')[source]

Read a specific parquet file from the specified file system.

Parameters:

path (str) – Path (including file name) to the parquet file to read. If it starts with a backslash, it will be interpreted as absolute, if not, as relative to the path specified at instantiation. Defaults to an empty string, which results in an unchanged path.

Returns:

Pandas or polars dataframe.

Return type:

DataFrame

Raises:

ValueError – If the final path is directly under root (e.g., “/file.parquet”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.

property read

Top-level read_parquet function of either pandas or polars.

class TomlWriter(path, storage=Storage.FILE, overwrite=False, skip=False, chunk_size=32, storage_kws=None, toml_kws=None, prune=False)[source]

Bases: Writer

Save a TOML file to any supported file system.

Parameters:
  • path (str) – The absolute path to the file to save the TOML into. May include two or more forward slashes (subdirectories will be created) and string placeholders (i.e., pairs of curly brackets) that will be interpolated when instances are called.

  • storage (str, optional) – The type of file system to write to (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • overwrite (bool, optional) – Whether to silently overwrite the destination file. Defaults to False, which will raise an exception if it already exists.

  • skip (bool, optional) – Whether to silently do nothing if the target file already exists. Defaults to False.

  • chunk_size (int, optional) – Chunk size to use when writing to the selected file system in MiB. Defaults to 32 (MiB).

  • storage_kws (dict, optional) – Passed on as keywords to the constructor of the file system.

  • prune (bool, optional) – Whether to silently drop non-string keys and None values from the dictionary-like object to save as TOML. Defaults to False.

  • toml_kws (dict, optional) – Passed on as keyword arguments to the tomli-w.dump() function. See the tomli-w GitHub page for options.

Raises:
  • TypeError – If path is not a string, chunk_size is not an integer or either storage_kws or toml_kws are not dictionaries.

  • ValueError – If storage is not among the currently supported file-system schemes, mode not among the supported file-mode options, the chunk_size is smaller than 1 (MiB), or if either storage_kws or toml_kws are not dictionaries.

See also

Storage

__call__(toml, *parts)[source]

Serialize a dictionary-like object and write it to TOML file.

Parameters:
  • toml (dict) – The dictionary-like object to save as TOML.

  • *parts (str, optional) – Fragments that will be interpolated into the path string given at instantiation. Obviously, there must be at least as many as there are placeholders in the path.

Returns:

An empty tuple.

Return type:

tuple

Raises:
  • IndexError – If the path given at instantiation has more string placeholders that there are parts.

  • FileExistsError – If the destination file already exists, skip is False and overwrite is also False.

  • ValueError – If the final path is directly under root (e.g., “/file.toml”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.

  • TypeError – If the dictionary-like object contains None values and pruned is False.

class TomlReader(path='', storage=Storage.FILE, chunk_size=32, storage_kws=None, toml_kws=None, not_found='raise')[source]

Bases: Reader

Read a TOML file from any supported file system.

Parameters:
  • path (str, optional) – Directory under which the TOML file is located or full path to the TOML file. Since it (or part of it) can also be provided later, when the callable instance is called, it is optional here. Defaults to an empty string.

  • storage (str, optional) – The type of file system to read from (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • chunk_size (float, optional) – Chunk size to use when reading from the selected file system in MiB. Defaults to 32 (MiB).

  • storage_kws (dict, optional) – Passed on as keyword arguments to the constructor of the file system.

  • toml_kws (dict, optional) – Passed on as keyword arguments to the load() function of python’s own tomllib package.

  • not_found (str, optional) – What to do if the specified TOML file is not found. One of “ignore”, “warn”, or “raise”. Defaults to “raise”. Use the NotFound enum to avoid typos!

Raises:
  • TypeError – If path is not a string, chunk_size is not a float, or if storage_kws is not a dictionary.

  • ValueError – If storage is not among the currently supported file-system schemes, mode not among the supported file-mode options, the chunk_size is smaller than 1 (MiB), or if storage_kws is not a dictionary.

See also

Storage, NotFound

__call__(path='')[source]

Read a specific TOML file from the specified file system.

If not_found is set to “warn” or “ignore” and the file cannot be found, an empty dictionary is returned.

Parameters:

path (str) – Path (including file name) to the TOML file to read. If it starts with a backslash, it will be interpreted as absolute, if not, as relative to the path specified at instantiation. Defaults to an empty string, which results in an unchanged path.

Returns:

The parsed contents of the TOML file.

Return type:

dict

Raises:

ValueError – If the final path is directly under root (e.g., “/file.toml”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.

class YamlWriter(path, storage=Storage.FILE, overwrite=False, skip=False, chunk_size=32, storage_kws=None, yaml_kws=None)[source]

Bases: Writer

Save a dictionary to a YAML file on any of the supported file systems.

Parameters:
  • path (str) – The absolute path to the YAML file to save the dictionary into. May include two or more forward slashes (subdirectories will be created) and string placeholders (i.e., pairs of curly brackets) that will be interpolated when instances are called.

  • storage (str, optional) – The type of file system to write to (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • overwrite (bool, optional) – Whether to silently overwrite the destination file. Defaults to False, which will raise an exception if it already exists.

  • skip (bool, optional) – Whether to silently do nothing if the target file already exists. Defaults to False.

  • chunk_size (int, optional) – Chunk size to use when writing to the selected file system in MiB. Defaults to 32 (MiB).

  • storage_kws (dict, optional) – Passed on as keywords to the constructor of the file system.

  • yaml_kws (dict, optional) – Passed on as keyword arguments to PyYaml`s dump() function. See the PyYaml documentation for options.

Raises:
  • TypeError – If path is not a string, chunk_size is not an integer or either storage_kws or yaml_kws are not dictionaries.

  • ValueError – If storage is not among the currently supported file-system schemes, mode not among the supported file-mode options, the chunk_size is smaller than 1 (MiB), or if either storage_kws or yaml_kws are not dictionaries.

See also

Storage

__call__(yml, *parts)[source]

Write a dictionary-like object to YAML file the given file system.

Parameters:
  • yml (dict or list) – The mapping to save as YAML.

  • *parts (str) – Fragments that will be interpolated into the path given at instantiation. Obviously, there must be at least as many as there are placeholders in the path.

Returns:

An empty tuple.

Return type:

tuple

Raises:
  • IndexError – If the path given at instantiation has more string placeholders that there are parts.

  • FileExistsError – If the destination file already exists, skip is False and overwrite is also False.

  • ValueError – If the final path is directly under root (e.g., “/file.yml”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.

class YamlReader(path='', storage=Storage.FILE, chunk_size=32, storage_kws=None, loader=<class 'yaml.loader.Loader'>, not_found='raise')[source]

Bases: Reader

Read a YAML file from any supported file system.

Parameters:
  • path (str, optional) – Directory under which the YAML file is located or full path to the YAML file. Since it (or part of it) can also be provided later, when the callable instance is called, it is optional here. Defaults to an empty string.

  • storage (str, optional) – The type of file system to read from (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • chunk_size (float, optional) – Chunk size to use when reading from the selected file system in MiB. Defaults to 32 (MiB).

  • storage_kws (dict, optional) – Passed on as keyword arguments to the constructor of the file system.

  • loader (type, optional) –

    The loader class to use. Defaults to Loader. See the PyYaml documentation for options.

  • not_found (str, optional) – What to do if the specified YAML file is not found. One of “ignore”, “warn”, or “raise”. Defaults to “raise”. Use the NotFound enum to avoid typos!

Raises:
  • TypeError – If path is not a string, chunk_size is not a float, or if storage_kws is not a dictionary.

  • ValueError – If storage is not among the currently supported file-system schemes, mode not among the supported file-mode options, the chunk_size is smaller than 1 (MiB), or if storage_kws is not a dictionary.

See also

Storage, NotFound

__call__(path='')[source]

Read a specific YAML file from the specified file system.

If not_found is set to “warn” or “ignore” and the file cannot be found, an empty dictionary is returned.

Parameters:

path (str) – Path (including file name) to the YAML file to read. If it starts with a backslash, it will be interpreted as absolute, if not, as relative to the path specified at instantiation. Defaults to an empty string, which results in an unchanged path.

Returns:

The parsed contents of the YAML file.

Return type:

dict

Raises:

ValueError – If the final path is directly under root (e.g., “/file.yml”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.

class YamlParser(loader=<class 'yaml.loader.Loader'>)[source]

Bases: ArgRepr

Light wrapper around pyyaml’s yaml.load() function.

Parameters:

loader (type, optional) – The loader class to use. Defaults to Loader.

__call__(yml)[source]

Parse a specific YAML string.

Parameters:

yml (str) – The YAML string to parse

Returns:

The result of parsing the YAML string.

Return type:

dict or list

class JsonWriter(path, storage=Storage.FILE, overwrite=False, skip=False, chunk_size=32, storage_kws=None, json_kws=None, gzip=None)[source]

Bases: Writer

Save a dictionary to a JSON file on any of the supported file systems.

Parameters:
  • path (str) – The absolute path to the JSON file to save the dictionary into. May include two or more forward slashes (subdirectories will be created) and string placeholders (i.e., pairs of curly brackets) that will be interpolated when instances are called.

  • storage (str, optional) – The type of file system to write to (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • overwrite (bool, optional) – Whether to silently overwrite the destination file. Defaults to False, which will raise an exception if it already exists.

  • skip (bool, optional) – Whether to silently do nothing if the target file already exists. Defaults to False.

  • chunk_size (int, optional) – Chunk size to use when writing to the selected file system in MiB. Defaults to 32 (MiB).

  • storage_kws (dict, optional) – Passed on as keywords to the constructor of the file system.

  • json_kws (dict, optional) – Passed on as keyword arguments to the dump() function of python’s own json module. See the json documentation for options.

  • gzip (bool, optional) – Write the JSON to a gzip-compressed JSON file if True and a plain text file if False. If left at None, which is the default, file names ending in “.gz” will trigger compression whereas all other extensions (if any) will not.

Raises:
  • TypeError – If path is not a string, chunk_size is not an integer or either storage_kws or json_kws are not dictionaries.

  • ValueError – If storage is not among the currently supported file-system schemes, mode not among the supported file-mode options, the chunk_size is smaller than 1 (MiB), or if either storage_kws or json_kws are not dictionaries.

See also

Storage

__call__(obj, *parts)[source]

Write a dictionary-like object to JSON file the given file system.

Parameters:
  • obj (dict or list) – The mapping to save as JSON.

  • *parts (str) – Fragments that will be interpolated into the path given at instantiation. Obviously, there must be at least as many as there are placeholders in the path.

Returns:

An empty tuple.

Return type:

tuple

Raises:
  • IndexError – If the path given at instantiation has more string placeholders that there are parts.

  • FileExistsError – If the destination file already exists, skip is False and overwrite is also False.

  • ValueError – If the final path is directly under root (e.g., “/file.json”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.

class JsonReader(path='', storage=Storage.FILE, chunk_size=32, storage_kws=None, json_kws=None, not_found='raise', gzip=None)[source]

Bases: Reader

Read a potentially compressed JSON file from any supported file system.

Parameters:
  • path (str, optional) – Directory under which the JSON file is located or full path to the JSON file. Since it (or part of it) can also be provided later, when the callable instance is called, it is optional here. Defaults to an empty string.

  • storage (str, optional) – The type of file system to read from (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • chunk_size (float, optional) – Chunk size to use when reading from the selected file system in MiB. Defaults to 32 (MiB).

  • storage_kws (dict, optional) – Passed on as keyword arguments to the constructor of the file system.

  • json_kws (dict, optional) –

    Passed on as keyword arguments to the load() function of python’s own json module. See the json documentation for options.

  • not_found (str, optional) – What to do if the specified JSON file is not found. One of “ignore”, “warn”, or “raise”. Defaults to “raise”. Use the NotFound enum to avoid typos!

  • gzip (bool, optional) – Read the JSON from a gzip-compressed file if True and a plain text file if False. If left at None, which is the default, file names ending in “.gz” will be assumed to be compressed whereas all other extensions will not.

Raises:
  • TypeError – If path is not a string, chunk_size is not a float, or if storage_kws is not a dictionary.

  • ValueError – If storage is not among the currently supported file-system schemes, mode not among the supported file-mode options, the chunk_size is smaller than 1 (MiB), or if storage_kws is not a dictionary.

See also

Storage, NotFound

__call__(path='')[source]

Read a specific JSON file from the specified file system.

If not_found is set to “warn” or “ignore” and the file cannot be found, an empty dictionary is returned.

Parameters:

path (str) – Path (including file name) to the JSON file to read. If it starts with a backslash, it will be interpreted as absolute, if not, as relative to the path specified at instantiation. Defaults to an empty string, which results in an unchanged path.

Returns:

The parsed contents of the JSON file.

Return type:

dict

Raises:

ValueError – If the final path is directly under root (e.g., “/file.json”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.

Base classes

class Writer(path, storage=Storage.FILE, overwrite=False, skip=False, mode=Mode.WB, chunk_size=32, storage_kws=None, *args, **kwargs)[source]

Bases: ArgRepr

Base class for writing objects to files or blobs on any filesystem.

Parameters:
  • path (str) – The absolute path to the file to save. May contain any number of string placeholders (i.e., pairs of curly brackets) that will be interpolated when instances are called.

  • storage (str, optional) – The type of file system to write to (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • overwrite (bool, optional) – Whether to silently overwrite the destination file. Defaults to False, which will raise an exception if it already exists.

  • skip (bool, optional) – Whether to silently do nothing if the target file already exists. Defaults to False.

  • mode (str, optional) – The mode to open the target file/object/blob in. Defaults to “wb”. Use the Mode enum to avoid typos.

  • chunk_size (float, optional) – Chunk size to use when writing to the selected file system in MiB. Defaults to 32 (MiB).

  • storage_kws (dict, optional) – Passed on as keyword arguments to the constructor of the file system.

  • *args – Additional arguments are reflected in the representation of instances but do not affect functionality in any way.

  • **kwargs – Additional keyword arguments are reflected in the representation of instances but do not affect functionality in any way.

Raises:
  • TypeError – If path is not a string, chunk_size is not a float, or if storage_kws is not a dictionary.

  • ValueError – If storage is not among the currently supported file-system schemes, mode not among the supported file-mode options, the chunk_size is smaller than 1 (MiB), or if storage_kws is not a dictionary.

See also

Storage, Mode

_managed(uri, compression=None)[source]

Context manager for atomic writes with automatic cleanup.

static _tmp(uri)[source]

Create a random name for a temporary target file.

_uri_from(*parts)[source]

Check skip/overwrite and create parent directories.

property chunk_bytes

Bytes to flush to the file system in one go.

property fs

Fresh fsspec file system on first use, same thereafter.

class Reader(path='', storage=Storage.FILE, mode=Mode.RB, chunk_size=32, storage_kws=None, *args, **kwargs)[source]

Bases: ArgRepr

Base class for reading objects from files or blobs on any filesystem.

Parameters:
  • path (str, optional) – Directory under which the file is located or full path to the file. Since it (or part of it) can also be provided later, when the callable instance is called, it is optional here. Defaults to an empty string.

  • storage (str, optional) – The type of file system to read from (“file”, “s3”, etc.). Defaults to “file”. Use the Storage enum to avoid typos.

  • mode (str, optional) – The mode to open the source file/object/blob in. Defaults to “rb”. Use the Mode enum to avoid typos.

  • chunk_size (float, optional) – Chunk size to use when reading from the selected file system in MiB. Defaults to 32 (MiB).

  • storage_kws (dict, optional) – Passed on as keyword arguments to the constructor of the file system.

  • *args – Additional arguments are reflected in the representation of instances but do not affect functionality in any way.

  • **kwargs – Additional keyword arguments are reflected in the representation of instances but do not affect functionality in any way.

Raises:
  • TypeError – If path is not a string, chunk_size is not a float, or if storage_kws is not a dictionary.

  • ValueError – If storage is not among the currently supported file-system schemes, mode not among the supported file-mode options, the chunk_size is smaller than 1 (MiB), or if storage_kws is not a dictionary.

See also

Storage, Mode

_managed(uri, compression=None)[source]

Context manager for atomic reads from the given file system.

_non_root(path='')[source]

Append/replace the path given at instantiation on instance call.

property chunk_bytes

Bytes to read from the file system in one go.

property fs

Fresh fsspec file system on first use, same thereafter.

Enums

class Storage(*values)[source]

Bases: StrEnum

Supported file systems for read/write operations.

FILE = file
S3 = s3
GCS = gcs
MEMORY = memory
class Mode(*values)[source]

Bases: StrEnum

Modes for opening files.

WB = wb
WT = wt
class Compression(*values)[source]

Bases: StrEnum

Compression algorithms for file storage.

ZIP = zip
BZ2 = bz2
GZIP = gzip
LZMA = lzma
XZ = xz
class NotFound(*values)[source]

Bases: StrEnum

Enum to direct read/load behavior in case of missing files.

IGNORE = ignore
WARN = warn
RAISE = raise
class Bears(*values)[source]

Bases: StrEnum

Enum to choose pandas versus polars.

PANDAS = pandas
POLARS = polars