pd

Pandas utilities and partials of dataframe method calls.

Parameters that are known at program start are used to initialize the classes so that, at runtime, dataframes can flow through a preconfigured processing pipe of callable objects.

class Agg(func=None, *args, **kwargs)[source]

Bases: ArgRepr

Simple partial for calling a pandas object’s agg method.

Parameters:
  • func (callable, str, list, or dict, optional) – Function(s) to use for aggregating the data. If a function, must work when passed a Series. Also acceptable are a function name, a list of function names and a dictionary with columns names as keys and functions, function names, or lists thereof as values. Defaults to None, which only works for a dataframe and relies on kwargs to specify named aggregations.

  • *args – Positional arguments to pass on to the agg or func call.

  • **kwargs – Keyword arguments to pass on to the agg or func call.

Note

See the pandas agg docs for a full list of (keyword) arguments and an extensive description of usage and configuration.

__call__(df)[source]

Call a pandas object’s agg method with the cached (kw)args.

Parameters:

df (Series, DataFrame, Rolling or their GroupBy companions) – The pandas object to aggregate.

Returns:

The aggregation of the pandas object.

Return type:

scalar, Series, or DataFrame

class AsType(types, **kwargs)[source]

Bases: ReprName

Partial of a pandas dataframe or series astype method.

Parameters:
  • types (type or dict) – Single type or dictionary of column names and types, specifying type conversion of entire dataframe or specific columns, respectively

  • **kwargs – Keyword arguments are passed on to the astype method call after the types argument.

__call__(df)[source]

Cast dataframe (columns) or series to specified types.

Parameters:

df (DataFrame or Series) – Pandas dataframe tor series o type-cast.

Returns:

Pandas dataframe (columns) or series cast to new type(s).

Return type:

DataFrame or Series

class Assign(col=None, **cols)[source]

Bases: ArgRepr

Light wrapper around a pandas dataframe’s assign method.

Parameters:
  • col (dict) – A dictionary with the names of newly created (or overwritten) columns as keys. If the values are callable, they are computed on the entire dataframe and assigned to the new columns. The callable must not the change input dataframe (though pandas doesn’t check it). If the values are not callable, e.g., a series, scalar, or array, they are simply assigned.

  • **cols – As in the original, the keyword arguments themselves serve as the name(s) of the new (or overwritten) column(s) and their values are set in the same way.

__call__(df)[source]

Add new columns to a dataframe by calling its assign method.

Parameters:

df (DataFrame) – The dataframe to add new columns to.

Returns:

The input dataframe with new columns added.

Return type:

DataFrame

class ColumnMapper(src_col, transform, tgt_col=None, na_action=None)[source]

Bases: ArgRepr

Transform one column of a pandas dataframe into another.

This is simply a partial of calling the map method on one column of a dataframe and assigning the result to the same or another column of the same dataframe.

Parameters:
  • src_col (Hashable) – Column to call the map method on.

  • transform (callable, Mapping, or Series) – Function or mapping in the form of a dictionary or a pandas series.

  • tgt_col (Hashable, optional) – Dataframe column to store the series resulting from the transformation. Defaults to src_col, thus overwriting it in place.

  • na_action (str, optional) – Can take the value “ignore” or None, defaulting to the latter. Will be passed to the series map method along with transform.

__call__(df)[source]

Called the map method on a specified column of a DataFrame.

Cached keyword arguments are forwarded to the method call and the result is stored in the specified column of the DataFrame.

Parameters:

df (DataFrame) – Pandas dataframe with the column to call the map method on.

Returns:

Pandas dataframe with the result of the column transformation in the specified column.

Return type:

DataFrame

class ColumnSelector(col)[source]

Bases: ArgRepr

Select a single column of a (grouped) pandas dataframe as a series.

This is simply a partial for calling a (grouped) dataframe’s __getitem__ method with a single argument (using the square-brackets accessor).

Parameters:

col (hashable) – Single DataFrame column to select.

__call__(df)[source]

Select a single column of a (grouped) pandas dataframe as series.

Parameters:

df (DataFrame or DataFrameGroupBy) – Pandas dataframe or grouped dataframe to select column from.

Returns:

The selected column from the (grouped) dataframe.

Return type:

Series or SeriesGroupBy

class ColumnsSelector(col=(), *cols)[source]

Bases: ArgRepr

Select one or more columns of a (grouped) pandas dataframe as dataframe.

This is simply a partial for calling a (grouped) dataframe’s __getitem__ method with a list of arguments (using the square-brackets accessor).

Parameters:
  • col (Hashable, optional) – Column name or sequence thereof. Defaults to an empty tuple.

  • *cols (Hashable) – Additional columns names.

__call__(df)[source]

Select the specified column(s) from a (grouped) pandas dataframe.

Parameters:

df (DataFrame or DataFrameGroupBy) – Pandas dataframe or grouped dataframe to select column(s) from.

Returns:

The selected column(s) of the (grouped) dataframe.

Return type:

DataFrame or DataFrameGroupBy

class Drop(label=None, *labels, axis=1, index=None, columns=None, level=None, errors='raise')[source]

Bases: ArgRepr

A simple partial of a pandas dataframe or series’ drop method.

Parameters:
  • labels (hashable or sequence, optional) – Index or column labels to drop. Defaults to None.

  • axis (1 or "columns", 0 or "index") – Whether to drop labels from the columns (1 or “columns”) or index (0 or “index”). Defaults to 1

  • index (hashable or sequence, optional) – Single label or list-like. Defaults to None. Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).

  • columns (hashable or sequence, optional) – Single label or list-like. Defaults to None. Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

  • level (hashable, optional.) – Integer or level name. Defaults to None. For MultiIndex, level from which the labels will be removed.

  • errors ("raise" or "ignore") – Defaults to “raise”. If “ignore”, suppress error and drop only existing labels.

__call__(df)[source]

Drop rows or columns from a pandas series or dataframe.

Parameters:

df (Series or DataFrame) – The object to drop rows or columns from.

Returns:

The object with rows or columns dropped.

Return type:

Series or DataFrame

class DropNA(axis=0, how=None, thresh=None, subset=None, ignore_index=False)[source]

Bases: ArgRepr

A simple partial of a pandas dataframe or series’ dropna method.

Parameters:
  • axis (0 or "index", 1 or "columns") – Determine if rows or columns which contain missing values are removed. Defaults to 0.

  • how ("any" or "all") – Determine if row or column is removed from DataFrame, when we have at least one NA or all NA. Defaults to “any”.

  • thresh (int, optional) – Require that many non-NA values. Cannot be combined with how. Defaults to None.

  • subset (hashable or sequence, optional) – Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include. Defaults to None.

  • ignore_index (bool, optional) – Defaults to False. If True, the resulting axis will be labeled 0, 1, …, n - 1.

__call__(df)[source]

Drop rows or columns with NAs from a pandas series or dataframe.

Parameters:

df (Series or DataFrame) – The object to drop rows or columns with NAs from.

Returns:

The object with rows or columns with NAs dropped.

Return type:

Series or DataFrame

class GroupBy(by=None, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]

Bases: ArgRepr

Simple partial of a pandas dataframe and series groupby method.

Parameters:
  • by (str, callable, series, array, dict, or list) – Column name, function (to be called on each column name), list or numpy array of the same length as the columns, a dict or series providing a label -> group name mapping, or a list of the above.

  • level (hashable or sequence, optional) – If the axis is a multi-index (hierarchical), group by a particular level or levels. Do not specify both by and level. Defaults to None.

  • as_index (bool, optional) – Whether to return group labels as index. Defaults to True.

  • sort (bool, optional) – Whether to sort group keys. Defaults to True.

  • group_keys (bool, optional) – Defaults to True

  • observed (bool, optional) – Whether to show only observed values for categorical groupers. Defaults to False.

  • dropna (bool, optional) – Whether to treat NA values in group keys as groups. Defaults to True.

Note

For a more extensive description of all (keyword) arguments, see the pandas documentation.

__call__(df)[source]

Call a dataframe or series groupby method.

Parameters:

df (DataFrame or Series) – Pandas dataframe or series to group.

Returns:

The grouped dataframe or series.

Return type:

DataFrameGroupBy or SeriesGroupBy

class GroupByApply(func, *args, **kwargs)[source]

Bases: ArgRepr

Partial for calling a grouped dataframe or series apply method.

Parameters:
  • func (callable) – A callable that takes a dataframe or series as its first argument, and returns a dataframe, a series or a scalar.

  • *args – Positional arguments to pass to func.

  • **kwargs – Keyword arguments to pass to func.

__call__(df)[source]

Call a grouped dataframe or series apply method.

Parameters:

df (DataFrameGroupBy or SeriesGroupBy) – The pandas group-by object to apply func to.

Returns:

The input object with func applied to groups.

Return type:

Series or DataFrame

class Join(*args, **kwargs)[source]

Bases: ArgRepr

Light wrapper around the pandas dataframe join method.

Parameters:
  • *args – Arguments to pass on to the join method call.

  • **kwargs – Keyword arguments to pass on to the join method call.

Note

For a full list of (keyword) arguments and their description, see the pandas join documentation.

__call__(df, other)[source]

Join a dataframe with other dataframe(s) and/or series.

Parameters:
  • df (DataFrame) – Source dataframe on which the join method will be called.

  • other (DataFrame, Series, or a list of any combination) – Index should be similar to one (or more) columns in df. If a series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined dataframe.

Returns:

The joined dataframe.

Return type:

DataFrame

class ParquetReader(path='', **kwargs)[source]

Bases: ArgRepr

Light wrapper around the top-level read_parquet pandas function.

Parameters:
  • path (str, optional) – Directory under which the parquet files are located or full path to the parquet file. If not fully specified here, the path must be completed on calling the instance. Defaults to the current working directory of the python interpreter.

  • **kwargs – Keyword arguments passed on to the read_parquet function call.

__call__(path='')[source]

Read one or more parquet file(s) into pandas DataFrame.

Parameters:

path (str, optional) – Path (including file name) to the directory where the parquet file(s) to read are located. If it starts with a backslash, it will be interpreted as absolute, if not, as relative to the path specified at instantiation. Defaults to an empty string, which results in an unchanged path.

Return type:

DataFrame

class ParquetWriter(path='', create=False, **kwargs)[source]

Bases: ArgRepr

Partial of the pandas dataframe to_parquet method.

Parameters:
  • path (str, optional) – Path (including file name) to save the parquet file to. May include any number of string placeholders (i.e., pairs of curly brackets) that will be interpolated when instances are called. Defaults to the current working directory of the python interpreter.

  • create (bool, optional) – What to do if the directory where the parquet file should be saved does not exist. Defaults to False.

  • **kwargs – Keyword arguments passed on to the to_parquet method call.

__call__(df, *parts)[source]

Write a pandas DataFrame to parquet file.

Parameters:
  • df (DataFrame) – The dataframe to write.

  • *parts (str, optional) – Fragments that will be interpolated into the path string given at instantiation. Obviously, there must be at least as many as there are placeholders in the path.

Returns:

An empty tuple.

Return type:

tuple

class Rename(mapper=None, index=None, columns=None, axis=1, level=None, errors='ignore')[source]

Bases: ArgRepr

Simple partial of a pandas dataframe’s rename method.

Parameters:
  • mapper (dict-like or function) – Dict-like or function transformations to apply to the axis values.

  • index (dict-like or function) – Alternative to specifying mapper with axis = 0.

  • columns (dict-like or function) – Alternative to specifying mapper with axis = 1.

  • axis (1 or "columns", 0 or "index", optional) – Axis to target with mapper. Defaults to 1.

  • level (Hashable, optional) – In case of a MultiIndex, only rename labels in the specified level. Defaults to None

  • errors ("ignore" or "raise", optional) – If “raise”, raise a KeyError when a dict-like mapper, index, or columns contains labels that are not present in the index being transformed. If “ignore”, existing keys will be renamed and extra keys will be ignored. Defaults to “ignore”.

__call__(df)[source]

Rename a pandas dataframe’s columns or rows.

Parameters:

df (DataFrame) – The dataframe to rename columns or rows of.

Returns:

The dataframe with renamed columns or rows.

Return type:

DataFrame

property resolved

Resolved mapper-axis vs. index vs. columns keywords.

class ResetIndex(level=None, drop=False, col_level=0, col_fill='', allow_duplicates=False, names=None)[source]

Bases: ArgRepr

Simple partial of a pandas dataframe’s reset_index method.

Parameters:
  • level (int, str, tuple, or list, optional) – Only remove the given levels from the index. Defaults to None, which removes all levels.

  • drop (bool, optional) – Do not try to insert index into dataframe columns. This resets the index to the default integer index. Default to False.

  • col_level (int or str, optional) – If the columns have multiple levels, determines which level the labels are inserted into. Default to 0.

  • col_fill (Hashable, optional) – If the columns have multiple levels, determines how the other levels are named. Defaults to an empty string.

  • allow_duplicates (bool, optional) – Allow duplicate column labels to be created. Defaults to False

  • names (hashable or sequence, optional) – Using the given string, rename the dataframe column which contains the index data. If the dataframe has a multiindex, this has to be a list or tuple with length equal to the number of levels. Defaults to None.

__call__(df)[source]

Reset the index of a pandas dataframe.

Parameters:

df (DataFrame) – The dataframe to reset the index of.

Returns:

The dataframe with its index reset.

Return type:

DataFrame

class RollingGroupByApply(func, raw=False, engine=None, engine_kws=None, *args, **kwargs)[source]

Bases: ArgRepr

Partial for calling a rolling-grouped dataframe’s apply method.

Parameters:
  • func (callable) – Must produce a single numerical value from a numpy ndarray if raw = True or a series if raw = False. Can also accept a numba JIT function with engine = “numba” specified.

  • raw (bool, optional) – Whether to pass a numpy ndarray or a pandas series to func. Defaults to False.

  • engine (str, optional) – Either “cython” or “numba”. Defaults to None

  • engine_kws (dict, optional) – Configuration of the “numba” engine. Keys can be “nopython”, “nogil”, and “parallel”, and values must be True or False. Defaults to None.

  • *args – Positional arguments to pass to func.

  • **kwargs – Keyword arguments to pass to func.

__call__(rolling_df)[source]

Call a rolling-grouped dataframe`s apply method.

Parameters:

rolling_df (RollingGroupby) – The rolling-grouped dataframe to aggregate.

Returns:

The aggregation of the rolling-grouped dataframe.

Return type:

DataFrame

class RollingWindow(*args, **kwargs)[source]

Bases: ArgRepr

Simple partial of for calling a pandas object’s rolling method.

Parameters:
  • *args – Arguments to pass on to the rolling method call.

  • **kwargs – Keyword arguments to pass on to the rolling method call.

Notes

See the pandas rolling docs for a full list of (keyword) arguments and an extensive description of usage.

__call__(df)[source]

Call a pandas object`s rolling method with the cached (kw)args.

Parameters:

df (Series, DataFrame, or their GroupBy companions) – The pandas object to call rolling on.

Returns:

Depending on the input type.

Return type:

Window, Rolling, or RollingGroupBy

class RowsSelector(condition)[source]

Bases: ArgRepr

Select rows from a pandas dataframe with a boolean mask or function.

This is simply a partial for calling a dataframe’s __getitem__ method (using the square-brackets accessor) with a callable that takes the dataframe as input, and produces a 1-D, boolean array-like structure (of the same length as the dataframe to select from).

Parameters:

condition (callable or array-like) – A callable that accepts a dataframe and produces a 1-D, boolean array-like structure of the same length

__call__(df)[source]

Select rows from a dataframe with the specified mask or condition.

Parameters:

df (DataFrame) – The pandas dataframe to select rows from.

Returns:

The pandas dataframe with only the selected rows.

Return type:

DataFrame

class SetIndex(keys, drop=True, append=False, verify_integrity=False)[source]

Bases: ArgRepr

Simple partial of a pandas dataframe’s set_index method.

Parameters:
  • keys (hashable or array-like) – This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays.

  • drop (bool, optional) – Delete columns to be used as the new index. Defaults to True.

  • append (bool, optional) – Whether to append columns to existing index. Defaults to False

  • verify_integrity (bool, optional) – Whether to check the new index for duplicates. Defaults to False. Setting to True will impact the performance of this method.

__call__(df)[source]

Set the index of a pandas dataframe.

Parameters:

df (DataFrame) – The dataframe to set the index of.

Returns:

The Dataframe with a new index set.

Return type:

DataFrame

class SortValues(by, **kwargs)[source]

Bases: ArgRepr

Partial of the pandas dataframe sort_values method.

Parameters:
  • by (hashable or sequence) – Name or list of names to sort by.

  • **kwargs – Additional keyword arguments will be forwarded to the method call with the exception of “inplace”, which will be set to False.

Note

For a full list of keyword arguments and their description, see the pandas sort_values documentation.

__call__(df)[source]

Sort a pandas dataframe by column(s) values.

Parameters:

df (DataFrame) – The dataframe to sort.

Returns:

The sorted dataframe.

Return type:

DataFrame