pd

Pandas utilities and partials of dataframe method calls.

Parameters that are known at program start are used to initialize the classes so that, at runtime, dataframes can flow through a preconfigured processing pipe of callable objects.

class Agg(func=None, axis=0, *args, engine=None, engine_kwargs=None, **kwargs)[source]

Bases: ArgRepr

Simple partial for calling a pandas object’s agg method.

Parameters:
  • func (callable, str, list, or dict, optional) – Function(s) to use for aggregating the data. If a function, must work when passed a Series. Also acceptable are a function name, a list of function names and a dictionary with columns names as keys and functions, function names, or lists thereof as values. Defaults to None, which only works for a dataframe and relies on kwargs to specify named aggregations.

  • axis (int or str, optional) – Which dimension to aggregate over in case of a dataframe. Must be one of 0, “index”, 1, or “columns”. Ignored for all other pandas objects. Defaults to 0.

  • *args – Positional arguments to pass on to the agg or func call.

  • engine (str, optional) – Which engine to use when applied to group-by objects.

  • engine_kwargs (dict, optional) – Keywords to configure the engine, if any.

  • **kwargs – Keyword arguments to pass on to the agg or func call. When used on a dataframe or series, and if func is None, column names with their individual aggregation functions can be given.

Note

See the pandas agg docs for a full list of (keyword) arguments and an extensive description of usage and configuration.

__call__(df)[source]

Call a pandas object’s agg method with the cached (kw)args.

Parameters:

df – The pandas object to aggregate.

Returns:

The aggregation of the pandas object.

Return type:

Scalar, Series, or DataFrame

class Apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine=None, engine_kwargs=None, **kwargs)[source]

Bases: ArgRepr

Simple partial for calling a pandas object’s apply method.

Parameters:
  • func (callable, str, list, or dict) – Function(s) to apply to the data.

  • axis (int or str, optional) – Which dimension to apply func over in case of a dataframe. Must be one of 0, “index”, 1, or “columns”. Ignored for all other pandas objects. Defaults to 0.

  • raw (bool, optional) – Whether to pass a series or a numpy array to func. Defaults to False, which results in a series being passed.

  • result_type (str, optional) – Must be one of “expand”, “reduce”, “broadcast”, or None.

  • args (tuple, optional) – Positional arguments to pass on to func. Defaults to an emtpy tuple.

  • by_row (str or bool, optional) – Must be one of “compat” or False.

  • engine (str or decorator, optional) – Which engine to use. Defaults to the python interpreter.

  • engine_kwargs (dict, optional) – Keywords to configure the engine, if any.

  • **kwargs – Keyword arguments to pass on to the func call.

Note

See the pandas apply docs for a full list of (keyword) arguments and a description of usage and configuration.

__call__(df)[source]

Call a pandas object’s apply method.

Parameters:

df – The pandas object to call apply on.

Returns:

The return type of calling apply on the pandas object.

Return type:

Pandas

class AsFreq(freq, method=None, how=None, normalize=False, fill_value=None)[source]

Bases: ArgRepr

Light wrapper around a pandas dataframe or series asfreq method.

Parameters:
  • freq (DateOffset or str) – Frequency DateOffset or string.

  • method (str, optional) – Method to use for filling holes in re-indexed Series (note this does not fill NaNs that already were present). Must be one of “pad”/”ffill” or “backfill”/”bfill”. Defaults to None.

  • how (str, optional) – For PeriodIndex only. Must be one of “start” or “end”. Defaults to None.

  • normalize (bool, optional) – Whether to reset output index to midnight. defaults to False

  • fill_value (scalar, optional) – Value to use for missing values, applied during upsampling (note this does not fill NaNs that already were present).

__call__(df)[source]

Call the asfreq method of the passed pandas object.

Parameters:

df (Series or DataFrame) – The pandas object to call asfreq on with the cached (keyword) arguments.

Returns:

The same type as called with.

Return type:

Series or DataFrame

class AsType(types, errors='raise')[source]

Bases: ReprName

Partial of a pandas dataframe or series astype method.

Parameters:
  • types (type or dict) – Single type or dictionary of column names and types, specifying type conversion of entire dataframe or specific columns, respectively

  • errors (str, optional) – What to do when data cannot be type cast. Must be one of “raise” or “ignore”. Defaults to “raise”.

__call__(df)[source]

Cast dataframe (columns) or series to specified types.

Parameters:

df (DataFrame or Series) – Pandas dataframe tor series o type-cast.

Returns:

Pandas dataframe (columns) or series cast to new type(s).

Return type:

DataFrame or Series

class Assign(col=None, **cols)[source]

Bases: ArgRepr

Light wrapper around a pandas dataframe’s assign method.

Parameters:
  • col (dict) – A dictionary with the names of newly created (or overwritten) columns as keys. If the values are callable, they are computed on the entire dataframe and assigned to the new columns. The callable must not the change input dataframe (though pandas doesn’t check it). If the values are not callable, e.g., a series, scalar, or array, they are simply assigned.

  • **cols – As in the original, the keyword arguments themselves serve as the name(s) of the new (or overwritten) column(s) and their values are set in the same way.

__call__(df)[source]

Add new columns to a dataframe by calling its assign method.

Parameters:

df (DataFrame) – The dataframe to add new columns to.

Returns:

The input dataframe with new columns added.

Return type:

DataFrame

class ColumnSelector(col)[source]

Bases: ArgRepr

Select a single column of a (grouped) pandas dataframe as a series.

This is simply a partial for calling a (grouped) dataframe’s __getitem__ method with a single argument (using the square-brackets accessor).

Parameters:

col (hashable) – Single DataFrame column to select.

__call__(df)[source]

Select a single column of a (grouped) pandas dataframe as series.

Parameters:

df (DataFrame or DataFrameGroupBy) – Pandas dataframe or grouped dataframe to select column from.

Returns:

The selected column from the (grouped) dataframe.

Return type:

Series or SeriesGroupBy

class ColumnsSelector(col=(), *cols)[source]

Bases: ArgRepr

Select one or more columns of a (grouped) pandas dataframe as dataframe.

This is simply a partial for calling a (grouped) dataframe’s __getitem__ method with a list of arguments (using the square-brackets accessor).

Parameters:
  • col (hashable or array-like, optional) – Column name or sequence thereof. Defaults to an empty tuple.

  • *cols (hashable) – Additional columns names.

__call__(df)[source]

Select the specified column(s) from a (grouped) pandas dataframe.

Parameters:

df (DataFrame or DataFrameGroupBy) – Pandas dataframe or grouped dataframe to select column(s) from.

Returns:

The selected column(s) of the (grouped) dataframe.

Return type:

DataFrame or DataFrameGroupBy

class Copy(deep=True)[source]

Bases: ArgRepr

Partial of the copy method of a pandas dataframe or series.

Parameters:

deep (bool, optional) – Makes a deep copy when True, including a copy of the data and the indices. When False, neither the indices nor the data are copied. Defaults to True

__call__(df)[source]

Call the copy method of the passed pandas dataframe or series.

Parameters:

df (DataFrame or Series) – The pandas object to call the copy method of.

Returns:

Copy of the pandas object passed, deep or shallow, depending on the flag set at instantiation.

Return type:

DataFrame or Series

class Drop(label=None, *labels, axis=1, index=None, columns=None, level=None, errors='raise')[source]

Bases: ArgRepr

A simple partial of a pandas dataframe or series’ drop method.

Parameters:
  • labels (hashable or sequence, optional) – Index or column labels to drop. Defaults to None.

  • axis (1 or "columns", 0 or "index") – Whether to drop labels from the columns (1 or “columns”) or index (0 or “index”). Defaults to 1

  • index (hashable or sequence, optional) – Single label or list-like. Defaults to None. Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).

  • columns (hashable or sequence, optional) – Single label or list-like. Defaults to None. Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

  • level (hashable, optional.) – Integer or level name. Defaults to None. For MultiIndex, level from which the labels will be removed.

  • errors ("raise" or "ignore") – Defaults to “raise”. If “ignore”, suppress error and drop only existing labels.

__call__(df)[source]

Drop rows or columns from a pandas series or dataframe.

Parameters:

df (Series or DataFrame) – The object to drop rows or columns from.

Returns:

The object with rows or columns dropped.

Return type:

Series or DataFrame

property resolved

Resolved labels-axis vs. index-columns keywords.

class DropNA(axis=0, how=None, thresh=None, subset=None, ignore_index=True)[source]

Bases: ArgRepr

A simple partial of a pandas dataframe or series’ dropna method.

Parameters:
  • axis (0 or "index", 1 or "columns") – Determine if rows or columns which contain missing values are removed. Defaults to 0.

  • how ("any" or "all", optional) – Determine if row or column is removed from DataFrame, when we have at least one NA or all NA. Defaults to None.

  • thresh (int, optional) – Require that many non-NA values. Cannot be combined with how. Defaults to None.

  • subset (hashable or sequence, optional) – Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include. Defaults to None.

  • ignore_index (bool, optional) – Defaults to True, thus relabeling the resulting axis as 0, 1, …, n - 1.

__call__(df)[source]

Drop rows or columns with NAs from a pandas series or dataframe.

Parameters:

df (Series or DataFrame) – The object to drop rows or columns with NAs from.

Returns:

The object with rows or columns with NAs dropped.

Return type:

Series or DataFrame

class Explode(col=(), *cols, ignore_index=False)[source]

Bases: ArgRepr

Partial of a pandas dataframe or series explode method.

Parameters:
  • col (hashable or sequence, optional) – Column name or sequence of column names to explode. Only relevant when called on a DataFrame.

  • *cols (hashable) – Additional column names to explode.

  • ignore_index (bool, optional) – If True, the resulting index will be reset. Otherwise, it will be exploded as well, introducing duplicates. Defaults to False.

__call__(df)[source]

Explode a dataframe or series.

Parameters:

df (DataFrame or Series) – Pandas dataframe or series to explode.

Returns:

Exploded pandas dataframe or series.

Return type:

DataFrame or Series

Raises:

TypeError – When called on a dataframe with no col specified or when called on an object other than a dataframe or series.

class FillNA(value, axis=0, limit=None)[source]

Bases: ArgRepr

Light wrapper around a pandas dataframe or series fillna method.

Parameters:
  • value (scalar, dict, Series, or DataFrame) – Value to use to fill holes (e.g. 0), alternately a dict, Series, or DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict or Series or DataFrame will not be filled. This value cannot be a list.

  • axis (int or str, optional) – Axis along which to fill missing values in case of a dataframe. Must be one of 0, “index”, 1, or “columns”. Ignored for Series. Defaults to 0.

  • limit (int, optional) – This is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

__call__(df)[source]

Call the fillna method of the passed pandas object.

Parameters:

df (Series or DataFrame) – The pandas object to call fillna on with the cached (keyword) arguments.

Returns:

The same type as called with.

Return type:

Series or DataFrame

class GroupBy(by=None, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]

Bases: ArgRepr

Simple partial of a pandas dataframe or series groupby method.

Parameters:
  • by (str, callable, series, array, dict, or list) – Column name, function (to be called on each column name), list or numpy array of the same length as the columns, a dict or series providing a label -> group name mapping, or a list of the above.

  • level (hashable or sequence, optional) – If the axis is a multi-index (hierarchical), group by a particular level or levels. Do not specify both by and level. Defaults to None.

  • as_index (bool, optional) – Whether to return group labels as index. Defaults to True.

  • sort (bool, optional) – Whether to sort group keys. Defaults to True.

  • group_keys (bool, optional) – Defaults to True

  • observed (bool, optional) – Whether to show only observed values for categorical groupers. Defaults to False.

  • dropna (bool, optional) – Whether to treat NA values in group keys as groups. Defaults to True.

Note

For a more extensive description of all (keyword) arguments, see the pandas documentation.

__call__(df)[source]

Call a dataframe or series groupby method.

Parameters:

df (DataFrame or Series) – Pandas dataframe or series to group.

Returns:

The grouped dataframe or series.

Return type:

DataFrameGroupBy or SeriesGroupBy

class Join(*args, **kwargs)[source]

Bases: ArgRepr

Light wrapper around the pandas dataframe join method.

Parameters:
  • *args – Arguments to pass on to the join method call.

  • **kwargs – Keyword arguments to pass on to the join method call.

Note

For a full list of (keyword) arguments and their description, see the pandas join documentation.

__call__(df, other)[source]

Join a dataframe with other dataframe(s) and/or series.

Parameters:
  • df (DataFrame) – Source dataframe on which the join method will be called.

  • other (DataFrame, Series, or a list of any combination) – Index should be similar to one (or more) columns in df. If a series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined dataframe.

Returns:

The joined dataframe.

Return type:

DataFrame

class Mapper(func, na_action=None, engine=None, **kwargs)[source]

Bases: ArgRepr

Partial of a pandas dataframe or series map method.

Parameters:
  • func (callable, Mapping, or Series) – Function or mapping in the form of a dictionary or a pandas series.

  • na_action (str, optional) – Can take the value “ignore” or None, defaulting to the latter.

  • engine (decorator, optional) – The engine to use for a pandas series, Defaults to None.

  • **kwargs (Any) – Keyword arguments are pass on to func.

__call__(df)[source]

Called the map method of a pandas series or dataframe.

Cached keyword arguments are forwarded to the method call.

Parameters:

df (DataFrame or Series) – Pandas dataframe with the column to call the map method on.

Returns:

Pandas object with the result or the map operation.

Return type:

DataFrame or Series

Raises:

TypeError – When called on an unsuitable object type.

class Rename(mapper=None, index=None, columns=None, axis=1, level=None, errors='ignore')[source]

Bases: ArgRepr

Simple partial of a pandas dataframe or series rename method.

Parameters:
  • mapper (dict-like or function) – Dict-like or function transformations to apply to the axis values.

  • index (dict-like or function) – Alternative to specifying mapper with axis = 0.

  • columns (dict-like or function) – Alternative to specifying mapper with axis = 1.

  • axis (1 or "columns", 0 or "index", optional) – Axis to target with mapper. Defaults to 1.

  • level (Hashable, optional) – In case of a MultiIndex, only rename labels in the specified level. Defaults to None

  • errors ("ignore" or "raise", optional) – If “raise”, raise a KeyError when a dict-like mapper, index, or columns contains labels that are not present in the index being transformed. If “ignore”, existing keys will be renamed and extra keys will be ignored. Defaults to “ignore”.

__call__(df)[source]

Rename a pandas dataframe’s or series’ columns or rows.

Parameters:

df (DataFrame or Series) – The dataframe or series to rename columns or rows of.

Returns:

The dataframe or series with renamed columns or rows.

Return type:

DataFrame or Series

Raises:

TypeError – When called on an unsuitable object type.

property resolved

Resolved mapper-axis vs. index-columns keywords.

class ResetIndex(level=None, drop=False, col_level=0, col_fill='', allow_duplicates=False, names=None)[source]

Bases: ArgRepr

Simple partial of a pandas dataframe or series reset_index method.

Parameters:
  • level (int, str, tuple, or list, optional) – Only remove the given levels from the index. Defaults to None, which removes all levels.

  • drop (bool, optional) – Do not try to insert index into dataframe columns. This resets the index to the default integer index. Default to False.

  • col_level (int or str, optional) – If the columns have multiple levels, determines which level the labels are inserted into. Default to 0.

  • col_fill (Hashable, optional) – If the columns have multiple levels, determines how the other levels are named. Defaults to an empty string.

  • allow_duplicates (bool, optional) – Allow duplicate column labels to be created. Defaults to False

  • names (hashable or sequence, optional) – Using the given string, rename the dataframe column which contains the index data. If the dataframe has a multiindex, this has to be a list or tuple with length equal to the number of levels. Defaults to None.

__call__(df)[source]

Reset the index of a pandas dataframe or series.

Parameters:

df (DataFrame or Series) – The dataframe or series to reset the index of.

Returns:

The dataframe with its index reset.

Return type:

DataFrame or series

Raises:

TypeError – When called with an unsuitable object type.

class RollingWindow(*args, **kwargs)[source]

Bases: ArgRepr

Simple partial of for calling a pandas object’s rolling method.

Parameters:
  • *args – Arguments to pass on to the rolling method call.

  • **kwargs – Keyword arguments to pass on to the rolling method call.

Notes

See the pandas rolling docs for a full list of (keyword) arguments and an extensive description of usage.

__call__(df)[source]

Call a pandas object’s rolling method with the cached (kw)args.

Parameters:

df (Series, DataFrame, or their GroupBy companions) – The pandas object to call rolling on.

Returns:

Depending on the input type.

Return type:

Window, Rolling, or RollingGroupBy

class RowsSelector(condition)[source]

Bases: ArgRepr

Select rows from a pandas dataframe or series with some condition.

This is simply a partial for calling a dataframe’s or series’ __getitem__ method (using the square-brackets accessor) with a callable that takes the dataframe or series as input, and produces a 1-D, boolean array-like structure (of the same length as the dataframe or series to select from).

Parameters:

condition (callable or array-like) – A callable that accepts a dataframe or series and produces a 1-D, boolean array-like structure of the same length

__call__(df)[source]

Select rows from a pandas dataframe or series.

Parameters:

df (DataFrame or Series) – The pandas dataframe or series to select rows from.

Returns:

The pandas dataframe or series with only the selected rows.

Return type:

DataFrame or Series

class SetIndex(key, *keys, drop=True, append=False)[source]

Bases: ArgRepr

Simple partial of a pandas dataframe’s set_index method.

Parameters:
  • key (hashable or array-like) – This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays.

  • *keys (hashable) – Additional columns to include into the index.

  • drop (bool, optional) – Delete columns to be used as the new index. Defaults to True.

  • append (bool, optional) – Whether to append columns to existing index. Defaults to False

__call__(df)[source]

Set the index of a pandas dataframe.

Parameters:

df (DataFrame) – The dataframe to set the index of.

Returns:

The Dataframe with a new index set.

Return type:

DataFrame

class SortValues(by, *bys, **kwargs)[source]

Bases: ArgRepr

Partial of the pandas dataframe or series sort_values method.

Parameters:
  • by (hashable or sequence) – Name or list of names to sort by. Ignored if used with a series.

  • *bys (hashable) – Additional names to sort by. Again ignored if used with a series.

  • **kwargs – Additional keyword arguments will be forwarded to the method call with the exception of “inplace”, which will be set to False.

Note

For a full list of keyword arguments and their description, see the pandas sort_values documentation.

__call__(df)[source]

Sort a pandas dataframe or series by column(s) values.

Parameters:

df (DataFrame or Series) – The dataframe or series to sort.

Returns:

The sorted dataframe or series.

Return type:

DataFrame or Series

Raises:

TypeError – If called on anything else other than a pandas series or dataframe.

class Transform(func, axis=0, *args, engine=None, engine_kwargs=None, **kwargs)[source]

Bases: ArgRepr

Simple partial for calling a pandas object’s transform method.

Parameters:
  • func (callable, str, list, or dict, optional) – Function(s) to use for transforming the data.

  • axis (int or str, optional) – Which dimension to aggregate over in case of a dataframe. Must be one of 0, “index”, 1, or “columns”. Ignored for all other pandas objects. Defaults to 0.

  • *args – Positional arguments to pass on to the agg or func call.

  • engine (str, optional) – Which engine to use when applied to group-by objects.

  • engine_kwargs (dict, optional) – Keywords to configure the engine, if any.

  • **kwargs – Keyword arguments to pass on to the agg or func call. When used on a dataframe or series, and if func is None, column names with their individual aggregation functions can be given.

Note

See the pandas transform docs for a full list of (keyword) arguments and description of usage and configuration.

__call__(df)[source]

Call a pandas object’s transform method with cached (kw)args.

Parameters:

df (Series, DataFrame, their GroupBy companions, or Resampler) – The pandas object to transform.

Returns:

Depending on the input type.

Return type:

Series or DataFrame

Raises:

TypeError – When called on an unsuitable object type.