pd
Pandas utilities and partials of dataframe method calls.
Parameters that are known at program start are used to initialize the classes so that, at runtime, dataframes can flow through a preconfigured processing pipe of callable objects.
- class Agg(func=None, axis=0, *args, engine=None, engine_kwargs=None, **kwargs)[source]
Bases:
ArgReprSimple partial for calling a pandas object’s
aggmethod.- Parameters:
func (callable, str, list, or dict, optional) – Function(s) to use for aggregating the data. If a function, must work when passed a Series. Also acceptable are a function name, a list of function names and a dictionary with columns names as keys and functions, function names, or lists thereof as values. Defaults to
None, which only works for a dataframe and relies on kwargs to specify named aggregations.axis (int or str, optional) – Which dimension to aggregate over in case of a dataframe. Must be one of 0, “index”, 1, or “columns”. Ignored for all other pandas objects. Defaults to 0.
*args – Positional arguments to pass on to the
aggor func call.engine (str, optional) – Which engine to use when applied to group-by objects.
engine_kwargs (dict, optional) – Keywords to configure the engine, if any.
**kwargs – Keyword arguments to pass on to the
aggor func call. When used on a dataframe or series, and if func isNone, column names with their individual aggregation functions can be given.
Note
See the pandas agg docs for a full list of (keyword) arguments and an extensive description of usage and configuration.
- class Apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine=None, engine_kwargs=None, **kwargs)[source]
Bases:
ArgReprSimple partial for calling a pandas object’s
applymethod.- Parameters:
func (callable, str, list, or dict) – Function(s) to apply to the data.
axis (int or str, optional) – Which dimension to apply func over in case of a dataframe. Must be one of 0, “index”, 1, or “columns”. Ignored for all other pandas objects. Defaults to 0.
raw (bool, optional) – Whether to pass a series or a numpy array to func. Defaults to
False, which results in a series being passed.result_type (str, optional) – Must be one of “expand”, “reduce”, “broadcast”, or
None.args (tuple, optional) – Positional arguments to pass on to func. Defaults to an emtpy tuple.
by_row (str or bool, optional) – Must be one of “compat” or
False.engine (str or decorator, optional) – Which engine to use. Defaults to the python interpreter.
engine_kwargs (dict, optional) – Keywords to configure the engine, if any.
**kwargs – Keyword arguments to pass on to the func call.
Note
See the pandas apply docs for a full list of (keyword) arguments and a description of usage and configuration.
- class AsFreq(freq, method=None, how=None, normalize=False, fill_value=None)[source]
Bases:
ArgReprLight wrapper around a pandas dataframe or series
asfreqmethod.- Parameters:
freq (DateOffset or str) – Frequency DateOffset or string.
method (str, optional) – Method to use for filling holes in re-indexed Series (note this does not fill NaNs that already were present). Must be one of “pad”/”ffill” or “backfill”/”bfill”. Defaults to
None.how (str, optional) – For PeriodIndex only. Must be one of “start” or “end”. Defaults to
None.normalize (bool, optional) – Whether to reset output index to midnight. defaults to
Falsefill_value (scalar, optional) – Value to use for missing values, applied during upsampling (note this does not fill NaNs that already were present).
- class AsType(types, errors='raise')[source]
Bases:
ReprNamePartial of a pandas dataframe or series
astypemethod.- Parameters:
types (type or dict) – Single type or dictionary of column names and types, specifying type conversion of entire dataframe or specific columns, respectively
errors (str, optional) – What to do when data cannot be type cast. Must be one of “raise” or “ignore”. Defaults to “raise”.
- class Assign(col=None, **cols)[source]
Bases:
ArgReprLight wrapper around a pandas dataframe’s
assignmethod.- Parameters:
col (dict) – A dictionary with the names of newly created (or overwritten) columns as keys. If the values are callable, they are computed on the entire dataframe and assigned to the new columns. The callable must not the change input dataframe (though pandas doesn’t check it). If the values are not callable, e.g., a series, scalar, or array, they are simply assigned.
**cols – As in the original, the keyword arguments themselves serve as the name(s) of the new (or overwritten) column(s) and their values are set in the same way.
- class ColumnSelector(col)[source]
Bases:
ArgReprSelect a single column of a (grouped) pandas dataframe as a series.
This is simply a partial for calling a (grouped) dataframe’s
__getitem__method with a single argument (using the square-brackets accessor).- Parameters:
col (hashable) – Single DataFrame column to select.
- class ColumnsSelector(col=(), *cols)[source]
Bases:
ArgReprSelect one or more columns of a (grouped) pandas dataframe as dataframe.
This is simply a partial for calling a (grouped) dataframe’s
__getitem__method with a list of arguments (using the square-brackets accessor).- Parameters:
col (hashable or array-like, optional) – Column name or sequence thereof. Defaults to an empty tuple.
*cols (hashable) – Additional columns names.
- __call__(df)[source]
Select the specified column(s) from a (grouped) pandas dataframe.
- Parameters:
df (DataFrame or DataFrameGroupBy) – Pandas dataframe or grouped dataframe to select column(s) from.
- Returns:
The selected column(s) of the (grouped) dataframe.
- Return type:
DataFrame or DataFrameGroupBy
- class Copy(deep=True)[source]
Bases:
ArgReprPartial of the
copymethod of a pandas dataframe or series.- Parameters:
deep (bool, optional) – Makes a deep copy when
True, including a copy of the data and the indices. WhenFalse, neither the indices nor the data are copied. Defaults toTrue
- __call__(df)[source]
Call the
copymethod of the passed pandas dataframe or series.- Parameters:
df (DataFrame or Series) – The pandas object to call the
copymethod of.- Returns:
Copy of the pandas object passed, deep or shallow, depending on the flag set at instantiation.
- Return type:
DataFrame or Series
- class Drop(label=None, *labels, axis=1, index=None, columns=None, level=None, errors='raise')[source]
Bases:
ArgReprA simple partial of a pandas dataframe or series’
dropmethod.- Parameters:
labels (hashable or sequence, optional) – Index or column labels to drop. Defaults to
None.axis (1 or "columns", 0 or "index") – Whether to drop labels from the columns (1 or “columns”) or index (0 or “index”). Defaults to 1
index (hashable or sequence, optional) – Single label or list-like. Defaults to
None. Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).columns (hashable or sequence, optional) – Single label or list-like. Defaults to
None. Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).level (hashable, optional.) – Integer or level name. Defaults to
None. For MultiIndex, level from which the labels will be removed.errors ("raise" or "ignore") – Defaults to “raise”. If “ignore”, suppress error and drop only existing labels.
- __call__(df)[source]
Drop rows or columns from a pandas series or dataframe.
- Parameters:
df (Series or DataFrame) – The object to drop rows or columns from.
- Returns:
The object with rows or columns dropped.
- Return type:
Series or DataFrame
- property resolved
Resolved labels-axis vs. index-columns keywords.
- class DropNA(axis=0, how=None, thresh=None, subset=None, ignore_index=True)[source]
Bases:
ArgReprA simple partial of a pandas dataframe or series’
dropnamethod.- Parameters:
axis (0 or "index", 1 or "columns") – Determine if rows or columns which contain missing values are removed. Defaults to 0.
how ("any" or "all", optional) – Determine if row or column is removed from DataFrame, when we have at least one NA or all NA. Defaults to
None.thresh (int, optional) – Require that many non-NA values. Cannot be combined with how. Defaults to
None.subset (hashable or sequence, optional) – Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include. Defaults to
None.ignore_index (bool, optional) – Defaults to
True, thus relabeling the resulting axis as 0, 1, …, n - 1.
- class Explode(col=(), *cols, ignore_index=False)[source]
Bases:
ArgReprPartial of a pandas dataframe or series
explodemethod.- Parameters:
col (hashable or sequence, optional) – Column name or sequence of column names to explode. Only relevant when called on a DataFrame.
*cols (hashable) – Additional column names to explode.
ignore_index (bool, optional) – If
True, the resulting index will be reset. Otherwise, it will be exploded as well, introducing duplicates. Defaults toFalse.
- __call__(df)[source]
Explode a dataframe or series.
- Parameters:
df (DataFrame or Series) – Pandas dataframe or series to explode.
- Returns:
Exploded pandas dataframe or series.
- Return type:
DataFrame or Series
- Raises:
TypeError – When called on a dataframe with no col specified or when called on an object other than a dataframe or series.
- class FillNA(value, axis=0, limit=None)[source]
Bases:
ArgReprLight wrapper around a pandas dataframe or series
fillnamethod.- Parameters:
value (scalar, dict, Series, or DataFrame) – Value to use to fill holes (e.g. 0), alternately a dict, Series, or DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict or Series or DataFrame will not be filled. This value cannot be a list.
axis (int or str, optional) – Axis along which to fill missing values in case of a dataframe. Must be one of 0, “index”, 1, or “columns”. Ignored for Series. Defaults to 0.
limit (int, optional) – This is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
- class GroupBy(by=None, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]
Bases:
ArgReprSimple partial of a pandas dataframe or series
groupbymethod.- Parameters:
by (str, callable, series, array, dict, or list) – Column name, function (to be called on each column name), list or numpy array of the same length as the columns, a dict or series providing a label -> group name mapping, or a list of the above.
level (hashable or sequence, optional) – If the axis is a multi-index (hierarchical), group by a particular level or levels. Do not specify both by and level. Defaults to
None.as_index (bool, optional) – Whether to return group labels as index. Defaults to
True.sort (bool, optional) – Whether to sort group keys. Defaults to
True.group_keys (bool, optional) – Defaults to
Trueobserved (bool, optional) – Whether to show only observed values for categorical groupers. Defaults to
False.dropna (bool, optional) – Whether to treat NA values in group keys as groups. Defaults to
True.
Note
For a more extensive description of all (keyword) arguments, see the pandas documentation.
- class Join(*args, **kwargs)[source]
Bases:
ArgReprLight wrapper around the pandas dataframe
joinmethod.- Parameters:
*args – Arguments to pass on to the
joinmethod call.**kwargs – Keyword arguments to pass on to the
joinmethod call.
Note
For a full list of (keyword) arguments and their description, see the pandas join documentation.
- __call__(df, other)[source]
Join a dataframe with other dataframe(s) and/or series.
- Parameters:
df (DataFrame) – Source dataframe on which the
joinmethod will be called.other (DataFrame, Series, or a list of any combination) – Index should be similar to one (or more) columns in df. If a series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined dataframe.
- Returns:
The joined dataframe.
- Return type:
DataFrame
- class Mapper(func, na_action=None, engine=None, **kwargs)[source]
Bases:
ArgReprPartial of a pandas dataframe or series
mapmethod.- Parameters:
func (callable, Mapping, or Series) – Function or mapping in the form of a dictionary or a pandas series.
na_action (str, optional) – Can take the value “ignore” or
None, defaulting to the latter.engine (decorator, optional) – The engine to use for a pandas series, Defaults to
None.**kwargs (Any) – Keyword arguments are pass on to func.
- __call__(df)[source]
Called the
mapmethod of a pandas series or dataframe.Cached keyword arguments are forwarded to the method call.
- Parameters:
df (DataFrame or Series) – Pandas dataframe with the column to call the
mapmethod on.- Returns:
Pandas object with the result or the map operation.
- Return type:
DataFrame or Series
- Raises:
TypeError – When called on an unsuitable object type.
- class Rename(mapper=None, index=None, columns=None, axis=1, level=None, errors='ignore')[source]
Bases:
ArgReprSimple partial of a pandas dataframe or series
renamemethod.- Parameters:
mapper (dict-like or function) – Dict-like or function transformations to apply to the axis values.
index (dict-like or function) – Alternative to specifying mapper with axis = 0.
columns (dict-like or function) – Alternative to specifying mapper with axis = 1.
axis (1 or "columns", 0 or "index", optional) – Axis to target with mapper. Defaults to 1.
level (Hashable, optional) – In case of a MultiIndex, only rename labels in the specified level. Defaults to
Noneerrors ("ignore" or "raise", optional) – If “raise”, raise a
KeyErrorwhen a dict-like mapper, index, or columns contains labels that are not present in the index being transformed. If “ignore”, existing keys will be renamed and extra keys will be ignored. Defaults to “ignore”.
- __call__(df)[source]
Rename a pandas dataframe’s or series’ columns or rows.
- Parameters:
df (DataFrame or Series) – The dataframe or series to rename columns or rows of.
- Returns:
The dataframe or series with renamed columns or rows.
- Return type:
DataFrame or Series
- Raises:
TypeError – When called on an unsuitable object type.
- property resolved
Resolved mapper-axis vs. index-columns keywords.
- class ResetIndex(level=None, drop=False, col_level=0, col_fill='', allow_duplicates=False, names=None)[source]
Bases:
ArgReprSimple partial of a pandas dataframe or series
reset_indexmethod.- Parameters:
level (int, str, tuple, or list, optional) – Only remove the given levels from the index. Defaults to
None, which removes all levels.drop (bool, optional) – Do not try to insert index into dataframe columns. This resets the index to the default integer index. Default to
False.col_level (int or str, optional) – If the columns have multiple levels, determines which level the labels are inserted into. Default to 0.
col_fill (Hashable, optional) – If the columns have multiple levels, determines how the other levels are named. Defaults to an empty string.
allow_duplicates (bool, optional) – Allow duplicate column labels to be created. Defaults to
Falsenames (hashable or sequence, optional) – Using the given string, rename the dataframe column which contains the index data. If the dataframe has a multiindex, this has to be a list or tuple with length equal to the number of levels. Defaults to
None.
- class RollingWindow(*args, **kwargs)[source]
Bases:
ArgReprSimple partial of for calling a pandas object’s
rollingmethod.- Parameters:
*args – Arguments to pass on to the
rollingmethod call.**kwargs – Keyword arguments to pass on to the
rollingmethod call.
Notes
See the pandas rolling docs for a full list of (keyword) arguments and an extensive description of usage.
- class RowsSelector(condition)[source]
Bases:
ArgReprSelect rows from a pandas dataframe or series with some condition.
This is simply a partial for calling a dataframe’s or series’
__getitem__method (using the square-brackets accessor) with a callable that takes the dataframe or series as input, and produces a 1-D, boolean array-like structure (of the same length as the dataframe or series to select from).- Parameters:
condition (callable or array-like) – A callable that accepts a dataframe or series and produces a 1-D, boolean array-like structure of the same length
- class SetIndex(key, *keys, drop=True, append=False)[source]
Bases:
ArgReprSimple partial of a pandas dataframe’s
set_indexmethod.- Parameters:
key (hashable or array-like) – This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays.
*keys (hashable) – Additional columns to include into the index.
drop (bool, optional) – Delete columns to be used as the new index. Defaults to
True.append (bool, optional) – Whether to append columns to existing index. Defaults to
False
- class SortValues(by, *bys, **kwargs)[source]
Bases:
ArgReprPartial of the pandas dataframe or series
sort_valuesmethod.- Parameters:
by (hashable or sequence) – Name or list of names to sort by. Ignored if used with a series.
*bys (hashable) – Additional names to sort by. Again ignored if used with a series.
**kwargs – Additional keyword arguments will be forwarded to the method call with the exception of “inplace”, which will be set to
False.
Note
For a full list of keyword arguments and their description, see the pandas sort_values documentation.
- __call__(df)[source]
Sort a pandas dataframe or series by column(s) values.
- Parameters:
df (DataFrame or Series) – The dataframe or series to sort.
- Returns:
The sorted dataframe or series.
- Return type:
DataFrame or Series
- Raises:
TypeError – If called on anything else other than a pandas series or dataframe.
- class Transform(func, axis=0, *args, engine=None, engine_kwargs=None, **kwargs)[source]
Bases:
ArgReprSimple partial for calling a pandas object’s
transformmethod.- Parameters:
func (callable, str, list, or dict, optional) – Function(s) to use for transforming the data.
axis (int or str, optional) – Which dimension to aggregate over in case of a dataframe. Must be one of 0, “index”, 1, or “columns”. Ignored for all other pandas objects. Defaults to 0.
*args – Positional arguments to pass on to the
aggor func call.engine (str, optional) – Which engine to use when applied to group-by objects.
engine_kwargs (dict, optional) – Keywords to configure the engine, if any.
**kwargs – Keyword arguments to pass on to the
aggor func call. When used on a dataframe or series, and if func isNone, column names with their individual aggregation functions can be given.
Note
See the pandas transform docs for a full list of (keyword) arguments and description of usage and configuration.
- __call__(df)[source]
Call a pandas object’s
transformmethod with cached (kw)args.- Parameters:
df (Series, DataFrame, their GroupBy companions, or Resampler) – The pandas object to transform.
- Returns:
Depending on the input type.
- Return type:
Series or DataFrame
- Raises:
TypeError – When called on an unsuitable object type.