pd
Pandas utilities and partials of dataframe method calls.
Parameters that are known at program start are used to initialize the classes so that, at runtime, dataframes can flow through a preconfigured processing pipe of callable objects.
- class Agg(func=None, *args, **kwargs)[source]
Bases:
ArgRepr
Simple partial for calling a pandas object’s
agg
method.- Parameters:
func (callable, str, list, or dict, optional) – Function(s) to use for aggregating the data. If a function, must work when passed a Series. Also acceptable are a function name, a list of function names and a dictionary with columns names as keys and functions, function names, or lists thereof as values. Defaults to
None
, which only works for a dataframe and relies on kwargs to specify named aggregations.*args – Positional arguments to pass on to the
agg
or func call.**kwargs – Keyword arguments to pass on to the
agg
or func call.
Note
See the pandas agg docs for a full list of (keyword) arguments and an extensive description of usage and configuration.
- class AsType(types, **kwargs)[source]
Bases:
ReprName
Partial of a pandas dataframe or series
astype
method.- Parameters:
types (type or dict) – Single type or dictionary of column names and types, specifying type conversion of entire dataframe or specific columns, respectively
**kwargs – Keyword arguments are passed on to the
astype
method call after the types argument.
- class Assign(col=None, **cols)[source]
Bases:
ArgRepr
Light wrapper around a pandas dataframe’s
assign
method.- Parameters:
col (dict) – A dictionary with the names of newly created (or overwritten) columns as keys. If the values are callable, they are computed on the entire dataframe and assigned to the new columns. The callable must not the change input dataframe (though pandas doesn’t check it). If the values are not callable, e.g., a series, scalar, or array, they are simply assigned.
**cols – As in the original, the keyword arguments themselves serve as the name(s) of the new (or overwritten) column(s) and their values are set in the same way.
- class ColumnMapper(src_col, transform, tgt_col=None, na_action=None)[source]
Bases:
ArgRepr
Transform one column of a pandas dataframe into another.
This is simply a partial of calling the
map
method on one column of a dataframe and assigning the result to the same or another column of the same dataframe.- Parameters:
src_col (Hashable) – Column to call the
map
method on.transform (callable, Mapping, or Series) – Function or mapping in the form of a dictionary or a pandas series.
tgt_col (Hashable, optional) – Dataframe column to store the series resulting from the transformation. Defaults to src_col, thus overwriting it in place.
na_action (str, optional) – Can take the value “ignore” or
None
, defaulting to the latter. Will be passed to the seriesmap
method along with transform.
- __call__(df)[source]
Called the
map
method on a specified column of a DataFrame.Cached keyword arguments are forwarded to the method call and the result is stored in the specified column of the DataFrame.
- Parameters:
df (DataFrame) – Pandas dataframe with the column to call the
map
method on.- Returns:
Pandas dataframe with the result of the column transformation in the specified column.
- Return type:
DataFrame
- class ColumnSelector(col)[source]
Bases:
ArgRepr
Select a single column of a (grouped) pandas dataframe as a series.
This is simply a partial for calling a (grouped) dataframe’s
__getitem__
method with a single argument (using the square-brackets accessor).- Parameters:
col (hashable) – Single DataFrame column to select.
- class ColumnsSelector(col=(), *cols)[source]
Bases:
ArgRepr
Select one or more columns of a (grouped) pandas dataframe as dataframe.
This is simply a partial for calling a (grouped) dataframe’s
__getitem__
method with a list of arguments (using the square-brackets accessor).- Parameters:
col (Hashable, optional) – Column name or sequence thereof. Defaults to an empty tuple.
*cols (Hashable) – Additional columns names.
- __call__(df)[source]
Select the specified column(s) from a (grouped) pandas dataframe.
- Parameters:
df (DataFrame or DataFrameGroupBy) – Pandas dataframe or grouped dataframe to select column(s) from.
- Returns:
The selected column(s) of the (grouped) dataframe.
- Return type:
DataFrame or DataFrameGroupBy
- class Drop(label=None, *labels, axis=1, index=None, columns=None, level=None, errors='raise')[source]
Bases:
ArgRepr
A simple partial of a pandas dataframe or series’
drop
method.- Parameters:
labels (hashable or sequence, optional) – Index or column labels to drop. Defaults to
None
.axis (1 or "columns", 0 or "index") – Whether to drop labels from the columns (1 or “columns”) or index (0 or “index”). Defaults to 1
index (hashable or sequence, optional) – Single label or list-like. Defaults to
None
. Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).columns (hashable or sequence, optional) – Single label or list-like. Defaults to
None
. Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).level (hashable, optional.) – Integer or level name. Defaults to
None
. For MultiIndex, level from which the labels will be removed.errors ("raise" or "ignore") – Defaults to “raise”. If “ignore”, suppress error and drop only existing labels.
- class DropNA(axis=0, how=None, thresh=None, subset=None, ignore_index=False)[source]
Bases:
ArgRepr
A simple partial of a pandas dataframe or series’
dropna
method.- Parameters:
axis (0 or "index", 1 or "columns") – Determine if rows or columns which contain missing values are removed. Defaults to 0.
how ("any" or "all") – Determine if row or column is removed from DataFrame, when we have at least one NA or all NA. Defaults to “any”.
thresh (int, optional) – Require that many non-NA values. Cannot be combined with how. Defaults to
None
.subset (hashable or sequence, optional) – Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include. Defaults to
None
.ignore_index (bool, optional) – Defaults to
False
. IfTrue
, the resulting axis will be labeled 0, 1, …, n - 1.
- class GroupBy(by=None, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]
Bases:
ArgRepr
Simple partial of a pandas dataframe and series
groupby
method.- Parameters:
by (str, callable, series, array, dict, or list) – Column name, function (to be called on each column name), list or numpy array of the same length as the columns, a dict or series providing a label -> group name mapping, or a list of the above.
level (hashable or sequence, optional) – If the axis is a multi-index (hierarchical), group by a particular level or levels. Do not specify both by and level. Defaults to
None
.as_index (bool, optional) – Whether to return group labels as index. Defaults to
True
.sort (bool, optional) – Whether to sort group keys. Defaults to
True
.group_keys (bool, optional) – Defaults to
True
observed (bool, optional) – Whether to show only observed values for categorical groupers. Defaults to
False
.dropna (bool, optional) – Whether to treat NA values in group keys as groups. Defaults to
True
.
Note
For a more extensive description of all (keyword) arguments, see the pandas documentation.
- class GroupByApply(func, *args, **kwargs)[source]
Bases:
ArgRepr
Partial for calling a grouped dataframe or series
apply
method.- Parameters:
func (callable) – A callable that takes a dataframe or series as its first argument, and returns a dataframe, a series or a scalar.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.
- class Join(*args, **kwargs)[source]
Bases:
ArgRepr
Light wrapper around the pandas dataframe
join
method.- Parameters:
*args – Arguments to pass on to the
join
method call.**kwargs – Keyword arguments to pass on to the
join
method call.
Note
For a full list of (keyword) arguments and their description, see the pandas join documentation.
- __call__(df, other)[source]
Join a dataframe with other dataframe(s) and/or series.
- Parameters:
df (DataFrame) – Source dataframe on which the
join
method will be called.other (DataFrame, Series, or a list of any combination) – Index should be similar to one (or more) columns in df. If a series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined dataframe.
- Returns:
The joined dataframe.
- Return type:
DataFrame
- class ParquetReader(path='', **kwargs)[source]
Bases:
ArgRepr
Light wrapper around the top-level
read_parquet
pandas function.- Parameters:
path (str, optional) – Directory under which the parquet files are located or full path to the parquet file. If not fully specified here, the path must be completed on calling the instance. Defaults to the current working directory of the python interpreter.
**kwargs – Keyword arguments passed on to the
read_parquet
function call.
- __call__(path='')[source]
Read one or more parquet file(s) into pandas DataFrame.
- Parameters:
path (str, optional) – Path (including file name) to the directory where the parquet file(s) to read are located. If it starts with a backslash, it will be interpreted as absolute, if not, as relative to the path specified at instantiation. Defaults to an empty string, which results in an unchanged path.
- Return type:
DataFrame
- class ParquetWriter(path='', create=False, **kwargs)[source]
Bases:
ArgRepr
Partial of the pandas dataframe
to_parquet
method.- Parameters:
path (str, optional) – Path (including file name) to save the parquet file to. May include any number of string placeholders (i.e., pairs of curly brackets) that will be interpolated when instances are called. Defaults to the current working directory of the python interpreter.
create (bool, optional) – What to do if the directory where the parquet file should be saved does not exist. Defaults to
False
.**kwargs – Keyword arguments passed on to the
to_parquet
method call.
- __call__(df, *parts)[source]
Write a pandas DataFrame to parquet file.
- Parameters:
df (DataFrame) – The dataframe to write.
*parts (str, optional) – Fragments that will be interpolated into the path string given at instantiation. Obviously, there must be at least as many as there are placeholders in the path.
- Returns:
An empty tuple.
- Return type:
tuple
- class Rename(mapper=None, index=None, columns=None, axis=1, level=None, errors='ignore')[source]
Bases:
ArgRepr
Simple partial of a pandas dataframe’s
rename
method.- Parameters:
mapper (dict-like or function) – Dict-like or function transformations to apply to the axis values.
index (dict-like or function) – Alternative to specifying mapper with axis = 0.
columns (dict-like or function) – Alternative to specifying mapper with axis = 1.
axis (1 or "columns", 0 or "index", optional) – Axis to target with mapper. Defaults to 1.
level (Hashable, optional) – In case of a MultiIndex, only rename labels in the specified level. Defaults to
None
errors ("ignore" or "raise", optional) – If “raise”, raise a
KeyError
when a dict-like mapper, index, or columns contains labels that are not present in the index being transformed. If “ignore”, existing keys will be renamed and extra keys will be ignored. Defaults to “ignore”.
- __call__(df)[source]
Rename a pandas dataframe’s columns or rows.
- Parameters:
df (DataFrame) – The dataframe to rename columns or rows of.
- Returns:
The dataframe with renamed columns or rows.
- Return type:
DataFrame
- property resolved
Resolved mapper-axis vs. index vs. columns keywords.
- class ResetIndex(level=None, drop=False, col_level=0, col_fill='', allow_duplicates=False, names=None)[source]
Bases:
ArgRepr
Simple partial of a pandas dataframe’s
reset_index
method.- Parameters:
level (int, str, tuple, or list, optional) – Only remove the given levels from the index. Defaults to
None
, which removes all levels.drop (bool, optional) – Do not try to insert index into dataframe columns. This resets the index to the default integer index. Default to
False
.col_level (int or str, optional) – If the columns have multiple levels, determines which level the labels are inserted into. Default to 0.
col_fill (Hashable, optional) – If the columns have multiple levels, determines how the other levels are named. Defaults to an empty string.
allow_duplicates (bool, optional) – Allow duplicate column labels to be created. Defaults to
False
names (hashable or sequence, optional) – Using the given string, rename the dataframe column which contains the index data. If the dataframe has a multiindex, this has to be a list or tuple with length equal to the number of levels. Defaults to
None
.
- class RollingGroupByApply(func, raw=False, engine=None, engine_kws=None, *args, **kwargs)[source]
Bases:
ArgRepr
Partial for calling a rolling-grouped dataframe’s
apply
method.- Parameters:
func (callable) – Must produce a single numerical value from a numpy ndarray if raw =
True
or a series if raw =False
. Can also accept a numba JIT function with engine = “numba” specified.raw (bool, optional) – Whether to pass a numpy ndarray or a pandas series to func. Defaults to
False
.engine (str, optional) – Either “cython” or “numba”. Defaults to
None
engine_kws (dict, optional) – Configuration of the “numba” engine. Keys can be “nopython”, “nogil”, and “parallel”, and values must be
True
orFalse
. Defaults toNone
.*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.
- class RollingWindow(*args, **kwargs)[source]
Bases:
ArgRepr
Simple partial of for calling a pandas object’s
rolling
method.- Parameters:
*args – Arguments to pass on to the
rolling
method call.**kwargs – Keyword arguments to pass on to the
rolling
method call.
Notes
See the pandas rolling docs for a full list of (keyword) arguments and an extensive description of usage.
- class RowsSelector(condition)[source]
Bases:
ArgRepr
Select rows from a pandas dataframe with a boolean mask or function.
This is simply a partial for calling a dataframe’s
__getitem__
method (using the square-brackets accessor) with a callable that takes the dataframe as input, and produces a 1-D, boolean array-like structure (of the same length as the dataframe to select from).- Parameters:
condition (callable or array-like) – A callable that accepts a dataframe and produces a 1-D, boolean array-like structure of the same length
- class SetIndex(keys, drop=True, append=False, verify_integrity=False)[source]
Bases:
ArgRepr
Simple partial of a pandas dataframe’s
set_index
method.- Parameters:
keys (hashable or array-like) – This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays.
drop (bool, optional) – Delete columns to be used as the new index. Defaults to
True
.append (bool, optional) – Whether to append columns to existing index. Defaults to
False
verify_integrity (bool, optional) – Whether to check the new index for duplicates. Defaults to
False
. Setting toTrue
will impact the performance of this method.
- class SortValues(by, **kwargs)[source]
Bases:
ArgRepr
Partial of the pandas dataframe
sort_values
method.- Parameters:
by (hashable or sequence) – Name or list of names to sort by.
**kwargs – Additional keyword arguments will be forwarded to the method call with the exception of “inplace”, which will be set to
False
.
Note
For a full list of keyword arguments and their description, see the pandas sort_values documentation.