pd
Pandas utilities and partials of dataframe method calls.
Parameters that are known at program start are used to initialize the classes so that, at runtime, dataframes can flow through a preconfigured processing pipe of callable objects.
- class Agg(func=None, *args, **kwargs)[source]
Bases:
ArgReprSimple partial for calling a pandas object’s
aggmethod.- Parameters:
func (callable, str, list, or dict, optional) – Function(s) to use for aggregating the data. If a function, must work when passed a Series. Also acceptable are a function name, a list of function names and a dictionary with columns names as keys and functions, function names, or lists thereof as values. Defaults to
None, which only works for a dataframe and relies on kwargs to specify named aggregations.*args – Positional arguments to pass on to the
aggor func call.**kwargs – Keyword arguments to pass on to the
aggor func call.
Note
See the pandas agg docs for a full list of (keyword) arguments and an extensive description of usage and configuration.
- class AsType(types, **kwargs)[source]
Bases:
ReprNamePartial of a pandas dataframe or series
astypemethod.- Parameters:
types (type or dict) – Single type or dictionary of column names and types, specifying type conversion of entire dataframe or specific columns, respectively
**kwargs – Keyword arguments are passed on to the
astypemethod call after the types argument.
- class Assign(col=None, **cols)[source]
Bases:
ArgReprLight wrapper around a pandas dataframe’s
assignmethod.- Parameters:
col (dict) – A dictionary with the names of newly created (or overwritten) columns as keys. If the values are callable, they are computed on the entire dataframe and assigned to the new columns. The callable must not the change input dataframe (though pandas doesn’t check it). If the values are not callable, e.g., a series, scalar, or array, they are simply assigned.
**cols – As in the original, the keyword arguments themselves serve as the name(s) of the new (or overwritten) column(s) and their values are set in the same way.
- class ColumnMapper(src_col, transform, tgt_col=None, na_action=None)[source]
Bases:
ArgReprTransform one column of a pandas dataframe into another.
This is simply a partial of calling the
mapmethod on one column of a dataframe and assigning the result to the same or another column of the same dataframe.- Parameters:
src_col (Hashable) – Column to call the
mapmethod on.transform (callable, Mapping, or Series) – Function or mapping in the form of a dictionary or a pandas series.
tgt_col (Hashable, optional) – Dataframe column to store the series resulting from the transformation. Defaults to src_col, thus overwriting it in place.
na_action (str, optional) – Can take the value “ignore” or
None, defaulting to the latter. Will be passed to the seriesmapmethod along with transform.
- __call__(df)[source]
Called the
mapmethod on a specified column of a DataFrame.Cached keyword arguments are forwarded to the method call and the result is stored in the specified column of the DataFrame.
- Parameters:
df (DataFrame) – Pandas dataframe with the column to call the
mapmethod on.- Returns:
Pandas dataframe with the result of the column transformation in the specified column.
- Return type:
DataFrame
- class ColumnSelector(col)[source]
Bases:
ArgReprSelect a single column of a (grouped) pandas dataframe as a series.
This is simply a partial for calling a (grouped) dataframe’s
__getitem__method with a single argument (using the square-brackets accessor).- Parameters:
col (hashable) – Single DataFrame column to select.
- class ColumnsSelector(col=(), *cols)[source]
Bases:
ArgReprSelect one or more columns of a (grouped) pandas dataframe as dataframe.
This is simply a partial for calling a (grouped) dataframe’s
__getitem__method with a list of arguments (using the square-brackets accessor).- Parameters:
col (Hashable, optional) – Column name or sequence thereof. Defaults to an empty tuple.
*cols (Hashable) – Additional columns names.
- __call__(df)[source]
Select the specified column(s) from a (grouped) pandas dataframe.
- Parameters:
df (DataFrame or DataFrameGroupBy) – Pandas dataframe or grouped dataframe to select column(s) from.
- Returns:
The selected column(s) of the (grouped) dataframe.
- Return type:
DataFrame or DataFrameGroupBy
- class Drop(label=None, *labels, axis=1, index=None, columns=None, level=None, errors='raise')[source]
Bases:
ArgReprA simple partial of a pandas dataframe or series’
dropmethod.- Parameters:
labels (hashable or sequence, optional) – Index or column labels to drop. Defaults to
None.axis (1 or "columns", 0 or "index") – Whether to drop labels from the columns (1 or “columns”) or index (0 or “index”). Defaults to 1
index (hashable or sequence, optional) – Single label or list-like. Defaults to
None. Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).columns (hashable or sequence, optional) – Single label or list-like. Defaults to
None. Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).level (hashable, optional.) – Integer or level name. Defaults to
None. For MultiIndex, level from which the labels will be removed.errors ("raise" or "ignore") – Defaults to “raise”. If “ignore”, suppress error and drop only existing labels.
- class DropNA(axis=0, how=None, thresh=None, subset=None, ignore_index=False)[source]
Bases:
ArgReprA simple partial of a pandas dataframe or series’
dropnamethod.- Parameters:
axis (0 or "index", 1 or "columns") – Determine if rows or columns which contain missing values are removed. Defaults to 0.
how ("any" or "all") – Determine if row or column is removed from DataFrame, when we have at least one NA or all NA. Defaults to “any”.
thresh (int, optional) – Require that many non-NA values. Cannot be combined with how. Defaults to
None.subset (hashable or sequence, optional) – Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include. Defaults to
None.ignore_index (bool, optional) – Defaults to
False. IfTrue, the resulting axis will be labeled 0, 1, …, n - 1.
- class GroupBy(by=None, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)[source]
Bases:
ArgReprSimple partial of a pandas dataframe and series
groupbymethod.- Parameters:
by (str, callable, series, array, dict, or list) – Column name, function (to be called on each column name), list or numpy array of the same length as the columns, a dict or series providing a label -> group name mapping, or a list of the above.
level (hashable or sequence, optional) – If the axis is a multi-index (hierarchical), group by a particular level or levels. Do not specify both by and level. Defaults to
None.as_index (bool, optional) – Whether to return group labels as index. Defaults to
True.sort (bool, optional) – Whether to sort group keys. Defaults to
True.group_keys (bool, optional) – Defaults to
Trueobserved (bool, optional) – Whether to show only observed values for categorical groupers. Defaults to
False.dropna (bool, optional) – Whether to treat NA values in group keys as groups. Defaults to
True.
Note
For a more extensive description of all (keyword) arguments, see the pandas documentation.
- class GroupByApply(func, *args, **kwargs)[source]
Bases:
ArgReprPartial for calling a grouped dataframe or series
applymethod.- Parameters:
func (callable) – A callable that takes a dataframe or series as its first argument, and returns a dataframe, a series or a scalar.
*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.
- class Join(*args, **kwargs)[source]
Bases:
ArgReprLight wrapper around the pandas dataframe
joinmethod.- Parameters:
*args – Arguments to pass on to the
joinmethod call.**kwargs – Keyword arguments to pass on to the
joinmethod call.
Note
For a full list of (keyword) arguments and their description, see the pandas join documentation.
- __call__(df, other)[source]
Join a dataframe with other dataframe(s) and/or series.
- Parameters:
df (DataFrame) – Source dataframe on which the
joinmethod will be called.other (DataFrame, Series, or a list of any combination) – Index should be similar to one (or more) columns in df. If a series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined dataframe.
- Returns:
The joined dataframe.
- Return type:
DataFrame
- class ParquetReader(path='', **kwargs)[source]
Bases:
ArgReprLight wrapper around the top-level
read_parquetpandas function.- Parameters:
path (str, optional) – Directory under which the parquet files are located or full path to the parquet file. If not fully specified here, the path must be completed on calling the instance. Defaults to the current working directory of the python interpreter.
**kwargs – Keyword arguments passed on to the
read_parquetfunction call.
- __call__(path='')[source]
Read one or more parquet file(s) into pandas DataFrame.
- Parameters:
path (str, optional) – Path (including file name) to the directory where the parquet file(s) to read are located. If it starts with a backslash, it will be interpreted as absolute, if not, as relative to the path specified at instantiation. Defaults to an empty string, which results in an unchanged path.
- Return type:
DataFrame
- class ParquetWriter(path='', create=False, **kwargs)[source]
Bases:
ArgReprPartial of the pandas dataframe
to_parquetmethod.- Parameters:
path (str, optional) – Path (including file name) to save the parquet file to. May include any number of string placeholders (i.e., pairs of curly brackets) that will be interpolated when instances are called. Defaults to the current working directory of the python interpreter.
create (bool, optional) – What to do if the directory where the parquet file should be saved does not exist. Defaults to
False.**kwargs – Keyword arguments passed on to the
to_parquetmethod call.
- __call__(df, *parts)[source]
Write a pandas DataFrame to parquet file.
- Parameters:
df (DataFrame) – The dataframe to write.
*parts (str, optional) – Fragments that will be interpolated into the path string given at instantiation. Obviously, there must be at least as many as there are placeholders in the path.
- Returns:
An empty tuple.
- Return type:
tuple
- class Rename(mapper=None, index=None, columns=None, axis=1, level=None, errors='ignore')[source]
Bases:
ArgReprSimple partial of a pandas dataframe’s
renamemethod.- Parameters:
mapper (dict-like or function) – Dict-like or function transformations to apply to the axis values.
index (dict-like or function) – Alternative to specifying mapper with axis = 0.
columns (dict-like or function) – Alternative to specifying mapper with axis = 1.
axis (1 or "columns", 0 or "index", optional) – Axis to target with mapper. Defaults to 1.
level (Hashable, optional) – In case of a MultiIndex, only rename labels in the specified level. Defaults to
Noneerrors ("ignore" or "raise", optional) – If “raise”, raise a
KeyErrorwhen a dict-like mapper, index, or columns contains labels that are not present in the index being transformed. If “ignore”, existing keys will be renamed and extra keys will be ignored. Defaults to “ignore”.
- __call__(df)[source]
Rename a pandas dataframe’s columns or rows.
- Parameters:
df (DataFrame) – The dataframe to rename columns or rows of.
- Returns:
The dataframe with renamed columns or rows.
- Return type:
DataFrame
- property resolved
Resolved mapper-axis vs. index vs. columns keywords.
- class ResetIndex(level=None, drop=False, col_level=0, col_fill='', allow_duplicates=False, names=None)[source]
Bases:
ArgReprSimple partial of a pandas dataframe’s
reset_indexmethod.- Parameters:
level (int, str, tuple, or list, optional) – Only remove the given levels from the index. Defaults to
None, which removes all levels.drop (bool, optional) – Do not try to insert index into dataframe columns. This resets the index to the default integer index. Default to
False.col_level (int or str, optional) – If the columns have multiple levels, determines which level the labels are inserted into. Default to 0.
col_fill (Hashable, optional) – If the columns have multiple levels, determines how the other levels are named. Defaults to an empty string.
allow_duplicates (bool, optional) – Allow duplicate column labels to be created. Defaults to
Falsenames (hashable or sequence, optional) – Using the given string, rename the dataframe column which contains the index data. If the dataframe has a multiindex, this has to be a list or tuple with length equal to the number of levels. Defaults to
None.
- class RollingGroupByApply(func, raw=False, engine=None, engine_kws=None, *args, **kwargs)[source]
Bases:
ArgReprPartial for calling a rolling-grouped dataframe’s
applymethod.- Parameters:
func (callable) – Must produce a single numerical value from a numpy ndarray if raw =
Trueor a series if raw =False. Can also accept a numba JIT function with engine = “numba” specified.raw (bool, optional) – Whether to pass a numpy ndarray or a pandas series to func. Defaults to
False.engine (str, optional) – Either “cython” or “numba”. Defaults to
Noneengine_kws (dict, optional) – Configuration of the “numba” engine. Keys can be “nopython”, “nogil”, and “parallel”, and values must be
TrueorFalse. Defaults toNone.*args – Positional arguments to pass to func.
**kwargs – Keyword arguments to pass to func.
- class RollingWindow(*args, **kwargs)[source]
Bases:
ArgReprSimple partial of for calling a pandas object’s
rollingmethod.- Parameters:
*args – Arguments to pass on to the
rollingmethod call.**kwargs – Keyword arguments to pass on to the
rollingmethod call.
Notes
See the pandas rolling docs for a full list of (keyword) arguments and an extensive description of usage.
- class RowsSelector(condition)[source]
Bases:
ArgReprSelect rows from a pandas dataframe with a boolean mask or function.
This is simply a partial for calling a dataframe’s
__getitem__method (using the square-brackets accessor) with a callable that takes the dataframe as input, and produces a 1-D, boolean array-like structure (of the same length as the dataframe to select from).- Parameters:
condition (callable or array-like) – A callable that accepts a dataframe and produces a 1-D, boolean array-like structure of the same length
- class SetIndex(keys, drop=True, append=False, verify_integrity=False)[source]
Bases:
ArgReprSimple partial of a pandas dataframe’s
set_indexmethod.- Parameters:
keys (hashable or array-like) – This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays.
drop (bool, optional) – Delete columns to be used as the new index. Defaults to
True.append (bool, optional) – Whether to append columns to existing index. Defaults to
Falseverify_integrity (bool, optional) – Whether to check the new index for duplicates. Defaults to
False. Setting toTruewill impact the performance of this method.
- class SortValues(by, **kwargs)[source]
Bases:
ArgReprPartial of the pandas dataframe
sort_valuesmethod.- Parameters:
by (hashable or sequence) – Name or list of names to sort by.
**kwargs – Additional keyword arguments will be forwarded to the method call with the exception of “inplace”, which will be set to
False.
Note
For a full list of keyword arguments and their description, see the pandas sort_values documentation.