pl
Polars utilities and partials of dataframe method calls.
Parameters that are known at program start are used to initialize the classes so that, at runtime, dataframes can flow through a preconfigured processing pipe of callable objects.
- class Drop(*columns, strict=True)[source]
Bases:
ArgRepr
Partial of the polars dataframe drop method.
- Parameters:
*columns (ColumnNameOrSelector) – Names of the columns that should be removed from the dataframe. Accepts column selector input.
strict (bool, optional) – Validate that all column names exist in the current schema, and throw an exception if any do not. Defaults to
True
- class Filter(*predicates, **constraints)[source]
Bases:
ArgRepr
Partial of the polars dataframe filter method.
- Parameters:
*predicates – Expression(s) that evaluate to a boolean Series.
**constraints – Filter column(s) given named by the keyword argument itself by the supplied value. Constraints will be implicitly combined with other filters with a logical and.
- class FromPandas(schema_overrides=None, rechunk=True, nan_to_null=True, include_index=False)[source]
Bases:
ArgRepr
Partial of the polars top-level function from_pandas.
- Parameters:
schema_overrides (dict, optional) – Support override of inferred types for one or more columns. Defaults to
None
.rechunk (bool, optional) – Make sure that all data is in contiguous memory. Default to
True
.nan_to_null (bool, optional) – Pyarrow will convert the
NaN
toNone
. Default toTrue
.include_index (bool, optional) – Load any non-default pandas indexes as columns. Default to
False
.
- class GroupBy(*by, maintain_order=False, **named_by)[source]
Bases:
ArgRepr
Partial of the polars dataframe group_by method.
- Parameters:
*by (IntoExpr) – Column(s) to group by. Accepts expression input. Strings are parsed as column names.
maintain_order (bool, optional) – Ensure that the order of the groups is consistent with the input data. This is slower than a default group by. Settings this to True blocks the possibility to run on the streaming engine. Default to
False
.**named_by (IntoExpr) – Additional columns to group by, specified as keyword arguments. The columns will be renamed to the keyword used.
- class GroupByAgg(*aggs, **named_aggs)[source]
Bases:
ArgRepr
Partial of a polars (dynamic) group-by object’s agg method.
- Parameters:
*aggs (IntoExpr) – Aggregations to compute for each group of the group by operation, specified as positional arguments. Accepts expression input. Strings are parsed as column names.
**named_aggs (IntoExpr) – Additional aggregations, specified as keyword arguments. The resulting columns will be renamed to the keyword used.
- class GroupByDynamic(index_column, every, period=None, offset=None, include_boundaries=False, closed='left', label='left', group_by=None, start_by='window')[source]
Bases:
ArgRepr
Partial of the polars dataframe group_by_dynamic method.
- Parameters:
index_column (IntoExpr) – Column used to group based on the time window. Often of type Date or Datetime. This column must be sorted in ascending order (or, if group_by is specified, then it must be sorted in ascending order within each group). In case of a dynamic group by on indices, dtype needs to be Int32 or Int64. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.
every (str or timedelta) – Interval of the window. Suffix string of integer number with the letter “i” to indicate indexing by integer columns.
period (str or timedelta, optional) – Length of the window. Equals ‘every’ if set to
None
(the default).offset (str or timedelta) – Offset of the window. Does not take effect if start_by is “datapoint”. Defaults to zero.
include_boundaries (bool, optional) – Add the lower and upper bound of the window to the “_lower_boundary” and “_upper_boundary” columns. This will impact performance because it is harder to parallelize. Defaults to
False
.closed ("left", "right", "both", "none") – Define which sides of the temporal interval are closed (inclusive).
label ("left", "right", "datapoint") – Which label to use for the window, lower boundary, upper boundary, or first value of the index column in the given window. If you don’t need the label to be at one of the boundaries, choose this option for maximum performance.
group_by (IntoExpr, optional) – Also group by this column/these columns. Defaults to
None
.start_by ("window", "datapoint", "monday", "tuesday", ...) – The strategy to determine the start of the first window by, where “window” takes the earliest timestamp, truncates it with every, and then adds offset. Weekly windows start on Monday. “datapoint” starts from the first encountered data point, whereas any day of the week starts the window at the weekday before the first data point. The resulting window is then shifted back until the earliest datapoint is in or in front of it.
- class Join(on=None, how='inner', left_on=None, right_on=None, suffix='_right', validate='m:m', nulls_equal=False, coalesce=None, maintain_order=None)[source]
Bases:
ArgRepr
Partial of the polars dataframe join method.
- Parameters:
on (str) – Name(s) of the join columns in both DataFrames. If set, left_on and right_on should be
None
. Should not be specified if how is “cross”. Defaults toNone
.how ("inner", "left", "right", "full", "semi", "anti", "cross") – Join strategy.
left_on (str, optional) – Name(s) of the left join column(s). Defaults to
None
.right_on (str, optional) – Name(s) of the right join column(s). Defaults to
None
.suffix (str, optional) – Suffix to append to columns with a duplicate name. Defaults to “_right”.
validate ("m:m", "m:1", "1:m", "1:1") – Checks if join is of specified type, many-to-many, many-to-one, one-to_many, or one-to-one.
nulls_equal (bool, optional) – Join on null values. By default, null values will never produce matches. Defaults to
False
.coalesce (bool, optional) – Coalescing behavior (merging of join columns). Defaults to
None
, which leaves the behaviour join specific.maintain_order ("none", "left", "right", "left_right", "right_left") – Which dataframe row order to preserve, if any. Do not rely on any observed ordering without explicitly setting this parameter, as your code may break in a future release. Not specifying any ordering can improve performance Supported for inner, left, right and full joins.
- class Select(*exprs, **named_exprs)[source]
Bases:
ArgRepr
Partial of the polars dataframe select method.
- Parameters:
*exprs (IntoExpr) – Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.
**named_exprs (IntoExpr) – Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.
- class Sort(by, *more_by, descending=False, nulls_last=False, multithreaded=True, maintain_order=False)[source]
Bases:
ArgRepr
Partial of the polars dataframe sort method.
- Parameters:
by (IntoExpr) – Column(s) to sort by. Accepts expression input, including selectors. Strings are parsed as column names.
*more_by (IntoExpr) – Additional columns to sort by, specified as positional arguments.
descending (bool, optional) – Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans. Defaults to
False
.nulls_last (bool, optional) – Place null values last. Can be a single boolean applying to all columns or a sequence of booleans for per-column control. Defaults to
False
multithreaded (bool, optional) – Sort using multiple threads. Defaults to
True
.maintain_order (bool, optional) – Whether the order should be maintained if elements are equal. Defaults to
False
.
- class ToPandas(use_pyarrow_extension_array=False, **kwargs)[source]
Bases:
ArgRepr
Partial of the polars dataframe to_pandas method.
- Parameters:
use_pyarrow_extension_array (bool, optional) – Use pyarrow-backed extension arrays instead of numpy arrays for the columns of the pandas dataframe. This allows zero copy operations and preservation of null values. Subsequent operations on the resulting pandas dataframe may trigger conversion to numpy if those operations are not supported by pyarrow compute. Defaults to
False
.**kwargs – Additional keyword arguments to be passed to pyarrow.Table.to_pandas().
- class WithColumns(*exprs, **named_exprs)[source]
Bases:
ArgRepr
Partial of the polars dataframe with_columns method.
- Parameters:
*exprs (IntoExpr) – Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.
**named_exprs (IntoExpr) – Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.