pl

Polars utilities and partials of dataframe method calls.

Parameters that are known at program start are used to initialize the classes so that, at runtime, dataframes can flow through a preconfigured processing pipe of callable objects.

class Drop(*columns, strict=True)[source]

Bases: ArgRepr

Partial of the polars dataframe drop method.

Parameters:
  • *columns (ColumnNameOrSelector) – Names of the columns that should be removed from the dataframe. Accepts column selector input.

  • strict (bool, optional) – Validate that all column names exist in the current schema, and throw an exception if any do not. Defaults to True

__call__(df)[source]

Drop columns from a polars dataframe.

Parameters:

df (DataFrame) – The Dataframe to drop columns from.

Returns:

The Dataframe without the dropped columns.

Return type:

DataFrame

class Filter(*predicates, **constraints)[source]

Bases: ArgRepr

Partial of the polars dataframe filter method.

Parameters:
  • *predicates – Expression(s) that evaluate to a boolean Series.

  • **constraints – Filter column(s) given named by the keyword argument itself by the supplied value. Constraints will be implicitly combined with other filters with a logical and.

__call__(df)[source]

Filter dataframe by predicates anf value constraints.

Parameters:

df (DataFrame) – The dataframe to filter.

Returns:

The filtered dataframe.

Return type:

DataFrame

class FromPandas(schema_overrides=None, rechunk=True, nan_to_null=True, include_index=False)[source]

Bases: ArgRepr

Partial of the polars top-level function from_pandas.

Parameters:
  • schema_overrides (dict, optional) – Support override of inferred types for one or more columns. Defaults to None.

  • rechunk (bool, optional) – Make sure that all data is in contiguous memory. Default to True.

  • nan_to_null (bool, optional) – Pyarrow will convert the NaN to None. Default to True.

  • include_index (bool, optional) – Load any non-default pandas indexes as columns. Default to False.

__call__(pandas)[source]

Convert pandas structures into polars series or dataframes

Parameters:

pandas – Dataframe, series, or index to convert to polars.

Returns:

Series if pandas series or index, dataframe otherwise.

Return type:

Series or DataFrame

class GroupBy(*by, maintain_order=False, **named_by)[source]

Bases: ArgRepr

Partial of the polars dataframe group_by method.

Parameters:
  • *by (IntoExpr) – Column(s) to group by. Accepts expression input. Strings are parsed as column names.

  • maintain_order (bool, optional) – Ensure that the order of the groups is consistent with the input data. This is slower than a default group by. Settings this to True blocks the possibility to run on the streaming engine. Default to False.

  • **named_by (IntoExpr) – Additional columns to group by, specified as keyword arguments. The columns will be renamed to the keyword used.

__call__(df)[source]

Group a polars dataframe.

Parameters:

df (DataFrame) – The dataframe to group.

Returns:

The grouped dataframe.

Return type:

DataFrame

class GroupByAgg(*aggs, **named_aggs)[source]

Bases: ArgRepr

Partial of a polars (dynamic) group-by object’s agg method.

Parameters:
  • *aggs (IntoExpr) – Aggregations to compute for each group of the group by operation, specified as positional arguments. Accepts expression input. Strings are parsed as column names.

  • **named_aggs (IntoExpr) – Additional aggregations, specified as keyword arguments. The resulting columns will be renamed to the keyword used.

__call__(grouped)[source]

Aggregate a polars (dynamic) group-by object.

Parameters:

grouped (GroupBy or DynamicGroupBy) – The polars (dynamic) group-by object to aggregate.

Returns:

The aggregated (dynamic) group-by object.

Return type:

DataFrame

class GroupByDynamic(index_column, every, period=None, offset=None, include_boundaries=False, closed='left', label='left', group_by=None, start_by='window')[source]

Bases: ArgRepr

Partial of the polars dataframe group_by_dynamic method.

Parameters:
  • index_column (IntoExpr) – Column used to group based on the time window. Often of type Date or Datetime. This column must be sorted in ascending order (or, if group_by is specified, then it must be sorted in ascending order within each group). In case of a dynamic group by on indices, dtype needs to be Int32 or Int64. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.

  • every (str or timedelta) – Interval of the window. Suffix string of integer number with the letter “i” to indicate indexing by integer columns.

  • period (str or timedelta, optional) – Length of the window. Equals ‘every’ if set to None (the default).

  • offset (str or timedelta) – Offset of the window. Does not take effect if start_by is “datapoint”. Defaults to zero.

  • include_boundaries (bool, optional) – Add the lower and upper bound of the window to the “_lower_boundary” and “_upper_boundary” columns. This will impact performance because it is harder to parallelize. Defaults to False.

  • closed ("left", "right", "both", "none") – Define which sides of the temporal interval are closed (inclusive).

  • label ("left", "right", "datapoint") – Which label to use for the window, lower boundary, upper boundary, or first value of the index column in the given window. If you don’t need the label to be at one of the boundaries, choose this option for maximum performance.

  • group_by (IntoExpr, optional) – Also group by this column/these columns. Defaults to None.

  • start_by ("window", "datapoint", "monday", "tuesday", ...) – The strategy to determine the start of the first window by, where “window” takes the earliest timestamp, truncates it with every, and then adds offset. Weekly windows start on Monday. “datapoint” starts from the first encountered data point, whereas any day of the week starts the window at the weekday before the first data point. The resulting window is then shifted back until the earliest datapoint is in or in front of it.

__call__(df)[source]

Evaluate rolling-window aggregations on a polars dataframe.

Parameters:

df (DataFrame) – The dataframe to compute rolling-window aggregations on.

Returns:

The rolling-window aggregations.

Return type:

DataFrame

class Join(on=None, how='inner', left_on=None, right_on=None, suffix='_right', validate='m:m', nulls_equal=False, coalesce=None, maintain_order=None)[source]

Bases: ArgRepr

Partial of the polars dataframe join method.

Parameters:
  • on (str) – Name(s) of the join columns in both DataFrames. If set, left_on and right_on should be None. Should not be specified if how is “cross”. Defaults to None.

  • how ("inner", "left", "right", "full", "semi", "anti", "cross") – Join strategy.

  • left_on (str, optional) – Name(s) of the left join column(s). Defaults to None.

  • right_on (str, optional) – Name(s) of the right join column(s). Defaults to None.

  • suffix (str, optional) – Suffix to append to columns with a duplicate name. Defaults to “_right”.

  • validate ("m:m", "m:1", "1:m", "1:1") – Checks if join is of specified type, many-to-many, many-to-one, one-to_many, or one-to-one.

  • nulls_equal (bool, optional) – Join on null values. By default, null values will never produce matches. Defaults to False.

  • coalesce (bool, optional) – Coalescing behavior (merging of join columns). Defaults to None, which leaves the behaviour join specific.

  • maintain_order ("none", "left", "right", "left_right", "right_left") – Which dataframe row order to preserve, if any. Do not rely on any observed ordering without explicitly setting this parameter, as your code may break in a future release. Not specifying any ordering can improve performance Supported for inner, left, right and full joins.

__call__(left, right)[source]

Join two polars dataframes.

Parameters:
  • left (DataFrame) – Left dataframe in the join.

  • right (DataFrame) – Right dataframe in the join.

Returns:

The joined dataframes.

Return type:

DataFrame

class Select(*exprs, **named_exprs)[source]

Bases: ArgRepr

Partial of the polars dataframe select method.

Parameters:
  • *exprs (IntoExpr) – Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

  • **named_exprs (IntoExpr) – Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.

__call__(df)[source]

Select columns from a polars dataframe.

Parameters:

df (DataFrame) – The Dataframe to select columns from.

Returns:

The selected columns.

Return type:

DataFrame

class Sort(by, *more_by, descending=False, nulls_last=False, multithreaded=True, maintain_order=False)[source]

Bases: ArgRepr

Partial of the polars dataframe sort method.

Parameters:
  • by (IntoExpr) – Column(s) to sort by. Accepts expression input, including selectors. Strings are parsed as column names.

  • *more_by (IntoExpr) – Additional columns to sort by, specified as positional arguments.

  • descending (bool, optional) – Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans. Defaults to False.

  • nulls_last (bool, optional) – Place null values last. Can be a single boolean applying to all columns or a sequence of booleans for per-column control. Defaults to False

  • multithreaded (bool, optional) – Sort using multiple threads. Defaults to True.

  • maintain_order (bool, optional) – Whether the order should be maintained if elements are equal. Defaults to False.

__call__(df)[source]

Sort polars dataframe by column values.

Parameters:

df (DataFrame) – The Dataframe to sort.

Returns:

The sorted dataframe.

Return type:

DataFrame

class ToPandas(use_pyarrow_extension_array=False, **kwargs)[source]

Bases: ArgRepr

Partial of the polars dataframe to_pandas method.

Parameters:
  • use_pyarrow_extension_array (bool, optional) – Use pyarrow-backed extension arrays instead of numpy arrays for the columns of the pandas dataframe. This allows zero copy operations and preservation of null values. Subsequent operations on the resulting pandas dataframe may trigger conversion to numpy if those operations are not supported by pyarrow compute. Defaults to False.

  • **kwargs – Additional keyword arguments to be passed to pyarrow.Table.to_pandas().

__call__(df)[source]

Convert a polars dataframe into a pandas one.

Parameters:

df (DataFrame) – The polars dataframe to convert.

Returns:

The converted pandas dataframe.

Return type:

PandasFrame

class WithColumns(*exprs, **named_exprs)[source]

Bases: ArgRepr

Partial of the polars dataframe with_columns method.

Parameters:
  • *exprs (IntoExpr) – Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

  • **named_exprs (IntoExpr) – Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.

__call__(df)[source]

Add or replace columns to/of a polars dataframe.

Parameters:

df (DataFrame) – The dataframe to add or replace columns to/of.

Returns:

The dataframe with columns added or replaced.

Return type:

DataFrame