pl
Polars utilities and partials of dataframe method calls.
Parameters that are known at program start are used to initialize the classes so that, at runtime, dataframes can flow through a preconfigured processing pipe of callable objects.
- class Cast(dtypes, *, strict=True)[source]
Bases:
ArgReprPartial of the polars dataframe cast method.
- Parameters:
dtypes (Dtypes) – Mapping of column names (or selector) to dtypes, or a single dtype to which all columns will be cast.
strict (bool, optional) – Raise if cast is invalid on rows after predicates are pushed down. If
False, invalid casts will produce null values. Defaults toTrue
- class Drop(*columns, strict=True)[source]
Bases:
ArgReprPartial of the polars dataframe drop method.
- Parameters:
*columns (ColumnNameOrSelector) – Names of the columns that should be removed from the dataframe. Accepts column selector input.
strict (bool, optional) – Validate that all column names exist in the current schema, and throw an exception if any do not. Defaults to
True
- class DropNulls(subset=None)[source]
Bases:
ArgReprPartial of the polars dataframe drop_nulls method.
- Parameters:
subset (Subset) – Column name(s) for which null values are considered. If set to
None(default), use all columns.
- class Filter(*predicates, **constraints)[source]
Bases:
ArgReprPartial of the polars dataframe filter method.
- Parameters:
*predicates – Expression(s) that evaluate to a boolean Series.
**constraints – Filter column(s) given named by the keyword argument itself by the supplied value. Constraints will be implicitly combined with other filters with a logical and.
- class FromPandas(schema_overrides=None, rechunk=True, nan_to_null=True, include_index=False)[source]
Bases:
ArgReprPartial of the polars top-level function from_pandas.
- Parameters:
schema_overrides (dict, optional) – Support override of inferred types for one or more columns. Defaults to
None.rechunk (bool, optional) – Make sure that all data is in contiguous memory. Default to
True.nan_to_null (bool, optional) – Pyarrow will convert the
NaNtoNone. Default toTrue.include_index (bool, optional) – Load any non-default pandas indexes as columns. Default to
False.
- class GroupBy(*by, maintain_order=False, **named_by)[source]
Bases:
ArgReprPartial of the polars dataframe group_by method.
- Parameters:
*by (IntoExpr) – Column(s) to group by. Accepts expression input. Strings are parsed as column names.
maintain_order (bool, optional) – Ensure that the order of the groups is consistent with the input data. This is slower than a default group by. Settings this to True blocks the possibility to run on the streaming engine. Default to
False.**named_by (IntoExpr) – Additional columns to group by, specified as keyword arguments. The columns will be renamed to the keyword used.
- class GroupByAgg(*aggs, **named_aggs)[source]
Bases:
ArgReprPartial of a polars (dynamic) group-by object’s agg method.
- Parameters:
*aggs (IntoExpr) – Aggregations to compute for each group of the group by operation, specified as positional arguments. Accepts expression input. Strings are parsed as column names.
**named_aggs (IntoExpr) – Additional aggregations, specified as keyword arguments. The resulting columns will be renamed to the keyword used.
- class GroupByDynamic(index_column, every, period=None, offset=None, include_boundaries=False, closed='left', label='left', group_by=None, start_by='window')[source]
Bases:
ArgReprPartial of the polars dataframe group_by_dynamic method.
- Parameters:
index_column (IntoExpr) – Column used to group based on the time window. Often of type Date or Datetime. This column must be sorted in ascending order (or, if group_by is specified, then it must be sorted in ascending order within each group). In case of a dynamic group by on indices, dtype needs to be Int32 or Int64. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.
every (str or timedelta) – Interval of the window. Suffix string of integer number with the letter “i” to indicate indexing by integer columns.
period (str or timedelta, optional) – Length of the window. Equals ‘every’ if set to
None(the default).offset (str or timedelta) – Offset of the window. Does not take effect if start_by is “datapoint”. Defaults to zero.
include_boundaries (bool, optional) – Add the lower and upper bound of the window to the “_lower_boundary” and “_upper_boundary” columns. This will impact performance because it is harder to parallelize. Defaults to
False.closed ("left", "right", "both", "none") – Define which sides of the temporal interval are closed (inclusive).
label ("left", "right", "datapoint") – Which label to use for the window, lower boundary, upper boundary, or first value of the index column in the given window. If you don’t need the label to be at one of the boundaries, choose this option for maximum performance.
group_by (IntoExpr, optional) – Also group by this column/these columns. Defaults to
None.start_by ("window", "datapoint", "monday", "tuesday", ...) – The strategy to determine the start of the first window by, where “window” takes the earliest timestamp, truncates it with every, and then adds offset. Weekly windows start on Monday. “datapoint” starts from the first encountered data point, whereas any day of the week starts the window at the weekday before the first data point. The resulting window is then shifted back until the earliest datapoint is in or in front of it.
- class Join(on=None, how='inner', left_on=None, right_on=None, suffix='_right', validate='m:m', nulls_equal=False, coalesce=None, maintain_order=None)[source]
Bases:
ArgReprPartial of the polars dataframe join method.
- Parameters:
on (str) – Name(s) of the join columns in both DataFrames. If set, left_on and right_on should be
None. Should not be specified if how is “cross”. Defaults toNone.how ("inner", "left", "right", "full", "semi", "anti", "cross") – Join strategy.
left_on (str, optional) – Name(s) of the left join column(s). Defaults to
None.right_on (str, optional) – Name(s) of the right join column(s). Defaults to
None.suffix (str, optional) – Suffix to append to columns with a duplicate name. Defaults to “_right”.
validate ("m:m", "m:1", "1:m", "1:1") – Checks if join is of specified type, many-to-many, many-to-one, one-to_many, or one-to-one.
nulls_equal (bool, optional) – Join on null values. By default, null values will never produce matches. Defaults to
False.coalesce (bool, optional) – Coalescing behavior (merging of join columns). Defaults to
None, which leaves the behaviour join specific.maintain_order ("none", "left", "right", "left_right", "right_left") – Which dataframe row order to preserve, if any. Do not rely on any observed ordering without explicitly setting this parameter, as your code may break in a future release. Not specifying any ordering can improve performance Supported for inner, left, right and full joins.
- class Pivot(on, on_columns=None, **kwargs)[source]
Bases:
ArgReprPartial of the polars dataframe pivot method.
- Parameters:
on (ColumnNameOrSelector) – The column(s) whose values will be used as the new columns of the output dataframe.
on_columns (Sequence or None) – What value combinations will be considered for the output table.
- class Rename(mapping, *, strict=True)[source]
Bases:
ArgReprPartial of the polars dataframe rename method.
- Parameters:
mapping (Mapping or Callable) – Key value pairs that map from old name to new name, or a function that takes the old name as input and returns the new name.
strict (bool, optional) – Validate that all column names exist in the current schema, and throw an exception if any do not. Defaults to
True.
- class Select(*exprs, **named_exprs)[source]
Bases:
ArgReprPartial of the polars dataframe select method.
- Parameters:
*exprs (IntoExpr) – Column(s) to select, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.
**named_exprs (IntoExpr) – Additional columns to select, specified as keyword arguments. The columns will be renamed to the keyword used.
- class Sort(by, *more_by, descending=False, nulls_last=False, multithreaded=True, maintain_order=False)[source]
Bases:
ArgReprPartial of the polars dataframe sort method.
- Parameters:
by (IntoExpr) – Column(s) to sort by. Accepts expression input, including selectors. Strings are parsed as column names.
*more_by (IntoExpr) – Additional columns to sort by, specified as positional arguments.
descending (bool, optional) – Sort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans. Defaults to
False.nulls_last (bool, optional) – Place null values last. Can be a single boolean applying to all columns or a sequence of booleans for per-column control. Defaults to
Falsemultithreaded (bool, optional) – Sort using multiple threads. Defaults to
True.maintain_order (bool, optional) – Whether the order should be maintained if elements are equal. Defaults to
False.
- class ToPandas(use_pyarrow_extension_array=False, **kwargs)[source]
Bases:
ArgReprPartial of the polars dataframe to_pandas method.
- Parameters:
use_pyarrow_extension_array (bool, optional) – Use pyarrow-backed extension arrays instead of numpy arrays for the columns of the pandas dataframe. This allows zero copy operations and preservation of null values. Subsequent operations on the resulting pandas dataframe may trigger conversion to numpy if those operations are not supported by pyarrow compute. Defaults to
False.**kwargs – Additional keyword arguments to be passed to pyarrow.Table.to_pandas().
- class VStack(in_place=False)[source]
Bases:
ArgReprPartial of the polars dataframe vstack method.
- Parameters:
in_place (bool, optional) – Whether to modify in place. Defaults to
False.
- __call__(upper, lower)[source]
Stack to polars dataframes on top of each other.
- Parameters:
upper (DataFrame) – The upper dataframe to be appended to.
lower (DataFrame) – The lower dataframe being appended to upper.
- Returns:
The upper and lower dataframes stacked in top of each other.
- Return type:
DataFrame
- class WithColumns(*exprs, **named_exprs)[source]
Bases:
ArgReprPartial of the polars dataframe with_columns method.
- Parameters:
*exprs (IntoExpr) – Column(s) to add, specified as positional arguments. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.
**named_exprs (IntoExpr) – Additional columns to add, specified as keyword arguments. The columns will be renamed to the keyword used.
io
- class Parquet2LazyFrame(path='', storage=LazyStorage.FILE, storage_kws=None, **kwargs)[source]
Bases:
LazyReaderLazily scan a parquet file on any supported file system.
- Parameters:
path (str, optional) – Base directory or full path to the parquet file. Since part of it can also be provided later, when the callable instance is called, it is optional here. Defaults to an empty string.
storage (str, optional) – The type of file system to read from (“file”, “s3”, “gcs”, etc.). Defaults to “file”. Use the
LazyStorageenum to avoid typos.storage_kws (dict, optional) – Passed on as storage_options to
polars.scan_parquet().**kwargs – Passed on as additional keyword arguments to polar’s top-level
scan_parquet()function. See the scan documentation for available options.
- Raises:
TypeError – If path is not a string or storage_kws is not a dictionary.
ValueError – If storage is not among the currently supported file-system schemes.
See also
- __call__(path='')[source]
Lazily scan a parquet file on the specified file system.
- Parameters:
path (str, optional) – Path (including file name) to the parquet file to scan. If it starts with a forward slash, it is interpreted as absolute; otherwise, it is joined to the path given at instantiation. Defaults to an empty string, which leaves the instantiation path unchanged.
- Returns:
A Polars
LazyFramebacked by the specified parquet file. No data is read until the frame is collected or sinked.- Return type:
LazyFrame
- Raises:
ValueError – If the final path is directly under root (e.g., “/file.parquet”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.
- class LazyFrame2Parquet(path, storage=LazyStorage.FILE, storage_kws=None, **kwargs)[source]
Bases:
LazyWriterSink a polars lazy frame to a parquet file on any supported file system.
- Parameters:
path (str) – The absolute path to the parquet file to write. May include any number of string placeholders (i.e., pairs of curly brackets) that will be interpolated when the instance is called.
storage (str, optional) – The type of file system to write to (“file”, “s3”, “gcs”, etc.). Defaults to “file”. Use the
LazyStorageenum to avoid typos.storage_kws (dict, optional) – Passed as storage_options to
polars.LazyFrame.sink_parquet().**kwargs – Passed on as additional keyword arguments to
polars.LazyFrame.sink_parquet(). See the sink documentation for available options.
- Raises:
TypeError – If path is not a string or storage_kws is not a dictionary.
ValueError – If storage is not among the currently supported file-system schemes.
See also
Note
sink_parquetrequires a streaming-compatible query plan. Ensure your lazy query is compatible before calling. Polars will raise if it is not.- __call__(ldf, *parts)[source]
Sink a polars lazy frame to a parquet file.
- Parameters:
ldf (LazyFrame) – The polars lazy frame to sink.
*parts (str) – Fragments that will be interpolated into the path given at instantiation. Obviously, there must be at least as many as there are placeholders in the path.
- Returns:
An empty tuple.
- Return type:
tuple
- Raises:
IndexError – If the path given at instantiation has more string placeholders that there are parts.
ValueError – If the final path is directly under root (e.g., “/file.parquet”) because, on local file system, this is not where you want to save to and, on object storage, the first directory refers to the name of an (existing!) bucket.
Base classes
- class LazyReader(path='', storage=LazyStorage.FILE, storage_kws=None, *args, **kwargs)[source]
Bases:
ArgReprBase class for scanning polars lazy frames from any filesystem.
- Parameters:
path (str, optional) – Directory under which the parquet file is located or its full path. Since it (or part of it) can also be provided later, when the callable instance is called, it is optional here. Defaults to an empty string.
storage (str, optional) – The type of file system to scan from (“file”, “s3”, etc.). Defaults to “file”. Use the
LazyStorageenum to avoid typos.storage_kws (dict, optional) – Passed on as storage_options to polars’ scan methods.
*args – Additional arguments are reflected in the representation of instances but do not affect functionality in any way.
**kwargs – Additional keyword arguments are reflected in the representation of instances but do not affect functionality in any way.
- Raises:
TypeError – If path is not a string or storage_kws is not a dictionary.
ValueError – If storage is not among the currently supported file-system schemes.
See also
- property prefix
The URI prefix for the selected storage backend.
- class LazyWriter(path, storage=LazyStorage.FILE, storage_kws=None, *args, **kwargs)[source]
Bases:
ArgReprBase class for sinks of polars lazy frames on any filesystem.
- Parameters:
path (str) – The absolute path to the file to sink. May contain any number of string placeholders (i.e., pairs of curly brackets) that will be interpolated when instances are called.
storage (str, optional) – The type of file system to write to (“file”, “s3”, etc.). Defaults to “file”. Use the
LazyStorageenum to avoid typos.storage_kws (dict, optional) – Passed on as keyword arguments both to the fsspec filesystem constructor and as storage_options to polars’ sink methods.
*args – Additional arguments are reflected in the representation of instances but do not affect functionality in any way.
**kwargs – Additional keyword arguments are reflected in the representation of instances but do not affect functionality in any way.
- Raises:
TypeError – If path is not a string or storage_kws is not a dictionary.
ValueError – If storage is not among the currently supported file-system schemes.
See also
- property prefix
The URI prefix for the selected storage backend.