POLARS

Revolutionize Your Data Workflow with Polars: The Ultimate Pandas Replacement

Index

Introduction

  • No query planning
  • Limited multicore algorithms
  • RAM management
  • Warty missing data support
  • Complex group by operations awkward and slow
  • Columnar

  • Standardization

  • Cross Languages

  • Diverse Data Type

  • Nested Data Type

Apache Arrow

Column vs ROW

Core Features and Advantages

Core Features and Advantages

Rust

Rust which gives it C/C++ performance and allows it to fully control performance critical parts in a query engine.

ARROW

Apache Arrow is a cross-language, columnar memory format for big data analytics, enabling efficient data sharing between systems.

Expression

Enables custom computations and transformations for composable operations and better performance.

IO and Lazy mode

QUERY PLAN

  • select
  • filter
  • join
  • with_columns
  • sort
  • groupby | agg

Manipulation/selection

  • collect
  • fetch
  • serialize
  • deserialize
  • profile
  • explain
  • over
  • str
  • dt
  • cat
  • bin
  • list

JOIN

GROUP BY

Expressions

MERGE

LazyFRAME

Real-world Applications

Tick2kline pandas

Tick2kline pandas

Tick2kline polars

Tick2kline polars

Tick2kline

Real-world Applications

Daily data preprocess

Daily data preprocess

Daily data preprocess

Futures Data Clean

Futures Data Clean

tick2kline with rust

tick2kline with rust

tick2kline with rust

tick2kline with rust

tick2kline with rust

Polars Extension with rust

QA

Polars Rust Python

Alternatives compare

QUERY PLAN
LARGE DATA
DIRTY DATA
STRICTER API
COLUMNAR
PARALLELIZE
DISTRIBUTED
PANDAS DASK MODIN VAEX CUDF SPARK DUCKDB POLARS

Polars

By yvictor

Polars

Revolutionize Your Data Workflow with Polars: The Ultimate Pandas Replacement

  • 186