Polars vs Pandas

what's the difference?

https://slides.com/cheukting_ho/polars-vs-pandas/

Hello I am Cheuk

  • Open-Source contributor


     
  • Organisers of community events


     
  • PSF director and fellow
     
  • Community manager at OpenSSF

Who know what Polars is?

Who know what Polars is?

Who is using Polars?

(instead of pandas)

Polars != pandas 2.0

It's a         library

It also has a Python API

Some consider it a pandas alternative

But why 

Pandas is a wrapper of Numpy

Source : https://www.dataquest.io/blog/pandas-big-data/

Any Numpy uses lots of pre-compiled C code

 > C ?

Rust is a “memory-safe” language because...

  • Special ownership rules
  • Vigorous compiler check
  • "If it complies, it is probably safe"

Similarities

Similarities

  • Data Structures:
    Data Frames, Series
     
  • Data Types:
    Numerical values, Datetime, Objects
     
  • Data Transformation:
    Joints, Pivots, Groupby and Aggregate

How to transition form Pandas to Polars?

You don't...

 

Pandas is still useful, but Polars is an alternative worth looking into

If you do this in Pandas,
do that in Polars

Pandas

  • import pandas as pd
     
  • pd.read_csv()
     
  • pd.read_excel()
     
  • Read in batches
     
  • pd.DataFrame(
        {"a":[1,2,3], "b":[4,5,6]}
    )

Polars

  • import polars as pl
     
  • pl.read_csv()
     
  • pl.read_excel()
     
  • & Lazy loading
     
  • pl.DataFrame(
        {"a":[1,2,3], "b":[4,5,6]}
    )

Pandas

  • df.head()
     
  • df["a"]
     
  • df[["a","b"]]


     
  • titanic[titanic["Age"] > 35]

Polars

  • df.head()
     
  • pl.col("a")
     
  • df.select(
        pl.col(["a", "b"])
    )
     
  • titanic.filter(
        pl.col("Age") > 35
    )

Pandas

  • df.plot()
     
  • df.sample() # with weight
     
  • df.describe()

Polars



  • df.sample() # no weight

  • df.describe()

Performance comparison

Which one should I use?

Pandas

  • multifunctional tool for exploring data
     
  • For data that fits in memory
     
  • Stable, well established
     
  • Good for data exploration
     
  • Works well with Sci-kit learn

Polars

  • specialised in data transformation
     
  • Trial capability handling out-of-memory data
     
  • Young, actively developing
     
  • Good for production
    (if you pin the version)
     
  • Works well if you have many cores

I think you should give Polars a try

https://conference.pyladies.com/