Driving down the Memray lane

Profiling your data science work

Get the slides: slides.com/cheukting_ho/memray-lane

Hello I am Cheuk

  • Open-Source contributor


     
  • Organisers of community events


     
  • ex EPS board member, PSF fellow
     
  • Looking for my next role

Have you ever seen
MemoryError?

So, what is memory profiling?

Profiling (in software eng.)

  • Investigate software behaviour
  • When it is being executed
  • Dynamic analysis
  • the tool that does that is called a profiler

Memory Profiling

  • Profiling - investigation of...
  • Memory allocation
  • Garbage collection
  • When the program is executed over time

Why we need a memory profiler?

Especially when doing data science work

Why we need a profiler?

  • Data science work involves lots of data
  • Python apps are not very good at memory management
  • Either having memory error, or worse...
  • hanging to a reference when it’s not there
  • We need to know what's going on

Different Memories

Heap vs Stack

Heap vs Stack

  • Store global variables
     
  • Allocated anywhere in the memory
     
  • Need to be free with reference (GC)
     
  • Or free when the process terminated
  • Store local variables
     
  • Allocated with a defined order
     
  • Released in order

     
  • Free when the function returned

Resident Set vs Virtual Memory

Resident vs Virtual

  • Actual memory that is used physically on RAM
     
  • Not an accurate measure for total used

     
  • However, it is a reliable estimate
  • Totally memory that the process needs
     
  • Not an accurate measure of how much consumed at a time
     
  • Estimation of the total amount used

Different OS may be different as well...

How Python manage memories

  • private heap for all Python objects and data structures
     
  • Python memory manager talks to the OS manager
     
  • pymalloc -> PyMem_RawMalloc -> VMM
     
  • object-specific allocators for different object types
  • optimized for objects <= 512 bytes
     
  • “arenas” with a fixed size of 256 KiB
     
  • Block that keeps only one Python object of a fixed size
     
  • size from 8 to 512 bytes
     
  • if bigger falls back to PyMem_RawMalloc and PyMem_RawRealloc

pymalloc

I found a tool to profile my code...In Jupyter 🙌

(the only problem is no Window support... yet)

Let's see how it works...