Profiling

Pandora has a built-in profiling tool, that can be used to provide more insight on the memory and time use of each step in the pipeline.

Warning

Make sure to install Pandora with development dependencies included.

By default, two graphs will be created :

  • an icicle graph, showing the time spent in each step of the pipeline

  • a plot, showing the memory consumption of Pandora at regular intervals during the execution

Configuration and parameters

Pandora’s profiling configuration works just like a pipeline step, but is placed at the root of the config file :

profiling

Name

Description

Type

Default value

Required

save_graphs

Save the default graphs generated

bool

False

No

save_raw_data

Save the raw data on calls as a .pickle file

bool

False

No

Note

profiling can also be set to True or False directly instead of being a dict, setting every boolean inside its configuration to the specified value.

Example configuration :

{
    "profiling": {
        "save_raw_data": true,
        "save_graphs": true
    },
    "input": {
        ...
    },
    "pipeline": {
        ...
    }
}

Saved profiling data

When save_raw_data is enabled, Pandora saves the profiling information as a .pickle file containing a pandas DataFrame with the following structure :

Name

Description

level

Depth of the function call in the profiling stack

parent

UUID of the “parent” call (call that was running when this call was made)

name

Understandable name given to the function call

uuid

Unique identifier of the function call

time

Time (in seconds) it took to execute the function

call_time

Timestamp (in seconds) at which the call was made

memory

Either None or a list of (timestamp memory) tuples representing memory consumption (in megabytes) at each timestamp during the function execution

Modifying the profiled functions

To include a function in the icicle time graph, simply add the @profile decorator to the function, providing a descriptive name.

If you also want to track memory usage over time for a specific function call, set memprof=True in the decorator. If the function is too fast (or slow) for the default memory sampling interval, you can modify it with interval (in seconds).

from pandora.profiler import profile

@profile("my profiled function", memprof=True, interval=0.5)
def my_function():
    ...