Profiling

Pandora has a built-in profiling tool, that can be used to provide more insight on the memory and time use of each step in the pipeline.

Warning

Make sure to install Pandora with development dependencies included.

By default, two graphs will be created :

an icicle graph, showing the time spent in each step of the pipeline
a plot, showing the memory consumption of Pandora at regular intervals during the execution

Configuration and parameters

Pandora’s profiling configuration works just like a pipeline step, but is placed at the root of the config file :

profiling

Name	Description	Type	Default value	Required
save_graphs	Save the default graphs generated	bool	False	No
save_raw_data	Save the raw data on calls as a .pickle file	bool	False	No

Note

profiling can also be set to True or False directly instead of being a dict, setting every boolean inside its configuration to the specified value.

Example configuration :

{
    "profiling": {
        "save_raw_data": true,
        "save_graphs": true
    },
    "input": {
        ...
    },
    "pipeline": {
        ...
    }
}

Saved profiling data

When save_raw_data is enabled, Pandora saves the profiling information as a .pickle file containing a pandas DataFrame with the following structure :

Name	Description
level	Depth of the function call in the profiling stack
parent	UUID of the “parent” call (call that was running when this call was made)
name	Understandable name given to the function call
uuid	Unique identifier of the function call
time	Time (in seconds) it took to execute the function
call_time	Timestamp (in seconds) at which the call was made
memory	Either None or a list of (timestamp memory) tuples representing memory consumption (in megabytes) at each timestamp during the function execution

Modifying the profiled functions

To include a function in the icicle time graph, simply add the @profile decorator to the function, providing a descriptive name.

If you also want to track memory usage over time for a specific function call, set memprof=True in the decorator. If the function is too fast (or slow) for the default memory sampling interval, you can modify it with interval (in seconds).

from pandora.profiler import profile

@profile("my profiled function", memprof=True, interval=0.5)
def my_function():
    ...