Command Line Interfaces

This module stores the Command-Line Interfaces (CLIs) exposes by the library as part of the installation process.

sl-crc

sl-crc [OPTIONS]

Options

-sp, --session_path <session_path>

Required The absolute path to the session whose raw behavior log data needs to be extracted into .feather files.

-id, --manager_id <manager_id>

Required The xxHash-64 hash value that represents the unique identifier for the process that manages this runtime. This is primarily used when calling this CLI on remote compute servers to ensure that only a single process can execute the CLI at a time.

Default:

0

-pdr, --processed_data_root <processed_data_root>

The absolute path to the directory where processed data from all projects is stored on the machine that runs this command. This argument is used when calling the CLI on the BioHPC server, which uses different data volumes for raw and processed data. Note, the input path must point to the root directory, as it will be automatically modified to include the project name, the animal id, and the session ID. Do not provide this argument if processed and raw data roots are the same.

-l, --legacy

Determines whether the processed session is a modern Sun lab session or a ‘legacy’ Tyche project session. Do not provide this flag unless you are working with ‘ascended’ Tyche data.

Default:

False

-c, --create_processed_directories

Determines whether to create the processed data hierarchy. Typically, this flag only needs to be enabled when this command is called outside of the typical data processing pipeline used in the Sun lab. Usually, processed data directories are created at an earlier stage of data processing, if it is carried out on the remote compute server.

Default:

False

-um, --update_manifest

Determines whether to (re)generate the manifest file for the processed session’s project. This flag should always be enabled when this CLI is executed on the remote compute server(s) to ensure that the manifest file always reflects the most actual state of each project.

Modern Log Processing

This module contains the functions used to parse the log data generated by the current Sun lab data acquisition systems. Specifically, the tools provided by this module read the compressed .npz log files and extract the necessary data as .feather files, used during later data processing stages.

sl_behavior.log_processing.extract_log_data(session_data, manager_id, parallel_workers=7, update_manifest=False)

Reads the compressed .npz log files stored in the raw_data directory of the target session and extracts all relevant behavior data stored in these files into the processed_data directory.

This function is intended to run on the BioHPC server as part of the ‘general’ data processing pipeline. It is optimized to process all log files in parallel and extract the data stored inside the files into the behavior_data directory and camera_frames directory.

Parameters:
  • session_data (SessionData) – The SessionData instance for the processed session.

  • manager_id (int) – The xxHash-64 hash-value that specifies the unique identifier of the manager process that manages the log processing runtime.

  • parallel_workers (int, default: 7) – The number of CPU cores (workers) to use for processing the data in parallel. Note, this number should not exceed the number of available log files.

  • update_manifest (bool, default: False) – Determines whether to update (regenerate) the project manifest file for the processed session’s project. This should always be enabled when working with remote compute server(s) to ensure that the project manifest file contains the most actual snapshot of the project’s state.

Return type:

None

Legacy Log Processing

This package contains the functions used to parse the log data generated by the legacy data acquisition pipeline used by the Tyche dataset. Specifically, the tools provided by this module read the .JSON log file generated by the GIMBL Unity library and extract the necessary data as .feather files, used during later data processing stages. It both parses the behavior data and makes it compatible with modern Sun lab data processing pipelines.

sl_behavior.legacy.extract_gimbl_data(session_data, manager_id, update_manifest=False)

Reads and exports the data stored in the GIMBL .JSOn file to individual .feather files.

This is a service function designed to process the legacy data from the Tyche dataset. It should not be used with modern Sun lab data and instead is purpose-built for reanalyzing the legacy Tyche dataset. Do not call this function unless you know what you are doing.

Parameters:
  • session_data (SessionData) – The SessionData instance for the session whose legacy log data needs to be processed.

  • manager_id (int) – The xxHash-64 hash-value that specifies the unique identifier of the manager process that manages the log processing runtime.

  • update_manifest (bool, default: False) – Determines whether to update (regenerate) the project manifest file for the processed session’s project. This should always be enabled when working with remote compute server(s) to ensure that the project manifest file contains the most actual snapshot of the project’s state.

Return type:

None