Neuronal Fluorescence Signal Processing Pipeline (v0.1.0)
B3. Revised App for quantification of calcium and glutamate signaling data in cultured neurons (authors: Kirill Chesnov (Wernig lab) and Silvia Natale)”
This pipeline was developed to analyze glutamate release at synaptic level (measured by iGluSnFr3 signal) in mouse and human induced neurons. To recording conditions for iGluSnFr3 signal are 100fps is required with 60x magnification. The pipeline also allows to plot representative traces
1.General description
This pipeline is based on a python code that allows to analyze iGluSnFr3 firing at synaptic level over time as a measure of synaptic activity; it contains values that are specific for iGluSnFr3 signal but they can be changed for any type of imaging recording (e.g., Calcium, Glutamate, Voltage). The pipeline can be found at https://github.com/chesnov/fluorescent_peak_analysis. It allows also to plot also representative traces. The pipeline automates several key steps:
- Preprocessing: Motion correction and extraction of Regions of Interest (ROIs) and their corresponding fluorescence traces using the CaImAn library.
- Signal Processing: Calculation of ΔF/F traces.
- Event Detection: Identification of fluorescence peaks (events) within each ROI trace.
- Analysis: Calculation of key metrics like peak amplitude, event frequency, and pairwise synchrony between ROIs.
- Aggregation & Visualization: Combines results across multiple experiments and conditions, performs statistical comparisons (ANOVA, Tukey HSD), and generates summary plots and data tables.
The primary entry point for users is the process_dataset function that can be executed from the jupyter notebook.
2. Prerequisites
- Anaconda: A working the environment creation manual is available on GitHub
- Input Data: Raw imaging data should be in .tif format.
3. Steps to make the analysis started
- The first step is to use VS code to download the github folder “fluorescent_peak-analysis” on desktop. To do that, download VS code and git on pc (if not already previously downloaded)
- Also, create an account on github if not already available
- Install conda (download from the web: install conda>anaconda distribution>skip registration)
- To download the github folder “fluorescent_peak-analysis” on the the desktop in the terminal of VS code write:
- cd Desktop
- git clone conda env create -n caiman --file environment.yaml
- Go to the folder on desktop “fluorescent_peak-analysis” and you will find the file “trace_feature_extract” that is the notebook
- Select the kernell python>caiman
- On the notebook run “import” and it should install all the packages. If some of the packages are not installed, install them separately
- pip install name of the package
- On the Jupiter notebook “Run full pipeline” specify input_dir and output_dir
- In the output folder you will have analysis of Amplitude, Frequency and Synchronicity calculated per ROI, Neuron and batch.
4. Input Data Structure
The pipeline expects a specific directory structure for your input data.
<input_dir>/
│
├── configuration.yaml <-- Central configuration file for ALL experiments
│
├── Condition_A/ <-- Folder for experimental condition 'A'
│ ├── Experiment_01/ <-- Folder for experiment 1 under Condition A
│ │ └── movie_01.tif <-- Raw imaging data file
│ ├── Experiment_02/
│ │ └── movie_02.tif
│ └── ...
│
├── Condition_B/ <-- Folder for experimental condition 'B'
│ ├── Experiment_03/
│ │ └── data_03.tif <-- Filename can vary, must end in .tif
│ └── ...
│
└── ... <-- Other condition folders
- <input_dir>: The main directory containing all your data and the configuration file.
- configuration.yaml: A single YAML file located at the root of <input_dir>. This file contains all parameters for motion correction, source extraction, peak finding, and analysis used across all experiments processed in a single run. Crucially, you must edit this file to set the desired parameters before running the pipeline. Several sample parameter files are available on GitHub
- Condition_X/: Subdirectories named according to your experimental conditions (e.g., Control, TreatmentX).
- Experiment_Y/: Subdirectories within each condition folder, representing individual biological replicates or recordings.
- *.tif: The raw fluorescence movie file for each experiment. There should be exactly one .tif file per Experiment_Y folder.
5. Configuration (configuration.yaml)
This file is central to the pipeline's operation. It uses the YAML format to define parameters for various stages. Key sections include:
- motion_corr: Parameters for CaImAn's motion correction (frame rate fr, decay time, max shifts, etc.).
- src_extr_deconv: Parameters for CaImAn's CNMF-E source extraction and deconvolution (expected number of neurons K, gaussian kernel size gSig, thresholds, background subtraction options, etc.).
- cnmfe_params: Parameters for evaluating and filtering detected components (minimum SNR, correlation threshold rval_thr, CNN classifier usage).
- roi_contours: Parameters for defining ROI boundaries (level, minimum area percentage_nonzero).
- peak_extraction: Parameters for the peak detection algorithm (window_size for smoothing, positive_peaks flag, thresholds based on noise standard deviations num_std_height, num_std_prominence, width_threshold, noise percentile k_percentile, ROI filtering num_noise_std_thresh, min_num_rois, include_silent_rois, analysis time window peak_finding_start, peak_finding_end, processing timeout).
- Action Required: Carefully review and adjust the parameters in configuration.yaml to match your experimental setup and analysis goals before running the pipeline. Sample yaml files contain detailed comments to help you set parameters correctly.
6. Running the Pipeline
The pipeline is executed by calling the process_dataset function from the peak_extraction.py script. You typically do this within a Python script or a Jupyter notebook.
# Example Python script or Jupyter cell
from peak_extraction import process_dataset # Make sure peak_extraction.py is in your Python path or current directory
# Adjust utils import path if needed within peak_extraction.py/analyze_data.py
# Define the main input and output directories
input_directory = '/path/to/your/input_data' # Replace with the actual path to <input_dir>
output_directory = '/path/to/your/analysis_output' # Replace with the desired output path
# Run the entire processing and analysis pipeline
process_dataset(input_directory, output_directory)
- Replace /path/to/your/input_data with the full path to the directory containing your configuration.yaml and condition folders.
- Replace /path/to/your/analysis_output with the full path where you want the results to be saved. The pipeline will create this directory if it doesn't exist, along with necessary subdirectories mirroring the input structure.
- Ensure the utils package (containing analyze_data.py and caiman_wrapper.py) is accessible from where you run peak_extraction.py.
- Execute the script. The pipeline will iterate through each valid experiment found in the input structure, process it, and finally aggregate and analyze the results. Progress bars will be displayed for some steps.
7. Output Description
The pipeline generates output files in two main locations: within individual experiment folders and aggregated results in the main output directory.
A. Individual Experiment Output (within <output_dir>/<Condition_X>/<Experiment_Y>/)
For each processed experiment (e.g., movie_01), the following files are typically generated:
- movie_01_DFF_traces.csv: Comma-separated values file containing the calculated ΔF/F traces for all retained ROIs (Rows=time points, Columns=ROIs).
- movie_01_roi_contours.pdf: A PDF showing the average fluorescence image with the contours of all retained ROIs overlaid and numbered.
- movie_01_experiment_df.csv: A detailed table listing every detected peak for every ROI. Columns include roi_id, peak_time (frame number), peak_absolute_amplitude (in ΔF/F units), and noise_level (estimated baseline noise for the ROI).
- movie_01_firing_frequency.csv: A table summarizing the firing characteristics for each ROI. Columns: roi_id, mean_peak_to_peak_distance[ms], mean_firing_frequency[Hz].
- movie_01_aggregated_corr.csv: A matrix (saved as CSV) representing the pairwise synchrony score between all ROIs for this experiment.
- movie_01_aggregated_corr.html / .pdf: Interactive (HTML) and static (PDF) heatmap visualization of the synchrony matrix.
- ROI_X_peaks.html / .pdf: For each ROI, plots showing the raw trace, smoothed trace, detected peaks, and the detection threshold. Useful for verifying peak detection quality.
- _noise_level_histogram.html / .pdf: A histogram showing the distribution of estimated noise levels across all ROIs in this experiment.
- movie_01_settings.yaml: A copy of the configuration used for this specific run, plus status information (status_message, retained_components, pipeline_version). Check status_message here first if an experiment fails or produces unexpected results. Messages like Full pipeline success, No components were extracted, No contours were extracted, Successful ROI extraction (but peak finding might fail later) indicate the processing outcome.
- Temporary CaImAn Files: Motion corrected movies (.mmap, .tif), etc., might be present depending on CaImAn settings and cleanup status.
B. Aggregated Output (within <output_dir>/)
After processing all individual experiments, the pipeline aggregates the data and performs comparative analysis, saving results directly in the main <output_dir>:
- experiments_amplitude_df.csv: Combines _experiment_df.csv from all valid experiments (after filtering). Includes condition and experiment ID columns. Contains data for every detected peak.
- experiment_avg_peak_amplitudes.csv: Average peak amplitude calculated per experiment (averaging ROI averages). Used for statistical tests between conditions.
- roi_avg_peak_amplitudes.csv: Average peak amplitude calculated per ROI.
- rois_to_remove.csv: Lists ROIs (and experiments) excluded from the final analysis and the reason (e.g., high noise, experiment had too few ROIs after filtering, silent ROI).
- experiments_frequency_df.csv: Combines _firing_frequency.csv from all valid experiments/ROIs.
- experiment_avg_firing_frequency.csv: Average frequency and peak-to-peak distance calculated per experiment. Used for statistical tests.
- experimets_synchrony_df.csv: Combines pairwise synchrony data from all valid experiments/ROI pairs.
- mean_synchrony_df.csv: Average synchrony score calculated per experiment. Used for statistical tests.
- ANOVA_*.csv: Results of the One-Way ANOVA test comparing conditions for peak amplitude, frequency, and synchrony (based on per-experiment averages).
- Tukey_*.txt: Results of the Tukey HSD post-hoc test (if ANOVA was significant) detailing pairwise comparisons between conditions for amplitude, frequency, and synchrony.
- Peak_Absolute_Amplitude*.pdf: Plots visualizing peak amplitude distributions:
- _by_Group.pdf: Box plot of per-experiment averages, with individual experiment averages overlaid (swarm plot).
- _by_ROI.pdf: Violin plot of per-ROI averages.
- Peak_Absolute_Amplitude.pdf: Violin plot of all individual peak amplitudes.
- Mean_Firing_Frequency*.pdf / Mean_Peak_to_Peak*.pdf: Plots visualizing frequency/inter-peak interval distributions:
- _by_Experiment.pdf: Box plot of per-experiment averages, with individual experiment averages overlaid.
- Mean_Firing_Frequency.pdf / Mean_Peak_to_Peak.pdf: Violin plot of per-ROI averages.
- Mean_Synchrony_by_Group.pdf: Box plot of per-experiment average synchrony, with individual experiment averages overlaid.
8. Troubleshooting & Notes
- Check _settings.yaml: If an experiment seems missing from the final analysis or gave errors, check the status_message field in its individual output folder (<output_dir>/<Condition>/<Experiment>/<experiment_id>_settings.yaml).
- CaImAn Errors: The CaImAn steps (motion correction, source extraction) can sometimes fail depending on data quality and parameters. Check the console output for specific CaImAn error messages. Adjusting parameters in configuration.yaml (e.g., K, gSig, min_SNR, rval_thr) might be necessary.
- Timeout: The raw_data_to_df_f function has a timeout mechanism (default 300s, configurable in peak_extraction section of YAML). If CaImAn hangs, the process will be restarted. Persistent timeouts might indicate a deeper issue or need for more computational resources.
- ROI Filtering: ROIs can be excluded based on:
- Noise level deviating too far from the mean noise across all experiments (controlled by num_noise_std_thresh).
- Being "silent" (no peaks detected) if include_silent_rois is False.
- Experiment Filtering: Entire experiments can be excluded if, after ROI filtering, they have fewer ROIs than min_num_rois. The console output will list removed experiments.
- Parameter Tuning: Achieving good results often requires tuning the parameters in configuration.yaml, especially those related to CNMF-E (src_extr_deconv, cnmfe_params) and peak detection (peak_extraction). Use the individual experiment plots (_roi_contours.pdf, ROI_X_peaks.pdf) to guide tuning.
- Memory: Processing large datasets can be memory-intensive, particularly during the CaImAn steps. Ensure your system has sufficient RAM.