MoSeq2 Extract Modeling Notebook Instructions

Overview

MoSeq2 Extract Modeling Notebook has 3 parts, extraction, PCA, and AR-HMM Modeling. These more detailed instructions are complementary to the instructions in the notebook. The notebook should be run linearly.

The general process of using MoSeq is as follows:

acquire data (depth video)
organize data into a project-specific folder
extract mouse from depth video
aggregate data
assign group/experimental labels to each session
reduce dimensionality of data (PCA)
model with AR-HMM
analysis

Each time you acquire and extract more data for a project after going through steps 1-8, we generally recommend re-doing steps 4-8.

Note: Please use Chrome to interact with the Jupyter notebooks. The widgets are not supported in other browsers.

Project Setup

If you are using Conda, and the environment name is moseq2-app, please run conda activate moseq2-app to activate the environment. If you are using the Docker container, please make sure your MoSeq container is running. The MoSeq2 package suite uses specific versions of Python packages so you will run into warnings when you run certain cells. There is no further action needed when you see the warnings and you don't need to worry about them. When running the Check MoSeq Versions cell, the following warning can be ignored: pandaswarning

Files and Directory Structure

Currently Supported Depth File Extensions

We currently support .dat, .tar.gz, .avi and .mkv. You can read more about these depth data extensions here.

Directory Structure

Each MoSeq project is contained within a base directory and you can copy the MoSeq notebooks to the base directory and navigate to better organize the extraction, modeling, and analysis results. At this stage, the base directory should contain separate subfolders for each depth recording session, as shown below (read more about it here):

.                   
└── <base_dir>/
    ├── session_1/
    ├   ├── depth.dat
    ├   ├── depth_ts.txt
    ├   └── metadata.json
    ...
    ├── session_n/
    ├   ├── depth.dat
    ├   ├── depth_ts.txt
    └── └── metadata.json

To run the notebooks, in Terminal, navigate to the directory with the notebooks, run jupyter notebook to start the Jupyter server. You can read more about how to run code in Juypter Notebooks in the official documentation.

During project setup, the notebook generates or restores variables from progress.yaml (more information) and config.yaml (more information), and downloads the pre-trained flip classifier model file for extraction if the file doesn't exist yet. NOTE: if you upgrade MoSeq (either through pip/conda or through Docker), make sure you generate new config.yaml and session-config.yaml files. Any changes to parameter names or the addition of new parameters may cause errors. Simply rename your old config files to something like config-backup.yaml or config.yaml.backup so that you don't lose preset parameters, and MoSeq will generate new configs.

Raw Data Extraction

There is no further action needed when you see the following warnings: sklearn1 sklearn2

Interactive Arena Detection Tool

Running this tool is optional, but recommended. If the Interactive Arena Detection Tool is not run, the default parameters in config.yaml file will be used in the extraction step. You can find the file structure after running this tool here.

This tool allows you to visualize how changing the parameters used in the extraction step affects the detection of the arena floor and mouse extraction. Once you have settled upon a combination of parameters that produce a good arena mask and extraction, you can save them to be used for this session during the actual extraction step.

interactive arena

Computing the arena mask

Run the cell to initialize the Arena detection widget. The cell renders a control panel to configure parameters for detecting the arena.
By default the widget selects the first unextracted session folder in your dataset, sorted alphanumerically.

The most common parameters are displayed in the arena mask section.

Adjust the depth range for detecting the floor of the arena.
Adjust the dilation iterations to expand or shrink the floor mask.
Once finished adjusting parameters, click the Compute arena mask button to apply any new parameter changes and (re)compute the mask.

If adjusting the common parameters didn't produce a satisfactory arena mask, you can display advanced parameters by clicking the "Show advanced arena mask parameters" checkbox.

Adjust these parameters until the desired arena mask is produced. Note: the arena mask algorithm produces a selection of candidate arena masks, and sorts them based on mask size, shape, and distance to the center of the arena. If the desired mask isn't the first mask that is displayed, try cycling through some of the other candidate masks, using the Mask Index parameter.

Computing a test extraction

If you like the arena mask, click the Compute extraction button to test extraction on a subset of the data. Use the Display frame slider to scroll through extracted frames from the test extraction.

Generally, the default parameters work well, but if you need to further adjust extraction parameters, click the Show advanced extraction parameters checkbox to reveal the other parameters you can fine-tune.

Saving parameters

Once you are satisfied with the extraction, click the Save parameters... button to move on to the next session and save this session's parameters. Your session-specific parameter settings will be saved into the session_config.yaml file and are used during the extraction step.

Clicking the Save session parameters button writes the session-specific parameters to the session_config.yaml file only. The widget will remain on the current session.
Clicking the Save session parameters and move to next writes the session-specific parameters to the session_config.yaml file, moves on to the next session, and computes the arena mask and sample extraction for the new session given the current parameters.

Note: selecting a different session does not automatically compute the arena mask or extraction. You still must click either Compute arena mask or Compute extraction.

Version update notes

This widgets has mutliple important upgrades and updates. Please check the version number of moseq2-app to see if you are using the most up-to-date version.
All versions before v1.0.2beta: The wiget is called "Interactive ROI Detection Tool". All versions of this widget called "Interactive ROI Detection Tool" are deprecated and no longer suported. If you are using a version of the MoSeq Extraction and Modeling Notebook that says "Interactive ROI Detection Tool", please update MoSeq2 package suite to the latest version.
Version v1.1.0beta to v1.1.1: The widget is called "Interactive Arena Detection Tool". These versions don't automatically add session-config.yaml file path to config.yaml, therefore, the session specific extract paramters are not used in the extraction step. Please update your MoSeq2 package suite. Alternatively, you can manually add session_config_path: <path to your session_config.yaml> to the config.yaml file. Partial config.yaml example below:

...
pixel_format: gray16le
prefix: ''
progress_bar: false
recompute_bg: false
session_config_path: ./session_config.yaml
skip_completed: false
spatial_filter_size: [3]
tail_filter_iters: 1
tail_filter_shape: ellipse
...

Version v1.1.2: we fixed the bug to ensure the session-specific parameters are used in the extraction step.

Extract sessions

Each session takes a while to extract if the data is extracted locally, and each extraction is run serially. A 30-minute session typically takes about 10 minutes to extract using default settings. When running locally, extractions occur serially so plan to wait accordingly.

However, if you are running MoSeq from a computing cluster that runs Slurm, you can extract sessions in parallel.

Extracting sessions in parallel with Slurm

This section only applies to users running MoSeq on a computing cluster with Slurm.

To extract data using Slurm, modify the following keys within the config_data dictionary:

key	default	description
cluster_type	'local'	can either be "local" or "slurm" if set to slurm, Moseq understands to submit extraction jobs to Slurm for parallel execution.
partition	'short'	the partition on Slurm to submit your job to. This is specific to the Slurm cluster you're using.
memory	'5GB'	the amount of memory to request for an extraction job.
ncpus	1	the number of CPUs to request for an extraction job. 1 CPU is enough because the extraction runs on a single process.
wall_time	'3:00:00'	the amount of time you expect the extraction job to run. You should request a bit more time t vehan the actual extraction time as a buffer.
run_cmd	False	Flag used to automatically submit the Slurm jobs. Set to `True` to submit jobs within the notebook.
prefix	''	A command to run before starting the extraction. For example, if MoSeq is installed on a specific conda environment, you should activate the conda environment first. In this case, `prefix` could be set to `'source ~/.bashrc; conda activate moseq2-app;'`
extract_out_script	'extract_out.sh'	File name to store the actual extraction commands used.

The partition, memory, ncpus, and wall_time keys are passed directly to Slurm, so should be formatted as such. Look at their documentation for more information about formatting.

Run Extraction Validation Tests

Below we show an example of the extaction validation test. extractionvalidation

Warning Interpretations:

stationary: The session has over 5% of the total frames where the mouse appears to be stationary. MoSeq models mouse pose dynamics so if the mouse stays stationary for a prolonged period, the model may not yield accurate results. You may want to further examine the extracted data and/or exclude the session(s) in the modeling step.
missing: The session has over 5% of the total frames where the mouse appears missing. MoSeq may still yield accurate results but we recommend that you examine the extracted data and change the parameters to re-extract the data.
size_anomaly: The session has over 5% of the total frames where the mouse area is smaller than 2 standard deviations of the mean mouse area.
dropped_frames: The session has over 5% of the total frames dropped. We recommend going through the extracted data to see if all the dropped frames are clustered together or scattered throughout the session. If all the dropped frames are clustered together, you may want to split the session into two to exclude the dropped frames.
corrupted: The session has over 5% of the total frames that have no extracted values. We recommend going through the extracted data to see if all the corrupted frames are clustered together or scattered throughout the session. If all the corrupted frames are clustered together, you may want to split the session into two to exclude the corrupted frames.
scalar_anomaly: Average scalar outliers are detected using the elliptic envelope algorithm with 10% contamination. Sessions flagged with scalar_anomaly flag means that the average scalar values such as velocity, height, etc are not within the fitted elliptical envelop that contains 90% of the data.

Please note that these warnings don't mean something is wrong with your data. They are meant as a guide to show you potential issues with your data. It is up to your discretion on whether or not you want to exclude any sessions flagged by this module. You should visually inspect the sessions with the Review Extraction widget and look at the outliers in the Scalar Summary widget to learn more information. You should check if the warnings, extraction results, and scalar outliers match your expectation.

[OPTIONAL] Review Extraction Output

review extraction

In addition to the extraction validation tests, visual inspection of the extracted videos can be useful to identify problems. This interactive tool shows extraction videos for you to review the extracted output. You can find examples of a good and a bad extraction below. You can find the file structure after data extraction here.

Instructions:

Run the cell to launch the interactive widget to review extraction output.
Select a session from the Session dropdown menu to preview its extraction output.
Change the Playback Speed slider to speed up or slow down the video.

Note that this widget can take some time to load and render because it must manipulate relatively large movies.

Examples of a good extraction and a bad extraction

The video below is an example of good extraction. If everything worked, you should see an extraction movie that looks like the following video (within reason). In the example of good extraction, the mouse is located in the center of the frame, and the head is always pointing right. The noise from the bucket wall is reasonable and it doesn't create a reflection of the mouse.

In some cases, just the first frame of a session has a lot of speckle noise but goes away for subsequent frames and that is still considered a reasonably good extraction.

The video below is an example of a bad extraction. The mouse's head is not always pointing right, the reflection of the mouse is visible in many frames and the mouse is not always located in the center of the frame.

Aggregate extracted results

Once all of your raw data recordings have been extracted and are of good quality, to simply keep track of all the training data, you should consolidate all the output files from extraction in a single folder called aggregate_results/ and generate moseq2-index.yaml (more information). You can find the file structure after aggregating results here.

If you add new sessions after running this step or any of the future steps, and you want to include them in your analyses, you will need to:

re-aggregate your data
re-run pca
train a new model

NOTE: when aggregating data on this step, the moseq2-index.yaml file is re-generated if it is already present. If you have previously run the PCA step, you will need to re-supply the path to your pca scores file for use in future steps.

Assign Groups

The tool is intended for users to specify groups for the sessions interactively. The group field in the moseq2-index.yaml is used to store group labels, so the sessions can be grouped by experimental design for downstream analysis. Group labels in the moseq2-index.yaml can be used analyses comparing different cohorts or experimental conditions. Initially, all sessions are labeled "default" and the Group Setter tool below is used to assign group labels to sessions. This step requires that all your sessions have a metadata.json file containing a session name. Specify Groups

Instructions

Run the cell to launch the Group Setter tool.
Click on a column name to sort the table by values in the column.
Click the filter button to filter the values in a column.
Click on the session to select the session. To select multiple sessions, click the sessions while holding the CTRL/COMMAND key, or click the first and last entry while holding the SHIFT key.
Enter the group name in the text field and click Set Group to update the group column for the selected sessions.
Click the Update Index File button to save current group assignments.

Scalar Summary for Further Extraction Diagnostics

Scalar Summary

Running Further Extraction Diagnostics tool is optional. Running this tool, scalar values such as mouse sizes and speeds are plotted and you can interactively identify sessions that may be outliers.

Sclar Summary

If you are modeling more than one group, mouse sizes must be similar. We typically use the area_mm scalar to inspect the distribution of mouse sizes in our dataset. Additionally, you can look at the length, width, and height of the mouse across sessions and those metrics should also be similar. Otherwise, the data should be preprocessed before the modeling step.

If you see sessions with very low average velocity, you may want to further examine the extracted data and/or exclude the sessions in the modeling step. This only becomes an issue during the modeling step if you see the following error: numpy.linalg.LinAlgError: Matrix is not positive definite. This error is described in more detail below.

There are two columns of plots, one plotting the mean scalar value on the left, and one column plotting the standard deviation of the scalar value on the right. When looking for outliers, focus on the mean column. The example shown above shows a typical distribution of 2d velocity and mouse area values.

If outlier sessions do exist, you can review the extraction video using the Preview Extractions tool and check for any irregularities that could indicate the session either need to be re-extracted or discarded (due to different forms of corruption).

UUID Lookup

If you want to re-extract or exclude the outlier sessions and you are not able to locate the session with the information provided in the tool above, you can use the UUID Lookup tool to look up additional information, such as file paths, session name, and subject name for a particular data point in the Scalar Summary.

UUID Look Up

PCA

Fitting PCA

Fit PCA to your extracted data to determine the principal components (PCs) that explain the largest possible variance in your dataset. The PCs should look smooth and well-defined and should explain >90% of the variance in the dataset with 10 PCs. This is important because we want to distill most of the pose information present in our depth videos into a low-dimensional representation while also removing unwanted noise.

Note: sometimes dask (the package we use for fitting PCA) will terminate unexpectedly during the PC fitting steps. This happens more frequently when any of the following conditions are met:

running PCA with very large datasets
using a lot of workers when running PCA locally. This isn't an issue when requesting workers on a SLURM cluster.
running PCA on your local computer
running PCA on a computer with low memory specifications

If you encounter these issues locally, try running PCA again after adjusting some of your resource settings.

PCA Parameters

config_data['overwrite_pca'] is a boolean that specifies whether we overwrite the existing PCA results. config_data['overwrite_pca'] = False means no overwriting.
config_data['gaussfilter_space'] is a tuple that specifies the kernel standard deviations along the horizontal and vertical directions for Gaussian filtering. config_data['gaussfilter_space'] = (1.5, 1) means the horizontal kernel standard deviation is 1.5 and the vertical kernel standard deviation is 1.
config_data['medfilter_space'] specifies the kernel size for median filter on the frame. config_data['medfilter_space'] = [0] means there is no median filtering for the image on the frame.
config_data['medfilter_time'] specifies the kernel size for median filter across frames. config_data['medfilter_time'] = [0] means there is no median filtering across frames.
config_data['missing_data'] is a boolean that speifies whether we use missing data PCA. config_data['missing_data'] = True means we using an iterative method, missing data PCA to reconstruct respective PCs for dataset with missing/dropped frames or occluded portitions such as mice with head-attached cables.
config_data['missing_data_iters'] is the number of times to iterate over missing data during PCA. config_data['missing_data_iters'] = 10 means there are 10 iterations.
config_data['recon_pcs'] is the number of PCs to use for missing data reconstruction.config_data['recon_pcs'] = 10 means 10 PCs are used.
config_data['dask_port'] is the port to which Dask send the Diagnostics Dashboard to. config_data['dask_port'] = '8787' means the port number is 8787 and we recommend you leave this parameter unchange.
config_data['nworkers'] is the number of workers for computing PCA. config_data['nworkers'] = 1 means there is 1 worker for computing PCA and we recommend using 1 worker when running PCA locally. When running PCA on Slurm, the number of workers represent the number of spawned Slurm jobs.

PCA Slurm parameters

Refer to the Slurm parameters in the extraction section, as they similarly apply here. The only difference is that the nworkers parameter refers to either the number of CPUs used locally or the number of batched Slurm jobs when used via Slurm.

Note: You may see warning messages that say: distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: X GB -- Worker memory limit: Y GB. This message doesn't mean there is an error and you can ignore the warning message as long as the kernel is still running.

PCA is a resource-intensive step and in some cases, the kernel in Jupyter may die before the PCA step finishes. If you continue to experience a dead kernel after adjusting your resources as described in the wiki, please run the Fitting PCA step and Computing Principal Component Scores step in the CLI. Find the CLI instructions in the wiki here.

pca warning

Below we show a typical set of PCs MoSeq2 learns. PCA Result Visualization

In this example, 25 PCs are concatenated and plotted together. Pixels colored white indicate large positive component weights, while black indicates large negative component weights. Principal components are ordered from left to right, top to bottom, in terms of how much variance each explains. If you want to learn more about PCA, check out this useful link.

pca scree plot This plot (called a scree plot) shows the accumulated variance explained for each of the PCs shown in the above image. As noted above, we have achieved our desired goal of explaining >90% of the variance in the dataset with 10 PCs.

If this isn't the case, check out our PCA troubleshooting part in the Wiki for more information. You can configure the parameter for the PCA step by modifying these parameters: PCA Parameter

You can find which files are added after the PCA step here.

Computing Model-Free Changepoints

This step can be used to determine a target syllable duration for the modeling step. Computing model-free changepoints is an optional step if you don't plan to use it as a reference to find the model that best matches the changepoints. In a nutshell, changepoints describe moments in time where motifs of action switch into other actions determined with a model-free algorithm. Refer to our first MoSeq publication to learn more about changepoints.

Model-free Changepoints

The image above shows the distribution of changepoint durations computed across a dataset. Typically the distribution is smooth, left-skewed, and peaks around 0.3 seconds, and we typically observe a median changepoint duration between 0.3-0.4 seconds. If this is not the case, check out our model-free changepoints problem in the Wiki for more information.

After computing model-free changepoints, changepoints_dist.png will be added to _pca. You can find the file structure after running Computer model-free changepoints here.

Note: the default parameters are configured for C57 mouse data, and have not been tested for other strains and species.

AR-HMM Modelling

Fitting AR-HMM Model

Fitting the AR-HMM typically requires adjusting the kappa hyperparameter to achieve a target syllable duration (higher values of kappa lead to longer syllable durations). The target duration can be determined using changepoint analysis or set heuristically to 0.3-0.4 seconds based on work from our lab. In the code below, set kappa to 'scan' to run a family of models with different kappa values and use the "Get Best Model Fit" cell to pick a value automatically based on the changepoint duration distribution computed on your data.. We recommend fitting for 100-200 iterations to pick kappa. You can find more information on scanning kappa and best practices running models in the analysis tips.

For final model fitting, set kappa to the chosen value and fit for ~1000 iterations. We recommend that you run ~100 models and use the "Get Best Model Fit" cell to pick the model that best fits the PC changepoints. You can configure the parameter for the modeling step by modifying the parameters in the Model Parameters block of code in the AR-HMM Modeling section. We describe the parameters in detail below:

Most Relevant Model Paramters

config_data['checkpoint_freq'] sets the model saving freqency (in interations). config_data['checkpoint_freq'] = -1 means no model checkpoints are saved; if config_data['checkpoint_freq'] is set to a positive integer, such as config_data['checkpoint_freq'] = 50 means a model checkpoint is saved every 50 iterations and a checkpoints folder containing checkpointed models will be created.
config_data['use_checkpoint'] is a boolean that specifies whether the program resumes training from the latest saved checkpoint. config_data['use_checkpoint'] = False means the training doesn't resume from saved checkpoints.
config_data['npcs'] is the number of PCs being used to represent the pose dynamics and they are the observations to train the AR-HMM. In our lab, we tend to set config_data['npcs'] = 10 to include the top 10 PCs. The number of PCs included should explain variance >= ~90%.
config_data['max_states'] number of maximum states (syllables) the AR-HMM can end up with. In our lab, we tend to set config_data['max_states'] = 100.
config_data['robust'] is a boolean that specifies whether the noise in the AR-HMM is sampled from a t-distribution. config_data['robust'] = True means the modeling step uses robust-AR-HMM with a t-distribution. Robust AR-HMM can tolerate more noise and the model yields fewer syllables. config_data['robust'] = False means the modeling step uses non-robust AR-HMM with a z-distribution and the model yields more syllables than robust AR-HMM. In our lab, we tend to set config_data['robust'] = True in exploratory analysis because few syllables are useful to gauge the general behavioral patterns.
config_data['separate_trans'] a boolean that specifies whether the groups are modeled with separate transition matrices. config_data['separate_trans'] = False means all groups share one transition matrix for the hidden states (syllables) and config_data['separate_trans'] = True means each group has its own transition matrix for the hidden states. Since setting config_data['separate_trans'] = True would decrease the amount of data that goes into producing each transition matrix so we recommend only using it when the the data size is large. In our lab, we tend to set config_data['separate_trans'] = False because the model parameters are more accurate when there are more data.
config_data['num_iter'] sets the number of the iterations to train the model (the number of iterations of Gibbs sampling). config_data['num_iter'] = 100 means the model parameters are updated through 100 iterations of Gibbs sampling. A higher number of iterations yields more accurate models but it may increase the computation cost. In our lab, we tend to set config_data['num_iter'] = 100 in the exploratory step and config_data['num_iter'] = 1000 when we are satisfied with the model parameters and move on to in-depth behavioral analysis.
config_data['kappa'] specifies the kappa setting in AR-HMM. Kappa setting affects the length (duration) probability distribution prior for the learned states (syllables). Higher kappa value results in longer average syllable durations. If config_data['kappa'] = None, the kappa will be automatically set to equal the number of total frames. If config_data['kappa'] = 'scan', models with different kappa values are trained and you can use Find best model fit. You can find more information about kappa scan in the documentation here.
config_data['select_groups'] is a boolean that specifies whether specific groups are modeled. config_data['select_groups'] - False means all data as is in moseq2-index.yaml are modeled.
config_data['cluster_type'] specifies the platform the modeling process runs on. config_data['cluster_type'] = 'local' means the training is run locally. We currently also supported config_data['cluster_type'] = 'slurm'.
config_data['ncpus'] specifies the number of cpus used in the model training. config_data['ncpus'] = 0 means all the CPU cores will be used.

Advanced model parameters - hold out parameters

config_data['hold_out'] is a boolean that specifies whether a subset of data is held out during the training process. config_data['hold_out'] = False means there is no data subset to hold out during the traing process.
config_data['nfolds'] specifies the number of folds to hold out during training if config_data['hold_out'] = True. config_data['nfolds'] = 2 means 2 folds to hold out during training.
config_data['percent_split'] specifies the training-validation split percentage when config_data['hold_out'] = False. config_data['percent_split'] = 0 means no data is reserved is for validation during training.

Advanced model parameters - Kappa scan parameters

Effective when config_data['kappa'] == 'scan'

config_data['scan_scale'] specifies the scale to scan kappa values. config_data['scan_scale'] = 'log' means the kappa values are log spaced and config_data['scan_scale'] = 'linear' means the kappa values are linearly spaced. config_data['min_kappa'] specifies the minimum kappa value. config_data['min_kappa'] = None means the minimum kappa value will be automatically set with respect to the total number of frames. config_data['min_kappa'] = 1000 means the minimum kappa value is 1,000.
config_data['max_kappa'] specifies the maximum kappa value.config_data['max_kappa'] = None means the maximum kappa value will be automatically set with respect to the total number of frames. config_data['max_kappa'] = 1000000 means the maximum kappa value is 1,000,000.
config_data['n_models'] specifies the total number of models to scan through. config_data['n_models'] = 15 means 15 models with different kappa values will be run.
config_data['out_script'] specifies the name of the file to save the script to run kappa scan learn model commands. config_data['out_script'] = 'train_out.sh' means the file saved is train_out.sh in the base directory.

Advanced model parameters - scaling parameter for Hierarchical Dirichlet Process

alpha and gamma control the rate of dispersion for data in the Hierarchical Dirichlet Process. We recommend you leave the two parameters alone therefore the details for these parameters are not discussed here. In our lab, we tend to set config_data['alpha'] = 5.7 and config_data['gamma'] = 1e3.

SLURM PARAMETERS

Effective when config_data['kappa'] == 'scan' and config_data['culuster_type'] = 'slurm'. Only edit these parameters if cluster_type == 'slurm'

config_data['prefix'] specifies prefix that activates the conda environment. config_data['prefix'] = 'conda activate moseq2-app; ' activates a conda environment named moseq2-app.
config_data['memory'] specifies the memory to be requested. config_data['memory'] = '16GB' requests 16GB of memeory for each model training.
config_data['wall_time'] specifies the requested time of the process. config_data['wall_time'] = '3:00:00' requests 3 hours of computing time.
config_data['partition'] specifies the name of the partition. config_data['partition'] = 'short' means the "short" partition is used.
config_data['run_cmd'] is a boolean that specifies whether the kappa scan script is the commands via os.system(...). config_data['run_cmd'] = False means the script is run mannually.

Note: if loading a model checkpoint, ensure the modeling parameters (especially the selected groups) are identical to that of the checkpoint. Otherwise, the model will fail.

You can find the file structure after fitting the AR-HMM model here.

Potential errors running the model

numpy.linalg.LinAlgError: Matrix is not positive definite: this error can sometimes occur if a mouse is too still in some of the sessions, or if it isn't detected in a significant number of frames, and thus is trying to fit the AR-HMM to sections of data that display no change.

Continue to the analysis notebook

Congrats, you've just completed the extraction and modeling steps of the MoSeq pipeline!

Once you have extracted and modeled your data, move on to the MoSeq2 Analysis Visualization Notebook and read instructions on the wiki to analyze and interpret your results.

MoSeq2 Home
- Pipeline home
Acquisition
MoSeq2 installation
- Conda installation
- Docker installation
  - MacOS
  - Windows
Depth data and intermediate file organization
Extraction and modeling notebook
Analysis and visualization notebook

Command-line alternatives
- Extraction and modeling
- Analysis and visualization
Troubleshooting and tips
Other resources