merlin.core package

Submodules

merlin.core.analysistask module

exception merlin.core.analysistask.AnalysisAlreadyStartedException[source]

Bases: Exception

exception merlin.core.analysistask.AnalysisAlreadyExistsException[source]

Bases: Exception

exception merlin.core.analysistask.InvalidParameterException[source]

Bases: Exception

class merlin.core.analysistask.AnalysisTask(dataSet, parameters=None, analysisName=None)[source]

Bases: abc.ABC

An abstract class for performing analysis on a DataSet. Subclasses should implement the analysis to perform in the run_analysis() function.

save(overwrite=False) → None[source]

Save a copy of this AnalysisTask into the data set.

Parameters

overwrite – flag indicating if an existing analysis task with the same name as this analysis task should be overwritten even if the specified parameters are different.

Raises

AnalysisAlreadyExistsException – if an analysis task with the same name as this analysis task already exists in the data set with different parameters.

run(overwrite=True) → None[source]

Run this AnalysisTask.

Upon completion of the analysis, this function informs the DataSet that analysis is complete.

Parameters

overwrite – flag indicating if previous analysis from this analysis task should be overwritten.

Raises

AnalysisAlreadyStartedException – if this analysis task is currently already running or if overwrite is not True and this analysis task has already completed or exited with an error.

abstract get_estimated_memory() → float[source]

Get an estimate of how much memory is required for this AnalysisTask.

Returns

a memory estimate in megabytes.

abstract get_estimated_time() → float[source]

Get an estimate for the amount of time required to complete this AnalysisTask.

Returns

a time estimate in minutes.

abstract get_dependencies() → List[str][source]

Get the analysis tasks that must be completed before this analysis task can proceed.

Returns

a list containing the names of the analysis tasks that

this analysis task depends on. If there are no dependencies, an empty list is returned.

get_parameters()[source]

Get the parameters for this analysis task.

Returns

the parameter dictionary

is_error()[source]

Determines if an error has occurred while running this analysis

Returns

True if the analysis is complete and otherwise False.

is_complete()[source]

Determines if this analysis has completed successfully

Returns

True if the analysis is complete and otherwise False.

is_started()[source]

Determines if this analysis has started.

Returns

True if the analysis has begun and otherwise False.

is_running()[source]

Determines if this analysis task is expected to be running, but has unexpectedly stopped for more than two minutes.

get_analysis_name()[source]

Get the name for this AnalysisTask.

Returns

the name of this AnalysisTask

is_parallel()[source]

Determine if this analysis task uses multiple cores.

class merlin.core.analysistask.InternallyParallelAnalysisTask(dataSet, parameters=None, analysisName=None)[source]

Bases: merlin.core.analysistask.AnalysisTask

An abstract class for analysis that can only be run in one part, but can internally be sped up using multiple processes. Subclasses should implement the analysis to perform in te run_analysis() function.

set_core_count(coreCount)[source]

Set the number of parallel processes this analysis task is allowed to use.

is_parallel()[source]

Determine if this analysis task uses multiple cores.

class merlin.core.analysistask.ParallelAnalysisTask(dataSet, parameters=None, analysisName=None)[source]

Bases: merlin.core.analysistask.AnalysisTask

An abstract class for analysis that can be run in multiple parts independently. Subclasses should implement the analysis to perform in the run_analysis() function

abstract fragment_count()[source]
run(fragmentIndex: int = None, overwrite=True) → None[source]

Run the specified index of this analysis task.

If fragment index is not provided. All fragments for this analysis task are run in serial.

Parameters

fragmentIndex – the index of the analysis fragment to run or None if all fragments should be run.

is_error(fragmentIndex=None)[source]

Determines if an error has occurred while running this analysis

Returns

True if the analysis is complete and otherwise False.

is_complete(fragmentIndex=None)[source]

Determines if this analysis has completed successfully

Returns

True if the analysis is complete and otherwise False.

is_started(fragmentIndex=None)[source]

Determines if this analysis has started.

Returns

True if the analysis has begun and otherwise False.

is_running(fragmentIndex=None)[source]

Determines if this analysis task is expected to be running, but has unexpectedly stopped for more than two minutes.

is_parallel()[source]

Determine if this analysis task uses multiple cores.

merlin.core.dataset module

exception merlin.core.dataset.DataFormatException[source]

Bases: Exception

class merlin.core.dataset.DataSet(dataDirectoryName: str, dataHome: str = None, analysisHome: str = None)[source]

Bases: object

save_workflow(workflowString: str) → str[source]

Save a snakemake workflow for analysis of this dataset.

Parameters

workflowString – a string containing the snakemake workflow to save

Returns: the path to the saved workflow

get_snakemake_path() → str[source]

Get the directory for storing files related to snakemake.

Returns: the snakemake path as a string

save_figure(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], figure: matplotlib.figure.Figure, figureName: str, subdirectory: str = 'figures') → None[source]

Save the figure into the analysis results for this DataSet

This function will save the figure in both png and pdf formats.

Parameters
  • analysisTask – the analysis task that generated this figure.

  • figure – the figure handle for the figure to save

  • figureName – the name of the file to store the figure in, excluding extension

  • subdirectory – the name of the subdirectory within the specified analysis task to save the figures.

figure_exists(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], figureName: str, subdirectory: str = 'figures') → bool[source]

Determine if a figure with the specified name has been saved within the results for the specified analysis task.

This function only checks for the png formats.

Parameters
  • analysisTask – the analysis task that generated this figure.

  • figureName – the name of the file to store the figure in, excluding extension

  • subdirectory – the name of the subdirectory within the specified analysis task to save the figures.

get_analysis_image_set(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], imageBaseName: str, imageIndex: int = None) → numpy.ndarray[source]

Get an analysis image set saved in the analysis for this data set.

Parameters
  • analysisTask – the analysis task that generated and stored the image set.

  • imageBaseName – the base name of the image

  • imageIndex – index of the image set to retrieve

get_analysis_image(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], imageBaseName: str, imageIndex: int, imagesPerSlice: int, sliceIndex: int, frameIndex: int) → numpy.ndarray[source]

Get an image from an image set save in the analysis for this data set.

Parameters
  • analysisTask – the analysis task that generated and stored the image set.

  • imageBaseName – the base name of the image

  • imageIndex – index of the image set to retrieve

  • imagesPerSlice – the number of images in each slice of the image file

  • sliceIndex – the index of the slice to get the image

  • frameIndex – the index of the frame in the specified slice

writer_for_analysis_images(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], imageBaseName: str, imageIndex: int = None, imagej: bool = True) → tifffile.tifffile.TiffWriter[source]

Get a writer for writing tiff files from an analysis task.

Parameters
  • analysisTask

  • imageBaseName

  • imageIndex

  • imagej

Returns:

static analysis_tiff_description(sliceCount: int, frameCount: int) → Dict[source]
list_analysis_files(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, subdirectory: str = None, extension: str = None, fullPath: bool = True) → List[str][source]
save_dataframe_to_csv(dataframe: pandas.core.frame.DataFrame, resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None, **kwargs) → None[source]

Save a pandas data frame to a csv file stored in this dataset.

If a previous pandas data frame has been save with the same resultName, it will be overwritten

Parameters
  • dataframe – the data frame to save

  • resultName – the name of the output file

  • analysisTask – the analysis task that the dataframe should be saved under. If None, the dataframe is saved to the data set root.

  • resultIndex – index of the dataframe to save or None if no index should be specified

  • subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.

  • **kwargs – arguments to pass on to pandas.to_csv

load_dataframe_from_csv(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None, **kwargs) → Optional[pandas.core.frame.DataFrame][source]

Load a pandas data frame from a csv file stored in this data set.

Parameters
  • resultName

  • analysisTask

  • resultIndex

  • subdirectory

  • **kwargs

Returns

the pandas data frame

Raises

FileNotFoundError – if the file does not exist

open_pandas_hdfstore(mode: str, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → pandas.io.pytables.HDFStore[source]
delete_pandas_hdfstore(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → None[source]
open_table(mode: str, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → <module 'tables.file' from 'c:\\users\\george\\pymol\\envs\\merlin-env\\lib\\site-packages\\tables\\file.py'>[source]
delete_table(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → None[source]

Delete an hdf5 file stored in this data set if it exists.

Parameters
  • resultName – the name of the output file

  • analysisTask – the analysis task that should be associated with this hdf5 file. If None, the file is assumed to be in the data set root.

  • resultIndex – index of the dataframe to save or None if no index should be specified

  • subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.

open_hdf5_file(mode: str, resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → h5py._hl.files.File[source]

Open an hdf5 file stored in this data set.

Parameters
  • mode – the mode for opening the file, either ‘r’, ‘r+’, ‘w’, ‘w-‘, or ‘a’.

  • resultName – the name of the output file

  • analysisTask – the analysis task that should be associated with this hdf5 file. If None, the file is assumed to be in the data set root.

  • resultIndex – index of the dataframe to save or None if no index should be specified

  • subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.

Returns

a h5py file object connected to the hdf5 file

Raise:
FileNotFoundError: if the mode is ‘r’ and the specified hdf5 file

does not exist

delete_hdf5_file(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → None[source]

Delete an hdf5 file stored in this data set if it exists.

Parameters
  • resultName – the name of the output file

  • analysisTask – the analysis task that should be associated with this hdf5 file. If None, the file is assumed to be in the data set root.

  • resultIndex – index of the dataframe to save or None if no index should be specified

  • subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.

save_json_analysis_result(analysisResult: Dict, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → None[source]
load_json_analysis_result(resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → Dict[source]
load_pickle_analysis_result(resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → Dict[source]
save_pickle_analysis_result(analysisResult, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None)[source]
save_numpy_analysis_result(analysisResult: numpy.ndarray, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → None[source]
save_numpy_txt_analysis_result(analysisResult: numpy.ndarray, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → None[source]
load_numpy_analysis_result(resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → numpy.array[source]
load_numpy_analysis_result_if_available(resultName: str, analysisName: str, defaultValue, resultIndex: int = None, subdirectory: str = None) → numpy.array[source]

Load the specified analysis result or return the specified default value if the analysis result does not exist.

Parameters
  • resultName – The name of the analysis result

  • analysisName – The name of the analysis task the result is saved in

  • defaultValue – The value to return if the specified analysis result does not exist

  • resultIndex – The index of the analysi result

  • subdirectory – The subdirectory within the analysis task that the result is saved in

Returns: The analysis result or defaultValue if the analysis result

doesn’t exist.

get_analysis_subdirectory(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], subdirectory: str = None, create: bool = True) → str[source]

analysisTask can either be the class or a string containing the class name.

create - Flag indicating if the analysis subdirectory should be

created if it does not already exist.

get_task_subdirectory(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str])[source]
get_log_subdirectory(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str])[source]
save_analysis_task(analysisTask: merlin.core.analysistask.AnalysisTask, overwrite: bool = False)[source]
load_analysis_task(analysisTaskName: str) → merlin.core.analysistask.AnalysisTask[source]
delete_analysis(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str]) → None[source]

Remove all files associated with the provided analysis from this data set.

Before deleting an analysis task, it must be verified that the analysis task is not running.

get_analysis_tasks() → List[str][source]

Get a list of the analysis tasks within this dataset.

Returns: A list of the analysis task names.

analysis_exists(analysisTaskName: str) → bool[source]

Determine if an analysis task with the specified name exists in this dataset.

get_logger(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → logging.Logger[source]
close_logger(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]
get_analysis_environment(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]

Get the environment variables for the system used to run the specified analysis task.

Parameters
  • analysisTask – The completed analysis task to get the environment variables for.

  • fragmentIndex – The fragment index of the analysis task to get the environment variables for.

Returns: A dictionary of the environment variables. If the job has not

yet run, then None is returned.

record_analysis_started(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]
record_analysis_running(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]
record_analysis_complete(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]
record_analysis_error(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]
get_analysis_start_time(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → float[source]

Get the time that this analysis task started

Returns

The start time for the analysis task execution in seconds since the epoch in UTC.

get_analysis_complete_time(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → float[source]

Get the time that this analysis task completed.

Returns

The completion time for the analysis task execution in seconds since the epoch in UTC.

get_analysis_elapsed_time(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → float[source]

Get the time that this analysis took to complete.

Returns

The elapsed time for the analysis task execution in seconds. Returns None if the analysis task has not yet completed.

is_analysis_idle(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]
check_analysis_started(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]
check_analysis_done(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]
analysis_done_filename(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → str[source]
check_analysis_error(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]
reset_analysis_status(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None)[source]
class merlin.core.dataset.ImageDataSet(dataDirectoryName: str, dataHome: str = None, analysisHome: str = None, microscopeParametersName: str = None)[source]

Bases: merlin.core.dataset.DataSet

get_image_file_names()[source]
load_image(imagePath, frameIndex)[source]
image_stack_size(imagePath)[source]

Get the size of the image stack stored in the specified image path.

Returns

a three element list with [width, height, frameCount] or None

if the file does not exist

get_microns_per_pixel()[source]

Get the conversion factor to convert pixels to microns.

get_image_dimensions()[source]

Get the dimensions of the images in this data set.

Returns

A tuple containing the width and height of each image in pixels.

get_image_xml_metadata(imagePath: str) → Dict[source]

Get the xml metadata stored for the specified image.

Parameters

imagePath – the path to the image file (.dax or .tif)

Returns: the metadata from the associated xml file

class merlin.core.dataset.MERFISHDataSet(dataDirectoryName: str, codebookNames: List[str] = None, dataOrganizationName: str = None, positionFileName: str = None, dataHome: str = None, analysisHome: str = None, microscopeParametersName: str = None)[source]

Bases: merlin.core.dataset.ImageDataSet

save_codebook(codebook: merlin.data.codebook.Codebook) → None[source]

Store the specified codebook in this dataset.

If a codebook with the same codebook index and codebook name as the specified codebook already exists in this dataset, it is not overwritten.

Parameters

codebook – the codebook to store

Raises

FileExistsError – If a codebook with the same codebook index but a different codebook name is already save within this dataset.

load_codebooks() → List[merlin.data.codebook.Codebook][source]

Get all the codebooks stored within this dataset.

Returns

A list of all the stored codebooks.

load_codebook(codebookIndex: int = 0) → Optional[merlin.data.codebook.Codebook][source]

Load the codebook stored within this dataset with the specified index.

Parameters

codebookIndex – the index of the codebook to load.

Returns

The codebook stored with the specified codebook index. If no codebook exists with the specified index then None is returned.

get_stored_codebook_name(codebookIndex: int = 0) → Optional[str][source]

Get the name of the codebook stored within this dataset with the specified index.

Parameters

codebookIndex – the index of the codebook to load to find the name of.

Returns

The name of the codebook stored with the specified codebook index. If no codebook exists with the specified index then None is returned.

get_codebooks() → List[merlin.data.codebook.Codebook][source]

Get the codebooks associated with this dataset.

Returns

A list containing the codebooks for this dataset.

get_codebook(codebookIndex: int = 0) → merlin.data.codebook.Codebook[source]
get_data_organization() → merlin.data.dataorganization.DataOrganization[source]
get_stage_positions() → List[List[float]][source]
get_fov_offset(fov: int) → Tuple[float, float][source]

Get the offset of the specified fov in the global coordinate system. This offset is based on the anticipated stage position.

Parameters

fov – index of the field of view

Returns

A tuple specifying the x and y offset of the top right corner of the specified fov in pixels.

z_index_to_position(zIndex: int) → float[source]

Get the z position associated with the provided z index.

position_to_z_index(zPosition: float) → int[source]

Get the z index associated with the specified z position

Raises

Exception – If the provided z position is not specified in this dataset

get_z_positions() → List[float][source]

Get the z positions present in this dataset.

Returns

A sorted list of all unique z positions

get_fovs() → List[int][source]
get_imaging_rounds() → List[int][source]
get_raw_image(dataChannel, fov, zPosition)[source]
get_fiducial_image(dataChannel, fov)[source]

merlin.core.executor module

class merlin.core.executor.Executor[source]

Bases: object

abstract run(task: merlin.core.analysistask.AnalysisTask, index: int = None, rerunCompleted: bool = False) → None[source]

Run an analysis task.

This method will not run analysis tasks that are already currently running and analysis is terminated early due to error or otherwise will not be restarted.

Parameters
  • task – the analysis task to run.

  • index – index of the analysis to run for a parallel analysis task.

  • rerunCompleted – flag indicating if previous analysis should be run again even if it has previously completed. If overwrite is True, analysis will be run on the task regardless of its status. If overwrite is False, analysis will only be run on the task or fragments of the task that have either not been started or have previously completed in error.

class merlin.core.executor.LocalExecutor(coreCount=None)[source]

Bases: merlin.core.executor.Executor

run(task: merlin.core.analysistask.AnalysisTask, index: int = None, rerunCompleted: bool = False) → None[source]

Run an analysis task.

This method will not run analysis tasks that are already currently running and analysis is terminated early due to error or otherwise will not be restarted.

Parameters
  • task – the analysis task to run.

  • index – index of the analysis to run for a parallel analysis task.

  • rerunCompleted – flag indicating if previous analysis should be run again even if it has previously completed. If overwrite is True, analysis will be run on the task regardless of its status. If overwrite is False, analysis will only be run on the task or fragments of the task that have either not been started or have previously completed in error.

merlin.core.scheduler module

Module contents