merlin.core package¶

Submodules¶

merlin.core.analysistask module¶

exception merlin.core.analysistask.AnalysisAlreadyStartedException[source]¶: Bases: Exception

exception merlin.core.analysistask.AnalysisAlreadyExistsException[source]¶: Bases: Exception

exception merlin.core.analysistask.InvalidParameterException[source]¶: Bases: Exception

class merlin.core.analysistask.AnalysisTask(dataSet, parameters=None, analysisName=None)[source]¶

Bases: abc.ABC

An abstract class for performing analysis on a DataSet. Subclasses should implement the analysis to perform in the run_analysis() function.

save(overwrite=False) → None[source]¶

Save a copy of this AnalysisTask into the data set.

Parameters: overwrite – flag indicating if an existing analysis task with the same name as this analysis task should be overwritten even if the specified parameters are different.
Raises: AnalysisAlreadyExistsException – if an analysis task with the same name as this analysis task already exists in the data set with different parameters.

run(overwrite=True) → None[source]¶

Run this AnalysisTask.

Upon completion of the analysis, this function informs the DataSet that analysis is complete.

Parameters: overwrite – flag indicating if previous analysis from this analysis task should be overwritten.
Raises: AnalysisAlreadyStartedException – if this analysis task is currently already running or if overwrite is not True and this analysis task has already completed or exited with an error.

abstract get_estimated_memory() → float[source]¶

Get an estimate of how much memory is required for this AnalysisTask.

Returns: a memory estimate in megabytes.

abstract get_estimated_time() → float[source]¶

Get an estimate for the amount of time required to complete this AnalysisTask.

Returns: a time estimate in minutes.

abstract get_dependencies() → List[str][source]¶

Get the analysis tasks that must be completed before this analysis task can proceed.

Returns

a list containing the names of the analysis tasks that: this analysis task depends on. If there are no dependencies, an empty list is returned.

get_parameters()[source]¶

Get the parameters for this analysis task.

Returns: the parameter dictionary

is_error()[source]¶

Determines if an error has occurred while running this analysis

Returns: True if the analysis is complete and otherwise False.

is_complete()[source]¶

Determines if this analysis has completed successfully

Returns: True if the analysis is complete and otherwise False.

is_started()[source]¶

Determines if this analysis has started.

Returns: True if the analysis has begun and otherwise False.

is_running()[source]¶: Determines if this analysis task is expected to be running, but has unexpectedly stopped for more than two minutes.

get_analysis_name()[source]¶

Get the name for this AnalysisTask.

Returns: the name of this AnalysisTask

is_parallel()[source]¶: Determine if this analysis task uses multiple cores.

class merlin.core.analysistask.InternallyParallelAnalysisTask(dataSet, parameters=None, analysisName=None)[source]¶

Bases: merlin.core.analysistask.AnalysisTask

An abstract class for analysis that can only be run in one part, but can internally be sped up using multiple processes. Subclasses should implement the analysis to perform in te run_analysis() function.

set_core_count(coreCount)[source]¶: Set the number of parallel processes this analysis task is allowed to use.

is_parallel()[source]¶: Determine if this analysis task uses multiple cores.

class merlin.core.analysistask.ParallelAnalysisTask(dataSet, parameters=None, analysisName=None)[source]¶

Bases: merlin.core.analysistask.AnalysisTask

An abstract class for analysis that can be run in multiple parts independently. Subclasses should implement the analysis to perform in the run_analysis() function

abstract fragment_count()[source]¶

run(fragmentIndex: int = None, overwrite=True) → None[source]¶

Run the specified index of this analysis task.

If fragment index is not provided. All fragments for this analysis task are run in serial.

Parameters: fragmentIndex – the index of the analysis fragment to run or None if all fragments should be run.

is_error(fragmentIndex=None)[source]¶

Determines if an error has occurred while running this analysis

Returns: True if the analysis is complete and otherwise False.

is_complete(fragmentIndex=None)[source]¶

Determines if this analysis has completed successfully

Returns: True if the analysis is complete and otherwise False.

is_started(fragmentIndex=None)[source]¶

Determines if this analysis has started.

Returns: True if the analysis has begun and otherwise False.

is_running(fragmentIndex=None)[source]¶: Determines if this analysis task is expected to be running, but has unexpectedly stopped for more than two minutes.

is_parallel()[source]¶: Determine if this analysis task uses multiple cores.

merlin.core.dataset module¶

exception merlin.core.dataset.DataFormatException[source]¶: Bases: Exception

class merlin.core.dataset.DataSet(dataDirectoryName: str, dataHome: str = None, analysisHome: str = None)[source]¶

Bases: object

save_workflow(workflowString: str) → str[source]¶

Save a snakemake workflow for analysis of this dataset.

Parameters: workflowString – a string containing the snakemake workflow to save

Returns: the path to the saved workflow

get_snakemake_path() → str[source]¶

Get the directory for storing files related to snakemake.

Returns: the snakemake path as a string

save_figure(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], figure: matplotlib.figure.Figure, figureName: str, subdirectory: str = 'figures') → None[source]¶

Save the figure into the analysis results for this DataSet

This function will save the figure in both png and pdf formats.

Parameters

analysisTask – the analysis task that generated this figure.
figure – the figure handle for the figure to save
figureName – the name of the file to store the figure in, excluding extension
subdirectory – the name of the subdirectory within the specified analysis task to save the figures.

figure_exists(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], figureName: str, subdirectory: str = 'figures') → bool[source]¶

Determine if a figure with the specified name has been saved within the results for the specified analysis task.

This function only checks for the png formats.

Parameters

analysisTask – the analysis task that generated this figure.
figureName – the name of the file to store the figure in, excluding extension
subdirectory – the name of the subdirectory within the specified analysis task to save the figures.

get_analysis_image_set(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], imageBaseName: str, imageIndex: int = None) → numpy.ndarray[source]¶

Get an analysis image set saved in the analysis for this data set.

Parameters

analysisTask – the analysis task that generated and stored the image set.
imageBaseName – the base name of the image
imageIndex – index of the image set to retrieve

get_analysis_image(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], imageBaseName: str, imageIndex: int, imagesPerSlice: int, sliceIndex: int, frameIndex: int) → numpy.ndarray[source]¶

Get an image from an image set save in the analysis for this data set.

Parameters

analysisTask – the analysis task that generated and stored the image set.
imageBaseName – the base name of the image
imageIndex – index of the image set to retrieve
imagesPerSlice – the number of images in each slice of the image file
sliceIndex – the index of the slice to get the image
frameIndex – the index of the frame in the specified slice

writer_for_analysis_images(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], imageBaseName: str, imageIndex: int = None, imagej: bool = True) → tifffile.tifffile.TiffWriter[source]¶

Get a writer for writing tiff files from an analysis task.

Parameters

analysisTask –
imageBaseName –
imageIndex –
imagej –

Returns:

static analysis_tiff_description(sliceCount: int, frameCount: int) → Dict[source]¶

list_analysis_files(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, subdirectory: str = None, extension: str = None, fullPath: bool = True) → List[str][source]¶

save_dataframe_to_csv(dataframe: pandas.core.frame.DataFrame, resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None, **kwargs) → None[source]¶

Save a pandas data frame to a csv file stored in this dataset.

If a previous pandas data frame has been save with the same resultName, it will be overwritten

Parameters

dataframe – the data frame to save
resultName – the name of the output file
analysisTask – the analysis task that the dataframe should be saved under. If None, the dataframe is saved to the data set root.
resultIndex – index of the dataframe to save or None if no index should be specified
subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.
**kwargs – arguments to pass on to pandas.to_csv

load_dataframe_from_csv(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None, **kwargs) → Optional[pandas.core.frame.DataFrame][source]¶

Load a pandas data frame from a csv file stored in this data set.

Parameters

resultName –
analysisTask –
resultIndex –
subdirectory –
**kwargs –

Returns

the pandas data frame

Raises

FileNotFoundError – if the file does not exist

open_pandas_hdfstore(mode: str, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → pandas.io.pytables.HDFStore[source]¶

delete_pandas_hdfstore(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → None[source]¶

open_table(mode: str, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → <module 'tables.file' from 'c:\\users\\george\\pymol\\envs\\merlin-env\\lib\\site-packages\\tables\\file.py'>[source]¶

delete_table(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → None[source]¶

Delete an hdf5 file stored in this data set if it exists.

Parameters

resultName – the name of the output file
analysisTask – the analysis task that should be associated with this hdf5 file. If None, the file is assumed to be in the data set root.
resultIndex – index of the dataframe to save or None if no index should be specified
subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.

open_hdf5_file(mode: str, resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → h5py._hl.files.File[source]¶

Open an hdf5 file stored in this data set.

Parameters

mode – the mode for opening the file, either ‘r’, ‘r+’, ‘w’, ‘w-‘, or ‘a’.
resultName – the name of the output file
analysisTask – the analysis task that should be associated with this hdf5 file. If None, the file is assumed to be in the data set root.
resultIndex – index of the dataframe to save or None if no index should be specified
subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.

Returns

a h5py file object connected to the hdf5 file

Raise:

FileNotFoundError: if the mode is ‘r’ and the specified hdf5 file: does not exist

delete_hdf5_file(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → None[source]¶

Delete an hdf5 file stored in this data set if it exists.

Parameters

resultName – the name of the output file
analysisTask – the analysis task that should be associated with this hdf5 file. If None, the file is assumed to be in the data set root.
resultIndex – index of the dataframe to save or None if no index should be specified
subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.

save_json_analysis_result(analysisResult: Dict, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → None[source]¶

load_json_analysis_result(resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → Dict[source]¶

load_pickle_analysis_result(resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → Dict[source]¶

save_pickle_analysis_result(analysisResult, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None)[source]¶

save_numpy_analysis_result(analysisResult: numpy.ndarray, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → None[source]¶

save_numpy_txt_analysis_result(analysisResult: numpy.ndarray, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → None[source]¶

load_numpy_analysis_result(resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → numpy.array[source]¶

load_numpy_analysis_result_if_available(resultName: str, analysisName: str, defaultValue, resultIndex: int = None, subdirectory: str = None) → numpy.array[source]¶

Load the specified analysis result or return the specified default value if the analysis result does not exist.

Parameters

resultName – The name of the analysis result
analysisName – The name of the analysis task the result is saved in
defaultValue – The value to return if the specified analysis result does not exist
resultIndex – The index of the analysi result
subdirectory – The subdirectory within the analysis task that the result is saved in

Returns: The analysis result or defaultValue if the analysis result: doesn’t exist.

get_analysis_subdirectory(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], subdirectory: str = None, create: bool = True) → str[source]¶

analysisTask can either be the class or a string containing the class name.

create - Flag indicating if the analysis subdirectory should be: created if it does not already exist.

get_task_subdirectory(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str])[source]¶

get_log_subdirectory(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str])[source]¶

save_analysis_task(analysisTask: merlin.core.analysistask.AnalysisTask, overwrite: bool = False)[source]¶

load_analysis_task(analysisTaskName: str) → merlin.core.analysistask.AnalysisTask[source]¶

delete_analysis(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str]) → None[source]¶

Remove all files associated with the provided analysis from this data set.

Before deleting an analysis task, it must be verified that the analysis task is not running.

get_analysis_tasks() → List[str][source]¶

Get a list of the analysis tasks within this dataset.

Returns: A list of the analysis task names.

analysis_exists(analysisTaskName: str) → bool[source]¶: Determine if an analysis task with the specified name exists in this dataset.

get_logger(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → logging.Logger[source]¶

close_logger(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶

get_analysis_environment(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶

Get the environment variables for the system used to run the specified analysis task.

Parameters

analysisTask – The completed analysis task to get the environment variables for.
fragmentIndex – The fragment index of the analysis task to get the environment variables for.

Returns: A dictionary of the environment variables. If the job has not: yet run, then None is returned.

record_analysis_started(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶

record_analysis_running(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶

record_analysis_complete(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶

record_analysis_error(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶

get_analysis_start_time(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → float[source]¶

Get the time that this analysis task started

Returns: The start time for the analysis task execution in seconds since the epoch in UTC.

get_analysis_complete_time(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → float[source]¶

Get the time that this analysis task completed.

Returns: The completion time for the analysis task execution in seconds since the epoch in UTC.

get_analysis_elapsed_time(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → float[source]¶

Get the time that this analysis took to complete.

Returns: The elapsed time for the analysis task execution in seconds. Returns None if the analysis task has not yet completed.

is_analysis_idle(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]¶

check_analysis_started(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]¶

check_analysis_done(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]¶

analysis_done_filename(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → str[source]¶

check_analysis_error(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]¶

reset_analysis_status(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None)[source]¶

class merlin.core.dataset.ImageDataSet(dataDirectoryName: str, dataHome: str = None, analysisHome: str = None, microscopeParametersName: str = None)[source]¶

Bases: merlin.core.dataset.DataSet

get_image_file_names()[source]¶

load_image(imagePath, frameIndex)[source]¶

image_stack_size(imagePath)[source]¶

Get the size of the image stack stored in the specified image path.

Returns

a three element list with [width, height, frameCount] or None: if the file does not exist

get_microns_per_pixel()[source]¶: Get the conversion factor to convert pixels to microns.

get_image_dimensions()[source]¶

Get the dimensions of the images in this data set.

Returns: A tuple containing the width and height of each image in pixels.

get_image_xml_metadata(imagePath: str) → Dict[source]¶

Get the xml metadata stored for the specified image.

Parameters: imagePath – the path to the image file (.dax or .tif)

Returns: the metadata from the associated xml file

class merlin.core.dataset.MERFISHDataSet(dataDirectoryName: str, codebookNames: List[str] = None, dataOrganizationName: str = None, positionFileName: str = None, dataHome: str = None, analysisHome: str = None, microscopeParametersName: str = None)[source]¶

Bases: merlin.core.dataset.ImageDataSet

save_codebook(codebook: merlin.data.codebook.Codebook) → None[source]¶

Store the specified codebook in this dataset.

If a codebook with the same codebook index and codebook name as the specified codebook already exists in this dataset, it is not overwritten.

Parameters: codebook – the codebook to store
Raises: FileExistsError – If a codebook with the same codebook index but a different codebook name is already save within this dataset.

load_codebooks() → List[merlin.data.codebook.Codebook][source]¶

Get all the codebooks stored within this dataset.

Returns: A list of all the stored codebooks.

load_codebook(codebookIndex: int = 0) → Optional[merlin.data.codebook.Codebook][source]¶

Load the codebook stored within this dataset with the specified index.

Parameters: codebookIndex – the index of the codebook to load.
Returns: The codebook stored with the specified codebook index. If no codebook exists with the specified index then None is returned.

get_stored_codebook_name(codebookIndex: int = 0) → Optional[str][source]¶

Get the name of the codebook stored within this dataset with the specified index.

Parameters: codebookIndex – the index of the codebook to load to find the name of.
Returns: The name of the codebook stored with the specified codebook index. If no codebook exists with the specified index then None is returned.

get_codebooks() → List[merlin.data.codebook.Codebook][source]¶

Get the codebooks associated with this dataset.

Returns: A list containing the codebooks for this dataset.

get_codebook(codebookIndex: int = 0) → merlin.data.codebook.Codebook[source]¶

get_data_organization() → merlin.data.dataorganization.DataOrganization[source]¶

get_stage_positions() → List[List[float]][source]¶

get_fov_offset(fov: int) → Tuple[float, float][source]¶

Get the offset of the specified fov in the global coordinate system. This offset is based on the anticipated stage position.

Parameters: fov – index of the field of view
Returns: A tuple specifying the x and y offset of the top right corner of the specified fov in pixels.

z_index_to_position(zIndex: int) → float[source]¶: Get the z position associated with the provided z index.

position_to_z_index(zPosition: float) → int[source]¶

Get the z index associated with the specified z position

Raises: Exception – If the provided z position is not specified in this dataset

get_z_positions() → List[float][source]¶

Get the z positions present in this dataset.

Returns: A sorted list of all unique z positions

get_fovs() → List[int][source]¶

get_imaging_rounds() → List[int][source]¶

get_raw_image(dataChannel, fov, zPosition)[source]¶

get_fiducial_image(dataChannel, fov)[source]¶

merlin.core.executor module¶

class merlin.core.executor.Executor[source]¶

Bases: object

abstract run(task: merlin.core.analysistask.AnalysisTask, index: int = None, rerunCompleted: bool = False) → None[source]¶

Run an analysis task.

This method will not run analysis tasks that are already currently running and analysis is terminated early due to error or otherwise will not be restarted.

Parameters

task – the analysis task to run.
index – index of the analysis to run for a parallel analysis task.
rerunCompleted – flag indicating if previous analysis should be run again even if it has previously completed. If overwrite is True, analysis will be run on the task regardless of its status. If overwrite is False, analysis will only be run on the task or fragments of the task that have either not been started or have previously completed in error.

class merlin.core.executor.LocalExecutor(coreCount=None)[source]¶

Bases: merlin.core.executor.Executor

run(task: merlin.core.analysistask.AnalysisTask, index: int = None, rerunCompleted: bool = False) → None[source]¶

Run an analysis task.

This method will not run analysis tasks that are already currently running and analysis is terminated early due to error or otherwise will not be restarted.

Parameters

task – the analysis task to run.
index – index of the analysis to run for a parallel analysis task.
rerunCompleted – flag indicating if previous analysis should be run again even if it has previously completed. If overwrite is True, analysis will be run on the task regardless of its status. If overwrite is False, analysis will only be run on the task or fragments of the task that have either not been started or have previously completed in error.

merlin.core package¶

Submodules¶

merlin.core.analysistask module¶

merlin.core.dataset module¶

merlin.core.executor module¶

merlin.core.scheduler module¶

Module contents¶