merlin.core package¶
Submodules¶
merlin.core.analysistask module¶
-
class
merlin.core.analysistask.
AnalysisTask
(dataSet, parameters=None, analysisName=None)[source]¶ Bases:
abc.ABC
An abstract class for performing analysis on a DataSet. Subclasses should implement the analysis to perform in the run_analysis() function.
-
save
(overwrite=False) → None[source]¶ Save a copy of this AnalysisTask into the data set.
- Parameters
overwrite – flag indicating if an existing analysis task with the same name as this analysis task should be overwritten even if the specified parameters are different.
- Raises
AnalysisAlreadyExistsException – if an analysis task with the same name as this analysis task already exists in the data set with different parameters.
-
run
(overwrite=True) → None[source]¶ Run this AnalysisTask.
Upon completion of the analysis, this function informs the DataSet that analysis is complete.
- Parameters
overwrite – flag indicating if previous analysis from this analysis task should be overwritten.
- Raises
AnalysisAlreadyStartedException – if this analysis task is currently already running or if overwrite is not True and this analysis task has already completed or exited with an error.
-
abstract
get_estimated_memory
() → float[source]¶ Get an estimate of how much memory is required for this AnalysisTask.
- Returns
a memory estimate in megabytes.
-
abstract
get_estimated_time
() → float[source]¶ Get an estimate for the amount of time required to complete this AnalysisTask.
- Returns
a time estimate in minutes.
-
abstract
get_dependencies
() → List[str][source]¶ Get the analysis tasks that must be completed before this analysis task can proceed.
- Returns
- a list containing the names of the analysis tasks that
this analysis task depends on. If there are no dependencies, an empty list is returned.
-
get_parameters
()[source]¶ Get the parameters for this analysis task.
- Returns
the parameter dictionary
-
is_error
()[source]¶ Determines if an error has occurred while running this analysis
- Returns
True if the analysis is complete and otherwise False.
-
is_complete
()[source]¶ Determines if this analysis has completed successfully
- Returns
True if the analysis is complete and otherwise False.
-
is_started
()[source]¶ Determines if this analysis has started.
- Returns
True if the analysis has begun and otherwise False.
-
is_running
()[source]¶ Determines if this analysis task is expected to be running, but has unexpectedly stopped for more than two minutes.
-
-
class
merlin.core.analysistask.
InternallyParallelAnalysisTask
(dataSet, parameters=None, analysisName=None)[source]¶ Bases:
merlin.core.analysistask.AnalysisTask
An abstract class for analysis that can only be run in one part, but can internally be sped up using multiple processes. Subclasses should implement the analysis to perform in te run_analysis() function.
-
class
merlin.core.analysistask.
ParallelAnalysisTask
(dataSet, parameters=None, analysisName=None)[source]¶ Bases:
merlin.core.analysistask.AnalysisTask
An abstract class for analysis that can be run in multiple parts independently. Subclasses should implement the analysis to perform in the run_analysis() function
-
run
(fragmentIndex: int = None, overwrite=True) → None[source]¶ Run the specified index of this analysis task.
If fragment index is not provided. All fragments for this analysis task are run in serial.
- Parameters
fragmentIndex – the index of the analysis fragment to run or None if all fragments should be run.
-
is_error
(fragmentIndex=None)[source]¶ Determines if an error has occurred while running this analysis
- Returns
True if the analysis is complete and otherwise False.
-
is_complete
(fragmentIndex=None)[source]¶ Determines if this analysis has completed successfully
- Returns
True if the analysis is complete and otherwise False.
-
is_started
(fragmentIndex=None)[source]¶ Determines if this analysis has started.
- Returns
True if the analysis has begun and otherwise False.
-
merlin.core.dataset module¶
-
class
merlin.core.dataset.
DataSet
(dataDirectoryName: str, dataHome: str = None, analysisHome: str = None)[source]¶ Bases:
object
-
save_workflow
(workflowString: str) → str[source]¶ Save a snakemake workflow for analysis of this dataset.
- Parameters
workflowString – a string containing the snakemake workflow to save
Returns: the path to the saved workflow
-
get_snakemake_path
() → str[source]¶ Get the directory for storing files related to snakemake.
Returns: the snakemake path as a string
-
save_figure
(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], figure: matplotlib.figure.Figure, figureName: str, subdirectory: str = 'figures') → None[source]¶ Save the figure into the analysis results for this DataSet
This function will save the figure in both png and pdf formats.
- Parameters
analysisTask – the analysis task that generated this figure.
figure – the figure handle for the figure to save
figureName – the name of the file to store the figure in, excluding extension
subdirectory – the name of the subdirectory within the specified analysis task to save the figures.
-
figure_exists
(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], figureName: str, subdirectory: str = 'figures') → bool[source]¶ Determine if a figure with the specified name has been saved within the results for the specified analysis task.
This function only checks for the png formats.
- Parameters
analysisTask – the analysis task that generated this figure.
figureName – the name of the file to store the figure in, excluding extension
subdirectory – the name of the subdirectory within the specified analysis task to save the figures.
-
get_analysis_image_set
(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], imageBaseName: str, imageIndex: int = None) → numpy.ndarray[source]¶ Get an analysis image set saved in the analysis for this data set.
- Parameters
analysisTask – the analysis task that generated and stored the image set.
imageBaseName – the base name of the image
imageIndex – index of the image set to retrieve
-
get_analysis_image
(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], imageBaseName: str, imageIndex: int, imagesPerSlice: int, sliceIndex: int, frameIndex: int) → numpy.ndarray[source]¶ Get an image from an image set save in the analysis for this data set.
- Parameters
analysisTask – the analysis task that generated and stored the image set.
imageBaseName – the base name of the image
imageIndex – index of the image set to retrieve
imagesPerSlice – the number of images in each slice of the image file
sliceIndex – the index of the slice to get the image
frameIndex – the index of the frame in the specified slice
-
writer_for_analysis_images
(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], imageBaseName: str, imageIndex: int = None, imagej: bool = True) → tifffile.tifffile.TiffWriter[source]¶ Get a writer for writing tiff files from an analysis task.
- Parameters
analysisTask –
imageBaseName –
imageIndex –
imagej –
Returns:
-
list_analysis_files
(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, subdirectory: str = None, extension: str = None, fullPath: bool = True) → List[str][source]¶
-
save_dataframe_to_csv
(dataframe: pandas.core.frame.DataFrame, resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None, **kwargs) → None[source]¶ Save a pandas data frame to a csv file stored in this dataset.
If a previous pandas data frame has been save with the same resultName, it will be overwritten
- Parameters
dataframe – the data frame to save
resultName – the name of the output file
analysisTask – the analysis task that the dataframe should be saved under. If None, the dataframe is saved to the data set root.
resultIndex – index of the dataframe to save or None if no index should be specified
subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.
**kwargs – arguments to pass on to pandas.to_csv
-
load_dataframe_from_csv
(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None, **kwargs) → Optional[pandas.core.frame.DataFrame][source]¶ Load a pandas data frame from a csv file stored in this data set.
- Parameters
resultName –
analysisTask –
resultIndex –
subdirectory –
**kwargs –
- Returns
the pandas data frame
- Raises
FileNotFoundError – if the file does not exist
-
open_pandas_hdfstore
(mode: str, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → pandas.io.pytables.HDFStore[source]¶
-
delete_pandas_hdfstore
(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → None[source]¶
-
open_table
(mode: str, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → <module 'tables.file' from 'c:\\users\\george\\pymol\\envs\\merlin-env\\lib\\site-packages\\tables\\file.py'>[source]¶
-
delete_table
(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → None[source]¶ Delete an hdf5 file stored in this data set if it exists.
- Parameters
resultName – the name of the output file
analysisTask – the analysis task that should be associated with this hdf5 file. If None, the file is assumed to be in the data set root.
resultIndex – index of the dataframe to save or None if no index should be specified
subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.
-
open_hdf5_file
(mode: str, resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → h5py._hl.files.File[source]¶ Open an hdf5 file stored in this data set.
- Parameters
mode – the mode for opening the file, either ‘r’, ‘r+’, ‘w’, ‘w-‘, or ‘a’.
resultName – the name of the output file
analysisTask – the analysis task that should be associated with this hdf5 file. If None, the file is assumed to be in the data set root.
resultIndex – index of the dataframe to save or None if no index should be specified
subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.
- Returns
a h5py file object connected to the hdf5 file
- Raise:
- FileNotFoundError: if the mode is ‘r’ and the specified hdf5 file
does not exist
-
delete_hdf5_file
(resultName: str, analysisTask: Union[merlin.core.analysistask.AnalysisTask, str] = None, resultIndex: int = None, subdirectory: str = None) → None[source]¶ Delete an hdf5 file stored in this data set if it exists.
- Parameters
resultName – the name of the output file
analysisTask – the analysis task that should be associated with this hdf5 file. If None, the file is assumed to be in the data set root.
resultIndex – index of the dataframe to save or None if no index should be specified
subdirectory – subdirectory of the analysis task that the dataframe should be saved to or None if the dataframe should be saved to the root directory for the analysis task.
-
save_json_analysis_result
(analysisResult: Dict, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → None[source]¶
-
load_json_analysis_result
(resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → Dict[source]¶
-
load_pickle_analysis_result
(resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → Dict[source]¶
-
save_pickle_analysis_result
(analysisResult, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None)[source]¶
-
save_numpy_analysis_result
(analysisResult: numpy.ndarray, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → None[source]¶
-
save_numpy_txt_analysis_result
(analysisResult: numpy.ndarray, resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → None[source]¶
-
load_numpy_analysis_result
(resultName: str, analysisName: str, resultIndex: int = None, subdirectory: str = None) → numpy.array[source]¶
-
load_numpy_analysis_result_if_available
(resultName: str, analysisName: str, defaultValue, resultIndex: int = None, subdirectory: str = None) → numpy.array[source]¶ Load the specified analysis result or return the specified default value if the analysis result does not exist.
- Parameters
resultName – The name of the analysis result
analysisName – The name of the analysis task the result is saved in
defaultValue – The value to return if the specified analysis result does not exist
resultIndex – The index of the analysi result
subdirectory – The subdirectory within the analysis task that the result is saved in
- Returns: The analysis result or defaultValue if the analysis result
doesn’t exist.
-
get_analysis_subdirectory
(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str], subdirectory: str = None, create: bool = True) → str[source]¶ analysisTask can either be the class or a string containing the class name.
- create - Flag indicating if the analysis subdirectory should be
created if it does not already exist.
-
save_analysis_task
(analysisTask: merlin.core.analysistask.AnalysisTask, overwrite: bool = False)[source]¶
-
delete_analysis
(analysisTask: Union[merlin.core.analysistask.AnalysisTask, str]) → None[source]¶ Remove all files associated with the provided analysis from this data set.
Before deleting an analysis task, it must be verified that the analysis task is not running.
-
get_analysis_tasks
() → List[str][source]¶ Get a list of the analysis tasks within this dataset.
Returns: A list of the analysis task names.
-
analysis_exists
(analysisTaskName: str) → bool[source]¶ Determine if an analysis task with the specified name exists in this dataset.
-
get_logger
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → logging.Logger[source]¶
-
close_logger
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶
-
get_analysis_environment
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶ Get the environment variables for the system used to run the specified analysis task.
- Parameters
analysisTask – The completed analysis task to get the environment variables for.
fragmentIndex – The fragment index of the analysis task to get the environment variables for.
- Returns: A dictionary of the environment variables. If the job has not
yet run, then None is returned.
-
record_analysis_started
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶
-
record_analysis_running
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶
-
record_analysis_complete
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶
-
record_analysis_error
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → None[source]¶
-
get_analysis_start_time
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → float[source]¶ Get the time that this analysis task started
- Returns
The start time for the analysis task execution in seconds since the epoch in UTC.
-
get_analysis_complete_time
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → float[source]¶ Get the time that this analysis task completed.
- Returns
The completion time for the analysis task execution in seconds since the epoch in UTC.
-
get_analysis_elapsed_time
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → float[source]¶ Get the time that this analysis took to complete.
- Returns
The elapsed time for the analysis task execution in seconds. Returns None if the analysis task has not yet completed.
-
is_analysis_idle
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]¶
-
check_analysis_started
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]¶
-
check_analysis_done
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → bool[source]¶
-
analysis_done_filename
(analysisTask: merlin.core.analysistask.AnalysisTask, fragmentIndex: int = None) → str[source]¶
-
-
class
merlin.core.dataset.
ImageDataSet
(dataDirectoryName: str, dataHome: str = None, analysisHome: str = None, microscopeParametersName: str = None)[source]¶ Bases:
merlin.core.dataset.DataSet
-
image_stack_size
(imagePath)[source]¶ Get the size of the image stack stored in the specified image path.
- Returns
- a three element list with [width, height, frameCount] or None
if the file does not exist
-
-
class
merlin.core.dataset.
MERFISHDataSet
(dataDirectoryName: str, codebookNames: List[str] = None, dataOrganizationName: str = None, positionFileName: str = None, dataHome: str = None, analysisHome: str = None, microscopeParametersName: str = None)[source]¶ Bases:
merlin.core.dataset.ImageDataSet
-
save_codebook
(codebook: merlin.data.codebook.Codebook) → None[source]¶ Store the specified codebook in this dataset.
If a codebook with the same codebook index and codebook name as the specified codebook already exists in this dataset, it is not overwritten.
- Parameters
codebook – the codebook to store
- Raises
FileExistsError – If a codebook with the same codebook index but a different codebook name is already save within this dataset.
-
load_codebooks
() → List[merlin.data.codebook.Codebook][source]¶ Get all the codebooks stored within this dataset.
- Returns
A list of all the stored codebooks.
-
load_codebook
(codebookIndex: int = 0) → Optional[merlin.data.codebook.Codebook][source]¶ Load the codebook stored within this dataset with the specified index.
- Parameters
codebookIndex – the index of the codebook to load.
- Returns
The codebook stored with the specified codebook index. If no codebook exists with the specified index then None is returned.
-
get_stored_codebook_name
(codebookIndex: int = 0) → Optional[str][source]¶ Get the name of the codebook stored within this dataset with the specified index.
- Parameters
codebookIndex – the index of the codebook to load to find the name of.
- Returns
The name of the codebook stored with the specified codebook index. If no codebook exists with the specified index then None is returned.
-
get_codebooks
() → List[merlin.data.codebook.Codebook][source]¶ Get the codebooks associated with this dataset.
- Returns
A list containing the codebooks for this dataset.
-
get_fov_offset
(fov: int) → Tuple[float, float][source]¶ Get the offset of the specified fov in the global coordinate system. This offset is based on the anticipated stage position.
- Parameters
fov – index of the field of view
- Returns
A tuple specifying the x and y offset of the top right corner of the specified fov in pixels.
-
z_index_to_position
(zIndex: int) → float[source]¶ Get the z position associated with the provided z index.
-
position_to_z_index
(zPosition: float) → int[source]¶ Get the z index associated with the specified z position
- Raises
Exception – If the provided z position is not specified in this dataset
-
merlin.core.executor module¶
-
class
merlin.core.executor.
Executor
[source]¶ Bases:
object
-
abstract
run
(task: merlin.core.analysistask.AnalysisTask, index: int = None, rerunCompleted: bool = False) → None[source]¶ Run an analysis task.
This method will not run analysis tasks that are already currently running and analysis is terminated early due to error or otherwise will not be restarted.
- Parameters
task – the analysis task to run.
index – index of the analysis to run for a parallel analysis task.
rerunCompleted – flag indicating if previous analysis should be run again even if it has previously completed. If overwrite is True, analysis will be run on the task regardless of its status. If overwrite is False, analysis will only be run on the task or fragments of the task that have either not been started or have previously completed in error.
-
abstract
-
class
merlin.core.executor.
LocalExecutor
(coreCount=None)[source]¶ Bases:
merlin.core.executor.Executor
-
run
(task: merlin.core.analysistask.AnalysisTask, index: int = None, rerunCompleted: bool = False) → None[source]¶ Run an analysis task.
This method will not run analysis tasks that are already currently running and analysis is terminated early due to error or otherwise will not be restarted.
- Parameters
task – the analysis task to run.
index – index of the analysis to run for a parallel analysis task.
rerunCompleted – flag indicating if previous analysis should be run again even if it has previously completed. If overwrite is True, analysis will be run on the task regardless of its status. If overwrite is False, analysis will only be run on the task or fragments of the task that have either not been started or have previously completed in error.
-