PolusAI · hamshkhawar · Apr 1, 2024
diff --git a/README.md b/README.md
@@ -0,0 +1,105 @@
+# Common Workflow Language (CWL) Workflows
+
+CWL feature extraction workflow for imaging dataset
+
+##  Workflow Steps:
+
+create a [Conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#activating-an-environment) environment using python = ">=3.9,<3.12"
+
+#### 1. Install polus-plugins.
+
+- clone a image-tools repository
+`git clone https://github.com/camilovelezr/image-tools.git ../`
+- cd `image-tools`
+- create a new branch
+`git checkout -b hd2  remotes/origin/hd2`
+- `pip install .`
+
+#### 2. Install workflow-inference-compiler.
+- clone a workflow-inference-compiler repository
+`git clone https://github.com/camilovelezr/workflow-inference-compiler.git ../`
+- cd `workflow-inference-compiler`
+- create a new branch
+`git checkout -b hd2  remotes/origin/hd2`
+- `pip install -e ".[all]"`
+
+#### 3. Install image-workflow.
+- cd `image-workflows`
+- poetry install
+
+#### Note:
+Ensure that the [docker-desktop](https://www.docker.com/products/docker-desktop/) is running in the background. To verify that it's operational, you can use the following command:
+`docker run -d -p 80:80 docker/getting-started` 
+This command will launch the `docker/getting-started container` in detached mode (-d flag), exposing port 80 on your local machine (-p 80:80). It's a simple way to test if Docker Desktop is functioning correctly.
+
+## Details 
+This workflow integrates eight distinct plugins, starting from data retrieval from [Broad Bioimage Benchmark Collection](https://bbbc.broadinstitute.org/), renaming files, correcting uneven illumination, segmenting nuclear objects, and culminating in the extraction of features from identified objects
+
+Below are the specifics of the plugins employed in the workflow
+1. [bbbc-download-plugin](https://github.com/saketprem/polus-plugins/tree/bbbc_download/utils/bbbc-download-plugin)
+2. [file-renaming-tool](https://github.com/PolusAI/image-tools/tree/master/formats/file-renaming-tool)
+3. [ome-converter-tool](https://github.com/PolusAI/image-tools/tree/master/formats/ome-converter-tool)
+4. [basic-flatfield-estimation-tool](https://github.com/PolusAI/image-tools/tree/master/regression/basic-flatfield-estimation-tool)
+5. [apply-flatfield-tool](https://github.com/PolusAI/image-tools/tree/master/transforms/images/apply-flatfield-tool)
+6. [kaggle-nuclei-segmentation](https://github.com/hamshkhawar/image-tools/tree/kaggle-nuclei_seg/segmentation/kaggle-nuclei-segmentation)
+7. [polus-ftl-label-plugin](https://github.com/hamshkhawar/image-tools/tree/kaggle-nuclei_seg/transforms/images/polus-ftl-label-plugin)
+8. [nyxus-plugin](https://github.com/PolusAI/image-tools/tree/kaggle-nuclei_seg/features/nyxus-plugin)
+
+## Execute CWL workflows
+Three different CWL workflows can be executed for specific datasets
+1. segmentation
+2. analysis
+
+During the execution of the segmentation workflow, `1 to 7` plugins will be utilized. However, for executing the analysis workflow, `1 to 8` plugins will be employed.
+If a user wishes to execute a workflow for a new dataset, they can utilize a sample YAML file to input parameter values. This YAML file can be saved in the desired subdirectory of the `configuration` folder with the name `dataset.yml`
+
+If a user opts to run a workflow without background correction, they can set `background_correction` to false. In this case, the workflow will skip steps `4 and 5`
+
+`python -m polus.image.workflows  --name="BBBC001" --workflow=analysis`
+
+A directory named `outputs` is generated, encompassing CLTs for each plugin, YAML files, and all outputs are stored within the `outdir` directory.
+```
+outputs
+├── experiment
+│   └── cwl_adapters
+|   experiment.cwl
+|   experiment.yml
+|
+└── outdir
+    └── experiment
+        ├── step 1 BbbcDownload
+        │   └── outDir
+        │       └── bbbc.outDir
+        │           └── BBBC
+        │               └── BBBC039
+        │                   └── raw
+        │                       ├── Ground_Truth
+        │                       │   ├── masks
+        │                       │   └── metadata
+        │                       └── Images
+        │                           └── images
+        ├── step 2 FileRenaming
+        │   └── outDir
+        │       └── rename.outDir
+        ├── step 3 OmeConverter
+        │   └── outDir
+        │       └── ome_converter.outDir
+        ├── step 4 BasicFlatfieldEstimation
+        │   └── outDir
+        │       └── estimate_flatfield.outDir
+        ├── step 5 ApplyFlatfield
+        │   └── outDir
+        │       └── apply_flatfield.outDir
+        ├── step 6 KaggleNucleiSegmentation
+        │   └── outDir
+        │       └── kaggle_nuclei_segmentation.outDir
+        ├── step 7 FtlLabel
+        │   └── outDir
+        │       └── ftl_plugin.outDir
+        └── step 8 NyxusPlugin
+            └── outDir
+                └── nyxus_plugin.outDir
+
+```
+#### Note:
+Step 7 and step 8 are executed only in the case of the `analysis` workflow.
diff --git a/configuration/__init__.py b/configuration/__init__.py
diff --git a/configuration/analysis/BBBC001.yml b/configuration/analysis/BBBC001.yml
@@ -0,0 +1,14 @@
+---
+name : BBBC001
+file_pattern : /.*/.*/.*/Images/.*/.*_{row:c}{col:dd}f{f:dd}d{channel:d}.tif
+out_file_pattern : x{row:dd}_y{col:dd}_p{f:dd}_c{channel:d}.tif
+image_pattern: x{x:dd}_y{y:dd}_p{p:dd}_c{c:d}.ome.tif
+seg_pattern: x{x:dd}_y{y:dd}_p{p:dd}_c0.ome.tif
+ff_pattern: "x00_y03_p0\\(0-5\\)_c{c:d}_flatfield.ome.tif"
+df_pattern: "x00_y03_p0\\(0-5\\)_c{c:d}_darkfield.ome.tif"
+group_by: c
+map_directory: false
+features: ALL
+file_extension: pandas
+background_correction: false
+
diff --git a/configuration/analysis/BBBC039.yml b/configuration/analysis/BBBC039.yml
@@ -0,0 +1,13 @@
+---
+name : BBBC039
+file_pattern : /.*/.*/.*/Images/.*/.*_{row:c}{col:dd}_s{s:d}_w{channel:d}.*.tif
+out_file_pattern : x{row:dd}_y{col:dd}_p{s:dd}_c{channel:d}.tif
+image_pattern: x{x:dd}_y{y:dd}_p{p:dd}_c{c:d}.ome.tif
+seg_pattern: x{x:dd}_y{y:dd}_p{p:dd}_c1.ome.tif
+ff_pattern: "x\\(00-15\\)_y\\(01-24\\)_p0\\(1-9\\)_c{c:d}_flatfield.ome.tif"
+df_pattern: "x\\(00-15\\)_y\\(01-24\\)_p0\\(1-9\\)_c{c:d}_darkfield.ome.tif"
+group_by: c
+map_directory: false
+features: "ALL_INTENSITY"
+file_extension: pandas
+background_correction: false
diff --git a/configuration/analysis/__init__.py b/configuration/analysis/__init__.py
diff --git a/configuration/analysis/sample.yml b/configuration/analysis/sample.yml
@@ -0,0 +1,13 @@
+---
+name : 
+file_pattern :
+out_file_pattern : 
+image_pattern: 
+seg_pattern: 
+ff_pattern:
+df_pattern: 
+group_by: 
+map_directory: 
+features: 
+file_extension: 
+background_correction:
diff --git a/configuration/segmentation/BBBC001.yml b/configuration/segmentation/BBBC001.yml
@@ -0,0 +1,11 @@
+---
+name : BBBC001
+file_pattern : /.*/.*/.*/Images/.*/.*_{row:c}{col:dd}f{f:dd}d{channel:d}.tif
+out_file_pattern : x{row:dd}_y{col:dd}_p{f:dd}_c{channel:d}.tif
+image_pattern: x{x:dd}_y{y:dd}_p{p:dd}_c{c:d}.ome.tif
+seg_pattern: x{x:dd}_y{y:dd}_p{p:dd}_c0.ome.tif
+ff_pattern: "x00_y03_p0\\(0-5\\)_c{c:d}_flatfield.ome.tif"
+df_pattern: "x00_y03_p0\\(0-5\\)_c{c:d}_darkfield.ome.tif"
+group_by: c
+map_directory: false
+background_correction: false
diff --git a/configuration/segmentation/BBBC039.yml b/configuration/segmentation/BBBC039.yml
@@ -0,0 +1,11 @@
+---
+name : BBBC039
+file_pattern : /.*/.*/.*/Images/.*/.*_{row:c}{col:dd}_s{s:d}_w{channel:d}.*.tif
+out_file_pattern : x{row:dd}_y{col:dd}_p{s:dd}_c{channel:d}.tif
+image_pattern: x{x:dd}_y{y:dd}_p{p:dd}_c{c:d}.ome.tif
+seg_pattern: x{x:dd}_y{y:dd}_p{p:dd}_c1.ome.tif
+ff_pattern: "x\\(00-15\\)_y\\(01-24\\)_p0\\(1-9\\)_c{c:d}_flatfield.ome.tif"
+df_pattern: "x\\(00-15\\)_y\\(01-24\\)_p0\\(1-9\\)_c{c:d}_darkfield.ome.tif"
+group_by: c
+map_directory: false
+background_correction: false
diff --git a/configuration/segmentation/__init__.py b/configuration/segmentation/__init__.py
diff --git a/configuration/segmentation/sample.yml b/configuration/segmentation/sample.yml
@@ -0,0 +1,12 @@
+---
+name : 
+file_pattern : 
+out_file_pattern : 
+image_pattern: 
+seg_pattern: 
+ff_pattern:
+df_pattern: 
+group_by: 
+map_directory: 
+features: 
+file_extension: 
diff --git a/cwl_adapters/basic-flatfield-estimation.cwl → cwl-adapters/basic-flatfield-estimation.cwl b/cwl_adapters/basic-flatfield-estimation.cwl → cwl-adapters/basic-flatfield-estimation.cwl
diff --git a/cwl_adapters/bbbcdownload.cwl → cwl-adapters/bbbcdownload.cwl b/cwl_adapters/bbbcdownload.cwl → cwl-adapters/bbbcdownload.cwl
diff --git a/cwl_adapters/file-renaming.cwl → cwl-adapters/file-renaming.cwl b/cwl_adapters/file-renaming.cwl → cwl-adapters/file-renaming.cwl
diff --git a/cwl_adapters/image_assembler.cwl → cwl-adapters/image_assembler.cwl b/cwl_adapters/image_assembler.cwl → cwl-adapters/image_assembler.cwl
diff --git a/cwl_adapters/montage.cwl → cwl-adapters/montage.cwl b/cwl_adapters/montage.cwl → cwl-adapters/montage.cwl
diff --git a/cwl_adapters/ome-converter.cwl → cwl-adapters/ome-converter.cwl b/cwl_adapters/ome-converter.cwl → cwl-adapters/ome-converter.cwl
diff --git a/cwl_adapters/precompute_slide.cwl → cwl-adapters/precompute_slide.cwl b/cwl_adapters/precompute_slide.cwl → cwl-adapters/precompute_slide.cwl
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,38 @@
+[tool.poetry]
+name = "polus-image-workflows"
+version = "0.1.1-dev1"
+description = "Build and execute pipelines of polus plugins on Compute."
+authors = ["Hamdah Shafqat Abbasi <[email protected]>"]
+readme = "README.md"
+packages = [{include = "polus", from = "src"}]
+
+[tool.poetry.dependencies]
+python = ">=3.9,<3.12"
+typer = "^0.9.0"
+pyyaml = "^6.0.1"
+pydantic = "^2.6.1"
+cwl-utils="0.31"
+toil="^5.12"
+polus-plugins = {path = "../image-tools", develop = true}
+workflow-inference-compiler = {path = "../workflow-inference-compiler", develop = true}
+
+[tool.poetry.group.dev.dependencies]
+jupyter = "^1.0.0"
+nbconvert = "^7.11.0"
+pytest = "^7.4.4"
+bump2version = "^1.0.1"
+pre-commit = "^3.3.3"
+black = "^23.3.0"
+ruff = "^0.0.274"
+mypy = "^1.4.0"
+pytest-xdist = "^3.3.1"
+pytest-sugar = "^0.9.7"
+
+[build-system]
+requires = ["poetry-core>=1.0.0"]
+build-backend = "poetry.core.masonry.api"
+
+[tool.pytest.ini_options]
+addopts = [
+    "--import-mode=importlib",
+]
diff --git a/src/polus/image/workflows/__init__.py b/src/polus/image/workflows/__init__.py
diff --git a/src/polus/image/workflows/__main__.py b/src/polus/image/workflows/__main__.py
@@ -0,0 +1,65 @@
+"""CWL Workflow."""
+import logging
+import typer
+from pathlib import Path
+from polus.image.workflows.utils import LoadYaml
+from workflows.cwl_analysis import CWLAnalysisWorkflow
+from workflows.cwl_nuclear_segmentation import CWLSegmentationWorkflow
+from pathlib import Path
+
+
+app = typer.Typer()
+
+# Initialize the logger
+logging.basicConfig(
+    format="%(asctime)s - %(name)-8s - %(levelname)-8s - %(message)s",
+    datefmt="%d-%b-%y %H:%M:%S",
+)
+logger = logging.getLogger("WIC Python API")
+logger.setLevel(logging.INFO)
+
+
+@app.command()
+def main(
+    name: str = typer.Option(
+        ...,
+        "--name",
+        "-n",
+        help="Name of imaging dataset of Broad Bioimage Benchmark Collection (https://bbbc.broadinstitute.org/image_sets)"
+    ),
+    workflow: str = typer.Option(
+        ...,
+        "--workflow",
+        "-w",
+        help="Name of cwl workflow"
+    )
+) -> None:
+
+    """Execute CWL Workflow."""
+
+    logger.info(f"name = {name}")
+    logger.info(f"workflow = {workflow}")
+
+    config_path = Path(__file__).parent.parent.parent.parent.parent.joinpath(f"configuration/{workflow}/{name}.yml")
+    print(config_path)
+
+
+    model = LoadYaml(workflow=workflow, config_path=config_path)
+    params = model.parse_yaml()
+
+    if workflow == "analysis":
+        logger.info(f"Executing {workflow}!!!")
+        model = CWLAnalysisWorkflow(**params)
+        model.workflow()
+
+    if workflow == "segmentation":
+        logger.info(f"Executing {workflow}!!!")
+        model = CWLSegmentationWorkflow(**params)
+        model.workflow()
+
+
+    logger.info("Completed CWL workflow!!!")
+
+
+if __name__ == "__main__":
+    app()
diff --git a/src/polus/image/workflows/utils.py b/src/polus/image/workflows/utils.py
@@ -0,0 +1,68 @@
+import pydantic
+from pathlib import Path
+from typing import Dict
+from typing import Union
+import yaml
+
+
+GITHUB_TAG = "https://raw.githubusercontent.com"
+
+
+ANALYSIS_KEYS = ["name", "file_pattern", "out_file_pattern", "image_pattern", "seg_pattern", "ff_pattern", "df_pattern", "group_by", "map_directory", "features", "file_extension", "background_correction"]
+SEG_KEYS = ["name", "file_pattern", "out_file_pattern", "image_pattern", "seg_pattern", "ff_pattern", "df_pattern", "group_by", "map_directory", "background_correction"]
+
+
+class DataModel(pydantic.BaseModel):
+    data: Dict[str, Dict[str, Union[str, bool]]]
+
+
+class LoadYaml(pydantic.BaseModel):
+    """Validation of Dataset yaml."""
+    workflow:str
+    config_path: Union[str, Path]
+
+    @pydantic.validator("config_path", pre=True)
+    @classmethod
+    def validate_path(cls, value: Union[str, Path]) -> Union[str, Path]:
+        """Validation of Paths."""
+        if not Path(value).exists():
+            msg = f"{value} does not exist! Please do check it again"
+            raise ValueError(msg)
+        if isinstance(value, str):
+            return Path(value)
+        return value
+
+    @pydantic.validator("workflow", pre=True)
+    @classmethod
+    def validate_workflow_name(cls, value: str) -> str:
+        """Validation of workflow name."""
+        if not value in ["analysis", "segmentation", "visualization"]:
+            msg = f"Please choose a valid workflow name i-e analysis segmentation visualization"
+            raise ValueError(msg)
+        return value
+
+    def parse_yaml(self) -> Dict[str, Union[str, bool]]:
+        """Parsing yaml configuration file for each dataset."""
+
+        with open(f'{self.config_path}','r') as f: 
+            data = yaml.safe_load(f)
+
+        check_values = any([v for _, v in data.items() if f is None])
+
+        if check_values is True:
+            msg = f"All the parameters are not defined! Please do check it again"
+            raise ValueError(msg)
+
+
+        if self.workflow == "analysis":
+            if data['background_correction'] == True:
+                if list(data.keys()) != ANALYSIS_KEYS:
+                    msg = f"Please do check parameters again for analysis workflow!!"
+                    raise ValueError(msg)
+
+        if self.workflow == "segmentation":
+            if data['background_correction'] == True:
+                if list(data.keys()) != SEG_KEYS:
+                    msg = f"Please do check parameters again for segmentation workflow!!"
+                    raise ValueError(msg)
+        return data
diff --git a/workflows/__init__.py b/workflows/__init__.py