# Python Workflow API The Python Workflow API is the primary way to author Sophios workflows. It lets you describe a workflow in the same place where you already have names, functions, imports, tests, and reusable Python helpers, while still compiling to portable CWL artifacts. The API is intentionally centered on a focused set of concepts: - a `CommandLineTool` describes one executable tool, - a `Step` places that tool inside a workflow graph, - a `Workflow` owns the graph, named outputs, and compiled artifacts, - assignments such as `cat.inputs.file = touch.outputs.file` record workflow bindings rather than executing commands immediately. If you remember one rule, make it this: Python describes the graph; compilation turns that graph into CWL and job inputs; execution happens after that. ## Scope This page covers: - what a `Step` represents, - what a `Workflow` represents, - how literal values bind to step inputs, - how a step output becomes a later step input, - what compilation produces, - how to compile and inspect workflow artifacts, - when to run locally, - when to keep compiled CWL in memory for submission payload construction. It does not cover every CWL feature. Advanced YAML controls, static dispatch, and program synthesis are documented separately in [Advanced YAML and Operations](advanced.md). ## Working Directory Assumption Most examples in this guide assume you are running from the repository root: ```bash cd /path/to/sophios ``` That makes checked-in adapter paths such as `cwl_adapters/echo.cwl` resolve cleanly. When writing reusable scripts, prefer `Path` objects over fragile string paths: ```python from pathlib import Path echo_tool = Path("cwl_adapters") / "echo.cwl" ``` ## Step 1: Load a Tool as a Step A `Step` is a CWL `CommandLineTool` placed inside a workflow context. ```python from pathlib import Path from sophios.apis.python.workflow import Step echo = Step(Path("cwl_adapters") / "echo.cwl") ``` At this point, Sophios has loaded the tool contract. It knows the tool has an input named `message` and an output named `stdout`. You can inspect the ports: ```python for input_port in echo.inputs: print(input_port.name, input_port.parameter_type) for output_port in echo.outputs: print(output_port.name, output_port.parameter_type) ``` This interface is the contract Sophios uses for validation, binding, edge inference, and generated CWL. ## Step 2: Bind a Literal Input Bind a value to a step input: ```python echo.inputs.message = "hello from Sophios" ``` That assignment does not run the tool. It records a workflow binding: ```text echo.message <- "hello from Sophios" ``` Sophios will later turn that binding into the appropriate generated CWL job input. The older shorthand still works: ```python echo.message = "hello from Sophios" ``` The docs use the explicit form because it is clearer: - `step.inputs.` is where values enter a step, - `step.outputs.` is where values leave a step. ## Step 3: Put Steps in a Workflow Wrap the step in a workflow: ```python from sophios.apis.python.workflow import Workflow workflow = Workflow([echo], "hello_python") ``` A workflow has a name and an ordered list of children. Children may be concrete steps or nested workflows. Compile without running: ```python workflow.write_artifacts() ``` Run locally: ```python workflow.run() ``` The minimal complete example is therefore: ```python from pathlib import Path from sophios.apis.python.workflow import Step, Workflow def build_workflow() -> Workflow: echo = Step(Path("cwl_adapters") / "echo.cwl") echo.inputs.message = "hello from Sophios" return Workflow([echo], "hello_python") if __name__ == "__main__": build_workflow().run() ``` This example keeps the moving parts visible. A runnable workflow needs a tool, an input binding, and a `Workflow` that can compile and run. ## Step 4: Link Two Steps Most workflows become useful when one step consumes another step's output. ```python from pathlib import Path from sophios.apis.python.workflow import Step, Workflow touch = Step(Path("cwl_adapters") / "touch.cwl") touch.inputs.filename = "empty.txt" append = Step(Path("cwl_adapters") / "append.cwl") append.inputs.file = touch.outputs.file append.inputs.str = "Hello" cat = Step(Path("cwl_adapters") / "cat.cwl") cat.inputs.file = append.outputs.file workflow = Workflow([touch, append, cat], "multistep_python") workflow.write_artifacts() ``` Read the bindings as arrows: ```text touch.outputs.file -> append.inputs.file append.outputs.file -> cat.inputs.file ``` The Python assignment creates the edge. You do not need to hand-write a CWL `source` field. ## Linear Edge Inference The Python API can also use the same edge inference mechanism as `.wic` workflows. In a linear workflow, if a required step input is not bound, Sophios passes that missing input to the compiler. The compiler then looks backward through earlier steps and connects the most recent compatible output. That means this also works: ```python touch = Step(Path("cwl_adapters") / "touch.cwl") touch.inputs.filename = "empty.txt" append = Step(Path("cwl_adapters") / "append.cwl") append.inputs.str = "Hello" cat = Step(Path("cwl_adapters") / "cat.cwl") workflow = Workflow([touch, append, cat], "multistep_python") workflow.write_artifacts() ``` Here Sophios infers: ```text touch.outputs.file -> append.inputs.file append.outputs.file -> cat.inputs.file ``` Use inference when the workflow is a clear linear chain and the generated graph is easy to inspect. Prefer explicit bindings when multiple prior outputs could match, when names matter, or when the workflow definition must make data flow unambiguous without relying on generated artifacts. ## Binding Types There are two common input binding patterns. | Binding | Python shape | Meaning | | --- | --- | --- | | Literal value | `step.inputs.message = "hello"` | The value is known now. | | Step output | `cat.inputs.file = append.outputs.file` | The value comes from an earlier step. | This table is one of the most important concepts in the Python API. Most workflow code is a readable sequence of these bindings. ## What Bindings Become Python assignments in a Sophios workflow are declarative. They do not execute a command when the assignment runs. They record enough information for the compiler to generate a CWL workflow and a CWL job input object. For example, this literal binding: ```python echo.inputs.message = "hello" ``` becomes part of the generated job input data. Sophios keeps the value out of the CWL tool definition so the compiled workflow and runtime inputs remain separate. This edge binding: ```python cat.inputs.file = append.outputs.file ``` becomes a CWL workflow `source` relationship. The compiled workflow connects the concrete output port from the `append` step to the concrete input port on the `cat` step. The practical rule is: write Python that describes the graph, then inspect the generated CWL when exact execution behavior matters. ## Named Workflow Outputs Outputs should also be deliberate. ```python workflow.outputs.captured_stdout = echo.outputs.stdout ``` That line exposes the step output as a stable workflow result. Use named workflow outputs when a downstream user, test, or execution service needs a stable result name. ## Export `.wic` From Python Every `Workflow` can export a `.wic` source representation. For a quick in-memory view, inspect `workflow.yaml`: ```python print(workflow.yaml) ``` This shows the file representation Sophios can derive from the current Python workflow. It is useful when debugging surprising bindings before compilation or when you want to compare Python-authored workflows with `.wic` workflows. When you want a real `.wic` file, use `write_wic()`: ```python workflow.write_wic("hello_python.wic") ``` This writes a source `.wic` workflow from the Python object. It does not compile the workflow and it does not write generated CWL. Literal bindings, named outputs, explicit edges, and intentionally unbound linear inputs are preserved in the `.wic` representation so the normal Sophios compiler can still apply edge inference later. If you need the text instead of a file: ```python wic_text = workflow.to_wic() ``` For nested workflows, `write_wic()` embeds subworkflows in the root document by default. If you want a sibling-file tree instead, pass `inline_subworkflows=False`: ```python workflow.write_wic("workflows", inline_subworkflows=False) ``` ## Compile Paths Sophios gives you two common compile paths. Compilation is the point where the high-level Python workflow becomes concrete CWL. During compilation, Sophios validates the workflow structure, resolves explicit bindings, applies edge inference where inputs were intentionally left unbound, emits runtime input data for literal values, and produces a CWL workflow document that can be inspected or executed by a CWL runner. ### Write Artifacts to Disk ```python workflow.write_artifacts() ``` Use this when you want to inspect generated files. Typical artifacts include: - `autogenerated/.cwl`, - `autogenerated/_inputs.yml`, - Graphviz files when graph output is enabled. This is the best path when generated artifacts need to be reviewed, committed to a test fixture, or inspected during debugging. `workflow.compile(write_to_disk=True)` is still available. `write_artifacts()` is the clearer public method for the common "compile and inspect files" path. Neither method writes intermediate `.wic` compiler trees by default. Use `workflow.write_wic(...)` when you want a source `.wic` file. ### Keep Compiled CWL in Memory ```python compiled = workflow.get_cwl_workflow() ``` Use this when the next step is another Python operation, such as packaging a submission payload. The returned object contains: - the compiled CWL workflow fields, - the workflow name, - generated `yaml_inputs`. Submission payloads commonly expect the CWL workflow document and job inputs as separate pieces. The compute payload guide shows one concrete service-oriented payload shape in detail. ## Running Locally Run locally with: ```python workflow.run() ``` Sophios supports two local CWL runners through `Workflow.run()`: - `cwltool`, the default local runner, - `toil-cwl-runner`, for local execution through Toil. Runtime options can be passed through `run_args_dict`: ```python workflow.run( run_args_dict={ "cwl_runner": "cwltool", "container_engine": "docker", "copy_output_files": "yes", } ) ``` To run the same workflow with Toil, switch the runner: ```python workflow.run( run_args_dict={ "cwl_runner": "toil-cwl-runner", "container_engine": "docker", "logLevel": "INFO", } ) ``` Sophios uses recognized local-run options for setup and passes runner-specific key/value options to the selected CWL runner. See [Running a Multistep Workflow Locally](multistep_runner.md) for a complete example. Local execution is best for: - verifying development-scale workflows, - verifying tool contracts, - checking generated outputs before submission, - verifying end-to-end behavior in a controlled environment. If the workflow needs a large container image or special hardware, compile first and inspect the generated artifacts before attempting a full run. ## Nested Workflows A workflow can contain another workflow: ```python preprocess = Workflow([touch, append], "preprocess") report = Workflow([preprocess, cat], "report") ``` Nested workflows are how large pipelines are split into named components. A subworkflow can have its own inputs, outputs, tests, and documentation. Use nesting when a group of steps has a meaningful name: - "preprocess", - "segment", - "measure", - "summarize", - "submit". ## Scatter and Conditionals CWL supports scatter and conditional execution. Sophios exposes those controls through step-level settings: ```python echo.scatter_on(echo.inputs.message, method="dotproduct") ``` ```python echo.when = "$(inputs.message != '')" ``` These features are powerful and should be used with explicit review of the generated CWL. Scatter changes the shape of values. Conditionals can make outputs nullable. Both affect downstream type compatibility and runtime behavior. The current lightweight `when_pyapi.py` example runs locally, but CWL correctly warns that outputs from conditional steps may be `null`. That warning is expected: conditional execution changes the output type contract. The lightweight `scatter_pyapi.py` example uses a CWL `Any` output because the array-producing tool is intentionally generic. Sophios treats `Any` as permissive for explicit Python bindings while keeping compiler edge inference conservative. ## Generated Files When you compile or run with disk output enabled, Sophios writes useful artifacts: - `autogenerated/.cwl`: compiled root CWL workflow. - `autogenerated/_inputs.yml`: generated job inputs. - `autogenerated/schemas/`: generated schemas for `.wic` validation. - `cachedir*`: CWL runner caches and intermediate files. - `provenance/`: CWL provenance data unless disabled. - `outdir/`: copied primary outputs when output copying is enabled. - `error_.txt`: tracebacks from compilation or runner startup failures. Intermediate compiler `.wic` trees are not written under `autogenerated/` during normal compilation. The CLI writes them only when explicitly invoked with `--write_intermediate_wic`. These files are the main debugging surface. A production workflow should be reviewed through its generated artifacts, not only through the Python code. ## Troubleshooting When a workflow fails: - Compile before running. If compilation fails, the issue is in the workflow structure or tool contract. - Inspect `workflow.yaml` to check bindings before compilation. - Inspect `autogenerated/.cwl` and the generated inputs after compilation. - Re-run without quiet settings if you need full CWL runner output. - Check whether a container image is being pulled. - Delete `autogenerated/`, `cachedir*`, `outdir/`, and `provenance/` when you need a clean local run. - Replace advanced controls with simpler bindings when isolating a failure. ## Maintained Examples The following examples are intended to be quick to read and quick to run: - `examples/scripts/helloworld_pyapi.py`: minimal Python workflow. - `examples/scripts/multistep1_pyapi.py`: step-to-step file flow. - `examples/scripts/multistep1_toJson_pyapi.py`: in-memory compiled workflow JSON. - `examples/scripts/multistep_runner_pyapi.py`: local runner selection for a multistep workflow. - `examples/scripts/reusable_interface_pyapi.py`: reusable workflow interface example. - `examples/scripts/scatter_pyapi.py`: scatter over array-valued bindings. - `examples/scripts/when_pyapi.py`: conditional execution. - `examples/scripts/tool_builder_workflow.py`: generated CLTs composed in memory. - `examples/scripts/compute_payload_workflow.py`: Python workflow to validated compute payload. The Ichnaea and SAM3 walkthroughs are larger, production-oriented examples with heavier runtime assumptions. ## Next Steps Continue with: - [Building Tool Contracts in Python](tool_builder_sam3.md) to author tools. - [Using Tool Builder and the Workflow Python API Together](tool_builder_workflow.md) to build tools in memory and compose them immediately. - [From Python Workflow to Compute Payload](compute_payload_workflow.md) to prepare validated submission payloads from compiled workflows. - [Advanced YAML and Operations](advanced.md) when you need `.wic` files, schema validation, inference controls, or audit-friendly artifacts.