# Advanced YAML and Operations The `.wic` YAML format is an advanced Sophios usage mode. The recommended authoring path is Python. YAML becomes valuable when you need a workflow to be a plain file: easy to archive, diff, validate, run headlessly, or inspect without importing project-specific Python code. Use this guide when you need: - standalone `.wic` workflows, - CI or batch execution from files, - audit-friendly workflow artifacts, - explicit inspection of inference and generated CWL, - schema-backed editor validation, - low-level Sophios features such as namespaces, static dispatch, and metadata annotations. ## Minimal `.wic` Workflow ```yaml steps: - echo: in: message: !ii Hello World ``` Run it from the command line: ```bash sophios --yaml docs/tutorials/helloworld.wic --run_local --copy_output_files ``` Compile without running: ```bash sophios --yaml docs/tutorials/helloworld.wic --generate_cwl_workflow ``` The generated CWL and input artifacts are written under `autogenerated/`. ## Why YAML Is Still Useful YAML is the file-native Sophios workflow representation. It is excellent for operational workflows where the workflow definition should be reviewed, stored, validated, or executed without importing project-specific Python modules. YAML gives you: - a single workflow file that can be reviewed without executing Python, - reproducible headless commands for CI and batch systems, - stable artifacts for audits and papers, - direct access to inference tags, anchors, and metadata, - schema validation in editors, - optional compiler-internal `.wic` trees when debugging the compiler. ## CLI Modes Common commands: ```bash sophios --yaml workflow.wic --generate_cwl_workflow sophios --yaml workflow.wic --run_local sophios --yaml workflow.wic --generate_run_script sophios --generate_schemas sophios --generate_config ``` Intermediate compiler `.wic` trees are not written by default. If you need them while debugging the compiler, opt in explicitly: ```bash sophios --yaml workflow.wic --generate_cwl_workflow --write_intermediate_wic ``` Useful flags: - `--graphviz`: write Graphviz sources and rendered diagrams when `dot` is available. - `--inputs_file `: merge extra job inputs into generated CWL inputs. - `--copy_output_files`: copy primary outputs into `outdir/`. - `--cwl_runner toil-cwl-runner`: run locally with Toil instead of `cwltool`. - `--container_engine podman`: use Podman instead of Docker. - `--inference_use_naming_conventions`: refine edge inference with naming rules. - `--insert_steps_automatically`: attempt limited automatic insertion when inference fails. ## Configuration and Discovery Sophios discovers CWL tools and `.wic` workflows from a JSON config file. The main keys are: ```json { "search_paths_cwl": { "global": ["/path/to/cwl_adapters"] }, "search_paths_wic": { "global": ["/path/to/workflows"] } } ``` If you do not pass `--config_file`, Sophios uses `~/wic/global_config.json`. Generate a starter config with: ```bash sophios --generate_config ``` Practical discovery rules: - Discovery is recursive. - CWL tools are keyed by filename stem inside a namespace. - `.wic` workflows are keyed by filename stem inside a namespace. - Duplicate stems in the same namespace overwrite earlier discoveries in memory. - Absolute paths are safer than relative paths because relative paths are resolved from the current working directory. ## Inline Inputs Use `!ii` when a value is known directly in the workflow file: ```yaml steps: - echo: in: message: !ii Hello World ``` Sophios extracts inline values into the generated CWL job inputs document during compilation. This keeps simple workflows compact while still producing normal CWL execution artifacts. ## Explicit Edges Use anchors when inference is not the right communication tool. ```yaml steps: - touch: in: filename: !ii empty.txt out: - file: !& created_file - cat: in: file: !* created_file ``` The output anchor `!& created_file` is consumed later with `!* created_file`. This notation is intentionally similar to YAML anchors, but it is specific to Sophios workflow edges. ## Edge Inference Sophios can infer many edges by comparing input and output types and formats. The compiler only connects a step input to outputs that already exist from earlier steps. This is the same compiler mechanism used by `.wic` workflows and Python workflows. In Python, leaving a required step input unbound allows the compiler to infer that edge during compilation. At a high level: 1. Look backward through previous step outputs. 2. Compare CWL type and format. 3. Prefer the most recent compatible output. 4. Use the first compatible match when multiple candidates remain. Inference reduces boilerplate, but it is not a substitute for review. Generated DAGs and generated CWL should be inspected when correctness matters. ### Naming Conventions When `--inference_use_naming_conventions` is enabled, Sophios can refine matches with rename rules from the config file: ```json "renaming_conventions": [ ["energy_", "edr_"], ["structure_", "tpr_"], ["traj_", "trr_"] ] ``` This can make inference more precise in domains where input and output names follow predictable conventions. It can also be misleading when file conversions or repeated formats make the "nearest" value different from the intended value. Use generated diagrams and explicit anchors when ambiguity matters. ### Inference Rules `inference_rules` can customize matching for specific formats: ```json "inference_rules": { "edam:format_3881": "continue", "edam:format_3987": "continue", "edam:format_3878": "break", "edam:format_2033": "break" } ``` The implemented `break` rule stops the search at the current compatible output. This is useful when older outputs are technically compatible but should not be considered. ## Namespaces Namespaces distinguish tools or workflows that share the same filename stem. Example config: ```json { "search_paths_wic": { "global": ["workflows/default"], "alternate": ["workflows/collaborator"] } } ``` Use a namespaced workflow at the call site: ```yaml wic: steps: (1, min.wic): wic: namespace: alternate ``` Within one namespace, names should still be unique. ## Metadata Annotations YAML metadata lives under a top-level `wic:` key. This keeps Sophios-specific metadata in one place so it can be merged, inspected, and removed during compilation. Graphviz metadata example: ```yaml wic: graphviz: label: Descriptive Subworkflow Name ranksame: - (1, short_step_name_1) - (5, short_step_name_5) steps: (1, short_step_name_1): wic: graphviz: label: Descriptive Step Name 1 ``` Nested metadata can override child workflow metadata. This is useful for parameter passing and customization without editing the child workflow file. ## Static Dispatch Static dispatch lets one workflow call a logical step while choosing a concrete implementation at compile time. Aggregator workflow: ```yaml wic: default_implementation: implementation1 implementations: implementation1: steps: - implementation1.wic: implementation2: steps: - implementation2.wic: ``` Call-site override: ```yaml steps: - static_dispatch.wic: wic: steps: (1, static_dispatch.wic): wic: implementation: implementation2 ``` Use this when several workflows provide the same conceptual operation but differ in algorithm, container, hardware assumptions, or performance profile. ## Program Synthesis `--insert_steps_automatically` enables a limited form of automatic step insertion when edge inference initially fails. This is useful for constrained cases such as known file-format conversions. It is not a general AI planner. Treat it as an advanced compiler feature and review the generated DAG carefully. ## Subinterpreters Subinterpreters support realtime or repeated auxiliary workflows while the main workflow is running. The current `cwl_subinterpreter` path repeatedly runs an independent auxiliary workflow for a fixed number of iterations. This is an advanced CWL/Sophios integration feature. Reach for it only after the main workflow is stable and you need monitoring or auxiliary execution behavior that cannot be expressed cleanly as ordinary workflow steps. ## Debugging Checklist When a YAML workflow does something unexpected: - Compile without running first. - Inspect `autogenerated/.cwl`. - Inspect generated job inputs. - Generate Graphviz output with `--graphviz`. - Replace inferred edges with explicit anchors where ambiguity matters. - Regenerate schemas if editor validation looks stale. - Delete `autogenerated/`, `cachedir*`, `outdir/`, and `provenance/` before a clean run. YAML is strongest when it gives you evidence. Use the artifacts.