Tutorial
Wrapping an external pipeline
Turn an existing script or external repo into a Linkar template without copying all its code.
The best Linkar wrapper is usually thin.
Linkar should own:
- the runtime contract
- parameter resolution
- output exposure
- run provenance
The external tool should still own its real computational logic.
Start with the interface, not the wrapper code
The first job is to define a stable template contract in linkar_template.yaml.
Before writing wrapper code, decide three things:
- which inputs the user should provide
- which outputs Linkar should record
- where the wrapped tool should write its results under
LINKAR_RESULTS_DIR
For example:
id: fastqc
version: 0.1.0
description: Run FastQC on one FASTQ file.
params:
input_fastq:
type: path
required: true
threads:
type: int
default: 4
outputs:
results_dir: {}
fastqc_reports:
glob: fastqc/*_fastqc.html
run:
command: >-
fastqc --threads "${param:threads}"
--outdir "${LINKAR_RESULTS_DIR}/fastqc"
"${param:input_fastq}"
Now the wrapper has a clear job: in the simplest case, there is no separate wrapper file at all.
Prefer run.command for one-command wrappers
For a normal command-line tool, a single run.command string is usually the cleanest option:
run:
command: >-
fastqc --threads "${param:threads}"
--outdir "${LINKAR_RESULTS_DIR}/fastqc"
"${param:input_fastq}"
This is a good wrapper because:
- the contract is explicit
- the output location is deterministic
- there is no extra wrapper file to maintain
This is the right shape for wrappers around tools like:
fastqcsamtoolsbcl-convertcellrangersubcommands when you only need one stable invocation
Use run.sh or run.py when the wrapper starts doing real logic
If you are wrapping a Python-based pipeline or a multi-mode entrypoint, run.py is usually better
than pushing more conditionals into shell.
run.py is the right move when the wrapper must:
- validate combinations of parameters
- assemble optional arguments clearly
- call into a Python library or Python-native pipeline
- inspect files or emit structured errors
That is why a template like demultiplex is better as either a declarative run.command or a real
programmatic entrypoint, rather than a large shell adapter that only forwards arguments.
For Python wrappers, Linkar already supports a direct entrypoint model. The bundled
download_test_data example uses:
run:
entry: run.py
and the run.py file reads Linkar-provided environment variables such as:
SOURCE_URLOUTPUT_NAMELINKAR_RESULTS_DIR
That is the current runtime model in the codebase today.
A realistic template layout
For a thin command wrapper:
fastqc/
linkar_template.yaml
test.sh
For a shell-oriented wrapper with local logic:
demultiplex/
linkar_template.yaml
run.sh
test.sh
testdata/
For a Python-oriented wrapper:
download_test_data/
linkar_template.yaml
run.py
test.sh
testdata/
Keep the external repo boundary clear
You have two reasonable packaging models:
1. Thin wrapper around an external checkout
Use this when the external repo already has its own release cycle and you do not want to bundle it
into the template. If you do this, prefer cloning a pinned commit rather than floating main.
Template job:
- define Linkar params
- call the external entrypoint
- write outputs under
LINKAR_RESULTS_DIR
Typical shape:
my_pack/
templates/
wrapped_pipeline/
linkar_template.yaml
run.sh
test.sh
In this model, run.sh is mostly an adapter that calls a pinned checkout, installed binary, or
existing environment.
2. Self-contained template bundle
Use this when the template should be portable on its own and the bundled pipeline code is part of the template distribution.
Template directory can then contain:
my_pack/
templates/
demultiplex/
linkar_template.yaml
run.py
helpers/
samplesheet.py
assets/
adapter_seqs.tsv
test.py
testdata/
This is still reasonable when the bundled code really is part of the distributed template contract.
Choose this model when:
- the wrapped logic is small enough to version together with the template
- portability matters more than reusing an external repo boundary
- you want
linkar render ...to produce a self-contained handoff artifact
Testing strategy
Template-local testing should stay with the template repo.
Examples:
cd templates/fastqc
bash test.sh
cd templates/demultiplex
python test.py
Then validate through Linkar:
linkar test fastqc --pack /path/to/pack
linkar test demultiplex --pack /path/to/pack
That split mirrors the current codebase:
- local
test.shortest.pykeeps authoring fast linkar test ...validates the real Linkar runtime path
Good wrapper rules
- keep the Linkar contract explicit
- keep output locations deterministic
- prefer explicit defaults over hidden omission logic
- prefer
run.commandwhen one command is enough - use
run.shfor real local shell logic - use
run.pyonce shell stops being clearer - let the external tool own the real computation
Linkar is the runtime and packaging layer, not a replacement for the external tool itself.