Back to tutorials

Tutorial

Wrapping an external pipeline

Turn an existing script or external repo into a Linkar template without copying all its code.

The best Linkar wrapper is usually thin.

Linkar should own:

  • the runtime contract
  • parameter resolution
  • output exposure
  • run provenance

The external tool should still own its real computational logic.

Start with the interface, not the wrapper code

The first job is to define a stable template contract in linkar_template.yaml.

Before writing wrapper code, decide three things:

  • which inputs the user should provide
  • which outputs Linkar should record
  • where the wrapped tool should write its results under LINKAR_RESULTS_DIR

For example:

id: fastqc
version: 0.1.0
description: Run FastQC on one FASTQ file.
params:
  input_fastq:
    type: path
    required: true
  threads:
    type: int
    default: 4
outputs:
  results_dir: {}
  fastqc_reports:
    glob: fastqc/*_fastqc.html
run:
  command: >-
    fastqc --threads "${param:threads}"
    --outdir "${LINKAR_RESULTS_DIR}/fastqc"
    "${param:input_fastq}"

Now the wrapper has a clear job: in the simplest case, there is no separate wrapper file at all.

Prefer run.command for one-command wrappers

For a normal command-line tool, a single run.command string is usually the cleanest option:

run:
  command: >-
    fastqc --threads "${param:threads}"
    --outdir "${LINKAR_RESULTS_DIR}/fastqc"
    "${param:input_fastq}"

This is a good wrapper because:

  • the contract is explicit
  • the output location is deterministic
  • there is no extra wrapper file to maintain

This is the right shape for wrappers around tools like:

  • fastqc
  • samtools
  • bcl-convert
  • cellranger subcommands when you only need one stable invocation

Use run.sh or run.py when the wrapper starts doing real logic

If you are wrapping a Python-based pipeline or a multi-mode entrypoint, run.py is usually better than pushing more conditionals into shell.

run.py is the right move when the wrapper must:

  • validate combinations of parameters
  • assemble optional arguments clearly
  • call into a Python library or Python-native pipeline
  • inspect files or emit structured errors

That is why a template like demultiplex is better as either a declarative run.command or a real programmatic entrypoint, rather than a large shell adapter that only forwards arguments.

For Python wrappers, Linkar already supports a direct entrypoint model. The bundled download_test_data example uses:

run:
  entry: run.py

and the run.py file reads Linkar-provided environment variables such as:

  • SOURCE_URL
  • OUTPUT_NAME
  • LINKAR_RESULTS_DIR

That is the current runtime model in the codebase today.

A realistic template layout

For a thin command wrapper:

fastqc/
  linkar_template.yaml
  test.sh

For a shell-oriented wrapper with local logic:

demultiplex/
  linkar_template.yaml
  run.sh
  test.sh
  testdata/

For a Python-oriented wrapper:

download_test_data/
  linkar_template.yaml
  run.py
  test.sh
  testdata/

Keep the external repo boundary clear

You have two reasonable packaging models:

1. Thin wrapper around an external checkout

Use this when the external repo already has its own release cycle and you do not want to bundle it into the template. If you do this, prefer cloning a pinned commit rather than floating main.

Template job:

  • define Linkar params
  • call the external entrypoint
  • write outputs under LINKAR_RESULTS_DIR

Typical shape:

my_pack/
  templates/
    wrapped_pipeline/
      linkar_template.yaml
      run.sh
      test.sh

In this model, run.sh is mostly an adapter that calls a pinned checkout, installed binary, or existing environment.

2. Self-contained template bundle

Use this when the template should be portable on its own and the bundled pipeline code is part of the template distribution.

Template directory can then contain:

my_pack/
  templates/
    demultiplex/
      linkar_template.yaml
      run.py
      helpers/
        samplesheet.py
      assets/
        adapter_seqs.tsv
      test.py
      testdata/

This is still reasonable when the bundled code really is part of the distributed template contract.

Choose this model when:

  • the wrapped logic is small enough to version together with the template
  • portability matters more than reusing an external repo boundary
  • you want linkar render ... to produce a self-contained handoff artifact

Testing strategy

Template-local testing should stay with the template repo.

Examples:

cd templates/fastqc
bash test.sh
cd templates/demultiplex
python test.py

Then validate through Linkar:

linkar test fastqc --pack /path/to/pack
linkar test demultiplex --pack /path/to/pack

That split mirrors the current codebase:

  • local test.sh or test.py keeps authoring fast
  • linkar test ... validates the real Linkar runtime path

Good wrapper rules

  • keep the Linkar contract explicit
  • keep output locations deterministic
  • prefer explicit defaults over hidden omission logic
  • prefer run.command when one command is enough
  • use run.sh for real local shell logic
  • use run.py once shell stops being clearer
  • let the external tool own the real computation

Linkar is the runtime and packaging layer, not a replacement for the external tool itself.