Configuration¶
This guide covers advanced configuration options for BPM, including environment variables, configuration files, and customization settings.
Environment Variables¶
BPM uses several environment variables to control its behavior:
Required Variables¶
BPM_CACHE
¶
The cache directory where BPM stores repositories, templates, and other data.
# Linux/macOS
export BPM_CACHE="/path/to/your/bpm_cache"
# Windows
setx BPM_CACHE "C:\path\to\your\bpm_cache"
Default: $HOME/.cache/bpm
Optional Variables¶
BPM_CONFIG
¶
Path to a global configuration file.
BPM_LOG_LEVEL
¶
Controls the verbosity of BPM's logging output.
export BPM_LOG_LEVEL="DEBUG" # Most verbose
export BPM_LOG_LEVEL="INFO" # Default
export BPM_LOG_LEVEL="WARNING" # Only warnings and errors
export BPM_LOG_LEVEL="ERROR" # Only errors
BPM_TEMPLATE_PATH
¶
Additional template search paths (colon-separated on Unix, semicolon-separated on Windows).
Configuration Files¶
BPM uses YAML configuration files at different levels:
Global Configuration¶
Location: ~/.config/bpm/config.yaml
# Global BPM configuration
cache_dir: "/path/to/cache"
log_level: "INFO"
default_repository: "main"
# Template settings
template_search_paths:
- "/path/to/custom/templates"
- "/another/template/path"
# Workflow settings
workflow_timeout: 3600 # seconds
max_concurrent_workflows: 4
# UI settings
color_output: true
progress_bars: true
Project Configuration¶
Location: project.yaml
in each project directory
# Project metadata
name: "240101_RNAseq_Study_Institute_Research"
description: "RNA-seq analysis of treatment vs control samples"
created_date: "2024-01-01"
author: "John Doe"
institution: "Research Institute"
# Project settings
active_repository: "bioinformatics_repo"
output_directory: "results"
temp_directory: "temp"
# Template variables
variables:
threads: 8
memory: "32G"
genome: "GRCh38"
adapter: "AGATCGGAAGAG"
# Workflow configurations
workflows:
rnaseq:
parameters:
threads: 16
memory: "64G"
environment:
CONDA_ENV: "rnaseq_env"
Repository Configuration¶
Location: repo.yaml
in each repository directory
# Repository metadata
name: "bioinformatics_repo"
version: "1.0.0"
description: "Bioinformatics templates and workflows"
maintainer: "Bioinformatics Team"
license: "MIT"
# Repository settings
default_templates:
- "demultiplexing:bclconvert"
- "nfcore:rnaseq"
# Template configurations
templates:
demultiplexing:
bclconvert:
description: "BCL Convert demultiplexing template"
variables:
threads: 8
memory: "16G"
dependencies:
- "bclconvert"
- "fastqc"
nfcore:
rnaseq:
description: "nf-core RNA-seq pipeline template"
variables:
threads: 16
memory: "64G"
genome: "GRCh38"
dependencies:
- "nextflow"
- "docker"
# Workflow configurations
workflows:
generate_nfcore_rnaseq_samplesheet:
description: "Generate sample sheet for nf-core RNA-seq"
script: "workflows/generate_nfcore_rnaseq_samplesheet.py"
parameters:
input_dir: "required"
output_file: "required"
sample_pattern: "*.fastq.gz"
Configuration Hierarchy¶
BPM follows a specific hierarchy for configuration settings:
- Command-line arguments (highest priority)
- Project configuration (
project.yaml
) - Repository configuration (
repo.yaml
) - Global configuration (
~/.config/bpm/config.yaml
) - Environment variables
- Default values (lowest priority)
Customizing Templates¶
Template Variables¶
You can customize template behavior by setting variables:
# In project.yaml
variables:
# System resources
threads: 16
memory: "64G"
# Analysis parameters
genome: "GRCh38"
adapter: "AGATCGGAAGAG"
quality_threshold: 20
# Paths
data_dir: "/path/to/data"
results_dir: "results"
# Custom settings
email: "user@institute.edu"
queue: "long"
Template Inheritance¶
Templates can inherit settings from parent configurations:
# Base template configuration
base_template:
variables:
threads: 8
memory: "32G"
# Specific template overrides
rnaseq_template:
extends: "base_template"
variables:
threads: 16 # Override base setting
memory: "64G" # Override base setting
genome: "GRCh38" # New variable
Advanced Settings¶
Workflow Configuration¶
Configure workflow execution settings:
# In project.yaml
workflows:
default:
timeout: 3600 # 1 hour
retries: 3
environment:
CONDA_ENV: "bioinformatics"
rnaseq:
timeout: 7200 # 2 hours
retries: 2
parameters:
threads: 16
memory: "64G"
environment:
CONDA_ENV: "rnaseq_env"
SLURM_PARTITION: "long"
Repository Management¶
Configure repository behavior:
# In global config
repositories:
auto_update: true
update_interval: 86400 # 24 hours
cache_size_limit: "10GB"
# Repository-specific settings
bioinformatics_repo:
auto_update: false
update_interval: 604800 # 1 week
Logging Configuration¶
Customize logging behavior:
# In global config
logging:
level: "INFO"
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
file: "/path/to/bpm.log"
max_size: "10MB"
backup_count: 5
Security Considerations¶
Sensitive Information¶
Avoid storing sensitive information in configuration files:
# ❌ Don't do this
variables:
api_key: "secret_key_here"
password: "my_password"
# ✅ Use environment variables instead
variables:
api_key: "${API_KEY}"
password: "${DB_PASSWORD}"
File Permissions¶
Set appropriate permissions for configuration files:
# Global config
chmod 600 ~/.config/bpm/config.yaml
# Project config
chmod 644 project.yaml
# Repository config
chmod 644 repo.yaml
Troubleshooting Configuration¶
Common Issues¶
- Configuration not loaded: Check file paths and permissions
- Variables not resolved: Ensure environment variables are set
- Template inheritance issues: Verify YAML syntax and structure
- Workflow failures: Check timeout and resource settings
Debugging¶
Enable debug logging to troubleshoot configuration issues:
Validation¶
BPM validates configuration files and will show errors for invalid settings:
# Check project configuration
bpm info --project project.yaml
# Validate repository configuration
bpm repo list --verbose
Best Practices¶
- Use environment variables for sensitive information
- Keep configurations version-controlled when appropriate
- Document custom configurations in project README files
- Test configurations before deploying to production
- Use consistent naming conventions across projects
- Backup important configurations regularly
Examples¶
Complete Project Configuration¶
# project.yaml
name: "240101_RNAseq_Study_Institute_Research"
description: "RNA-seq analysis of treatment vs control samples"
created_date: "2024-01-01"
author: "John Doe"
institution: "Research Institute"
active_repository: "bioinformatics_repo"
output_directory: "results"
temp_directory: "temp"
variables:
# System resources
threads: 16
memory: "64G"
# Analysis parameters
genome: "GRCh38"
adapter: "AGATCGGAAGAG"
quality_threshold: 20
# Paths
data_dir: "/data/raw"
results_dir: "results"
# Custom settings
email: "john.doe@institute.edu"
queue: "long"
workflows:
rnaseq:
timeout: 7200
retries: 2
parameters:
threads: 16
memory: "64G"
environment:
CONDA_ENV: "rnaseq_env"
SLURM_PARTITION: "long"
This configuration provides a comprehensive setup for a typical RNA-seq analysis project with all necessary parameters and settings.