Add spike-in sequences to nfcore/smrnaseq pipeline
The current nfcore/smrnaseq pipeline (2.3.0)does not support spike-in sequences. However, it is possible to add spike-in sequences to the pipeline by following the steps below:
Download the spike-in sequences
Download the handbook for QIAseq miRNA Library QC PCR Handbook and find Appendix C: QIAseq miRNA Library QC Spike- In Sequences. Process the table and you can get the spike-in sequences as this file: QIAseq_miRNA_Spikein.fa.
Add spike-in sequences to the hairpin and mature fasta files
You can download the latest fasta files from mirbase and add the spike-in sequences to the hairpin and mature fasta files. In order to make the quantification of the pipeline work, you need to modify the sequence names of the spike-in sequences to the organism you desire to quantify. Here is an example:
1
2
3
4
5
6
7
8
>UniSP112
GGTTCGTACGTACACTGTTCA
>UniSP110
TTCGAGGCCTATTAAACCTCTG
>UniSP136
ATCAGTTTCTTGTTCGTTTCA
>UniSP109
CGAAACTGGTGTCGACCGACA
Add ‘hsa’ for human:
1
2
3
4
5
6
7
8
>hsa-mir-UniSP112
GGTTCGTACGTACACTGTTCA
>hsa-mir-UniSP110
TTCGAGGCCTATTAAACCTCTG
>hsa-mir-UniSP136
ATCAGTTTCTTGTTCGTTTCA
>hsa-mir-UniSP109
CGAAACTGGTGTCGACCGACA
Then append these sequences to both the hairpin and mature fasta files.
Modify the gff3 file
Download the latest gff3 file from mirbase and add the following line to the end of the file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
hsa-mir-UniSP100 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP100;Alias=hsa-mir-UniSP100;Name=hsa-mir-UniSP100
hsa-mir-UniSP101 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP101;Alias=hsa-mir-UniSP101;Name=hsa-mir-UniSP101
hsa-mir-UniSP102 . miRNA_primary_transcript 1 20 . + . ID=hsa-mir-UniSP102;Alias=hsa-mir-UniSP102;Name=hsa-mir-UniSP102
hsa-mir-UniSP103 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP103;Alias=hsa-mir-UniSP103;Name=hsa-mir-UniSP103
hsa-mir-UniSP104 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP104;Alias=hsa-mir-UniSP104;Name=hsa-mir-UniSP104
hsa-mir-UniSP105 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP105;Alias=hsa-mir-UniSP105;Name=hsa-mir-UniSP105
hsa-mir-UniSP106 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP106;Alias=hsa-mir-UniSP106;Name=hsa-mir-UniSP106
hsa-mir-UniSP107 . miRNA_primary_transcript 1 20 . + . ID=hsa-mir-UniSP107;Alias=hsa-mir-UniSP107;Name=hsa-mir-UniSP107
hsa-mir-UniSP108 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP108;Alias=hsa-mir-UniSP108;Name=hsa-mir-UniSP108
hsa-mir-UniSP109 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP109;Alias=hsa-mir-UniSP109;Name=hsa-mir-UniSP109
hsa-mir-UniSP110 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP110;Alias=hsa-mir-UniSP110;Name=hsa-mir-UniSP110
hsa-mir-UniSP111 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP111;Alias=hsa-mir-UniSP111;Name=hsa-mir-UniSP111
hsa-mir-UniSP112 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP112;Alias=hsa-mir-UniSP112;Name=hsa-mir-UniSP112
hsa-mir-UniSP113 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP113;Alias=hsa-mir-UniSP113;Name=hsa-mir-UniSP113
hsa-mir-UniSP114 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP114;Alias=hsa-mir-UniSP114;Name=hsa-mir-UniSP114
hsa-mir-UniSP115 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP115;Alias=hsa-mir-UniSP115;Name=hsa-mir-UniSP115
hsa-mir-UniSP116 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP116;Alias=hsa-mir-UniSP116;Name=hsa-mir-UniSP116
hsa-mir-UniSP117 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP117;Alias=hsa-mir-UniSP117;Name=hsa-mir-UniSP117
hsa-mir-UniSP118 . miRNA_primary_transcript 1 20 . + . ID=hsa-mir-UniSP118;Alias=hsa-mir-UniSP118;Name=hsa-mir-UniSP118
hsa-mir-UniSP119 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP119;Alias=hsa-mir-UniSP119;Name=hsa-mir-UniSP119
hsa-mir-UniSP120 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP120;Alias=hsa-mir-UniSP120;Name=hsa-mir-UniSP120
hsa-mir-UniSP121 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP121;Alias=hsa-mir-UniSP121;Name=hsa-mir-UniSP121
hsa-mir-UniSP122 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP122;Alias=hsa-mir-UniSP122;Name=hsa-mir-UniSP122
hsa-mir-UniSP123 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP123;Alias=hsa-mir-UniSP123;Name=hsa-mir-UniSP123
hsa-mir-UniSP124 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP124;Alias=hsa-mir-UniSP124;Name=hsa-mir-UniSP124
hsa-mir-UniSP125 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP125;Alias=hsa-mir-UniSP125;Name=hsa-mir-UniSP125
hsa-mir-UniSP126 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP126;Alias=hsa-mir-UniSP126;Name=hsa-mir-UniSP126
hsa-mir-UniSP127 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP127;Alias=hsa-mir-UniSP127;Name=hsa-mir-UniSP127
hsa-mir-UniSP128 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP128;Alias=hsa-mir-UniSP128;Name=hsa-mir-UniSP128
hsa-mir-UniSP129 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP129;Alias=hsa-mir-UniSP129;Name=hsa-mir-UniSP129
hsa-mir-UniSP130 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP130;Alias=hsa-mir-UniSP130;Name=hsa-mir-UniSP130
hsa-mir-UniSP131 . miRNA_primary_transcript 1 24 . + . ID=hsa-mir-UniSP131;Alias=hsa-mir-UniSP131;Name=hsa-mir-UniSP131
hsa-mir-UniSP132 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP132;Alias=hsa-mir-UniSP132;Name=hsa-mir-UniSP132
hsa-mir-UniSP133 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP133;Alias=hsa-mir-UniSP133;Name=hsa-mir-UniSP133
hsa-mir-UniSP134 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP134;Alias=hsa-mir-UniSP134;Name=hsa-mir-UniSP134
hsa-mir-UniSP135 . miRNA_primary_transcript 1 24 . + . ID=hsa-mir-UniSP135;Alias=hsa-mir-UniSP135;Name=hsa-mir-UniSP135
hsa-mir-UniSP136 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP136;Alias=hsa-mir-UniSP136;Name=hsa-mir-UniSP136
hsa-mir-UniSP137 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP137;Alias=hsa-mir-UniSP137;Name=hsa-mir-UniSP137
hsa-mir-UniSP138 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP138;Alias=hsa-mir-UniSP138;Name=hsa-mir-UniSP138
hsa-mir-UniSP139 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP139;Alias=hsa-mir-UniSP139;Name=hsa-mir-UniSP139
hsa-mir-UniSP140 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP140;Alias=hsa-mir-UniSP140;Name=hsa-mir-UniSP140
hsa-mir-UniSP141 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP141;Alias=hsa-mir-UniSP141;Name=hsa-mir-UniSP141
hsa-mir-UniSP142 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP142;Alias=hsa-mir-UniSP142;Name=hsa-mir-UniSP142
hsa-mir-UniSP143 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP143;Alias=hsa-mir-UniSP143;Name=hsa-mir-UniSP143
hsa-mir-UniSP144 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP144;Alias=hsa-mir-UniSP144;Name=hsa-mir-UniSP144
hsa-mir-UniSP145 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP145;Alias=hsa-mir-UniSP145;Name=hsa-mir-UniSP145
hsa-mir-UniSP146 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP146;Alias=hsa-mir-UniSP146;Name=hsa-mir-UniSP146
hsa-mir-UniSP147 . miRNA_primary_transcript 1 22 . + . ID=hsa-mir-UniSP147;Alias=hsa-mir-UniSP147;Name=hsa-mir-UniSP147
hsa-mir-UniSP148 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP148;Alias=hsa-mir-UniSP148;Name=hsa-mir-UniSP148
hsa-mir-UniSP149 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP149;Alias=hsa-mir-UniSP149;Name=hsa-mir-UniSP149
hsa-mir-UniSP150 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP150;Alias=hsa-mir-UniSP150;Name=hsa-mir-UniSP150
hsa-mir-UniSP151 . miRNA_primary_transcript 1 21 . + . ID=hsa-mir-UniSP151;Alias=hsa-mir-UniSP151;Name=hsa-mir-UniSP151
It is important to set the miRNA type as miRNA_primary_transcript in the GFT file, because the type miRNA require the information of the precursor sequence, which is not available for spike-in sequences (see mirtop/mirtop/mirna/mapper.py, line 334 to 343).
Add the parameters to nfcore/smrnaseq
And then modify your command for running the pipeline. The key is to add –mirna_gtf, –mature, and –hairpin parameters to the command.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
nextflow run nf-core/smrnaseq -r 2.3.0 \
-profile docker \
--input samplesheet.csv \
--genome 'hg38' \
--mirtrace_species 'hsa' \
--mirna_gtf hsa_modified.gff3 \
--mature mature_with_qiaseq_spikein.fa \
--hairpin hairpin_with_qiaseq_spikein.fa \
--protocol 'qiaseq' \
--outdir results \
--save_reference \
--with_umi \
--umitools_extract_method regex \
--umitools_bc_pattern '.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})(?P<discard_2>.*)' # Regex pattern for Qiagen_QIAseq_miRNA
Then the spike-in sequences will be quantified together with the other miRNAs.
Reference
- Discuss on Slack channel: https://nfcore.slack.com/archives/CL66GAJBF/p1678195598260819