NOT your grandma’s ANALYSIS PIPELINE
For use with REALLY RNA-SeQ Library preparation kitS
This pipeline is for RNA-Seq data analyses for libraries prepared with the REALLY-rNONE workflow (works with other RNA-Seq library prep data too). The software takes in raw FASTQs and trims adapters, aligns to a user-specified reference transcriptome using the STAR-Aligner software, and marks duplicates.
To generate UMI-deconvoluted FASTQs from BCL files use SRSLYumi.
STEP 1: DOWNLOAD realLYRUN
Install reallyRUN with:
pip3 install reallyrun
This will also install the SRSLYumi pipeline required for UMI deconvolution. The package is compatible with Python 3 and can be installed in a virtual environment if necessary. This software requires conda, We recommend mamba for speed. If you prefer to use standard conda, installation instructions can be found here. For reproducibility's sake and to ensure appropriate versions we use snakemake wrappers for many of the tools in this pipeline, which are often slow to create the first time they are used. As a result, your first time running the software may take a long time - don't worry, this is totally normal!
MANUAL
Download the REALLYrun code from GitHub
STEP 2: RUN realLYRUN
with star index generation
The basic analysis can be run with really runsamples
when running on standard libraries or really runsamples --umi
when running on libraries with UMIs. For UMI aware demltiplexing of REALLY libraries please use our SRSLYumi python package (more info at https://github.com/claretbio/SRSLYumi).
Run this if there is no STAR Index:
really runsamples --fastqdir /home/user/fastqfiles \
--resultsdir /home/user/amazing-results \
--reference /home/user/data/hg38.fa \
--gtf /home/user/data/hg38.gtf \
--refflat /home/user/data/hg38_refflat.txt \
--ribosomal /home/user/data/hg38_rrna.interval_list \
--libraries lib1,lib2,lib3
Where lib1, lib2, lib3 are library IDs. The library IDs provided should match the beginning of the fastq files. For example, the library ID for the fastq files named lib1_R1.fastq.gz
and lib1_R2.fastq.gz
would be lib1
. This can be provided directly on the command line with a comma separated list: --libraries lib1,lib2
or as a file that lists one library ID per line: --libfile libfile.txt
.
Optional arguments:
--fastqdir : a path to the directory containing the raw fastqs you wish to process
--resultsdir : a path to the directory you would like the output to be in
--indexdir: a path to the directory where you would like the STAR index to be created
If not specified, all will default to the current directory
without STAR index generation
Run this if you have already generated the STAR index using the workflow above or have a previously generated Index file and a matching gtf file.
really runsamples --fastqdir /home/user/fastqfiles \
--starindex /home/user/starIndex \
--resultsdir /home/user/amazing-results \
--gtf /home/user/data/hg38.gtf \
--refflat /home/user/data/hg38_refflat.txt \
--ribosomal /home/user/data/hg38_rrna.interval_list \
--libraries lib1,lib2,lib3
This command will provide more information about usage:
really runsamples --help
STEP 3: ANALYZE DATA
Output files will be in the directory specified by --resultsdir
above or in the current working directory, with individual directories for each library.
Output files:
Trimmed fastqs
Mapped duplicate bams
Samtools flagstat output
picardCollectRnaSeqMetrics output
An insert length distribution
GC Bias plot and table
A summary stats file
UMI aware runs will also have
Consensus reads for each UMI with fgbio
A umi.bam file with corrected UMIs and UMI aware duplicate marking
For additional information about running the pipeline, contact technicalsupport@claretbio.com