NOT your grandma’s ANALYSIS PIPELINE

For use with REALLY RNA-SeQ Library preparation kitS

This pipeline is for RNA-Seq data analyses for libraries prepared with the REALLY-rNONE workflow (works with other RNA-Seq library prep data too). The software takes in raw FASTQs and trims adapters, aligns to a user-specified reference transcriptome using the STAR-Aligner software, and marks duplicates.
To generate UMI-deconvoluted FASTQs from BCL files use SRSLYumi.

REALLYRUN software workflow. Raw FASTQ or BCL files are processed to generate a comprehensive data output. Steps in teal indicate processing of sequencing data without UMI and steps in yellow indicate processing of UMI-aware sequencing data. SRSLYumi must be run prior to processing UMI-aware data. STAR index can be generated from the reference genome, this will not be required in subsequent run (see below for more information).

STEP 1: DOWNLOAD realLYRUN

Install reallyRUN with: 

pip3 install reallyrun

This will also install the SRSLYumi pipeline required for UMI deconvolution. The package is compatible with Python 3 and can be installed in a virtual environment if necessary. This software requires conda, We recommend mamba for speed. If you prefer to use standard conda, installation instructions can be found here. For reproducibility's sake and to ensure appropriate versions we use snakemake wrappers for many of the tools in this pipeline, which are often slow to create the first time they are used. As a result, your first time running the software may take a long time - don't worry, this is totally normal!

MANUAL

Download the REALLYrun code from GitHub

STEP 2: RUN realLYRUN

with star index generation

The basic analysis can be run with really runsamples when running on standard libraries or really runsamples --umi when running on libraries with UMIs. For UMI aware demltiplexing of REALLY libraries please use our SRSLYumi python package (more info at https://github.com/claretbio/SRSLYumi).

Run this if there is no STAR Index:


really runsamples --fastqdir /home/user/fastqfiles \
--resultsdir /home/user/amazing-results \
--reference /home/user/data/hg38.fa \
--gtf /home/user/data/hg38.gtf \
--refflat /home/user/data/hg38_refflat.txt \
--ribosomal /home/user/data/hg38_rrna.interval_list \
--libraries lib1,lib2,lib3

Where lib1, lib2, lib3 are library IDs. The library IDs provided should match the beginning of the fastq files. For example, the library ID for the fastq files named lib1_R1.fastq.gz and lib1_R2.fastq.gz would be lib1. This can be provided directly on the command line with a comma separated list: --libraries lib1,lib2 or as a file that lists one library ID per line: --libfile libfile.txt.

Optional arguments:

    --fastqdir : a path to the directory containing the raw fastqs you wish to process 
    
    --resultsdir : a path to the directory you would like the output to be in 

    --indexdir: a path to the directory where you would like the STAR index to be created 

If not specified, all will default to the current directory 

without STAR index generation

Run this if you have already generated the STAR index using the workflow above or have a previously generated Index file and a matching gtf file.

really runsamples --fastqdir /home/user/fastqfiles \
--starindex /home/user/starIndex \
--resultsdir /home/user/amazing-results \
--gtf /home/user/data/hg38.gtf \
--refflat /home/user/data/hg38_refflat.txt \
--ribosomal /home/user/data/hg38_rrna.interval_list \
--libraries lib1,lib2,lib3

This command will provide more information about usage:

 really runsamples --help

STEP 3: ANALYZE DATA

Output files will be in the directory specified by --resultsdir above or in the current working directory, with individual directories for each library.

 Output files:

  • Trimmed fastqs

  • Mapped duplicate bams

  • Samtools flagstat output

  • picardCollectRnaSeqMetrics output

  • An insert length distribution

  • GC Bias plot and table

  • A summary stats file

 UMI aware runs will also have

  • Consensus reads for each UMI with fgbio

  • A umi.bam file with corrected UMIs and UMI aware duplicate marking

For additional information about running the pipeline, contact technicalsupport@claretbio.com