nf-core/stableexpression
Edit

This pipeline is dedicated to identifying the most stable genes within a single or multiple expression dataset(s). This is particularly useful for identifying the most suitable RT-qPCR reference genes for a specific species.

expressionhousekeeping-genesqpcr-analysis

This is the development version of the pipeline.

Launch development version https://github.com/nf-core/stableexpression

Introduction

This document describes the output produced by the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

Get accessions

Get Expression Atlas dataset accessions corresponding to the provided species (and optionally keywords) (run by default; optional)
Get NBCI GEO microarray dataset accessions corresponding to the provided species (and optionally keywords) (run by default; optional)

Download data

Download Expression Atlas data (run by default; optional)
Download NBCI GEO data (run by default; optional)

ID Mapping

Map gene IDS to NCBI Entrez Gene IDS (or Ensembl IDs) for standardisation among datasets using g:Profiler (run by default; optional)

Data normalisation

Normalize RNAseq raw data using TPM (necessitates downloading the corresponding genome and computing transcript lengths) or CPM.
Perform quantile normalisation on each dataset separately using scikit-learn

Merge all data
Compute base statistics for each gene, platform-wide and for each platform (RNAseq and microarray)
Compute stability scoring

Get list of candidate genes based on base statistics
Run optimised, scalable version of Normfinder
Run optimised, scalable version of Genorm (NOT run by default; optional)
Compute stability scores for each candidate gene

Aggregate results
Prepare Dash Plotly app for further investigation of gene / sample counts
Make MultiQC report

Output files

MultiQC

This report is located at multiqc/multiqc_report.html and can be opened in a browser.

Output files

multiqc/
- MultiQC report file: multiqc_report.html.
- MultiQC data dir: multiqc_data.
- Plots created by MultiQC: multiqc_plots.

MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.

Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see http://multiqc.info.

Dash Plotly app

dash_app/: folder containing the Dash Plotly app

To launch the app, you must first create and activate the appropriate conda environment:

conda env create -n nf-core-stableexpression-dash -f <OUTDIR>/dash_app/spec-file.txt
conda activate nf-core-stableexpression-dash

then:

cd dash_app
python app.py

and open your browser at http://localhost:8080

Note

The app will try to use the port 8080 by default. If it is already in use, it will try 8081, 8082 and so on. Check the logs to see which port it is using.

Expression Atlas

Output files

public_data/expression_atlas/accessions/: accessions found when querying Expression Atlas
public_data/expression_atlas/datasets/: count datasets (normalized: *.normalised.csv / raw: *.raw.csv) and experimental designs (*.design.csv) downloaded from Expression Atlas.

GEO

Output files

public_data/geo/accessions/: accessions found when querying GEO
public_data/geo/datasets/: count datasets (normalized: *.normalised.csv / raw: *.raw.csv) and experimental designs (*.design.csv) downloaded from GEO.

IDMapping (g:Profiler)

Output files

idmapping/
- Count datasets whose gene IDs have been mapped: *.renamed.csv.
- Table associating original gene IDs and mapped gene IDs: *.mapping.csv.
- Gene metadata (name and description): *.metadata.csv.

Normalisation

Output files

normalised/: Newly normalised datasets
- normalised/deseq2/ for DESeq2
- normalised/edger/ for EdgeR
quantile_normalised : Quantile normalised datasets

Gene base statistics

Output files

merged_datasets/: Merged count datasets (sample-wide)
- merged_datasets/all/ : all datasets together
- merged_datasets/rnaseq/ : only RNA-seq datasets
- merged_datasets/microarray/ : only microarray datasets

Merged counts

The file containing all normalised counts is bundled as a Parquet file with the Dash Plotly app.

Output files

dash_app/data/all_counts.parquet: Merged count datasets (sample-wide)

Summary of gene statistics and scores

The gene stat summary is also bundled with the Dash Plotly app.

Output files

dash_app/data/all_genes_summary.csv: file containing all gene statistics, scores and ranked by stability score

Overall experimental design

Output files

dash_app/data/whole_design.csv: file containing all experimental design information

Pipeline information

Output files

pipeline_info/
- Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
- Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter’s are used when running the pipeline.
- Parameters used by the pipeline run: params.json.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

On this page

nf-core/stableexpression Edit

Introduction

Pipeline overview

Output files

MultiQC

Dash Plotly app

Expression Atlas

GEO

IDMapping (g:Profiler)

Normalisation

Gene base statistics

Merged counts

Summary of gene statistics and scores

Overall experimental design

Pipeline information

nf-core/stableexpression
Edit