--- title: 'NACHO' subtitle: "A NAnostring quality Control dasHbOard" author: "Mickaƫl Canouil, Ph.D., Gerard A. Bouland and Roderick C. Slieker, Ph.D." email: "mickael.canouil@cnrs.fr" date: '`r format(Sys.time(), "%B %d, %Y")`' bibliography: nacho.bib link-citations: true output: rmarkdown::html_vignette: number_sections: true toc: true toc_depth: 2 fig_width: 6.3 fig_height: 4.7 vignette: > %\VignetteIndexEntry{NACHO} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( eval = TRUE, collapse = TRUE, # results = "asis", include = TRUE, echo = TRUE, warning = TRUE, message = TRUE, error = TRUE, # tidy = FALSE, # crop = TRUE, # autodep = TRUE, fig.align = "center", fig.pos = "!h", cache = FALSE ) ``` ```{r logo, echo = FALSE, out.width = "150px"} knitr::include_graphics(path = "nacho_hex.png") ``` # Installation ```{r, eval = FALSE} # Install NACHO from CRAN: install.packages("NACHO") # Or the the development version from GitHub: # install.packages("remotes") remotes::install_github("mcanouil/NACHO") ``` ```{r, message = FALSE} # Load NACHO library(NACHO) ``` # Overview ```{r, echo = FALSE, results = "asis"} cat(readLines(system.file("app", "www", "about-nacho.md", package = "NACHO"))[-c(1, 2)], sep = "\n") ``` ```{r, echo = FALSE, results = "asis"} print(citation("NACHO"), "html") ``` ```{r, echo = FALSE, comment = ""} print(citation("NACHO"), "bibtex") ``` # An example To display the usage and utility of *NACHO*, we show three examples in which the above mentioned functions are used and the results are briefly examined. *NACHO* comes with presummarised data and in the first example we use this dataset to call the interactive web application using `visualise()`. In the second example, we show the process of going from raw RCC files to visualisations with a dataset queried from **GEO** using `GEOquery`. In the third example, we use the summarised dataset from the second example to calculate the sample specific size factors using `normalise()` and its added functionality to predict housekeeping genes. Besides creating interactive visualisations, *NACHO* also identifies poorly performing samples which can be seen under the Outlier Table tab in the interactive web application. While calling `normalise()`, the user has the possibility to remove these outliers before size factor calculation. ## Get NanoString nCounter data ### Presummarised data from *NACHO* This example shows how to use summarised data to call the interactive web application. The raw data used is from a study of @liu_pam50_2016 and was acquired from the NCBI GEO public database [@barrett_ncbi_2013]. ```{r ex1, eval = FALSE} library(NACHO) data(GSE74821) visualise(GSE74821) ``` ```{r ex1-fig, echo = FALSE, out.width = "650px"} knitr::include_graphics(path = "README-visualise.png") ``` ### Raw data from GEO Numerous NanoString nCounter datasets are available from GEO [@barrett_ncbi_2013]. In this example, we use a mRNA dataset from the study of @bruce_identification_2015 with the GEO accession number: **GSE70970**. The data is extracted and prepared using the following code. ```{r geo-down, echo = FALSE, warning = FALSE, message = FALSE, error = FALSE} gse <- try(GEOquery::getGEO("GSE70970"), silent = TRUE) if (inherits(gse, "try-error")) { # when GEOquery is down cons <- showConnections(all = TRUE) icons <- which(grepl("GSE70970", cons[, "description"])) - 1 for (icon in icons) close(getConnection(icon)) cat( "Note: `GEOquery` seems to be currently down. Thus, the following code was not executed.\n" ) } ``` ```{r ex2, results = "hide", message = FALSE, warning = FALSE, eval = !inherits(gse, "try-error")} library(GEOquery) # Download data gse <- getGEO("GSE70970") getGEOSuppFiles(GEO = "GSE70970", baseDir = tempdir()) # Unzip data untar( tarfile = file.path(tempdir(), "GSE70970", "GSE70970_RAW.tar"), exdir = file.path(tempdir(), "GSE70970", "Data") ) # Get phenotypes and add IDs targets <- pData(phenoData(gse[[1]])) targets$IDFILE <- list.files(file.path(tempdir(), "GSE70970", "Data")) ``` ```{r, echo = FALSE, message = FALSE, warning = FALSE, eval = !inherits(gse, "try-error")} targets[1:5, unique(c("IDFILE", names(targets)))] ``` After we extracted the dataset to the `` `r file.path(tempdir(), "GSE70970", "Data")` `` directory, a `Samplesheet.csv` containing a column with the exact names of the files for each sample can be written or use as is. ## The `load_rcc()` function The first argument requires the path to the directory containing the RCC files, the second argument is the location of samplesheet followed by third argument with the column name containing the exact names of the files. The `housekeeping_genes` and `normalisation_method` arguments respectively indicate which housekeeping genes and normalisation method should be used. ```{r ex3, eval = !inherits(gse, "try-error")} GSE70970_sum <- load_rcc( data_directory = file.path(tempdir(), "GSE70970", "Data"), # Where the data is ssheet_csv = targets, # The samplesheet id_colname = "IDFILE", # Name of the column that contains the unique identfiers housekeeping_genes = NULL, # Custom list of housekeeping genes housekeeping_predict = TRUE, # Whether or not to predict the housekeeping genes normalisation_method = "GEO", # Geometric mean or GLM n_comp = 5 # Number indicating how many principal components should be computed. ) ``` ```{r, echo = FALSE, results = "hide", eval = !inherits(gse, "try-error")} unlink(file.path(tempdir(), "GSE70970"), recursive = TRUE) ``` ## The `visualise()` function When the summarisation is done, the summarised (or normalised) data can be visualised using the `visualise()` function as can be seen in the following chunk of code. ```{r} #| eval: false visualise(GSE70970_sum) ``` The sidebar includes widgets to control quality-control thresholds. These widgets differ according to the selected tab. Each sample in the plots can be coloured based on either technical specifications which are included in the RCC files or based on specifications of your own choosing, though these specifications need to be included in the samplesheet. ## The `normalise()` function *NACHO* allows the discovery of housekeeping genes within your own dataset. *NACHO* finds the five best suitable housekeeping genes, however, it is possible that one of these five genes might not be suitable, which is why a subset of these discovered housekeeping genes might work better in some cases. For this example, we use the **GSE70970** dataset from the previous example. The discovered housekeeping genes are saved in the result object as **predicted_housekeeping**. ```{r ex5, eval = !inherits(gse, "try-error")} print(GSE70970_sum[["housekeeping_genes"]]) ``` ```{r intext, eval = !inherits(gse, "try-error"), echo = FALSE, results = "asis"} cat( "Let's say _", GSE70970_sum[["housekeeping_genes"]][1], "_ and _", GSE70970_sum[["housekeeping_genes"]][2], "_ are not suitable, therefore, you want to exclude these genes from the normalisation process.", sep = "" ) ``` ```{r, eval = !inherits(gse, "try-error")} my_housekeeping <- GSE70970_sum[["housekeeping_genes"]][-c(1, 2)] print(my_housekeeping) ``` The next step is the actual normalisation. The first argument requires the summary which is created with the `load_rcc()` function. The second argument requires a vector of gene names. In this case, it is a subset of the discovered housekeeping genes we just made. With the third argument the user has the choice to remove the outliers. Lastly, the normalisation method can be choosed. Here, the user has a choice between `"GLM"` or `"GEO"`. The differences between normalisation methods are nuanced, however, a preference for either method are use case specific. In this example, `"GLM"` is used. ```{r ex7, eval = !inherits(gse, "try-error")} GSE70970_norm <- normalise( nacho_object = GSE70970_sum, housekeeping_genes = my_housekeeping, housekeeping_predict = FALSE, housekeeping_norm = TRUE, normalisation_method = "GEO", remove_outliers = TRUE ) ``` `normalise()` returns a `list` object (same as `load_rcc()`) with `raw_counts` and `normalised_counts` slots filled with the raw and normalised counts. Both counts are also in the *NACHO* data.frame. ## The `autoplot()` function The `autoplot()` function provides an easy way to plot any quality-control from the `visualise()` function. ```{r, eval = FALSE} autoplot( object = GSE74821, x = "BD", colour = "CartridgeID", size = 0.5, show_legend = TRUE ) ``` The possible metrics (`x`) are: * `"BD"` (Binding Density) * `"FoV"` (Imaging) * `"PCL"` (Positive Control Linearity) * `"LoD"` (Limit of Detection) * `"Positive"` (Positive Controls) * `"Negative"` (Negative Controls) * `"Housekeeping"` (Housekeeping Genes) * `"PN"` (Positive Controls vs. Negative Controls) * `"ACBD"` (Average Counts vs. Binding Density) * `"ACMC"` (Average Counts vs. Median Counts) * `"PCA12"` (Principal Component 1 vs. 2) * `"PCAi"` (Principal Component scree plot) * `"PCA"` (Principal Components planes) * `"PFNF"` (Positive Factor vs. Negative Factor) * `"HF"` (Housekeeping Factor) * `"NORM"` (Normalisation Factor) ```{r, echo = FALSE, results = "asis"} metrics <- c( "BD" = "Binding Density", "FoV" = "Imaging", "PCL" = "Positive Control Linearity", "LoD" = "Limit of Detection", "Positive" = "Positive Controls", "Negative" = "Negative Controls", "Housekeeping" = "Housekeeping Genes", "PN" = "Positive Controls vs. Negative Controls", "ACBD" = "Average Counts vs. Binding Density", "ACMC" = "Average Counts vs. Median Counts", "PCA12" = "Principal Component 1 vs. 2", "PCAi" = "Principal Component scree plot", "PCA" = "Principal Components planes", "PFNF" = "Positive Factor vs. Negative Factor", "HF" = "Housekeeping Factor", "NORM" = "Normalisation Factor" ) for (imetric in seq_along(metrics)) { cat("\n\n###", metrics[imetric], "\n\n") print(autoplot(object = GSE74821, x = names(metrics[imetric]))) cat("\n") } ``` ## NACHO as a standalone app *NACHO* is also available as a standalone app to be used in a shiny server configuration. A convenience function `deploy()` is available to directly copy the *NACHO* app to the default directory of a shiny server. ```{r deploy, eval = FALSE} deploy(directory = "/srv/shiny-server", app_name = "NACHO") ``` The app can also be run directly, without manually summarising and normalising RCC files: ```{r app, eval = FALSE} shiny::runApp(system.file("app", package = "NACHO")) ``` ```{r app-fig, echo = FALSE, out.width = "650px"} knitr::include_graphics(path = "README-app.png") ``` ## The `render()` function The `render()` function renders a comprehensive HTML report, using `print(..., echo = TRUE)`, which includes all quality-control metrics and description of those metrics. ```{r, eval = FALSE} render( nacho_object = GSE74821, colour = "CartridgeID", output_file = "NACHO_QC.html", output_dir = ".", size = 0.5, show_legend = TRUE, clean = TRUE ) ``` The underneath function `print()` can be used directly within any Rmarkdown chunk, setting the parameter `echo = TRUE`. ```{r print, results = "asis"} print( x = GSE74821, colour = "CartridgeID", size = 0.5, show_legend = TRUE, echo = TRUE, title_level = 3 ) ``` # References