PALM 0.1.0
This R package implements PALM
[1], a quasi-Poisson-based framework designed for robust, scalable, and reproducible identification of covariate-associated microbial features in large-scale microbiome association studies and meta-analyses. The package can perform single-study association analysis and meta-analysis across multiple studies. This tutorial provides examples demonstrating meta-analysis.
Figure. 1 shows the schematic overview of PALM for meta-analysis. For each study, PALM takes RA count data of microbial features, sequencing depth, the covariate of interest, and any confounders that need to be adjusted for as input to generate AA-level summary statistics. Specifically, for each feature, PALM first fits a quasi-Poisson null model (without the covariate), then calculates score statistics based on the null model output and the covariate, and finally constructs the RA-level summary statistics (i.e., RA association effect estimate and variance) based on the score statistics. Next, PALM corrects the compositional effect using the RA-level summary statistics of all features to derive AA-level summary statistics. The forest plot illustrates the AA-level summary statistics for a feature. These summary statistics can be combined via meta-analysis to obtain the overall effect estimate, variance, and p-value. Covariate-associated microbial features are selected based on the test p-values, accounting for multiple testing across features.
Figure 1: Schematic overview of PALM for meta-analysis
Install package from GitHub.
if(!require("PALM", quietly = TRUE)){
devtools::install_github("ZjpWei/PALM_package")
}
Load the required packages.
library("PALM")
library("ggplot2")
library("cowplot")
library("tidyverse")
library("DT")
options(DT.options = list(
initComplete = JS("function(settings, json) {",
"$(this.api().table().header()).css({'background-color':
'#000', 'color': '#fff'});","}")))
There are two approaches PALM can run meta-analysis.
Approach #1: If the individual-level sample data across all participating studies are available, the function palm
can take these data as input to generate, harmonize, and combine summary association statistics across studies for fast and reliable identification of microbial features.
Approach #2: If individual-level samples of each study are not in a central place, summary statistics can be generated for each study separately using the functions palm.null.model
and palm.get.summary
. The summary statistics can be transported to a central place to be harmonized and combined for feature selection using the function palm.meta.summary
.
We demonstrate Approach #1 in this session and Approach #2 in the next session.
Here we use the datasets from two metagenomics studies of colorectal cancer (CRC) [2] to demonstrate the use of each function. The CRC_abd
is a list of sample-by-feature matrices of relative abundance counts of 267 species under order Clostridiales from the two studies. The CRC_meta
is a data frame including the sample-level variables from these two studies. In particular, the following variables are in the CRC_meta
data:
Sample identity: “Sample_ID”
Study name: “Study”
Disease status: “Group”
## Load CRC data
data("CRC_data")
CRC_abd <- CRC_data$CRC_abd
CRC_meta <- CRC_data$CRC_meta
# Prepare input data
rel.abd <- list()
covariate.interest <- list()
for(d in unique(CRC_meta$Study)){
rel.abd[[d]] <- CRC_abd[CRC_meta$Sample_ID[CRC_meta$Study == d],]
disease <- as.numeric(CRC_meta$Group[CRC_meta$Study == d] == "CRC")
names(disease) <- CRC_meta$Sample_ID[CRC_meta$Study == d]
covariate.interest[[d]] <- matrix(disease, ncol = 1, dimnames = list(names(disease), "disease"))
}
# palm analysis
meta.result <- palm(rel.abd = rel.abd, covariate.interest = covariate.interest, prev.filter = 0)
The table below presents results for 20 microbial features, including overall association effect estimates, standard errors, p-values, and q-values for testing the overall effect; p-values and q-values for testing cross-study effect heterogeneity; and summary statistics (association effect estimates and standard errors) for individual studies.
meta.result$disease %>% dplyr::slice(1:20) %>%
mutate(across(where(is.numeric), ~ round(.x, digits = 4))) %>%
datatable()
The figure below displays the features with \(q\)-value \(≤ 0.05\). The dots in scatter plot represent the overall association effect estimates with lines indicating the 95% confidence intervals. The bar plot shows the \(-\log_{10}(q\text{-value})\).
## summarize results
palm.df <- meta.result$disease %>% dplyr::filter(qval <= 0.05) %>%
dplyr::filter(coef > 0) %>% arrange(qval) %>%
dplyr::add_row(meta.result$disease %>% dplyr::filter(qval <= 0.05) %>%
dplyr::filter(coef < 0) %>% arrange(desc(qval))) %>%
dplyr::transmute(feature = factor(feature, levels = feature), coef, stderr, qval, sig = (qval <= 0.05),
dir = case_when(coef > 0 & qval <= 0.05 ~ "pos",
coef < 0 & qval <= 0.05 ~ "neg",
TRUE ~ "other")) %>% arrange(desc(row_number()))
## generate bar plot
df.bar <- palm.df %>% ggplot(aes(x=feature, y=-log10(qval), fill= sig)) +
geom_bar(stat="identity") + ylim(0, 2.5) + scale_fill_manual(
breaks = c(TRUE, FALSE), values = c("grey50", "grey80")) +
geom_hline(aes(yintercept = -log10(0.05)),colour="#990000", linetype="dashed") +
theme_minimal() + coord_flip() + ylab("-log10 (q-value)") +
theme(axis.title.x = element_text(size = 10),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(hjust = 0.5, size = 20),
panel.background = element_rect(fill=NULL, colour='black', linewidth = 1),
axis.text.y = element_blank(),
axis.text.x = element_text(size = 10),
legend.title = element_text(hjust = 0.5, size = 16),
legend.text = element_text(size = 10),
legend.position = "none",
legend.direction = "vertical",
legend.box = "vertical",
strip.text = element_blank())
## generate scatter plot
df.scatter <- palm.df %>% ggplot(aes(x=feature, y=coef)) +
geom_errorbar(aes(ymin = coef - 1.96 * stderr, ymax = coef + 1.96 * stderr), color = "grey80",
width = 0, linewidth = 0.5) +
geom_point(aes(color = dir), pch = 18, size = 2.5) + ylim(-3, 3) +
geom_hline(aes(yintercept = 0),colour="#990000", linetype="dashed") +
theme_minimal() + scale_color_manual(
breaks = c("pos", "other", "neg"),
values = c("red", "grey80", "blue")) +
coord_flip() + ylab("Meta Coefficients") +
theme(axis.title.x = element_text(size = 10),
axis.title.y = element_text(size = 10),
axis.ticks = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(hjust = 0.5, size = 20),
panel.background = element_rect(fill=NULL, colour='black', linewidth = 1),
axis.text.y = element_text(size = 10),
axis.text.x = element_text(size = 10),
legend.title = element_text(hjust = 0.5, size = 16),
legend.text = element_text(size = 10),
legend.position = "none",
legend.direction = "vertical",
legend.box = "vertical",
strip.text = element_blank())
## plot
plot_grid(df.scatter, df.bar, nrow = 1, align = 'h', rel_widths = c(0.73, 0.27))
The figure below shows dots representing study-level association effect estimates with lines indicating the 95% confidence intervals. The PALM effect estimates show good consistency across studies.
df.plot <- NULL
df.area <- NULL
for(l in palm.df$feature){
df.plot <- rbind(df.plot, tibble(feature = factor(l, levels = levels(palm.df$feature)),
Study = c("FR-CRC", "DE-CRC"),
AA = c(meta.result$disease[meta.result$disease$feature == l, "FR-CRC_effect"],
meta.result$disease[meta.result$disease$feature == l, "DE-CRC_effect"]),
ci = 1.96 * c(meta.result$disease[meta.result$disease$feature == l, "FR-CRC_stderr"],
meta.result$disease[meta.result$disease$feature == l, "DE-CRC_stderr"])))
if(meta.result$disease[meta.result$disease$feature == l, "qval.het"] <= 0.1){
df.area <- rbind(df.area, tibble(species = l, AA = 0))
}
}
g.PALM <- df.plot %>% ggplot(aes(x = feature, y= AA)) +
geom_errorbar(aes(ymin = AA - ci, ymax = AA + ci, group = Study),
position = position_dodge(width = 0.6),
width = 0, linewidth = 0.5) +
geom_point(pch = 18, aes(color = Study),
size = 2.5, position = position_dodge(width = 0.6)) +
theme_minimal() + ylab("PALM\nCoefficients") +
coord_flip() +
geom_hline(aes(yintercept = 0),colour="#990000", linetype="dashed") +
ylim(-8, 4.5) +
theme(axis.title.y = element_text(size = 13),
axis.title.x = element_text(size = 13),
axis.ticks = element_blank(),
panel.grid = element_blank(),
panel.background = element_rect(fill=NULL, colour='black', linewidth = 1),
axis.text.y = element_text(size = 13),
axis.text.x = element_text(size = 13),
legend.title = element_text(hjust = 0.5, size = 16),
legend.text = element_text(size = 13),
legend.position = c(0.2, 0.85),
legend.direction = "vertical",
legend.box = "vertical",
strip.text = element_blank()) +
guides(color = guide_legend(reverse=T))
if(!is.null(df.area)){
g.PALM <- g.PALM + geom_rect(data = df.area,
aes(xmin = as.numeric(species) - 0.5,
xmax = as.numeric(species) + 0.5,
ymin = -Inf, ymax = Inf),
fill = "yellow", alpha = 0.2)
}
g.PALM
If the datasets of different studies live in different locations and cannot be conveniently shared among studies, we can first generate summary statistics for each study:
## Generate summary statistics for each study
null.obj.FR <- palm.null.model(rel.abd = rel.abd$`FR-CRC`, prev.filter = 0)
summary.stats.FR <- palm.get.summary(null.obj = null.obj.FR, covariate.interest = covariate.interest$`FR-CRC`)
null.obj.DE <- palm.null.model(rel.abd = rel.abd$`DE-CRC`, prev.filter = 0)
summary.stats.DE <- palm.get.summary(null.obj = null.obj.DE, covariate.interest = covariate.interest$`DE-CRC`)
These summary statistics can be transported to a central location for meta-analysis:
## Concatenate summary statistics
summary.stats.merge <- c(summary.stats.FR, summary.stats.DE)
names(summary.stats.merge) <- c("FR-CRC", "DE-CRC")
## Meta-analysis to harmonize and combine summary statistics across studies
meta.result.2 <- palm.meta.summary(summary.stats = summary.stats.merge)
The meta-analysis results generated in this way are identical to those from the previous session.
Microbiome association studies can involve a large number of covariates of interest (e.g., omics variables). We show here how to use PALM to meta-analyze eight microbiome-metabolome association studies [3]. In these eight studies, we have 101 genera and 450 metabolites and we are interested in identifying genera that are associated with individual metabolites. This analysis takes approximately 50 seconds.
## Load metabolite data:
## You may see the following website on how to directly load data from
## github into R https://github.com/ZjpWei/PALM/raw/main/Metabolite.rda
options(timeout = 300)
load(file = url("https://github.com/ZjpWei/PALM/raw/main/Metabolite.rda"))
## Change genera names
for(d in names(otu_data_lst)){
colnames(otu_data_lst[[d]]) <- gsub(".*;g__", "", colnames(otu_data_lst[[d]]))
}
## Get null model
null.obj <- palm.null.model(rel.abd = otu_data_lst, covariate.adjust = covariates_adjust_lst)
## Get summary statistics
summary.stat <- palm.get.summary(null.obj = null.obj, covariate.interest = cmpd_data_lst,
cluster = cluster_data_lst)
## Meta-analysis
meta.scan.result <- palm.meta.summary(summary.stats = summary.stat)
The following shows the effect estimates of 20 microbial features for 5 metabolites.
meta.sum <- lapply(meta.scan.result, function(d){d %>% tibble::column_to_rownames("feature")})
selected.num <- sort(unlist(lapply(meta.sum, function(d){sum(d$qval <= 0.05)})), decreasing = TRUE)
top.cov.name <- names(selected.num)[1:min(5, length(selected.num))]
coef_mat <- do.call(cbind, lapply(meta.sum[top.cov.name], function(d){d[,"coef",drop=FALSE]}))
colnames(coef_mat) <- top.cov.name
coef_mat %>% dplyr::slice(1:20) %>%
mutate(across(where(is.numeric), ~ round(.x, digits = 4))) %>%
datatable()
The following figure displays the features associated with butyrate production with \(q\)-value \(\leq 0.05\). The dots in scatter plot represent the overall association effect estimates with lines indicating the 95% confidence intervals. The bar plot shows the \(-\log_{10}(q\text{-value})\).
palm.df <- meta.scan.result$HMDB0000039 %>% dplyr::filter(qval <= 0.05) %>%
dplyr::filter(coef > 0) %>% arrange(qval) %>%
dplyr::add_row(meta.scan.result$HMDB0000039 %>% dplyr::filter(qval <= 0.05) %>%
dplyr::filter(coef < 0) %>% arrange(desc(qval))) %>%
dplyr::transmute(feature = factor(feature, levels = feature), coef, stderr, qval, sig = (qval <= 0.05),
dir = case_when(coef > 0 & qval <= 0.05 ~ "pos",
coef < 0 & qval <= 0.05 ~ "neg",
TRUE ~ "other")) %>% arrange(desc(row_number()))
## generate bar plot
df.bar <- palm.df %>% ggplot(aes(x=feature, y=-log10(qval), fill= sig)) +
geom_bar(stat="identity") + ylim(0, 10) + scale_fill_manual(
breaks = c(TRUE, FALSE), values = c("grey50", "grey80")) +
geom_hline(aes(yintercept = -log10(0.05)),colour="#990000", linetype="dashed") +
theme_minimal() + coord_flip() + ylab("-log10 (q-value)") +
theme(axis.title.x = element_text(size = 13),
axis.title.y = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(hjust = 0.5, size = 20),
panel.background = element_rect(fill=NULL, colour='black', linewidth = 1),
axis.text.y = element_blank(),
axis.text.x = element_text(size = 13),
legend.title = element_text(hjust = 0.5, size = 16),
legend.text = element_text(size = 13),
legend.position = "none",
legend.direction = "vertical",
legend.box = "vertical",
strip.text = element_blank())
## generate scatter plot
df.scatter <- palm.df %>% ggplot(aes(x=feature, y=coef)) +
geom_errorbar(aes(ymin = coef - 1.96 * stderr, ymax = coef + 1.96 * stderr), color = "grey80",
width = 0, linewidth = 0.5) +
geom_point(aes(color = dir), pch = 18, size = 2.5) + ylim(-0.6, 0.6) +
geom_hline(aes(yintercept = 0),colour="#990000", linetype="dashed") +
theme_minimal() + scale_color_manual(
breaks = c("pos", "other", "neg"),
values = c("red", "grey80", "blue")) +
coord_flip() + ylab("Meta Coefficients") +
theme(axis.title.x = element_text(size = 13),
axis.title.y = element_text(size = 13),
axis.ticks = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(hjust = 0.5, size = 20),
panel.background = element_rect(fill=NULL, colour='black', linewidth = 1),
axis.text.y = element_text(size = 13),
axis.text.x = element_text(size = 13),
legend.title = element_text(hjust = 0.5, size = 16),
legend.text = element_text(size = 13),
legend.position = "none",
legend.direction = "vertical",
legend.box = "vertical",
strip.text = element_blank())
## plot
plot_grid(df.scatter, df.bar, nrow = 1, align = 'h', rel_widths = c(0.62, 0.38))
The following figure displays the features associated with butyrate production with \(q\)-value \(\leq 0.05\). The dots in the scatter plot represent the study-level association effect estimates, and the lines indicate 95% confidence intervals. The PALM effect estimates show good consistency across studies.
## summarize HMDB0000039 analysis
PALM.df <- meta.scan.result$HMDB0000039 %>% dplyr::filter(qval <= 0.05) %>%
dplyr::filter(coef > 0) %>% arrange(qval) %>%
dplyr::add_row(meta.scan.result$HMDB0000039 %>% dplyr::filter(qval <= 0.05) %>%
dplyr::filter(coef < 0) %>% arrange(desc(qval))) %>% tibble::column_to_rownames("feature")
## PALM plots
df.plot <- NULL
for(l in names(otu_data_lst)){
if(paste0(l,"_effect") %in% colnames(PALM.df)){
df.plot <- rbind(df.plot, tibble(genus = factor(rownames(PALM.df), levels = rownames(PALM.df)),
Study = factor(rep(l, nrow(PALM.df)), levels = names(otu_data_lst),
labels = paste0("MTBL", 1:8)),
AA = PALM.df[,paste0(l,"_effect")],
AA.lower = PALM.df[, paste0(l,"_effect")] - PALM.df[, paste0(l,"_stderr")],
AA.upper = PALM.df[, paste0(l,"_effect")] + PALM.df[, paste0(l,"_stderr")]))
}
}
df.area <- tibble(genus = factor(rownames(PALM.df %>% dplyr::filter(qval.het <= 0.1)), levels= rownames(PALM.df)), AA = 0)
g.PALM <- df.plot %>%
ggplot(aes(x=genus, y=AA)) +
geom_errorbar(aes(ymin = AA.lower, ymax = AA.upper, group = Study),
position = position_dodge(width = 0.6),
width = 0, linewidth = 0.5) +
geom_point(pch = 18, aes(color = Study),
size = 2.5, position = position_dodge(width = 0.6)) +
ylab("PALM\nCoefficients") + xlab("features") +
geom_hline(aes(yintercept = 0),colour="#990000", linetype="dashed") +
scale_color_manual(values = c("#F8766D","#CD9600","#7CAE00","#00BE67",
"#00BFC4","#00A9FF","#C77CFF","#FF61CC"),
breaks = c("MTBL8", "MTBL7", "MTBL6", "MTBL5",
"MTBL4", "MTBL3", "MTBL2", "MTBL1")) +
coord_flip() + theme_minimal() + ylim(-2.5, 1.5) +
theme(axis.title.x = element_text(size = 15),
axis.title.y = element_text(size = 15),
axis.ticks = element_blank(),
panel.grid = element_blank(),
panel.background = element_rect(fill=NULL, colour='black', linewidth = 1),
axis.text.y = element_text(size = 15),
axis.text.x = element_text(size = 15),
legend.title = element_text(hjust = 0.5, size = 16),
legend.text = element_text(size = 13),
legend.position = c(0.15, 0.91),
legend.direction = "vertical",
legend.box = "vertical",
strip.text = element_blank()) +
guides(color = guide_legend(reverse=T))
if(nrow(df.area) != 0){
g.PALM <- g.PALM + geom_rect(data = df.area, aes(xmin = as.numeric(genus) - 0.5,
xmax = as.numeric(genus) + 0.5,
ymin = -Inf, ymax = Inf),
fill = "yellow", alpha = 0.2)
}
g.PALM
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS 15.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] DT_0.29 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
## [5] dplyr_1.1.4 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
## [9] tibble_3.2.1 tidyverse_2.0.0 cowplot_1.1.1 ggplot2_3.5.1
## [13] PALM_0.1.0 BiocStyle_2.28.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 generics_0.1.3 stringi_1.7.12
## [4] enrichwith_0.3.1 lattice_0.21-8 hms_1.1.3
## [7] digest_0.6.35 magrittr_2.0.3 timechange_0.2.0
## [10] evaluate_0.21 grid_4.3.1 brglm2_0.9.2
## [13] bookdown_0.35 fastmap_1.1.1 jsonlite_1.8.9
## [16] Matrix_1.6-0 nnet_7.3-19 BiocManager_1.30.21.1
## [19] crosstalk_1.2.0 scales_1.3.0 numDeriv_2016.8-1.1
## [22] jquerylib_0.1.4 cli_3.6.3 rlang_1.1.4
## [25] munsell_0.5.1 withr_3.0.2 cachem_1.0.8
## [28] yaml_2.3.7 tools_4.3.1 tzdb_0.4.0
## [31] metafor_4.2-0 colorspace_2.1-1 mathjaxr_1.6-0
## [34] vctrs_0.6.5 R6_2.5.1 lifecycle_1.0.4
## [37] htmlwidgets_1.6.2 MASS_7.3-60 pkgconfig_2.0.3
## [40] pillar_1.10.1 bslib_0.7.0 gtable_0.3.6
## [43] glue_1.8.0 Rcpp_1.0.14 xfun_0.39
## [46] tidyselect_1.2.1 highr_0.10 rstudioapi_0.15.0
## [49] knitr_1.43 farver_2.1.2 nlme_3.1-162
## [52] htmltools_0.5.8.1 labeling_0.4.3 rmarkdown_2.23
## [55] compiler_4.3.1 metadat_1.2-0
Wei et al. Fast and reliable association discovery in large-scale microbiome studies and meta-analyses using PALM. Submitted.
Wirbel, Jakob et al. Meta-analysis of fecal metagenomes reveals global microbial features that are specific for colorectal cancer.
Muller, E., Algavi, Y.M. & Borenstein, E. The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis. npj Biofilms Microbiomes 8, 79 (2022).