## Meta-analysisNew using Stata

Senior Statistician and Software Developer

StataCorp LLC

Stata Conference

July 30 , 2020

## Outline

• What is meta-analysis? Why and when you should use it ?
• Data setup: Effect-sizes and Meta-analysis models
• The meta suite: Exploring the syntax
• Examples: Two case studies
• Summary
• Subgroup-analysis:
• Publication bias: NSAIDS data
• The meta control panel
• BCG vaccine efficacy against tuberculosis
• MA is the science of combining results from multiple studies addressing a similar scientific question

## What is meta-analysis (MA) ?

• MA has been mostly used in medicine, but also in econometrics, ecology, psychology, and education to name a few
• The goal of MA is to explore consistencies and discrepancies among the studies , and if sensible, provide a unified conclusion
• Potential problem: Publication bias, which occur when the results of the published literature in a certain domain differ systematically in its results from all the relevant research results

## Why would you want to use meta-analysis ?

•  An increase in power and improvement in precision
• The ability to answer questions not posed by individual studies and to settle controversies arising from conflicting claims

## Data setup

• $$K$$ studies

treatment group

control group

• Study $$j$$ estimates effect size, $$\theta_j$$ and its standard error $$\sigma_j$$
• Effect size (ES): a value that reflects the magnitude of group differences or the strength of a relationship between 2 variables

vs

ES

e.g. OR, RR, RD, Hedges's $$g$$, Cohen's $$d$$ etc.

variable 1

variable 2

ES

e.g. correlation coef. $$r$$, regression coef $$\beta$$ etc.

## MA models

\hat{\theta}_j = \theta_j + \epsilon_j, \, \epsilon_{j}\sim N\left(0, \hat{\sigma}^2_j\right)

$$K$$ independent studies, each reports:

• An estimate, $$\hat{\theta}_j$$, of the true (unknown) effect size $$\theta_j$$
• An estimate, $$\hat{\sigma}_j$$ , of its standard error
•  Estimating $$\theta$$ (and $$\tau^2$$ with RE model) is one of the main goals of MA
\theta
Model Assumption Target of inference
common effect (CE) common value
fixed effects (FE) fixed
Random effects (RE)
\theta_1=\theta_2=\dots=\theta_K
\theta_j
\theta_j = \left\{\theta + u_j\right\} \sim N\left(\theta, \tau^2\right)
\theta = \text{weighted avg}(\theta_j)
\theta= E\left(\theta_j \right)
\hat{\theta} = \frac{\sum_{j=1}^kw_j\hat{\theta}_j}{\sum_{j=1}^k w_j}

# The meta suite

## Exploring the syntax

• Pre-computed (generic) effect sizes
•  Effect sizes for binary data
•  Effect sizes for continuous data
meta set

## Data setup and MA declaration

create variables starting with _meta_ (e.g. _meta_es, _meta_se) to be used with all other commands

etc.

meta funnelplot
meta forestplot
meta summarize
meta esize
meta regress
Binary summary data:

+----------------------------------+
| study  nt1     nt0   nc1     nc0 |
|----------------------------------|
|     1    4     119    11     128 |
|     2    6     300    29     274 |
|     3    3     228    11     209 |
|     4   62   13536   248   12619 |
|     5   33    5036    47    5761 |
+----------------------------------+

Precomputed effect size data:

+----------------------+
| study      ES  ES_se |
|----------------------|
|     1     .03   .125 |
|     2     .12   .147 |
|     3    -.14   .167 |
|     4    1.18   .373 |
|     5     .26   .369 |
+----------------------+

Continuous summary data:

+--------------------------------------------------+
| study   n1       m1     sd1   n2      m2     sd2 |
|--------------------------------------------------|
|     1   13    0.096   0.020   14   0.920   0.047 |
|     2   18   -0.000   0.066   11   1.110   0.094 |
|     3   10    0.054   0.088   11   0.956   0.040 |
|     4   15    0.000   0.019   20   0.899   0.098 |
|     5   15    0.036   0.020   10   1.102   0.014 |
+--------------------------------------------------+


Precomputed effect size data: (es and CI)

+----------------------------------------+
| study      ES          cil         ciu |
|----------------------------------------|
|     1     .03    -.2149955    .2749955 |
|     2     .12   -.16811471   .40811471 |
|     3    -.14   -.46731399   .18731399 |
|     4    1.18    .44893343   1.9110666 |
|     5     .26   -.46322671   .98322671 |
+----------------------------------------+

meta esize nt1 nt0 nc1 nc0
meta esize n1 m1 sd1 n2 m2 sd2
meta set ES ES_se
meta set ES cil ciu

## Effect sizes computed from summary data

Binary summary data

($$2\times 2$$ tables)

Continuous summary data

(sample size, mean, and standard deviation for each group)

Hedges's $$g$$, Cohen's $$d$$, Glass's $$\Delta_1$$ and $$\Delta_2$$, and (raw) mean difference $$D$$

log odds-ratio $$\log$$(OR), $$\log$$(ORpeto), log risk-ratio $$\log$$(RR) , and risk difference $$RD$$

## Pre-computed  Effect sizes

Correlation $$r$$, $$\log$$(HR), $$\text{logit}$$($$p$$), etc.

meta esize n1 m1 sd1 n2 m2 sd2
meta esize nt1 nt0 nc1 nc0
meta set ES ES_se

## Effect sizes for binary data

. webuse bcg, clear
(Efficacy of BCG vaccine against tuberculosis)
. keep studylbl npost - nnegc
. describe 
------------------------------------------------------------------------------------------
storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------------------
studylbl        str27   %27s                  Study label
npost           int     %9.0g                 Number of TB positive cases in treated group
nnegt           long    %9.0g                 Number of TB negative cases in treated group
nposc           int     %9.0g                 Number of TB positive cases in control group
nnegc           long    %9.0g                 Number of TB negative cases in control group
------------------------------------------------------------------------------------------

group TB+ TB-
Vaccinated npost = 4 nnegt = 119
control nposc = 11 nnegc = 128

each study, $$j$$, yield a $$2\times 2$$ table, e.g. for study 1:

\log(OR_j)
\log(RR_j)
RD_j
\log(ORpeto_j)
     +--------------------------------------------------------+
|               studylbl   npost   nnegt   nposc   nnegc |
|--------------------------------------------------------|
1. |          Aronson, 1948       4     119      11     128 |
2. | Ferguson & Simes, 1949       6     300      29     274 |
3. | Rosenthal et al., 1960       3     228      11     209 |
+--------------------------------------------------------+

. list in 1/3

their SEs and CIs

computes one of

meta esize

## Effect sizes for binary data

. meta esize npost nnegt nposc nnegc 
Meta-analysis setting information

Study information
No. of studies:  13
Study label:  Generic                      <--- controlled by studylabel()
Study size:  _meta_studysize
Summary data:  npost nnegt nposc nnegc

Effect size
Type:  lnoratio                     <--- controlled by esize()
Label:  Log Odds-Ratio               <--- controlled by eslabel()
Variable:  _meta_es
Zero-cells adj.:  None; no zero cells          <--- controlled by zerocells()

Precision
Std. Err.:  _meta_se
CI:  [_meta_cil, _meta_ciu]
CI level:  95%                          <--- controlled by level()

Model and method                                <--- controlled by random[()], fixed[()],
Model:  Random-effects                            and common[()]
Method:  REML

• We can now use, for example, meta summarize to compute the overall effect size (mean log odds-ratio in this example)
. meta summarize
  Effect-size label:  Log Odds-Ratio
Effect size:  _meta_es
Std. Err.:  _meta_se

Meta-analysis summary                     Number of studies =     13
Random-effects model                      Heterogeneity:
Method: REML                                          tau2 =  0.3378
I2 (%) =   92.07
H2 =   12.61

--------------------------------------------------------------------
Study | Log Odds-Ratio    [95% Conf. Interval]  % Weight
------------------+-------------------------------------------------
Study  1 |         -0.939      -2.110       0.233      4.98
Study  2 |         -1.666      -2.560      -0.772      6.34
Study  3 |         -1.386      -2.677      -0.096      4.49
(Output omitted)
Study 11 |         -0.341      -0.560      -0.121      9.88
Study 12 |          0.447      -0.986       1.879      3.97
Study 13 |         -0.017      -0.542       0.507      8.45
------------------+-------------------------------------------------
theta |         -0.745      -1.110      -0.381
--------------------------------------------------------------------
Test of theta = 0: z = -4.01                     Prob > |z| = 0.0001
Test of homogeneity: Q = chi2(12) = 163.16         Prob > Q = 0.0000

• Compute $$\log(RR)$$  (esize(lnrratio)) and use a RE model based on the DerSimonian-Laird method (random(dlaird))
. meta esize npost nnegt nposc nnegc, esize(lnrratio) random(dlaird)
. meta update, esize(lnrratio) random(dlaird)

Or equivalently,

Meta-analysis setting information

Study information
No. of studies:  10

(omitted output)

Effect size
Type:  lnrratio
Label:  Log Risk-Ratio
Variable:  _meta_es
Zero-cells adj.:  None; no zero cells

(omitted output)

Model and method
Model:  Random-effects
Method:  DerSimonian-Laird



You may change the default MA model using one of options  random[()], common or fixed and the default effect size via option esize()

. meta update, studylabel(studylbl) eslabel("My label")

You may provide more descriptive labels for the studies and the effect size using options studylabel() and eslabel()

Meta-analysis setting information from meta esize

Study information
No. of studies:  13
Study label:  studylbl
Study size:  _meta_studysize
Summary data:  npost nnegt nposc nnegc

Effect size
Type:  lnrratio
Label:  My label
Variable:  _meta_es
Zero-cells adj.:  None; no zero cells

Precision
Std. Err.:  _meta_se
CI:  [_meta_cil, _meta_ciu]
CI level:  95%

Model and method
Model:  Random-effects
Method:  DerSimonian-Laird


• Had there been zero cells, you may specify how to handle them via the zerocells() option
. meta update, zerocells(.2)

// or

. meta update, zerocells(tacc) 

We will construct a forest plot for the 1st 4 studies to see the effect of adding study labels and effect size label

. meta update, studylabel(studylbl) eslabel("Log(RR)")
. meta forestplot in 1/4

studylabel(studylbl)

eslabel("Log(RR)")

Forest plot without options studylabel() and eslabel()

• At any point in your analysis, you may use meta query to remind yourself of your current MA settings
. meta query 
-> meta esize npost nnegt nposc nnegc , esize(lnrratio) studylabel(studylbl) eslabel(My
> label) random(dlaird)

Meta-analysis setting information from meta esize

Study information
No. of studies:  13
Study label:  studylbl
Study size:  _meta_studysize
Summary data:  npost nnegt nposc nnegc

Effect size
Type:  lnrratio
Label:  My label
Variable:  _meta_es
Zero-cells adj.:  None; no zero cells

Precision
Std. Err.:  _meta_se
CI:  [_meta_cil, _meta_ciu]
CI level:  95%

Model and method
Model:  Random-effects
Method:  DerSimonian-Laird



## syntax

• If you have access to summary data, use meta esize to compute and declare effect sizes such as an odds ratio or a Hedges’s $$g$$.
• To check whether your data are already meta set or to see the current meta settings, use meta query
• To update some of your meta-analysis settings after the declaration, use meta update.
• Alternatively, if you have only precomputed (generic) effect sizes, use meta set.

## Summary I

• meta set and meta esize create system variables with names starting with _meta_ to be used by all subsequent meta commands.

# Data sets used

Two data sets (bcg.dta and nsaids.dta) will be used throughout this webinar, you may further explore them below

# Exploring heterogenity

## subgroup-analysis

Case study: Efficacy of BCG vaccine against tuberculosis

• Heterogeneity: Variability among the effect sizes beyond what is expected due to random sampling (chance).
• Exploring the possible reasons for heterogeneity between studies is an important aspect of a MA
. webuse bcgset, clear
(Efficacy of BCG vaccine against tuberculosis; set with -meta esize-)
. describe npost - studylbl
------------------------------------------------------------------------------------------
storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------------------
npost           int     %9.0g                 Number of TB positive cases in treated group
nnegt           long    %9.0g                 Number of TB negative cases in treated group
nposc           int     %9.0g                 Number of TB positive cases in control group
nnegc           long    %9.0g                 Number of TB negative cases in control group
latitude        byte    %9.0g                 Absolute latitude of the study location (in
degrees)
studylbl        str27   %27s                  Study label
------------------------------------------------------------------------------------------

• MA consists of 13 studies (Colditz et al. ) to evaluate the efficacy of the  BCG vaccine against tuberculosis (TB)
• Vaccine efficacy has been controversial
     +----------------------------------------------------------------+
|           author   npost   nnegt   nposc   nnegc   latitude    |
|----------------------------------------------------------------|
1. |          Aronson       4     119      11     128         44    |
2. | Ferguson & Simes       6     300      29     274         55    |
3. | Rosenthal et al.       3     228      11     209         42    |
+----------------------------------------------------------------+
. list author npost - nnegc latitude in 1/3
. meta esize npost - nnegc, esize(lnrratio) studylabel(studylbl)
. meta forestplot
. meta forest, eform nullrefline

nonsignificant RR

nonoverlapping CIs

## Quantifying heterogeneity

Sampling error

Between-study heterogeneity

I^2=
H^2=

Total observed heterogeneity

• Subgroup analysis focuses on explaining

(within-study heterogeneity)

meta summarize and meta forestplot report

92\% \approx
12.86 \approx

# Subgroup analysis

• Subgroup analysis involves dividing the data into subgroups,  in order to make comparisons between them.
• The studies are grouped based on study or participants’ characteristics, and an overall effect-size estimate is computed for each group
• The goal of subgroup analysis is to compare these overall estimates across groups and determine whether the considered grouping helps explain some of the observed between-study heterogeneity.

# Compare the BCG vaccine efficacy in cold vs hot climate

Berkey et al (1995) and Borenstein et al (2009) suggested that latitude (as a surrogate for climate) could explain some of the variation in the efficacy of the BCG vaccine

• We will dichotomize latitude into two categories: hotter climate vs colder climate
. generate byte latitude_01 = latitude_c > 0

. label define latval 0 "hot climate" 1 "cold climate"

. label values latitude_01 latval


     +-------------------------------------------------------+
|                    studylbl   latitude    latitude_01 |
|-------------------------------------------------------|
1. |               Aronson, 1948         44   cold climate |
2. |      Ferguson & Simes, 1949         55   cold climate |
3. |      Rosenthal et al., 1960         42   cold climate |
4. |     Hart & Sutherland, 1977         52   cold climate |
5. | Frimodt-Moller et al., 1973         13    hot climate |
+-------------------------------------------------------+

. list studylbl latitude latitude_01 in 1/5


. meta forestplot, subgroup(latitude_01) nullrefline rr

summary for each group

Test of $$H_0: \theta_{grp1} = \theta_{grp2}$$

• You may report your results as vaccine efficacies via the transform() option
meta forest, subgroup(latitude_01)	///
transform("Vaccine efficacy": efficacy)

Other supported transformations within the transform() option are: corr, exp, invlogit, and tanh.

# Summary II

• Heterogeneity is the variability among the ES beyond what is expected due to random sampling.
• $$I^2$$, $$H^2$$ are statistics used to quantify heterogeneity among the ES
• Whenever possible, reasons behind heterogeneity should always be explored via subgroup analysis or meta-regression.
• Large unexplained heterogeneity could mean that:
• overall ES has no meaningful interpretation in practice
• it does not make sense to conduct a meta-analysis.

# Small-study effect (Publication bias)

• Small-study effects (Sterne, Gavaghan, and Egger 2000) is used in MA to describe the cases when the results of smaller studies differ systematically from the results of larger studies.
• One of the reasons  behind small-study effect is publication bias (or more generally reporting bias)
• Publication bias arises when the decision to publish a study depends on the statistical significance of its results.

Random subset

• Suppose that we are missing some of the studies in our MA.

Observed studies

valid conclusions albeit wider CIs, less powerful tests (less info)

systematically different

Studies not included in the MA (missing studies)

(e.g. when smaller studies with nonsignificant findings are suppressed from publication)

our meta-analytic results will be biased and decisions based on them are invalid

## Tools for small-study effects analysis

• The funnel plot
• Tests for small-study effects
• The trim-and-fill analysis
• Simple funnel plot
• Contour-enhanced funnel plot (one-sided and two-sided significance contours)
• Several precision metrics
• Egger's, Peters's, and Harbord's regression-based tests with the possibility to include moderators to account for heterogeneity
• Begg and mazumdar's test

meta funnelplot

meta bias

meta trimfill

Small-study effect (potentially due to publication bias)

Little evidence of Small-study effect

\hat{\theta}_j \sim N\left( \theta, \sigma^2_j \right )

which means the individual ES should be distributed randomly around the overall ES

Large and small studies tell the same story about $$\theta$$

Large and small studies tell  different stories about $$\theta$$

large studies

small studies

large studies

small studies

. webuse nsaidsset, clear
(Effectiveness of nonsteroidal anti-inflammatory drugs; set with -meta esize-)
. meta funnelplot

 Effect-size label:  Log Odds-Ratio
Effect size:  _meta_es
Std. Err.:  _meta_se
Model:  Common-effect
Method:  Inverse-variance


gap (missing studies ?)

• You may enhance the contour funnel plot via the addplot() option
. scalar theta = r(theta) // obtained from previous -meta funnel- command r() results

// position legend at 10 o'clock inside the graph region
. local legopts ring(0) position(10) cols(1) size(small) symxsize(*0.6)

. local opts horizontal range(0 1.6) lpattern(dash) lcolor("red") ///
legend(order(1 2 3 4 5 6) label(6 "95% pseudo CI") legopts')

. meta funnel, contours(1 5 10) ///
addplot(function theta-1.96*x, opts' || function theta+1.96*x, opts')
. meta bias, harbord 
• We will test for funnel-plot asymmetry and use the Harbord's test instead of the Egger's test as we are working with $$\log$$(OR)
 Effect-size label:  Log Odds-Ratio
Effect size:  _meta_es
Std. Err.:  _meta_se

Regression-based Harbord test for small-study effects
Random-effects model
Method: REML

H0: beta1 = 0; no small-study effects
beta1 =      3.03
SE of beta1 =     0.741
z =      4.09
Prob > |z| =    0.0000


Nonparametric trim-and-fill analysis of publication bias
Linear estimator, imputing on the left

Iteration                            Number of studies =     47
Model: Random-effects                       observed =     37
Method: REML                                  imputed =     10

Pooling
Model: Random-effects
Method: REML

---------------------------------------------------------------
Studies |   Log Odds-Ratio    [95% Conf. Interval]
---------------------+-----------------------------------------
Observed |            1.322       1.031       1.613
Observed + Imputed |            1.035       0.726       1.343
---------------------------------------------------------------

. meta trimfill, funnel(contours(1 5 10) legend(legopts'))
• We can perform a trim-and-fill analysis to assess the effect of missing studies on the overall ES and request a contour-enhanced funnel plot based on the complete (observed + filled) set studies

## Summary III

• Publication bias occurs if studies with favourable results are more likely to be published than studies with unfavourable results.
• Small-study effect is manifested graphically by funnel-plot asymmetry
• Publication bias is only one of the reasons behind funnel-plot asymmetry.
• Publication bias should be assessed after you have accounted for heterogeneity in your MA (see ex8 of  meta funnelpot and ex1 of  meta bias)
• You can investigate small-study effects visually via meta funnelplot, test for it via meta bias, and assess its impact on the overall ES via meta trimfill

# Other features

• Cumulative (and stratified-cumulative) MA forest plots
• L'Abbé plots
• Multiple subgroup analyses forest plots
• Meta-regression
• Bubble plots after meta-regression
• Effect sizes for continuous data (Hedges's $$g$$, Cohen's $$d$$, etc.)
• Pre-computed effect sizes (correlation $$r$$, $$\log(HR)$$, etc.)
• Stratified funnel plots with various precision metrics

# The meta control panel

Prefer to avoid typing commands ?  Everything I have showed you can be done in the meta control panel with few mouse clicks

meta set
meta esize
meta summarize

meta forestplot

meta labbeplot

meta regress
estat bubbleplot
meta funnelplot
meta bias
meta trimfill

# Summary

•  A new meta suite is available in Stata 16 to perform MA
• Effect sizes for binary and continuous data may be computed via meta esize and generic (pre-computed) ES may be specified via meta set

• It is important to include an assessment of publication bias to insure the integrity of the MA . This may be done using the meta funnelplot, meta bias and meta trimfill commands
• When substantial heterogeneity is present among the studies, the reasons behind this heterogeneity should be explored via  subgroup analysis ( meta summarize, subgroup()) or meta-regression ( meta regress)

• Use meta update and meta query to update and describe your current MA settings, respectively
• Results of a MA are best summarized numerically using  meta summarize, or graphically using meta forestplot. This includes subgroup-analysis forest plots and CMA forest plots.