{treeheatr} and {pmlbr}

visualizing decision trees
on benchmark datasets

Trang Le

University of Pennsylvania

RLadies Johannesburg 2021-09-14

@trang1618

slides.com/trang1618/rladies-johannes/

part 1

Your decision tree may be cool, but what if I tell you
you can make it hot?

🌳

🔥

decision tree + heatmap

rpart.plot

Terry Therneau, Beth Atkinson, Brian Ripley

visNetwork::visTree()

Almende B.V., Benoit Thieurmel , Titouan Robert

partykit::plot.party()

Torsten Hothorn, Heidi Seibold, Achim Zeileis

ggparty

Martin Borkovec, Niyaz Madin, et al.

dtreeviz

(Python)

Terence Parr, Prince Grover

treeheatr

target

features

dependent variable/outcome/phenotype

predictors/variables

treeheatr::heat_tree(x)

can be of object

party (or constparty)
specifying the precomputed tree
partynode
specifying the manual tree
data.frame (or tibble)

heat_tree(x, target_lab = 'Outcome')

351 blood samples (January 10 - February 18)

3 features:

lactic dehydrogenase (LDH)
lymphocyte levels
high-sensitivity C-reactive protein (hs_CRP)

Yan et al., 2020

COVID-19 patient data

heat_tree(x = covid, target_lab = 'Outcome')

data.frame

heat_tree(covid, target_lab = 'Outcome',
	  feats = NA)

split_ldh <- partysplit(1L, breaks = 365)
split_crp <- partysplit(2L, breaks = 41.2)
split_lymp <- partysplit(3L, breaks = 14.7)

custom_tree <- partynode(1L, split = split_ldh , kids = list(
  partynode(2L, split = split_crp, kids = list(
    partynode(3L, info = 'Survival'),
    partynode(4L, split = split_lymp, kids = list(
      partynode(5L, info = 'Death'),
      partynode(6L, info = 'Survival'))))),
  partynode(7L, info = 'Death')))

heat_tree(x = custom_tree, data_test = covid, target_lab = 'Outcome')

part 2 what is benchmarking?

a standard practice to illustrate the strengths and weaknesses of algorithms with regards to different problem characteristics

target

features

dependent variable/outcome/phenotype

predictors/variables

pmlbr::fetch_data(x)

character object

name of the dataset to fetch from PMLB

fetch_data('wine_quality_red')

fixed.acidity	volatile.acidity	target

7.4	0.700	5
7.8	0.880	5
7.8	0.760	5
11.2	0.280	6
7.4	0.700	5
7.4	0.660	5
7.9	0.600	5

fetch_data('wine_quality_red')

tiny.cc/pmlbr

Please save a copy of the notebook in *your own* Google Drive so you can start editing.

PMLB for other purposes

teaching

first time open source contributors

Thanks!

@trang1618

Funding

NIH LM010098

NIH AI116794

R packages

ggplot2

partykit

ggparty

heatmaply

People

Jason Moore

Anonymous reviewers

treeheatr and pmlbr: visualizing decision trees on benchmark datasets

By Trang Le

treeheatr and pmlbr: visualizing decision trees on benchmark datasets

Workshop at RLadies Miami, Nov 19 2020 https://trang1618.github.io/treeheatr https://epistasislab.github.io/pmlb/

4 years ago
631

Trang Le

#math graduate. Postdoc fellow with Jason Moore.

{treeheatr} and {pmlbr}

visualizing decision trees on benchmark datasets

Trang Le

part 1

Your decision tree may be cool, but what if I tell you you can make it hot?

decision tree + heatmap

rpart.plot

visNetwork::visTree()

partykit::plot.party()

ggparty

dtreeviz

(Python)

treeheatr

COVID-19 patient data

part 2

what is benchmarking?

PMLB for other purposes

teaching

first time open source contributors

Thanks!

treeheatr and pmlbr: visualizing decision trees on benchmark datasets

More from Trang Le

visualizing decision trees
on benchmark datasets

Your decision tree may be cool, but what if I tell you
you can make it hot?