A user built a decision tree in R with tree package using the below code

1.0K Asked by varshaChauhan in Data Science , Asked on Nov 5, 2019

Classification tree:

tree(formula = High temperature ~ ., data = summer.train)

Variables actually used in tree construction:

[1] "Humidity" "Cloudy" "Airy" "Dry"

"Windy"

Number of terminal nodes: 12

Residual mean deviance: 0.3874 = 377.7 / 975

Misclassification error rate: 0.08909 = 89 / 999

Now how to get the variables that are used by the tree construction, "airy", "dry", etc based on the summary function above?

Let us use the famous spam dataset to find out the solution

library(kernlab)

library(tree)

data(spam)

spam_tree_def <- tree(type~.,data=spam)

summary(spam_tree_def)

The summary result gives the following

Classification tree:

tree(formula = type ~ ., data = spam)

Variables actually used in tree construction:

[1] "charDollar" "remove" "charExclamation" "hp" "capitalLong" "our"

[7] "capitalAve" "free" "george" "edu"

Number of terminal nodes: 13

Residual mean deviance: 0.4879 = 2238 / 4588

Misclassification error rate: 0.08259 = 380 / 4601

The correct way to extract what we want is

as.character(summary(spam_tree_def)$used)

[1] "charDollar" "remove" "charExclamation" "hp" "capitalLong" "our"

[7] "capitalAve" "free" "george" "edu"

Your Answer