A user built a decision tree in R with tree package using the below code
Classification tree:
tree(formula = High temperature ~ ., data = summer.train)
Variables actually used in tree construction:
[1] "Humidity" "Cloudy" "Airy" "Dry"
"Windy"
Number of terminal nodes: 12
Residual mean deviance: 0.3874 = 377.7 / 975
Misclassification error rate: 0.08909 = 89 / 999
Now how to get the variables that are used by the tree construction, "airy", "dry", etc based on the summary function above?
Let us use the famous spam dataset to find out the solution
library(kernlab)
library(tree)
data(spam)
spam_tree_def <- tree(type~.,data=spam)
summary(spam_tree_def)
The summary result gives the following
Classification tree:
tree(formula = type ~ ., data = spam)
Variables actually used in tree construction:
[1] "charDollar" "remove" "charExclamation" "hp" "capitalLong" "our"
[7] "capitalAve" "free" "george" "edu"
Number of terminal nodes: 13
Residual mean deviance: 0.4879 = 2238 / 4588
Misclassification error rate: 0.08259 = 380 / 4601
The correct way to extract what we want is
as.character(summary(spam_tree_def)$used)
[1] "charDollar" "remove" "charExclamation" "hp" "capitalLong" "our"
[7] "capitalAve" "free" "george" "edu"