A user is currently viewing the decision tree using the following code. Is there a way that we can export some calculated fields as output too?

463 Asked by NiharikaDeshpande in Data Science , Asked on Nov 7, 2019

For example, is it possible to display the sum of an input attribute at each node, i.e. sum of feature 1 from 'X' data array in the leaves of the tree.

from sklearn import datasets

iris = datasets.load_iris()

X = iris.data[:]

y = iris.target

#%%

from sklearn.tree import DecisionTreeClassifier

alg=DecisionTreeClassifier( max_depth=5,min_samples_leaf=2, max_leaf_nodes = 10)

alg.fit(X,y)

#%%

## View tree

import graphviz

from sklearn import tree

dot_data = tree.export_graphviz(alg,out_file=None, node_ids = True, proportion = True, class_names = True, filled = True, rounded = True)

graph = graphviz.Source(dot_data)

graph

For doing that, we have to define a function which will return an array of values that satisfy the conditions of node and feature, where node is the index of the node from the tree that you want values for and feature is the column (or feature) that you want from X.

def node_feature_values(X, clf, node=0, feature=0, require_leaf=False):

"""this function will return an array of values

from the input array X. Array values will be limited to

1. samples that passed through

2. and from the feature .

clf must be a fitted DecisionTreeClassifier

"""

leaf_ids = find_leaves(X, clf)

if (require_leaf and

node not in leaf_ids):

print(" is set, "

"select one of these nodes:
{}".format(leaf_ids))

return

# a sparse array that contains node assignment by sample

node_indicator = clf.decision_path(X)

node_array = node_indicator.toarray()

# which samples at least passed through the node

samples_in_node_mask = node_array[:,node]==1

return X[samples_in_node_mask, feature]

It is applied to the following example

values_arr = node_feature_values(X, alg, node=12, feature=0, require_leaf=True)

array([6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7,

5.8, 6.4, 6.5, 7.7, 7.7, 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.1, 6.4,

7.4, 7.9, 6.4, 7.7, 6.3, 6.4, 6.9, 6.7, 6.9, 5.8, 6.8, 6.7, 6.7,

6.3, 6.5, 6.2, 5.9])

Now a user can perform any operation including sum of feature 1 from 'X' data array in the leaves of the tree.

print("There are {} total samples in this node, "

"{}% of the total".format(len(values_arr), len(values_arr) / float(len(X))*100))

print("Feature Sum: {}".format(values_arr.sum()))

There are 43 total samples in this node,28.666666666666668% of the total

Feature Sum: 286.69999999999993

A user is currently viewing the decision tree using the following code. Is there a way that we can export some calculated fields as output too?

Your Answer