A user is currently viewing the decision tree using the following code. Is there a way that we can export some calculated fields as output too?

For example, is it possible to display the sum of an input attribute at each node, i.e. sum of feature 1 from 'X' data array in the leaves of the tree.

from sklearn import datasets

iris = datasets.load_iris()

X = iris.data[:]

y = iris.target

#%%

from sklearn.tree import DecisionTreeClassifier

alg=DecisionTreeClassifier( max_depth=5,min_samples_leaf=2, max_leaf_nodes = 10)

alg.fit(X,y)

#%%

## View tree

import graphviz

from sklearn import tree

dot_data = tree.export_graphviz(alg,out_file=None, node_ids = True, proportion = True, class_names = True, filled = True, rounded = True)

graph = graphviz.Source(dot_data)

graph


For doing that, we have to define a function which will return an array of values that satisfy the conditions of node and feature, where node is the index of the node from the tree that you want values for and feature is the column (or feature) that you want from X.

def node_feature_values(X, clf, node=0, feature=0, require_leaf=False):

    """this function will return an array of values

    from the input array X. Array values will be limited to

     1. samples that passed through

     2. and from the feature .

    clf must be a fitted DecisionTreeClassifier

    """

    leaf_ids = find_leaves(X, clf)

    if (require_leaf and

        node not in leaf_ids):

        print(" is set, "

                "select one of these nodes:
{}".format(leaf_ids))

        return

    # a sparse array that contains node assignment by sample

    node_indicator = clf.decision_path(X)

    node_array = node_indicator.toarray()

    # which samples at least passed through the node

    samples_in_node_mask = node_array[:,node]==1

    return X[samples_in_node_mask, feature]

It is applied to the following example

values_arr = node_feature_values(X, alg, node=12, feature=0, require_leaf=True)

array([6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7,

       5.8, 6.4, 6.5, 7.7, 7.7, 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.1, 6.4,

       7.4, 7.9, 6.4, 7.7, 6.3, 6.4, 6.9, 6.7, 6.9, 5.8, 6.8, 6.7, 6.7,

       6.3, 6.5, 6.2, 5.9])


Now a user can perform any operation including sum of feature 1 from 'X' data array in the leaves of the tree.

print("There are {} total samples in this node, "

      "{}% of the total".format(len(values_arr), len(values_arr) / float(len(X))*100))

print("Feature Sum: {}".format(values_arr.sum()))

There are 43 total samples in this node,28.666666666666668% of the total

Feature Sum: 286.69999999999993



Your Answer

Interviews

Parent Categories