Explain how a decision tree works along with the assumptions.
Every machine learning algorithm has their own benefits and a reason for implementation. Decision tree algorithm is one such widely used algorithm. A decision tree is an upside down tree that makes decisions based on the conditions present in the data. Let us illustrate this to make it easy. Let us take a titanic dataset and assume that we are taking a decision tree for building our final model. So internally, the algorithm will make a decision tree which will be something like this given below.
In the above representation of a tree, the conditions such as the age, diet plan and the exercise routine goes on splitting into branches until they come to a decision whether a person is fit or not. The conditions are known as the internal nodes and they split to come to a decision which is known as leaf.
Decision trees are of two types.
a) Classification and
b) Regression
Classification trees are applied on those data when the outcome is discrete in nature or is categorical such as presence or absence of students in a class, a person died or survived, approval of loan etc. but regression trees are used when the outcome of the data is continuous in nature such as prices, age of a person, length of stay in a hotel, etc.
Assumptions
Despite such simplicity of a decision tree, it holds certain assumptions which are
a) Discretization of continuous variables is required
b) The data taken for training should be wholly considered as root
c) Distribution of records is done in a recursive manner on the basis of attribute values.