How to train a multiple linear regression model to find the best combination of variables?

858    Asked by CameronOliver in Data Science , Asked on Nov 9, 2019
Answered by Nitin Solanki

Automated model selection and model-averaging. Provides a wrapper for glm and other functions, automatically generating all possible models (under constraints set by the user) with the specified response and explanatory variables, and finding the best models in terms of some Information Criterion (AIC, AICc or BIC). Can handle very large numbers of candidate models. Features a Genetic Algorithm to find the best models when an exhaustive screening of the candidates is not feasible

But we have to take care of following things:

This type of data-driven model selection will almost always destroy your ability to make reliable inferences (compute p-values, confidence intervals, etc.)

it may overfit your data (although using the information criteria listed in the package description will help with this)



Your Answer

Interviews

Parent Categories