The linear_model.stepwise module

Both Method Variable Selection

both_selection(formula, data, model, max_iter, formula_kwargs, fit_kwargs)

This function is deprecated. Use the `Stepwise()` class from the `estyp.linear_model` module instead.

This function performs both forward and backward variable selection using the Akaike Information Criterion (AIC).

Parameters:
  • formula (str) – A string representing the initial model formula.

  • data (DataFrame) – A Pandas DataFrame containing the data to be used for model fitting.

  • model (GLM, OLS, Logit, LogisticRegression) – A statsmodels.GLM object that represents the type of model to be fit.

  • max_iter (int) – The maximum number of iterations to perform.

  • formula_kwargs (dict) – Additional keyword arguments to be passed to the model.from_formula() method.

  • fit_kwargs – Additional keyword arguments to be passed to the fit() method. Defaults to a dictionary {"disp":0}.

Returns:

A string representing the final model formula.

import statsmodels.api as sm
import pandas as pd
from estyp.linear_model.stepwise import both_selection

data = pd.DataFrame({
   "y": [1, 2, 3, 4, 5],
   "x1": [1, 2, 3, 4, 5],
   "x2": [6, 7, 8, 9, 10],
})
formula = "y ~ x1 + x2"
model = sm.OLS

final_formula = both_selection(formula=formula, data=data, model=model)
print(final_formula)
y ~ x1

Forward Variable Selection

forward_selection(y, data, model, alpha, formula_kwargs, fit_kwargs)

This function is deprecated. Use the `Stepwise()` class from the `estyp.linear_model` module instead.

This function performs forward variable selection using p-values calculated from nested models testing.

Parameters:
  • y (str) – A string containing the name of the dependent variable (target) to be predicted.

  • data (DataFrame) – The pandas DataFrame containing both the target variable ‘y’ and the predictor variables for model training.

  • model (Union[GLM, OLS, Logit, LogisticRegression]) – A statsmodels model class. The statistical model to be used for model fitting and evaluation. Defaults to sm.OLS.

  • alpha (float) – A number between 0 and 1. The significance level for feature selection. A feature is added to the model if its p-value is less than this alpha value. Defaults to 0.05.

  • formula_kwargs (dict) – Additional keyword arguments to be passed to the model.from_formula() method. Defaults to dict().

  • fit_kwargs (dict) – Additional keyword arguments to be passed to the fit() method. Defaults to a dictionary {"disp":0}.

Returns:

A string representing the final model formula.

import pandas as pd
import statsmodels.api as sm
from estyp.linear_model.stepwise import forward_selection

# Create sample DataFrame
data = pd.DataFrame({
   'y': [1, 2, 3, 4, 5],
   'X1': [2, 4, 5, 7, 9],
   'X2': [3, 1, 6, 8, 4],
   'X3': [1, 5, 9, 2, 3]
})

# Perform the forward variable selection
formula = forward_selection(
   y = "y",
   data = data,
   model = sm.OLS,
   alpha = 0.05
)

# Fit the model using the selected formula
selected_model = sm.OLS.from_formula(formula, data).fit()
print(selected_model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.990
Model:                            OLS   Adj. R-squared:                  0.986
Method:                 Least Squares   F-statistic:                     289.0
Date:                Sun, 17 Sep 2023   Prob (F-statistic):           0.000443
Time:                        02:53:15   Log-Likelihood:                 2.6178
No. Observations:                   5   AIC:                            -1.236
Df Residuals:                       3   BIC:                            -2.017
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.1438      0.203     -0.710      0.529      -0.789       0.501
X1             0.5822      0.034     17.000      0.000       0.473       0.691
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   2.488
Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.336
Skew:                           0.389   Prob(JB):                        0.845
Kurtosis:                       1.998   Cond. No.                         14.8
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.