The linear_model module

Module contents

Logistic Regression

class LogisticRegression(X, y, penalty, dual, tol, C, fit_intercept, intercept_scaling, class_weight, random_state, solver, max_iter, verbose, warm_start, n_jobs, l1_ratio)

This class implements a logistic regression model. It is like the sklearn.linear_model.LogisticRegression class, but adds additional methods for calculating confidence intervals, p-values, and model summaries.

__init__(X, y, penalty, dual, tol, C, fit_intercept, intercept_scaling, class_weight, random_state, solver, max_iter, verbose, warm_start, n_jobs, l1_ratio)
Parameters:
  • X (Union[DataFrame, ndarray, None]) – A Pandas DataFrame or a NumPy array containing the model predictors.

  • y (Union[Series, ndarray, None]) – A Pandas Series or a NumPy array containing the model response.

  • penalty (Literal['l1', 'l2', 'elasticnet']) – The type of penalty to use. Can be one of "none" (default). "l1", "l2", or "elasticnet".

  • dual (bool) – Whether to use the dual formulation of the problem.

  • tol (float) – The tolerance for convergence.

  • C (int) – The regularization strength.

  • fit_intercept (bool) – Whether to fit an intercept term.

  • intercept_scaling (int) – The scaling factor for the intercept term.

  • class_weight (Union[None, str, dict]) – None (default), “balanced” or a dictionary that maps class labels to weights.

  • random_state (int) – The random seed.

  • solver (Literal['lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga']) – The solver to use. Can be one of "lbfgs" (default), "liblinear", "newton-cg", "newton-cholesky", "sag", or "saga".

  • max_iter (int) – The maximum number of iterations.

  • verbose (int) – The verbosity level.

  • warm_start (bool) – Whether to use the warm start.

  • n_jobs (int) – The number of jobs to use for parallel processing.

  • l1_ratio (Union[float, None]) – The l1_ratio parameter for elasticnet regularization.

fit()

Fits the model to the data. Can be used like the sklearn.linear_model.LogisticRegression class or with the from_formula class method from statsmodels.

predict(new_data: DataFrame)

Predicts the class labels for new data.

conf_int(conf_level=0.95)

Calculates the confidence intervals for the model coefficients.

se()

Calculates the standard errors for the model coefficients.

z_values()

Calculates the z-scores for the model coefficients.

p_values()

Calculates the p-values for the model coefficients.

summary(conf_level=0.95)

Prints a summary of the model.

from_formula(formula, data)

Class method to create an instance from a formula.

params

Returns the estimated values for model parameters.

aic

Calculates the Akaike information criterion (AIC) for the model.

bic

Calculates the Bayesian information criterion (BIC) for the model.

cov_matrix

Returns the estimated covariance matrix for model parameters.

residuals

Returns the deviance of the model.

deviance_residuals

Returns the deviance residuals.

Examples

  • Example 1: Using the LogisticRegression() like the statsmodels Logit class.

import numpy as np
import pandas as pd
from estyp.linear_model import LogisticRegression

np.random.seed(123)
data = pd.DataFrame({
   "y": np.random.randint(2, size=100),
   "x1": np.random.uniform(-1, 1, size=100),
   "x2": np.random.uniform(-1, 1, size=100),
})

formula = "y ~ x1 + x2"
spec = LogisticRegression.from_formula(formula, data)
model = spec.fit()

print(model.summary())
           Estimate      S.E.         z  Pr(>|z|)   [Lower,    Upper]
Intercept -0.200864  0.202894 -0.989996  0.322176 -0.598530  0.196801
x1         0.032006  0.375254  0.085292  0.932030 -0.703478  0.767490
x2         0.438665  0.344263  1.274215  0.202587 -0.236078  1.113407
  • Example 2: Using LogisticRegression() like the sklearn.linear_model.LogisticRegression() class.

from estyp.linear_model import LogisticRegression

X = data.drop(columns="y")
y = data["y"]

model = LogisticRegression()
model.fit(X, y)

print(model.summary())
           Estimate      S.E.         z  Pr(>|z|)   [Lower,    Upper]
Intercept -0.200864  0.202894 -0.989996  0.322176 -0.598530  0.196801
x1         0.032006  0.375254  0.085292  0.932030 -0.703478  0.767490
x2         0.438665  0.344263  1.274215  0.202587 -0.236078  1.113407

Stepwise Selection for Linear Models

class Stepwise(formula, data, model, direction, criterion, alpha, max_iter, formula_params, fit_params, verbose)

The Stepwise class provides a method to perform stepwise model selection, which is a method to add or remove predictors based on their significance, AIC or BIC in a model.

Parameters:
  • formula (str) – A string representing the formula, using the patsy formula syntax.

  • data (DataFrame) – A pandas DataFrame that contains the data for both the dependent and independent variables.

  • model (Union[GLM, OLS, Logit, LogisticRegression]) – Specifies the type of model to be used.

  • direction (Literal["both", "forward", "backward"]) – Specifies the direction of the stepwise process.

  • criterion (Literal["aic", "bic", "f-test"]) – The criterion to be used for adding or removing predictors.

  • alpha (float) – The significance level for adding or removing predictors. It must be a value between 0 and 1.

  • max_iter (int) – The maximum number of iterations for the both direction process.

  • formula_params (Dict[str, Any]) – Additional parameters to be passed to the model’s from_formula method.

  • fit_params (Dict[str, Any]) – Additional parameters to be passed to the model’s fit method.

  • verbose (bool) – If set to False, the class will not print information about the stepwise process.

optimal_model_

The optimal model obtained after the stepwise process.

optimal_formula_

The optimal model formula after the stepwise process.

optimal_variables_

List of optimal predictor variables in the final model.

optimal_metric_

The optimal value of the chosen criterion (e.g., AIC, BIC, or F-test) for the final model.

fit()

Conducts the stepwise process based on the specified direction and criterion.

Examples:

import pandas as pd
from statsmodels.api import OLS
from estyp.linear_model import Stepwise
data = pd.DataFrame({"y": [1,2,3,4,5], "x1": [5,20,3,2,1], "x2": [6,7,8,9,10]})
stepwise = Stepwise(formula="y ~ 1", data=data, model=OLS, direction="forward", criterion="aic")
stepwise.fit()
print("Best predictors:", stepwise.optimal_variables_)
Starting AIC: 19.6551
- Term added: "x2" | AIC: -317.7430
- Term added: "x1" | AIC: -323.7311
Forward selection completed
- Obtained AIC: -323.7311
- Added terms: None
- Obtained formula: "y ~ x2 + x1"
Best predictors: ['x2', 'x1']
plot_history(ax=None)

Plots the history of the chosen criterion during the stepwise.

Parameters:

ax (matplotlib.axes.Axes, optional) – An Axes instance for the plot. If not provided, a new figure and axes will be created.

Returns:

fig, axmatplotlib.figure.Figure, matplotlib.axes.Axes

The Figure and Axes instances containing the plot if not provided.

Examples:

import pandas as pd
from statsmodels.api import OLS
from estyp.linear_model import Stepwise

data = pd.DataFrame(
   {
      "y": [1, 2, 3, 4, 5],
      "x1": [5, 20, 3, 2, 1],
      "x2": [6, 7, 8, 9, 10],
      "x3": [1, 2, 40, 4, 30],
      "x4": [20, 1, 4, 5, 6],
      "x5": [90, -1, 40, 5, 26],
   }
)
stepwise = Stepwise(
   formula="y ~ x1 + x2 + x3 + x4 + x5",
   data=data,
   model=OLS,
   direction="backward",
   criterion="bic",
)
stepwise.fit()
fig, ax = stepwise.plot_history()
Starting BIC: -299.2531
- Term dropped: "x3" | BIC: -312.2742
- Term dropped: "x5" | BIC: -315.9306
- Term dropped: "x1" | BIC: -320.3508
Backward selection completed
- Obtained BIC: -320.3508
- Dropped terms: 3
- Obtained formula: "y ~ x2 + x4"
_images/linear_model_4_3.png

Example

import pandas as pd
from statsmodels.api import OLS
from estyp.linear_model import Stepwise
data = pd.DataFrame({"y": [1,2,3,4,5], "x1": [5,20,3,2,1], "x2": [6,7,8,9,10]})
stepwise = Stepwise(formula="y ~ 1", data=data, model=OLS, direction="forward", criterion="aic")
stepwise.fit()
Starting AIC: 19.6551
- Term added: "x2" | AIC: -317.7430
- Term added: "x1" | AIC: -323.7311
Forward selection completed
- Obtained AIC: -323.7311
- Added terms: None
- Obtained formula: "y ~ x2 + x1"

Note

  • The class is designed to work seamlessly with statsmodels models.

  • If using “both” as the direction, the “f-test” criterion is not available.

  • Ensure that the data provided is appropriate for the model chosen.