The linear_model module
Module contents
Logistic Regression
- class LogisticRegression(X, y, penalty, dual, tol, C, fit_intercept, intercept_scaling, class_weight, random_state, solver, max_iter, verbose, warm_start, n_jobs, l1_ratio)
This class implements a logistic regression model. It is like the sklearn.linear_model.LogisticRegression class, but adds additional methods for calculating confidence intervals, p-values, and model summaries.
- __init__(X, y, penalty, dual, tol, C, fit_intercept, intercept_scaling, class_weight, random_state, solver, max_iter, verbose, warm_start, n_jobs, l1_ratio)
- Parameters:
X (Union[DataFrame, ndarray, None]) – A Pandas DataFrame or a NumPy array containing the model predictors.
y (Union[Series, ndarray, None]) – A Pandas Series or a NumPy array containing the model response.
penalty (Literal['l1', 'l2', 'elasticnet']) – The type of penalty to use. Can be one of
"none"(default)."l1","l2", or"elasticnet".dual (bool) – Whether to use the dual formulation of the problem.
tol (float) – The tolerance for convergence.
C (int) – The regularization strength.
fit_intercept (bool) – Whether to fit an intercept term.
intercept_scaling (int) – The scaling factor for the intercept term.
class_weight (Union[None, str, dict]) – None (default), “balanced” or a dictionary that maps class labels to weights.
random_state (int) – The random seed.
solver (Literal['lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga']) – The solver to use. Can be one of
"lbfgs"(default),"liblinear","newton-cg","newton-cholesky","sag", or"saga".max_iter (int) – The maximum number of iterations.
verbose (int) – The verbosity level.
warm_start (bool) – Whether to use the warm start.
n_jobs (int) – The number of jobs to use for parallel processing.
l1_ratio (Union[float, None]) – The l1_ratio parameter for elasticnet regularization.
- fit()
Fits the model to the data. Can be used like the sklearn.linear_model.LogisticRegression class or with the from_formula class method from statsmodels.
- predict(new_data: DataFrame)
Predicts the class labels for new data.
- conf_int(conf_level=0.95)
Calculates the confidence intervals for the model coefficients.
- se()
Calculates the standard errors for the model coefficients.
- z_values()
Calculates the z-scores for the model coefficients.
- p_values()
Calculates the p-values for the model coefficients.
- summary(conf_level=0.95)
Prints a summary of the model.
- from_formula(formula, data)
Class method to create an instance from a formula.
- params
Returns the estimated values for model parameters.
- aic
Calculates the Akaike information criterion (AIC) for the model.
- bic
Calculates the Bayesian information criterion (BIC) for the model.
- cov_matrix
Returns the estimated covariance matrix for model parameters.
- residuals
Returns the deviance of the model.
- deviance_residuals
Returns the deviance residuals.
Examples
Example 1: Using the LogisticRegression() like the statsmodels Logit class.
import numpy as np import pandas as pd from estyp.linear_model import LogisticRegression np.random.seed(123) data = pd.DataFrame({ "y": np.random.randint(2, size=100), "x1": np.random.uniform(-1, 1, size=100), "x2": np.random.uniform(-1, 1, size=100), }) formula = "y ~ x1 + x2" spec = LogisticRegression.from_formula(formula, data) model = spec.fit() print(model.summary())
Estimate S.E. z Pr(>|z|) [Lower, Upper] Intercept -0.200864 0.202894 -0.989996 0.322176 -0.598530 0.196801 x1 0.032006 0.375254 0.085292 0.932030 -0.703478 0.767490 x2 0.438665 0.344263 1.274215 0.202587 -0.236078 1.113407
Example 2: Using LogisticRegression() like the sklearn.linear_model.LogisticRegression() class.
from estyp.linear_model import LogisticRegression X = data.drop(columns="y") y = data["y"] model = LogisticRegression() model.fit(X, y) print(model.summary())
Estimate S.E. z Pr(>|z|) [Lower, Upper] Intercept -0.200864 0.202894 -0.989996 0.322176 -0.598530 0.196801 x1 0.032006 0.375254 0.085292 0.932030 -0.703478 0.767490 x2 0.438665 0.344263 1.274215 0.202587 -0.236078 1.113407
Stepwise Selection for Linear Models
- class Stepwise(formula, data, model, direction, criterion, alpha, max_iter, formula_params, fit_params, verbose)
The Stepwise class provides a method to perform stepwise model selection, which is a method to add or remove predictors based on their significance, AIC or BIC in a model.
- Parameters:
formula (str) – A string representing the formula, using the patsy formula syntax.
data (DataFrame) – A pandas DataFrame that contains the data for both the dependent and independent variables.
model (Union[GLM, OLS, Logit, LogisticRegression]) – Specifies the type of model to be used.
direction (Literal["both", "forward", "backward"]) – Specifies the direction of the stepwise process.
criterion (Literal["aic", "bic", "f-test"]) – The criterion to be used for adding or removing predictors.
alpha (float) – The significance level for adding or removing predictors. It must be a value between 0 and 1.
max_iter (int) – The maximum number of iterations for the both direction process.
formula_params (Dict[str, Any]) – Additional parameters to be passed to the model’s from_formula method.
fit_params (Dict[str, Any]) – Additional parameters to be passed to the model’s fit method.
verbose (bool) – If set to False, the class will not print information about the stepwise process.
- optimal_model_
The optimal model obtained after the stepwise process.
- optimal_formula_
The optimal model formula after the stepwise process.
- optimal_variables_
List of optimal predictor variables in the final model.
- optimal_metric_
The optimal value of the chosen criterion (e.g., AIC, BIC, or F-test) for the final model.
- fit()
Conducts the stepwise process based on the specified direction and criterion.
Examples:
import pandas as pd from statsmodels.api import OLS from estyp.linear_model import Stepwise data = pd.DataFrame({"y": [1,2,3,4,5], "x1": [5,20,3,2,1], "x2": [6,7,8,9,10]}) stepwise = Stepwise(formula="y ~ 1", data=data, model=OLS, direction="forward", criterion="aic") stepwise.fit() print("Best predictors:", stepwise.optimal_variables_)
Starting AIC: 19.6551 - Term added: "x2" | AIC: -317.7430 - Term added: "x1" | AIC: -323.7311 [92m[4m[1mForward selection completed[0m - Obtained AIC: -323.7311 - Added terms: None - Obtained formula: "y ~ x2 + x1" Best predictors: ['x2', 'x1']
- plot_history(ax=None)
Plots the history of the chosen criterion during the stepwise.
- Parameters:
ax (matplotlib.axes.Axes, optional) – An Axes instance for the plot. If not provided, a new figure and axes will be created.
Returns:
- fig, axmatplotlib.figure.Figure, matplotlib.axes.Axes
The Figure and Axes instances containing the plot if not provided.
Examples:
import pandas as pd from statsmodels.api import OLS from estyp.linear_model import Stepwise data = pd.DataFrame( { "y": [1, 2, 3, 4, 5], "x1": [5, 20, 3, 2, 1], "x2": [6, 7, 8, 9, 10], "x3": [1, 2, 40, 4, 30], "x4": [20, 1, 4, 5, 6], "x5": [90, -1, 40, 5, 26], } ) stepwise = Stepwise( formula="y ~ x1 + x2 + x3 + x4 + x5", data=data, model=OLS, direction="backward", criterion="bic", ) stepwise.fit() fig, ax = stepwise.plot_history()
Starting BIC: -299.2531
- Term dropped: "x3" | BIC: -312.2742
- Term dropped: "x5" | BIC: -315.9306 - Term dropped: "x1" | BIC: -320.3508 [92m[4m[1mBackward selection completed[0m - Obtained BIC: -320.3508 - Dropped terms: 3 - Obtained formula: "y ~ x2 + x4"
Example
import pandas as pd from statsmodels.api import OLS from estyp.linear_model import Stepwise data = pd.DataFrame({"y": [1,2,3,4,5], "x1": [5,20,3,2,1], "x2": [6,7,8,9,10]}) stepwise = Stepwise(formula="y ~ 1", data=data, model=OLS, direction="forward", criterion="aic") stepwise.fit()
Starting AIC: 19.6551 - Term added: "x2" | AIC: -317.7430 - Term added: "x1" | AIC: -323.7311 [92m[4m[1mForward selection completed[0m - Obtained AIC: -323.7311 - Added terms: None - Obtained formula: "y ~ x2 + x1"
Note
The class is designed to work seamlessly with statsmodels models.
If using “both” as the direction, the “f-test” criterion is not available.
Ensure that the data provided is appropriate for the model chosen.