Title: | Distribution-Based Model Selection |
---|---|
Description: | Perform model selection using distribution and probability-based methods, including standardized AIC, BIC, and AICc. These standardized information criteria allow one to perform model selection in a way similar to the prevalent "Rule of 2" method, but formalize the method to rely on probability theory. A novel goodness-of-fit procedure for assessing linear regression models is also available. This test relies on theoretical properties of the estimated error variance for a normal linear regression model, and employs a bootstrap procedure to assess the null hypothesis that the fitted model shows no lack of fit. For more information, see Koeneman and Cavanaugh (2023) <arXiv:2309.10614>. Functionality to perform all subsets linear or generalized linear regression is also available. |
Authors: | Scott H. Koeneman [aut, cre] |
Maintainer: | Scott H. Koeneman <[email protected]> |
License: | GPL-3 |
Version: | 0.2.1 |
Built: | 2025-03-09 03:09:51 UTC |
Source: | https://github.com/shkoeneman/dbmodelselect |
Perform model selection using distribution and probability-based methods, including standardized AIC, BIC, and AICc. These standardized information criteria allow one to perform model selection in a way similar to the prevalent "Rule of 2" method, but formalize the method to rely on probability theory. A novel goodness-of-fit procedure for assessing linear regression models is also available. This test relies on theoretical properties of the estimated error variance for a normal linear regression model, and employs a bootstrap procedure to assess the null hypothesis that the fitted model shows no lack of fit. For more information, see Koeneman and Cavanaugh (2023) <arXiv:2309.10614>. Functionality to perform all subsets linear or generalized linear regression is also available.
The DESCRIPTION file:
Package: | DBModelSelect |
Type: | Package |
Title: | Distribution-Based Model Selection |
Version: | 0.2.1 |
Date: | 2023-09-25 |
Authors@R: | person("Scott H.", "Koeneman", email = "[email protected]", role = c("aut", "cre")) |
Description: | Perform model selection using distribution and probability-based methods, including standardized AIC, BIC, and AICc. These standardized information criteria allow one to perform model selection in a way similar to the prevalent "Rule of 2" method, but formalize the method to rely on probability theory. A novel goodness-of-fit procedure for assessing linear regression models is also available. This test relies on theoretical properties of the estimated error variance for a normal linear regression model, and employs a bootstrap procedure to assess the null hypothesis that the fitted model shows no lack of fit. For more information, see Koeneman and Cavanaugh (2023) <arXiv:2309.10614>. Functionality to perform all subsets linear or generalized linear regression is also available. |
URL: | https://github.com/shkoeneman/DBModelSelect |
License: | GPL-3 |
Depends: | R (>= 4.1.0) |
RoxygenNote: | 7.2.3 |
Repository: | https://shkoeneman.r-universe.dev |
RemoteUrl: | https://github.com/shkoeneman/dbmodelselect |
RemoteRef: | HEAD |
RemoteSha: | 29e5d48ebd4e1417188b7b3a1cb865ec681db342 |
Author: | Scott H. Koeneman [aut, cre] |
Maintainer: | Scott H. Koeneman <[email protected]> |
Index of help topics:
AICc Corrected AIC for linear models BootGOFTestLM Bootstrap goodness-of-fit procedure for linear models DBModelSelect-package Distribution-Based Model Selection FitGLMSubsets Perform all subsets regression for generalized linear models FitLMSubsets Perform all subsets linear regression StandICModelSelect Model selection using standardized information criteria
The DBModelSelect package
provides several methods of
model selection based in distributional theory. This includes
an implementation of selection using standardized information
criteria in the StandICModelSelect
function, and
the implementation of an omnibus goodness-of-fit test for
linear models in the BootGOFTestLM
function.
Maintainer: Scott H. Koeneman [email protected]
Useful links:
Calculates corrected AIC for an 'lm' linear model object.
AICc(model)
AICc(model)
model |
A fitted 'lm' object. |
The numeric value of of corrected AIC for the supplied linear model object.
#generate data set.seed(9122023) data <- data.frame(x = rnorm(100), y = rnorm(100)) AICc(lm(y~x, data = data))
#generate data set.seed(9122023) data <- data.frame(x = rnorm(100), y = rnorm(100)) AICc(lm(y~x, data = data))
Performs a bootstrap goodness-of-fit procedure to assess the fit of a normal linear regression model
BootGOFTestLM( x, data, boot_iter = 1000, level = 0.95, return_dist = FALSE, ... ) ## S3 method for class 'BootGOFTestLM' print(x, ...)
BootGOFTestLM( x, data, boot_iter = 1000, level = 0.95, return_dist = FALSE, ... ) ## S3 method for class 'BootGOFTestLM' print(x, ...)
x |
A fitted |
data |
A dataframe used to fit the model given by |
boot_iter |
An integer indicating number of bootstrap iterations to perform. |
level |
Confidence level of the bootstrap interval used in the test. |
return_dist |
A logical specifying whether to optionally return the bootstrap distribution. Defaults to FALSE. |
... |
Additional arguments. |
A list containing the specification and results of the test. The hypothesis of adequate fit is rejected if the null value is not contained in the bootstrap interval.
# generate some data set.seed(5122023) data <- data.frame(s = rnorm(200), t = rnorm(200)) data$y <- data$s + rnorm(200) # determine whether candidate model shows lack of fit model <- lm(y~s+t, data = data) BootGOFTestLM(model, data = data, boot_iter = 100)
# generate some data set.seed(5122023) data <- data.frame(s = rnorm(200), t = rnorm(200)) data$y <- data$s + rnorm(200) # determine whether candidate model shows lack of fit model <- lm(y~s+t, data = data) BootGOFTestLM(model, data = data, boot_iter = 100)
Fit a specified generalized linear model on all subsets of covariates supplied. May be done in parallel if a cluster is supplied. Produces an output suitable for use with the StandICModelSelect
function.
FitGLMSubsets( response, data, family, intercept = TRUE, force_intercept = TRUE, cluster = NULL, ... )
FitGLMSubsets( response, data, family, intercept = TRUE, force_intercept = TRUE, cluster = NULL, ... )
response |
A character string specifying the name of the response variable. |
data |
A dataframe containing a column corresponding to the response variable in addition to columns for each covariate of interest. |
family |
A family suitable for supplying to the |
intercept |
A logical indicating whether an intercept term should be considered in models. Defaults to TRUE. |
force_intercept |
A logical indicating whether to force an intercept term into all models if an intercept is desired. Defaults to TRUE. |
cluster |
A cluster created using |
... |
Additional arguments that may be supplied when calling |
A list of fitted models suitable for use with the StandICModelSelect
function.
# example code # generate some data data <- data.frame(s = rnorm(200), t = rnorm(200)) data$y <- data$s + rnorm(200) # perform all subsets regression model_list <- FitGLMSubsets(response = "y", data = data, family = gaussian(), intercept = TRUE, force_intercept = TRUE) # perform model selection model_select <- StandICModelSelect(model_list, IC = "AIC")
# example code # generate some data data <- data.frame(s = rnorm(200), t = rnorm(200)) data$y <- data$s + rnorm(200) # perform all subsets regression model_list <- FitGLMSubsets(response = "y", data = data, family = gaussian(), intercept = TRUE, force_intercept = TRUE) # perform model selection model_select <- StandICModelSelect(model_list, IC = "AIC")
Perform linear regression on all subsets of covariates supplied. May be done in parallel if a cluster is supplied. Produces an output suitable for use with the StandICModelSelect
function.
FitLMSubsets( response, data, intercept = TRUE, force_intercept = TRUE, cluster = NULL )
FitLMSubsets( response, data, intercept = TRUE, force_intercept = TRUE, cluster = NULL )
response |
A character string specifying the name of the response variable. |
data |
A dataframe containing a column corresponding to the response variable in addition to columns for each covariate of interest. |
intercept |
A logical indicating whether an intercept term should be considered in models. Defaults to TRUE. |
force_intercept |
A logical indicating whether to force an intercept term into all models if an intercept is desired. Defaults to TRUE. |
cluster |
A cluster created using |
A list of fitted linear models suitable for use with the StandICModelSelect
function.
# example code # generate some data data <- data.frame(s = rnorm(200), t = rnorm(200)) data$y <- data$s + rnorm(200) # perform all subsets regression model_list <- FitLMSubsets(response = "y", data = data, intercept = TRUE, force_intercept = TRUE) # perform model selection model_select <- StandICModelSelect(model_list, IC = "AIC")
# example code # generate some data data <- data.frame(s = rnorm(200), t = rnorm(200)) data$y <- data$s + rnorm(200) # perform all subsets regression model_list <- FitLMSubsets(response = "y", data = data, intercept = TRUE, force_intercept = TRUE) # perform model selection model_select <- StandICModelSelect(model_list, IC = "AIC")
Perform model selection on a list of models using standardized information criteria.
StandICModelSelect( x, IC = "AIC", ref_model_index = NULL, sd_cutoff = 2, user_df = NULL, ... ) ## S3 method for class 'StandICModelSelect' print(x, ...) ## S3 method for class 'StandICModelSelect' plot(x, ...)
StandICModelSelect( x, IC = "AIC", ref_model_index = NULL, sd_cutoff = 2, user_df = NULL, ... ) ## S3 method for class 'StandICModelSelect' print(x, ...) ## S3 method for class 'StandICModelSelect' plot(x, ...)
x |
A list containing the fitted model objects on which to perform model selection. Model objects must have a |
IC |
A character string containing the base information criteria to use. Options are "AIC", "BIC", and "AICc" for linear models. Default option is 'AIC'. |
ref_model_index |
An integer with the index of the largest candidate model to use as the reference. If not supplied, defaults to the model with largest number of estimated coefficients in |
sd_cutoff |
A numeric describing how many standard deviations to use when formulating a cutoff for model viability. |
user_df |
An optional vector the same length as |
... |
Additional arguments. |
A list containing the final model selected in addition to standardized information criteria and difference in degrees of freedom for all candidate models.
# example code # generate some data s <- rnorm(200) t <- rnorm(200) y <- s + rnorm(200) # formulate and fit models model_list <- list(lm(y~1), lm(y~s), lm(y~t), lm(y~s+t)) # perform model selection model_select <- StandICModelSelect(model_list, IC = "AIC") # display best model model_select$best_model
# example code # generate some data s <- rnorm(200) t <- rnorm(200) y <- s + rnorm(200) # formulate and fit models model_list <- list(lm(y~1), lm(y~s), lm(y~t), lm(y~s+t)) # perform model selection model_select <- StandICModelSelect(model_list, IC = "AIC") # display best model model_select$best_model