An R6 class for a model
Model.Rd
A Model specifies how a model looks like, fits and validates it, tunes hyperparameters, stores it and predicts from it.
Public fields
name
A telling name for the model.
directory
Store/find the
Model
withfit_obj
set in this directory.fitter
Fit and validate the model with this fitting function.
time_cutoffs
Threshold and censor the outcome accordingly.
val_error_fun
Calculates the error of the validated predictions.
hyperparams
Optional arguments passed to
fitter
.include_from_continuous_pheno
The names of the continuous variables in the pheno data (to be) included in the predictor matrix.
include_from_discrete_pheno
The names of the discrete variables in the pheno data (to be) included in the predictor matrix.
include_expr
Whether to include the expression data in the predictor matrix.
combine_n_max_categorical_features
Maximum number of categorical features to combine.
combined_feature_min_positive_ratio
Minimum ratio of positive observations in a combined (categorical) feature.
enable_imputation
Overrides the
imputer
attribute of theData
object.fit_obj
The fitted object, something returned by a fitter like a
ptk_zerosum
S3 object.create_directory
Whether to create
directory
if it does not exist, yet.file
Store this Model object under this name in
directory
.li_var_suffix
Append this to the names of features from the pheno data when adding them to the predictor matrix.
Methods
Method new()
Create a new Model instance.
Usage
Model$new(
name,
fitter,
directory,
time_cutoffs,
val_error_fun,
hyperparams = NULL,
include_from_continuous_pheno = NULL,
include_from_discrete_pheno = NULL,
include_expr = TRUE,
combine_n_max_categorical_features = 1L,
combined_feature_min_positive_ratio = 0.04,
enable_imputation = TRUE,
file = "model.rds",
create_directory = TRUE,
li_var_suffix = "++"
)
Arguments
name
string. A telling name for the model.
fitter
function. Fit the model to the data and validate it with this fitting function. See
fitter_prototype()
for its required interface. patroklos provides two fitters out of the box:ptk_zerosum()
, a wrapper aroundzeroSum::zeroSum
andptk_ranger()
, a wrapper aroundranger::ranger()
. To tune more than just one combination of hyperparameters, decorate a fitter withmultitune()
.directory
string. Store/find the
Model
withfit_obj
set in this directory.time_cutoffs
numeric vector. A model-agnostic hyperparameter that changes the response during training as follows: For every value of
time_cutoffs
, specify a model on the following response data:For Cox response, censor all patients where the event occurred after
time_cutoffs
at this value and train the specified model.For binary response, binarize the outcome depending on whether it occurred before or after this value.
We already tune this hyperparameter and only store the best model according to validation as
fit_obj
and report the chosen time cutoff astime_cutoff
attribute in it.val_error_fun
Function to calculate the error of validated predictions. For its interface, see
val_error_fun_prototype()
.hyperparams
list. Optional arguments passed to
fitter
, e.g. alpha in case of an elastic net.include_from_continuous_pheno
vector of strings. The names of the continuous variables in the pheno data (to be) included in the predictor matrix. Default is
NULL
, which means no continuous pheno variables are or will be included.include_from_discrete_pheno
vector of strings. The names of the discrete variables in the pheno data (to be) included in the predictor matrix. A discrete variable with n levels will be dichotomized into n-1 binary dummy variables. Default is
NULL
, which means no discrete pheno variables are or will be included.include_expr
logical. Whether to include the expression data in the predictor matrix.
combine_n_max_categorical_features
integer. Maximum number of categorical features to combine in predicting features.
combined_feature_min_positive_ratio
numeric. Minimum ratio of positive observations in a combined (categorical) feature. This attribute together with
combine_n_max_categorical_features
governs which combined categorical features the predictor matrix will contain: add a combination of the levels of distinct categorical features to the predictor matrix after imputation if at mostcombine_n_max_categorical_features
are involved in the combination and if the combination is (expected to be) there in at leastcombined_feature_min_positive_ratio
of the samples.enable_imputation
logical. If
FALSE
, it overrides theimputer
attribute of theData
object and we do not impute.file
string. The name of the model-fit_obj file inside
directory
. Default is"fit_obj.rds"
.create_directory
logical. Whether to create
directory
if it does not exist, yet. Default isTRUE
.li_var_suffix
string. Append this to the names of features from the pheno data when adding them to the predictor matrix. Default is
"++"
.
Method fit()
Fit the model to a data set, validate it and tune hyperparameters.
Arguments
data
Data object. Read it in if needed.
update_model_shell
logical. If
TRUE
and we find a stored model withfit_obj
not being NULL, we set thefit_obj
attribute of the model to the foundfit_obj
and save it. This way, we can keep storedModel
s up-to-date with changes in theModel
class.quiet
logical. Whether to suppress messages. Default is
FALSE
.msg_prefix
string. Prefix for messages. Default is
""
.
Method predict()
Predict for a data set.
Arguments
data
Data object. Specifications on the data. Read it in if needed.
quiet
logical. Whether to suppress messages. Default is
FALSE
.
Returns
A named list of length 3:
"predicted"
: named numeric vector with predicted response,"actual"
: named numeric vector with actual response,"cox_mat"
: named matrix with columns"time_to_event"
,"event"
,"hazard"
, where hazard again is the predicted response. This matrix is helpful to calculate hazard ratios or logrank p-values.