
An R6 class for a model
Model.RdA Model specifies how a model looks like, fits and validates it, tunes hyperparameters, stores it and predicts from it.
Public fields
nameA telling name for the model.
directoryStore/find the
Modelwithfit_objset in this directory.fitterFit and validate the model with this fitting function.
time_cutoffsThreshold and censor the outcome accordingly.
val_error_funCalculates the error of the validated predictions.
hyperparamsOptional arguments passed to
fitter.include_from_continuous_phenoThe names of the continuous variables in the pheno data (to be) included in the predictor matrix.
include_from_discrete_phenoThe names of the discrete variables in the pheno data (to be) included in the predictor matrix.
include_exprWhether to include the expression data in the predictor matrix.
combine_n_max_categorical_featuresMaximum number of categorical features to combine.
combined_feature_min_positive_ratioMinimum ratio of positive observations in a combined (categorical) feature.
enable_imputationOverrides the
imputerattribute of theDataobject.fit_objThe fitted object, something returned by a fitter like a
ptk_zerosumS3 object.create_directoryWhether to create
directoryif it does not exist, yet.fileStore this Model object under this name in
directory.li_var_suffixAppend this to the names of features from the pheno data when adding them to the predictor matrix.
Methods
Method new()
Create a new Model instance.
Usage
Model$new(
name,
fitter,
directory,
time_cutoffs,
val_error_fun,
hyperparams = NULL,
include_from_continuous_pheno = NULL,
include_from_discrete_pheno = NULL,
include_expr = TRUE,
combine_n_max_categorical_features = 1L,
combined_feature_min_positive_ratio = 0.04,
enable_imputation = TRUE,
file = "model.rds",
create_directory = TRUE,
li_var_suffix = "++"
)Arguments
namestring. A telling name for the model.
fitterfunction. Fit the model to the data and validate it with this fitting function. See
fitter_prototype()for its required interface. patroklos provides two fitters out of the box:ptk_zerosum(), a wrapper aroundzeroSum::zeroSumandptk_ranger(), a wrapper aroundranger::ranger(). To tune more than just one combination of hyperparameters, decorate a fitter withmultitune().directorystring. Store/find the
Modelwithfit_objset in this directory.time_cutoffsnumeric vector. A model-agnostic hyperparameter that changes the response during training as follows: For every value of
time_cutoffs, specify a model on the following response data:For Cox response, censor all patients where the event occurred after
time_cutoffsat this value and train the specified model.For binary response, binarize the outcome depending on whether it occurred before or after this value.
We already tune this hyperparameter and only store the best model according to validation as
fit_objand report the chosen time cutoff astime_cutoffattribute in it.val_error_funFunction to calculate the error of validated predictions. For its interface, see
val_error_fun_prototype().hyperparamslist. Optional arguments passed to
fitter, e.g. alpha in case of an elastic net.include_from_continuous_phenovector of strings. The names of the continuous variables in the pheno data (to be) included in the predictor matrix. Default is
NULL, which means no continuous pheno variables are or will be included.include_from_discrete_phenovector of strings. The names of the discrete variables in the pheno data (to be) included in the predictor matrix. A discrete variable with n levels will be dichotomized into n-1 binary dummy variables. Default is
NULL, which means no discrete pheno variables are or will be included.include_exprlogical. Whether to include the expression data in the predictor matrix.
combine_n_max_categorical_featuresinteger. Maximum number of categorical features to combine in predicting features.
combined_feature_min_positive_rationumeric. Minimum ratio of positive observations in a combined (categorical) feature. This attribute together with
combine_n_max_categorical_featuresgoverns which combined categorical features the predictor matrix will contain: add a combination of the levels of distinct categorical features to the predictor matrix after imputation if at mostcombine_n_max_categorical_featuresare involved in the combination and if the combination is (expected to be) there in at leastcombined_feature_min_positive_ratioof the samples.enable_imputationlogical. If
FALSE, it overrides theimputerattribute of theDataobject and we do not impute.filestring. The name of the model-fit_obj file inside
directory. Default is"fit_obj.rds".create_directorylogical. Whether to create
directoryif it does not exist, yet. Default isTRUE.li_var_suffixstring. Append this to the names of features from the pheno data when adding them to the predictor matrix. Default is
"++".
Method fit()
Fit the model to a data set, validate it and tune hyperparameters.
Arguments
dataData object. Read it in if needed.
update_model_shelllogical. If
TRUEand we find a stored model withfit_objnot being NULL, we set thefit_objattribute of the model to the foundfit_objand save it. This way, we can keep storedModels up-to-date with changes in theModelclass.quietlogical. Whether to suppress messages. Default is
FALSE.msg_prefixstring. Prefix for messages. Default is
"".
Method predict()
Predict for a data set.
Arguments
dataData object. Specifications on the data. Read it in if needed.
quietlogical. Whether to suppress messages. Default is
FALSE.
Returns
A named list of length 3:
"predicted": named numeric vector with predicted response,"actual": named numeric vector with actual response,"cox_mat": named matrix with columns"time_to_event","event","hazard", where hazard again is the predicted response. This matrix is helpful to calculate hazard ratios or logrank p-values.