An R6 class for a model

A Model specifies how a model looks like, fits and validates it, tunes hyperparameters, stores it and predicts from it.

Public fields

name: A telling name for the model.
directory: Store/find the Model with fit_obj set in this directory.
fitter: Fit and validate the model with this fitting function.
time_cutoffs: Threshold and censor the outcome accordingly.
val_error_fun: Calculates the error of the validated predictions.
hyperparams: Optional arguments passed to fitter.
include_from_continuous_pheno: The names of the continuous variables in the pheno data (to be) included in the predictor matrix.
include_from_discrete_pheno: The names of the discrete variables in the pheno data (to be) included in the predictor matrix.
include_expr: Whether to include the expression data in the predictor matrix.
combine_n_max_categorical_features: Maximum number of categorical features to combine.
combined_feature_min_positive_ratio: Minimum ratio of positive observations in a combined (categorical) feature.
enable_imputation: Overrides the imputer attribute of the Data object.
fit_obj: The fitted object, something returned by a fitter like a ptk_zerosum S3 object.
create_directory: Whether to create directory if it does not exist, yet.
file: Store this Model object under this name in directory.
li_var_suffix: Append this to the names of features from the pheno data when adding them to the predictor matrix.

Methods

Method `new()`

Create a new Model instance.

Usage

Model$new(
  name,
  fitter,
  directory,
  time_cutoffs,
  val_error_fun,
  hyperparams = NULL,
  include_from_continuous_pheno = NULL,
  include_from_discrete_pheno = NULL,
  include_expr = TRUE,
  combine_n_max_categorical_features = 1L,
  combined_feature_min_positive_ratio = 0.04,
  enable_imputation = TRUE,
  file = "model.rds",
  create_directory = TRUE,
  li_var_suffix = "++"
)

Arguments

name

string. A telling name for the model.

fitter

function. Fit the model to the data and validate it with this fitting function. See fitter_prototype() for its required interface. patroklos provides two fitters out of the box: ptk_zerosum(), a wrapper around zeroSum::zeroSum and ptk_ranger(), a wrapper around ranger::ranger(). To tune more than just one combination of hyperparameters, decorate a fitter with multitune().

directory

string. Store/find the Model with fit_obj set in this directory.

time_cutoffs

numeric vector. A model-agnostic hyperparameter that changes the response during training as follows: For every value of time_cutoffs, specify a model on the following response data:

For Cox response, censor all patients where the event occurred after time_cutoffs at this value and train the specified model.
For binary response, binarize the outcome depending on whether it occurred before or after this value.

We already tune this hyperparameter and only store the best model according to validation as fit_obj and report the chosen time cutoff as time_cutoff attribute in it.

val_error_fun

Function to calculate the error of validated predictions. For its interface, see val_error_fun_prototype().

hyperparams

list. Optional arguments passed to fitter, e.g. alpha in case of an elastic net.

include_from_continuous_pheno

vector of strings. The names of the continuous variables in the pheno data (to be) included in the predictor matrix. Default is NULL, which means no continuous pheno variables are or will be included.

include_from_discrete_pheno

vector of strings. The names of the discrete variables in the pheno data (to be) included in the predictor matrix. A discrete variable with n levels will be dichotomized into n-1 binary dummy variables. Default is NULL, which means no discrete pheno variables are or will be included.

include_expr

logical. Whether to include the expression data in the predictor matrix.

combine_n_max_categorical_features

integer. Maximum number of categorical features to combine in predicting features.

combined_feature_min_positive_ratio

numeric. Minimum ratio of positive observations in a combined (categorical) feature. This attribute together with combine_n_max_categorical_features governs which combined categorical features the predictor matrix will contain: add a combination of the levels of distinct categorical features to the predictor matrix after imputation if at most combine_n_max_categorical_features are involved in the combination and if the combination is (expected to be) there in at least combined_feature_min_positive_ratio of the samples.

enable_imputation

logical. If FALSE, it overrides the imputer attribute of the Data object and we do not impute.

file

string. The name of the model-fit_obj file inside directory. Default is "fit_obj.rds".

create_directory

logical. Whether to create directory if it does not exist, yet. Default is TRUE.

li_var_suffix

string. Append this to the names of features from the pheno data when adding them to the predictor matrix. Default is "++".

Returns

A Model R6 object.

Method `fit()`

Fit the model to a data set, validate it and tune hyperparameters.

Usage

Model$fit(data, update_model_shell = FALSE, quiet = FALSE, msg_prefix = "")

Arguments

data: Data object. Read it in if needed.
update_model_shell: logical. If TRUE and we find a stored model with fit_obj not being NULL, we set the fit_obj attribute of the model to the found fit_obj and save it. This way, we can keep stored Models up-to-date with changes in the Model class.
quiet: logical. Whether to suppress messages. Default is FALSE.
msg_prefix: string. Prefix for messages. Default is "".

Returns

The Model object itself with the fit_obj attribute set to the object tuned over time_cutoffs, combine_n_max_categorical_features and hyperparams.

Method `predict()`

Predict for a data set.

Usage

Model$predict(data, quiet = FALSE)

Arguments

data: Data object. Specifications on the data. Read it in if needed.
quiet: logical. Whether to suppress messages. Default is FALSE.

Returns

A named list of length 3:

"predicted": named numeric vector with predicted response,
"actual": named numeric vector with actual response,
"cox_mat": named matrix with columns "time_to_event", "event", "hazard", where hazard again is the predicted response. This matrix is helpful to calculate hazard ratios or logrank p-values.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

Model$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

See also

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method fit()

Usage

Arguments

Returns

Method predict()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Method `new()`

Method `fit()`

Method `predict()`

Method `clone()`