Skip to contents

A Model specifies how a model looks like, fits and validates it, tunes hyperparameters, stores it and predicts from it.

See also

Public fields

name

A telling name for the model.

directory

Store/find the Model with fit_obj set in this directory.

fitter

Fit and validate the model with this fitting function.

time_cutoffs

Threshold and censor the outcome accordingly.

val_error_fun

Calculates the error of the validated predictions.

hyperparams

Optional arguments passed to fitter.

include_from_continuous_pheno

The names of the continuous variables in the pheno data (to be) included in the predictor matrix.

include_from_discrete_pheno

The names of the discrete variables in the pheno data (to be) included in the predictor matrix.

include_expr

Whether to include the expression data in the predictor matrix.

combine_n_max_categorical_features

Maximum number of categorical features to combine.

combined_feature_min_positive_ratio

Minimum ratio of positive observations in a combined (categorical) feature.

enable_imputation

Overrides the imputer attribute of the Data object.

fit_obj

The fitted object, something returned by a fitter like a ptk_zerosum S3 object.

create_directory

Whether to create directory if it does not exist, yet.

file

Store this Model object under this name in directory.

li_var_suffix

Append this to the names of features from the pheno data when adding them to the predictor matrix.

Methods


Method new()

Create a new Model instance.

Usage

Model$new(
  name,
  fitter,
  directory,
  time_cutoffs,
  val_error_fun,
  hyperparams = NULL,
  include_from_continuous_pheno = NULL,
  include_from_discrete_pheno = NULL,
  include_expr = TRUE,
  combine_n_max_categorical_features = 1L,
  combined_feature_min_positive_ratio = 0.04,
  enable_imputation = TRUE,
  file = "model.rds",
  create_directory = TRUE,
  li_var_suffix = "++"
)

Arguments

name

string. A telling name for the model.

fitter

function. Fit the model to the data and validate it with this fitting function. See fitter_prototype() for its required interface. patroklos provides two fitters out of the box: ptk_zerosum(), a wrapper around zeroSum::zeroSum and ptk_ranger(), a wrapper around ranger::ranger(). To tune more than just one combination of hyperparameters, decorate a fitter with multitune().

directory

string. Store/find the Model with fit_obj set in this directory.

time_cutoffs

numeric vector. A model-agnostic hyperparameter that changes the response during training as follows: For every value of time_cutoffs, specify a model on the following response data:

  • For Cox response, censor all patients where the event occurred after time_cutoffs at this value and train the specified model.

  • For binary response, binarize the outcome depending on whether it occurred before or after this value.

We already tune this hyperparameter and only store the best model according to validation as fit_obj and report the chosen time cutoff as time_cutoff attribute in it.

val_error_fun

Function to calculate the error of validated predictions. For its interface, see val_error_fun_prototype().

hyperparams

list. Optional arguments passed to fitter, e.g. alpha in case of an elastic net.

include_from_continuous_pheno

vector of strings. The names of the continuous variables in the pheno data (to be) included in the predictor matrix. Default is NULL, which means no continuous pheno variables are or will be included.

include_from_discrete_pheno

vector of strings. The names of the discrete variables in the pheno data (to be) included in the predictor matrix. A discrete variable with n levels will be dichotomized into n-1 binary dummy variables. Default is NULL, which means no discrete pheno variables are or will be included.

include_expr

logical. Whether to include the expression data in the predictor matrix.

combine_n_max_categorical_features

integer. Maximum number of categorical features to combine in predicting features.

combined_feature_min_positive_ratio

numeric. Minimum ratio of positive observations in a combined (categorical) feature. This attribute together with combine_n_max_categorical_features governs which combined categorical features the predictor matrix will contain: add a combination of the levels of distinct categorical features to the predictor matrix after imputation if at most combine_n_max_categorical_features are involved in the combination and if the combination is (expected to be) there in at least combined_feature_min_positive_ratio of the samples.

enable_imputation

logical. If FALSE, it overrides the imputer attribute of the Data object and we do not impute.

file

string. The name of the model-fit_obj file inside directory. Default is "fit_obj.rds".

create_directory

logical. Whether to create directory if it does not exist, yet. Default is TRUE.

li_var_suffix

string. Append this to the names of features from the pheno data when adding them to the predictor matrix. Default is "++".

Returns

A Model R6 object.


Method fit()

Fit the model to a data set, validate it and tune hyperparameters.

Usage

Model$fit(data, update_model_shell = FALSE, quiet = FALSE, msg_prefix = "")

Arguments

data

Data object. Read it in if needed.

update_model_shell

logical. If TRUE and we find a stored model with fit_obj not being NULL, we set the fit_obj attribute of the model to the found fit_obj and save it. This way, we can keep stored Models up-to-date with changes in the Model class.

quiet

logical. Whether to suppress messages. Default is FALSE.

msg_prefix

string. Prefix for messages. Default is "".

Returns

The Model object itself with the fit_obj attribute set to the object tuned over time_cutoffs, combine_n_max_categorical_features and hyperparams.


Method predict()

Predict for a data set.

Usage

Model$predict(data, quiet = FALSE)

Arguments

data

Data object. Specifications on the data. Read it in if needed.

quiet

logical. Whether to suppress messages. Default is FALSE.

Returns

A named list of length 3:

  • "predicted": named numeric vector with predicted response,

  • "actual": named numeric vector with actual response,

  • "cox_mat": named matrix with columns "time_to_event", "event", "hazard", where hazard again is the predicted response. This matrix is helpful to calculate hazard ratios or logrank p-values.


Method clone()

The objects of this class are cloneable with this method.

Usage

Model$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.