A new equivalence based metric for predictive check to qualify mixed-effects models

Abstract
The main objective of any modeling exercise is to provide a rationale for effective decision making during drug development. The aim of the current simulation experiment was to evaluate the properties of predictive check as a covariate model qualification technique and, more importantly, to introduce and evaluate alternative criteria to qualify models. Original concentration-time profiles (yod) were simulated using a 1-compartment model for an intravenous drug administered to 25 men and 25 women. The typical clearance for male subjects (TVCLm) was assumed to be 5-fold higher than that for female subjects (TVCLf). Fifty such trials under the same design were generated randomly. Predictive check was used as the model qualification tool to study predictive performance of true (males ≠ females) and false (males=females) models in the context of maximum likelihood estimation. For each yod, 200 replications were generated to study the properties of a discrepancy variable, a statistic that depends on the model, and a test statistic, a statistic that does not depend on the model. Several qualification criteria were evaluated in assessing predictive performance, such as, predictive p-value (Pp), probability of equivalence (peqv), and probability of rejecting the null hypothesis (data=model) using the Kolmogorov-Smirnov test (pks). The Pp value was calculated using sum of squared errors as a discrepancy variable. For both of the models, the Pp values uniformly ranged between 0 and 1. The pattern of Pp values suggests that qualification of the false model is unlikely. For both of the models, the range of peqv is about 0.95 to 1.0 for concentration at 0.5 hours. However, this is not the case for the concentration at 4 hours, which is primarily dependent on the clearance. The false model (0.35 to 0.50) has poor predictive performance compared with the true model (0.65 to 0.80) using peqv. The pks suggests no difference in the distributions of replicated and original concentrations at all of the time points for both of the models. Discrepancy variables cannot aid in rejecting false models, whereas the use of a test statistic can aid in rejecting false models. However, selection of an informative test statistic is challenging. As far as the qualification criteria are considered, the equivalence-based comparison of a test statistic is more informative than a significance-based comparison. No convincing evidence exists in the literature demonstrating the added advantages of predictive check as a routine model qualification tool over the existing tools, such as diagnostic plots or mechanistic reasoning. However, when a model is to be used for designing a trial, it should at least be able to regenerate the data used to build the model. In such cases, predictive check might offer insights into potential inconsistencies.