Refine Search

New Search

Results in Journal Statistical Methods in Medical Research: 2,559

(searched for: journal_id:(95940))
Page of 52
Articles per Page
by
Show export options
  Select all
Chao Cheng, Abigail Sloan,
Statistical Methods in Medical Research; doi:10.1177/09622802211025992

Abstract:
By combining data across multiple studies, researchers increase sample size, statistical power, and precision for pooled analyses of biomarker–disease associations. However, researchers must adjust for between-study variability in biomarker measurements. Previous research often treats the biomarker measurements from a reference laboratory as a gold standard, even though those measurements are certainly not equal to their true values. This paper addresses measurement error and bias arising from both the reference and study-specific laboratories. We develop two calibration methods, the exact calibration method and approximate calibration method, for pooling biomarker data drawn from nested or matched case–control studies, where the calibration subset is obtained by randomly selecting controls from each contributing study. Simulation studies are conducted to evaluate the empirical performance of the proposed methods. We apply the proposed methods to a pooling project of nested case–control studies to evaluate the association between circulating 25-hydroxyvitamin D (25(OH)D) and colorectal cancer risk.
, Gary S Berger
Statistical Methods in Medical Research; doi:10.1177/09622802211023543

Abstract:
We conducted this study to determine whether fallopian tube anatomy can predict the likelihood of pregnancy and pregnancy outcomes after tubal sterilization reversal. We built a flexible, non-parametric, multivariate model via generalized additive models to assess the effects of the following tubal parameters observed during tubal reparative surgery: tubal lengths; differences in tubal segment location and diameters at the anastomosis sites; and fibrosis of the tubal muscularis. In this study, population, age, and tubal length—in that order—were the primary factors predicting the likelihood of pregnancy. For pregnancy outcomes, tubal length was the most influential predictor of birth and ectopic pregnancy, while age was the primary predictor of miscarriage. Segment location and diameters contributed slightly to the odds of miscarriage and ectopic pregnancy. Tubal muscularis fibrosis had little apparent effect. This study is the first to show that a statistical learning predictive model based on fallopian tube anatomy can predict pregnancy and pregnancy outcome probabilities after tubal reversal surgery.
Abdullah Qayed, Dong Han
Statistical Methods in Medical Research; doi:10.1177/09622802211009257

Abstract:
By collecting multiple sets per subject in microarray data, gene sets analysis requires characterize intra-subject variation using gene expression profiling. For each subject, the data can be written as a matrix with the different subsets of gene expressions (e.g. multiple tumor types) indexing the rows and the genes indexing the columns. To test the assumption of intra-subject (tumor) variation, we present and perform tests of multi-set sphericity and multi-set identity of covariance structures across subjects (tumor types). We demonstrate by both theoretical and empirical studies that the tests have good properties. We applied the proposed tests on The Cancer Genome Atlas (TCGA) and tested covariance structures for the gene expressions across several tumor types.
Mingyue Du, Hui Zhao, Jianguo Sun
Statistical Methods in Medical Research; doi:10.1177/09622802211009259

Abstract:
Cox’s proportional hazards model is the most commonly used model for regression analysis of failure time data and some methods have been developed for its variable selection under different situations. In this paper, we consider a general type of failure time data, case K interval-censored data, that include all of other types discussed as special cases, and propose a unified penalized variable selection procedure. In addition to its generality, another significant feature of the proposed approach is that unlike all of the existing variable selection methods for failure time data, the proposed approach allows dependent censoring, which can occur quite often and could lead to biased or misleading conclusions if not taken into account. For the implementation, a coordinate descent algorithm is developed and the oracle property of the proposed method is established. The numerical studies indicate that the proposed approach works well for practical situations and it is applied to a set of real data arising from Alzheimer’s Disease Neuroimaging Initiative study that motivated this study.
Deo Kumar Srivastava, E Olusegun George, Zhaohua Lu,
Statistical Methods in Medical Research; doi:10.1177/09622802211017592

Abstract:
Clinical trials with survival endpoints are typically designed to enroll patients for a specified number of years, (usually 2–3 years) with another specified duration of follow-up (usually 2–3 years). Under this scheme, patients who are alive or free of the event of interest at the termination of the study are censored. Consequently, a patient may be censored due to insufficient follow-up duration or due to being lost to follow-up. Potentially, this process could lead to unequal censoring in the treatment arms and lead to inaccurate and adverse conclusions about treatment effects. In this article, using extensive simulation studies, we assess the impact of such censorings on statistical procedures (the generalized logrank tests) for comparing two survival distributions and illustrate our observations by revisiting Mukherjee et al.’s 1 findings of cardiovascular events in patients who took Rofecoxib (Vioxx).
Matteo Bottai, Andrea Discacciati, Giola Santoni
Statistical Methods in Medical Research; doi:10.1177/09622802211022403

Abstract:
This paper introduces the event-probability function, a measure of occurrence of an event of interest over time, defined as the instantaneous probability of an event at a given time point conditional on having survived until that point. Unlike the hazard function, the event-probability function is a proper probability. This paper describes properties and interpretation of the event-probability function, presents its connection with other popular functions, such as the hazard and survival functions, proposes practical flexible proportional-odds models for estimating conditional event-probabilities given covariates with possibly censored and truncated observations, discusses the theoretical and computational aspects of parameter estimation, and applies the proposed models for assessing mortality in patients with metastatic renal carcinoma from a randomized clinical trial.
, Hae Jung, Saundra Buys, Mary Daly, Esther M John, John Hopper, Irene Andrulis, Mary Beth Terry,
Statistical Methods in Medical Research; doi:10.1177/09622802211008945

Abstract:
Mammographic screening and prophylactic surgery such as risk-reducing salpingo oophorectomy can potentially reduce breast cancer risks among mutation carriers of BRCA families. The evaluation of these interventions is usually complicated by the fact that their effects on breast cancer may change over time and by the presence of competing risks. We introduce a correlated competing risks model to model breast and ovarian cancer risks within BRCA1 families that accounts for time-varying covariates. Different parametric forms for the effects of time-varying covariates are proposed for more flexibility and a correlated gamma frailty model is specified to account for the correlated competing events.We also introduce a new ascertainment correction approach that accounts for the selection of families through probands affected with either breast or ovarian cancer, or unaffected. Our simulation studies demonstrate the good performances of our proposed approach in terms of bias and precision of the estimators of model parameters and cause-specific penetrances over different levels of familial correlations. We applied our new approach to 498 BRCA1 mutation carrier families recruited through the Breast Cancer Family Registry. Our results demonstrate the importance of the functional form of the time-varying covariate effect when assessing the role of risk-reducing salpingo oophorectomy on breast cancer. In particular, under the best fitting time-varying covariate model, the overall effect of risk-reducing salpingo oophorectomy on breast cancer risk was statistically significant in women with BRCA1 mutation.
Marie Alexandre, Mélanie Prague, Rodolphe Thiébaut
Published: 4 July 2021
by 10.1177
Statistical Methods in Medical Research; doi:10.1177/09622802211023963

The publisher has not yet granted permission to display this abstract.
, Hongtu Zhu, Mihye Ahn,
Published: 4 July 2021
by 10.1177
Statistical Methods in Medical Research; doi:10.1177/09622802211012015

The publisher has not yet granted permission to display this abstract.
Ariane M Mbekwe Yepnang, , Sandra M Eldridge, Bruno Giraudeau
Published: 4 July 2021
by 10.1177
Statistical Methods in Medical Research; doi:10.1177/09622802211026004

The publisher has not yet granted permission to display this abstract.
Amirhossein Alvandi, Armin Hatefi
Published: 4 July 2021
by 10.1177
Statistical Methods in Medical Research; doi:10.1177/09622802211025989

The publisher has not yet granted permission to display this abstract.
, , Lianming Wang,
Statistical Methods in Medical Research; doi:10.1177/09622802211023985

Abstract:
Failure time data with a cured subgroup are frequently confronted in various scientific fields and many methods have been proposed for their analysis under right or interval censoring. However, a cure model approach does not seem to exist in the analysis of partly interval-censored data, which consist of both exactly observed and interval-censored observations on the failure time of interest. In this article, we propose a two-component mixture cure model approach for analyzing such type of data. We employ a logistic model to describe the cured probability and a proportional hazards model to model the latent failure time distribution for uncured subjects. We consider maximum likelihood estimation and develop a new expectation-maximization algorithm for its implementation. The asymptotic properties of the resulting estimators are established and the finite sample performance of the proposed method is examined through simulation studies. An application to a set of real data on childhood mortality in Nigeria is provided.
Shintaro Yamamuro, , Satoshi Iimuro, Yutaka Matsuyama
Statistical Methods in Medical Research; doi:10.1177/09622802211025988

Abstract:
Modern causal mediation theory has formalized several types of indirect and direct effects of treatment on outcomes regarding specific mediator variables. We reviewed and unified distinct approaches to estimate the “interventional” direct and indirect effects for multiple mediators and time-varying variables. This study was motivated by a clinical trial of elderly type-2 diabetic patients in which atorvastatin was widely prescribed to control patients’ cholesterol levels to reduce diabetic complications, including cardiovascular disease. Among atorvastatin’s preventive side-effects (pleiotropic effects), we focus on its anti-inflammatory action as measured by white blood cell counts. Hence, we estimate atorvastatin’s interventional indirect effects through cholesterol lowering and through anti-inflammatory action, and interventional direct effect bypassing these two actions. In our analysis, total effect (six-year cardiovascular disease risk difference) estimated by standard plug-in g-formula of −3.65% (95% confidence interval: −10.29%, 4.38%) is decomposed into indirect effect via low-density lipoprotein cholesterol (−0.90% [−1.91%, −0.07%]), via white blood cell counts (−0.03% [−0.22%, 0.11%]), and direct effect (−2.84% [−9.71%, 5.41%]) by the proposed parametric mediational g-formula. The SAS program and its evaluation via simulated datasets are provided in the Supplemental materials.
Wei Wei, , Michael Kane, Daniel Zelterman
Statistical Methods in Medical Research; doi:10.1177/09622802211013062

Abstract:
Adaptive designs are gaining popularity in early phase clinical trials because they enable investigators to change the course of a study in response to accumulating data. We propose a novel design to simultaneously monitor several endpoints. These include efficacy, futility, toxicity and other outcomes in early phase, single-arm studies. We construct a recursive relationship to compute the exact probabilities of stopping for any combination of endpoints without the need for simulation, given pre-specified decision rules. The proposed design is flexible in the number and timing of interim analyses. A R Shiny app with user-friendly web interface has been created to facilitate the implementation of the proposed design.
, Jianwen Cai, Jason P Fine, Elisabeth P Dellon, Charles R Esther
Statistical Methods in Medical Research; doi:10.1177/09622802211023975

Abstract:
Proportional rates models are frequently used for the analysis of recurrent event data with multiple event categories. When some of the event categories are missing, a conventional approach is to either exclude the missing data for a complete-case analysis or employ a parametric model for the missing event type. It is well known that the complete-case analysis is inconsistent when the missingness depends on covariates, and the parametric approach may incur bias when the model is misspecified. In this paper, we aim to provide a more robust approach using a rate proportion method for the imputation of missing event types. We show that the log-odds of the event type can be written as a semiparametric generalized linear model, facilitating a theoretically justified estimation framework. Comprehensive simulation studies were conducted demonstrating the improved performance of the semiparametric method over parametric procedures. Multiple types of Pseudomonas aeruginosa infections of young cystic fibrosis patients were analyzed to demonstrate the feasibility of our proposed approach.
Zhen Meng, Qinglong Yang, Qizhai Li, Baoxue Zhang
Statistical Methods in Medical Research; doi:10.1177/09622802211002864

Abstract:
For a nonparametric Behrens-Fisher problem, a directional-sum test is proposed based on division-combination strategy. A one-layer wild bootstrap procedure is given to calculate its statistical significance. We conduct simulation studies with data generated from lognormal, t and Laplace distributions to show that the proposed test can control the type I error rates properly and is more powerful than the existing rank-sum and maximum-type tests under most of the considered scenarios. Applications to the dietary intervention trial further show the performance of the proposed test.
Jijia Wang, Jing Cao, , Chul Ahn
Statistical Methods in Medical Research; doi:10.1177/09622802211022392

Abstract:
The stepped-wedge cluster randomized design has been increasingly employed by pragmatic trials in health services research. In this study, based on the GEE approach, we present closed-form sample size calculation that is applicable to both closed-cohort and cross-sectional stepped wedge trials. Importantly, the proposed method is flexible to accommodate design issues routinely encountered in pragmatic trials, such as different within- and between-subject correlation structures, irregular crossover schedules for the switch to intervention, and missing data due to repeated measurements over prolonged follow-up. The closed-form formulas allow researchers to analytically assess the impact of different design factors on sample size requirement. We also recognize the potential issue of limited numbers of clusters in pragmatic stepped wedge trials and present an adjustment approach for underestimated variance of the treatment effect. We conduct extensive simulation to assess the performance of the proposed sample size method. An application example to a real clinical trial is presented.
Statistical Methods in Medical Research; doi:10.1177/09622802211022385

Abstract:
Meta-analysis of clinical trials targeting rare events face particular challenges when the data lack adequate number of events and are susceptible to high levels of heterogeneity. The standard meta-analysis methods (DerSimonian Laird (DL) and Mantel–Haenszel (MH)) often lead to serious distortions because of such data sparsity. Applications of the methods suited to specific incidence and heterogeneity characteristics are lacking, thus we compared nine available methods in a simulation study. We generated 360 meta-analysis scenarios where each considered different incidences, sample sizes, between-study variance (heterogeneity) and treatment allocation. We include globally recommended methods such as inverse-variance fixed/random-effect (IV-FE/RE), classical-MH, MH-FE, MH-DL, Peto, Peto-DL and the two extensions for MH bootstrapped-DL (bDL) and Peto-bDL. Performance was assessed on mean bias, mean error, coverage and power. In the absence of heterogeneity, the coverage and power when combined revealed small differences in meta-analysis involving rare and very rare events. The Peto-bDL method performed best, but only in smaller sample sizes involving rare events. For medium-to-larger sample sizes, MH-bDL was preferred. For meta-analysis involving very rare events, Peto-bDL was the best performing method which was sustained across all sample sizes. However, in meta-analysis with 20% or more heterogeneity, the coverage and power were insufficient. Performance based on mean bias and mean error was almost identical across methods. To conclude, in meta-analysis of rare binary outcomes, our results suggest that Peto-bDL is better in both rare and very rare event settings in meta-analysis with limited sample sizes. However, when heterogeneity is large, the coverage and power to detect rare events are insufficient. Whilst this study shows that some of the less studied methods appear to have good properties under sparse data scenarios, further work is needed to assess them against the more complex distributional-based methods to understand their overall performances.
Chun Yin Lee
Statistical Methods in Medical Research; doi:10.1177/09622802211022377

Abstract:
The area under the receiver operating characteristic curve (AUC) is one of the most popular measures for evaluating the performance of a predictive model. In nested models, the change in AUC (ΔAUC) can be a discriminatory measure of whether the newly added predictors provide significant improvement in terms of predictive accuracy. Recently, several authors have shown rigorously that ΔAUC can be degenerate and its asymptotic distribution is no longer normal when the reduced model is true, but it could be the distribution of a linear combination of some [Formula: see text] random variables [ 1 , 2 ]. Hence, the normality assumption and existing variance estimate cannot be applied directly for developing a statistical test under the nested models. In this paper, we first provide a brief review on the use of ΔAUC for comparing nested logistic models and the difficulty of retrieving the reference distribution behind. Then, we present a special case of the nested logistic regression models that the newly added predictor to the reduced model contains a change-point in its effects. A new test statistic based on ΔAUC is proposed in this setting. A simple resampling scheme is proposed to approximate the critical values for the test statistic. The inference of the change-point parameter is done via m-out-of- n bootstrap. Large-scale simulation is conducted to evaluate the finite-sample performance of the ΔAUC test for the change-point model. The proposed method is applied to two real-life datasets for illustration.
, David C. Hoaglin, İlyas Bakbergenuly
Published: 10 June 2021
by 10.1177
Statistical Methods in Medical Research; doi:10.1177/09622802211013065

Abstract:
Contemporary statistical publications rely on simulation to evaluate performance of new methods and compare them with established methods. In the context of random-effects meta-analysis of log-odds-ratios, we investigate how choices in generating data affect such conclusions. The choices we study include the overall log-odds-ratio, the distribution of probabilities in the control arm, and the distribution of study-level sample sizes. We retain the customary normal distribution of study-level effects. To examine the impact of the components of simulations, we assess the performance of the best available inverse–variance–weighted two-stage method, a two-stage method with constant sample-size-based weights, and two generalized linear mixed models. The results show no important differences between fixed and random sample sizes. In contrast, we found differences among data-generation models in estimation of heterogeneity variance and overall log-odds-ratio. This sensitivity to design poses challenges for use of simulation in choosing methods of meta-analysis.
, Mitsunori Ogawa, Yasuhiro Hagiwara, Yutaka Matsuyama
Statistical Methods in Medical Research; doi:10.1177/09622802211011197

Abstract:
In clinical and epidemiological studies using survival analysis, some explanatory variables are often missing. When this occurs, multiple imputation (MI) is frequently used in practice. In many cases, simple parametric imputation models are routinely adopted without checking the validity of the model specification. Misspecified imputation models can cause biased parameter estimates. In this study, we describe novel frequentist type MI procedures for survival analysis using proportional and additive hazards models. The procedures are based on non-parametric estimation techniques and do not require the correct specification of parametric imputation models. For continuous missing covariates, we first sample imputation values from a parametric imputation model. Then, we obtain estimates by solving the estimating equation modified by non-parametrically estimated conditional densities. For categorical missing covariates, we directly sample imputation values from a non-parametrically estimated conditional distribution and then obtain estimates by solving the corresponding estimating equation. We evaluate the performance of the proposed procedures using simulation studies: one uses simulated data; another uses data informed by parameters generated from a real-world medical claims database. We also applied the procedures to a pharmacoepidemiological study that examined the effect of antihyperlipidemics on hyperglycemia incidence.
K Edgar, D Jackson, K Rhodes, T Duffy, C-F Burman,
Statistical Methods in Medical Research; doi:10.1177/09622802211017574

Abstract:
Background The number of Phase III trials that include a biomarker in design and analysis has increased due to interest in personalised medicine. For genetic mutations and other predictive biomarkers, the trial sample comprises two subgroups, one of which, say [Formula: see text] is known or suspected to achieve a larger treatment effect than the other [Formula: see text]. Despite treatment effect heterogeneity, trials often draw patients from both subgroups, since the lower responding [Formula: see text] subgroup may also gain benefit from the intervention. In this case, regulators/commissioners must decide what constitutes sufficient evidence to approve the drug in the [Formula: see text] population. Methods and Results Assuming trial analysis can be completed using generalised linear models, we define and evaluate three frequentist decision rules for approval. For rule one, the significance of the average treatment effect in [Formula: see text] should exceed a pre-defined minimum value, say [Formula: see text]. For rule two, the data from the low-responding group [Formula: see text] should increase statistical significance. For rule three, the subgroup-treatment interaction should be non-significant, using type I error chosen to ensure that estimated difference between the two subgroup effects is acceptable. Rules are evaluated based on conditional power, given that there is an overall significant treatment effect. We show how different rules perform according to the distribution of patients across the two subgroups and when analyses include additional (stratification) covariates in the analysis, thereby conferring correlation between subgroup effects. Conclusions When additional conditions are required for approval of a new treatment in a lower response subgroup, easily applied rules based on minimum effect sizes and relaxed interaction tests are available. Choice of rule is influenced by the proportion of patients sampled from the two subgroups but less so by the correlation between subgroup effects.
Jan P Burgard, , Ralf Münnich, Domingo Morales
Statistical Methods in Medical Research; doi:10.1177/09622802211017583

Abstract:
Obesity is considered to be one of the primary health risks in modern industrialized societies. Estimating the evolution of its prevalence over time is an essential element of public health reporting. This requires the application of suitable statistical methods on epidemiologic data with substantial local detail. Generalized linear-mixed models with medical treatment records as covariates mark a powerful combination for this purpose. However, the task is methodologically challenging. Disease frequencies are subject to both regional and temporal heterogeneity. Medical treatment records often show strong internal correlation due to diagnosis-related grouping. This frequently causes excessive variance in model parameter estimation due to rank-deficiency problems. Further, generalized linear-mixed models are often estimated via approximate inference methods as their likelihood functions do not have closed forms. These problems combined lead to unacceptable uncertainty in prevalence estimates over time. We propose an l2-penalized temporal logit-mixed model to solve these issues. We derive empirical best predictors and present a parametric bootstrap to estimate their mean-squared errors. A novel penalized maximum approximate likelihood algorithm for model parameter estimation is stated. With this new methodology, the regional obesity prevalence in Germany from 2009 to 2012 is estimated. We find that the national prevalence ranges between 15 and 16%, with significant regional clustering in eastern Germany.
Simón Ramírez, Adolfo J Quiroz,
Statistical Methods in Medical Research; doi:10.1177/09622802211009258

Abstract:
There is a well-established tradition within the statistics literature that explores different techniques for reducing the dimensionality of large feature spaces. The problem is central to machine learning and it has been largely explored under the unsupervised learning paradigm. We introduce a supervised clustering methodology that capitalizes on a Metropolis Hastings algorithm to optimize the partition structure of a large categorical feature space tailored towards minimizing the test error of a learning algorithm. This is a general methodology that can be applied to any supervised learning problem with a large categorical feature space. We show the benefits of the algorithm by applying this methodology to the problem of risk adjustment in competitive health insurance markets. We use a large claims data set that records ICD-10 codes, a large categorical feature space. We aim at improving risk adjustment by clustering diagnostic codes into risk groups suitable for health expenditure prediction. We test the performance of our methodology against common alternatives using panel data from a representative sample of twenty three million citizens in Colombian Healthcare System. Our results outperform common alternatives and suggest that it has potential to improve risk adjustment.
Statistical Methods in Medical Research; doi:10.1177/09622802211017299

Abstract:
Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre-screening of the variables may be necessary. A number of procedures for such pre-screening has been developed; among them the Sure Independence Screening (SIS) enjoys some popularity. However, SIS is vulnerable to outliers in the data, and in particular in small samples this may lead to faulty inference. In this paper, we develop a new robust screening procedure. We build on the density power divergence (DPD) estimation approach and introduce DPD-SIS and its extension iterative DPD-SIS. We illustrate the behavior of the methods through extensive simulation studies and show that they are superior to both the original SIS and other robust methods when there are outliers in the data. Finally, we illustrate its use in a study on regulation of lipid metabolism.
, Fabio Paredes, Claudio Vargas, Catterina Ferreccio
Statistical Methods in Medical Research; doi:10.1177/09622802211013830

Abstract:
A key hypothesis in epidemiological studies is that time to disease exposure provides relevant information to be considered in statistical models. However, the initiation time of a particular condition is usually unknown. Therefore, we developed a multiple imputation methodology for the age at onset of a particular condition, which is supported by incidence data from different sources of information. We introduced and illustrated such a methodology using simulated data in order to examine the performance of our proposal. Then, we analyzed the association of gallstones and fatty liver disease in the Maule Cohort, a Chilean study of chronic diseases, using participants’ risk factors and six sources of information for the imputation of the age-occurrence of gallstones. Simulated studies showed that an increase in the proportion of imputed data does not affect the quality of the estimated coefficients associated with fully observed variables, while the imputed variable slowly reduces its effect. For the Chilean study, the categorized exposure time to gallstones is a significant variable, in which participants who had short and long exposure have, respectively, 26.2% and 29.1% higher chance of getting a fatty liver disease than non-exposed ones. In conclusion, our multiple imputation approach proved to be quite robust both in the linear/logistic regression simulation studies and in the real application, showing the great potential of this methodology.
Hormatollah Pourreza, , Einolah Deiri
Statistical Methods in Medical Research; doi:10.1177/09622802211009262

Abstract:
In this paper, we concentrate on the statistical properties of Gamma-X family of distributions. A special case of this family is the Gamma-Weibull distribution. Therefore, the statistical properties of Gamma-Weibull distribution as a sub-model of Gamma-X family are discussed such as moments, variance, skewness, kurtosis and Rényi entropy. Also, the parameters of the Gamma-Weibull distribution are estimated by the method of maximum likelihood. Some sub-models of the Gamma-X are investigated, including the cumulative distribution, probability density, survival and hazard functions. The Monte Carlo simulation study is conducted to assess the performances of these estimators. Finally, the adequacy of Gamma-Weibull distribution in data modeling is verified by the two clinical real data sets. Mathematics Subject Classification: 62E99; 62E15
, You-Gan Wang
Statistical Methods in Medical Research; doi:10.1177/09622802211012012

Abstract:
In robust regression, it is usually assumed that the distribution of the error term is symmetric or the data are symmetrically contaminated by outliers. However, this assumption is usually not satisfied in practical problems, and thus if the traditional robust methods, such as Tukey’s biweight and Huber’s method, are used to estimate the regression parameters, the efficiency of the parameter estimation can be lost. In this paper, we construct an asymmetric Tukey’s biweight loss function with two tuning parameters and propose a data-driven method to find the most appropriate tuning parameters. Furthermore, we provide an adaptive algorithm to obtain robust and efficient parameter estimates. Our extensive simulation studies suggest that the proposed method performs better than the symmetric methods when error terms follow an asymmetric distribution or are asymmetrically contaminated. Finally, a cardiovascular risk factors dataset is analyzed to illustrate the proposed method.
, , Jagbir Gill, Clifford Miles, Troy Plumb
Statistical Methods in Medical Research; doi:10.1177/09622802211009265

Abstract:
This functional joint model paper is motivated by a chronic kidney disease study post kidney transplantation. The available kidney organ is a scarce resource because millions of end-stage renal patients are on the waiting list for kidney transplantation. The life of the transplanted kidney can be extended if the progression of the chronic kidney disease stage can be slowed, and so a major research question is how to extend the transplanted kidney life to maximize the usage of the scarce organ resource. The glomerular filtration rate is the best test to monitor the progression of the kidney function, and it is a continuous longitudinal outcome with repeated measures. The patient’s survival status is characterized by time-to-event outcomes including kidney transplant failure, death with kidney function, and death without kidney function. Few studies have been carried out to simultaneously investigate these multiple clinical outcomes in chronic kidney disease stage patients based on a joint model. Therefore, this paper proposes a new functional joint model from this clinical chronic kidney disease study. The proposed joint models include a longitudinal sub-model with a flexible basis function for subject-level trajectories and a competing-risks sub-model for multiple time-to event outcomes. The different association structures can be accomplished through a time-dependent function of shared random effects from the longitudinal process or the whole longitudinal history in the competing-risks sub-model. The proposed joint model that utilizes basis function and competing-risks sub-model is an extension of the standard linear joint models. The application results from the proposed joint model can supply some useful clinical references for chronic kidney disease study post kidney transplantation.
Leyla Azarang, Roch Giorgi, the CENSUR working survival group
Statistical Methods in Medical Research, Volume 30, pp 1538-1553; doi:10.1177/09622802211003608

Abstract:
Recently, there has been a lot of development in relative survival field. In the absence of data on the cause of death, the research has tended to focus on the estimation of survival probability of a cancer (as a disease of interest). In many cancers, one nonfatal event that decreases the survival probability can occur. There are a few methods that assess the role of prognostic factors for multiple types of clinical events while dealing with uncertainty about the cause of death. However, these methods require proportional hazard or Markov assumptions. In practice, one or both of these assumptions might be violated. Violation of the proportional hazard assumption can lead to estimates that are biased, and difficult to interpret and violation of Markov assumption results in inconsistent estimators. In this work, we propose a semi-parametric approach to estimate the possibly time-varying regression coefficients in the likely non-Markov relative survival progressive illness-death model. The performance of the proposed estimator is investigated through simulations. We illustrate our approach using data from a study on rectal cancer resected for cure conducted in two French population-based digestive cancer registries.
Katy C Molina, , Vera D Tomazella, Eder A Milani
Statistical Methods in Medical Research; doi:10.1177/09622802211011187

Abstract:
Survival models with a frailty term are presented as an extension of Cox’s proportional hazard model, in which a random effect is introduced in the hazard function in a multiplicative form with the aim of modeling the unobserved heterogeneity in the population. Candidates for the frailty distribution are assumed to be continuous and non-negative. However, this assumption may not be true in some situations. In this paper, we consider a discretely distributed frailty model that allows units with zero frailty, that is, it can be interpreted as having long-term survivors. We propose a new discrete frailty-induced survival model with a zero-modified power series family, which can be zero-inflated or zero-deflated depending on the parameter value. Parameter estimation was obtained using the maximum likelihood method, and the performance of the proposed models was performed by Monte Carlo simulation studies. Finally, the applicability of the proposed models was illustrated with a real melanoma cancer data set.
Statistical Methods in Medical Research; doi:10.1177/09622802211008939

Abstract:
As the interactions between people increases, the impending menace of COVID-19 outbreaks materializes, and there is an inclination to apply lockdowns. In this context, it is essential to have easy-to-use indicators for people to employ as a reference. The effective reproduction number of confirmed positives, Rt, fulfills such a role. This document proposes a data-driven approach to nowcast Rt based on previous observations’ statistical behavior. As more information arrives, the method naturally becomes more precise about the final count of confirmed positives. Our method’s strength is that it is based on the self-reported onset of symptoms, in contrast to other methods that use the daily report’s count to infer this quantity. We show that our approach may be the foundation for determining useful epidemy tracking indicators.
, , Yichen Qin, Feifang Hu
Statistical Methods in Medical Research; doi:10.1177/09622802211008206

Abstract:
Concerns have been expressed over the validity of statistical inference under covariate-adaptive randomization despite the extensive use in clinical trials. In the literature, the inferential properties under covariate-adaptive randomization have been mainly studied for continuous responses; in particular, it is well known that the usual two-sample t-test for treatment effect is typically conservative. This phenomenon of invalid tests has also been found for generalized linear models without adjusting for the covariates and are sometimes more worrisome due to inflated Type I error. The purpose of this study is to examine the unadjusted test for treatment effect under generalized linear models and covariate-adaptive randomization. For a large class of covariate-adaptive randomization methods, we obtain the asymptotic distribution of the test statistic under the null hypothesis and derive the conditions under which the test is conservative, valid, or anti-conservative. Several commonly used generalized linear models, such as logistic regression and Poisson regression, are discussed in detail. An adjustment method is also proposed to achieve a valid size based on the asymptotic results. Numerical studies confirm the theoretical findings and demonstrate the effectiveness of the proposed adjustment method.
Yun Li, Irina Bondarenko, Michael R Elliott, Timothy P Hofer, Jeremy Mg Taylor
Statistical Methods in Medical Research, Volume 30, pp 1428-1444; doi:10.1177/09622802211002866

Abstract:
With medical tests becoming increasingly available, concerns about over-testing, over-treatment and health care cost dramatically increase. Hence, it is important to understand the influence of testing on treatment selection in general practice. Most statistical methods focus on average effects of testing on treatment decisions. However, this may be ill-advised, particularly for patient subgroups that tend not to benefit from such tests. Furthermore, missing data are common, representing large and often unaddressed threats to the validity of most statistical methods. Finally, it is often desirable to conduct analyses that can be interpreted causally. Using the Rubin Causal Model framework, we propose to classify patients into four potential outcomes subgroups, defined by whether or not a patient’s treatment selection is changed by the test result and by the direction of how the test result changes treatment selection. This subgroup classification naturally captures the differential influence of medical testing on treatment selections for different patients, which can suggest targets to improve the utilization of medical tests. We can then examine patient characteristics associated with patient potential outcomes subgroup memberships. We used multiple imputation methods to simultaneously impute the missing potential outcomes as well as regular missing values. This approach can also provide estimates of many traditional causal quantities of interest. We find that explicitly incorporating causal inference assumptions into the multiple imputation process can improve the precision for some causal estimates of interest. We also find that bias can occur when the potential outcomes conditional independence assumption is violated; sensitivity analyses are proposed to assess the impact of this violation. We applied the proposed methods to examine the influence of 21-gene assay, the most commonly used genomic test in the United States, on chemotherapy selection among breast cancer patients.
, Chen Qu, Rumana Z Omar, , Ewout W Steyerberg, Ian R White, Gareth Ambler
Statistical Methods in Medical Research; doi:10.1177/09622802211007522

Abstract:
Risk-prediction models for health outcomes are used in practice as part of clinical decision-making, and it is essential that their performance be externally validated. An important aspect in the design of a validation study is choosing an adequate sample size. In this paper, we investigate the sample size requirements for validation studies with binary outcomes to estimate measures of predictive performance (C-statistic for discrimination and calibration slope and calibration in the large). We aim for sufficient precision in the estimated measures. In addition, we investigate the sample size to achieve sufficient power to detect a difference from a target value. Under normality assumptions on the distribution of the linear predictor, we obtain simple estimators for sample size calculations based on the measures above. Simulation studies show that the estimators perform well for common values of the C-statistic and outcome prevalence when the linear predictor is marginally Normal. Their performance deteriorates only slightly when the normality assumptions are violated. We also propose estimators which do not require normality assumptions but require specification of the marginal distribution of the linear predictor and require the use of numerical integration. These estimators were also seen to perform very well under marginal normality. Our sample size equations require a specified standard error (SE) and the anticipated C-statistic and outcome prevalence. The sample size requirement varies according to the prognostic strength of the model, outcome prevalence, choice of the performance measure and study objective. For example, to achieve an SE < 0.025 for the C-statistic, 60–170 events are required if the true C-statistic and outcome prevalence are between 0.64–0.85 and 0.05–0.3, respectively. For the calibration slope and calibration in the large, achieving SE < 0.15[Formula: see text]would require 40–280 and 50–100 events, respectively. Our estimators may also be used for survival outcomes when the proportion of censored observations is high.
, M Ganjali
Statistical Methods in Medical Research, Volume 30, pp 1484-1501; doi:10.1177/09622802211002868

Abstract:
Joint modeling of zero-inflated count and time-to-event data is usually performed by applying the shared random effect model. This kind of joint modeling can be considered as a latent Gaussian model. In this paper, the approach of integrated nested Laplace approximation (INLA) is used to perform approximate Bayesian approach for the joint modeling. We propose a zero-inflated hurdle model under Poisson or negative binomial distributional assumption as sub-model for count data. Also, a Weibull model is used as survival time sub-model. In addition to the usual joint linear model, a joint partially linear model is also considered to take into account the non-linear effect of time on the longitudinal count response. The performance of the method is investigated using some simulation studies and its achievement is compared with the usual approach via the Bayesian paradigm of Monte Carlo Markov Chain (MCMC). Also, we apply the proposed method to analyze two real data sets. The first one is the data about a longitudinal study of pregnancy and the second one is a data set obtained of a HIV study.
Xubiao Peng, Ebrima Gibbs, Judith M Silverman, Neil R Cashman,
Statistical Methods in Medical Research, Volume 30, pp 1502-1522; doi:10.1177/09622802211002861

Abstract:
Multiple different screening tests for candidate leads in drug development may often yield conflicting or ambiguous results, sometimes making the selection of leads a nontrivial maximum-likelihood ranking problem. Here, we employ methods from the field of multiple criteria decision making (MCDM) to the problem of screening candidate antibody therapeutics. We employ the SMAA-TOPSIS method to rank a large cohort of antibodies using up to eight weighted screening criteria, in order to find lead candidate therapeutics for Alzheimer’s disease, and determine their robustness to both uncertainty in screening measurements, as well as uncertainty in the user-defined weights of importance attributed to each screening criterion. To choose lead candidates and measure the confidence in their ranking, we propose two new quantities, the Retention Probability and the Topness, as robust measures for ranking. This method may enable more systematic screening of candidate therapeutics when it becomes difficult intuitively to process multi-variate screening data that distinguishes candidates, so that additional candidates may be exposed as potential leads, increasing the likelihood of success in downstream clinical trials. The method properly identifies true positives and true negatives from synthetic data, its predictions correlate well with known clinically approved antibodies vs. those still in trials, and it allows for ranking analyses using antibody developability profiles in the literature. We provide a webserver where users can apply the method to their own data: http://bjork.phas.ubc.ca .
, Jr Frank E Harrell, Ewout W Steyerberg
Statistical Methods in Medical Research, Volume 30, pp 1465-1483; doi:10.1177/09622802211002867

Abstract:
Machine learning approaches are increasingly suggested as tools to improve prediction of clinical outcomes. We aimed to identify when machine learning methods perform better than a classical learning method. We hereto examined the impact of the data-generating process on the relative predictive accuracy of six machine and statistical learning methods: bagged classification trees, stochastic gradient boosting machines using trees as the base learners, random forests, the lasso, ridge regression, and unpenalized logistic regression. We performed simulations in two large cardiovascular datasets which each comprised an independent derivation and validation sample collected from temporally distinct periods: patients hospitalized with acute myocardial infarction (AMI, n = 9484 vs. n = 7000) and patients hospitalized with congestive heart failure (CHF, n = 8240 vs. n = 7608). We used six data-generating processes based on each of the six learning methods to simulate outcomes in the derivation and validation samples based on 33 and 28 predictors in the AMI and CHF data sets, respectively. We applied six prediction methods in each of the simulated derivation samples and evaluated performance in the simulated validation samples according to c-statistic, generalized R2, Brier score, and calibration. While no method had uniformly superior performance across all six data-generating process and eight performance metrics, (un)penalized logistic regression and boosted trees tended to have superior performance to the other methods across a range of data-generating processes and performance metrics. This study confirms that classical statistical learning methods perform well in low-dimensional settings with large data sets.
Parisa Azimaee, , Yaser Maddahi
Statistical Methods in Medical Research, Volume 30, pp 1523-1537; doi:10.1177/09622802211003620

Abstract:
Quantifying the tool–tissue interaction forces in surgery can be used in the training process of novice surgeons to help them better handle surgical tools and avoid exerting excessive forces. A significant challenge concerns the development of proper statistical learning techniques to model the relationship between the true force exerted on the tissue and several outputs read from sensors mounted on the surgical tools. We propose a nonparametric bootstrap technique and a Bayesian multilevel modeling methodology to estimate the true forces. We use the linear exponential loss function to asymmetrically penalize the over and underestimation of the applied forces to the tissue. We incorporate the direction of the force as a group factor in our analysis. A weighted approach is used to account for the nonhomogeneity of read voltages from the surgical tool. Our proposed Bayesian multilevel models provide estimates that are more accurate than those under the maximum likelihood and restricted maximum likelihood approaches. Moreover, confidence bounds are much narrower and the biases and root mean squared errors are significantly smaller in our multilevel models with the linear exponential loss function.
Xiaoxiao Zhou,
Statistical Methods in Medical Research, Volume 30, pp 1554-1572; doi:10.1177/09622802211003113

Abstract:
Mediation analysis aims to decompose a total effect into specific pathways and investigate the underlying causal mechanism. Although existing methods have been developed to conduct mediation analysis in the context of survival models, none of these methods accommodates the existence of a substantial proportion of subjects who never experience the event of interest, even if the follow-up is sufficiently long. In this study, we consider mediation analysis for the mixture of Cox proportional hazards cure models that cope with the cure fraction problem. Path-specific effects on restricted mean survival time and survival probability are assessed by introducing a partially latent group indicator and applying the mediation formula approach in a three-stage mediation framework. A Bayesian approach with P-splines for approximating the baseline hazard function is developed to conduct analysis. The satisfactory performance of the proposed method is verified through simulation studies. An application of the Alzheimer’s disease (AD) neuroimaging initiative dataset investigates the causal effects of APOE-[Formula: see text] allele on AD progression.
Zhiping Qiu, Huijuan Ma, Jianwei Chen, Gregg E Dinse
Statistical Methods in Medical Research; doi:10.1177/0962280221995986

Abstract:
The quantile regression model has increasingly become a useful approach for analyzing survival data due to its easy interpretation and flexibility in exploring the dynamic relationship between a time-to-event outcome and the covariates. In this paper, we consider the quantile regression model for survival data with missing censoring indicators. Based on the augmented inverse probability weighting technique, two weighted estimating equations are developed and corresponding easily implemented algorithms are suggested to solve the estimating equations. Asymptotic properties of the resultant estimators and the resampling-based inference procedures are established. Finally, the finite sample performances of the proposed approaches are investigated in simulation studies and a real data application.
Yingrui Yang,
Statistical Methods in Medical Research; doi:10.1177/0962280221998410

Abstract:
In epidemiology, identifying the effect of exposure variables in relation to a time-to-event outcome is a classical research area of practical importance. Incorporating propensity score in the Cox regression model, as a measure to control for confounding, has certain advantages when outcome is rare. However, in situations involving exposure measured with moderate to substantial error, identifying the exposure effect using propensity score in Cox models remains a challenging yet unresolved problem. In this paper, we propose an estimating equation method to correct for the exposure misclassification-caused bias in the estimation of exposure-outcome associations. We also discuss the asymptotic properties and derive the asymptotic variances of the proposed estimators. We conduct a simulation study to evaluate the performance of the proposed estimators in various settings. As an illustration, we apply our method to correct for the misclassification-caused bias in estimating the association of PM2.5 level with lung cancer mortality using a nationwide prospective cohort, the Nurses’ Health Study. The proposed methodology can be applied using our user-friendly R program published online.
, Tobias F Chirwa, Jim Todd, Eustasius Musenge
Statistical Methods in Medical Research; doi:10.1177/0962280221997507

Abstract:
There are numerous fields of science in which multistate models are used, including biomedical research and health economics. In biomedical studies, these stochastic continuous-time models are used to describe the time-to-event life history of an individual through a flexible framework for longitudinal data. The multistate framework can describe more than one possible time-to-event outcome for a single individual. The standard estimation quantities in multistate models are transition probabilities and transition rates which can be mapped through the Kolmogorov-Chapman forward equations from the Bayesian estimation perspective. Most multistate models assume the Markov property and time homogeneity; however, if these assumptions are violated, an extension to non-Markovian and time-varying transition rates is possible. This manuscript extends reviews in various types of multistate models, assumptions, methods of estimation and data features compatible with fitting multistate models. We highlight the contrast between the frequentist (maximum likelihood estimation) and the Bayesian estimation approaches in the multistate modeling framework and point out where the latter is advantageous. A partially observed and aggregated dataset from the Zimbabwe national ART program was used to illustrate the use of Kolmogorov-Chapman forward equations. The transition rates from a three-stage reversible multistate model based on viral load measurements in WinBUGS were reported.
Yalda Zarnegarnia, Shari Messinger
Statistical Methods in Medical Research; doi:10.1177/0962280221995956

Abstract:
Receiver operating characteristic curves are widely used in medical research to illustrate biomarker performance in binary classification, particularly with respect to disease or health status. Study designs that include related subjects, such as siblings, usually have common environmental or genetic factors giving rise to correlated biomarker data. The design could be used to improve detection of biomarkers informative of increased risk, allowing initiation of treatment to stop or slow disease progression. Available methods for receiver operating characteristic construction do not take advantage of correlation inherent in this design to improve biomarker performance. This paper will briefly review some developed methods for receiver operating characteristic curve estimation in settings with correlated data from case–control designs and will discuss the limitations of current methods for analyzing correlated familial paired data. An alternative approach using conditional receiver operating characteristic curves will be demonstrated. The proposed approach will use information about correlation among biomarker values, producing conditional receiver operating characteristic curves that evaluate the ability of a biomarker to discriminate between affected and unaffected subjects in a familial paired design.
Statistical Methods in Medical Research; doi:10.1177/0962280221990415

Abstract:
The modified Poisson regression coupled with a robust sandwich variance has become a viable alternative to log-binomial regression for estimating the marginal relative risk in cluster randomized trials. However, a corresponding sample size formula for relative risk regression via the modified Poisson model is currently not available for cluster randomized trials. Through analytical derivations, we show that there is no loss of asymptotic efficiency for estimating the marginal relative risk via the modified Poisson regression relative to the log-binomial regression. This finding holds both under the independence working correlation and under the exchangeable working correlation provided a simple modification is used to obtain the consistent intraclass correlation coefficient estimate. Therefore, the sample size formulas developed for log-binomial regression naturally apply to the modified Poisson regression in cluster randomized trials. We further extend the sample size formulas to accommodate variable cluster sizes. An extensive Monte Carlo simulation study is carried out to validate the proposed formulas. We find that the proposed formulas have satisfactory performance across a range of cluster size variability, as long as suitable finite-sample corrections are applied to the sandwich variance estimator and the number of clusters is at least 10. Our findings also suggest that the sample size estimate under the exchangeable working correlation is more robust to cluster size variability, and recommend the use of an exchangeable working correlation over an independence working correlation for both design and analysis. The proposed sample size formulas are illustrated using the Stop Colorectal Cancer (STOP CRC) trial.
, Nicholas Illenberger, Jason A Roy, Nandita Mitra
Published: 7 April 2021
by 10.1177
Statistical Methods in Medical Research, Volume 30, pp 1306-1319; doi:10.1177/0962280221995972

The publisher has not yet granted permission to display this abstract.
Statistical Methods in Medical Research; doi:10.1177/0962280220988570

Abstract:
Log-rank tests have been widely used to compare two survival curves in biomedical research. We describe a unified approach to power and sample size calculation for the unweighted and weighted log-rank tests in superiority, noninferiority and equivalence trials. It is suitable for both time-driven and event-driven trials. A numerical algorithm is suggested. It allows flexible specification of the patient accrual distribution, baseline hazards, and proportional or nonproportional hazards patterns, and enables efficient sample size calculation when there are a range of choices for the patient accrual pattern and trial duration. A confidence interval method is proposed for the trial duration of an event-driven trial. We point out potential issues with several popular sample size formulae. Under proportional hazards, the power of a survival trial is commonly believed to be determined by the number of observed events. The belief is roughly valid for noninferiority and equivalence trials with similar survival and censoring distributions between two groups, and for superiority trials with balanced group sizes. In unbalanced superiority trials, the power depends also on other factors such as data maturity. Surprisingly, the log-rank test usually yields slightly higher power than the Wald test from the Cox model under proportional hazards in simulations. We consider various nonproportional hazards patterns induced by delayed effects, cure fractions, and/or treatment switching. Explicit power formulae are derived for the combination test that takes the maximum of two or more weighted log-rank tests to handle uncertain nonproportional hazards patterns. Numerical examples are presented for illustration.
, Alexander Petersen, Juan C Vidal,
Statistical Methods in Medical Research, Volume 30, pp 1445-1464; doi:10.1177/0962280221998064

Abstract:
Biosensor data have the potential to improve disease control and detection. However, the analysis of these data under free-living conditions is not feasible with current statistical techniques. To address this challenge, we introduce a new functional representation of biosensor data, termed the glucodensity, together with a data analysis framework based on distances between them. The new data analysis procedure is illustrated through an application in diabetes with continuous-time glucose monitoring (CGM) data. In this domain, we show marked improvement with respect to state-of-the-art analysis methods. In particular, our findings demonstrate that (i) the glucodensity possesses an extraordinary clinical sensitivity to capture the typical biomarkers used in the standard clinical practice in diabetes; (ii) previous biomarkers cannot accurately predict glucodensity, so that the latter is a richer source of information and; (iii) the glucodensity is a natural generalization of the time in range metric, this being the gold standard in the handling of CGM data. Furthermore, the new method overcomes many of the drawbacks of time in range metrics and provides more in-depth insight into assessing glucose metabolism.
Zhaoxin Ye, , Donna L Coffman
Statistical Methods in Medical Research, Volume 30, pp 1413-1427; doi:10.1177/0962280221997505

Abstract:
Causal mediation effect estimates can be obtained from marginal structural models using inverse probability weighting with appropriate weights. In order to compute weights, treatment and mediator propensity score models need to be fitted first. If the covariates are high-dimensional, parsimonious propensity score models can be developed by regularization methods including LASSO and its variants. Furthermore, in a mediation setup, more efficient direct or indirect effect estimators can be obtained by using outcome-adaptive LASSO to select variables for propensity score models by incorporating the outcome information. A simulation study is conducted to assess how different regularization methods can affect the performance of estimated natural direct and indirect effect odds ratios. Our simulation results show that regularizing propensity score models by outcome-adaptive LASSO can improve the efficiency of the natural effect estimators and by optimizing balance in the covariates, bias can be reduced in most cases. The regularization methods are then applied to MIMIC-III database, an ICU database developed by MIT.
, Ruosha Li, Limin Peng,
Statistical Methods in Medical Research, Volume 30, pp 1332-1346; doi:10.1177/0962280221995977

Abstract:
The inactivity time, or lost lifespan specifically for mortality data, concerns time from occurrence of an event of interest to the current time point and has recently emerged as a new summary measure for cumulative information inherent in time-to-event data. This summary measure provides several benefits over the traditional methods, including more straightforward interpretation yet less sensitivity to heavy censoring. However, there exists no systematic modeling approach to inferring the quantile inactivity time in the literature. In this paper, we propose a semi-parametric regression method for the quantiles of the inactivity time distribution under right censoring. The consistency and asymptotic normality of the regression parameters are established. To avoid estimation of the probability density function of the inactivity time distribution under censoring, we propose a computationally efficient method for estimating the variance–covariance matrix of the regression coefficient estimates. Simulation results are presented to validate the finite sample properties of the proposed estimators and test statistics. The proposed method is illustrated with a real dataset from a clinical trial on breast cancer.
Page of 52
Articles per Page
by
Show export options
  Select all
Back to Top Top