SURROGATE ENDPOINTS AND BIOMARKERS
The FDA-NIH Biomarker Working Group (2016) defines a biomarker as a “characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or responses to an exposure or intervention, including therapeutic interventions.” It is noted that there is a clear distinction between biomarkers and clinical outcome assessments (COAs), which typically relate to how an individual feels or functions, or how long the person lives. COAs are measured using a report generated by a clinician, patient, non-clinician observer, or a performance-based assessment, and, unlike biomarkers, can be used to quantify treatment effect in a clinical trial. Different types of biomarkers may be defined, depending on their intended use. For example, the so-called predictive biomarkers help to identify patients that are likely to benefit from or be harmed by a treatment option. On the other hand, prognostic biomarkers help to assess the likelihood of a clinical event, including disease recurrence or progression. In certain situations, pharmacodynamic biomarkers may be used to determine the occurrence of a biological response to treatment in an individual. Other categories include predisposition or susceptibility biomarkers, used in the determination of the risk of developing a disease; and diagnostic biomarkers, concerned with the identification of individuals with the disease or condition of interest. The development of biomarkers involves adherence to strict regulatory requirements. This entails obtaining qualification that is based on robust evidence that demonstrates the biomarkers are fit for purpose in drug development and evaluation. Once a biomarker is qualified, it has the potential to provide critical information to enhance clinical-trial design and facilitate the regulatory review process.
In personalized medicine, certain molecular targeted therapies tend to involve a high cost of delivery. In such instances, use of biomarkers that predict response may be a viable alternative to gain efficiency. This is especially attractive, provided the cost associated with the use of companion diagnostics, which involves testing before giving treatment, is not in excess of the savings obtained by tailoring treatment only to the target patient population.
Surrogate endpoints relate to a small class of biomarkers that serve as a substitute for clinical outcomes, which directly measure how patients feel, function, or survive. Surrogate endpoints are particularly preferred when the desired clinical outcomes are not readily obtainable for practical or ethical reasons. Thus, the primary function of a surrogate endpoint is to predict, but not measure, clinical benefit or harm. However, the validity and reliability of a surrogate endpoint must first be established before it can be used in medical research or clinical practice. This requires extensive testing to see how well they predict, or correlate with, clinical benefit. In general, the predictive capacity of a surrogate endpoint is evaluated based on data from a variety of sources, including epidemiologic, therapeutic, pathophysiologic, or other scientific experiments. Depending on the level of clinical validation, surrogate endpoints may be classified as candidate, reasonably likely, or validated. Surrogate endpoints are considered candidate when they are still under evaluation for their predictive ability, whereas reasonably likely surrogate endpoints require support based on strong mechanistic and/or epidemiologic rationale.
In general, there is no definitive way of establishing the validity of a given biomarker as a surrogate endpoint. However, there are a few surrogate endpoints that are now in routine use, both in the context of drug approval and medical practice. Examples include HbAlc, a measure of glycemic control, which is a surrogate for disease severity and outcomes of morbidity and mortality in patients with diabetes; and serum cholesterol levels (e.g., LDL-C), which serves as a surrogate for similar cardiovascular outcomes.
In the following sections, we highlight statistical and regulatory issues that are of relevance to the use of biomarkers and surrogate endpoints in drug development. Special emphasis is given to the requirements for validating and qualification of biomarkers and surrogate endpoints, and the resources available to facilitate the development of biomarkers by sponsors.
From a statistical perspective, the analysis of biomarkers and surrogate endpoints is associated with several challenges, including the assessment of the validity and clinical utility of the marker, as well as the handling of high dimensionality and multiplicity issues. Recent advances in modern analytic methods and new-generation sequencing appear to address some of the issues, but this is an active area of research with considerable opportunities to advance drug development and evidence-based medicine (Matsui 2013).
When dealing with one biomarker at a time, traditional univariate techniques can serve as screening tools. Advantages of such approaches include ease of implementation of the procedures and interpretation of the results. A major drawback is the inability to utilize the potential correlations among biomarkers. Therefore, it may often be essential to use more sophisticated multivariate methods when dealing with several biomarkers. Most traditional approaches may not be suitable to handle multivariate biomarker data, especially when dealing with such issues as the high dimensions, missing values, multicollinearity, and multiplicity. Accordingly, modern analytical approaches, including penalized regression, decision trees, and neural networks, may need to be considered (see, e.g., Hastie et al. 2009). In genomics, for example, hierarchical models can be used, since such models draw strength by incorporating information across comparable genes (see, e.g., Speed 2003).
After a promising set of genes or markers is identified, one may then assess the diagnostic potential of the markers using alternative models, while incorporating additional clinical information. Lu et al. (2013) report results that are based on the use of a penalized-regression approach to analyze data from an AIDS trial. In DeRubeis et al. (2014), a linear regression model has been applied to develop an individual treatment rule utilizing data from an RCT.
In association analyses involving many markers, one needs to control the possibility of false positives. Traditional approaches, such as the Bonferroni method tend to be too strict and may lead to many missed findings. The false discovery rate (FDR), defined as the expected proportion of incorrectly rejected null hypotheses among the declared significant results, was introduced by Benjamini and Hochberg (1995) as an attractive alternative to the more conservative traditional methods for simultaneous inference. Subsequent enhancements of the FDR include the positive false discovery rate (pFDR) and the q-value, which is a measure of significance in terms of the FDR rather than the usual false positive rate associated with traditional p-values (Storey and Tibshirani 2003).
As pointed out earlier, before a biomarker can be used in practice, its validity and reliability have to rigorously be assessed and established. A common approach, often referred to as analytical validation, is to use a gold standard to determine the reliability of the assay and the sensitivity and specificity of the measurements (Chau et al. 2008). In contrast, clinical validity relates to the assessment of the predictive value of a biomarker for disease prognosis or treatment effect. When the focus is on prognostic biomarkers, clinical validation requires the determination of the strength of correlation between biomarker values and a clinical endpoint.
In the context of randomized controlled trials (RCTs), clinical validity of a predictive biomarker may be evaluated with respect to the degree of significance of the treatment-by-biomarker interaction in a suitable model. Further, it is important to establish the clinical utility of the biomarker, i.e., whether the use of the biomarker in clinical practice has benefits. This is often accomplished through suitably designed clinical trials. One approach involves randomizing patients either to a standard of care therapy or to a strategy in which a biomarker-based treatment assignment is used. In other cases, enrichment designs may be employed in which treatment effect is assessed using only patients who are predicted to be responders based on the biomarker under consideration (Matsui 2013; Simon 2010).
When the intent is to determine whether biomarkers can serve as a surrogate for a clinical endpoint, it is essential to evaluate a number of conditions with the help of suitable statistical techniques. The exercise typically involves an assessment of the effect of the drug on the biomarker, the effect of the drug on the clinical endpoint of interest, and the association of the surrogate biomarker and the clinical endpoint. In a seminal work, Prentice (1989) introduced an approach under the assumption that “a response variable for which a test of the null hypothesis of no relationship to the treatment groups under comparison is also a valid test of the corresponding null hypothesis based on the true endpoint.” Implicit in the criterion is that the surrogate response variable captures all the information pertaining to the relationship between the treatment and the true endpoint. In other words, given the surrogate endpoint, the impact of treatment is conditionally independent of the true endpoint. One of the drawbacks of Prentices approach is its reliance on untestable assumptions. Clearly, conditioning on the surrogate, which is obtained posttreatment, is noncausal. Further, as argued in Berger (2004), the criterion provides a necessary, but not sufficient, condition to infer a treatment effect on the true endpoint.
Since Prentices idea of perfect surrogacy is unrealistic, Freedman et al. (1992) and Wang and Taylor (2003) introduced an approach based on the proportion of treatment effect explained. However, the approach still relies on conditioning on a posttreatment marker, and it may also lead to ratio estimates that may lie outside the acceptable limits of 0 to 1 and having high variability.
An alternative strategy involves combining information from several trials, with a view to assessing the “trial level association” between the treatment effect on the surrogate and the treatment effect on the true endpoint (Buyse et al. 2015). An example of the application of the meta-analytic approach may be found in Paoletti et al. (2013), in which results are reported concerning the validity of progression-free survival as a surrogate for overall survival in advanced/recurrent gastric cancer trials.
Although the meta-analytic approach appears attractive, in practice it may not be feasible to get data from multiple sources on a biomarker and a new treatment. Therefore, ongoing research is still needed for establishing the reliability of surrogate markers intended for use in drug development and clinical practice.