PATIENT-REPORTED OUTCOMES IN REGULATORY SETTINGS
According to a recent guidance document issued by the US FDA, patient-reported outcomes (PROs) are defined as “any report of the status of a patients health condition that comes directly from the patient, without interpretation of the patients response by a clinician or anyone else” (FDA 2009). Thus, PROs may include a range of such subjective outcomes as symptoms (e.g., pain, fatigue, nausea, or vomiting), functioning (physical, emotional, or social), health-related quality of life (HRQOL), or preference about a given treatment (Drummond et al. 2005). PRO data is captured directly from the patients using a suitable instrument, which consists of a questionnaire and accompanying instruction and documentation in support of its use. While a PRO is typically measured by self-report, in some situations it may be captured by interview, in which case the interviewer is expected to record only the patients response. It is noted that certain symptoms or other concepts that are known only to the patient (e.g., pain severity) can only be measured using PRO instruments.
Pharmaceutical companies and regulatory agencies pay due attention to the collection, analysis, and reporting of PROs (see, e.g., Alemayehu and Cappelleri 2012; 2014). The importance of PROs stems from the fact that there is a growing focus on patient-centric healthcare system. The US FDA acknowledges that evidence from a well-defined and reliable PRO instrument collected through an appropriately designed investigation can be used to support a claim in medical product labeling (FDA 2009). In the European Union (EU), the EMA has recognized the fact that the “experience of patients of how a treatment impacts on their well-being and everyday life is an important aspect of the evaluation of the clinical benefits of new medicines” (EMA 2016). In addition, PROs can be used as evidence to support health-technology assessment (HTA) decisions and payer negotiations (Zagadailov 2013). PROs are, therefore, routinely included as an integral component of most drug-development plans, often starting in early-phase trial designs (Basch 2016).
Since PROs are subjective in nature, their acceptance as a basis for the assessment of the relative benefits and risks of alternative treatment options is dependent on the validity and reliability of the instruments used to generate the data. In addition, it is essential to ensure that PROs are developed following standardized approaches, so that results can be compared or synthesized across measures, and that the burden on patients is reduced to a minimum. The Patient-Reported Outcome Measurement Information System, or PROMIS (Celia 2007), which was launched in 2005 through a National Institutes of Health (NIH) Roadmap Initiative, is an example of ongoing efforts intended to enhance the development, use, and interpretation of PROs (see, e.g., De Walt 2007).
In the following, we provide a high-level summary of pertinent aspects of PRO-instrument development as well as data collection, analysis, and reporting, with emphasis on the issues that are germane to their effective use in clinical trials and regulatory submissions. For a more in-depth discussion of these issues, the reader is referred to Cappelleri et al. (2013).
Development and Validation of PRO Instruments
The development of a PRO instrument that is intended to be used to generate evidence for regulatory or other healthcare decision-making requires a rigorous evaluation process, involving both qualitative and quantitative methods. The initial step often comprises a thorough review of the available literature to confirm the need for a new instrument and to understand the nature of the measures that are already in use for related purposes. The next step would be development and assessment of a conceptual framework that guarantees that the issues of most relevance to the patient are captured. The concept of interest may involve a single item (e.g., pain intensity), or require multiple items (e.g., physical function). In the case of the latter, it is critical to establish how individual items are associated with each other and each domain, and how domains are associated with each other and the general concept of interest. Figure 4.1 adapted from FDA (2009), illustrates the interrelationships of items and domains in a conceptual framework of a PRO instrument.
Once the conceptual framework is confirmed, other properties of the instrument will need to be established, including content validity, reliability, construct validity, and ability to detect change.
The assessment of content validity, which requires evidence that the instrument measures the concept of interest, includes analyzing data collected from focus groups (Patrick et al. 2011a; 2011b). Content validity depends on a number of factors, including whether item generation includes input from the target patient population; appropriateness of the recall period for the instrument; the mode of administration (i.e., whether self-administration, interview, or both); relevance of the response options (e.g., visual analog scale, Likert scale, etc.); scoring of items and domains; and respondent burden.
Construct validity involves establishing whether observed relationships between measures gathered using the instrument and results gathered
FIGURE 4.1 Illustration of a Conceptual Framework of a PRO Instrumentusing other measures are in congruence with preexisting hypotheses about those relationships (i.e., discriminant and convergent validity), as well as whether the instrument can differentiate between clinically distinct groups (i.e., known groups validity). In addition, an instruments tloor/ceiling effects are assessed to determine the appropriate use of the instrument for a given condition. The ability of the instrument to discriminate among patients is characterized by assessing the variability in responses when the instrument is administered. Further, an instruments ability to capture responsiveness to meaningful change may be determined by comparing the change in PRO scores against the change in other similar measures or a gold standard (criterion validity).
The assessment of the reliability of an instrument consists in evaluation of its ability to yield consistent, reproducible results. This may include determination of reproducibility (e.g., test-retest reliability), internal consistency (e.g., agreement among the observed responses to different questions), as well as inter-interviewer reproducibility, if applicable. Internal consistency is assessed with respect to item-to-item correlations (often using Cronbachs alpha), whereas test-retest reliability may be assessed based on an analysis of variance involving repeated measurements on the same set of subjects.