Background Non-interventional large-scale research on real-world patients who had a stroke requires the use of multiple data sources ensuring access to longitudinal data from large populations with clinically-detailed information. We sought to establish a framework for longitudinal research on patients hospitalised with stroke by linking information-rich, deidentified inpatient data from the Paul Coverdell National Acute Stroke Program (PCNASP) to commercial and Medicare Advantage longitudinal claims data.
Methods All stroke admissions in PCNASP between 2008 and 2015 were evaluated for linkage to longitudinal claims from a commercial insurer using an algorithm based on six available common data fields (patient age, gender, admission date, discharge date, discharge diagnosis and state) and a hospital match. We evaluated the linkage quality (via the percentage of unique records in the linked dataset) and the representativeness of the linked population. We also described medical history, stroke severity and patterns of medication use among the PCNASP-claims linked cohort.
Results The linkage produced uniqueness equal to 99.1%. We identified 5644 linked and 98 896 unlinked patients who had an ischaemic stroke hospitalisation in claims data. Linked patients were younger than unlinked (69.7 vs 72.5 years), but otherwise similar by medical history, prestroke medication use or lab values. Stroke severity was mild and most patients were discharged home. Prestroke and discharge use of antihypertensive and statins in the PCNASP were greater than their use as measured by filled prescriptions in claims.
Conclusions High-quality linkage between the PCNASP and commercial claims data is feasible. This linkage identified differences between reported or recommended versus actual out-of-hospital medication utilisation, highlighting the importance of longitudinal data availability for research aimed to improve the care of patients who had a stroke.
- standard of care
Data availability statement
No data are available. Data are not available as subjected to specific data use agreement that does allow data sharing.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Several classes of medications have shown to be effective in managing stroke risk factors and secondary stroke prevention,1 2 up to approximately 80% as compared with no treatment.3 Healthcare utilisation databases can be a useful tool to study the use and the comparative safety and effectiveness of therapeutics in routine care of patients who had ischaemic stroke, and thus complement information from randomised controlled trials (RCTs), which although generally considered the gold standard for establishing the causal relationship between interventions and patient outcomes, are often costly, take a long time to complete, and are often applicable to only a narrow patient population.4–6 These databases allow for the creation of a continuous record of hospitalisations, outpatient care, and medication use, but they often lack sufficient detailed information on critical clinical characteristics such as disease severity.7 Furthermore, most administrative datasets do not contain in-hospital drug use information. By contrast, clinical stroke registries are generally rich in clinical detail but lack longitudinal data and are deidentified abstractions of the medical record without explicit patient consent. The Paul Coverdell National Acute Stroke (PCNASP) was established by the Centers for Disease Control and Prevention in 2001 to collect data on the quality of care provided to patients who had a stroke from the initial emergency response through hospital discharge with the goal of improving the quality of hospital-based acute stroke care.8–10
In such a context, the linkage with alternative data sources such as clinical or quality improvement registries, to confirm clinical diagnoses that may not be accurately recorded in claims and to collect information on disease severity and inpatient medications and procedures, provides an opportunity for clinical research on the use and the effects of medications in large real-world patient populations.
While a body of literature exists on Medicare fee for service claims linked to clinical stroke registries,11–17 little is known about commercially insured patients under age 65 and those in Medicare Advantage plans. In a cohort study of patients who had ischaemic stroke, we sought to determine: (1) the feasibility of linking patients from an inpatient deidentified stroke registry (ie, the PCNASP to a commercial claims dataset with longitudinal information on inpatient and outpatient care and out-of-hospital filled prescriptions, (2) the representatives of the linked versus unlinked claims patients and (3) the reliability of registry-derived clinical inpatient and outpatient information such as prestroke and postdischarge medication use by comparing it to administrative claims data on filled prescriptions in the linked population.
Inpatient stroke data from the PCNASP
Information on hospitalised patients who had an ischaemic stroke was available via PCNASP between January 2008 and September 2015. The PCNASP was established in 2001 and from 2007 to 2011 collected data from acute care hospitals in six states (Georgia, Massachusetts, Michigan, Minnesota, Ohio, and North Carolina), increasing to 11 states in 2012–2015 (Arkansas, California, Georgia, Iowa, Massachusetts, Michigan, Minnesota, New York, North Carolina, Ohio and Wisconsin).18 PCNASP includes patients aged ≥18 years with a clinical diagnosis of acute ischaemic stroke, intracerebral haemorrhage, subarachnoid haemorrhage or transient ischaemic attack (TIA) or an International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) code indicative of a stroke or TIA.19 PCNASP collects several in-hospital data elements including stroke subtype, National Institutes of Health (NIH) Stroke Scale Score, and prescribed preventative pharmacological therapy at discharge, for example, antihypertensive, statin and antithrombotic treatments.18
Longitudinal information on commercially insured patients was collected from the Clinformatics Data Mart (OptumInsight, Eden, Prairie, Minnesota, USA) between 2008 and 2015, a US-based healthcare insurance dataset including deidentified administrative claims for over 14 million persons annually (hereinafter referred to as Optum). The patients in this dataset are commercial health plan members and Medicare Advantage members (approximately 25%–30%) from all 50 states. For each enrollee, the dataset includes demographic information, health plan enrollment status, inpatient and outpatient medical encounters coded via ICD-9CM and Current Procedural Terminology (CPT)-4 classifications, and filled prescriptions, including National Drug Code numbers, quantity dispensed, and days’ supply. Claims data are deterministically linked to laboratory test results provided by two national laboratory providers, with results for outpatient laboratory tests available for approximately a third of beneficiaries.
Data linkage strategy
In the absence of direct patient identifiers, we used the following six available common data fields to link the PCNASP and the Optum dataset: patient age, gender, admission date, discharge date, discharge diagnosis and state. In order to improve linkage validity, we required a hospital match to occur first.20 We established a hospital ‘crosswalk’ by matching hospitalisations from the inpatient PCNASP data with hospitalisations from the claims, on the basis of exact values for the six common data fields. We restricted the crosswalk to hospitals with at least five exact patient level matches. The hospital identified in claims data that contained the majority of exactly matched hospitalisations for any given PCNASP hospital was assumed to be the correct link for that PCNASP hospital.
Within each hospital, we calculated the percentage of records that were unique after implementing the linkage, as previously described.20 This was defined as follows:
N multiple records in claims=number of multiple records in claims that linked to the same record in PCNASP,
N multiple records in PCNASP=number of multiple records in PCNASP that linked to the same record in claims,
N linked records=total number of records in claims and PCNASP for which linkage was possible.
We linked PCNASP and Optum dataset using increasingly stringent criteria with respect to matching by age and dates of admission or discharge, and used the strictest linkage rule (uniqueness equal to 99.1%), to identify a study population of linked patients who had ischaemic stroke for which we had high confidence that the linkage accurately identified the same patient in claims and PCNASP.
Within the linked study population, we restricted to patients who had a first index ischaemic stroke hospitalisation and six or more months of continuous health plan enrollment in claims prior to that hospital admission. Within the same time period, we also identified patients who had a first index ischaemic stroke hospitalisation in claims and no linkage with the PCNASP, and applied the same inclusion criteria as for linked patients. To identify patients who had ischaemic stroke in Optum linked and unlinked patients, we used a definition previously validated against medical records, based on a primary discharge diagnosis of ischaemic stroke (ICD-9 codes 433.x1, 434.xx (excluding 434.x0), or 436.xx),21 22 during the index hospitalisation.
Characteristics of linked and unlinked claims-based populations with ischaemic stroke
To assess the representativeness of the claims-based population with linkage to the PCNASP, with respect to the claims-based population without linkage to the PCNASP, we compared baseline patient characteristics between Optum enrollees with ischaemic stroke who were linked and not linked to the PCNASP data. Baselines patient characteristics of interest were measured in claims during the 6 months preceding the index stroke hospitalisation and included demographic information, comorbid conditions, use of medications, measures of healthcare utilisation, and, for a subset of the study cohort, selected baseline laboratory test results. We also described characteristics of the index hospitalisation including length of stay, discharge status, and in-hospital death, when available.
Characterisation of cohorts of patients who had an ischaemic stroke through PCNASP and claims-based longitudinal information
Within the linked study population, we assessed common fields pertinent to medical history and use of medications prior to the index stroke hospitalisation as measured in both PCNASP and claims. History of comorbid conditions at baseline was assessed through ICD-9 diagnostic and procedural codes in claims, and through recorded information on past medical history in PCNASP. Active use of antihypertensive and lipid-lowering medication treatment was measured via filled prescriptions during the 90 days prior to the stroke hospitalisation in claims, and patient’s reported medication use on admission in PCNASP. Baseline stroke severity was captured by NIH Stroke Scale (NIHSS) scores and by degree of independence in ambulation at discharge, both recorded in the PCNASP.
To evaluate the representativeness of the linked population, we examined descriptive statistics for a range of claims-based characteristics of patients linked to the PCNASP versus unlinked patients. We quantified the differences between linked and unlinked patients via standardised differences, using the conventional definition of meaningful differences as values greater than 0.1.23 Standardised differences compare the difference in means in units of the pooled SD and, unlike p values, are not influenced by sample size. Thus, in the context of a large sample, such as the current study, standardised differences are the preferred tool to identify meaningful differences in covariates.
To describe characteristics of linked patients who had an ischaemic stroke through PCNASP and claims-based longitudinal information, we evaluated frequencies and percentages for binary variables; and means (SD) and medians (IQR) for continuous variables. For the information captured by both PCNASP and claims within the linked population, we evaluated the presence of any meaningful discordance via McNemar test for paired nominal data and also provided absolute per cent differences. Percentages, means and medians for PCNASP variables were calculated using only the data of stroke hospitalisations that occurred in the time periods for which the specific information was collected (online supplemental material table 1).
Within the linked study population, we also identified a subset of patients who were directly discharged home after the index stroke hospitalisation, in order to reliably measure dispensing of filled prescription medications after the stroke hospitalisation in claims data, since patients do not have individual pharmacy claims while in a postacute care facility. In this population, we described the frequency of antihypertensive and statin treatment prescribed at discharge based on information from the PCNASP, with the frequency of filled prescriptions for antihypertensive and statin medications in the 90 days after discharge from the same hospitalisation.
In analyses stratified by age (<65 and ≥65 years) and by coverage (commercial and Medicare Advantage), we evaluated the representativeness of the linked population with stroke, described patients’ characteristics as captured by PCNASP and by longitudinal claims, and compared the frequency of antihypertensive and statin treatment prescribed at discharge based on information from the PCNASP, with the frequency of filled prescriptions for antihypertensive and statin medications in the 90 days after discharge from the same hospitalisation.
There were 32 991 571 patients contained in the claims dataset and 574 586 hospitalisations in PCNASP from 2008 to 2015. After applying a strict linkage rule based on exact matching of linking fields (table 1, linkage step 5), we successfully linked 10 079 hospitalisations among 9548 unique patients in Optum to hospitalisations in the PCNASP.
When we further restricted Optum linked and unlinked patients to those with a primary discharge diagnosis of ischaemic stroke during the index hospitalisation and with six or more months of continuous enrollment prior to the hospital admission, there were 5644 linked and 98 896 unlinked patients in claims data, that is, information from the PCNASP was available for 5.4% of Optum patients who had an ischaemic stroke (figure 1). After the linkage, most of the data derived from acute care hospitalisations in Georgia, North Carolina, Ohio and Minnesota.
Claims-based patient characteristics between enrollees for whom PCNASP information was available (linked enrollees) versus not available (unlinked enrollees) were balanced with most standardised differences<0.1 (table 2).
In both cohorts within claims data, PCNASP-linked and unlinked patients had similar gender distribution (approximately 50% women), burden of comorbidities as measured by the combined comorbidity index, a claims-based score with lower values associated with lower mortality risk and higher values associated with higher mortality risk,24 and medication use prior to the hospitalisation. Minor imbalances were noted for a few characteristics. Compared with patients without PCNASP linkage, PCNASP-linked patients in Optum were slightly younger (69.2 vs 72.5 years), had a higher number of physician visits at baseline and a higher baseline total cholesterol and haemoglobin A1c level, and were more frequently discharged to home healthcare. During the 6 months prior to the stroke hospitalisation, prescriptions were filled for an antihypertensive in over 60% of patients, a statin in ~35% and an anticoagulant or antiplatelet in ~10%. In addition, measures of healthcare utilisation, baseline laboratory test results and characteristics of the index stroke hospitalisation were similar, with ~20% of patients experiencing a hospitalisation in the prior 6 months, of ~6 days length. Claims-based patient characteristics between PCNASP-linked and unlinked patients were also balanced with most standardised differences <0.1, when we stratified by age and coverage type (online supplemental material tables 2 and 3).
When we explored the concordance of baseline medical history in the PCNASP-linked patients between the claims and PCNASP sources, we found discordance was overall low and identified only a few variables for which discordance was higher as measured by McNemar test and absolute per cent differences (table 3).
In particular, the prevalence of history of diabetes mellitus, carotid stenosis, peripheral vascular disease, congestive heart failure and depression was higher in claims. Conversely, the prevalence of history of atrial fibrillation, obesity and cigarette smoking was higher in the PCNASP compared with claims, consistent with expected under-recording practice for these variables in claims data. Stroke severity was often mild or moderate (ie, mean initial NIHSS was 6.2 with a median of 3), and 50.4% patients were able to ambulate independently with or without a device at discharge. When we stratified by age and coverage type (online supplemental material tables 4 and 5), findings remained consistent, except for a higher prevalence of hypertension and chronic kidney disease in the PCNASP compared with claims among patients younger than 65 and among commercially insured patients.
Patients filled prescriptions for antihypertensive and lipid-lowering medications in the 90-day period prior to the stroke hospitalisation less frequently than they reported these medications as being currently taken at admission (figure 2). Among PCNASP-linked patients discharged home, antihypertensive treatment was prescribed at discharge in 81.6% of cases; however, claims for prescriptions filled post discharge were found in only 61.9%. Similar patterns were observed for statin treatment (prescribed at discharge in 84.7% of cases and filled in the first 90 days post discharge in 61.1% of cases). Findings were similar when we stratified by age and coverage type (online supplemental material figures 1 and 2).
In a large cohort of patients hospitalised with ischaemic stroke, we found that a reliable linkage between the PCNASP, a rich inpatient stroke registry, and commercial claims data using indirect identifiers was feasible, and permitted combining detailed disease markers of the acute stroke care episode with longitudinal postdischarge care in a vulnerable population of patients at high risk of recurrent stroke. Despite small differences in a few variables, the PCNASP-claims linked subset appeared to be overall representative of the general claims-based population with stroke. Detailed data on stroke severity (NIHSS), and ambulatory status at discharge were available for most linked patients in the PCNASP, providing crucial information for predictors and risk adjustment in clinical and pharmacoepidemiological analyses addressing the postdischarge period. Medication information on antihypertensive and lipid-lowering treatments from the PCNASP was discordant with claims-based drug utilisation patterns outside of the hospital, with reported use of medications on admission and prescriptions at discharge largely overestimating the real-world use of medications as measured by filled prescriptions, which highlights the role of a longitudinal framework based on dispensing information from claims to accurately assess the use of medications outside of the hospital. Use of medications as measured by drug dispensing in claims is known to strongly correlate with medication use by patients, in contrast to prescribing information, typically captured in medical records, which may overestimate medication use by patients who fail to fill their prescriptions, resulting in substantial bias.25
Real World Evidence (RWE), the understanding of causal treatment effects from electronic data generated by the routine provision of care, has gained much attention from regulators, payers and physician groups.26–30 RWE is thought to complement essential evidence on the efficacy of medications that we gain from RCTs, by providing information on their safety and effectiveness in clinical practice.5 6 Generated evidence needs to be internally valid and generalisable to an identifiable target population in order to be actionable.31–33 A valid evaluation of patients who had a stroke in routine setting of care requires the joint contribution of data sources that can ensure the access to large populations with complete healthcare longitudinal information and rich clinical descriptors. Our study provides evidence that the creation of such a research framework is feasible and can provide a valid platform in which to address a range of clinical and pharmacoepidemiological research questions. These data resources can help understand why in some individuals evidence-based treatments fail to prevent recurrent events and can help identify during the hospitalisation those at greatest risk for non-adherence in the time period after discharge. The observation of important differences between reported or prescribing information and out-of-hospital medication dispensing for two mainstay treatments for stroke prevention highlights the challenges of assuming medication adherence subsequent to acute stroke hospitalisation based on discharge prescriptions. These findings are in line with results from previous studies assessing primary medication non-adherence in the USA, with estimated pooled primary non-adherence rates of 16% for antihypertensive medications and 25% for lipid-lowering medications,34 and with rates as high as 28% for both classes in the primary care setting.35 Our findings are also consistent with prior studies showing that more than half of patients stop taking their prescribed secondary prevention medications, including antihypertensive and lipid-lowering medication, 1–2 years after an incident stroke.36–39 In a study evaluating 1-year self-report of persistence and adherence to medications prescribed to patients after stroke discharge, up to one-third of patients who had a stroke discontinued one or more secondary prevention medications within 1 year of hospital discharge.37 Of note, self-discontinuation of medications was uncommon, and several potentially modifiable patient, provider, and system-level factors associated with persistence and adherence may be targets for future interventions. The proposed research framework can hopefully help identify modifiable elements that can be the basis for taking quality improvement interventions to the next level, such as intensive education, remediation of social determinants of health, and better coordination with primary care after discharge. This framework will also provide the opportunity to study stroke care during readmissions as well as the use of other major medication classes, for example, antiplatelets, anticoagulants and glucose-lowering medications.
Another relevant example of a successful large-scale linkage between administrative data and alternative data sources with additional clinical insight on patients who had a stroke is the linkage between Medicare fee-for-service part A and B claims and the in-hospital Get With The Guidelines (GWTG)—Stroke Registry.20 40 This enriched data source allowed investigators to follow Medicare fee-for-service beneficiaries who were linked to the GWTG-Stroke Registry for a range of claims-based clinical outcomes after acute ischaemic stroke hospitalisations.11–18 However, even in this important example, out-of-hospital medication information was limited to the documentation of treatment reported at hospital admission and drug prescriptions at discharge as recorded in the GWTG-Stroke Registry, without longitudinal data on prescriptions actively filled by the patient prior or subsequent to the stroke hospitalisation.11 12 With the exception of over-the-counter medications and self-pay, our study could rely on complete information on filled prescriptions outside of the hospital. Another strength of this study is that we were able to provide information on a population enrolled in commercial or Medicare Advantage plans, which is a crucial complementary data source to those traditionally available from Medicare fee-for-service, Medicaid and Veterans Affairs.
Our study has limitations. First, as the data from claims and the PCNASP were deidentified in accordance with the data use agreement of these data sources, we could not confirm our linkage through the use of personal identifiers. However, our linkage was built on an established strategy successfully implemented in a similar setting, which demonstrated the feasibility of a reliable linkage between claims and an inpatient registry.20 Second, information from the PCNASP was only available for 5.4% of Optum patients who had an ischaemic stroke. This is not evidence of poor performance of the linkage strategy, but it is likely driven by (1) the complete absence of information from US states represented in the Optum database but not participating to the PCNASP, (2) the limited presence of Optum enrollees in some of the US states that mostly contributed to the PCNASP registry during the study period and (3) the limited participation of hospitals to the PCNASP registry in some of the US states mostly represented in the Optum database. Nevertheless, we found that the Optum population with ischaemic stroke that was treated at hospitals participating to the PCNASP registry had similar characteristics to the remainder of the Optum population with ischaemic stroke that was treated at hospitals that did not participate to the PCNASP registry, suggesting that our findings likely apply to the broader Optum population with stroke. Similarly, information from the Optum database was only available for 1.7% of PCNASP patients, which could limit the generalisability of our findings to the remainder of the PCNASP population. This was not assessable as we did not have access to the unlinked PCNASP population. Third, PCNASP-based information on several variables was characterised by a considerable amount of missingness (online supplemental material table 6), in particular for the variables collected only later in the study period (eg, obesity, chronic kidney disease, drug or alcohol abuse, depression). Finally, claims data capture actual medication dispensing patterns and therefore provide a more complete and reliable approximation of medication use than electronic health records, which only capture prescriptions.25 However, claims data do not capture actual medication use, and still have limitations in measuring drug use in certain settings, for example, over-the-counter medication use. Lastly, claims do not provide information on the reasons patients are non-adherent or may discontinue medications, which could be due to financial hardship, forgetfulness or countermanding physician orders.
In conclusion, in a large cohort of patients who had a stroke, we found that a reliable linkage between the PCNASP and commercial claims data using indirect identifiers was feasible, representative and permitted combining detailed disease markers of the acute stroke care episode with reliable postdischarge longitudinal information in a vulnerable population of patients at high risk of recurrent stroke. This enriched data source will provide important insights into postdischarge evaluation of medication use and outcomes, ultimately improving the care of patients who had a stroke.
Data availability statement
No data are available. Data are not available as subjected to specific data use agreement that does allow data sharing.
Patient consent for publication
The Institutional Review Boards of the Mass General Brigham and the Centers for Disease Control and Prevention approved the study (#2014P002184). A signed data use agreement was in place. Informed consent was waived for study participants.
We thank Moa Lee and Julie A. Barberio for their contribution to early stages of this research.
Contributors EP, SS and LHS were involved in all parts of the study. MGG, XT, JMF and LMVRM were involved in designing the study and revising the manuscript. PA and HM were involved in data analysis and revising the manuscript. EP, SS and LHS are the guarantors.
Funding This study was funded by the Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA. EP was supported by a career development grant K08AG055670 from the National Institute on Aging. LM was supported by a career development grant 5K08AG05338002 from the National Institute on Aging.
Disclaimer The findings and conclusions in this report are those of the authors and do not necessarily represent the opinion of the Centers for Disease Control and Prevention.
Competing interests EP is co-investigator of an investigator-initiated grant to the Brigham and Women’s Hospital from Boehringer-Ingelheim, not related to the topic of the submitted work. SS is the principal investigator of investigator-initiated grants to the Brigham and Women’s Hospital from Bayer, Vertex, and Boehringer Ingelheim unrelated to the topic of this study. He is a consultant to WHISCON and to Aetion, a software manufacturer of which he owns equity. His interests were declared, reviewed, and approved by the Brigham and Women’s Hospital and Partners HealthCare System in accordance with their institutional compliance policies. LHS is principal investigator of NIH-NINDS grants (U24NS107243), co-investigator of other grants (PCORI R-1609-35995, NIH-NINDS R01NS111952, NIH R01AG062770), and reports the following relationships relevant to research grants or companies that manufacture products for thrombolysis or thrombectomy even if the interaction involves non-thrombolysis products: scientific consultant regarding trial design and conduct to Genentech (late window thrombolysis) and Member of steering committee (TIMELESS NCT03785678); consultant on user interface design and usability to LifeImage; stroke systems of care consultant to the Massachusetts Dept of Public Health; member of a Data Safety Monitoring Boards (DSMB) for Penumbra (MIND NCT03342664); Serving as National PI for Medtronic (Stroke AF NCT02700945); National Co-PI, late window thrombolysis trial, NINDS (P50NS051343, MR WITNESS NCT01282242. LMVRM reports is principal investigator of a NIH-NIA grant (5K08AG053380-02), principal investigator of an investigator-initiated grant sponsored by the Epilepsy Foundation of America (60300-EFA-PCO-000-19-01), co-principal investigator of a CDC grant (SIP 20-007), co-investigator of other grants (NIH-NIA 5R01AG062282-02, NIH-NIA 2P01AG032952-11, NIH- NIA 3R01AG062282-03S1), and reports no conflict of interest.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.