Our aim was to describe an administrative database as a source for estimation of epidemiological characteristics of a rheumatic condition. For this purpose, we utilized an example of SLE prevalence assessment in the database of the Estonian Health Insurance Fund.
The EHIF database can be considered as being in a favorable position for retrieval of reliable estimates in epidemiological studies. The completeness of data is secured by homogeneity of the health care and insurance system, imposed on a small-sized population. Application of a “fee-for-service” billing principle backs complete capture of HCP activities [9]. Data transmission design, through which data entered by a physician is transferred to the billing claims without re-entering, allows for avoiding errors caused by repeated data processing by nonmedical personnel [9, 10]. Real time data transmission with inbuilt quality checks provides the researchers with cleaned up-to-date data.
As every database is created mainly for administrative purposes, the EHIF database has its limitations in regard to epidemiological research. The EHIF does not distinguish between referral and final clinical diagnoses which may bring along a considerable number of false positive diagnostic codes in the database. Moreover, in the case of conditions with as complicated a diagnostic process as SLE, the initial diagnosis may be revised as the disease evolves in the course of time [4]. The lack of detailed clinical data in the EHIF database brings along the necessity for ascertainment of diagnoses using data sources that contain information for assessment of validity of the coded diagnosis. HCP electronic databases can be utilized for this purpose. In Estonia, the search for clinical records is facilitated by a limited number of structurally similar HCP electronic database versions in use. Our choice to contact GPs by mail was driven by the intention to speed up the data collection process as GPs approached their databases simultaneously; the more time consuming procedure of reviewing of the GPs’ databases by the researchers would have yielded apparently analogous results.
Our study design matched the approach 2b described in Widdifield and colleagues [4]: patients were sampled from the administrative database (EHIF) by the presence of diagnoses codes and were classified as true cases or false positive cases by the reference standard (HCP electronic databases). This approach precludes identification of false and true negative cases and hence calculation of a database’s sensitivity and specificity. PPV, a statistic reporting the proportion of people with the code that truly has the disease, can be estimated based on the identified false and true positives. PPV is the most commonly used statistic to report code accuracy in administrative database research validation studies [9, 11]. Although PPV use is limited in some research circumstances due to its dependency on prevalence [9, 12], this characteristic of PPV should not preclude its usage for demonstration of accuracy of diagnosis code assignment in a particular predefined group during the fixed study period [13].
The proportion of false positive M32 diagnoses in the EHIF database (40%) was similar to the 43% reported by Bernatsky and colleagues in an administrative database in Nova Scotia, Canada [4]. However, the general comparison may not be of great value for inferences, hence accuracy of diagnostic code depends on several factors. Besides the purpose of the administrative database creation, the correctness of code is greatly affected by the case ascertainment algorithm in the study [3, 4]. For confirmation of a M32 diagnosis as true SLE, we used the opinion of experienced rheumatologists on the case’s fulfillment of ACR criteria as “gold standard”. Based on the revision of clinical documentation, this approach provided us with the access to data from the six-year period after the end of cases’ enrollment. It gave us the advantage to follow the patient’s progress over a longer time interval which is valuable in case of complicated diagnoses. Our decision – to regard as true SLE cases the individuals who were assigned M32 by a rheumatologist four or more times during the studied period – may have artificially to some degree decreased our estimation of the false positive percentage. Yet, our data revealed a decrease in the false positive proportion from about 90 to 60% to 30% among the cases coded M32 respectively once, twice and three times by a rheumatologist. During the verification of a random sample of the M32 diagnoses assigned four times or more, only one rather exceptional false positive case was detected. Thereby, the percentage of false positives could be assumed as being further diminished with increment of M32 assignment repetitions, finally approximating zero. Our results corroborated the earlier results of administrative database research by Bernatsky, Widdifield and colleagues demonstrating the effect of specialty of physician on the accuracy of diagnosis of rheumatic condition [4, 5]. PPV of the M32 codes assigned by GPs and specialists other than rheumatologists ranged from 15 to 20%. Among the rheumatologists’ diagnoses, the proportion of false positives decreased with an increasing number of billing episodes with PPV varying from 10 to 70% among codes assigned once and three times during the study period, respectively.
The false positive diagnoses assigned both by the GPs and other specialists were predominantly referral diagnoses which were not confirmed by a rheumatologist afterwards. Similarly, the majority (about 70%) of false positive M32 codes assigned by the rheumatologists turned out to be primary diagnoses which were not confirmed by the further examination. These results support the findings of Bernatsky and colleagues of the initial diagnoses being a major source of low PPV of administrative databases in the case of rheumatic conditions [4]. Due to the evolving nature of SLE and relying on the finding that many initial M32 diagnosis cases were later diagnosed as having other systemic connective tissue diseases, it may be argued that decreased validity caused by tentative diagnoses is and will be an intrinsic part of administrative database research of SLE epidemiology. A potentially avoidable cause of false positivity – coding error – contributed to a relatively small proportion of PPV decrease in our study among rheumatologists and other specialists. Coding errors made by the GPs occurred mostly in the cases of conditions with similar ICD codes (e.g. F32, H32, N32) and could presumably be attributed to the beginning of the study period when prescriptions were still handwritten. Regarding the digitalized prescription system, which was introduced to Estonian health care in 2010 and is used in an almost exceptional manner today, the role of coding error as a reason for false positive M32 codes can be expected to have decreased.
Although SLE and syphilis may share common clinical and laboratory features [14, 15], we would like to believe that syphilis misdiagnosed as SLE during a year by a rheumatologist is a regrettable exception. According to the Estonian Health Board (http://www.terviseamet.ee/en/information.html) there were 166 cases of early syphilis diagnosed in Estonia during 2006–2011. Remarkably there were no misdiagnosed syphilis cases among the false positive M32 diagnoses assigned by rheumatologists one, two or three times. In our opinion, this supports the decision to treat the only misdiagnosed case as a highly uncommon occurrence. The case can be used as an illustration of the importance of concentration of rheumatological care to centers with high level diagnostic possibilities and accumulation of knowledge and experience.
In our sample, the correctness of M32 code assignment did not depend on patients’ age and sex (logistic regression analysis, results not shown); these results contradicts the findings of Bernatsky and colleagues of lower sensitivity of case definitions of systemic autoimmune diseases in billing data for older individuals [4].