The Norwegian Patient Register receives administrative data on patients from hospitals and private contract specialists, and the Cancer Registry records information on all cases of cancer and selected pre-stages of cancer. The Norwegian Patient Register contains codes for the conditions deemed to be correct and relevant for the treatment that has been provided to the patient. The Cancer Registry, on the other hand, contains detailed information on each case of cancer on the basis of data from several different sources (clinical and pathological reports, data from radiotherapy and death certificates) and has the most updated information available on each case. The Norwegian Patient Register can therefore be regarded as a treatment register and the Cancer Registry as a disease register. The difference between the two is reflected in fact that the Norwegian Patient Register uses the term «diagnostic code», whereas the Cancer Registry uses «diagnosis».
The CRN has received administrative patient data from Norwegian hospitals since 2002. These data have been an important source to ensure the best possible completeness of the register. Since 2010 this arrangement was replaced by a transfer of data from the NPR to the CRN for all hospitals jointly (
1). The CRN contacts the hospitals concerned in cases where the patients have received treatment of cancer and a report of the case has not been sent to the CRN.
For it to be appropriate for the CRN to use data from the NPR as a basis to send such reminders, the quality of codes used by the latter must be high.
This study is the first in which individual-level data from the NPR have been validated against corresponding data from another central health registry. The purpose of the analyses presented here is to assess the correspondence between diagnostic codes registered in the NPR and diagnoses registered in the CRN for the 2008 calendar year. The project was restricted to the six most frequently occurring forms of cancer: colon cancer, cancer of the rectosigmoid junction, rectum, or anus, cancer of the lungs and the trachea, breast cancer, prostate cancer and cancer of the bladder, ureter and urethra.
Material and method
The project has been implemented in accordance with the regulations of the Norwegian Patient Register (
2) and of the Cancer Registry of Norway ( 3). All data exchange was undertaken through the Norwegian Health Net. Data linkages and analyses were undertaken with the aid of SPSS for Windows (Version 17).
The files of the NPR for 2008 were searched for data on all hospitalisations (admissions and outpatient contacts) in somatic hospitals where the ICD10 codes C18 (colon cancer), C19-C21 (cancer in the rectum, rectosigmoid junction or anus), C33-C34 (cancer of the lungs or the trachea), C50 (breast cancer), C61 (prostate cancer) or C66-C68 (cancer of the bladder, ureter or urethra) were registered as the main or as an additional code. After deletion of data for which the personal identification number was not reported (N = 1 778) we were left with data on 251 674 admissions – 59 123 hospitalisations (stays in a ward or day treatment) (23.5 per cent) and 192 551 outpatient consultations (76.5 per cent).
Data from the Cancer Registry comprised information on persons who had been registered with at least one case of illness (malignant diagnoses, basal-cell carcinomas of the skin, borderline neoplasms in the ovaries, ductal carcinomas
in situ in the breast and pre-malignant or benign neoplasms in the bladder and the central nervous system) in 2008 or earlier, updated as of September 2010. Information on persons who had died or emigrated prior to 2006 was excluded from the database.
The NPR is an internally encrypted register where data on activities are never directly linked to personal identification numbers. To be able to link data from the NPR with data from the CRN, the personal identification numbers in the CRN were encrypted by the NPR according to standard procedures. The combined file did therefore at no point in time contain readable personal identification numbers.
In the NPR, the unit of registration is «admission», and several admissions will usually be registered for each patient who has received treatment for cancer. In the CRN, the corresponding unit is «case of cancer», and some patients will have multiple registrations. We therefore started by reducing the data set from the NPR to one line per patient, retaining information on the number of times the relevant codes had been recorded for hospitalisations and outpatient consultations. Similarly the data set from the CRN was reduced to one line per patient, and information on all cases of cancer (ICD-10 codes) with the appurtenant date of diagnosis was retained. Subsequently these two data sets were linked to form our analysis file.
The patients were grouped into the three following categories:
Furthermore, we noted whether the data from the CRN showed that the patient had been diagnosed in the same year (2008) or earlier. Finally we investigated which diagnoses had been registered in cases where the condition from the NPR did not correspond to the diagnosis listed in the CRN.
The vast majority of the patients who had been registered with one of the six most frequently occurring forms of cancer in the NPR were also registered with at least one case of illness in the CRN (Table 1). The degree of correspondence between the diagnostic code in the NPR and the diagnosis recorded by the CRN varied from 81 per cent for patients with colon cancer to 97 per cent for patients with prostate cancer. Moreover, we found that the degree of correspondence increased with the number of admissions with the relevant diagnostic code recorded by the NPR as well as with increased age of the patient (data not shown).
Table 1: Registrations in the CRN for patients registered in the NPR in 2008
Number of patients (Norwegian Patient Register)
In the Cancer Registry of Norway, same diagnosis, number (%)
In the Cancer Registry, different diagnosis, number (%)
Not in the Cancer Registry, number (%)
Colon cancer (C18)
5 207 (81)
1 029 (16)
Cancer of the rectosigmoid junction, rectum, or anus (C19-C21)
3 470 (82)
Cancer of the lungs or the trachea (C33-C34)
5 053 (90)
Breast cancer (C50), women
12 111 (94)
Prostate cancer (C61), men
16 330 (97)
Cancer of the bladder, ureter or urethra (C66-C68)
6 701 (93)
Diagnoses in the CRN in the case of non-correspondence
In the following we will describe the diagnoses that had been registered in the CRN in cases where there was no correspondence between the two registers.
A total of 1 029 (16.0 per cent) of 6 437 patients registered with C18 colon cancer in the NPR were registered with another diagnosis in the CRN. Of these, the majority (n = 728, 70.7 per cent) were registered with other cancer diagnoses related to the gastrointestinal system (C16 stomach (n = 34), C17 the small intestine (n = 30), C19 the rectosigmoid junction (n = 200) and C20 the rectum (n = 464)). Altogether 3.2 per cent of the patients had no record in the CRN.
Of all the patients who were registered in the NPR with C19 – 21 (cancer of the rectosigmoid junction, rectum, or anus), 639 out of 4 246 (15.1 per cent) were registered with a different diagnosis in the CRN. Most of them were registered with colon cancer (C18, n = 466, 72.9 per cent). Altogether 137 of these patients had no record in the CRN (3.3 per cent).
A total of 431 out of 5 611 patients who were registered in the NPR with C33 or C34, cancer of the lungs or the trachea were registered with another diagnosis in the CRN (7.7 per cent). Among these, altogether 63 were registered in the CRN with the diagnosis C80, malignant neoplasm without specification of site, while 45 were registered with C45, mesothelioma, and 39 were registered with C43, malignant melanoma of the skin. Of all patients registered with C33-C34 in the NPR, in total, 2.4 per cent had no record in the CRN.
Of all women registered with C50 breast cancer in the NPR, 624 out of 12 875 were registered with another diagnosis in the CRN (4.8 per cent). Among these, a total of 449 were registered with diagnostic code D05, carcinoma in situ of breast. The others were distributed across a wide range of diagnoses. Altogether 140 women with C50 in the NPR were not registered in the CRN (1.1 per cent).
Of a total of 16 907 men registered with C61 prostate cancer in the NPR there were 264 (1.6 per cent) who had another diagnosis in the CRN. Among these, 61 were registered with C67 malignant neoplasm of bladder. In addition, many different diagnoses were recorded for these patients. A total of 313 men registered with prostate cancer in the NPR had no registration in the CRN (1.9 per cent).
Of a total of 7 188 patients registered in the NPR with C66-C68, cancer of the bladder, ureter and urethra, altogether 295 (4.1 per cent) had been recorded with another diagnosis in the CRN – 236 men and 59 women. In total, 117 of these men had been registered with the diagnosis C61, malignant neoplasm of prostate (prostate cancer), while 22 were registered with C65, malignant neoplasm of renal pelvis, and 13 were recorded with C64, malignant neoplasm of kidneys, except renal pelvis. A total of 11 women registered in the NPR with the diagnostic code C66-C68 were recorded with the diagnosis C64 in the CRN, but no other diagnoses were especially frequent. A total of 192 patients registered with C66-C68 in the NPR were not found in the CRN’s records (2.7 per cent).
Correspondence in the number of patients in the two registers
Since the Cancer Registry of Norway is being continuously updated, the number of patients will vary as new information is added. The CRN’s updated statistics of new cases of cancer in Norway in 2008 shows that 1 165 men and 1 263 women were diagnosed with colon cancer for the first time during that year (
1). These figures are somewhat higher than those presented in our article. We found similar results for cancer of the rectum, the sigmoid colon and the anus, cancer of the lungs and the trachea, breast cancer and cancer of the bladder, ureter and urethra (Figure 1).
Figure 1 Proportion of new cases of cancer in 2008 ( 1) with a corresponding code in the Norwegian Patient Register
For prostate cancer, however, the situation was different. According to updated figures from the CRN, a total of 4 409 new cases of prostate cancer were registered in 2008 (
4), while we found only 3 336 men who were registered with this diagnostic code in the NPR and diagnosis year 2008 from the CRN, which is approximately three-fourths of what would be expected (Figure 1). The data from the CRN used for the current analysis included 4 284 patients with prostate cancer that had been diagnosed in 2008. We retrieved data on all admissions to somatic hospitals in 2008 registered by the NPR for these patients, irrespective of their diagnostic codes, and found that 4 054 of 4 284 patients were registered as having been treated in Norwegian hospitals during that year. A total of 230 patients were thus not registered in the hospitals at all, whereas 718 were registered only with diagnostic codes other than C61, prostate cancer. Of these 718 patients, a total of 436 were registered with at least one of the following diagnostic codes: C67 malignant neoplasm of bladder (n = 41), N40 hyperplasia of prostate (n = 245), Z125 special screening examination for malignant neoplasm of the prostate (n = 116), D40 neoplasm of uncertain or unknown behaviour of male genital organs (n = 10), Z031 observation for suspected malignant neoplasm (n = 24).
We compared the diagnostic codes registered by the Norwegian Patient Register with the diagnoses retrieved from the Cancer Registry of Norway for the six most frequently occurring forms of cancer. The strength of our study rests on the fact that we have had access to complete annual data on hospital admissions for 2008 for which one of the relevant diagnostic codes had been registered (Norwegian Patient Register) as well as to complete and quality-controlled data of all new cases of cancer (Cancer Registry of Norway). However, we have not been able to use data from private contract practitioners, since their reporting of personal identification numbers is incomplete.
We found that a small proportion of patients with the relevant diagnostic codes in the NPR were not registered by the CRN. Moreover, we found that the degree of correspondence between the diagnostic code in the NPR and the diagnosis in the CRN varied – from 81 – 82 per cent for cancer of the gastrointestinal system, 90 per cent for cancer of the airways, 93 per cent for cancer of the urinary tract, 94 per cent for breast cancer and 97 per cent for prostate cancer. We have chosen to use patient, rather than admission, as the unit of our analyses. With admission as the unit of analysis, the degree of correspondence would have been higher than what is presented here (colon cancer 86 per cent, cancer of the rectum, colon and anus 89 per cent, cancer of the lungs or the trachea 96 per cent, breast cancer 98 per cent, prostate cancer 98 per cent and cancer of the bladder, ureter and urethra 96 per cent). This difference occurs because most cancer patients are registered with several admissions in the course of a year. The likelihood of errors in the diagnostic codes is higher for cases where the patient is registered with only one or very few such admissions. The effect of errors in the diagnostic codes is therefore more prominent when patients are used as the unit of analysis than when using admissions as the unit.
The coding quality of the NPR has been investigated on assignment from the Office of the Auditor General in 2003 and 2008, by comparing a random sample of 1 000 admissions with the information in the patient records (
4, 5). The last audit revealed fairly substantial divergences between the main diagnostic codes reported to the NPR and the information in the patient records, with a total proportion of errors amounting to a full 36.2 per cent. This figure includes errors at all levels, from errors at the four-digit level in the ICD-10 system to entry of an incorrect chapter of the ICD-10. In our study we have seen that the correspondence between NPR and CRN data is considerably better than what might have been expected in light of this audit report. We believe that the results from such audits are unsuitable for drawing conclusions regarding the value of using NPR data for research purposes, or as in our study, for helping enhance the completeness of a different central health registry. When only the main diagnostic code is taken into account, admissions for which the auditors claim that the main and the additional diagnostic code ought to have been switched will also be classified as erroneously coded. In our material only one diagnostic code had been entered for 11 per cent of all hospitalisations and 51 per cent of all outpatient consultations (data not shown). If we had taken only the main diagnostic code into account, the value of the analyses would have been considerably reduced.
To gain an impression of the mechanisms that give rise to the divergences between the two registers we reviewed the reminders that the CRN had sent to the hospitals. In light of the responses from the hospitals, we can note three possible explanations for the lack of correspondence between the registers: imprecision in the specification of site, including imprecise coding of metastases, typing errors and imprecision in the degree of malignancy. In the following we will provide some examples of how such mechanisms may help explain the differences between the registers.
The degree of correspondence between the registers was lowest with regard to cancer of the gastrointestinal system (C18 colon cancer and C19 – 21 cancer of the rectosigmoid junction, rectum, or anus). In cases of divergence between the registers, a cancer had most often (slightly more than in 70 per cent of the cases) been registered in a different part of the gastrointestinal system. Here, imprecision in the location of the cancer may thus help explain part of the divergence between the registers. Of all patients registered with lung cancer in the NPR, altogether 8 per cent had been recorded with another diagnosis in the CRN. In several of these cases a malignant neoplasm without specification of site (C80) had been recorded by the CRN. In such cases, lung cancer may be a metastasis and should thereby have been recorded with the code for secondary malignant neoplasm of lung (C78.0). Furthermore, a small proportion of the cases of lung cancer in the NPR had been registered with the diagnosis malignant melanoma of skin (C43) in the CRN, a form of cancer which in the vast majority of cases (99.5 per cent) has been morphologically verified (
6). Such divergences may in some cases be due to simple typing errors (C34 instead of C43), although in other cases C34 lung cancer may be a metastasis from C43 malignant melanoma. Of the five per cent of all cases of breast cancer in the NPR registered with a different diagnosis in the CRN, close to three-fourths had been recorded as a pre-invasive carcinoma. This may indicate imprecision in the coding of the degree of malignancy.
A relatively small proportion (2.1 per cent) of the patients who had been registered with one of the relevant diagnostic codes in the NPR were absent from the records of the CRN. The mandatory reporting of cases of cancer, in combination with good routines for reporting as well as collection of cancer data, indicate that the CRN can be regarded as fairly complete. The data have been estimated to be near-complete for the period 2001 – 2005 (98.8 per cent) (
6). In our study, the proportion of patients who were registered in the NPR, but not in the CRN, were highest for those who were registered with colon cancer and cancer of the rectum, the sigmoid colon and the anus. Changes in the CRN’s registration routines may provide a possible explanation for this observation. Beginning in 2008, cases of polyps and adenomas in the colon and rectum were no longer registered in the CRN’s main database, and were therefore excluded from the sample used in our study. Erroneous coding of pre-malignant conditions such as C18-C21 in data reported to the NPR would therefore not appear in a linkage to CRN data.
With regard to colon cancer, cancer of the rectum, the sigmoid colon and the anus, cancer of the lungs and the trachea, breast cancer and cancer of the bladder, ureter and urethra, we found a very high degree of correspondence between the number of patients registered with the relevant diagnostic codes and diagnosis year 2008 from the CRN and the official statistics of the number of new cases in 2008 (
1). This shows that NPR’s records for 2008 contain data on the vast majority of patients diagnosed with one of these forms of cancer in that year.
As regards prostate cancer, the NPR had registered a considerably lower number of cases with a diagnosis year 2008 than what the incidence figures of the CRN would indicate. We also found that among the patients registered in the CRN with prostate cancer and diagnosis year 2008, approximately one in eight were not registered as treated in any hospital, or only with diagnostic codes that could not be associated with treatments related to this condition. The CRN’s figures for prostate cancer have been estimated to be 99.8 per cent complete for the period 2001 – 2005 (
6). The registry’s data show that only a little more than one per cent of newly reported cases on prostate cancer in 2008 were based only on information from the death certificate ( 1). We therefore believe that there is little reason to assume that deaths among patients who have received no previous treatment for this condition will have a significant importance for the large divergence between the number of prostate-cancer patients in the NPR and the corresponding number in the CRN.
A previous study (data from the period 1957 – 1986), based on information from the CRN and the hospitals’ systems for patient administration and patient records, showed that there were major shortcomings in the registration of cases of prostate cancer in the hospitals’ lists (
7). These data, however, stem from a period quite far back in time. In the current situation with performance-based funding (introduced in 1997), we find it unlikely that deficient registration of admissions can be the cause of the divergence between the registers in terms of prostate-cancer patients. The number of cases of prostate cancer has increased considerably during the last decade ( 8), while the use of PSA testing has increased. Of the new cases in 2008, a total of 8 per cent had been reported by private practitioners ( 8), while CRN data show that 9 per cent of the total number of patients had been treated exclusively in locations other than the public hospitals (data not shown). Private contract specialists also report to the NPR, but personal identification numbers are often missing in these reports. We thus had no opportunity to use data reported by the private contract specialists for investigating the follow-up of patients with prostate cancer by this group of therapists.
Since a substantial proportion of the cases of prostate cancer are treated outside the public hospitals, caution is required when NPR data on this group of patients are used for studies. We believe that this situation is likely to improve if a more complete reporting of personal identification numbers from the private contract specialists can be achieved.
The methodology used for our study can also be used to compare data from the NPR with data from other registers. The NPR has recently established similar cooperation with two other health registers, the Medical Birth Registry of Norway and the Norwegian Surveillance System for Communicable Diseases (MSIS). It has been pointed out that personal identification of data in the NPR will have a major importance for a more certain assessment of the completeness of the quality registers (
9). It is worth noting, however, that patient consent must comprise this purpose, and that any supply of data on identifiable individuals will require that the disease or quality register concerned has a legally established right to receive information from or send information to the NPR.
Recently a review of articles (n = 132) elucidating the validity of diagnostic information in the Swedish National Inpatient Register («Slutenvårdsregisteret») was published. This register contains data on identifiable individuals from the period since 1984 (
10). Most of the articles included were smaller-scale studies that compared the information in the register with information from patient records. The degree of correspondence was mostly in the range 85 – 95 per cent. However, the review did not include any articles investigating the validity of the diagnostic codes for cancer. In Sweden it has also been shown that correlating data from the patient register with data from the register of causes of death produced important information on the quality of the reporting of causes of death ( 11).
Whereas the Danish Cancer Registry has collected data from the National Patient Register since 1987 (
12), the CRN is currently starting to remind hospitals to send clinical data for patients treated in 2010, on the basis of data the CRN has received from the NPR. We foresee that in the long term, the CRN’s routines for sending reminders also will have an impact on the quality of the data in the NPR.
Feedback loops are now leading back to the hospitals from the CRN, based on the data the hospitals have reported to the NPR. If the hospitals frequently need to report to the CRN that the administrative patient data are incorrect, this may gradually help increase the quality of the reported data. The CRN will also report any divergences to the NPR, so that the latter may have the opportunity to review its control routines.
Our findings show that with regard to the six most frequently occurring forms of cancer there is a relatively high degree of correspondence between the relevant diagnostic codes in the Norwegian Patient Register and the diagnoses recorded by the Cancer Registry of Norway. We interpret this finding to indicate that the diagnostic codes in the NPR have high validity and that the data are suitable for the CRN to use as a basis for sending reminders to hospitals. However, no conclusions should be drawn regarding other forms of cancer or other conditions on the basis of the analyses presented here. In epidemiological cancer research, the quality-controlled data from the CRN should continue to be used. Information on the date of diagnosis is included only in the CRN, and CRN data are therefore the only source which can be used for studies of incidence, prevalence and survival of cancer. On the other hand, data from the NPR can be used to supplement CRN data, for example in studies of clinical pathways. We would expect that routine correlations of registry data could help improve the quality of both registers.