Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study

<p>Abstract</p> <p>Background</p> <p>In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a...

Full description

Saved in:
Bibliographic Details
Main Authors: Gibson Diane (Author), Anderson Phil (Author), Karmel Rosemary (Author), Peut Ann (Author), Duckett Stephen (Author), Wells Yvonne (Author)
Format: Book
Published: BMC, 2010-02-01T00:00:00Z.
Subjects:
Online Access:Connect to this object online.
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<p>Abstract</p> <p>Background</p> <p>In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a common SLK is now used in many collections to facilitate the statistical examination of cross-program use. In 2005, the Pathways in Aged Care (PIAC) cohort study was funded to create a linked aged care database using the common SLK to enable analysis of pathways through aged care services.</p> <p>Linkage using an SLK is commonly deterministic. The purpose of this paper is to describe an extended deterministic record linkage strategy for situations where there is a general person identifier (e.g. an SLK) and several additional variables suitable for data linkage. This approach can allow for variation in client information recorded on different databases.</p> <p>Methods</p> <p>A stepwise deterministic record linkage algorithm was developed to link datasets using an SLK and several other variables. Three measures of likely match accuracy were used: the discriminating power of match key values, an estimated false match rate, and an estimated step-specific trade-off between true and false matches. The method was validated through examining link properties and clerical review of three samples of links.</p> <p>Results</p> <p>The deterministic algorithm resulted in up to an 11% increase in links compared with simple deterministic matching using an SLK. The links identified are of high quality: validation samples showed that less than 0.5% of links were false positives, and very few matches were made using non-unique match information (0.01%). There was a high degree of consistency in the characteristics of linked events.</p> <p>Conclusions</p> <p>The linkage strategy described in this paper has allowed the linking of multiple large aged care service datasets using a statistical linkage key while allowing for variation in its reporting. More widely, our deterministic algorithm, based on statistical properties of match keys, is a useful addition to the linker's toolkit. In particular, it may prove attractive when insufficient data are available for clerical review or follow-up, and the researcher has fewer options in relation to probabilistic linkage.</p>
Item Description:10.1186/1472-6963-10-41
1472-6963