*5.2. Data Completeness*

The quality of research and statistics is only ever as good as the quality of the data contained in the record from which the data are extracted. Data accuracy in EHRs has been found to be variable [35,41], which is likely to impact on research quality. In one recent Australian study, approximately 13% of probable cases did not have a coded diagnosis, and were identified through the presence of one or more other diabetes managemen<sup>t</sup> indicators [45].

Bailie et al. (2015) identified difficulties in calculating denominators in patient data extracted from EHRs. Numerous reasons were given, including incomplete data entry, differing requirements and compatibility between EHRs and data extraction tools, and differences in the definition used for active or regular patients. The authors concluded that the inconsistencies identified limited the usefulness and reliability of the EHR data [46].

#### *5.3. The Medical Record as an 'Aide Memoir'*

The primary purpose of the EHR is to capture data that relates to the clinical care of the patient, not to obtain data for research purposes [47]. Henderson et al. (2019) sugges<sup>t</sup> that time-poor GPs may only enter the data they regard as important for patient care, which may not always reflect the data that are important for research. This limits the capability of using EHR data for research purposes [45].

The medical record has long been regarded as an 'aide memoir', or memory aid, rather than as a complete record of the patient's care. Even with the advent of EHRs, this association has continued. In a benchmarking study that examined the prevalence of diabetes using BEACH data and extracted data from one Australian EHR, the prevalence of diabetes was lower when using the extracted EHR data from the 'diagnosis' data element. However, the authors found that they could obtain a comparable prevalence estimate by identifying proxies that indicate the presence of diabetes (e.g., free text searches for diabetes in other parts of the record, medications used to treat diabetes, use of MBS item numbers only used in relation to diabetes). Importantly, the authors noted that this approach would be less reliable for other clinical conditions where proxy measures may not work [45]. Interestingly, MedicineInsight does not extract free text data, as it may contain identifiable information that could compromise privacy [44].

#### *5.4. Privacy and Information Protection*

The extraction of data from EHRs for statistical and research purposes usually involves the transfer of the exported patient data to a third party (e.g., governmen<sup>t</sup> department or University researcher). Concerns have arisen in Australia about patient privacy and information protection [35,40,41]. The removal of information from extracted data that would identify a patient has been highlighted as being of primary importance to researchers [35,41,48], GPs [40,41] and other practice staff [41]. The need for independent governance oversight of programs that involve extracted EHR data has also been emphasized [35,48].

At present, most data extraction from general practice EHRs involves the whole of practice data, where data are extracted about all patient encounters [44]. Concerns may arise if individual GPs within a practice are not willing to have data about their clinical activity included in a download, or when patients do not give permission for their data to be downloaded.

#### **6. A Fresh Approach**

We propose a new approach to improve the production of high-quality data about general practice clinical activity. This proposal is based on the following principles:


Building on the structure of the BEACH interface for active data collection, we propose developing a hybrid active + passive data collection based on data extraction from EHRs with subsequent data curation from GPs to review the quality of extracted data and complete gaps in the dataset. A specialized data extraction tool would be required to extract relevant data from the GP EHR. To circumvent problems experienced with current EHR data extractions, the GP would curate the data for completeness and validity. Weproposethattwodatatemplatesarerequired:


For each of these, minimum datasets based on a problem-oriented record structure with in-built coding and classification systems would be required for the purposes of data extraction, encryption and transfer to researchers, and subsequent data analysis.

Initially, these could be used to provide cross-sectional data from a representative sample of patients who attend general practice. A second stage of research would involve use of the tool as the basis for longitudinal data collection, whereby a sample of patients are recruited to the study and their data are extracted and curated at every visit. The addition of data about other health services received between GP visits (e.g., specialist, hospital or allied health visits), added and curated by the GP, would enhance knowledge about patients' broader experiences with the health system.

The strength of this approach is the focus placed on the importance of record structures, data linkages, coding and classification systems, and in the general application of standards required for the success of the model.

This approach will improve the understanding of morbidity and managemen<sup>t</sup> within the general practice population and provide baseline data for further research and evaluation examining interventions to improve quality of care for general practice patients. It has some utility for use in GP clinical audits and quality assurance.
