18–21 May 2026
Europe/Warsaw timezone

OMOP ETL Pipeline Implementation for Tuberculosis Data Standardisation at Douala General Hospital, Cameroon

21 May 2026, 16:03
18m
Room 13 A

Room 13 A

Speaker

Brenda Yankam Mbouamba (Ruhr University Bochum)

Description

Douala General Hospital, a first-class healthcare facility in Cameroon, serves thousands of patients yearly through its multidisciplinary medical teams. The hospital hosts numerous patient records that hold significant potential for public health research. However, most records remain paper-based, limiting their accessibility and reuse. In departments such as pulmonology, patient data are often stored in heterogeneous data sheets lacking uniform structure or standardisation, which constrains their use for clinical research, care management, and evidence-based decision-making. Moreover, the absence of standardisation hinders data integration within broader health systems, restricting secure sharing and interoperability.
To address these challenges, we implemented a complete Extract, Transform, and Load (ETL) pipeline aligned with the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) version 5.4, an internationally recognised framework for health data standardisation. The objective was to transform and integrate patient data from the tuberculosis department into a database compliant with FAIR (Findable, Accessible, Interoperable, Reusable) principles, thereby enhancing data quality, interoperability, and reusability for research and clinical monitoring.
The dataset included over 80 clinical and administrative variables such as sociodemographic data, medical history, symptoms, laboratory results, and diagnoses. These data were extracted from varied paper-based sources, presenting differences in completeness and structure. For standardisation, several Observational Health Data Sciences and Informatics (OHDSI) tools were employed: WhiteRabbit for data profiling, USAGI for vocabulary mapping, and Rabbit-in-a-Hat for defining table mappings to the OMOP CDM structure. The populated tables included Person, care_site, Measurement, Visit_Occurrence, Condition_Occurrence, Observation_period and Observation. Concept mappings were derived from SNOMED CT, LOINC, and RxNorm, with contextual adaptations to local data.
The ETL pipeline was developed using SQL scripts generated from Rabbit-in-a-Hat and executed in pgAdmin for PostgreSQL. The OMOP tables were created using scripts from the OHDSI GitHub repository, and the transformed data were loaded accordingly.
Data quality was evaluated using the Achilles tool, which automatically assessed completeness, conformance, and plausibility, achieving an overall score of 99%, demonstrating the reliability of the pipeline. This work represents a pioneering effort in applying OMOP CDM within the African context, promoting collaboration, interoperability, and data-driven decision-making to strengthen tuberculosis care and research in Cameroon.

53573500355

Author

Brenda Yankam Mbouamba (Ruhr University Bochum)

Co-authors

Agnes Kiragga (African Population and Health Research Center) Bertrand Hugo Mbatchou Ngahane (Douala General Hospital, Data Science Without Borders project) Fankoua Tchaptchet Luc Baudoin (Douala General Hospital, Data Science Without Borders project) François Anicet Onana Akoa (Douala General Hospital, Data Science Without Borders project) Jean Blaise Ebimbe (Douala Gynaeco-Obstetric and Pediatric Hospital) Miranda Barasa (African Population and Health Research Center) Pauline Andeso (African Population and Health Research Center) Samuel Iddi (African Population and Health Research Center)

Presentation materials

There are no materials yet.