Data Integration Platform
An open-source data integration platform for public health projects.
Health data: a complex landscape
Understanding health issues often requires combining complex and heterogeneous data sources, even in the context of single-country interventions. Data can come from HMIS platforms such as DHIS2, from individual tracking systems, from custom software built to address specific issues, or from various Excel reports provided by health experts.
Having such diverse data in disconnected silos is often the biggest obstacle to an efficient exploration and analysis process.
It also makes collaboration difficult, and many data analysts working on health data end up developing ad-hoc scripts and visualisations on their own laptops and communicating their results in scattered publications from which it is hard to get unified insights.
Breaking down the silos: the Bluesquare data integration platform
To address this issue, Bluesquare built OpenHexa, a cloud-based data integration platform consisting of three components: extraction, analysis & visualization. This platform is mostly based on mature open-source technologies, such as the Jupyter ecosystem and allows a lot of flexibility for its users.
Our platform is able to gather data from different sources:
1. Automatic extracts of DHIS2 data
2. GIS data management
3. Ad-hoc reports.
The core of the platform is a cloud-based computation environment. Based on JupyterHub, it allows different actors to collaborate using Jupyter notebooks.
Extraction, cleaning up and storing of the data by our in-house data science team.
Analysis by all project stakeholders on the raw and cleaned data.
Many different solutions can be used to visualise data, and the choice of the tool will depend on the project: Tableau, PowerBI, Dash, Shiny or Voilà.
This hosted analysis environment offers multiple benefits:
To national health systems managers
It gives a simple way to give access to their data to researchers or Monitoring and Evaluation specialists, and to get feedback on the analysis and results obtained. They can build reports mobilising both the routine data and results from analysis produced by national and international research teams, thus benefiting from the expertise of a wide range of actors. It can also help enforce data sharing agreements by providing a single platform to share data.
For analytical teams
It allows them to connect to updated data coming from a variety of data sources, so they can concentrate on their expertise rather than on acquiring and formatting data. It also allows them to easily see updates in their results when routine data is updated, and thus breaks the rigidity of the publication cycle that is too narrow for policy oriented data use.