Improving data quality in your international development programme

High quality data are important for NGOs and other's implementing international development projects.  Data are needed to plan new projects and programmes, evaluate and learn from existing ones and of course to be accountable to citizens and donors that you work with.  Collecting data without considering if it's of high quality undermines it's value as an ingredient in the decision making process.

What is data quality?

Data quality means that the information collected as part of a monitoring and evaluation system is accurate and reliable.  In many cases this is taken to mean simply avoiding double-counting what you have achieved.  While this is one of the more challenging data quality issues facing some organisations, there are other important dimensions to consider.

MEASURE Evaluation separate data quality into the following operational dimensions.  The following table is reproduced from Data Quality Audit Tool (2008) by MEASURE Evaluation.  While the examples are taken from the HIV and AIDS sector, the dimensions are applicable more broadly.

Accuracy - Also known as validity. Accurate data are considered correct: the data measure what they are intended to measure. Accurate data minimise error (e.g., recording or interviewer bias, transcription error, sampling error) to a point of being negligible.

Reliability - The data generated by a program’s information system are based on protocols and procedures that do not change according to who is using them and when or how often they are used. The data are reliable because they are measured and collected consistently.

Completeness - Completeness means that an information system from which the results are derived is appropriately inclusive: it represents the complete list of eligible persons or units and not just a fraction of the list.

Precision - This means that the data have sufficient detail. For example, an indicator requires the number of individuals who received HIV counselling & testing and received their test results, by sex of the individual. An information system lacks precision if it is not designed to record the sex of the individual who received counselling & testing.

Timeliness - Data are timely when they are up-to-date (current), and when the information is available on time. Timeliness is affected by: (1) the rate at which the program’s information system is updated; (2) the rate of change of actual program activities; and (3) when the information is actually used or required.

Integrity - Integrity is when data generated by a program’s information system are protected from deliberate bias or manipulation for political or personal reasons.

Where can I learn more?

If you want to learn more, here are some useful resources to review.  If you have other suggestions please let us know in the comments below.

Data Quality Assurance Tool for Program Level Indicators - MEASURE Evaluation

This Data Quality Assurance Tool was developed for PEPFAR Programmes.  It consists of diagnostics, guidance, worksheets, and text-boxes that emphasize preventing and managing data quality challenges and documenting processes so that reporting systems are auditable. 

Data Quality Assurance Tools - MEASURE Evaluation

This web page summaries toolkits, excel templates, case studies and guidelines to assist with improving data quality.  The focus of these resources is again related to PEPFAR, but also have broader relevance.

Data Quality eLearning Course - Global Health eLearning Center

This online course provides an introduction to data quality, including the seven dimensions of data quality and coverage of the different types of double counting that impact data quality and the strategies that can be employed to avoid double counting.

How can I improve data quality in my organisation?

Now you know more about the dimensions of data quality, what can you do to improve data quality in your own work?  The following questions (adapted from those developed by PEPFAR) may give you a useful starting point.  

1. Does your programme have a list of operational definitions of what is being counted?

If your programme is working in schools, water points or health facilities, do you have a clear definition of what you mean by these?  Take the following indicator for example:

Number of health facilities and schools where community scorecards have been implemented by local CSOs

In this case the definition of a health facility and school is clear.  However, implementing community scorecards is a multi-step process.  At what point do you define them as having been implemented?  In discussions with our partner we agreed on the definition of this indicator being when a community scorecard reaches the monitoring stage.

2. Are the same operational definitions systematically followed across your programme?

Once you've agreed on operational definitions, how do you make sure these are followed across each site or facility in which you work.  When working with our partners to document their processes, we often find that field staff have different understandings of the same terms.  In other cases they raise challenges that prevent them from accurately applying this definition.

In one project the donor wanted data disaggregated by age range, gender and poverty level.  While we could come up with clear definitions for each of these that were shared, there were practical challenges in determining the poverty level of someone taking part in a workshop.  In the end we went back to donor to request a change.

To ensure all field staff were clear, we developed a Process Manual for the programme.  This clarified for each facilitator the steps they needed to follow during implementation along with any operational definitions they needed to understand.

3. Does the programme have procedures for avoiding double counting at the site level?

Anytime I see an indicator saving 'number of people reached by activity X' my first thought is around double-counting.  Without careful consideration you face a risk of counting the same people several times.  For example, if you run three sensitisation workshops you cannot simply count the number of people attending each workshop and add it up.  What you need to know is both the number of unique people attending the three workshops.

Typically this requires you devising (in advance) a system for uniquely identifying people.  Email and phone numbers are nice if the people you are working with use them, but often some other kind of unique identifyer is needed.  Some programmes assign their own code based on a string of personally identifiable information.  For example:

Year of birth + First three letters of first name + First three letters of mother's name

What best works depends on your context.  In our work on a government programme we are trying out biometric identification as a way to uniquely identify people involved in the programme.  Clearly this wouldn't be appropriate on a programme working with commercial sex workers or other marginalised groups.

4. Does the programmes have procedures for avoiding double counting across multiple sites?

This issue is an extension of the previous one.  In this case the challenge is the avoid double counting a person that attends (for example) a clinic in one village and then a different clinic in another village.  As with the previous example, a unique identifyer is needed.  However, in this case the additional challenge is ensuring that all sites shared the same registry of people and their unique identifyers.

Since BetterData is web-based this kind of data can be shared in real-time across each site in a programme.

5. Does the reporting system enable the clear identification of a drop out or a person lost to follow-up?

This particular operational challenge relates to situations where a programme is providing long-term support or assistance to an individual.  In this case you typically want to track their interaction with (and hopefully benefit from) the programme over time.  However, it's normal to lose track of people or have others simply drop-out.  Your monitoring system should have a specific way of recording when this happens.  You may also need operational systems for following up with people that have not participated for a specific time frame.

6. Who is responsible for data-collection?

In practise it's more likely that different people will play different roles in the process of collecting data.  Since we map the implementation of a programme to a specific process we can define clearly at each stage of implementation:

  • What information must be collected using which forms?
  • Who is responsible for collecting this data?
  • Who is responsible for reviewing and approving this data?

This approach can again be documented in the Process Manual and should ideally also be incorporated into people's job descriptions.

7. Does everyone on the programme use the same data collection forms?

As a programme evolves and develops it's model it's common for the data collection forms to be tweaked.  If these forms are based on paper then it's essential that one person retains the latest set of forms and also that updated forms are distributed to all field staff (and older forms collected and recycled to avoid them being mistakenly used).  This kind of version control can be tricky to handle.  Happily updating web and mobile based forms is much easier, as is tracking the version control to see who made changes and when.

8. Are clearly written instructions available on how to fill out the data collection forms?

This area relates back to the earlier question on definitions.  Ideally each form should include clear guidance to minimise the likelihood of mistakes.  With mobile and web based forms some additional options are available:

  • Help text - this provides guidance (in multiple languages) for each question
  • Labels and dividers - break the form up into logical sections with clear introductions to each
  • Validations - fields can enforce that the data entered must be a number, text, date or an email.  It can also apply logic, like that date entered in one field must be after one entered in another field.  If properly used, these type of validations can help avoid common data entry mistakes\

It's also useful to consider showing examples of completed forms.  This helps people see what is expected for each field.

9. Is there a standardised approach to aggregating and verifying data?

The approach for aggregating and verifying data depends on how you collect data to start with.  Many of our partners were using paper, Excel or Word templates before they started working with us.  Aggregating these - espcially across more than six sites - can get very challenging and requires a rigorous approach.

It's easy to make mistakes when tallying totals from one sheet or cutting and pasting numbers from another spreadsheet.  Standardising your templates can help, as can creating additional forms specifically for tallying totals.  Whatever approach you use, it's important to consider how you can check your working on a sample and ask questions that verify the validity of the totals you get.

Our work has focused on enhancing BetterData to define calculations within forms and on indicators.  This makes it possible for your forms to automatically aggregate totals by site and use that to populate another field on a different form.  This makes it easy to aggregate data across multiple sites.  

10. Are all data source documents available for auditing purposes?

An evaluator or auditor should be able to trace any indicator back to it's original source docuemnt or form to verify the claim being made.  This means having a clear link between your indicators on one side and your data collection forms on the other.

When we work with partners to set up a monitoring system we begin with a process mapping approach.  This seeks to understand all the steps involved in implementing a programme.  At each step specific forms may be needed to collect data and reports needed to aggregate data for review and learning.  We link the process mapping review to indicators on the logframe or results framework.  This makes it clear what the data source is for each indicator and ensures that the source data needed to audit that indicator is easily available.

Create a Process Manual to help improve data quality

The examples above show how mapping out the processes in your programme can tackle many data quality issues at the source.  The output from a process mapping approach is a detailed Process Manual that you can share with your colleagues working on programme implementation and monitoring and evaluation.

We're producing an eBook that takes you through our approach to process mapping.  This will explain how to create a Process Manual that documents your programme processes in detail.  Sign up for updates if you'd like to receive a free copy when it's ready.

UPDATE - Our guide to creating a process manual is now available.  Please click on the picture to download it.