One potato, two potatoes, three potatoes, [blank]: handling missing data in food frequency questionnaires
Public Health Nutrition Editorial Highlight: ‘Missing data in food frequency questionnaires: making assumptions about item non-response’, by Karen E Lamb, Dana Lee Olstad, Cattram Nguyen, Catherine Milte, Sarah A McNaughton
Measuring dietary intake is challenging due to the variety of foods available for consumption. Food frequency questionnaires (FFQs) are a popular method of capturing dietary information as they are comparatively inexpensive and relatively simple to use. FFQs tend to contain a large number of questions to capture profiles of dietary patterns or nutrient intakes. The upside of a detailed and lengthy questionnaire is that the answers to these questions may help to better characterise dietary intake. The downside is that answering many questions can prove onerous to study participants. This presents a problem: a longer and more detailed questionnaire may provide more meaningful data, but it also increases the likelihood of missing responses.
This ‘missing data’ causes problems for nutrition researchers. Missing data makes it impossible to perform statistical analyses without making assumptions about how the missing data arose. Since these assumptions will influence the outcome of the analysis, the methods for handling the missing data must be selected with care! In fact, the best way to deal with missing data is an ongoing area of research within the field of statistics. Although this research has yielded a variety of possible approaches, it is often unclear what assumptions are made by nutrition researchers when dealing with missing FFQ data as they do not explicitly state how they dealt with any missing data.
The most common approach is to assume that the missing data corresponds to no consumption of a particular item. This approach is known as a ‘single imputation’ method, meaning that each missing observation is replaced with a single value. For example, a blank answer to the question ‘How often do you eat bananas?’, would be taken to mean that this person never eats bananas. While this method permits analysis, it can result in misleading findings since participants may under report consumption of particular items (e.g. energy-dense, nutrient poor foods). Alternative single imputation approaches include replacing the missing observation with the most common value observed in the sample (i.e. the sample mode). However, a fundamental problem with single imputation approaches is that these approaches treat the imputed values as if they were known, when in fact there is uncertainty about their true values. As such, this approach can lead to incorrect inference about associations between diet and other factors such as health.
In our commentary we describe an alternative approach to dealing with missing data: multiple imputation. This approach allows the uncertainty associated with imputation to be incorporated into the analysis. We discuss the drawbacks of more commonly employed methods while highlighting both the possible alternatives and the need for greater transparency in their reporting. Our intent is to raise awareness of the challenges in dealing with missing data in FFQs, and to encourage the use of more robust methods for addressing them.
The paper, ‘Missing data in food frequency questionnaires: making assumptions about item non-response‘ is published in the journal Public Health Nutrition and is freely available until 8th January 2017.