UK Data Service series record for:
The Health Survey for England (HSE), sponsored by the Information Centre for Health and Social Care and the Department of Health, began in 1991 and has been carried out annually since then. A number of core questions are included in every wave but each year’s survey also has a particular focus on a disease or condition or population group, which are subsequently revisited at appropriate intervals in order to monitor change. The survey combines questionnaire-based answers with physical measurements and the analysis of blood samples. Blood pressure, height and weight, smoking, drinking and general health are covered every year. An interview with each eligible person in the household is followed by a nurse visit.
You can find links to the datasets in the DATA ACCESS section above. When you follow the link to a dataset you will be taken to its catalogue record which contains the following information:
Most survey data may be downloaded as SPSS, Stata or tab-delimited files. There is a download button near the top right of each catalogue record. Most datasets can be downloaded after you login to your UK Data Service account. See our Access pages for more information about how to access data.
See our Use data pages for more advice about getting started with analyses. These pages contain advice and training; guides about datasets, topics and methods and software including SPSS and Stata; information about how others have used the data and how to cite datasets. See also our Events pages for courses and webinars about how to find, use and manage data.
The Health Survey for England (HSE) series is designed to monitor trends in the nation's health. It has run every year since 1991. The study provides regular information that cannot be obtained from other sources on a range of aspects concerning the public's health and many of the factors that affect health. The aims of the series are:
The survey focuses on different health issues each year, although a number of core questions are included every year. Topics are revisited at appropriate intervals in order to monitor change.
Annual datasets are available from 1993 onwards. The 1991 and 1992 data are only available as a combined file. For a list of up-to-date datasets, see the DATA ACCESS section on this webpage.
In most years the HSE covers the private household population, however, in 2000 older people in residential and care homes were also included in the survey. Children are not included in the HSE between 1991 and 1994. However, since 1995 the survey has been extended to include those aged 2-15 and since 2002, data is available for all ages. It should be noted that not all topics are considered suitable for children and in such cases data are not available for the child sample.
Yes. For example, the 2001 data consists of an individual and a household file. The individual file contains records for all individuals in co-operating households who gave a full interview. It also contains information from the household questionnaire, main individual schedule, self-completions and the nurse visit (where one occurred).
The 2001 household file contains records on household composition and sex, age and marital status for all individuals in co-operating households. It is provided as an aid to household level analysis. Other household level variables are stored on the individual file.
Variable lists and PDF user guides (including questionnaires) are freely available on the catalogue page of each dataset. To find a dataset’s catalogue page, follow the link to the dataset from this page under DATA ACCESS or from the results pages in our database search engine Discover.
The Health Survey for England contains a 'core' which is repeated each year and each survey year has one or more modules on subjects of special interest. Modules are often repeated periodically allowing analysis over time.
The 'core' includes: questions on general health and psycho-social indicators, smoking, alcohol, demographic and socio-economic indicators, questions about use of health services and prescribed medicines and measurements of height, weight and blood pressure.
The modules may be about a single topic, several topics or about population groups. The modules to date have been: 1993 cardiovascular disease; 1994 cardiovascular disease; 1995 asthma, accidents, and disability; 1996 asthma, accidents, and special measures of general health (Euroquol, SF36); 1997 children and young people; 1998 cardiovascular disease; 1999 ethnic groups; 2000 older people, and social exclusion; 2001 respiratory disease and atopic conditions, disability and non-fatal accidents; 2002 children and young people (aged 0-24); 2003 cardiovascular disease; 2004 ethnic minority groups; 2005 older people; 2006 cardiovascular disease; 2007 knowledge and attitudes; 2008 physical activity and fitness; 2009 long-term health conditions and self-assessed general health; 2010 respiratory disease and lung function; 2011 cardiovascular disease.
Before 2003 the HSE generally gave a good match to the total population and so weights were often not required. For example, in 2002 no weights need to be applied if using the adult sample only. Since 2003, non-response weights have been introduced to keep up with changes on many large-scale surveys with the aim of reducing possible biases and so the HSE should be weighted.
In addition, where a population has been oversampled (e.g. the population in care homes in 2000) weights are required to compensate for this. Weights are also required for analysis of children in the HSE.
The following boost samples are in the HSE:
The following gives guidance to users who want to pool the 1999 and 2004 HSE datasets in order to increase the number of respondents from minority ethnic groups. See the other FAQs for more detailed information about how to use weights with the 1999 and 2004 HSE using the ethnic boosts.
a. Sampling design
There are differences in how ethnic minorities were sampled in the two years. In particular, there is no boost for Black Africans in 1999 and so this group cannot be included. It should also be noted that the sampling methodology for the Chinese differed in 2004 and 1999. For more information on the sampling design of both years see Volume 2 (methodological report) of the HSE reports for 19992004.
There are a number of weights within the 2004 HSE but the 1999 weights are less complex. See the other FAQs for information about using the weights within the 1999 and 2004 datasets and how to prepare the weights for pooling the 1999 and 2004 datasets.
c. Primary Sampling Units (PSUs)
If you are planning on using the PSU variables to account for the survey design, you need to bear in mind that PSUs with the same numbers in 1999 and 2004 do not represent the same PSUs across the two years. You can make the PSU variables unique within each year by adding 1000 to each PSU in 1999 and 2000 to each PSU in 2004.
d. Merging the datasets
First it is necessary to prepare the weights in the 1999 and 2004 data – see the FAQ on using the HSE individual weights if pooling the 1999 and 2004 data.
Having prepared the weights you can simply append/aggregate the 1999 and 2004 together. An example of the STATA syntax for appending the 2 datasets is given below:
use "<insert name of 1999 file>"
e. Differences in topics between years
Extra topics in 1999
- Use of health services: GP, hospital, dental services
- Demispan measurement
- Mid-upper arm circumference measurement
Extra topics in 2004
- Heating/cooking appliances, mould and dampness, household pets
- Fruit and vegetable consumption: detailed section in the individual questionnaire in 2004 but only four questions in 1999 in the self-completion booklet on ‘eating habits’
- Complimentary and alternative medicines
- Parental health
- Cycling safety
- Euroqol general health (EQ-5D)
- Social capital
- Infant length
- Urine sample
Please note that not all respondents were asked each of these topics, for example in 2004 the nurse visit was only given to those from minority ethnic groups. Appendix A of the HSE User Guide for 1999and 2004 give a detailed list of topics and the population/sub-sample of respondents covered by each topic.
f. Differences in classifications
The questions on ethnicity within the 1999 individual section and the 2004 individual and household ethnicity sections are very similar. The following differences are noted
- Irish has a slightly different definition
- Asian/Asian British – 2004 includes Indian Caribbean (not in 1999)
- Mixed – more detailed questions in 2004 if ‘other’ (mother’s and father’s cultural background)
- In 2004 for ‘Other family origins’ if more than one answer is given, respondent is asked for their mother’s cultural background (not in 1999)
The questions on ethnicity in the 1999 household section are different to the 1999 individual section and 2004 household and individual sections. Please refer to the 2004 and 1999 questionnaires for more detail
2. Socio-economic classification
In 1999 Socio-Economic Group (SEG) and Registrar General’s Class (RG Class) were used; in 2004 NS_SEC was used (SEG and RG Class are also available in 2004). The 3 or 5-category version ofNS_SEC gives reasonable comparability with RG Class. See Appendix 2 of The National Statistics Socio-economic Classification: Origins, Development and Use
3. Index of Multiple Deprivation (IMD)
IMD is not in the 1999 dataset (but is in 2004)
NB: there may be other differences that are not noted here.
There are two files in the 2004 HSE dataset:
(1) The General Population (GP)
(2) Ethnic Boost file
GP refers to the whole population of England, regardless of ethnic group. The GP file is representative of the general population of England so it contains ‘white’ individuals and minority ethnic individuals in the GP sample. Those in the GP file who were classified as belonging to one of the target minority groups were given the same questionnaire as the ethnic boost sample.
The Ethnic Boost file includes all those from the ethnic boost sample PLUS the ethnic minorities from the GP sample.
The following weights are in the files:
(1) GP file:
- Household level: wt_hhld : adjusts for non-contact and refusal of households. It corrects the distribution of household members to match population estimates for sex/age groups and GOR. These weights are generated using calibration weighting with household selection weights to correct for the limit of 3 households at addresses.
- Individual level, adults: wt_int : a combination of the household weight and a component which adjusts for additional non-response among individuals within households
- Individual level, children: child_wt: a combination of the household weight and a selection weight for only including a maximum of 2 kids in a household. The combined weight is then adjusted to ensure the weighted age/sex distribution matches that of all children in co-operating households.
(2) Ethnic boost file:
- Individual level: wt_int : adjusts for probability of selecting an address in the minority ethnic sample (addresses within different ethnic profiles had different chances of being selected). Then combined with selection weights for households within addresses and for individuals within households
- Nurse visit: wt_nurse : accounts for drop out at this stage
- Blood sample: wt_blood : accounts for drop out at this stage
b. Combining the files – how to use the weights
When combining the GP file and the Ethnic Boost files for analysis purposes you should delete the duplicate cases from the GP file. The following stata syntax can be used to combine the files and drop the duplicate ethnic minority cases from the GP file:
*drop ethnic minority cases on the gp file
*append the ethnic boost file to the gp file
You should then check that the combined file is in the correct proportion (e.g. the white group reflect the correct proportion of the population - based on the mid-year pop estimates for 2003). We have done this and the proportions are fine so no further adjustments are needed.
As advised by NatCen you should then create a new weighting variable (e.g. wt_intnew) which divides the combined weights by the mean weight (6.67) so that the weighted total equals the actual total (i.e. so that your output shows the correct population size rather than the weighted population size). 6.67 is the mean of your combined sample 2004 weights.
If using the nurse data you should use the variable wt_nurse as this overrides the individual weight wt-int. Similarly if using the blood data you should use the variable wt_blood as this overrides the variable wt_nurse.
The weights for children and households are different within the general population and ethnic boost files. The GP file has 3 separate weights for adults, children and households but the Ethnic Boost file only has one weight (wt_int) which combines the adult, children and household weights. So when doing analyses based purely on children or on households which weights should you use? NatCen advise to use the variable wt_int within both files when using purely child data. However the household data is more problematic – the household selection weight is very complex because it depends on the profile of household members, if you are using this weight you should contact NatCen for further advice.
In summary you should use your new weighting variable (e.g. wt_intnew) for all individual analyses (adults and children) but use wt_nurse if using nurse visit data or wt_blood if using the blood data.
c. Do the weights make a difference?
You can see that the weights make a difference to the final result by running the following crosstabs on the adults in the combined file without the weights and then with the weights applied, separately for men and women:
- BMI by ethnic group
- Portions of fruit and vegetables by ethnic group
- Physical activity by ethnic group
The weights do make a difference to the final results (some are larger differences than others). For example the percentage of Black-Caribbean women classified as obese is 28% when unweighted but rises to 30% when weighted. The weighted cross-tabulations replicate the weighted results (and unweighted bases) shown in the published HSE report.
The 1999 and 2004 Health Surveys for England (HSE) both contained ethnic boost samples in order to increase the number of ethnic minority respondents in the survey.
It is possible to pool the 1999 and 2004 datasets to further increase the number of ethnic minority respondents for your analyses. When pooling the 2 datasets you need to ensure that you use the weights from 1999 and 2004 in the correct way.
First you need to prepare the 2004 weights, see the FAQ above about how to use the 2004 Health Survey for England (HSE) weights.
Next, you need to prepare the 1999 weights which are less complex than the 2004 weights.
For the 1999 dataset you need to follow the same procedure as 2004 in terms of dropping the ethnic minority cases from the general population dataset, merging the 2 files together and adjusting the weight so that the weighted total equals the actual total. You can check the population is in the correct proportion for 1999 (e.g. the white group reflects the correct proportion of the population) by comparing with the 2001 Census estimates.
You should then scale the weights for both years (i.e. divide each year by the mean weight). Scaling the weights for both years means you're not giving prominence to any one year - both years have a mean weight of one. You then simply give the new scaled weight in each year the same name (e.g. wt_intnew) and then combine the datasets and use the new scaled weight (e.g. wt_intnew) as described in the questions above about how to use the 2004 Health Survey for England (HSE) weights.
See also the other HSE FAQs for more information about pooling the 1999 and 2004 HSE datasets.
The gora variable does not have labels in the deposited data in the HSE 2001. The coding for this variable is as below:
A North East
B North West
E East Midlands
F West Midlands
G East England
J South East
K South West
The HSE generally uses standard missing value conventions which are shown below (a few of the variables do not conform to this scheme but the value labels are clearly marked). However, sometimes missing values are coded incorrectly, for example all missing values for the 2002 and 2003 variable WhVal (Waist-hip measurement value) are coded as -1 when they should be coded as either -6 or -7.
Missing value conventions on the HSE:
-1 Not applicable: Used to signify that a particular variable did not apply to a given respondent usually because of internal routing. For example, men in women only questions.
-2 Schedule not applicable: Used mainly for variables on the self-completions when the respondent was not of the given age range, also used for children without legal guardians in the home who could not participate in the nurse schedule.
-6 Schedule not obtained: Used to signify that a particular variable was not answered because the respondent did not complete or agree to a particular schedule (i.e. nurse schedule or self-completions).
-7 Refused/ not obtained: Used only for variables on the nurse schedules, this code indicates that a respondent refused a particular measurement or test or the measurement was attempted but not obtained or not attempted.
-8 Don't know, Can't say.
-9 No answer/ Refused.
These are published in annual reports and are available from a good academic library or from The Stationery Office. Please also refer to the HSE online reports and results on the Health and Social Care Information Centre website (see the Department of Health website for earlier publications) and the HSE reports on the NatCen Health and Lifestyle web page.
Yes. Wales, Scotland and Northern Ireland all have health surveys that are downloadable from our web site. They are:
However, it is important to note that there are issues of compatibility between these surveys. The questions that are used are not always identical and there are methodological differences between surveys. For example, the WHS in 1995 and 1998 used a paper questionnaire to collect information whilst the WHS from 2003-4 onwards, HSE, SHeS and Northern Ireland surveys gather data through interviews.
See our Health and health behaviour theme pages for more health-related data and resources.
Using the Health Survey for England for teaching
Three teaching datasets based on the HSE have been produced, all held at the UK Data Archive. These datasets use data from the related HSE dataset but contain fewer variables.
The following teaching datasets were created for particular courses given by the UK Data Service (or its forerunner, the Economic and Social Data Service). They may be used for other purposes but were designed to go with the teaching materials used on the course, which are also available to download with the data.
Yes, the workbook Using SPSS for Windows: Exploring the Health Survey for England 2002 Teaching Dataset in SPSS uses the 2002 HSE teaching dataset to demonstrate how to get started doing research using SPSS.
The teaching datasets for multi-level modelling and small area estimation were designed for courses on those techniques and the materials used on the course are available to download with the teaching datasets.
See our teaching pages for practical information, exemplars, and tips for using UK Data Service data in teaching, including: