How did the study come about?

During recent decades China has undergone a rapid transition in the main disease patterns of its population. There has been a substantial decrease in maternal and child mortality and in adult mortality from the main infectious/parasitic diseases, whereas for ischaemic heart disease and some other non-communicable chronic diseases, there has been a moderate increase in mortality rates, at least in some parts of China. Consequently, most of the premature mortality that still remains in China now involves the chronic diseases of middle age, such as cancer, stroke, heart disease, or chronic lung disease.14 In the mid-1970s a retrospective survey of the causes of 20 million deaths in China during 1973–75 showed that for each major disease there was large unexplained variation in the age-specific rates between different parts of China, indicating that there must be some large avoidable causes.5 This finding was confirmed and extended in the late 1980s by a more careful retrospective survey of the causes of one million deaths during 1986–88 in 69 rural counties and 24 cities (Figure 1).3,4 These big differences in disease rates between one area and another probably reflect differences in the ways people live rather than genetic differences. Moreover, even within one area there are likely to be substantial differences between individuals in genetic factors, as well as in patterns of chronic infection, personal biochemistry, physical characteristics, lifestyle, etc. that persist for many years and eventually influence the likelihood of particular individuals developing particular diseases.3,4

Figure 1

Geographic variations of age-standardized mortality rates for selected diseases in rural China, aged 35–69 (per 100 000), 1986–88 (each letter in the figure represents one county. Comparable rates for UK males and females for 1990 are also shown.)

Large prospective cohort studies that last for many years, comparing the characteristics of people who do eventually develop a particular disease with the characteristics of people in the same area who do not, are an important way of investigating many slow-acting causes of the common chronic diseases in the population. During the 1990s, a large prospective cohort study (as well as a large retrospective study) was set up to help determine the main avoidable causes of chronic disease in China.6 That study did not involve the collection and long-term storage of blood samples, but it is generating valuable information (together with the retrospective study) about the health effects of tobacco smoking and of some other non-blood related risk factors [such as body mass index (BMI) and blood pressure].68 However, without the collection of blood samples, the range of possible risk factors that can be investigated is relatively limited. Consequently, a blood-based prospective cohort study involving 500 000 middle-aged adults in 10 different parts of China was funded in 2002 by a major research grant from the Hong Kong-based Kadoorie Charitable Foundation (KCF), with UK-based support from the Medical Research Council, the British Heart Foundation, and Cancer Research UK. Recruitment started in June 2004, the first 100 000 participants had been recruited by mid-2005, and recruitment should be completed during 2008.

What does it cover?

The KSCDC is an open-ended prospective study with very broad research aims. The primary objectives of the study are to assess reliably the effects of both established risk factors and emerging risk factors for many different diseases, not only overall but also in a range of different circumstances (e.g. at different ages and at different levels of other risk factors). The study will also help monitor the growth of the tobacco epidemic over the next few decades in China, given the recent large increase in cigarette consumption in the adult population.6,7,9,10

By storing both plasma and buffy coat samples from a large number of individuals, the study will allow reliable assessment of the relevance of many genetic and other factors that will be proposed in the future as correlates or determinants of the incidence or mortality rates of various common chronic diseases. For instance, it is expected that the identification of a very large number of genetic markers and the availability of high-throughput genotyping will soon allow studies of unrelated individuals to locate new genetic variants associated with an increased risk for various major chronic diseases. There is also a long list of other factors in blood samples, which may well be of substantial interest as correlates of disease-specific mortality (e.g. lipoproteins and lipid fractions, homocysteine, various haemostatic and inflammatory factors, other plasma proteins, antibodies to many other infective agents). The study will use a ‘nested’ case–control approach for blood analyses whereby, when sufficient numbers of people have developed or died from particular diseases, their stored blood will be retrieved and compared with blood retrieved from otherwise similar individuals who do not have this disease. By seeking differences in the stored DNA and plasma, a wide range of genetic and environmental correlates and causes can be studied. Such an approach to the analysis of blood samples allows many different factors to be studied in relation to many different diseases at relatively low cost, without it being necessary to know at the outset the factors that will be of chief interest many years hence.

Who is in the sample?

KSCDC aims to recruit ∼500 000 adults initially aged 35–74 from the general population in 10 geographically defined regions (five rural counties and five urban districts) across China (Figure 2). The study cohort is not designed to be representative of the general population (although within each study area the participants will be relatively unselected). Rather, it is planned that study areas will include several particular regions throughout China, with quite different disease profiles and quite different exposures. The study sites were carefully selected based on patterns of major chronic disease rates and risk exposures, levels of economic development, relative stability of the population and local infrastructures (including quality of existing death reporting systems, access to broadband internet, and availability of a reliable courier service for shipment of blood samples by air), and long-term local commitment to the project. In the 10 study areas, all men and women aged 35–74 who are permanently resident (for complete follow-up of mortality and morbidity) and without major disability in each administrative unit (village or street committee) will be identified and invited to participate. The response rate for participation at each survey site will be related to these known denominators.

Figure 2

The location of 10 survey sites.

How often will they be followed-up, and what is the rate of loss likely to be?

At 5 year intervals, a reasonably representative sample of ∼10 000 (2%) surviving study participants will be invited for re-survey with repeat interviews, including the same measurements and blood collection procedures as those in the baseline survey. These repeat assessments will be used to take account of biological variations and random errors in measurements made at baseline, allowing us to correct for ‘regression dilution’ bias in the estimation of associations between long-term ‘usual’ levels of particular risk factors and disease.11 All study participants will be followed-up indefinitely for cause-specific mortality through the death registries that are now fully established at the 10 study sites. Nearly all adult deaths in these survey areas will have involved some form of medical attention, with their underlying causes being certified by a doctor. In rare situations (currently <5%) when death occurs at home without recent medical attention, a verbal autopsy will be conducted by qualified staff based at the regional coordinating centre to help determine the most likely cause from symptoms or signs described by family members.6

To improve the statistical power of the study, information on non-fatal events will also be obtained for certain major categories of disease (such as cancer, stroke, and heart attack) through the established registry systems available at five urban sites, and by periodically visiting doctors at street or village health clinics, and by discussing the status of the study cohort with village or street committee members during annual active confirmation follow-up. For any suspected cases of non-fatal stroke, CHD, or cancer, further confirmation about their diagnosis will be sought by reviewing hospital or other medical records. Although the baseline survey of 50 000 subjects in each study area is expected to take 3–4 years to complete, the mortality follow-up for particular administrative units (such as villages or street committees) within each study area will start within 6 months of the start of the baseline survey in that geographically defined area to allow for the establishment of computerized long-term follow-up systems.

The quality and completeness of both mortality and morbidity follow-up data in each survey site will be checked regularly during the study period by the study coordinating centres. This will involve monitoring the number of people who have died or are lost to follow-up each year, assessment of overall mortality patterns in the study cohort, levels of diagnostic criteria for individual diseases, and the proportion of deaths (in middle age and, separately, in old age) with unknown cause. The observed age-specific mortality rate in each study site will also be compared with the projected rate based on the overall death rate for that region. It is anticipated that during the first 10 years of follow-up there will be ≥60 000 deaths, including ∼15 000 from each of cancer, stroke, and COPD and 5000 from CHD (plus, perhaps, even larger numbers of non-fatal events). The proportion of deaths with unknown cause or proportion of subjects lost to follow-up should both be <5% (at least up to age 80).

What has been measured?

Following informed written consent, which allows the blood to be used for unspecified research purposes that are of no direct benefit to the individuals, each study participant will undergo an interview and physical examination and will provide a blood sample for long-term storage. Information collected through face-to-face interview will cover a broad range of variables (Table 1), including demographic factors, indicators of socio-economic status, smoking, alcohol and green tea consumption, diet, physical activity, pre-existing diseases and current long-term medication, indoor air pollution now and in the past, reproductive history (women), and psychological status. Blood pressure, heart rate, height (standing and sitting), weight, waist/hip circumference, bio-impedance, lung function (FEV1 and FVC), and CO level are also to be measured (Table 2). A total of 10 ml non-fasting blood is to be collected into an EDTA-containing vacutainer, and a small sample of this is used for on-site rapid dipstick testing of random blood glucose and hepatitis B antigen (HBsAg) before the vacutainer is placed in cool boxes at ∼4°C (Table 2). All information collected is entered directly into a computer using a laptop-based direct data entry system developed specifically for the project. The program has various built-in functions to avoid missing items and minimize logic errors during the interview, as well as help buttons to facilitate interview procedures. At the end of each survey day, the blood samples are to be centrifuged at the local study laboratory and then aliquoted into four bar-coded cryovials (three plasma samples and one buffy coat) for long-term storage in nitrogen tanks. Within ∼1–3 weeks of the initial survey, repeat measures are to be obtained on a random sub-sample (3%) of participants, providing estimates of reliability and checks against any serious organizational failure.

Table 1

Summary of questionnaire data collected in the KSCDC project

Demographic data
Socioeconomic data
    •Occupation
    •Education
    •Household composition
    •Income
    •Health insurance cover
    •Financial assets
Personal health behaviours
    •Alcohol
    •Smoking
    •Green tea
    •Diet (frequency)
    •Spicy food
    •Nutritional supplement
    •Physical activity (at work and during leisure time)
General health related data
    •Self-rated health status
    •Disease history (for 18 common conditions)
    •Current medication on CVD or diabetes
    •History of blood transfusion
    •Pattern of bowel movements
    •History of severe food shortage
    •Exposure to passive smoking
    •Exposure to indoor air pollution from cooking/heating fuel
    •Weight change during the last 12 months
    •Weight in early adulthood
Family history
    •Parental age/or age of death
    •Parental cause of death
    •Number of siblings
    •Sibling's medical history
    •Number of children
    •Children's medical history
Sleeping, mood, and mental situation
    •Self-rated mood status
    •Traumatic events
    •Sleep situation
    •Depression and anxiety (CIDI-SF)
    •Panic attack
    •Chronic pain
    •Claustrophobia and agoraphobia
Reproductive history (for women)
    •Age of first menstrual period
    •Menopause status
    •History of pregnancy
    •History of breast feeding
    •History of contraceptive pills use
    •History of hysterectomy and of ovary/breast surgery
Demographic data
Socioeconomic data
    •Occupation
    •Education
    •Household composition
    •Income
    •Health insurance cover
    •Financial assets
Personal health behaviours
    •Alcohol
    •Smoking
    •Green tea
    •Diet (frequency)
    •Spicy food
    •Nutritional supplement
    •Physical activity (at work and during leisure time)
General health related data
    •Self-rated health status
    •Disease history (for 18 common conditions)
    •Current medication on CVD or diabetes
    •History of blood transfusion
    •Pattern of bowel movements
    •History of severe food shortage
    •Exposure to passive smoking
    •Exposure to indoor air pollution from cooking/heating fuel
    •Weight change during the last 12 months
    •Weight in early adulthood
Family history
    •Parental age/or age of death
    •Parental cause of death
    •Number of siblings
    •Sibling's medical history
    •Number of children
    •Children's medical history
Sleeping, mood, and mental situation
    •Self-rated mood status
    •Traumatic events
    •Sleep situation
    •Depression and anxiety (CIDI-SF)
    •Panic attack
    •Chronic pain
    •Claustrophobia and agoraphobia
Reproductive history (for women)
    •Age of first menstrual period
    •Menopause status
    •History of pregnancy
    •History of breast feeding
    •History of contraceptive pills use
    •History of hysterectomy and of ovary/breast surgery
Table 1

Summary of questionnaire data collected in the KSCDC project

Demographic data
Socioeconomic data
    •Occupation
    •Education
    •Household composition
    •Income
    •Health insurance cover
    •Financial assets
Personal health behaviours
    •Alcohol
    •Smoking
    •Green tea
    •Diet (frequency)
    •Spicy food
    •Nutritional supplement
    •Physical activity (at work and during leisure time)
General health related data
    •Self-rated health status
    •Disease history (for 18 common conditions)
    •Current medication on CVD or diabetes
    •History of blood transfusion
    •Pattern of bowel movements
    •History of severe food shortage
    •Exposure to passive smoking
    •Exposure to indoor air pollution from cooking/heating fuel
    •Weight change during the last 12 months
    •Weight in early adulthood
Family history
    •Parental age/or age of death
    •Parental cause of death
    •Number of siblings
    •Sibling's medical history
    •Number of children
    •Children's medical history
Sleeping, mood, and mental situation
    •Self-rated mood status
    •Traumatic events
    •Sleep situation
    •Depression and anxiety (CIDI-SF)
    •Panic attack
    •Chronic pain
    •Claustrophobia and agoraphobia
Reproductive history (for women)
    •Age of first menstrual period
    •Menopause status
    •History of pregnancy
    •History of breast feeding
    •History of contraceptive pills use
    •History of hysterectomy and of ovary/breast surgery
Demographic data
Socioeconomic data
    •Occupation
    •Education
    •Household composition
    •Income
    •Health insurance cover
    •Financial assets
Personal health behaviours
    •Alcohol
    •Smoking
    •Green tea
    •Diet (frequency)
    •Spicy food
    •Nutritional supplement
    •Physical activity (at work and during leisure time)
General health related data
    •Self-rated health status
    •Disease history (for 18 common conditions)
    •Current medication on CVD or diabetes
    •History of blood transfusion
    •Pattern of bowel movements
    •History of severe food shortage
    •Exposure to passive smoking
    •Exposure to indoor air pollution from cooking/heating fuel
    •Weight change during the last 12 months
    •Weight in early adulthood
Family history
    •Parental age/or age of death
    •Parental cause of death
    •Number of siblings
    •Sibling's medical history
    •Number of children
    •Children's medical history
Sleeping, mood, and mental situation
    •Self-rated mood status
    •Traumatic events
    •Sleep situation
    •Depression and anxiety (CIDI-SF)
    •Panic attack
    •Chronic pain
    •Claustrophobia and agoraphobia
Reproductive history (for women)
    •Age of first menstrual period
    •Menopause status
    •History of pregnancy
    •History of breast feeding
    •History of contraceptive pills use
    •History of hysterectomy and of ovary/breast surgery
Table 2

Summary of clinical measurements at baseline survey in KSCDC

Variables
No. of measurements
Equipment used
Standing heightOnceManufactured instrument
Sitting heightOnce
Hip sizeOnceStandard tape measure
Waist sizeOnce
WeightOnceBMI composition analyser (TBF-300GS)
Bio-impedanceOnce
FEV1TwiceMicro (MS01) Spirometer
FVCTwice
CO levelTwiceMicro CO meter (MC02)
Resting pulse rateTwiceUA-779 digital BP monitor
Resting blood pressureaTwice
Random blood glucosebOnceJohnson SureStep Plus
Hepatitis B antigen (HBsAg)OnceACON dipstick, USA
Variables
No. of measurements
Equipment used
Standing heightOnceManufactured instrument
Sitting heightOnce
Hip sizeOnceStandard tape measure
Waist sizeOnce
WeightOnceBMI composition analyser (TBF-300GS)
Bio-impedanceOnce
FEV1TwiceMicro (MS01) Spirometer
FVCTwice
CO levelTwiceMicro CO meter (MC02)
Resting pulse rateTwiceUA-779 digital BP monitor
Resting blood pressureaTwice
Random blood glucosebOnceJohnson SureStep Plus
Hepatitis B antigen (HBsAg)OnceACON dipstick, USA
a

If the difference between two measurements is >10 mm Hg for SBP, a third measurement is required, with the values for the last two measurements recorded.

b

If the value is between 7.8 and 11.0 mmol/l, a fasting blood glucose test will be carried out on the following day.

Table 2

Summary of clinical measurements at baseline survey in KSCDC

Variables
No. of measurements
Equipment used
Standing heightOnceManufactured instrument
Sitting heightOnce
Hip sizeOnceStandard tape measure
Waist sizeOnce
WeightOnceBMI composition analyser (TBF-300GS)
Bio-impedanceOnce
FEV1TwiceMicro (MS01) Spirometer
FVCTwice
CO levelTwiceMicro CO meter (MC02)
Resting pulse rateTwiceUA-779 digital BP monitor
Resting blood pressureaTwice
Random blood glucosebOnceJohnson SureStep Plus
Hepatitis B antigen (HBsAg)OnceACON dipstick, USA
Variables
No. of measurements
Equipment used
Standing heightOnceManufactured instrument
Sitting heightOnce
Hip sizeOnceStandard tape measure
Waist sizeOnce
WeightOnceBMI composition analyser (TBF-300GS)
Bio-impedanceOnce
FEV1TwiceMicro (MS01) Spirometer
FVCTwice
CO levelTwiceMicro CO meter (MC02)
Resting pulse rateTwiceUA-779 digital BP monitor
Resting blood pressureaTwice
Random blood glucosebOnceJohnson SureStep Plus
Hepatitis B antigen (HBsAg)OnceACON dipstick, USA
a

If the difference between two measurements is >10 mm Hg for SBP, a third measurement is required, with the values for the last two measurements recorded.

b

If the value is between 7.8 and 11.0 mmol/l, a fasting blood glucose test will be carried out on the following day.

What has it found?

Following extensive piloting, the main study was launched in June 2004. By mid-2005, all 10 sites selected had started recruitment and over 100 000 participants had already been recruited, with a current recruitment rate of ∼11 000 per month (Figure 3). Currently, the mean age of the participants was 51 years, and there were more women (58%) than men (42%). This sex ratio is slightly more extreme than in the local (non-migrant) population, and is being rectified during the subsequent recruitment period. The proportions of regular alcohol drinkers and current smokers were much higher in men (45 and 59%, respectively) than in women (5 and 2%). The mean BMI was 23.6 kg/m2, with 33% having BMI of 25–30 kg/m2 (i.e. ‘overweight’; 29%) or >30 kg/m2 (‘obese’; 4%). There was, however, a large difference in the prevalence of people who were overweight or obese (as well as of many other factors) across different regions, reflecting probably different stages of economic development (Figure 4). BMI correlated strongly with blood pressure for both men and women (Figure 5), and with the prevalence of cardiovascular disease, diabetes, or elevated random blood glucose levels (>11.1 mmol/l) (Figure 6).

Figure 3

Cumulative recruitment rate during the first year of study

Figure 4

Age-standardized prevalence of overweight or obese among men in different regions. (Data from the recently started 10th region were not included because the numbers are still small)

Figure 5

Relationship between baseline BMI and systolic blood pressure, standardized for age

Figure 6

Age-standardized prevalence of stroke, CHD, and diabetes by baseline BMI levels

By involving rural populations that have not previously been investigated extensively, this large blood-based epidemiological study would potentially widen our perspectives on what might be regarded as ‘normal’ in Western populations. For example, a previous small prospective study in China showed that the positive relationship between CHD risk and blood total cholesterol levels continues down to at least 3 mmol/l (i.e. well below the range generally studied in Western populations), without any evidence of a threshold below which lower cholesterol was not associated with lower risk.12 If confirmed by larger studies involving assays of different aspects of the lipid profile, this would indicate that LDL particles are an important cause of CHD even in individuals who have, by Western standards, low LDL cholesterol levels. Such an international perspective on risk factors for disease can help to avoid confusion between what is ‘statistically’ normal (e.g. average cholesterol levels of 5–6 mmol/l in the UK) and what may be ‘biologically’ normal (e.g. average cholesterol levels of <3 mmol/l, as still found in some parts of rural China4).

What are the main strengths and weaknesses?

The KSCDC will be among the largest blood-based prospective studies ever conducted in the world. It is carefully designed, with a range of comprehensive computerized systems for reliable and efficient data collection and management (Table 3). The chronic diseases normally associated with affluence (e.g. CHD, lung cancer, diabetes, etc.) are most prevalent in the urban and coastal study sites, whereas the chronic diseases normally associated with poverty (e.g. stomach cancer, oesophagus cancer, COPD, etc.) are more prevalent in the inland and rural sites (Figure 1). Although there are numerous challenges ahead, especially with respect to the recruitment of enough men and reliable follow-up of non-fatal events in some of the study areas, this large study will provide reliable new data on both established and emerging risk factors for several major chronic diseases in a range of different circumstances. Moreover, by storing blood samples from a large number of individuals, it will allow reliable assessment of many genetic and other factors that will be proposed in the future as correlates or determinants of risk from several common chronic diseases. This will be relevant to the prediction, prevention, and understanding of disease both in China and other countries.

Table 3

Summary of computerized data collection and management systems

Table 3

Summary of computerized data collection and management systems

Can I get hold of the data? Where can I find out more?

Recruitment of the target population of 500 000 is not expected to end until 2008. Although the data will not be freely available, specific proposals for future collaboration would be welcomed. Further information can be found on the study websites (www.ctsu.ox.ac.uk/~kadoorie/public/ in English or www.kscdc.net in Chinese) or through email to Pamela.linksted@ctsu.ox.ac.uk (project manager).

For the KSCDC collaborative group.

The KSCDC is supported by a research grant from the KCF in Hong Kong. The UK Medical Research Council, the British Heart Foundation, and Cancer Research UK also provide core funding to the Clinical Trial Service Unit. The most important acknowledgement is to the participants in the study, to members of the survey teams in each of the 10 regional centres (Henan, Hunan, Gansu, Sichuan, Zhejian, Harbin, Qingdao, Suzhou, Liuzhou, and Haikou), as well as to the project development and management teams based at the Beijing national (Shaorun Dong, Zheng Bian, Yian Xu, Guoqin Zhao, and Jiang He) and Oxford international coordinating centres (Garry Lancaster, Herminio Gonzalez, Alison Offer, and Judith Nie).

References

1

Murray CJL, Lopez AD (eds). The Global Burden of Disease. Vol. 1. Harvard: Harvard University Press,

1996
.

2

Yang GH, Murray CJL, Zhang Z. Exploring Adult Mortality in China: Levels, Patterns and Causes. Beijing: Hua Xia Press,

1991
.

3

Chen JS, Campbell TC, Li JY, Peto R. Diet, Lifestyle and Mortality in China: A Study of the Characteristics of 65 Chinese Counties. Oxford: Oxford University Press,

1990
.

4

Chen JS, Peto R, Pan WH et al. Geographic Study of Mortality, Biochemistry, Diet and Lifestyle in Rural China. Oxford: Oxford University Press,

2006
.

5

Li JY, Liu BQ, Li GY, Chen ZJ, Sun XD, Rong SD. Atlas of cancer mortality in the People's Republic of China: an aid for cancer control and research.

Int J Epidemiol
1981
;
10
:
127
–33.

6

Niu SR, Yang GH, Chen ZM et al. Emerging tobacco hazards in China: 2. Early mortality results from a prospective study.

BMJ
1998
;
317
:
1423
–24.

7

Liu BQ, Peto R, Chen ZM et al. Emerging tobacco hazards in China: 1. Retrospective proportional mortality study of one million deaths.

BMJ
1998
;
317
:
1411
–22.

8

Chen ZM, Yang GH, Zhou MG et al. Body mass index and mortality from ischaemic heart disease in a lean population: 10 year prospective study of 220 000 adult men.

Int J Epidemiol
2005
(in press).

9

Peto R, Chen ZM, Boreham J. Tobacco—the growing epidemic.

Nat Med
1999
;
1
:
15
–17.

10

Lam TH, Ho SY, Hedley AJ, Mak KH, Peto R. Mortality and smoking in Hong Kong: case-control study of all adults deaths in 1998.

Br Med J
2001
;
323
:
361
–62.

11

Clarke R, Shipley M, Lewington S et al. Underestimation of risk associations due to regression dilution in long-term follow-up of prospective studies.

Am J Epidemiol
1999
;
150
:
341
–53.

12

Chen ZM, Peto R, Collins R, Lu JR, Li WX. Serum cholesterol concentration and coronary heart disease in a population with low cholesterol concentration.

BMJ
1991
;
303
:
276
–82.