Gut microbiota composition correlates with diet and health …

RESEARCH ARTICLE

subjects, representative (by UniFrac) of three community settings. (Day-hospital subjects grouped closely to community dwellers by microbiota and dietary analysis, and were not included.) A represent- ative NMR profile is presented in Supplementary Fig. 5. Initial PCA (principal component analysis) analysis showed a trend for separation according to community setting (data not shown). Pair-wise statistical models were therefore constructed according to the cluster groups. Valid and robust models were obtained for comparison of NMR spectra from community and long-stay subjects, and community and rehabilitation subjects (Fig. 3). The major metabolites separating community from long-stay subjects were glucose, glycine and lipids (higher levels in long-stay than community subjects), and glutarate and butyrate (higher levels in community subjects). Co-inertia analysis of the genus-level microbiota and metabolome data revealed a significant relationship ( P value , 0.01) between the two data sets (Supplemen- tary Fig. 6 and Supplementary Notes). Notwithstanding three long- stay subjects, a diagonal separated community from long-stay in both microbiota and metabolome data sets. Other metabolites of interest were acetate, propionate and valerate, which were more abundant in community dwellers (Supplementary Fig. 6). To investigate microbial short-chain fatty acid (SCFA) production further, the frequency of microbial genes for SCFA production was investigated by shotgun metagenomic sequencing. We sequenced 125.9 gigabases (Gb) of bacterial DNA from 27 of the 29 subjects, and assembled contigs with a total length of 2.20 Gb, containing 2.51 million predicted genes (Supplementary Table 4). Consistent with reduced microbiota diversity (Supplementary Fig. 3), there were significantly fewer total genes predicted, and higher N50 values (N50 is the length of the smallest contig that contains the fewest (largest) contigs whose combined length represents at least 50% of the assembly), in the assembled metagenomic data of long-stay subjects compared to rehabilitation or community subjects (Supplementary Fig. 7). The metagenomes were then searched for key microbial genes in butyrate, acetate and propionate production, revealing significantly higher gene counts and coverage for butyrate- and acetate-producing enzymes (BCoAt and ACS, respectively) in community and rehabilitation com- pared to long-stay subjects (Supplementary Fig. 8 and Supplementary Table 5). There was also significantly higher coverage of the propionate- related genes (PCoAt) in community compared to long-stay subjects, but the higher gene count was not significant (Supplementary Table 5). These observations are consistent with the association of butyrate, acetate and propionate and the direction of the main split between long-stay and community subjects in the metabolome; candidate

described over 11% of the data set variance and most differences in food consumption between community-dwelling and long-stay sub- jects. The most discriminating food types were vegetables, fruit and meat, whose consumption changed in a gradual manner along the first eigenvector. Procrustes analysis of the FFQ and the microbiota b -diversity was used to co-visualize the data (Fig. 2b). Separations based on either diet or microbiota co-segregated along the first axis of both data sets (unweighted and weighted UniFrac, Fig. 2b; Monte-Carlo P value , 0.0001). Application of complete linkage clustering and Euclidean distances to the first eigenvector (Fig. 2c) revealed four dietary groups (DGs). DG1 (‘low fat/high fibre’) and DG2 (‘moderate fat/high fibre’) included 98% of the community and day hospital subjects, and DG3 (‘moderate fat/low fibre’) and DG4 (‘high fat/low fibre’) included 83% of the long-stay subjects. For a complete description of dietary groups, see Supplementary Notes and Supplementary Table 3. The healthy food diversity index (HFD 23 ) positively correlated with three microbiota diversity indices (Supplementary Fig. 2a), and all four indices showed significant differences between community and long-stay subjects (Supplementary Fig. 2b), indicating that a healthy, diverse diet promotes a more diverse gut microbiota. Analysing by dietary groups rather than residence location confirmed that both microbiota and diet were most diverse in DG1, and least diverse in DG3 and DG4 (Supplementary Fig. 3). Procrustes analysis similarly showed that the dietary groups were associated with separations in microbiota composition (Supplementary Fig. 3). Furthermore, the microbiota was associated with the duration in long-stay, with residents of more than a year having a microbiota that was furthest separated from community-dwelling subjects (Supplementary Fig. 4). For the majority of these longer-term residents, the diet was different from that in more recently admitted subjects (Supplementary Fig. 4). Examination of duration of care (Supplementary Fig. 4c) showed that diet changed more quickly than the microbiota did; both diet and microbiota moved in the direction away from the community types. After 1 month in long stay, all subjects had a long-stay diet, but it took a year for the microbiota to be clearly the long-stay type. Collectively the data indicate that the composition of the microbiota is determined by the composition and diversity of the diet. Community setting and faecal metabolome Faecal metabolites correlate with microbiota composition and inflammatory scores in Crohn’s disease 24 . We therefore performed metabolomic analysis (NMR spectroscopy) of faecal water from 29

0.6 0.8 b

a

0.6

0.4

0.4

0.2

0.2

0.0

0.0

–0.2

–0.2

–0.4

–0.4

–0.6

–0.6

–0.8

–1.2 –0.8 –0.4 –0.0 0.4 0.8 1.2 t 1

–0.6

–0.4

–0.2

0.0

0.2 0.4 0.6

t 1

Figure 3 | PLS-DA plots of 1 H NMR spectra of faecal water from community, long-stay and rehabilitation subjects. a , Community subjects (green) versus long-stay subjects (red); R 2 5 0.517, Q 2 5 0.409, two- component model. b , Community subjects (green) versus rehabilitation subjects (orange); R 2 5 0.427, Q 2 5 0.163, two-component model. The ellipses represent the Hotellings T2 with 95% confidence. To confirm the validation of the model, permutation tests ( n 5 1,000) were performed. For model a , the95%

confidence interval for the misclassification error rate (MER) was (0.43, 0.57). Using the PLS-DA model on the data resulted in an MER of 0.2 which is outside the 95% confidence interval obtained for random permutation tests, thus validating the model. For model b , using permutation testing the 95% confidence interval for the MER was (0.45, 0.55). Using the PLS-DA model on the data resulted in an MER of 0.16 which is outside the 95% confidence interval obtained for random permutation tests.

180 | NATURE | VOL 488 | 9 AUGUST 2012

Macmillan Publishers Limited. All rights reserved ©2012

Powered by