第 2 章 Vegan 使用技巧

R package for community ecologists: popular ordination methods, ecological null models & diversity analysis

Vegan 是面向研究群落的生态学家开发的软件包，包括了流行的坐标分析方法、生态学的零假设和多样性分析方法。

library(vegan)

2.1 计算 beta-dispersion

使用 betadisper() 计算。这是一个 beta-diversity （β多样性）指标。

example("betadisper")

2.1.1 使用 `varespec` 数据集计算 beta-dispersion

2.1.1.1 数据集说明

varespec 数据有 24 行和 44 列，是 44 个物种（species）在 24 个样品（site）中的估计覆盖率，列名是 44 个物种拉丁名称的简写。varechem 数据有 24 行、14 列，是 24 个样品中 14 个土壤相关的属性。

data(varespec)

## Bray-Curtis distances between samples
dis <- vegdist(varespec)

2.1.1.2 样本间距

vegdist 默认计算样品间的 Bray-Curtis 距离，其定义为：

\[d_{jk} = (\sum abs(x_{ij}-x_{ik}))/(\sum (x_{ij}+x_{ik}))\]

其中，$x_{ij}$ 和 $x_{ik}$ 分别代表物种（列） $i$ 在两个样品（行）$j$ 和 $k$ 中的数量。则前两个位点的 Bray-Curtis 距离是 0.5310021。

2.1.1.3 样本分组及计算结果

## First 16 sites grazed, remaining 8 sites ungrazed
groups <- factor(c(rep(1,16), rep(2,8)), labels = c("grazed","ungrazed"))

## Calculate multivariate dispersions
mod <- betadisper(dis, groups)
mod

## 
##  Homogeneity of multivariate dispersions
## 
## Call: betadisper(d = dis, group = groups)
## 
## No. of Positive Eigenvalues: 15
## No. of Negative Eigenvalues: 8
## 
## Average distance to median:
##   grazed ungrazed 
##   0.3926   0.2706 
## 
## Eigenvalues for PCoA axes:
## (Showing 8 of 23 eigenvalues)
##  PCoA1  PCoA2  PCoA3  PCoA4  PCoA5  PCoA6  PCoA7  PCoA8 
## 1.7552 1.1334 0.4429 0.3698 0.2454 0.1961 0.1751 0.1284

mod 返回一个 betadisper 类的列表，有以下几个元件：

eig：numeric；PCoA 分析的特征值
vectors: matrix；PCoA 分析特征向量
distances: numeric；在 PCoA 的多维空间中，每个样本与其对应的分组中心间的欧氏距离。
group: factor；分组信息。
centroids: matrix；分组在 PCoA 坐标系中的中心点。
call: 函数调用信息。

str(mod)

## List of 7
##  $ eig            : Named num [1:23] 1.755 1.133 0.443 0.37 0.245 ...
##   ..- attr(*, "names")= chr [1:23] "PCoA1" "PCoA2" "PCoA3" "PCoA4" ...
##  $ vectors        : num [1:24, 1:23] 0.0946 -0.3125 -0.3511 -0.3291 -0.1926 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:24] "18" "15" "24" "27" ...
##   .. ..$ : chr [1:23] "PCoA1" "PCoA2" "PCoA3" "PCoA4" ...
##  $ distances      : Named num [1:24] 0.331 0.245 0.409 0.406 0.271 ...
##   ..- attr(*, "names")= chr [1:24] "18" "15" "24" "27" ...
##  $ group          : Factor w/ 2 levels "grazed","ungrazed": 1 1 1 1 1 1 1 1 1 1 ...
##  $ centroids      : num [1:2, 1:23] -0.1455 0.2786 0.0758 -0.2111 -0.0137 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "grazed" "ungrazed"
##   .. ..$ : chr [1:23] "PCoA1" "PCoA2" "PCoA3" "PCoA4" ...
##  $ group.distances: num [1:2(1d)] 0.393 0.271
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ : chr [1:2] "grazed" "ungrazed"
##  $ call           : language betadisper(d = dis, group = groups)
##  - attr(*, "class")= chr "betadisper"
##  - attr(*, "method")= chr "bray"
##  - attr(*, "type")= chr "median"
##  - attr(*, "bias.adjust")= logi FALSE

这 24 个样品来自两组处理，其中前 16 个为一组（grazed），后 8 个为另一组（ungrazed）。 betadisper() 计算了每个组样品距离中心点（“中位数”）距离的平均值。输出中还包括 23 个特征值（eigenvalues）。

2.1.1.4 统计分析

To test if one or more groups is more variable than the others, ANOVA of the distances to group centroids can be performed and parametric theory used to interpret the significance of F. An alternative is to use a permutation test. permutest.betadisper permutes model residuals to generate a permutation distribution of F under the Null hypothesis of no difference in dispersion between groups.

anova() 方法被用来比较各个分组间是否存在有变化程度更大的情况。permutest() 则为这一任务提供了另一种统计分析手段。

## Perform test
anova(mod)

## Analysis of Variance Table
## 
## Response: Distances
##           Df  Sum Sq  Mean Sq F value  Pr(>F)  
## Groups     1 0.07931 0.079306  4.6156 0.04295 *
## Residuals 22 0.37801 0.017182                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova() 和 permutest() 分别计算显著性。

## Permutation test for F
permutest(mod, pairwise = TRUE, permutations = 99)

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 99
## 
## Response: Distances
##           Df  Sum Sq  Mean Sq      F N.Perm Pr(>F)  
## Groups     1 0.07931 0.079306 4.6156     99   0.04 *
## Residuals 22 0.37801 0.017182                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
##           grazed ungrazed
## grazed               0.04
## ungrazed 0.04295

TukeyHSD() 方法则计算出两两 group 之间存在的差异及其显著性。

## Tukey's Honest Significant Differences
(mod.HSD <- TukeyHSD(mod))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = distances ~ group, data = df)
## 
## $group
##                       diff        lwr          upr     p adj
## ungrazed-grazed -0.1219422 -0.2396552 -0.004229243 0.0429502

plot(mod.HSD)

plot() 可以绘制样品和中心点之间的距离。

## Plot the groups and distances to centroids on the
## first two PCoA axes
plot(mod)

## with data ellipses instead of hulls
plot(mod, ellipse = TRUE, hull = FALSE) # 1 sd data ellipse

plot(mod, ellipse = TRUE, hull = FALSE, conf = 0.90) # 90% data ellipse

## can also specify which axes to plot, ordering respected
plot(mod, axes = c(3,1), seg.col = "forestgreen", seg.lty = "dashed")

## Draw a boxplot of the distances to centroid for each group
boxplot(mod)

scores() 计算样本特征值。

## `scores` and `eigenvals` also work
scrs <- scores(mod)
str(scrs)

## List of 2
##  $ sites    : num [1:24, 1:2] 0.0946 -0.3125 -0.3511 -0.3291 -0.1926 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:24] "18" "15" "24" "27" ...
##   .. ..$ : chr [1:2] "PCoA1" "PCoA2"
##  $ centroids: num [1:2, 1:2] -0.1455 0.2786 0.0758 -0.2111
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "grazed" "ungrazed"
##   .. ..$ : chr [1:2] "PCoA1" "PCoA2"

# 可以将其画在图上
plot(scrs$sites, xlim = c(-.8, .8), ylim = c(-.5, .5))
text(scrs$sites, labels = rownames(scrs$sites), pos = 4)

计算给出的样本、样本组的中心点坐标等。

head(scores(mod, 1:4, display = "sites"))

##          PCoA1       PCoA2        PCoA3        PCoA4
## 18  0.09459373  0.15914576  0.074400844 -0.202466025
## 15 -0.31248809  0.10032751 -0.062243360  0.110844864
## 24 -0.35106507 -0.05954096 -0.038079447  0.095060928
## 27 -0.32914546 -0.17019348  0.231623720  0.019110623
## 23 -0.19259443 -0.01459250 -0.005679372 -0.209718312
## 19 -0.06794575 -0.14501690 -0.085645653  0.002431355

# group centroids/medians 
scores(mod, 1:4, display = "centroids")

##               PCoA1       PCoA2       PCoA3      PCoA4
## grazed   -0.1455200  0.07584572 -0.01366220 -0.0178990
## ungrazed  0.2786095 -0.21114993 -0.03475586  0.0220129

# eigenvalues from the underlying principal coordinates analysis
eigenvals(mod)

##      PCoA1      PCoA2      PCoA3      PCoA4      PCoA5      PCoA6      PCoA7 
##  1.7552165  1.1334455  0.4429018  0.3698054  0.2453532  0.1960921  0.1751131 
##      PCoA8      PCoA9     PCoA10     PCoA11     PCoA12     PCoA13     PCoA14 
##  0.1284467  0.0971594  0.0759601  0.0637178  0.0583225  0.0394934  0.0172699 
##     PCoA15     PCoA16     PCoA17     PCoA18     PCoA19     PCoA20     PCoA21 
##  0.0051011 -0.0004131 -0.0064654 -0.0133147 -0.0253944 -0.0375105 -0.0480069 
##     PCoA22     PCoA23 
## -0.0537146 -0.0741390

bias.adjust = TRUE 对于 beta-多样性估计中的小样本偏差进行校正。

## try out bias correction; compare with mod3
(mod3B <- betadisper(dis, groups, type = "median", bias.adjust=TRUE))

## 
##  Homogeneity of multivariate dispersions
## 
## Call: betadisper(d = dis, group = groups, type = "median", bias.adjust
## = TRUE)
## 
## No. of Positive Eigenvalues: 15
## No. of Negative Eigenvalues: 8
## 
## Average distance to median:
##   grazed ungrazed 
##   0.4055   0.2893 
## 
## Eigenvalues for PCoA axes:
## (Showing 8 of 23 eigenvalues)
##  PCoA1  PCoA2  PCoA3  PCoA4  PCoA5  PCoA6  PCoA7  PCoA8 
## 1.7552 1.1334 0.4429 0.3698 0.2454 0.1961 0.1751 0.1284

anova(mod3B)

## Analysis of Variance Table
## 
## Response: Distances
##           Df  Sum Sq  Mean Sq F value  Pr(>F)  
## Groups     1 0.07193 0.071927  3.7826 0.06468 .
## Residuals 22 0.41834 0.019015                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

permutest(mod3B, permutations = 99)

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 99
## 
## Response: Distances
##           Df  Sum Sq  Mean Sq      F N.Perm Pr(>F)  
## Groups     1 0.07193 0.071927 3.7826     99   0.08 .
## Residuals 22 0.41834 0.019015                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

哪怕只有一个分组，仍然可以使用

## should always work for a single group
group <- factor(rep("grazed", NROW(varespec)))
(tmp <- betadisper(dis, group, type = "median"))

## 
##  Homogeneity of multivariate dispersions
## 
## Call: betadisper(d = dis, group = group, type = "median")
## 
## No. of Positive Eigenvalues: 15
## No. of Negative Eigenvalues: 8
## 
## Average distance to median:
## grazed 
## 0.4255 
## 
## Eigenvalues for PCoA axes:
## (Showing 8 of 23 eigenvalues)
##  PCoA1  PCoA2  PCoA3  PCoA4  PCoA5  PCoA6  PCoA7  PCoA8 
## 1.7552 1.1334 0.4429 0.3698 0.2454 0.1961 0.1751 0.1284

(tmp <- betadisper(dis, group, type = "centroid"))

## 
##  Homogeneity of multivariate dispersions
## 
## Call: betadisper(d = dis, group = group, type = "centroid")
## 
## No. of Positive Eigenvalues: 15
## No. of Negative Eigenvalues: 8
## 
## Average distance to centroid:
## grazed 
## 0.4261 
## 
## Eigenvalues for PCoA axes:
## (Showing 8 of 23 eigenvalues)
##  PCoA1  PCoA2  PCoA3  PCoA4  PCoA5  PCoA6  PCoA7  PCoA8 
## 1.7552 1.1334 0.4429 0.3698 0.2454 0.1961 0.1751 0.1284

当有缺失值的时候，也可以使用

## simulate missing values in 'd' and 'group'
## using spatial medians
groups[c(2,20)] <- NA
dis[c(2, 20)] <- NA
mod2 <- betadisper(dis, groups) ## messages
mod2

## 
##  Homogeneity of multivariate dispersions
## 
## Call: betadisper(d = dis, group = groups)
## 
## No. of Positive Eigenvalues: 14
## No. of Negative Eigenvalues: 5
## 
## Average distance to median:
##   grazed ungrazed 
##   0.3984   0.3008 
## 
## Eigenvalues for PCoA axes:
## (Showing 8 of 19 eigenvalues)
##  PCoA1  PCoA2  PCoA3  PCoA4  PCoA5  PCoA6  PCoA7  PCoA8 
## 1.4755 0.8245 0.4218 0.3456 0.2159 0.1688 0.1150 0.1060

permutest(mod2, permutations = 99)

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 99
## 
## Response: Distances
##           Df   Sum Sq  Mean Sq      F N.Perm Pr(>F)
## Groups     1 0.039979 0.039979 2.4237     99   0.11
## Residuals 18 0.296910 0.016495

anova(mod2)

## Analysis of Variance Table
## 
## Response: Distances
##           Df   Sum Sq  Mean Sq F value Pr(>F)
## Groups     1 0.039979 0.039979  2.4237 0.1369
## Residuals 18 0.296910 0.016495

plot(mod2)

boxplot(mod2)

plot(TukeyHSD(mod2))

type = "centroid" 时的结果（默认为 type = "median"，一般不需要修改）。

## Using group centroids
mod3 <- betadisper(dis, groups, type = "centroid")
mod3

## 
##  Homogeneity of multivariate dispersions
## 
## Call: betadisper(d = dis, group = groups, type = "centroid")
## 
## No. of Positive Eigenvalues: 14
## No. of Negative Eigenvalues: 5
## 
## Average distance to centroid:
##   grazed ungrazed 
##   0.4001   0.3108 
## 
## Eigenvalues for PCoA axes:
## (Showing 8 of 19 eigenvalues)
##  PCoA1  PCoA2  PCoA3  PCoA4  PCoA5  PCoA6  PCoA7  PCoA8 
## 1.4755 0.8245 0.4218 0.3456 0.2159 0.1688 0.1150 0.1060

permutest(mod3, permutations = 99)

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 99
## 
## Response: Distances
##           Df   Sum Sq  Mean Sq      F N.Perm Pr(>F)
## Groups     1 0.033468 0.033468 3.1749     99    0.1
## Residuals 18 0.189749 0.010542

anova(mod3)

## Analysis of Variance Table
## 
## Response: Distances
##           Df   Sum Sq  Mean Sq F value  Pr(>F)  
## Groups     1 0.033468 0.033468  3.1749 0.09166 .
## Residuals 18 0.189749 0.010542                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(mod3)

boxplot(mod3)

plot(TukeyHSD(mod3))

2.1.2 使用 `dune` 数据集计算 beta-dispersion

本例子来源于 Vignettes: Diversity analysis in vegan.

使用的数据是荷兰沙丘（dune）草地牧场（meadow）上的植被（vegetation）和环境因子。

# Vegetation and Environment in Dutch Dune Meadows.
data("dune")
data("dune.env")

使用 betadiver() 计算 beta-多样性。

betadiver() 的第二个参数指定计算 beta 多样性指标的算法。算法基于 $a$, $b$, $c$ 等 3 个值（使用 betadiver(help = TRUE) 查看计算公式）。它们的含义如下：

method = "z" 时，计算的公式为：

\[z = (log(2) - log(2*a+b+c) + log(a+b+c))/log(2)\]

计算得到的 z 是一个 dist 对象。

z = betadiver(dune, method = "z")
z

##            1         2         3         4         5         6         7
## 2  0.4150375                                                            
##            8         9        10        11        12        13        14
## 2                                                                       
##           15        16        17        18        19
## 2                                                   
##  [ reached getOption("max.print") -- omitted 18 rows ]

按照 dune.env$Management 中给出的土地管理类型进行分组，计算 beta-dispersion。

(mod = with(dune.env, betadisper(z, Management)))

## 
##  Homogeneity of multivariate dispersions
## 
## Call: betadisper(d = z, group = Management)
## 
## No. of Positive Eigenvalues: 12
## No. of Negative Eigenvalues: 7
## 
## Average distance to median:
##     BF     HF     NM     SF 
## 0.2532 0.2512 0.4406 0.3635 
## 
## Eigenvalues for PCoA axes:
## (Showing 8 of 19 eigenvalues)
##   PCoA1   PCoA2   PCoA3   PCoA4   PCoA5   PCoA6   PCoA7   PCoA8 
## 1.65466 0.88696 0.53336 0.37435 0.28725 0.22445 0.16128 0.08099

对计算得到的结果进行可视化。

plot(mod)

boxplot(mod)

1 R Markdown

3 概率论和数理统计