← 返回首页

GBD 分析方法与复现教程

面向“能直接复现结果”的研究流程页面:EAPC + 95%CI、Joinpoint/AAPC、世界地图映射、聚类分析,包含方法说明、可运行代码、输入输出样例与排错清单。

R / tidyverse Joinpoint Geo Mapping Reproducibility 研究复现

0) 模块统计看板

0
卡片入口总数
专题页 + 案例 + 代码清单
0
课程代码文件
支持在线查看与下载
0
核心分析模块
EAPC / AAPC / 地图 / 聚类
0
示例数据文件
含 toy 输入与结果样例
方法模块4项
实战案例4项
专题页面9页
代码清单页1页
说明:本页提供专题导航与完整方法说明,便于先按主题定位,再查看详细代码与流程。
专题入口

专题模块导航

专题导航
专题页面集中提供方法说明、关键脚本与下载入口,便于按主题查阅与复现。
实战场景

GBD 实战案例

以上案例页同时提供页面说明、脚本示例与配套数据文件。

1) 页面能力总览

快速看能力覆盖;详细代码可在下方折叠区展开。

2) 最小运行流程

按这 5 步走,基本可完成一轮复现分析。

1

准备输入

整理为 location,year,measure,age,metric,val,lower,upper

2

EAPC 计算

先跑 EAPC,得到各地区趋势与 95%CI。

3

Joinpoint 回读

导出输入后回读 AAPC,并统一格式。

4

国家名映射

绘制地图并保存未匹配国家清单。

5

聚类输出

合并特征后聚类,输出簇标签与树图。

3) 详细方法与代码(点击展开)

包含 EAPC、Joinpoint、地图、聚类、复现协议、论文模板、代码入口与更新日志。

3) EAPC + 95%CI

方法说明:对年龄标化率做 ln(rate) ~ year 线性回归,斜率 β 表示对数尺度年变化。

公式:EAPC = (exp(β)-1)*10095%CI = (exp(β ± 1.96*SE)-1)*100

输入:location, year, val(且 val > 0)。输出:location, eapc, lci, uci

R 代码:批量计算 EAPC + 95%CI
library(dplyr) calc_eapc <- function(df_rate) { stopifnot(all(c("location", "year", "val") %in% names(df_rate))) df_rate %>% filter(!is.na(val), val > 0) %>% group_by(location) %>% summarise( n_year = n, beta = coef(lm(log(val) ~ year))[2], se = summary(lm(log(val) ~ year))$coefficients[2, 2], .groups = "drop" ) %>% mutate( eapc = (exp(beta) - 1) * 100, lci = (exp(beta - 1.96 * se) - 1) * 100, uci = (exp(beta + 1.96 * se) - 1) * 100 ) %>% mutate(across(c(eapc, lci, uci), ~round(.x, 3)))
} # 示例
input_df <- read.csv("examples/eapc_input_toy.csv")
out_eapc <- calc_eapc(input_df)
write.csv(out_eapc, "examples/output_eapc_toy.csv", row.names = FALSE)
常见报错:① non-positive values(存在 0 或负值,需先过滤);② 年份点过少(建议每组≥10年);③ 极端异方差(建议补充敏感性分析)。

4) Joinpoint / AAPC

方法说明:先导出 Joinpoint 输入(rate + SE),外部软件拟合后回读 .aapc.txt,统一整理为论文表。

输入:location_name, year, val, SE输出:location_name, val, lower, upper, AAPC

R 代码:Joinpoint 输入导出 + AAPC 回读
library(dplyr) prepare_joinpoint_input <- function(df, measure_name) { df %>% filter(age == "Age-standardized", metric == "Rate", measure == measure_name) %>% mutate(SE = (upper - lower) / (2 * 1.96)) %>% select(location_name = location, year, val, SE) %>% arrange(location_name, year)
} jp_in <- read.csv("examples/joinpoint_input_toy.csv")
jp_export <- prepare_joinpoint_input(jp_in, "Incidence")
write.csv(jp_export, "examples/output_joinpoint_export.csv", row.names = FALSE) # 外部 Joinpoint 运行后,将结果保存为 examples/joinpoint_aapc_result.txt
aapc_raw <- read.table("examples/joinpoint_aapc_result_toy.txt", header = TRUE)
aapc_tbl <- aapc_raw %>% transmute( location_name, val = round(AAPC, 3), lower = round(LowerCI, 3), upper = round(UpperCI, 3), AAPC = sprintf("%.3f (%.3f, %.3f)", val, lower, upper) )
write.csv(aapc_tbl, "examples/output_aapc_table.csv", row.names = FALSE)
常见报错:① Joinpoint 输入排序错误(必须 location + year);② SE 为 0(常见于 lower=upper);③ 字段名不一致(回读前先标准化列名)。

5) 世界地图热力图(国家名映射)

方法说明:先统一 GBD 国家名和地图数据国家名,再进行空间合并;未匹配国家必须输出日志。

输入:location, val输出:地图对象 + unmatched_countries.csv

R 代码:映射表 + 未匹配检查
library(dplyr)
library(ggplot2)
library(maps) map_country_names <- function(df) { name_map <- c( "United States of America" = "USA", "Russian Federation" = "Russia", "Republic of Korea" = "South Korea", "Democratic People's Republic of Korea" = "North Korea", "Viet Nam" = "Vietnam", "Cote d'Ivoire" = "Ivory Coast" ) df %>% mutate(location = recode(location, !!!name_map))
} map_df <- read.csv("examples/map_input_toy.csv")
map_df <- map_country_names(map_df)
world_data <- map_data("world")
merged_map <- left_join(world_data, map_df, by = c("region" = "location")) unmatched <- setdiff(unique(map_df$location), unique(world_data$region))
write.csv(data.frame(location = unmatched), "examples/output_unmatched_countries.csv", row.names = FALSE) ggplot(merged_map, aes(long, lat, group = group, fill = val)) + geom_polygon(color = "grey70", linewidth = 0.1) + scale_fill_viridis_c(option = "C", na.value = "grey95") + coord_fixed(1.3) + theme_void
常见报错:① 国家名拼写不一致导致大片 NA;② 直接用 0 替换 NA 会扭曲低值国家分布;③ 图例分箱边界与论文表不一致。

6) 聚类分析(含簇数依据)

方法说明:使用标准化特征做层次聚类,比较不同簇数轮廓系数,选择最稳健分组。

输入:location + 多特征列输出:cluster_label + 树状图。

R 代码:层次聚类 + 轮廓系数
library(dplyr)
library(cluster)
library(factoextra) clu_df <- read.csv("examples/cluster_input_toy.csv")
row.names(clu_df) <- clu_df$location
x <- clu_df %>% select(-location) %>% scale hc <- hclust(dist(x, method = "euclidean"), method = "complete") sil_scores <- sapply(2:6, function(k) { grp <- cutree(hc, k = k) mean(silhouette(grp, dist(x))[, 3])
}) best_k <- which.max(sil_scores) + 1
cluster_out <- data.frame(location = row.names(x), cluster = cutree(hc, k = best_k))
write.csv(cluster_out, "examples/output_cluster_labels.csv", row.names = FALSE) fviz_dend(hc, k = best_k, cex = 0.6, rect = TRUE, horiz = TRUE)
常见报错:① 未标准化导致大尺度变量主导;② 簇数仅按经验固定而无依据;③ 异常值未处理导致簇边界失真。

7) 复现协议(Reproducibility)

环境建议:R ≥ 4.3;核心包 dplyrggplot2mapsclusterfactoextra

运行顺序:EAPC → Joinpoint 输入导出/回读 → 地图映射 → 聚类分析。

结果落盘:所有示例输出默认写入 public/gbd-analysis/examples/

版本记录:建议附 sessionInfo 到论文补充材料。

R 代码:环境快照
pkgs <- c("dplyr", "ggplot2", "maps", "cluster", "factoextra")
print(R.version.string)
print(sapply(pkgs, packageVersion))
sessionInfo

8) 论文结果一键复制模板

下面提供可直接粘贴到“方法 / 结果 / 图注”的模板。复制后仅替换中括号内变量即可(疾病名、年龄段、年份范围、核心数值)。
模板 A:方法学(Methods)
### Statistical analysis
We extracted [measure] rates for [disease/exposure] from [start_year] to [end_year] at the [age_group] level. Temporal trend was quantified using estimated annual percentage change (EAPC) by fitting a log-linear model:
ln(rate) ~ year.
EAPC was calculated as (exp(β)-1)×100, with 95% confidence interval (CI) derived from β ± 1.96×SE. For segmented temporal patterns, joinpoint regression was applied and average annual percentage change (AAPC) with 95% CI was reported.
Spatial distribution was visualized using country-level choropleth maps after harmonizing country-name mappings.
Unsupervised hierarchical clustering (Euclidean distance, complete linkage) was used to identify pattern groups across countries/regions.
模板 B:结果段(Results)
### Temporal trend
From [start_year] to [end_year], the age-standardized [measure] rate of [disease] in [population/region] changed from [value_start] to [value_end] per 100,000.
The overall trend showed an EAPC of [eapc]% (95% CI: [lci] to [uci]), indicating a [decreasing/increasing/stable] pattern. Joinpoint analysis identified [n] significant turning point(s), and the corresponding AAPC was [aapc]% (95% CI: [aapc_lci] to [aapc_uci]).
Spatially, higher burdens were observed in [high_regions], whereas lower values were concentrated in [low_regions]. Cluster analysis grouped countries/regions into [k] clusters with distinct epidemiological profiles, suggesting heterogeneous burden trajectories across development contexts.
模板 C:图注(Figure Legend)
**Figure [X]. Global pattern and temporal trend of [disease/measure].**
(A) Choropleth map of age-standardized [measure] rate in [year], per 100,000 population.
(B) Temporal trend of age-standardized rate from [start_year] to [end_year], with estimated annual percentage change (EAPC) and 95% CI.
(C) Joinpoint-derived average annual percentage change (AAPC) across selected regions.
(D) Hierarchical clustering dendrogram showing pattern groups based on trend and burden indicators. Abbreviations: EAPC, estimated annual percentage change; AAPC, average annual percentage change; ASR, age-standardized rate; CI, confidence interval.

9) GBD课程代码入口

已将本地课程脚本整理为在线清单(共 30 个代码/示例页面文件),支持浏览与下载。

🔗 打开「GBD课程代码清单」

后续可按章节继续拆分为独立教学专题页(如 EAPC、Joinpoint、BAPC、Norpred)。

10) 数据资源入口

11) 本页更新日志

  • 修复页面 JS 平滑滚动逻辑(可用、无语法错误)。
  • EAPC 与 95%CI 代码改为可运行版本,公式完整。
  • 新增 Joinpoint 输入/输出闭环、地图国家名映射与未匹配日志策略。
  • 新增聚类簇数依据(轮廓系数)与复现协议说明。
  • 新增论文写作一键复制模板(Methods / Results / Figure Legend)。
  • 新增「GBD课程代码清单」入口与下载页面(30个文件)。
  • 统一代码示例格式,并补充标准化路径写法。
  • 新增 BAPC / APC / Figures 三个专题页面与首页入口卡片。