GBD分析方法与复现教程

library(dplyr) calc_eapc <- function(df_rate) { stopifnot(all(c("location", "year", "val") %in% names(df_rate))) df_rate %>% filter(!is.na(val), val > 0) %>% group_by(location) %>% summarise( n_year = n, beta = coef(lm(log(val) ~ year))[2], se = summary(lm(log(val) ~ year))$coefficients[2, 2], .groups = "drop" ) %>% mutate( eapc = (exp(beta) - 1) * 100, lci = (exp(beta - 1.96 * se) - 1) * 100, uci = (exp(beta + 1.96 * se) - 1) * 100 ) %>% mutate(across(c(eapc, lci, uci), ~round(.x, 3)))
} # 示例
input_df <- read.csv("examples/eapc_input_toy.csv")
out_eapc <- calc_eapc(input_df)
write.csv(out_eapc, "examples/output_eapc_toy.csv", row.names = FALSE)

常见报错：① non-positive values（存在 0 或负值，需先过滤）；② 年份点过少（建议每组≥10年）；③ 极端异方差（建议补充敏感性分析）。

4) Joinpoint / AAPC

方法说明：先导出 Joinpoint 输入（rate + SE），外部软件拟合后回读 .aapc.txt，统一整理为论文表。

输入：location_name, year, val, SE。输出：location_name, val, lower, upper, AAPC。

R 代码：Joinpoint 输入导出 + AAPC 回读

library(dplyr) prepare_joinpoint_input <- function(df, measure_name) { df %>% filter(age == "Age-standardized", metric == "Rate", measure == measure_name) %>% mutate(SE = (upper - lower) / (2 * 1.96)) %>% select(location_name = location, year, val, SE) %>% arrange(location_name, year)
} jp_in <- read.csv("examples/joinpoint_input_toy.csv")
jp_export <- prepare_joinpoint_input(jp_in, "Incidence")
write.csv(jp_export, "examples/output_joinpoint_export.csv", row.names = FALSE) # 外部 Joinpoint 运行后，将结果保存为 examples/joinpoint_aapc_result.txt
aapc_raw <- read.table("examples/joinpoint_aapc_result_toy.txt", header = TRUE)
aapc_tbl <- aapc_raw %>% transmute( location_name, val = round(AAPC, 3), lower = round(LowerCI, 3), upper = round(UpperCI, 3), AAPC = sprintf("%.3f (%.3f, %.3f)", val, lower, upper) )
write.csv(aapc_tbl, "examples/output_aapc_table.csv", row.names = FALSE)

常见报错：① Joinpoint 输入排序错误（必须 location + year）；② SE 为 0（常见于 lower=upper）；③ 字段名不一致（回读前先标准化列名）。

5) 世界地图热力图（国家名映射）

方法说明：先统一 GBD 国家名和地图数据国家名，再进行空间合并；未匹配国家必须输出日志。

输入：location, val。输出：地图对象 + unmatched_countries.csv。

R 代码：映射表 + 未匹配检查

library(dplyr)
library(ggplot2)
library(maps) map_country_names <- function(df) { name_map <- c( "United States of America" = "USA", "Russian Federation" = "Russia", "Republic of Korea" = "South Korea", "Democratic People's Republic of Korea" = "North Korea", "Viet Nam" = "Vietnam", "Cote d'Ivoire" = "Ivory Coast" ) df %>% mutate(location = recode(location, !!!name_map))
} map_df <- read.csv("examples/map_input_toy.csv")
map_df <- map_country_names(map_df)
world_data <- map_data("world")
merged_map <- left_join(world_data, map_df, by = c("region" = "location")) unmatched <- setdiff(unique(map_df$location), unique(world_data$region))
write.csv(data.frame(location = unmatched), "examples/output_unmatched_countries.csv", row.names = FALSE) ggplot(merged_map, aes(long, lat, group = group, fill = val)) + geom_polygon(color = "grey70", linewidth = 0.1) + scale_fill_viridis_c(option = "C", na.value = "grey95") + coord_fixed(1.3) + theme_void

常见报错：① 国家名拼写不一致导致大片 NA；② 直接用 0 替换 NA 会扭曲低值国家分布；③ 图例分箱边界与论文表不一致。

6) 聚类分析（含簇数依据）

方法说明：使用标准化特征做层次聚类，比较不同簇数轮廓系数，选择最稳健分组。

输入：location + 多特征列。输出：cluster_label + 树状图。

R 代码：层次聚类 + 轮廓系数

library(dplyr)
library(cluster)
library(factoextra) clu_df <- read.csv("examples/cluster_input_toy.csv")
row.names(clu_df) <- clu_df$location
x <- clu_df %>% select(-location) %>% scale hc <- hclust(dist(x, method = "euclidean"), method = "complete") sil_scores <- sapply(2:6, function(k) { grp <- cutree(hc, k = k) mean(silhouette(grp, dist(x))[, 3])
}) best_k <- which.max(sil_scores) + 1
cluster_out <- data.frame(location = row.names(x), cluster = cutree(hc, k = best_k))
write.csv(cluster_out, "examples/output_cluster_labels.csv", row.names = FALSE) fviz_dend(hc, k = best_k, cex = 0.6, rect = TRUE, horiz = TRUE)

常见报错：① 未标准化导致大尺度变量主导；② 簇数仅按经验固定而无依据；③ 异常值未处理导致簇边界失真。

7) 复现协议（Reproducibility）

环境建议：R ≥ 4.3；核心包 dplyr、ggplot2、maps、cluster、factoextra。

运行顺序：EAPC → Joinpoint 输入导出/回读 → 地图映射 → 聚类分析。

结果落盘：所有示例输出默认写入 public/gbd-analysis/examples/。

版本记录：建议附 sessionInfo 到论文补充材料。

R 代码：环境快照

pkgs <- c("dplyr", "ggplot2", "maps", "cluster", "factoextra")
print(R.version.string)
print(sapply(pkgs, packageVersion))
sessionInfo

8) 论文结果一键复制模板

下面提供可直接粘贴到“方法 / 结果 / 图注”的模板。复制后仅替换中括号内变量即可（疾病名、年龄段、年份范围、核心数值）。

模板 A：方法学（Methods）

### Statistical analysis
We extracted [measure] rates for [disease/exposure] from [start_year] to [end_year] at the [age_group] level. Temporal trend was quantified using estimated annual percentage change (EAPC) by fitting a log-linear model:
ln(rate) ~ year.
EAPC was calculated as (exp(β)-1)×100, with 95% confidence interval (CI) derived from β ± 1.96×SE. For segmented temporal patterns, joinpoint regression was applied and average annual percentage change (AAPC) with 95% CI was reported.
Spatial distribution was visualized using country-level choropleth maps after harmonizing country-name mappings.
Unsupervised hierarchical clustering (Euclidean distance, complete linkage) was used to identify pattern groups across countries/regions.

模板 B：结果段（Results）

### Temporal trend
From [start_year] to [end_year], the age-standardized [measure] rate of [disease] in [population/region] changed from [value_start] to [value_end] per 100,000.
The overall trend showed an EAPC of [eapc]% (95% CI: [lci] to [uci]), indicating a [decreasing/increasing/stable] pattern. Joinpoint analysis identified [n] significant turning point(s), and the corresponding AAPC was [aapc]% (95% CI: [aapc_lci] to [aapc_uci]).
Spatially, higher burdens were observed in [high_regions], whereas lower values were concentrated in [low_regions]. Cluster analysis grouped countries/regions into [k] clusters with distinct epidemiological profiles, suggesting heterogeneous burden trajectories across development contexts.

模板 C：图注（Figure Legend）

**Figure [X]. Global pattern and temporal trend of [disease/measure].**
(A) Choropleth map of age-standardized [measure] rate in [year], per 100,000 population.
(B) Temporal trend of age-standardized rate from [start_year] to [end_year], with estimated annual percentage change (EAPC) and 95% CI.
(C) Joinpoint-derived average annual percentage change (AAPC) across selected regions.
(D) Hierarchical clustering dendrogram showing pattern groups based on trend and burden indicators. Abbreviations: EAPC, estimated annual percentage change; AAPC, average annual percentage change; ASR, age-standardized rate; CI, confidence interval.

9) GBD课程代码入口

已将本地课程脚本整理为在线清单（共 30 个代码/示例页面文件），支持浏览与下载。

🔗 打开「GBD课程代码清单」

后续可按章节继续拆分为独立教学专题页（如 EAPC、Joinpoint、BAPC、Norpred）。

10) 数据资源入口

11) 本页更新日志

修复页面 JS 平滑滚动逻辑（可用、无语法错误）。
EAPC 与 95%CI 代码改为可运行版本，公式完整。
新增 Joinpoint 输入/输出闭环、地图国家名映射与未匹配日志策略。
新增聚类簇数依据（轮廓系数）与复现协议说明。
新增论文写作一键复制模板（Methods / Results / Figure Legend）。
新增「GBD课程代码清单」入口与下载页面（30个文件）。
统一代码示例格式，并补充标准化路径写法。
新增 BAPC / APC / Figures 三个专题页面与首页入口卡片。