In this project, we provide a step-by-step walkthrough of a CDISC clinical data standardization workflow using R. Starting from raw data collected through electronic Case Report Forms (eCRFs) following the CDASH standard, the project demonstrates how these data are transformed into SDTM domains, then into ADaM analysis datasets, and finally summarized as Tables, Listings, and Figures (TLFs).
Project Objectives :
Simulate raw CDASH-like data (e.g., demographics, laboratory results)
Perform data cleaning and harmonization (variable renaming, visit mapping, missing value checks)
Derive SDTM-compliant domains (DM,
LB)
Build ADaM datasets (ADSL,
ADLB), including derived variables such as baseline, change
from baseline, and analysis flags
Generate summary statistics and visualizations (mean change from baseline by treatment arm)
Demonstrate traceability from raw CDASH to final analytical results
Loading necessary R packages
library(tidyverse)
In this example, data are collected from six patients (IDs 1–6) enrolled in a clinical trial evaluating the effect of a new treatment, Drug A, compared to a placebo. Each participant is randomly assigned to either the treatment or placebo group, and their hemoglobin levels are measured at three scheduled visits: Screening, Week 4, and Week 8. These raw observations represent the initial data captured through electronic Case Report Forms (eCRFs) following the CDASH (Clinical Data Acquisition Standards Harmonization) standard, before any cleaning or transformation into SDTM format.
# Demographics form (raw CDASH-like)
raw_demographics <- data.frame(
Subject_ID = c(1:6),
Study_ID = "ABC123",
Sex = c("M", "F", "M", "F", "M", "F"),
Age_Years = c(54, 61, 49, 58, 62, 53),
Treatment_Assigned = c("Drug A", "Placebo", "Drug A", "Placebo", "Drug A", "Placebo")
) |> print()
## Subject_ID Study_ID Sex Age_Years Treatment_Assigned
## 1 1 ABC123 M 54 Drug A
## 2 2 ABC123 F 61 Placebo
## 3 3 ABC123 M 49 Drug A
## 4 4 ABC123 F 58 Placebo
## 5 5 ABC123 M 62 Drug A
## 6 6 ABC123 F 53 Placebo
# Lab form (raw CDASH-like)
raw_labs <- data.frame(
Subject_ID = rep(1:6, each = 3),
Visit_Name = rep(c("Screening", "Week 4", "Week 8"), times = 6),
Test_Name = "Hemoglobin",
Test_Result = c(13.2, 13.8, 14.0,
12.8, 13.1, 13.3,
14.1, 14.8, 15.0,
13.5, 13.6, 13.7,
13.0, 13.3, 13.5,
12.9, 13.0, 13.2),
Units = "g/dL"
) |> print()
## Subject_ID Visit_Name Test_Name Test_Result Units
## 1 1 Screening Hemoglobin 13.2 g/dL
## 2 1 Week 4 Hemoglobin 13.8 g/dL
## 3 1 Week 8 Hemoglobin 14.0 g/dL
## 4 2 Screening Hemoglobin 12.8 g/dL
## 5 2 Week 4 Hemoglobin 13.1 g/dL
## 6 2 Week 8 Hemoglobin 13.3 g/dL
## 7 3 Screening Hemoglobin 14.1 g/dL
## 8 3 Week 4 Hemoglobin 14.8 g/dL
## 9 3 Week 8 Hemoglobin 15.0 g/dL
## 10 4 Screening Hemoglobin 13.5 g/dL
## 11 4 Week 4 Hemoglobin 13.6 g/dL
## 12 4 Week 8 Hemoglobin 13.7 g/dL
## 13 5 Screening Hemoglobin 13.0 g/dL
## 14 5 Week 4 Hemoglobin 13.3 g/dL
## 15 5 Week 8 Hemoglobin 13.5 g/dL
## 16 6 Screening Hemoglobin 12.9 g/dL
## 17 6 Week 4 Hemoglobin 13.0 g/dL
## 18 6 Week 8 Hemoglobin 13.2 g/dL
/* --- Demographics form (raw CDASH-like) --- */
data raw_demographics;
length Study_ID $6 Sex $1 Treatment_Assigned $10;
do Subject_ID = 1 to 6;
Study_ID = "ABC123";
if mod(Subject_ID,2)=1 then Sex = "M";
else Sex = "F";
select (Subject_ID);
when (1,3,5) Treatment_Assigned = "Drug A";
otherwise Treatment_Assigned = "Placebo";
end;
select (Subject_ID);
when (1) Age_Years = 54;
when (2) Age_Years = 61;
when (3) Age_Years = 49;
when (4) Age_Years = 58;
when (5) Age_Years = 62;
when (6) Age_Years = 53;
end;
output;
end;
run;
proc print data=raw_demographics;
title "Raw CDASH-Like Demographics Data";
run;
/* --- Lab form (raw CDASH-like) --- */
data raw_labs;
length Visit_Name $10 Test_Name $20 Units $5;
array results[18] _temporary_ (13.2 13.8 14.0
12.8 13.1 13.3
14.1 14.8 15.0
13.5 13.6 13.7
13.0 13.3 13.5
12.9 13.0 13.2);
do i = 1 to 6; /* 6 subjects */
Subject_ID = i;
Units = "g/dL";
Test_Name = "Hemoglobin";
do j = 1 to 3; /* 3 visits per subject */
select (j);
when (1) Visit_Name = "Screening";
when (2) Visit_Name = "Week 4";
when (3) Visit_Name = "Week 8";
end;
Test_Result = results[(i-1)*3 + j];
output;
end;
end;
drop i j;
run;
proc print data=raw_labs;
title "Raw CDASH-Like Laboratory Data";
run;
In this step, we clean and standardize the raw CDASH datasets to
prepare them for SDTM mapping. The code below renames variables to align
with CDISC SDTM naming conventions (e.g.,
Subject_ID → USUBJID, Age_Years →
AGE), ensures consistent subject identifiers, and
harmonizes visit labels by changing “Screening” to “Baseline.” This
process produces clean, structured datasets for both
demographics and laboratory results,
ready for integration into SDTM domains.
# --- Clean Demographics ---
clean_DM <- raw_demographics |>
rename(USUBJID = Subject_ID,
STUDYID = Study_ID,
SEX = Sex,
AGE = Age_Years,
ARM = Treatment_Assigned) |>
mutate(USUBJID = paste0("SUBJ", USUBJID)) |> print()
## USUBJID STUDYID SEX AGE ARM
## 1 SUBJ1 ABC123 M 54 Drug A
## 2 SUBJ2 ABC123 F 61 Placebo
## 3 SUBJ3 ABC123 M 49 Drug A
## 4 SUBJ4 ABC123 F 58 Placebo
## 5 SUBJ5 ABC123 M 62 Drug A
## 6 SUBJ6 ABC123 F 53 Placebo
# --- Clean Labs ---
clean_LB <- raw_labs |>
rename(USUBJID = Subject_ID,
VISIT = Visit_Name,
LBTEST = Test_Name,
LBSTRESN = Test_Result,
LBSTRESU = Units) |>
mutate(USUBJID = paste0("SUBJ", USUBJID),
VISIT = ifelse(VISIT == "Screening", "Baseline", VISIT)) |> print()
## USUBJID VISIT LBTEST LBSTRESN LBSTRESU
## 1 SUBJ1 Baseline Hemoglobin 13.2 g/dL
## 2 SUBJ1 Week 4 Hemoglobin 13.8 g/dL
## 3 SUBJ1 Week 8 Hemoglobin 14.0 g/dL
## 4 SUBJ2 Baseline Hemoglobin 12.8 g/dL
## 5 SUBJ2 Week 4 Hemoglobin 13.1 g/dL
## 6 SUBJ2 Week 8 Hemoglobin 13.3 g/dL
## 7 SUBJ3 Baseline Hemoglobin 14.1 g/dL
## 8 SUBJ3 Week 4 Hemoglobin 14.8 g/dL
## 9 SUBJ3 Week 8 Hemoglobin 15.0 g/dL
## 10 SUBJ4 Baseline Hemoglobin 13.5 g/dL
## 11 SUBJ4 Week 4 Hemoglobin 13.6 g/dL
## 12 SUBJ4 Week 8 Hemoglobin 13.7 g/dL
## 13 SUBJ5 Baseline Hemoglobin 13.0 g/dL
## 14 SUBJ5 Week 4 Hemoglobin 13.3 g/dL
## 15 SUBJ5 Week 8 Hemoglobin 13.5 g/dL
## 16 SUBJ6 Baseline Hemoglobin 12.9 g/dL
## 17 SUBJ6 Week 4 Hemoglobin 13.0 g/dL
## 18 SUBJ6 Week 8 Hemoglobin 13.2 g/dL
# --- Quality checks ---
sum(is.na(clean_DM)) # missingness check
## [1] 0
sum(is.na(clean_LB))
## [1] 0
/* --- Clean Demographics --- */
data clean_DM;
set raw_demographics;
length USUBJID $8 STUDYID $6 SEX $1 ARM $10;
/* Rename and create SDTM-compliant variables */
USUBJID = cats("SUBJ", Subject_ID);
STUDYID = Study_ID;
AGE = Age_Years;
ARM = Treatment_Assigned;
keep STUDYID USUBJID SEX AGE ARM;
run;
title "Cleaned Demographics Dataset";
proc print data=clean_DM noobs;
run;
/* --- Clean Labs --- */
data clean_LB;
set raw_labs;
length USUBJID $8 VISIT $10 LBTEST $20 LBSTRESU $5;
/* Rename and create SDTM-compliant variables */
USUBJID = cats("SUBJ", Subject_ID);
VISIT = strip(Visit_Name);
if VISIT = "Screening" then VISIT = "Baseline"; /* Harmonize visit names */
LBTEST = Test_Name;
LBSTRESN = Test_Result;
LBSTRESU = Units;
keep USUBJID VISIT LBTEST LBSTRESN LBSTRESU;
run;
title "Cleaned Laboratory Dataset";
proc print data=clean_LB noobs;
run;
/* --- Quality Checks: Missing Value Summary --- */
title "Missing Value Check for Cleaned Datasets";
proc means data=clean_DM n nmiss;
run;
proc means data=clean_LB n nmiss;
run;
In this step, we organize the cleaned data into SDTM-compliant domains following CDISC standards. Each dataset is structured with standardized variable names and assigned a corresponding domain code. The Demographics (DM) domain contains one record per subject with key attributes such as study ID, subject ID, sex, age, and treatment arm. The Laboratory Tests (LB) domain captures repeated measures of laboratory results across study visits, including test name, result, and units. These structured datasets form the foundation for subsequent ADaM derivations and statistical analyses.
# a. Demographics (DM domain)
DM <- clean_DM |>
select(STUDYID, USUBJID, SEX, AGE, ARM) |>
mutate(DOMAIN = "DM") |> print()
## STUDYID USUBJID SEX AGE ARM DOMAIN
## 1 ABC123 SUBJ1 M 54 Drug A DM
## 2 ABC123 SUBJ2 F 61 Placebo DM
## 3 ABC123 SUBJ3 M 49 Drug A DM
## 4 ABC123 SUBJ4 F 58 Placebo DM
## 5 ABC123 SUBJ5 M 62 Drug A DM
## 6 ABC123 SUBJ6 F 53 Placebo DM
# b. Laboratory Tests (LB domain)
LB <- clean_LB |>
select(USUBJID, VISIT, LBTEST, LBSTRESN, LBSTRESU) |>
mutate(STUDYID = "ABC123",
DOMAIN = "LB") |> print()
## USUBJID VISIT LBTEST LBSTRESN LBSTRESU STUDYID DOMAIN
## 1 SUBJ1 Baseline Hemoglobin 13.2 g/dL ABC123 LB
## 2 SUBJ1 Week 4 Hemoglobin 13.8 g/dL ABC123 LB
## 3 SUBJ1 Week 8 Hemoglobin 14.0 g/dL ABC123 LB
## 4 SUBJ2 Baseline Hemoglobin 12.8 g/dL ABC123 LB
## 5 SUBJ2 Week 4 Hemoglobin 13.1 g/dL ABC123 LB
## 6 SUBJ2 Week 8 Hemoglobin 13.3 g/dL ABC123 LB
## 7 SUBJ3 Baseline Hemoglobin 14.1 g/dL ABC123 LB
## 8 SUBJ3 Week 4 Hemoglobin 14.8 g/dL ABC123 LB
## 9 SUBJ3 Week 8 Hemoglobin 15.0 g/dL ABC123 LB
## 10 SUBJ4 Baseline Hemoglobin 13.5 g/dL ABC123 LB
## 11 SUBJ4 Week 4 Hemoglobin 13.6 g/dL ABC123 LB
## 12 SUBJ4 Week 8 Hemoglobin 13.7 g/dL ABC123 LB
## 13 SUBJ5 Baseline Hemoglobin 13.0 g/dL ABC123 LB
## 14 SUBJ5 Week 4 Hemoglobin 13.3 g/dL ABC123 LB
## 15 SUBJ5 Week 8 Hemoglobin 13.5 g/dL ABC123 LB
## 16 SUBJ6 Baseline Hemoglobin 12.9 g/dL ABC123 LB
## 17 SUBJ6 Week 4 Hemoglobin 13.0 g/dL ABC123 LB
## 18 SUBJ6 Week 8 Hemoglobin 13.2 g/dL ABC123 LB
/* --- a. Demographics (DM Domain) --- */
data DM;
set clean_DM;
length DOMAIN $2;
DOMAIN = "DM"; /* SDTM domain code */
keep STUDYID USUBJID SEX AGE ARM DOMAIN;
run;
title "SDTM Demographics (DM) Domain";
proc print data=DM noobs;
run;
/* --- b. Laboratory Tests (LB Domain) --- */
data LB;
set clean_LB;
length STUDYID $6 DOMAIN $2;
STUDYID = "ABC123"; /* Assign study ID */
DOMAIN = "LB"; /* SDTM domain code */
keep USUBJID VISIT LBTEST LBSTRESN LBSTRESU STUDYID DOMAIN;
run;
title "SDTM Laboratory Tests (LB) Domain";
proc print data=LB noobs;
run;
In this step, we begin transforming the standardized SDTM domains into ADaM datasets, which are structured for statistical analysis.
The first task is to merge the Demographics (DM) and
Laboratory (LB) domains by their common identifiers,
USUBJID and STUDYID. This merge combines
subject-level information (such as age, sex, and treatment arm) with
laboratory results, creating a unified dataset called
ADLB_raw. This combined data serves as the
foundation for deriving analysis variables such as baseline values,
change from baseline, and analysis flags in subsequent steps.
ADLB_raw <- merge(LB,
DM,
by = c("USUBJID", "STUDYID"))|> print()
## USUBJID STUDYID VISIT LBTEST LBSTRESN LBSTRESU DOMAIN.x SEX AGE
## 1 SUBJ1 ABC123 Baseline Hemoglobin 13.2 g/dL LB M 54
## 2 SUBJ1 ABC123 Week 4 Hemoglobin 13.8 g/dL LB M 54
## 3 SUBJ1 ABC123 Week 8 Hemoglobin 14.0 g/dL LB M 54
## 4 SUBJ2 ABC123 Baseline Hemoglobin 12.8 g/dL LB F 61
## 5 SUBJ2 ABC123 Week 4 Hemoglobin 13.1 g/dL LB F 61
## 6 SUBJ2 ABC123 Week 8 Hemoglobin 13.3 g/dL LB F 61
## 7 SUBJ3 ABC123 Baseline Hemoglobin 14.1 g/dL LB M 49
## 8 SUBJ3 ABC123 Week 4 Hemoglobin 14.8 g/dL LB M 49
## 9 SUBJ3 ABC123 Week 8 Hemoglobin 15.0 g/dL LB M 49
## 10 SUBJ4 ABC123 Baseline Hemoglobin 13.5 g/dL LB F 58
## 11 SUBJ4 ABC123 Week 4 Hemoglobin 13.6 g/dL LB F 58
## 12 SUBJ4 ABC123 Week 8 Hemoglobin 13.7 g/dL LB F 58
## 13 SUBJ5 ABC123 Baseline Hemoglobin 13.0 g/dL LB M 62
## 14 SUBJ5 ABC123 Week 4 Hemoglobin 13.3 g/dL LB M 62
## 15 SUBJ5 ABC123 Week 8 Hemoglobin 13.5 g/dL LB M 62
## 16 SUBJ6 ABC123 Baseline Hemoglobin 12.9 g/dL LB F 53
## 17 SUBJ6 ABC123 Week 4 Hemoglobin 13.0 g/dL LB F 53
## 18 SUBJ6 ABC123 Week 8 Hemoglobin 13.2 g/dL LB F 53
## ARM DOMAIN.y
## 1 Drug A DM
## 2 Drug A DM
## 3 Drug A DM
## 4 Placebo DM
## 5 Placebo DM
## 6 Placebo DM
## 7 Drug A DM
## 8 Drug A DM
## 9 Drug A DM
## 10 Placebo DM
## 11 Placebo DM
## 12 Placebo DM
## 13 Drug A DM
## 14 Drug A DM
## 15 Drug A DM
## 16 Placebo DM
## 17 Placebo DM
## 18 Placebo DM
proc sort data=LB; by STUDYID USUBJID; run;
proc sort data=DM; by STUDYID USUBJID; run;
data ADLB_raw;
merge LB(in=inLB) DM(in=inDM);
by STUDYID USUBJID;
if inLB and inDM; /* Keep only matched subjects */
run;
title "ADaM Raw Dataset (ADLB_raw)";
proc print data=ADLB_raw noobs;
run;
After merging the SDTM domains, the next step is to create
analysis-ready variables needed for statistical evaluation. For
each subject, we identify the baseline laboratory value
(the measurement recorded at the “Baseline” visit) and calculate the
change from baseline (CHG) at subsequent
visits. Additional variables are defined to describe the parameter
(PARAM), the numeric analysis value (AVAL),
and the visit label (AVISIT). Finally, an analysis
flag (ANL01FL) is created to indicate which
records should be included in the analysis—typically all post-baseline
observations. This process ensures that the dataset is structured and
traceable for consistent statistical analysis in the ADaM framework.
ADLB <- ADLB_raw |>
group_by(USUBJID) |>
mutate(
BASE = LBSTRESN[VISIT == "Baseline"],
CHG = LBSTRESN - BASE,
PARAM = paste0(LBTEST, " (", LBSTRESU, ")"),
AVAL = LBSTRESN,
AVISIT = VISIT,
ANL01FL = ifelse(VISIT != "Baseline", "Y", "N")
) |>
ungroup() |> print()
## # A tibble: 18 × 17
## USUBJID STUDYID VISIT LBTEST LBSTRESN LBSTRESU DOMAIN.x SEX AGE ARM
## <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 SUBJ1 ABC123 Baseline Hemogl… 13.2 g/dL LB M 54 Drug…
## 2 SUBJ1 ABC123 Week 4 Hemogl… 13.8 g/dL LB M 54 Drug…
## 3 SUBJ1 ABC123 Week 8 Hemogl… 14 g/dL LB M 54 Drug…
## 4 SUBJ2 ABC123 Baseline Hemogl… 12.8 g/dL LB F 61 Plac…
## 5 SUBJ2 ABC123 Week 4 Hemogl… 13.1 g/dL LB F 61 Plac…
## 6 SUBJ2 ABC123 Week 8 Hemogl… 13.3 g/dL LB F 61 Plac…
## 7 SUBJ3 ABC123 Baseline Hemogl… 14.1 g/dL LB M 49 Drug…
## 8 SUBJ3 ABC123 Week 4 Hemogl… 14.8 g/dL LB M 49 Drug…
## 9 SUBJ3 ABC123 Week 8 Hemogl… 15 g/dL LB M 49 Drug…
## 10 SUBJ4 ABC123 Baseline Hemogl… 13.5 g/dL LB F 58 Plac…
## 11 SUBJ4 ABC123 Week 4 Hemogl… 13.6 g/dL LB F 58 Plac…
## 12 SUBJ4 ABC123 Week 8 Hemogl… 13.7 g/dL LB F 58 Plac…
## 13 SUBJ5 ABC123 Baseline Hemogl… 13 g/dL LB M 62 Drug…
## 14 SUBJ5 ABC123 Week 4 Hemogl… 13.3 g/dL LB M 62 Drug…
## 15 SUBJ5 ABC123 Week 8 Hemogl… 13.5 g/dL LB M 62 Drug…
## 16 SUBJ6 ABC123 Baseline Hemogl… 12.9 g/dL LB F 53 Plac…
## 17 SUBJ6 ABC123 Week 4 Hemogl… 13 g/dL LB F 53 Plac…
## 18 SUBJ6 ABC123 Week 8 Hemogl… 13.2 g/dL LB F 53 Plac…
## # ℹ 7 more variables: DOMAIN.y <chr>, BASE <dbl>, CHG <dbl>, PARAM <chr>,
## # AVAL <dbl>, AVISIT <chr>, ANL01FL <chr>
proc sort data=ADLB_raw; by USUBJID VISIT; run;
data ADLB;
set ADLB_raw;
by USUBJID;
retain BASE;
/* Identify baseline value */
if VISIT = "Baseline" then BASE = LBSTRESN;
/* Calculate change from baseline */
CHG = LBSTRESN - BASE;
/* Define analysis variables */
PARAM = catx(" ", LBTEST, "(", LBSTRESU, ")");
AVAL = LBSTRESN;
AVISIT = VISIT;
if VISIT ne "Baseline" then ANL01FL = "Y";
else ANL01FL = "N";
run;
title "ADaM Laboratory Dataset (ADLB)";
proc print data=ADLB noobs;
run;
The ADSL (Subject-Level Analysis Dataset) contains
one record per subject and serves as the cornerstone of all ADaM
datasets. It provides key demographic and treatment information used to
define analysis populations. In this step, we derive flags such as the
Intent-to-Treat (ITTFL) and Safety
(SAFFL) indicators—both set to “Y” for subjects
included in the analysis. We also define the planned
(TRT01P) and actual
(TRT01A) treatment arms based on the variable
ARM from the Demographics dataset.
ADSL <- DM |>
mutate(
ITTFL = "Y",
SAFFL = "Y",
TRT01A = ARM,
TRT01P = ARM
) |> print()
## STUDYID USUBJID SEX AGE ARM DOMAIN ITTFL SAFFL TRT01A TRT01P
## 1 ABC123 SUBJ1 M 54 Drug A DM Y Y Drug A Drug A
## 2 ABC123 SUBJ2 F 61 Placebo DM Y Y Placebo Placebo
## 3 ABC123 SUBJ3 M 49 Drug A DM Y Y Drug A Drug A
## 4 ABC123 SUBJ4 F 58 Placebo DM Y Y Placebo Placebo
## 5 ABC123 SUBJ5 M 62 Drug A DM Y Y Drug A Drug A
## 6 ABC123 SUBJ6 F 53 Placebo DM Y Y Placebo Placebo
data ADSL;
set DM;
length ITTFL SAFFL TRT01A TRT01P $1.;
ITTFL = "Y"; /* Intent-to-treat flag */
SAFFL = "Y"; /* Safety flag */
TRT01A = ARM; /* Actual treatment */
TRT01P = ARM; /* Planned treatment */
run;
title "ADaM Subject-Level Dataset (ADSL)";
proc print data=ADSL noobs;
run;
With the ADaM datasets prepared, the next step is to generate Tables, Listings, and Figures (TLFs) — the key outputs used in clinical study reports and regulatory submissions. In this example, we summarize the mean change from baseline in hemoglobin by treatment arm and visit, calculating the sample size, mean, and standard deviation for each group. The results are then visualized through a line plot with error bars representing the mean ± standard deviation (SD), allowing for an easy comparison of treatment effects over time.
In this step, we use the analysis-ready dataset
(ADLB) to generate a summary table of
hemoglobin changes from baseline. The data are filtered to include only
records flagged for analysis (ANL01FL = "Y"), then grouped
by treatment arm (ARM) and visit
(AVISIT). For each group, we calculate the
sample size (N), the mean change
from baseline (Mean_Change), and its
standard deviation (SD_Change).
summary_stats <- ADLB |>
filter(ANL01FL == "Y") |>
group_by(ARM, AVISIT) |>
summarise(
N = n(),
Mean_Change = round(mean(CHG), 2),
SD_Change = round(sd(CHG), 2)
)
## `summarise()` has grouped output by 'ARM'. You can override using the `.groups`
## argument.
summary_stats
## # A tibble: 4 × 5
## # Groups: ARM [2]
## ARM AVISIT N Mean_Change SD_Change
## <chr> <chr> <int> <dbl> <dbl>
## 1 Drug A Week 4 3 0.53 0.21
## 2 Drug A Week 8 3 0.73 0.21
## 3 Placebo Week 4 3 0.17 0.12
## 4 Placebo Week 8 3 0.33 0.15
proc sort data=ADLB;
by ARM AVISIT;
run;
proc means data=ADLB n mean stddev maxdec=2;
where ANL01FL = "Y"; /* Include only analysis records */
class ARM AVISIT;
var CHG;
output out=summary_stats
n=N
mean=Mean_Change
stddev=SD_Change;
run;
/* Display summary table */
title "Summary Statistics: Mean Change from Baseline in Hemoglobin";
proc print data=summary_stats noobs;
var ARM AVISIT N Mean_Change SD_Change;
run;
Now, we visualize the mean change from baseline in hemoglobin over time for each treatment arm. Using both R and SAS, line plots with markers and error bars are created to display the average change (mean) and the corresponding variability (± standard deviation) at each study visit. This visualization provides a clear and intuitive comparison of how hemoglobin levels evolve across visits between the treatment and placebo groups, effectively highlighting overall trends and treatment effects.
ggplot(summary_stats, aes(x = AVISIT, y = Mean_Change, group = ARM, color = ARM)) +
geom_line(size = 1.2) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = Mean_Change - SD_Change, ymax = Mean_Change + SD_Change), width = 0.2) +
labs(
title = "Mean Change from Baseline in Hemoglobin",
x = "Visit",
y = "Mean Change (g/dL)",
color = "Treatment Arm"
) +
theme_classic(base_size = 13)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
data summary_plot;
set summary_stats;
where _TYPE_ = 3; /* Keep group-level means only */
LowBar = Mean_Change - SD_Change;
HighBar = Mean_Change + SD_Change;
run;
ods graphics / width=5in height=3.5in imagename="MeanChange_Hb_NoGrid";
proc sgplot data=summary_plot noborder;
/* ggplot2-style colors */
styleattrs datacontrastcolors=(CXE41A1C CX377EB8)
datasymbols=(circlefilled)
datalinepatterns=(solid);
/* Error bars (±1 SD) */
highlow x=AVISIT low=LowBar high=HighBar /
group=ARM
type=line
lineattrs=(thickness=1 pattern=solid)
transparency=0.2
name="ErrBars";
/* Mean line with markers */
series x=AVISIT y=Mean_Change / group=ARM
lineattrs=(thickness=2)
markers markerattrs=(symbol=circlefilled size=9)
name="MeanLine";
/* Legend for treatment arm */
keylegend "MeanLine" / title="Treatment Arm" position=right across=1;
/* Clean axes — no grid lines */
xaxis label="Visit" discreteorder=data display=(noticks);
yaxis label="Mean Change (g/dL)" display=(noticks);
/* Title */
title "Mean Change from Baseline in Hemoglobin";
run;
ods graphics off;
| ADaM Variable | Derived From | Description |
|---|---|---|
AVAL |
LB.LBSTRESN |
Lab test numeric result |
BASE |
LB.LBSTRESN (Baseline visit) |
Baseline value |
CHG |
AVAL - BASE |
Change from baseline |
ARM |
DM.ARM |
Treatment arm |
ANL01FL |
Derived | Analysis record flag (Y/N) |
| Step | Data Standard | Description | Example Output |
|---|---|---|---|
| Data Collection | CDASH | Raw CRF data collected at site | raw_demographics, raw_labs |
| Data Cleaning | Pre-SDTM QC | Check completeness, consistency | clean_DM, clean_LB |
| Data Tabulation | SDTM | Structured data by domain | DM, LB |
| Analysis Preparation | ADaM | Derived, merged, analysis-ready | ADSL, ADLB |
| Statistical Results | TLFs | Tables, Listings, Figures | summary_stats, plots |
CDASH = standardized data collection
SDTM = standardized data organization for regulators
ADaM = standardized data preparation for analysts
TLFs = the final outputs (tables, listings, figures)
Each layer ensures traceability, quality, and regulatory compliance