Skip to contents

Sample covariates from the NHANES database

Usage

sample_covariates_nhanes(
  covariates = NULL,
  year = "2017-2018",
  n_subjects = 100,
  conditional = NULL,
  use_weights = TRUE,
  seed = NULL,
  dictionary = NULL,
  na.rm = TRUE,
  cache_dir = nhanes_default_cache_dir(),
  ...
)

Arguments

covariates

character vector of NHANES variable names to include in the output, e.g. c("RIDAGEYR", "BMXBMI", "WTMEC2YR"). If NULL (default), all variables in the cached data are returned (SEQN is always dropped).

year

NHANES survey cycle, e.g. "2017-2018". Supported values: "1999-2000", "2001-2002", "2003-2004", "2005-2006", "2007-2008", "2009-2010", "2011-2012", "2013-2014", "2015-2016", "2017-2018", "2019-2020".

n_subjects

number of subjects to sample. Default is 100.

conditional

list with conditional limits for sampled population, e.g. list("RIDAGEYR" = c(18, 65), "BMXBMI" = c(18, 35)). Filters are applied before sampling.

use_weights

logical. If TRUE, use NHANES 2-year MEC examination weights (WTMEC2YR) for probability-proportional sampling, which produces a sample more representative of the U.S. civilian non-institutionalized population. Requires WTMEC2YR to be present in the cached data (included when "DEMO" was downloaded). Default TRUE.

seed

integer random seed passed to set.seed() for reproducibility. Default NULL does not set a seed.

dictionary

named list mapping user-defined covariate names to their NHANES variable names, e.g. list("WT" = "BMXWT", "HT" = "BMXHT", "AGE" = "RIDAGEYR"). Names in covariates and conditional that appear as keys in dictionary are translated to the corresponding NHANES names before lookup and translated back in the output. Names not present in dictionary are treated as direct NHANES variable names.

na.rm

logical. If TRUE (default), rows with NA in any of the requested covariates are dropped before sampling.

cache_dir

path to a directory containing a merged NHANES RDS file created by download_nhanes_cache(). Defaults to the package-level cache populated automatically on first load. Set to NULL to always download on demand via nhanesA (requires internet).

...

additional arguments (currently unused)

Value

a data.frame with n_subjects rows and the requested covariates as columns.

Details

On first load, irxforge automatically downloads NHANES Demographics, Laboratory, and Examination tables (cycle 2017-2018) and saves a single merged RDS file in the package installation directory. Subsequent calls read from this cache with no internet access required.

Call download_nhanes_cache() to pre-download additional years or groups.

If the cache file for the requested year is absent, an error is raised with instructions to run download_nhanes_cache().

NHANES uses a complex multi-stage sampling design. Survey weights reflect the probability of selection and non-response. Use use_weights = TRUE to account for this when sampling.

Key covariates in the default cache (NHANES 2017-2018)

The default cache merges all Demographics, Laboratory, and Examination tables. The full merged dataset contains all variables from every table in those groups; the most commonly used ones are listed below. Measurement variables only — administrative comment-code fields and SI-unit duplicates are omitted for brevity.

Demographics (DEMO_J)

VariableDescription
RIDAGEYRAge in years at screening (top-coded at 80)
RIDAGEMNAge in months at screening (ages ≤ 24 months)
RIAGENDRGender (1 = Male, 2 = Female)
RIDRETH1Race/Hispanic origin
RIDRETH3Race/Hispanic origin (includes Non-Hispanic Asian)
RIDEXPRGPregnancy status (females 20–44 at exam)
DMDBORN4Country of birth
DMDCITZNCitizenship status
DMDEDUC2Education level – adults 20+
DMDEDUC3Education level – youth 6–19
DMDMARTLMarital status
DMDYRSUSYears in the US
DMDFMSIZTotal number of people in the family
DMDHHSIZTotal number of people in the household
INDFMIN2Total family income (range value, USD)
INDFMPIRRatio of family income to poverty guidelines
INDHHIN2Total household income (range value, USD)
WTINT2YRFull sample 2-year interview weight
WTMEC2YRFull sample 2-year MEC exam weight

Body Measures (BMX_J)

VariableDescription
BMXWTWeight (kg)
BMXHTStanding height (cm)
BMXBMIBody Mass Index (kg/m²)
BMXWAISTWaist circumference (cm)
BMXHIPHip circumference (cm)
BMXARMCArm circumference (cm)
BMXARMLUpper arm length (cm)
BMXLEGUpper leg length (cm)

Blood Pressure & Pulse (BPX_J)

VariableDescription
BPXSY1Systolic blood pressure, 1st reading (mm Hg)
BPXSY2Systolic blood pressure, 2nd reading (mm Hg)
BPXSY3Systolic blood pressure, 3rd reading (mm Hg)
BPXDI1Diastolic blood pressure, 1st reading (mm Hg)
BPXDI2Diastolic blood pressure, 2nd reading (mm Hg)
BPXDI3Diastolic blood pressure, 3rd reading (mm Hg)
BPXPLS60-second pulse (beats/min)

Glycohemoglobin (GHB_J)

VariableDescription
LBXGHGlycohemoglobin / HbA1c (%)

Standard Biochemistry Profile (BIOPRO_J)

VariableDescription
LBXSALAlbumin (g/dL)
LBXSBUBlood urea nitrogen / BUN (mg/dL)
LBXSCATotal calcium (mg/dL)
LBXSCRCreatinine, serum (mg/dL)
LBXSGLGlucose, serum (mg/dL)
LBXSGBGlobulin (g/dL)
LBXSIRIron, serum (ug/dL)
LBXSPHPhosphorus (mg/dL)
LBXSTBTotal bilirubin (mg/dL)
LBXSTPTotal protein (g/dL)
LBXSTRTriglycerides, serum (mg/dL)
LBXSUAUric acid (mg/dL)
LBXSATSIAlanine aminotransferase / ALT (U/L)
LBXSASSIAspartate aminotransferase / AST (U/L)
LBXSGTSIGamma-glutamyl transferase / GGT (IU/L)
LBXSAPSIAlkaline phosphatase / ALP (IU/L)
LBXSCKCreatine phosphokinase / CPK (IU/L)
LBXSCHTotal cholesterol, serum (mg/dL)
LBXSC3SIBicarbonate (mmol/L)
LBXSCLSIChloride (mmol/L)
LBXSKSIPotassium (mmol/L)
LBXSNASISodium (mmol/L)
LBXSOSSIOsmolality (mmol/kg)
LBXSLDSILactate dehydrogenase / LDH (IU/L)

Complete Blood Count (CBC_J)

VariableDescription
LBXWBCSIWhite blood cell count (1000 cells/µL)
LBXRBCSIRed blood cell count (million cells/µL)
LBXHGBHemoglobin (g/dL)
LBXHCTHematocrit (%)
LBXMCVSIMean cell volume (fL)
LBXMCHSIMean cell hemoglobin (pg)
LBXMCMean cell hemoglobin concentration (g/dL)
LBXRDWRed cell distribution width (%)
LBXPLTSIPlatelet count (1000 cells/µL)
LBXMPSIMean platelet volume (fL)
LBXLYPCTLymphocyte percent (%)
LBDLYMNOLymphocyte number (1000 cells/µL)
LBXNEPCTSegmented neutrophils percent (%)
LBDNENOSegmented neutrophils number (1000 cells/µL)
LBXMOPCTMonocyte percent (%)
LBDMONOMonocyte number (1000 cells/µL)
LBXEOPCTEosinophils percent (%)
LBDEONOEosinophils number (1000 cells/µL)
LBXBAPCTBasophils percent (%)
LBDBANOBasophils number (1000 cells/µL)
LBXNRBCNucleated red blood cells (/100 WBC)

Lipids

VariableTableDescription
LBXTCTCHOL_JTotal cholesterol (mg/dL)
LBDHDDHDL_JDirect HDL-cholesterol (mg/dL)
LBXTRTRIGLY_JTriglycerides (mg/dL)
LBDLDLTRIGLY_JLDL-cholesterol, Friedewald equation (mg/dL)
LBDLDLMTRIGLY_JLDL-cholesterol, Martin-Hopkins equation (mg/dL)
LBDLDLNTRIGLY_JLDL-cholesterol, NIH equation 2 (mg/dL)

Urine Albumin & Creatinine (ALB_CR_J)

VariableDescription
URXUCRCreatinine, urine (mg/dL)
URXCRSCreatinine, urine (µmol/L)
URXUMAAlbumin, urine (µg/mL)
URXUMSAlbumin, urine (mg/L)
URDACTAlbumin-creatinine ratio (mg/g)

The full merged dataset contains additional variables from all other Laboratory and Examination tables downloaded for the requested year. Use names(readRDS(file.path(cache_dir, "nhanes_<year>.rds"))) to inspect all available columns.