This vignette will show you how to use the cancerR
package to classify cancer subtypes using the information available from
pathology reports which are typically coded using the International
Classification of Diseases for Oncology (ICD-O) system. This information
is typically available in cancer registries and can be used to classify
the type of cancer.
library(cancerR)
# Make example data
data <- data.frame(
icd_o3_histology = c("8522", "9490", "9070"),
# Different formats of site codes commonly found in cancer registries
icd_o3_site = c("C50.1", "C701", "620"),
icd_o3_behaviour = c("3", "3", "3")
)
head(data)
#> icd_o3_histology icd_o3_site icd_o3_behaviour
#> 1 8522 C50.1 3
#> 2 9490 C701 3
#> 3 9070 620 3
The site_convert()
function can be used to extract the
correct site (a.k.a. topography) codes and convert them to a
standardized numeric format. It is designed to handle both character and
numeric input and will automatically detect if the codes are in decimal
(“C34.1”) or integer (“C341”) format and convert them.
# Convert site codes
data$site_conv <- site_convert(data$icd_o3_site, validate = FALSE)
head(data)
#> icd_o3_histology icd_o3_site icd_o3_behaviour site_conv
#> 1 8522 C50.1 3 501
#> 2 9490 C701 3 701
#> 3 9070 620 3 620
site_convert()
also has built-in validation to ensure
that the site codes have the correct numeric values ranging from “C00.0”
to “C97.9”. This can be called by specifying the validate
argument as TRUE
.
# Valid site codes
site_convert("C34.1", validate = TRUE)
#> [1] 341
# Invalid site codes
site_convert("C99.9", validate = TRUE) # Should return NA and an warning message
#> Warning in site_convert("C99.9", validate = TRUE): There were 1 invalid ICD-O-3
#> site codes found and set to NA.
#> [1] NA
site_convert("C99.9", validate = FALSE) # Should return 999
#> [1] 999
The aya_class()
function can be used to classify
adolescent and young adult cancer based on the histology, site, and
behaviour codes of the cancer.
The method used for the classification can be specified using one of
the method
arguments specified below:
"Barr 2020"
(default) - Classification
based on the AYA classification by Barr et al
"SEER 2020"
- S.E.E.R. 2020
Recode Revision
"SEER-WHO v2008"
- S.E.E.R. WHO
2008
"SEER v2006"
- S.E.E.R.
2006
Users can also specify the depth of the classification tree using the
depth
argument. The depth parameter specifies the maximum
depth of the classification tree, with 1 being the highest level of
classification and most general grouping.
# Classify AYA cancers using Barr 2020 classification
# Classify at level 1 (most general)
data$dx_lvl_1 <- aya_class(data$icd_o3_histology, data$icd_o3_site, data$icd_o3_behaviour, depth = 1)
# Add more granular classifications
data$dx_lvl_2 <- aya_class(
histology = data$icd_o3_histology,
site = data$site_conv,
behaviour = data$icd_o3_behaviour,
depth = 2
)
# Add even more granular classifications (level 3) using SEER 2020 revision classification
data$dx_lvl_3 <- aya_class(
histology = data$icd_o3_histology,
site = site_convert(data$icd_o3_site), # Convert site codes using site_convert()
behaviour = data$icd_o3_behaviour,
method = "SEER v2020",
depth = 3
)
# View created columns
print(data[, c("dx_lvl_1", "dx_lvl_2", "dx_lvl_3")])
#> dx_lvl_1
#> 1 9. Carcinomas
#> 2 3. CNS and other intracranial and intraspinal neoplasms
#> 3 7. Gonadal and related tumors
#> dx_lvl_2
#> 1 9.6 Carcinoma of breast
#> 2 3.3 Neuroblastomas/ganglioneuromas
#> 3 7.1 Testis
#> dx_lvl_3
#> 1 9.6.1 Breast - infiltrating duct
#> 2 3.3.2 Neuroblastoma/ganglioneuroblastoma - invasive
#> 3 7.1.1 Germ cell and trophoblastic
Similarly, the kid_class()
function can be used to
classify childhood cancers.
The method used for the classification can be specified using one of
the method
arguments specified below:
"iccc3"
(default) - Classification
based on the International
Classification of Childhood Cancer, 3rd ed. (ICCC-3)
"who-iccc3"
- ICCC-3 Recode
ICD-O-3/WHO 2008
"iarc2017"
- ICCC-3 /
IARC2017
# Make example data
data_kid <- data.frame(
histology = c("8522", "9490", "9070"),
site = c("C50.1", "C701", "620"),
behaviour = c("3", "3", "3")
)
# Classify childhood cancers using ICCC-3 classification
data_kid$dx_lvl_1 <- kid_class(data_kid$histology, data_kid$site, depth = 1) # ICCC-3
data_kid$dx_lvl_1.seer <- kid_class(data_kid$histology, data_kid$site, method = "who-iccc3", depth = 1) # WHO-SEER recode
# Add SEER grouping column
data_kid$seer_grp <- kid_class(data_kid$histology, data_kid$site, depth = 99)
# View results
head(data_kid)
#> histology site behaviour
#> 1 8522 C50.1 3
#> 2 9490 C701 3
#> 3 9070 620 3
#> dx_lvl_1
#> 1 XI. Other malignant epithelial neoplasms and malignant melanomas
#> 2 IV. Neuroblastoma and other peripheral nervous cell tumors
#> 3 X. Germ cell tumors, trophoblastic tumors, and neoplasms of gonads
#> dx_lvl_1.seer seer_grp
#> 1 XI. Other malignant epithelial neoplasms and malignant melanomas 102
#> 2 IV. Neuroblastoma and other peripheral nervous cell tumors 33
#> 3 X. Germ cell tumors, trophoblastic tumors, and neoplasms of gonads 85