| Title: | Homologous Recombination Detection in Family Pedigrees |
|---|---|
| Description: | This package implements a pedigree-based algorithm to detect homologous recombination. Additional functions are supplied to detect runs of homozygosity and to phase haplotypes at informative SNPs. |
| Authors: | Catherine Mahoney [aut, cre] (ORCID: <https://orcid.org/0000-0003-0424-104X>), Michael Salter-Townshend [aut] (ORCID: <https://orcid.org/0000-0001-6232-9109>), Denis Shields [aut] (ORCID: <https://orcid.org/0000-0003-4015-2474>), Irish Research Council [fnd] (Grant EPSPG/2019/443) |
| Maintainer: | Catherine Mahoney <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.99.1 |
| Built: | 2026-07-02 21:05:13 UTC |
| Source: | https://github.com/BiocStaging/inferRecom |
Pre-computed maternal and paternal crossover events from the simCEU dataset. Used for testing and demonstrating haplotype phasing functionality.
GRanges objects stored as RDS files with metadata columns:
Child identifier
Family identifier
SNP name at crossover interval start
SNP name at crossover interval end
Genetic position at interval start (cM)
Genetic position at interval end (cM)
Two crossover datasets are provided:
xoMat.rds - Maternal crossover events
xoPat.rds - Paternal crossover events
These objects contain detected crossover events from 3-child families in the simCEU dataset. Crossovers were identified using:
# Maternal crossovers xoMat <- xoDetect( plinkFile = "simCEU", mapFile = "female_chr4.txt", familySize = 3, parent = "mother", snpFilter = 5, cmFilter = 1 ) # Paternal crossovers xoPat <- xoDetect( plinkFile = "simCEU", mapFile = "male_chr4.txt", familySize = 3, parent = "father", snpFilter = 5, cmFilter = 1 )
Access the crossover data using:
dataPath <- system.file("extdata", package = "inferRecom")
xoMat <- readRDS(file.path(dataPath, "xoMat.rds"))
xoPat <- readRDS(file.path(dataPath, "xoPat.rds"))
Sex-specific and sex-averaged genetic maps for chromosome 4.
Tab-delimited text file with columns:
Chromosome identifier (chr4)
Physical position in base pairs (hg19/GRCh37)
Recombination rate (cM/Mb) at this position
Cumulative genetic distance in centiMorgans from chromosome start
Access the files using:
femaleMap <- read.delim(file.path(dataPath, "female_chr4.txt"))
Derived from European sex-specific maps from Bherer, et al. (2014).
Bhérer, C., Campbell, C. L., & Auton, A. (2017). Refined genetic maps reveal sexual dimorphism in human meiotic recombination at multiple scales. Nature communications, 8(1), 14994.
This function identifies runs of homozygosity (ROH) in PLINK genotype data based on minimum physical length and SNP count thresholds. It can optionally filter for specific samples, apply a minimum genetic distance criterion, and separate results by case/control status.
hzRun( plinkFile, mapFile = NULL, minMb = 1, minSnps = 5, minCm = NULL, rsOnly = FALSE, sampleIds = NULL, caseControl = FALSE, BPPARAM = SerialParam() )hzRun( plinkFile, mapFile = NULL, minMb = 1, minSnps = 5, minCm = NULL, rsOnly = FALSE, sampleIds = NULL, caseControl = FALSE, BPPARAM = SerialParam() )
plinkFile |
Character string; path to the PLINK prefix (without extensions). |
mapFile |
Character string or NULL; path to a tab-delimited genetic map
file containing columns: |
minMb |
Numeric; minimum physical length of ROH in megabases. Default is 1. |
minSnps |
Integer; minimum number of consecutive homozygous SNPs required for an ROH. Default is 5. |
minCm |
Numeric or NULL; minimum genetic length of ROH in centiMorgans. If NULL, no genetic length filtering is applied. Requires mapFile to be specified. Default is NULL. |
rsOnly |
Logical; whether to restrict analysis to SNPs with names beginning with "rs". Default is FALSE. |
sampleIds |
Character vector or NULL; specific sample IDs to analyze. If NULL, all samples are analyzed. Default is NULL. |
caseControl |
Logical; if TRUE, function returns a GRanges of case and control ROH. Requires affected status in FAM file. Default is FALSE. |
BPPARAM |
A BiocParallelParam object specifying parallel execution.
Default is |
Runs of homozygosity (ROH) are continuous genomic segments where an individual is homozygous at all marker positions. ROH can indicate autozygosity (inheritance of identical haplotypes from a common ancestor) and are used to estimate inbreeding, identify disease-associated loci, and study population history.
This function identifies ROH by:
Converting genotypes to binary (1 = homozygous, 0 = heterozygous/missing)
Identifying runs using run-length encoding
Filtering runs by minimum SNP count and physical length
Optionally filtering by minimum genetic length (cM)
Genotypes from snpStats are coded as:
0 = homozygous reference (AA) - counted as homozygous
1 = heterozygous (AB) - breaks ROH
2 = homozygous alternate (BB) - counted as homozygous
NA = missing - breaks ROH
Filtering strategy:
Physical length (minMb): Always applied
SNP count (minSnps): Always applied
Genetic length (minCm): Applied only if mapFile is provided and minCm is not NULL
Parallel processing is performed by individual, improving efficiency for datasets with many samples.
A GRanges object containing detected ROH with metadata
columns: sampleId, startSnp, finishSnp, numSnps.
If genetic map is provided, also includes startCm, finishCm.
If caseControl = TRUE, returns a GRanges with
two elements: case and control. Returns an empty
GRanges if no ROH are detected.
# Serial execution (default) # Get path to example data dataPath <- system.file("extdata", package = "inferRecom") plinkFile <- file.path(dataPath, "simCEU") # Detect ROH rohData <- hzRun( plinkFile = plinkFile, minMb = 1, minSnps = 50 ) # View results head(rohData) # Analyze specific individuals sampleIds <- c("F3C1", "F4C3", "F5P2") rohSubset <- hzRun( plinkFile = plinkFile, minMb = .5, minSnps = 50, sampleIds = sampleIds ) rohSubset # Separate by case/control status rohCc <- hzRun( plinkFile = plinkFile, minMb = .5, minSnps = 50, caseControl = TRUE ) rohCases <- rohCc$case rohControls <- rohCc$control rohCases rohControls# Serial execution (default) # Get path to example data dataPath <- system.file("extdata", package = "inferRecom") plinkFile <- file.path(dataPath, "simCEU") # Detect ROH rohData <- hzRun( plinkFile = plinkFile, minMb = 1, minSnps = 50 ) # View results head(rohData) # Analyze specific individuals sampleIds <- c("F3C1", "F4C3", "F5P2") rohSubset <- hzRun( plinkFile = plinkFile, minMb = .5, minSnps = 50, sampleIds = sampleIds ) rohSubset # Separate by case/control status rohCc <- hzRun( plinkFile = plinkFile, minMb = .5, minSnps = 50, caseControl = TRUE ) rohCases <- rohCc$case rohControls <- rohCc$control rohCases rohControls
A simulated CEU (Utah residents with Northern and Western European ancestry) dataset for demonstrating crossover detection and ROH analysis.
PLINK binary format files (.bed, .bim, .fam) containing:
Simulated family trios and larger pedigrees with 2-3 children
Chromosome 4 markers
Simulated genotype data with realistic LD structure
Mix of 2-child and 3-child families for testing
The dataset includes three PLINK binary files:
simCEU.bed - Binary genotype data
simCEU.bim - Variant information (SNP IDs, positions, alleles)
simCEU.fam - Sample information (family IDs, relationships)
Access the files using:
dataPath <- system.file("extdata", package = "inferRecom")
plinkFile <- file.path(dataPath, "simCEU")
Simulated data based on CEU population structure from 1000 Genomes
Project and R package sim1000G
Siva, N. (2008). 1000 Genomes project.
Dimitromanolakis, A., Xu, J., Krol, A., & Briollais, L. (2019). sim1000G: a user-friendly genetic variant simulator in R for unrelated individuals and family-based designs. BMC bioinformatics, 20(1), 26.
This function identifies recombination events in family-based genotype data, using PLINK-formatted input files and a genetic map. It supports 2- or 3-child family structures, and can optionally write results to disk if an output path is provided.
xoDetect( plinkFile, mapFile, familySize = c(2, 3), parent = c("mother", "father"), rsOnly = FALSE, snpFilter = 5, cmFilter = 1, caseControl = FALSE, out = NULL, BPPARAM = SerialParam() )xoDetect( plinkFile, mapFile, familySize = c(2, 3), parent = c("mother", "father"), rsOnly = FALSE, snpFilter = 5, cmFilter = 1, caseControl = FALSE, out = NULL, BPPARAM = SerialParam() )
plinkFile |
Character string; path to the PLINK prefix (without extensions). |
mapFile |
Character string; path to a tab-delimited map file containing
columns: |
familySize |
Integer; either 2 or 3, specifying the family pedigree size. |
parent |
Character; either |
rsOnly |
Logical; whether to restrict to SNPs with names beginning
with "rs". Default is |
snpFilter |
Integer; minimum number of SNPs separating putative crossovers. Default is 5. |
cmFilter |
Numeric; minimum genetic distance (cM) separating putative crossovers. Default is 1. |
caseControl |
Logical; if TRUE, function returns a
GRangesList of case and control
crossovers for 3+ child families. Default is |
out |
Character or NULL; if provided, output is written as a CSV at this path. |
BPPARAM |
A BiocParallelParam object specifying parallel execution.
Default is |
This function reads PLINK genotype data (via snpStats) and infers loci where meiotic recombination has occurred, resolved to the nearest informative SNPs. The algorithm identifies informative marker configurations where parents are heterozygous/homozygous and filters for Mendelian consistency before detecting inheritance state changes that indicate crossovers.
For 3-child families, children are analyzed in triples to allow identification of the specific recombinant child. For 2-child families, state changes between the two siblings are used to infer crossover locations, but the individual cannot be determined.
If out is provided, the resulting table is written to disk as
UTF-8 encoded CSV without row names.
Parallel processing is performed by family, improving efficiency for datasets with many families.
A GRanges listing detected
crossover intervals with metadata columns: childId (in 3-child
families), familyId, startSnp, finishSnp,
startPos, finishPos, startCm, finishCm.
# Serial execution (default) # Load example data dataPath <- system.file("extdata", package = "inferRecom") plinkFile <- file.path(dataPath, "simCEU") # Basic maternal crossover detection mapFemale <- file.path(dataPath, "female_chr4.txt") xoMat3 <- xoDetect( plinkFile = plinkFile, mapFile = mapFemale, familySize = 3, parent = "mother" ) # View results xoMat3 # Basic 2-child family xoMat2 <- xoDetect( plinkFile = plinkFile, mapFile = mapFemale, familySize = 2, parent = "mother" ) # View results xoMat2 # Maternal crossover detection with case/control separation mapFemale <- file.path(dataPath, "female_chr4.txt") xoMat3CC <- xoDetect( plinkFile = plinkFile, mapFile = mapFemale, familySize = 3, parent = "mother", caseControl = TRUE ) # View results xoMat3CC$case xoMat3CC$control# Serial execution (default) # Load example data dataPath <- system.file("extdata", package = "inferRecom") plinkFile <- file.path(dataPath, "simCEU") # Basic maternal crossover detection mapFemale <- file.path(dataPath, "female_chr4.txt") xoMat3 <- xoDetect( plinkFile = plinkFile, mapFile = mapFemale, familySize = 3, parent = "mother" ) # View results xoMat3 # Basic 2-child family xoMat2 <- xoDetect( plinkFile = plinkFile, mapFile = mapFemale, familySize = 2, parent = "mother" ) # View results xoMat2 # Maternal crossover detection with case/control separation mapFemale <- file.path(dataPath, "female_chr4.txt") xoMat3CC <- xoDetect( plinkFile = plinkFile, mapFile = mapFemale, familySize = 3, parent = "mother", caseControl = TRUE ) # View results xoMat3CC$case xoMat3CC$control
This function phases parental and child haplotypes based on detected crossover events from maternal and paternal meioses.
xoPhase( plinkFile, xoDetectPaternal, xoDetectMaternal, famIds = NULL, rsOnly = TRUE, outputFormat = c("list", "summarizedExperiment", "vcf"), vcfOutput = "phased", BPPARAM = SerialParam() )xoPhase( plinkFile, xoDetectPaternal, xoDetectMaternal, famIds = NULL, rsOnly = TRUE, outputFormat = c("list", "summarizedExperiment", "vcf"), vcfOutput = "phased", BPPARAM = SerialParam() )
plinkFile |
Character string; path to the PLINK prefix (without extensions). |
xoDetectPaternal |
GRanges; output
from |
xoDetectMaternal |
GRanges; output from
|
famIds |
Character vector or NULL; specific family IDs to phase. If NULL, all families with 5+ members are analyzed. Default is NULL. |
rsOnly |
Logical; whether to restrict to SNPs with names beginning with "rs". Default is TRUE. |
outputFormat |
Character string; format for output. Options are "list" (default, returns named list of DataFrames), "summarizedExperiment" (returns SummarizedExperiment object), or "vcf" (writes VCF files). |
vcfOutput |
Character string; path prefix for VCF output files. Only
used when outputFormat = "vcf". Each family will be written to a
separate VCF file with suffix "_ |
BPPARAM |
A BiocParallelParam object specifying parallel execution.
Default is |
This function performs haplotype phasing using family-based genetic data and detected crossover events. The phasing process:
Identifies informative SNPs where parents are heterozygous/homozygous
Assigns alleles (1 or 2) to children based on inheritance
Converts allele codes to nucleotides (A, C, G, T)
Imputes homozygous sites from the other parent
Phases parental haplotypes using crossover breakpoints
Adds back non-informative SNPs with inferred genotypes where possible
Uses neighboring informative SNPs to infer phase for ambiguous cases
Outputs diploid genotypes for all family members across all SNPs
Non-informative SNPs are handled as follows:
Both parents homozygous (same allele): Genotypes confidently inferred
Both parents heterozygous: Phase uncertain, set to NA
One parent heterozygous, one homozygous: Phase inferred from neighboring informative SNPs using haplotype continuity
The function requires crossover detection results from both parents as
GRanges objects. Run xoDetect() separately for maternal and paternal
crossovers before using this function.
Parallel processing is performed by family, improving efficiency for datasets with many families.
Output format: Each phased family contains columns:
rsID - SNP identifier
location - Physical position in base pairs
For each child: <childID>Pat, <childID>Mat - paternal
and maternal haplotypes
For parents: <parentID>_1, <parentID>_2 - two
haplotypes
Depends on outputFormat:
"list": A named list of DataFrame objects, one per family. Each DataFrame contains phased haplotypes with columns for SNP ID, physical position, and phased alleles for each family member.
"summarizedExperiment": A SummarizedExperiment object containing phased genotypes across all families with rowData (SNP information) and colData (sample information).
"vcf": Writes VCF files and returns paths to created files.
Returns a message if no phase information is available for a family.
# Load example data dataPath <- system.file("extdata", package = "inferRecom") plinkFile <- file.path(dataPath, "simCEU") mapFemale <- file.path(dataPath, "female_chr4.txt") mapMale <- file.path(dataPath, "male_chr4.txt") # Detect crossovers for both parents (returns GRanges objects) xoPat <- xoDetect( plinkFile = plinkFile, mapFile = mapMale, familySize = 3, parent = "father" ) xoMat <- xoDetect( plinkFile = plinkFile, mapFile = mapFemale, familySize = 3, parent = "mother" ) # Phase haplotypes (serial execution, default) phased <- xoPhase( plinkFile = plinkFile, xoDetectPaternal = xoPat, xoDetectMaternal = xoMat ) # Access phased haplotypes for a specific family family1Phase <- phased[[1]] family1Phase # Phase haplotypes for subset of families phasedSubset <- xoPhase( plinkFile = plinkFile, xoDetectPaternal = xoPat, xoDetectMaternal = xoMat, famIds = c("F4", "F6") ) phasedSubset# Load example data dataPath <- system.file("extdata", package = "inferRecom") plinkFile <- file.path(dataPath, "simCEU") mapFemale <- file.path(dataPath, "female_chr4.txt") mapMale <- file.path(dataPath, "male_chr4.txt") # Detect crossovers for both parents (returns GRanges objects) xoPat <- xoDetect( plinkFile = plinkFile, mapFile = mapMale, familySize = 3, parent = "father" ) xoMat <- xoDetect( plinkFile = plinkFile, mapFile = mapFemale, familySize = 3, parent = "mother" ) # Phase haplotypes (serial execution, default) phased <- xoPhase( plinkFile = plinkFile, xoDetectPaternal = xoPat, xoDetectMaternal = xoMat ) # Access phased haplotypes for a specific family family1Phase <- phased[[1]] family1Phase # Phase haplotypes for subset of families phasedSubset <- xoPhase( plinkFile = plinkFile, xoDetectPaternal = xoPat, xoDetectMaternal = xoMat, famIds = c("F4", "F6") ) phasedSubset