This package provides tools for analyzing meiotic recombination, runs of homozygosity, and long-distance haplotype phase in family-based genetic data.
inferRecom enables rapid, accurate detection of meiotic
recombination in nuclear families. The following vignette detects
recombination in a simulated set of pedigrees of varying sizes based on
the CEU cohort from the 1000 Genomes Project using the package
sim1000G and formatted in PLINK binary format. The
sex-specific recombination rate maps in Europeans used in this example
were generated by Bhérer, et al., and are available via Github.
This vignette also showcases several auxiliary functions that facilitate visualization, analysis of regions of homozygosity, and chromosome-scale phasing of informative markers.
The first section demonstrates the core function xoDetect(), which localizes and individualizes recombination in families with both parents and three or more children, and localizes recombination in families with both parents and exactly two children. It filters artifacts arising from genotyping error and other experimental error using the default SNP-based criterion of five consecutive informative SNPs to confirm a recombination event, as well as a map-based criterion of one centiMorgan to confirm a recombination event.
# Detect recombinations in families with two parents and three or more children
# and families with two parents and exactly two children
# Filter for recombination with SNP-based and cM-based filters at default values
# Option to select for only SNPs that have an rs number is disabled by default
exampleDir <- system.file("extdata", package = "inferRecom")
simCEU <- file.path(exampleDir, "simCEU")
mapFemale <- file.path(exampleDir, "female_chr4.txt")
mapMale <- file.path(exampleDir, "male_chr4.txt")
pat3 <- xoDetect(simCEU, mapMale, familySize = 3, parent = "father", caseControl = FALSE)
mat3 <- xoDetect(simCEU, mapFemale, familySize = 3, parent = "mother", caseControl = FALSE)
pat2 <- xoDetect(simCEU, mapMale, familySize = 2, parent = "father")
mat2 <- xoDetect(simCEU, mapFemale, familySize = 2, parent = "mother")
pat3
#> GRanges object with 3 ranges and 6 metadata columns:
#> seqnames ranges strand | childId familyId startSnp
#> <Rle> <IRanges> <Rle> | <character> <character> <character>
#> [1] chr4 78432058-78432285 * | F6C2 F6 rs10026702
#> [2] chr4 78393878-78424195 * | F8C3 F8 rs4450960
#> [3] chr4 78084453-78100731 * | F8C5 F8 rs17327923
#> finishSnp startCm finishCm
#> <character> <numeric> <numeric>
#> [1] rs10026918 57.3462 57.3462
#> [2] rs17400095 57.3424 57.3462
#> [3] rs2645687 57.3172 57.3178
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengthsThe output of the functions is a GRanges object or list showing the individual/family IDs, SNP IDs, locations, and recombination rate map centiMorgan distances for each recombination. Given that families with two children do not have sufficient information to infer the recombinant child, the ‘CHILD’ column is omitted in the output. If the caseControl option is specified, a list of GRanges separated by case/control status is returned. The results with the default filtering parameters are shown below.
Filtering for closely-spaced events can remove genotyping error and other artefacts in the data that are not consistent with true homologous recombination. The default value for the SNP filter is 5, and for the genetic-distance filter is 1cM. Setting the genetic distance-based filter to 0 removes the genetic distance condition altogether. For comparison, the output below shows paternal recombinations in families with three or more children without the genetic distance filter.
pat3_0cM <- xoDetect(simCEU,
mapMale,
familySize = 3,
parent = "father",
caseControl = FALSE,
cmFilter = 0
)
pat3_0cM
#> GRanges object with 5 ranges and 6 metadata columns:
#> seqnames ranges strand | childId familyId startSnp
#> <Rle> <IRanges> <Rle> | <character> <character> <character>
#> [1] chr4 78059331-78077129 * | F6C1 F6 rs12502028
#> [2] chr4 78828799-78833246 * | F6C1 F6 rs10518181
#> [3] chr4 78432058-78432285 * | F6C2 F6 rs10026702
#> [4] chr4 78393878-78424195 * | F8C3 F8 rs4450960
#> [5] chr4 78084453-78100731 * | F8C5 F8 rs17327923
#> finishSnp startCm finishCm
#> <character> <numeric> <numeric>
#> [1] rs1874564 57.2841 57.3170
#> [2] rs6836305 57.6484 57.6484
#> [3] rs10026918 57.3462 57.3462
#> [4] rs17400095 57.3424 57.3462
#> [5] rs2645687 57.3172 57.3178
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengthsRemoving the genetic distance filter results in the detection of two recombination intervals in child 1 in pedigree 6 that are separated by only 0.33cM (~750kb), which is unlikely to represent a true meiotic recombination due to interference in crossing over.
GenomicRanges output from
xoDetect() using trackViewer and
rtracklayerIt is straightforward to visualize recombination intervals using the
Bioconductor packages trackViewer and
rtracklayer. Here, case and control intervals are plotted
for basic comparison using trackViewer.
mat3CC <- xoDetect(simCEU,
mapFemale,
familySize = 3,
parent = "mother",
caseControl = TRUE
)
# Add feature column for compatibility with "gene" type in trackViewer
mat3CC$case$feature <- "interval"
mat3CC$control$feature <- "interval"
caseTrack <- new(
"track",
dat = mat3CC$case,
type = "gene",
format = "BED"
)
setTrackStyleParam(caseTrack, "color", "black")
controlTrack <- new(
"track",
dat = mat3CC$case,
type = "gene",
format = "BED"
)
setTrackStyleParam(controlTrack, "color", "blue")
tl <- trackList(caseTrack, controlTrack) #
viewRegion <- GRanges("chr4", IRanges(78000000, 79500000))
viewTracks(tl, gr = viewRegion, autoOptimizeStyle = TRUE)While trackViewer provides convenient functionality for
visualization entirely in an R environment, some intervals are not
visible at this resolution. This and many other applications benefit
from more interactive visualization directly in the UCSC Genome Browser.
Example tracks as exported in the code below are shown in the browser
alongside recombination maps and nearby genes.
export(mat3CC$case, "~/path/to/caseOutput.bed", format = "BED")
export(mat3CC$control, "~/path/to/controlOutput.bed", format = "BED")One of the factors that affects the density of informative SNPs and
therefore the resolution of recombination intervals is the degree of
homozygosity in the sample. The function hzRun() is a tool
to detect runs of homozygosity of a user-specified minimum length in Mb
(default 1Mb). To account for variation in SNP density, it also includes
a criterion for the minimum number of SNPs per run, with a default value
of 5. Users can also provide a genetic map and specify a minimum
centiMorgan length for runs. This example applies a sex-averaged map to
all samples.
As the output is a GenomicRanges object or list, as in
xoDetect(), the procedure to produce genome browser
visualizations is identical to that of the recombination visualizations
corresponding to xoDetect().
sexavg_chr4 <- file.path(exampleDir, "sexavg_chr4.txt")
hzRun(simCEU, sexavg_chr4)
#> GRanges object with 1 range and 6 metadata columns:
#> seqnames ranges strand | sampleId startSnp finishSnp
#> <Rle> <IRanges> <Rle> | <character> <character> <character>
#> [1] 4 78002267-79180440 * | F4C3 rs17261150 rs7340831
#> numSnps startCm finishCm
#> <integer> <numeric> <numeric>
#> [1] 1028 82.7108 83.7803
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengthsInferring recombination implies identity by descent in between the
informative SNPs, allowing for the phasing of this subset of SNPs in the
pedigrees with three or more offspring. Many population-based phasing
programs are highly accurate over short distances, but phase accuracy
deteriorates rapidly at distances greater than a few centiMorgans. The
function xoPhase() implements a novel phasing algorithm to
produce long-range haplotype phase information at informative SNPs.
Using the information in the nuclear family structures, it is possible
to obtain chromosome-scale haplotypes at the informative subset of SNPs.
In these data, 36% of total SNPs are informative. Homozygous SNPs are
also reported, and non-informative SNPs are shown as NA.
The function xoPhase() uses the sets of maternal and
paternal recombinations to infer haplotype phase at all informative SNPs
for all members of each the nuclear families with three or more
children. There is also an option to specify a subset of family IDs for
phasing.
Several options are available for output, including a list of DataFrames, a SummarizedExperiment object, or VCF. The paternal and maternal chromosomes transmitted in each child are labeled with the suffix ‘Pat’ and ‘Mat’ respectively, while each chromosome in the parents is labeled with a 1 or 2.
phased <- xoPhase(simCEU,
pat3,
mat3,
output = "list"
)
phased$F6[1158:1162, ]
#> DataFrame with 5 rows and 12 columns
#> rsID location F6C1Pat F6C1Mat F6C2Pat F6C2Mat
#> <character> <numeric> <character> <character> <character> <character>
#> 1 rs17002978 79292724 C C C C
#> 2 rs10032397 79294804 A A A A
#> 3 rs17420118 79294988 C C C C
#> 4 rs6838661 79296422 G G G G
#> 5 rs17002988 79298781 T T T T
#> F6C3Pat F6C3Mat F6P1_1 F6P1_2 F6P2_1 F6P2_2
#> <character> <character> <character> <character> <character> <character>
#> 1 C C C C C C
#> 2 A A A A A A
#> 3 C C C C C C
#> 4 C G G C G G
#> 5 T T T T T T