| Title: | Analyze Recombination and Crossover Interference with Sequence-Based Genotypes |
|---|---|
| Description: | Analyze recombination and crossover interference from genotyping-by-sequencing (GBS) data on a backcross. |
| Authors: | Karl W Broman [aut, cre]
|
| Maintainer: | Karl W Broman <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.8 |
| Built: | 2026-06-01 21:00:07 UTC |
| Source: | https://github.com/kbroman/xoiGBS |
Calculate genotype probabilities using an HMM, from high-throughput sequencing data with counts of alleles at SNPs
calc_genoprob_gbs( counts, map, error_prob1 = 0.002, error_prob2 = 0.002, map_function = c("haldane", "kosambi", "c-f", "morgan"), cores = 1 )calc_genoprob_gbs( counts, map, error_prob1 = 0.002, error_prob2 = 0.002, map_function = c("haldane", "kosambi", "c-f", "morgan"), cores = 1 )
counts |
Three-dimensional array of counts, positions x individuals x two alleles, with alleles A and B where we are expecting genotypes AA and AB |
map |
Vector of marker positions, same length as |
error_prob1 |
Error probability for sequencing errors |
error_prob2 |
Error probability for locus errors |
map_function |
Map function for converting from cM distances to recombination fractions |
cores |
Number of CPU cores to use, for multi-core calculations |
Matrix of genotype probabilities, positions x individuals, for Pr(het)
Grab all double-crossover locations, as a matrix with two columns, for cases of exactly 2 crossovers
grab2XO(xoloc)grab2XO(xoloc)
xoloc |
List of matrices, of crossover locations, as output from |
Matrix with locations of pair of crossovers, when there are exactly two
Grab all crossover locations, as a vector
grabXO(xoloc)grabXO(xoloc)
xoloc |
List of matrices, of crossover locations, as output from |
Vector of crossover locations
Infer crossover locations from a matrix of genotype probabilities as produced by calc_genoprob_gbs()
inferXOloc(genoprob, map, low = 0.1, high = 0.9)inferXOloc(genoprob, map, low = 0.1, high = 0.9)
genoprob |
Matrix of genotype probabilities, positions x individuals, as produced by |
map |
Vector of marker positions, same length as |
low |
Lower threshold; if probability below this threshold, infer homozygous |
high |
Higher threshold; if probaability above this threshold, infer heterozygous |
List of matrices (of length ncol(genoprob), each having
columns with estimated location and left and right interval
endpoints. If no crossovers, it will be a matrix with no rows.
Get numeric order of individual IDs, when they're of the form blah25 or blah25-2 Can also have dup info at the end, like blah25_dup1 or blah25_dup2
order_ids(ids, decreasing = FALSE)order_ids(ids, decreasing = FALSE)
ids |
Vector of individual IDs (as character string) |
decreasing |
If TRUE, get decreasing order (largest to smallest) |
We assume the ids have an initial non-numerial label,
followed by a number, followed possibly by _dup and then
another number. The IDs are sorted first by the initial
non-numeric bit, then by the number, then by the duplicate
number (with absense of _dup taken to be duplicate "0" and
_dup without a number taken to be duplicate "1").
Input IDs, sorted numerically
sort_ids()
ids <- c("BCA70", "BCA1", "PWD1", "PWD2", "BCA2", "BCA75", "BCA70_dup", "PWD1_dup2", "PWD1_dup1") order_ids(ids) sort_ids(ids) sort_ids(ids, decreasing=TRUE)ids <- c("BCA70", "BCA1", "PWD1", "PWD2", "BCA2", "BCA75", "BCA70_dup", "PWD1_dup2", "PWD1_dup1") order_ids(ids) sort_ids(ids) sort_ids(ids, decreasing=TRUE)
Randomizing estimated crossover locations, uniform within their intervals.
randXOloc(xoloc)randXOloc(xoloc)
xoloc |
Either a matrix, or a list of matrices, of crossover locations, as output from |
Object like that input, but with estimated crossover locations randomized.
read counts data from a set of files
read_counts(files)read_counts(files)
files |
A vector of character strings with the counts data. These text files can be gzipped. Each file should be one backcross individual, and the files should have six whitespace-delimited columns: chromosome, basepairs positions, allele1, allele2, readcount1, readcount2, with allele1 being the allele for the homozygote parent. Alternatively, it can be a single directory, in which case we read all the .txt or .gz files in that directory. |
Names are taken from the file names...everything before ".txt" or ".gz", but removing "_read_counts" if it's part of the name. So BCA81-2_read_counts.txt.gz becomes BCA81-2
A list of data frames with the contents of the files
Reorganize count data, from by-individual data frames to a list of 3-dimensional arrays.
reorg_counts(counts, map = NULL, clean_chr = TRUE, quiet = TRUE)reorg_counts(counts, map = NULL, clean_chr = TRUE, quiet = TRUE)
counts |
A list of data frames, one per individual, with six columns: chromosome, basepairs
position, allele1, allele2, readcount1, and readcount 2, as read with |
map |
Optional map of SNPs to consider in the output, as a list of vectors of marker positions. If provided, any other positions will be ignored. |
clean_chr |
If TRUE, remove "chr" from the chromosome names, so chr13 becomes just 13 |
quiet |
If FALSE, print some tracing info |
A list containing map and counts, with the map being a list of vectors of Mbp positions, and the counts being a list of 3-dimensional arrays, position x individual x allele.
Sort individual IDs numerically, when they're of the form blah25 or blah25-2 Can also have dup info at the end, like blah25_dup1 or blah25_dup2
sort_ids(ids, decreasing = FALSE)sort_ids(ids, decreasing = FALSE)
ids |
Vector of individual IDs (as character string) |
decreasing |
If TRUE, sort in decreasing order (largest to smallest) |
We assume the ids have an initial non-numerial label,
followed by a number, followed possibly by _dup and then
another number. The IDs are sorted first by the initial
non-numeric bit, then by the number, then by the duplicate
number (with absense of _dup taken to be duplicate "0" and
_dup without a number taken to be duplicate "1").
Input IDs, sorted numerically
order_ids()
ids <- c("BCA70", "BCA1", "PWD1", "PWD2", "BCA2", "BCA75", "BCA70_dup", "PWD1_dup2", "PWD1_dup1") order_ids(ids) sort_ids(ids) sort_ids(ids, decreasing=TRUE)ids <- c("BCA70", "BCA1", "PWD1", "PWD2", "BCA2", "BCA75", "BCA70_dup", "PWD1_dup2", "PWD1_dup1") order_ids(ids) sort_ids(ids) sort_ids(ids, decreasing=TRUE)
Trim tight double crossovers from crossover information
trimXO(xoloc, mind = 1)trimXO(xoloc, mind = 1)
xoloc |
Either a matrix, or a list of matrices, of crossover locations, as output from |
mind |
Minimum allowed distance between crossovers |
Object like that input, but with double-crossovers within mind of each other removed