Combines phylogeny and data to ensure that tips in phylogeny match data and that observations with missing values are removed. This function uses variables provided in the `formula` argument to:

  • Remove NA`s: Check if there is any row with NA in the variables included in the formula. All rows containing NA will be removed from the data

  • Match data and phy: Check if tips from phylogeny match rownames in data. Tips not present in data and phy will be removed from the phylogeny and data

  • Return matched data and phy: The returned data has no NA in the variables included in `formula` and only rows that match phylogeny tips. Returned phy has only tips that match data

Used internally in samp_phylm, samp_phyglm, clade_phylm, clade_phyglm, intra_phylm, intra_phyglm, tree_phylm, tree_phyglm and all function analysing interactions. Users can also directly use this function to combine a phylogeny and a dataset.

match_dataphy(formula, data, phy, verbose = TRUE, ...)

Arguments

formula

The model formula

data

Data frame containing species traits with row names matching tips in phy.

phy

A phylogeny (class 'phylo' or 'multiphylo')

verbose

Print the number of species that match data and phylogeny and warnings. We highly recommend to use the default (verbose = T), but warning and information can be silenced for advanced use.

...

Further arguments to be passed to match_dataphy

Value

The function match_dataphy returns a list with the following components:

data: Cropped dataset matching phylogeny

phy: Cropped phylogeny matching data

dropped: Species dropped from phylogeny and removed from data.

Details

This function uses all variables provided in the `formula` to match data and phylogeny. To avoid cropping the full dataset, `match_dataphy` searches for NA values only on variables provided by formula. Missing values on other variables, not included in `formula`, will not be removed from data. If no species names are provided as row names in the dataset but the number of rows in the dataset is the same as the number of tips in the phylogeny, the function assumes that the dataset and the phylogeny are in the same order.

This ensures consistency between data and phylogeny only for the variables that are being used in the model (set by `formula`).

If phy is a 'multiphylo' object, all phylogenies will be cropped to match data. But the dataset order will only match the first tree provided. The returned phylogeny will be a 'multiphylo' object.

Note

If tips are removed from the phylogeny and data or if rows containing missing values are removed from data, a message will be printed with the details. Further, the final number of species that match data and phy will always be reported by a message.

References

This function is largely inspired by the function comparative.data in caper package David Orme, Rob Freckleton, Gavin Thomas, Thomas Petzoldt, Susanne Fritz, Nick Isaac and Will Pearse (2013). caper: Comparative Analyses of Phylogenetics and Evolution in R. R package version 0.5.2. http://CRAN.R-project.org/package=caper

Examples

# Load data: data(alien) head(alien$data)
#> family adultMass gestaLen homeRange #> Tachyglossus_aculeatus Tachyglossidae 4020.767 28.375 0.9991117 #> Ornithorhynchus_anatinus Ornithorhynchidae 1458.208 15.000 0.1120000 #> Ondatra_zibethicus Cricetidae 1135.014 27.100 0.0044500 #> Mesocricetus_auratus Cricetidae 97.125 15.500 NA #> Castor_canadensis Castoridae 18085.634 110.000 NA #> Myocastor_coypus Myocastoridae 6135.768 131.737 0.0376000 #> SD_mass SD_gesta SD_range #> Tachyglossus_aculeatus 1218.29240 4.199500 0.79224190 #> Ornithorhynchus_anatinus 180.81779 2.160000 0.04300000 #> Ondatra_zibethicus 388.17479 3.333300 0.00000000 #> Mesocricetus_auratus 12.52913 0.496000 NA #> Castor_canadensis 2875.61581 13.090000 NA #> Myocastor_coypus 546.08335 3.425162 0.01567712
# Match data and phy based on model formula: comp.data <- match_dataphy(gestaLen ~ homeRange, data = alien$data, alien$phy[[1]])
#> Warning: NA's in response or predictor, rows with NA's were removed
#> Warning: Some phylo tips do not match species in data (this can be due to NA removal) species were dropped from phylogeny or data
#> Used dataset has 49 species that match data and phylogeny
# Check data: head(comp.data$data)
#> family adultMass gestaLen homeRange #> Tachyglossus_aculeatus Tachyglossidae 4020.767 28.375 0.99911167 #> Ornithorhynchus_anatinus Ornithorhynchidae 1458.208 15.000 0.11200000 #> Ondatra_zibethicus Cricetidae 1135.014 27.100 0.00445000 #> Myocastor_coypus Myocastoridae 6135.768 131.737 0.03760000 #> Marmota_monax Sciuridae 3747.182 31.600 0.03335818 #> Tamiasciurus_hudsonicus Sciuridae 209.452 35.724 0.01173571 #> SD_mass SD_gesta SD_range #> Tachyglossus_aculeatus 1218.29240 4.199500 0.792241901 #> Ornithorhynchus_anatinus 180.81779 2.160000 0.043000000 #> Ondatra_zibethicus 388.17479 3.333300 0.000000000 #> Myocastor_coypus 546.08335 3.425162 0.015677117 #> Marmota_monax 528.35266 2.275200 0.042975197 #> Tamiasciurus_hudsonicus 23.66808 3.072264 0.007209057
# Check phy: comp.data$phy
#> #> Phylogenetic tree with 49 tips and 48 internal nodes. #> #> Tip labels: #> Tachyglossus_aculeatus, Ornithorhynchus_anatinus, Ondatra_zibethicus, Myocastor_coypus, Marmota_monax, Tamiasciurus_hudsonicus, ... #> #> Rooted; includes branch lengths.
# See species dropped from phy or data: comp.data$dropped
#> [1] "Mesocricetus_auratus" "Castor_canadensis" #> [3] "Hystrix_brachyura" "Chinchilla_lanigera" #> [5] "Marmota_bobak" "Tamias_townsendii" #> [7] "Atlantoxerus_getulus" "Sciurus_niger" #> [9] "Sciurus_aureogaster" "Oryctolagus_cuniculus" #> [11] "Macaca_arctoides" "Macaca_mulatta" #> [13] "Macaca_fascicularis" "Ovis_ammon" #> [15] "Ovis_aries" "Hemitragus_jemlahicus" #> [17] "Capra_ibex" "Rupicapra_rupicapra" #> [19] "Ovibos_moschatus" "Gazella_subgutturosa" #> [21] "Saiga_tatarica" "Bubalus_bubalis" #> [23] "Tragelaphus_strepsiceros" "Capreolus_capreolus" #> [25] "Rangifer_tarandus" "Rusa_timorensis" #> [27] "Cervus_elaphus" "Rusa_unicolor" #> [29] "Camelus_bactrianus" "Equus_hemionus" #> [31] "Mustela_sibirica" "Mustela_lutreola" #> [33] "Neovison_vison" "Nasua_nasua" #> [35] "Lycalopex_griseus" "Felis_catus" #> [37] "Pseudocheirus_peregrinus" "Bettongia_lesueur" #> [39] "Macropus_eugenii" "Macropus_parma" #> [41] "Petrogale_lateralis" "Petrogale_penicillata" #> [43] "Thylogale_billardierii" "Potorous_tridactylus" #> [45] "Lasiorhinus_latifrons"
# Example2: # Match data and phy based on model formula: comp.data2 <- match_dataphy(gestaLen ~ adultMass, data = alien$data, alien$phy)
#> Warning: NA's in response or predictor, rows with NA's were removed
#> Warning: Some phylo tips do not match species in data (this can be due to NA removal) species were dropped from phylogeny or data
#> Used dataset has 84 species that match data and phylogeny
# Check data (missing data on variables not included in the formula are preserved) head(comp.data2$data)
#> family adultMass gestaLen homeRange #> Tachyglossus_aculeatus Tachyglossidae 4020.767 28.375 0.9991117 #> Ornithorhynchus_anatinus Ornithorhynchidae 1458.208 15.000 0.1120000 #> Ondatra_zibethicus Cricetidae 1135.014 27.100 0.0044500 #> Mesocricetus_auratus Cricetidae 97.125 15.500 NA #> Castor_canadensis Castoridae 18085.634 110.000 NA #> Myocastor_coypus Myocastoridae 6135.768 131.737 0.0376000 #> SD_mass SD_gesta SD_range #> Tachyglossus_aculeatus 1218.29240 4.199500 0.79224190 #> Ornithorhynchus_anatinus 180.81779 2.160000 0.04300000 #> Ondatra_zibethicus 388.17479 3.333300 0.00000000 #> Mesocricetus_auratus 12.52913 0.496000 NA #> Castor_canadensis 2875.61581 13.090000 NA #> Myocastor_coypus 546.08335 3.425162 0.01567712
# Check phy: comp.data2$phy
#> 101 phylogenetic trees
# See species dropped from phy or data: comp.data2$dropped
#> [1] "Chinchilla_lanigera" "Marmota_bobak" #> [3] "Tamias_townsendii" "Atlantoxerus_getulus" #> [5] "Sciurus_niger" "Sciurus_aureogaster" #> [7] "Lycalopex_griseus" "Felis_catus" #> [9] "Pseudocheirus_peregrinus" "Petrogale_lateralis"