Interaction between intraspecific variability and influential species - Phylogenetic Linear Regression

Performs leave-one-out deletion analysis for phylogenetic linear regression, and detects influential species, while taking into account potential interactions with intraspecific variability.

intra_influ_phylm(
  formula,
  data,
  phy,
  Vy = NULL,
  Vx = NULL,
  y.transf = NULL,
  x.transf = NULL,
  n.intra = 10,
  distrib = "normal",
  model = "lambda",
  cutoff = 2,
  track = TRUE,
  ...
)

Arguments

formula	The model formula: `response~predictor`.
data	Data frame containing species traits with row names matching tips in `phy`.
phy	A phylogeny (class 'phylo') matching `data`.
Vy	Name of the column containing the standard deviation or the standard error of the response variable. When information is not available for one taxon, the value can be 0 or `NA`.
Vx	Name of the column containing the standard deviation or the standard error of the predictor variable. When information is not available for one taxon, the value can be 0 or `NA`
y.transf	Transformation for the response variable (e.g. `"log"` or `"sqrt"`). Please use this argument instead of transforming data in the formula directly (see also details below).
x.transf	Transformation for the predictor variable (e.g. `"log"` or `"sqrt"`). Please use this argument instead of transforming data in the formula directly (see also details below).
n.intra	Number of datasets resimulated taking into account intraspecific variation (see: `"intra_phylm"`)
distrib	A character string indicating which distribution to use to generate a random value for the response and/or predictor variables. Default is normal distribution: "normal" (function `rnorm`). Uniform distribution: "uniform" (`runif`) Warning: we recommend to use normal distribution with Vx or Vy = standard deviation of the mean.
model	The phylogenetic model to use (see Details). Default is `lambda`.
cutoff	The cutoff value used to identify for influential species (see Details)
track	Print a report tracking function progress (default = TRUE)
...	Further arguments to be passed to `phylolm`

Value

The function intra_influ_phylm returns a list with the following components:

cutoff: The value selected for cutoff

formula: The formula

full.model.estimates: Coefficients, aic and the optimised value of the phylogenetic parameter (e.g. lambda) for the full model without deleted species.

influential_species: List of influential species, both based on standardised difference in intercept and in the slope of the regression. Species are ordered from most influential to less influential and only include species with a standardised difference > cutoff.

sensi.estimates: A data frame with all simulation estimates. Each row represents a deleted clade for an iteration of resimulated data. Columns report the calculated regression intercept (intercept), difference between simulation intercept and full model intercept (DIFintercept), the standardised difference (sDIFintercept), the percentage of change in intercept compared to the full model (intercept.perc) and intercept p-value (pval.intercept). All these parameters are also reported for the regression slope (DIFestimate etc.). Additionally, model aic value (AIC) and the optimised value (optpar) of the phylogenetic parameter (e.g. kappa or lambda, depending on the phylogenetic model used) are reported.

data: Original full dataset.

errors: Species where deletion resulted in errors.

Details

This function fits a phylogenetic linear regression model using phylolm, and detects influential species by sequentially deleting one at a time. The regression is repeated n.intra times for simulated values of the dataset, taking into account intraspecific variation. At each iteration, the function generates a random value for each row in the dataset using the standard deviation or errors supplied, and detect the influential species within that iteration.

All phylogenetic models from phylolm can be used, i.e. BM, OUfixedRoot, OUrandomRoot, lambda, kappa, delta, EB and trend. See ?phylolm for details.

influ_phylm detects influential species based on the standardised difference in intercept and/or slope when removing a given species compared to the full model including all species. Species with a standardised difference above the value of cutoff are identified as influential. The default value for the cutoff is 2 standardised differences change.

Currently, this function can only implement simple linear models (i.e. \(trait~ predictor\)). In the future we will implement more complex models.

Output can be visualised using sensi_plot.

Warning

When Vy or Vx exceed Y or X, respectively, negative (or null) values can be generated, this might cause problems for data transformation (e.g. log-transformation). In these cases, the function will skip the simulation.

Setting n.intra at high values can take a long time to execute, since the total number of iterations equals n.intra * nrow(data).

References

Paterno, G. B., Penone, C. Werner, G. D. A. sensiPhy: An r-package for sensitivity analysis in phylogenetic comparative methods. Methods in Ecology and Evolution 2018, 9(6):1461-1467.

Ho, L. S. T. and Ane, C. 2014. "A linear-time algorithm for Gaussian and non-Gaussian trait evolution models". Systematic Biology 63(3):397-408.

Examples

if (FALSE) {
# Load data:
data(alien)
# run analysis:
intra_influ <- intra_influ_phylm(formula = gestaLen ~ adultMass, phy = alien$phy[[1]],
data=alien$data,model="lambda",y.transf = log,x.transf = NULL,Vy="SD_gesta",Vx=NULL,
n.intra=30,distrib = "normal")
summary(intra_influ)
sensi_plot(intra_influ)
}
# \dontshow{
data(alien)
# run analysis:
intra_influ <- intra_influ_phylm(formula = gestaLen ~ adultMass, phy = alien$phy[[1]],
                                data=alien$data[1:15, ],model="lambda",y.transf = log,
                                x.transf = NULL,Vy="SD_gesta",Vx=NULL,
                                n.intra=5,distrib = "normal")
#> Warning: distrib=normal: make sure that standard deviation is provided for Vx and/or Vy
#> Warning: NA's in response or predictor, rows with NA's were removed
#> Warning: Some phylo tips do not match species in data (this can be due to NA removal) species were dropped from phylogeny or data
#> Used dataset has  10  species that match data and phylogeny
#> 
  |                                                                            
  |                                                                      |   0%
#> Warning: the estimation of lambda matches the upper/lower bound for this parameter.
#>                           You may change the bounds using options "upper.bound" and "lower.bound".
#> 
  |                                                                            
  |==============                                                        |  20%
#> Warning: the estimation of lambda matches the upper/lower bound for this parameter.
#>                           You may change the bounds using options "upper.bound" and "lower.bound".
#> Warning: the estimation of lambda matches the upper/lower bound for this parameter.
#>                           You may change the bounds using options "upper.bound" and "lower.bound".
#> 
  |                                                                            
  |============================                                          |  40%
#> Warning: the estimation of lambda matches the upper/lower bound for this parameter.
#>                           You may change the bounds using options "upper.bound" and "lower.bound".
#> Warning: the estimation of lambda matches the upper/lower bound for this parameter.
#>                           You may change the bounds using options "upper.bound" and "lower.bound".
#> Warning: the estimation of lambda matches the upper/lower bound for this parameter.
#>                           You may change the bounds using options "upper.bound" and "lower.bound".
#> 
  |                                                                            
  |==========================================                            |  60%
#> Warning: the estimation of lambda matches the upper/lower bound for this parameter.
#>                           You may change the bounds using options "upper.bound" and "lower.bound".
#> 
  |                                                                            
  |========================================================              |  80%
#> Warning: the estimation of lambda matches the upper/lower bound for this parameter.
#>                           You may change the bounds using options "upper.bound" and "lower.bound".
#> Warning: the estimation of lambda matches the upper/lower bound for this parameter.
#>                           You may change the bounds using options "upper.bound" and "lower.bound".
#> Warning: the estimation of lambda matches the upper/lower bound for this parameter.
#>                           You may change the bounds using options "upper.bound" and "lower.bound".
#> 
  |                                                                            
  |======================================================================| 100%
summary(intra_influ)
#> $`Most Common Influential species for the Estimate`
#>     Species removed (%) of iterations
#> 1 Castor_canadensis               100
#> 
#> $`Average Estimates`
#>     Species removed     Estimate  DIFestimate Change(%)      Pval
#> 1 Castor_canadensis 0.0001836317 9.958807e-05    120.68 0.0143568
#> 
#> $`Most Common Influential species for the Intercept`
#>     Species removed (%) of iterations
#> 1 Castor_canadensis                20
#> 
#> $`Average Intercepts`
#>     Species removed Intercept DIFintercept Change(%)         Pval
#> 1 Castor_canadensis  3.017682   -0.1648099      5.16 1.522807e-05
#> 
sensi_plot(intra_influ)
#> Warning: `fun.y` is deprecated. Use `fun` instead.
# }