This short example shows how the packages rgbif
, taxa
, and metacoder
can be used to retrieve biodiversity data, filter it by taxonomic characteristics, and visualize it. Each of these packages do much more than this; see the links at the end of this example for more information.
install.packages("metacoder")
install.packages("taxa")
install.packages("rgbif")
library(rgbif)
library(taxa)
library(metacoder)
We can use the rgbif
package to get a table of recorded plant occurrences in Oregon from the Global Biodiversity Information Facility (GBIF). Each row is a record of a plant being observed.
occ <- occ_data(scientificName = "Plantae", stateProvince = "Oregon")
occ$data
## # A tibble: 500 x 72
## key scientificName decimalLatitude decimalLongitude issues datasetKey publishingOrgKey
## <int> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 2.01e9 Quercus suber… 44.6 -123. cdrou… 50c9509d-… 28eb1a3f-1c15-4…
## 2 1.99e9 Polystichum m… 45.5 -123. cdrou… 50c9509d-… 28eb1a3f-1c15-4…
## 3 1.99e9 Chrysolepis c… 44.4 -122. cdrou… 50c9509d-… 28eb1a3f-1c15-4…
## 4 1.99e9 Lysichiton am… 46.1 -124. cdrou… 50c9509d-… 28eb1a3f-1c15-4…
## 5 1.99e9 Cardamine hir… 45.5 -123. cdrou… 50c9509d-… 28eb1a3f-1c15-4…
## 6 1.99e9 Thuja plicata… 45.4 -123. cdrou… 50c9509d-… 28eb1a3f-1c15-4…
## 7 2.01e9 Quercus garry… 44.9 -123. cdrou… 50c9509d-… 28eb1a3f-1c15-4…
## 8 1.99e9 Arum italicum… 45.5 -122. cdrou… 50c9509d-… 28eb1a3f-1c15-4…
## 9 1.99e9 Umbellularia … 43.5 -124. cdrou… 50c9509d-… 28eb1a3f-1c15-4…
## 10 1.99e9 Oenanthe sarm… 44.6 -123. cdrou… 50c9509d-… 28eb1a3f-1c15-4…
## # … with 490 more rows, and 65 more variables: networkKeys <list>, installationKey <chr>,
## # publishingCountry <chr>, protocol <chr>, lastCrawled <chr>, lastParsed <chr>, crawlId <int>,
## # basisOfRecord <chr>, taxonKey <int>, kingdomKey <int>, …
taxmap
formatNote how the taxonomy is encoded in a set of columns:
occ$data[26:31]
## # A tibble: 500 x 6
## kingdom phylum order family genus species
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Plantae Tracheophyta Fagales Fagaceae Quercus Quercus suber
## 2 Plantae Tracheophyta Polypodiales Dryopteridaceae Polystichum Polystichum munitum
## 3 Plantae Tracheophyta Fagales Fagaceae Chrysolepis Chrysolepis chrysophylla
## 4 Plantae Tracheophyta Alismatales Araceae Lysichiton Lysichiton americanus
## 5 Plantae Tracheophyta Brassicales Brassicaceae Cardamine Cardamine hirsuta
## 6 Plantae Tracheophyta Pinales Cupressaceae Thuja Thuja plicata
## 7 Plantae Tracheophyta Fagales Fagaceae Quercus Quercus garryana
## 8 Plantae Tracheophyta Alismatales Araceae Arum Arum italicum
## 9 Plantae Tracheophyta Laurales Lauraceae Umbellularia Umbellularia californica
## 10 Plantae Tracheophyta Apiales Apiaceae Oenanthe Oenanthe sarmentosa
## # … with 490 more rows
We can use these columns to infer a taxonomy and organize the data according to it. The function parse_tax_data
from the taxa
package can convert data that includes a taxonomy into the taxmap
format.
obj <- parse_tax_data(occ$data, class_cols = 26:31, named_by_rank = TRUE)
obj
## <Taxmap>
## 454 taxa: ab. Plantae, ac. Tracheophyta ... rm. Clematis vitalba
## 454 edges: NA->ab, ab->ac, ab->ad, ab->ae, ab->af ... im->rj, gs->rk, fq->rl, ki->rm
## 1 data sets:
## tax_data:
## # A tibble: 500 x 73
## taxon_id key scientificName decimalLatitude decimalLongitude issues datasetKey
## <chr> <int> <chr> <dbl> <dbl> <chr> <chr>
## 1 kj 2.01e9 Quercus suber… 44.6 -123. cdrou… 50c9509d-…
## 2 kk 1.99e9 Polystichum m… 45.5 -123. cdrou… 50c9509d-…
## 3 kl 1.99e9 Chrysolepis c… 44.4 -122. cdrou… 50c9509d-…
## # … with 497 more rows, and 66 more variables: publishingOrgKey <chr>,
## # networkKeys <list>, installationKey <chr>, publishingCountry <chr>,
## # protocol <chr>, lastCrawled <chr>, lastParsed <chr>, crawlId <int>,
## # basisOfRecord <chr>, taxonKey <int>, …
## 0 functions:
Note how the original table is now part of the object and a unique taxon ID has been added to each row.
Using the taxmap
format keeps the taxonomic and non-taxonomic parts of the data in sync, allowing for filtering data based on taxon characteristics and vice versa. It also allows you to use the heat_tree
function from the metacoder
package to plot per-taxon data. In the example below, we plot the number of occurrence for each taxon after filtering out:
obj %>%
filter_taxa(taxon_ranks != "species") %>%
filter_taxa(taxon_names == "Tracheophyta", subtaxa = TRUE) %>%
filter_taxa(taxon_ranks == "order", n_subtaxa > 10, subtaxa = TRUE, supertaxa = TRUE) %>%
heat_tree(node_label = taxon_names,
node_color = n_obs,
node_size = n_obs,
node_color_axis_label = "# occurrences")
Any hierarchical data can be used with metacoder
and taxa
. Here is an example of plotting voting results from the 2016 US Democratic primary between Bernie Sanders and Hillary Clinton, using geography as the hierarchy instead of taxonomy.
The metacoder
package focuses on ecological analysis of taxonomic data, especially microbiome data. Here is an example that uses metacoder
to visualize differences in the human microbiome among body sites.
The grey tree on the lower left functions as a key for the unlabeled trees. Each of the smaller trees represent a comparison between body sites in the columns and rows. A taxon colored brown is more abundant in the body site in the column and a taxon colored green is more abundant in body site of the row. For example, Bacteroidetes is more abundant (i.e. has more reads) in the throat samples than the nose samples and this is due to the greater abundance of two genera Prevotella and Porphyromonas.