Despite the countless blog posts, textbooks, and online courses focused on R that litter the internet, this document attempts to provide an overview of how R is typically used in paleolimnological analyses at PEARL.

The purpose of the guide is to aid both new and experienced users refine their many inevitable internet searches prefaced with *“r help …”*; as even with some familiarity with a given statistical technique, it may be unclear how to perform it using R.

The guide begins with some basic information for users new to R; however, the majority of content is focused on common analyses used in paleolimnological studies and the specific packages and functions that can facilitate them. The guide was created using R Markdown, specifically to allow easy updates and a gradual evolution over time. Therefore, please share any suggestions you may have to improve future versions.

R is an open-source programming language and software environment for statistical computing and graphics. R is freely available for Linux, macOS, and Windows and the current version can be obtained directly from the project website: https://www.r-project.org/.

As R is a command line program, many people choose to control it through a graphical interface. There are several different graphical interfaces or integrated development environments (IDE) to choose from, but the one most commonly used at PEARL is RStudio. RStudio is an open-source IDE for R, freely available for Linux, macOS, and Windows, and its current version can be obtained directly from its project website: https://www.rstudio.com/.

RStudio is not required for anything described in this guide, it simply provides multiple panes, menus, and buttons to make some aspects of R more accessible. However, for simplicity, the guide assumes RStudio is installed, and that if you choose to either use R on its own or with a different IDE, that you are also capable of translating the described operations to your particular setup.

Objects are the fundamental elements of the R programming language. Within R, an object is simply a data structure with some attributes. Therefore, essentially everything that you work with during an R session will be an object of some kind.

Most objects have a class attribute that provides some definition regarding the type of data it contains. Classes include things like ‘character’, ‘numeric’, ‘logical’, ‘data.frame’ and ‘function’.

Functions are the principal tools used in R. Functions are objects that can take arguments as input, and return a new object as output. Some are quite simple (e.g. `mean`

returns the mean value of a numeric object), whereas others are much more complex (e.g. those used to perform ordinations). Many different functions are available within a base installation of R; however, as you become familiar with R, it becomes possible to write functions of your own, that can be shared via scripts and packages.

All R functions and datasets are stored in packages, including everything contained within a base install of R. However, this base functionality can be extended through external packages that contain additional functions (along with relevant documentation). It may be helpful to view this relationship as packages are the ‘apps’ available for the R ‘operating system’. The main source for external packages is the Comprehensive R Archive Network (CRAN) repository (the ‘app store’ in the OS analogy).

It is also possible to install external packages from a local archive file (e.g. .zip or .tar.gz), and this is how the `jezioro`

package (that contains several functions and datasets created at PEARL) is currently distributed. The latest version of `jezioro`

is always available on the PEARL website.

R is able to access the CRAN repository directly, and external packages are typically installed either through the console, or using an IDE such as RStudio.

To install the package `rmarkdown`

using the console, you would enter the following command into the lower left pane of RStudio (the ‘Console’ pane):

`install.packages("rmarkdown")`

However, the functions contained within that package do not become available until it is loaded into memory with the `library`

command.

`library("rmarkdown")`

To install a package using RStudio:

- click on the ‘Packages’ tab of the lower right pane of RStudio (the ‘Viewer’ pane)
- click the ‘Install’ button
- at the prompt, type the name of the package you wish to install (e.g.
`rmarkdown`

) - finally, click the ‘Install’ button and the package will be downloaded and installed

The `jezioro`

package contains several functions and data sets written at PEARL and is distributed as an archive file; however many other packages useful to the analyses performed at PEARL are available from the CRAN repository. These include:

`analogue`

- Gavin Simpson’s stat package needed for some WA techniques`dplyr`

- A set of tools for working with data frame like objects`ggplot2`

- A system for ‘declaratively’ creating graphics, based on the book “The Grammar of Graphics”`magrittr`

- Provides a mechanism to chain commands with a forward-pipe operator,`%>%`

`rioja`

- Functions for the analysis of Quaternary science data (the successor to C2)`rLakeAnalyzer`

- Physical limnology package`rmarkdown`

- A package for converting R Markdown documents into a variety of formats`roxygen2`

- A package to simplify documenting your own packages`vegan`

- Ordination methods, diversity analysis and other functions for community and vegetation ecologists

All objects within R should have documentation accessible with the `help`

function. To bring up the documentation for a function, package, or dataset, enter ‘help(“object of interest”)’ into the console and the help file should appear in the ‘Viewer’ pane of RStudio (on the bottom right). For example, to bring up the documentation for the `help`

function itself, enter:

`help("help")`

The `?`

is a shortcut for the help function, so entering the following will also bring up the documentation for the `help`

function.

`?help`

The help documentation for a package (which should contain a list of all of its functions) is accessed in the same way. So to bring up the help file for the `rmarkdown`

package installed earlier, use:

`?rmarkdown`

However, this will generate an error if the package is not loaded into memory. To search for help documentation on objects that are installed, but not currently loaded, use a double question mark `??`

.

`??rmarkdown`

After installing both R and RStudio, and then starting RStudio the screen will be split into four panes as shown in the official RStudio Cheat Sheet.

By default these panes are:

- Upper left, the ‘Source’ pane (where files and data are displayed)
- Upper right, the ‘Environment’ pane (which displays objects and packages currently memory and a history of commands)
- Lower right, the ‘Viewer’ pane (which contains several tabs including a file manager, one for plots, a package browser, and one for help files)
- Lower left, the ‘Console’ pane (where commands are entered)

Typing directly into the console is the simplest way to use R. For example, entering

`2+2`

will return the output,

`## [1] 4`

Where the `[1]`

indicates that the first element of the result (in this case, there is only one element) has a value of `4`

.

As you start performing more complex operations, you will find that it is easier to enter your input in the ‘Source’ pane, where it can saved to a file for later use, rather than retyping everything each time. These R script files typically have an *.R* extension.

Individual lines can be sent from the ‘Source’ pane to the console, by moving the cursor to the desired line and either hitting the *‘Run’* button, or using the keyboard shortcut *‘Ctrl + Enter’* (or *‘Ctrl + R’*). The same can also be done for selections of text.

The next logical step after determining that 2+2 = 4, is storing that information for later use. This is done by ‘assigning’ the value ‘4’ to an object we will call `x`

by using the assignment operator `<-`

(you could also use `=`

, but that has some restrictions on its use, whereas `<-`

can be used anywhere).

`x <- 4`

You should now see `x`

and its value displayed in the ‘Environment’ pane. Which means it is stored in memory and can be called back again using the console.

`x`

`## [1] 4`

Inspecting the class of `x`

with the `class`

function reveals that it is ‘numeric’.

`class(x)`

`## [1] "numeric"`

Which means that it can be used in arithmetic operations.

`x + 2`

`## [1] 6`

`x`

currently contains only one element (i.e. it has a length of 1), but this can be changed by reassigning multiple values to `x`

using the `c`

or ‘combine’ function. Below we reassign `x`

with two 8’s, changing it to a numeric vector with two elements.

```
x <- c(8, 8)
x
```

`## [1] 8 8`

`class(x)`

`## [1] "numeric"`

`length(x)`

`## [1] 2`

Note that `x`

can still be used in arithmetic operations. Adding ‘5’ to `x`

will add ‘5’ to each of element of `x`

.

`x + 5`

`## [1] 13 13`

As mentioned in the introduction, ‘numeric’ is one of several available classes. Another class that you will frequently encounter is ‘character’. A ‘character’ element is typically a string of alphanumeric characters, such as words.

```
y <- "blue"
class(y)
```

`## [1] "character"`

We can combine ‘character’ elements into a vector with the `c`

function.

```
y <- c(y, "red", y)
y
```

`## [1] "blue" "red" "blue"`

Predictably, errors occur when you try to perform addition on characters.

`y + 2 `

`## Error in y + 2: non-numeric argument to binary operator`

However, what about if you have a vector with both data types?

```
z <- c(2, 4, 6, "blue")
class(z)
```

`## [1] "character"`

`z + 2`

`## Error in z + 2: non-numeric argument to binary operator`

So, it only takes one ‘character’ element, to force the assignment of the `z`

object to the ’character` class.

Within R, all objects have a class. The class defines what kind of information the object contains and how it may be used. The previous example demonstrated some characteristics of ‘numeric’ vs ‘character’ objects.

In general, data will most often be classified as either a vector, data frame or matrix.

As seen previously, a `vector`

is a sequence of data elements.

```
x <- c(1,2,3)
y <- c(2,4,6)
```

Operations can be performed on the `vector`

as a whole.

`x+2`

`## [1] 3 4 5`

Or on individual elements of a `vector`

using `[]`

to indicate which element.

`y[3]*2`

`## [1] 12`

Or using multiple vectors.

`x+y`

`## [1] 3 6 9`

A `matrix`

is a two dimensional array. So lets make one by combining `x`

and `y`

using the `rbind`

function. Note that in addition to the data elements, `table1`

also has row names (‘x’ and ‘y’) and column names ([,1], [,2], and [,3]).

```
table1 <- rbind(x, y)
table1
```

```
## [,1] [,2] [,3]
## x 1 2 3
## y 2 4 6
```

`class(table1)`

`## [1] "matrix" "array"`

Individual elements of a `matrix`

can be also accessed using `[]`

to indicate the indices of each dimension separated by a comma (row then column).

`table1[2,3]`

```
## y
## 6
```

`table1[1,3] + table1[2,1]`

```
## x
## 5
```

`table1*2`

```
## [,1] [,2] [,3]
## x 2 4 6
## y 4 8 12
```

A `matrix`

can contain only one type of data (i.e. only ‘numeric’ or only ‘character’). In contrast, a `data frame`

is a table of data that can (but doesn’t necessarily) contain mixed data types. They can be created with `data.frame`

function.

```
table2 <- data.frame(Col1=c(1, 2), Col2=c("black", "white"))
table2
```

```
## Col1 Col2
## 1 1 black
## 2 2 white
```

`class(table2)`

`## [1] "data.frame"`

There are many more classes within R, but initially you will likely spend most of your time working with matrices and data frames. It is important to stay mindful of the differences between these two classes.

A matrix can be changed into a data frame using the `as.data.frame`

function.

```
x <- as.data.frame(table1)
class(x)
```

`## [1] "data.frame"`

`x`

```
## V1 V2 V3
## x 1 2 3
## y 2 4 6
```

Similarly, a data frame can be changed into a matrix by using the `as.matrix`

function. However, matrices must contain only one data type, and a number can be converted into a character, but not vice versa. Thus, when converting `table2`

to a matrix, the numeric values are converted to character values (denoted in quotation marks), and are no longer available for arithmetic operations.

```
x <- as.matrix(table2)
class(x)
```

`## [1] "matrix" "array"`

`x`

```
## Col1 Col2
## [1,] "1" "black"
## [2,] "2" "white"
```

`x[1,1]*2`

`## Error in x[1, 1] * 2: non-numeric argument to binary operator`

This becomes important when trying to combine matrices of different data types. For example, lets combine `table1`

a numeric matrix, with `x`

(the copy of `table2`

with everything converted to ‘character’ data) using the `cbind`

function.

```
x <- cbind(table1, x)
x
```

```
## Col1 Col2
## x "1" "2" "3" "1" "black"
## y "2" "4" "6" "2" "white"
```

By default, output inherits class from the input. So here, `x`

must be a matrix, and matrices can contain only one data type, therefore all the numeric values in both `table1`

and `x`

are converted to character values (as denoted by the quotation marks).

`class(x)`

`## [1] "matrix" "array"`

`x[1,3] + x[2,1]`

`## Error in x[1, 3] + x[2, 1]: non-numeric argument to binary operator`

In this case, it would probably make more sense for the output to be a data frame, containing both numeric and character columns. One way to do this, is by nesting the `as.data.frame`

function within the `cbind`

function. As we are now combining objects of different classes, R must choose a class for the output object. In this case it defaults to a data frame, as this can be done withoutchange any of the data with either object.

```
x <- cbind(table1, as.data.frame(table2))
class(x)
```

`## [1] "data.frame"`

`x`

```
## 1 2 3 Col1 Col2
## x 1 2 3 1 black
## y 2 4 6 2 white
```

`x[1,3] + x[2,1]`

`## [1] 5`

The past few examples should have provided some insight regarding the potential strength of functions. Functions allow very complex operations and can be combined in many ways.

Functions follow the general form: *function(input, optional arguments)*.

For example, the function `round`

, will round an input value ‘x’ to ‘n’ significant digits, when `round(x,digits=n)`

is entered into the console. Replacing ‘x’ and ‘n’ with actual numbers gives something like:

`round(3.141593, digits=5)`

`## [1] 3.14159`

`round(3.141593, digits=2) `

`## [1] 3.14`

The available documentation for a function can be viewed by entering a `?`

before its name into the console.

`?round`

A base install of R contains many functions and so far we have already used: `class`

, `c`

, `length`

, `rbind`

, `cbind`

, `as.data.frame`

, `as.matrix`

, and `round`

. It is also possible to define your own (that can incorporate other functions) with the syntax: *function(input){operations}*.

For example, we can create a function that will calculate what percentage each element of some input data contributes to the total sum of the input: `function(x){x/sum(x)}`

. Assigning (`<-`

) this function to an object called `percentage`

then allows it to be applied repeatedly to different datasets.

`percentage <- function(x){x/sum(x)}`

Although not particularly useful on objects containing only a single element, we can use the `c`

function to *combine* several numbers into a vector, then run `percentage`

on the vector as a whole.

`percentage(10)`

`## [1] 1`

`percentage(c(5,5,10))`

`## [1] 0.25 0.25 0.50`

Most of the data you will want to analyze with R, will be generated outside of R. Thus, it will need to be imported before you work with it. The simplest procedure to import data files is:

- First convert your data file into .csv format (this can be done in excel or whatever spreadsheet program you are using)
- The import will done by entering the
`read.csv`

function into the console along with several arguments - The
`file=file.choose()`

argument will open a dialog box that will allow you select the file you wish to import - The
`header=T`

argument will assign the top row of the imported file to the column names of the generated object - The
`stringsAsFactors=FALSE`

argument will ensure character strings stay as character strings

`count.data <- read.csv(file=file.choose(), header=T, stringsAsFactors=FALSE)`

Similarly, to export an object from within your R environment to a .csv file, use the `write.csv`

function. If we wanted to export the object `output.data`

to a .csv file, we would enter the following into the console to open up a dialog box to select the location of the new file.

`write.csv(output.data, file=file.choose())`

In addition to statistical analyses, one of the main uses of R is the generation of high quality figures. However, creating figures with R is a complex topic, and currently beyond the scope of this guide.

To begin, basic plots can be generated with the `plot`

function.

```
x <- c(1,2,3,4)
y <- c(2,4,6,8)
plot(x,y, type ='l')
```

`?plot`

```
## Help on topic 'plot' was found in the following packages:
##
## Package Library
## graphics /usr/lib/R/library
## base /usr/lib64/R/library
##
##
## Using the first match ...
```

Complex plots are also possible with the `plot`

function, but there are also alternate graphic systems available through external packages such as `ggplot2`

.

A few different methods for installing external packages were given in the Introduction of this guide. However, before the functions within a package become available for use, the package needs to be loaded into memory.

For example, if you installed the `rmarkdown`

function earlier, trying to bring up the help file for its `render`

function without the package loaded should generate an error.

`?render`

Load an installed package (in this case `rmarkdown`

) into memory with the `library`

function, and try again.

```
library(rmarkdown)
?render
```

This section lists common data analysis techniques used at PEARL and the functions and/or packages that contain the tools necessary to perform them.

Over time I would like to add some detailed example code for each of these techniques.

To check your the normality of your data, you can use the Shapiro-Wilk Normality Test provided in a base installation of R by the `shapiro.test`

function.

`shapiro.test(input.data)`

If you are dealing with a large number of variables, it may make sense to examine all of them at once using the `apply`

function.

`apply(input.data, 2, shapiro.test)`

Data can be square-root transformed using the `sqrt`

function. Data can be log (base 10) transformed using `log10`

. Note that by default the `log`

function will calculate the natural logarithm (i.e. ln).

```
sqrt(input.data)
log10(input.data)
log(input.data)
```

The `decostand`

function within the `vegan`

package provides several standardization methods (e.g. ‘chi square’ and ‘hellinger’)

`?decostand`

Z-scores can be calculated with the `scale`

function available in a base installation of R. Checking the details of the help file reveals that `scale`

has two arguments, center and scale.

`?scale`

- If center is TRUE then centering is done by subtracting the column means (omitting NAs) of
`x`

from their corresponding columns - If scale is TRUE then scaling is done by dividing the (centered) columns of
`x`

by their standard deviations if center is TRUE, and the root mean square otherwise.

Therefore, z-scores for `input.data`

can be calculated with:

`scale(input.data, center=TRUE, scale=TRUE)`

Correlations (Pearson, Kendall, and Spearman) can be examined using the `cor`

function.

`cor(input.data)`

A nice correlation matrix can be produced using the `corrplot`

package.

```
library(corrplot)
newdatacor = cor(my.dat[1:13])
corrplot(newdatacor, method = "number")
data(mtcars)
M <- cor(mtcars)
set.seed(0)
corrplot(M, method = "number", col = "black", cl.pos = "n")
```

Hill’s N2 values (Hill 1973)^{1} can be calculated for your data using the `Hill.N2`

function provided by the `rioja`

package.

`N2=Hill.N2(input.data, margin=2)`

Rarefaction is a method to correct for bias in species richness between samples of unequal size, by standardizing across samples to the number of species expected in a sample of the same total size as the smallest sample.

This can be done within R using the `rarefy`

function provided by the `vegan`

package.

`?rarefy`

At PEARL, we currently use ScienTissiME to produce ^{210}Pb chronologies. However, when reanalyzing cores dated prior to 2012-2013, they have been dated with the binford functions provided by the `jezioro`

package.

For time-scales too long for ^{210}Pb analyses, an implementation of the Bacon approach to age-depth modelling (i.e. using Bayesian statistics to reconstruct accumulation histories for deposits, by combining radiocarbon and other dates with prior information; Blaauw and Christen, 2011^{2}) is provided by the package `rbacon`

.

Tools to perform ordinations are provided by the `vegan`

package.

A detrended correspondance analysis (DCA) can be performed using the function `decorana`

. For example, to run a DCA on a matrix called “speciesData” (arranged with species as columns and samples as rows).

```
library(vegan)
outputDCA <- decorana(speciesData)
summary(outputDCA)
plot(outputDCA)
```

A principal component analysis (PCA) can be performed using the function `rda`

(Note that the “scale” argument can be used to scale values to unit variance).

```
outputPCA <- rda(speciesData, scale=TRUE)
summary(outputPCA, axes=4)
```

The `biplot`

function can be use to generate a PCA biplot with species scores indicated by biplot arrows (Note that the scaling argument can be used to scale either site(1), species (2), or both by eigenvalues. See `?biplot.rda`

for more details.

`biplot(outputPCA, scaling=2) `

A redundancy analysis (RDA) is performed by the `rda`

function when two matrices are provided as input, one containing the species data and the second containing the environmental data.

```
outputRDA <- rda(speciesData ~ ., envData)
outputRDA
summary(outputRDA, scaling=0, axes=4)
plot(outputRDA, scaling=2)
```

To manually perform forward selection on an RDA model use `add1.cca`

```
testRDA <- rda(speciesData ~ 1, envData) # defines starting model
add1(testRDA, scope=formula(outputRDA), test="perm", pstep=500) # examine F-ratios
testRDA <- update (testRDA, . ~ . + pH) # update model
add1(testRDA, scope=formula(outputRDA), test="perm", pstep=500) # examine F-ratio
testRDA <- update (testRDA, . ~ . + Ca) # updates model
add1(testRDA, scope=formula(outputRDA), test="perm", pstep=500) # continue until no more variables can improve model
anova (testRDA, by="terms", step=500) # tests terms in final model
testRDA
summary(testRDA, scaling=0, axes=4)
plot(testRDA, scaling=2)
```

Finally, examine the variable inflation factors in both the full model and reduced models.

```
vif.cca(outputRDA)
vif.cca(testRDA)
```

An ANOSIM can be performed using the `anosim`

function provided by the `vegan`

package. Similarity percentages can be determined with the `simper`

function also provided by the `vegan`

package.

indVAL?

Transfer functions can be built using the `WA`

function provided by the `rioja`

package.

For example, the midge/VWHO calibration set described in detail by Quinlan and Smol^{3} ^{4} is provided as a dataset called `vwhoQuinlan2010`

by the `jezioro`

package.

Therefore the Quinlan and Smol midge/VWHO model can be built using the following code:

```
library(jezioro)
library(rioja)
data(vwhoQuinlan2010)
vwho.model <- WA(vwhoQuinlan2010[,3:47], vwhoQuinlan2010[,2], tolDW=TRUE)
```

The Quinlan and Smol papers settled on the WA.inf.tol_XVal model as having the best performance metrics (RMSE=1.98 mg/L, R2=0.5998). To evaluate that the `vwho.model`

object has the same performance as the model described by Quinlan and Smol, use the `crossval`

function also provided by `rioja`

.

```
library(rioja)
rioja::crossval(vwho.model, cv.method="loo", verbose=FALSE, ngroups=10, nboot=100, h.cutoff=0, h.dist=NULL)
```

```
##
## Method : Weighted Averaging
## Call : WA(y = vwhoQuinlan2010[, 3:47], x = vwhoQuinlan2010[, 2], tolDW = TRUE)
##
## Tolerance DW : Yes
## Monotonic deshrink : No
## No. samples : 54
## No. species : 45
## Cross val. : loo
##
## Deshrinking regression coefficients:
## Inverse d/s Classical d/s
## wa.b0 -14.1270 4.1927
## wa.b1 3.7760 0.1758
## wa.tol.b0 -12.0037 3.9839
## wa.tol.b1 3.4135 0.2007
##
## Performance:
## RMSE R2 Avg.Bias Max.Bias Skill
## WA.inv 1.8168 0.6636 0.0000 2.7096 66.3643
## WA.cla 2.2302 0.6636 0.0000 2.5285 49.3166
## WA.inv.tol 1.7577 0.6852 0.0000 2.4689 68.5171
## WA.cla.tol 2.1235 0.6852 0.0000 2.1941 54.0510
## WA.inv_XVal 1.9896 0.5967 -0.0209 3.0363 59.6633
## WA.cla_XVal 2.3260 0.6085 -0.0389 2.9700 44.8688
## WA.inv.tol_XVal 1.9817 0.5998 0.0017 2.8652 59.9821
## WA.cla.tol_XVal 2.2577 0.6117 -0.0011 2.7239 48.0607
```

Confident that your model is the same as the one described in the manuscripts, it can then be applied down-core to midge sedimentary assemblage data to reconstruct VWHO concentrations.

Data sets can be merged on common columns using the `join`

function provided by the `analogue`

package.

This can be useful when comparing downcore data sets with a calibration set, by creating a merged data set adding rows/columns (populated by zeroes) as necessary for those taxa absent from either of the original datasets.

```
library(analogue)
calSet <- data.frame(taxa1=c(0.25, 0.75, 0.25),taxa2=c(0.20, 0.05, 0.35), taxa3=c(0.55, 0.20, 0.50), row.names=c("cal1", "cal2", "cal3"))
testSet <- data.frame(taxa1=c(0.25, 0.25, 0.50),taxa4=c(0.75, 0.10, 0.15), row.names=c("test1", "test2", "test3"))
join(calSet, testSet, type="outer", split=FALSE)
```

The `aggregate`

funciton provided in a base installation of R can also be useful when harmonizing disparate data sets.

Identifying whether a fossil data set has any close modern analogs within a calibration set (i.e. evaluating a transfer function) can be done using the `analog`

function provided by the `analogue`

package.

The `analogue`

package also contains a vignette with a detailed analog matching example that follows the approach described in Flower et al. 1997^{5} and Simpson et al. 2005^{6}.

```
library(analogue)
vignette("analogue_methods")
```

Regression trees can be constructed using the `rpart`

package.

Davies Test

Locally Estimated Scatterplot Smoothing (LOESS) can be performed using the `loess`

function provided by the bast `stats`

package.

Generalized Additive Models (or GAMs) can be constructed using the `mgcv`

package.

GAMs are increasingly applied in paleolimnological analyses because:

- Time series from the sediment record are often irregularly spaced in time, complicating the use of classical statistical time series models.
- Paleolimnological analyses of temporal trends often use linear regressions or (non-)parametric correlations despite potential violations of the underlying assumptions that likely occur due to temporal dependencies within the dataset.
- LOESS estimated trends contain inherent subjectivity regarding the parameters chosen to achieve the LOESS fit (e.g. the span)
- GAMS are statistical models that can be used to estimate trends as smooth functions of time; however, unlike LOESS, GAMs use automatic smoothness selection methods to objectively determine the complexity of the fitted trend.
- As formal statistical models, GAMs allow for potentially complex, non-linear trends, provide an account of model uncertainty, and can identify periods of significant temporal change.

Simpson (2018)^{7} provides a detailed introduction to the use of GAMs and their construction using R (the supplementary information for the article contains the annotated R code used to perform the described analyses).

The `dplyr`

package provides several functions that act intuitively as verbs for data manipulation.

These functions include: `filter`

, `select`

, `mutate`

, `arrange`

, and `summarise`

.

`%>%`

The `%>%`

command provided by the `magrittr`

package functions similarly to the Unix pipe “|” command. It “pipes” the output from one function directly into the next (as its first argument), allowing intermediate steps to be bypassed, reducing the need for repetition in the code.

Note that due to the special characters in `%>%`

, pulling up it’s help file requires:

`?magrittr::`%>%``

A simple example:

```
x <- c(5,4,3,2,1)
x %>% mean
```

`## [1] 3`

A more complicated example using `%>%`

in combination with some of the data frame manipulation verbs provided by `dplyr`

:

Import the `vwhoQuinlan2010`

dataset from `jezioro`

, and convert it to a data frame.

```
data(vwhoQuinlan2010)
vwhoQuinlan2010 <- as.data.frame(vwhoQuinlan2010)
```

Calculate the mean VWHO of those lakes (rows) where TANYSL <10 and MICROP >40.

`filter(vwhoQuinlan2010, MICROP>30) %>% filter(TANYSL<10) %>% summarise(mean(VWHO))`

```
## mean(VWHO)
## 1 4.6
```

`ggplot2`

`ggplot2`

is a data visualization package that implements concepts outlined in ‘The Grammar of Graphics’ a book ‘about grammatical rules for creating perceivable graphs’.

The syntax used to produce plots with `ggplot2`

is quite different from that of the base graphics system. For example:

```
library(ggplot2)
activities <- read.csv(file="data/ggplot2_activities.csv", header =T)
ggplot(activities, aes(x=Lake, y=Pb210, ymin=Pb210-Pb210err, ymax=Pb210+Pb210err)) +
theme(legend.position=c(0.90, 0.85), axis.text.x=element_text(size=8)) +
geom_point(aes(reorder(Lake, -Pb210))) +
geom_errorbar(aes(reorder(Lake, -Pb210), colour=Interval)) +
ylab("Total 210Pb Activity (Bq/kg)")
```

The `ggplot2`

package is able to construct very complex figures (click here for an extensive list of examples), and there are extensive resources available detailing its use. However, at PEARL we occasionally need to break some of its rules (e.g multiple y-axes), thus `ggplot2`

is not the best tool for producing stratigraphies that show both depth and age.

With the appropriate shapefiles, basic maps can be generated using `ggplot2`

and `rgdal`

(Bindings for the ‘Geospatial’ Data Abstraction Library).

```
library(ggplot2)
library(rgdal)
```

For example, to add a few points to an outline of Ontario:

- Read in the shapefile, and convert it to a dataframe for use with
`ggplot2`

.

```
ontario<-readOGR("data/ontarioShapefile/Ont2.shp", layer="Ont2")
ontario_df <- fortify(ontario)
```

- Import (or in this example create) a dataframe containing the data for the plot.

`sites <- data.frame(name=c("queens", "lake"), decLatitude=c(44.22, 45.18), decLongitude=c(-76.50, -78.83), stringsAsFactors = FALSE)`

- Generate the plot with
`geom_polygon`

and add the points with`geom_point`

. Note that by default this will use a Mercator projection centered on the Prime Meridian.

```
map <- ggplot() +
geom_polygon(data = ontario_df, aes(x = long, y = lat, group = group), fill="grey40") +
geom_point(data=sites, aes(x=decLongitude, y=decLatitude))
print(map)
```

- The projection can be changed with
`coord_map`

. For example, to change the previous map to a Lambert projection o a Lambert projection with a true scale between latitudes of 40 and 70 degrees N:

```
#The projection can be changed using 'coord_map'
map <- map +
coord_map("lambert", lat0=40, lat1=60)
print(map)
```