# 1 Introduction

The jezioro package is a collection of functions and datasets intended for use by members of the Paleoecological Environmental Assessment and Research Laboratory (PEARL). The package is maintained by Adam Jeziorski, and contains contributions from Adam Jeziorski, Joshua Thienpont, and Andrew Labaj. If you have any suggestions for its improvement please let us know.

The package includes the following functions (along with some example data sets to illustrate their use):

• cladCount: determine the number of individuals (and their relative abundances) represented in raw cladoceran subfossil data
• dipteranVWHO: harmonize appropriately formatted chironomid/chaoborid sample data with the Quinlan and Smol (2001, 2010) calibration set, construct the VWHO inference model described in Quinlan and Smol (2001, 2010), infer VWHO from the sample data, and perform an analog matching analysis between the sample data and the calibration set
• interpDates: interpolate dates for the undated intervals of a dated sediment core
• quickGAM: performs the GAM fitting procedure and related analyses described in Simpson 2018
• vrsChla: infer lake sediment chlorophyll a concentrations from visible reflectance measurements
• the ‘binford’ functions collectively calculate 210Pb dates for a sediment core using the methods described in Binford 1990, Schelske 1994, and Appleby 2001
• binfordRho: determine water content and dry sediment density for freeze-dried intervals of a sediment core
• binfordActivity: calculate 210Pb, 137Cs, and 214Bi activities corrected for the efficiency of the gamma counter
• binfordDates: use the unsupported fraction of 210Pb activity in sediment samples to calculate sediment ages via the Constant Rate of Supply (CRS) model
• the ‘wiltse’ functions test for homogeneity and coherence within a dataset, they are slightly modified versions of the ‘brien’ and ‘decompose’ functions described in Brendan Wiltse’s 2014 PhD thesis
• wiltseBrien: perform the Brien test (Brien et al. 1984) on variables contained within a data frame
• wiltseDecompose: build upon the Brien test by identifying homogenous and coherent subsets within the correlation matrix

The package also contains some commonly used calibration sets to aid the construction of the relevant transfer functions for application to fossil assemblages:

• tpHall1996: diatom-inferred TP calibration set of 54 south-central Ontario lakes described in Hall and Smol (1996)
• vwhoQuinlan2010: dipteran-inferred VWHO calibration set of 54 south-central Ontario lakes described in Quinlan and Smol (2001; 2010)

The chirInfo data set tabulates several aspects of chironomid taxonomy/ecology to facilitate manipulation of chironomid data (e.g. sorting by littoral/profundal habitat preferences)

# 2 Functions

cladCount determines the maximum number of individuals represented in raw counts of cladoceran subfossils.

In paleolimnological cladoceran analyses, all remains (carapaces, headshields, ephippia, postabdominal claws, etc.) should be tabulated seperately, with only the most frequently encountered remain for each taxon used to estimate its abundance (Korhola and Rautio 2001). cladCount is a function to quickly calculate the maximum number of individuals represented in appropriately formatted raw counts of subfossils.

Input data must contain the taxon name and subfossil name in the first two columns (respectively), with each subsequent column a sample/interval. If a taxon can be represented by more than one subfossil type, the ‘Taxon’ cell should be left blank from the second row onwards (these blank cells are how the function identifies the number of subfossils present for each taxon).

The required format of cladCount input data is illustrated by cladCountInput:

data(cladCountInput)
Taxon Subfossil X0.0.25 X0.5.0.75 X1.0.1.25 X1.5.1.75
Bosmina sp. Antennule 495 623 178 221
Carapace 531 511 143 188
D. longispina PA Claw 11 23 10 15
D. pulex PA Claw NA NA NA NA
Daphniid Tail Stem NA NA NA NA
Ephippia NA 1 NA NA
Acroperus harpae Headshield NA NA NA NA
Carapace NA NA NA NA
Postabdomen NA NA NA NA
Alona affinis Headshield NA 1 NA NA

The largest number of individuals possibly represented by all the subfossils for a particular taxon is identified by attributing the following numbers of each subfossil to an individual (rounding up in the case of ‘half-individuals’; Korhola and Rautio, 2001):

• Carapace (2)
• PA Claw (2)
• Postabdomen (1)
• Mandible(2)
• Caudal Furca (2)
• Exopodite Segment (2)
• Basal Exp Segment (2)
• Exp Segment 2 (2)
• Exp Segment 3 (2)
• Antennule (2)
• Tail Stem (1)

The output of cladCount is controlled by three arguments (percCutoff, sampleCutoff, and outputType):

• percCutoff (defaults to 2): minimum relative abundance (i.e. %) required for a taxon to be included in the reduced subset (‘gt’ - abbreviation of ‘greater than’)
• sampleCutoff (defaults to 2): the minimum number of samples a taxon must be present in with at least a relative abundance of percCutoff for inclusion in the reduced subset (‘gt’)
• outputType (defaults to ‘indiv’): the format of the output, either individuals (‘indiv’), relative abundance (‘perc’), or the relative abundances of only those taxa that meet the sampleCutoff and percCutoff criteria (‘gt’)

For example, to return the number of individuals for all taxa present in the dataset:

data(cladCountInput)
cladCount(cladCountInput)

Similarly, to return the relative abundances of only those taxa with greater than 4% abundance in at least 2 samples:

data(cladCountInput)
cladCount(cladCountInput, percCutoff=4, sampleCutoff=2, outputType='gt')

#### 2.1.0.1 References

Korhola A, Rautio M (2001) 2. Cladocera and other branchiopod crustaceans. In: Smol JP, Birks HJB, Last WM (eds.) Tracking Environmental Change Using Lake Sediments. Volume 4: Zoological Indicators. Kluwer Academic Publishers, Dordrecht, The Netherlands, pp 4-41

## 2.2 dipteranVWHO

dipteranVWHO harmonizes chironomid/chaoborid count data with the taxonomy used in the Quinlan and Smol (2001, 2010) calibration set, returns the chironomid inferred volume weighted hypolimnetic oxygen concentrations (VWHO) from the sample data using the inference model described in Quinlan and Smol (2001, 2010), and also performs some analog matching between the sample data and the calibration set.

The required format of the input data is provided in an example data set dipteranVWHOInput

The output of dipteranVWHO is modified by three arguments (evaluate, percentileCut, and lowCount):

• evaluate (defaults to FALSE): logical to indicate whether to display the model information
• percentileCut (defaults to 5): cutoff value for flagging samples in the plots as having poor modern analogs
• lowCount (defaults to 50): cutoff value to flag samples that have low counts (i.e. less than 50 individuals)

The function returns a list of four lists:

• formattedData contains the sample info, the raw chironomid counts, the raw chaoborid counts, the chironomid counts aggregated by taxon code to match the taxonomy used in vwhoQuinlan 2010 dataset, and vector of the 44 taxon codes used in the model.
• vwhoModel the model object constructed by the ‘WA’ function provided by ‘rioja’
• vwhoResults contains model ouput, relative abundance of the dipterans used in the model, relative abundances of all dipterans in the dataset
• analogResults contains the output from the analog analyis, the close modern analogs, and the minimum dissimilarity values

In addition, three plots are produced:

• minimum dissimilarity between each sample interval and the calibration set data
• sediment depth vs dipteran-inferred VWHO
• a composite plot combining data from both graphs

For example, to perform the analyses on the example dataset using a modern analog cutoff of 10, and only flagging counts with less than 40 individuals as low:

data(dipteranVWHOInput)
dipteranVWHO(dipteranVWHOInput, evaluate=TRUE, percentileCut=10, lowCount=40)

#### 2.2.0.1 References

Quinlan R, Smol JP (2001) Setting minimum head capsule abundance and taxa deletion criteria in chironomid-based inference models. Journal of Paleolimnology 26: 327-342

Quinlan R, Smol JP (2001) Chironomid-based inference models for estimating end-of-summer hypolimnetic oxygen from south-central Ontario shield lakes. Freshwater Biology 46: 1529-1551

Quinlan R, Smol JP (2010) Use of Chaoborus subfossil mandibles in models for inferring past hypolimnetic oxygen. Journal of Paleolimnology 44: 43-50

## 2.3 interpDates

interpDates interpolates dates for any undated intervals in a sediment core from the midpoint and age of the dated intervals, as well as the sectioning resolution.

Input data must contain two columns, and the argument intervalWidth is used to specifiy the sectioning resolution:

• Column 1: midpoint depth of the dated interval
• Column 2: date of the interval

The required format of interpDates input data:

##      Midpt  Age
## [1,]  0.25 2017
## [2,]  4.25 2000
## [3,]  8.25 1950
## [4,] 18.25 1850

The output of interpDates is modified by a single argument (intervalWidth):

• intervalWidth (defaults to 0.5): a single numeric value is used to indicate a constant sectioning resolution (e.g. the default value assumes the entire core was section at 0.5cm intervals). For a variable sectioning resolution, change intervalWidth to a vector providing the width of each interval in the core

Three different interpolation methods are used to determine the dates:

• ‘connectTheDotsDates’ are calculated from straight lines between sequential date pairs
• ‘linearDates’ are calculated by fitting a straight line through all dated intervals
• ‘polynomialDates’ are calculated by fitting a 2nd order polynomial line through all dated intervals

For example, to return the interpolated dates from all 3 methods (as well as a plot comparing the two fitted lines) for a core with four dated intervals and a constant sectioning resolution of 0.5cm:

interpDates.input <- cbind(c(0.25, 4.25, 8.25, 18.25), c(2017, 2000, 1950, 1850))
interpDates(interpDates.input)

Similarly, to return interpolated dates for a core with four dated intervals sectioned at a 0.5cm resolution for the first 10 intervals, and then a 1.0cm resolution for the next 10 intervals:

interpDates.input <- cbind(c(0.25, 4.25, 8.5, 13.5), c(2017, 2000, 1950, 1850))
intervalWidth <- c(rep(0.5, 10), rep(1.0, 10))
interpDates(interpDates.input, intervalWidth)

For more involved approaches to the estimation of age-depth relationships consult Blaauw and Heegaard (2012).

#### 2.3.0.1 References

Blaauw M, Heegaard E (2012) 12. Estimation of age-depth relationships. In: Birks HJB, Lotter AF, Juggins S, Smol JP (eds.) Tracking Environmental Change Using Lake Sediments. Volume 5: Data Handling and Numerical Techniques. Springer, Netherlands, pp 379-413

## 2.4 quickGAM

quickGAM fits a GAM along with the related analyses described in Simpson 2018. This function applies the code provided in the Simpson 2018 Supplementary Information.

Input data must be composed of two vectors:

• x: the independent variable
• y: the dependent variable

The output of ‘quickGAM’ is modified by three optional arguments, that can be used to ensure the axis labels and y-axis range remain consistent across all the output plots.

• xLabel (defaults to NULL)
• yLabel (defaults to NULL)
• yRange (defaults to NULL)

quickGAM returns a series of six plots: - A simple scatterplot of the input data - Estimated CAR1 process from the GAM fitted to the input data ( i.e. an autocorrelation check) - GAM-based trends fitted to the input data - Estimated trends (w 20 random draws of the GAM fit to the input data) - 95% simultaneous confidence intervals (light grey) and across-the-function confidence intervals (dark grey) on the estimated trends. - Estimated first derivatives (black lines) and 95% simultaneous confidence intervals of the GAM trends fitted to the data. Where the simultaneous interval does not include 0, the models detect significant temporal change in the response.

For example, to fit a GAM between sediment depth (x) and the inferred-chlorophyll a values (y) provided in the example data set vrsChlaInput:

Read in the example data set:

data(vrsChlaInput)

The optional arguments allow for consistent output plots.

indepVarLabel <- "Core Depth (cm)"
depenVarLabel <- expression("VRS-Inferred Chlorophyll "~italic("a")~" (mg"%.%"g"^"-1"*textstyle(")"))
depenVarRange <- c(0, 0.0825)

Note, that to use the default values for the plots, using ‘drop=FALSE’ when reading in data can preserve column names for use in the plots.

quickGAM(plotData[,1, drop=FALSE],plotData[,2, drop=FALSE], xLabel = indepVarLabel, yLabel = depenVarLabel, yRange = depenVarRange)

#### 2.4.0.1 References

Simpson GL (2018) Modelling Palaeoecological Time Series Using Generalised Additive Models. Frontiers in Ecology and Evolution [https://dx.doi.org/10.3389/fevo.2018.00149]

## 2.5 vrsChla

vrsChla infers chlorophyll a concentrations of sediments from spectral measurements of absorbance at wavelengths between 650-700 nm, following the approach described in Wolfe et al. (2006) and Michelutti et al. (2010), and reviewed in Michelutti and Smol (2016).

The technique uses a simple linear predictive model to infer sedimentary chlorophyll a concentrations (along with its primary degradation products, pheophytin a and pheophorbide a) from the absorbance peak centered on 675nm following the equation:

$[\mbox{chlorophyll }\textit{a } + \mbox{ derivatives}] = 0.0919 · \mbox{peak area}_{\textit{ 650-700 nm}} + 0.0011$

Input data must contain 27 columns:

• Column 1: midpoint depth of the sediment interval
• Columns 2-27: values from the spectrophotometer for wavelengths 650-700 nm, measured every 2 nm

The required format of vrsChla input data is illustrated by vrsChlaInput:

data(vrsChlaInput)
##       V1          V2           V3           V4           V5           V6
## 1  0.125 -0.01127052 -0.011009693 -0.010835648 -0.010232449 -0.009695530
## 2  0.625  0.04971433  0.049901485  0.050267935  0.050736427  0.051381111
## 3  1.125  0.02501535  0.025347233  0.025770903  0.026331663  0.026941776
## 4  1.625  0.00387764  0.004512548  0.005248785  0.006270409  0.007355452
## 5  2.125  0.12109232  0.121597052  0.122285128  0.123096704  0.123881340
## 6  2.625  0.08708549  0.087878942  0.088815451  0.089810133  0.090915442
## 7  2.875  0.09193444  0.092756271  0.093790054  0.094962120  0.096058369
## 8  3.125  0.12888002  0.129751921  0.130694628  0.131823540  0.132953167
## 9  3.375  0.14781332  0.148777962  0.150153875  0.151404858  0.152652025
## 10 3.625  0.14603949  0.146726370  0.147392273  0.148148060  0.148859501

NOTE: When using the Model 6500 series Rapid Content Analyzer at PEARL, the necessary values are contained in the ‘spectra’ tab of the excel file output (although they must be transposed). Ensure cells are formatted to 15 decimal places to avoid small rounding errors.

For example, to determine the chlorophyll a concentrations of vrsChlaInput and plot sediment depth vs chlorophyll a (red line denotes the estimated 0.01 mg·g-1 lower detection limit of the method):

data(vrsChlaInput)
vrsChla(vrsChlaInput)