Package 'restrend'

Title: EStimate TRENDs
Description: Collection of functions to estimate linear or monotonic trends in hydrologic data.
Authors: Dave Lorenz [cre]
Maintainer: Dave Lorenz <[email protected]>
License: CC0
Version: 0.4.2
Built: 2024-11-11 06:32:21 UTC
Source: https://github.com/USGS-R/restrend

Help Index


Estimate Trends (ESTREND)

Description

This package has specialized functions for managing data to facilitate testing for linear or monotonic trends in hydrologic data.

Details

Package: restrend
Type: Package
Version: 0.4.1
Date: 2015-12-04
License: File LICENSE
Depends: g.data, smwrBase, smwrGraphs, smwrStats, smwrQW

This package contains functions that facilitate testing for linear or monotonic trends in hydrologic data. Water-quality data or any other data collected on a nearly regular basis can be uncensored, left censored, or multiply censored. Data for annual analysis must be uncensored.

Author(s)

Dave Lorenz <[email protected]>

Maintainer: Dave Lorenz <[email protected]>

References

Lorenz, D.L., in preparation, restrend—an R package for trend estimation in hydrologic data, version 0.4.1

See Also

smwrQW


Annual Trends

Description

Perform the annual Kendall trend test.

Usage

annualTrends(Stations = "All", Snames = "All", use.logs = FALSE,
  ar.adj = TRUE, report)

Arguments

Stations

a vector of the the station identifiers on which to do the flow adjustment.

Snames

a vector of the response variables on which to do the flow adjustment.

use.logs

log transform the data before the trend test?

ar.adj

adjust the attained p-value to account for serial correlation? See Details.

report

the name of the PDF file that contains a report for each test. The default is to use the name of the project with "_an" appended. If the PDF file exists, then it is not overwritten, but the name is appended with a sequence of numbers until one that is valid is created.

Details

The kensen.test includes a correction to the attained p-value that accounts the first order auto regression of the data. It is applied whenever there are at least 10 observations. To suppress the correction, set ar.adj to FALSE, or a numeric value can be specified for the minimum number of observations, whihc can be useful for a ragged analysis. The value for ar.adj cannot be set to less than 10.

Value

The name of the report file.


Trend Test

Description

Compute the seasonal Kendall trend test with the Turnbull slope estimator for left-censored data.

Usage

censSeaken(series, nseas = 12)

Arguments

series

any regularly spaced object that can be forced to class "lcens" to test for trend. Missing values are permitted.

nseas

the number of seasons per year.

Value

An object of class "htest" also inhereting class "seaken" containing the following components:

method

a description of the method.

statistic

the value of Kendall's tau.

p.value

the p-value. See Note.

p.value.raw

the p-value computed without correction for serial correlation. See Note.

p.value.corrected

the p-value computed with correction for serial correlation. See Note.

estimate

a named vector containing the Sen estimate of the slope in units per year, the median value of the data, and the median value of time.

data.name

a string containing the actual name of the input series with the number of years and seasons.

alternative

a character string describing alternative to the test ("two.sided").

null.value

the value for the hypothesized slope (0).

nyears

the number of years.

nseasons

the number of seasons.

series

the data that was analyzed converted to numeric values.

Note

The value of p.value is p.value.raw if there are fewer than 10 years of data and is p.value.corrected otherwise.

References

The approach used in censSeaken was used ooriginally in Sullivan and others (2009). The original version of the code was published in Lorenz and others (2011). It is based on principles for comparing censored values in Helsel (2012) and the Turnbull slope estiamte is described by Turnbul(1974).

Helsel, D.R. 2012, Statistics for censored environmental data using Minitab and R: New York, Wiley, 324 p.
Lorenz, D.L., Ahearn, E.A., Carter, J.M., Cohn, T.A., Danchuk, W.J., Frey, J.W., Helsel, D.R., Lee, K.E., Leeth, D.C., Martin, J.D., McGuire, V.L., Neitzert, K.M., Robertson, D.M., Slack, J.R., Starn, J., Vecchia, A.V., Wilkison, D.H., and Williamson, J.E., 2011, USGS library for S-PLUS for Windows—Release 4.0: U.S. Geological Survey Open-File Report 2011-1130. (Available at http://pubs.er.usgs.gov/publication/ofr20111130).
Sullivan, D.J., Vecchia, A.V., Lorenz, D.L., Gilliom, R.J., and Martin, J.D., 2009, Trends in pesticide concentrations in corn-belt streams, 1996???-2006: U.S. Geological Survey Scientific Investigations Report 2009-5132, 75 p.
Turnbull, B.W., 1974, Nonparametric estimation of a survivorship function with doubly censored data: Journal of the American Statistical Society, v. 69, p. 169–173.

Examples

## Not run: 
# Compare censored and uncensored to seaken
library(USGSwsData)
data(KlamathTP)
# Construct the regular series
KlamathTP.rs <- with(KlamathTP, regularSeries(TP_ss, sample_dt,
 begin="1972-01-01", end="1980-01-01"))
# Uncensored, differences due to rounding
with(KlamathTP.rs, seaken(Value, 12))
with(KlamathTP.rs, censSeaken(Value, 12))
# About 30 percent censoring, censSeaken closer to uncensored slope
with(KlamathTP.rs, seaken(ifelse(Value < 0.05, 0.025, Value), 12))
with(KlamathTP.rs, censSeaken(as.lcens(Value, 0.05), 12))

## End(Not run)

Check the ESTREND Project

Description

Check and get the ESTREND project to use for this analyses.

Usage

ckProj()

Value

The position of the current project in the serach path is returned.


Handle Missing Values in Data

Description

This function is useful for dealing with NAs in data frames. Only rows in the data specified by columns that have only missing values are removed.

Usage

dropAllMissing(data, columns = "All", ...)

Arguments

data

the dataset to subset for all missing values.

columns

the columns to check for missing values. See Details.

...

any additional arguments to columns if it is a funciton.

Details

The value for columns can be a character vector containing the name of the columns to check, or "All," which checks all columns. Or it can be a function that returns a logical value–TRUE checks the column and FALSE does not. Any additonal arguments the the function can be given by ....

Value

The dataset data having rows with at least one nonmissing value in the columns specified by columns.

Examples

# create a short test dataset
test.df <- data.frame(A=c("a", "b", "c", "d", "e"),
   B=c(1, 2, NA, NA, 5), # numeric values
   C=c(1L, 3L, NA, 4L, NA), # integer values
   D=c(1, 2, NA, NA, 5)) # more numeric values
# The default, no row has all missing values
dropAllMissing(test.df)
# Check 2 columns
dropAllMissing(test.df, columns=c("B", "C"))
# check all numeric, including both integer and numeric types
dropAllMissing(test.df, columns=is.numeric)
# Check only those that are type numeric
dropAllMissing(test.df, columns=inherits, what="numeric")

Evaluate Monthly Definitions

Description

Create tables of observations for each season in both the first and last halfs of the record. Internal use only.

Usage

estrendMonthTable(ssn12, frctn = 0.5)

Arguments

ssn12

the vector of data selected for the monthly seaken analysis.

frctn

limit for the fraction of possible observations in each half.

Value

A list of the results for the percentage of observations in each half, the selected months, and the number of selected months.


Evaluate Seasonal Definitions

Description

Create tables of observations for each season in both the BE and MI records. Internal use only.

Usage

estrendSeasonTable(ssn12, frctn = 0.5, pctg = 0.8)

Arguments

ssn12

the vector of data selected for the 12 seasons per year seaken analysis.

frctn

limit for the fraction of possible observations in the BE.

pctg

required percentage of seasons needed to exceed frctn.

Value

A list of the results for 12, 6, 4, and 3 seasons per year and the "best."


EstrendSub Data

Description

A subset of stations and water-quality constituents from Schertz and others (1991). The water-quality constituents were selected to cover a range of censoring levels. The stations were selected to represent a range of sampling intensity and duration.

Usage

EstrendSub

Format

Data frame with 5428 rows and 18 columns

Name Type Description
STAID character USGS station identifier
DATES Date Sample date
QI numeric Instantaneous streamflow at time of sample
QD numeric daily mean streamflow for sample
RN.organic character Remark code for organic nitrogen concentration
PN.organic numeric Organic nitrogen concentration in mg/L
RAmmonia character Remark code for ammonia concentration
PAmmonia numeric Ammonia concentration in mg/L as N
RKjeldahl character Remark code for Kjeldahl nitrogen concentration
PKjeldahl numeric Kjeldahl nitrogen concentration in mg/L
RTotal.P character Remark code for total (whole water) phosphorus concentration
PTotal.P numeric Total (whole water) phosphorus concentrationin mg/L
RCopper character Remark code for copper concentration
PCopper numeric Copper concentration in ug/L
RIron character Remark code for iron concentration
PIron numeric Iron concentration in ug/L
Calcium numeric Calcium concentration in mg/L
Chloride numeric Chloride concentration in mg/L

Source

The data include 8 water-quality constituents from 19 stations from Schertz and others (1991).

References

Schertz, T.L., Alexander, R.B., and Ohe, D.J., 1991, The computer program EStimate TREND (ESTREND), a system for the detection of trends in water-quality data: U.S. Geological Survey Water Resources Investigations Report 91-4040, 72 p.

Examples

## Not run: 
data(EstrendSub)
# Sampling date ranges for each station
with(EstrendSub, tapply(DATES, STAID, range))

## End(Not run)

Flow Adjusted Concentrations

Description

Create flow adjusted concentrations for the seasonal Kendall trend test.

Usage

flowAdjust(Stations = "All", Snames = "All", use.logs = TRUE,
  span = 0.75, max.cens = 5, report)

Arguments

Stations

a vector of the the station identifiers on which to do the flow adjustment.

Snames

a vector of the response variables on which to do the flow adjustment.

use.logs

fit a log-log LOWESS curve? If FALSE, then do not use the log transforms.

span

the span to use for the LOWESS curve.

max.cens

the maximum percent censoring permitted. A warning is printed if there are any censored data and a warning is printed if any exceed the value.

report

the name of the PDF file that contains graphs of each fit. The default is to use the name of the project with "_fa" appended. If the PDF file exists, then it is not overwritten, but the name is appended with a sequence of numbers until one that is valid is created.

Value

The name of the report file.


Get Trends

Description

Extract the trend results.

Usage

getTrends(Stations = "All", Snames = "All", sig.level = 0.05)

Arguments

Stations

a vector of the station identifiers on which to do the flow adjustment.

Snames

a vector of the response variables on which to do the flow adjustment.

sig.level

the alpha level of the test. If the attained p-value of the test is less than sig.level, then the null hypothesis of no trend is rejected and the trend is classes as "up" or "down."

Value

A data frame of the requested trend tests having these columns:

Station

the station identifier.

Response

the response variable name.

Type

the type of trend test.

NumYears

the number of years for the trend test.

NumSeasons

the number of seasons. Set to NA for the Tobit trend test.

Nobs

the number of observations used in the test.

RepValue

the representative value of the response variable. See Note.

Trend

the trend expressed as the average rate in Response units per year

Trend.pct

the trend expressed as the average rate in percent change per year

P.value

the attained p-value of the test.

Trend.dir

an indicator of the trend. Trend.dir is "none" if the attained p-value is greater than sig.level, "up" if the attained p-value is less than sig.level and Trend is positive, and "down" if the attained p-value is less than sig.level and Trend is negative. For the seasonal Kendall test, when there are many tied values, the results from the trend test and the trend may not agree, in these cases Trend.dir will be "*."

Note

The representative value is the median from a sample of the data taken to represent a reasonably uniform sampling over time and throughout the year. For the seasonal Kendall test, it is the median of the data, for the Tobit test, it is the median of the data sampled as for a seasonal Kendall test.


List ESTREND Projects

Description

List the ESTREND projects that have been set up in the current workspace.

Usage

lsProj()

Value

The names of valid projects are returned. If a project is in use, then it is labeled "in use," otherwise the most recently used project is labeled "recent" if that can be determined.


Plot Trends

Description

Create a series of diagnostic plots for a single Tobit trend test.

Usage

plotTrends(data, which = "Trend", device)

Arguments

data

the data set returned from getTrends.

which

which trend to plot, either "Trend" or "Trend.pct."

device

the name of the graphics device. Defaults to a valid graphics device on any platform. May be "pdf" to create a pdf file.

Value

The name of the graphics device.

Note

The project from which data was retrieved must be the current project.


Plot Tobit Trends

Description

Create a series of diagnostic plots for a single Tobit trend test.

Usage

plotTT(Station, Sname, device)

Arguments

Station

the station identifier.

Sname

the response variable.

device

the name of the graphics device. Defaults to a valid graphics device on any platform. May be "pdf" to create a pdf file.

Value

The name of the graphics device.

See Also

plot.summary.censReg


Print an Object

Description

Print the results of the regional seasonal Kendall trend test.

Usage

## S3 method for class 'rsktest'
print(x, digits = 4, ...)

Arguments

x

the object to be printed, must be the output from regionalSeaken.

digits

the number of digits to use when printing numeric values.

...

further arguments for other methods.

Value

The object is returned invisibly.

Note

Three p-values are printed for the analysis. The raw p-value is based only on the computed variance of S. The p-value corrected for serial correlation includes the adjustement described in Hirsh and Slack. The p-value corected for serial and spatial correlation also includes the adjustment based on Dietz and Killeen (1981) described in Sprague and others (2009). An alternative corrected value based on the methods described Douglas and others (2000) is also printed.

References

Dietz, E.J., and Killeen, T.J., 1981, A nonparametric multivariate test for monotone trend with pharmaceutical applications: Journal of the American Statistical Association, v. 76, p 169–174.
Douglas, E.M., Vogel, R.M., and Kroll, C.N., 2000, Trends in floods and low flows in the United States: impact of spatial correlation: Journal of Hydrology, v. 240, p. 90–105.
Sprague, L.A., Mueller, D.K., Schwarz, G.E., and Lorenz, D.L., 2009, Nutrient trends in streams and rivers of the United States, 1993–2003: U.S. Geolgical Survey Scientific Investigations Report 2008-5202, 196 p.


Trend Test

Description

Compute the regional seasonal Kendall trend test.

Usage

regionalSeaken(series, nseas = 12)

Arguments

series

a matrix of the regular series for each site. Each column is the regular series of observations for that site. Missing values are permitted.

nseas

the number of seasons per year.

Value

An object of class "rsktest" containing the following components:

method

a description of the method.

statistic

the value of Kendall's tau.

p.values

the p-value. See Note.

estimate

the Sen estimate of the slope in units per year–the median of the site medians.

data.name

a string containing the actual name of the input series with the number of years and seasons.

Note

The values of p.values are the attained p-values considering the raw results, corrected for serial correlation, corrected for spatial correlation using the method of Dietz and Killeen (1981), and the corrected value for spatial correlation using the method of Douglas and others (2000).

References

Dietz, E.J., and Killeen, T.J., 1981, A nonparametric multivariate test for monotone trend with pharmaceutical applications: Journal of the American Statistical Association, v. 76, p 169–174.
Douglas, E.M., Vogel, R.M., and Kroll, C.N., 2000, Trends in floods and low flows in the United States: impact of spatial correlation: Journal of Hydrology, v. 240, p. 90–105. Sprague, L.A., Mueller, D.K., Schwarz, G.E., and Lorenz, D.L., 2009, Nutrient trends in streams and rivers of the United States, 1993–2003: U.S. Geolgical Survey Scientific Investigations Report 2008-5202, 196 p.

Examples

## Not run: 
data(RK3b)
# Build matrix, site 11 is missing 2004, would be row 60
Ammonia <- with(RK3b, matrix(c(value[1:59], NA, value[60:299]),
 ncol=25))
regionalSeaken(Ammonia, 1)

## End(Not run)

Annual Series Data

Description

Ammonium (micro eq/L) in snowpack, Colorado and New Mexico.

Usage

RK3b

Format

Data frame with 299 rows and 3 columns

Name Type Description
year integer The year in which the sample was taken
site integer The site number
value numeric The ammonium concentration of the sample in micro eq/L

Source

Data retrieved from the supplemental data set for Helsel and others (2006).

References

Helsel, D.R., Mueller, D.K., and Slack, J.R., 2006, Computer program for the Kendall family of trend tests: U.S. Geological Survey Scientific Investigations Report 2005-5275, 4 p.

Examples

## Not run: 
data(RK3b)
# How many samples at each site?
with(RK3b, table(site))

## End(Not run)

Regional Seasonal Kendall Trends

Description

Perform the regional seasonal Kendall trend test. The test is only appropriate for uncensored data and is performed after the trends have been computed.

Usage

RSKTrends(Stations = "All", Sname, FAC = FALSE)

Arguments

Stations

a vector of the the station identifiers on which to do the trend test.

Sname

the response variables on which to do the trend test.

FAC

logical, if TRUE, then do the regional test on the flow ajusted values, otherwise the analysis in done on the raw values.

Value

An object of class "rsktest" from regionalSeaken.

See Also

regionalSeaken


Summarize Samples

Description

Create a multi-page pdf file of sample data by station in a dataset. The first pages are a listing of the first and last sample and total numerof of samples at each station. The following pages are dot plots of the sample dates. No more than 40 stations per page are liested or plotted.

Usage

sampReport(data, DATES = "DATES", STAID = "STAID", file)

Arguments

data

the dataset to summarize

DATES

the name of the column containing the sample dates

STAID

the name of the column containing the station identifiers

file

the output file base name; the .pdf suffix is appended to make the actual file name. If missing, then the name of data is used as the base name.

Value

The actual file name is returned invisibly.


Set Up an ESTREND Project

Description

Define the data and other characteristics of an ESTREND project.

Usage

setProj(project, data, STAID, DATES, Snames, FLOW = NULL, Covars = NULL,
  type = "seasonal", Start = NULL, End = NULL, tol = NULL,
  min.obs = 20)

Arguments

project

the name of the project to set up. The project name is forced to all lower case.

data

the dataset to use in for the project.

STAID

the name of the column in data that contains the station identifiers. The data are forced to character for the analysis.

DATES

the name of the column in data containing the sample dates. This column must be class "Date" for seasonal analyses, but can be numeric or integer for annual analyses.

Snames

the name of the columns in data containing the sample data for trend analysis. These must be of class "numeric," "integer," "lcens," or "qw."

FLOW

the name of the column in data containing the streamflow for each sample.

Covars

the name of the columns in data containing any covariate data for trend analysis.

type

the kind of analysis. Must be "seasonal," "tobit," "annual," or "monthly." Only the fist letter is necessary. See Details.

Start

the starting date for the analysis. For seasonal analyses, must be "Date" or a character string the represents a date. For annual analyses, must match the type of DATES. See Details.

End

the ending date for the analysis. For seasonal analyses, must be "Date" or a character string the represents a date. For annual analyses, must match the type of DATES. To guarantee that the periods are set up correctly, End should be the first day of the month following the actual last day.

tol

the tolerance for the samples dates. To be included in a regular analysis, the first sample within the Start to End time period must be within tol and likewise for the last sample. if tol is NULL, then it is set to 5 percent of the time frame, except for annual series analysis when it is set to 1 year.

min.obs

the minimum number of observations required for a trend analysis.

Details

If Start and End are NULL, then a ragged analysis is set up, and each variable set up for each station is analyzed on the available data. Otherwise, the analysis is regular and will be restricted to the time frame specified by Start and End.

If type is "seasonal," then the data are processed for a seasonal Kendall type of analysis—seasons are defined (12, 6, 4, and 3 per year) and evaluated to select the "best" number of seasons. If, type is "monthly," then the data are processed for a seasonal Kendall type of analysis—seasons defined as months, but the number of seasons is set by the months during which sampling occured. There must be at least 1/2 of possible samples in a month to be included. If type is "tobit" or "annual," then no seasonal analysis is done because the data are ready for analysis. If type is "annual," then the data must be uncensored.

Value

The name of the project is returned.

Note

A directory is created using the name of the project is created in the user's directory. It contains the objects created by restrend as R workspaces.


Seasonal Kendall Trends

Description

Perform the seasonal Kendall trend test.

Usage

SKTrends(Stations = "All", Snames = "All", use.logs = TRUE,
  max.cens = 5, nseas = NULL, report)

Arguments

Stations

a vector of the the station identifiers on which to do the trend test.

Snames

a vector of the response variables on which to do the trend test.

use.logs

logical, if TRUE, then log transform the data before the trend test, otherwise no log transform is used. Applies only to uncensored seasonal Kendall test—the data for the censored seasonal Kendall test is never log-transformed.

max.cens

the maximum percent censoring permitted for the uncensored seasonal Kendall test. If the percentage of censoring exceeds this value, then the censored seasonal Kendall test is performed. Set to a negative value to force the censored seasonal Kendall test for all Stations and Snames.

nseas

the number of seasons to use for all of the tests. If NULL (default), then use the selected number of seasons defined in setProj. Applies only when the type of analysis is "seasonal."

report

the base name of the PDF file that contains a report for each test; the suffix ".pdf" should not be inlcuded. The default is to use the name of the project with "_sk" appended. If the PDF file exists, then it is not overwritten, but the name is appended with a sequence of numbers until one that is valid is created.

Value

The name of the report file.


Summarize Samples

Description

Produce a summary of sample data by station in a dataset.

Usage

sumSamp(data, DATES = "DATES", STAID = "STAID", by.numeric = TRUE)

Arguments

data

the dataset to summarize.

DATES

the name of the column containing the sample dates.

STAID

the name of the column containing the station identifiers.

by.numeric

compute summaries for each numeric column in data?

Value

A data frame contining the starting and ending dates of the samples and the number of samples by station identifier if by.numeric is FALSE. If by.numeric is TRUE, then the returned data are by station and numeric column (Reponse) and an indicator of censoring is included.

Examples

# do something here

Tobit Trends

Description

Perform the tobit regression trend test.

Usage

tobitTrends(Stations = "All", Snames = "All", use.logs = TRUE,
  Flow = TRUE, Seasons = TRUE, Covars = NULL, report)

Arguments

Stations

a vector of the station identifiers on which to do the flow adjustment.

Snames

a vector of the response variables on which to do the flow adjustment.

use.logs

log transform the data (and flow) before the trend test?

Flow

logical, include the flow variable? Can also be an expression that includes any transformation of flow.

Seasons

include first-order fourier seasonal terms? Can also be an integer value indicating the order of the terms to include.

Covars

the covariates listed in the regression model.

report

the name of the PDF file that contains a report for each test. The default is to use the name of the project with "_tb" appended. If the PDF file exists, then it is not overwritten, but the name is appended with a sequence of numbers until one that is valid is created.

Value

The name of the report file.

Note

The trend variable name is .Dectime, which represents the trend in units of per year.


Flow Adjusted Concentrations

Description

Remove flow adjusted concentrations for the seasonal Kendall trend test.

Usage

undoFA(Stations, Snames)

Arguments

Stations

a vector of the the station identifiers on which to do the flow adjustment.

Snames

a vector of the response variables on which to do the flow adjustment.

Details

The arguments for Stations and Snames must be valid station identifiers and response varaibles. "All" is not valid.

Value

Nothing is returned.

Note

The undoFA function is used to remove the flow adjusted concentration data and prevent that analysis for the uncensored seasonal Kendall test. It is intended for the rare case when a reasonable flow-adjustment model cannot be found.


Define the ESTREND Project

Description

Define the ESTREND project to use for subsequent analyses.

Usage

useProj(project)

Arguments

project

the name of the project to use. The project name is forced to all lower case. If missing, then the most recent project is opened for use.

Value

The name of the project is returned.