Title: | EStimate TRENDs |
---|---|
Description: | Collection of functions to estimate linear or monotonic trends in hydrologic data. |
Authors: | Dave Lorenz [cre] |
Maintainer: | Dave Lorenz <[email protected]> |
License: | CC0 |
Version: | 0.4.2 |
Built: | 2024-11-11 06:32:21 UTC |
Source: | https://github.com/USGS-R/restrend |
This package has specialized functions for managing data to facilitate testing for linear or monotonic trends in hydrologic data.
Package: | restrend |
Type: | Package |
Version: | 0.4.1 |
Date: | 2015-12-04 |
License: | File LICENSE |
Depends: | g.data, smwrBase, smwrGraphs, smwrStats, smwrQW |
This package contains functions that facilitate testing for linear or monotonic trends in hydrologic data. Water-quality data or any other data collected on a nearly regular basis can be uncensored, left censored, or multiply censored. Data for annual analysis must be uncensored.
Dave Lorenz <[email protected]>
Maintainer: Dave Lorenz <[email protected]>
Lorenz, D.L., in preparation, restrend—an R package for trend estimation in hydrologic data, version 0.4.1
Perform the annual Kendall trend test.
annualTrends(Stations = "All", Snames = "All", use.logs = FALSE, ar.adj = TRUE, report)
annualTrends(Stations = "All", Snames = "All", use.logs = FALSE, ar.adj = TRUE, report)
Stations |
a vector of the the station identifiers on which to do the flow adjustment. |
Snames |
a vector of the response variables on which to do the flow adjustment. |
use.logs |
log transform the data before the trend test? |
ar.adj |
adjust the attained p-value to account for serial correlation? See Details. |
report |
the name of the PDF file that contains a report for each test. The default is to use the name of the project with "_an" appended. If the PDF file exists, then it is not overwritten, but the name is appended with a sequence of numbers until one that is valid is created. |
The kensen.test
includes a correction to the attained p-value
that accounts the first order auto regression of the data. It is
applied whenever there are at least 10 observations. To suppress the
correction, set ar.adj
to FALSE
, or a numeric value can
be specified for the minimum number of observations, whihc can be
useful for a ragged analysis. The value for ar.adj
cannot be
set to less than 10.
The name of the report file.
Compute the seasonal Kendall trend test with the Turnbull slope estimator for left-censored data.
censSeaken(series, nseas = 12)
censSeaken(series, nseas = 12)
series |
any regularly spaced object that can be forced to class "lcens" to test for trend. Missing values are permitted. |
nseas |
the number of seasons per year. |
An object of class "htest" also inhereting class "seaken" containing the following components:
method |
a description of the method. |
statistic |
the value of Kendall's tau. |
p.value |
the p-value. See Note. |
p.value.raw |
the p-value computed without correction for serial correlation. See Note. |
p.value.corrected |
the p-value computed with correction for serial correlation. See Note. |
estimate |
a named vector containing the Sen estimate of the slope in units per year, the median value of the data, and the median value of time. |
data.name |
a string containing the actual name of the input series with the number of years and seasons. |
alternative |
a character string describing alternative to the test ("two.sided"). |
null.value |
the value for the hypothesized slope (0). |
nyears |
the number of years. |
nseasons |
the number of seasons. |
series |
the data that was analyzed converted to numeric values. |
The value of p.value
is p.value.raw
if there are
fewer than 10 years of data and is p.value.corrected
otherwise.
The approach used in censSeaken
was used ooriginally in Sullivan
and others (2009). The original version of the code was published in Lorenz
and others (2011). It is based on principles for comparing censored values
in Helsel (2012) and the Turnbull slope estiamte is described by Turnbul(1974).
Helsel, D.R. 2012, Statistics for censored environmental data
using Minitab and R: New York, Wiley, 324 p.
Lorenz, D.L., Ahearn, E.A., Carter, J.M., Cohn, T.A., Danchuk, W.J.,
Frey, J.W., Helsel, D.R., Lee, K.E., Leeth, D.C., Martin, J.D.,
McGuire, V.L., Neitzert, K.M., Robertson, D.M., Slack, J.R., Starn, J.,
Vecchia, A.V., Wilkison, D.H., and Williamson, J.E., 2011,
USGS library for S-PLUS for Windows—Release 4.0: U.S. Geological
Survey Open-File Report 2011-1130.
(Available at http://pubs.er.usgs.gov/publication/ofr20111130).
Sullivan, D.J., Vecchia, A.V., Lorenz, D.L., Gilliom, R.J., and
Martin, J.D., 2009, Trends in pesticide concentrations in corn-belt streams,
1996???-2006: U.S. Geological Survey Scientific Investigations Report
2009-5132, 75 p.
Turnbull, B.W., 1974, Nonparametric estimation of a survivorship function
with doubly censored data: Journal of the American Statistical Society,
v. 69, p. 169–173.
## Not run: # Compare censored and uncensored to seaken library(USGSwsData) data(KlamathTP) # Construct the regular series KlamathTP.rs <- with(KlamathTP, regularSeries(TP_ss, sample_dt, begin="1972-01-01", end="1980-01-01")) # Uncensored, differences due to rounding with(KlamathTP.rs, seaken(Value, 12)) with(KlamathTP.rs, censSeaken(Value, 12)) # About 30 percent censoring, censSeaken closer to uncensored slope with(KlamathTP.rs, seaken(ifelse(Value < 0.05, 0.025, Value), 12)) with(KlamathTP.rs, censSeaken(as.lcens(Value, 0.05), 12)) ## End(Not run)
## Not run: # Compare censored and uncensored to seaken library(USGSwsData) data(KlamathTP) # Construct the regular series KlamathTP.rs <- with(KlamathTP, regularSeries(TP_ss, sample_dt, begin="1972-01-01", end="1980-01-01")) # Uncensored, differences due to rounding with(KlamathTP.rs, seaken(Value, 12)) with(KlamathTP.rs, censSeaken(Value, 12)) # About 30 percent censoring, censSeaken closer to uncensored slope with(KlamathTP.rs, seaken(ifelse(Value < 0.05, 0.025, Value), 12)) with(KlamathTP.rs, censSeaken(as.lcens(Value, 0.05), 12)) ## End(Not run)
Check and get the ESTREND project to use for this analyses.
ckProj()
ckProj()
The position of the current project in the serach path is returned.
This function is useful for dealing with NA
s in data frames. Only
rows in the data specified by columns
that have only missing values
are removed.
dropAllMissing(data, columns = "All", ...)
dropAllMissing(data, columns = "All", ...)
data |
the dataset to subset for all missing values. |
columns |
the columns to check for missing values. See Details. |
... |
any additional arguments to |
The value for columns
can be a character vector containing the
name of the columns to check, or "All," which checks all columns. Or it can be
a function that returns a logical value–TRUE
checks the column and
FALSE
does not. Any additonal arguments the the function can be given
by ....
The dataset data
having rows with at least one nonmissing value
in the columns specified by columns
.
# create a short test dataset test.df <- data.frame(A=c("a", "b", "c", "d", "e"), B=c(1, 2, NA, NA, 5), # numeric values C=c(1L, 3L, NA, 4L, NA), # integer values D=c(1, 2, NA, NA, 5)) # more numeric values # The default, no row has all missing values dropAllMissing(test.df) # Check 2 columns dropAllMissing(test.df, columns=c("B", "C")) # check all numeric, including both integer and numeric types dropAllMissing(test.df, columns=is.numeric) # Check only those that are type numeric dropAllMissing(test.df, columns=inherits, what="numeric")
# create a short test dataset test.df <- data.frame(A=c("a", "b", "c", "d", "e"), B=c(1, 2, NA, NA, 5), # numeric values C=c(1L, 3L, NA, 4L, NA), # integer values D=c(1, 2, NA, NA, 5)) # more numeric values # The default, no row has all missing values dropAllMissing(test.df) # Check 2 columns dropAllMissing(test.df, columns=c("B", "C")) # check all numeric, including both integer and numeric types dropAllMissing(test.df, columns=is.numeric) # Check only those that are type numeric dropAllMissing(test.df, columns=inherits, what="numeric")
Create tables of observations for each season in both the first and last halfs of the record. Internal use only.
estrendMonthTable(ssn12, frctn = 0.5)
estrendMonthTable(ssn12, frctn = 0.5)
ssn12 |
the vector of data selected for the monthly seaken analysis. |
frctn |
limit for the fraction of possible observations in each half. |
A list of the results for the percentage of observations in each half, the selected months, and the number of selected months.
Create tables of observations for each season in both the BE and MI records. Internal use only.
estrendSeasonTable(ssn12, frctn = 0.5, pctg = 0.8)
estrendSeasonTable(ssn12, frctn = 0.5, pctg = 0.8)
ssn12 |
the vector of data selected for the 12 seasons per year seaken analysis. |
frctn |
limit for the fraction of possible observations in the BE. |
pctg |
required percentage of seasons needed to exceed frctn. |
A list of the results for 12, 6, 4, and 3 seasons per year and the "best."
A subset of stations and water-quality constituents from Schertz and others (1991). The water-quality constituents were selected to cover a range of censoring levels. The stations were selected to represent a range of sampling intensity and duration.
EstrendSub
EstrendSub
Data frame with 5428 rows and 18 columns
Name | Type | Description |
STAID | character | USGS station identifier |
DATES | Date | Sample date |
QI | numeric | Instantaneous streamflow at time of sample |
QD | numeric | daily mean streamflow for sample |
RN.organic | character | Remark code for organic nitrogen concentration |
PN.organic | numeric | Organic nitrogen concentration in mg/L |
RAmmonia | character | Remark code for ammonia concentration |
PAmmonia | numeric | Ammonia concentration in mg/L as N |
RKjeldahl | character | Remark code for Kjeldahl nitrogen concentration |
PKjeldahl | numeric | Kjeldahl nitrogen concentration in mg/L |
RTotal.P | character | Remark code for total (whole water) phosphorus concentration |
PTotal.P | numeric | Total (whole water) phosphorus concentrationin mg/L |
RCopper | character | Remark code for copper concentration |
PCopper | numeric | Copper concentration in ug/L |
RIron | character | Remark code for iron concentration |
PIron | numeric | Iron concentration in ug/L |
Calcium | numeric | Calcium concentration in mg/L |
Chloride | numeric | Chloride concentration in mg/L |
The data include 8 water-quality constituents from 19 stations from Schertz and others (1991).
Schertz, T.L., Alexander, R.B., and Ohe, D.J., 1991, The computer program EStimate TREND (ESTREND), a system for the detection of trends in water-quality data: U.S. Geological Survey Water Resources Investigations Report 91-4040, 72 p.
## Not run: data(EstrendSub) # Sampling date ranges for each station with(EstrendSub, tapply(DATES, STAID, range)) ## End(Not run)
## Not run: data(EstrendSub) # Sampling date ranges for each station with(EstrendSub, tapply(DATES, STAID, range)) ## End(Not run)
Create flow adjusted concentrations for the seasonal Kendall trend test.
flowAdjust(Stations = "All", Snames = "All", use.logs = TRUE, span = 0.75, max.cens = 5, report)
flowAdjust(Stations = "All", Snames = "All", use.logs = TRUE, span = 0.75, max.cens = 5, report)
Stations |
a vector of the the station identifiers on which to do the flow adjustment. |
Snames |
a vector of the response variables on which to do the flow adjustment. |
use.logs |
fit a log-log LOWESS curve? If |
span |
the span to use for the LOWESS curve. |
max.cens |
the maximum percent censoring permitted. A warning is printed if there are any censored data and a warning is printed if any exceed the value. |
report |
the name of the PDF file that contains graphs of each fit. The default is to use the name of the project with "_fa" appended. If the PDF file exists, then it is not overwritten, but the name is appended with a sequence of numbers until one that is valid is created. |
The name of the report file.
Extract the trend results.
getTrends(Stations = "All", Snames = "All", sig.level = 0.05)
getTrends(Stations = "All", Snames = "All", sig.level = 0.05)
Stations |
a vector of the station identifiers on which to do the flow adjustment. |
Snames |
a vector of the response variables on which to do the flow adjustment. |
sig.level |
the alpha level of the test. If the attained p-value
of the test is less than |
A data frame of the requested trend tests having these columns:
Station |
the station identifier. |
Response |
the response variable name. |
Type |
the type of trend test. |
NumYears |
the number of years for the trend test. |
NumSeasons |
the number of seasons. Set to |
Nobs |
the number of observations used in the test. |
RepValue |
the representative value of the response variable. See
|
Trend |
the trend expressed as the average rate in Response units per year |
Trend.pct |
the trend expressed as the average rate in percent change per year |
P.value |
the attained p-value of the test. |
Trend.dir |
an indicator of the trend. |
The representative value is the median from a sample of the data taken to represent a reasonably uniform sampling over time and throughout the year. For the seasonal Kendall test, it is the median of the data, for the Tobit test, it is the median of the data sampled as for a seasonal Kendall test.
List the ESTREND projects that have been set up in the current workspace.
lsProj()
lsProj()
The names of valid projects are returned. If a project is in use, then it is labeled "in use," otherwise the most recently used project is labeled "recent" if that can be determined.
Create a series of diagnostic plots for a single Tobit trend test.
plotTrends(data, which = "Trend", device)
plotTrends(data, which = "Trend", device)
data |
the data set returned from |
which |
which trend to plot, either "Trend" or "Trend.pct." |
device |
the name of the graphics device. Defaults to a valid graphics device on any platform. May be "pdf" to create a pdf file. |
The name of the graphics device.
The project from which data
was retrieved must be the
current project.
Create a series of diagnostic plots for a single Tobit trend test.
plotTT(Station, Sname, device)
plotTT(Station, Sname, device)
Station |
the station identifier. |
Sname |
the response variable. |
device |
the name of the graphics device. Defaults to a valid graphics device on any platform. May be "pdf" to create a pdf file. |
The name of the graphics device.
Print the results of the regional seasonal Kendall trend test.
## S3 method for class 'rsktest' print(x, digits = 4, ...)
## S3 method for class 'rsktest' print(x, digits = 4, ...)
x |
the object to be printed, must be the output from
|
digits |
the number of digits to use when printing numeric values. |
... |
further arguments for other methods. |
The object is returned invisibly.
Three p-values are printed for the analysis. The raw p-value is based only on the computed variance of S. The p-value corrected for serial correlation includes the adjustement described in Hirsh and Slack. The p-value corected for serial and spatial correlation also includes the adjustment based on Dietz and Killeen (1981) described in Sprague and others (2009). An alternative corrected value based on the methods described Douglas and others (2000) is also printed.
Dietz, E.J., and Killeen, T.J., 1981, A nonparametric multivariate test for
monotone trend with pharmaceutical applications: Journal of the American
Statistical Association, v. 76, p 169–174.
Douglas, E.M., Vogel, R.M., and Kroll, C.N., 2000, Trends in floods and low
flows in the United States: impact of spatial correlation: Journal of
Hydrology, v. 240, p. 90–105.
Sprague, L.A., Mueller, D.K., Schwarz, G.E., and Lorenz, D.L., 2009,
Nutrient trends in streams and rivers of the United States, 1993–2003:
U.S. Geolgical Survey Scientific Investigations Report 2008-5202, 196 p.
Compute the regional seasonal Kendall trend test.
regionalSeaken(series, nseas = 12)
regionalSeaken(series, nseas = 12)
series |
a matrix of the regular series for each site. Each column is the regular series of observations for that site. Missing values are permitted. |
nseas |
the number of seasons per year. |
An object of class "rsktest" containing the following components:
method |
a description of the method. |
statistic |
the value of Kendall's tau. |
p.values |
the p-value. See Note. |
estimate |
the Sen estimate of the slope in units per year–the median of the site medians. |
data.name |
a string containing the actual name of the input series with the number of years and seasons. |
The values of p.values
are the attained p-values considering
the raw results, corrected for serial correlation, corrected for spatial
correlation using the method of Dietz and Killeen (1981), and the corrected
value for spatial correlation using the method of Douglas and others (2000).
Dietz, E.J., and Killeen, T.J., 1981, A nonparametric multivariate test for
monotone trend with pharmaceutical applications: Journal of the American
Statistical Association, v. 76, p 169–174.
Douglas, E.M., Vogel, R.M., and Kroll, C.N., 2000, Trends in floods and low
flows in the United States: impact of spatial correlation: Journal of
Hydrology, v. 240, p. 90–105.
Sprague, L.A., Mueller, D.K., Schwarz, G.E., and Lorenz, D.L., 2009,
Nutrient trends in streams and rivers of the United States, 1993–2003:
U.S. Geolgical Survey Scientific Investigations Report 2008-5202, 196 p.
## Not run: data(RK3b) # Build matrix, site 11 is missing 2004, would be row 60 Ammonia <- with(RK3b, matrix(c(value[1:59], NA, value[60:299]), ncol=25)) regionalSeaken(Ammonia, 1) ## End(Not run)
## Not run: data(RK3b) # Build matrix, site 11 is missing 2004, would be row 60 Ammonia <- with(RK3b, matrix(c(value[1:59], NA, value[60:299]), ncol=25)) regionalSeaken(Ammonia, 1) ## End(Not run)
Ammonium (micro eq/L) in snowpack, Colorado and New Mexico.
RK3b
RK3b
Data frame with 299 rows and 3 columns
Name | Type | Description |
year | integer | The year in which the sample was taken |
site | integer | The site number |
value | numeric | The ammonium concentration of the sample in micro eq/L |
Data retrieved from the supplemental data set for Helsel and others (2006).
Helsel, D.R., Mueller, D.K., and Slack, J.R., 2006, Computer program for the Kendall family of trend tests: U.S. Geological Survey Scientific Investigations Report 2005-5275, 4 p.
## Not run: data(RK3b) # How many samples at each site? with(RK3b, table(site)) ## End(Not run)
## Not run: data(RK3b) # How many samples at each site? with(RK3b, table(site)) ## End(Not run)
Perform the regional seasonal Kendall trend test. The test is only appropriate for uncensored data and is performed after the trends have been computed.
RSKTrends(Stations = "All", Sname, FAC = FALSE)
RSKTrends(Stations = "All", Sname, FAC = FALSE)
Stations |
a vector of the the station identifiers on which to do the trend test. |
Sname |
the response variables on which to do the trend test. |
FAC |
logical, if |
An object of class "rsktest" from regionalSeaken
.
Create a multi-page pdf file of sample data by station in a dataset. The first pages are a listing of the first and last sample and total numerof of samples at each station. The following pages are dot plots of the sample dates. No more than 40 stations per page are liested or plotted.
sampReport(data, DATES = "DATES", STAID = "STAID", file)
sampReport(data, DATES = "DATES", STAID = "STAID", file)
data |
the dataset to summarize |
DATES |
the name of the column containing the sample dates |
STAID |
the name of the column containing the station identifiers |
file |
the output file base name; the .pdf suffix
is appended to make the actual file name. If missing, then the
name of |
The actual file name is returned invisibly.
Define the data and other characteristics of an ESTREND project.
setProj(project, data, STAID, DATES, Snames, FLOW = NULL, Covars = NULL, type = "seasonal", Start = NULL, End = NULL, tol = NULL, min.obs = 20)
setProj(project, data, STAID, DATES, Snames, FLOW = NULL, Covars = NULL, type = "seasonal", Start = NULL, End = NULL, tol = NULL, min.obs = 20)
project |
the name of the project to set up. The project name is forced to all lower case. |
data |
the dataset to use in for the project. |
STAID |
the name of the column in |
DATES |
the name of the column in |
Snames |
the name of the columns in |
FLOW |
the name of the column in |
Covars |
the name of the columns in |
type |
the kind of analysis. Must be "seasonal," "tobit," "annual," or "monthly." Only the fist letter is necessary. See Details. |
Start |
the starting date for the analysis. For seasonal analyses,
must be "Date" or a character string the represents a date. For
annual analyses, must match the type of |
End |
the ending date for the analysis. For seasonal analyses,
must be "Date" or a character string the represents a date. For
annual analyses, must match the type of |
tol |
the tolerance for the samples dates. To be included in a
regular analysis, the first sample within the |
min.obs |
the minimum number of observations required for a trend analysis. |
If Start
and End
are NULL
, then a ragged
analysis is set up, and each variable set up for each station is
analyzed on the available data. Otherwise, the analysis is regular and
will be restricted to the time frame specified by Start
and
End
.
If type
is "seasonal," then the data are processed for a
seasonal Kendall type of analysis—seasons are defined (12, 6, 4,
and 3 per year) and evaluated to select the "best" number of seasons.
If, type
is "monthly," then the data are processed for a
seasonal Kendall type of analysis—seasons defined as months, but
the number of seasons is set by the months during which sampling occured.
There must be at least 1/2 of possible samples in a month to be included.
If type
is "tobit" or "annual," then no seasonal analysis is
done because the data are ready for analysis. If type
is
"annual," then the data must be uncensored.
The name of the project is returned.
A directory is created using the name of the project is created in the user's directory. It contains the objects created by restrend as R workspaces.
Perform the seasonal Kendall trend test.
SKTrends(Stations = "All", Snames = "All", use.logs = TRUE, max.cens = 5, nseas = NULL, report)
SKTrends(Stations = "All", Snames = "All", use.logs = TRUE, max.cens = 5, nseas = NULL, report)
Stations |
a vector of the the station identifiers on which to do the trend test. |
Snames |
a vector of the response variables on which to do the trend test. |
use.logs |
logical, if |
max.cens |
the maximum percent censoring permitted for the
uncensored seasonal Kendall test. If the percentage of censoring
exceeds this value, then the censored seasonal Kendall test is
performed. Set to a negative value to force the censored seasonal
Kendall test for all |
nseas |
the number of seasons to use for all of the tests. If
|
report |
the base name of the PDF file that contains a report for each test; the suffix ".pdf" should not be inlcuded. The default is to use the name of the project with "_sk" appended. If the PDF file exists, then it is not overwritten, but the name is appended with a sequence of numbers until one that is valid is created. |
The name of the report file.
Produce a summary of sample data by station in a dataset.
sumSamp(data, DATES = "DATES", STAID = "STAID", by.numeric = TRUE)
sumSamp(data, DATES = "DATES", STAID = "STAID", by.numeric = TRUE)
data |
the dataset to summarize. |
DATES |
the name of the column containing the sample dates. |
STAID |
the name of the column containing the station identifiers. |
by.numeric |
compute summaries for each numeric column in |
A data frame contining the starting and ending dates of the
samples and the number of samples by station identifier if by.numeric
is FALSE
. If by.numeric
is TRUE
, then the returned
data are by station and numeric column (Reponse) and an indicator of
censoring is included.
# do something here
# do something here
Perform the tobit regression trend test.
tobitTrends(Stations = "All", Snames = "All", use.logs = TRUE, Flow = TRUE, Seasons = TRUE, Covars = NULL, report)
tobitTrends(Stations = "All", Snames = "All", use.logs = TRUE, Flow = TRUE, Seasons = TRUE, Covars = NULL, report)
Stations |
a vector of the station identifiers on which to do the flow adjustment. |
Snames |
a vector of the response variables on which to do the flow adjustment. |
use.logs |
log transform the data (and flow) before the trend test? |
Flow |
logical, include the flow variable? Can also be an expression that includes any transformation of flow. |
Seasons |
include first-order fourier seasonal terms? Can also be an integer value indicating the order of the terms to include. |
Covars |
the covariates listed in the regression model. |
report |
the name of the PDF file that contains a report for each test. The default is to use the name of the project with "_tb" appended. If the PDF file exists, then it is not overwritten, but the name is appended with a sequence of numbers until one that is valid is created. |
The name of the report file.
The trend variable name is .Dectime
, which represents
the trend in units of per year.
Remove flow adjusted concentrations for the seasonal Kendall trend test.
undoFA(Stations, Snames)
undoFA(Stations, Snames)
Stations |
a vector of the the station identifiers on which to do the flow adjustment. |
Snames |
a vector of the response variables on which to do the flow adjustment. |
The arguments for Stations
and Snames
must be valid
station identifiers and response varaibles. "All" is not valid.
Nothing is returned.
The undoFA
function is used to remove the flow adjusted
concentration data and prevent that analysis for the uncensored seasonal
Kendall test. It is intended for the rare case when a reasonable
flow-adjustment model cannot be found.
Define the ESTREND project to use for subsequent analyses.
useProj(project)
useProj(project)
project |
the name of the project to use. The project name is forced to all lower case. If missing, then the most recent project is opened for use. |
The name of the project is returned.