Title: | R functions to support statistical methods in water resources |
---|---|
Description: | Useful tools for the analysis of hydrologic data, not inlcuding censored water-quality data. |
Authors: | Dave Lorenz [cre], Laura De Cicco [ctb] |
Maintainer: | Dave Lorenz <[email protected]> |
License: | CC0 |
Version: | 0.7.6 |
Built: | 2024-11-06 04:58:14 UTC |
Source: | https://github.com/USGS-R/smwrStats |
Useful tools for the analysis of hydrologic data, not inlcuding censored water-quality data.
Package: | smwrStats |
Type: | Package |
Version: | 0.7.5 |
Date: | 2016-01-21 |
License: | CC0 |
Regression applications:
allReg
binaryReg
hosmerLemeshow.test
leCessie.test
multReg
press
rmse
selBestWave
roc
vif
predictDuan
predictFerguson
predictMVUE
senSlope
seasonalPeak
seasonalWave
Record extension applications:
move.1
jackknifeMove.1
move.2
optimBoxCox
Trend applications:
curvi
kensen.test
regken
seaken
trends
serial.test
regken
trends
Summary statistics:
cor.all
printCor
percentile
quantile.numeric
qtiles.CI
skew
sumStats
timeWeightedMean
multicomp.test
Hypothesis tests:
grubbs.test
ppcc.test
serial.test
Dave Lorenz <[email protected]>
Maintainer: Dave Lorenz <[email protected]>
Helsel, D.R. and Hirsch, R.M., 2002, Statistical methods in
water resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.
Lorenz, D.L., in preparation. smwrStats—an R package for the analysis of hydrologic data, version 0.7.5.
Creates a table of the best subsets of explanatory variables for a response variable.
allReg(x, y, wt = rep(1, nrow(x)), nmax = ncol(x), nbst = 3, na.rm.x = TRUE, lin.dep = 10)
allReg(x, y, wt = rep(1, nrow(x)), nmax = ncol(x), nbst = 3, na.rm.x = TRUE, lin.dep = 10)
x |
matrix of candidate explanatory variables. |
y |
the response variable. |
wt |
the weight variable if needed. |
nmax |
the maximum number of explanatory variables to include in the largest model. |
nbst |
the number of best models to determine for each subset size. |
na.rm.x |
logical, if |
lin.dep |
a value to protect against linear dependencies; the number of
the number of observations must be greater than the number of columns in
|
A data frame containing these columns:
model.formula |
the subset model formula |
nvars |
the size (number of variables in the subset model |
stderr |
the standard error of the subsbet model |
R2 |
the coefficient of determination for the subset model |
adjr2 |
the adjusted r-squared of the subbset model |
Cp |
Mallow's Cp for the subset model |
press |
the press statistic for the subset model |
This function is a wrapper for the regsubsets
function in the
leaps package.
Helsel, D.R. and Hirsch, R.M., 2002, Statistical methods in
water resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.
Mallow, C.L., 1973, Come comments of Cp: Technometrics, v. 15, p.
661–675.
Miller, A.J., 1990, Subset selection in regression in Monographs on
Statistics and Applied Probability 40: London, Chapman and Hall.
# See the regression vignette for examples .pager <- options("pager") options(pager="console") vignette(package="smwrStats") options(.pager)
# See the regression vignette for examples .pager <- options("pager") options(pager="console") vignette(package="smwrStats") options(.pager)
Computes diagnostic statistics for an analysis of covariance (ANCOVA) with a single factor variable.
ancovaReg(object, find.best = TRUE, trace = FALSE)
ancovaReg(object, find.best = TRUE, trace = FALSE)
object |
the linear regression model object. |
find.best |
select the "best" subset of terms in the model? |
trace |
print the results of the selection process if |
The input model object (object
) can be the complete ancova model
including all interaction terms or it can be any form of an ANCOVA model.
Most often, if it is not the complete ancova model, then find.best
should be FALSE
.
The find.best
option uses the step
function to select the "best" subset of terms in the model. In general, this
can be used to retain or drop significant interaction terms. It will not
look at individual factor levels in the model.
A list of class "ancovaReg" containing these components:
aovtab |
the analysis of variance table from the original model |
parmests |
a summary of the final |
vif |
a named vector of variance inflation factors. |
diagstats |
a data.frame containing the observed values, predicted values, residuals, standardized residuals, studentized residuals, leverage, Cook's D, and dfits for each observation. |
crit.val |
a named vector of the critical values for leverage, Cook's D, and dfits. |
flagobs |
a logical vector indicating which observations exceeded at least one of the critical values. |
object |
the |
x |
the model matrix of explanatory variables. |
factor.var |
the name of the factor variable |
x.fr |
the model frame of explanatory variables. |
If no factor variable
is found in the final model, either because one was not specified or it was
dropped from the model, then an object of class "multReg" is returned
instead. See multReg
for details.
Objects of class "ancovaReg" have print
and plot
methods.
Draper, N.R. and Smith, H., 1998, Applied Regression Analysis, (3rd ed.): New York, Wiley, 724 p.
Computes diagnostic statistics for logistic regression.
binaryReg(object, lc.max = 1000)
binaryReg(object, lc.max = 1000)
object |
the logistic regression model object. |
lc.max |
the lmaximum number of observations for the le Cessie test. |
Because the le Cessie test is very slow to compute for many observations,
the test is not performed if there are more than lc.max
observations.
A list of class "binaryreg" containing these components:
regsum |
the output from |
Warning |
any warnings relevant to the model |
Factors |
information about factor explanatory variables |
Profile |
summary information about the coding of the response variable |
Hosmer |
output from the Hosmer-Lemeshow test
on |
leCessie |
output from the le Cessie-van Houwelingen
test on |
PctCorrect |
the classification table |
Concordance |
the concordance table |
roc |
output from the
receiver operating characteristics test on |
diagstats |
a data frame containing the response variable, the predicted response probability, the response residual, the deviance residuals, the Pearson residuals, the leverage, the value of Cook's D, and the dfits value |
crit.val |
the critical values for leverage, Cook's D, and dfits |
flagobs |
a logical value indicating which observaitons exceeded any one of the critical values |
object |
the |
Logistic regression can be very useful alternative method for heavily
censored water-quality data.
The critical values for the test criteria
are computed as: leverage, 3p/n; Cook's D, median quantile for the
F distribution with p+1 and n-p degrees of freedonm;
and dfits, the .01 quantile of the grubbs distribution for n
observations, where p is the number of parameters estiamted in the
regression and n is the number of observations.
Objects of class
"binaryreg" have print
and plot
methods.
Harrell, F.E., Jr., 2001, Regression modeling strategies with
applications to linear models, logistic regression and survival analysis:
New York, N.Y., Springer, 568 p.
Helsel, D.R., and Hirsch, R.M., 2002, Statistical methods in water
resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.
McFadden, D., 1974, Conditional logit analysis of qualitative choice behavior: p. 105-142 in Zarembka, P. (ed.), Frontiers in Econometrics. London, Academic Press, 252 p.
roc
, leCessie.test
,
hosmerLemeshow.test
Reviews/accepts the results of an analysis: method for "default" data.
confirm(x, ...) ## Default S3 method: confirm(x, ...)
confirm(x, ...) ## Default S3 method: confirm(x, ...)
x |
the object to be confirmed |
... |
additional arguments required for specific methods |
The object returned depends on the specific method.
The default method simply returns the object and issues a warning.
Reviews/accepts the results of an analysis: method for "seasonalPeak" object.
## S3 method for class 'seasonalPeak' confirm(x, all = FALSE, plot.only = FALSE, ...)
## S3 method for class 'seasonalPeak' confirm(x, all = FALSE, plot.only = FALSE, ...)
x |
a seasonalPeak object. |
all |
a logical value indicating whether to accept the peak without
interactive user input or to force the user to process the peak. The default
value is |
plot.only |
a logical value indicating that only a plot is desired. If
|
... |
not used, required for compatibility with other methods. |
An object of class seasonalPeak. The confirmed object is a single
value that represents the estimate of the timing of the peak and four or
five aadditional ttributes.
NumPeaks: the number of seasonal peaks; either 1 or 2.
Models: candidate
loading models. The number indicates the number of months of constituent
loading.
hlife: candidate half-life values. The muner indicates the
half-life in terms of months.
Confirmed: logical indicating that the
object has not been confirmed.
If NumPeaks
is 2, then an additional attirbute Second.peak
that is a data frame of candidate parameters for the second peak is
included. See seasonalWave
for details.
Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p. 1308-1324.
## Not run: library(smwrData) data(QW05078470) # Simply click on the identified peak, and enter 1 for a single peak. confirm(with(QW05078470, seasonalPeak(dectime(DATES), P00665))) ## End(Not run)
## Not run: library(smwrData) data(QW05078470) # Simply click on the identified peak, and enter 1 for a single peak. confirm(with(QW05078470, seasonalPeak(dectime(DATES), P00665))) ## End(Not run)
Computes correlations among numeric data.
cor.all(data, method = "pearson", na.method = "pairwise", distribution = "normal")
cor.all(data, method = "pearson", na.method = "pairwise", distribution = "normal")
data |
any rectangular object such as a data.frame or matrix. |
method |
a character string indicating which correlation coefficient is to be used. One of "pearson," "kendall," or "spearman," can be abbreviated. |
na.method |
a character string indicating which method to use for missing values. One of "fail," "omit," "pairwise," can be abbreviated. |
distribution |
a character string indicating the assumed distribution of the data. One of "normal," "lognormal," or "log1p", which can be abbreviated. |
The null hypothesis is that the data are not correlated with one another.
The alternate hypothesis is that they are correlated with one another. This
is a two-sided test. For other options, see cor.all
.
If method
is "pearson," then the correlation is based on Pearson's
product moment correlation coefficient. If method
is "kendall," then
Kendall's tau is used to estimate a rank-based measure of association. If
method
is "spearman", then Spearman's rho is used to estimate the
correlation of the ranks of the data. The last two methods may be used if
the data do not necessarily come from a bivariate normal distribution.
If na.method
is "fail," then cor.all
stops if there are any
missing numeric values. If it is "omit," then all rows with any missing
values is removed before the correlations are computed. That option will
always produce a correlation matrix that is positive definite. If
na.method
is "pairwise," then missing values are removed from each
pairwise correlation.
If distribution
is "normal," then the assumption for method
=
"pearson" is that the data are bivariate normal. If distribution
is
"lognormal," then the assumption for method
= "pearson" is that the
data are bivariate log-normal and all data are natural log-transformed. If
distribution
is "log1p," then the assumption for method
=
"pearson" is that the data are bivariate log-normal after adding 1 and all
data are transformed using the log1p
function. The data are
transformed for any method
, but only produce a different result for
method
= "pearson."
An object of class "cor.all," which has these components:
estimates |
a matrix of the correlations between each pair of numeric
variables in |
p.values |
a matrix of the attained
p-values between each pair of numeric variables in |
counts |
a matrix of observations in each pair of numeric variables in
|
alternative |
a character string indicating the alternative hypothesis, always "two.sided" |
na.method |
a character string indicating the method to handle missing values |
method |
a character string describing the method to compute the correlations |
data.name |
the name of the data set, derived from |
data |
a data frame of the numeric variables |
call.method |
a character string indicating the method to compute the correlations |
distribution |
a character string indicating the distribution assumption of the data |
The print
, plot
, and summary
methods are
available for an object of class "cor.all."
Conover, W.J., 1980, Practical nonparametric statistics (2d
ed.): New York, Wiley, 512 p.
Helsel, D.R. and Hirsch, R.M., 2002, Statistical methods in water resources: U.S. Geological Survey Techniques of Water-Resources Investigations, book 4, chap. A3, 522 p.
cor.test
, plot.cor.all
,
summary.cor.all
## Not run: library(smwrData) data(TNLoads) cor.all(TNLoads[, 1:5]) cor.all(TNLoads, method="spearman") ## End(Not run)
## Not run: library(smwrData) data(TNLoads) cor.all(TNLoads[, 1:5]) cor.all(TNLoads, method="spearman") ## End(Not run)
Generates a matrix for curvi-linear modeling of trends.
curvi(x, ..., style = c("mw", "se"))
curvi(x, ..., style = c("mw", "se"))
x |
a vector of dates/times, assumed to be in dectime format. Missing values are permitted and result in corresponding missing values in the output. |
style |
either "mw" for midpoint time and halfwidth, or "se" for start and end times. |
... |
paired vectors specifying the midpoint of the trend and the halfwidth, or the start and end times. |
Curvi-linear trends are described in Appendix 1 (equations 1–9 through 1–11) in Vecchia (2005). The original form of the trend was described in terms of the midpoint of the trend and the halfwidth. Specifying the start and end times of the trend are added as an easy-to-use option.
A matrix with one column for each of the trends sprecified in ....
Curvi-linear trends provide a pleasing visual trend with a gradual
transition between trends in contrast to linear trends with sharp changes at
the endpoints.
Each trend is 0 prior to the start, and then increases to
a maximum of 1 and maintain that maximum value after the end. The overall
change is described by the regresison coefficient, rather than as a rate
described by trends
, for example.
Vecchia, A.V., 2005, Water-quality trend analysis and sampling desgin for streams in the Red River fo the North basin, Minnesota, Norht Dakota, and South Dakota, 1970–2001: U.S. Geolgical Survey Scientific Investigations Report 2005–5224, 54 p. Available on line at http://pubs.usgs.gov/sir/2005/5224/.
# Model with a single curvi-linear trend from 2001 through 2003 # First using midpoint and half width (default) and then start and end. curvi(2000 + seq(0,20)/5, c(2002, 1)) curvi(2000 + seq(0,20)/5, c(2001, 2003), style="se")
# Model with a single curvi-linear trend from 2001 through 2003 # First using midpoint and half width (default) and then start and end. curvi(2000 + seq(0,20)/5, c(2002, 1)) curvi(2000 + seq(0,20)/5, c(2001, 2003), style="se")
Tests for a single outlier in a sample from a normal distribution.
grubbs.test(x, alternate = "two.sided")
grubbs.test(x, alternate = "two.sided")
x |
a vector of numeric values. Missing values are allowed, but are ignored in the calculation. |
alternate |
the alternate hypothesis; must be "two.sided" for either a high or a low outlier, "high" to test only for a high outlier, or "low" to test only for a lopw outlier. |
An object of class "htest" having the following components:
statistic |
the value of the test statistic. |
p.value |
the attained p-value for the test. |
data.name |
a character string describing the name of the data used in the test. |
alternative |
a description of the altrnative and null hypotheses. |
method |
a description of the method. |
Grubbs, F., 1969, Procedures for Detecting Outlying Observations in Samples, Technometrics, v. 11, no. 1, pp. 1-21.
# A random sample with a high value set.seed(100) grubbstest <- rnorm(32) grubbs.test(grubbstest) qqnorm(grubbstest)
# A random sample with a high value set.seed(100) grubbstest <- rnorm(32) grubbs.test(grubbstest) qqnorm(grubbstest)
Performs the Hosmer-Lemeshow test for goodness-of-fit for a logistic regression model.
hosmerLemeshow.test(object, groups = 10)
hosmerLemeshow.test(object, groups = 10)
object |
an object of class "glm" on which to perform the test. |
groups |
the number of groups to use for the test. |
An object of class "htest" having these components:
method |
a description of the method. |
statistic |
the test statistic. |
p.value |
the attained p-level of the test statistic. |
data.name |
the name of |
alternative |
the alternate hypothesis—"some lack of fit." |
estimate |
a data frame of the size, expected value, and actual counts in each group. If the model has a single explanatory variable, then the mean value is included as column 4. |
The null hypothesis is "no lack of fit." Rejection of the null hypothesis indicates "some lack of fit."
Hosmer, D.W. and Lemeshow, S., 1980, Goodness-of-fit tests for the multiple logistic regression model: Communications in Statistics — Theory and Methods, v. 9, p. 1043–1069
Computes the bias and variance of move.1 predictions using the jackknife estimator.
jackknifeMove.1(object, newdata, type = c("response", "link"))
jackknifeMove.1(object, newdata, type = c("response", "link"))
object |
an object of class "move.1" on which to base the predicted values. |
newdata |
an optional data.frame in which to look for variables with
which to predict. If omitted, then the calibration data are used; the response
values are sorted from smallest to largest and |
type |
the type of prediction ("response" or "link"). See Details. |
If type
is "response," then the predicted values are
back-transformed. Otherwise, the predicted values are computed directly from
the model equation.
A vector of predictions matching newdata
or the model data.
Lorenz, D.L., 2015, smwrStats-an R package for analyzing hydrologic data, version 0.7.0: U.S. Geological Survey Open-File Report 2015-XXXX, XX p.
library(smwrData) data(IonBalance) # Build model for non missing Alkalinity IB.move <- move.1(Anion_sum ~ Cation_sum, data=IonBalance, subset=abs(Pct_Diff) < 10) print(IB.move) # Predict Anion_sum for missing Alkalinity predict(IB.move, IonBalance[1, ])
library(smwrData) data(IonBalance) # Build model for non missing Alkalinity IB.move <- move.1(Anion_sum ~ Cation_sum, data=IonBalance, subset=abs(Pct_Diff) < 10) print(IB.move) # Predict Anion_sum for missing Alkalinity predict(IB.move, IonBalance[1, ])
Tests for a temporal trend using Kendall's tau and computes the Sen slope estimate of the trend.
kensen.test(y, t, n.min = 10)
kensen.test(y, t, n.min = 10)
y |
the data collected over time. Missing values (NAs) are allowed and removed before computations. |
t |
the time corresponding to each observations in |
n.min |
the minimum number of observations for adjusting the p-value
for serial correlation. Used when |
An object of class "htest" containing the following components:
method |
a description of the method. |
statistic |
the value of Kendall's tau. |
p.value |
the p-value. |
estimate |
a named vector containing the Sen estimate of the slope in units per year, the median value of the data, the median value of time, the number of observations, and if the serial correction is applied, the effective number of observations (n*). |
data.name |
a string containing the actual name of the input series. |
coef |
a vector of an estimate of the intercept and slope. |
alternative |
a character string describing alternative to the test ("two.sided"). |
null.value |
the value for the hypothesized slope (0). |
A straight line of the form
ytrend = sen.slope * ( t - median.time ) + median.data
may be used as a trend line for graphically
portraying or detrending the data. It goes through the point
(t,y) = ( median.time , median.data )
with slope sen.slope.
Many tied values can cause misleading results.
The p-values for uniformly spaced data (t
values unit value like
years) are adjusted for lag-1 autoregressive serial correlation according to
the method described by Yue and Wang (2004) that adjusts for trend. In
keeping with the logic of seaken
, the p-value adjustment is never
performed for fewer than 10 observations. The user can suppress the
adjustment by setting the value of n.min
to Inf
.
Conover, W.J., 1980, Practical nonparametric statistics (2d
ed.): New York, Wiley, 512 p.
Helsel, D.R., and Hirsch, R.M., 2002, Statistical methods in water
resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.
Hirsch, R.M., Alexander, R.B. , and Smith, R.A., 1991, Selection of methods
for the detection and estimation of trends in water quality: Water Resources
Research, v. 27 p. 803–813.
Kendall, M.G., 1938, A new measure of rank correlation: Biometrika v. 30, p.
81–89.
Kendall, M.G., 1976, Rank correlation methods (4th ed.): London, Griffin,
202 p.
Sen, P.K., 1968, Estimates of regression coefficient based on Kendall's tau:
Journal of the American Statisical Association, v. 63, p. 1379–1389.
Yue, S. and Wang. C., 2004, The Mann-Kendall test modified by effective sample size to detect trend in serially correlated hydrological series: Water Resources Management v. 18, p. 201-218.
## Not run: library(smwrData) data(SaddlePeaks) with(SaddlePeaks, kensen.test(Flow, Year)) ## End(Not run)
## Not run: library(smwrData) data(SaddlePeaks) with(SaddlePeaks, kensen.test(Flow, Year)) ## End(Not run)
Performs the le Cressie-van Houwelingen test for goodness-of-fit for a logistic regression model.
leCessie.test(object, bandwidth, newterms)
leCessie.test(object, bandwidth, newterms)
object |
an object of class "glm" on which to perform the test. |
bandwidth |
the bandwidth for smoothing the residuals. |
newterms |
any new variables to add to the model. Expressed as the right-hand side of a formula. |
If bandwidth
is missing, then the mean distance between observations
is used.
An object of class "lecessie" having these components:
method |
a description of the method. |
statistic |
the test statistic. |
parameters |
the degrees of freedom of the chi-squared test. |
p.value |
the attained p-level of the test statistic. |
data.name |
the name of |
alternative |
the alternate hypothesis–"some lack of fit." |
estimate |
the estimated values for the test. |
object |
the original |
target.object |
the |
bandwidth |
the bandwidth used for smoothing the residuals. |
max.distance |
the maximum distance between observations. |
smoothed.residuals |
the smoothed residuals. |
distance.matrix |
a matrix of the distances between observations. |
hat |
the hat matrix. |
The null hypothesis is "no lack of fit." Rejection of the null hypothesis indicates "some lack of fit."
le Cessie, S. and van Houwelingen, H.C., 1995, Testing the fit of a regression model via score tests in random effects models: Biometrics, v. 51, p 600-614.
Creates the right matrices for model.frame.default
when predicting from models with trends
terms. Used only internally.
## S3 method for class 'trends' makepredictcall(var, call)
## S3 method for class 'trends' makepredictcall(var, call)
var |
a variable. |
call |
the term in the formula, as a call. |
A replacement for call
for the prediction variable.
Calculates the Maintenance Of Variance Extension, Type 1 (MOVE.1) for record extension by fitting a Line of Organic Correlation (LOC) regression model.
move.1(formula, data, subset, na.action, distribution = "normal")
move.1(formula, data, subset, na.action, distribution = "normal")
formula |
a formula object with the response variable on the left of the ~ operator and a single explantory variable on the right. |
data |
the data frame containing the variables named in |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function that indicates what should happen when the data
contain missing values (NAs). The default is set by the |
distribution |
either "normal," "lognormal," or "commonlog" indicating nature of the bivariate distribution, See Details. |
If distribution
is "normal," then the data in x
and y
are assumed to have a bivariate normal distribution. Otherwise, they are
assumed to have a bivariate log-normal distribution and a logarithmic
transform is applied to both x
and y
before analysis. The
natural logarithm is used if distribution
is "lognormal" and the
commmon logarithm is used if distribution
is "commonlog."
An object of class "move.1" having these components:
coefficients |
the intercept and slope of the line describing the fit. |
na.action |
a character string indicating the special handling of NAs. |
R |
Pearson's correlation coefficient. |
call |
the
matched call to |
fitted.values |
the fitted LOC values for the response. |
residuals |
a 2-column matrix containing the
signed distance from the predicted to to the corresponding |
x |
the (possibly transformed) values for |
y |
the (possibly transformed) values for |
x.stats |
the mean and standard deviation of the (possibly
transformed) values for |
y.stats |
the mean and standard
deviation of the (possibly transformed) values for |
var.names |
the names of |
model |
the model frame. |
Objects of class "move.1" have print
, predict
, and
plot
methods.
Hirsch, R.M., 1982, A comparison of four streamflow record extension techniques: Water Resources Research, v. 18, p. 1081–1088.
library(smwrData) data(IonBalance) # Build model for non missing Alkalinity IB.move <- move.1(Anion_sum ~ Cation_sum, data=IonBalance, subset=abs(Pct_Diff) < 10) print(IB.move)
library(smwrData) data(IonBalance) # Build model for non missing Alkalinity IB.move <- move.1(Anion_sum ~ Cation_sum, data=IonBalance, subset=abs(Pct_Diff) < 10) print(IB.move)
Calculates the Maintenance Of Variance Extension, Type 2 (MOVE.2) for record extension by fitting a Line of Organic Correlation (LOC) regression model.
move.2(formula, data, subset, distribution = "normal", lag = 0)
move.2(formula, data, subset, distribution = "normal", lag = 0)
formula |
a formula object with the response variable on the left of the ~ operator and a single explantory variable on the right. |
data |
the data frame containing the variables named in |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
distribution |
either "normal," "lognormal," "commonlog," "log1p," or an object of class "optimBoxCox" indicating the nature of the bivariate distribution, see Details. |
lag |
the number of days to account for time of travel between the explanatory and response sites. If the explanatory site is upstream, then lag can be positive, otherwise, lag can be negative to account for the travel time between the sites. |
MOVE.2 has a necessarily predefined method for missing values—the
response variable is assumed to contain missing values and they are the values
to be estimated by the model equation. For the function move.2
, missing
values in the explanatory variable are excluded from the computations.
If distribution
is "normal," then the data in the explanatory variable
and the response variable are assumed to have a bivariate normal distribution.
Otherwise, they are assumed to have a bivariate log-normal distribution and a
logarithmic transform is applied to both the explanatory variable and the
response variable before analysis. The natural logarithm is used if
distribution
is "lognormal" and the commmon logarithm is used if
distribution
is "commonlog." If either the response or the explanatory
has zero values, then the "log1p" option can be used. That option adds 1 to
each value and then computes the natural logarithm.
Alternatively, the output from optimBoxCox
that contains both the response and explanatory variables can be supplied to
transform those variables by other than a logarithmic transform.
An object of class "move.2" having these components:
coefficients |
the intercept and slope of the line describing the fit. |
R |
Pearson's correlation coefficient. |
p.value |
the p-value from the correlation test, given a two-sided alternate hypothesis. |
call |
the matched call to |
fitted.values |
the fitted LOC values for the response. |
residuals |
a 2-column matrix containing the signed distance from the predicted to to the corresponding the response variable and the explanatory variable values. |
x |
the (possibly transformed and lagged) values for the explanatory variable. |
y |
the (possibly transformed) values for the response variable. |
lag |
the value of the |
xstats |
the mean and standard deviation of the (possibly transformed) values for the explanatory variable. |
ystats |
the mean and standard deviation of the (possibly transformed) values for the response variable. |
var.names |
the names of the response variable and the explanatory variable. |
model |
the model frame. |
data |
the data frame supplied in |
distribution |
the value supplied in |
Objects of class "move.2" have print
, predict
, and
plot
methods.
Hirsch, R.M., 1982, A comparison of four streamflow record
extension techniques: Water Resources Research, v. 18, p. 1081–1088.
Moog, D.B., Whiting, P.J., and Thomas, R.B., 1999, Streamflow record extension using
power transformations and applicaitons to sediment transport: Water Resources
Research, v. 35, p 243–254.
predict.move.2
, plot.move.2
,
optimBoxCox
## Not run: # See the vignette: vignette("RecordExtension", package="smwrStats") ## End(Not run)
## Not run: # See the vignette: vignette("RecordExtension", package="smwrStats") ## End(Not run)
Performs multiple comparison tests among groups of data. The tests may be either parametric (Yandell, 1997), nonparametric (Higgins, 2004), or Dunn's nonparametric (Glantz, 2005).
multicomp.test(x, g, method = "parametric", critical.value = "", alpha = 0.05)
multicomp.test(x, g, method = "parametric", critical.value = "", alpha = 0.05)
x |
the numeric vector of observations. Missing values (NAs) are allowed and removed before the test is performed. |
g |
any group vector for the observations. Missing values (NAs) are allowed and removed before the test is performed. |
method |
a character string describing the test. Only the first character is necessary. See Details. |
critical.value |
a character string describing the method to use for determining the critical value. Only the first character is necessary. See Details. |
alpha |
the significance level of the test. See Note. |
The choices for method
are "parametric," "nonparametric," and
"dunn." If the method
is "parametric," then the comparisons are based on
the means and variances of the raw data and the valid choices for
critical.value
are "tukey" (default), "bonferroni," or "lsd." Otherwise,
the comparisons are based on the ranks of the data. Valid choices for
critical.value
are "tukey" (default), "bonferroni," or "lsd" when
method
is "nonparametric" and "sidak" (default) or "bonferroni"
when method
is "dunn." The basic diffference between the default
nonparametric method and Dunn's nonparametric method is in the handling of ties.
An object of class MCT containing the following components:
title |
a description of the test. |
cv.method |
the method used to compute the critical value. |
alpha |
the value of |
crit.value |
the critical value for the pairwise comparisons. |
response |
the name of the response variable. |
groups |
the name of the group variable. |
means |
the means for each group. |
sizes |
the number of observations in each group. |
table |
the table of the results of the pairwise comparisons. |
assoc |
a data frame containing the possible association for each group. |
All computations of the variance for unequal group sizes are based on
the harmonic mean as described in Yandell (1997). That adjustment is only
approximate when critical.value
is "tukey" and method
is
"parametric" but useful when the design is slightly unbalanced.
The
default nonparametric method method
= "nonparametric" is only
assymptotically unbiased when some data are tied. For smaller data sets with
small numbers of ties, it may be preferable to use Dunn's nonparametric
method method
= "dunn."
Glantz, S.A., 2005, Primer of biostatistics: McGraw Hill, New York, 520 p.
Higgins, J.J., 2004, Introduction to modern nonparametric statistics: Pacific Grove, Calif., Brooks/Cole, 384 p.
Yandell, B.S., 1997, Practical data analysis for designed experiments: London, United Kingdom, Chapman & Hall, 437 p.
Computes diagnostics for linear regression.
multReg(object)
multReg(object)
object |
the linear regression model object |
A object of class "multReg" having the following components:
aovtab |
the analysis of variance table, using the type II sum of squares. See Note. |
parmests |
a summary of |
vif |
a named vector of variance inflation factors. |
diagstats |
a data.frame containing the observed values, predicted values, residuals, standardized residuals, studentized residuals, leverage, Cook's D, and dfits for each observation. |
crit.val |
a named vector of the critical values for leverage, Cook's D, and dfits. See Note |
flagobs |
a logical vector indicating which observations exceeded at least one of the critical values. |
object |
the |
x |
the model matrix of explanatory variables. |
The type II sum of squares are calculated according to the principle
of marginality, testing each term after all others, except ignoring the
term's higher-order relatives. This type sum of squares is useful for
assessing the overall marginal effect of each term in the model.
The critical values for the test criteria are computed as: leverage, 3p/n; Cook's D, median quantile for the F distribution with p+1 and n-p degrees of freedom; and dfits, the .01 quantile of the grubbs distribution for n observations time the square root of (p/n), where p is the number of parameters estimated in the regression and n is the number of observations.
Objects of class "multReg" have print
and
plot
methods.
Draper, N.R. and Smith, H., 1998, Applied Regression Analysis,
(3rd ed.): New York, Wiley, 724 p.
Computes Box-Cox transformations that maximimize the log likelihood of the transformations.
optimBoxCox(X, start = NULL)
optimBoxCox(X, start = NULL)
X |
a data frame or matrix of the data to find the optimized Box-Cox transforms to produce multivariate normality. Can also be a numeric vector for a simple Box-Cox transform to normality. |
start |
a numeric vector of length matching the number of columns in |
An object of class "optimBoxCox" having these components:
start |
the starting values for the Box-Cox transformations. |
criterion |
the log-likelihood of the Box-Cox transformations. |
names |
the names of the columns. |
lambda |
the values of the Box-Cox transformations. |
stderr |
the standard errors of the Box-Cox transformations. |
return.code |
the convergence value returned by |
gm |
the geometric means of the data in |
data |
the data in |
The maximum likelihood estimate of the Box-Cox transformations corresponds to the
minimum determinant of the variance-covariance matrix of the transformed X
. The
methodology is described in Andrews and others (1971).
Andrews, D.F., Gnanadesikan, R., and Warner, J.L., 1971, Transformations of multivariate data: Biometrics, v. 27, p. 825–840.
Computes the empirical cumulative percent or percent exceedance of observed data
for specific values.
percentile(x, q, test = ">=", na.rm = TRUE, percent = TRUE, ...) ## Default S3 method: percentile(x, q, test = ">=", na.rm = TRUE, percent = TRUE, ...)
percentile(x, q, test = ">=", na.rm = TRUE, percent = TRUE, ...) ## Default S3 method: percentile(x, q, test = ">=", na.rm = TRUE, percent = TRUE, ...)
x |
a numeric vector representing the observed data. |
q |
a vector of quantiles for which the cumulative percent or percent exceedence is desired. |
test |
a character string indicating the test. The default value, '>=,' is the percent equalling or exceeding the quantile and '<' would return the cumulative percent below the quantile. |
na.rm |
a logical value indication whether missing values (NAs) should
be removed or not. If na.rm is |
percent |
a logical value indicating whether the result should be
expressed as a percent or proportion. The default value, |
... |
not used, required for method function |
A named vector as long as q
corresponding to the requested
value.
The stats package contains the ecdf
function that performs a
similar function when test
is "<=."
set.seed(2342) Xr <- rlnorm(24) # The percentage of the observarions greater than or equal to 2 percentile(Xr, 1:5)
set.seed(2342) Xr <- rlnorm(24) # The percentage of the observarions greater than or equal to 2 percentile(Xr, 1:5)
Produce a series of diagnostic plots for statistical analyses.
## S3 method for class 'ancovaReg' plot(x, which = "All", set.up = TRUE, span = 0.8, ...) ## S3 method for class 'binaryreg' plot(x, which = 2:5, set.up = TRUE, bandw = 0.3, ...) ## S3 method for class 'cor.all' plot(x, which = "All", set.up = TRUE, ...) ## S3 method for class 'lecessie' plot(x, which = "All", set.up = TRUE, ...) ## S3 method for class 'move.1' plot(x, which = "All", set.up = TRUE, span = 0.8, ...) ## S3 method for class 'move.2' plot(x, which = "All", set.up = TRUE, span = 0.8, ...) ## S3 method for class 'multReg' plot(x, which = "All", set.up = TRUE, span = 1, ...) ## S3 method for class 'senSlope' plot(x, which = "All", set.up = TRUE, span = 0.8, ...) ## S3 method for class 'roc' plot(x, which = "All", set.up = TRUE, ...)
## S3 method for class 'ancovaReg' plot(x, which = "All", set.up = TRUE, span = 0.8, ...) ## S3 method for class 'binaryreg' plot(x, which = 2:5, set.up = TRUE, bandw = 0.3, ...) ## S3 method for class 'cor.all' plot(x, which = "All", set.up = TRUE, ...) ## S3 method for class 'lecessie' plot(x, which = "All", set.up = TRUE, ...) ## S3 method for class 'move.1' plot(x, which = "All", set.up = TRUE, span = 0.8, ...) ## S3 method for class 'move.2' plot(x, which = "All", set.up = TRUE, span = 0.8, ...) ## S3 method for class 'multReg' plot(x, which = "All", set.up = TRUE, span = 1, ...) ## S3 method for class 'senSlope' plot(x, which = "All", set.up = TRUE, span = 0.8, ...) ## S3 method for class 'roc' plot(x, which = "All", set.up = TRUE, ...)
x |
the object to be plotted. |
which |
a character string or sequence of integers indicating which diagnostic plots to produce. See Details. |
set.up |
set up the graphics page? Set to |
span |
the smoothing parameter for |
bandw |
the bandwidth for kernel smoothing for the Hosmer-Lemeshow plot. |
... |
not used, required for method function. These diagnostic plots have fixed graphics output. |
For objects of class "ancovaReg" and "multReg," the argument which
can be a character string, "All" or any of a sequence of numbers from 1 to
8. If it is "All," then all plots are produced. For class "multReg," can
also be the name of an explanatory variable so that a residual dependence
plot is created for a single variable.
Numeric values for which
:
Fitted with separate factor levels vs. Observed
Fitted vs. Residual
S-L plot
A correlogram if dates are available in the model or in the data set
Q-normal and Tukey boxplots for each factor level
Influence plot
Outliers plot
Residual dependence plots for each explanatory variable
For objects of class "binaryreg," the argument which
can be a
character string, "All" or any of a sequence of numbers from 1 to 5. If it
is "All," then all plots are produced. By default, the le Cessie-van
Houwelingen diagnotic plot is not shown as it can be difficult to interpret.
Numeric values for which
:
Le Cessie-van Houwelingen overall fit
Overall fit
Classification plot
ROC plot
Lin-Wei-Ying partial residual plots
For objects of class "cor.all," the argument which
must be either
"All," "Lower," or the name of one of the variables. If which
is
"All", then the full scatter plot matrix is shown if there are 4 or fewer
variables, otherwise indivdual paired plots are shown. If which
is
"Lower", then the lower part of the scatter plot matrix is shown if there
are 5 or fewer variables, otherwise indivdual paired plots are shown. If
which
is the name of a variable, then only the scatter plots of that
variable and all others are shown.
For objects of class "lecessie," the argument which
must be either
"All," "First," the name of one or more of the variables, or a number
indicating which variable to plot. If which
is "All", then the
fitted vs. Residual and all partial residual plots are created. If
which
is "First", then only the fitted vs. Residual plot is created.
If which
is one or more of the variable names, then those partial
residual plots are created. If which
is numeric, then the fitted vs.
Residual (1) sequentially numbered partial residual plots are created.
For objects of class "move.1" or "move.2," the argument which
can be a character
string, "All" or any of a sequence of numbers from 1 to 3. If it is "All,"
then all plots are produced. Numeric values for which
:
x on y and y on x
The LOC regression line with ellipse
Q-Q plot of x and y
For objects of class "roc," the argument which
can be a character
string, "All" or 1. The only graph is the receiver operating characteristics
curve for a logistic regression.
For objects of class "senSlope," the argument which
can be a
character string, "All" or 1. The only graph avaialble is a scatter plot of
the data with the regression line in green and a smoothed line in cyan.
The object x
is returned invisibly.
le Cessie, S. and van Houwelingen, H.C., 1995, Testing the fit
of a regression model via score tests in random effects models: Biometrics,
v. 51, p. 600–614.
Lin, D.Y., Wei, L.J., and Ying, Z., 2002,
Model-checking techniques based on cumulative residuals: Biometrics, v. 58,
p. 1–12.
ancovaReg
, binaryReg
,
cor.all
, leCessie.test
, move.1
,
move.2
, multReg
, roc
, senSlope
,
loess.smooth
, setPage
Creates diagnostic plots for selected hypothesis tests.
## S3 method for class 'htest' plot(x, which = "All", set.up = TRUE, ...) ## S3 method for class 'kensen' plot(x, which = "All", set.up = TRUE, ...) ## S3 method for class 'ppcc' plot(x, which = "All", set.up = TRUE, ...)
## S3 method for class 'htest' plot(x, which = "All", set.up = TRUE, ...) ## S3 method for class 'kensen' plot(x, which = "All", set.up = TRUE, ...) ## S3 method for class 'ppcc' plot(x, which = "All", set.up = TRUE, ...)
x |
an object having classes "htest" and some other class for which a
|
which |
either "All" or a number indicating which plot, see Details. |
set.up |
set up the graphics page? |
... |
not used, required for other methods. |
The kensen
method for plot
graphs the y and t data with the
best fit line.
The ppcc
method creates a single graph that shows the q-normal plot
with a line showing the fit.
The object x
is returned invisibly.
Computes the probability plot correlation coefficient test for departures from normality.
ppcc.test(x)
ppcc.test(x)
x |
a vector of numeric values. Missing values are allowed, but are ignored in the calculation. |
An object of class "htest" having the following components:
statistic |
the value of the test statistic. |
p.value |
the attained p-value for the test. |
data.name |
a character string describing the name of the data used in the test. |
method |
a description of the method. |
The PPCC test is attractive because it has a simple, graphical
interpretation: it is a measure of the correlation in a Q-normal plot of the
data. As such, it is related to the Shapiro-Wilk test (Shapiro and Wilk,
1965) for normality.
The distribution function of the test statistic is empirical. This
application uses the "pocket calculator" approximation for computing the
p-value of the observed statistic (Royston, 1992).
Filliben, 1975, The PPCC test for normality: Technometrics, v.
17, no. 1, p. 111–117.
Looney, S.W., and Gulledge, T.R., 1985, Use of the correlation coefficient
with normal probability plots: The American Statistician, v. 39, p.
75–79.
Royston, J.P., 1992, A pocket-calculator algorithm for the Shapiro-Francia
test of non-normality–an application to medicine: Statistics in Medicine,
v. 12, p. 181–184.
Shapiro, S.S., and Wilk, M.B., 1965, An analysis of variance test for normality (complete samples): Biometrika, v. 52, p. 591–611.
## These data should produce an attained p-value less than 0.001 set.seed(45) ppcc.test.data <- rnorm(32) qqnorm(ppcc.test.data) abline(mean(ppcc.test.data), sd(ppcc.test.data)) ppcc.test(ppcc.test.data)
## These data should produce an attained p-value less than 0.001 set.seed(45) ppcc.test.data <- rnorm(32) qqnorm(ppcc.test.data) abline(mean(ppcc.test.data), sd(ppcc.test.data)) ppcc.test(ppcc.test.data)
Predicts new values from a maintenance of variance extension, type 1 (MOVE.1) model fit.
## S3 method for class 'move.1' predict(object, newdata, type = c("response", "link"), var.fit = FALSE, ...)
## S3 method for class 'move.1' predict(object, newdata, type = c("response", "link"), var.fit = FALSE, ...)
object |
an object of class "move.1" on which to base the predicted values. |
newdata |
an optional data.frame in which to look for variables with which to predict. If omitted, then the fitted values are used. |
type |
the type of prediction ("response" or "link"). See Details. |
var.fit |
logical if |
... |
not used, required for method function. |
If type
is "response," then the predicted values are
back-transformed. Otherwise, the predicted values are computed directly from
the model equation.
If var.fit
is FALSE
, then a vector of predictions matching
newdata
or the model data. If var.fit
is TRUE
, then a data frame
containing the columns:
fit |
the predicted values |
var.fit |
the variance of the predicted values |
Lorenz, D.L., 2015, smwrStats-an R package for analyzing hydrologic data, version 0.7.0: U.S. Geological Survey Open-File Report 2015-XXXX, XX p.
library(smwrData) data(IonBalance) # Build model for non missing Alkalinity IB.move <- move.1(Anion_sum ~ Cation_sum, data=IonBalance, subset=abs(Pct_Diff) < 10) print(IB.move) # Predict Anion_sum for missing Alkalinity predict(IB.move, IonBalance[1, ])
library(smwrData) data(IonBalance) # Build model for non missing Alkalinity IB.move <- move.1(Anion_sum ~ Cation_sum, data=IonBalance, subset=abs(Pct_Diff) < 10) print(IB.move) # Predict Anion_sum for missing Alkalinity predict(IB.move, IonBalance[1, ])
Predicts new values from a maintenance of variance extension, type 2 (MOVE.2) model fit.
## S3 method for class 'move.2' predict(object, newdata, type = c("response", "link"), ...)
## S3 method for class 'move.2' predict(object, newdata, type = c("response", "link"), ...)
object |
an object of class "move.2" on which to base the predicted values. |
newdata |
an optional data.frame in which to look for variables with which to predict. If omitted, then the fitted values are used. |
type |
the type of prediction ("response" or "link"). See Details. |
... |
not used, required for method function. |
If type
is "response," then the predicted values are
back-transformed. Otherwise, the predicted values are computed directly from
the model equation.
A vector of predictions matching newdata
or the model data.
If lag was set to a non-zero value in the call to move.2
, then the
explanatory variable is lagged only when predictiong values from the calibration
data (newdata
is not supplied.) This facilitates prediction of selected
statistics at the response site rather than the complete record.
## Not run: # See the vignette: vignette("RecordExtension", package="smwrStats") ## End(Not run)
## Not run: # See the vignette: vignette("RecordExtension", package="smwrStats") ## End(Not run)
Predicts new values from Sen slope (senSlope) model fit.
## S3 method for class 'senSlope' predict(object, newdata, ...)
## S3 method for class 'senSlope' predict(object, newdata, ...)
object |
an object of class "senSlope" on which to base the predicted values. |
newdata |
an optional data.frame in which to look for variables with which to predict. If omitted, then the fitted values are used. |
... |
not used, required for method function. |
A vector of predictions matching newdata
or the model data.
Predicts bias-corrected expected mean response values from a log-transformed regression model, using either the minimum variance unbiased estimate(MVUE), Duan's smoothing estimate, or Ferguson's maximum likelihood estimate.
predictDuan(object, newdata, back.trans = exp) predictFerguson(object, newdata, Log10 = FALSE) predictMVUE(object, newdata, Log10 = FALSE)
predictDuan(object, newdata, back.trans = exp) predictFerguson(object, newdata, Log10 = FALSE) predictMVUE(object, newdata, Log10 = FALSE)
object |
an object of class "lm" on which to base the predicted values. |
newdata |
an optional data.frame in which to look for variables with which to predict. If omitted, then the fitted values are used. |
back.trans |
the back-transformation function. For common log
transforms, use |
Log10 |
is the transform of the response variable the common log? |
A vector of predictions matching newdata
or the model data.
Bradu, D. and Mundlak, Y., 1970, Estimation in the lognormal linear models: Journal of the American Statistical Association, v. 65, no. 329, p. 198–211.
Duan, N., 1983, Smearing estimate: a nonparametric retransformation method: Journal of the American Statistical Association, v. 78, p. 159–178.
Ferguson, R.I. 1986, River loads underestimated by rating curves: Water Resources Research, v. 22, p 74–76.
Helsel, D.R. and Hirsch, R.M., 2002, Statistical methods in water resources: U.S. Geological Survey Techniques of Water-Resources Investigations, book 4, chap. A3, 522 p.
## Generate random log-normal data and build the regression model set.seed(111) XX.df <- data.frame(x=sort(runif(32, 1, 5)), y=rlnorm(32, seq(1,2, length.out=32))) XX.lm <- lm(log(y) ~ x, data=XX.df) ## Compare the results for x=1:5 ## The simple back-transformed estimates exp(predict(XX.lm, newdata=data.frame(x=1:5))) ## The bias corrected estimates of the mean response predictFerguson(XX.lm, newdata=data.frame(x=1:5)) predictDuan(XX.lm, newdata=data.frame(x=1:5)) predictMVUE(XX.lm, newdata=data.frame(x=1:5))
## Generate random log-normal data and build the regression model set.seed(111) XX.df <- data.frame(x=sort(runif(32, 1, 5)), y=rlnorm(32, seq(1,2, length.out=32))) XX.lm <- lm(log(y) ~ x, data=XX.df) ## Compare the results for x=1:5 ## The simple back-transformed estimates exp(predict(XX.lm, newdata=data.frame(x=1:5))) ## The bias corrected estimates of the mean response predictFerguson(XX.lm, newdata=data.frame(x=1:5)) predictDuan(XX.lm, newdata=data.frame(x=1:5)) predictMVUE(XX.lm, newdata=data.frame(x=1:5))
Computes the prediction error sum of squares statistic (PRESS) (Helsel and Hirsch, 2002) for a linear regression model.
press(model)
press(model)
model |
an object of class "lm" or the output from |
The prediction error sum of squares statistic.
Helsel, D.R. and Hirsch, R.M., 2002, Statistical methods in
water resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.
Prints the results of a analysis of covariance (ancovaReg
).
## S3 method for class 'ancovaReg' print(x, digits = 3, ...)
## S3 method for class 'ancovaReg' print(x, digits = 3, ...)
x |
an object of class "ancovaReg" from |
digits |
the number of significant digits to print numeric data. |
... |
not used for method, required for other methods. |
The object x
is returned invisibly.
The printed output contains the ANOVA table for the orignal models, the regression summary for the final model, variance inflation factors for each explanatory variable in the final model, and selected test criteria with observations that exceed one of more of the criteria.
Prints the results of a logistic regression diagnotic analysis
(binaryReg
).
## S3 method for class 'binaryreg' print(x, digits = 4, ...)
## S3 method for class 'binaryreg' print(x, digits = 4, ...)
x |
an object of class "binaryreg" from |
digits |
the number of significant digits to print numeric data. |
... |
not used for method, required for other methods. |
The original regression model should be the output from glm
with
binomial
as the value for the family
argument.
The object x
is returned invisibly.comp2 Description of
'comp2'
The printed output contains the original call; a summary of the deviance residuals; the coefficients with their estimates, standard errors, z scores, and attained p-values; a summary of the deviance comparison between the model and null model (no explanatory varaibles), also known as the G-squared or -2 log-likelihood; the response profile, which matches the value of the response variable to 0 or 1 and the number of observations in each group; the le Cessie-van Houwelingen and Hosmer-Lemeshow goodness of fit tests; three assessments of the predictinve power: the MaFadden R-squared and the adjusted R-squared, the classification table, the concordance index, and the area under the reciever operating characteristics curve; and selected test criteria with observations that exceed one of more of the criteria.
Prints the results of a correlation test (cor.all
).
## S3 method for class 'cor.all' print(x, digits = 4, lower = TRUE, ...)
## S3 method for class 'cor.all' print(x, digits = 4, lower = TRUE, ...)
x |
an object of class "cor.all" from |
digits |
the number of significant digits to print numeric data. |
lower |
logical, print only the lower triangular matrix? Otherwise, print the full, square matrix. |
... |
not used for method, required for other methods. |
The object x
is returned invisibly.
The printed output contains a description of the test; and 3 lines for each comparison that is printed: the correlation statistic, the attained p-value, and the number of pairs.
Prints the results of a le Cessie-van Houwelingen test
(leCessie.test
).
## S3 method for class 'lecessie' print(x, digits = 4, ...)
## S3 method for class 'lecessie' print(x, digits = 4, ...)
x |
an object of class "lecessie" from |
digits |
the number of significant digits to print numeric data. |
... |
not used for method, required for other methods. |
The object x
is returned invisibly.
The printed ouput is very similar to the printed ouput for class "htest," the output from a hypothesis test, but includes the additional output summarizing the distance between observations, which can be useful for comparing the reults with different bamdwidth settings.
Prints the results of a multiple comparison test (multicomp.test
).
## S3 method for class 'MCT' print(x, digits = 4, ...)
## S3 method for class 'MCT' print(x, digits = 4, ...)
x |
an object of class "MCT" from |
digits |
the number of significant digits to print numeric data. |
... |
not used for method, required for other methods. |
The object x
is returned invisibly.
The printed output contains a description of the test, critical values, the variables in the test, and two tables: the paired comparisons and associations among the groups. The table of the paried comparisons shows the groups in the comparison, the estimate of the difference between the group means, the standard error of the difference, lower and upper confidence intervals, and a flag that indicates if the confidence interval excludes 0, which indicates wheter the difference is significantly different from 0 at the user-specified value. The table of asociations shows the group, the mean value of the response, the number of observations in the group, and any number of columns names "A," "B," and so forth that represent possible associations of the groups whare an "X" is present in the group.
Prints the results of a move.1 analysis (move.1
).
## S3 method for class 'move.1' print(x, digits = 4, ...)
## S3 method for class 'move.1' print(x, digits = 4, ...)
x |
an object of class "move.1" from |
digits |
the number of significant digits to print numeric data. |
... |
additonal arguments for printing numeric values. |
The object x
is returned invisibly.
The printed output contains the call, the regression coefficients, and selected statistics of the variables.
Prints the results of a MOVE.2 analysis (move.2
).
## S3 method for class 'move.2' print(x, digits = 4, ...)
## S3 method for class 'move.2' print(x, digits = 4, ...)
x |
an object of class "move.2" from |
digits |
the number of significant digits to print numeric data. |
... |
additonal arguments for printing numeric values. |
The object x
is returned invisibly.
The printed output contains the call, the regression coefficients, and selected statistics of the variables.
Prints the results of a multiple regression diagnostic analysis
(multReg
).
## S3 method for class 'multReg' print(x, digits = 3, ...)
## S3 method for class 'multReg' print(x, digits = 3, ...)
x |
an object of class "multReg" from |
digits |
the number of significant digits to print numeric data. |
... |
not used for method, required for other methods. |
The object x
is returned invisibly.
The printed output contains the regression model call; a summary of the residuals; A table of the coefficients with their estiamtes, standard errors, t vlaues, and attained probability levels; the residual standard error; R-squared and F-statistical summaries; Model comparison statistics; if more than one explanatory variable a type II sum-of-squares analysis of variance table and variance inflation factors; and selected test criteria with observations that exceed one of more of the criteria.
Prints the results of a multivariate unconditional Box-Cox transformation.
## S3 method for class 'optimBoxCox' print(x, digits = 4, ...)
## S3 method for class 'optimBoxCox' print(x, digits = 4, ...)
x |
an object of class "optimBoxCox" from |
digits |
the number of significant digits to print numeric data. |
... |
not used for method, required for other methods. |
The object x
is returned invisibly.
The printed output contains a table showing the power transformation values and their standard errors.
Prints the results of a receiver operator characteristics (ROC) for a logistic regression model.
## S3 method for class 'roc' print(x, digits = 3, ...)
## S3 method for class 'roc' print(x, digits = 3, ...)
x |
an object of class "roc" from |
digits |
the number of significant digits to print numeric data. |
... |
not used for method, required for other methods. |
The object x
is returned invisibly.
The printed output contains the area under to ROC curve.
Prints the results of a seasonal peak analysis (seasonalPeak
).
## S3 method for class 'seasonalPeak' print(x, digits = 3, details = FALSE, ...)
## S3 method for class 'seasonalPeak' print(x, digits = 3, details = FALSE, ...)
x |
an object of class "seasonalPeak" from |
digits |
the number of significant digits to print numeric data. |
details |
logical, print the model details? |
... |
not used for method, required for other methods. |
The object x
is returned invisibly.
The printed output contains the default time-of-peak value and potential alternate values for unconfirmed peaks and the number of peaks, timing of the primary peak, and optionally the model information for confirmed peaks.
Prints the results of a Sen slope analysis (senSlope
).
## S3 method for class 'senSlope' print(x, digits = 4, ...)
## S3 method for class 'senSlope' print(x, digits = 4, ...)
x |
an object of class "senSlope" from |
digits |
the number of significant digits to print numeric data. |
... |
additonal arguments for printing numeric values. |
The object x
is returned invisibly.
The printed output contains the call, the smalles and largest residuals, the regression coefficients, and the confidence limits of the Sen slope.
Prints the results of a correlation test (cor.all
) or correlation matrix
in a compact form indicating positive or negative correlation.
printCor(x, criterion = 0.75)
printCor(x, criterion = 0.75)
x |
an object of class "cor.all" from |
criterion |
a numeric value indicating the test criterion for showing a value. |
The object x
is returned invisibly.
The printed output is a compressed table showing "+" where the value in x
is greater than criterion
, "-" where the value in x
is less than
-criterion
, "\" on the diagonal, if x
is symmetric, and "." where x
has a missing value, and " " otherwise.
## Not run: library(smwrData) data(TNLoads) printCor(cor(TNLoads, method="spearman"), .5) ## End(Not run)
## Not run: library(smwrData) data(TNLoads) printCor(cor(TNLoads, method="spearman"), .5) ## End(Not run)
Computes the cumulative probability and quantiles for the Grubbs distribution describing an outlier in a sample from Normal distribution.
qgrubbs(alpha, N) pgrubbs(G, N)
qgrubbs(alpha, N) pgrubbs(G, N)
alpha |
the significance level. |
N |
the number of values in the sample. |
G |
the maximum or minimum scaled difference from the mean. |
The function pgrubbs
returns the attained p-value, not the
cumulative distribution, for the maximum scaled distance from the mean. the
funciton qgrubbs
returns the one-sided critical value for the maximum
scaled difference from the mean.
Grubbs, F., 1969, Procedures for Detecting Outlying Observations in Samples, Technometrics, v. 11, no. 1, pp. 1-21.
# The difference is due to rounding errors pgrubbs(c(.9, .95, .99), 32) qgrubbs(c(5.905348, 5.483234, 5.159097), 32)
# The difference is due to rounding errors pgrubbs(c(.9, .95, .99), 32) qgrubbs(c(5.905348, 5.483234, 5.159097), 32)
Computes sample quantiles and confidence limts for specified probabilities.
qtiles.CI(x, probs = 0.5, CI = 0.9, bound = c("two.sided", "upper", "lower"), na.rm = TRUE)
qtiles.CI(x, probs = 0.5, CI = 0.9, bound = c("two.sided", "upper", "lower"), na.rm = TRUE)
x |
numeric vector to compute the sample quantiles. |
probs |
numeric vector of desired probabilities with values between 0 and 1. |
CI |
the minimum desired confidence interval for each level specifed in probs. |
bound |
a character string indicating the desired bounds, "two.sided"
means the two-sided interval, "upper" means the upper bound of the interval,
and "lower" means the lower bound of the interval. Only a single character
is needed. The lower confidence limit is |
na.rm |
logical; if TRUE, then missing values are removed before computation. |
A matrix of sample quantiles, the lower confidence limit, the upper confidence limit, and the probability represented by the confidence interval corresponding to the probs levels in the sorted x data.
Helsel, D.R. and Hirsch, R.M., 2002, Statistical methods in
water resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.
## Generate a random sample set.seed(222) XX.rn <- rexp(32) qtiles.CI(XX.rn, probs=c(.25, .5, .75))
## Generate a random sample set.seed(222) XX.rn <- rexp(32) qtiles.CI(XX.rn, probs=c(.25, .5, .75))
Computes sample quantiles corresponding to the given probabilities:
method for "numeric" data. The smallest observation corresponds to a probability
of 0 and the largest to a probability of 1. This method function is a simple
wrapper for the default function that sets the default type
to 2 for
numeric data.
## S3 method for class 'numeric' quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 2, ...)
## S3 method for class 'numeric' quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 2, ...)
x |
numeric vector whose sample quantiles are wanted. |
probs |
numeric vector of probabilities with values in the range from 0 to 1. |
na.rm |
remove missing values |
names |
include names of the probabilities, expressed as percentages? |
type |
an interger between 1 and 9 that selects the method for computing the quantile. See Note. |
... |
further arguments passed to or from other methods. |
An optionally named vector corresponding to the quantiles of
x
for the selected probabilities.
Helsel and Hirsh (2002) define the 75th percentile as "a value which
exceeds no more than 75 percent of the data and is exceeded by no more than
25 percent of the data." This rule can be easily extended to other
percentiles. The selection of type
equal to 2 ensures that this rule
is met for all data. The rule stated by Helsel and Hirsch is very useful for
an emprical description of the data, but Hyndman and Fan (1996) describe the
slection of type
for other applications.
Helsel, D.R. and Hirsch, R.M., 2002, Statistical methods in
water resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.
Hyndman, R.J. and Fan, Y. 1996, Sample quantiles in statistical packages: American Statistician, v. 50, p. 361-365.
# The default value for type (7) will compute a value that is exceeded by 4 values # for a sample of size 14 quantile(seq(14), type=7) # But 4/14 is greater than 25 percent. But setting type to 2 will result in # only 3 values that are larger than the computed 75th percentile. quantile(seq(14))
# The default value for type (7) will compute a value that is exceeded by 4 values # for a sample of size 14 quantile(seq(14), type=7) # But 4/14 is greater than 25 percent. But setting type to 2 will result in # only 3 values that are larger than the computed 75th percentile. quantile(seq(14))
Computes the regional Kendall trend test with Sen slope estimator.
regken(series, correct = nrow(series) > 9)
regken(series, correct = nrow(series) > 9)
series |
a numeric matrix with rows representing the annual observations and columns representing the sites. Missing values are permitted. |
correct |
logical, if |
An object of class "htest" also inhereting class "seaken" containing the following components:
method |
a description of the method. |
statistic |
the value of Kendall's tau. |
p.value |
the p-value. See Note. |
p.value.raw |
the p-value computed without correction for cross correlation. See Note. |
p.value.corrected |
the p-value computed with correction for cross correlation. See Note. |
estimate |
a named vector containing the Sen estimate of the slope in units per year, the median value of the data, and the median value of time. |
data.name |
a string containing the actual name of the input series with the number of years and sites |
alternative |
a character string describing alternative to the test ("two.sided"). |
null.value |
the value for the hypothesized slope (0). |
nyears |
the number of years. |
nseasons |
the number of sites |
series |
the data that was analyzed. |
seasonnames |
the names of the sites |
The value of p.value
is p.value.raw
if there are fewer
than 10 years of data and is p.value.corrected
otherwise.
The regional Kendall is described in Douglas and others (2000) and Helsel and others (2006).
The adjustment for spatial correlation used in regken
is based on equation 13 in
Douglas and others (2000) and uses the Spearman correlation to account for monotonic correlations.
Douglas, E.M., Vogel, R.M., and Kroll, C.N., 2000, Trends in floods and low flows in the United States: impact of spatial correlation: Journal of Hydrology, v. 240, p. 90–105.
Helsel, D.R., Mueller, D.K., and Slack, J.R., 2006, Computer program for the Kendall family of trend tests: U.S. Geological Survey Scientific Investigations Report 2005–5275, 4 p.
# Need example?
# Need example?
Computes the root-mean-squared error (RMSE) of the difference between observed values and the predicted values or the RMSE or relative percent differences (RPD) between samples and duplicates.
rmse(x, ...) ## Default S3 method: rmse(x, y, ...) ## S3 method for class 'lm' rmse(x, ...) rpd(x, y, plotit = FALSE)
rmse(x, ...) ## Default S3 method: rmse(x, y, ...) ## S3 method for class 'lm' rmse(x, ...) rpd(x, y, plotit = FALSE)
x |
either a random vector an object for which a method exists. |
y |
duplicate samples paired with |
plotit |
logical, if |
... |
arguments to be passed to or from methods. |
For the rmse
functions, a single value representing the
estimated RMSE. For rpd
, the relative percent differences for each
paired sample and duplicate.
The definition for the RMSE of paired water-quality duplicates is
The definition for RPD for paired water-quality duplicates is
Other disciplines may use different equations.
Bland J.M. and Altman D.G., 1986 Statistical methods for assessing agreement
between two methods of clinical measurement: Lancet, i, p. 307–310.
Clesceri, L.S., Greenberg, A.E., and Eaton, A.D., 1998, Standard
methods for the examination of water and wastewater, 20th edition:
Baltimore, Md, United Book Press, Inc., 1162 p.
Harvey, D., undated, Analytical chemistry 2.0: Analytical Sciences Digital Library: online at URL: http://www.asdlib.org/onlineArticles/ecourseware/Analytical
Helsel, D.R., and Hirsch, R.M., 2002, Statistical methods in water
resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.
# Example 15.2 from Harvey. dupX1 <- c(160, 196, 207, 185, 172, 133) dupX2 <- c(147, 202, 196, 193, 188, 119) rmse(dupX1, dupX2) rpd(dupX1, dupX2)
# Example 15.2 from Harvey. dupX1 <- c(160, 196, 207, 185, 172, 133) dupX2 <- c(147, 202, 196, 193, 188, 119) rmse(dupX1, dupX2) rpd(dupX1, dupX2)
Computes the receiver operator characteristics (ROC) for a logistic regression model.
roc(object)
roc(object)
object |
an object of class "glm." |
An object of class "roc" haveing these components:
c.val |
the area under the ROC curve |
table |
a 3-column matrix of the fitted values, sensitivity and specificity |
Objects of class "roc" have print
and plot
methods.
Gonen, M., 2006, Receiver Operating Characteristics (ROC) Curves: SAS Users Group International Conference, Paper 210-31 18 p., available at http://www2.sas.com/proceedings/sugi31/210-31.pdf, last accessed Octocer 26, 2011.
Computes the seasonal Kendall trend test with Sen slope estimator.
seaken(series, nseas = 12)
seaken(series, nseas = 12)
series |
a regularly spaced numeric vector to test for trend. Missing values are permitted. |
nseas |
the number of seasons per year. Must not exceed 52. Can also be a character vector of the names of the seasons. The length of the character vector determines the number of seasons. |
An object of class "htest" also inhereting class "seaken" containing the following components:
method |
a description of the method. |
statistic |
the value of Kendall's tau. |
p.value |
the p-value. See Note. |
p.value.raw |
the p-value computed without correction for serial correlation. See Note. |
p.value.corrected |
the p-value computed with correction for serial correlation. See Note. |
estimate |
a named vector containing the Sen estimate of the slope in units per year, the median value of the data, and the median value of time. |
data.name |
a string containing the actual name of the input series with the number of years and seasons. |
alternative |
a character string describing alternative to the test ("two.sided"). |
null.value |
the value for the hypothesized slope (0). |
nyears |
the number of years. |
nseasons |
the number of seasons. |
series |
the data that was analyzed. |
seasonnames |
the names of the seasons. |
The value of p.value
is p.value.raw
if there are fewer
than 10 years of data and is p.value.corrected
otherwise.
Hirsch, R.M., Alexander, R.B., and Smith, R.A., 1991, Selection
of methods for the detection and estimation of trends in water quality:
Water Resources Research, v. 27, p. 803–813.
Hirsch, R.M., Slack, J.R., and Smith, R.A., 1982, Techniques of trend
analysis for monthly water quality data: Water Resources Research, v. 18, p.
107–121.
Hirsch, R.M., and Slack, J.R., 1984, A nonparametric trend test for seasonal
data with serial dependence: Water Resources Research, v. 20, p.
727–732.
Kendall, M.G., 1938, A new measure of rank correlation: Biometrika v. 30, p.
81–89.
Kendall, M.G., 1976, Rank correlation methods (4th ed.): London, Griffin,
202 p.
Sen, P.K., 1968, Estimates of regression coefficient based on Kendall's tau:
Journal of the American Statisical Association, v. 63, p. 1379–1389.
## Not run: library(smwrData) library(smwrBase) data(KlamathTP) RegTP <- with(KlamathTP, regularSeries(TP_ss, sample_dt)) # The warning generated is expected and acceptable for these data seaken(RegTP$Value, 12) # Manaus river data is in package boot library(boot) data(manaus) manaus.sk <- seaken(manaus, 12) print(manaus.sk) # Note for these data the large difference between the raw and corrected p-values. # p-value (raw) is << 0.001 manaus.sk$p.value.raw # p-value (with correlation correction) is = 0.10 manaus.sk$p.value.corrected # Hence, it may be concluded that these particular data show substantial serial correlation # as seen with see with acf(manaus). ## End(Not run)
## Not run: library(smwrData) library(smwrBase) data(KlamathTP) RegTP <- with(KlamathTP, regularSeries(TP_ss, sample_dt)) # The warning generated is expected and acceptable for these data seaken(RegTP$Value, 12) # Manaus river data is in package boot library(boot) data(manaus) manaus.sk <- seaken(manaus, 12) print(manaus.sk) # Note for these data the large difference between the raw and corrected p-values. # p-value (raw) is << 0.001 manaus.sk$p.value.raw # p-value (with correlation correction) is = 0.10 manaus.sk$p.value.corrected # Hence, it may be concluded that these particular data show substantial serial correlation # as seen with see with acf(manaus). ## End(Not run)
Computes the timing of the seasonal peak value. The timing of the seasonal peak is needed for the seasonalWave model (Vecchia and others, 2008).
seasonalPeak(x, y)
seasonalPeak(x, y)
x |
a vector of decimal time representing dates and times. Missing values are permitted and are removed before analysis. |
y |
a vector of the data for which the peak is needed. Missing values are permitted and are removed before analysis. |
The timing of the peak of the data is computed by identifying the largest
value produced by smoothing that data with supsmu
. The remaining data
in the attributes are used by using the seasonalPeak method of confirm.
An object of class seasonalPeak. The unconfirmed object is a single
value that represents the estimate of the timing of the peak and five
additional attributes.
Data: a list of the x
and y
values where x is the fractional
part of the original decimal time data. Missing values have been removed.
Smooth: a list of the x
and y
smoothed values.
Points: a
list of 361 evenly spaced xout and yout values.
Extra: pointers to all
the peaks in Points
.
Confirmed: logical indicating that the
object has not been confirmed.
The generic functions print and confirm have methods for object of class seasonalPeak.
Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p. 1308-1324
seasonalWave
, confirm.seasonalPeak
,
supsmu
, print.seasonalPeak
library(smwrData) data(QW05078470) with(QW05078470, seasonalPeak(dectime(DATES), P00665)) ## Should be: # Default value: 0.499 # Alternate values: 0.497
library(smwrData) data(QW05078470) with(QW05078470, seasonalPeak(dectime(DATES), P00665)) ## Should be: # Default value: 0.499 # Alternate values: 0.497
Computes a seasonal wave model to describe the variation of a constituent over the course of a year. This model is particularly well suited to describe the concentration of pesticides.
seasonalWave(x, cmax, loading, hlife, second.peak = NULL)
seasonalWave(x, cmax, loading, hlife, second.peak = NULL)
x |
a vector of decimal time representing dates. Missing values
( |
cmax |
the time of the greatest peak value, expressed as a fraction of the year. |
loading |
the number of months of contituent loading for the primary peak. |
hlife |
the half life of the decay rate, expressed in units of months. This should be in the range of 1 though 4. |
second.peak |
a list of the parameters for the second peak. See Details. |
The seasonal wave model is expressed as a differential equation
where is a
specified constant, which represents the rate of removal or chemical decay;
are specified nonnegative constants, which
represent monthly input amounts;
is the indicator function with
if
lies in the given interval and
otherwise; and
is the input amount at time
.
The value of hlife is used to define the decay rate in the
differential equation. The value of
is 12/
hlife
and
decays at a a rate of
per month.
The list for second.peak
must contain these components:
la, the
lag from the primary peak to the second peak, in months. This is always the
time from the primary peak to the second peak, even if the second peak
occurs earlier in the year.
lo, the loading duration in months, this
value must be leas than la
. w, the load scaling factor relative to
the primary peak. Must be greater than 0 and less than or equal to 1. For
practical pruposes, it should be greater than 0.5.
A vector expressing the expected variation of concentration for each value in x; the values are scaled to a range of -0.5 to 0.5.
Dave Lorenz, original coding by Aldo Vecchia.
Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p. 1308-1324.
seasonalPeak
, confirm.seasonalPeak
## Not run: # Selected single peak models # 1 month loading, 1 month half-life curve(seasonalWave(x, 3/12, 1, 1), 0, 1, n=361, xlab='fraction of year', ylab="W") # 1 month loading, 3 month half-life curve(seasonalWave(x, 3/12, 1, 3), 0, 1, n=361, add=TRUE, col="blue") # 3 month loading, 2 month half-life curve(seasonalWave(x, 3/12, 3, 2), 0, 1, n=361, add=TRUE, col="green") # Add a second peak model curve(seasonalWave(x, 3/12, 3, 2, second.peak=list(la=6, lo=2, w=.75) ), 0, 1, n=361, add=TRUE, col="red") ## End(Not run)
## Not run: # Selected single peak models # 1 month loading, 1 month half-life curve(seasonalWave(x, 3/12, 1, 1), 0, 1, n=361, xlab='fraction of year', ylab="W") # 1 month loading, 3 month half-life curve(seasonalWave(x, 3/12, 1, 3), 0, 1, n=361, add=TRUE, col="blue") # 3 month loading, 2 month half-life curve(seasonalWave(x, 3/12, 3, 2), 0, 1, n=361, add=TRUE, col="green") # Add a second peak model curve(seasonalWave(x, 3/12, 3, 2, second.peak=list(la=6, lo=2, w=.75) ), 0, 1, n=361, add=TRUE, col="red") ## End(Not run)
This is a support function for seasonalWave model (Vecchia and others, 2008).
seasonalWave.fit(cmax, wtx, pkt, phi)
seasonalWave.fit(cmax, wtx, pkt, phi)
cmax |
the time of the greatest peak value, expressed as a fraction of the year. |
wtx |
a vector of 12 monthly application rates. |
pkt |
the numeric month of peak expected from the data in |
phi |
the recession rate, in 1/years. |
A vector of 361 values describing the annual seasonal wave.
This function is support for the seasonalWave function and is not intended to be called by the user.
Dave Lorenz, original coding by Aldo Vecchia.
Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p. 1308-1324.
This is a support function for seasonalWave model (Vecchia and others, 2008).
seasonalWave.wt(loading, second.peak)
seasonalWave.wt(loading, second.peak)
loading |
the number of months of primary loading for the seasonal wave model. |
second.peak |
a list describing the loading characteristics of the
second peak. See Details. If |
If second.peak
is supplied, then it must be a list with these
components:
la, the lag (number of months) between the primary and second
peak;
lo, the loading duration (number of months) for the second peak
(must be less than la
;
w, the weigthing factor for the second peak
loading, relative to the primary peak. Must be greater than 0 and less than
or equal to 1.
The weighting vector for the seasaon wave model. A vector of length 12 that represents the relative loading to the system.
This function is support for the seasonalWave function and is not intended to be called by the user.
Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p. 1308-1324.
Selects the "best" parameters for a seasonal wave fit.
selBestWave(formula, data, dec.time, wave.list, exhaustive = FALSE, Regression = lm, Test = AIC, ...)
selBestWave(formula, data, dec.time, wave.list, exhaustive = FALSE, Regression = lm, Test = AIC, ...)
formula |
the formula describing the model without a
|
data |
the data.frame that contians the veriable specified in the formula. |
dec.time |
a character string of the name of the column in |
wave.list |
an object of class "seasonalPeak" (confirmed) describing the timing of the peak and potential candidate models. |
exhaustive |
logical; if TRUE, then do a fairly complete search for the
timing of the peak, otherwise accept the timing specified in
|
Regression |
the regression function. |
Test |
the function to perform the comparison test among all of the candidate models. |
... |
any additional arguments to |
For logistic regression, use Regression
=glm
,
Test
=deviance
, family=binomial(link="logit")
.
A 4- or 7-column matrix of the "best" models. The columns are the
timing of the peak (Cmax
), the primary peak loading (Loading
),
the half life (Hlife
), and the test score (Test
). If the model
is a two-peak model, then additional columns (la
, lo
, and
w
) describeing the second peak are included.
Vecchia, A.V., Martin, J.D., and Gilliom, R.J., 2008, Modeling variability and trends in pesticide concentrations in streams: Journal of the American Water Resources Association, v. 44, no. 5, p.1308–1324.
## See the SeasonalWave demo
## See the SeasonalWave demo
Computes the Sen slope with confidence interval and an intercept for paired data.
senSlope(formula, data, subset, na.action, intercept = "Ac", CI = 0.95)
senSlope(formula, data, subset, na.action, intercept = "Ac", CI = 0.95)
formula |
a model formula with exactly one explanatory variable. |
data |
the data. |
subset |
any descriptiopn to subset |
na.action |
the function to handle missing values. |
intercept |
a character string indicating the method to compute the intercept. See Details. |
CI |
the desired confidence interval for the slope. |
The argument intercept
may be either "Ac" or "A1m." If it is "Ac,"
then the intercept is computed from the median of y
and x
,
also known as the Conover method. If it is "A1m," then the intercept is
chosen so that the median of the residuals is zero.
An object of class "senSlope" with these components:
call |
the matched call. |
coefficients |
the intercept ans Sen slope. |
slope.CI |
the lower and upper confidence limits of the Sen slope. |
residuals |
the residuals of the regression. |
fitted.values |
the fitted values. |
na.action |
information about any missing values. |
x |
the explanatory variable. |
y |
the response variable. |
var.names |
the response and explanatory variable names. |
model |
the model frame. |
Dietz, E.J., 1989, Teaching regression in a nonparametric
statistics course: The American Statstician, v. 43, p. 35–40
Helsel, D.R., and Hirsch, R.M., 2002, Statistical methods in water
resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.
Sen, P.K., 1968, Estimates of the regression coefficient based on Kendall's
tau: Journal of the American Statistical Association, v. 63 p. 1379–1389
## Not run: library(smwrData) data(SaddlePeaks) senSlope(Flow ~ Year, data=SaddlePeaks) ## End(Not run)
## Not run: library(smwrData) data(SaddlePeaks) senSlope(Flow ~ Year, data=SaddlePeaks) ## End(Not run)
Performs either of two nonparametric tests (Wilcoxon Rank-Sum test or runs
test) for serial correlation. Both tests compute Z, which is the product of
sequential values of x
after removing any trend and normalizing so that the
median of Z values is 0.
serial.test(x, method = "wilcoxon")
serial.test(x, method = "wilcoxon")
x |
the numeric vector of observations. Missing values (NAs) are not allowed. |
method |
the string "wilcoxon" or "runs," depending on which test should be used. Only the first character is necessary. See Details. |
If the method is "wilcoxon," then the Wilcoxon Rank-Sum test (Wilcoxon, 1945) is performed to compare the distribution of positive Z values to the distribution of negative Z values. Excessive numbers of positive Z values suggests positive serial correlation and excessive negative Z values suggests negative serial correlation.
If the method is "runs," then the runs test (Wald and Wolfowitz, 1940) is performed on sequences of positive and negative values of Z. If there is a larger than expected number of runs, then that suggest negative serial correlation and if fewer than the expected number of runs, then positive serial correlation.
An object of class htest containing the following components:
statistic |
the score for either test. |
p.value |
the two-sided p-value of the statistic. |
Z |
the raw data of the analysis. |
alternative |
a string "two.sided" indicating the hypothesis. |
method |
a description of the method. |
data.name |
a string containing the actual name of the input data. |
The null hypothesis is that the data are uncorrelated.
Dufour, J.M., 1981, Rank test for serial dependence: Journal of the Time Series Analysis, v. 2, no. 3, p. 117–128.
Wald, A., and Wolfowitz, J., 1940, On a test whether two samples are from the same population: Annals of Mathematical Statistics, v. 11, p. 147–162.
Wilcoxon, F., 1945, Individual comparisons by ranking methods: Biometrics, v. 1, p. 80–83.
Produces a plot of time-series data by season that shows seasonal and overall trends: method for "seaken" object.
## S3 method for class 'seaken' seriesPlot(x, SeasonLine = list(name = "", what = "vertical", color = "black"), SeasonPoint = list(name = "", what = "points", symbol = "circle", filled = TRUE, size = 0.09, color = "black"), yaxis.log = FALSE, yaxis.range = c(NA, NA), ylabels = 7, xlabels = x$seasonames, xtitle = "", ytitle = deparse(substitute(x)), caption = "", margin = c(NA, NA, NA, NA), SeasonTrend = list(name = "", what = "lines", color = "brown"), TrendLine = list(name = "", what = "lines", color = "blue"), ...)
## S3 method for class 'seaken' seriesPlot(x, SeasonLine = list(name = "", what = "vertical", color = "black"), SeasonPoint = list(name = "", what = "points", symbol = "circle", filled = TRUE, size = 0.09, color = "black"), yaxis.log = FALSE, yaxis.range = c(NA, NA), ylabels = 7, xlabels = x$seasonames, xtitle = "", ytitle = deparse(substitute(x)), caption = "", margin = c(NA, NA, NA, NA), SeasonTrend = list(name = "", what = "lines", color = "brown"), TrendLine = list(name = "", what = "lines", color = "blue"), ...)
x |
data that can be treated as a regularly-spaced time series. |
SeasonLine |
control parameters of the lines in the plot. See Details. |
SeasonPoint |
control parameters of the points in the plot. See Details. |
yaxis.log |
log-transform the y axis? |
yaxis.range |
set the range of the y axis. |
ylabels |
set the y-axis labels. See |
xlabels |
set the x-axis labels, may be a single numeric value
indicating the number of season in |
xtitle |
the x-axis title. |
ytitle |
the y-axis title. |
caption |
the figure caption. |
margin |
the parameters of the margin of the plot area. |
SeasonTrend |
control parameters of the trend line for each season. See Details. |
TrendLine |
control parameters of the overall trend line compute by
|
... |
any additional arguments required for specific methods. |
The argument what
for SeasonLine
must be either "lines" or
"vertical." See monthplot
for more information.
The
argument what
for either SeasonPoint
, SeasonTrend
, or
trendLine
may be set to "none" to suppress drawing of that feature.
Information about the graph.
Computes the skewness statistic.
skew(x, na.rm = TRUE, method = "fisher")
skew(x, na.rm = TRUE, method = "fisher")
x |
any numeric vector. |
na.rm |
logical; if TRUE, then remove missing value before computation. |
method |
the method to use for computing the skew, must be "fisher" or "moments." |
a single value representing the skewness of the data in x
.
Helsel, D.R. and Hirsch, R.M., 2002, Statistical methods in
water resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.
skew(c(1.0, 1.2, 1.5, 1.9, 2.5))
skew(c(1.0, 1.2, 1.5, 1.9, 2.5))
Summarizes the correlations: method for "cor.all" object.
## S3 method for class 'cor.all' summary(object, p.adjust = "holm", variable = NULL, ...)
## S3 method for class 'cor.all' summary(object, p.adjust = "holm", variable = NULL, ...)
object |
an object created by the |
p.adjust |
a character string describing the method to use to adjust
the p-values to account for m ultiple comparisons. See
|
variable |
if the name of a variable, then summarize only those correlations of this variable with all others, otherwise summarize all combinations. |
... |
further arguments passed to or from other methods. |
A data frame containing columns of the paired variables, the correlation, the adjusted p-value, and the number of observations in the correlation.
library(smwrData) data(TNLoads) # Extract only the correlations with the log of total nitrogen summary(cor.all(TNLoads[, 1:5]), variable="LOGTN")
library(smwrData) data(TNLoads) # Extract only the correlations with the log of total nitrogen summary(cor.all(TNLoads[, 1:5]), variable="LOGTN")
Creates a dataset of specified summary statistics from a collection of data.
sumStats(..., group = NULL, Num = "Num", Stats = list(Mean = mean, StdDev = sd), Probs = (0:4)/4, na.rm = TRUE)
sumStats(..., group = NULL, Num = "Num", Stats = list(Mean = mean, StdDev = sd), Probs = (0:4)/4, na.rm = TRUE)
group |
list whose components are interpreted as categories, each of the same length as the objects in ... The e lements of the categories define the position in a multi-way array corresponding to each observation. Missing values (NAs) are allowed. The names of group are used as the names of the columns in the output dataset. If a vector is given, it will be treated as a list with one component. The default is NULL, which indicates no grouping variable. |
Num |
a character string indicating the name of the column that contains the number of nonmissing observations. |
Stats |
a tagged list. An element in the list should have the name of the target variable in the output data set, a nd it should be a function that accepts the na.rm argument and computes a single value. See Notes for commonly used functions. The default is the target columns Mean and StdDev, which are computing using functions mean and stdev. |
Probs |
vector of desired probability levels. Values must be between 0 and 1. Minimum returned for probs=0 and maximum returned for probs=1. Default is c(0.0, 0.25, 0.50, 0.75, 1.0). |
na.rm |
logical; if TRUE, then missing values are removed from each column in ... before computing the statistics |
... |
any number of arguments, which can be of many different forms: a dataset, selected columns of a dataset, vectors, and matrices. If a dataset is supplied, then nonnumeric columns are removed before computing statistics. |
A data frame containing columns identifying each variable in ..., any grouping variables, and the requested statistics as named in the call.
Commonly used functions referenced in the Stats arguments include
mean, sd, skew and var.
The statistics requested by the Probs argument are computed by the quantile
function using type
=2.
Helsel, D.R. and Hirsch, R.M., 2002, Statistical methods in water resources: U.S. Geological Survey Techniques of Water-Resources Investigations, book 4, chap. A3, 522 p.
mean
, sd
, skew
,
var
, quantile
,
## Generate a random sample set.seed(222) XX.rn <- rexp(32) sumStats(XX.rn)
## Generate a random sample set.seed(222) XX.rn <- rexp(32) sumStats(XX.rn)
Computes the mean weighted by the duration between values.
timeWeightedMean(x, time, na.rm = TRUE, fill = TRUE, by.period = FALSE, excessive = 2.5)
timeWeightedMean(x, time, na.rm = TRUE, fill = TRUE, by.period = FALSE, excessive = 2.5)
x |
a sequnece of numeric values whose mean is to be computed. |
time |
the times associated with |
na.rm |
remove missing values before computing the mean? |
fill |
expand the weights for the first and last observation to the end of the period? Otherwise, the weight is based on the distance to the second or next-to-last observation. |
by.period |
computes wights by the periods defined by |
excessive |
a warning is issued if the largest weight for any
observation exceeds |
The values of time
are expected to be in decimal format, where the
integer part indicates the period and the fractional part linearly
distributed through the period. For example, the year 2000 begins at 2,000.0
and July 2, 2000 is 2,000.5. The function dectime
can be used to
convert Date
or POSIX
to decimal format. If time
is of
class "Date," or any "POSIX" classs, it is converted using
dectime
.
If na.rm
is FALSE
and byeriod
is
TRUE
, then missing values are returned only for periods that contain
a missing value (NA
).
If by.period
is TRUE
, then a vector with one entry per
period, otherwise the mean for the entire data set.
Crawford,C.G., 2004, Sampling strategies for estimating acute and chronic exposures of pesticides in streams: Journal of the American Water Resources Association, v. 40, n. 2, p 485-502.
## Not run: library(smwrData) data(QW05078470) with(QW05078470, timeWeightedMean(P00665, dectime(DATES))) ## End(Not run)
## Not run: library(smwrData) data(QW05078470) with(QW05078470, timeWeightedMean(P00665, dectime(DATES))) ## End(Not run)
Generates a basis matrix for piecewise linear modeling of trends.
trends(x, breaks, boundary.breaks = c(FALSE, FALSE), steps)
trends(x, breaks, boundary.breaks = c(FALSE, FALSE), steps)
x |
a vector of dates/times, assumed to be in dectime format. Missing values are permitted and result in corresponding missing values in the output. |
breaks |
a vector of breakpoints in the linear trends. |
boundary.breaks |
a logical vector of length 2, indicating whether the
breaks include the range of the data. If the first is TRUE, then the first
break denotes the beginning of a trend. If the last is TRUE, then the last
break denotes the end of a trend. Otherwise, the trends begin at the
|
steps |
a vector indicating any step trends. |
A matrix of dimension length of x by number of linear trend and step trends. The breaks are included as an attribute.
Each trend is 0 prior to the break, and then increases at the rate of 1 per unit and maintain that maximum value after the break. The regression coefficient then reprents the trend as a rate. A step trend is 0 before the step and 1 after it.
# model two piecewise linear trends from 2000 to 2004, with a break at 2001 and 2003 trends(2000 + seq(0,20)/5, breaks=c(2001, 2003))
# model two piecewise linear trends from 2000 to 2004, with a break at 2001 and 2003 trends(2000 + seq(0,20)/5, breaks=c(2001, 2003))
Computes the variance inflation factor (Helsel and Hirsch, 2002) for each variable in a linear regression fit.
vif(model, ...) ## S3 method for class 'lm' vif(model, ...)
vif(model, ...) ## S3 method for class 'lm' vif(model, ...)
model |
an object of class "lm." There is no default method for
|
... |
further arguments passed to or from other methods. |
A named numeric vector containing the variance inflation factors for each variable.
Helsel, D.R. and Hirsch, R.M., 2002, Statistical methods in
water resources: U.S. Geological Survey Techniques of Water-Resources
Investigations, book 4, chap. A3, 522 p.