Title: | Functions to import and manipulate hydrologic data |
---|---|
Description: | This package has data import and export functions for specialized formats used within the U.S. Geological Survey. It also contains several functions that are useful for managing or manipulating hydrologic and other data. |
Authors: | Dave Lorenz [aut], Laura DeCicco [aut, cre], Sheila Saia [ctb] |
Maintainer: | Laura DeCicco <[email protected]> |
License: | CC0 |
Version: | 1.1.5 |
Built: | 2024-11-19 03:55:47 UTC |
Source: | https://github.com/USGS-R/smwrBase |
This package has specialized functions for importing, managing, or manipulating hydrologic data.
Package: | smwrBase |
Type: | Package |
Version: | 1.1.2 |
Date: | 2015-12-08 |
License: | File CC0 |
Depends: | methods,memoise,digest,lubridate |
This package contains functions that import, manage, or manipulate hydrologic data and functions that apply specialized transformations used in hydrologic analyses amd modeling. A listing of the functions and their description is in the following table.
Function | Description |
%cn% | Identify character strings that contain the specified pattern. |
anomalies | Break down time-series data into long- and short-term deviations (anomalies) |
and the high-frequency variation. | |
as.timeDay | Convert data to objects of class "timeDay." |
baseDay | Computes the "base" day of the year, a reference value that can be used to |
group days for the computation of summary statistics. | |
boxCox | Apply a Box-Cox power transformation. |
coalesce | Merge a matrix or list of vectors selecting the first non-missing value. |
conc.meq | A list containing necessary information for the function conc2meq . |
conc2meq | Convert concentration in milligrams per liter to milli-equivalents per liter. |
daysInMonth | The number of days in a month. |
dectime | Convert dates and times to decimal time in years. |
dectime2Date | Convert decimal time in years to an object of class "Date." |
dlpearsonIII | The density of the log-Pearson Type III distribution. |
dms2dd | Convert data in degrees, minutes, and seconds to decimal degrees. |
dpearsonIII | The density of the Pearson Type III distribution. |
eventLen | Compute the length or duration of an event. |
eventNum | Compute the number of an event, identified by a TRUE value in a sequence. |
eventSeq | Compute the sequence number within an event. |
eventSeries | Create regular time-series data from recorded events. |
exportCSV | Export a data frame to a comma-separated values file. |
exportRDB | Export data to an ASCII relational-database file. |
fillMissing | Interpolate missing values in a regular time-series of data. |
fourier | Compute the Fourier series decomposition from date data. |
group2row | Unstack data oriented in columns to rows of data. |
hyperbolic | Apply a hyperbolic transformation. |
hysteresis | Compute a basis for estimating hysteresis effect in some variable related to |
the argument x . |
|
IboxCox | Apply the inverse Box-Cox power transformation. |
Ihyperbolic | Apply the inverse hyperbolic transformation. |
importCSV | Import a data frame from a comma-separated values file. |
importRDB | Import a data frame from an ASCII relational-database file. |
index.coalesce | Return the index column number instead of the values for the first non-missing value. |
isCharLike | Determine whether the data be treated like character data. |
IsCurve | Apply the inverse s-curve transformation. |
isDateLike | Deterimne whether the data can be treated like date data. |
isGroupLike | Determine whether the data can be treated like grouping data. |
isNumberLike | Determine whether the data can be treated as numeric data. |
makeMeta | Create a template meta file for a comma-separated values file. |
mergeNearest | Merge two datasets by the nearest date and time. |
mergeQ | Merge flow data with water-quality data. |
miss2na | Convert a coded missing value to NA. |
more | Display the contents of an object by pages. |
movingAve | Compute the moving average in regular time-series data. |
movingDiff | Compute the moving difference in regular time-series data. |
na2miss | Convert NA to a coded missing value. |
peaks | Compute the indices of peaks in time-series data. |
pick | Select a value based on the value of a logical, integer, or character reference value. |
plpearsonIII | Compute the cumulative probability of the log-Pearson Type III distribution. |
ppearsonIII | Compute the cumulative probability of the Pearson Type III distribution. |
qlpearsonIII | Compute the quantile of the log-Pearson Type III distribution. |
qpearsonIII | Compute the quantile of the Pearson Type III distribution. |
quadratic | Compute a basis for an orthogonal second-order polynomial. |
readList | Import data arranged on lines into a list. |
recode | Recode distinct values. |
regularSeries | Put data collected at arbitrary times into a regular time series. |
rlpearsonIII | Compute the random variates of the log-Pearson Type III distribution. |
rpearsonIII | Compute the random variates of the Pearson Type III distribution. |
sCurve | Apply the s-curve transformation. |
scaleRng | Scale data to a specified range. |
screenData | Screen data for missing values or gaps. |
seasons | Create seasonal categories from dates. |
seqCollapse | Collapse a sequence of integers to a compact character string. |
setFileType | Support function to manage file suffixes. |
setTZ | Set the time zone information for dates and times. |
shiftData | Shift time-series data forward or backward. |
sumComposition | Compute the percentages of data within a matrix. |
timeDay | Various methods for manipulating time-of-day data, including conversion to and |
from character, addition, and others. | |
untable | Expand a 2-dimensional table into the raw values. |
waterYear | Compute the water year of date data. The water year ends on September 30 of the year. |
whichRowCol | Identify the row and column indexes for TRUE values in a logical matrix. |
Dave Lorenz
Lorenz, D.L., 2015, smwrBase—an R package for managing hydrologic data, version 1.1.1: U.S. Geological Survey Open-File Report 2015–1202, 7 p.
Extract elements of a time-of-day object.
## S3 method for class 'timeDay' x[i]
## S3 method for class 'timeDay' x[i]
x |
the object. |
i |
an index specifying elements to extract. See |
The subset of x
indicated by i
.
Matches partial values, such as substrings.
x %cn% pattern
x %cn% pattern
x |
the character vector to be matched. Missing values are permitted. |
pattern |
the pattern to be matched against, may be a regular expression. |
A vector the same length as x
of logical values indicating
whether pattern
is found in the element of x
or not.
## A simple example c("abc", "def") %cn% "c"
## A simple example c("abc", "def") %cn% "c"
Decompose a series of observations into deviations (anomalies) from the mean for selected periods and the remainder (HFV or high frequency variation) using the method described in Appendix A of Vecchia (2000).
anomalies(x, ...)
anomalies(x, ...)
x |
a time series (ts) or a vector of observations that represents a daily series. Missing values (NAs) are allowed only at the beginning and end of the series. |
... |
named anomalies and the length of the selected periods, generally in days. The anomalies must be specified in order of decreasing length. |
The intent of computing anomalies is to give flexibility in fitting the
relation between flux, or concentration, and flow for time periods longer
than a couple of years. Taking a very simple regression model:
C = B0 + B1 * Q + e,
where C is the concentration, B0 and B1 are the regression coefficients, Q is
the flow, and e is the error. This can be re-expressed in terms of flow
anomalies (for this example, 5- and 1-year anomalies are used, many others are possible):
C = B0 + B1 * Qbar + B1 * A5 + B1 * A1 + B1 * HFV + e,
where C, B0, B1, and e are the same as the simple regression, and Qbar is the
mean flow, A5 is the 5-year anomaly, A1 is the 1-year anomaly, and HFV is the
high-frequency variation. The simple regression model assumes that the
regression coefficient (B1) is constant for all anomalies. Computing
anomalies removes that constraint and is represented by this model:
C = B0 + B1 * A5 + B2 * A1 + B3 * HFV + e,
where C, A5, A1, HFV, and e are the same as the re-expressed model, and B0,
B1, B2, and B3 are regression coefficients (numerically different from the
simple coefficients). Qbar is a constant and is not needed for the regression.
Anomalies are computed sequentially. First, the mean of x
is computed and
subtracted from the data. Then for each anomaly, the running mean of the
specified period is computed (the anomaly) and is subtracted from the data.
The remainder is the HFV. This procedure ensures that the sum of the
anomalies plus the mean is equal to the original data.
A matrix of the specified anomalies and HFV. The mean of x
is
included as an attribute.
The output matrix contains missing values in the beginning,
corresponding to the length of the longest anomaly.
A long time-frame anomaly that is often of interest, is the 5-year anomaly, which is 1,826 days.
Vecchia, A.V., 2000, Water-quality trend analysis and sampling design for the Souris River, Saskatchewan, North Dakota, and Manitoba: U.S. Geological Survey Water-Resources Investigations Report 00-4019, 77 p.
## Not run: library(smwrData) data(Q05078770) anomalies(log(Q05078770$FLOW), A3mo=90) ## End(Not run)
## Not run: library(smwrData) data(Q05078770) anomalies(log(Q05078770$FLOW), A3mo=90) ## End(Not run)
timeDay
objectsAddition of time-of-day data to either "Date" or "POSIXt" classes. This is useful when dates and times are recorded in separate columns in a dataset.
## S4 method for signature 'timeDay,POSIXt' Arith(e1, e2) ## S4 method for signature 'POSIXt,timeDay' Arith(e1, e2) ## S4 method for signature 'timeDay,Date' Arith(e1, e2) ## S4 method for signature 'Date,timeDay' Arith(e1, e2)
## S4 method for signature 'timeDay,POSIXt' Arith(e1, e2) ## S4 method for signature 'POSIXt,timeDay' Arith(e1, e2) ## S4 method for signature 'timeDay,Date' Arith(e1, e2) ## S4 method for signature 'Date,timeDay' Arith(e1, e2)
e1 , e2
|
timeDay and POSIXt or Date objects. Missing values are permitted in either argument and result in a missing value in the output. |
A vector of class "POSIXct."
as.Date("2001-03-04") + as.timeDay("10:00") ## Not run: library(smwrData) data(QW05078470) # Note that the result is reported in the local time zone! QW05078470$DATES + as.timeDay(QW05078470$TIMES) ## End(Not run)
as.Date("2001-03-04") + as.timeDay("10:00") ## Not run: library(smwrData) data(QW05078470) # Note that the result is reported in the local time zone! QW05078470$DATES + as.timeDay(QW05078470$TIMES) ## End(Not run)
Convert the time-of-day object to a character string.
## S3 method for class 'timeDay' as.character(x, ...)
## S3 method for class 'timeDay' as.character(x, ...)
x |
the time-of-day object to be converted. |
... |
not used, required for other methods. |
The values in x
converted to a character representation.
Convert an object to class "data.frame."
## S3 method for class 'timeDay' as.data.frame(x, row.names = NULL, optional = FALSE, ..., nm = deparse(substitute(x)))
## S3 method for class 'timeDay' as.data.frame(x, row.names = NULL, optional = FALSE, ..., nm = deparse(substitute(x)))
x |
the time-of-day object to be converted. |
row.names |
|
optional |
logical. If |
... |
not used, required for other methods. |
nm |
the column name to create for |
A data frame is created containing the data in x
.
as.timeDay
Valid conversions for function as.timeDay
.
as.timeDay(time, format) ## S4 method for signature 'timeDay,missing' as.timeDay(time, format) ## S4 method for signature 'numeric,missing' as.timeDay(time, format) ## S4 method for signature 'character,character' as.timeDay(time, format) ## S4 method for signature 'character,missing' as.timeDay(time, format)
as.timeDay(time, format) ## S4 method for signature 'timeDay,missing' as.timeDay(time, format) ## S4 method for signature 'numeric,missing' as.timeDay(time, format) ## S4 method for signature 'character,character' as.timeDay(time, format) ## S4 method for signature 'character,missing' as.timeDay(time, format)
time |
the time of day in character format. |
format |
The format for converting the time of day. See |
Inconsistent formats for time
will result in an error. Missing values
or empty strings in time
will result in missing values in the output.
as.timeDay("10:00") as.timeDay("3 PM", format="%I %p")
as.timeDay("10:00") as.timeDay("3 PM", format="%I %p")
Computes the base day of the year, a reference value that can be used to group days for the computation of summary statistics.
baseDay(x, numeric = TRUE, year = c("calendar", "water", "climate"))
baseDay(x, numeric = TRUE, year = c("calendar", "water", "climate"))
x |
a vector of class POSIXt, Dates, or character that represents a date. Missing values are permitted. |
numeric |
a logical value; |
year |
a character string indicating the basis of the factor levels. See Details. |
The base day is computed such that all dates have the same reference value
regardless of whether the year is a leap year or not. If year
is
"calendar," then the factor levels or day number begin on January 1; if year
is
"water," then the factor levels or day number begin on October 1; and if year
is
"climate," then the factor levels or day number begin on April 1.
An integer value representing the base day number if numeric
is TRUE
. Otherwise a factor with levels for every day of the year.
# The default numeric result baseDay(c("2000-02-29", "2000-03-01", "2001-03-01")) # The result as a factor baseDay(c("2000-02-29", "2000-03-01", "2001-03-01"), numeric=FALSE)
# The default numeric result baseDay(c("2000-02-29", "2000-03-01", "2001-03-01")) # The result as a factor baseDay(c("2000-02-29", "2000-03-01", "2001-03-01"), numeric=FALSE)
Computes the decimal time representation of the base day of the year.
baseDay2decimal(x)
baseDay2decimal(x)
x |
a vector of baseDay values, character, or factors of the form month
abbreviation and day number, generally created from |
A numeric value representing the base day.
# The baseDay ordered by calendar year bd.tmp <- baseDay(c("2000-02-29", "2000-03-01", "2001-03-01"), numeric=FALSE) baseDay2decimal(bd.tmp) # ordered by water year, result should agree bd.tmp <- baseDay(c("2000-02-29", "2000-03-01", "2001-03-01"), numeric=FALSE, year="water") baseDay2decimal(bd.tmp)
# The baseDay ordered by calendar year bd.tmp <- baseDay(c("2000-02-29", "2000-03-01", "2001-03-01"), numeric=FALSE) baseDay2decimal(bd.tmp) # ordered by water year, result should agree bd.tmp <- baseDay(c("2000-02-29", "2000-03-01", "2001-03-01"), numeric=FALSE, year="water") baseDay2decimal(bd.tmp)
Functions for transforming and back-transforming data using the Box-Cox power transform, with options to preserve the measurement units.
boxCox(x, lambda = 1, GM, alpha = 0) IboxCox(x, lambda = 1, GM, alpha = 0)
boxCox(x, lambda = 1, GM, alpha = 0) IboxCox(x, lambda = 1, GM, alpha = 0)
x |
a numeric vector to be transformed by |
lambda |
the power term in the Box-Cox transformation. The value of 1 is a linear transform, the value of 0 results in a natural log transform. |
GM |
the value to use for the geometric mean of |
alpha |
an offset value for |
If x
contains missing values, then GM
is computed after
omitting the missing values and the output vector has a missing value
wherever x
has a missing value.
The function boxCox
computes the forward transform and the function
IboxCox
computes the inverse [boxCox] transform, or back-transform.
A numeric vector of the transformed or back-transformed values in
x
with an attribute "GM" of the geometric mean.
The original power transform described by Box and Cox (1964) is adjusted by a power transform of the geometric mean to retain the correct dimensional units of the original data as described in section 13.2 by Draper and Smith (1998).
Box, G.E.P., and Cox, D.R., 1964, An analysis of transformations:
Journal of the Royal Statistical Society, v. 26, Series B, p. 211–243.
Draper, N.R., and Smith, H., 1998, Applied regression analysis: New York,
John Wiley and Sons, 706 p.
X.test <- c(1,4,9,16,25,36,49) boxCox(X.test) boxCox(X.test, lambda=0) IboxCox(boxCox(1:3, lambda=0), lambda=0) # verify the back-transform ## Should return # [1] 1 2 3 # attr(,"GM") # [1] 1.817121
X.test <- c(1,4,9,16,25,36,49) boxCox(X.test) boxCox(X.test, lambda=0) IboxCox(boxCox(1:3, lambda=0), lambda=0) # verify the back-transform ## Should return # [1] 1 2 3 # attr(,"GM") # [1] 1.817121
Combine time-of-day data into a single vector.
## S3 method for class 'timeDay' c(..., recursive = FALSE)
## S3 method for class 'timeDay' c(..., recursive = FALSE)
... |
any number of objects that can be converted to class "timeDay." The first must be a time-of-day object. |
recursive |
not used, required for other methods. |
A single vector of class "timeDay."
c(as.timeDay("10:00"), as.timeDay("3 PM", format="%I %p"))
c(as.timeDay("10:00"), as.timeDay("3 PM", format="%I %p"))
Construct a vector with as few missing values as possible from a selected sequence of vectors.
coalesce(mat, ...) index.coalesce(mat, ...)
coalesce(mat, ...) index.coalesce(mat, ...)
mat |
a vector or matrix. |
... |
additional vectors or matrices, must have the same number of
rows as |
For coalesce
, a vector in which each element is determined by
selecting the first non-missing value in the order in which they are
specified in the argument list. The first step is to construct a matrix from
all arguments. The output is initially set to column 1; for any missing values
in the column, the data from column 2 are used and so on until all columns have
been searched or all missing values replaced.
For index.coalesce
, an integer vector indicating which column from mat or
from the vectors or constant specified for ... produced the result in coalesce.
This function is most useful for creating a column in a dataset from related columns that represent different methods. For example, a single column of alkalinity may be desired when there are multiple columns of alkalinity determined by various methods.
coalesce(c(1,NA,NA,3), c(2,2,NA,2)) # should be: [1] 1 2 NA 3 coalesce(c(1,NA,NA,3), c(2,2,NA,2), 0) # should be: [1] 1 2 0 3
coalesce(c(1,NA,NA,3), c(2,2,NA,2)) # should be: [1] 1 2 NA 3 coalesce(c(1,NA,NA,3), c(2,2,NA,2), 0) # should be: [1] 1 2 0 3
Support function to supply data for functions within smwrBase.
conc.meq()
conc.meq()
A list containing necessary information for the function
conc2meq
.
The user may choose to make local copies of the data with the same name (conc.meq) to be able to change or add to the list.
Convert concentrations in milligrams per liter (mg/L) to milli-equivalents per liter.
conc2meq(conc, constituent)
conc2meq(conc, constituent)
conc |
a numeric vector containing the concentration data in milligrams per liter. |
constituent |
the name of the constituent. Must be one of "aluminum," "ammonium," "bicarbonate," "bromide," "calcium," "carbonate," "chloride," "fluoride," "iron," "magnesium," "manganese," "nitrate as n," "nitrite as n," "phosphorus as p," "potassium," "sodium," "sulfate," or "sulfide." There must be enough characters in the name to uniquely identify the constituent. The case of the input name is ignored. |
Vector containing the milli-equivalent values. Missing values
(NA
s) are returned if the constituent name is invalid.
The user must verify that the units of concentration are milligrams per liter. Only those constituents that are typically reported in milligrams per liter (rather than micrograms per liter) are provided in this function. Aluminum, iron, and manganese (and possibly sulfide) are sometimes reported in micrograms per liter. Such values should be divided by 1000.0 before using this function.
The conversion for iron assumes that the dissolved iron is iron II.
The conversion for phosphorus assumes that most of the phosphorus is divalent. The actual conversion for phosphorus is very dependent on pH.
The conversion factors are taken from table 9 (page 56) of Hem (1985). The
available conversion factors are stored in the list created by conc.meq
in
smwrBase.
Hem, J.D., 1985, Study and interpretation of the chemical characteristics of natural water: U.S. Geological Survey Water-Supply Paper 2254, 263 p.
conc2meq(c(1,2,3), "Nitrate") # should be: [1] 0.07139 0.14278 0.21417
conc2meq(c(1,2,3), "Nitrate") # should be: [1] 0.07139 0.14278 0.21417
Computes the number of days in a month or the total number of days in the year to the end of the month.
daysInMonth(month, year, cum = FALSE)
daysInMonth(month, year, cum = FALSE)
month |
the month number, must range in value from 1 to 12. Missing values are permitted. |
year |
the calendar year, replicated in length to match |
cum |
a logical value. If |
A vector matching month
of the requested number of days.
Missing values are returned wherever either month
or year
is
missing.
# Check February on a leap year and regular year. # Should return 29, 28 daysInMonth(c(2,2), c(2000, 2001))
# Check February on a leap year and regular year. # Should return 29, 28 daysInMonth(c(2,2), c(2000, 2001))
Convert date/time data to be expressed as year and fractional part of year. This can be useful for plotting or representing time in a regression model.
dectime(dates, times, time.format, date.format, Date.noon = TRUE, year.type = c("calendar", "water", "climate"))
dectime(dates, times, time.format, date.format, Date.noon = TRUE, year.type = c("calendar", "water", "climate"))
dates |
a vector of a valid date object, or character representation of dates. Missing values are permitted and produce corresponding missing values in the output. |
times |
a character representation of times. Missing values are permitted and produce corresponding missing values in the output. |
time.format |
format to convert |
date.format |
format to convert |
Date.noon |
logical, if |
year.type |
a character string indicating the type of year to determine the offset, must be one of "calendar," "water," or "climate." |
The format for times
must be one of "hm," "hms," or "ms." Note that
this is actually a conversion function, see See Also. If times
is missing, dates
is class "Date," and Date.noon
is TRUE
,
then set the time to 12:00, so that the decimal time represents the center of
the day.
A vector representation of the data in decimal format–year and decimal fraction.
dectime("11/11/1918", date.format="%m/%d/%Y") dectime(1988:1990)
dectime("11/11/1918", date.format="%m/%d/%Y") dectime(1988:1990)
Convert time data expressed as year and fractional part of year to class "Date."
dectime2Date(x, Date.noon = TRUE)
dectime2Date(x, Date.noon = TRUE)
x |
the decimal date to convert. |
Date.noon |
correct from noon correction for |
A vector of class "Date" corresponding to each value in x
.
A small value, representing about 1 minute, is added to each value in x
to prevent truncation errors in the conversion. This can cause some errors if
the data were converted from date and time data.
dectime("02/07/2013", date.format="%m/%d/%Y") # Convert back the printed result: dectime2Date(2013.103)
dectime("02/07/2013", date.format="%m/%d/%Y") # Convert back the printed result: dectime2Date(2013.103)
Density, cumulative probability, quantiles, and random generation for the log-Pearson Type III distribution.
dlpearsonIII(x, meanlog = 0, sdlog = 1, skew = 0) plpearsonIII(q, meanlog = 0, sdlog = 1, skew = 0) qlpearsonIII(p, meanlog = 0, sdlog = 1, skew = 0) rlpearsonIII(n, meanlog = 0, sdlog = 1, skew = 0)
dlpearsonIII(x, meanlog = 0, sdlog = 1, skew = 0) plpearsonIII(q, meanlog = 0, sdlog = 1, skew = 0) qlpearsonIII(p, meanlog = 0, sdlog = 1, skew = 0) rlpearsonIII(n, meanlog = 0, sdlog = 1, skew = 0)
x , q
|
vector of quantiles. |
meanlog |
vector of means of the distribution of the log-transformed data. |
sdlog |
vector of standard deviation of the distribution of the log-transformed data. |
skew |
vector of skewness of the distribution of the log-transformed data. |
p |
vector of probabilities. |
n |
number of observations. If length( |
Elements of x, q, or p that are missing will result in missing values in the returned data.
Either the density (dlpearsonIII
), cumulative probability
(plpearsonIII
), quantile (qlpearsonIII
), or random sample
(rlpearsonIII
) for the described distribution.
The log-Pearson Type III distribution is used extensively in flood-frequency analysis in the United States.
## Simple examples dlpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qlnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution dlpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) ## Simple examples plpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qlnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution plpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) ## Simple examples qlpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qlnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution qlpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) ## Simple examples rlpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qlnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution rlpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25)
## Simple examples dlpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qlnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution dlpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) ## Simple examples plpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qlnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution plpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) ## Simple examples qlpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qlnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution qlpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) ## Simple examples rlpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qlnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution rlpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25)
Convert data in degrees, minutes, and seconds format to decimal degrees.
dms2dd(x, minutes = NULL, seconds = 0, split = "")
dms2dd(x, minutes = NULL, seconds = 0, split = "")
x |
a character or numeric vector coded as degrees, minutes, and seconds or a numeric vector of degrees. The character string may contain a leading zero for values less than 100. Missing values are permitted and result in missing values in the output. |
minutes |
a vector of minutes. If supplied, then |
seconds |
a vector of seconds. Assumed to be 0 if not supplied. Missing values are permitted. |
split |
the delimiter for |
A numeric vector of decimal degrees the same length as x
.
Missing values are returned wherever x
, minutes
, or
seconds
has a missing value.
dms2dd(983206) # using a numeric value # should be [1] 98.535 dms2dd("0983206") # using a character value # should be [1] 98.535 dms2dd(98, 32, 6) # using numeric values for degrees, minutes and seconds # should be [1] 98.535 dms2dd("98:32", split=":") # Note seconds not included in text # should be [1] 98.53333
dms2dd(983206) # using a numeric value # should be [1] 98.535 dms2dd("0983206") # using a character value # should be [1] 98.535 dms2dd(98, 32, 6) # using numeric values for degrees, minutes and seconds # should be [1] 98.535 dms2dd("98:32", split=":") # Note seconds not included in text # should be [1] 98.53333
Density, cumulative probability, quantiles, and random generation for the Pearson Type III distribution.
dpearsonIII(x, mean = 0, sd = 1, skew = 0) ppearsonIII(q, mean = 0, sd = 1, skew = 0) qpearsonIII(p, mean = 0, sd = 1, skew = 0) rpearsonIII(n, mean = 0, sd = 1, skew = 0)
dpearsonIII(x, mean = 0, sd = 1, skew = 0) ppearsonIII(q, mean = 0, sd = 1, skew = 0) qpearsonIII(p, mean = 0, sd = 1, skew = 0) rpearsonIII(n, mean = 0, sd = 1, skew = 0)
x , q
|
vector of quantiles. Missing values are permitted and result in corresponding missing values in the output. |
mean |
vector of means of the distribution of the data. |
sd |
vector of standard deviation of the distribution of the data. |
skew |
vector of skewness of the distribution of the data. |
p |
vector of probabilities. |
n |
number of observations. If length( |
Elements of x
, q
, or p
that are missing will result in
missing values in the returned data.
Either the density (dpearsonIII
), cumulative probability
(ppearsonIII
), quantile (qpearsonIII
), or random sample
(rpearsonIII
) for the described distribution.
The log-Pearson Type III distribution is used extensively in flood-frequency analysis in the United States. The Pearson Type III forms the basis for that distribution.
## Simple examples dpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution dpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) ## Simple examples ppearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution ppearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) ## Simple examples qpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution qpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) # Simple examples rpearsonIII(c(.5, .75, .9), 1.5, .25, 0)
## Simple examples dpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution dpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) ## Simple examples ppearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution ppearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) ## Simple examples qpearsonIII(c(.5, .75, .9), 1.5, .25, 0) ## compare to normal qnorm(c(.5, .75, .9), 1.5, .25) ## Make a skewed distribution qpearsonIII(c(.5, .75, .9), 1.5, .25, 0.25) # Simple examples rpearsonIII(c(.5, .75, .9), 1.5, .25, 0)
Computes the event number eventNum
, the length of events
eventLen
, or the sequence number for individual observations within an
event eventSeq
.
eventNum(event, reset = FALSE, na.fix = FALSE) eventSeq(eventno) eventLen(eventno, summary = FALSE)
eventNum(event, reset = FALSE, na.fix = FALSE) eventSeq(eventno) eventLen(eventno, summary = FALSE)
event |
a logical vector where |
reset |
a logical value indicating whether the event is assumed to
continue until the next event, or only while event is |
na.fix |
the value to use where event has missing values ( |
eventno |
an integer vector indicating the event number. Generally the
output from the |
summary |
a logical value, controlling output. See Value for details. |
The function eventNum
returns an integer vector the same
length as event
indicating the event sequence number.
The function eventLen
returns an integer vector the same length as
eventno
indicating the sequence length of the event if summary
is FALSE
, or a named integer vector indicating the sequence length of
each event if summary
is TRUE
.
The function eventSeq
returns an integer vector the same length as
eventno
indicating the sequence number of each element in the
event.
## Notice the difference caused by setting reset to TRUE eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)) eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE) ## Notice the difference caused by setting reset to TRUE eventSeq(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE))) eventSeq(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE)) ## Notice the difference caused by setting reset to TRUE eventLen(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE)) ## This is an example of the summary option eventLen(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE), summary=TRUE)
## Notice the difference caused by setting reset to TRUE eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)) eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE) ## Notice the difference caused by setting reset to TRUE eventSeq(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE))) eventSeq(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE)) ## Notice the difference caused by setting reset to TRUE eventLen(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE)) ## This is an example of the summary option eventLen(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE), summary=TRUE)
Some time-series analyses require data that are uniformly spaced in time. This function will construct a regular series from randomly spaced events.
eventSeries(times, period = "hour", which = "cumsum", begin, end, k.period = 1)
eventSeries(times, period = "hour", which = "cumsum", begin, end, k.period = 1)
times |
a date-like vector corresponding to data. |
period |
character string that is valid input to the POSIXct method
for the |
which |
a character string indicating the method to use. See Details for options. |
begin |
the beginning date as POSIXt or as character. |
end |
the end date as POSIXt or as character. |
k.period |
the number of units of |
If there is no observation during a period, then that value
is set to 0 if which
is "sum" or the value for the previous period if
which
is "cumsum." The initial value of the series is always 0.
The function eventSeries
returns a data frame with two columns:
DateTime |
the date and time. |
Sum |
the sum of the number of events in the period if |
CumSum |
the cumulative sum of the number of events up to and
including the period if |
## Not run: library(smwrData) data(QW05078470) # Count the number of samples per month with(QW05078470, eventSeries(DATES, "month", which="sum")) ## End(Not run)
## Not run: library(smwrData) data(QW05078470) # Count the number of samples per month with(QW05078470, eventSeries(DATES, "month", which="sum")) ## End(Not run)
Exports a data frame to a text-based file.
exportCSV(x, file.name = "")
exportCSV(x, file.name = "")
x |
the data frame to be written. |
file.name |
a character string naming the file for output. |
The file name is returned.
The function exportCSV
also writes a meta file that contains
information about column formatting.
Exports a data frame to a text-based file.
exportRDB(x, file.name = "data.rdb", col.names = NULL, meta = FALSE, code.rule = 10)
exportRDB(x, file.name = "data.rdb", col.names = NULL, meta = FALSE, code.rule = 10)
x |
the data frame to be written. |
file.name |
a character string naming the file for output. |
col.names |
a vector of column names to use instead of the column names
in |
meta |
a logical value indicating whether the header should include a metadata template for documentation or not. |
code.rule |
an integer value indicating how many unique numeric values should be included in the metadata template for cases where each distinct value has a descriptive meaning rather than a numeric meaning. |
Setting the meta
argument to TRUE
generates a header that
contains a template that can be edited by the user to describe the contents
of the file.
The file name is returned.
write.table
, importRDB
Replace missing values in time-series data by interpolation.
fillMissing(x, span = 10, Dates = NULL, max.fill = 10)
fillMissing(x, span = 10, Dates = NULL, max.fill = 10)
x |
the sequence of observations. Missing values are permitted and will be replaced. |
span |
the maximum number of observations on each side of each range of missing values to use in constructing the time-series model. See Details. |
Dates |
an optional vector of dates/times associated with each value
in |
max.fill |
the maximum gap to fill. |
Missing values at the beginning and end of x
will not be replaced.
The argument span
is used to help set the range of values used to
construct the StructTS
model. If span
is set small, then the
variance of epsilon dominates and the estimates are not smooth. If
span
is large, then the variance of level dominates and the estimates
are linear interpolations. The variances of level and epsilon are components
of the state-space model used to interpolate values, see StructTS
for details.
See Note for more information about the method.
If span
is set larger than 99, then the entire time series is used to
estimate all missing values. This approach may be useful if there are many
periods of missing values. If span
is set to any number less than 4,
then simple linear interpolation will be used to replace missing values.
The observations in x
with missing values replaced by
interpolation.
The method used to interpolate missing values is based on
tsSmooth
constructed using StructTS
on x
with
type
set to "trend." The smoothing method basically uses the
information (slope) from two values previous to missing values and the two
values following missing values to smoothly interpolate values accounting for
any change in slope. Beauchamp (1989) used time-series methods for synthesizing
missing streamflow records. The group that is used to define the statistics that
control the interpolation is very simply defined by span
rather than
the more in-depth measures described in Elshorbagy and others (2000).
If the data have gaps rather than missing values, then fillMissing will return
a vector longer than x
if Dates
is given and the return data
cannot be inserted into the original data frame. If Dates
is not given,
then the gap will be recognized and not be filled. The function
insertMissing
can be used to create a data frame with the complete
sequence of dates.
Beauchamp, J.J., 1989, Comparison of regression and time-series
methods for synthesizing missing streamflow records: Water Resources
Bulletin, v. 25, no. 5, p. 961–975.
Elshorbagy, A.A., Panu, U.S., and Simonovic, S.P., 2000, Group-based estimation of missing hydrological data, I. Approach and general methodology: Hydrological Sciences Journal, v. 45, no. 6, p. 849–866.
tsSmooth
, StructTS
, insertMissing
## Not run: library(smwrData) data(Q05078470) # Create missing values in flow, the first sequence is a peak and the second is a recession Q05078470$FlowMiss <- Q05078470$FLOW Q05078470$FlowMiss[c(109:111, 198:201)] <- NA # Interpolate the missing values Q05078470$FlowFill <- fillMissing(Q05078470$FlowMiss) # How did we do (line is actual, points are filled values)? par(mfrow=c(2,1), mar=c(5.1, 4.1, 1.1, 1.1)) with(Q05078470[100:120, ], plot(DATES, FLOW, type="l")) with(Q05078470[109:111, ], points(DATES, FlowFill)) with(Q05078470[190:210, ], plot(DATES, FLOW, type="l")) with(Q05078470[198:201, ], points(DATES, FlowFill)) ## End(Not run)
## Not run: library(smwrData) data(Q05078470) # Create missing values in flow, the first sequence is a peak and the second is a recession Q05078470$FlowMiss <- Q05078470$FLOW Q05078470$FlowMiss[c(109:111, 198:201)] <- NA # Interpolate the missing values Q05078470$FlowFill <- fillMissing(Q05078470$FlowMiss) # How did we do (line is actual, points are filled values)? par(mfrow=c(2,1), mar=c(5.1, 4.1, 1.1, 1.1)) with(Q05078470[100:120, ], plot(DATES, FLOW, type="l")) with(Q05078470[109:111, ], points(DATES, FlowFill)) with(Q05078470[190:210, ], plot(DATES, FLOW, type="l")) with(Q05078470[198:201, ], points(DATES, FlowFill)) ## End(Not run)
Format an object of class "timeDay."
## S3 method for class 'timeDay' format(x, format, ...)
## S3 method for class 'timeDay' format(x, format, ...)
x |
the object to be formatted to type "character." |
format |
the format to use for output. See |
... |
not used, required for other methods. |
A vector of character strings representing the time of day values in x
.
Compute sine and cosine terms for describing annual or daily variations.
fourier(x, k.max = 1)
fourier(x, k.max = 1)
x |
a numeric vector where one unit specifies the period. See Details. Missing values are permitted. |
k.max |
the maximum number of paired sine and cosine terms specifying the order of the Fourier series. |
The argument x
can be expressed as decimal time, either annual or
diel; or it can be an object of class "Date," "POSIXct," or "POSIXlt" in
which case it will be converted to annual decimal time using the
dectime
function.
A matrix of the sine and cosine terms corresponding to the
value—two terms are computed for each value of k from 1 to k.max
: sine(k 2
pi x
) and cosine(k 2 pi x
). The value of k.max
is included as an attribute.
Water-quality data commonly follow a sinusoidal variation throughout a yearly cycle. A Fourier series of order one to three is generally enough to adequately describe that variation for many constituents.
# compute the sine and cosine terms for quarters of 2002 fourier(2002 + (0:3)/4) # sin(k=1) cos(k=1) # [1,] 3.54692e-014 1.00000e+000 # [2,] 1.00000e+000 7.08886e-013 # [3,] -3.65749e-013 -1.00000e+000 # [4,] -1.00000e+000 -3.78606e-013 # attr(, "k.max"): # [1] 1 # Compare to 2 cycles per year: fourier(2002 + (0:3)/4, 2) # sin(k=1) cos(k=1) sin(k=2) cos(k=2) #[ 1,] 3.546924e-14 1.000000e+00 7.093848e-14 1 # [2,] 1.000000e+00 7.088855e-13 1.417771e-12 -1 # [3,] -3.657492e-13 -1.000000e+00 7.314983e-13 1 # [4,] -1.000000e+00 -3.786056e-13 7.572112e-13 -1 # attr(,"k.max") # [1] 2
# compute the sine and cosine terms for quarters of 2002 fourier(2002 + (0:3)/4) # sin(k=1) cos(k=1) # [1,] 3.54692e-014 1.00000e+000 # [2,] 1.00000e+000 7.08886e-013 # [3,] -3.65749e-013 -1.00000e+000 # [4,] -1.00000e+000 -3.78606e-013 # attr(, "k.max"): # [1] 1 # Compare to 2 cycles per year: fourier(2002 + (0:3)/4, 2) # sin(k=1) cos(k=1) sin(k=2) cos(k=2) #[ 1,] 3.546924e-14 1.000000e+00 7.093848e-14 1 # [2,] 1.000000e+00 7.088855e-13 1.417771e-12 -1 # [3,] -3.657492e-13 -1.000000e+00 7.314983e-13 1 # [4,] -1.000000e+00 -3.786056e-13 7.572112e-13 -1 # attr(,"k.max") # [1] 2
Combine data from several rows into a single row based on common data in selected columns.
group2row(data, carryColumns, splitColumn, collectColumns)
group2row(data, carryColumns, splitColumn, collectColumns)
data |
a data frame containing the columns to be combined. |
carryColumns |
the names of the columns that form a row in the output dataset. Each unique combination of values in these columns will be a new row in the output dataset. |
splitColumn |
the name of a single column. For each unique value in
|
collectColumns |
the names of the columns to be collected. See Details. |
The function group2row
combines data from several rows into a single
row. Certain columns in the input dataset are said to be "collected." Other
columns may be "carried" into the output dataset by listing them in
carryColumns
. A new row will be created for each unique combination of
values in the carryColumns
. The output row consists of the carried
columns plus new columns that are named by the unique values in the
splitColumn
concatenated with the names in the collectColumns
.
The number of columns in the output data frame is equal to the number of
carryColumns
plus the number of unique values in the
splitColumn
times the number of names in the collectColumns
.
The strategy for collecting columns is to use a set of index values defined
by the splitColumn
. The maximum number of input rows collected for
each output row is equal to the number of unique values defined in the
splitColumn
. The splitColumn
is used to identify a column from
the input data that contains output column information. If a row of input has
a value in this column that matches one of the index values, then that row's
data will be included in the output in the column positions corresponding to
the matched index. The index values are concatenated with the input column
names of the collected columns to derive output column names.
## Not run: library(smwrData) data(QWstacked) group2row(QWstacked, c("site_no", "sample_dt", "sample_tm"), "parm_cd", c("result_va", "remark_cd")) ## End(Not run)
## Not run: library(smwrData) data(QWstacked) group2row(QWstacked, c("site_no", "sample_dt", "sample_tm"), "parm_cd", c("result_va", "remark_cd")) ## End(Not run)
Functions for transforming and back-transforming data using a hyperbolic function.
hyperbolic(x, factor = 0, scale = mean(x, na.rm = TRUE)) Ihyperbolic(x, factor = 0, scale)
hyperbolic(x, factor = 0, scale = mean(x, na.rm = TRUE)) Ihyperbolic(x, factor = 0, scale)
x |
a numeric vector to be transformed by |
factor |
the hyperbolic adjustment term in the hyperbolic equation. |
scale |
the scaling factor for the data. |
If x
contains missing values, then scale
is computed after
omitting the missing values and the output vector has a missing value
wherever x
has a missing value.
The basic equation for the hyberbolic transform is 1/(1 + (10^factor
* x
)/
scale
). The basic equation is adjusted to produce fairly consistent values for small changes
in factor
and increase for increasing values in x
.
The function hyperbolic
computes the forward transform and the
function Ihyperbolic
computes the inverse [hyperbolic] transform, or back-transform.
A numeric vector of the transformed or back-transformed values in
x
with an attribute "scale" of the values used for scale
. The range
of the values returned from hyperbolic
is between 0 and 2 times scale
.
The original hyperbolic transform used a linear factor. The version in
these functions uses the common log of the factor to make the factors easier
to use.
When used with the default value for scale
, factor
values
outside the range of +/- 3 have very little effect on the transform.
The use of a variable hyperbolic transform to help model the relations between stream water chemistry and flow was first described in:
Johnson, N.M., Likens, G.E., Borman, F.H., Fisher, D.W., and Pierce, R.S., 1969, A working model for the variation in stream water chemistry at the Hubbard Brook Experimental Forest, New Hampshire: Water Resources Research, v. 5, no. 6, p. 1353–1363.
X.test <- c(1,4,9,16,25,36,49) hyperbolic(X.test) # accept the defaults hyperbolic(X.test, factor=1) hyperbolic(X.test, factor=-1)
X.test <- c(1,4,9,16,25,36,49) hyperbolic(X.test) # accept the defaults hyperbolic(X.test, factor=1) hyperbolic(X.test, factor=-1)
Compute a basis for estimating hysteresis effects in some variable related to
the argument x
.
hysteresis(x, step = 3)
hysteresis(x, step = 3)
x |
the sequence of observations. Missing values are permitted and will be copied in the output. |
step |
the number of previous observations to use to compute the local mean. See Note. |
A numeric vector that approximates the local trend in x
.
The basis for estimating hysteresis is the current value x
minus
the mean of the previous step
values. The first step
values in
the output will be missing, and each missing value will result in step
plus 1 missing values. This approximates the trend in x
; if x
is
increasing in value over the previous step
values, then the output will be positive
and the greater the relative increase, the larger the output.
The use of hysteresis to help model the relations between stream water chemistry
and flow are described in:
Garrett, J.D., 2012, Concentrations, loads, and yields of select constituents from major tributaries of the Mississippi and Missouri Rivers in Iowa, water years 2004-2008: U.S. Geological Survey Scientific Investigations Report 2012–5240, 61 p.
Wang, P., and Linker, L.C., 2008, Improvement of regression simulation in fluvial sediment loads: Journal of Hydraulic Engineering, v. 134, no. 10, p. 1,527–1,531.
## Not run: library(smwrData) data(Q05078770) # Plot flow and hysteresis to show looping with(Q05078770, plot(log(FLOW), hysteresis(log(FLOW), 3), type="l")) ## End(Not run)
## Not run: library(smwrData) data(Q05078770) # Plot flow and hysteresis to show looping with(Q05078770, plot(log(FLOW), hysteresis(log(FLOW), 3), type="l")) ## End(Not run)
Imports a comma-separated variable file to a data frame.
importCSV(file.name = "", tz = "")
importCSV(file.name = "", tz = "")
file.name |
a character string specifying the name of the comma-separated variable
(CSV) file containing the data to be imported; |
tz |
a character string indicating the time-zone information for data imported as "POSIXct." The default is to use the local setting. |
All of the dates in a date column must have the same format as the first
non-blank date in the column. Any date with a format different from that of
the first non-blank date in the column will be imported as NA
(missing
value). Dates imported as class "Date" using a 4-digit year,
2-digit month, and 2-digit day with the period (.), hyphen (-), slash (/), or
no separator. Time and date data are imported as class "POSIXct" and
assumes the standard POSIX format for date and time.
A data frame with one column for each data column in the CSV file.
A NULL data frame is created if there are no data in the file.
read.csv
, read.table
, scan
,
as.Date
, as.POSIXct
## Not run: ## These datasets are available in smwrData as text files TestDir <- system.file("misc", package="smwrData") TestPart <- importCSV(file.path(TestDir, "TestPart.csv")) ## End(Not run)
## Not run: ## These datasets are available in smwrData as text files TestDir <- system.file("misc", package="smwrData") TestPart <- importCSV(file.path(TestDir, "TestPart.csv")) ## End(Not run)
Imports a formatted, tab-delimited file to a data frame.
importRDB(file.name = "", date.format = NULL, tz = "", convert.type = TRUE)
importRDB(file.name = "", date.format = NULL, tz = "", convert.type = TRUE)
file.name |
a character string specifying the name of the relational database (RDB) file containing the data to be imported. |
date.format |
a character string specifying the format of all date
columns. Required for columns that contain date and time. The default value,
|
tz |
the time zone information of the data. |
convert.type |
logical |
All of the dates in a date column must have the same format as the first
non-blank date in the column. Any date with a format different from that of
the first non-blank date in the column will be imported as NA
(missing
value). By default, dates are imported as class "Date" using a 4-digit year,
2-digit month, and 2-digit day with the period (.), hyphen (-), slash (/), or
no separator.
If a valid date.format
is supplied, then the data are imported using
as.POSIXct
, and time information can be included in the data. If
date.format
is "none," then conversion of the date information is
suppressed and the data are retained as character strings.
The value for tz
should be a valid "Olson" format consisting typically
of a continent and city. See timezone
for a description of
time zones. For the United States, use these time-zone specifications where
daylight savings time is used:
Eastern | "America/New_York" |
Central | "America/Chicago" |
Mountain | "America/Denver" |
Pacific | "America/Los_Angeles" |
Alaska | "America/Anchorage" |
Hawii | "America/Honolulu" |
Use these time specifications where daylight savings time is not used: #'
Eastern | "America/Jamaica" |
Central | "America/Managua" |
Mountain | "America/Phoenix" |
Pacific | "America/Metlakatla" |
A data frame with one column for each data column in the RDB file.
A NULL data frame is created if there are no data in the file.
The header information contained in the RDB file is retained in the output
dataset as comment
.
If convert.type
is TRUE
, then non-numeric values, other than blanks,
are converted to NaN
(not a number) rather than NA
(missing value)
in numeric columns. NaN
values are treated like NA
values but can
be identified using the is.nan
function.
read.table
, as.Date
, as.POSIXct
,
comment
## Not run: ## This dataset is available in smwrData as a text file TestDir <- system.file("misc", package="smwrData") TestFull <- importRDB(file.path(TestDir, "TestFull.rdb")) ## End(Not run)
## Not run: ## This dataset is available in smwrData as a text file TestDir <- system.file("misc", package="smwrData") TestFull <- importRDB(file.path(TestDir, "TestFull.rdb")) ## End(Not run)
Inserts rows of missing values into a data frame for gaps in a sequence.
insertMissing(x, col, fill = FALSE)
insertMissing(x, col, fill = FALSE)
x |
the data frame. |
col |
the name of the column that defines the sequence. |
fill |
logical ( |
A data frame like x
, but with rows of missing values where
there is a break in the sequence in column col
.
Setting fill
to TRUE
is useful for setting up datasets
for fillMissing
if there are gaps in the retrieved data. Setting
fill
to FALSE
, the default, is useful for creating a break
in the sequence for plotting the data.
## Not run: library(smwrData) data(Q05078470) # Plot the original data with(Q05078470[100:120, ], plot(DATES, FLOW, type="l")) # Remove 3 rows from the data set Q05078470 <- Q05078470[-(109:111), ] # Plot the data--line drawn through the missing record with(Q05078470[100:117, ], lines(DATES, FLOW, col="green")) # Insert a missing record Q05078470 <- insertMissing(Q05078470, "DATES") # Now plot to show gap in line with(Q05078470[100:118, ], lines(DATES, FLOW, col="blue")) ## End(Not run)
## Not run: library(smwrData) data(Q05078470) # Plot the original data with(Q05078470[100:120, ], plot(DATES, FLOW, type="l")) # Remove 3 rows from the data set Q05078470 <- Q05078470[-(109:111), ] # Plot the data--line drawn through the missing record with(Q05078470[100:117, ], lines(DATES, FLOW, col="green")) # Insert a missing record Q05078470 <- insertMissing(Q05078470, "DATES") # Now plot to show gap in line with(Q05078470[100:118, ], lines(DATES, FLOW, col="blue")) ## End(Not run)
Indicate which elements are missing.
## S3 method for class 'timeDay' is.na(x)
## S3 method for class 'timeDay' is.na(x)
x |
the object to be tested. |
A logical vector of the same length as its argument x
,
containing TRUE
for those elements marked NA
and FALSE
otherwise.
is.na(as.timeDay(c("10:30", "11:00")))
is.na(as.timeDay(c("10:30", "11:00")))
Tests if an object can be treated as a character, to name something; as a date; as a grouping variable, has distinct values; or as a number.
isCharLike(x) isDateLike(x) isGroupLike(x) isNumberLike(x)
isCharLike(x) isDateLike(x) isGroupLike(x) isNumberLike(x)
x |
any object. |
The function isCharLike
tests whether x
is of class "character"
or "factor." The function isDateLike
tests whether x
is of
class "Date" or "POSIXt." The function isGroupLike
tests whether
x
is of class "character" or "factor" or if x
is of type
"integer" or "logical." The function isNumberLike
tests whether
x
is of type "numeric" or of class "Date."
A logical value TRUE
if x
meets the criteria, or
FALSE
if it does not.
This function is most useful within other functions to control how that function handles a particular argument.
class
, is.numeric
,
is.factor
, is.character
,
is.integer
, is.logical
## The first should be FALSE and the second TRUE isCharLike(as.Date("2004-12-31")) isCharLike("32") ## The first should be FALSE and the second TRUE isDateLike(32) isDateLike(as.Date("2004-12-31")) ## The first should be FALSE and the second TRUE isGroupLike(as.Date("2004-12-31")) isGroupLike(32) ## The first should be FALSE and the second TRUE isNumberLike(as.Date("2004-12-31")) isNumberLike(32)
## The first should be FALSE and the second TRUE isCharLike(as.Date("2004-12-31")) isCharLike("32") ## The first should be FALSE and the second TRUE isDateLike(32) isDateLike(as.Date("2004-12-31")) ## The first should be FALSE and the second TRUE isGroupLike(as.Date("2004-12-31")) isGroupLike(32) ## The first should be FALSE and the second TRUE isNumberLike(as.Date("2004-12-31")) isNumberLike(32)
Get the length of a time-of-day object.
## S3 method for class 'timeDay' length(x)
## S3 method for class 'timeDay' length(x)
x |
a time-of-day object. |
An integer of length 1 indicating the number of elements in x
.
length(as.timeDay(c("10:30", "11:00")))
length(as.timeDay(c("10:30", "11:00")))
Create a template meta file for a CSV file. The meta file is used by
importCSV
to define column types.
makeMeta(file.name = "")
makeMeta(file.name = "")
file.name |
the name of the CSV file, which may include the path. |
The name of the meta file that was created.
The meta file that is created will only contain column types of character, numeric, integer, and logical. It may need to be edited by the user to redefine the column types actually needed for the data, for example columns of class "Date," "POSIXct," or "factor."
A utility to help model.frame.default
create the correct matrices
when predicting from models with quadratic
,
hyperbolic
, or boxCox
terms. Used only internally.
## S3 method for class 'quadratic' makepredictcall(var, call) ## S3 method for class 'hyperbolic' makepredictcall(var, call) ## S3 method for class 'boxCox' makepredictcall(var, call) ## S3 method for class 'scaleRng' makepredictcall(var, call)
## S3 method for class 'quadratic' makepredictcall(var, call) ## S3 method for class 'hyperbolic' makepredictcall(var, call) ## S3 method for class 'boxCox' makepredictcall(var, call) ## S3 method for class 'scaleRng' makepredictcall(var, call)
var |
a variable. |
call |
the term in the formula, as a call. |
A replacement for call
for the prediction variable.
timeDay
objectsNo mathematical functions, such as log
or exp
are allowed
on time-of-day data. An error results when trying to use any mathematical function
on objects of class "timeDay."
## S4 method for signature 'timeDay' Math(x)
## S4 method for signature 'timeDay' Math(x)
x |
an object of class "timeDay." |
Merge two datasets based on the the nearest date between each observation.
mergeNearest(left, dates.left = "DATES", all.left = FALSE, suffix.left = "left", right, dates.right = "DATES", suffix.right = "right", Date.noon = TRUE, max.diff = "7 days")
mergeNearest(left, dates.left = "DATES", all.left = FALSE, suffix.left = "left", right, dates.right = "DATES", suffix.right = "right", Date.noon = TRUE, max.diff = "7 days")
left |
the left-hand dataset. |
dates.left |
the name of the column of dates in the left-hand dataset. |
all.left |
logical ( |
suffix.left |
the suffix to apply to common column names in the left-hand dataset. |
right |
the right-hand dataset. |
dates.right |
the name of the column of dates in the right-hand dataset. |
suffix.right |
the suffix to apply to common column names in the right-hand dataset. |
Date.noon |
logical ( |
max.diff |
the maximum allowable difference in time for a match. See Details. |
The format for max.diff
should be a numeric value followed by a
description of the time span. The time span must be one of "secs," "mins,"
"hours," "days," or "weeks" for seconds, minutes, hours, days, or weeks,
respectively.
A data frame of the merged data with common column names renamed by
the suffix
arguments to avoid conflict.
Water-quality data taken at a specific time frequently need to be merged with daily flow data or merged with other water-quality data such as replicate samples or samples of a different medium taken at about the same time, but having a different time stamp.
library(smwrData) data(Q05078470) data(QW05078470) # Set the actual time of sampling in QW05078470 QW05078470 <- transform(QW05078470, DATES=DATES + as.timeDay(TIMES)) mergeNearest(QW05078470, right=Q05078470) # Notice the difference in selected dates mergeNearest(QW05078470, right=Q05078470, Date.noon=FALSE)
library(smwrData) data(Q05078470) data(QW05078470) # Set the actual time of sampling in QW05078470 QW05078470 <- transform(QW05078470, DATES=DATES + as.timeDay(TIMES)) mergeNearest(QW05078470, right=Q05078470) # Notice the difference in selected dates mergeNearest(QW05078470, right=Q05078470, Date.noon=FALSE)
Merges the flow (or other data) column from one or many daily-value datasets into a another dataset with one or more stations.
mergeQ(QWdata, STAID = "STAID", FLOW = "FLOW", DATES = "DATES", Qdata = NULL, Prefix = NULL, Plot = TRUE, ...)
mergeQ(QWdata, STAID = "STAID", FLOW = "FLOW", DATES = "DATES", Qdata = NULL, Prefix = NULL, Plot = TRUE, ...)
QWdata |
a data frame with at least a date column on which to merge. |
STAID |
a character string of the name of the station-identifier column.
The column name must agree in the |
FLOW |
a character string of the name of the flow column. The column name must agree in flow datasets and will be the column name in the merged dataset. See Details |
DATES |
a character string of the name of the column containing the date
information. The column name must agree in |
Qdata |
a data frame containing daily-flow values. |
Prefix |
a character string indicating the prefix of the names of datasets containing daily-flow values. |
Plot |
a logical value indicating whether to plot the joint distribution
of sampled flows and observed flows. See Notes for a description of
the plot. Used only if a single column is specified in |
... |
defines the dataset containing daily flow values for each station identifier. |
More than one column can be specified for FLOW
when merging a single
station, and the flow data are specified in Qdata
.
A data frame like QWdata
with an attached flow column(s).
The station-identifier columns must be of class character.
There are fours ways to merge flow and water-quality data:
A dataset that contains data for a single site does not require
a STAID
column. Qdata
must be supplied. This case must be used if
the flow record is incomplete or does not cover the range of dates in
QWdata
; all other methods will fail if that is the case. See Example 1.
A dataset that contains data for one or more sites can be
merged with a dataset that contains flow data for the sites in that
first dataset. This method will fail if there is not a complete list
of station identifiers in the flow dataset. See Example 2.
A dataset that contains data for one or more sites can be
merged with flow datasets that have names based on STAID
. The structure of
the name must be some common prefix followed by the station identifier. The
station identifier must conform to a valid name. This method will
fill in missing values (NA
s) if a dataset corresponding to a station
identifier is not available. See Example 3.
A dataset that contains data for one or more sites can be
merged with flow datasets that have arbitrary names. Station identifiers that
do not conform to valid names must be quoted. This method will fail if
there is not a complete list of station identifiers supplied as arguments.
See Example 4.
The plot shows the joint distribution of the sampled flows and observed flows from the sampling time period. The quantile-quantile plots are used to assess whether the sampled and observed flows have the same distribution. If the distributions are the same, then the plot will be approximately a straight line (included as a reference line). The extreme points can have more variability than points toward the center. A plot that shows the upper end trailing upward indicates that the largest flows have been under sampled and the sampled data may not give a reliable estimate of loads.
## Not run: library(smwrData) data(Q05078470) data(Q05078770) data(Qall) data(QW05078470) data(QWall) # Example 1 # mergeQ(QW05078470, Qdata=Q05078470, Plot=FALSE) # Example 2 # mergeQ(QWall, FLOW="Flow", Qdata=Qall, Plot=FALSE) # Example 3 # mergeQ(QWall, Prefix="Q", Plot=FALSE) # Example 4 # Note quotes required for station identifiers mergeQ(QWall, "05078470"=Q05078470, "05078770"=Q05078770, Plot=FALSE) ## End(Not run)
## Not run: library(smwrData) data(Q05078470) data(Q05078770) data(Qall) data(QW05078470) data(QWall) # Example 1 # mergeQ(QW05078470, Qdata=Q05078470, Plot=FALSE) # Example 2 # mergeQ(QWall, FLOW="Flow", Qdata=Qall, Plot=FALSE) # Example 3 # mergeQ(QWall, Prefix="Q", Plot=FALSE) # Example 4 # Note quotes required for station identifiers mergeQ(QWall, "05078470"=Q05078470, "05078770"=Q05078770, Plot=FALSE) ## End(Not run)
Display the contents of an object by pages.
more(x, n = 20L, ...)
more(x, n = 20L, ...)
x |
any valid object, generally a data frame, matrix, or table. |
n |
a positive integer indicating how many lines to print for a page. |
... |
additional arguments to be passed to methods for |
The object x
is returned invisibly. Sections of x
of
length n
are displayed during the execution of the function.
The function more
is intended for interactive sessions. If used
in a non-interactive session, it simply returns x
invisibly.
Several keyboard commands can be used to view the contents of x
. The
function more
will display n
lines of x
and wait for
user input. Any of the following commands can be entered by the user;
either upper- or lowercase letters are accepted.
q
Quit
t
Go to top of x
b
Go to bottom of x
u
Go up 1/2 page
p
Go to previous page
d
Go down 1/2 page
colName/pattern
Search for pattern in the column named colName
h
or ?
Print help
any other letter
Go down full page
Searching for a pattern in a column uses grep
to search for the
specified pattern in the character representation of the data in the column.
This makes it possible to search columns that are not type character.
Implements the Savitzky-Golay (Savitzky and Golay, 1964) filter on a regular series of data to compute a moving average.
movingAve(x, span = 3, order = 0, pos = "center")
movingAve(x, span = 3, order = 0, pos = "center")
x |
the data to be averaged or differenced. Missing values are permitted but result in missing values in the output. |
span |
the length of the data to be averaged. |
order |
the polynomial order for averaging. Must be less than
|
pos |
how to position the output data relative to the value returned;
"center" means that the value represents the average of the
most central value relative to the |
A vector of the same length as x
containing the averages.
For odd values of span
and pos
equal to "center,"
order
equal to 0 or 1 gives the same result.
In general, there is no
reason to use polynomial orders greater than 2, and pos
should always
be set to "center" for polynomial orders greater than 1 to avoid strange
behavior due to end effects.
The weights for the averages are computed based on linear model theory (Savitzky and Golay, 1964; Wood and Hockens, 1970). Wood and Hockens (1970) also discuss some artifacts resulting from smoothing.
Savitzky, A., and Golay, M.J.E., 1964, Smoothing and differentiation of data
by simplified least squares procedures: Analytical Chemistry, v. 36, no. 8, p.
1627–1639.
Wood, L.C., and Hockens, S.N., 1970, Least squares smoothing operators: Geophysics, v. 35, no. 6, p. 1005–1019.
## Construct a simple valley movingData <- abs(seq(-5, 5)) movingAve(movingData, span=5) movingAve(movingData, span=5, order=2)
## Construct a simple valley movingData <- abs(seq(-5, 5)) movingAve(movingData, span=5) movingAve(movingData, span=5, order=2)
Filter a regular series of data to compute a moving difference.
movingDiff(x, span = 1, pos = "end")
movingDiff(x, span = 1, pos = "end")
x |
the data to be averaged or differenced. Missing values are permitted but result in missing values in the output. |
span |
the span of the differences. |
pos |
how to position the output data relative to the value returned;
"center" means that value represents the difference between the
preceding |
A vector of the same length as x
containing the differences.
# Construct a simple valley movingData <- abs(seq(-5, 5)) movingDiff(movingData, span=1) # Compare to diff: diff(movingData)
# Construct a simple valley movingData <- abs(seq(-5, 5)) movingDiff(movingData, span=1) # Compare to diff: diff(movingData)
Converts missing values (NA
s) to or from a user specified value.
na2miss(x, to = -99999) miss2na(x, from = -99999)
na2miss(x, to = -99999) miss2na(x, from = -99999)
x |
a vector. Missing values ( |
to |
the replacement value for |
from |
the target value to match and replace with |
An object like x
with each target value replaced by the
specified value.
The function na2miss
converts missing values (NA
) to the
value to
and is useful to prepare a vector for export and subsequent
use by software external to R that does not handle NAs.
The function
miss2na
converts the value from
to NA
and can be used to
recode data imported from external software that uses a special value to
indicate missing values.
## Construct simple substitutions na2miss(c(1, 2, 3, NA, 5, 6))
## Construct simple substitutions na2miss(c(1, 2, 3, NA, 5, 6))
Find the local maxima in a vector.
peaks(x, span = 3, ties = "first", ends = TRUE)
peaks(x, span = 3, ties = "first", ends = TRUE)
x |
any numeric vector. Missing values are permitted, but suppress
identifying peaks within |
span |
The window width, the default value is 3, meaning compare each
value to both of its neighbors. The value for |
ties |
a character indicating how to handle ties. See Details. |
ends |
a logical value indicating whether or not to include either the first or last observations in the sequence if it is a local maximum. |
Possible values for ties
include "none," which treats sequential tied values as
individual values; all other values can be thought of as collapsing
sequential tied values—"first," "middle," or "last" identify the first,
middle, or last, respectively, of a sequence of ties as the peak if appropriate.
A vector matching x
of logical values indicating whether the
corresponding element is a local maximum or not.
A peak is defined as an element in a sequence that is strictly greater
than all other elements within a window of width span
centered at that
element. As such, setting ties
to "none" has the effect of not
identifying peaks with sequential tied values.
# Note the effect of missing values peaks(c(1:6,5,4,NA,4,6,9,NA)) peaks(c(1:6,NA,5,4,NA,4,6,9,NA)) # Note the effect of ties peaks(c(1:6,6,6,5,4,3,4,6,9)) peaks(c(1:6,6,6,5,4,3,4,6,9), ties="none")
# Note the effect of missing values peaks(c(1:6,5,4,NA,4,6,9,NA)) peaks(c(1:6,NA,5,4,NA,4,6,9,NA)) # Note the effect of ties peaks(c(1:6,6,6,5,4,3,4,6,9)) peaks(c(1:6,6,6,5,4,3,4,6,9), ties="none")
Return the value associated with test
from the supplied vectors.
pick(test, ..., .pass = test, na = NA)
pick(test, ..., .pass = test, na = NA)
test |
a logical, numeric, or character vector that indicates which value
to select from the data supplied in |
... |
the values to be selected. |
.pass |
the value to return for any element of |
na |
the value to return for any element of |
If test
is logical, then if test is TRUE
, return the first
argument in ..., otherwise return the second argument.
If test
is numeric, then return that value in the list defined by ...
If test
is character, then return that value in the list defined by
...
, which must be named in the call.
If test
is
NA
, then return the value specified by na
.
A vector of the same length as test
and data values from the
values list defined by ...
The mode of the result will be coerced
from the values list defined by ...
This function is designed to replace nested ifelse
expressions.
See Examples. It is different from switch
in that the value
selected from the possible alternatives is selected by the values in
test
rather than by a single value.
## Create the test vector testpick <- c(1,2,3,1) ## Nested ifelse ifelse(testpick == 1, 1, ifelse(testpick == 2, 3, ifelse(testpick == 3, 5, NA))) ## Results by pick: pick(testpick, 1, 3, 5) ## Create a test vector of character data testpick <- c("a","b","c","a") pick(testpick, a=1, b=3, c=5)
## Create the test vector testpick <- c(1,2,3,1) ## Nested ifelse ifelse(testpick == 1, 1, ifelse(testpick == 2, 3, ifelse(testpick == 3, 5, NA))) ## Results by pick: pick(testpick, 1, 3, 5) ## Create a test vector of character data testpick <- c("a","b","c","a") pick(testpick, a=1, b=3, c=5)
Print an object of class "timeDay."
## S3 method for class 'timeDay' print(x, ...)
## S3 method for class 'timeDay' print(x, ...)
x |
an object of class "timeDay." |
... |
not used, required for other methods. |
The object x
is returned invisibly.
The object x
is printed.
The object is printed using the format that created the object
in as.timeDay
.
Computes orthogonal polynomials of degree 2 (Cohn and others, 1992). Used primarily in a linear regression formula.
quadratic(x, center = NULL)
quadratic(x, center = NULL)
x |
a numeric vector. Missing values are permitted and result in corresponding missing values in the output. |
center |
an optional value to use for the center of |
A matrix of two columns—the centered value of x
and its
square.
If center
is specified, then the polynomials will not
necessarily be orthogonal. If used in a linear regression formula, then the
coefficient of the linear term is the slope at center
.
The function quadratic
differs
from poly
in that the data are not scaled, so the regression
coefficients are directly interpretable in terms of the units of x
.
Cohn, T.A., Caulder, D.L., Gilroy, E.J., Zynjuk, L.D., and Summers, R.M., 1992, The validity of a simple statistical model for estimating fluvial constituent loads—An empirical study involving nutrient loads entering Chesapeake Bay: Water Resources Research, v. 28, no. 5, p. 937–942.
## first and second orthogonal polynomials for the sequence from 1 to 10 quadratic(seq(10))
## first and second orthogonal polynomials for the sequence from 1 to 10 quadratic(seq(10))
Import data arranged on lines into a list.
readList(file, names = TRUE, sep = "", nlines = 1, convert = NULL)
readList(file, names = TRUE, sep = "", nlines = 1, convert = NULL)
file |
a character string specifying the name of the file. |
names |
logical, if |
sep |
the separator character for the data in each line. If a blank string (the default), then any white space is taken as the separator. |
nlines |
the number of lines that represent a single collection of data, |
convert |
character string indicating how to convert the data. Must be
a valid value for the |
A list with one component for each nlines
in the input file.
# Make a 3-line example dataset with component names A, B, and C. cat("A 1 2 3 4\nB 5 6 7\nC 8 9\n", file="readList.test") # Read the example dataset readList("readList.test")
# Make a 3-line example dataset with component names A, B, and C. cat("A 1 2 3 4\nB 5 6 7\nC 8 9\n", file="readList.test") # Read the example dataset readList("readList.test")
Converts a specified value to another value.
recode(x, from, to) ## S3 method for class 'factor' recode(x, from, to) ## S3 method for class 'integer' recode(x, from, to) ## S3 method for class 'character' recode(x, from, to) ## S3 method for class 'numeric' recode(x, from, to)
recode(x, from, to) ## S3 method for class 'factor' recode(x, from, to) ## S3 method for class 'integer' recode(x, from, to) ## S3 method for class 'character' recode(x, from, to) ## S3 method for class 'numeric' recode(x, from, to)
x |
a vector. Missing values ( |
from |
the target value to match and replace. |
to |
the replacement value. |
An object like vector with each target value replaced by the specified value.
When used on numeric (type "double"), the recode
function uses an approximate match within a small tolerance range to avoid
mismatches due to computations.
The function sub
offers greater
flexibility than recode
for replacing parts of text instead of the
complete text.
XT <- c(1, 2, 0, 4) recode(XT, 0, 3)
XT <- c(1, 2, 0, 4) recode(XT, 0, 3)
Some time-series analyses require data that are uniformly spaced in time. This function will construct a regular series from randomly spaced data using any of several user-definable methods.
regularSeries(x, times, period = "month", which = "middle", begin, end, k.period = 1)
regularSeries(x, times, period = "month", which = "middle", begin, end, k.period = 1)
x |
a vector of observations that represents a series. |
times |
a date-like vector corresponding to data. |
period |
character string that is valid input to the POSIXct method
for the function |
which |
a character string indicating the method to use, or the name of a function. See Details for options. |
begin |
the beginning date as POSIXt or as character. |
end |
the end date as POSIXt or as character. |
k.period |
the number of units of |
For regularSeries
, if there is no observation during a period, then that
value is set to NA
. If there is one observation, then the value is set
to the value of that single observation. The value of which
controls how
periods with multiple observations are handled. Three character strings are
recognized for selecting a single value: "earliest" selects the earliest
observation in the period, "middle" selects the observation closest to the
middle of the period, and "latest" selects the latest observation in the
period. If which
is not one of these, then it should be the name of a
function such as mean or median.
The function regularSeries
returns a data frame with the
following columns:
Season |
the season number. |
SeasonStartDate |
the starting date of the corresponding season number—the season includes dates greater than or equal to this date. |
SeasonEndDate |
the end date of the corresponding season number—the season includes dates strictly less than this date. |
Value |
the value from |
ValueDate |
the date from |
## Not run: library(smwrData) data(QW05078470) with(QW05078470, regularSeries(P00665, DATES)) # there should be no values for season numbers 2, 5, or 10 ## End(Not run)
## Not run: library(smwrData) data(QW05078470) with(QW05078470, regularSeries(P00665, DATES)) # there should be no values for season numbers 2, 5, or 10 ## End(Not run)
Transforms numeric data to a specified range.
scaleRng(x, Min = 0, Max = 1, x.range = range(x, na.rm = TRUE)) IscaleRng(x, Min, Max, x.range)
scaleRng(x, Min = 0, Max = 1, x.range = range(x, na.rm = TRUE)) IscaleRng(x, Min, Max, x.range)
x |
any numeric vector. Missing values are permitted and result in missing values in the corresponding output. |
Min |
the minimum of the output range. |
Max |
the maximum of the output range. |
x.range |
the input range to map to the output range. The default range is
computed from the range of |
The function scaleRng
maps the minimum of x.range
to Min
and the maximum of x.range
to Max
and uses linear interpolation
for other values in x
.
A numeric vector scaled to the specified range.
Some applications recommend or require data scaled to a consistent range.
The function scaleRng
will do that and can be used to back-transform
the data.
## simple case with back-transform x.tmp <- print(scaleRng(c(1.2, 2.3, 3.4, 5.6))) IscaleRng(x.tmp) ## now set the expected ranges x.tmp <- print(scaleRng(c(1.2, 2.3, 3.4, 5.6), x.range=c(1, 6))) IscaleRng(x.tmp)
## simple case with back-transform x.tmp <- print(scaleRng(c(1.2, 2.3, 3.4, 5.6))) IscaleRng(x.tmp) ## now set the expected ranges x.tmp <- print(scaleRng(c(1.2, 2.3, 3.4, 5.6), x.range=c(1, 6))) IscaleRng(x.tmp)
Screens data to determine if a value is reported for each date by calendar or water year.
screenData(dates, values, type = "DV", year = "calendar", printit = TRUE)
screenData(dates, values, type = "DV", year = "calendar", printit = TRUE)
dates |
the sequence of dates for each value in |
values |
the sequence of observations. |
type |
the frequency of |
year |
the type of year: "calendar" or "water," which begins on October 1 of the previous calendar year and ends on September 30. |
printit |
logical, if |
Missing values are permitted in either dates
or values
.
Those missing values are tallied in the completeness of record.
For type
= "DV," a matrix of the counts of missing values,
either coded as NA
or not in the dataset, for each month and each
year within the range of dates
.
For type
= "IV," a matrix of the counts of observed values
for each month and each year within the range of dates
.
This function is based on the screen program described in:
Rutledge, A.T., 2007, Program user guide for RECESS: at
https://water.usgs.gov/ogw/recess/UserManualRECESS.pdf.
library(smwrData) data(Q05078770) # this should indicate no missing values. with(Q05078770, screenData(DATES, FLOW)) # There should be missing values shown for: #months 10-12 in water year 2003 (October - December, 2002), and #months 1-9 of water year 2004. with(Q05078770, screenData(DATES, FLOW, year="w"))
library(smwrData) data(Q05078770) # this should indicate no missing values. with(Q05078770, screenData(DATES, FLOW)) # There should be missing values shown for: #months 10-12 in water year 2003 (October - December, 2002), and #months 1-9 of water year 2004. with(Q05078770, screenData(DATES, FLOW, year="w"))
Functions for transforming and back-transforming data using an s-shaped curve.
sCurve(x, location = 0, scale = 1, shape = 1) IsCurve(x, location = 0, scale = 1, shape = 1)
sCurve(x, location = 0, scale = 1, shape = 1) IsCurve(x, location = 0, scale = 1, shape = 1)
x |
a numeric vector to be transformed by |
location |
the transition point in the s-curve transform. |
scale |
the scaling factor for the data, the slope at the transition point in the s-curve transform. Must be greater than 0. |
shape |
a value that determines how quickly the curve approaches the limits of -1 or 1. Must be greater than 0. |
The basic equation for the s-curve is z/(1 + abs(z)^shape
)^(1/shape
),
where z is scale
*(x
-location
).
The function sCurve
computes the forward transform and the
function IsCurve
computes the inverse [sCurve] transform, or back-transform.
A numeric vector of the transformed or back-transformed values in
x
.
The sCurve
function is related to the hyperbolic
function in that both can represent mixing models for flow in stream water
chemistry. The sCurve
function is more flexible when there are
distinct upper and lower limits to the concentration. The hyperbolic
function is more flexible for open-ended concentrations for either high
or low flows. Also, sCurve
would typically use log-transformed
values for flow.
## Not run: # Basic changes to the s-curve curve(sCurve(x), -5,5, ylim=c(-1,1)) # Shift to left curve(sCurve(x, location=1), -5,5, add=TRUE, col="red") # increase slope curve(sCurve(x, scale=2), -5,5, add=TRUE, col="cyan") # increase rate curve(sCurve(x, shape=2), -5,5, add=TRUE, col="purple") ## End(Not run)
## Not run: # Basic changes to the s-curve curve(sCurve(x), -5,5, ylim=c(-1,1)) # Shift to left curve(sCurve(x, location=1), -5,5, add=TRUE, col="red") # increase slope curve(sCurve(x, scale=2), -5,5, add=TRUE, col="cyan") # increase rate curve(sCurve(x, shape=2), -5,5, add=TRUE, col="purple") ## End(Not run)
Create categories for any definitions of seasons by month and day.
seasons(x, breaks, Names = paste("Season Ending ", breaks, sep = ""))
seasons(x, breaks, Names = paste("Season Ending ", breaks, sep = ""))
x |
any vector of valid dates or date-time data of class "Date" or "POSIXt." |
breaks |
either month names of the end of the seasons or specific days in the form of "mm/dd," where mm is the 2-digit month and dd is the 2-digit day. Breaks in the form of "mm/dd" indicate the last day of each season. Breaks must be in calendar order. |
Names |
optional names for the seasons. |
The default names for the seasons are of the form "Season Ending ...," where
... is derived from breaks
.
A factor of seasonal categories.
## Just two seasons seasons(as.Date(c("2001-03-31", "2001-06-30", "2001-09-30")), breaks=c("June", "December")) ## The equivalent using mm/dd format seasons(as.Date(c("2001-03-31", "2001-06-30", "2001-09-30")), breaks=c("06/30", "12/31")) ## Not run: # Apply to a real dataset library(smwrData) data(QW05078470) transform(QW05078470, Seas=seasons(DATES, breaks=c("June", "December"))) ## End(Not run)
## Just two seasons seasons(as.Date(c("2001-03-31", "2001-06-30", "2001-09-30")), breaks=c("June", "December")) ## The equivalent using mm/dd format seasons(as.Date(c("2001-03-31", "2001-06-30", "2001-09-30")), breaks=c("06/30", "12/31")) ## Not run: # Apply to a real dataset library(smwrData) data(QW05078470) transform(QW05078470, Seas=seasons(DATES, breaks=c("June", "December"))) ## End(Not run)
Collapse a numeric sequence into a compact form that represents continuous ranges and discontinuous values.
seqCollapse(x, sequential = "-", skips = ", ")
seqCollapse(x, sequential = "-", skips = ", ")
x |
an integer vector, missing values and repeated values are permitted and removed before collapsing. |
sequential |
the separator for sequential values. |
skips |
the separator for gaps in the sequence. |
A character string that represents that data in x
in a compact
form. If x
is empty, then "" is returned.
This function is commonly used to express years in a compact form.
# A single value seqCollapse(1968) # A single, continuous range of values seqCollapse(1968:1992) # A collection of continuous and individual values seqCollapse(c(1968:1992, 1998, 2002, 2006:2012))
# A single value seqCollapse(1968) # A single, continuous range of values seqCollapse(1968:1992) # A collection of continuous and individual values seqCollapse(c(1968:1992, 1998, 2002, 2006:2012))
Create a new file or object name from an old name with an optional new suffix.
setFileType(filename, type = "tmp", replace = FALSE)
setFileType(filename, type = "tmp", replace = FALSE)
filename |
a single character string of the name of the file or object. |
type |
character string identifying the new suffix. |
replace |
logical ( |
A character string like filename
but with a new suffix.
This function is designed as a support function for many functions.
Dave Lorenz, original coding by Jim Slack, U.S. Geological Survey retired.
# Replace the .dat suffix with .txt setFileType("TestName.dat", "txt", replace=TRUE)
# Replace the .dat suffix with .txt setFileType("TestName.dat", "txt", replace=TRUE)
Set the time-zone information for dates and times.
setTZ(x, TZ, force.stz = FALSE)
setTZ(x, TZ, force.stz = FALSE)
x |
the date-time data, generally class "POSIXct." |
TZ |
time-zone code or time-zone name, see Details. |
force.stz |
force standard time specified in |
The time-zone information should be a standard name like those described in http://en.wikipedia.org/wiki/List_of_zoneinfo_time_zones. For the convenience of users in the United States, correct conversion is provided for the time-zone codes of EST, EDT, CST, CDT, MST, MDT, PST, PDT, AKST, AST, AKDT, ADT, HAST, and HST. However, time data in States like Arizona, where savings time is never used, would use time-zone information specified like " America/Phoenix" to avoid the possibility of setting savings time when it is not appropriate.
Data like x
, but with times adjusted by the time-zone information.
The time-zone information is a characterisitic of the data and not of
each individual value. If the data in x
come from different time
zones, then a time zone is selected from the data and used as the base—the
dates in x
are correctly converted to the selected time zone and a
warning is issued.
TestDts <- as.POSIXct(c("2010-05-28 09:50:00", "2010-11-29 15:20:00")) setTZ(TestDts, c("PDT", "PST")) # Try setting to different time zones setTZ(TestDts, c("PDT", "CST"))
TestDts <- as.POSIXct(c("2010-05-28 09:50:00", "2010-11-29 15:20:00")) setTZ(TestDts, c("PDT", "PST")) # Try setting to different time zones setTZ(TestDts, c("PDT", "CST"))
Returns a vector like the input, but with the position of the data shifted up or down.
shiftData(x, k = 1, fill = NA, circular = FALSE)
shiftData(x, k = 1, fill = NA, circular = FALSE)
x |
any vector. |
k |
a positive or negative whole number of positions to shift the data. Positive values shift data to a higher position and negative values shift data to a lower position. |
fill |
a scalar value like |
circular |
logical ( |
A vector like x
, with data shifted in position.
shiftData(1:5, k=1) # [1] NA 1 2 3 4 shiftData(1:5, k=1, circ=TRUE) # [1] 5 1 2 3 4
shiftData(1:5, k=1) # [1] NA 1 2 3 4 shiftData(1:5, k=1, circ=TRUE) # [1] 5 1 2 3 4
show
for Time-of-Day ObjectsDisplay the time of day.
## S4 method for signature 'timeDay' show(object)
## S4 method for signature 'timeDay' show(object)
object |
the object to be printed. |
The object is printed and returned invisibly.
Compute percentage or proportion of elements in a composition.
sumComposition(x, ..., Range = 100)
sumComposition(x, ..., Range = 100)
x |
any numeric vector, matrix, or data frame containing only numeric columns. |
... |
any additional vectors or matrices. |
Range |
the output range, generally 100 for percentages (the default) or 1 for proportions. |
Missing values are permitted in x
or ...
and result in missing
values for the row in the output.
A matrix with columns matching all of the data in x
and
...{}
with rows summing to Range
.
This function is designed to meet a very simple need in some
applications like constructing data for Piper (Piper, 1944) or trilinear diagrams.
For more in-depth manipulations of compositional data, the user is directed
to the compositions
or other similar package.
Piper, A.M., 1944, A graphical procedure in the geochemical interpretation of water analyses: Transactions of the American Geophysical Union, v. 25, p. 914-923.
# Create tiny dataset TinyCations <- data.frame(Ca=c(32, 47, 28), Mg=c(10,12,15), Na=c(7, 5, 7)) sumComposition(TinyCations)
# Create tiny dataset TinyCations <- data.frame(Ca=c(32, 47, 28), Mg=c(10,12,15), Na=c(7, 5, 7)) sumComposition(TinyCations)
Class "timeDay" describes the time of day without any reference to the date. The data are stored as seconds since midnight.
time
the time of day in seconds since midnight.
format
a character string indicating the format to display the time of day.
Objects can be created by calls of the form
as.timeDay(time, format)
.
showClass("timeDay")
showClass("timeDay")
Constructs a data frame from the count data in a contingency table, using the column and row names as classes.
untable(x, rows = "Rows", cols = "Columns", counts = FALSE)
untable(x, rows = "Rows", cols = "Columns", counts = FALSE)
x |
a contingency table. Missing values are not permitted. |
rows |
a character string indicating the name of the column containing the data for the rows. The default column name is "Rows." |
cols |
a character string indicating the name of the column containing the data for the columns. The default column name is "Columns." |
counts |
a logical value indicating whether there should be one row in
the result for each observation, which is the default |
A data frame containing two columns named from rows
and cols
and an
optional column named "Counts" if counts
is set to TRUE
.
The output for this function can be used for input to contingency table analysis functions that require a data frame rather than a contingency table. To convert a column from factor to ordered, use the ordered function.
## Create a small synthetic data matrix mdat <- matrix(seq(6), nrow = 2, ncol=3, dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3"))) untable(mdat)
## Create a small synthetic data matrix mdat <- matrix(seq(6), nrow = 2, ncol=3, dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3"))) untable(mdat)
Create an ordered factor or numeric values from a vector of dates based on the water year.
waterYear(x, numeric = FALSE)
waterYear(x, numeric = FALSE)
x |
an object of class "Date" or "POSIXt." Missing values are permitted and result in corresponding missing values in the output. |
numeric |
a logical value that indicates whether the returned values
should be numeric |
An ordered factor or numeric vector corresponding to the water year.
The water year is defined as the period from October 1 to September 30. The water year is designated by the calendar year in which it ends. Thus, the year ending September 30, 1999, is the "1999 water year."
library(smwrData) data(QW05078470) ## Return an ordered factor waterYear(QW05078470$DATES)
library(smwrData) data(QW05078470) ## Return an ordered factor waterYear(QW05078470$DATES)
Identifies the row and column numbers (indexes) of TRUE
values in a
logical matrix.
whichRowCol(x, which = "both")
whichRowCol(x, which = "both")
x |
a logical matrix. Missing values are treated as |
which |
a character string indicating what should be returned. |
A matrix of the row and column number if which
is "both."
Otherwise a named vector of the row number, if which
is "row," or
column number, if which
is "col." Only the first character is needed.
Some comparisons, %in%
for example, will return a vector rather
than a matrix and cause whichRowCol to fail.
# Simple case to find a single value whichRowCol(matrix(1:20, ncol=4) == 16) # Where are the missing values in a data set? library(smwrData) data(MenomineeMajorIons) whichRowCol(sapply(MenomineeMajorIons, is.na))
# Simple case to find a single value whichRowCol(matrix(1:20, ncol=4) == 16) # Where are the missing values in a data set? library(smwrData) data(MenomineeMajorIons) whichRowCol(sapply(MenomineeMajorIons, is.na))