\documentclass{article} \parskip 6pt \usepackage{pdfpages} \usepackage[margin=1.25in]{geometry} \usepackage[colorlinks=true,urlcolor=blue]{hyperref} %\VignetteIndexEntry{Trends using Monthly Seasonal Kendall} %\VignetteDepends{restrend} %\VignetteDepends{smwrBase} %\VignetteDepends{smwrQW} \begin{document} \SweaveOpts{concordance=TRUE} \raggedright \title{Trend Analysis of Nutrients} \author{Dave Lorenz} \maketitle \begin{abstract} This example illustrates the data manipulations for the seasonal Kendall analysis using the monthly option rather than the seasonal option for defining seasons. Note that these data were actually sampled on a periodic basis and would be more appropriate for the seasonal option, but the monthly option is used for illustration purposes. The example uses nutrient data. This example uses a common time frame for all of the trend tests. The common time frame facilitates comparing trends among the stations and constituents. Most often users will want to divide trend analyses into similar groups of analytes like major ions, nutrients and so forth because they will be analyzed in similar ways and will have common sampling time frames. \end{abstract} \tableofcontents \eject \section{Introduction} Need to add blurb about censoring and no log transform. The data used in this application are a small subset of the data used by Schertz and others (1991). The data are samples taken from water year 1969 (October, 1968) through water year 1989 (September, 1989). Nineteen stations were selected and four nutrients were selected from the original data. <>= # Load restrend and other packages and the data library(restrend) library(smwrBase) library(smwrQW) data(EstrendSub) head(EstrendSub) @ \eject \section{Summarize the Sample Data} In general, it is desirable, but not necessary, to subset the data before proceeding with the analysis of a subset of the constituents. Nutrient concentrations can vary widely in natural waters and can range from completely uncensored to highly censored. No flow adjustment will be done for these data becuase of the potential high variability in censoring--for these data, about 10 percent of ammonia samples are censored, but the other constituents have much lower percentages. <>= # Convert to class qw EstrendSub.qw <- convert2qw(EstrendSub) # Create the subset Nuts <- subset(EstrendSub.qw, select=c("STAID", "DATES", "PN.organic", "PAmmonia", "PKjeldahl", "PTotal.P")) # Rename to remove leading P, not required--just pretty constituents <- c("N.organic", "Ammonia", "Kjeldahl", "Total.P") names(Nuts)[3:6] <- constituents # The sampling for nutrients started later, so remove the samples that # have no nutrient data. Nuts <- dropAllMissing(Nuts, constituents) @ The \texttt{sampReport} function creates a simple PDF file that contains a report of the sample date ranges and graph of samples for each station. It can be used to help define the starting and ending date ranges for the trend tests as well as identifying sample gaps and other sampling issues. <>= # Create the report sampReport(Nuts, DATES="DATES", STAID="STAID", file="NutrientSampling") @ \includepdf[pages={-}]{NutrientSampling.pdf} The call to \texttt{sampReport} returns the file name invisibly (NutrientSampling.pdf). Because it is a full-size portrait PDF file, it is inserted here with compressed pages. The report gives the actual begin and end dates for sampling and the graph shows the sampling dates for each station. It is easy to see that only 5 stations (07227500, 07228000, 07336820, 07343200, and 07346070) were sampled for the entire retrieval period. Note that the report only shows when any sample was taken, the ranges for individual constituents can differ from the pattern shown. The code immediately following this paragraph demonstrates how to show the sampling pattern for a single constituent, but is not executed. The report would show that Kjeldahl sampling did not start at any station until 1974. \noindent\fbox{ \parbox{\textwidth}{ \texttt{\# The sampling pattern for Kjeldahl} \texttt{sampReport(na.omit(Nuts[c("STAID", "DATES", "Kjeldahl")]),} \texttt{DATES="DATES", STAID="STAID", file="KjeldahlSampling")} }} \section{Set up the Project} The user must balance the need to include as many stations as possible and the targeted time frame for the trend estimation. For these data, 5 stations have a reasonably complete record, but to include all of those stations, the analysis period would need to be much shorterThis example will use the full retrieval period and include only the 5 stations with reasonably complete record. The \texttt(setProj) function sets up the trend estimation project. There are many arguments to \texttt(setProj), see the documentation for details. The constituent names or response variable names are referred to as \texttt{Snames} in keeping with the names used in the original ESTREND. After projects have been set up, the user can get a list of the projects by using \texttt{lsProj} or can specify a project to use with \texttt{useProj}. The function \texttt{useProj} must be used to continue working on a project after the user quits from the R session. <>= # subset to a few selected stations: Nuts <- subset(Nuts, STAID %in% c("07227500", "07228000", "07336820", "07343200", "07346070")) # Set up the project setProj("nutrients", Nuts, STAID="STAID", DATES="DATES", Snames=constituents, type="monthly", Start="1969-10-01", End="1989-10-01") @ The \texttt(setProj) function creates a folder in the users workspace with that name. That folder contains \texttt{R} data that are updated after each successful call to an analysis function in \texttt{restrend}. Table 1 describes the data created in this example's call to \texttt(setProj). Any object of class "matrix" or "by" are indexed by station and sname. \textbf{Table 1.} The data created by \texttt(setProj). \begin{tabular}{l l p{8cm}} Name & Class & Description \\ estrend.cl & list & A record of the calls to analysis functions. \\ estrend.cn & matrix & A description of the censoring. May be "none," "left," or "multiple." \\ estrend.cp & matrix & The percent of observations that are left-censored. \\ estrend.df & by & The dataset, contains STAID, DATES, and the response variable. \\ estrend.in & list & Information about the project, such as the start and end dates and the names of columns in each dataset. \\ estrend.ml & by & Details from the monthly selection process. Each is a list from the potential comparisons from each month of the year, the selected months, and the number of months . See Lorenz (2016) for details. \\ estrend.ms & matrix & The number of months or seasons from the analysis recorded in \texttt{estrend.ml}. \\ estrend.st & matrix & The status for each station and sname. Must be "OK" to continue with the trend analysis. \\ \end{tabular} It is useful to verify which stations and snames will be analyzed and what the seasonal definitions are. The user need only enter the name of the R data object in the console. For these data, the seasonal definition is 0 in all cases where the status is not "OK." <>= # Which are OK? estrend.st # What seasonal definition? estrend.ms @ \eject \section{Seasonal Kendall Trend Test} These data are ready for the seasonal Kendall trend test. The function \texttt{SKTrends} executes the trend test on all valid combinations of stations and snames. It can also execute the test on subsets if some changes need to be made. The \texttt{SKTrends} function also creates a PDF file that contains the result of the analysis and a series graph on each page. See the documentation for \texttt{seriesPlot} for information about that graph. The file reports the results for each sname by station with the flow-adjusted results following the untransformed results. Most trends are very small for these data; only the reports for Calcium at 07228000 is shown. <>= # Trend tests, accepting default seasons SKTrends() @ \includepdf[pages={5}]{nutrients_sk.pdf} \section{Trend Results} When completed, or to check on intermediate results, the estimated trends can be extracted using the \texttt{getTrends} function. By default, all stations and snames are extracted. The output dataset is explained in the documentation for \texttt{getTrends}. The user has the option to set a significance level to determine whether there is a significant trend, the default level is 0.05. <>= # get the trends nutrients.tnd <- getTrends() print(nutrients.tnd) @ \eject \section{Further Remarks} Because trend analysis is not necessarily a straightforward process, but requires user assessments at several points in the process, it is not necessarily a good idea to simply create scripts and run them without any user review and interaction. To overcome recording the steps in a script, the functions in restrend record all changes to the projects database in a list called \texttt{estrend.cl}. It can be viewed at any time simply by entering estrend.cl in the console window. It can be saved with the data to ensure that the trend analysis is reproducible. <>= # get the history estrend.cl @ \begin{thebibliography}{9} \bibitem{Lor} Lorenz, D.L., in preparation, restrend: an R package for EStimate TRENDs: U.S. Geological Survey Open File Report, ? p. \bibitem{SAO} Schertz, T.L., Alexander, R.B., and Ohe, D.J., 1991, The computer program EStimate TREND (ESTREND), a system for the detection of trends in water-quality data: U.S. Geological Survey Water Resources Investigations Report 91-4040, 72 p. \end{thebibliography} \end{document}