GSmithREU

From MariachiWiki

Back to my home page.

Contents

Averaging Scripts

mtbcomb8

The R script mtbcomb8 is a variant of mtbcomb2, one of Dima's R scripts. It performs the same operation but does so more efficiently. To do this, the first column (the date/time portion) of the counts data is loaded as a vector of class POSIXct; the remaining columns (d1 through c5) are loaded as a numeric matrix. (Below are some excerpts of mtbcomb8, please see the script in its entirety here.)

 tab1 <- data.matrix( tab[, 2:ncol(tab)] )
 dvec <- tab$date

Selecting the data within each interval is a two-step process. Boundary indexes are used to select a subset of the data that corresponded roughly to the averaging interval:

 dveci <- dvec[ndxA:ndxB]
 tba1i <- tab1[ndxA:ndxB,]

This subset is small, usually about 33 rows long for a 30 minute time interval. Compare this to the size of the original data frame, which may be several hundred thousand rows long. Within this subset, only the rows corresponding to times within the interval are selected:

 dvecj <- dveci[ (b1 <= dveci)&(dveci < b2) ]
 tba1j <- tba1i[ (b1 <= dveci)&(dveci < b2), ]

Since R only needs to search the subset of the data for matching dates, the boundary indexes enable this portion of the script to run much faster, compared to previous versions of this script. Next, the counts data within the interval is averaged and added to the end of tba1:

 tba1 <- rbind(tba1, apply(tba1j, 2, mean, na.rm=TRUE))
 dvec1[k] <- (b1 + dt/2)

Unlike a data frame, in a data matrix, each element has the same class (numeric). (This is because each column of a data frame can have its own class.) When using the rbind() command on two data frames, R must check that the matching columns have the same class. R does not need to do this extra step when using the rbind() command on data matrixes, as the matching columns inherently are of the same class. At the end of each loop, the indexes are advanced:

 k <- (k+1)
 ndxA <- ndxA + length(dvecj)
 ndxB <- ndxA + 1.1 * td

At the end of the script, the date vector is combined with the counts matrix as a data frame, which is returned to the user:

 tbn <- data.frame(dvec1, tba1, wbt1)
 attr(tbn,"names") <- c(attr(tab,"names"), attr(wtb[,2:ncol(wtb)], "names"))
 return(tbn)

Using the Script

To use mtbcomb8, both the counts and weather data must be loaded into R's memory. Unless specified, mtbcomb8 expects the counts data to be named ct0 and the weather data to be named wt0; the averaging interval is 30 minutes by default. If these conditions are met, typing the following into R will save the results of mtbcomb8 into tab:

 tab <- mtbcomb8()

If the counts or weather data to be averaged are not ct0 or wt0, then the following should be used to average counts and weather with a time interval of intervalmin:

 tab <- mtbcomb8( counts, weather, intervalmin )

runavgcounts

The bash shell script runavgcounts runs Jason Immerman's script on all schools within a specified directory. Since this script calls Jason's ROOT script and runs within bash, the user must have both ROOT and bash installed on their computer.

The script will work best if all the counts files named with a similar format (for example, <schoolname>_counts.dat). It will write an averaged file in the same directory as the counts file, named <schoolname>_<time interval in seconds>.csv and a ROOT histogram file named <schoolname>_<time interval in seconds>_hist.root

Using the Script

Download the script, preferably into the directory where you keep your MARIACHI files. Open the script in a text editor (Notepad, SciTe, gedit, etc.) There are three parameters that need to be set if this is your first time using the script. (You won't need to change them unless you change the names/locations of your counts/weather/averaging script files.) First, you need to tell the script where the averaging script is located by changing the parameter ascript. The syntax is:

 ascript=<directory of script>/<script name>

For example, Jason's script avgCounts is saved in the directory /home/greg/work/AverageCounts/ on my computer:

 ascript=/home/greg/work/AverageCounts/avgCounts

Next, the location of the weather file must be specified:

 wfile=<weather file directory and name>

For example, the weather file weather-20070601-20080707.csv saved in the directory /home/greg/work/MtSinaiWeather/:

 wfile=/home/greg/work/MtSinaiWeather/weather-20070601-20080707.csv

The last parameter that needs to be set is cfile, which tells the script where to find the counts files. The syntax for cfile is identical to the syntax for wfile. If you only had one counts file to average (say, bayshore_counts.dat located in the directory /home/greg/work/bayshore/), then you would set:

 cfile=/home/greg/work/bayshore/bayshore_counts.dat

You probably have more than one counts file to average, and don't want to edit the script file more than once. If this is the case, you can type a *, which acts as a wildcard in the cfile parameter. For example, I have saved all the counts files in the following format:

 /home/greg/work/<school name>/<school name>_counts.dat

To average all the counts files, I would set the following:

 cfile=/home/greg/work/*/*_counts.dat

The script will look for any file named <anything>_counts.dat in any directory /home/greg/work/<anything>/<anything>/. Note that the * can take any value, even nothing. (To clarify, the file example_counts.dat in the directory /home/greg/work would be averaged by the script, as it was matched by cfile.)

Once you've set the parameters to the correct values, you can use the script. If the runavgcounts script is located in directory DIR, open a terminal and type:

 >/DIR/runavgcounts <averaging interval in seconds>

Or, if you are already in the directory DIR, then you can just type:

 >./runavgcounts <averaging interval in seconds>

If you want to see the help file, type:

 >./runavgcounts -h

c2 (and higher) plots vs. time

Zipped folder: plots of c2, c3.1, c3.2, c3.3, c4.a and c5 vs. time for all schools, as of July 28, 2008. Separate plots for 1800 and 7200 second averaging intervals. The c2 graphs contain 1 week of data per plot; the higher coincidence graphs contain 2 weeks of data per plot. Green dashed lines indicate the current week's average flux. Blue dotted lines indicate the previous week's average flux. The up/down triangles indicate points that do not fit on the vertical scale of the plot.

monthlykplot

monthly q-plots (plot 1 of 2).
Enlarge
monthly q-plots (plot 1 of 2).
monthly q-plots (plot 2 of 2).
Enlarge
monthly q-plots (plot 2 of 2).

Plots of the relative change of c2 vs. the relative change in pressure were made on a monthly basis for the bayshore detector array. (Please see the attached .PNG files.) For each month, the average c2 rate (r0) was calculated. The 'average' pressure was taken to be 101.325 kPa. Thus, the relative change in c2 is (c2-r0)/r0 and the relative change in pressure is (pressure-p0)/p0. That is,

\Delta c2_i = \frac{c2_i - \overline{c2}}{\overline{c2}} \mbox{ and } \Delta \mbox{pressure}_i = \frac{\mbox{pressure}_i - p0}{p0}

It is believed that the slope of the best fit line should be constant with respect to time. From January - April 2008, the monthly slopes have a constant value, within 1-2 standard deviations. After this time period, the slope increases in magnitude. This change has not yet been accounted for.

Update

q-plots of each school (plot 1 of 3).
Enlarge
q-plots of each school (plot 1 of 3).
q-plots of each school (plot 2 of 3).
Enlarge
q-plots of each school (plot 2 of 3).
q-plots of each school (plot 3 of 3).
Enlarge
q-plots of each school (plot 3 of 3).

The uncertainties in the "monthly q-plots" are those returned by the R function lm(). These uncertainties are not correct; they do not depend on the value of the weights= option of the lm() function for some reason. As a default, lm() will set weights=1, which is about 50 times larger than the appropriate weight. Thus, the reported uncertainty is about 2500 times larger than the actual value. This uncertainty is valid if the only fluctuations are statistical in nature, which is not the case for most of the data (see below).

Bayshore was chosen because it had a good "overall q-plot distribution" (see the q-plots). Other locations (such as the NSL) had poor distributions - why the difference? The answer: schools like Roosevelt have changing c2 values over time (this can be seen in these plots). Representing this data with a single average value is not appropriate. The solution: determine the intervals with (relatively) constant c2 values (from the c2 vs. time plots) for each school, and calculate a q-value for each interval (see NSL plots on the right).

q-plot for NSL (@ Stonybrook University), using a single value for the c2 average. Note the poor linear fit.
Enlarge
q-plot for NSL (@ Stonybrook University), using a single value for the c2 average. Note the poor linear fit.
q-plot for a subset of the NSL data. The fit for the subset is better than the fit for the entire data.
Enlarge
q-plot for a subset of the NSL data. The fit for the subset is better than the fit for the entire data.