Using "R" to Analyze Data

What is "R"

You collected a data set using a GCDC logger and realized, "Wow, that's a lot of data! Now what?". Data analysis is tedious and the process is particular to each user's application. Don't expect to find a magic software solution that will reduce your data into your perfect answer. However, don't despair. There are several options available, combined with a little bit of user effort, that provide powerful and versatile analysis capabilities.

Spreadsheets, such as Microsoft Excel or OpenOffice Calc, are great choices for plotting moderately sized data sets. The user interfaces are highly polished and customized plotting is easy to handle. Although, most spreadsheets can handle only about 100,000 lines of data before performance begins to slow. Furthermore, scripting complex analysis procedures in a spreadsheet is cumbersome. We recommend trying "R" because it is more powerful than a spreadsheet and it is easy to learn.

"R" is a high-level programming language used most commonly for statistical analysis of data. R is based on the "S" language, which was developed by the Bell Laboratories in the 1970s. R provides a simple workspace environment that can manipulate large data sets using simple math commands and complex function libraries. R is widely used by statisticians and data miners and the language is well supported by the open source community. The software is compact, free, and available for Windows, Mac, and Linux (visit www.r-project.org).

Matlab is another common software application for analyzing data but it is usually reserved to universities or businesses with copious budgets (it's expensive software!). Octave is a free open source adaptation of Matlab with nearly the same capabilities. Although, Octave is a significantly larger download and more complicated installation than R. We favor R because it's small, easy to learn, and free.

R is implemented from a command line workspace. If you are an experienced programmer, you may even cringe at some of the constructs used in R. Don't worry, it just works. User input occurs at the ">" prompt and the R interpreter responds with the results. A single result is preceded by a [1] to indicate the response number. The "#" character is used to add comment information that the R interpreter ignores.

The R workspace includes a single command line interface window and a separate graphics window for displaying plots. "RStudio" is free software package that provides a more versatile interface to the R interpreter. RStudio is available at www.rstudio.com

Introduction to R

A complete description of the R language is available here at cran.r-project.org. Beginners should skip to Appendix A and review an example R session. This will provide a quick overview of how R works in practice. From there, return to the R documentation to understand commands and data formats.

The GCDC logger user manual includes a basic introduction to R and a simple R session that loads and plots a data file.

Tutorials for Using RStudio

RStudio puts a fancy IDE over the R work environment and makes data analysis a little easier. Click here for tutorials and cheat-sheets for RStudio.

Advanced Topics

R is very powerful and versatile by itself but there are a wide range of additional packages that extend the R toolset. There are two important packages that provide powerful functions for analyzing accelerometer data - tuneR and seewave.

Packages are installed using the "install.package" command. For example, *install.package("tuneR") * automatically finds and downloads the tuneR package from the R repository. Then, *library("tuneR")* loads the library into the active R session. Now, the special functions within tuneR are available to the user. RStudio handles additional packages using the bottom-right window pane. Click the "Packages" tab, then "Install" button, and enter the name of the package to install (equivalent to "install.package" command). The package is added to the "User Library" list. Click the check box next to the library to make the library active (equivalent to "library" command).

Now that the tuneR and seewave libraries are loaded, R is ready to make pretty plots of accelerometer data. Below is an example R console session that imports a GCDC X2-5 data file and creates a spectral plot. In this test, the X2-5 was configured to sample at 2000 Hz in high gain and was attached to the dashboard of a car. The car was started, the rpm's stabilized to 1250, the engine was "rev'd" to about 3000 rpm, and then the engine was turned off.

> data<-read.table("e:\\GCDC\\DATA-435.CSV",sep=",",comment=";")

> #import x-axis column 2 between 1 and 20 seconds

> CarWaveX<-Wave(data[2000:40000,2], bit=16, pcm=TRUE, samp.rate=2000)

> #plot the spectrograph versus time

> spectro(CarWaveX, flim=c(0.005,0.2), wl=1000, ovlp=75, db=0)

The plot illustrates the vibration signal in "dB", which is amplitude relative to the maximum signal. In this case, the logger was not perfectly level so a small offset error occurs in the x-axis at 0Hz (the maximum signal). This can be filtered out but, for the sake of simplicity, the spectro function implemented the "flim" option to cropped out the offset (<0.005KHz).