Math 321 Statistics for Experimentalists Spring 2023: R Projects

0: Akritas' datasets

00: Install RStudio on your home computer (optional)

1: Intro to R and data

  1. (Optional) Login to Posit Cloud. Create a new RStudio project. An R console should occupy the left side of the page.
  2. Launch RStudio on your computer if you aren't using the cloud.
  3. Try the following commands and then write a brief description of what they did.
    • > x
    • > y
    • > x = c(1, 1, 2, 3, 5, 8, 13)
    • > y <- c(7:1)
    • > x
    • > y
    • > x+1
    • > x+y
    • > 2*y
    • > x*y
    • > sort(x*y)
    • > x < y
    • > x[4]
    • > x[4:6]
    • > x[x < y]
  4. Try the following commands and then write a brief description of what they did.
    • > 1:100
    • > sample(1:100, size=20)
    • > x=sample(1:100, size=20)
    • > x
    • > sort(x)
    • > stem(x, scale=2)
    • > stem(x)
    • > hist(x)
    • > hist(x, breaks=10)
    • > hist(x, breaks=10, freq=F); lines(density(x))
    • > boxplot(x)
    • > y=sample(1:100, size=20, replace=T)
    • Repeat the good parts of the exploration of x above. How are x and y different?
    • > boxplot(x,y)

2: Loading data and numerical v. categorical data

  1. Download the USGS data for all earthquakes in the past 30 days in Washington. Pay attention to where you save this file.
  2. Upload the data to RStudioCloud: click "File" in the bottom-right pane, then "Upload" and select all_month.csv.
  3. Load the data into R: > quakes=read.csv("all_month.csv", header=T)
  4. > str(quakes) tells you the structure of the data. How many earthquakes were there? How many variables are there?
  5. The magnitudes of the earthquakes is in the column mag. The following commands give you different views of the data:
    • > summary(quakes$mag) (a 5-statistic summary)
    • > boxplot(quakes$mag) (a box-and-whisker plot; the box covers the IQR, whiskers extend 1.5 × IQR from the box)
    • > hist(quakes$mag) (an absolute frequency histogram)
    • > hist(quakes$mag, freq=F) (a relative frequency histogram)
    • > plot(density(quakes$mag)) (an estimated density curve--like a smooth histogram)
    • > hist(quakes$mag, freq=F); lines(density(quakes$mag)). You may need to tell R to skip missing values: lines(density(quakes$mag, na.rm=T))
  6. The number of seismic stations used to determine the earthquake's location is in the column nst. This is numerical data, but it is discrete, not continuous. Try:
    • > boxplot(quakes$nst)
    • > hist(quakes$nst)
    • > barplot(table(quakes$nst))
    Do you have a preference?
  7. What goes wrong with > barplot(table(quakes$mag))?
  8. The type of earthquake is in the column type. This is categorical data. What happens when you try > boxplot(quakes$type) or > hist(quakes$type)? What about > barplot(table(quakes$type))?
  9. Sometimes we'll look at more than one variable at a time. > plot(quakes$mag, quakes$latitude) and > plot(quakes$mag, quakes$nst) both look interesting; what do you see?

3: Estimation and the Central Limit Theorem (optional)

4: Hypothesis tests (due 4/21)

5: Goodness-of-fit tests (due 4/26)

6: Linear regression with echindas (due 4/28)

7: Linear regression II (due 5/1)

8: Opah! (due 5/1)

9: ANOVA (due 5/5)

Course resources

Links

Office hours

Logan Axon
Department of Mathematics
MSC 2615
Gonzaga University
Spokane, WA 99258
Office: Herak 227A
Phone: 509.313.3897
Email: axon@gonzaga.edu

Last updated 5/2/2023