### Using R-studio for bare-bone, quick and dirty stats analysis # Author: Andreas Madlung # Date: Jan 20 2016 # Disclaimer: This mini-tutorial is no replacement for a statistics course # and assumes that the user knows which test to use and why. # For publication quality stats analysis check # assumptions more carefully than described here. # A good website for help is http://rcompanion.org/rcompanion/d_06.html ### #Download R from here: http://cran.fhcrc.org/ #Download R Studio Installer for Mac or Windows from here: https://www.rstudio.com/products/rstudio/download/ # If you are new to R it is recommended that you run through the tutorial once with the # example data sets as described below. When you are ready to use your own data, save the this script as # a new file, edit the code and run it sequentially. # Create a data set: # Open an Excel file and save as .csv (Bio332_16_R_ttest.csv) # To do that use "save as",then at the "Format" pull down tab choose Windows comma seperated (.csv) # (both for Mac and PC) # Note: Never use spaces for file or folder names. Use underscores_to_separate_words. # Note: R is case sensitive. Make sure to use the correct spelling as case format. # Type in data as follows: # header (that is the first row): give columns descriptive short names # order (important): all treatments go in the first column. # their responses go in the second column as in the example below: # #species response # apple 5 # apple 4.5 # apple 5.1 # apple 5.2 # pear 7.5 # pear 7.1 # pear 5.7 # pear 7 # Create a second data set in Excel (for the ANOVA analysis below) # and save as Bio332_16_R_ANOVA.csv #species response # apple 5 # apple 4.5 # apple 5.1 # apple 5.2 # pear 7.5 # pear 7.1 # pear 5.7 # pear 7 # orange 10 # orange 11 # orange 11.3 # orange 12 # You can now ask if the response is differnt between species. # This minitutorial introduces you to 4 simple tests: # For comparing two treatments: use t-test (if normal) and Mann-Whitney-U plus post-hoc (if non-normal) # For comparing more than 2 treatments: use ANOVA (if normal) and Kruskal-Wallis plus post-hoc (if non-normal) # WHICH TEST TO USE? #If you assume normality of your data, use a t-test. If the data are non-normally distributed, # use a Mann-Whitney-U test. # If unknown you can't go wrong with the Mann-Whitney-U test, but you lose a bit of power # to detect significant differences. If your sample size is larger than n=25 use a t-test as the # "central limit theorem" applies and normality no longer matters as much. #IMPORTANT NOTE: # This tutorial uses two test data sets with names that only apply to this test data set. You will need # to change these names everywhere you find them to reflect your data set names. The same is true for the # names of the treatments you use in your file. # The "path" is the address or the location on your computer of the file you want to use. # You will also have to adjust that every time you see a path in this experiment # (which currently in this script points to a specific file on my computer, not yours). # A quick way to find the path for PC users is to right click on the file and then select properties. The path is the "Location:". You can copy and paste it into R, but have to replace "/" with "/". # On a Mac you can open the terminal (search "terminal" on your computer) and type in pwd after the prompt and hit return. # Import .csv file # If using your own data you have to define the "path" to where your document is located on # YOUR computer. In this example the file is on my (A. Madlung's) Desktop and # called Bio332_16_R_ttest.csv. test_data_ttest <- read.csv ("/Users/amadlung/Desktop/Bio332_16_R_ttest.csv", header = TRUE) # call up the data file to see if it loaded correctly test_data_ttest # 1. UNPAIRED T-TEST (assuming normality) for exactly 2 samples # Do an unpaired ("regular") t-test (asks if response between apples and pears is different) t.test(response~species, data= test_data_ttest) # Output interpretation: # The unpaired t-test in R is called Welch two-sample t-test # You need the t-value, the degrees of freedom (df) and the p-value. # The output also gives you the mean (average) of each group and the 95% confidence intervals. # 2. Equivalent of UNPAIRED T-TEST (NOT assuming normality) for exactly 2 samples # --> run Mann-Whitney-U instead of t-test as expalined here: # Do an unpaired ("regular") M-W-U test (which asks if the response between apples and pears # is different in a non-normal sample.) # Import data if you haven't already test_data_ttest <- read.csv ("/Users/amadlung/Desktop/Bio332_16_R_ttest.csv", header = TRUE) # call up the data file to see if it loaded correctly test_data_ttest wilcox.test(response~species, data= test_data_ttest) # Output interpretation: # The unpaired "non-normal t-test" equivalent in R is called Wilcoxon rank sum test. # You need the W-value, and the p-value # You also get the mean (average) displayed # Ignore the warning regarding ties if you have several data points with the same values, # which cannot be unequivocvally ranked. If you use the data set provided, this shouldn't be a problem, though. # 3. ANOVA: test differences in means if you have more than 2 samples test_data_ANOVA <- read.csv ("/Users/amadlung/Desktop/Bio332_16_R_ANOVA.csv", header = TRUE) test_data_ANOVA fit <- aov(response ~ species, data=test_data_ANOVA) # If you learned how to interpret residual plots and q-q plots, you can use the # next two lines of code, otherwise skip them. layout(matrix(c(1,2,3,4),2,2)) # optional layout plot(fit) # diagnostic plots layout(matrix(c(1),1)) # return to one figure per pane # Use this code to display the ANOVA results summary(fit) # display Type I ANOVA table # Use the next line only if you need a type III analysis # most likely ignore (if you want to run it, uncomment the next line) #drop1(fit,~.,test="F") # type III SS and F Tests #ANOVA INTERPRETATION # From the summary displayed you will need the degrees of freedom (df), the F-value # and the p-value. This is for the entire ANOVA. If p< 0.05 at least one of the samples is different # from at least one other. Continue with the Tukey posthoc test to determine which is differnt from which. TukeyHSD(fit) # where "fit" comes from the previous command. # INTERPRETATION OF TUKEY TEST: # Look at each comparison and check the adjusted p-value for significance between pairs. # Assign letters to variables (here species) such that values NOT connected by the same letter # signify statsistically significant differences. In this example all three species are # different and would get the letters A, B, C. # 4. KRUSKAL-WALLIS: test differences in means if you have more than 2 samples in non-normal data sets test_data_ANOVA <- read.csv ("/Users/amadlung/Desktop/Bio332_16_R_ANOVA.csv", header = TRUE) test_data_ANOVA kruskal.test (response~species, data = test_data_ANOVA) # INTERPRETATION # You need the chi-squared value, the degrees of freedom, and the p-value for your report. # Like in the ANOVA, the p-value tells you that there is at least one species that is different from # at least one other. To know which it is you need to run a Dunn test or a Nemenyi test. # Non-parametric tests like Dunn and Nemenyi are less powerful and might not find significance # that a Tukey test would find in normally distributed data. #POSTHOC TEST FOR PAIRWISE COMPARISON #The next step installs a stats package that you need for the Nemenyi test. Wait until R finishes #spitting out red code before proceding. install.packages("PMCMR") library(PMCMR) require(PMCMR) posthoc.kruskal.nemenyi.test(x= test_data_ANOVA$response, g= test_data_ANOVA$species, method ="BH") #For connoisseurs: Another way to run the previous line is the following option: #attach(test_data_ANOVA) #posthoc.kruskal.nemenyi.test(x= response, g= species, method ="BH") #detach(test_data_ANOVA) #INTERPRETATION: # The numbers you get from the Nemenyi test are the p-values for each pair-wise comparison. # Get p-values for each comparison and assign letters as described for the Tukey test in the ANOVA description. # If Nemenyi is non-significant and you want to run a less stingent test, try the Dunn test below # (recommended if Nemenyi is insignificant). #OPTIONAL: # If you don't want to use a Nemenyi test (or you had trouble installing the package) you can choose # a different package to run the Nemenyi test or to run the Dunn test, which is equivalent to the # Nemenyi test. If you want to do so you can try the next 4 lines # of code to install two packages, then run Dunn or Nemenyi tests (must uncomment to run). #install.packages("manipulate") #library(manipulate) #install.packages("DescTools") #library(DescTools) #NemenyiTest(x= test_data_ANOVA$response, g= test_data_ANOVA$species, dist="tukey") #DunnTest(x= test_data_ANOVA$response, g= test_data_ANOVA$species, method ="fdr") #GRAPHING: #To graph the data in a box plot use the following code: boxplot(response ~ species, data = test_data_ANOVA, ylab="Response", xlab="Species") # Save the plot by going to Plots -> Save as PDF -> click on "Directory" and choose where to put the PDF # then click save. # If you want to add the letters to the categories it is easiest done by importing the PDF into a MS Word document # (in Word: Insert -> Photo -> Picture from file, then browse to your PDF file, # then use Format Picture tab, Wrap text tab, choose "tight") # or, better, use Photoshop and add the letters there. # If you want to change the labels of the category names the only way to do that is to change it in the # original .csv file with your raw data. #END OF THE TUTORIAL