Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


Introduction and Preliminaries


1 Introduction

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Among other things it has

● an effective data handling and storage facility,

● a suite of operators for calculations on arrays, in particular vectors and matrices,

● a large, coherent, integrated collection of intermediate tools for data analysis,

● graphical facilities for data analysis and display either directly at the computer or on hardcopy, and

● a well developed, simple and effective programming language, which includes conditionals, loops, user defined recursive functions and input and output facilities.

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

R is very much a vehicle for newly developing methods of interactive data analysis. It has developed rapidly, and has been extended by a large collection of packages. However, most programs written in R are essentially ephemeral, written for a single piece of data analysis.


2 General Properties

R makes it extremely easy to code complex mathematical or statistical procedures, though the programs may not run all that quickly. You can interface R with other languages (C, C++, Fortran) to provide fast implementations of subroutines, but writing this code (and making it portable) will typically take longer. Where the advantage falls in this trade-off will depend upon what you’re doing; for most things you will encounter during your degree, R is sufficiently fast.

R is open source and widely adopted by statisticians and data scientists. There is a huge wealth of existing libraries so you can often save time by using these, though it is sometimes easier to start from scratch than to adapt someone else’s function to meet your needs. Contributing new packages to the central repository (CRAN) is easy: even your lecturer has managed it. As a result, R packages are not build to very high standards.

R is portable, and works equally well on Windows, Mac OSX, and Linux.


3 Interfaces

For Windows and Mac OSX, the standard R download comes with an R GUI, which is adequate for simple tasks. You can also run R from the command line in any operating system.

There are a number of more powerful interfaces (RStudio, preferably) which you may like to try. Here’s a few.

● RStudio: very popular, with a nice interface and well thought out, especially for more advanced usage: can be a bit buggy, so make sure you update it regularly. Available on all platforms.

● Emacs with ESS: is available on all platforms, and is very powerful when you get used to it. Has a habit of freezing in my experience, though.

● TinnR: alternative Windows interface


4 R and SAS Comparison

R is in many ways comparable with SAS. The software is predominately syntax driven and relies on its user to known the R language (which in many ways resembles UNIX programming languages). R is comparable in structure and conceptual arrangement to other syntax based software packages. Similarities and dissimilarities with other software packages will be pointed out to facilitate an easier transition into R.

Technically R is an expression language with a very simple syntax. Unlike SAS, it is case sensitive, so “A” and “a” are different symbols and would refer to different variables.

Commands are separated either by a semi-colon (;), or by a newline. Elementary commands can be grouped together into one compound expression by braces ( and ).

Comments can be put almost anywhere, starting with a hashmark (#), everything to the end of the line is a comment.

If a command is not complete at the end of a line, R will give a different prompt, by default + on second and subsequent lines and continue to read input until the command is syntactically complete.


5 Getting started

5.1 Install R and RStudio

R can be downloaded from http://cloud.r-project.org. Most users should download and install a binary version. This is a version that has been translated (by compilers) into machine language for execution on a particular type of computer with a particular operating system. R is designed to be very portable: it will run on Microsoft Windows, Linux, Solaris, Mac OSX, and other operating systems, but different binary versions are required for each. In this class most of what we do would be the same on any system, but when we write system-specific instructions, we will assume that students are using Microsoft Windows.

Installation on Microsoft Windows is straightforward. A binary version is available for Windows from the web page http://cloud.r-project.org/bin/windows/base. Click “Download R 4.1.0 for Windows” to download a file with a name like R-4.1.0-win.exe. Clicking on this file will start an almost automatic installation of the R system. Though it is possible to customize the installation, the default responses will lead to a satisfactory installation in most situations, particularly for beginning users. One of the default settings of the installation procedure is to create an R icon on your computer’s desktop.

You should also install RStudio, after you have installed R. As with R, there are separate versions for different computing platforms, but they all look and act similarly. You should download the free edition of “RStudio Desktop” from https://www.rstudio.com/, and follow the instructions to install it on your computer.

5.2 RStudio layout

The RStudio interface consists of several panes (see a figure below)

● Top left: editor pane (also called script pane). Collections of commands (scripts) can be edited and saved. When you don’t get this window, you can open it with File → New → R script

Just typing a command in the editor window is not enough, it has to get into the command window before R executes the command. If you want to run a line from the script window (or the whole script), you can click Run or press CTRL+ENTER to send it to the command window.

● Bottom left: console pane (also called command pane). Here you can type commands after the “>” prompt and R will then execute your command. This is the most important window, because this is where R actually does stuff.

● Top right: workspace, history window. In the workspace window you can see which data and values R has in its memory. You can view and edit the values by clicking on them. The history window shows what has been typed before.

● Bottom right: files, plots, packages, help, viewer window. Here you can open files, view plots (also previous plots), install and load packages or use the help function.

All of the panes can be resized and repositioned, so sometimes it may appear that you’ve lost one, but there’s no need to worry: just find the header of the pane and click there with your mouse, and the pane will reappear. If the pane is there but the content isn’t what you want, try clicking on the tabs at the top.


6 Packages

All R functions and datasets are stored in packages. Only when a package is loaded are its contents available. This is done both for efficiency (the full list would take more memory and would take longer to search than a subset), and to aid package developers, who are protected from name clashes with other code. Package includes preassembled collections of functions and objects. This time we will go over a few commands related to package:

6.1 install.packages

Each R package is hosted at http://cran.r-project.org, the same website that hosts R. However, you don’t need to visit the website to download an R package; you can download packages straight from R’s command line. Here’s how:

1. Open RStudio.

2. Make sure you are connected to the Internet.

3. Run, for example, install.packages("ggplot2") at the command line.

That’s it. R will have your computer visit the website, download ggplot2, and install the package in your hard drive right where R wants to find it. You now have the ggplot2 package. If you would like to install another package, replace ggplot2 with your package name in the code.

6.2 library

Installing a package doesn’t place its functions at your fingertips just yet: it simply places them in your hard drive. To use an R package, you next have to load it in your R session with the command library("ggplot2"). If you would like to load a different package, replace ggplot2 with your package name in the code.

To see what this does, try an experiment. First, ask R to show you the qplot function. R won’t be able to find qplot because qplot lives in the ggplot2 package, which you haven’t loaded:

Now load the ggplot2 package: library("ggplot2")

If you installed the package with install.packages as instructed, everything should go fine. Don’t worry if you don’t see any results or messages. No news is fine news when loading a package. Don’t worry if you do see a message either; ggplot2 sometimes displays helpful start up messages. As long as you do not see anything that says “error”, you are doing fine.

Now if you ask to see qplot, R will show you quite a bit of code (qplot is a long function):

The main thing to remember is that you only need to install a package once, but you need to load it with library each time you wish to use it in a new R session. R will unload all of its packages each time you close RStudio.


7 Getting help

Learning any new language requires lots of help. Luckily, the help documentation and support in R is comprehensive and easily accessible from the command line. To leverage general help resources you can use the following:

7.1 General help

To leverage general help resources you can use:

Note that the help.search("some text here") function requires a character string en-closed in quotation marks. So if you are in search of time series functions in R, using help.search("randomForest") will pull up a healthy list of vignettes and code demon-strations that illustrate packages and functions that work with randomForest command.

7.2 Getting help on functions

For more direct help on functions that are installed on your computer:

Run the following commands and see

Note that the help() and ? functions only work for functions within loaded packages. If you want to see details on a function in a package that is installed on your com-puter but not loaded in the active R session you can use help(functionname, package = "packagename").

7.3 Getting help from the web

Typically, a problem you may be encountering is not new and others have faced, solved, and documented the same issue online. The following resources can be used to search for online help. Although, I typically just google the problem and find answers relatively quickly.

● Stack Overflow: a searchable Q&A site oriented toward programming issues. 75% of my answers typically come from Stack Overflow questions tagged for R at http://stackoverflow.com/questions/tagged/r.

● Cross Validated: a searchable Q&A site oriented toward statistical analysis. Many ques-tions regarding specific statistical functions in R are tagged for R at http://stats. stackexchange.com/questions/tagged/r.

● R-seek: a Google custom search that is focused on R-specific websites. Located at http://rseek.org/

● R-bloggers: a central hub of content collected from over 500 bloggers who provide news and tutorials about R. Located at http://www.r-bloggers.com/