# Introduction to R for Stata Users

**Let's set the software**

**R **is a programming environment

Robert Gentleman and Ross Ihaka developed

**R**at the University of Auckland, New Zealand in 1996.They designed the language to combine the strengths of two existing languages, S and Scheme.

Tools are distributed as packages, which any user can download to customize the R environment.

https://cran.r-project.org/

Free Software.

**RStudio** is a better view (Similar to Stata). Problematic with an extensive database.

**Data Type**

1. Vector

**Definition: ** A vector is a** **sequence of elements that share the same data type. A vector supports logical, integer, double, character, complex, or raw data __types__.

**Example code**

```
#Generating scalar
x<-2
#Generating a vector
x1 <- c(1,2,3)
x2 <- c(1,2,5.3,6,-2,4) # numeric vector
x3<- c("one","two","three") # character vector
x4 <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
x2
x2[c(2,4)] # 2nd and 4th elements of vector
```

Operations with vectors

x[c(posituin1, position)] subsetting

vectors rep(a, repetitions)

seq(from =,to =,by =)

a : b patterned Vectors

Exercise 1 - Practice with vectors

Suggested solution - 1 (Try yourself first)

2. Matrix

3. Arrays

4. Data Frame

Exercise 2 - Data frame

Suggested solution - 2 (Try yourself first)

5. List

6. Factor

**Some functions to start**

1. Get help

There are multiple blogs and help sources on the Internet. Try to google it and look for specific code.

**R **also can give you some advice using the following code

```
?options ## To Internet
help(options)
example(option)
example(lm)
# If the exact name of the command is not know
help.search("sum") # To Internet list of commands
apropos("sum")
```

2. Loops

3. Export and import

4. Merge function

5. More functions

6. Random variables

**Linear Regression ****(Economists, ****such ****as myself, love regressions)**

Let's study the demand for economics journals.

We begin with a small data set taken from Stock and Watson (2007) that provides information on the number of library subscriptions to economic journals in the US in 2000. The data set, collected initially by Bergstrom (2001), is available in package AER under the name Journals.

1. Upload database

We will need to install the package AER.

R has millions of packages that people create to run multiple statistical processes. Uploading packages in Windows is more straightforward than in IOS. In RStduio, I usually upload packages __manually__

**Example code:**

```
install.packages("AER") ## install packages
library(AER) ## Loaded a package
data ("Journals", package="AER") ## Call the date
```

Let's check the data before continuing

```
dim(Journals)
names(Journals)
```

2. Simple graphs

3. Estimations

Exercise 3 - Wage Equation

Suggested solution - 3 (Try yourself first)

Exercise 4 - Wages and year of experience

Suggested solution - 4 (Try yourself first)

Exercise 5 - Prices and subscripts

Suggested solution - 5 (Try yourself first)

4. Dichotomous variables (Dummy variables)

5. Non-Linear regressions

6. Comparison of models

**Descriptive Statistics**

In Stata, we can use the command summarize to calculate the descriptive statistics of the database. We can do the same in R with the following commands.

1. Mean, Median, and Standard Deviation

**Example code:**

```
rm(list=ls(all=TRUE)) # remove all the objects in the memory
data("CPS1985")
str(CPS1985)
head(CPS1985)
levels(CPS1985$occupation)[c(2, 6)] <- c("techn", "mgmt") #
attach(CPS1985) # to use column wage
summary(wage)
mean(wage)
median(wage)
var(wage)
sd(wage)
```

2. Histograms

3. More sophisticated graphs

**Interactions, Separate, and Weights**

y a + x Model without interaction. Identical slopes to x but different intercepts to a.

y a ∗ x Model with interaction. This interaction included ethnicity, education and the interaction between the two.

y a + x + a : x, the term a:x gives the difference in slopes compared with the reference category, in other words, just the interaction.

**Example code:**

```
#Interaction
cps_int <- lm(log(wage) ~ experience + I(experience^2) +
education * ethnicity, data = CPS1988)
# Test of coeficients
coeftest(cps_int)
cps_int <- lm(log(wage) ~ experience + I(experience^2) +
education + ethnicity + education:ethnicity,
data = CPS1988)
coeftest(cps_int) ## Both models are the same.
```

Separate regression for each level

As a further variation, it may be necessary to fit separate regressions for African-Americans and Caucasians.

This model specifies that the terms within parentheses are nested within ethnicity.

The term -1 removes the intercept of the nested model. A matrix to see results for both ethnicity

anova(model1, model2) the model where ethnicity interacts with every other regressor fits significantly better, at any reasonable level than the model without any interaction term.

**Example code:**

```
cps_sep <- lm(log(wage) ~ ethnicity /
(experience + I(experience^2) + education) - 1,
data = CPS1988)
#Estimate two models for separate
summary(cps_sep)
# To compare both models
cps_sep_cf <- matrix(coef(cps_sep), nrow = 2)
rownames(cps_sep_cf) <- levels(CPS1988$ethnicity)
colnames(cps_sep_cf) <- names(coef(cps_lm))[1:4]
cps_sep_cf
anova(cps_sep, cps_lm)
```

Weighted least squares

**References**

A Modern Approach to Regression with R.

An Introduction for R for Quantitative Economics.

R for STATA users.

Applied Econometric with R.

## Comments