Using SQL from R: A short introduction

By Deveshnie Mudaly

Elevator Pitch

The sqldf package can be used to execute SQL statements in R. For those competent in SQL, this function is extremely useful. SQL is a powerful language, and using it in conjunction with R can enhance and improve the analysis of data. I have made use of SQL to quickly simplify my data frames.

Description

It is easy to select rows and columns in a data frame with conditions using ‘SELECT’ and ‘WHERE’ statements. ‘GROUP BY’ and ‘ORDER BY’ statements can be used to group data together in a particular order. Data frames can be merged together using ‘INNER JOIN’, ‘LEFT JOIN’ or ‘RIGHT JOIN’. ‘CASE’ statements can be used to produce a new column using logical conditions. When working with large data sets, SQL can be used to simplify data quickly. One of the ways in which I used sqldf was by matching card numbers to their actual values. Cards (0-51) needed to be matched to their respective values (1-13) . I decided to match these values using a SQL INNER JOIN. An example of the code which I used is given below:

Playercards=data.frame(as.integer(runif(20,0,51)))

colnames(Playercards)=c(“Playercards”) CardOptions=data.frame(c(0:51),c(13,1,2,3,4,5,6,7,8,9,10,11,12, 13,1,2,3,4,5,6,7,8,9,10,11,12, 13,1,2,3,4,5,6,7,8,9,10,11,12, 13,1,2,3,4,5,6,7,8,9,10,11,12))

colnames(CardOptions)=c(“CardNumber”,”CardValue”)

sqldf(“SELECT Playercards.Playercards as PlayerCardOption, CardOptions.CardValue as PlayerCardValue FROM Playercards INNER JOIN CardOptions ON Playercards.Playercards=CardOptions.CardNumber”)