Monday, July 25, 2011

Using the OS X clipboard in R

One annoyance with using a command line based statistical system, i.e. R, is that if you use a spreadsheet program to handle your data (as most of you do) you have to incessantly export to csv and import to R. For big data sheets and complex calculations this generally a Good Idea(tm). It improves reproducibility and readability when going back to the project at some later time. However, the complexity of that is enough that you often, at least I, end up doing statistics in Excel. This is generally a Bad Idea, and often entails much more work in the long run. In order to make importing and exporting data easier R can handle the clipboard directly, which is what this post is about.

In OS X the clipboard is easily accessed through the command line programs pbcopy and pbpaste (pb means "pasteboard" but everyone knows it as the clipboard so that's what I'm going to call it). These are easily scriptable programs that can be used with any unix command in the shell or scripts (enter "man pbpaste" in the terminal to learn more). "pbpaste" pastes the current clipboard to std-out (that is where you call it from), while "pbcopy" copies std-in (what you put there) to the clipboard. An important quirk is that they only handle plain text, RTF and EPS, that is formats that are transferred as plain text. This is because unix is a text based system. What it means is that you can't copy stuff from formatted text, e.g. word, but with spreadsheets you are OK.

As command line programs they can be handled through the R command: pipe(). Pipe does not produce the current clipboard contents, just a description of the pipe as such. You have to use a read command to get the actual contents.
> readLines(pipe("pbpaste"))
which will produce one string per line (completely useless as data, but often useful to see how a given copy is formatted). To format it for data you use the same as you would any import, using one of the higher level read-commands: read.table(), read.csv(), etc.
> x <- read.table(pipe("pbpaste"))
This will put your copied excel table directly into the table "x" in R. Often you will be forced to tweak the function a little to get a usable table.
> x <- read.table(pipe("pbpaste"), header=TRUE, sep="\t")
That should be able to read your copied Excel data ("header" specifies that the first row is the header, and "sep" specifies the separator which defaults to tab or "\t" in excel).

To write to the clipboard you use "pbcopy". It is a bit more involved, but not much. First you connect a variable to the pipe from pbcopy.
> osxclipboard <- pipe("pbcopy", "w")
Now you can write to the variable "osxclipboard" and it will turn up on the clipboard so that you can paste it into another application. You have to remember that it is a pipe to an external file, which means that you have to use "write()" to put data in the clipboard. If you just assign some data to "osxclipboard" it will be changed from a pipe to that data. Thus:
> write(x, file="osxclipboard")
where "x" is your data, will to the trick. Then you just have to clean up using:
> close(osxclipboard)
In the same way as with copying from the clipboard, this will not format your data optimally. so you have to add some formatting command for Excel to read it properly.
> write(x, file="osxclipboard", sep = "\t")
There we are, but that is rather alot of code. Not really that much easier than a regular export from excel and then import to R. So here are two simple functions to do it all at once, with the most used arguments (I'm sorry for the code, I haven't figured out how to get indenting to work properly in blogger).

Simple readlines from the clipboard:
clipboard.readlines <- function (
clipboardname = "pbpaste",
n = -1L) {
readLines(con = pipe(clipboardname),
n = n)}
Read table from the clipboard. I have only included the most commonly used arguments to read.table(). You could easily add all the rest if you need them:
clipboard.readtable <- function (
clipboardname = "pbpaste",
header = FALSE,
sep = "",
quote = "\"'",
dec = ".") {
read.table(file = pipe(clipboardname),
header = header,
sep = sep,
quote = quote,
dec = dec)}
Write to the clipboard:
clipboard.write <- function (x,
clipboardname = "pbcopy",
sep = " ") {
osxclipboard <- pipe(clipboardname, "w")
write(x = x,
file = clipboard,
sep = sep)
Close(osxclipboard)}
That was it for now. I hope someone finds it helpful. I'm probably not putting any of this into a formal extension, so you will just have to run the code to use it. Consider the code GPL-licensed, although the text is under regular copyright.

5 comments:

  1. Very useful! Thanks for sharing.

    ReplyDelete
  2. clipboard.write is missing a comma at the end of second line

    ReplyDelete
  3. Fantastic functions! But shouldn't the 'clipboard' in function 'clipboard.write' be 'osxclipboard', as otherwise it is not defined anywhere? Beside, "Close' is a mistype for 'close' ? I got an error like 'could not find function "Close"...

    ReplyDelete
  4. Thanks Dong Liang,

    Honestly, I don't know. You may be right, I think I copy-pasted working code, but they are like five years old now so I don't know. In addition, a lot might have changed on the operating system side.

    ReplyDelete