Monday, May 2, 2011

Unicode symbols in R

A friend asked me this morning if there was a way to plot a symbol in R (as a plotting character) representing a half-filled circle. I didn't know, but I figured this out (perhaps it's demonstrated elsewhere -- the ability to use Unicode symbols was added in 2008 or so -- but I didn't stumble across it). First, looking at this list of Unicode shapes indicated that I wanted Unicode symbol 25D1. Then looking at ?points indicated that I could use a negative value (in this case -0x25D1L allows me to enter the value as hexadecimal: the L denotes a (long) integer). So

plot(1,1,pch=-0x25D1L)
plot(1,1,pch=-as.hexmode("25D1"))
plot(1,1,pch=-0x25D1L)

all work equivalently.

TestUnicode <- function(start="25a0", end="25ff", ...)
  {
    nstart <- as.hexmode(start)
    nend <- as.hexmode(end)
    r <- nstart:nend
    s <- ceiling(sqrt(length(r)))
    par(pty="s")
    plot(c(-1,(s)), c(-1,(s)), type="n", xlab="", ylab="",
         xaxs="i", yaxs="i")
    grid(s+1, s+1, lty=1)
    for(i in seq(r)) {
      try(points(i%%s, i%/%s, pch=-1*r[i],...))
    }
  }

TestUnicode()
TestUnicode(9500,9900)  ## some cool spooky stuff in here!
One thing to keep in mind is that you should test whatever symbols you decide to use carefully with whatever graphics path/display/printing solution you plan to use, as all platforms may not render all Unicode symbols properly. With a little more work I could change TestUnicode() to do proper indexing so that it would be easier to figure out which symbol was which. Watch for my next paper, in which I will use Unicode symbols 9748/x2614 ('UMBRELLA WITH RAIN DROPS'), 9749/x2615 ('HOT BEVERAGE'), 9763/x2623 ('BIOHAZARD SIGN'), and 9764/x2624 ('CADUCEUS') to represent my data ... Related links:

PS This worked fine on my primary 'machine' (Ubuntu 10.04 under VMWare on MacOS X.6), but under MacOS X.6 most of the symbols were not resolved. The friend for whom I worked this out has also stated that it didn't work under his (unstated) Linux distribution ... feel free to post in comments below if this works on your particular machine/OS combination. There is a remote possibility that this could be done with Hershey fonts as well (see this page on the R wiki for further attempts at symbol plotting), but I don't know how thorough the correspondence is between the Hershey fonts and the Unicode symbol set ...

PPS I asked about this on StackOverflow and got a useful answer from Gavin Simpson, referencing some notes by Paul Murrell: use cairo_pdf. This should work on any Linux installation with the Pango libraries, I think. In principle it could work on MacOS (and/or Windows?) with Pango installed as well, but I haven't tried ...

4 comments:

  1. Very cool, Now it's time to make star wars with it!

    ReplyDelete
  2. I can get the figure to render on Ubuntu but not when I send it to a pdf device. On MacOS 10.6.6, I get the following behavior: (1) failure without warnings when sent to x11 device, (2) failure with warnings when sent to pdf device, (3) very partial success when sent to a png device. The warnings I get were of the following form:

    In plot.xy(xy.coords(x, y), type = type, ...) ... :
    conversion failure on '┣' in 'mbcsToSbcs': dot substituted for

    ReplyDelete
  3. Are two of your three statements that work equivalently supposed to be identical, or did I miss something?

    ReplyDelete
  4. Here's a puzzle. It's possible to write print("\u2348") but paste0("\u", as.character(2348)) fails.

    Maybe it's possible with cat as per Duncan Murdoch: http://r.789695.n4.nabble.com/Export-Unicode-characters-from-R-td3669075.html

    ReplyDelete