Wednesday, May 16, 2012

My take on polar bar (a.k.a. consultant's) charts

Once upon a time, when I was working at Johnson & Johnson (pharma branch), I was surrounded by a bunch of programmers working to develop (among other things) a nifty piece of software for internal use. Part of it was later released as freeware, called Vlaaivis. The main idea was to visualize each the compound's many data at once, with each property being represented by a slice within the pie. For each property, ti was possible to define an ideal value, or range, and values above or below that one would show up as incomplete slices of two shades of the same color... This way, the fuller the pie the better, or more 'dieal', the compound under exam would be.
You may find it still at its home, http://www.vlaaivis.com/.

As I moved on to different things, I still remembered this as a good way of visualising multifactorial data, especially when comparing several candidates (compounds, in my case). Inspired by a post on LearnR, I decided to start reimplementing something similar in R for use within our own group.

The most significant change from the LearnR post is that I added in a 'facets_grid()' call to the ggplot, so as to split the polar bar chart in the different compounds. Thanks to the faceting, all compounds are plotted on the same 'space' and are therefore immediately comparable.

The other, minor, change I made was to create the dataframe as a matrix, which is the way we usually store such kind of data, and use 'melt', from the package 'reshape', to convert it into a form suitable for plotting as bar chart (polar or not). I left in the comments an option for reading in the data from a text file...

Here's the dummy template I came up with:



And here's my output:

If, as I described previously, the bars were some kind of normalised score, such as the recently suggested druglikeness score, then the fuller the pie, the better looking the compound would be for a medicinal chemist.

I omitted the legend, since the variable names (a-e) is present in each plot (does anyone know how can I get rid of the 0.5 legend key? it comes from the alpha definition in the ggplot).

Two major things left to do:


  1. I would like to plot the compound in order of 'fullness' - a sort/order snippet is there in the code, and the new ordering survives the melting of the matrix - however, ggplot seems to rearrange the data according to some internal order... (Thanks to Christoph for fixing this)
  2. Right now, the code isn't suitable for too many compounds, since the facets_grid() will arrange them horizontally. I would be grateful if someone were to let me know how to automatically arrange them in a grid of a given maximum number of columns... I know how to do that when I explicitly create each plot, but then I loose the ease of comparison which comes from all compounds being plotted on the same scale... ( Thanks to Christoph for fixing this too)


I'll update this text and the code as I improve the visualization.

If you don't feel like messing around with code, you can always try and build a similar plot using deduceR:
http://www.r-statistics.com/2010/08/rose-plot-using-deducers-ggplot2-plot-builder/

Hope one or more of you find this useful!

5 comments:

  1. The 0.5 is in the legend because you specified it as an aesthetic. Move the "alpha=0.5" from the aes() call to the geom_bar() one and it'll be gone.

    ggplot orders the facets by the order of the factor that you're using, which is by default alphabetical. Reorder them using something like

    DF$cpd <- factor(DF$cpd, levels=c("B", "A", "C"))

    I'm not quite sure what you mean in question 2; maybe facet_wrap is what you want?

    ReplyDelete
    Replies
    1. Thank you for your suggestions, Christoph.

      The ordering now works beautifully!

      facet_wrap is exactly what I was looking for, as well.

      Unfortunately if I move the 'alpha' statement from the aes to the parent function, the fill colors aren't semi-transparent any longer... perhaps I misunderstood something?

      Delete
    2. Yes, in the parent function it does not work, but putting it into the geom_bar() call works for me.

      aes() defines mappings from data to presentation, but the alpha level in your chart is supposed to be a fixed property. Thus you need to put that directly into the elements that are supposed to be transparent.

      Delete
    3. Ouch! Thanks a lot! I had forgotten that I had described the variables to plot in the ggplot function, not in the geom_bar... I'll fix it.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete