Monday, May 14, 2012

Plotting data and distribution simultaneously (with ggplot2)

Ever wanted to see at a glance the distribution of your data across different axes? It happens often to me, and R allows to build a nice plot composition - This is my latest concoction. I used ggplot2 here, but equivalent graphics can be made using either base graphics, or lattice.

The set is the usual 'iris', the central plot has petal length and width along the X/Y axes - I  used a customised color palette so as to be friendlier to color-blind people. On the left and at the top of the main plot, the density distribution of the whole set (grey) and by subspecies.

Well, I hope the code is clear - this time I commented it a bit more...

Saturday, May 12, 2012

My own version of bubble plot (part 1)

During one of my projects, I found myself in need of visualizing more than 3 dimensions at once. Three-dimensional graphs are not a good solution, usually - they will need to be properly oriented, for a start, ad that's tricky.
So, I started looking at bubble plots. The size of the bubble can show one property, as illustrated by the nice post at FlowingData - then you can show one more property defined by a color scale (continuous below, but nothing stops it from being categorical) 

I decided to push it and have two properties: look at the example below - the redder the color, the higher the value on the property ApKUpt (or whatever you want). The greener, the higher ApVUpt. I moved the color legend to a square on the extreme right to achieve a better use of the available space.


I tried three colors but it turns out that it just doesn't work. Even when your eyes don't interpret every rgb triplet as a completely different color, the amount of redness, greeness or blueness is difficult to estimate. Also, it gets tricky to show the color grading in a legend... One has to resort to slices of the three-dimensional color space. See what I mean?

Of course, one can define an ad-hoc color scale, such as the one used below, vaguely inspired by the colors that Mathematica uses to paint its surfaces. Many thanks to my colleague Pär for teaching me how to define these kind of color scales, and much else.

Here follows the code for the one, two and three colors plot:


It's messy and not at all clean - but it gets the job done. This routine is also dependent from several others which define colorscale and other accessory functions... feel free to drop me a line in the comments if you want the lot... Similar plots can be obtained with ggplot2 in much fewer lines, although right now I'm less expert at it so they're much less customised.

First Post: Welcome to this new blog!!!

It's been almost one years that I've started using R as my main programming/analysis tool.

I like the fact that so many beautiful graphics can be produced directly within R.

Although I often just use the basic functionalities, often my work pushes me to develop more complex visualisations which I'd like to share with others so that my efforts aren't wasted after I'm done using them.

Here I'll do my best to share, in the hope that they may be useful to someone, and that more expert users may point out ameliorations to the code, as well.

Later on I'll add this blog to the R-Bloggers feed so that I can contribute back to where I picked up so much inspiration.

Update - 15/05/2012 - I just added the feed to R-Bloggers, to celebrate I\'ll do my best to put out a nice pie chart. My take on the consultant charts!