In my recent posts analysing the Ontario Public Sector Salary Disclosure, I produced several visuals and I thought I would share how I did it. Custom, high quality, effective, professional, clean, stylish visuals can be difficult to wrestle out of standard analytical software, so here’s some guidance on how-to.
|
|
|
I used all open source tools:
- R – Open source statistical analysis and charting software
- OpenOffice Calc – Open source alternative to Excel
- Inkscape – Open source alternative to Adobe Illustrator
The process was:
- Data Gathering and Analysis – a deep topic for another post
- CSV as a bridge between analysis tools and charting tools
- R or Calc to generate base chart
- Inkscape to clean up
Charts with R and Inkscape
Chart 1: Average Pathologist Salary by List Ranking
1. Data Gathering and Analysis – was an extensive task undertaken with C# and a topic for another post
2. CSV – In order to bridge from the analysis tool to the charting application (R), I used a simple flat file. If the analysis were done in R, this would obviously not be necessary.

3. R – Load the CSV into R and produce the base chart with the following code. See in-line comments for detail on how it works.
# Clear all existing variables from memory rm(list=ls()) # Set working directory for the csv file setwd("C:\\Users\\Aleksey\\Documents\\Data Journalism\\c#\\salaryDisclosure") # load the csv file data <- read.csv("rankRaiseAalysis.csv", header=TRUE, sep=",", as.is=TRUE) # take a subest of the data, only the top some lessData <- data[1:4000,] # x axis is rank x <- lessData$rank # set up the grid for the graphs # mfrom (4,4) defines 4 x 4 grid # mar defines margins, bottom, left, top, right # mgp moves the axis labels around and is currently redundant par(mfrow = c(4,4), mar = c(0.1,0.5,2,0.5), mgp=c(1,1,0)) for (i in 1998:2013) { # for whatever reason R doesn't do sensible string concatenation # this adds X to i to get the string for fetching the variable from the dataframe y <- paste0("X",i) plot(x, lessData[,y]*100, # also mutiply by 100 to get % values type="h", # histogram ylim=c(0,20), # y-axis limits min 0, max 20 main=i-1, # this is the chart title ylab="", # no y axis lavels xlab="", # no x axis labels xaxt="n", # suppress x axis yaxt="n", # suppress y axis xaxs="i", # no margin within the plotting frame to the left or right yaxs="i", # similarly col="#550000" # plotting colour ) } # for the next plot, we don't want the 4x4 grid, so set it back to 1x1 par(mfrow=c(1,1), mar=c(3,3,3,3), mgp=c(1.5,0.5,0)) plot(x, lessData$X2013*100, # plot only 2013 in this chart, multiply by 100 to get % values type="h", #histogram ylim=c(0,20), # y-axis goes from 0 to 20 ylab="", # no y axis labels yaxs="i", # no margin within the plotting frame to the top or bottom yaxt="n", # suppress y axis xlim=c(0,4000), # x-axis goes from 0 to 4000 xlab="", # no x axis labels xaxs="i", # no margin within the plotting frame to the left or right xaxt="n", # suppress the x-axis main="Year 2012 % Salary Growth For Top 4000 on Sunshine List", #title col="#D45500") # add our own axis title title(xlab="Rank", cex.lab = 1) # size # add a custom x-axis axis(1, # 1 = at the bottom at=c(1,1000,2000,3000,4000), # vector of value locations for the ticks labels=c(1,1000,2000,3000,4000)) # vector of labels for those ticks # add a custom y-axis axis(2, # 2 = y axis at=c(0, 10, 20), # vector of value ocations for the ticks labels=c("0", "10", "20"), # vector of labels for those ticks cex.lab=0.5) # size
Created by Pretty R at inside-R.org
The following chart is generated by R and can be exported to SVG for loading into Inkscape.

4. Inkscape
Open the file in Inkscape, ungroup the elements and start cleaning:
- Colourise
- Add legends and labels
- Removal of excess ink for a cleaner look
- No Y or X axis- lines, these can be visually implied by the ticks
- No plotting area border, again visually implied by the other elements
- Better Y and X axis ticks and labels
- Required extensive use of the object align and distribute features
Chart 2 – Salary Growth at top of “Sunshine List”
1. Data Gathering and Analysis – was an extensive task undertaken with C# and a topic for another post
2. CSV – In order to bridge from the analysis tool to the charting application (R), I used a simple flat file. If the analysis were done in R, this would obviously not be necessary.

3. R
Load the CSV into R and produce the base chart with the following code. See in-line comments for detail on how it works.
# Clear all existing variables from memory rm(list=ls()) # Set working directory for the csv file setwd("C:\\Users\\Aleksey\\Documents\\Data Viz\\blogging\\017 - Pathologists follow-up") # load the csv file data <- read.csv("newPathologists.csv", # csv file header=TRUE, # varaible names are at the top sep=",", # it's commas, it's a csv as.is=TRUE) # build a plot with the firs tseries of data against year plot(data$year, data$X1.to.25, type="l", # makes a line plot ylim=c(0,450000), # sets the range for the y axis xaxt="n", # supresses the x-axis for customisation later lwd=3) # sets the line width # build our own custom x axis axis(1, # puts it at the bottom at=c(1997,2002,2007,2012), # position for the ticks labels=c(1997,2002,2007,2012)) # labels for those ticks # use a for loop for the rest of the data, the other 7 series for (i in 1:7) { # below is my ridiculous solution for string manipulation in R # in order to turn 1, 2, 3... into the names of my variables y <- paste0("X",paste0((1+25*i),paste0(".to.",(25+25*i)))) # the lines command adds lines to an existing plot lines(data$year, # x is still year data[,y], # having built y as a string i.e. "X25.to.50", you can refernce it this way lwd=(3-2*(i/7))) # a little function for line width that makes later series thinner }
Created by Pretty R at inside-R.org
The following two charts are generated by R and can be exported to SVG for loading into Inkscape.


4. Inkscape
- Open the file in Inkscape, ungroup the elements and start cleaning:
- Customise colours
- Better titles
- Grey-out borders and labels to reduce chart clutter
Chart with OpenOffice Calc and Inkscape
Chart 3 – Ontario “Sunshine List” Salary Growth
1. Data Gathering and Analysis – was an extensive task undertaken with C# and a topic for another post
2. CSV
In order to bridge from the analysis tool to the charting application (R), I used a simple flat file. If the analysis were done in R, this would obviously not be necessary.
3. OpenOffice Calc
Load the CSV into Calc. Create the three bar charts.
4. Inkscape
- Copy and paste from OpenOffice Calc into Inkscape.
- Tear up everything, keeping only the bars.
- Custom labelling, lines, row shading, etc.
- Extensive use of align and distribute features.




