New Project: K-means Clustering

When it comes to data visualisation design, it’s always important to consider your purpose and your audience. Are you trying to convince your audience of a particular point of view? Are you giving your audience an platform from which to explore and find their own insights? In my latest piece I take a step down a less discussed path.

I have created an interactive tool using D3.js that gives the user a chance to see and interact with the typical k-means clustering algorithm from data mining/machine learning. It is my hope, that it will enable students to develop an intuition for how the algorithm works, and a better appreciation of its shortcomings.

You can learn more about k-means clustering here.

K-means Clustering

New Project: Paid to Win

Here’s one I had ready to go four weeks ago, but the unreliability of my current web provider got the better of me. This is another piece working from the BBC Price of Football Survey data, only this time mashing it up with league tables available from Wikipedia. I ask where you find the cheapest goals and cheapest wins throughout the English and Scottish Football leagues:

Paid to Win


Click through for the interactive version.

This is again quite similar to the English Football Value for Money piece I did previously and is mainly just a data remix with an extra dimension to filter on. It’s an interesting way to explore a ranking against two parameters at the same time.

Advantages:
  • Display a ranking against two parameters simultaneously
  • Display both as a ranking as well as relative values. You can easily see both what is higher and lower and also by how much
  • Easily compare two teams against both parameters as well as their relative performance against both parameters (i.e. comparing line slopes)
  • Easily identify big movers between the two measures
Disadvantages:
  • Slightly complex, so requires the reader to figure it out a bit
  • Alternative to showing two rankings at once would be to take a design decision to only show one, that which is deemed most important
  • Current solution requires hover which is not mobile-friendly
  • Solution is D3.js and therefore SVG and thus IE 7- unfriendly

New Project: Passing Direction of Premier League Football Players

The guys over at MCFC Analytics have released a dataset for the entire 2010-11 English Premier League Football season. This has generated a number of visuals on passing. For the time being, only aggregate by-game data is available for the entire season, but in-game event data for all games should follow. This is my first project, likely in a series on MCFC data.

Passing Direction of Premier League Football Players

This visualisation is an excellent tool for outlier identification. With players scattered and coloured by position, we can immediately see who the oddballs are. Who are the defenders that pass more like a midfielder? Vise versa? If I were more intimately familiar with the game, this would lead me down a path of investigation to determine what is different about that player. Is it interesting? Is it positive?

Similar analyses carry value in the business world also. A retailer may have many locations and formats, but within those formats are there stores that behave outside of the norm? Do we have large format stores that look more like small format stores in the data? Is there a cost savings opportunity?

In terms of technology, this is another application of D3.js and SVG. Unfortunately that means it won’t work on browsers IE8 and below. If this piece were to be part of a wide-ranging, consumer-facing project, then a graceful degradation for IE8 would be required. If this were an internal business tool, a particular browser could be mandated (though this unfortunately might be IE8 or below). Since this is a personal project, I choose not to spend the hours to support IE8.

Global Games, Regional Sports – Vancouver 2010

I previously posted a project I worked on for the London 2012 Olympics. Well, I did some digging around and I managed to find data for the Vancouver 2010 Winter Olympics, so I wrangled it into the format I had previously used, made some tweaks, and republished my Global Games, Regional Sports analysis, but this time with different data. Does that make this some kind of data-remix? Please find the result here: http://thinkdatavis.com/portfolio/global-games-regional-sports-vancouver-2010-olympics.html Sample Insights
  • Biathlon and Cross Country Skiing both move together towards Northern Europe when switching from participants to medals. Due to similarities in the sports, we would expect them to be linked.
  • Skeleton is most southern in participants due largely to Kiwi competitors, but as they don’t win medals it rockets up to the top when switching to medals.
  • Four of six ice hockey medals won by Canada and the United States move that sport far to the west. A bad Olympics for European Ice Hockey
Data Vancouver 2010 Medals by Athlete - http://winterolympics.external.bbc.co.uk/medals/2010-standings/athletes/index.html Participants from Contextures Blog via http://blog.contextures.com/archives/2010/02/19/excel-pivot-tables-at-the-olympics/, thank you for manually scraping, compiling, and publishing Latitude and Longitude of Capitals – Mix of sources, primarily Wikipedia

New Project: Who Won the Olympics?

The 2012 London Summer Olympics are over and the medals are tallied. As it is probably the most nationalistic event of the year, we all look to the medal tables to see how our nation did. So, “Who won the Olympics?”. Like any good question, the answer is: It depends. I’ve created this visualisation to explore that question: http://thinkdatavis.com/portfolio/who-won-the-olympics.html It’s not as well designed and doesn’t have the affordance that I would like, but I’m publishing today in its draft form. Guidance for usage:
  • The 2×2 space is created by valuing Golds between 1 and 10 Silvers and Bronzes between 0 and 1 Silvers. Depending on where you are in the space, the rankings are different.
  • Click a country in the rankings table on the right to see their information and thresholds
  • Click a region in the chart to see the ranking that corresponds to that combination of Gold and Bronze valuation.
  • Cross a red line to turn it blue and increase the ranking of your selected country. Conversely, cross a blue line to turn it red and decrease the ranking.
  • When hovering you can also see the rank for your selected country if you were to click
  • When hovering, the darker shaded regions are near to your current in terms of rank for your selected country, the lighter are further.
Clearly the United States came top with more golds, more silvers and more medals than anybody else. China is similarly a sure bet for second. In the standard medals table countries are ranked by number of golds with other medals used to break ties. This is convenient for the host nation, Great Britain, who scored lot’s of golds and inconvenient for Russia who didn’t. So what about those Silvers and Bronzes? Are they worth something? Is a Gold worth three Silvers? Is a Bronze worth half a Silver? These two parameters, the value of a Gold and the value of a Bronze, enable us to create a two-dimensional space to explore. What are the possible different rankings for a given nation? What ranking corresponds to my valuation of medals? Interesting insights:
  • For Great Britain to come 3rd, they require gold medals to be worth at least nearly 2 silvers, and if bronzes are worth something, golds may need to be worth as much as 4.5 silvers
  • Canada is probably most vulnerable to changes in values compared to any other country. With a huge bronze haul and not much else, Canada could rank as low as 36th or as high as 13th, where either gold medals are nearly everything (36th) or all medals are created equal (13th)
  • For many countries, an increasing valuation of golds or bronzes is strictly a good thing, but for some, it actually gets more complex. Iran, for example: high valuations of gold and bronze are generally a bad thing for Iran, except when the value of gold is very high

New Project: Global Games, Regional Sports – London 2012

Today I am publishing a project that I have been working on over the last couple of weeks during the 2012 Summer Olympics here in London. The essential question I asked was: If you calculate the weighted geographical midpoint of each sport at the Olympics, do you get an interesting result? I think the answer is yes, and I built a visualisation to support exploring it. Please find the visualisation here: http://thinkdatavis.com/portfolio/global-games-regional-sports-london-2012-olympics.html Sample Insights
  • Any east/west or north/south participation bias is emphasised in the medal results. For example, a sport that has more eastern participants will have proportionally even more eastern medals. Examples: Badminton, Table Tennis, Archery. There are a few exceptions including Boxing with southern participants and northern winners.
  • Athletics have an incredibly wide range of participants and much narrower set of winners, dominated by the United States, pulling the medals point to the West
  • Football and Hockey are quite southern sports. At least in the case of Football this is a result of low European participation
  • Very few sports have proportionally more southern winners than participants. Of the 32 sports, only six move south when switching from participants to medals: Beach Volleyball, Modern Pentathlon, Rowing, Sailing, Volleyball, Water Polo. This is due in part to the dominance of the United States and China in the games, but is true more generally as more northern countries tend to lead the medals table.
  • A lot of sports land where my intuition puts them: Northwestern European tennis, European handball, New world Beach Volleyball
  • Some surprises to me: Judo and Taekwondo are surprisingly central. Fencing appears European as well as east-Asian. Surprisingly international: Basketball. Probably not all surprises for all people.
Data 2012 Olympic Medals by Country by Sport - http://www.london2012.com/medals/medal-count/, scraped manually. All London 2012 athletes and medal data - https://docs.google.com/spreadsheet/ccc?key=0AuKpKzUJbSqtdEdDR29BY0JsRDFlbHQ1SVRHcjlsLWc for participants data, thank you The Guardian Latitude and Longitude of Capitals – Mix of sources, primarily Wikipedia Technology This visualisation uses D3.js (http://d3js.org/), a JavaScript framework for SVG. Note that this then requires Firefox, Chrome, Safari, Opera, or IE9. Note that IE8 is explicitly excluded from this list.