Visualization with Gourse
Table of Contents
Recently I made this visualization for contributors to https://l10n.fs.al for the last two years (2012-2013). This visualization was done with gource. It was a bit of hacking, so I would like to describe here how I did it.
1 Getting the data to be visualized
For each translation or vote on https://l10n.fs.al/ the time and the name of the author is saved as well. So, we would like to get this information from the database, as well as the project to which this translation belonged, for all votes and translations.
First have a look at this diagram: db_diagram.png just to get an idea of the structure of the database.
Then look at the script below which I used to extract the data.
This is simply a join of the tables and extracting the fields that
are needed. When the time of vote is the same as the time of
translation, then this is a new translation and the field new
is
true.
2 Transforming data to the right format
In the result file contrib.txt
fields are separated by TAB, so
first I replaced tabs with commas (,) for easy processing. I did it
with find/replace on Emacs, but any other editor can do this.
Gource expects the input file (which is called log file) in the
format timestamp|username|action|filename
. Gource was designed to
work with version control systems (like git, subversion, etc.) in
order to visualize the project activity, so the action
is expected
to be A, or M, or D etc. (respectively for adding, modifying,
deleting a file) and filename
is the file that was touched.
I had to transform the data to this format and I did it with a script like this:
I also made sure that there are no lines containing NULL and that the file is sorted in increasing order by the timestamp:
sed -i contrib.log -e '/NULL/d' sort -t'|' -k1 contrib.log > contrib-1.log rm contrib.log mv contrib-1.log contrib.log
3 Feeding the data to gource and generating the output
First make sure that the tools that we need are installed:
sudo apt-get install gource ffmpeg
Then generate the video with a script like this:
#!/bin/bash gource contrib.log -s 0.1 -i 0 --max-files 0 \ --date-format "%B %Y" --hide dirnames,filenames \ -640x360 -o - \ | ffmpeg -y -r 25 -f image2pipe -vcodec ppm -i - \ -vcodec libvpx -b 10000K contrib.webm
How did I know that these are the right options? I didn't know
(especially for ffmpeg
), I simply googled and found some examples,
then I played with some options of gource
in order to get it
right.