mercredi 23 décembre 2009

Excel 2000/2003 bug



Trouvé ce jour un bug sous la version française de Excel 2003. Rien d'extraordinaire, vous me direz... Je publie néanmoins parce qu'il est a mon sens révélateur de la philosophie des produits Microsoft (à laquelle je suis de plus en plus réfractaire).

Le descriptif du bug:
- dans une cellule, taper les 3 lettres "dec"
- copier/coller (CTRL-C/CTRL-V) cette cellule, et la coller dans celle du dessous.
- Sélectionner les 2 cellules, puis à la souris, pointer la poignée (coin inférieur droit de la sélection), clic-gauche, et on descend (recopie incrémentale).
Comme on a sélectionné deux cellules identiques, il n'y a bien sur pas incrémentation, par contre, Excel transforme le contenu des cellules en "déc".

On pourrait se dire qu'il s'agit du formatage automatique de la cellule en fonction de ce qu'on saisit dedans. Exemple, je tape "1/1/09" dans une cellule, et il considère ça comme une date, et active le formatage adéquat. Du coup, si dans la même cellule, je retape "1", il m'affiche "01/01/1900". Déjà ça, c'est limite, mais on s'y fait (si, si...)

Mais en l'occurrence, non, c'est même pas ça ! Si dans une cellule copiée contenant "déc", je saisis "1", j'obtiens bien "1" affiché !

Voilà bien le genre de chose qui m'horripile au plus haut point. Lui se dit : "bon, ce crétin parle du mois de décembre, mais il ne sait pas qu'il y a un accent, alors je vais lui mettre l'accent, dans son dos, sans rien lui dire"

Je ne supporte pas qu'un programme fasse des trucs dans mon dos. Qu'il me fasse des suggestions, des propositions, tout ce qu'on veut, mais quand je tape 3 lettres, que je copie ensuite, je ne veut pas qu'il me les change. C'est moi qui connait la sémantique derrière mes saisies clavier, je ne lui demande pas de faire des hypothèses là-dessus.

Après essais, existe aussi dans la version 2000. Et sur 2007 ?

mercredi 9 décembre 2009

Transforming column datafiles into line datafiles on Windows




In the field of computer science, you frequently need to visualize data: it always makes things clearer when you show a chart, the reader gets the picture just by looking at it, without having to read the painfull 15-lines paragraph below.

So you have data. Sometimes lots of data. Sometimes badly organised text data files. For instance, where successive values are not on successive lines, but on successive columns. This might not be clear for everybody, so here is an example. Say you have a file containing temperatures measured every hour, for some period of time (a month, for instance). The usual way of doing is one measure per line:

1/1/2009;0;22
1/1/2009;1;23
1/1/2009;2;22.5
2/1/2009;0;24
...
and so on. And sometimes you run across datafiles where the layout is:

1/1/2009;22;23;22.5
2/1/2009;24; ...
...
While regular people won't find anything bad about this (it does indeed save some disk space!), this type of layout is actually unlogical: columns are supposed to be the different fields of data, not successive values. And it makes things more complicated when it comes to plotting...

I ran across this issue when trying to illustrate my previous post on gnuplot with a nice figure. I wanted to have a fancy "real-world data" illustration, so I downloaded electricity daily consumption datafiles from RTE (you can get those here). They provide Excel files per year, with 365 lines, one per day, and the power consumption for every half-hour (48 values per day). And guess what, the layout is just as described here...

So, first, before writing an adequate gnuplot script file, you need to transform columns into lines. And this is what this post is about, for Windows users. It can also be considered as a demo of what you can do with the Windows command-line interpreter.

A quick search shows some interesting material, mostly based on Linux tools (see here for example). And yes, Windows users, you'll need to get some new software, because Windows lacks some basic tools. At present we will only need the 'cut' tool, a binary can be downloaded through the gnuwin32 coreutils package.

What we need to do here is to process each line of the file, cut it into fields, and write a one datum per line output datafile. So, lets go first for the line-by-line processing, using the for command (you have of course already converted the file to .csv format):

for /F "delims=." %%a in (%file%) do call :sp1 "%%a"
Of course, you will have previously put the file name into the 'file' variable with set file=myfile.csv. The "delims" item is there just to get the whole line of data.

Then, we need to produce one output file for every column that contains a data value. This is done with the "numeric" version of the 'for' command:

--------------------------------------------
:sp1
set line=%~1
echo %line% >line.txt
for /L %%b in (1,1,48) do call :sp2 %%b
goto :eof
--------------------------------------------
Finally, cut the same line 48 times, and add each of the columns to the output files (column 1 is the date, column 2 is some non-significant data, the first value is in column 3):

--------------------------------------------
:sp2
set /A col=2+%1
"%app%" -d; -f1,%col% line.txt >> all.dat
goto :eof
--------------------------------------------
with the variable 'app' containing "c:\program files\gnuwin32\bin\cut.exe"

And that's about it, the output file all.dat will now contain one line per data point. The only thing left is to leave a line between each day, so gnuplot can figure out where the day stops, so sp1 is actually more like:

...
for /L %%b in (1,1,48) do call :sp2 %%b
:: gnuplot needs 1 blank lines to separate records
echo. >> all.dat
goto :eof
--------------------------------------------
Finally, we can plot the thing with some classical gnuplot scripting:

------------------------------------------------------
set title "France power consumption on mondays, 2008\n(data source : RTE)"
set xrange [0:47]
set xlabel "day period"
set ylabel "week"
set yrange [0:51]
set zlabel "MW" offset 0,3
set ztics 30000,10000
set datafile separator ";"
set style data lines
set grid
set pm3d
set surface
set hidden3d
fn="all.dat"
set view 42.0,56.0
unset colorbox
splot fn using 2 every 1:7 notitle
pause -1
set terminal png size 640,480
set output "RTE_2008_monday.png"
replot
------------------------------------------------------
Feel free to comment (english or french) if you'd like more details.