I recently started using collectd (storing the values in rrd archives). 
I have noticed something I consider a flaw in rrdtool file creation. I 
hope somebody can explain the current practice :)

When creating the files the first (shortest) timespan defined also 
defines the number of rows in the first rra segments if that timespan 
divided by the interval is bigger than the defined rrarows. In other 
words the number of rows is increased to accomodate having a cdp slot 
for each pdp coming in (for the timespan).

There are however 3 rra's defined for each timespan: AVERAGE, MIN and 
MAX. When having a single pdp for each cdp these 3 values are the same. 
So I fail to see the point of storing it 3 times. Can somebody elaborate 
why this behavior was created?

For our use case ( 14d timespan with 5s interval ) the savings in 
diskspace would be considerable.

Any thoughts on why this would be a bad idea?

I have looked at the source and to me it looks like a patch that is 
contained in only util_rrdcreate.c. In the rra_get function the pdp/cdp 
ratio is calculated already. When it is 1 the code can use another list 
of aggragator words (rra_types variable) containing just the 'AVERAGE' 

Martijn Posthuma

