[collectd] Authentication/Encryption for the network plugin.

Sun Apr 12 20:41:17 CEST 2009

Hi Thorsten,

On Sat, Apr 11, 2009 at 09:05:12AM -0700, Thorsten von Eicken wrote:
> Cool! I believe a more powerful scheme is needed, but this is a great
> start. It would be really good to explain in the man page what you're
> signing and encrypting. Also, your english is a bit confusing around
> the "Sign" option. I believe you want to say that the receiver
> requires signed input data which may optionally also be encrypted.

I've fixed/changed bits and pieces all day on Saturday, so some of what
I wrote previously has been superseded.

First off: The network packet consists of “parts”, which all have the
following form:
 +-+-+----------+
 !T!L! Data ... !
 +-+-+----------+
 T == type of data (2 bytes)
 L == length of the part (including the two header fields, 2 bytes)

Parts of an unknown type are skipped. This is possible, because the
length of the part is known and ensures forward-compatibility.

> I'm assuming the signature is something like:
>    SHA1(shared_secret + ":" + unsigned_data)
> and the resulting packet becomes
>    unsigned_data + signature

It is now HMAC-SHA-256, so the call would be something like this in
pseudocode:
 hash = hmac_sha256 (shared_secret, data)
I've learned, that simply doing “hash = sha256 (secret + data)” is prone
to attacks. Since libgcrypt offers the HMAC functions, I'm using those.
The “part” looks like this:
 +-+-+------+
 !T!L! Hash !
 +-+-+------+
 Hash == HMAC-SHA-256 hash (32 bytes => L = 36)
Currently the signature is assumed to sign ALL following data, i. e.
everything following the “part”. This way, older servers receiving such
a signature can simply skip it.

An alternative would be to put the signed data as payload into the part,
i. e. something like this:
 +-+-+------+---------+
 !T!L! Hash ! Payload !
 +-+-+------+---------+
 L == 36 + length(Payload)
The advantage would be that you could mix signed and unsigned data in
the packet or sign some data with one key and some other data with some
other key.

I think compatibility to older clients is worth the slightly inelegant
design of the first (and implemented) approach and that the mentioned
(now impossible) features are not important enough to break compati-
bility. It would, however, be very easy to add another part-type which
behaves according to the second description, making both possibilities
available.

> And is the encryption something like:
>    AES(shared_secret, iv + unencrypted_data)
> and the resulting packet:
>    iv, encrypted_data
> where iv is a randomly generated "initialization vector"?

Basically it's along this line:
  hash = sha1 (data);
  e = aes256 (shared_secret, (hash, data))
In the packet there will be:
 +-+-+-+---------+------+---------+
 !T!L!l! Padding ! Hash ! Payload !
 +-+-+-+---------+------+---------+
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 l    == length(Payload) (2 bytes)
 Hash == SHA-1 hash (20 bytes)
The data marked with ‘^’ will be encrypted, i. e. the entire payload.
Because the length of the buffer passed to AES must be a multiple of 16,
0–15 padding bytes are necessary and we cannot calculate the size of the
payload from L, therefore it is included (l). Padding bytes are filled
with random content.

Encryption is done without initialization vector (IV) and AES is set up
to work in cipher-block chaining mode (CBC). When talking with Sebastian
yesterday we realized that this allows for known plaintext attacks if
you have the network plugin set up to forward data using encryption. So
I will probably add a 16 byte IV and possibly rearrange the parts a bit.

> The big downside of all this that I see is that all machines need to
> have the same shared secret.

Maybe you could tell us some more about the problem you're trying to
solve..?

> I'm wondering whether we can't construct a simple scheme. Let me try:

So if I understand you correctly, you want to have one secret for each
client. Depending on *which* secret was used to sign the data, you want
to allow only certain values for ‘host’. Is that correct?

If we want to go into that direction, we should try to make this more
general. I'm sure as soon as we have integrated something like this,
someone will want their customers to be able to use only specific
plugins etc.

I could picture something along these lines:

 - Use the proven and tested username/password tuple.
 - The client includes ‘username’ in the packet and calculates an HMAC
   of the data using ‘password’.
 - The server looks up ‘password’ using ‘username’ and verifies the
   validity and integrity of the data.
 - The ‘username’ is added to the ‘meta data’ of each value_list_t that
   is dispatched. Meta data is a concept that has been planned for some
   time but isn't implemented yet. Basically, the idea is to add a
     notification_meta_t *meta;
   to value_list_t.
 - Use the ‘username’ in the filter subsystem to achieve your goal.

For “The server looks up ‘password’” I'd go for simple username /
password files. Something in the form:
  foo: waecheM0

This has worked great for Apache for years, so it's probably not the
worst solution. We should keep in mind that demand for other
authentication schemata may come up, such as LDAP.

Here's an example for “achieving our goal”. Everything but the “meta”
match exists.
 <Match "meta">
   # If this data has been authenticated
   # as belonging to user "foo" ...
   Username equals "foo"
 </Match>
 <Target "replace">
   # ... prepend "foo-" to the hostname.
   Host "^" "foo-"
 </Target>

And a more complex example, doing what I think you want to achieve:
 <Chain "customer foo">
   <Rule "check hostname">
     # If hostname ends with ".foo.com" ...
     <Match "regex">
       Host "\\.foo\\.com$"
     </Match>
     # ... continue in the calling chain.
     Target "return"
   </Rule>
   # Else (default rule): Stop processing.
   Target "stop"
 </Chain>
 <Chain "PreCache">
   <Rule>
     <Match "meta">
       # If this data has been authenticated
       # as belonging to user "foo" ...
       Username equals "foo"
     </Match>
     <Target "jump">
       # ... jump to the chain "customer foo".
       Chain "customer foo"
     </Target>
   </Rule>
   Target "return"
 </Chain>

Or course, you could write a more specific ‘target’ to get this done
with less configuration:
  <Rule>
    Match "username_and_host"
    Target "return"
  </Rule>
  # else
  Target "stop"

So let me know if I understood you correctly and if you think this is
going into the right direction.

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20090412/c66129fb/attachment.pgp