[collectd] The joy of (different) encodings

Florian Forster octo at verplant.org
Sat Nov 25 10:26:01 CET 2006


Hi Lubos,

I've broken the thread, since ultimately this isn't a ignorelist
problem..

On Fri, Nov 24, 2006 at 12:29:36PM +0100, Lubo?? Stan??k wrote:
> There is another problem. My regex implementation is for 8-bit
> characters. It will work with the UTF-8 strings till you want to check
> special character properties.

As far as I know, the POSIX regexen use the `locale' setting to
determine the character set. This may or may not match the actual
encoding of the config file, which might cause problems. I see two
possible ways around this:
1) Detmine the files encoding, temporarily set the locale and parse the
   file. This, however, may cause problems later on, and I don't think
   setting the locale and never change it back is a bad thing.
2) Convert all external data (config file, data read, network stuff) to
   our current locale. This may restrict the useable characters, but
   it's now the user's responsibility to set the correct locale, as with
   all other programs.

Also, since strings are exchanged over the network, either the used
encoding needs to be transfered, or we should use a unified encoding,
such as UTF-8. Since UTF-8 is the future and it's hip and it's colorful
(think: executive talk), I'd prefer it.

> You can object to this that it will be rare. With a HAL mounted media
> it is possible, take into account that UDF DVD has Unicode volume
> label.

That's right, it _is_ rare. We could substitute eight-byte characters
with something else, e. g. a questionmark as many other applications do,
to save users from weird characters being displayed. That should be
trivial to implement.

Any thoughts are welcome. Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://mailman.verplant.org/pipermail/collectd/attachments/20061125/552ef148/attachment.pgp


More information about the collectd mailing list