Monday, February 18, 2013

ICU is cool

I have always disliked iconv. Both iconv and icu are designed to convert text in various character encodings into other encodings. iconv is normally installed on your computer (in OSX or Linux etc.), but icu is much easier to use. The problem is that iconv doesn't provide any easy way to compute the length of the destination buffer, whereas in icu this is trivial. For example, if I want to compute how long a buffer to contain text will be when I convert it from utf-16 all I do is pass 0 as the destination length and a NULL buffer and it tells me how long to make it:

Having called that function, I simply allocate a buffer of exactly that length, pass it into the conversion function again, and Bob's your uncle. The way to do this in iconv is to guess how big to make it, then reallocate the destination buffer as often as needed during the chunk by chunk conversion. Then you can read the number of characters converted. What a messy way to do it! I particularly like the fact that icu does NOT require you to specify a locale, as iconv does for some obscure reason. That limits the conversions you can do to those locales installed on your machine, and you have to guess which of them is appropriate for the current conversion. That's just nuts.

No comments:

Post a Comment