2012-11-01

UTF-8 everything

snarklet

Popular wisdom among developers does change over time. But on character encoding, this wisdom has become consistent: use UTF-8. Use UTF-8 everywhere. In your Strings, in your database tables, in your data store, in your messages, in your pages, in your scripts. "UTF-8 all the things."

It begs the question, why is anything other than UTF-8 the default character encoding for any system? Every system that contends with character encoding released since—oh, I'll be generous and keep it recent—2010 should use UTF-8 out of the box, fresh from the git hub or bit locker.

It angers me, frankly, that as a user of an API, I have to be the one to realize, oh lordy, there's a character transcoding going on here, and for some insane reason this thing isn't using UTF-8. Okay then, how do I tell it to snap out of its fit of insanity?

Stop it. I don't want ISO 8859. I don't want Windows-1252. I doubt anyone really does, they just don't know they need to tell you they don't.

About this blog