Proposed PEP 349: Generalized String Coercion
- Goal: ease transition from str to unicode
- Unicode-safe: can take unicode, doesn't try to coerce to str
- Str-stable: if passed str, doesn't coerce to unicode
- Using str () makes non-Unicode-safe, unicode () not str-stable
- "%s" % obj now coerces obj which has only __unicode__ to '<foo
instance>' (note that "%s" % unistr works)
- Returning unicode from __str__ causes str () to complain UnicodeEncodeError
if it can't encode.
- Want: promotion to unicode. PEP 349 defines text(), which will call
__str__, check that isinstance (res, basestring), and return without encoding.
Guido still prefers a bytes type (and immutable frozenbytes).
PEP 349 solves a problem I don't have often (probably because of my
lack of writing such libraries), but doesn't seem to
solve the problem I do have, of reading from external data sources:
foo_string.decode ('ascii') is type unicode and doesn't need to be,
but foo_string.decode ('iso-8859-1') might need to be or might not,
depending on what it contains.