aboutsummaryrefslogtreecommitdiffstats
path: root/lib/normalize_string.rb
Commit message (Collapse)AuthorAgeLines
* Remove unused variableRowan Crawford2014-03-01-2/+0
|
* Handle UndefinedConversionError when converting to utf-8Rowan Crawford2014-03-01-5/+5
| | | | | | From: http://ruby-doc.org/core-2.0/String.html#method-i-encode Ducktypes for having encode rather than relying on RUBY_VERSION
* Add a helper function for dumping text to diskMark Longair2013-05-16-0/+11
| | | | | | | This function is useful for investigating problems with handling of emails, attachments and the related character encoding issues. It can safely be removed later, but is currently useful to have for debugging purposes.
* Add functions for converting from arbitrary text data to UTF-8Mark Longair2013-05-16-0/+75
Throughout the codebase it is simplest and most consistent if we could assume that all text/* attachments are represented by UTF-8 strings, and this was largely true with the TMail backend which ensured that all returned text parts were in UTF-8. We have to change the replacement Mail-backed to similarly attempt to convert text parts to UTF-8. This commit introduces two functions which are useful for this. The normalize_string_to_utf8 function will try various encodings, either suggested or guessed (with charlock_holmes) to convert the passed string to UTF-8, and if it can't find a suitable encoding will throw an exception. Unfortunately, the current behaviour of the site is that uninterpretable text/* attachments are still passed around and mangled to UTF-8 just before display. To mimic this it's also useful to have the convert_string_to_utf8_or_binary function, which tries to convert the string to UTF-8 with normalize_string_to_utf8, but if that's not possible just returns the original string. (In Ruby 1.9, encoding will be set to UTF-8 or ASCII-8BIT appropriately.)