aboutsummaryrefslogtreecommitdiffstats
path: root/lib/normalize_string.rb
Commit message (Collapse)AuthorAgeLines
* convert_string_to_utf8 returns struct of string and scrubbing status.Louise Crow2015-06-22-1/+6
|
* Round trip through utf-16 to clean utf-8 stringLouise Crow2015-06-22-6/+10
| | | | | As noted in the ruby docs (http://ruby-doc.org/core-1.9.3/String.html#method-i-encode), any conversion from an encoding to the same encoding is a no-op, covert it first to utf-16.
* Merge branch 'force-filenames-to-utf8' into rails-3-developLouise Crow2015-05-28-0/+14
|\
| * Force attachment filenames to utf-8 before trying to save themLouise Crow2015-05-15-3/+6
| | | | | | | | | | | | | | | | | | In a database with encoding SQL-ASCII, an invalid utf-8 filename can be saved but will cause an "invalid byte sequence in UTF-8" when the filename is prepared for display. In a database with a UTF-8 encoding, saving the string will cause an error like "ActiveRecord::StatementInvalid (PG::Error: ERROR: invalid byte sequence for encoding "UTF8""
| * Add method for forcing strings to valid utf-8Louise Crow2015-05-14-0/+11
| | | | | | | | | | Try likely conversions but if that fails, just replace the characters that are invalid utf-8.
* | Add source file encoding for all ruby files.Louise Crow2015-05-15-0/+1
|/ | | | | | This is important under ruby 1.9 in order to determine the encoding that will be used for new strings created in the code in the file.
* Remove unused variableRowan Crawford2014-03-01-2/+0
|
* Handle UndefinedConversionError when converting to utf-8Rowan Crawford2014-03-01-5/+5
| | | | | | From: http://ruby-doc.org/core-2.0/String.html#method-i-encode Ducktypes for having encode rather than relying on RUBY_VERSION
* Add a helper function for dumping text to diskMark Longair2013-05-16-0/+11
| | | | | | | This function is useful for investigating problems with handling of emails, attachments and the related character encoding issues. It can safely be removed later, but is currently useful to have for debugging purposes.
* Add functions for converting from arbitrary text data to UTF-8Mark Longair2013-05-16-0/+75
Throughout the codebase it is simplest and most consistent if we could assume that all text/* attachments are represented by UTF-8 strings, and this was largely true with the TMail backend which ensured that all returned text parts were in UTF-8. We have to change the replacement Mail-backed to similarly attempt to convert text parts to UTF-8. This commit introduces two functions which are useful for this. The normalize_string_to_utf8 function will try various encodings, either suggested or guessed (with charlock_holmes) to convert the passed string to UTF-8, and if it can't find a suitable encoding will throw an exception. Unfortunately, the current behaviour of the site is that uninterpretable text/* attachments are still passed around and mangled to UTF-8 just before display. To mimic this it's also useful to have the convert_string_to_utf8_or_binary function, which tries to convert the string to UTF-8 with normalize_string_to_utf8, but if that's not possible just returns the original string. (In Ruby 1.9, encoding will be set to UTF-8 or ASCII-8BIT appropriately.)