aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--app/models/incoming_message.rb32
-rw-r--r--todo.txt13
2 files changed, 33 insertions, 12 deletions
diff --git a/app/models/incoming_message.rb b/app/models/incoming_message.rb
index 2dbfec682..17a5844bb 100644
--- a/app/models/incoming_message.rb
+++ b/app/models/incoming_message.rb
@@ -17,8 +17,7 @@
# Copyright (c) 2007 UK Citizens Online Democracy. All rights reserved.
# Email: francis@mysociety.org; WWW: http://www.mysociety.org/
#
-# $Id: incoming_message.rb,v 1.82 2008-04-21 11:23:03 francis Exp $
-
+# $Id: incoming_message.rb,v 1.83 2008-04-21 14:45:06 francis Exp $
# TODO
# Move some of the (e.g. quoting) functions here into rblib, as they feel
@@ -328,12 +327,33 @@ class IncomingMessage < ActiveRecord::Base
# Charset conversion, turn everything into UTF-8
if not text_charset.nil?
- if text_charset == 'us-ascii'
- # Emails say US ASCII, but mean Windows-1252
- # XXX How do we autodetect this properly?
- text = Iconv.conv('utf-8', 'windows-1252', text)
+ begin
+ text = Iconv.conv('utf-8', text_charset, text)
+ rescue Iconv::IllegalSequence
+ # Clearly specified charset was nonsense
+ text_charset = nil
end
end
+ if text_charset.nil?
+ # No specified charset, so guess
+
+ # Could use rchardet here, but it had trouble with
+ # http://www.whatdotheyknow.com/request/107/response/144
+ # So I gave up - most likely in UK we'll only get windows-1252 anyway.
+
+ begin
+ # See if it is good UTF-8 anyway
+ text = Iconv.conv('utf-8', 'utf-8', text)
+ rescue Iconv::IllegalSequence
+ begin
+ # Or is it good windows-1252, most likely
+ text = Iconv.conv('utf-8', 'windows-1252', text)
+ rescue Iconv::IllegalSequence
+ # Just use it even though it is nonsense - treat as UTF-8
+ end
+ end
+
+ end
# Fix DOS style linefeeds to Unix style ones (or other later regexps won't work)
# Needed for e.g. http://www.whatdotheyknow.com/request/60/response/98
diff --git a/todo.txt b/todo.txt
index de298bd0c..ac4df9519 100644
--- a/todo.txt
+++ b/todo.txt
@@ -13,8 +13,9 @@ Cluster solr patch - https://issues.apache.org/jira/browse/SOLR-236
FOI requests to use to test it
==============================
-Complaint to info commissioner:
+Internal review:
http://www.whatdotheyknow.com/request/search_engine_advertising_bought
+http://www.whatdotheyknow.com/request/communications_from_home_office_
http://www.whatdotheyknow.com/request/details_of_grant_awarded_to_vi_g_
I received a reply on 4 April from Alison McCarthy to my request for
@@ -81,11 +82,6 @@ when sending "my response is late"
Change email address interface - easier to do now with post_redirect.circumstance?
-Consider showing Subject: of email somewhere
- e.g. for http://www.whatdotheyknow.com/request/172/response/234
- http://www.whatdotheyknow.com/request/breakdown_of_calulation_of_jsa
-the subject has all the content
-
One of the PDFs on live site has:
Error: PDF version 1.6 -- xpdf supports version 1.5 (continuing anyway)
Need to upgrade to poppler-utils?
@@ -198,11 +194,16 @@ Quoting fixing TODO:
http://www.whatdotheyknow.com/request/94/response/161
http://www.whatdotheyknow.com/request/police_powers_to_inform_car_insu
http://www.whatdotheyknow.com/request/sale_of_public_land_in_worcester
+ http://www.whatdotheyknow.com/request/148/response/209
+ http://www.whatdotheyknow.com/request/35/response/191
Char encoding and other bad formatting:
http://www.whatdotheyknow.com/request/107/response/144
http://www.whatdotheyknow.com/request/35/response/177
http://www.whatdotheyknow.com/request/52/response/238
+ http://localhost:3001/request/107/response/144
+ http://localhost:3001/request/52/response/238
+
Sources of public bodies
========================