30 files changed, 2061 insertions, 426 deletions
diff --git a/lib/acts_as_xapian/.gitignore b/lib/acts_as_xapian/.gitignore
new file mode 100644
index 000000000..60e95666f
--- /dev/null
+++ b/lib/acts_as_xapian/.gitignore
@@ -0,0 +1,3 @@
+/xapiandbs
+CVS
+*.swp
diff --git a/lib/acts_as_xapian/LICENSE.txt b/lib/acts_as_xapian/LICENSE.txt
new file mode 100644
index 000000000..72d93c4be
--- /dev/null
+++ b/lib/acts_as_xapian/LICENSE.txt
@@ -0,0 +1,21 @@
+acts_as_xapian is released under the MIT License.
+
+Copyright (c) 2008 UK Citizens Online Democracy.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of the acts_as_xapian software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including without
+limitation the rights to use, copy, modify, merge, publish, distribute,
+sublicense, and/or sell copies of the Software, and to permit persons to whom
+the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
diff --git a/lib/acts_as_xapian/README.txt b/lib/acts_as_xapian/README.txt
new file mode 100644
index 000000000..a1d22ef3f
--- /dev/null
+++ b/lib/acts_as_xapian/README.txt
@@ -0,0 +1,276 @@
+The official page for acts_as_xapian is now the Google Groups page.
+
+http://groups.google.com/group/acts_as_xapian
+
+frabcus's github repository is no longer the official repository,
+find the official one from the Google Groups page.
+
+------------------------------------------------------------------------
+
+Do patch this file if there is documentation missing / wrong. It's called
+README.txt and is in git, using Textile formatting. The wiki page is just
+copied from the README.txt file.
+
+Contents
+========
+
+* a. Introduction to acts_as_xapian
+* b. Installation
+* c. Comparison to acts_as_solr (as on 24 April 2008)
+* d. Documentation - indexing
+* e. Documentation - querying
+* f. Configuration
+* g. Performance
+* h. Support
+
+
+a. Introduction to acts_as_xapian
+=================================
+
+"Xapian":http://www.xapian.org is a full text search engine library which has
+Ruby bindings. acts_as_xapian adds support for it to Rails. It is an
+alternative to acts_as_solr, acts_as_ferret, Ultrasphinx, acts_as_indexed,
+acts_as_searchable or acts_as_tsearch.
+
+acts_as_xapian is deployed in production on these websites.
+* "WhatDoTheyKnow":http://www.whatdotheyknow.com
+* "MindBites":http://www.mindbites.com
+
+The section "c. Comparison to acts_as_solr" below will give you an idea of
+acts_as_xapian's features.
+
+acts_as_xapian was started by Francis Irving in May 2008 for search and email
+alerts in WhatDoTheyKnow, and so was supported by "mySociety":http://www.mysociety.org
+and initially paid for by the "JRSST Charitable Trust":http://www.jrrt.org.uk/jrsstct.htm
+
+
+b. Installation
+===============
+
+Retrieve the plugin directly from the git version control system by running
+this command within your Rails app.
+
+    git clone git://github.com/frabcus/acts_as_xapian.git vendor/plugins/acts_as_xapian
+
+Xapian 1.0.5 and associated Ruby bindings are also required.
+
+Debian or Ubuntu - install the packages libxapian15 and libxapian-ruby1.8.
+
+Mac OSX - follow the instructions for installing from source on
+the "Installing Xapian":http://xapian.org/docs/install.html page - you need the
+Xapian library and bindings (you don't need Omega).
+
+There is no Ruby Gem for Xapian, it would be great if you could make one!
+
+
+c. Comparison to acts_as_solr (as on 24 April 2008)
+=============================
+
+* Offline indexing only mode - which is a minus if you want changes
+immediately reflected in the search index, and a plus if you were going to
+have to implement your own offline indexing anyway.
+
+* Collapsing - the equivalent of SQL's "group by". You can specify a field
+to collapse on, and only the most relevant result from each value of that
+field is returned. Along with a count of how many there are in total.
+acts_as_solr doesn't have this.
+
+* No highlighting - Xapian can't return you text highlighted with a search
+query. You can try and make do with TextHelper::highlight (combined with
+words_to_highlight below). I found the highlighting in acts_as_solr didn't
+really understand the query anyway.
+
+* Date range searching - this exists in acts_as_solr, but I found it
+wasn't documented well enough, and was hard to get working.
+
+* Spelling correction - "did you mean?" built in and just works.
+
+* Similar documents - acts_as_xapian has a simple command to find other models
+that are like a specified model.
+
+* Multiple models - acts_as_xapian searches multiple types of model if you
+like, returning them mixed up together by relevancy. This is like
+multi_solr_search, only it is the default mode of operation and is properly
+supported.
+
+* No daemons - However, if you have more than one web server, you'll need to
+work out how to use "Xapian's remote backend":http://xapian.org/docs/remote.html.
+
+* One layer - full-powered Xapian is called directly from the Ruby, without
+Solr getting in the way whenever you want to use a new feature from Lucene.
+
+* No Java - an advantage if you're more used to working in the rest of the
+open source world. acts_as_xapian, it's pure Ruby and C++.
+
+* Xapian's awesome email list - the kids over at
+"xapian-discuss":http://lists.xapian.org/mailman/listinfo/xapian-discuss
+are super helpful. Useful if you need to extend and improve acts_as_xapian. The
+Ruby bindings are mature and well maintained as part of Xapian.
+
+
+d. Documentation - indexing
+===========================
+
+Xapian is an *offline indexing* search library - only one process can have the
+Xapian database open for writing at once, and others that try meanwhile are
+unceremoniously kicked out. For this reason, acts_as_xapian does not support
+immediate writing to the database when your models change.
+
+Instead, there is a ActsAsXapianJob model which stores which models need
+updating or deleting in the search index. A rake task 'xapian:update_index'
+then performs the updates since last change. You can run it on a cron job, or
+similar.
+
+Here's how to add indexing to your Rails app:
+
+1. Put acts_as_xapian in your models that need search indexing. e.g.
+
+    acts_as_xapian :texts => [ :name, :short_name ],
+       :values => [ [ :created_at, 0, "created_at", :date ] ],
+       :terms => [ [ :variety, 'V', "variety" ] ]
+
+Options must include:
+
+* :texts, an array of fields for indexing with full text search.
+e.g. :texts => [ :title, :body ]
+
+* :values, things which have a range of values for sorting, or for collapsing.
+Specify an array quadruple of [ field, identifier, prefix, type ] where
+** identifier is an arbitary numeric identifier for use in the Xapian database
+** prefix is the part to use in search queries that goes before the :
+** type can be any of :string, :number or :date
+
+e.g. :values => [ [ :created_at, 0, "created_at", :date ],
+[ :size, 1, "size", :string ] ]
+
+* :terms, things which come with a prefix (before a :) in search queries.
+Specify an array triple of [ field, char, prefix ] where
+** char is an arbitary single upper case char used in the Xapian database, just
+pick any single uppercase character, but use a different one for each prefix.
+** prefix is the part to use in search queries that goes before the :
+For example, if you were making Google and indexing to be able to later do a
+query like "site:www.whatdotheyknow.com", then the prefix would be "site".
+
+e.g. :terms => [ [ :variety, 'V', "variety" ] ]
+
+A 'field' is a symbol referring to either an attribute or a function which
+returns the text, date or number to index. Both 'identifier' and 'char' must be
+the same for the same prefix in different models.
+
+Options may include:
+* :eager_load, added as an :include clause when looking up search results in
+database
+* :if, either an attribute or a function which if returns false means the
+object isn't indexed
+
+2. Generate a database migration to create the ActsAsXapianJob model:
+
+    script/generate acts_as_xapian
+    rake db:migrate
+
+3. Call 'rake xapian:rebuild_index models="ModelName1 ModelName2"' to build the index
+the first time (you must specify all your indexed models). It's put in a
+development/test/production dir in acts_as_xapian/xapiandbs. See f. Configuration
+below if you want to change this.
+
+4. Then from a cron job or a daemon, or by hand regularly!, call 'rake xapian:update_index'
+
+
+e. Documentation - querying
+===========================
+
+Testing indexing
+----------------
+
+If you just want to test indexing is working, you'll find this rake task
+useful (it has more options, see tasks/xapian.rake)
+
+    rake xapian:query models="PublicBody User" query="moo"
+
+Performing a query
+------------------
+
+To perform a query from code call ActsAsXapian::Search.new. This takes in turn:
+* model_classes - list of models to search, e.g. [PublicBody, InfoRequestEvent]
+* query_string - Google like syntax, see below
+
+And then a hash of options:
+* :offset - Offset of first result (default 0)
+* :limit - Number of results per page
+* :sort_by_prefix - Optionally, prefix of value to sort by, otherwise sort by relevance
+* :sort_by_ascending - Default true (documents with higher values better/earlier), set to false for descending sort
+* :collapse_by_prefix - Optionally, prefix of value to collapse by (i.e. only return most relevant result from group)
+
+Google like query syntax is as described in
+    "Xapian::QueryParser Syntax":http://www.xapian.org/docs/queryparser.html
+Queries can include prefix:value parts, according to what you indexed in the
+acts_as_xapian part above. You can also say things like model:InfoRequestEvent
+to constrain by model in more complex ways than the :model parameter, or
+modelid:InfoRequestEvent-100 to only find one specific object.
+
+Returns an ActsAsXapian::Search object. Useful methods are:
+* description - a techy one, to check how the query has been parsed
+* matches_estimated - a guesstimate at the total number of hits
+* spelling_correction - the corrected query string if there is a correction, otherwise nil
+* words_to_highlight - list of words for you to highlight, perhaps with TextHelper::highlight
+* results - an array of hashes each containing:
+** :model - your Rails model, this is what you most want!
+** :weight - relevancy measure
+** :percent - the weight as a %, 0 meaning the item did not match the query at all
+** :collapse_count - number of results with the same prefix, if you specified collapse_by_prefix
+
+Finding similar models
+----------------------
+
+To find models that are similar to a given set of models call ActsAsXapian::Similar.new. This takes:
+* model_classes - list of model classes to return models from within
+* models - list of models that you want to find related ones to
+
+Returns an ActsAsXapian::Similar object. Has all methods from ActsAsXapian::Search above, except
+for words_to_highlight. In addition has:
+* important_terms - the terms extracted from the input models, that were used to search for output
+You need the results methods to get the similar models.
+
+
+f. Configuration
+================
+
+If you want to customise the configuration of acts_as_xapian, it will look for
+a file called 'xapian.yml' under Rails.root/config. As is familiar from the
+format of the database.yml file, separate :development, :test and :production
+sections are expected.
+
+The following options are available:
+* base_db_path - specifies the directory, relative to Rails.root, in which
+acts_as_xapian stores its search index databases. Default is the directory
+xapiandbs within the acts_as_xapian directory.
+
+
+g. Performance
+==============
+
+On development sites, acts_as_xapian automatically logs the time taken to do
+searches.  The time displayed is for the Xapian parts of the query; the Rails
+database model lookups will be logged separately by ActiveRecord. Example:
+
+    Xapian query (0.00029s) Search: hello
+
+To enable this, and other performance logging, on a production site,
+temporarily add this to the end of your config/environment.rb
+
+    ActiveRecord::Base.logger = Logger.new(STDOUT)
+
+
+h. Support
+==========
+
+Please ask any questions on the
+"acts_as_xapian Google Group":http://groups.google.com/group/acts_as_xapian
+
+The official home page and repository for acts_as_xapian are the
+"acts_as_xapian github page":http://github.com/frabcus/acts_as_xapian/wikis
+
+For more details about anything, see source code in lib/acts_as_xapian.rb
+
+Merging source instructions "Using git for collaboration" here:
+http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html
diff --git a/lib/acts_as_xapian/acts_as_xapian.rb b/lib/acts_as_xapian/acts_as_xapian.rb
new file mode 100644
index 000000000..b30bb4d10
--- /dev/null
+++ b/lib/acts_as_xapian/acts_as_xapian.rb
@@ -0,0 +1,979 @@
+# encoding: utf-8
+# acts_as_xapian/lib/acts_as_xapian.rb:
+# Xapian full text search in Ruby on Rails.
+#
+# Copyright (c) 2008 UK Citizens Online Democracy. All rights reserved.
+# Email: hello@mysociety.org; WWW: http://www.mysociety.org/
+#
+# Documentation
+# =============
+#
+# See ../README.txt foocumentation. Please update that file if you edit
+# code.
+
+# Make it so if Xapian isn't installed, the Rails app doesn't fail completely,
+# just when somebody does a search.
+begin
+    require 'xapian'
+    $acts_as_xapian_bindings_available = true
+rescue LoadError
+    STDERR.puts "acts_as_xapian: No Ruby bindings for Xapian installed"
+    $acts_as_xapian_bindings_available = false
+end
+
+module ActsAsXapian
+    ######################################################################
+    # Module level variables
+    # XXX must be some kind of cattr_accessor that can do this better
+    def ActsAsXapian.bindings_available
+        $acts_as_xapian_bindings_available
+    end
+    class NoXapianRubyBindingsError < StandardError
+    end
+
+    @@db = nil
+    @@db_path = nil
+    @@writable_db = nil
+    @@init_values = []
+
+    # There used to be a problem with this module being loaded more than once.
+    # Keep a check here, so we can tell if the problem recurs.
+    if $acts_as_xapian_class_var_init
+        raise "The acts_as_xapian module has already been loaded"
+    else
+        $acts_as_xapian_class_var_init = true
+    end
+
+    def ActsAsXapian.db
+        @@db
+    end
+    def ActsAsXapian.db_path=(db_path)
+        @@db_path = db_path
+    end
+    def ActsAsXapian.db_path
+        @@db_path
+    end
+    def ActsAsXapian.writable_db
+        @@writable_db
+    end
+    def ActsAsXapian.stemmer
+        @@stemmer
+    end
+    def ActsAsXapian.term_generator
+        @@term_generator
+    end
+    def ActsAsXapian.enquire
+        @@enquire
+    end
+    def ActsAsXapian.query_parser
+        @@query_parser
+    end
+    def ActsAsXapian.values_by_prefix
+        @@values_by_prefix
+    end
+    def ActsAsXapian.config
+      @@config
+    end
+
+    ######################################################################
+    # Initialisation
+    def ActsAsXapian.init(classname = nil, options = nil)
+        if not classname.nil?
+            # store class and options for use later, when we open the db in readable_init
+            @@init_values.push([classname,options])
+        end
+    end
+
+    # Reads the config file (if any) and sets up the path to the database we'll be using
+    def ActsAsXapian.prepare_environment
+      return unless @@db_path.nil?
+
+      # barf if we can't figure out the environment
+      environment = (ENV['RAILS_ENV'] or Rails.env)
+      raise "Set RAILS_ENV, so acts_as_xapian can find the right Xapian database" if not environment
+
+      # check for a config file
+      config_file = Rails.root.join("config","xapian.yml")
+      @@config = File.exists?(config_file) ? YAML.load_file(config_file)[environment] : {}
+
+      # figure out where the DBs should go
+      if config['base_db_path']
+        db_parent_path = Rails.root.join(config['base_db_path'])
+      else
+        db_parent_path = File.join(File.dirname(__FILE__), 'xapiandbs')
+      end
+
+      # make the directory for the xapian databases to go in
+      Dir.mkdir(db_parent_path) unless File.exists?(db_parent_path)
+
+      @@db_path = File.join(db_parent_path, environment)
+
+      # make some things that don't depend on the db
+      # XXX this gets made once for each acts_as_xapian. Oh well.
+      @@stemmer = Xapian::Stem.new('english')
+    end
+
+    # Opens / reopens the db for reading
+    # XXX we perhaps don't need to rebuild database and enquire and queryparser -
+    # but db.reopen wasn't enough by itself, so just do everything it's easier.
+    def ActsAsXapian.readable_init
+        raise NoXapianRubyBindingsError.new("Xapian Ruby bindings not installed") unless ActsAsXapian.bindings_available
+        raise "acts_as_xapian hasn't been called in any models" if @@init_values.empty?
+
+        prepare_environment
+
+        # We need to reopen the database each time, so Xapian gets changes to it.
+        # Calling reopen() does not always pick up changes for reasons that I can
+        # only speculate about at the moment. (It is easy to reproduce this by
+        # changing the code below to use reopen() rather than open() followed by
+        # close(), and running rake spec.)
+        if !@@db.nil?
+            @@db.close
+        end
+
+        # basic Xapian objects
+        begin
+            @@db = Xapian::Database.new(@@db_path)
+            @@enquire = Xapian::Enquire.new(@@db)
+        rescue IOError => e
+            raise "Failed to open Xapian database #{@@db_path}: #{e.message}"
+        end
+
+        init_query_parser
+    end
+
+    # Make a new query parser
+    def ActsAsXapian.init_query_parser
+        # for queries
+        @@query_parser = Xapian::QueryParser.new
+        @@query_parser.stemmer = @@stemmer
+        @@query_parser.stemming_strategy = Xapian::QueryParser::STEM_SOME
+        @@query_parser.database = @@db
+        @@query_parser.default_op = Xapian::Query::OP_AND
+        begin
+            @@query_parser.set_max_wildcard_expansion(1000)
+        rescue NoMethodError
+            # The set_max_wildcard_expansion method was introduced in Xapian 1.2.7,
+            # so may legitimately not be available.
+            #
+            # Large installations of Alaveteli should consider
+            # upgrading, because uncontrolled wildcard expansion
+            # can crash the whole server: see http://trac.xapian.org/ticket/350
+        end
+
+        @@stopper = Xapian::SimpleStopper.new
+        @@stopper.add("and")
+        @@stopper.add("of")
+        @@stopper.add("&")
+        @@query_parser.stopper = @@stopper
+
+        @@terms_by_capital = {}
+        @@values_by_number = {}
+        @@values_by_prefix = {}
+        @@value_ranges_store = []
+
+        for init_value_pair in @@init_values
+            classname = init_value_pair[0]
+            options = init_value_pair[1]
+
+            # go through the various field types, and tell query parser about them,
+            # and error check them - i.e. check for consistency between models
+            @@query_parser.add_boolean_prefix("model", "M")
+            @@query_parser.add_boolean_prefix("modelid", "I")
+            if options[:terms]
+              for term in options[:terms]
+                  raise "Use a single capital letter for term code" if not term[1].match(/^[A-Z]$/)
+                  raise "M and I are reserved for use as the model/id term" if term[1] == "M" or term[1] == "I"
+                  raise "model and modelid are reserved for use as the model/id prefixes" if term[2] == "model" or term[2] == "modelid"
+                  raise "Z is reserved for stemming terms" if term[1] == "Z"
+                  raise "Already have code '" + term[1] + "' in another model but with different prefix '" + @@terms_by_capital[term[1]] + "'" if @@terms_by_capital.include?(term[1]) && @@terms_by_capital[term[1]] != term[2]
+                  @@terms_by_capital[term[1]] = term[2]
+                  # XXX use boolean here so doesn't stem our URL names in WhatDoTheyKnow
+                  # If making acts_as_xapian generic, would really need to make the :terms have
+                  # another option that lets people choose non-boolean for terms that need it
+                  # (i.e. searching explicitly within a free text field)
+                  @@query_parser.add_boolean_prefix(term[2], term[1])
+              end
+            end
+            if options[:values]
+              for value in options[:values]
+                  raise "Value index '"+value[1].to_s+"' must be an integer, is " + value[1].class.to_s if value[1].class != 1.class
+                  raise "Already have value index '" + value[1].to_s + "' in another model but with different prefix '" + @@values_by_number[value[1]].to_s + "'" if @@values_by_number.include?(value[1]) && @@values_by_number[value[1]] != value[2]
+
+                  # date types are special, mark them so the first model they're seen for
+                  if !@@values_by_number.include?(value[1])
+                      if value[3] == :date
+                          value_range = Xapian::DateValueRangeProcessor.new(value[1])
+                      elsif value[3] == :string
+                          value_range = Xapian::StringValueRangeProcessor.new(value[1])
+                      elsif value[3] == :number
+                          value_range = Xapian::NumberValueRangeProcessor.new(value[1])
+                      else
+                          raise "Unknown value type '" + value[3].to_s + "'"
+                      end
+
+                      @@query_parser.add_valuerangeprocessor(value_range)
+
+                      # stop it being garbage collected, as
+                      # add_valuerangeprocessor ref is outside Ruby's GC
+                      @@value_ranges_store.push(value_range)
+                  end
+
+                  @@values_by_number[value[1]] = value[2]
+                  @@values_by_prefix[value[2]] = value[1]
+              end
+            end
+        end
+    end
+
+    def ActsAsXapian.writable_init(suffix = "")
+        raise NoXapianRubyBindingsError.new("Xapian Ruby bindings not installed") unless ActsAsXapian.bindings_available
+        raise "acts_as_xapian hasn't been called in any models" if @@init_values.empty?
+
+        # if DB is not nil, then we're already initialised, so don't do it
+        # again XXX reopen it each time, xapian_spec.rb needs this so database
+        # gets written twice correctly.
+        # return unless @@writable_db.nil?
+
+        prepare_environment
+
+        full_path = @@db_path + suffix
+
+        # for indexing
+        @@writable_db = Xapian::WritableDatabase.new(full_path, Xapian::DB_CREATE_OR_OPEN)
+        @@enquire = Xapian::Enquire.new(@@writable_db)
+        @@term_generator = Xapian::TermGenerator.new()
+        @@term_generator.set_flags(Xapian::TermGenerator::FLAG_SPELLING, 0)
+        @@term_generator.database = @@writable_db
+        @@term_generator.stemmer = @@stemmer
+    end
+
+    ######################################################################
+    # Search with a query or for similar models
+
+    # Base class for Search and Similar below
+    class QueryBase
+        attr_accessor :offset
+        attr_accessor :limit
+        attr_accessor :query
+        attr_accessor :matches
+        attr_accessor :query_models
+        attr_accessor :runtime
+        attr_accessor :cached_results
+
+        def initialize_db
+            self.runtime = 0.0
+
+            ActsAsXapian.readable_init
+            if ActsAsXapian.db.nil?
+                raise "ActsAsXapian not initialized"
+            end
+        end
+
+        MSET_MAX_TRIES = 5
+        MSET_MAX_DELAY = 5
+        # Set self.query before calling this
+        def initialize_query(options)
+            #raise options.to_yaml
+
+            self.runtime += Benchmark::realtime {
+                offset = options[:offset] || 0; offset = offset.to_i
+                limit = options[:limit]
+                raise "please specifiy maximum number of results to return with parameter :limit" if not limit
+                limit = limit.to_i
+                sort_by_prefix = options[:sort_by_prefix] || nil
+                sort_by_ascending = options[:sort_by_ascending].nil? ? true : options[:sort_by_ascending]
+                collapse_by_prefix = options[:collapse_by_prefix] || nil
+
+                ActsAsXapian.enquire.query = self.query
+
+                if sort_by_prefix.nil?
+                    ActsAsXapian.enquire.sort_by_relevance!
+                else
+                    value = ActsAsXapian.values_by_prefix[sort_by_prefix]
+                    raise "couldn't find prefix '" + sort_by_prefix.to_s + "'" if value.nil?
+                    ActsAsXapian.enquire.sort_by_value_then_relevance!(value, sort_by_ascending)
+                end
+                if collapse_by_prefix.nil?
+                    ActsAsXapian.enquire.collapse_key = Xapian.BAD_VALUENO
+                else
+                    value = ActsAsXapian.values_by_prefix[collapse_by_prefix]
+                    raise "couldn't find prefix '" + collapse_by_prefix + "'" if value.nil?
+                    ActsAsXapian.enquire.collapse_key = value
+                end
+
+                tries = 0
+                delay = 1
+                begin
+                    self.matches = ActsAsXapian.enquire.mset(offset, limit, 100)
+                rescue IOError => e
+                    if e.message =~ /DatabaseModifiedError: /
+                        # This should be a transient error, so back off and try again, up to a point
+                        if tries > MSET_MAX_TRIES
+                            raise "Received DatabaseModifiedError from Xapian even after retrying #{MSET_MAX_TRIES} times"
+                        else
+                            sleep delay
+                        end
+                        tries += 1
+                        delay *= 2
+                        delay = MSET_MAX_DELAY if delay > MSET_MAX_DELAY
+
+                        ActsAsXapian.db.reopen()
+                        retry
+                    else
+                        raise
+                    end
+                end
+                self.cached_results = nil
+            }
+        end
+
+        # Return a description of the query
+        def description
+            self.query.description
+        end
+
+        # Does the query have non-prefixed search terms in it?
+        def has_normal_search_terms?
+            ret = false
+            #x = ''
+            for t in self.query.terms
+                term = t.term
+                #x = x + term.to_yaml + term.size.to_s + term[0..0] + "*"
+                if term.size >= 2 && term[0..0] == 'Z'
+                    # normal terms begin Z (for stemmed), then have no capital letter prefix
+                    if term[1..1] == term[1..1].downcase
+                        ret = true
+                    end
+                end
+            end
+            return ret
+        end
+
+        # Estimate total number of results
+        def matches_estimated
+            self.matches.matches_estimated
+        end
+
+        # Return query string with spelling correction
+        def spelling_correction
+            correction = ActsAsXapian.query_parser.get_corrected_query_string
+            if correction.empty?
+                return nil
+            end
+            return correction
+        end
+
+        # Return array of models found
+        def results
+            # If they've already pulled out the results, just return them.
+            if !self.cached_results.nil?
+                return self.cached_results
+            end
+
+            docs = []
+            self.runtime += Benchmark::realtime {
+                # Pull out all the results
+                iter = self.matches._begin
+                while not iter.equals(self.matches._end)
+                    docs.push({:data => iter.document.data,
+                            :percent => iter.percent,
+                            :weight => iter.weight,
+                            :collapse_count => iter.collapse_count})
+                    iter.next
+                end
+            }
+
+            # Log time taken, excluding database lookups below which will be displayed separately by ActiveRecord
+            if ActiveRecord::Base.logger
+                ActiveRecord::Base.logger.add(Logger::DEBUG, "  Xapian query (#{'%.5fs' % self.runtime}) #{self.log_description}")
+            end
+
+            # Look up without too many SQL queries
+            lhash = {}
+            lhash.default = []
+            for doc in docs
+                k = doc[:data].split('-')
+                lhash[k[0]] = lhash[k[0]] + [k[1]]
+            end
+            # for each class, look up all ids
+            chash = {}
+            for cls, ids in lhash
+                conditions = [ "#{cls.constantize.table_name}.#{cls.constantize.primary_key} in (?)", ids ]
+                found = cls.constantize.find(:all, :conditions => conditions, :include => cls.constantize.xapian_options[:eager_load])
+                for f in found
+                    chash[[cls, f.id]] = f
+                end
+            end
+            # now get them in right order again
+            results = []
+            docs.each do |doc|
+                k = doc[:data].split('-')
+                model_instance = chash[[k[0], k[1].to_i]]
+                if model_instance
+                    results << { :model => model_instance,
+                                 :percent => doc[:percent],
+                                 :weight => doc[:weight],
+                                 :collapse_count => doc[:collapse_count] }
+                end
+            end
+            self.cached_results = results
+            return results
+        end
+    end
+
+    # Search for a query string, returns an array of hashes in result order.
+    # Each hash contains the actual Rails object in :model, and other detail
+    # about relevancy etc. in other keys.
+    class Search < QueryBase
+        attr_accessor :query_string
+
+        # Note that model_classes is not only sometimes useful here - it's
+        # essential to make sure the classes have been loaded, and thus
+        # acts_as_xapian called on them, so we know the fields for the query
+        # parser.
+
+        # model_classes - model classes to search within, e.g. [PublicBody,
+        # User]. Can take a single model class, or you can express the model
+        # class names in strings if you like.
+        # query_string - user inputed query string, with syntax much like Google Search
+        def initialize(model_classes, query_string, options = {}, user_query = nil)
+            # Check parameters, convert to actual array of model classes
+            new_model_classes = []
+            model_classes = [model_classes] if model_classes.class != Array
+            for model_class in model_classes
+                raise "pass in the model class itself, or a string containing its name" if model_class.class != Class && model_class.class != String
+                model_class = model_class.constantize if model_class.class == String
+                new_model_classes.push(model_class)
+            end
+            model_classes = new_model_classes
+
+            # Set things up
+            self.initialize_db
+
+            # Case of a string, searching for a Google-like syntax query
+            self.query_string = query_string
+
+            # Construct query which only finds things from specified models
+            model_query = Xapian::Query.new(Xapian::Query::OP_OR, model_classes.map{|mc| "M" + mc.to_s})
+            if user_query.nil?
+                user_query = ActsAsXapian.query_parser.parse_query(
+                                       self.query_string,
+                                       Xapian::QueryParser::FLAG_BOOLEAN | Xapian::QueryParser::FLAG_PHRASE |
+                                       Xapian::QueryParser::FLAG_LOVEHATE |
+                                       Xapian::QueryParser::FLAG_SPELLING_CORRECTION)
+            end
+            self.query = Xapian::Query.new(Xapian::Query::OP_AND, model_query, user_query)
+
+            # Call base class constructor
+            self.initialize_query(options)
+        end
+
+        # Return just normal words in the query i.e. Not operators, ones in
+        # date ranges or similar. Use this for cheap highlighting with
+        # TextHelper::highlight, and excerpt.
+        def words_to_highlight
+            # TODO: In Ruby 1.9 we can do matching of any unicode letter with \p{L}
+            # But we still need to support ruby 1.8 for the time being so...
+            query_nopunc = self.query_string.gsub(/[^ёЁа-яА-Яa-zA-Zà-üÀ-Ü0-9:\.\/_]/iu, " ")
+            query_nopunc = query_nopunc.gsub(/\s+/, " ")
+            words = query_nopunc.split(" ")
+            # Remove anything with a :, . or / in it
+            words = words.find_all {|o| !o.match(/(:|\.|\/)/) }
+            words = words.find_all {|o| !o.match(/^(AND|NOT|OR|XOR)$/) }
+            return words
+        end
+
+        # Text for lines in log file
+        def log_description
+            "Search: " + self.query_string
+        end
+
+    end
+
+    # Search for models which contain theimportant terms taken from a specified
+    # list of models. i.e. Use to find documents similar to one (or more)
+    # documents, or use to refine searches.
+    class Similar < QueryBase
+        attr_accessor :query_models
+        attr_accessor :important_terms
+
+        # model_classes - model classes to search within, e.g. [PublicBody, User]
+        # query_models - list of models you want to find things similar to
+        def initialize(model_classes, query_models, options = {})
+            self.initialize_db
+
+            self.runtime += Benchmark::realtime {
+                # Case of an array, searching for models similar to those models in the array
+                self.query_models = query_models
+
+                # Find the documents by their unique term
+                input_models_query = Xapian::Query.new(Xapian::Query::OP_OR, query_models.map{|m| "I" + m.xapian_document_term})
+                ActsAsXapian.enquire.query = input_models_query
+                matches = ActsAsXapian.enquire.mset(0, 100, 100) # XXX so this whole method will only work with 100 docs
+
+                # Get set of relevant terms for those documents
+                selection = Xapian::RSet.new()
+                iter = matches._begin
+                while not iter.equals(matches._end)
+                    selection.add_document(iter)
+                    iter.next
+                end
+
+                # Bit weird that the function to make esets is part of the enquire
+                # object. This explains what exactly it does, which is to exclude
+                # terms in the existing query.
+                # http://thread.gmane.org/gmane.comp.search.xapian.general/3673/focus=3681
+                eset = ActsAsXapian.enquire.eset(40, selection)
+
+                # Do main search for them
+                self.important_terms = []
+                iter = eset._begin
+                while not iter.equals(eset._end)
+                    self.important_terms.push(iter.term)
+                    iter.next
+                end
+                similar_query = Xapian::Query.new(Xapian::Query::OP_OR, self.important_terms)
+                # Exclude original
+                combined_query = Xapian::Query.new(Xapian::Query::OP_AND_NOT, similar_query, input_models_query)
+
+                # Restrain to model classes
+                model_query = Xapian::Query.new(Xapian::Query::OP_OR, model_classes.map{|mc| "M" + mc.to_s})
+                self.query = Xapian::Query.new(Xapian::Query::OP_AND, model_query, combined_query)
+            }
+
+            # Call base class constructor
+            self.initialize_query(options)
+        end
+
+        # Text for lines in log file
+        def log_description
+            "Similar: " + self.query_models.to_s
+        end
+    end
+
+    ######################################################################
+    # Index
+
+    # Offline indexing job queue model, create with migration made
+    # using "script/generate acts_as_xapian" as described in ../README.txt
+    class ActsAsXapianJob < ActiveRecord::Base
+    end
+
+    # Update index with any changes needed, call this offline. Usually call it
+    # from a script that exits - otherwise Xapian's writable database won't
+    # flush your changes. Specifying flush will reduce performance, but make
+    # sure that each index update is definitely saved to disk before
+    # logging in the database that it has been.
+    def ActsAsXapian.update_index(flush = false, verbose = false)
+        # STDOUT.puts("start of ActsAsXapian.update_index") if verbose
+
+        # Before calling writable_init we have to make sure every model class has been initialized.
+        # i.e. has had its class code loaded, so acts_as_xapian has been called inside it, and
+        # we have the info from acts_as_xapian.
+        model_classes = ActsAsXapianJob.find_by_sql("select model from acts_as_xapian_jobs group by model").map {|a| a.model.constantize}
+        # If there are no models in the queue, then nothing to do
+        return if model_classes.size == 0
+
+        ActsAsXapian.writable_init
+        # Abort if full rebuild is going on
+        new_path = ActsAsXapian.db_path + ".new"
+        if File.exist?(new_path)
+            raise "aborting incremental index update while full index rebuild happens; found existing " + new_path
+        end
+
+        ids_to_refresh = ActsAsXapianJob.find(:all).map() { |i| i.id }
+        for id in ids_to_refresh
+            job = nil
+            begin
+                ActiveRecord::Base.transaction do
+                    begin
+                        job = ActsAsXapianJob.find(id, :lock =>true)
+                    rescue ActiveRecord::RecordNotFound => e
+                        # This could happen if while we are working the model
+                        # was updated a second time by another process. In that case
+                        # ActsAsXapianJob.delete_all in xapian_mark_needs_index below
+                        # might have removed the first job record while we are working on it.
+                        #STDERR.puts("job with #{id} vanished under foot") if verbose
+                        next
+                    end
+                    STDOUT.puts("ActsAsXapian.update_index #{job.action} #{job.model} #{job.model_id.to_s} #{Time.now.to_s}") if verbose
+
+                    begin
+                        if job.action == 'update'
+                            # XXX Index functions may reference other models, so we could eager load here too?
+                            model = job.model.constantize.find(job.model_id) # :include => cls.constantize.xapian_options[:include]
+                            model.xapian_index
+                        elsif job.action == 'destroy'
+                            # Make dummy model with right id, just for destruction
+                            model = job.model.constantize.new
+                            model.id = job.model_id
+                            model.xapian_destroy
+                        else
+                            raise "unknown ActsAsXapianJob action '" + job.action + "'"
+                        end
+                    rescue ActiveRecord::RecordNotFound => e
+                        # this can happen if the record was hand deleted in the database
+                        job.action = 'destroy'
+                        retry
+                    end
+                    if flush
+                        ActsAsXapian.writable_db.flush
+                    end
+                    job.destroy
+                end
+            rescue => detail
+                # print any error, and carry on so other things are indexed
+                STDERR.puts(detail.backtrace.join("\n") + "\nFAILED ActsAsXapian.update_index job #{id} #{$!} " + (job.nil? ? "" : "model " + job.model + " id " + job.model_id.to_s))
+            end
+        end
+        # We close the database when we're finished to remove the lock file. Since writable_init
+        # reopens it and recreates the environment every time we don't need to do further cleanup
+        ActsAsXapian.writable_db.flush
+        ActsAsXapian.writable_db.close
+    end
+
+    def ActsAsXapian._is_xapian_db(path)
+        is_db = File.exist?(File.join(path, "iamflint")) || File.exist?(File.join(path, "iamchert"))
+        return is_db
+    end
+
+    # You must specify *all* the models here, this totally rebuilds the Xapian
+    # database.  You'll want any readers to reopen the database after this.
+    #
+    # Incremental update_index calls above are suspended while this rebuild
+    # happens (i.e. while the .new database is there) - any index update jobs
+    # are left in the database, and will run after the rebuild has finished.
+
+    def ActsAsXapian.rebuild_index(model_classes, verbose = false, terms = true, values = true, texts = true, safe_rebuild = true)
+        #raise "when rebuilding all, please call as first and only thing done in process / task" if not ActsAsXapian.writable_db.nil?
+        prepare_environment
+
+        update_existing = !(terms == true && values == true && texts == true)
+        # Delete any existing .new database, and open a new one which is a copy of the current one
+        new_path = ActsAsXapian.db_path + ".new"
+        old_path = ActsAsXapian.db_path
+        if File.exist?(new_path)
+            raise "found existing " + new_path + " which is not Xapian flint database, please delete for me" if not ActsAsXapian._is_xapian_db(new_path)
+            FileUtils.rm_r(new_path)
+        end
+        if update_existing
+            FileUtils.cp_r(old_path, new_path)
+        end
+        ActsAsXapian.writable_init
+        ActsAsXapian.writable_db.close # just to make an empty one to read
+        # Index everything
+        if safe_rebuild
+            _rebuild_index_safely(model_classes, verbose, terms, values, texts)
+        else
+            @@db_path = ActsAsXapian.db_path + ".new"
+            ActsAsXapian.writable_init
+            # Save time by running the indexing in one go and in-process
+            for model_class in model_classes
+                STDOUT.puts("ActsAsXapian.rebuild_index: Rebuilding #{model_class.to_s}") if verbose
+                model_class.find(:all).each do |model|
+                  STDOUT.puts("ActsAsXapian.rebuild_index      #{model_class} #{model.id}") if verbose
+                  model.xapian_index(terms, values, texts)
+                end
+            end
+            ActsAsXapian.writable_db.flush
+            ActsAsXapian.writable_db.close
+        end
+
+        # Rename into place
+        temp_path = old_path + ".tmp"
+        if File.exist?(temp_path)
+            @@db_path = old_path
+            raise "temporary database found " + temp_path + " which is not Xapian flint database, please delete for me" if not ActsAsXapian._is_xapian_db(temp_path)
+            FileUtils.rm_r(temp_path)
+        end
+        if File.exist?(old_path)
+            FileUtils.mv old_path, temp_path
+        end
+        FileUtils.mv new_path, old_path
+
+        # Delete old database
+        if File.exist?(temp_path)
+            if not ActsAsXapian._is_xapian_db(temp_path)
+                @@db_path = old_path
+                raise "old database now at " + temp_path + " is not Xapian flint database, please delete for me"
+            end
+            FileUtils.rm_r(temp_path)
+        end
+
+        # You'll want to restart your FastCGI or Mongrel processes after this,
+        # so they get the new db
+        @@db_path = old_path
+    end
+
+    def ActsAsXapian._rebuild_index_safely(model_classes, verbose, terms, values, texts)
+        batch_size = 1000
+        for model_class in model_classes
+            model_class_count = model_class.count
+            0.step(model_class_count, batch_size) do |i|
+              # We fork here, so each batch is run in a different process. This is
+              # because otherwise we get a memory "leak" and you can't rebuild very
+              # large databases (however long you have!)
+
+              ActiveRecord::Base.connection.disconnect!
+
+              pid = Process.fork # XXX this will only work on Unix, tough
+              if pid
+                    Process.waitpid(pid)
+                    if not $?.success?
+                        raise "batch fork child failed, exiting also"
+                    end
+                    # database connection doesn't survive a fork, rebuild it
+              else
+                    # fully reopen the database each time (with a new object)
+                    # (so doc ids and so on aren't preserved across the fork)
+                    ActiveRecord::Base.establish_connection
+                    @@db_path = ActsAsXapian.db_path + ".new"
+                    ActsAsXapian.writable_init
+                    STDOUT.puts("ActsAsXapian.rebuild_index: New batch. #{model_class.to_s} from #{i} to #{i + batch_size} of #{model_class_count} pid #{Process.pid.to_s}") if verbose
+                    model_class.find(:all, :limit => batch_size, :offset => i, :order => :id).each do |model|
+                      STDOUT.puts("ActsAsXapian.rebuild_index      #{model_class} #{model.id}") if verbose
+                      model.xapian_index(terms, values, texts)
+                    end
+                    ActsAsXapian.writable_db.flush
+                    ActsAsXapian.writable_db.close
+                    # database connection won't survive a fork, so shut it down
+                    ActiveRecord::Base.connection.disconnect!
+                    # brutal exit, so other shutdown code not run (for speed and safety)
+                    Kernel.exit! 0
+              end
+
+              ActiveRecord::Base.establish_connection
+
+            end
+        end
+    end
+
+    ######################################################################
+    # Instance methods that get injected into your model.
+
+    module InstanceMethods
+        # Used internally
+        def xapian_document_term
+            self.class.to_s + "-" + self.id.to_s
+        end
+
+        def xapian_value(field, type = nil, index_translations = false)
+            if index_translations && self.respond_to?("translations")
+                if type == :date or type == :boolean
+                    value = single_xapian_value(field, type = type)
+                else
+                    values = []
+                    for locale in self.translations.map{|x| x.locale}
+                        I18n.with_locale(locale) do
+                            values << single_xapian_value(field, type=type)
+                        end
+                    end
+                    if values[0].kind_of?(Array)
+                        values = values.flatten
+                        value = values.reject{|x| x.nil?}
+                    else
+                        values = values.reject{|x| x.nil?}
+                        value = values.join(" ")
+                    end
+                end
+            else
+                value = single_xapian_value(field, type = type)
+            end
+            return value
+        end
+
+        # Extract value of a field from the model
+        def single_xapian_value(field, type = nil)
+            value = self.send(field.to_sym) || self[field]
+            if type == :date
+                if value.kind_of?(Time)
+                    value.utc.strftime("%Y%m%d")
+                elsif value.kind_of?(Date)
+                    value.to_time.utc.strftime("%Y%m%d")
+                else
+                    raise "Only Time or Date types supported by acts_as_xapian for :date fields, got " + value.class.to_s
+                end
+            elsif type == :boolean
+                value ? true : false
+            else
+                # Arrays are for terms which require multiple of them, e.g. tags
+                if value.kind_of?(Array)
+                    value.map {|v| v.to_s}
+                else
+                    value.to_s
+                end
+            end
+        end
+
+        # Store record in the Xapian database
+        def xapian_index(terms = true, values = true, texts = true)
+            # if we have a conditional function for indexing, call it and destroy object if failed
+            if self.class.xapian_options.include?(:if)
+                if_value = xapian_value(self.class.xapian_options[:if], :boolean)
+                if not if_value
+                    self.xapian_destroy
+                    return
+                end
+            end
+
+            existing_query = Xapian::Query.new("I" + self.xapian_document_term)
+            ActsAsXapian.enquire.query = existing_query
+            match = ActsAsXapian.enquire.mset(0,1,1).matches[0]
+
+            if !match.nil?
+                doc = match.document
+            else
+                doc = Xapian::Document.new
+                doc.data = self.xapian_document_term
+                doc.add_term("M" + self.class.to_s)
+                doc.add_term("I" + doc.data)
+            end
+            # work out what to index
+            # 1. Which terms to index?  We allow the user to specify particular ones
+            terms_to_index = []
+            drop_all_terms = false
+            if terms and self.xapian_options[:terms]
+              terms_to_index = self.xapian_options[:terms].dup
+              if terms.is_a?(String)
+                  terms_to_index.reject!{|term| !terms.include?(term[1])}
+                  if terms_to_index.length == self.xapian_options[:terms].length
+                      drop_all_terms = true
+                  end
+              else
+                  drop_all_terms = true
+              end
+            end
+            # 2. Texts to index?  Currently, it's all or nothing
+            texts_to_index = []
+            if texts and self.xapian_options[:texts]
+                texts_to_index = self.xapian_options[:texts]
+            end
+            # 3. Values to index?  Currently, it's all or nothing
+            values_to_index = []
+            if values and self.xapian_options[:values]
+                values_to_index = self.xapian_options[:values]
+            end
+
+            # clear any existing data that we might want to replace
+            if drop_all_terms && texts
+                # as an optimisation, if we're reindexing all of both, we remove everything
+                doc.clear_terms
+                doc.add_term("M" + self.class.to_s)
+                doc.add_term("I" + doc.data)
+            else
+                term_prefixes_to_index = terms_to_index.map {|x| x[1]}
+                for existing_term in doc.terms
+                    first_letter = existing_term.term[0...1]
+                    if !"MI".include?(first_letter) # it's not one of the reserved value
+                        if first_letter.match("^[A-Z]+") # it's a "value" (rather than indexed text)
+                            if term_prefixes_to_index.include?(first_letter) # it's a value that we've been asked to index
+                                doc.remove_term(existing_term.term)
+                            end
+                        elsif texts
+                            doc.remove_term(existing_term.term) # it's text and we've been asked to reindex it
+                        end
+                    end
+                end
+            end
+
+            for term in terms_to_index
+                value = xapian_value(term[0])
+                if value.kind_of?(Array)
+                    for v in value
+                        doc.add_term(term[1] + v)
+                    end
+                else
+                    doc.add_term(term[1] + value)
+                end
+            end
+
+            if values
+                doc.clear_values
+                for value in values_to_index
+                    doc.add_value(value[1], xapian_value(value[0], value[3]))
+                end
+            end
+            if texts
+                ActsAsXapian.term_generator.document = doc
+                for text in texts_to_index
+                    ActsAsXapian.term_generator.increase_termpos # stop phrases spanning different text fields
+                    # XXX the "1" here is a weight that could be varied for a boost function
+                    ActsAsXapian.term_generator.index_text(xapian_value(text, nil, true), 1)
+                end
+            end
+
+            ActsAsXapian.writable_db.replace_document("I" + doc.data, doc)
+        end
+
+        # Delete record from the Xapian database
+        def xapian_destroy
+            ActsAsXapian.writable_db.delete_document("I" + self.xapian_document_term)
+        end
+
+        # Used to mark changes needed by batch indexer
+        def xapian_mark_needs_index
+            xapian_create_job('update', self.class.base_class.to_s, self.id)
+        end
+
+        def xapian_mark_needs_destroy
+            xapian_create_job('destroy', self.class.base_class.to_s, self.id)
+        end
+
+        # Allow reindexing to be skipped if a flag is set
+        def xapian_mark_needs_index_if_reindex
+            return true if (self.respond_to?(:no_xapian_reindex) && self.no_xapian_reindex == true)
+            xapian_mark_needs_index
+        end
+
+        def xapian_create_job(action, model, model_id)
+            begin
+                ActiveRecord::Base.transaction(:requires_new => true) do
+                    ActsAsXapianJob.delete_all([ "model = ? and model_id = ?", model, model_id])
+                    xapian_before_create_job_hook(action, model, model_id)
+                    ActsAsXapianJob.create!(:model => model,
+                                            :model_id => model_id,
+                                            :action => action)
+                end
+            rescue ActiveRecord::RecordNotUnique => e
+                # Given the error handling in ActsAsXapian::update_index, we can just fail silently if
+                # another process has inserted an acts_as_xapian_jobs record for this model.
+                raise unless (e.message =~ /duplicate key value violates unique constraint "index_acts_as_xapian_jobs_on_model_and_model_id"/)
+            end
+        end
+
+        # A hook method that can be used in tests to simulate e.g. an external process inserting a record
+        def xapian_before_create_job_hook(action, model, model_id)
+        end
+
+     end
+
+    ######################################################################
+    # Main entry point, add acts_as_xapian to your model.
+
+    module ActsMethods
+        # See top of this file for docs
+        def acts_as_xapian(options)
+            # Give error only on queries if bindings not available
+            if not ActsAsXapian.bindings_available
+                return
+            end
+
+            include InstanceMethods
+
+            cattr_accessor :xapian_options
+            self.xapian_options = options
+
+            ActsAsXapian.init(self.class.to_s, options)
+
+            after_save :xapian_mark_needs_index_if_reindex
+            after_destroy :xapian_mark_needs_destroy
+        end
+    end
+
+end
+
+# Reopen ActiveRecord and include the acts_as_xapian method
+ActiveRecord::Base.extend ActsAsXapian::ActsMethods
+
+
diff --git a/lib/acts_as_xapian/tasks/xapian.rake b/lib/acts_as_xapian/tasks/xapian.rake
new file mode 100644
index 000000000..c1986ce1e
--- /dev/null
+++ b/lib/acts_as_xapian/tasks/xapian.rake
@@ -0,0 +1,66 @@
+require 'rubygems'
+require 'rake'
+require 'rake/testtask'
+require 'active_record'
+
+namespace :xapian do
+    # Parameters - specify "flush=true" to save changes to the Xapian database
+    # after each model that is updated. This is safer, but slower. Specify
+    # "verbose=true" to print model name as it is run.
+    desc 'Updates Xapian search index with changes to models since last call'
+    task :update_index => :environment do
+        ActsAsXapian.update_index(ENV['flush'] ? true : false, ENV['verbose'] ? true : false)
+    end
+
+    # Parameters - specify 'models="PublicBody User"' to say which models
+    # you index with Xapian.
+
+    # This totally rebuilds the database, so you will want to restart
+    # any web server afterwards to make sure it gets the changes,
+    # rather than still pointing to the old deleted database. Specify
+    # "verbose=true" to print model name as it is run.  By default,
+    # all of the terms, values and texts are reindexed.  You can
+    # suppress any of these by specifying, for example, "texts=false".
+    # You can specify that only certain terms should be updated by
+    # specifying their prefix(es) as a string, e.g. "terms=IV" will
+    # index the two terms I and V (and "terms=false" will index none,
+    # and "terms=true", the default, will index all)
+
+
+    desc 'Completely rebuilds Xapian search index (must specify all models)'
+    task :rebuild_index => :environment do
+        def coerce_arg(arg, default)
+	    if arg == "false"
+	        return false
+            elsif arg == "true"
+                return true
+            elsif arg.nil?
+	        return default
+            else
+                return arg
+            end
+	end
+        raise "specify ALL your models with models=\"ModelName1 ModelName2\" as parameter" if ENV['models'].nil?
+        ActsAsXapian.rebuild_index(ENV['models'].split(" ").map{|m| m.constantize}, 
+	coerce_arg(ENV['verbose'], false),
+        coerce_arg(ENV['terms'], true),
+        coerce_arg(ENV['values'], true),
+        coerce_arg(ENV['texts'], true))
+    end
+
+    # Parameters - are models, query, offset, limit, sort_by_prefix,
+    # collapse_by_prefix
+    desc 'Run a query, return YAML of results'
+    task :query => :environment do
+        raise "specify models=\"ModelName1 ModelName2\" as parameter" if ENV['models'].nil?
+        raise "specify query=\"your terms\" as parameter" if ENV['query'].nil?
+        s = ActsAsXapian::Search.new(ENV['models'].split(" ").map{|m| m.constantize}, 
+            ENV['query'],
+            :offset => (ENV['offset'] || 0), :limit => (ENV['limit'] || 10),
+            :sort_by_prefix => (ENV['sort_by_prefix'] || nil), 
+            :collapse_by_prefix => (ENV['collapse_by_prefix'] || nil)
+        )
+        STDOUT.puts(s.results.to_yaml)
+    end
+end
+
diff --git a/lib/configuration.rb b/lib/configuration.rb
index fba70f27c..2192433f7 100644
--- a/lib/configuration.rb
+++ b/lib/configuration.rb
@@ -21,6 +21,7 @@ module AlaveteliConfiguration
             :AVAILABLE_LOCALES => '',
             :BLACKHOLE_PREFIX => 'do-not-reply-to-this-address',
             :BLOG_FEED => '',
+            :CACHE_FRAGMENTS => true,
             :CONTACT_EMAIL => 'contact@localhost',
             :CONTACT_NAME => 'Alaveteli',
             :COOKIE_STORE_SESSION_SECRET => 'this default is insecure as code is open source, please override for live sites in config/general; this will do for local development',
diff --git a/lib/generators/acts_as_xapian/USAGE b/lib/generators/acts_as_xapian/USAGE
new file mode 100644
index 000000000..2d027c46f
--- /dev/null
+++ b/lib/generators/acts_as_xapian/USAGE
@@ -0,0 +1 @@
+./script/generate acts_as_xapian
diff --git a/lib/generators/acts_as_xapian/acts_as_xapian_generator.rb b/lib/generators/acts_as_xapian/acts_as_xapian_generator.rb
new file mode 100644
index 000000000..434c02cb5
--- /dev/null
+++ b/lib/generators/acts_as_xapian/acts_as_xapian_generator.rb
@@ -0,0 +1,10 @@
+require 'rails/generators/active_record/migration'
+
+class ActsAsXapianGenerator < Rails::Generators::Base
+  include Rails::Generators::Migration
+  extend ActiveRecord::Generators::Migration
+  source_root File.expand_path("../templates", __FILE__)
+  def create_migration_file
+    migration_template "migration.rb", "db/migrate/add_acts_as_xapian_jobs.rb"
+  end
+end
diff --git a/lib/generators/acts_as_xapian/templates/migration.rb b/lib/generators/acts_as_xapian/templates/migration.rb
new file mode 100644
index 000000000..84a9dd766
--- /dev/null
+++ b/lib/generators/acts_as_xapian/templates/migration.rb
@@ -0,0 +1,14 @@
+class CreateActsAsXapian < ActiveRecord::Migration
+  def self.up
+    create_table :acts_as_xapian_jobs do |t|
+      t.column :model, :string, :null => false
+      t.column :model_id, :integer, :null => false
+      t.column :action, :string, :null => false
+    end
+    add_index :acts_as_xapian_jobs, [:model, :model_id], :unique => true
+  end
+  def self.down
+    drop_table :acts_as_xapian_jobs
+  end
+end
+
diff --git a/lib/has_tag_string/README.txt b/lib/has_tag_string/README.txt
new file mode 100644
index 000000000..0d3a38229
--- /dev/null
+++ b/lib/has_tag_string/README.txt
@@ -0,0 +1 @@
+Plugin used only in WhatDoTheyKnow right now.
diff --git a/lib/has_tag_string/has_tag_string.rb b/lib/has_tag_string/has_tag_string.rb
new file mode 100644
index 000000000..4022faaac
--- /dev/null
+++ b/lib/has_tag_string/has_tag_string.rb
@@ -0,0 +1,165 @@
+# lib/has_tag_string.rb:
+# Lets a model have tags, represented as space separate strings in a public
+# interface, but stored in the database as keys. Each tag can have a value
+# followed by a colon - e.g. url:http://www.flourish.org
+#
+# Copyright (c) 2010 UK Citizens Online Democracy. All rights reserved.
+# Email: hello@mysociety.org; WWW: http://www.mysociety.org/
+
+module HasTagString
+    # Represents one tag of one model.
+    # The migration to make this is currently only in WDTK code.
+    class HasTagStringTag < ActiveRecord::Base
+        # XXX strip_attributes!
+
+        validates_presence_of :name
+
+        # Return instance of the model that this tag tags
+        def tagged_model
+            return self.model.constantize.find(self.model_id)
+        end
+
+        # For display purposes, returns the name and value as a:b, or 
+        # if there is no value just the name a
+        def name_and_value
+            ret = self.name
+            if !self.value.nil?
+                ret += ":" + self.value
+            end
+            return ret
+        end
+
+        # Parses a text version of one single tag, such as "a:b" and returns
+        # the name and value, with nil for value if there isn't one.
+        def HasTagStringTag.split_tag_into_name_value(tag)
+            sections = tag.split(/:/)
+            name = sections[0]
+            if sections[1]
+                value = sections[1,sections.size].join(":")
+            else
+                value = nil
+            end
+            return name, value
+        end
+    end
+
+    # Methods which are added to the model instances being tagged
+    module InstanceMethods
+        # Given an input string of tags, sets all tags to that string.
+        # XXX This immediately saves the new tags.
+        def tag_string=(tag_string)
+            if tag_string.nil?
+                tag_string = ""
+            end
+
+            tag_string = tag_string.strip
+            # split tags apart
+            tags = tag_string.split(/\s+/).uniq
+
+            ActiveRecord::Base.transaction do
+                for tag in self.tags
+                    tag.destroy
+                end
+                self.tags = []
+                for tag in tags
+                    # see if is a machine tags (i.e. a tag which has a value)
+                    name, value = HasTagStringTag.split_tag_into_name_value(tag)
+
+                    tag = HasTagStringTag.new(
+                            :model => self.class.base_class.to_s,
+                            :model_id => self.id,
+                            :name => name, :value => value
+                    )
+                    self.tags << tag
+                end
+            end
+        end
+
+        # Returns the tags the model has, as a space separated string
+        def tag_string
+            return self.tags.map { |t| t.name_and_value }.join(' ')
+        end
+
+        # Returns the tags the model has, as an array of pairs of key/value
+        # (this can't be a dictionary as you can have multiple instances of a
+        # key with different values)
+        def tag_array
+            return self.tags.map { |t| [t.name, t.value] }
+        end
+
+        # Returns a list of all the strings someone might want to search for.
+        # So that is the key by itself, or the key and value.
+        # e.g. if a request was tagged openlylocal_id:12345, they might
+        # want to search for "openlylocal_id" or for "openlylocal_id:12345" to find it.
+        def tag_array_for_search
+            ret = {}
+            for tag in self.tags
+                ret[tag.name] = 1
+                ret[tag.name_and_value] = 1
+            end
+
+            return ret.keys.sort
+        end
+
+        # Test to see if class is tagged with the given tag
+        def has_tag?(tag_as_string)
+            for tag in self.tags
+                if tag.name == tag_as_string
+                    return true
+                end
+            end 
+            return false
+        end
+
+        class TagNotFound < StandardError
+        end
+    
+        # If the tag is a machine tag, returns array of its values
+        def get_tag_values(tag_as_string)
+            found = false
+            results = []
+            for tag in self.tags
+                if tag.name == tag_as_string
+                    found = true
+                    if !tag.value.nil?
+                        results << tag.value
+                    end
+                end
+            end 
+            if !found
+                raise TagNotFound
+            end
+            return results
+        end
+
+        # Adds a new tag to the model, if it isn't already there
+        def add_tag_if_not_already_present(tag_as_string)
+            self.tag_string = self.tag_string + " " + tag_as_string
+        end
+    end
+
+    # Methods which are added to the model class being tagged
+    module ClassMethods
+        # Find all public bodies with a particular tag
+        def find_by_tag(tag_as_string) 
+            return HasTagStringTag.find(:all, :conditions => 
+                ['name = ? and model = ?', tag_as_string, self.to_s ] 
+            ).map { |t| t.tagged_model }.sort { |a,b| a.name <=> b.name }.uniq
+        end
+    end
+
+    ######################################################################
+    # Main entry point, add has_tag_string to your model.
+    module HasMethods
+        def has_tag_string()
+            has_many :tags, :conditions => "model = '" + self.to_s + "'", :foreign_key => "model_id", :class_name => 'HasTagString::HasTagStringTag'
+
+            include InstanceMethods
+            self.class.send :include, ClassMethods
+        end
+    end
+
+end
+
+ActiveRecord::Base.extend HasTagString::HasMethods
+
diff --git a/lib/i18n_fixes.rb b/lib/i18n_fixes.rb
index 9f0849e75..64c370477 100644
--- a/lib/i18n_fixes.rb
+++ b/lib/i18n_fixes.rb
@@ -35,7 +35,7 @@ def gettext_interpolate(string, values)
     pattern, key = $1, $1.to_sym
 
     if !values.include?(key)
-      raise I18n::MissingInterpolationArgument.new(pattern, string)
+      raise I18n::MissingInterpolationArgument.new(pattern, string, values)
     else
       v = values[key].to_s
       if safe && !v.html_safe?
diff --git a/lib/mail_handler/backends/mail_backend.rb b/lib/mail_handler/backends/mail_backend.rb
index 28c486e1b..e019eba97 100644
--- a/lib/mail_handler/backends/mail_backend.rb
+++ b/lib/mail_handler/backends/mail_backend.rb
@@ -95,7 +95,7 @@ module MailHandler
             def get_from_address(mail)
                 first_from = first_from(mail)
                 if first_from
-                    if first_from.is_a?(ActiveSupport::Multibyte::Chars)
+                    if first_from.is_a?(String)
                         return nil
                     else
                         return first_from.address
@@ -109,7 +109,7 @@ module MailHandler
             def get_from_name(mail)
                 first_from = first_from(mail)
                 if first_from
-                    if first_from.is_a?(ActiveSupport::Multibyte::Chars)
+                    if first_from.is_a?(String)
                         return nil
                     else
                         return (first_from.display_name || nil)
diff --git a/lib/mail_handler/backends/mail_extensions.rb b/lib/mail_handler/backends/mail_extensions.rb
index 029331802..87af526bf 100644
--- a/lib/mail_handler/backends/mail_extensions.rb
+++ b/lib/mail_handler/backends/mail_extensions.rb
@@ -7,54 +7,6 @@ module Mail
         attr_accessor :within_rfc822_attachment # for parts within a message attached as text (for getting subject mainly)
         attr_accessor :count_parts_count
         attr_accessor :count_first_uudecode_count
-
-        # A patched version of the message initializer to work around a bug where stripping the original
-        # input removes meaningful spaces - e.g. in the case of uuencoded bodies.
-        def initialize(*args, &block)
-            @body = nil
-            @body_raw = nil
-            @separate_parts = false
-            @text_part = nil
-            @html_part = nil
-            @errors = nil
-            @header = nil
-            @charset = 'UTF-8'
-            @defaulted_charset = true
-
-            @perform_deliveries = true
-            @raise_delivery_errors = true
-
-            @delivery_handler = nil
-
-            @delivery_method = Mail.delivery_method.dup
-
-            @transport_encoding = Mail::Encodings.get_encoding('7bit')
-
-            @mark_for_delete = false
-
-            if args.flatten.first.respond_to?(:each_pair)
-                init_with_hash(args.flatten.first)
-            else
-                # The replacement of this commented out line is the change.
-                # init_with_string(args.flatten[0].to_s.strip)
-                init_with_string(args.flatten[0].to_s)
-            end
-
-            if block_given?
-                instance_eval(&block)
-            end
-
-            self
-        end
-
-        def set_envelope_header
-            raw_string = raw_source.to_s
-            if match_data = raw_source.to_s.match(/\AFrom\s(#{TEXT}+)#{CRLF}/m)
-               set_envelope(match_data[1])
-               self.raw_source = raw_string.sub(match_data[0], "")
-            end
-        end
-
     end
 
     # A patched version of the parameter hash that handles nil values without throwing
@@ -77,6 +29,7 @@ module Mail
     # HACK: Backport encoding fixes for Ruby 1.8 from Mail 2.5
     # Can be removed when we no longer support Ruby 1.8
     class Ruby18
+
         def Ruby18.b_value_decode(str)
             match = str.match(/\=\?(.+)?\?[Bb]\?(.+)?\?\=/m)
             if match
@@ -129,11 +82,11 @@ module Mail
         def Ruby19.b_value_decode(str)
           match = str.match(/\=\?(.+)?\?[Bb]\?(.+)?\?\=/m)
           if match
-            encoding = match[1]
+            charset = match[1]
             str = Ruby19.decode_base64(match[2])
             # Rescue an ArgumentError arising from an unknown encoding.
             begin
-                str.force_encoding(fix_encoding(encoding))
+                str.force_encoding(pick_encoding(charset))
             rescue ArgumentError
             end
           end
@@ -141,18 +94,5 @@ module Mail
           decoded.valid_encoding? ? decoded : decoded.encode("utf-16le", :invalid => :replace, :replace => "").encode("utf-8")
         end
 
-        def Ruby19.q_value_decode(str)
-            match = str.match(/\=\?(.+)?\?[Qq]\?(.+)?\?\=/m)
-            if match
-                encoding = match[1]
-                str = Encodings::QuotedPrintable.decode(match[2].gsub(/_/, '=20'))
-                # Backport line from mail 2.5 to strip a trailing = character
-                # Remove trailing = if it exists in a Q encoding
-                str = str.sub(/\=$/, '')
-                str.force_encoding(fix_encoding(encoding))
-            end
-            decoded = str.encode("utf-8", :invalid => :replace, :replace => "")
-            decoded.valid_encoding? ? decoded : decoded.encode("utf-16le", :invalid => :replace, :replace => "").encode("utf-8")
-        end
     end
 end
diff --git a/lib/mail_handler/mail_handler.rb b/lib/mail_handler/mail_handler.rb
index 918f91180..53033d440 100644
--- a/lib/mail_handler/mail_handler.rb
+++ b/lib/mail_handler/mail_handler.rb
@@ -59,7 +59,7 @@ module MailHandler
         end
 
         # e.g. http://www.whatdotheyknow.com/request/copy_of_current_swessex_scr_opt#incoming-9928
-        if content_type == 'application/acrobat'
+        if content_type == 'application/acrobat' or content_type == 'document/pdf'
             content_type = 'application/pdf'
         end
 
diff --git a/lib/no_constraint_disabling.rb b/lib/no_constraint_disabling.rb
index d515a959a..32a4a6bfe 100644
--- a/lib/no_constraint_disabling.rb
+++ b/lib/no_constraint_disabling.rb
@@ -47,7 +47,7 @@ module ActiveRecord
               connection,
               table_name,
               class_names[table_name.to_sym] || table_name.classify,
-              File.join(fixtures_directory, path))
+              ::File.join(fixtures_directory, path))
           end
 
           all_loaded_fixtures.update(fixtures_map)
diff --git a/lib/quiet_opener.rb b/lib/quiet_opener.rb
index ae6605c43..16ea27b8e 100644
--- a/lib/quiet_opener.rb
+++ b/lib/quiet_opener.rb
@@ -1,6 +1,8 @@
 require 'open-uri'
 require 'net-purge'
-require 'net/http/local'
+if RUBY_VERSION.to_f < 2.0
+    require 'net/http/local'
+end
 
 def quietly_try_to_open(url)
     begin
@@ -12,17 +14,36 @@ def quietly_try_to_open(url)
     return result
 end
 
+# On Ruby versions before 2.0, we need to use the net-http-local gem
+# to force the use of 127.0.0.1 as the local interface for the
+# connection.  However, at the time of writing this gem doesn't work
+# on Ruby 2.0 and it's not necessary with that Ruby version - one can
+# supply a :local_host option to Net::HTTP:start.  So, this helper
+# function is to abstract away that difference, and can be used as you
+# would Net::HTTP.start(host) when passed a block.
+def http_from_localhost(host)
+    if RUBY_VERSION.to_f >= 2.0
+        Net::HTTP.start(host, :local_host => '127.0.0.1') do |http|
+            yield http
+        end
+    else
+        Net::HTTP.bind '127.0.0.1' do
+            Net::HTTP.start(host) do |http|
+                yield http
+            end
+        end
+    end
+end
+
 def quietly_try_to_purge(host, url)
     begin
         result = ""
         result_body = ""
-        Net::HTTP.bind '127.0.0.1' do
-            Net::HTTP.start(host) {|http|
-                request = Net::HTTP::Purge.new(url)
-                response = http.request(request)
-                result = response.code
-                result_body = response.body
-            }
+        http_from_localhost(host) do |http|
+            request = Net::HTTP::Purge.new(url)
+            response = http.request(request)
+            result = response.code
+            result_body = response.body
         end
     rescue OpenURI::HTTPError, SocketError, Errno::ETIMEDOUT, Errno::ECONNREFUSED, Errno::EHOSTUNREACH, Errno::ECONNRESET, Errno::ENETUNREACH
         Rails.logger.warn("PURGE: Unable to reach host #{host}")
diff --git a/lib/strip_attributes/README.rdoc b/lib/strip_attributes/README.rdoc
new file mode 100644
index 000000000..bd55c0c1c
--- /dev/null
+++ b/lib/strip_attributes/README.rdoc
@@ -0,0 +1,77 @@
+== StripAttributes
+
+StripAttributes is a Rails plugin that automatically strips all ActiveRecord
+model attributes of leading and trailing whitespace before validation. If the
+attribute is blank, it strips the value to +nil+.
+
+It works by adding a before_validation hook to the record.  By default, all
+attributes are stripped of whitespace, but <tt>:only</tt> and <tt>:except</tt>
+options can be used to limit which attributes are stripped.  Both options accept
+a single attribute (<tt>:only => :field</tt>) or arrays of attributes (<tt>:except =>
+[:field1, :field2, :field3]</tt>).
+
+=== Examples
+
+  class DrunkPokerPlayer < ActiveRecord::Base
+    strip_attributes!
+  end
+
+  class SoberPokerPlayer < ActiveRecord::Base
+    strip_attributes! :except => :boxers
+  end
+
+  class ConservativePokerPlayer < ActiveRecord::Base
+    strip_attributes! :only => [:shoe, :sock, :glove]
+  end
+
+=== Installation
+
+Option 1. Use the standard Rails plugin install (assuming Rails 2.1).
+
+  ./script/plugin install git://github.com/rmm5t/strip_attributes.git
+
+Option 2. Use git submodules
+
+  git submodule add git://github.com/rmm5t/strip_attributes.git vendor/plugins/strip_attributes
+
+Option 3. Use braid[http://github.com/evilchelu/braid/tree/master] (assuming
+you're using git)
+
+  braid add --rails_plugin git://github.com/rmm5t/strip_attributes.git
+  git merge braid/track
+
+=== Other
+
+If you want to use this outside of Rails, extend StripAttributes in your
+ActiveRecord model after putting strip_attributes in your <tt>$LOAD_PATH</tt>:
+
+  require 'strip_attributes'
+  class SomeModel < ActiveRecord::Base
+    extend StripAttributes
+    strip_attributes!
+  end
+
+=== Support
+
+The StripAttributes homepage is http://stripattributes.rubyforge.org. You can
+find the StripAttributes RubyForge progject page at:
+http://rubyforge.org/projects/stripattributes
+
+StripAttributes source is hosted on GitHub[http://github.com/]:
+http://github.com/rmm5t/strip_attributes
+
+Feel free to submit suggestions or feature requests. If you send a patch,
+remember to update the corresponding unit tests.  In fact, I prefer new features
+to be submitted in the form of new unit tests.
+
+=== Credits
+
+The idea was triggered by the information at
+http://wiki.rubyonrails.org/rails/pages/HowToStripWhitespaceFromModelFields
+but was modified from the original to include more idiomatic ruby and rails
+support.
+
+=== License
+
+Copyright (c) 2007-2008 Ryan McGeary released under the MIT license
+http://en.wikipedia.org/wiki/MIT_License
+\ No newline at end of file
diff --git a/lib/strip_attributes/Rakefile b/lib/strip_attributes/Rakefile
new file mode 100644
index 000000000..05b0c14ad
--- /dev/null
+++ b/lib/strip_attributes/Rakefile
@@ -0,0 +1,30 @@
+require 'rake'
+require 'rake/testtask'
+require 'rake/rdoctask'
+
+desc 'Default: run unit tests.'
+task :default => :test
+
+desc 'Test the stripattributes plugin.'
+Rake::TestTask.new(:test) do |t|
+  t.libs << 'lib'
+  t.pattern = 'test/**/*_test.rb'
+  t.verbose = true
+end
+
+desc 'Generate documentation for the stripattributes plugin.'
+Rake::RDocTask.new(:rdoc) do |rdoc|
+  rdoc.rdoc_dir = 'rdoc'
+  rdoc.title    = 'Stripattributes'
+  rdoc.options << '--line-numbers' << '--inline-source'
+  rdoc.rdoc_files.include('README.rdoc')
+  rdoc.rdoc_files.include('lib/**/*.rb')
+end
+
+desc 'Publishes rdoc to rubyforge server'
+task :publish_rdoc => :rdoc do
+  cmd = "scp -r rdoc/* rmm5t@rubyforge.org:/var/www/gforge-projects/stripattributes"
+  puts "\nPublishing rdoc: #{cmd}\n\n"
+  system(cmd)
+end
+
diff --git a/lib/strip_attributes/strip_attributes.rb b/lib/strip_attributes/strip_attributes.rb
new file mode 100644
index 000000000..130d10185
--- /dev/null
+++ b/lib/strip_attributes/strip_attributes.rb
@@ -0,0 +1,37 @@
+module StripAttributes
+  # Strips whitespace from model fields and leaves nil values as nil.
+  # XXX this differs from official StripAttributes, as it doesn't make blank cells null.
+  def strip_attributes!(options = nil)
+    before_validation do |record|
+      attribute_names = StripAttributes.narrow(record.attribute_names, options)
+
+      attribute_names.each do |attribute_name|
+        value = record[attribute_name]
+        if value.respond_to?(:strip)
+          stripped = value.strip
+          if stripped != value
+            record[attribute_name] = (value.nil?) ? nil : stripped
+          end
+        end
+      end
+    end
+  end
+
+  # Necessary because Rails has removed the narrowing of attributes using :only
+  # and :except on Base#attributes
+  def self.narrow(attribute_names, options)
+    if options.nil?
+      attribute_names
+    else
+      if except = options[:except]
+        except = Array(except).collect { |attribute| attribute.to_s }
+        attribute_names - except
+      elsif only = options[:only]
+        only = Array(only).collect { |attribute| attribute.to_s }
+        attribute_names & only
+      else
+        raise ArgumentError, "Options does not specify :except or :only (#{options.keys.inspect})"
+      end
+    end
+  end
+end
diff --git a/lib/strip_attributes/test/strip_attributes_test.rb b/lib/strip_attributes/test/strip_attributes_test.rb
new file mode 100644
index 000000000..8158dc664
--- /dev/null
+++ b/lib/strip_attributes/test/strip_attributes_test.rb
@@ -0,0 +1,90 @@
+require "#{File.dirname(__FILE__)}/test_helper"
+
+module MockAttributes
+  def self.included(base)
+    base.column :foo, :string
+    base.column :bar, :string
+    base.column :biz, :string
+    base.column :baz, :string
+  end
+end
+
+class StripAllMockRecord < ActiveRecord::Base
+  include MockAttributes
+  strip_attributes!
+end
+
+class StripOnlyOneMockRecord < ActiveRecord::Base
+  include MockAttributes
+  strip_attributes! :only => :foo
+end
+
+class StripOnlyThreeMockRecord < ActiveRecord::Base
+  include MockAttributes
+  strip_attributes! :only => [:foo, :bar, :biz]
+end
+
+class StripExceptOneMockRecord < ActiveRecord::Base
+  include MockAttributes
+  strip_attributes! :except => :foo
+end
+
+class StripExceptThreeMockRecord < ActiveRecord::Base
+  include MockAttributes
+  strip_attributes! :except => [:foo, :bar, :biz]
+end
+
+class StripAttributesTest < Test::Unit::TestCase
+  def setup
+    @init_params = { :foo => "\tfoo", :bar => "bar \t ", :biz => "\tbiz ", :baz => "" }
+  end
+
+  def test_should_exist
+    assert Object.const_defined?(:StripAttributes)
+  end
+
+  def test_should_strip_all_fields
+    record = StripAllMockRecord.new(@init_params)
+    record.valid?
+    assert_equal "foo", record.foo
+    assert_equal "bar", record.bar
+    assert_equal "biz", record.biz
+    assert_equal "",    record.baz
+  end
+
+  def test_should_strip_only_one_field
+    record = StripOnlyOneMockRecord.new(@init_params)
+    record.valid?
+    assert_equal "foo",     record.foo
+    assert_equal "bar \t ", record.bar
+    assert_equal "\tbiz ",  record.biz
+    assert_equal "",        record.baz
+  end
+
+  def test_should_strip_only_three_fields
+    record = StripOnlyThreeMockRecord.new(@init_params)
+    record.valid?
+    assert_equal "foo", record.foo
+    assert_equal "bar", record.bar
+    assert_equal "biz", record.biz
+    assert_equal "",    record.baz
+  end
+
+  def test_should_strip_all_except_one_field
+    record = StripExceptOneMockRecord.new(@init_params)
+    record.valid?
+    assert_equal "\tfoo", record.foo
+    assert_equal "bar",   record.bar
+    assert_equal "biz",   record.biz
+    assert_equal "",      record.baz
+  end
+
+  def test_should_strip_all_except_three_fields
+    record = StripExceptThreeMockRecord.new(@init_params)
+    record.valid?
+    assert_equal "\tfoo",   record.foo
+    assert_equal "bar \t ", record.bar
+    assert_equal "\tbiz ",  record.biz
+    assert_equal "",        record.baz
+  end
+end
diff --git a/lib/strip_attributes/test/test_helper.rb b/lib/strip_attributes/test/test_helper.rb
new file mode 100644
index 000000000..7d06c40db
--- /dev/null
+++ b/lib/strip_attributes/test/test_helper.rb
@@ -0,0 +1,20 @@
+require 'test/unit'
+require 'rubygems'
+require 'active_record'
+
+PLUGIN_ROOT = File.expand_path(File.join(File.dirname(__FILE__), ".."))
+
+$LOAD_PATH.unshift "#{PLUGIN_ROOT}/lib"
+require "#{PLUGIN_ROOT}/init"
+
+class ActiveRecord::Base
+  alias_method :save, :valid?
+  def self.columns()
+    @columns ||= []
+  end
+
+  def self.column(name, sql_type = nil, default = nil, null = true)
+    @columns ||= []
+    @columns << ActiveRecord::ConnectionAdapters::Column.new(name.to_s, default, sql_type, null)
+  end
+end
diff --git a/lib/tasks/gettext.rake b/lib/tasks/gettext.rake
index 366dfbe88..3f357213f 100644
--- a/lib/tasks/gettext.rake
+++ b/lib/tasks/gettext.rake
@@ -29,11 +29,11 @@ namespace :gettext do
    end
 
    def theme_files_to_translate(theme)
-       Dir.glob("{vendor/plugins/#{theme}/lib}/**/*.{rb,erb}")
+       Dir.glob("{lib/themes/#{theme}/lib}/**/*.{rb,erb}")
    end
 
    def theme_locale_path(theme)
-     File.join(Rails.root, "vendor", "plugins", theme, "locale-theme")
+     Rails.root.join "lib", "themes", theme, "locale-theme"
    end
 
 end
diff --git a/lib/tasks/import.rake b/lib/tasks/import.rake
new file mode 100644
index 000000000..c8183c745
--- /dev/null
+++ b/lib/tasks/import.rake
@@ -0,0 +1,78 @@
+require 'csv'
+require 'tempfile'
+
+namespace :import do
+
+    desc 'Import public bodies from CSV provided on standard input'
+    task :import_csv => :environment do
+        dryrun = ENV['DRYRUN'] != '0'
+        if dryrun
+            STDERR.puts "Only a dry run; public bodies will not be created"
+        end
+
+        tmp_csv = nil
+        Tempfile.open('alaveteli') do |f|
+            f.write STDIN.read
+            tmp_csv = f
+        end
+
+        number_of_rows = 0
+
+        STDERR.puts "Preliminary check for ambiguous names or slugs..."
+
+        # Check that the name and slugified version of the name are
+        # unique:
+        url_part_count = Hash.new { 0 }
+        name_count = Hash.new { 0 }
+        reader = CSV.open tmp_csv.path, 'r'
+        header_line = reader.shift
+        headers = header_line.collect { |h| h.gsub /^#/, ''}
+
+        reader.each do |row_array|
+            row = Hash[headers.zip row_array]
+            name = row['name']
+            url_part = MySociety::Format::simplify_url_part name, "body"
+            name_count[name] += 1
+            url_part_count[url_part] += 1
+            number_of_rows += 1
+        end
+
+        non_unique_error = false
+
+        [[name_count, 'name'],
+         [url_part_count, 'url_part']].each do |counter, field|
+            counter.sort.map do |name, count|
+                if count > 1
+                    non_unique_error = true
+                    STDERR.puts "The #{field} #{name} was found #{count} times."
+                end
+            end
+        end
+
+        next if non_unique_error
+
+        STDERR.puts "Now importing the public bodies..."
+
+        # Now it's (probably) safe to try to import:
+        errors, notes = PublicBody.import_csv_from_file(tmp_csv.path,
+                                                        tag='',
+                                                        tag_behaviour='replace',
+                                                        dryrun,
+                                                        editor="#{ENV['USER']} (Unix user)",
+                                                        I18n.available_locales) do |row_number, fields|
+            percent_complete = (100 * row_number.to_f / number_of_rows).to_i
+            STDERR.print "#{row_number} out of #{number_of_rows} "
+            STDERR.puts "(#{percent_complete}% complete)"
+        end
+
+        if errors.length > 0
+            STDERR.puts "Import failed, with the following errors:"
+            errors.each do |error|
+                STDERR.puts "  #{error}"
+            end
+        else
+            STDERR.puts "Done."
+        end
+
+    end
+end
diff --git a/lib/tasks/stats.rake b/lib/tasks/stats.rake
index 4eda27289..38eb15996 100644
--- a/lib/tasks/stats.rake
+++ b/lib/tasks/stats.rake
@@ -1,8 +1,14 @@
 namespace :stats do
 
-  desc 'Produce transaction stats'
+  desc 'Produce monthly transaction stats for a period starting START_YEAR'
   task :show => :environment do
-    month_starts = (Date.new(2009, 1)..Date.new(2011, 8)).select { |d| d.day == 1 }
+    example = 'rake stats:show START_YEAR=2009 [START_MONTH=3 END_YEAR=2012 END_MONTH=10]'
+    check_for_env_vars(['START_YEAR'], example)
+    start_year = (ENV['START_YEAR']).to_i
+    start_month = (ENV['START_MONTH'] || 1).to_i
+    end_year = (ENV['END_YEAR'] || Time.now.year).to_i
+    end_month = (ENV['END_MONTH'] || Time.now.month).to_i
+    month_starts = (Date.new(start_year, start_month)..Date.new(end_year, end_month)).select { |d| d.day == 1 }
     headers = ['Period',
                'Requests sent',
                'Annotations added',
@@ -94,7 +100,7 @@ namespace :stats do
   desc 'Update statistics in the public_bodies table'
   task :update_public_bodies_stats => :environment do
     verbose = ENV['VERBOSE'] == '1'
-    PublicBody.all.each do |public_body|
+    PublicBody.find_each(:batch_size => 10) do |public_body|
       puts "Counting overdue requests for #{public_body.name}" if verbose
 
       # Look for values of 'waiting_response_overdue' and
@@ -102,7 +108,12 @@ namespace :stats do
       # described_state column, and instead need to be calculated:
       overdue_count = 0
       very_overdue_count = 0
-      InfoRequest.find_each(:conditions => {:public_body_id => public_body.id}) do |ir|
+      InfoRequest.find_each(:batch_size => 200,
+                            :conditions => {
+                                :public_body_id => public_body.id,
+                                :awaiting_description => false,
+                                :prominence => 'normal'
+                            }) do |ir|
         case ir.calculate_status
         when 'waiting_response_very_overdue'
           very_overdue_count += 1
diff --git a/lib/tasks/temp.rake b/lib/tasks/temp.rake
index d371ad0dc..67fa10174 100644
--- a/lib/tasks/temp.rake
+++ b/lib/tasks/temp.rake
@@ -1,292 +1,40 @@
 namespace :temp do
 
-    desc "Fix the history of requests where the described state doesn't match the latest status value
-          used by search, by adding an edit event that will correct the latest status"
-    task :fix_bad_request_states => :environment do
-        dryrun = ENV['DRYRUN'] != '0'
-        if dryrun
-            puts "This is a dryrun"
-        end
-
-        InfoRequest.find_each() do |info_request|
-            next if info_request.url_title == 'holding_pen'
-            last_info_request_event = info_request.info_request_events[-1]
-            if last_info_request_event.latest_status != info_request.described_state
-                puts "#{info_request.id} #{info_request.url_title} #{last_info_request_event.latest_status} #{info_request.described_state}"
-                params = { :script => 'rake temp:fix_bad_request_states',
-                           :user_id => nil,
-                           :old_described_state => info_request.described_state,
-                           :described_state => info_request.described_state
-                          }
-                if ! dryrun
-                    info_request.info_request_events.create!(:last_described_at => last_info_request_event.described_at + 1.second,
-                                                             :event_type => 'status_update',
-                                                             :described_state => info_request.described_state,
-                                                             :calculated_state => info_request.described_state,
-                                                             :params => params)
-                    info_request.info_request_events.each{ |event| event.xapian_mark_needs_index }
-                end
-            end
-
-        end
-    end
-
-    def disable_duplicate_account(user, count, dryrun)
-        dupe_email = "duplicateemail#{count}@example.com"
-        puts "Updating #{user.email} to #{dupe_email} for user #{user.id}"
-        user.email = dupe_email
-        user.save! unless dryrun
-    end
-
-    desc "Re-extract any missing cached attachments"
-    task :reextract_missing_attachments, [:commit] => :environment do |t, args|
-        dry_run = args.commit.nil? || args.commit.empty?
-        total_messages = 0
-        messages_to_reparse = 0
-        IncomingMessage.find_each :include => :foi_attachments do |im|
-            begin
-                reparse = im.foi_attachments.any? { |fa| ! File.exists? fa.filepath }
-                total_messages += 1
-                messages_to_reparse += 1 if reparse
-                if total_messages % 1000 == 0
-                    puts "Considered #{total_messages} received emails."
-                end
-                unless dry_run
-                    im.parse_raw_email! true if reparse
-                    sleep 2
-                end
-            rescue StandardError => e
-                puts "There was a #{e.class} exception reparsing IncomingMessage with ID #{im.id}"
-                puts e.backtrace
-                puts e.message
-            end
-        end
-        message = dry_run ? "Would reparse" : "Reparsed"
-        message += " #{messages_to_reparse} out of #{total_messages} received emails."
-        puts message
-    end
-
-    desc 'Cleanup accounts with a space in the email address'
-    task :clean_up_emails_with_spaces => :environment do
-        dryrun = ENV['DRYRUN'] == '0' ? false : true
-        if dryrun
-            puts "This is a dryrun"
-        end
-        count = 0
-        User.find_each do |user|
-            if / /.match(user.email)
-
-                email_without_spaces = user.email.gsub(' ', '')
-                existing = User.find_user_by_email(email_without_spaces)
-                # Another account exists with the canonical address
-                if existing
-                    if user.info_requests.count == 0 and user.comments.count == 0 and user.track_things.count == 0
-                        count += 1
-                        disable_duplicate_account(user, count, dryrun)
-                    elsif existing.info_requests.count == 0 and existing.comments.count == 0 and existing.track_things.count == 0
-                        count += 1
-                        disable_duplicate_account(existing, count, dryrun)
-                        user.email = email_without_spaces
-                        puts "Updating #{user.email} to #{email_without_spaces} for user #{user.id}"
-                        user.save! unless dryrun
-                    else
-                        user.info_requests.each do |info_request|
-                            info_request.user = existing
-                            info_request.save! unless dryrun
-                            puts "Moved request #{info_request.id} from user #{user.id} to #{existing.id}"
-                        end
-
-                        user.comments.each do |comment|
-                            comment.user = existing
-                            comment.save! unless dryrun
-                            puts "Moved comment #{comment.id} from user #{user.id} to #{existing.id}"
-                        end
-
-                        user.track_things.each do |track_thing|
-                            track_thing.tracking_user = existing
-                            track_thing.save! unless dryrun
-                            puts "Moved track thing #{track_thing.id} from user #{user.id} to #{existing.id}"
-                        end
-
-                        TrackThingsSentEmail.find_each(:conditions => ['user_id = ?', user]) do |sent_email|
-                            sent_email.user = existing
-                            sent_email.save! unless dryrun
-                            puts "Moved track thing sent email #{sent_email.id} from user #{user.id} to #{existing.id}"
-
-                        end
-
-                        user.censor_rules.each do |censor_rule|
-                            censor_rule.user = existing
-                            censor_rule.save! unless dryrun
-                            puts "Moved censor rule #{censor_rule.id} from user #{user.id} to #{existing.id}"
-                        end
-
-                        user.user_info_request_sent_alerts.each do |sent_alert|
-                            sent_alert.user = existing
-                            sent_alert.save! unless dryrun
-                            puts "Moved sent alert #{sent_alert.id} from user #{user.id} to #{existing.id}"
-                        end
-
-                        count += 1
-                        disable_duplicate_account(user, count, dryrun)
-                    end
-                else
-                    puts "Updating #{user.email} to #{email_without_spaces} for user #{user.id}"
-                    user.email = email_without_spaces
-                    user.save! unless dryrun
-                end
-            end
-        end
-    end
-
-    desc 'Create a CSV file of a random selection of raw emails, for comparing hexdigests'
-    task :random_attachments_hexdigests => :environment do
 
-        # The idea is to run this under the Rail 2 codebase, where
-        # Tmail was used to extract the attachements, and the task
-        # will output all of those file paths in a CSV file, and a
-        # list of the raw email files in another.  The latter file is
-        # useful so that one can easily tar up the emails with:
-        #
-        #   tar cvz -T raw-email-files -f raw_emails.tar.gz
-        #
-        # Then you can switch to the Rails 3 codebase, where
-        # attachment parsing is done via
-        # recompute_attachments_hexdigests
-
-        require 'csv'
-
-        File.open('raw-email-files', 'w') do |f|
-            CSV.open('attachment-hexdigests.csv', 'w') do |csv|
-                csv << ['filepath', 'i', 'url_part_number', 'hexdigest']
-                IncomingMessage.all(:order => 'RANDOM()', :limit => 1000).each do |incoming_message|
-                    # raw_email.filepath fails unless the
-                    # incoming_message has an associated request
-                    next unless incoming_message.info_request
-                    raw_email = incoming_message.raw_email
-                    f.puts raw_email.filepath
-                    incoming_message.foi_attachments.each_with_index do |attachment, i|
-                        csv << [raw_email.filepath, i, attachment.url_part_number, attachment.hexdigest]
-                    end
-                end
-            end
-        end
-
-    end
-
-
-    desc 'Check the hexdigests of attachments in emails on disk'
-    task :recompute_attachments_hexdigests => :environment do
-
-        require 'csv'
-        require 'digest/md5'
-
-        OldAttachment = Struct.new :filename, :attachment_index, :url_part_number, :hexdigest
-
-        filename_to_attachments = Hash.new {|h,k| h[k] = []}
-
-        header_line = true
-        CSV.foreach('attachment-hexdigests.csv') do |filename, attachment_index, url_part_number, hexdigest|
-            if header_line
-                header_line = false
-            else
-                filename_to_attachments[filename].push OldAttachment.new filename, attachment_index, url_part_number, hexdigest
-            end
+    desc 'Analyse rails log specified by LOG_FILE to produce a list of request volume'
+    task :request_volume => :environment do
+        example = 'rake log_analysis:request_volume LOG_FILE=log/access_log OUTPUT_FILE=/tmp/log_analysis.csv'
+        check_for_env_vars(['LOG_FILE', 'OUTPUT_FILE'],example)
+        log_file_path = ENV['LOG_FILE']
+        output_file_path = ENV['OUTPUT_FILE']
+        is_gz = log_file_path.include?(".gz")
+        urls = Hash.new(0)
+        f = is_gz ? Zlib::GzipReader.open(log_file_path) : File.open(log_file_path, 'r')
+        processed = 0
+        f.each_line do |line|
+            line.force_encoding('ASCII-8BIT') if RUBY_VERSION.to_f >= 1.9
+            if request_match = line.match(/^Started (GET|OPTIONS|POST) "(\/request\/.*?)"/)
+                next if line.match(/request\/\d+\/response/)
+                urls[request_match[2]] += 1
+                processed += 1
+            end
+        end
+        url_counts = urls.to_a
+        num_requests_visited_n_times = Hash.new(0)
+        CSV.open(output_file_path, "wb") do |csv|
+            csv << ['URL', 'Number of visits']
+            url_counts.sort_by(&:last).each do |url, count|
+                num_requests_visited_n_times[count] +=1
+                csv << [url,"#{count}"]
+            end
+            csv << ['Number of visits', 'Number of URLs']
+            num_requests_visited_n_times.to_a.sort.each do |number_of_times, number_of_requests|
+                csv << [number_of_times, number_of_requests]
+            end
+            csv << ['Total number of visits']
+            csv << [processed]
         end
 
-        total_attachments = 0
-        attachments_with_different_hexdigest = 0
-        files_with_different_numbers_of_attachments = 0
-        no_tnef_attachments = 0
-        no_parts_in_multipart = 0
-
-        multipart_error = "no parts on multipart mail"
-        tnef_error = "tnef produced no attachments"
-
-        # Now check each file:
-        filename_to_attachments.each do |filename, old_attachments|
-
-            # Currently it doesn't seem to be possible to reuse the
-            # attachment parsing code in Alaveteli without saving
-            # objects to the database, so reproduce what it does:
-
-            raw_email = nil
-            File.open(filename) do |f|
-                raw_email = f.read
-            end
-            mail = MailHandler.mail_from_raw_email(raw_email)
-
-            begin
-                attachment_attributes = MailHandler.get_attachment_attributes(mail)
-            rescue IOError => e
-                if e.message == tnef_error
-                    puts "#{filename} #{tnef_error}"
-                    no_tnef_attachments += 1
-                    next
-                else
-                    raise
-                end
-            rescue Exception => e
-                if e.message == multipart_error
-                    puts "#{filename} #{multipart_error}"
-                    no_parts_in_multipart += 1
-                    next
-                else
-                    raise
-                end
-            end
-
-            if attachment_attributes.length != old_attachments.length
-                puts "#{filename} the number of old attachments #{old_attachments.length} didn't match the number of new attachments #{attachment_attributes.length}"
-                files_with_different_numbers_of_attachments += 1
-            else
-                old_attachments.each_with_index do |old_attachment, i|
-                    total_attachments += 1
-                    attrs = attachment_attributes[i]
-                    old_hexdigest = old_attachment.hexdigest
-                    new_hexdigest = attrs[:hexdigest]
-                    new_content_type = attrs[:content_type]
-                    old_url_part_number = old_attachment.url_part_number.to_i
-                    new_url_part_number = attrs[:url_part_number]
-                    if old_url_part_number != new_url_part_number
-                        puts "#{i} #{filename} old_url_part_number #{old_url_part_number}, new_url_part_number #{new_url_part_number}"
-                    end
-                    if old_hexdigest != new_hexdigest
-                        body = attrs[:body]
-                        # First, if the content type is one of
-                        # text/plain, text/html or application/rtf try
-                        # changing CRLF to LF and calculating a new
-                        # digest - we generally don't worry about
-                        # these changes:
-                        new_converted_hexdigest = nil
-                        if ["text/plain", "text/html", "application/rtf"].include? new_content_type
-                            converted_body = body.gsub /\r\n/, "\n"
-                            new_converted_hexdigest = Digest::MD5.hexdigest converted_body
-                            puts "new_converted_hexdigest is #{new_converted_hexdigest}"
-                        end
-                        if (! new_converted_hexdigest) || (old_hexdigest != new_converted_hexdigest)
-                            puts "#{i} #{filename} old_hexdigest #{old_hexdigest} wasn't the same as new_hexdigest #{new_hexdigest}"
-                            puts "  body was of length #{body.length}"
-                            puts "  content type was: #{new_content_type}"
-                            path = "/tmp/#{new_hexdigest}"
-                            f = File.new path, "w"
-                            f.write body
-                            f.close
-                            puts "  wrote body to #{path}"
-                            attachments_with_different_hexdigest += 1
-                        end
-                    end
-                end
-            end
-
-        end
-
-        puts "total_attachments: #{total_attachments}"
-        puts "attachments_with_different_hexdigest: #{attachments_with_different_hexdigest}"
-        puts "files_with_different_numbers_of_attachments: #{files_with_different_numbers_of_attachments}"
-        puts "no_tnef_attachments: #{no_tnef_attachments}"
-        puts "no_parts_in_multipart: #{no_parts_in_multipart}"
-
     end
 
 end
diff --git a/lib/tasks/themes.rake b/lib/tasks/themes.rake
index a8d16f108..4a864d141 100644
--- a/lib/tasks/themes.rake
+++ b/lib/tasks/themes.rake
@@ -1,94 +1,123 @@
 
+require Rails.root.join('commonlib', 'rblib', 'git')
+
 namespace :themes do
 
-    def plugin_dir
-        File.join(Rails.root,"vendor","plugins")
+    # Alias the module so we don't need the MySociety prefix here
+    Git = MySociety::Git
+
+    def all_themes_dir
+        File.join(Rails.root,"lib","themes")
     end
 
     def theme_dir(theme_name)
-        File.join(plugin_dir, theme_name)
+        File.join(all_themes_dir, theme_name)
     end
 
-    def checkout(commitish)
-        puts "Checking out #{commitish}" if verbose
-        system "git checkout #{commitish}"
+    def old_all_themes_dir(theme_name)
+        File.join(Rails.root, "vendor", "plugins", theme_name)
     end
 
-    def checkout_tag(version)
-        checkout usage_tag(version)
+    def possible_theme_dirs(theme_name)
+        [theme_dir(theme_name), old_all_themes_dir(theme_name)]
     end
 
-    def checkout_remote_branch(branch)
-        checkout "origin/#{branch}"
+    def installed?(theme_name)
+        possible_theme_dirs(theme_name).any? { |dir| File.directory? dir }
     end
 
     def usage_tag(version)
         "use-with-alaveteli-#{version}"
     end
 
-    def install_theme_using_git(name, uri, verbose=false, options={})
-        install_path = theme_dir(name)
-        Dir.chdir(plugin_dir) do
-            clone_command = "git clone #{uri} #{name}"
-            if system(clone_command)
-                Dir.chdir install_path do
-                    # First try to checkout a specific branch of the theme
-                    tag_checked_out = checkout_remote_branch(AlaveteliConfiguration::theme_branch) if AlaveteliConfiguration::theme_branch
-                    if !tag_checked_out
-                        # try to checkout a tag exactly matching ALAVETELI VERSION
-                        tag_checked_out = checkout_tag(ALAVETELI_VERSION)
-                    end
-                    if ! tag_checked_out
-                        # if we're on a hotfix release (four sequence elements or more),
-                        # look for a usage tag matching the minor release (three sequence elements)
-                        # and check that out if found
-                        if hotfix_version = /^(\d+\.\d+\.\d+)(\.\d+)+/.match(ALAVETELI_VERSION)
-                            base_version = hotfix_version[1]
-                            tag_checked_out = checkout_tag(base_version)
-                        end
-                    end
-                    if ! tag_checked_out
-                        puts "No specific tag for this version: using HEAD" if verbose
-                    end
-                    puts "removing: .git .gitignore" if verbose
-                    rm_rf %w(.git .gitignore)
-                end
-            else
-                rm_rf install_path
-                raise "#{clone_command} failed! Stopping."
-            end
-        end
-    end
-
     def uninstall(theme_name, verbose=false)
-        dir = theme_dir(theme_name)
-        if File.directory?(dir)
-            run_hook(theme_name, 'uninstall', verbose)
-            puts "Removing '#{dir}'" if verbose
-            rm_r dir
-        else
-            puts "Plugin doesn't exist: #{dir}"
+        possible_theme_dirs(theme_name).each do |dir|
+            if File.directory?(dir)
+                run_hook(theme_name, 'uninstall', verbose)
+            end
         end
     end
 
     def run_hook(theme_name, hook_name, verbose=false)
-        hook_file = File.join(theme_dir(theme_name), "#{hook_name}.rb")
+        directory = theme_dir(theme_name)
+        hook_file = File.join(directory, "#{hook_name}.rb")
         if File.exist? hook_file
-            puts "Running #{hook_name} hook for #{theme_name}" if verbose
+            puts "Running #{hook_name} hook in #{directory}" if verbose
             load hook_file
         end
     end
 
-    def installed?(theme_name)
-        File.directory?(theme_dir(theme_name))
+    def move_old_theme(old_theme_directory)
+        puts "There was an old-style theme at #{old_theme_directory}" if verbose
+        moved_directory = "#{old_theme_directory}-moved"
+        begin
+            File.rename old_theme_directory, moved_directory
+        rescue Errno::ENOTEMPTY, Errno::EEXIST
+            raise "Tried to move #{old_theme_directory} out of the way, " \
+                "but #{moved_directory} already existed"
+        end
+    end
+
+    def committishes_to_try
+        result = []
+        theme_branch = AlaveteliConfiguration::theme_branch
+        result.push "origin/#{theme_branch}" if theme_branch
+        result.push usage_tag(ALAVETELI_VERSION)
+        hotfix_match = /^(\d+\.\d+\.\d+)(\.\d+)+/.match(ALAVETELI_VERSION)
+        result.push usage_tag(hotfix_match[1]) if hotfix_match
+        result
+    end
+
+    def checkout_best_option(theme_name)
+        theme_directory = theme_dir theme_name
+        all_failed = true
+        committishes_to_try.each do |committish|
+            if Git.committish_exists? theme_directory, committish
+                puts "Checking out #{committish}" if verbose
+                Git.checkout theme_directory, committish
+                all_failed = false
+                break
+            else
+                puts "Failed to find #{committish}; skipping..." if verbose
+            end
+        end
+        puts "Falling to using HEAD instead" if all_failed and verbose
     end
 
     def install_theme(theme_url, verbose, deprecated=false)
+        FileUtils.mkdir_p all_themes_dir
         deprecation_string = deprecated ? " using deprecated THEME_URL" : ""
-        theme_name = File.basename(theme_url, '.git')
+        theme_name = theme_url_to_theme_name theme_url
         puts "Installing theme #{theme_name}#{deprecation_string} from #{theme_url}"
+        # Make sure any uninstall hooks have been run:
         uninstall(theme_name, verbose) if installed?(theme_name)
-        install_theme_using_git(theme_name, theme_url, verbose)
+        theme_directory = theme_dir theme_name
+        # Is there an old-style theme directory there?  If so, move it
+        # out of the way so that there's no risk that work is lost:
+        if File.directory? theme_directory
+            unless Git.non_bare_repository? theme_directory
+                move_old_theme theme_directory
+            end
+        end
+        # If there isn't a directory there already, clone it into place:
+        unless File.directory? theme_directory
+            unless system "git", "clone", theme_url, theme_directory
+                raise "Cloning from #{theme_url} to #{theme_directory} failed"
+            end
+        end
+        # Set the URL for origin in case it has changed, and fetch from there:
+        Git.remote_set_url theme_directory, 'origin', theme_url
+        Git.fetch theme_directory, 'origin'
+        # Check that checking-out a new commit will be safe:
+        unless Git.status_clean theme_directory
+            raise "There were uncommitted changes in #{theme_directory}"
+        end
+        unless Git.is_HEAD_pushed? theme_directory
+            raise "The current work in #{theme_directory} is unpushed"
+        end
+        # Now try to checkout various commits in order of preference:
+        checkout_best_option theme_name
+        # Finally run the install hooks:
         run_hook(theme_name, 'install', verbose)
         run_hook(theme_name, 'post_install', verbose)
     end
@@ -102,4 +131,5 @@ namespace :themes do
             install_theme(AlaveteliConfiguration::theme_url, verbose, deprecated=true)
         end
     end
+
 end
diff --git a/lib/theme.rb b/lib/theme.rb
new file mode 100644
index 000000000..4f03b5d99
--- /dev/null
+++ b/lib/theme.rb
@@ -0,0 +1,3 @@
+def theme_url_to_theme_name(theme_url)
+  File.basename theme_url, '.git'
+end
diff --git a/lib/whatdotheyknow/strip_empty_sessions.rb b/lib/whatdotheyknow/strip_empty_sessions.rb
index e162acf67..6d175ca98 100644
--- a/lib/whatdotheyknow/strip_empty_sessions.rb
+++ b/lib/whatdotheyknow/strip_empty_sessions.rb
@@ -1,9 +1,9 @@
 module WhatDoTheyKnow
-  
+
   class StripEmptySessions
     ENV_SESSION_KEY = "rack.session".freeze
     HTTP_SET_COOKIE = "Set-Cookie".freeze
-    STRIPPABLE_KEYS = [:session_id, :_csrf_token, :locale]
+    STRIPPABLE_KEYS = ['session_id', '_csrf_token', 'locale']
 
     def initialize(app, options = {})
       @app = app
diff --git a/lib/world_foi_websites.rb b/lib/world_foi_websites.rb
index c3f3655df..50976c897 100644
--- a/lib/world_foi_websites.rb
+++ b/lib/world_foi_websites.rb
@@ -53,7 +53,20 @@ class WorldFOIWebsites
                               {:name => "Informace pro Vsechny",
                                   :country_name => "Česká republika",
                                   :country_iso_code => "CZ",
-                                  :url => "http://www.infoprovsechny.cz"}
+                                  :url => "http://www.infoprovsechny.cz"},
+                              {:name => "¿Qué Sabés?",
+                                  :country_name => "Uruguay",
+                                  :country_iso_code => "UY",
+                                  :url => "http://www.quesabes.org/"},
+                              {:name => "Nu Vă Supărați",
+                                  :country_name => "România",
+                                  :country_iso_code => "RO",
+                                  :url => "http://nuvasuparati.info/"},
+                               {:name => "Marsoum41",
+                                  :country_name => "تونس",
+                                  :country_iso_code => "TN",
+                                  :url => "http://www.marsoum41.org"}
+
                               ]
         return world_foi_websites
     end