diff options
-rw-r--r-- | vendor/plugins/acts_as_xapian/README | 1 | ||||
-rw-r--r-- | vendor/plugins/acts_as_xapian/README.txt | 170 | ||||
-rw-r--r-- | vendor/plugins/acts_as_xapian/lib/acts_as_xapian.rb | 160 |
3 files changed, 174 insertions, 157 deletions
diff --git a/vendor/plugins/acts_as_xapian/README b/vendor/plugins/acts_as_xapian/README deleted file mode 100644 index b851372c8..000000000 --- a/vendor/plugins/acts_as_xapian/README +++ /dev/null @@ -1 +0,0 @@ -See extensive comments at top of lib/acts_as_xapian.rb for documentation. diff --git a/vendor/plugins/acts_as_xapian/README.txt b/vendor/plugins/acts_as_xapian/README.txt new file mode 100644 index 000000000..0ccab02ba --- /dev/null +++ b/vendor/plugins/acts_as_xapian/README.txt @@ -0,0 +1,170 @@ +Contents +======== + +a. Introduction to acts_as_xapian +b. Comparison to acts_as_solr (as on 24 April 2008) +c. Documentation - indexing +d. Documentation - querying + + +a. Introduction to acts_as_xapian +================================= + +Xapian is a full text search engine library, which has Ruby bindings. +acts_as_xapian adds support for it to Rails. It is an alternative to +acts_as_lucene or acts_as_ferret. + +Xapian is an *offline indexing* search library - only one process can have the +Xapian database open for writing at once, and others that try meanwhile are +unceremoniously kicked out. For this reason, acts_as_xapian does not support +immediate writing to the database when your models change. + +Instead, there is a ActsAsXapianJob model which stores which models need +updating or deleting in the search index. A rake task 'xapian:update_index' +then performs the updates since last change. Run it on a cron job, or similar. + +Xapian 1.0.5 and associated Ruby bindings are required. + +Email francis@mysociety.org with patches. + + +b. Comparison to acts_as_solr (as on 24 April 2008) +============================= + +* Offline indexing only mode - which is a minus if you want changes +immediately reflected in the search index, and a plus if you were going to +have to implement your own offline indexing anyway. + +* Collapsing - the equivalent of SQL's "group by". You can specify a field +to collapse on, and only the most relevant result from each value of that +field is returned. Along with a count of how many there are in total. +acts_as_solr doesn't have this. + +* No highlighting - Xapian can't return you text highlighted with a search +query. You can try and make do with TextHelper::highlight (combined with +words_to_highlight below). I found the highlighting in acts_as_solr didn't +really understand the query anyway. + +* Date range searching - maybe this works in acts_as_solr, but I never found +out how. + +* Spelling correction - "did you mean?" built in and just works. + +* Multiple models - acts_as_xapian searches multiple models if you like, +returning them mixed up together by relevancy. This is like multi_solr_search, +only it is the default mode of operation and is properly supported. + +* No daemons - However, if you have more than one web server, you'll need to +work out how to use Xapian's remote backend http://xapian.org/docs/remote.html. + +* One layer - full-powered Xapian is called directly from the Ruby, without +Solr getting in the way whenever you want to use a new feature from Lucene. + +* No Java - an advantage if you're more used to working in the rest of the +open source world. acts_as_xapian, it's pure Ruby and C++. + +* Xapian's awesome email list - the kids over at xapian-discuss are super +helpful. Useful if you need to extend and improve acts_as_xapian. The +Ruby bindings are mature and well maintained as part of Xapian. +http://lists.xapian.org/mailman/listinfo/xapian-discuss + + +c. Documentation - indexing +=========================== + +1. Put acts_as_xapian in your models that need search indexing. + +e.g. acts_as_xapian :texts => [ :name, :short_name ], + :values => [ [ :created_at, 0, "created_at", :date ] ], + :terms => [ [ :variety, 'V', "variety" ] ] + +Options must include: +:texts, an array of fields for indexing with full text search + e.g. :texts => [ :title, :body ] +:values, things which have a range of values for indexing, or for collapsing. + Specify an array quadruple of [ field, identifier, prefix, type ] where + - number is an arbitary numeric identifier for use in the Xapian database + - prefix is the part to use in search queries that goes before the : + - type can be any of :string, :number or :date + e.g. :values => [ [ :created_at, 0, "created_at" ], [ :size, 1, "size"] ] +:terms, things which come after a : in search queries. Specify an array + triple of [ field, char, prefix ] where + - char is an arbitary single upper case char used in the Xapian database + - prefix is the part to use in search queries that goes before the : + e.g. :terms => [ [ :variety, 'V', "variety" ] ] +A 'field' is a symbol referring to either an attribute or a function which +returns the text, date or number to index. Both 'number' and 'char' must be +the same for the same prefix in different models. + +Alternatively, +:instead_index, a field which refers to another model that should be reindexed + instead of this one. + +Options may include: +:eager_load, added as an :include clause when looking up search results in +database +:if, either an attribute or a function which if returns false means the +object isn't indexed + +2. Make and run this database migration to create the ActsAsXapianJob model. + + class ActsAsXapianMigration < ActiveRecord::Migration + def self.up + create_table :acts_as_xapian_jobs do |t| + t.column :model, :string, :null => false + t.column :model_id, :integer, :null => false + + t.column :action, :string, :null => false + end + add_index :acts_as_xapian_jobs, [:model, :model_id], :unique => true + end + + def self.down + remove_table :acts_as_xapian_jobs + end + end + +3. Call 'rake xapian::rebuild_index models="ModelName1 ModelName2"' to build the index +the first time (you must specify all your indexed models). It's put in a +development/test/production dir in acts_as_xapian/xapiandbs. + +4. Then from a cron job or a daemon, or by hand regularly!, call 'rake xapian:update_index' + + +d. Documentation - querying +=========================== + +If you just want to test indexing is working, you'll find this rake task +useful (it has more options, see lib/tasks/xapian.rake) + rake xapian:query models="PublicBody User" query="moo" + +To perform a query call ActsAsXapian::Search.new. This takes in turn: + model_classes - list of models to search, e.g. [PublicBody, InfoRequestEvent] + query_string - Google like syntax, see below +And then a hash of options: + :offset - Offset of first result + :limit - Number of results per page + :sort_by_prefix - Optionally, prefix of value to sort by, otherwise sort by relevance + :sort_by_ascending - Default true, set to false for descending sort + :collapse_by_prefix - Optionally, prefix of value to collapse by (i.e. only return most relevant result from group) + +Google like query syntax is as described in http://www.xapian.org/docs/queryparser.html +Queries can include prefix:value parts, according to what you indexed in the +acts_as_xapian part above. You can also say things like model:InfoRequestEvent +to constrain by model in more complex ways than the :model parameter, or +modelid:InfoRequestEvent-100 to only find one specific object. + +Returns an ActsAsXapian::Search object. Useful methods are: + description - a techy one, to check how the query has been parsed + matches_estimated - a guesstimate at the total number of hits + spelling_correction - the corrected query string if there is a correction, otherwise nil + words_to_highlight - list of words for you to highlight, perhaps with TextHelper::highlight + results - an array of hashes containing: + :model - your Rails model, this is what you most want! + :weight - relevancy measure + :percent - the weight as a %, 0 meaning the item did not match the query at all + :collapse_count - number of results with the same prefix, if you specified collapse_by_prefix + + + + diff --git a/vendor/plugins/acts_as_xapian/lib/acts_as_xapian.rb b/vendor/plugins/acts_as_xapian/lib/acts_as_xapian.rb index 3983f5c19..976a5df19 100644 --- a/vendor/plugins/acts_as_xapian/lib/acts_as_xapian.rb +++ b/vendor/plugins/acts_as_xapian/lib/acts_as_xapian.rb @@ -4,150 +4,13 @@ # Copyright (c) 2008 UK Citizens Online Democracy. All rights reserved. # Email: francis@mysociety.org; WWW: http://www.mysociety.org/ # -# $Id: acts_as_xapian.rb,v 1.20 2008-05-15 10:00:06 francis Exp $ +# $Id: acts_as_xapian.rb,v 1.21 2008-05-15 11:20:47 francis Exp $ # Documentation # ============= # -# Xapian is a full text search engine library, which has Ruby bindings. -# acts_as_xapian adds support for it to Rails. It is an alternative to -# acts_as_lucene or acts_as_ferret. -# -# Xapian is an *offline indexing* search library - only one process can have -# the Xapian database open for writing at once, and others that try meanwhile -# are unceremoniously kicked out. For this reason, acts_as_xapian does not -# support immediate writing to the database when your models change. -# -# Instead, there is a ActsAsXapianJob model which stores which models need -# updating or deleting in the search index. A rake task 'xapian:update_index' -# then performs the updates since last change. Run it on a cron job, or -# similar. -# -# Xapian 1.0.5 and associated Ruby bindings are required. -# -# Email francis@mysociety.org with patches. -# -# -# Comparison to acts_as_solr (as on 24 April 2008) -# ========================== -# -# * Offline indexing only mode - which is a minus if you want changes -# immediately reflected in the search index, and a plus if you were going to -# have to implement your own offline indexing anyway. -# -# * Collapsing - the equivalent of SQL's "group by". You can specify a field -# to collapse on, and only the most relevant result from each value of that -# field is returned. Along with a count of how many there are in total. -# acts_as_solr doesn't have this. -# -# * No highlighting - Xapian can't return you text highlighted with a search -# query. You can try and make do with TextHelper::highlight (combined with -# words_to_highlight below). I found the highlighting in acts_as_solr didn't -# really understand the query anyway. -# -# * Date range searching - maybe this works in acts_as_solr, but I never found -# out how. -# -# * Spelling correction - "did you mean?" built in and just works. -# -# * Multiple models - acts_as_xapian searches multiple models if you like, -# returning them mixed up together by relevancy. This is like multi_solr_search, -# only it is the default mode of operation and is properly supported. -# -# * No daemons - However, if you have more than one web server, you'll need to -# work out how to use Xapian's remote backend http://xapian.org/docs/remote.html. -# -# * One layer - full-powered Xapian is called directly from the Ruby, without -# Solr getting in the way whenever you want to use a new feature from Lucene. -# -# * No Java - an advantage if you're more used to working in the rest of the -# open source world. acts_as_xapian, it's pure Ruby and C++. -# -# * Xapian's awesome email list - the kids over at xapian-discuss are super -# helpful. Useful if you need to extend and improve acts_as_xapian. The -# Ruby bindings are mature and well maintained as part of Xapian. -# http://lists.xapian.org/mailman/listinfo/xapian-discuss -# -# -# Indexing -# ======== -# -# 1. Put acts_as_xapian in your models that need search indexing. -# -# e.g. acts_as_xapian :texts => [ :name, :short_name ], -# :values => [ [ :created_at, 0, "created_at", :date ] ], -# :terms => [ [ :variety, 'V', "variety" ] ] -# -# Options must include: -# :texts, an array of fields for indexing with full text search -# e.g. :texts => [ :title, :body ] -# :values, things which have a range of values for indexing, or for collapsing. -# Specify an array quadruple of [ field, identifier, prefix, type ] where -# - number is an arbitary numeric identifier for use in the Xapian database -# - prefix is the part to use in search queries that goes before the : -# - type can be any of :string, :number or :date -# e.g. :values => [ [ :created_at, 0, "created_at" ], [ :size, 1, "size"] ] -# :terms, things which come after a : in search queries. Specify an array -# triple of [ field, char, prefix ] where -# - char is an arbitary single upper case char used in the Xapian database -# - prefix is the part to use in search queries that goes before the : -# e.g. :terms => [ [ :variety, 'V', "variety" ] ] -# A 'field' is a symbol referring to either an attribute or a function which -# returns the text, date or number to index. Both 'number' and 'char' must be -# the same for the same prefix in different models. -# -# Alternatively, -# :instead_index, a field which refers to another model that should be reindexed -# instead of this one. -# -# Options may include: -# :eager_load, added as an :include clause when looking up search results in -# database -# :if, either an attribute or a function which if returns false means the -# object isn't indexed -# -# 2. Make and run the migration to create the ActsAsXapianJob model, code below -# (search for ActsAsXapianJob). -# -# 3. Call 'rake xapian::rebuild_index models="ModelName1 ModelName2"' to build the index -# the first time (you must specify all your indexed models). It's put in a -# development/test/production dir in acts_as_xapian/xapiandbs. -# -# 4. Then from a cron job or a daemon, or by hand regularly!, call 'rake xapian:update_index' -# -# -# Querying -# ======== -# -# If you just want to test indexing is working, you'll find this rake task -# useful (it has more options, see lib/tasks/xapian.rake) -# rake xapian:query models="PublicBody User" query="moo" -# -# To perform a query call ActsAsXapian::Search.new. This takes in turn: -# model_classes - list of models to search, e.g. [PublicBody, InfoRequestEvent] -# query_string - Google like syntax, see below -# And then a hash of options: -# :offset - Offset of first result -# :limit - Number of results per page -# :sort_by_prefix - Optionally, prefix of value to sort by, otherwise sort by relevance -# :sort_by_ascending - Default true, set to false for descending sort -# :collapse_by_prefix - Optionally, prefix of value to collapse by (i.e. only return most relevant result from group) -# -# Google like query syntax is as described in http://www.xapian.org/docs/queryparser.html -# Queries can include prefix:value parts, according to what you indexed in the -# acts_as_xapian part above. You can also say things like model:InfoRequestEvent -# to constrain by model in more complex ways than the :model parameter, or -# modelid:InfoRequestEvent-100 to only find one specific object. -# -# Returns an ActsAsXapian::Search object. Useful methods are: -# description - a techy one, to check how the query has been parsed -# matches_estimated - a guesstimate at the total number of hits -# spelling_correction - the corrected query string if there is a correction, otherwise nil -# results - an array of hashes containing: -# :model - your Rails model, this is what you most want! -# :weight - relevancy measure -# :percent - the weight as a %, 0 meaning the item did not match the query at all -# :collapse_count - number of results with the same prefix, if you specified collapse_by_prefix +# See ../README.txt for documentation. Please update that file as you +# this code. require 'xapian' @@ -427,22 +290,7 @@ module ActsAsXapian ###################################################################### # Index - # Offline indexing job queue model, create with this migration: - # class ActsAsXapianMigration < ActiveRecord::Migration - # def self.up - # create_table :acts_as_xapian_jobs do |t| - # t.column :model, :string, :null => false - # t.column :model_id, :integer, :null => false - # - # t.column :action, :string, :null => false - # end - # add_index :acts_as_xapian_jobs, [:model, :model_id], :unique => true - # end - # - # def self.down - # remove_table :acts_as_xapian_jobs - # end - # end + # Offline indexing job queue model, create with migration in ../README.txt class ActsAsXapianJob < ActiveRecord::Base end |