Add Solr Spatial Search for Django Haystack

Solr has native support of spatial search in the latest release Solr 3.4. However, Django Haystack does not support it yet. Some very helpful discussions about the issue can be found in the Haystack Google group. But, the patch discussed in the post is about JTeam’s SSP plugin, not the Solr native spatial search. I followed the discussion and did similar changes to support this.

Basically. we need two changes.
1. Add a new Solr field type LatLonType into the schema generated by Haystack.
2. Support search parameters for spatial queries such as &fq={!geofilt pt=45.15,-93.85 sfield=store d=5}.

Here are the details of the changes:

Step 1. Add LatLonType

As described in spatial search wiki, spatial fields should be defined as LatLonType. To support this, we need to add a type definition for LatLonType in Solr schema. Haystack generates schemas just as Django rendering HTML. It is based on a template file, haystack/templates/search_configuration/solr.xml. We need to add the following lines in section "types" and section "fields".

<schema name="default" version="1.1">
  <types>
  ...
  <!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
  <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
  <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
  </types>
  <fields>
  ...
  <!-- Type used to index the lat and lon components for the "location" FieldType -->
  <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>
  ...
  </fields>

Field type "location" needs "*_coordinate" fields to store latitude and longitude. Fields "*_coordinate" are of type "tdouble". That’s why we need these 3 lines.

Correspondingly. we need to add a new search field type LocationField in haystack/fields.py as follows. We introduce two constructor parameters model_lat_attr and model_lng_attr to specify the names of latitude and longitude fields in the indexed Django model just as model_attr in SearchField.

class LocationField(SearchField):
    field_type = 'location'

    def __init__(self, model_lat_attr=None, model_lng_attr=None, **kwargs):
        if kwargs.get('faceted') is True:
            raise SearchFieldError("%s can not be faceted." % self.__class__.__name__)
        super(LocationField, self).__init__(**kwargs)
        self.model_lat_attr = model_lat_attr
        self.model_lng_attr = model_lng_attr

    def prepare(self, obj):
        if self.model_lat_attr != None and self.model_lng_attr != None:
            lat = getattr(obj, self.model_lat_attr, None)
            lng = getattr(obj, self.model_lng_attr, None)
            if lat != None and lng != None:
                location = '%s,%s'%(obj.lat, obj.lng)
                return self.convert(location)
        return self.convert(super(LocationField, self).prepare(obj))

    def convert(self, value):
        if value is None:
            return None
        return unicode(value)

Notice the line "field_type = 'location'". It tells Haystack that LocationField is of the location type we defined in the above schema. But to enable this, we also need to add the following line in backends/solr_backend.py. When executing "manage.py build_solr_schema", the following method build_schema will be called.

class SearchBackend(BaseSearchBackend):
    def build_schema(self, fields):
            ...
            elif field_class.field_type == 'location':
                field_data['type'] = 'location'

I don’t like this kind of hard coding change. It is not very object oriented. Ideally method build_schema should automatically use the field_type defined in LocationField and put it in schema. In that way, we do not need to change the Haystack code. Instead we can define the inherited LocationField in our own code. However, this is just the beginning. We have more hard coding changes in step 2.

With the above code changes, we can define search indices for Django models using LocationField. For example,

class Store(models.Model):
    name = models.CharField(max_length=100)
    lat = models.DecimalField(max_digits=10, decimal_places=6)
    lng = models.DecimalField(max_digits=10, decimal_places=6)

class StoreIndex(RealTimeSearchIndex):
    name = CharField(model_attr='name', document=True)
    loc = LocationField(model_lat_attr='lat', model_lng_attr='lng')

site.register(Store, StoreIndex)

Step 2. Add spatial search query support

SearchQuerySet is the user interface in Haystack to query Solr. To pass spatial search query parameters to Solr, we add a method 'spatial' for SearchQuerySet in haystack/query.py.

class SearchQuerySet(object):
    ...
    def spatial(self, **kwargs):
        """Adds spatial search to the query"""
        clone = self._clone()
        clone.query.add_spatial(**kwargs)
        return clone

SearchQuerySet is similar to Django QuerySet, mainly used for chaining queries such as method spatial we just added. The queries are accumulated and evaluated together later. Notice method spatial passes all parameters to method add_spatial, which is defined in haystack/backends/__init__.py as follows.

from haystack.exceptions import SpatialError

class BaseSearchQuery(object):
    def __init__(self, using=DEFAULT_ALIAS):
        ...
        self.spatial_query = {}

    def add_spatial(self, **kwargs):
        if 'lat' not in kwargs or 'lng' not in kwargs or 'd' not in kwargs or 'sfield' not in kwargs:
            raise SpatialError("Spatial queries must contains args lat, lng, d and sfield")
        if 'filter' not in kwargs:
            kwargs['filter'] = 'geofilt'
        self.spatial_query.update(kwargs)

    def _clone(self, klass=None, using=None):
        ...
        clone.spatial_query = self.spatial_query.copy()

As you can see, to do a spatial search, we need to provide at least 4 parameters: lat, lng, d, and sfield as defined in the Solr wiki. We can also specify what spatial filter to use, which can be either geofilt (default) or bbox. For example, the following query matches all items with latitude=45.15, longitude=-93.85, within 5 kilometers sorted by distance in ascending order. The filter in this example is bbox.

SearchQuerySet().spatial(lat=45.15, lng=-93.85, d=3, sfield='loc', filter='bbox').order_by('geodist()')

To make this example work, we need to modify haystack/backends/solr_backend.py as follows, where we construct the Solr query finally.

class SearchBackend(BaseSearchBackend):
    def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
               fields='', highlight=False, facets=None, date_facets=None, query_facets=None,
               narrow_queries=None, spelling_query=None,
               limit_to_registered_models=None, result_class=None,
               spatial_query=None, **kwargs):
        ...
        if spatial_query is not None:
            kwargs['pt'] = '%s,%s'%(spatial_query['lat'], spatial_query['lng'])
            kwargs['sfield'] = spatial_query['sfield']
            kwargs['d'] = spatial_query['d']
            if narrow_queries is None:
                narrow_queries = set()
            narrow_queries.add('{!%s}'% spatial_query['filter'])
        if narrow_queries is not None:
            kwargs['fq'] = list(narrow_queries)

class SearchQuery(BaseSearchQuery):
    def run(self, spelling_query=None):
        ...
        if self.spatial_query:
            kwargs['spatial_query'] = self.spatial_query

The final change we need to add is adding SpatialError in haystack/exceptions.py. The exception is used in the above code.

class SpatialError(HaystackError):
    """Raised when incorrect arguments have been provided for spatial."""
    pass

All code changes in this blog can be found here on GitHub. It is based on Haystack v1.2.5.

One thought on “Add Solr Spatial Search for Django Haystack