Add Solr Spatial Search for Django Haystack

Solr has native support of spatial search in the latest release Solr 3.4. However, Django Haystack does not support it yet. Some very helpful discussions about the issue can be found in the Haystack Google group. But, the patch discussed in the post is about JTeam’s SSP plugin, not the Solr native spatial search. I followed the discussion and did similar changes to support this.

Basically. we need two changes.
1. Add a new Solr field type LatLonType into the schema generated by Haystack.
2. Support search parameters for spatial queries such as &fq={!geofilt pt=45.15,-93.85 sfield=store d=5}.

Here are the details of the changes:

Step 1. Add LatLonType

As described in spatial search wiki, spatial fields should be defined as LatLonType. To support this, we need to add a type definition for LatLonType in Solr schema. Haystack generates schemas just as Django rendering HTML. It is based on a template file, haystack/templates/search_configuration/solr.xml. We need to add the following lines in section "types" and section "fields".

<schema name="default" version="1.1">
  <types>
  ...
  <!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
  <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
  <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
  </types>
  <fields>
  ...
  <!-- Type used to index the lat and lon components for the "location" FieldType -->
  <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>
  ...
  </fields>

Field type "location" needs "*_coordinate" fields to store latitude and longitude. Fields "*_coordinate" are of type "tdouble". That’s why we need these 3 lines.

Correspondingly. we need to add a new search field type LocationField in haystack/fields.py as follows. We introduce two constructor parameters model_lat_attr and model_lng_attr to specify the names of latitude and longitude fields in the indexed Django model just as model_attr in SearchField.

class LocationField(SearchField):
    field_type = 'location'

    def __init__(self, model_lat_attr=None, model_lng_attr=None, **kwargs):
        if kwargs.get('faceted') is True:
            raise SearchFieldError("%s can not be faceted." % self.__class__.__name__)
        super(LocationField, self).__init__(**kwargs)
        self.model_lat_attr = model_lat_attr
        self.model_lng_attr = model_lng_attr

    def prepare(self, obj):
        if self.model_lat_attr != None and self.model_lng_attr != None:
            lat = getattr(obj, self.model_lat_attr, None)
            lng = getattr(obj, self.model_lng_attr, None)
            if lat != None and lng != None:
                location = '%s,%s'%(obj.lat, obj.lng)
                return self.convert(location)
        return self.convert(super(LocationField, self).prepare(obj))

    def convert(self, value):
        if value is None:
            return None
        return unicode(value)

Notice the line "field_type = 'location'". It tells Haystack that LocationField is of the location type we defined in the above schema. But to enable this, we also need to add the following line in backends/solr_backend.py. When executing "manage.py build_solr_schema", the following method build_schema will be called.

class SearchBackend(BaseSearchBackend):
    def build_schema(self, fields):
            ...
            elif field_class.field_type == 'location':
                field_data['type'] = 'location'

I don’t like this kind of hard coding change. It is not very object oriented. Ideally method build_schema should automatically use the field_type defined in LocationField and put it in schema. In that way, we do not need to change the Haystack code. Instead we can define the inherited LocationField in our own code. However, this is just the beginning. We have more hard coding changes in step 2.

With the above code changes, we can define search indices for Django models using LocationField. For example,

class Store(models.Model):
    name = models.CharField(max_length=100)
    lat = models.DecimalField(max_digits=10, decimal_places=6)
    lng = models.DecimalField(max_digits=10, decimal_places=6)

class StoreIndex(RealTimeSearchIndex):
    name = CharField(model_attr='name', document=True)
    loc = LocationField(model_lat_attr='lat', model_lng_attr='lng')

site.register(Store, StoreIndex)

Step 2. Add spatial search query support

SearchQuerySet is the user interface in Haystack to query Solr. To pass spatial search query parameters to Solr, we add a method 'spatial' for SearchQuerySet in haystack/query.py.

class SearchQuerySet(object):
    ...
    def spatial(self, **kwargs):
        """Adds spatial search to the query"""
        clone = self._clone()
        clone.query.add_spatial(**kwargs)
        return clone

SearchQuerySet is similar to Django QuerySet, mainly used for chaining queries such as method spatial we just added. The queries are accumulated and evaluated together later. Notice method spatial passes all parameters to method add_spatial, which is defined in haystack/backends/__init__.py as follows.

from haystack.exceptions import SpatialError

class BaseSearchQuery(object):
    def __init__(self, using=DEFAULT_ALIAS):
        ...
        self.spatial_query = {}

    def add_spatial(self, **kwargs):
        if 'lat' not in kwargs or 'lng' not in kwargs or 'd' not in kwargs or 'sfield' not in kwargs:
            raise SpatialError("Spatial queries must contains args lat, lng, d and sfield")
        if 'filter' not in kwargs:
            kwargs['filter'] = 'geofilt'
        self.spatial_query.update(kwargs)

    def _clone(self, klass=None, using=None):
        ...
        clone.spatial_query = self.spatial_query.copy()

As you can see, to do a spatial search, we need to provide at least 4 parameters: lat, lng, d, and sfield as defined in the Solr wiki. We can also specify what spatial filter to use, which can be either geofilt (default) or bbox. For example, the following query matches all items with latitude=45.15, longitude=-93.85, within 5 kilometers sorted by distance in ascending order. The filter in this example is bbox.

SearchQuerySet().spatial(lat=45.15, lng=-93.85, d=3, sfield='loc', filter='bbox').order_by('geodist()')

To make this example work, we need to modify haystack/backends/solr_backend.py as follows, where we construct the Solr query finally.

class SearchBackend(BaseSearchBackend):
    def search(self, query_string, sort_by=None, start_offset=0, end_offset=None,
               fields='', highlight=False, facets=None, date_facets=None, query_facets=None,
               narrow_queries=None, spelling_query=None,
               limit_to_registered_models=None, result_class=None,
               spatial_query=None, **kwargs):
        ...
        if spatial_query is not None:
            kwargs['pt'] = '%s,%s'%(spatial_query['lat'], spatial_query['lng'])
            kwargs['sfield'] = spatial_query['sfield']
            kwargs['d'] = spatial_query['d']
            if narrow_queries is None:
                narrow_queries = set()
            narrow_queries.add('{!%s}'% spatial_query['filter'])
        if narrow_queries is not None:
            kwargs['fq'] = list(narrow_queries)

class SearchQuery(BaseSearchQuery):
    def run(self, spelling_query=None):
        ...
        if self.spatial_query:
            kwargs['spatial_query'] = self.spatial_query

The final change we need to add is adding SpatialError in haystack/exceptions.py. The exception is used in the above code.

class SpatialError(HaystackError):
    """Raised when incorrect arguments have been provided for spatial."""
    pass

All code changes in this blog can be found here on GitHub. It is based on Haystack v1.2.5.

Install WinRAR on Fedora

Simply put, to install WinRar on Fedora:

rpm -ivh http://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-stable.noarch.rpm
yum install unrar

The first command adds rpmfusion into your yum repository. Then, the yum command installs unrar from rpmfusion. Now you can extract rar files by ‘unrar e filename’. During the installation of unrar, yum may prompt you if it is OK to import GPG key from RPM Fusion nonfree repository? Type y to add it.

WinRAR is provided by RARLAB. Its download page points to rpm.livna.org for Linux version unrar. However, livna has been merged to rpmfusion. Thus, you should add rpmfusion into your yum repository instead. To do that, either follow the section “command line setup using rpm” on its config page or only add the nonfree repository (WinRAR is not open sourced) as I did above.

Install Ruby On Rails and Django on Fedora

Some notes about installing ruby on rails and django on fedora 11.

To install ROR on Fedora:

yum install mysql
yum install mysql-devel
yum install ruby
yum install rubygems
gem install rails
gem install mysql

Later, I found the gem installed by yum is 1.3.1, which is lower than required by rake. So I downloaded rubygems-1.3.5.tgz from http://rubyforge.org/frs/?group_id=126, and installed it by

> ruby setup.rb

To test the installation of mysql, write a ruby script with

require "mysql"

But I got this error,

mysqltestst.rb:1:in `require': no such file to load -- mysql (LoadError)

To fix it, add environment variable RUBYOPT=rubygems

Another way is as follows

require "rubygems"
require "mysql"

To install Django on Fedora:

yum install Django
yum install MySQL-python.i586

Weird Boot Error: virbr0 starting userspace STP failed

I suddenly could not boot my Linux box (Fedora 11) today. It showed some error messages as follows,

virbr0: starting userspace STP failed, starting kernel STP
ADDRCONF(NETDEV_UP): eth0: link is not ready
e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
0000:00:19.0: eth0: 10/100 speed: disabling TSO
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

In the beginning, I thought it is related to network or libvirtd. But I did not solve the problem by turning off libvirtd. I did not change anything in my last boot. But I updated Fedora packages using Software Update. However, I did not pay attention what I updated. By searching on web, I found some people talked about this issue after upgrading video drivers. I felt it may be the same reason.

My video card is Nvidia. I used nvidia linux driver (NVIDIA-Linux-x86-190.42-pkg1.run) downloaded from nvidia web site. To give it a try, I decided to reinstall it. So I boot at runlevel 3, and run the driver installer NVIDIA-Linux-x86-190.42-pkg1.run again in text mode. It rebuilt the driver and reset xorg.conf. Then, the problem was fixed.

After that, I found that there are actually a lot of errors in /var/log/Xorg.?.log such as

NVIDIA: Failed to load the NVIDIA kernel module. Please check
your system's kernel log for additional error messages.

Also, /var/log/messages log has error message as follows.

WARNING: GdmLocalDisplayFactory: maximum number of X display failures
reached: check X server log for errors
init:prefdm main process terminated with status 1

So this weird boot error is really caused by video driver. The errors related to virbr0 were just coincident errors that showed up after X failed to start. I checked my logs. Such errors have been there for every boot. I just did not see them. Now I turn off libvirtd service.

FYI, to boot into runlevel 3 in this hanging case. Select the linux image entry to boot in GRUB, press ‘e’. In next screen, move to the kernel line, press ‘e’. In my case, it ends up with ‘rhgb quiet’, which is redhat graphic boot. Delete ‘rhgb, quite’, replace it with ‘3’. Then press ‘b’. It will boot to runlevel 3 without starting those daemons or X window. Just log in as root and reinstall the video driver. If runlevel 3 does not work, try single user mode, i.e. replace ‘rhgb quite’ with ‘single’.

Ruby MySQL adapter on Windows

It is very easy to install Ruby MySQL adapter using gem on Windows.

gem install mysql

But I found Ruby MySQL adapter does not work with MySQL 5.1 on my Windows box. When Ruby executed SQLs or Rails server got requests, I got some errors like this

C:/Ruby/lib/ruby/gems/1.8/gems/activerecord-2.3.5/lib/active_record/
connection_adapters/abstract_adapter.rb:39: [BUG] Segmentation fault
ruby 1.8.6 (2009-08-04) [i386-mingw32]

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

It seems the reason is that this version of mysql gem does not work with MySQL 5.1 lib. I saw some suggestions on Web to downgrade MySQL to 5.0 to solve the problem. However, I don’t want to downgrade MySQL. It turns out the solution is very simple. Download MySQL 5.0 noinstall version mysql-noinstall-5.0.89-win32.zip from http://dev.mysql.com/downloads/mysql/5.0.html. Extract libmysql.dll and copy it to C:Rubybin. Then, the problem is solved. I am still running MySQL 5.1. But Ruby uses this MySQL 5.0 version dll.

My environment:  Ruby 1.8.6. Gem 1.3.5. MySQL gem version 2.8.1. Rails version is 2.3.5. Windows XP and MySQL 5.1.

Install MySQL for Python (MySQLdb) on Windows

It took me quite a while to figure out how to build and install MySQL for Python (MySQLdb) on Windows. I’d better write it down.

There is no binary distribution of MySQLdb for Python 2.6 on Windows. I have to build it from the source. My environment is Windows XP. MySQL 5.1. Python 2.6 (windows version, not cygwin), and MySQL-python-1.2.3c1. Also, I have Microsoft Visual C++ 2008 Express Edition (Microsoft Visual Studio 9.0) installed, which is required to compile the C code in MySQL-python.

First of all, install Python setuptools, if you haven’t installed it. It is required in MySQL-python setup.py. I also added C:Python26Scripts into environment PATH, where easy_install is installed.

Then, make sure you have MySQL Developer Components installed. Download MySQL msi installer version, select “Developer Components” in Custom Setup. It will install C:Program FilesMySQLMySQL Server 5.1include, libdebug and libopt for you. They are not installed by default.

Uncompress MySQL-python-1.2.3c1.tar.gz into a directory. Open a command window (cmd), change to the directory.

Try to run,

setup.py build

I got this error in setup_windows.py:

in get_config
serverKey = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, options['registry_key'])
WindowsError: [Error 2] The system cannot find the file specified

So I edited site.cfg, changed the MySQL version from 5.0 to 5.1 (since I am using 5.1)

registry_key = SOFTWAREMySQL ABMySQL Server 5.1

You can use regedit to check which version you are using. It is specified at: HKEY_LOCAL_MACHINE/SOFTWARE/MySQL AB/MySQL Server 5.1.

Now try to build it again. I got this error:

buildtemp.win32-2.6Release_mysql.pyd.manifest : general error c1010070: Failed to load and parse the manifest. The system cannot find  the file specified.
error: command ‘mt.exe’ failed with exit status 31

To fix this problem, go to C:Python26Libdistutils, edit msvc9compiler.py, search for ‘MANIFESTFILE’, you will find the following line

ld_args.append(‘/MANIFESTFILE:’ + temp_manifest)

Then append the following line after the above line,

ld_args.append(‘/MANIFEST’)

Then go back to run “setup.py build”, it will succeed. Finally, run

setup.py install

Test it in python

>>> import MySQLdb
>>>

Cannot open libstdc++.so.5 on fedora 11

I just upgraded to fedora 11. When I installed java EE SDK, I got the following error

> ./java_ee_sdk-5_07-linux-nojdk.bin
./java_ee_sdk-5_07-linux-nojdk.bin: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory

The default libstdc++ on fedora is libstdc++.so.6 installed from libstdc++.i586. To solve the problem, install compat-libstdc++-33. For fedora 11, the package can be installed by

> yum install compat-libstdc++-33-3.2.3-66.i586

How to read input files in maven junit

Sometimes we need to put unit test data into plain text files. For example, assume we want to test a parser using a json string as the test data. If we put the json string as a constant string in the java code, we end up with a lot of error-prone escaping characters. In that case, we may want to put the test string into a file as a resource and read the string from the file in junit.

In maven, we need to put the resource file in src/test/resources. Let me create a demo from scratch.

> mvn archetype:create -DgroupId=org.fuyun -DartifactId=junitresdemo
> find .
.
./junitresdemo
./junitresdemo/pom.xml
./junitresdemo/src
./junitresdemo/src/test
./junitresdemo/src/test/java
./junitresdemo/src/test/java/org
./junitresdemo/src/test/java/org/fuyun
./junitresdemo/src/test/java/org/fuyun/AppTest.java
./junitresdemo/src/main
./junitresdemo/src/main/java
./junitresdemo/src/main/java/org
./junitresdemo/src/main/java/org/fuyun
./junitresdemo/src/main/java/org/fuyun/App.java

> cd junitresdemo
> mkdir -p src/test/resources
> vi src/test/resources/myres.txt
> cat src/test/resources/myres.txt
test1=testdata

As you can see, I put my test data as a key-value pair in a property file. If your test data contains special characters such as escape char, you’d better handle file reading by yourself instead of using Properties as I am going to show.

Then I modify the automatically generated test code src/test/java/org/fuyun/AppTest.java as follows.

package org.fuyun;

import junit.framework.Test;
import junit.framework.TestCase;
import junit.framework.TestSuite;

import java.io.InputStream;
import java.util.Properties;

public class AppTest extends TestCase {
    public AppTest( String testName ) {
        super( testName );
    }

    public static Test suite() {
        return new TestSuite( AppTest.class );
    }

    public void testApp() throws java.io.IOException {
        InputStream in =
            getClass().getClassLoader().getResourceAsStream("myres.txt");
        Properties p = new Properties();
        p.load(in);
        String mystr = p.getProperty("test1");
        assertEquals("testdata", mystr);
    }
}

To read the property file, we need to use getResourceAsStream. Actually this is why I want to write this blog. If you search on web, you may find that people talk about you can load the file using Class.getResourceAsStream(). So, I am supposed to write the line as

        InputStream in = getClass().getResourceAsStream("myres.txt");

It can compile. But the test will fail. The InputStream variable in will be null, i.e., it cannot find myres.txt. Why? Why do we have to use the method defined in ClassLoader?

The difference between Class.getResourceAsStream and ClassLoader.getResourceAsStream is that Class.getResourceAsStream attempts to first resolve the file name by appending the package prefix (org/fuyun/) if the file name is not an absolute path, otherwise removes the leading “/” if the path is absolute. Then, it calls the ClassLoader’s getResourceAsStream to load the resolved file name. This is documented here.

For example, if I do the following hack, the test will pass temporarily.

> mv target/test-classes/myres.txt target/test-classes/org/fuyun/.
> mvn test

But to really fix it, we should revise the line by adding a leading “/” in the file name as follows.

        InputStream in = getClass().getResourceAsStream("/myres.txt");

On the other hand, if you use ClassLoader.getResourceAsStream, the leading “/” will make it unable to find the file.

How to convert an integer to base64 in Python

I found  a bunch of solutions on web and noticed it may lead to entirely different results if you ignore a few assumptions.

First of all, the input of base64 algorithm is an array of bytes, which is a string in Python. Thus, before converting an integer to base64, you need to convert the integer to an array of bytes. To do so, you need to first decide the byte order: big endian or little endian. The best solution is using Python module struct. It can convert an integer to binary format in either byte order. Also, it can handle signed integers, etc.

I am using unsigned long long, which has 64 bits,  and little endian,  the simplest solution could be:

>>>import struct

>>> n = 12345

>>> struct.pack(‘<Q’, n).encode(“base64″).strip()
‘OTAAAAAAAAA=’

However, the reason I need to use base64 here is that base64 is more compact than the decimal format of integers.  But the above example has a lot of A’s, which are caused by zeros padded in the unsigned long long. Also, I am using encode method provided by string to generate base64 in the above example. A better base64 support is module base64, which supports url safe version of base64.

The following example can solve these problems. Also, I remove the padding =’s here.


import base64
import struct

def encode(n):
  data = struct.pack('<Q', n).rstrip('x00')
  if len(data)==0:
    data = 'x00'
  s = base64.urlsafe_b64encode(data).rstrip('=')
  return s

def decode(s):
  data = base64.urlsafe_b64decode(s + '==')
  n = struct.unpack('<Q', data + 'x00'* (8-len(data)) )
  return n[0]

Notice that when stripping zeros, how to handle the integer zero itself? If we simply remove all tailing zeros, we will end up with an empty string here. Thus, I keep a byte zero if the integer is zero. In decode, we have to pad zeros to variable data to 8 bytes for unpacking it as an unsigned long long. Notice the padding is in little endian order too.

Run some tests:

>>> print encode(0)
AA

>>> print decode('AA')
0

>>> print encode(12345)
OTA

>>> print decode('OTA')
12345

The code is for Python 2.5.

Finally, an interesting question. Is that true any string constructed by base64 characters is a valid base64 string?  See the following example,

>>>print decode('100')
19927

>>>print decode('101')
19927

>>>print decode('102')
19927

>>>print decode('103')
19927

The results are the same. Why? I would say only 100 is a valid base64 string. But python base64 decoder can tolerate the other 3. We know that 4 base64 chars (24 bits) represent 3 bytes. If we have only 3 base64 chars (18 bits) here, it can only represent 2 bytes (16 bits). Thus, the least significant 2 bits of the 3 chars (“101″-“102″) are ignored by base64 decoder.

glassfish errors in netbeans

I don’t use IDE since I am used to editing in vim. I tried to adopt eclipse for java work several times. But I just could not keep using it. However, I recently started to see if I can adopt netbeans for java ee development. My environment is netbeans + glassfish + maven.

I experienced several problems in the beginning

(1) NetBeans: No suitable Deployment Server is defined for the project or globally.

I found the solution here: http://wiki.netbeans.org/JAXWSNB6Maven2GlassFishV2

In Netbeans:

  1. Right-click the project and select Properties. Navigate to the Run tab.
  2. In the Server field select GlassFish V2

(2) netbeans SEC5046: Audit: Authentication refused for [admin].

Why do I see this error while I can still successfully deploy my application to glassfish from netbeans?

I found the solution here http://forums.java.net/jive/thread.jspa?threadID=35551

Just simply remove ~/.asadminpass