Latest Version: 0.9.6.2
  Dashboard > Pylons Cookbook > ... > Database > MySQL fulltext search in SQLAlchemy with safe query parameters
  Pylons Cookbook Log In | Sign Up   View a printable version of the current page.  
  MySQL fulltext search in SQLAlchemy with safe query parameters
Added by Graham Higgins, last edited by James Gardner on Apr 07, 2007  (view change)
Labels: 
(None)

Name Space Section Page Version Status Curator Reviewed Author(s)
MySQL fulltext search in SQLAlchemy with safe query parameters Pylons CookBook   MySQL fulltext search in SQLAlchemy with safe query parameters 1.0 Draft Graham Higgins False  

MySQL fulltext search in SQLAlchemy with safe query parameters

I needed to be able to query a table with exact matches for some fields but fuzzy fulltext search on another. This allowed me to search on messages which had words my query did not, and found hits when my query had extra words. Standalone testing showed it worked very well. Getting it working with query parameters which would be immune to SQL Injection attacks took me a while to figure out.

Creating the FULLTEXT Index in websetup.py

I define my tables to SQLAlchemy in models/_init_.py and the ORM classes in models/domain.py.

In websetup.py I use raw SQL to add the fulltext index after the tables are created:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from sqlalchemy import *
from myapp.models import *
    
def setup_config(command, filename, section, vars):
    app_conf = appconfig('config:' + filename)
    if not app_conf.has_key('sqlalchemy.dburi'):
        raise KeyError("No sqlalchemy database config found!")
    conn = metadata.connect(app_conf\['sqlalchemy.dburi'])
    metadata.engine.echo = asbool(app_conf.get("sqlalchemy.echo", False))  
    metadata.drop_all()
    metadata.create_all()
    
    metadata.engine.execute("ALTER TABLE kb ADD FULLTEXT(message)")
    
    session = create_session()

The MySQL docs point out that if you only have a couple entries then your FULLTEXT queries will fail because it omits data that occurs in 50% or more of the entries, just like "stop words". It also says that it's much faster to load the table with the index off then create the index rather than loading with the index turned on.

Run query with bound parameters in the controller

In my controller kb.py, I could use the same style of raw SQL as above but I'd be passing in naked user query strings which can break if they have quotes in them, leaving the DB open to SQL injection attacks. Instead, create a query with bound query parameters using the technique outlined in the SQLAlchemy manual section "Using Bind Parameters in Text Blocks". The config/routing.py file matches URLs like ".../kb/:fac/:sev/*text" and I found I had to use the asterisk-text or text with some odd characters didn't match the URL, even when the HTTP query string was URL-encoded.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from myapp.lib.base import *
from myapp.lib.database import session_context
from myapp.models import metadata
    
class KbController(BaseController):
    
    def __before__(self):
        self.session = session_context.current
    
    def search(self, fac, sev, text):
       t = metadata.engine.text("""
        SELECT ROUND(MATCH(message) 
            AGAINST(:message), 2) 
            AS score,  facility, severity, message,
             explanation, solution, significance,
             notes1, notes2, notes3, os
        FROM kb
        WHERE MATCH(message) AGAINST(:message)
          AND facility=:facility
          AND severity=:severity
        LIMIT :limit
        """)
        c.results = t.execute(message=text, 
                              facility=fac, 
                              severity=sev, 
                              limit=100).fetchall()
        c.fac = fac
        c.sev = sev
        c.text = text
	return render_response('kb_search_results.myt')

The "MATCH...AGAINST" is the interesting part here. As part of the SELECT clause it returns a relevance rating, which I round to a couple decimal places; in the WHERE clause is where the work is done. The query runs surprisingly fast and results are returned from most to least relevant.

Site running on a free Atlassian Confluence Open Source Project License granted to Pylons. Evaluate Confluence today.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.3.3 Build:#645 Feb 13, 2007) - Bug/feature request - Contact Administrators
Top