| Name |
Space |
Section |
Page |
Version |
Status |
Curator |
Reviewed |
Author(s) |
| MySQL fulltext search in SQLAlchemy with safe query parameters |
Pylons CookBook |
|
MySQL fulltext search in SQLAlchemy with safe query parameters |
1.0 |
Draft |
Graham Higgins |
False |
|
MySQL fulltext search in SQLAlchemy with safe query parameters
I needed to be able to query a table with exact matches for some fields but fuzzy fulltext search on another. This allowed me to search on messages which had words my query did not, and found hits when my query had extra words. Standalone testing showed it worked very well. Getting it working with query parameters which would be immune to SQL Injection attacks took me a while to figure out.
Creating the FULLTEXT Index in websetup.py
I define my tables to SQLAlchemy in models/_init_.py and the ORM classes in models/domain.py.
In websetup.py I use raw SQL to add the fulltext index after the tables are created:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | from sqlalchemy import *
from myapp.models import *
def setup_config(command, filename, section, vars):
app_conf = appconfig('config:' + filename)
if not app_conf.has_key('sqlalchemy.dburi'):
raise KeyError("No sqlalchemy database config found!")
conn = metadata.connect(app_conf\['sqlalchemy.dburi'])
metadata.engine.echo = asbool(app_conf.get("sqlalchemy.echo", False))
metadata.drop_all()
metadata.create_all()
metadata.engine.execute("ALTER TABLE kb ADD FULLTEXT(message)")
session = create_session()
|
The MySQL docs point out that if you only have a couple entries then your FULLTEXT queries will fail because it omits data that occurs in 50% or more of the entries, just like "stop words". It also says that it's much faster to load the table with the index off then create the index rather than loading with the index turned on.
Run query with bound parameters in the controller
In my controller kb.py, I could use the same style of raw SQL as above but I'd be passing in naked user query strings which can break if they have quotes in them, leaving the DB open to SQL injection attacks. Instead, create a query with bound query parameters using the technique outlined in the SQLAlchemy manual section "Using Bind Parameters in Text Blocks". The config/routing.py file matches URLs like ".../kb/:fac/:sev/*text" and I found I had to use the asterisk-text or text with some odd characters didn't match the URL, even when the HTTP query string was URL-encoded.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 | from myapp.lib.base import *
from myapp.lib.database import session_context
from myapp.models import metadata
class KbController(BaseController):
def __before__(self):
self.session = session_context.current
def search(self, fac, sev, text):
t = metadata.engine.text("""
SELECT ROUND(MATCH(message)
AGAINST(:message), 2)
AS score, facility, severity, message,
explanation, solution, significance,
notes1, notes2, notes3, os
FROM kb
WHERE MATCH(message) AGAINST(:message)
AND facility=:facility
AND severity=:severity
LIMIT :limit
""")
c.results = t.execute(message=text,
facility=fac,
severity=sev,
limit=100).fetchall()
c.fac = fac
c.sev = sev
c.text = text
return render_response('kb_search_results.myt')
|
The "MATCH...AGAINST" is the interesting part here. As part of the SELECT clause it returns a relevance rating, which I round to a couple decimal places; in the WHERE clause is where the work is done. The query runs surprisingly fast and results are returned from most to least relevant.