I’ve just installed Sphider, a free PHP search engine, on this very website. I thought that Wordpress’ search function lacked a lot of features like indexing content outside the blog itself, advanced search capabilities, etc.
I used to use phpDig a lot but it’s been a long time since an update has been made available. While it’s (or used to be) a very good search engine, it was kind of tricky to get it back on track if it got messed up.
The installation is pretty easy so I won’t go into details about this here. I got this thing working within 10 minutes and I gotta say it can crawl a site really fast. It’s even better if you use the command line rather than your browser to initiate the crawling process.
Sphider also has the capability of indexing Acrobat Reader files (.pdf) as well as Microsoft Office documents (.doc) if you have the right converters installed. To convert PDF documents to text on Linux, you will need the pdftotext utility bundled with Xpdf.
Unfortunately I found out that the catdoc project, the MS Word to text converter, has been dead for quite a while and that even though you can still find it on the web, it won’t be able to convert documents created with Office XP and 2003. I know that there’s a command line utility that comes with OpenOffice but since I don’t have X installed, I couldn’t try. And I don’t plan to install OpenOffice on my server just for that purpose.
If you find a good text converter for MS Word documents, let us know!
2 responses so far ↓
1. Response by : Scott Kingsley Clark on Apr 7, 2009 at 1:16 pm
Hey, check out the Sphider for WordPress plugin, which might make it easier for you to manage your Sphider within WordPress.
http://wordpress.org/extend/plugins/sphider/
2. Response by : Stephane Brault on Apr 27, 2009 at 7:08 am
I like Sphider Plus more than Sphider but Sphider Plus is not free anymore.
Leave a Comment