watching the world go by

PHP & Microsoft Index Server

Sunday 1st October, 2006

One of the main sites I maintain at work runs on a Windows IIS Server, although it is written totally in PHP. One issue most people have with the site is the sheer volume of documents on there, they are at least categories into folders on the site however it's still pretty difficult to find specific documents, and there's a distinct lack of usable document searching software which I could locate with a cheap price tag. Fortunately, Microsoft Windows ships with something called Index Server which is a much under-rated technology. Essentially Index Server is able to index certain document types and make them searchable in a free-text search engine, a little like the google search which finds PDF files.

Searching Index Server in PHP is pretty simple, there are a few tutorials out there however finding information about the query language of Index Server can be a little difficult, I have barly skimmed the surface of what is possible however some examples follow.

Querying Index Server is possible if you run the Win32 version of PHP as you need to create a COM object containing Index Server functionality, the following assumes you have already created a Catalog called "myIndex" which contains a number of dumcnets

// Create com object to point to index server
$Q = new COM("ixsso.Query");
$util = new COM("ixsso.util");     

// Point to catalog
$Q->Catalog = "myIndex";

// Specify the columns 
$Q->Columns = "filename, rank, vpath, path, DocTitle";

// Define a query
$Q->Query = "Holiday Form";

// Specify we want to sort by relevance rank
$Q->SortBy = "rank[d]";

// Create a RecordSet containing the results 
$fileRS = $Q->CreateRecordSet("nonsequential");

if(!$fileRS->RecordCount == 0) 
{	
   $fileRS->MoveFirst();
   while($i < $fileRS->RecordCount) 
   {
      // Output rank and filename
      Echo $fileRS->fields['rank']->value . ' - ' . 
         $fileRS->fields['filename']->value
      $fileRS->MoveNext();
      $i++;
   }
}

The query language is pretty powerful and there is technical documentation about the Index Server Query Language provided by Microsoft at MSDN, however I tend to use 1 of 3 query types from a web query

Exact Phrase

$Q->Query = "Holiday Form";
All Keywords
$Q->Query = "volvo AND price";
Any Keywords
$Q->Query = "volvo OR price";

Some other useful information about Index Server. By default it is quite happy to index Microsoft Office documents (XLS, DOC, PPT) however there is an iFilter plugin provided by Adobe to index PDF files.

Also Microsoft provide an application called Microsoft Office Document Imaging with recent version of Office, this application can convert images into a searchable format using Optical Character Recogniion (OCR). It only works with TIFF or MDI (Microsoft TIFF format) files, however you can quite easily OCR an entire folder of images using a simple batch file

FOR %%i in (*.tif) DO mspview -o %%i
FOR %%i in (*.tiff) DO mspview -o %%i
FOR %%i in (*.mdi) DO mspview -o %%i

Sources :
http://www.dynamicajax.com

Add a comment | No comments | View entry