<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: A Stemming Analyzer for Zend&#8217;s PHP Lucene</title>
	<atom:link href="http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/feed/" rel="self" type="application/rss+xml" />
	<link>http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/</link>
	<description>One programmer's formatted output stream</description>
	<lastBuildDate>Fri, 22 Jan 2010 08:20:25 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Kenny Katzgrau</title>
		<link>http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/comment-page-1/#comment-236</link>
		<dc:creator>Kenny Katzgrau</dc:creator>
		<pubDate>Tue, 22 Dec 2009 05:29:57 +0000</pubDate>
		<guid isPermaLink="false">http://katzgrau.simplesample.org/?p=10#comment-236</guid>
		<description>@Mike,

Normally, Lucene stems all docs as they go into the index. During searches, Lucene stems any words in the search phrase.

Really, that&#039;s the extent of word stemming.

I think I may have an idea of what&#039;s going on. Since all documents which are added to the search results are scored, that score is normalized to a value between 0 and 1. I&#039;m thinking that in search #1, since you have an exact match between the search phrase and the stemmed word, the score is 1.00 while other matches get a very low score (and arent added to the final result list). In fact, &quot;mortage&quot; and &quot;mortga&quot; are considered completely different words in the post-analysis operations.

I could be wrong, but I think it makes sense. Your situation might be an interesting edge case in the StandardAnalyzer search.

And I think &quot;mortgagee&quot; to &quot;mortgage&quot; might be incorrect, but I&#039;m not sure of whether the Porter Stemmer was ever considered anything other &quot;the best available,&quot; so I guess we&#039;ll have to settle :)</description>
		<content:encoded><![CDATA[<p>@Mike,</p>
<p>Normally, Lucene stems all docs as they go into the index. During searches, Lucene stems any words in the search phrase.</p>
<p>Really, that&#8217;s the extent of word stemming.</p>
<p>I think I may have an idea of what&#8217;s going on. Since all documents which are added to the search results are scored, that score is normalized to a value between 0 and 1. I&#8217;m thinking that in search #1, since you have an exact match between the search phrase and the stemmed word, the score is 1.00 while other matches get a very low score (and arent added to the final result list). In fact, &#8220;mortage&#8221; and &#8220;mortga&#8221; are considered completely different words in the post-analysis operations.</p>
<p>I could be wrong, but I think it makes sense. Your situation might be an interesting edge case in the StandardAnalyzer search.</p>
<p>And I think &#8220;mortgagee&#8221; to &#8220;mortgage&#8221; might be incorrect, but I&#8217;m not sure of whether the Porter Stemmer was ever considered anything other &#8220;the best available,&#8221; so I guess we&#8217;ll have to settle <img src='http://codefury.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike</title>
		<link>http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/comment-page-1/#comment-234</link>
		<dc:creator>Mike</dc:creator>
		<pubDate>Mon, 21 Dec 2009 20:41:56 +0000</pubDate>
		<guid isPermaLink="false">http://katzgrau.simplesample.org/?p=10#comment-234</guid>
		<description>&#039;mortgage&#039; and &#039;mortgages&#039; stem down to &#039;mortgag&#039; (correct)

&#039;mortgagee&#039; (as in person that borrows a mortgage) stems down to &#039;mortgage&#039;, I was expecting &#039;mortgag&#039;

Is this a bug or correct behavior?  Here&#039;s more to the story (I am using zend_search_lucene):

-----------------------

I&#039;m using Porter Stemmer to stem the words, and here&#039;s a problem I&#039;m running into:

Word &quot;mortgage&quot; is correctly stemmed to &quot;mortgag&quot; Word &quot;mortgagee&quot; is (arguably incorrectly) stemmed to &quot;mortgage&quot;

There are approximately 100 documents with the word &quot;mortgage&quot; There is 1 document with word &quot;mortgagee&quot;

When I build an index without putting &quot;mortgagee&quot; in any documents, everything works fine: searching for &quot;mortgage&quot; or &quot;mortgages&quot; or &quot;mortgag&quot; returns all 100 documents.

When I build an index and one of the documents contains &quot;mortgagee&quot;, searching the index for &quot;mortgage&quot; only returns a single document with &quot;mortgagee&quot; (which was stemmed down to &quot;mortgage&quot;). However, searching for &quot;mortgag&quot; or &quot;mortgages&quot; returns all 100 documents.

The only logical conclusion I can make from this problem is lucene first searches for the pre-stemmed word, and if it doesn&#039;t find any results, it continues to search for the stemmed word. Thus, when searching for &#039;mortgage&#039;, it first finds the &#039;mortgage&#039; that was stemmed from &#039;mortgagee&#039; and stops searching. Is this the correct behavior, or is it a bug?</description>
		<content:encoded><![CDATA[<p>&#8216;mortgage&#8217; and &#8216;mortgages&#8217; stem down to &#8216;mortgag&#8217; (correct)</p>
<p>&#8216;mortgagee&#8217; (as in person that borrows a mortgage) stems down to &#8216;mortgage&#8217;, I was expecting &#8216;mortgag&#8217;</p>
<p>Is this a bug or correct behavior?  Here&#8217;s more to the story (I am using zend_search_lucene):</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>I&#8217;m using Porter Stemmer to stem the words, and here&#8217;s a problem I&#8217;m running into:</p>
<p>Word &#8220;mortgage&#8221; is correctly stemmed to &#8220;mortgag&#8221; Word &#8220;mortgagee&#8221; is (arguably incorrectly) stemmed to &#8220;mortgage&#8221;</p>
<p>There are approximately 100 documents with the word &#8220;mortgage&#8221; There is 1 document with word &#8220;mortgagee&#8221;</p>
<p>When I build an index without putting &#8220;mortgagee&#8221; in any documents, everything works fine: searching for &#8220;mortgage&#8221; or &#8220;mortgages&#8221; or &#8220;mortgag&#8221; returns all 100 documents.</p>
<p>When I build an index and one of the documents contains &#8220;mortgagee&#8221;, searching the index for &#8220;mortgage&#8221; only returns a single document with &#8220;mortgagee&#8221; (which was stemmed down to &#8220;mortgage&#8221;). However, searching for &#8220;mortgag&#8221; or &#8220;mortgages&#8221; returns all 100 documents.</p>
<p>The only logical conclusion I can make from this problem is lucene first searches for the pre-stemmed word, and if it doesn&#8217;t find any results, it continues to search for the stemmed word. Thus, when searching for &#8216;mortgage&#8217;, it first finds the &#8216;mortgage&#8217; that was stemmed from &#8216;mortgagee&#8217; and stops searching. Is this the correct behavior, or is it a bug?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kamal</title>
		<link>http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/comment-page-1/#comment-158</link>
		<dc:creator>Kamal</dc:creator>
		<pubDate>Thu, 21 May 2009 05:42:49 +0000</pubDate>
		<guid isPermaLink="false">http://katzgrau.simplesample.org/?p=10#comment-158</guid>
		<description>Hi, I was trying to add more filters then the one you have already provided in 
standardanalyzer-1.0.0b/StandardAnalyzer/Analyzer/Standard/English.php between line 37-39

        $this-&gt;addFilter(new Zend_Search_Lucene_Analysis_TokenFilter_LowerCaseUtf8());
        $this-&gt;addFilter(new Zend_Search_Lucene_Analysis_TokenFilter_StopWords($this-&gt;_stopWords));
        $this-&gt;addFilter(new StandardAnalyzer_Analysis_TokenFilter_EnglishStemmer());	

I am only able to add filters whic are of type Zend_Search_Lucene_Analysis_TokenFilter. Lets say if I try to add            $this-&gt;addFilter(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive) it doesn&#039;t work because its not of type Zend_Search_Lucene_Analysis_TokenFilter. Can you please have extra functionality to cater for this or if you have any solution can you post it or email me.

Thanks alot for your help in advance</description>
		<content:encoded><![CDATA[<p>Hi, I was trying to add more filters then the one you have already provided in<br />
standardanalyzer-1.0.0b/StandardAnalyzer/Analyzer/Standard/English.php between line 37-39</p>
<p>        $this-&gt;addFilter(new Zend_Search_Lucene_Analysis_TokenFilter_LowerCaseUtf8());<br />
        $this-&gt;addFilter(new Zend_Search_Lucene_Analysis_TokenFilter_StopWords($this-&gt;_stopWords));<br />
        $this-&gt;addFilter(new StandardAnalyzer_Analysis_TokenFilter_EnglishStemmer());	</p>
<p>I am only able to add filters whic are of type Zend_Search_Lucene_Analysis_TokenFilter. Lets say if I try to add            $this-&gt;addFilter(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive) it doesn&#8217;t work because its not of type Zend_Search_Lucene_Analysis_TokenFilter. Can you please have extra functionality to cater for this or if you have any solution can you post it or email me.</p>
<p>Thanks alot for your help in advance</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dayg</title>
		<link>http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/comment-page-1/#comment-144</link>
		<dc:creator>dayg</dc:creator>
		<pubDate>Thu, 12 Feb 2009 05:42:29 +0000</pubDate>
		<guid isPermaLink="false">http://katzgrau.simplesample.org/?p=10#comment-144</guid>
		<description>This is exactly what I was looking for, a Porter Stemmer based analyzer for Zend Lucene.

Thank you very much. :)</description>
		<content:encoded><![CDATA[<p>This is exactly what I was looking for, a Porter Stemmer based analyzer for Zend Lucene.</p>
<p>Thank you very much. <img src='http://codefury.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peleg Michaeli</title>
		<link>http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/comment-page-1/#comment-125</link>
		<dc:creator>Peleg Michaeli</dc:creator>
		<pubDate>Wed, 10 Dec 2008 21:00:53 +0000</pubDate>
		<guid isPermaLink="false">http://katzgrau.simplesample.org/?p=10#comment-125</guid>
		<description>That&#039;s weird -- after posting this comment, my last comment appeared again.
Sorry for bothering you for no reason.

Peleg.</description>
		<content:encoded><![CDATA[<p>That&#8217;s weird &#8212; after posting this comment, my last comment appeared again.<br />
Sorry for bothering you for no reason.</p>
<p>Peleg.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peleg Michaeli</title>
		<link>http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/comment-page-1/#comment-124</link>
		<dc:creator>Peleg Michaeli</dc:creator>
		<pubDate>Wed, 10 Dec 2008 20:59:17 +0000</pubDate>
		<guid isPermaLink="false">http://katzgrau.simplesample.org/?p=10#comment-124</guid>
		<description>Hello Kenny,

Maybe it was due to a fault or maybe due to your decision, but my comment here (from last week) had been deleted.

Would you be able to explain me the reason for this? And if not -- may I ask my question again?

Thanks ahead,
Peleg.</description>
		<content:encoded><![CDATA[<p>Hello Kenny,</p>
<p>Maybe it was due to a fault or maybe due to your decision, but my comment here (from last week) had been deleted.</p>
<p>Would you be able to explain me the reason for this? And if not &#8212; may I ask my question again?</p>
<p>Thanks ahead,<br />
Peleg.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jaskson &#8217;s DevNotes &#187; Implementing a Stemming Analyzer for Zend_Search_Lucene</title>
		<link>http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/comment-page-1/#comment-104</link>
		<dc:creator>Jaskson &#8217;s DevNotes &#187; Implementing a Stemming Analyzer for Zend_Search_Lucene</dc:creator>
		<pubDate>Thu, 23 Oct 2008 07:22:31 +0000</pubDate>
		<guid isPermaLink="false">http://katzgrau.simplesample.org/?p=10#comment-104</guid>
		<description>[...] project can be downloaded from the PHP Standard analyzer project page. I also have a blog post on this [...]</description>
		<content:encoded><![CDATA[<p>[...] project can be downloaded from the PHP Standard analyzer project page. I also have a blog post on this [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Patrick</title>
		<link>http://codefury.net/2008/06/a-stemming-analyzer-for-zends-php-lucene/comment-page-1/#comment-4</link>
		<dc:creator>Patrick</dc:creator>
		<pubDate>Sun, 22 Jun 2008 19:37:50 +0000</pubDate>
		<guid isPermaLink="false">http://katzgrau.simplesample.org/?p=10#comment-4</guid>
		<description>Hi,

Do you have an alpha / beta version of the WpSearch plugin?  There doesn&#039;t appear to be anything on the Project page.  I&#039;m in the process of learning how to use lucene and would love to see how it works on my local version of wordpress.

I&#039;m pretty new to all of this so I&#039;d be happy to give whatever credit / donation you were looking for.

Thanks!</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>Do you have an alpha / beta version of the WpSearch plugin?  There doesn&#8217;t appear to be anything on the Project page.  I&#8217;m in the process of learning how to use lucene and would love to see how it works on my local version of wordpress.</p>
<p>I&#8217;m pretty new to all of this so I&#8217;d be happy to give whatever credit / donation you were looking for.</p>
<p>Thanks!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
