search function issue

Locked
henry66
Posts: 40
Joined: Sun May 24, 2015 3:26 am

search function issue

Post by henry66 »

this search function seems to be one of the most looked for feature in v5, so is it with me.

since i cannot find a detailed description how it is supposed to work, here some assumptions:

- it is ssuposed to find complete words as well as part of words, case independent

right so far?

- example: i have 9 very similar mails (plain text) all containing the word 'Medimops'

- i search for 'medi' - result: nothing

- i search for 'medim' - result: nothing

- i seach for 'medimo' - result: one hit

- i search for 'dimops' - result: nothing

- i seacrh for 'medimop' - result: nothing

- i search for 'medimops' - result: all nine hits

totally baffeling and unexplainable for me. makes no sense, but is absolutely reproducable, same results always.

anybody?

greetings - heinz -

some samples as attachments
PPsearch_medimop_0.jpg
PPsearch_medimo_1.jpg
PPsearch_medi_0.jpg
henry66
Posts: 40
Joined: Sun May 24, 2015 3:26 am

Re: search function issue

Post by henry66 »

and here one more, with all 9 hits ...
PPsearch_medimops_9.jpg
User avatar
Jeff
Admin / Developer
Posts: 9227
Joined: Sat Sep 08, 2001 9:46 pm

Re: search function issue

Post by Jeff »

As of b22, it performs a whole word search; prior to b22, it would perform partial matches. This was changed for a couple of reasons, partly because whole word searches is more efficient in the new (b22) search engine, but also because it may provide more relevant results. It has its downsides -- searching for "horse" won't match "horses" but, on the other hand, searching for "one" won't match "stones".

However, b22 has added more commands, so that you can still perform partial matches -- just add a '*' to the beginning and/or end of your search. So, if you want to find "stones" you can search for "*one*". You can also default to partial matches if you want (in which case, using * will change it from a wild card to a word terminator).

If your search for medim and dimops found something, I'd recommend searching the message for those words and see if they occur as words on their own (and let me know in case something is wrong).


For more information:
https://www.esumsoft.com/products/pop-p ... ced-search
(featured in the MOTD, though the page was updated for b22, and edited minutes ago when I found a reference to the default behavior being a partial match; actually, I'm going to edit it again to use the phrases "partial" and "whole word" as that's probably better)



NOTE: I have updated PP and the documentation to use the terms "whole" and "partial" instead of the former "strict" and "loose" -- so, in b22, even though the doc says to use //match:whole// and //match:partial// -- you should instead use //match:strict// and //match:loose//. Again, that only applies to b22; once b23 is released, the strict/loose will no longer work and you need to use whole/partial. Further, this is only relevant if you want to change the default to partial matching, which I wouldn't recommend since using partial matching on-demand using wildcards is the recommended method.
henry66
Posts: 40
Joined: Sun May 24, 2015 3:26 am

Re: search function issue

Post by henry66 »

jeff, i understand your reasoning, and it makes sense, often enough, and even to me.

but we are not alone in this world: all and every search that i know, going thru emails, even those on webmails, uses partial searches as a default - so thats what a ''normal'' user expects. and if you or whoever changes from version to version (which you should do, 99% being improvements), it would only be fair to introduce this properly and rather loudly to your beta-user - rather than confusing them.

yes, i did check for that ''medim and dimops'' - but no, they were not there as fragments. so on this point, still no idea.

and natuarally, the *xxxx* partial search is working fine, a relict from good old DOS times, but a good substitude. IF one knows about it.

stiil think, as it was in b21, the search being 'partial' by default should be the standard - and other options like whole words only, capitals, whatever, could be an option. *xxx, xxx*, Xxxxx, Xxxx*, and many others - simple to understand, simple to learn.

many people will copy / paste words or fragments into the search box, and then adding * in front and behind is a pain, often forgotten, and leading to wrong or no results.

many greetings, and please, this was not really only critical - more like suggestions.

- heinz -
User avatar
Jeff
Admin / Developer
Posts: 9227
Joined: Sat Sep 08, 2001 9:46 pm

Re: search function issue

Post by Jeff »

but we are not alone in this world: all and every search that i know, going thru emails, even those on webmails, uses partial searches as a default
I just tested Thunderbird, it does not use partial searches (in case I need to refer to what I had searched for, I searched for "mortal" vs "mort" where the former found results and the latter did not). Using my email provider's web-based mail, it does use partial searches. Gmail does not use partial searches (test case: receipt vs ceipt).

In the cases of TB and Gmail that do not appear to use partial searching, there's probably (and "certainly" in the case of gmail for obvious reasons) fancier searching going on, so that I would presume searching for "horse" will find "horses". But I'd be very surprised if the examples of partial words you provided worked in TB or Gmail like they did in PPv5b21.

So, right now, I'm still inclined to stick with whole word, but I'm very open to changing that. After all, it's just a flip of a switch to change the default, no skin off my back; I just want to go with what I think the majority of people would expect (*because* I don't expect people to know about the wildcard usage).
it would only be fair to introduce this properly and rather loudly to your beta-user - rather than confusing them.
It's in the change notes. And it includes a very detailed explanation of the changes to the search engine that took place in b22. And the first change I mention regarding the search engine is this very topic. Further, I've stated that the primary method of information will be via the MOTD, and the current MOTD is "This update has made changes to the search engine. Please report any issues." The button on the bottom of the MOTD is (and always has been, for the beta) a link to the change notes.

That being said, I actually had an ulterior motive for being so brief on the MOTD. b22 brought a *major* change to the search engine and I originally had planned on going into great detail; but -- I wanted to see if anyone would bring it up. So, thank you, you started the conversation that I wanted but sometimes talking to oneself isn't very constructive :mrgreen: And hopefully anyone else that has an opinion also joins this conversation.
yes, i did check for that ''medim and dimops'' - but no, they were not there as fragments. so on this point, still no idea.
Would you be willing to send the email source so that I can test it? Save the source (view message in PP: File / Save message as) and then attach it via PM or email (do not attach in the public forum). If it really doesn't exist, then I do have a theory.

many people will copy / paste words or fragments into the search box, and then adding * in front and behind is a pain, often forgotten, and leading to wrong or no results
I would disagree with that generalization. If you copy/paste, then you're probably going to copy the entire phrase and not stop the selection in half a word. e.g. the thing that I most often copy/paste to search is an email address; and I make sure I copy the entire email address, even before the change to whole word.

Why would you copy half a word....?

and please, this was not really only critical - more like suggestions.
Criticism is fine as long as it's constructive :)
User avatar
Jeff
Admin / Developer
Posts: 9227
Joined: Sat Sep 08, 2001 9:46 pm

Re: search function issue

Post by Jeff »

I've come up with what may be a compromise and hopefully best-of-both-worlds kind of solution. It's what I'm calling "mixed matching".

Basically, for each word you provide in the search box, POP Peeper will first perform a "whole word" match on that word; but if the word doesn't exist in the database, then it will instead do a "partial" match.

The idea being that if you specify a "real" word, then you probably want to only find that word. But if you specify a "fake" word, then it probably means that you want to find forms of that word.

For example: if you do a search for "communicat" --
communicat is not a real word, and would probably not appear as a whole word. So, PP does a whole word search for "communicat", finds that it doesn't exist, and then does a partial search for "*communicat*" which would potentially find matches for "communicate", "communication", "communicating" etc.

On the other hand, if you do a search for a word like "eve" (e.g. perhaps a person's name), then you're only going to get messages that contain the whole word "eve" and not a bunch of irrelevant messages that contain the words seven, even, event, steve, ever, never, etc.

Case in point -- and this just happened to be kismet because I've been using "eve" as an example before I had ever tested it -- If I search for "eve" I get 7 results. If I search for *eve* I get 8749 results. (make that 8750, I just got a new message :lol: the trigger word being retrieve [and believe] [and however] [and other forms of retrieve])


Also, something else I've added based on something you've mentioned:
PPv5_Search_NoResultsTip.png
It's probably less likely to be relevant with the "mixed" matching, but it may still provide useful information to the user.
Locked