View Issue Details

IDProjectCategoryView StatusLast Update
0003040MediaMonkey 4Playlist / Searchpublic2009-02-12 17:10
Reporterjiri Assigned To 
PriorityimmediateSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product Version3.1 
Fixed in Version3.1 
Summary0003040: Implement proper full-text search
DescriptionCurrently users can use our search bar to quickly find data in their collections. However, there are several problems:
 - The search can be rather slow for large collections because it need to pretty much sequentially process the whole database.
 - The syntax of search could be enhanced, e.g. to support 'xxx NOT yyy'

We could implement our own full-text search engine or use the one currently developed for SQLite (in beta stage), but both approaches involve quite a lot of work and have some problems.

Alternative approach would be to use Desktop Search engines already installed on user machines, like Windows Desktop Search (WDS) that's default on Vista and can be downloaded to XPs as well. We could write a handler that would supply data to WDS engine (i.e. index all audio files supported by MM) and then we could use the engine to implement our Quick search.
Additional InformationSome technical links related to WDS:
Main MSDN page: http://msdn2.microsoft.com/en-us/library/aa965362.aspx
Adding data to index: http://msdn2.microsoft.com/en-us/library/bb231248.aspx
Querying the index: http://msdn2.microsoft.com/en-us/library/bb266517.aspx
SDK: http://www.microsoft.com/downloads/details.aspx?familyid=645300ae-5e7a-4ce7-95f0-49793f8f76e8&displaylang=en
Searchable file types: http://www.microsoft.com/windows/desktopsearch/technical/searchtype.mspx
Tagsdoc_help
Fixed in build1198

Relationships

related to 0000885 new MediaMonkey 4 Use 'madman' fuzzy search for getting track details from amazon 
parent of 0005134 closedLudek MediaMonkey 4 Searching for special chars like '.' no longer works with new full-text search engine 
parent of 0005280 closedLudek MediaMonkey 4 Boolean Search: Add support for 'NOT' (in addition to '-') 
parent of 0005104 closedLudek MediaMonkey 4 Search bar: Boolean 'OR' should be localizable 
parent of 0005133 closedjiri MediaMonkey 4 Improved context help for Search bar 
parent of 0005186 closedLudek MediaMonkey 4 Quick Search with "" (quoted strings) doesn't work as expected 
parent of 0005004 closedLudek MediaMonkey 4 Full-text-search: characters such as '.' should be removed from the query 
parent of 0005179 closedLudek MediaMonkey 4 Quick search for : fails to respect field selection 
parent of 0005147 closedLudek MediaMonkey 4 Search for exact text containing special characters doesn't always work 
parent of 0005565 closedLudek MediaMonkey 5 Full-text search only finds prefixes (gives poor results in Oriental languages) 
has duplicate 0001337 closedrusty MediaMonkey 4 Google style fuzzy searching in the toolbar 
has duplicate 0004503 closedLudek MediaMonkey 4 Search for exact word is no longer possible via search bar (if the word is not quoted) 
related to 0003464 feedbackjiri MediaMonkey 4 Make Search bar options more clear 
related to 0002570 closedpetr MediaMonkey 4 Library node should show all tracks in Library 
related to 0004395 closedLudek MediaMonkey 4 Configurability re. which fields the searchbar searches 
related to 0003754 closedLudek MediaMonkey 4 Search bar: Configureable search mode on a per root node basis 
related to 0003223 feedbackrusty MediaMonkey 4 search box actions should be available in hot key 
related to 0004088 closedLudek MediaMonkey 4 Search Results node is filled with non-sensical searches --> Optimize timer 
related to 0003760 closedLudek MediaMonkey 4 Searchbar returns different results for 'Search selected' vs 'entire library' 
related to 0004147 closed MediaMonkey 4 Unclear search mechanism for multiple-value tags: ';' can represent OR or AND 
related to 0005639 closedLudek MediaMonkey 4 Full text search yields poor results when searching for Drive paths 

Activities

jiri

2007-05-09 12:08

administrator   ~0009120

Assigning to Rusty for a review - will discuss over IM.

rusty

2007-05-09 16:17

administrator   ~0009124

I'm all for improved searching. As to which approach to use, I'd say that we can leave this as a technical decision, assuming that:
-search would work as is in the absence of a search engine
-any of the various underlying search engines would result in roughly similar performance
The only other approach that might be worth considering is to use an open source desktop search engine instead of an SQL one. see: http://www.searchtools.com/tools/tools-opensource.html

Other key requirements:
-the tool should support 'fuzzy' searches (e.g. if the user replaces an accented e with a plain e).
-Lastly, can this be implemented as a plugin so that it can be done post release (e.g. script another entry to be added to the quicksearch bar)?

--------------
Jiri's response:
I was thinking about WDS because it would let users also by using WDS get to results from MM (i.e. by searching for ABBA in WDS there would not only be listed HTML pages, but also MP3, OGG, .. tracks from ABBA).

I'm currently raising this only as a thing for discussion, there are many unclear things, e.g.
 - What to do when WDS isn't installed, some supplemantary search (as currently is?) would have to be present
 - WDS doesn't currently seem to properly handle accents, at least in my test it didn't find what it should in Czech language. Maybe Google Desktop Search could be checked out and even distributed with MM?

---------------------------
The problem with bundling any external desktop search engine is that the user may already have another one running. However, it's unrealistic to create plugins for each engine unless we are able to modify the framework so that users could create such plugins.

That leaves us with 2 real options:
a) Bundle GDS or WDS (pros for GDS is that it involves revenue, cons are that it isn't installed by default on any systems, and Google may include competitive products in future Google Packs).
b) Use an internal indexer

I suspect that option b) might be the best approach from a usability perspective, but that a) with GDS could also be a decent approach.

jiri

2007-08-22 20:59

administrator   ~0010245

Deferred past 3.0.

rusty

2008-07-14 15:40

administrator   ~0014365

I pretty much agree with the spec at:
http://www.jirihajek.net/MMwiki/index.php/Full-text_search

The only additions I would suggest are:
1) It should be possible to search by other fields in addition to those described in the spec. e.g. in addition to artist:word(s) and year: the following should be supported:
Album:word(s)
genre:
rating:x..y
grouping:
composer:
conductor:

2) There should be a shorthand means of supporting these (e.g.):
ar:
y:
al:
g:
r:
g:
cp:
cd:

I'm not sure of the best way to define these shortcuts though...

3) There should be a way of supporting multiple attributes for Artists, Genres, Composers. Eg artist:U2 King should return tracks for which there are multiple artists including U2 AND B.B. King. (this was probably already planned, I just want to make certain).

4) Is there a need to support a 'Does not contain' operator?

Note: I'm also making several issues related to this bug since they should probably be modified at the same time.

jiri

2008-07-15 11:32

administrator   ~0014370

1-3) is now added to the spec.

4) I'm sure that users will find some good uses for this feature, I can imagine e.g. 'beatle -beatles', or more specifically 'beatle -artist:beatles'. Or 'rock -hard'.

rusty

2008-07-15 17:40

administrator   ~0014372

using '-' for NOT might be a problem since many tracks have '-' somewhere within. Any ideas how to get around this problem?
e.g. Iron Maiden-The Best of could be a problem.

Or perhaps we just agree that if the format is " -word" that it's ok to use '-' as NOT.

jiri

2008-07-17 16:13

administrator   ~0014379

'-' will mean NOT only if it directly precedes a word (or phrase), i.e.:
-word
-"a phrase"

Other usage of '-' doesn't make much sense in the full-text search, since it's one of delimiters (like '.', ',', etc.) that are ignored for search and indexing purposes, for example search string 'AC/DC' will find both 'AC DC' and 'AC-DC'.

jiri

2008-08-01 12:04

administrator   ~0014439

I implemented core of this feature, i.e. SQLite support, full-text index maintenance, etc. It's all in SVN branch https://svn1.cvsdude.com/jirik/MediaMonkey/Branches/Full-text Search, where we should finish the feature before it's merged to the main branch.

An updated SQLite.dll is needed, I uploaded it to FTP, folder SQLite-FullText.

Some details of the implementation are described in wiki: http://www.jirihajek.net/MMwiki/index.php/Full-text_search

Ludek

2008-08-08 21:01

developer   ~0014446

Last edited: 2008-08-25 17:39

I implemented all mentioned in the wiki's article http://www.jirihajek.net/MMwiki/index.php/Full-text_search It is all in the SVN branch ../Branches/Full-text Search. I added also acceptance for localized strings (e.g. in czech 'interpret:abba' = 'artist:abba').

I also added several DUnit regression tests.

As has been already mentioned we should review some names of fields, e.g.
I would replace current 'Track#:1..3' by 'Track:1..3', 'Disc#:1..3' by 'Disc:1..3', 'OrigDate:2005..' by 'OrigYear:2005..', etc.

Fields like 'Album Artist' needs to be written as e.g.'AlbumArtist:Bjork'

The functionality can be tested/reviewed by using the MM_FTS.exe and SQLite3.dll uploaded to our FTP, folder SQLite-FullText.

jiri

2008-08-25 20:35

administrator   ~0014480

Maybe we could ignore non-alphanumeric characters in field names, i.e. no matter whether user enters 'Track' or 'Track#', the result would always be the same.

rusty

2008-08-25 20:51

administrator   ~0014482

I think that for the purpose of Searching, all fields that are entered in the searchbar should be entered exactly as they appear. e.g. Album Artist:Abba should be valid--otherwise it's unclear to users what they have to enter.

BUT, if we must create 'aliases' for the fields, then we might as well create shorter Aliases as well. e.g.

Album Artist:= AlbumArtist: or alar:

Though as Jiri pointed out, it may be too complex to do this in a localizable fashion.

Ludek

2008-08-26 11:16

developer   ~0014486

Last edited: 2008-08-26 11:30

Ok, I enhanced full-text searching so that both 'track:5' and 'track#:5' takes effect.

Re: searching for 'Album Artist:Abba'
the problem is that space has meaning of and operator and according to FTS3 engine this should mean: any text field contains word 'Album' and artist field contains word 'Abba'. But I see your point of view and I agree that this should be overloaded in such a cases.

i.e. I fixed it and in case of 'Album Artist:Abba' it means albumartist field contains word 'Abba' now.

The functionality can be tested/reviewed by using the MM_FTS.exe and SQLite3.dll uploaded to our FTP, folder SQLite-FullText.

rusty

2008-09-03 21:07

administrator   ~0014525

I tried to do a quick test of this however I found that:
1) the FTS build doesn't permit MM 3.0.4 libraries to be scanned or imported
2) the FTS build regularly experiences SQL errors when attempting to scan a fresh DB

Ludek

2008-09-04 10:19

developer   ~0014529

Last edited: 2008-09-04 10:21

Sorry for that, I probably uploaded bad SQLite3.dll that caused this. I also forgot to tell you that you should first backup your DB :-(

Nevertheless I have uploaded the right files.
I tested the files and works fine:
1. Installed 1185
2. Created DB (added some tracks)
3. Downloaded MM_FTS.exe and SQLite3.dll to the same directory.
4. Run MM_FTS.exe
-> DB is updated, works fine
5. Possibly add the others tracks to the DB

SO re-download the files and should work fine for you too.

rusty

2008-09-09 18:24

administrator   ~0014554

It's working now. There are a couple of search results that I'm not sure about:
a) Search for: dargent _doesn't_ find d'argent
b)i) Search for: hurricane _doesn't find_ 'Hurricane'. Shouldn't it?
 ii) Search for: hurr _doesn't find_ Hurricane. Shouldn't it? Also, the search results don't update dynamically as the user is typing.
c) Search for: come with me _finds_ come away with me. I think this is intentional--just want to confirm.
d) Search for: rating:3.5..5 doesn't work.
e) Adding - for NOT gets rid of all current search results. It should only take effect after the user types something after the -

Ludek

2008-09-10 10:36

developer   ~0014562

RE: a) Search for: dargent _doesn't_ find d'argent
-> I think that this works fine, the ' is not a delimiter like - or / and the word "dargent" differs from "d'argent".

RE: b)i) Search for: hurricane _doesn't find_ 'Hurricane'. Shouldn't it?
-> I cannot reproduce, works fine for me, searching is not case sensitive in my case, could you send me the file or find any steps to reproduce?

RE: b)ii) Search for: hurr _doesn't find_ Hurricane. Shouldn't it? Also, the search results don't update dynamically as the user is typing.
-> This is due to spec, see http://www.jirihajek.net/MMwiki/index.php/Full-text_search and the 'Searching of full words' paragraph, it is all about whether MM should automatically add '*'.

c) Search for: come with me _finds_ come away with me. I think this is intentional--just want to confirm.
->
Yes, now it works really like full-text search, i.e. searching for 'come away with me' means that e.g. words 'come', 'away' are in title, word 'with' is in path and word 'me' is in album field. Is this clear?

d) Search for: rating:3.5..5 doesn't work.
-> fixed, new MM_FTS.exe uploaded

e) Adding - for NOT gets rid of all current search results. It should only take effect after the user types something after the -
-> fixed, new MM_FTS.exe uploaded

peke

2008-11-12 02:12

developer   ~0014908

All Changes made in SVN Revision 6440 made that MM could not start on my PC by reporting SQLite3.dll Error. I see that Petr has uploaded My settings to FTP, if you need whole Library I could Upload it also (18MB compressed).

Ludek

2008-11-13 10:54

developer   ~0014930

The related DB update issue has been fixed in 3.0.1.1192. Another issue (not related to this) is tracked in 0004926.

RE: b)ii) Search for: hurr _doesn't find_ Hurricane
As Petr suggested, we should rather add an config entry - checkbox whether to search only full words (like google) or not. Otherwise I would expect that users will expect that searching for 'an' will find 'another' as used to be so far.

rusty

2008-11-26 17:18

administrator   ~0015149

On tools > options > Search, we can add the following to the top:

[ ] Search whole words only

Tooltip: Causes search for 'an' to not show show tracks containing 'another'

Note: do you propose that this works for all searches? or only for the searchbar?

jiri

2008-11-26 23:17

administrator   ~0015164

I'm not sure whether this really needs to be an option, users will need it to act differently in different cases anyway. I'd propose to wait for more feedback on the feature.

If you think we really have to add it, then ok, but it should be applied to Search bar only.

rusty

2008-11-27 18:36

administrator   ~0015199

Jiri, the reason why I think it IS required right away is that currently (1195) MM restricts searches to whole words (unlike previous versions of MM). I personally don't like this behaviour, and expect that a lot of users will find it annoying as well.

jiri

2008-11-28 12:22

administrator   ~0015235

Ok, we certainly can add it (even though I'd like to prevent too many settings to Options dialog). Note however, that due to design of FTS in SQLite, it only searches words _starting_ with the entered characters, there currently isn't a way how to use full-text search for searching a term in the middle of a word.

rusty

2008-11-28 17:10

administrator   ~0015239

i think that's fine.

Note: I think that 'Search whole words only' should be disabled by default, unless we find that it doesn't work as well as I expect.

Ludek

2008-12-01 10:44

developer   ~0015268

Fixed in 1198.

Added the '[ ] Search whole words only' checkbox and unchecked by default.

Owyn

2008-12-06 13:08

updater   ~0015448

Search for contained string does not seem to work. Per previous example, "bb" to find "ABBA".

Tried e.g. *bb. No luck.
http://www.mediamonkey.com/forum/viewtopic.php?f=6&t=34552

Ludek

2008-12-06 14:19

developer   ~0015453

Last edited: 2008-12-06 17:14

See the last Jiri's note (15235). That is why it doesn't work in case of 'Entire Library' quick search however searching for '*bb' should work in 'Current selection' mode that doesn't use the SQLite's FTS3 engine.

Owyn

2008-12-06 14:37

updater   ~0015454

Thanks. Tested it. "bb" finds, e.g. "B.B. King" but not Abba, "*bb" finds Abba, etc.

rusty

2008-12-07 04:31

administrator   ~0015506

Verified 1201.