View Issue Details

IDProjectCategoryView StatusLast Update
0002862MMW v4Otherpublic2007-06-17 19:09
ReporterLudek Assigned To 
PriorityurgentSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product Version3.0 
Fixed in Version3.0 
Summary0002862: Improve OPML v 1.0 parser for podcast directories (error when browsing directory)
DescriptionCurrently we have two OPML parsers implemented (v1.0 and v2.0). The latter works fine, but the former is too slow sometimes (for a large OPML directories) and does not work correctly for some OPML directories.
TagsNo tags attached.
Fixed in build1043

Relationships

related to 0002869 closedrusty The '+' sign always appears in the podcast directory hierarchy even when there's no subdirectory 

Activities

Ludek

2007-02-27 18:53

developer   ~0008646

I'm going to solve it with similarity of Boyer More's or Knuth Moris Prat's algorithm, we know the patterns ('<outline>','</outline>','/>') and therefore we do not need to implement an algorithm for creation of an automata for a pattern. So my implementation will be much easier and the time complexity will be O(n).

Ludek

2007-02-28 16:21

developer   ~0008659

Resolved in revision 2471.

rusty

2007-03-01 20:30

administrator   ~0008689

I'm not sure if this is a regression or not, but when I try to add the following OPML 1.1 directory, MM generates an AV:

http://www.digitalpodcast.com/opml/digitalpodcast.opml

Ludek

2007-03-01 23:10

developer   ~0008697

I cannot reproduce an AV, but adding of this OPML directory takes about 30 seconds in my case.

This is because that is a very large OPML directory, 1.5 MB of http data are read at once and it takes about 30 seconds in my case.
The true is that in case of simple adding the OPML directory there is no need to read all this data, I should read only some to find out just a name of the dir. (header info), and in case of expanding the dir. node I could try to read it per partes somehow, e.g. node by node.

Nevertheless the reading of all directory will allways take so long.

Ludek

2007-03-01 23:17

developer   ~0008698

Fixed - Only the first 4 KB of an OPML data are read to find out a header info so that a large OPML dir. could not take a lot of time on its adding.

rusty

2007-03-02 05:59

administrator   ~0008702

Tested build 1019 and it looks like it's not completely fixed.
 
I'm now able to add the opml directory, but then when I click it in MM, the Application appears frozen and a debug log is automatically generated (I sent this log to Ludek).

Ludek

2007-03-02 09:14

developer   ~0008708

Last edited: 2007-03-02 09:24

Yes, I have fixed only the case of adding it (there is enough to read only 4KB of the data to get a header info)

In the case of expanding the dir. all data has to be read (it is strange that it frozes, in my case it frozes only for 30 seconds when the data are read)

There are most probably two possibilities how to solve it:

a) Read it per partes (it is not a trivial issue and it will not a good solution in all cases, e.g. in case of http://podcast.com/opml/1817 OPML 1.0 dir - there is only one child on expanding)

b) Read it on the background - probably in case of adding the OPML dir. all its data would be stored somewhere on HDD.
Advantages:
 - Expanding would be much faster because the data are read from HDD.
 - User can browse it offline (but he cannot subscribe to a podcast)
Disadvantages:
 - The data would not be the actual, but we can add an update command similar like in case of podcast subscription.

I tend to b).
What do you tend to?

rusty

2007-03-02 16:00

administrator   ~0008713

I expect that b) would work fine, as long as there's some feedback given to the user that this is happening (e.g. mini status bar).

jiri

2007-03-05 10:26

administrator   ~0008720

Seems that b) should work fine.

Then also, I don't like the description in SVN Revision 2437: 'Only the first 4 KB of an OPML data are read to find out a header info...'. Will this be removed when b) is implemented? If not, it should at least by implemented dynamically, i.e. read HTTP data as long as we don't find the info we are looking for - and stop then.

Ludek

2007-03-05 18:54

developer   ~0008725

I must disagree - 4KB is enough to find out the info (just name of the directory). If the info is not included in the dir's OPML data then there would be search all the data uselessly in case of using your solution.
This is a reason why I think that my solution is better.

jiri

2007-03-06 10:56

administrator   ~0008726

Ok, in case there's some chance that name of directory isn't included in the whole file, it possibly makes sense to limit the amount of data read. However, it still seems to make sense, to do it on background and read only until directory name is found. In any case, it doesn't seem to be too important issue...

Ludek

2007-03-08 12:52

developer   ~0008740

Last edited: 2007-03-09 11:04

b) is fixed - i.e. Added caching of added OPML dir's data to HDD in order to speed up its reading/expanding.

TO DO: The default OPML dirs cannot be cached on its adding because it is a default (added as default). We have probably a three possibilities:
1. We could start up a background process for this after 1 minutes since MM starting??
2. We could include this OPMLs somewhere to MM install folder so that it needn't to be downloaded per user.
3. We could cache it on the first attempt to expand it.

Ludek

2007-03-15 10:12

developer   ~0008813

Reminder sent to: jiri, rusty

Ludek

2007-03-15 15:20

developer   ~0008815

Last edited: 2007-03-16 08:56

As discussed with Jiri over IM, we've chosen the 3 (i.e. We cache it on the first attempt to expand it, show it in mini progress bar and then it is expanded once it is cached). Added 'Update directory' command to context menu of appropriate OPML dir.

rusty

2007-06-11 16:27

administrator   ~0009307

Re-opening--I just observed a eurekalog error when browsing opml directories. The log was sent automatically (to Pavel).

rusty

2007-06-11 20:09

administrator   ~0009313

note also that the following isn't accepted as OPML:
http://www.podcastalley.com/feeds/PodcastAlleyTop50.opml

Ludek

2007-06-16 19:05

developer   ~0009452

Fixed accepting of the http://www.podcastalley.com/feeds/PodcastAlleyTop50.opml
directory in build 1043.

Ludek

2007-06-16 19:23

developer   ~0009454

Last edited: 2007-06-16 19:24

And the threading error fixed in build 1043 too.

rusty

2007-06-17 19:09

administrator   ~0009462

Verified 1043.