View Issue Details

IDProjectCategoryView StatusLast Update
0006453MMW v4DB/FileMonitorpublic2010-12-05 13:54
Reporterrusty Assigned To 
PriorityimmediateSeveritymajorReproducibilityalways
Status closedResolutionfixed 
Product Version4.0 
Target Version4.0Fixed in Version4.0 
Summary0006453: Default video metadata is often missing / incorrect
DescriptionWhen MM4 scans video files, they almost always do not have metadata stored to tags. Consequently MM infers the metadata from the path/filename. Unfortunately, the current inferences are almost always incorrect:

a) often Titles are blank
b) often Year is empty despite the fact that year is usually included in a demarcated fashion in the file

This is likely because MM currently uses metadata inference rules that were designed for audio. Here are some basic conventions that seem to be used in titling videos:


In summary, there's no real consistent naming convention. But there are a few common patterns:
- Title is usually the beginning of the path or filename, and Spaces in the title can be represented by spaces, periods, or underscores.
- Title is often followed by dvdrip or bdrip (capitalization is irrelevant, sometimes there's a space between 'dvd' and 'rip', there may or may not be demarcation via brackets [], (), {})
- Title is often following by year (there may or may not be demarcation via brackets [], (), {} )
- For Series or TV Series, the filename may or may not match the directory. In either case, the filename will include additional metadata re. the episode (Episode number e.g. S01E02, and sometimes the episode title)

TagsNo tags attached.
Fixed in build1330

Relationships

related to 0006435 closedLudek Add video properties 
related to 0006794 closedLudek Publisher Mask, Publisher Column and Tree Node are missing 
related to 0006806 new Optimized Auto Tools algorithms 
related to 0006934 resolvedLudek Not all metadata values are available for Masks in Auto-Tools 

Activities

peke

2010-08-28 14:20

developer   ~0020467

We should consider using Scene rules of video files where most of data is in standardized form.

Example for TV Shows:
<Show Title>.S<Season#>E<Episode#>.<Source>.<Encoding format>-<Scene group>
or
<Show Title> - <Season#>x<Episode#> - <Episode Title> - <Skip> (http://www.addic7ed.com/)

jiri

2010-09-28 21:31

administrator   ~0020611

Since our standard mask based patterns probably don't help much here, it would be best to implement some logic in our code (for videos only).

It could:
1. Take Title from the filename (and remove any known junk text or characters and patterns like year or series number, if we are able to localize it per rules below).
2. Just in case the filename is somehow too generic (some common patterns here?), it could take folder name.
3. Try to find year - as a 4-digit number. There probably can be some false matches, but we can't do much about it.
4. Try to find series and episode numbers (like SnnEmm pattern).

peke

2010-09-28 22:03

developer   ~0020613

2. Patterns I have posted are mostly used. Folder format is mostly "<Show Title>\Season <Season>\<Show Title>.S<Season#>E<Episode#>.<Source>.<Encoding format>-<Scene group>"

rusty

2010-10-20 14:48

administrator   ~0020881

Note: I just noticed that when I scan MP3 files that don't have tags _from a network location_ (e.g. \\192.168.0.147\Qdownload\Swayzak_-_collected\swayzak_-_route_de_la_slack-2006\108-bergheim_34--random_access_memory_(swayzak_superico_mix)-dh.mp3 ), that the filename isn't imported into the title field.

This is may be (another) reason for why movie Titles are often blank in my environment (all video is stored to the network).

Ludek

2010-10-25 13:43

developer   ~0020959

Last edited: 2010-10-25 13:46

Fixed in build 1319 and covered by automated regression testing.

The following metadata is now properly extracted from filename:

'F:\Films\The Big Bang Theory\The Big Bang Theory s01e01.avi'
properly extracts
Title = 'The Big Bang Theory'
Season = '1'
Episode = '1'

'F:\Films\IT Crowd\The.IT.Crowd.S02E06.WS.PDTV.XviD-RiVER.avi'
properly extracts
Title = 'The.IT.Crowd'
Season = '2'
Episode = '6'

'F:\Films\Up In The Air.2009.DVDSCR.XviD-CAMELOT.avi'
properly extracts
Title = 'Up In The Air'
Year = 2009;

'F:\Films\A Clockwork Orange\A Clockwork Orange.avi'
properly extracts
Title = 'A Clockwork Orange'

rusty

2010-11-04 13:09

administrator   ~0021169

Tested 1320 and there are still numerous problems:

1) Scanning the following paths --> No Metadata:
\\NAS2\Qdownload\TV - Arthur (01-10)\Season 05\5x01 - Arthur & The Big Riddle; Double Dare.avi
\\NAS2\Qdownload\the scooby doo show complete series\scooby doo show season 1\The Scooby-Doo Show 110 Scared Alot In Chamalot.avi
This occurs consistently, and I'm not sure why even the Title field is empty?!

2) For TV Series, which usually store the 'Series' to a directory immediately preceding the filename, the Series is consistently missing. e.g.:

\\192.168.0.147\Puzzle\TV\The Big Bang Theory\01x04 - The Luminous Fish Effect.avi
\\192.168.0.147\Puzzle\Kids TV\Phineas and Ferb Season 1\05 Lights Candace Action.avi
\\192.168.0.147\Puzzle\TV\30 Rock\30 Rock Season 4\30 Rock S04 E12 - Vema.avi
\\192.168.0.147\Puzzle\Documentary\BBC - Planet Earth\BBC.Planet.Earth.11of11.Ocean.Deep.DVB.XviD.MP3.avi
\\192.168.0.147\Puzzle\TV\Chuck - Season 1\Chuck 01x07 - Chuck Versus the Alma Mater.avi
\\192.168.0.147\Puzzle\Kids TV\Pink Panther\Pink.Panther.023-Super.Pink-DvdRip.xvid.ac3.avi

a) this logic should only be used when the preceding directory contains >2 video files (e.g. in the following example, the preceding directory contains only 1 or 2 video files, so the it shouldn't be treated as a 'season'.
\\192.168.0.147\Puzzle\Kids Movies\Coraline.DVDRip.XviD\Coraline.DVDRip.XviD\Coraline.DVDRip.XviD.avi

b) If the preceding directory has xxx - Season #, then the portion with 'Season #' should be stripped out. e.g.
\\192.168.0.147\Puzzle\TV\Chuck - Season 1\Chuck 01x07 - Chuck Versus the Alma Mater.avi


3) Season# and Episode# information was consistently missing for files in the following format (MM should be smart enough to look for this information when multiple tracks are stored to a single directory):

\\192.168.0.147\Puzzle\TV\The Big Bang Theory\01x04 - The Luminous Fish Effect.avi (i.e. ##x## should be treated as season/track info when multiple tracks are stored in a single directory)

\\192.168.0.147\Puzzle\Kids TV\Phineas and Ferb Season 1\05 Lights Candace Action.avi (i.e. Season # should be treated as a Season number if it appears within a parent directory that has >2 tracks inside; ## or ### should be treated as Episode# when it appears within the Title, if there are >2 files in the directory). Exception: when ###=264 (i.e. x264 or h.264 or H264)

Here are a bunch of other similar examples (note that for all of these cases, 'Series' was also not identified):
\\192.168.0.147\Puzzle\TV\30 Rock\30 Rock Season 4\30 Rock S04 E08 - Secret Santa.avi
\\192.168.0.147\Puzzle\Documentary\BBC - Planet Earth\BBC.Planet.Earth.09of11.Shallow.Seas.DVB.XviD.MP3.www.mvgroup.org.avi
\\192.168.0.147\Puzzle\TV\Chuck - Season 1\Chuck 01x07 - Chuck Versus the Alma Mater.avi
\\192.168.0.147\Puzzle\Kids TV\X-Men_Evolution_Complete_Series\X-Men Evolution\Season I\Episode 03 - Rogue Recruit.avi
\\192.168.0.147\Puzzle\Kids TV\Pink Panther\Pink.Panther.023-Super.Pink-DvdRip.xvid.ac3.avi
\\192.168.0.147\Puzzle\Kids TV\Clone Wars - Season 2\Star Wars 02x12 The Mandalore Plot.avi
\\192.168.0.147\Puzzle\TV\Terminator.The.Sarah.Connor.Chronicles\Sarah.Connor.Chronicles.S02\Terminator.The.Sarah.Connor.Chronicles.S02E06.HDTV.XviD-0TV.avi
\\192.168.0.147\Puzzle\TV\Weeds\Weeds - Season 4\Weeds - 406 - Excellent Treasures.avi
\\192.168.0.147\Puzzle\Kids TV\The Wonder Years\Wonder Years.s3e035 - The Powers That Be.avi


4) Year/Date information was consistently missing for the following:
\\NAS2\Qdownload\Prince of Persia The Sands of Time (2010) DVDRip XviD\Prince of Persia The Sands of Time (2010) DVDRip XviD.avi
\\192.168.0.147\Puzzle\Movies\2012 (2009) DVDRip XviD-MAXSPEED\2012 (2009) DVDRip XviD-MAXSPEED www.torentz.3xforum.ro.avi
\\192.168.0.147\Puzzle\Movies\About.A.Boy[2002]DvDrip[Eng][CZsubs]-morgie\About.A.Boy[2002]DvDrip[Eng]-morgie.avi
\\192.168.0.147\Puzzle\Movies\Adventureland.2009.DvDRip.AC3-FxM\Adventureland.2009.DvDRip.AC3-FxM.avi
\\192.168.0.147\Puzzle\Movies\Akira.1988.DVDRip.DivX.english.dubbed.avi
\\192.168.0.147\Puzzle\Kids Movies\Alvin_and_the_Chipmunks[2007]DvDrip[Eng]-FXG.4058561.TPB\Alvin and the Chipmunks[2007]DvDrip[Eng]-FXG\Alvin and the Chipmunks[2007]DvDrip[Eng]-FXG.avi
\\192.168.0.147\Puzzle\Movies\Annie Hall (Woody Allen 1977) XviD DVDRip.avi
\\192.168.0.147\Puzzle\Movies\Artificial Intelligence - AI [2001-DVDRip-H.264-x264]-WOLViSH\Artificial Intelligence - AI [2001-DVDRip-H.264-x264]-WOLViSH.mp4
\\192.168.0.147\Puzzle\Movies\District 9 (2009) DVDRip XviD-MAXSPEED\District 9 (2009) DVDRip XviD-MAXSPEED www.torentz.3xforum.ro.avi
\\192.168.0.147\Puzzle\Kids Movies\Dora.The.Explorer.Best.Friends.2009.DvDRiP.XviD-ExtraScene RG\Dora.The.Explorer.Best.Friends.2009.DvDRiP.XviD-ExtraScene RG.avi

Note: Preference should be given to:
Numbers that are enclosed within parentheses (round or square)
Numbers that range between 1930 and <current year>
So that movie titles that actually have a year in the Title fill in the Year/Date field with the preferred year variable.

Ludek

2010-11-05 00:04

developer   ~0021188

Fixed in build 1322.

The following is now covered by automated regression testing:

'C:\Films\IT Crowd\The Big Bang Theory s01e03.avi'
'C:\Films\IT Crowd\The.IT.Crowd.S02E06.WS.PDTV.XviD-RiVER.avi'
'C:\Films\A Clockwork Orange\A Clockwork Orange.avi'
'\\192.168.0.147\Puzzle\TV\The Big Bang Theory\01x04 - The Luminous Fish Effect.avi'
'\Phineas and Ferb Season 1\05 Lights Candace Action.avi'
'\TV\30 Rock\30 Rock Season 4\30 Rock S04 E12 - Vema.avi'
'\TV\Chuck - Season 1\Chuck 01x07 - Chuck Versus the Alma Mater.avi'
'\\NAS2\Qdownload\TV - Arthur (01-10)\Season 05\5x01 - Arthur & The Big Riddle; Double Dare.avi'
'\\NAS2\Qdownload\the scooby doo show complete series\scooby doo show season 1\The Scooby-Doo Show 110 Scared Alot In Chamalot.avi'
'\X-Men Evolution\Season 01\Episode 03 - Rogue Recruit.avi'
'\X-Men Evolution\Episode 03 - Rogue Recruit.avi'
'\Documentary\BBC - Planet Earth\BBC.Planet.Earth.09of11.Shallow.Seas.DVB.XviD.MP3.www.mvgroup.org.avi'
'\Kids TV\The Wonder Years\Wonder Years.s3e035 - The Powers That Be.avi'
'C:\Films\Up In The Air.2009.DVDSCR.XviD-CAMELOT.avi'
'\Prince of Persia\Prince of Persia The Sands of Time (2010) DVDRip XviD.avi'
'\About.A.Boy [2002]DvDrip[Eng]-morgie.avi'
'Alvin and the Chipmunks[2007]DvDrip[Eng]-FXG.avi'
'\Movies\District 9 (2009) DVDRip XviD-MAXSPEED\District 9 (2009) DVDRip XviD-MAXSPEED www.torentz.3xforum.ro.avi'
'\Artificial Intelligence - AI [2001-DVDRip-H.264-x264]-WOLViSH.mp4'
'\Kids Movies\Dora.The.Explorer.Best.Friends.2009.DvDRiP.XviD-ExtraScene RG\Dora.The.Explorer.Best.Friends.2009.DvDRiP.XviD-ExtraScene RG.avi'

rusty

2010-11-07 14:11

administrator   ~0021216

Last edited: 2010-11-07 14:22

Tested in 1322 and for every single video file on a network path (e.g. scan \\192.168.0.122\Videos\" where 'Videos' is the sharename), metadata wasn't inferred at all! i.e. every single field is blank.

Note: metadata is inferred for tracks on local drives.

Edit: Note that in previous builds, MM would sometimes correctly infer metadata on network scans on the first scan, and then subsequently fail. I was never able to figure out exactly what causes the failure. In build 1322, failure to infer metadata is consistent (i.e. I've never gotten it to succeed).

Ludek

2010-11-07 17:36

developer   ~0021220

Fixed the network paths issue in build 1323.

peke

2010-11-07 21:21

developer   ~0021227

@Rusty
I wonder are you sure that \\192.168.0.122\ is in LAN zone and mot in Internet Zone (Network Root Path in explorer tree on win 7)?

rusty

2010-11-08 22:22

administrator   ~0021260

Tested build 1323, and experienced the following issues. We need to discuss to figure out the best solution:

1) When 'Series' information is stored to the directory, it is lost: e.g.
\\192.168.0.147\Puzzle\Documentary\Heritage, Civilization and the Jews (Abba Eban,1984)[1Maven]\8 Out of the Ashes, 1919 - 1947.avi
\\192.168.0.147\Puzzle\Kids TV\X-Men_Evolution_Complete_Series\X-Men Evolution\Season III\Episode 08 - Self Possessed.avi

Similarly, when the title is only stored to the directory, it is lost:
\\192.168.0.147\Puzzle\Kids Movies\BATTERIES NOT INCLUDED[DVDRIP][ENG]-kidzcorner\BATTERIES_NOT_INCLUDED\TITLE01.avi

The only thing we might be able to do for such cases is to give preference to the directory name when:
a) The directory is a parent of a directory named 'Season ##' and/or there are multiple files in a directory without any commonality
b) When the file has no information (as in TITLE01), use the parent directory for metadata

2) When a '-' appears in the title, it often causes the title to be lost e.g.

\\192.168.0.147\Puzzle\Kids Movies\Mary.and.Max.2009.DVDRip.XviD.TheWretched.NoRar.www.crazy-torrent.com\Mary.and.Max.2009.DVDRip.XviD.TheWretched.NoRar.www.crazy-torrent.com.avi

\\192.168.0.147\Puzzle\Kids Movies\Cloudy.with.a.Chance.of.Meatballs.DVDRip.XviD-DoNE\meatball-done.avi

\\192.168.0.147\Puzzle\Kids Movies\Coraline.DVDRip.XviD-ARROW..4910439.TPB\Coraline.DVDRip.XviD-ARROW.[www.FilmsBT.com]\Coraline.DVDRip.XviD-ARROW.avi

\\192.168.0.147\Puzzle\Kids Movies\Dora.The.Explorer.Super.Babies.Dream.Adventure.2009.DVDrip.Xvid\Dora.The.Explorer.Super.Babies.Dream.Adventure.2009.DVDrip.Xvid-FHW-sample.avi

\\192.168.0.147\Puzzle\Kids Movies\Hotel for Dogs\Hotel.for.Dogs.DVDRip.XviD\Hotel.for.Dogs.DVDRip.XviD-NeDiVx.avi

\\192.168.0.147\Puzzle\Kids Movies\The.Secret.of.Kells.2009.LiMiTED.DVDRip.XviD-LPD.[www.torrentfive.com]\The.Secret.of.Kells.2009.LiMiTED.DVDRip.XviD-LPD.avi

\\192.168.0.147\Puzzle\Kids TV\Clone Wars - Season 2\Star wars 02x09-10.avi

\\192.168.0.147\Puzzle\Movies\Serenity\serenity-cd1.avi

This makes me wonder if perhaps, the Title should always be the filename (unless a tag exists), because it's fairly difficult to properly infer the title, and we end up losing a fair bit of metadata in doing so.

3) Episode # is lost for the following series:
\\192.168.0.147\Puzzle\Kids TV\Pink Panther\Pink.Panther.003-We.Give.Pink.Stamps-DvdRip.xvid.ac3.avi


4) Series not inferred for filenames of the following format, AND all Titles are identical!!
Clone.Wars.S01E02.avi
\\192.168.0.147\Puzzle\TV\30 Rock\30 Rock Season 1\30.Rock.S01E15.Hard.Ball.avi

==> This is really just a combination of 1) and 2), but illustrates the nature of the problem...

Ludek

2010-11-11 17:04

developer   ~0021314

1, 2, 4 - All fixed and covered by regression testing in build 1325

Re 3) We cannot do anything about this, because it is not obvious whether 003 is Season# or Episode# or whether the 3 (as part of 'ac3') is Episode#
Therefore we don't try to get anything in this case in order to prevent from false matches

jiri

2010-11-11 17:34

administrator   ~0021315

I haven't tested, but reopening for a possible small improvement - many test cases seem to contain '.' instead of ' '. I think that we could try to automatically fix this. In order to avoid false removal of dots, I'd suggest to remove them only if:
1. There aren't any spaces (' ') in the string.
2. There are at least two letters between some dots - this is to prevent removals in cases of some abbreviations like 'A.B.C.'.
3. Always keep several consecutive dots, like in 'And...'

Ludek

2010-11-11 19:28

developer   ~0021317

Fixed in build 1325.

peke

2010-12-02 01:40

developer   ~0021543

Last edited: 2010-12-02 01:53

Still not working in 1329 if User upgrade from MM3 and MM4 Imports settings where default Masks exist.

Tested using/scanning <user>\Desktop folder.
File examples:
1. C:\Users\Peke\Desktop\Legend.of.the.Boneknapper.Dragon.2010.DVDRip.XviD.AC3-YeFsTe.avi

Where Info is imported as:
Title: Legend of the Boneknapper Dragon 2010 DVDRip XviD AC3-YeFsTe
Date: 2010

2. C:\Users\Peke\Desktop\Filmovi\Lena_-_Satellite__Germany_.mp4

Where Info is imported as:
Title: Lena -
Season: at
Episode: ll

Instead of:
Title: Satellite Germany
Artist: Lena

Season and Episode should be considered in Mask only if SXX and EXX (SXXEXX) Where XX is numerical

Ludek

2010-12-02 18:00

developer   ~0021560

Re: 1.
This is all right, year 2010 is properly extracted and the title is dot free.

Re: 2.
This is a bug, I have fixed it and covered by regression testing, now the title is properly extracted as
'Lena - Satellite Germany'
and the Episode# and Season# are empty

Note that we cannot use guess mask like '<Series> - <title>' or '<director> - <title>' or '<title> - <description>', because cannot distinguish between them and title often includes '-' with no special meaning.

Fixed in 1330

peke

2010-12-03 02:45

developer   ~0021576

2. Verified 1330

1. I see, I have expected more something like this:
Title: Legend of the Boneknapper Dragon
?Publisher(Artist?): DVDRip XviD AC3-YeFsTe
Date: 2010

Especially as Year is already extracted.

zvezdan

2010-12-03 09:04

updater   ~0021585

>> ?Publisher(Artist?): DVDRip XviD AC3-YeFsTe

Wow, please don't tell me that you want to get the information about used video and audio codecs from the filename. Oh well... More and more it seems to me that your video implementation is just a joke. I think that you didn't allowed elemental columns for:
- video resolution (as one string or eventually as 2 independent numbers for horizontal and vertical resolution);
- video frames per second;
- video codec;
- audio codec(s);
- video bitrate.

What, you want from users to enter these data into some Custom field manually? Please, come on... Such data should be taken directly from the header/content of the video file when we use Add/Rescan option and they should be stored into dedicated fields. You know, such data are elemental for anyone who don't want to store some video file which has low resolution, bitrate and/or some old video codec, e.g. DivX 3.x.

Here is another thing - what if I have one video file which has two or more audio tracks? You need to store technical data for each audio track since all of them could have different audio bitrate, different number of audio channels, different audio codec... For example, some MPG video file could have one AC3 track with 5.1 channels and one MP3 track with 2 channels. So, this video file cannot be represented as one record in the Songs table. Have you covered all of that?

What about Spoken Languages field(s)? What about Subtitles? Every audio track could have different language and there could be several subtitles files for each video file...

You know, if you want to add video database support, do it properly or don't do it at all. The mentioned data are critical for database design and should be implemented from the start, you cannot add them latter without breaking compatibility.

jiri

2010-12-03 10:49

administrator   ~0021586

1. I just wonder how to implement such an algorithm to process these fields:
 a. We could consider title to be from the 1st char only until the first different field found (in this case the Date field).
 b. If some codec related string is found (like 'DVDRip', or 'XviD') then move everything from this position to Comment field and cut it off Title field.
If a&b is implemented, the result would be as Peke expected. That said, I'm a little skeptical about trying to be so clever here, because there will probably always be different strings that would be handled incorrectly by this heuristics and then it's probably better to leave some junk in Title field than to sometimes accidentally remove useful info from this field.

Zvezdan, you probably haven't read this issue properly, it's only about parsing filenames, not about metadata extraction in general. The fields you mentioned certainly _are_ in MM 4.0. It doesn't make sense to me to judge the new version without actually seeing it (and entering anything into Mantis before testing it).

Ludek

2010-12-03 11:27

developer   ~0021588

Last edited: 2010-12-03 11:27

1. I would rather leave it as is now, because otherwise the year would be removed from titles like:
'The Official 2010 FIFA World Cup Film in 3D'
and this is not desired.
Also titles like
'The Phantom (2009) DvDrip Part 2'
are problematis in this case, because 'DvDrip Part 2' would be moved to the comment field and it is not desired too.

Levaing as is in 1330.

jiri

2010-12-03 14:17

administrator   ~0021593

Right, good examples of heuristics failure. I agree, it doesn't seem to make sense to go beyond what's already implemented.

zvezdan

2010-12-03 21:52

updater   ~0021598

Jiri, sorry for hijacking this issue. Yes, I have read it properly, but its Subject says: "Default video metadata is often missing / incorrect". It could be related to any metadata, not only those inferred from filename. I have read many issues from Mantis and haven't found anything related to the fields that I mentioned. I posted my comment here because you didn't answer on my questions in Forum about video database (http://www.mediamonkey.com/forum/viewtopic.php?f=4&t=53032) and because it was unlogical to me that you are discussing since 2010-08-27 about that how filenames should be parsed and nowhere anything about video resolution and similar data. There are none information about that in Forums as well nor it could be seen from mockup images you have left on wiki. If you gave me the latest working version when I asked for it, I would know what you have implemented. Well, never mind. Since you don't think that my suggestions could be useful for its development, I will not bother you with them anymore.

peke

2010-12-04 00:18

developer   ~0021599

I agree on the examples you were right Ludek.

Verified 1330

jiri

2010-12-05 13:54

administrator   ~0021604

Zvezdan, alpha testing is going to start any day now, so there will be a plenty of options to review MM and give us some feedback. We are interested in suggestions, even a critic, but it must be based on facts.