[06:58:40] DrTrigon, briefly I initially tried using the Intensity profile of the top and bottom of the images [06:59:14] This was fairly noisy, and I attempted to find peaks (jumps) because these are squares (so, there would be a flat portion and then a jump, and so on) [06:59:18] Similar to floor(x) [06:59:47] I did that, and threshold the number of jumps ~20 (+/- 3) to avoid errors [07:00:08] I've tried median filtering and mean filtering the image before taking the intensity profile to reduce noise [07:00:42] I tried segmentation too - and the images were overly segmented. i was unable to find a single set of parameters to use for all images to segment it correctly [07:00:50] The images are pretty noisy in general [07:16:30] DrTrigon: The images are pretty noisy in general [08:38:07] DrTrigon: There ? [08:58:17] AbdealiJK: now I have some time. what is the current algo? [08:59:40] So, I tried a few things [09:00:00] The algo you see in the -newimages page is taking the intensity profile on the top/bottom [09:01:31] Still here ? [09:01:38] jepp [09:02:31] So, yeah getting intensity profile [09:02:53] Tryibng to detect the jumps in the profile [09:03:34] There should be around 20 jumps and I check that the jumps should be greater than a threshold [09:03:48] you talk of the "ruler" or scale on the bottom, right? [09:03:55] No [09:03:59] The IT8 bar [09:04:06] Which has those 20 shades of grey [09:04:28] thats what I meant [09:04:49] grey? remeber seeing yellowish... [09:04:53] Ah, alright (There was another "ruler" also ... so, misunderstood) [09:05:02] ;) [09:05:17] https://commons.wikimedia.org/wiki/File:Schooner_Mendocino_wrecked_at_Mendocino_in_the_great_storm_of_1867_(NYPL_b11707260-G89F318_001F).tiff [09:05:26] ^ That thing on the top of this file [09:06:01] wait... I though they are on the bottom... or both? [09:06:17] It can be on top or the bottom [09:06:25] hmm [09:06:39] so you take intensity profile of the whole image? [09:07:05] Nope [09:07:13] I first segment out the top and bottom 10% of the image [09:07:23] ok. [09:07:28] And then take the middle 5 pixel rows of that segmented image [09:07:39] good [09:08:05] And then do a mean or median in these pixel rows (mean/median over column) [09:08:26] To get 1 row - which I am doing the peak detection on [09:08:44] Have you tried color segmentation? [09:08:53] I did try color segmentation [09:09:06] but? [09:09:26] The parameters required for the "correct" ~20 segments to be got varies a lot [09:09:56] hmmm [09:10:24] the thing with you pixel row is you loose a lot of information [09:10:28] I tried The 3 algos in skimage: SLIC, Felzenzwalb, Quickshift [09:10:52] DrTrigon: I lose "geometry" information - but the color information is more refined than earlier [09:11:16] imagin doing this as a human beeing - I guess you would have the same falses... [09:11:23] Nope [09:11:38] yes but the geometry count as well [09:11:47] So, I drew the plots and compared and I didn't have atleast some of the falses [09:11:53] True, the geometries do count [09:12:13] I tried a canny corner detection, but that was highly erroneous too [09:12:49] i.e. the corners detected varied with images for the same parameters [09:13:11] contours? [09:13:32] Sorry, what exactly is contours ? Edge detection ? [09:13:34] did you check on color vs greyscale? [09:14:03] So, canny is only possible on greyscale [09:14:12] I tried checking the RGB values of the pixels in the scale [09:14:19] And they were not all R==G==B [09:14:39] In fact there was as much as +/- of 40 in between the R, G, B channels [09:15:03] 40 is aprox 15% as 255 is maximum [09:16:45] DrTrigon: But other than this ... are you aware of what jayvdb is thinking of the metric to be ? [09:17:02] * DrTrigon thinking... [09:17:18] metric? you mean MVP? [09:17:45] yep [09:18:21] that was my last question basically - did not check for answer yet... [09:19:00] Alright [09:19:04] Coming back to IT8 then [09:21:24] will answer jayvdb later - yes IT8 [09:21:49] I feel we need to use height info too [09:22:20] additinally check for a (white) gap between bar and pictures [09:22:44] and exclude images that are not grayscale there. [09:22:58] (I saw e.g. grass etc.) [09:24:37] AbdealiJK: ^^^ [09:25:21] * DrTrigon2 back [09:25:50] How exactly can that be done ? [09:26:01] Grey scale images - Theres all sort of colros like blue, sepia, etc also [09:31:11] then do sepia vs monochrome detection on the extracted bar first [09:31:51] other question: what is the amount if those pictures in the wiki, are the relevant? [09:34:19] There are quite a lot of images [09:34:26] about 27k or so I think [09:34:33] It is very popular in museums [09:34:45] Sepia vs monochrome is not easy - there are too many errors possible there too [09:39:27] try to make a strict but working algo that may be gets 5% only, then we can improve [09:42:26] so 1. extract region [09:42:41] 2. check colors [09:42:54] 3. convert to gray scale [09:43:08] 4. denoise [09:43:26] 5. check for white separator [09:43:36] 6. do what you did [09:44:01] AbdealiJK: ^^^ do this make sense? [09:51:24] DrTrigon2: Sorry - was away for a bit [09:51:41] If the case is to get 5% right - I can do that, yes [09:51:53] then improve [09:51:58] Nod [09:52:08] I proposed: [09:52:21] 1. extract region [09:52:32] I can see the steps you mentioned [09:52:33] 2. check color [09:52:46] 3. convert to gray scale [09:52:55] It sounds good. Essentially cascade all the weak classifiers and make a strong one which may not always be right [09:53:01] 5. check for white separator [09:53:11] ^ which may not get everything, but gets some * [09:53:15] 6. do what you did [09:53:47] then learn how to improve it or train it on a set [09:54:07] does that sound good? [09:54:32] yep [09:55:22] cool! :) [09:58:58] Will try it out :+1: [09:59:04] Thanks ! [10:00:31] your always wellcome! :)) [12:28:28] jayvdb, DrTrigon2 Hi [12:28:57] hi [12:32:09] ... [12:39:06] john are you here? [12:39:13] jayvdb, Could you type here ? Unable to hear [12:39:37] yea ok [12:40:57] so I guess what we need is "Histogram of number of categories found per file" to be excluding file type categories [12:44:24] then we need to look at which of these categories are not a leaf category [12:44:32] agree [12:44:33] e.g. 'Czech Republic' [12:51:16] sorry I didnt hear that [12:52:40] https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm [12:53:44] https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm#uploader_activity_levels [13:09:02] I don't here you anymore sorry [13:09:15] jayvdb: ^^^ [13:09:16] sorry [13:09:35] i cant hear you either ;-) [13:09:37] Touch a page... (was about the last) [13:09:53] if the bot edits a page, it should be a good job [13:10:04] people will review the contribution history, and we want them to be impressed [13:12:46] if it only adds "Category:JPEG files" to many pages, people will believe that is all it does [13:13:03] right, agree [13:13:42] so the auto mode should decide not to edit a page if it only adds low quality (file type) categories [13:14:06] agree with john [13:14:10] then every diff is impressive [13:14:19] ! agree too [13:14:23] I agree too * [13:14:45] why editing a page at the moment anyway? just write logs... [13:21:48] Unable to hear you again jayvdb [13:21:49] CC DrTrigon [13:21:51] ill think more about geography. [13:32:54] https://commons.wikimedia.org/wiki/Category:Hidden_faces_in_objects_or_places [13:34:08] Category:Graphics is also a medium quality category. [13:34:21] We talked last week about detecting maps vs logos [13:34:36] did you have any success there? [13:37:49] Category:Human faces will be the top item in the table ... it is hard to avoid that [13:38:05] exactly , ok... [13:38:32] but we need to make it a lower percentage, and make other better subcategorisation a higher percentage [13:48:41] DrTrigon, ? [13:48:49] sorry fel ut [13:53:31] https://commons.wikimedia.org/wiki/Category:Hidden_faces_in_objects_or_places [14:06:49] What about checking SVG for validity and the put into these: https://commons.wikimedia.org/wiki/Category:SVG_created_with_..._templates