[12:14:25] Hello folks! [12:21:37] DrTrigon, DrTrigon_ jayvdb Hi [12:21:51] your ISP ok? [12:22:40] It does work right now [12:22:47] Not sure how stable it is going to be [12:22:56] do you want to do IRC only today ? [12:23:10] That would be good [12:23:22] Meeting tomorrow would also be ok for me - in case of "emergency" [12:23:40] Nope let's do it today [12:23:57] gooood! ;) Do we want to start right now? [12:24:35] sure [12:25:10] jayvdb: ^^^ [12:25:39] ok [12:25:55] I can find time tomorrow evening if we fail tonight [12:26:12] cool! [12:26:27] So, I saw DrTrigon's comments on the -newimages log [12:26:44] I've incorporated it in the script (About showing the unique categories, and mentioning number of categories) [12:26:54] The new script is running right now, should get over soonish [12:27:19] I've also added various cmd line arguments like -limitsize to limit the file size, etc as we had discussed a few weeks back [12:27:26] coool! Like to see other cats than file types... [12:27:54] . [12:27:54] nod [12:28:24] So, I wanted to know if we should focus on adding more analysis methods to increase the percentage of files being categorized right now [12:28:43] Or should I focus on testing more to make stuff more stable to show at WIkiConferenceIndia [12:29:15] (both please ;))) jayvdb, your oppinion? [12:29:35] I agree. more automatic categories [12:29:58] we need leaf categories, which means more better analysis of the media [12:30:15] Alrght, so I will implement the OSM nominatim API key stuff next to get Location suggestions from images. [12:30:25] How many weeks to the conf? 3, 4? [12:30:38] 3 [12:31:09] But keeping last week for cleaning up, documenting stuff, etc. [12:31:18] +1 [12:31:29] (bug fixing) [12:31:48] Ping me if I can help with that [12:31:54] DrTrigon_, sure [12:32:06] Is that ok for you? [12:32:09] DrTrigon_, Thanks a lot for all the docker support ! [12:32:21] Sure!! Was not tooo much ;)) [12:32:23] nod. great to see Docker done [12:32:41] So, what other easy automatic categories can be done ? [12:33:31] I've spent some time on Monochromatic images, but haven't been able to make it robust yet ... Need to work on that. [12:33:45] nod. (wanted to mention that) [12:34:05] What is the issue? [12:34:52] AbdealiJK__, well, we should look at your logs of new files, to find groups of media that could be detected with more categories [12:35:02] I downloaded a few images for testing, and they seemed to have blues and other colors which were not very expected. And there was a wide variation in the blueness (even though it wasn't very visible by eye) [12:35:27] dark blue = black ? [12:36:16] https://commons.wikimedia.org/wiki/User:AbdealiJKTravis/logs/newimages [12:36:48] DrTrigon_, Probably - but the blue as compared to Red and Green was a very high percentage (Like 50% + pixels were blue) Don't remember exact numbers right now [12:36:51] Need to spend more time debugging [12:37:18] nod. [12:37:30] what is the thing at the bottom of these images: https://commons.wikimedia.org/wiki/File:View_from_the_carriage_resting_place_at_the_summit,_looking_n.-e_(NYPL_b11708073-G91F219_013B).tiff [12:37:33] Also, a note that Category:Icon based on (16x16. 32x32, 64x64, etc) is present now. Although no images in -newimages were Icons, so its not seen [12:37:36] gradient [12:38:19] https://commons.wikimedia.org/wiki/Category:Robert_N._Dennis_collection_of_stereoscopic_views [12:39:02] I am not sure what that gradient is. [12:39:12] https://commons.wikimedia.org/wiki/File:WiknicNYC_2016-29_cut_jeh.jpg currently only adding "Human faces" , but if there are more than one, it is a group of people [12:40:15] https://commons.wikimedia.org/wiki/Category:Groups_of_people [12:40:30] AbdealiJK__: Could you try to do a black-white/monochrom detection on https://upload.wikimedia.org/wikipedia/commons/c/c3/El%C5%91nyomul%C3%A1s_a_Bug_foly%C3%B3n%C3%A1l._Fortepan_52229.jpg at some point and tell me the result, please? [12:40:34] jayvdb, So, if >=3 people I can add "Category:Groups of people" ? [12:40:41] seems sensible [12:41:15] don't know the threshold anymore but basically YES [12:41:47] jayvdb: both cats together? faces and group of people? [12:41:50] DrTrigon_, Added that image to my ToDo. WIll ping when done [12:41:58] cool! [12:41:58] yes? [12:42:34] maybe there are different categories for faces depending on how large/focused the face is? [12:42:53] do we add both cats or does group of people replace human faces? [12:43:00] https://commons.wikimedia.org/wiki/Category:Faces_in_profile [12:43:07] jayvdb, What about Category:Robert_N._Dennis_collection_of_stereoscopic_views ? [12:43:43] AbdealiJK__, that strip at the bottom looks like it is an important 'thing', and very easy to detect [12:44:03] "Images containing a thingamabob on the bottom" [12:44:10] ask the uploaders what it is? [12:44:13] they can ask the library [12:44:18] and we find out its real name [12:45:07] can we detect big smiles ? https://commons.wikimedia.org/wiki/Category:Happy_faces [12:45:11] jayvdb, Alright [12:45:42] DrTrigon_, I think groups and face are independent categories and both can be added [12:45:52] nice! [12:46:16] guess happy faces might be a hard one [12:46:16] jayvdb, Nope - smile detection is really really bad. There's a haarcascade for it. But it's very falsey [12:46:30] hmmm.... [12:46:40] ...is ther a cascae for teeth? [12:46:58] we could try that on detected faces, like eye, nose mouth etc. [12:47:02] DrTrigon_, Nope, there's a cascade for mouth and smile IIRC [12:47:14] https://commons.wikimedia.org/wiki/File:Reyes_Nazar%C3%ADes.png - currently "line drawing" but could be https://commons.wikimedia.org/wiki/Category:Family_trees [12:47:19] ohhh ... location maps [12:47:55] https://commons.wikimedia.org/wiki/File:Armenia_adm_location_map.svg [12:48:02] I do not think it's possible to automatically categorize that as Family trees. [12:48:17] we upload lots of location maps, and they are fairly distinctive , even in colors used [12:48:58] map detection may be from color histogram... IF they all use the exactly same style... [12:49:04] jayvdb, Do they use a similar color scheme ? [12:49:06] https://commons.wikimedia.org/wiki/Category:Transparent_background [12:49:10] or train a cascade? [12:49:13] AbdealiJK__, most of them do , yes [12:50:00] graphics + specific color histogram = map? [12:50:34] Nice, I was trying to find Category:Transparent_background but was unable to :| [12:50:37] (colors: yellowish, gray and blue, with a bit of black) [12:50:59] https://commons.wikimedia.org/wiki/File:PHOTOS_INSIDE_THE_CLASSROOM_UPDATED006.jpg - this is a good one to test for smiling [12:51:09] (but if smiling is impossible, dont bother) [12:52:38] balls : https://commons.wikimedia.org/wiki/File:Alyssa_Naeher_Cleveland.jpg ? [12:52:53] round objects ... ;-) [12:53:09] circle detection? [12:53:16] https://commons.wikimedia.org/wiki/Category:Spherical_objects [12:53:22] Circle detection is very robust :) [12:53:53] But even a large number of faces could be detected as circles [12:53:55] ok detect a circle that has a least diameter of halve the image [12:53:59] wow - https://commons.wikimedia.org/wiki/File:Argentina,_administrative_divisions_-_Nmbrs_-_colored_(%2Bclaims).svg was added to human faces [12:54:39] yea... you can basically exclude that for svgs, right? [12:54:48] except on embedded stuff maybe... [12:54:49] . [12:54:57] if it is a line drawing, and a human face, it is not a human face, it is a a different category, and possible even a caricature [12:54:59] Nod - dlib detected a face in it ... There will be some falses [12:55:29] nod. what was the score and size? [12:55:54] Face (dlib) #1 [12:55:54] Score: 0.047 [12:55:54] Bounding Box: Left:141, Top:1421, Width:73, Height:73 [12:55:54] Other features: Eyes (2), Mouth, Nose [12:55:54] Face (haarcascade) #1 [12:55:57] Bounding Box: Left:605, Top:183, Width:145, Height:145 [12:56:18] small (compared to image) and bad score... [12:56:27] should be fixable ;)) [12:56:58] For line drawings, like https://commons.wikimedia.org/wiki/File:PIT-CNT.svg , add "major color" categories - you had a library for that, but I objected to it being used for photographs, but it might work very well with line drawings [12:56:59] strange that both did a false... [12:57:26] agree [12:57:59] do we have color segmentation already? [12:58:22] The score isnt very bad ... There are quite some images with correctly detected faces at 0.04x score [12:58:29] jayvdb, I think I can do color categories for any flat images. I.e. images with ONLY 4-5 colors they would normally be logos [12:58:47] DrTrigon_, No, never did it because it was not needed (yet) [12:59:01] AbdealiJK__: Faces... [12:59:12] ...you can e.g. exclude small ones, since ... [12:59:22] https://commons.wikimedia.org/wiki/File:Kit_body_monaco1617a.png - see https://es.wikipedia.org/wiki/Association_Sportive_de_Monaco_Football_Club [12:59:30] ...they are either wrong, or it needs to have a lot of them ... [12:59:44] ... to be a group photo of e.g. a concert. [12:59:58] Color/Maps: ... [13:00:03] those pixels and filenames are 100% give aways that they are football kit images [13:00:18] ... if we could have color segmentation back, we could also use it for map dtetection [13:00:20] ... [13:00:40] ... a lot of segments, but all the same colors, few fluctuations on the sgements. [13:01:11] better example https://en.wikipedia.org/wiki/Manchester_United_F.C. [13:01:55] https://commons.wikimedia.org/wiki/File:Kit_shorts.svg [13:02:16] https://commons.wikimedia.org/wiki/Category:Football_kit_templates [13:02:49] hmm, LOOKS easy ... ;) [13:02:54] an obscured ball : https://commons.wikimedia.org/wiki/File:Whitney_Engen_Cleveland.jpg [13:03:57] another human face : https://commons.wikimedia.org/wiki/File:Chile,_administrative_divisions_-_Nmbrs_-_colored.svg [13:04:09] another incorrectly categorised human face : https://commons.wikimedia.org/wiki/File:Chile,_administrative_divisions_-_Nmbrs_-_colored.svg [13:04:34] we need map detection to just exclude them [13:04:59] https://commons.wikimedia.org/wiki/File:Jannis_knight.jpg - it finds the face ... I wonder if we can detect hair color at the edges of the bounding box [13:05:02] AbdealiJK__: What do you think is possible? (I guess map and circle) [13:05:34] jayvdb: may be with segments too. [13:06:25] the oval here : https://commons.wikimedia.org/wiki/File:%D7%91%D7%A8_%D7%91%D7%95%D7%A8%D7%95%D7%9B%D7%95%D7%91,_%D7%91%D7%98%D7%95%D7%A8%D7%95%D7%A0%D7%98%D7%95,_1915.jpg [13:06:45] if we can detect the shapes of frames, we can say it is a framed photograph [13:06:56] it also makes it very likely to be a human if the frame is oval [13:06:57] AbdealiJK__: still there? [13:07:09] Yes, I am [13:07:16] TOo much information overload. Processing ... [13:07:21] ;)) [13:07:32] hair color again: https://commons.wikimedia.org/wiki/File:Janet_Tamaro_Gracie_Award_Acceptance_Speech.jpg [13:08:29] huge group of people (there must be another category for this) : https://commons.wikimedia.org/wiki/File:Women_2015_2016.jpg [13:08:56] you could at least suggest it is a team photograph [13:09:07] huge group > 10 persons ? [13:09:19] https://commons.wikimedia.org/wiki/Category:Team_photographs [13:09:26] 16 ? [13:09:51] most teams with reserves, coaches, etc are at least 16 [13:09:52] There's a very interesting https://commons.wikimedia.org/wiki/Category:People_by_quantity [13:09:53] and the faces have to be aligned somehow... [13:09:57] . [13:10:19] medals https://commons.wikimedia.org/wiki/File:Wrwc_2014_jules_and_kim.jpg [13:10:21] ► People in bed by number‎ ;)))) [13:10:25] hahaha [13:11:29] so.... we can basically add 'No people' to anything? [13:11:51] (except faces of course) [13:11:54] I found that weird too. [13:12:29] here is a weird case for line drawing colors: https://commons.wikimedia.org/wiki/File:Sudamerica_Rugby(en).png - lots of colors without much meaning [13:12:56] you could probably add 6 color categories to that one ;-) [13:13:10] :))) brilliant! [13:13:18] https://commons.wikimedia.org/wiki/File:Aby.jpg [13:13:23] sunglasses ^ [13:13:29] they are being detected as a face [13:13:40] so the 'face' detection library must know about glasses [13:14:11] * DrTrigon_ wondering about glasses haarcascades [13:14:13] Yes, it does [13:14:21] nice [13:14:40] so that should be really an easy one...? [13:14:55] nod [13:15:13] But I am not sure how accurate the sunglass detection is alone [13:15:15] WIll try [13:15:16] crests : https://commons.wikimedia.org/wiki/File:Burnaby_Lake_RC_Blue_on_White.jpg [13:15:23] https://commons.wikimedia.org/wiki/File:Sudamerica_Rugby(en).png - what were you trying to say ? [13:15:43] flag detection : https://commons.wikimedia.org/wiki/File:Identification_Flag_Thai_Army_Battalion_(Artillery).svg [13:15:45] https://commons.wikimedia.org/wiki/Category:People_with_glasses [13:16:39] Ok, I think we need to pause [13:16:41] *phew* [13:16:42] flag and map detection could go together with different thresholds [13:17:00] nod [13:17:05] another bad face detection : https://commons.wikimedia.org/wiki/File:JUNTOSSOMOSRUGBY.png [13:17:34] these hexagons are chemical drawings : https://commons.wikimedia.org/wiki/File:Oxogestone_phenpropionate.svg [13:17:56] so... AbdealiJK__ do you want to continue reporting after this very nice brainstorming (thanks jayvdb!) [13:18:31] jayvdb, In https://commons.wikimedia.org/wiki/File:JUNTOSSOMOSRUGBY.png although the faces were detected, the category isnt added [13:18:36] As it's not reliable enough [13:19:09] nod. but the detection is bad [13:19:30] it is easy to exclude media like that, as they are line drawings [13:19:58] not sure how reliable line detection is either [13:20:15] actually I would like to change "line detection" to "graphics" [13:20:26] DrTrigon_, I've done that in the new script which is currently running [13:20:27] as that is more general for now [13:20:32] perfect [13:20:53] but I would really vote for e.g. excluding single faces smaller than 70 px [13:20:58] the most common group of that /newimages is football kit [13:21:01] I am thinking.... that I can take SVG Images like https://commons.wikimedia.org/wiki/File:Oxogestone_phenpropionate.svg and detect the number of times H, O, C are written in it [13:21:13] you can detect them as football kit and detect the club colors [13:21:24] AbdealiJK__, +1 [13:21:36] yes! add text recognition generally [13:21:50] e.g. tesseract from pip [13:22:29] Tesseract is very bulky [13:22:37] better alternative? [13:22:39] Unless theres a case to add categories using that let's not add it ? [13:22:44] (for now) [13:23:01] DrTrigon_, There is no better alternative though [13:23:13] ;)) [13:23:42] https://commons.wikimedia.org/wiki/Category:Texts [13:24:12] Nod, alright [13:24:32] https://commons.wikimedia.org/wiki/Category:Text_logos [13:24:58] ...if it's easy and simple - do not waste too much time in it [13:25:13] alright [13:25:13] you could also add poppler or equivalent to look at pdfs [13:25:31] How do you propose to figure out football kits ? [13:25:48] that is a goood question? [13:25:55] how reliable is line detection? [13:26:19] DrTrigon_, I do not have any specific number for that [13:26:24] I dont think tesseract is needed here . we need to detect letters/glyphs , not sentences [13:26:55] All football kits for left/right sleeve seem to be 31 x 59. I wonder if that is some standardised size [13:27:13] (poppler extracts from pdfs) [13:27:19] there are simpler tools for finding if a image has digits and arabic letters in it [13:27:45] AbdealiJK__, yes, all of our football club articles have almost identical images in them [13:28:09] so its more like icon detection [13:28:10] the football articles have a infobox template, which automates everything if the images exist [13:28:34] Nice. [13:29:05] I can also do some basic shape detection to verify on top if we find there are false positives. But that size seems sufficiently standardizes [13:29:09] standardized* [13:29:12] the filenames are even regulated [13:29:50] nice!! [13:29:58] sound like a plan! [13:30:04] AbdealiJK__: you mentioned in the report 'Bulk test' you had issues with deleted files... [13:30:19] DrTrigon_, Yep [13:30:23] ... what about delaying analysis of new files by 5 to 30 mins...? [13:30:40] (such that users can correct misstakes first) [13:31:16] Not sure if that's going to matter much. Because those errors happened even after 600th file I analyzed which would probably have been older [13:31:29] But it does make sense to postpone the analysis [13:31:44] may be you get less issues then [13:31:58] RIght now, I've made the existence check and downloading in the same like with `and` - this has reduced those issues a lot [13:32:01] there will always be such cases [13:32:20] ... [13:32:20] right [13:32:27] what about ... [13:32:41] ... not checking for existence at all and just try to download directly? [13:32:54] if it fails ... continue on. [13:33:29] Yep, I can do a try catch for that :+1: [13:33:42] cool [13:34:49] Another thing I need to report ... [13:35:04] ... I still have issues with the scripts ... [13:35:25] The bulk script ? [13:35:29] Which version are you using ? [13:35:51] ... for bulk.py we should add some 'unicode' statements: https://gist.github.com/drtrigon/a1945629d1e7d7f566045629a43c0b06 [13:36:07] Is this inside the docker ? [13:36:13] (version: I do not know... I hope it was quite recent) [13:36:26] yes, inside catimages-gsoc [13:36:52] (the patch is old for sure but shows the issue) [13:36:52] So, the str() for that I added about a week ago [13:37:20] the str in the map? [13:37:20] Nod, this was like a very minor issue which was fixed within 15 mins. So it seems like you have an old-ish version still [13:37:34] ok, I'll check that! [13:37:42] then about simple_bot.py ... [13:37:44] Where are you getting the bulk script from btw ? [13:38:17] checking https://gist.github.com/drtrigon/2dcbc5fbac1e00f0f89dec9343994e48 [13:38:39] ill be back in 5 [13:38:41] sorry wrong: https://commons.wikimedia.org/wiki/User:DrTrigon/file-metadata [13:38:50] $ wget https://raw.githubusercontent.com/AbdealiJK/file-metadata/bulk/tests/bulk.py [13:39:12] aha [13:39:20] ? [13:39:33] wrong place? [13:39:42] It's now at https://github.com/pywikibot-catfiles/file-metadata/blob/ajk/work/file_metadata/wikibot/bulk_bot.py [13:40:00] ok, I'll chane that and test again! Thanks! [13:40:01] I shifted it here 4-5 days ago because Im trying to make it a "official" bot script [13:40:08] simple_bot.py: ... [13:40:21] RIght, so what's the error with simple_bot.py ? [13:40:29] ... I alwyas have the nasty ascii - unicode conversion/encoding issues [13:40:39] Right. That's because of docker [13:40:45] ... the actually come from pywikibot somehow... [13:41:01] Most OSes have a nice locale setting. But docker does not as it's minimal environment [13:41:15] aaa [13:41:26] i have to set it to uft8? [13:41:33] I think exporting `PYTHONIOENCODING=utf-8` should overall solve that. But we should indeed handle that in a better way [13:41:51] Because windows does not use utf-8 and will fail there (IIRC) [13:42:06] I think that is an never ending story with pywikibot in python2 [13:42:22] Yep, encodings are a pain. [13:42:23] I think the statement was python3 will solve that anyways... [13:42:47] Not quite [13:42:49] I think there was even a patch to python2 once... jayvdb can may be comment on that wehn he's back [13:42:58] But my understanding of encodings is minimal. I'll check it out :+1: [13:43:13] I did see that issue you made on docker-file-metadata [13:43:15] I think there were UNSOLVABLE cases [13:43:21] https://github.com/pywikibot-catfiles/docker-file-metadata/issues/7 [13:43:26] excatly [13:43:34] may be at the wrong place, right? [13:43:59] I wonder if using pywikibot.output() instead of print() will solve that [13:44:21] possible, can check that! [13:45:21] DrTrigon_, A review on https://github.com/pywikibot-catfiles/docker-file-metadata/pull/6 when you're free would also be good [13:45:28] It should reduce the docker image size a bit. [13:45:39] And ad travis testing [13:45:42] add * [13:46:12] will look into that then of course, noted! :) [13:46:28] btw ... [13:46:32] yes ? [13:46:47] ... I did quite some changes to docker-file-metadata (master) [13:46:54] Yep. I saw them all [13:47:05] may be you want to pull some of them into file-metadata? [13:47:13] Why ? [13:47:25] Normally the docker repo and code repo are separate [13:47:32] e.g. the dockerfiles into ajk/docker in order to enable building from dockerhub [13:47:46] But Travis will build and upload stuff to dockerhub for us [13:47:52] ok, I have no idea about how to properly organise [13:47:57] https://commons.wikimedia.org/wiki/Category:Unidentified_objects [13:48:18] jayvdb, ? [13:48:18] https://commons.wikimedia.org/wiki/Category:Unidentified_symbols should be easy to add items to [13:48:22] will travis upload to pywikibotcatfiles? [13:48:54] DrTrigon_, It can upload to anything I can. I've added by username password (encrypted) to it [13:49:13] Currently (mistakenly) it's set to abdealijk/file-metadata. Need to change it. [13:49:39] I see ... and travis will build all dockers from docker-file-metadata repo? [13:49:52] https://commons.wikimedia.org/wiki/Category:Unidentified_people here we go ; this should be used in addition to category:faces [13:50:31] and with even limited geo info, you can use subcats of https://commons.wikimedia.org/wiki/Category:Unidentified_people_by_country instead [13:50:35] jayvdb: agree a must have! [13:50:44] Yep ! [13:50:49] DrTrigon_, So, how it works is [13:50:58] WHen we push to docker-file-metadata, Travis builds are triggered [13:51:15] Now in travis, we tell travis to execute the commands we normally execute to upload to docker [13:51:39] building the centos docker takes 1h [13:51:44] So, essentially it uploads to docker using those commands (To which ever docker repo we want it to and with which ever tags we want it to use) [13:52:08] Not in travis. Travis has a very fast internet and has good processors [13:52:23] ok, cool. What about when we push to file-metadata? [13:52:25] can I suggest that this week we dont do any technology improvements -- just adding/improving algorithms to add categories [13:52:33] will that trigger a build as well? [13:52:51] jayvdb: yes, sorry, just want to understand... [13:53:02] we need to get a feeling for whether 5% categorisation can be obtained [13:53:09] that is a nice number [13:53:32] that is why I would like to look at the new stats as soon as we have them [13:53:32] DrTrigon_, so when we push to file-metadata, the file-metadata's Travis is run. This doesnt use docker at all and just runs the unittests [13:54:14] so that should trigger building of dockers as well. That's all for me about that. ;) [13:54:24] back to what jayvdb mentioned ... [13:54:27] since these NYPL tiff files are so common in the new files stream, we should talk with them to find out how we can help them [13:54:51] ah , a ruler at the bottom : https://commons.wikimedia.org/wiki/File:Deacon_McGuire,_3-20-86_(NYPL_b13537024-55953).tiff [13:54:58] ... I need to sort out what amount of cats are not file type cats. [13:55:25] Would you like me to remove File type cats for now ? [13:55:33] jayvdb: what about downloading the files in reduced size for first analysis? [13:55:35] I mean we basically know that every file will have a file type cat ... so ... [13:55:38] if we know they are all inches, the detection could determine the size of the object [13:55:42] the big tiff files [13:56:02] mediawiki has an automatic image scaler [13:56:13] so you can fetch a smaller version if it is helpful [13:56:21] since AbdealiJK__ mentioned they are very big sometimes [13:56:43] I'm currently just ignoring them with -limitsize [13:57:16] this is another batch upload happening now : https://commons.wikimedia.org/wiki/Commons:Batch_uploading/Fortepan.HU [13:58:02] So. 1 moment: jayvdb, Could you send an email to NYPL team and see if we can do something specific for them ? [13:58:19] AbdealiJK__: NO, do not remove file type cats. They are ok, just let me have the stats to see whats going on. [13:58:30] DrTrigon_, ok [13:59:09] jayvdb: 5% cats ... [13:59:09] no. both NYPL and Fortepan.HU are being done by https://commons.wikimedia.org/wiki/User:F%C3%A6 , so talk to him ? [13:59:41] ... as I saw we had a lot more, but we need leaf cats as you said, right? [13:59:49] alright [14:00:12] cc: me, if you want, but I cant help much - he wants your help for his project , and you want his help for your project. [14:00:26] +1 [14:00:38] all of these images beg for categories to be added [14:00:42] and you can do that for him [14:01:36] jayvdb: what kind of images are that? all photographs of people? [14:02:03] Would be better to speak to Fae - as he'd have a better idea [14:02:17] there are all sorts in : https://commons.wikimedia.org/wiki/Category:Robert_N._Dennis_collection_of_stereoscopic_views [14:02:29] and other batch categories he is populating [14:03:10] so we should have the methods running that we discussed last week, right? [14:04:01] would that make sense? be helpful? [14:04:32] or do you just want to run, what we have right know on those sets? [14:05:10] DrTrigon_, sorry, what methods are you talking about ? [14:05:27] monochrom photographs etc... [14:06:21] Yes [14:07:23] Hm. That's interesting. I would have assumed https://commons.wikimedia.org/wiki/File:Lövészárok_az_Ikva_folyó_közelében._Fortepan_52218.jpg would have been categorized as Black and White [14:07:27] Need to check that out. [14:07:51] jayvdb: do you want to run what code we now on those sets? or do you want to have also cats like we discussed last and this week (e.g. monochrom pictures etc.)? [14:08:04] I would rather wait and add these [14:08:41] Only because I want to code and am a bit tired of testing >_< [14:09:21] to me both is fine? just wanted to check jayvdb oppinion? [14:09:26] fine! [14:09:41] jayvdb, Still there ? [14:10:02] yup [14:10:23] Which do you think should be priority ? [14:11:22] I think we shouldnt add categories yet. we need leaf categories. geo will help select better categories [14:12:03] at least per country categories where they exist [14:12:16] (need leaf categories = add categories?) [14:12:40] Sorry I don't understand. I think DrTrigon_ Was asking whether to add these new categories like Transparent_background, Football_kits, etc OR run the current code on NYPL and Fortepan.HU datasets [14:12:52] ^ or rather which to do first [14:14:11] work on new algorithms and improving existing ones [14:14:20] alright, cool [14:14:30] start the conversation with Fae about his two batch uploads [14:14:45] he may suggest features which you can implement as new algorithms [14:14:57] sure, will do [14:15:36] this bot can only work on uncategorised files. if you categorise a file, on the wiki, you cant update the categories later [14:16:05] because a human could have added categories,and your bot cant detect when it is adding a dumber version of the same category [14:16:26] Yep (right now atleast) [14:17:09] we want to hit 5% in non-writing testing first, then we can do a real run and hit that 5% , and write that in the project report [14:18:07] if you partially categorise the easy one now with dumb categories, you cant (easily) recategorise them later with better categories which will show the features of the tool [14:18:39] (good point!) [14:19:31] (if 5% is the right number of course ...) [14:20:08] I need the stats to decide on that. [14:21:06] DrTrigon_, to confirm. You wanted stats on how many files were being added to each category right ? [14:21:55] Exactly, to distinguish between the top and the others a bit. [14:22:22] jayvdb: Are you considering categorisation of such upload sets or recently uploaded ones or old uncategorized ones, when you talk about the 5%? [14:22:33] newfiles [14:22:53] but we can cheat, knowing what newfiles will contain [14:23:09] There I was always thinking we say 5% whin a day period e.g. [14:23:30] that means as we improve the bot, that rat will increase on a per day basis [14:23:36] yes. but we know Fae is adding files every day ;-) [14:23:48] and we know what files he is adding [14:24:17] (cheating can be complex... ;)) [14:24:39] so we know more about the input - it isnt random - and we can optimise this project to work well on the current input files [14:24:50] (nod) [14:25:03] it looks like 5% of newfiles is NYPL [14:25:08] maybe even higher [14:25:24] Hey ! [14:25:31] The new analysis is up at https://commons.wikimedia.org/wiki/User:AbdealiJKTravis/logs/newimages [14:25:57] DrTrigon_, Is this what you were expecting ? [14:26:09] * DrTrigon_ checking... [14:26:10] ^ are these stats what you were expecting * ? [14:26:46] perfect very nice - since now we can calculate whether we have 5% or not! +10 [14:27:16] 10% human faces!!! [14:27:41] * DrTrigon_ need to check the false rate now... [14:27:54] Interesting. It seems like PAINT.NET has "Made with Paint.NET" instead of "Created with Paint.NET" [14:28:30] DrTrigon_, If you wanted this data earlier you could have searched for "Category:Human faces" in Ctrl+F of the browser and seen the number of hits. [14:28:36] ^ Just mentioning as that's what I had done earlier [14:30:47] ? [14:31:37] Nevermind ^^ [14:31:48] jayvdb: Do you want to add anything? [14:32:09] no [14:32:44] AbdealiJK__: Anything from your side? [14:32:47] Nope [14:33:37] So the I think we should finish, right? [14:33:57] Yep [14:34:00] Cya later :) [14:34:33] Thanks a lot for all your input guys! [14:34:38] ;-)