2016-07-08 12:21:44
|
<jayvdb>
|
hey ;-)
|
2016-07-08 12:22:55
|
<jayvdb>
|
have you looked at the video copyright code yet?
|
2016-07-08 12:28:16
|
<AbdealiJK>
|
Hi DrTrigon DrTrigon_ jayvdb
|
2016-07-08 12:28:30
|
<jayvdb>
|
perfect timing ;-)
|
2016-07-08 12:28:34
|
<DrTrigon_>
|
Hi!
|
2016-07-08 12:28:46
|
<AbdealiJK>
|
:D
|
2016-07-08 12:28:54
|
<AbdealiJK>
|
I didn't have time today to do it on the recent files though :(
|
2016-07-08 12:29:23
|
<AbdealiJK>
|
Had to go out suddenly and just came back. I wil do it in another 2-3 hrs and post results for that in conpherence
|
2016-07-08 12:30:24
|
<jayvdb>
|
ok
|
2016-07-08 12:30:24
|
<DrTrigon_>
|
about time; the other meeting today is at a bad timing for me - any change that we find a replacement?
|
2016-07-08 12:31:06
|
<jayvdb>
|
I havent heard back from Ty. If he can make it, I'll talk with him, and then organise another meeting with us all
|
2016-07-08 12:31:26
|
<AbdealiJK>
|
Tomorrow same time or day after same time is good for me.
|
2016-07-08 12:31:39
|
<DrTrigon_>
|
that would be perfect - if not possible ping me and I'll try to follow...
|
2016-07-08 12:31:41
|
<AbdealiJK>
|
I'm not available on Mon-Wed at that time though
|
2016-07-08 12:32:18
|
<jayvdb>
|
ok. AbdealiJK is that time today good for you?
|
2016-07-08 12:32:24
|
<DrTrigon_>
|
can we share a google calendar with availability? or a doodle?
|
2016-07-08 12:32:24
|
<AbdealiJK>
|
Yes
|
2016-07-08 12:32:26
|
<jayvdb>
|
ok
|
2016-07-08 12:32:59
|
<DrTrigon_>
|
if this is of any use?
|
2016-07-08 12:33:02
|
<jayvdb>
|
If Ty cant make it in a few hours, a Doodle would be good
|
2016-07-08 12:33:19
|
<DrTrigon_>
|
perfect - so let's see what happens
|
2016-07-08 12:33:32
|
<DrTrigon_>
|
are you fine guys? do we want to start?
|
2016-07-08 12:33:40
|
<jayvdb>
|
ok, do we intend to skype now?
|
2016-07-08 12:34:06
|
<AbdealiJK>
|
I think IRC is fine - not much to discuss this week
|
2016-07-08 12:34:14
|
<AbdealiJK>
|
But anythings good for me
|
2016-07-08 12:34:15
|
<DrTrigon_>
|
no oppinion
|
2016-07-08 12:34:41
|
<AbdealiJK>
|
jayvdb, Which would you prefer ?
|
2016-07-08 12:35:56
|
<jayvdb>
|
maybe try with IRC first and then a quick skype for anything that we want to chat about
|
2016-07-08 12:36:08
|
<AbdealiJK>
|
Alrighto
|
2016-07-08 12:36:15
|
<DrTrigon_>
|
alrighty
|
2016-07-08 12:36:32
|
<AbdealiJK>
|
So, regarding updates - I looked at EXIF data a bit more to see if we can use any more of it
|
2016-07-08 12:37:14
|
<AbdealiJK>
|
I found that we can detect whether an image is a screenshot using one bit of information (If "gnome-screenshot" was in some EXIF data it was a screenshot from GNOME's default tool)
|
2016-07-08 12:37:40
|
<AbdealiJK>
|
I tried checking other tools too - cheese, digikam, imagemagick, etc. But none of them left any trace on the exif data
|
2016-07-08 12:38:13
|
<jayvdb>
|
very useful
|
2016-07-08 12:38:28
|
<jayvdb>
|
screenshots can be copyrighted, and thus need to be deleted
|
2016-07-08 12:38:29
|
<AbdealiJK>
|
I then also tried to add more softwares but found popular things like matplotlib, octave, matlab, etc don't mess with the exif data :(
|
2016-07-08 12:38:54
|
<AbdealiJK>
|
jayvdb, Nod. Plus there's a Category:Screenshot where we can move it to get the right people looking at it
|
2016-07-08 12:39:02
|
<jayvdb>
|
yup
|
2016-07-08 12:39:08
|
<DrTrigon_>
|
jayvdb: > 3000 files in cat screenshots
|
2016-07-08 12:40:17
|
<DrTrigon_>
|
AbdealiJK: but you have some detection for matlab, right?
|
2016-07-08 12:40:23
|
<jayvdb>
|
I see "Number of grey shades used: 246" - I guess that can be used to determine if a image is b&w only
|
2016-07-08 12:41:04
|
<AbdealiJK>
|
DrTrigon_, Right you are. Sorry - matlab was written wrongly in that message
|
2016-07-08 12:41:16
|
<DrTrigon_>
|
:)
|
2016-07-08 12:41:26
|
<AbdealiJK>
|
jayvdb, Is there a reason to detect BnW images ? I found that the Category didnt have all sorts of BnW images
|
2016-07-08 12:41:37
|
<jayvdb>
|
there is a separate category for browns only , like https://commons.wikimedia.org/w/index.php?oldid=198123175
|
2016-07-08 12:41:38
|
<DrTrigon_>
|
yea it's a pitty not all software use the exif tags ...
|
2016-07-08 12:42:11
|
<jayvdb>
|
https://commons.wikimedia.org/wiki/Category:Black_and_white_photographs
|
2016-07-08 12:42:28
|
<jayvdb>
|
and other https://commons.wikimedia.org/wiki/Category:Monochrome_photographs
|
2016-07-08 12:43:03
|
<AbdealiJK>
|
Ahh, ok. I was just looking at Category:Black and white
|
2016-07-08 12:43:12
|
<jayvdb>
|
your tool should be able to process all of the files in https://commons.wikimedia.org/wiki/Category:Monochrome_photographs , and move them into a subcategory
|
2016-07-08 12:43:49
|
<DrTrigon_>
|
(straight forward...)
|
2016-07-08 12:44:08
|
<AbdealiJK>
|
Not very sure if it is straight forward - but yes will look into it
|
2016-07-08 12:44:14
|
<jayvdb>
|
hehe
|
2016-07-08 12:44:18
|
<AbdealiJK>
|
Especially the sepia color checking
|
2016-07-08 12:44:18
|
<DrTrigon_>
|
;))
|
2016-07-08 12:44:41
|
<DrTrigon_>
|
You don't have to do ALL subcats PERFECT
|
2016-07-08 12:44:47
|
<AbdealiJK>
|
Cool, will do that :+1:
|
2016-07-08 12:44:59
|
<AbdealiJK>
|
DrTrigon_, nod
|
2016-07-08 12:45:24
|
<AbdealiJK>
|
I think converting RGB -> HSV and then thresholding the Hue should work ...
|
2016-07-08 12:45:44
|
<DrTrigon_>
|
or peak detection?
|
2016-07-08 12:45:47
|
<AbdealiJK>
|
will try it out later
|
2016-07-08 12:46:03
|
<DrTrigon_>
|
may be also with summing/integrating?
|
2016-07-08 12:46:03
|
<AbdealiJK>
|
Peak detection to find whether an image is sepia color ?
|
2016-07-08 12:46:35
|
<jayvdb>
|
regarding models, it would be useful to publish a list of makes/models that you cant categorise, because the category doesnt exist
|
2016-07-08 12:46:52
|
<jayvdb>
|
someone probably needs to create a category, or create a redirect to an existing category
|
2016-07-08 12:47:50
|
<AbdealiJK>
|
What do you mean by publish a list ?
|
2016-07-08 12:47:51
|
<DrTrigon_>
|
jayvdb: yes important point! good note!
|
2016-07-08 12:48:03
|
<DrTrigon_>
|
e.g. if there is a color cat missing...
|
2016-07-08 12:48:19
|
<DrTrigon_>
|
...like you have red, green but blue is missing
|
2016-07-08 12:48:55
|
<DrTrigon_>
|
jayvdb: can we just use/add thes cats?
|
2016-07-08 12:49:12
|
<DrTrigon_>
|
not too extensively of course...
|
2016-07-08 12:49:34
|
<jayvdb>
|
yes, possibly, but I am worried about junk EXIF data resulting in junk categories.
|
2016-07-08 12:50:07
|
<AbdealiJK>
|
jayvdb, I think DrTrigon_ meant can we add it manually when we find it ourselves rather than making a list
|
2016-07-08 12:50:14
|
<jayvdb>
|
maybe have a threshold - if the bot has seen the same make/model 3 times, add the category to the file page even if the category does not exist
|
2016-07-08 12:50:14
|
<DrTrigon_>
|
yes, we should only do that in very restricted cases
|
2016-07-08 12:50:49
|
<AbdealiJK>
|
Nooo, the bot should never add it.
|
2016-07-08 12:50:54
|
<DrTrigon_>
|
AbdealiJK: some of them are not very disputed - but some will be - thus a list beforehand might be useful
|
2016-07-08 12:50:57
|
<jayvdb>
|
then it is easy for a human reviewer to create the category
|
2016-07-08 12:51:24
|
<AbdealiJK>
|
Ah - right. If there's a human reviewer then it's fine.
|
2016-07-08 12:51:26
|
<jayvdb>
|
AbdealiJK, the bot shouldnt create the category, but it can add `[[Category:Foo]]` even if the category doesnt ecist
|
2016-07-08 12:51:27
|
<DrTrigon_>
|
the bot checks whether it exists, and if start to populate
|
2016-07-08 12:51:50
|
<jayvdb>
|
*exist
|
2016-07-08 12:51:51
|
<DrTrigon_>
|
jayvdb: that's what I consider "creating"
|
2016-07-08 12:51:58
|
<DrTrigon_>
|
... ;))
|
2016-07-08 12:52:15
|
<AbdealiJK>
|
Just a note - even if the category doesnt exist, the bot does print it (just without the [[ ]] for the link)
|
2016-07-08 12:52:20
|
<jayvdb>
|
ya, it does implicitly create a category
|
2016-07-08 12:52:48
|
<AbdealiJK>
|
ok, got it
|
2016-07-08 12:52:54
|
<jayvdb>
|
AbdealiJK, but to reach 5% (or another goal %), you need to modify the file pages
|
2016-07-08 12:53:19
|
<jayvdb>
|
so linking to categories, even if they do not exist, is still categorisation
|
2016-07-08 12:53:38
|
<AbdealiJK>
|
ok
|
2016-07-08 12:53:54
|
<DrTrigon_>
|
jayvdb: which is good?!?
|
2016-07-08 12:53:59
|
<jayvdb>
|
yup
|
2016-07-08 12:54:29
|
<DrTrigon_>
|
AbdealiJK: about "Camer/Scanner" ...
|
2016-07-08 12:54:38
|
<AbdealiJK>
|
DrTrigon_, yes ?
|
2016-07-08 12:54:53
|
<DrTrigon_>
|
... I think you can try to suggest categories ...
|
2016-07-08 12:55:11
|
<DrTrigon_>
|
... again be bit careful and do only obvious cases ...
|
2016-07-08 12:55:21
|
<DrTrigon_>
|
... that is better than nothing, right?
|
2016-07-08 12:55:31
|
<AbdealiJK>
|
I am suggesting categories right ?
|
2016-07-08 12:55:34
|
<jayvdb>
|
DrTrigon , that is what I was talking about above. The bot can add categories to the page, which is a suggestion
|
2016-07-08 12:55:35
|
<AbdealiJK>
|
I don't think I understand
|
2016-07-08 12:55:48
|
<DrTrigon_>
|
"Because of this, it cannot "suggest" categories that do not exist yet"
|
2016-07-08 12:55:55
|
<AbdealiJK>
|
Ahhh
|
2016-07-08 12:55:56
|
<DrTrigon_>
|
(may be I got it wrong?)
|
2016-07-08 12:56:13
|
<AbdealiJK>
|
ok. Got what you meant
|
2016-07-08 12:56:21
|
<jayvdb>
|
we call them 'red categories'
|
2016-07-08 12:56:32
|
<AbdealiJK>
|
Based on the above discussion that it's ok to add red-categories, I will add it.
|
2016-07-08 12:56:42
|
<DrTrigon_>
|
cool! ;)
|
2016-07-08 12:56:42
|
<jayvdb>
|
ok
|
2016-07-08 12:56:48
|
<AbdealiJK>
|
BTW: One issue here is that
|
2016-07-08 12:57:07
|
<AbdealiJK>
|
I cannot know whether it should be "Taken with <make>" or "Scanned with <make>"
|
2016-07-08 12:57:27
|
<DrTrigon_>
|
in case of doubt just "<make"
|
2016-07-08 12:57:32
|
<AbdealiJK>
|
Hence we still cannot suggest whether it is "scanned" or "taken"
|
2016-07-08 12:57:58
|
<DrTrigon_>
|
nice info but not the main part...
|
2016-07-08 12:58:14
|
<DrTrigon_>
|
...as soon as you look up the software you should know it anyways.
|
2016-07-08 12:58:15
|
<jayvdb>
|
"Created with ..." ?
|
2016-07-08 12:58:16
|
<AbdealiJK>
|
I think I'll just make "... with <make>" so that the human who is checking can easily move that category and decide what "..." is
|
2016-07-08 12:58:33
|
<AbdealiJK>
|
"Created with ..." is for softwares jayvdb - Like ImageMagick, Matlab, etc.
|
2016-07-08 12:58:57
|
<AbdealiJK>
|
"Taken with ..." is for camera photographs
|
2016-07-08 12:58:57
|
<jayvdb>
|
a camera is 80% software these days ;-)
|
2016-07-08 12:59:13
|
<DrTrigon_>
|
yes but a scanner or a generator?
|
2016-07-08 12:59:16
|
<DrTrigon_>
|
;))
|
2016-07-08 12:59:19
|
<AbdealiJK>
|
I don't think the commons community would agree though ^
|
2016-07-08 12:59:38
|
<DrTrigon_>
|
photographs or coders?
|
2016-07-08 13:00:11
|
<AbdealiJK>
|
Ok, 1 moment. I'd like to pause as I find this is a very small issue
|
2016-07-08 13:00:25
|
<AbdealiJK>
|
I've already tested it for most of the big brands - Fujifilm, Nikon, Canon, etc.
|
2016-07-08 13:00:27
|
<DrTrigon_>
|
sure, go on!
|
2016-07-08 13:00:46
|
<AbdealiJK>
|
And so 80% of images should be categorized correctly
|
2016-07-08 13:01:08
|
<jayvdb>
|
ok. so not worth worrying about?
|
2016-07-08 13:01:20
|
<AbdealiJK>
|
We can increase it later by adding *some sort of* red links. But It's a minor detail and we can just do it later
|
2016-07-08 13:01:36
|
<jayvdb>
|
nod.
|
2016-07-08 13:01:47
|
<DrTrigon_>
|
agree, the code needs to be ready for that ...
|
2016-07-08 13:01:57
|
<AbdealiJK>
|
agree
|
2016-07-08 13:02:00
|
<DrTrigon_>
|
... but addind is some "minor" task for later.
|
2016-07-08 13:02:11
|
<jayvdb>
|
Maybe the red-category can be "Taken/Scanned with <make>"
|
2016-07-08 13:02:37
|
<AbdealiJK>
|
^ interesting :)
|
2016-07-08 13:02:50
|
<jayvdb>
|
or "Taken or scanned with <make>"
|
2016-07-08 13:02:54
|
<DrTrigon_>
|
or we ask on village pump? ;)
|
2016-07-08 13:03:08
|
<DrTrigon_>
|
AbdealiJK: do you want to continue with the next point?
|
2016-07-08 13:03:09
|
<AbdealiJK>
|
I think we can just choose one. And if someone comments with a better option, we can just use cat-a-lot to move stuff easily
|
2016-07-08 13:03:20
|
<AbdealiJK>
|
DrTrigon_, yes
|
2016-07-08 13:03:21
|
<jayvdb>
|
nod
|
2016-07-08 13:03:22
|
<DrTrigon_>
|
nod
|
2016-07-08 13:03:49
|
<jayvdb>
|
now regarding OpenStreetMap vs Google Maps, we really should be using OpenStreetMap if we can
|
2016-07-08 13:03:51
|
<AbdealiJK>
|
So, I tried creating an "official" docker file based on our last week conversation
|
2016-07-08 13:04:10
|
<DrTrigon_>
|
nod
|
2016-07-08 13:04:14
|
<AbdealiJK>
|
AH, ok - skipping the docker point then
|
2016-07-08 13:04:18
|
<jayvdb>
|
how much worse is the OSM data ?
|
2016-07-08 13:04:24
|
<AbdealiJK>
|
I tried using OSM. I prefer OSM compared to Google too
|
2016-07-08 13:04:25
|
<DrTrigon_>
|
???
|
2016-07-08 13:04:59
|
<AbdealiJK>
|
So the issue is that OSM was giving "New No 4, Street ABC, Town, City, Country, PIN" as a string
|
2016-07-08 13:05:14
|
<jayvdb>
|
and you need to parse it?
|
2016-07-08 13:05:26
|
<AbdealiJK>
|
Can't parse it - it has no standard format
|
2016-07-08 13:05:45
|
<jayvdb>
|
hmm. sounds like the geopy package is not helpful
|
2016-07-08 13:05:58
|
<AbdealiJK>
|
checks the OSM API
|
2016-07-08 13:06:30
|
<jayvdb>
|
looking for the python package I switched to recently ...
|
2016-07-08 13:06:44
|
<AbdealiJK>
|
AH, youve tried it already - nice
|
2016-07-08 13:07:18
|
<AbdealiJK>
|
jayvdb, But even with that OSM will still have a limit
|
2016-07-08 13:07:47
|
<jayvdb>
|
the limit isnt the problem
|
2016-07-08 13:08:18
|
<jayvdb>
|
wikimedia has a copy of the OSM data, so we can host our own instance of any query service if we need it
|
2016-07-08 13:08:33
|
<AbdealiJK>
|
:o nice
|
2016-07-08 13:08:34
|
<DrTrigon_>
|
ahhhh!
|
2016-07-08 13:08:41
|
<jayvdb>
|
(and may already have a copy of the Nominatim service running somewhere on tools)
|
2016-07-08 13:08:56
|
<AbdealiJK>
|
Wait ! so is OSM a part of wikimedia ?
|
2016-07-08 13:09:00
|
<AbdealiJK>
|
Or is this a collaboration ?
|
2016-07-08 13:09:40
|
<DrTrigon_>
|
not part ..., right?
|
2016-07-08 13:09:47
|
<jayvdb>
|
https://github.com/damianbraun/nominatim
|
2016-07-08 13:09:56
|
<jayvdb>
|
https://pypi.python.org/pypi/nominatim
|
2016-07-08 13:10:10
|
<jayvdb>
|
this is a collaboration
|
2016-07-08 13:10:39
|
<jayvdb>
|
Wikimedia uses OSM data a lot, so we set up our own database and tile server
|
2016-07-08 13:10:48
|
<AbdealiJK>
|
I can use `nominatim` +1
|
2016-07-08 13:10:52
|
<DrTrigon_>
|
I knew once that wiki draws a lot of data from OSM... :)
|
2016-07-08 13:10:54
|
<AbdealiJK>
|
I see jayvdb
|
2016-07-08 13:11:00
|
<DrTrigon_>
|
cool!
|
2016-07-08 13:11:09
|
<jayvdb>
|
and replicate the data into our local clone fairly regularly
|
2016-07-08 13:12:47
|
<jayvdb>
|
Wikimedia also has a license for MaxMind GeoIP, but I have never used that, so I dont know how useful it is. And I am not sure whether Wikimedia Foundation will be able to share their license for this purpose (probably, but not 100%)
|
2016-07-08 13:13:20
|
<DrTrigon_>
|
AbdealiJK: briefly comming back to docker - sorry for the time! - do you have anything you can give me in order that I can experiment arond a bit?
|
2016-07-08 13:13:37
|
<AbdealiJK>
|
jayvdb, Is there any IRC channel I can ask aboout the OSM + wikimedia stuff ?
|
2016-07-08 13:13:50
|
<jayvdb>
|
I know the MaxMind license is basically 'unlimited'
|
2016-07-08 13:14:22
|
<DrTrigon_>
|
jayvdb: IF we can avoid using commercial services it would be nice ...
|
2016-07-08 13:14:35
|
<DrTrigon_>
|
...of course only if we don't loose quality
|
2016-07-08 13:14:40
|
<jayvdb>
|
https://wikitech.wikimedia.org/wiki/OSM_Tileserver
|
2016-07-08 13:15:08
|
<jayvdb>
|
https://lists.wikimedia.org/pipermail/maps-l/
|
2016-07-08 13:15:26
|
<AbdealiJK>
|
DrTrigon_ I'm updating my latest dockerfiles
|
2016-07-08 13:15:48
|
<jayvdb>
|
https://www.mediawiki.org/wiki/Maps
|
2016-07-08 13:15:55
|
<DrTrigon_>
|
just put it somewhere and drop me a link
|
2016-07-08 13:16:02
|
<jayvdb>
|
https://maps.wikimedia.org/#4/40.75/-73.96
|
2016-07-08 13:16:41
|
<AbdealiJK>
|
DrTrigon_, in https://github.com/AbdealiJK/file-metadata/tree/ajk/docker you can see Dockerfile.ubuntu - that would be what you'd be interested in
|
2016-07-08 13:16:55
|
<AbdealiJK>
|
There's also Dockerfile.centos if you prefer that
|
2016-07-08 13:17:14
|
<DrTrigon_>
|
whatever you think works better - can use anything inside the VM
|
2016-07-08 13:17:56
|
<jayvdb>
|
https://tools.wmflabs.org/locator-tool/index.html#/
|
2016-07-08 13:18:00
|
<AbdealiJK>
|
I think Ubuntu would be easier.
|
2016-07-08 13:18:08
|
<DrTrigon_>
|
ok
|
2016-07-08 13:18:16
|
<AbdealiJK>
|
maps.wikimedia.org is really cool jayvdb
|
2016-07-08 13:20:06
|
<DrTrigon_>
|
agree - thanks for that!
|
2016-07-08 13:20:23
|
<DrTrigon_>
|
open questions here? do we want to go through the meeting agenda?
|
2016-07-08 13:21:28
|
<DrTrigon_>
|
AbdealiJK, jayvdb: ^^^
|
2016-07-08 13:21:30
|
<jayvdb>
|
sure
|
2016-07-08 13:21:47
|
<AbdealiJK>
|
yes please
|
2016-07-08 13:22:36
|
<DrTrigon_>
|
so AbdaliJK as I understand we get "categorization hints" (could not check the code yesterday...), right?
|
2016-07-08 13:23:28
|
<AbdealiJK>
|
Yes, that was done last week if I'm not mistaken
|
2016-07-08 13:23:51
|
<DrTrigon_>
|
I thought there were minor thing open - even better! :)
|
2016-07-08 13:24:14
|
<DrTrigon_>
|
then I was thinking about the to define the goal ...
|
2016-07-08 13:24:33
|
<DrTrigon_>
|
... would it be possible to start gathering and outputing stats now?
|
2016-07-08 13:24:48
|
<DrTrigon_>
|
in a nice table / overview
|
2016-07-08 13:25:05
|
<AbdealiJK>
|
DrTrigon_ what sort of stats are you thinking about ?
|
2016-07-08 13:25:35
|
<DrTrigon_>
|
e.g. when you do a run for https://commons.wikimedia.org/wiki/User:AbdealiJKTravis/logs/Category_Male_faces ...
|
2016-07-08 13:26:00
|
<AbdealiJK>
|
I find automatically adding it is irritating - because the script gets a bit messy
|
2016-07-08 13:26:25
|
<DrTrigon_>
|
... can you output a table at the end summarizing e.g. 1000 processed, 100 categorized, 10 issues, like ...
|
2016-07-08 13:26:27
|
<AbdealiJK>
|
Stats need structured data, and right now that is just dumping wiki lines
|
2016-07-08 13:26:38
|
<DrTrigon_>
|
something you did in the beginning but as table
|
2016-07-08 13:27:01
|
<AbdealiJK>
|
I can do "1000 processed" and "100 categorized"
|
2016-07-08 13:27:05
|
<DrTrigon_>
|
automatically doe not need to be ...
|
2016-07-08 13:27:09
|
<AbdealiJK>
|
"10 issues" cannot be done easily right ?
|
2016-07-08 13:27:21
|
<DrTrigon_>
|
... may be in a seperate script ...?
|
2016-07-08 13:28:02
|
<AbdealiJK>
|
DrTrigon_ I tried that in the beginning too. But it seemed unnecessary ... I would need to parse the wikicode that the first script writes and again process data, etc.
|
2016-07-08 13:28:04
|
<DrTrigon_>
|
by "issues" I mean "uncertain categorization"
|
2016-07-08 13:28:12
|
<AbdealiJK>
|
I found it was just easier to do manually
|
2016-07-08 13:28:42
|
<DrTrigon_>
|
it not about we need that in the bot - it's about to get an idea how it performs overall
|
2016-07-08 13:29:42
|
<jayvdb>
|
or maybe a table, with number of categories added per file. i.e. 5+ cats added: 10 images ; 4+ cats added: 50 images ; 3+ cats added: 100 images ...
|
2016-07-08 13:30:20
|
<AbdealiJK>
|
that can be done
|
2016-07-08 13:30:30
|
<DrTrigon_>
|
we need a tool to keep control of the bot quality
|
2016-07-08 13:30:41
|
<AbdealiJK>
|
I think uncertain categories can be done too
|
2016-07-08 13:30:50
|
<DrTrigon_>
|
and that has 2 parts; how good the bot finds data from files ...
|
2016-07-08 13:31:07
|
<DrTrigon_>
|
... and how good it can relate this info to categories on commons
|
2016-07-08 13:31:56
|
<jayvdb>
|
right. we dont want a bot adding "Black and white" and "photograph" when it can be adding only "Black and white photographs"
|
2016-07-08 13:32:16
|
<jayvdb>
|
the most valuable categorisation is leaf categories in the category tree
|
2016-07-08 13:33:55
|
<jayvdb>
|
perhaps that is a way to decide on quality categories : the bot must add three categories that are leaf categories
|
2016-07-08 13:34:01
|
<jayvdb>
|
^ just an example
|
2016-07-08 13:34:07
|
<DrTrigon_>
|
so we need to find a way to measure perfomance of (A) alogrithms like face detection, etc. and (B) relating that info to commons categorization
|
2016-07-08 13:34:35
|
<jayvdb>
|
if it does that, for 5% of the images, the tool is very useful
|
2016-07-08 13:35:28
|
<DrTrigon_>
|
of categorized images, right?
|
2016-07-08 13:35:47
|
<DrTrigon_>
|
(that could be a goal... correct)
|
2016-07-08 13:36:03
|
<jayvdb>
|
'three leaf categories' is probably too hard. maybe only 'one leaf category, and two non-leaf categories'
|
2016-07-08 13:36:48
|
<jayvdb>
|
ok, I need to go soon
|
2016-07-08 13:37:08
|
<DrTrigon_>
|
I might be go for 1 leaf cat in 5% as MVP and 3 leaf cat in 5% as bulls eye
|
2016-07-08 13:37:16
|
<DrTrigon_>
|
:)
|
2016-07-08 13:37:29
|
<jayvdb>
|
nod
|
2016-07-08 13:37:43
|
<DrTrigon_>
|
AbdealiJK: what do you think about measuring performance?
|
2016-07-08 13:37:55
|
<AbdealiJK>
|
3 leaf is probably not very feasible. Currently, the only analysis which gives us a leaf is Scanned with/Created with/Taken with
|
2016-07-08 13:38:17
|
<DrTrigon_>
|
and the black whit photographs, right?
|
2016-07-08 13:38:20
|
<jayvdb>
|
geo should give you a leaf cat
|
2016-07-08 13:38:23
|
<AbdealiJK>
|
Nope
|
2016-07-08 13:38:33
|
<AbdealiJK>
|
jayvdb, Not really, the place name can be of any granuality ..
|
2016-07-08 13:39:00
|
<AbdealiJK>
|
For example, I can detect that it was in India, Chennai. But maybe not that it was in IIT Madras (my college)
|
2016-07-08 13:39:03
|
<DrTrigon_>
|
AbdealiJK: sry, the monochromes
|
2016-07-08 13:39:08
|
<jayvdb>
|
geo should give you a leaf cat , or red leafcat
|
2016-07-08 13:39:29
|
<jayvdb>
|
ok . going now. ill be around in 15 mins
|
2016-07-08 13:40:17
|
<AbdealiJK>
|
DrTrigon_, https://commons.wikimedia.org/wiki/Category:Black_and_white_photographs has lots of subcats
|
2016-07-08 13:40:42
|
<DrTrigon_>
|
;) the "Nope" was not for me...
|
2016-07-08 13:40:57
|
<DrTrigon_>
|
so do you think measuring perfomance is feasble?
|
2016-07-08 13:41:44
|
<DrTrigon_>
|
jayvdb: bye - see ya!
|
2016-07-08 13:42:04
|
<AbdealiJK>
|
DrTrigon_ I am not sure how it can be done actually ...
|
2016-07-08 13:42:04
|
<DrTrigon_>
|
AbdealiJK: so do you think measuring perfomance is feasble?
|
2016-07-08 13:42:39
|
<AbdealiJK>
|
^ The measuring performance
|
2016-07-08 13:43:40
|
<DrTrigon_>
|
may be first step is to summerize info per bot run - the we can think about merging all thoses together
|
2016-07-08 13:44:29
|
<AbdealiJK>
|
nod. I will try creating a summary report based on what jayvdb mentioned (Number f images with 3 cats, etc)
|
2016-07-08 13:44:35
|
<AbdealiJK>
|
Is there any other stat ?
|
2016-07-08 13:45:17
|
<DrTrigon_>
|
I would include anything you want to learn about or does not work as you expect ...
|
2016-07-08 13:45:34
|
<DrTrigon_>
|
... you mentioned face-detection to be worse than you expected, etc.
|
2016-07-08 13:46:06
|
<AbdealiJK>
|
In the stats ?
|
2016-07-08 13:47:04
|
<DrTrigon_>
|
they are mostly for us developers, right? we want to know what parts of the code can or need to be improved...
|
2016-07-08 13:47:33
|
<DrTrigon_>
|
also I would ouput info like 5 sure categorizations, 3 experimental ones (relates to the face-detection etc.)
|
2016-07-08 13:47:47
|
<AbdealiJK>
|
Ok, so I think "Number of distinct categories"
|
2016-07-08 13:47:52
|
<DrTrigon_>
|
...respective to the 2 run modes
|
2016-07-08 13:47:58
|
<AbdealiJK>
|
"Number of files for each category"
|
2016-07-08 13:48:21
|
<AbdealiJK>
|
But we currently don't have a distinction for experimental vs sure - so, I'd like to procrastinate that if possible
|
2016-07-08 13:48:52
|
<DrTrigon_>
|
that is ok - but keep it in mind, for design etc.
|
2016-07-08 13:49:02
|
<AbdealiJK>
|
yep
|
2016-07-08 13:49:25
|
<DrTrigon_>
|
also it should be clear why they are distinct or categorized etc
|
2016-07-08 13:49:57
|
<AbdealiJK>
|
I didnt follow your last message, could you elaborate ?
|
2016-07-08 13:50:30
|
<DrTrigon_>
|
if you do stats like "Number of distinct categories" we also need to know WHY they are distinct e.g.
|
2016-07-08 13:50:48
|
<DrTrigon_>
|
what result / algorithm made the bot deciding like that
|
2016-07-08 13:51:09
|
<DrTrigon_>
|
(was it due to facedetection or anything else etc.)
|
2016-07-08 13:51:32
|
<AbdealiJK>
|
Apologies. Let me clarify what I meant by "number of distinct categories"
|
2016-07-08 13:51:55
|
<DrTrigon_>
|
please :)
|
2016-07-08 13:52:04
|
<AbdealiJK>
|
I meant overall in the whole batch, how many different categories were used
|
2016-07-08 13:53:20
|
<DrTrigon_>
|
I see.
|
2016-07-08 13:54:55
|
<AbdealiJK>
|
DrTrigon_ ok. think we can call this to a close ? Let me begin with the current stats we've mentioned and see if we need more over time ?
|
2016-07-08 13:55:14
|
<AbdealiJK>
|
And add appropriate stats as needed/feasible
|
2016-07-08 13:55:59
|
<DrTrigon_>
|
Yes let's do that. Once we see what we are talking off it will get more clear anyway. :)
|
2016-07-08 13:56:08
|
<AbdealiJK>
|
nod
|
2016-07-08 13:56:13
|
<DrTrigon_>
|
(clearer?)
|
2016-07-08 13:56:14
|
<jayvdb>
|
back
|
2016-07-08 13:56:24
|
<DrTrigon_>
|
nice!
|
2016-07-08 13:56:35
|
<AbdealiJK>
|
We were just planning on ending :)
|
2016-07-08 13:56:37
|
<DrTrigon_>
|
so AbdealiJK: do you want to add anything?
|
2016-07-08 13:56:44
|
<AbdealiJK>
|
Nothing from me - no
|
2016-07-08 13:57:10
|
<DrTrigon_>
|
jayvdb: anything to add? questions?
|
2016-07-08 13:57:29
|
<jayvdb>
|
list of distinct categories suggested (and then 'added' when the bot writes) would be useful
|
2016-07-08 13:57:35
|
<AbdealiJK>
|
Minor update: I've begun running the script on new images. Will update with results when done
|
2016-07-08 13:57:55
|
<jayvdb>
|
nice
|
2016-07-08 13:58:23
|
<jayvdb>
|
ill be around until late tonight, so post an update when you have some useful % stats
|
2016-07-08 13:58:27
|
<DrTrigon_>
|
AbdealiJK: wanted to add the Bulk test, Upstream and Software results are nice!
|
2016-07-08 13:58:56
|
<DrTrigon_>
|
...and I got a very strange error: http://dpaste.com/3JA5RK3
|
2016-07-08 13:59:05
|
<DrTrigon_>
|
that's all from my side!
|
2016-07-08 13:59:22
|
<AbdealiJK>
|
DrTrigon_ I answered that in the conpherence I believe
|
2016-07-08 13:59:25
|
<jayvdb>
|
thats all from me
|
2016-07-08 13:59:46
|
<DrTrigon_>
|
did not refresh conpherence yet...
|
2016-07-08 14:00:17
|
<DrTrigon_>
|
you wrote it as pwb.py already... I see... will check! thanks!
|
2016-07-08 14:00:42
|
<DrTrigon_>
|
so thanks and have a nice evening folks!
|
2016-07-08 15:02:35
|
<TyLandercasper>
|
Is everyone up?
|
2016-07-08 15:09:46
|
<jayvdb>
|
hi
|
2016-07-08 15:10:01
|
<TyLandercasper>
|
hey
|
2016-07-08 15:10:30
|
<jayvdb>
|
ping DrTrigon DrTrigon2 ;-)
|
2016-07-08 15:10:47
|
<DrTrigon>
|
jayvdb: Yeessss? :)
|
2016-07-08 15:10:55
|
<jayvdb>
|
AbdealiJK just left this room; i've pinged him on Skype
|
2016-07-08 15:11:07
|
<DrTrigon>
|
You wanna meet now?
|
2016-07-08 15:11:22
|
<jayvdb>
|
if you are free, TyLandercasper is here and on Skype
|
2016-07-08 15:11:44
|
<DrTrigon>
|
I kind of about to leave... let me check...
|
2016-07-08 15:12:25
|
<DrTrigon>
|
ok, have about 1.5 hrs - let's go... ;)
|
2016-07-08 15:16:59
|
<AbdealiJK>
|
TyLandercasper, Hi !
|
2016-07-08 15:17:15
|
<TyLandercasper>
|
hello!
|
2016-07-08 15:17:30
|
<AbdealiJK>
|
Is there a meeting going on ? Or is it postponed ?
|
2016-07-08 15:17:43
|
<TyLandercasper>
|
they're tying to add you right now
|
2016-07-08 15:34:47
|
<DrTrigon>
|
I fell out the call...
|
2016-07-08 15:34:56
|
<jayvdb>
|
me too
|
2016-07-08 16:05:33
|
<jayvdb>
|
i think ivelost everyone
|
2016-07-08 16:33:17
|
<DrTrigon>
|
AbdealiJK: https://phabricator.wikimedia.org/Z441 running `$ python bulk.py -search:'eth-bib' -logname:ETH-Bib` currently ;)
|
2016-07-09 17:02:34
|
<jayvdb>
|
AbdealiJK, https://meta.wikimedia.org/wiki/WikiConference_India_2016/Scholarships#Decision
|
2016-07-09 17:03:11
|
<jayvdb>
|
have you received an email?
|