[06:53:55] [[User:2003:66:8F3D:E785:222:4DFF:FEAF:FA63]] appears to be an IP bot, which has contributed ~12500 edits since Saturday. most of them are okay (creation of cinema-related authority control statements), but some edits seem to be incorrect. This account also clogs recent changes to some extent and it continuously needs partol attention. Should we do something? Is it allowed to run a bot without an account? Thanks! [06:53:55] 10[2] 04https://www.wikidata.org/wiki/User:2003:66:8F3D:E785:222:4DFF:FEAF:FA63 [06:54:27] [[Special:Contributions/2003:66:8F3D:E785:222:4DFF:FEAF:FA63]] appears to be more useful ;-) [06:54:27] 10[3] 04https://www.wikidata.org/wiki/Special:Contributions/2003:66:8F3D:E785:222:4DFF:FEAF:FA63 [07:17:59] MisterSynergy: it's been going on for months and months (e.g. https://www.wikidata.org/wiki/Wikidata:Administrators%27_noticeboard/Archive/2015/12#2003:66:8F2A:1AF6:C557:135:FB9A:7BDC last year) but the ip address changes every day... the bot policy does say bots need accounts, but it seems like not everyone agrees that it's a bot [07:27:51] nikki: thanks! https://tools.wmflabs.org/xtools-ec/?user=2003:66:8F3D:E785:222:4DFF:FEAF:FA63&project=wikidata indicates a pretty constant edit rate since saturday evening. I am pretty sure it is a bot. [07:51:44] nikki, MisterSynergy: didn't he get a account? [07:58:56] I don't know, I just recently started to do countervandalism work. [[User:YMS]] left a message on the IP addresses' talk page which sounds as if the user worked with an account in the past [07:58:56] 10[4] 10https://www.wikidata.org/wiki/User:YMS [07:59:30] since they continue to add poor data after this message, I'll raise attention on the [[Wikidata:Administrators' noticeboard]] now [07:59:30] 10[5] 10https://www.wikidata.org/wiki/Wikidata:Administrators%27_noticeboard [08:01:13] https://www.wikidata.org/w/index.php?title=Q3384856&diff=prev&oldid=371937172 :/ [08:01:25] oh, you're on it kek [08:01:33] please patrol too then [08:06:54] Filtering with ORES also makes me sad. [08:07:47] I now left a message on the Administrators' noticeboard [08:08:16] might have forgotten to mark partolled, typically I do --- sry! [08:09:55] We should have more inline patrolling. :( [08:10:37] you mean more "mark as patrolled" buttons? [08:11:15] Well we have a gadget that adds them inline to the recent changes. But the contributions page doesn't feature indications of unpatrolled edits so we can't add them there. [08:15:13] yes, it would be useful to have these links on contributions pages as well [08:15:23] btw. the IP bot is blocked now [08:23:42] need to go now. cu [09:41:24] SMalyshev : STRDT() wins the race :p [10:22:48] sjoerddebruin: hmm do you an idea how to find how many Wikipedia redirects are linked on Wikidata? [10:23:11] I remember statistics from 2015, but no idea of the technical tools involved [10:23:20] seems like something you would know? [10:25:52] not really [10:26:09] :s [10:26:36] harmonia : i promise i'll try something tonight ^^ [10:27:06] Alphos: I need two things, actually, the number and then the list [10:27:16] should be back around 18:00 CEST [10:29:34] a sparql question (i'm really newbie). what's wrong with this: http://tinyurl.com/j7znp78 ? [10:33:14] you probably need to group by everything that isn't being grouped manually in the select bit, not just by ?journal [10:33:48] changing the last line to "GROUP BY ?journal ?journalLabel ?languageLabel ?itwiki ?enwiki" returns results for me, at least [10:34:50] thanks, seems ok. the result number is correct [10:36:06] good :) [10:38:29] http://tinyurl.com/hylx2p9 uh, I run it and it gave me 10 time the same Qid O_O [10:39:13] hmm DISTINCT maybe? [10:43:06] harmonia: you get one line for each ?item + ?label so .. [10:43:35] yep, with distinct it work [10:43:37] s [10:43:55] it gave me 10 Qid, not just one repeated ten times [11:16:03] i got another question, if you can point me to some examples: how to find all items that have an official website (P856) qualified by point in time (P585) [11:16:03] P585 Somebody subscripted core deployment branches?!??!?!?!!!!???? - https://phabricator.wikimedia.org/P585 [11:16:03] P856 Second run of convertNamespaceFromWikitext.php on Catalan Beta - https://phabricator.wikimedia.org/P856 [11:18:08] atomotic: I'm doing your query now [11:18:20] wait a little, I'm eating at the same time :p [11:19:36] * nikki came up with http://tinyurl.com/jytt97d [11:20:19] I usually use https://www.mediawiki.org/wiki/Wikibase/Indexing/SPARQL_Query_Examples#Largest_cities_with_female_mayor when trying to do queries with qualifiers because I always forget how to do them >_< [11:21:47] great! [11:22:33] http://tinyurl.com/hc4eu3r [11:22:46] hmm nikki made a better one :) [11:23:04] thanks both [11:23:46] :) [12:41:30] Tobi_WMDE_SW: Are you around? I got the next CI failure, a segfault even. https://integration.wikimedia.org/ci/job/mwext-Wikibase-repo-tests-sqlite-php55/891/console [12:48:12] Thiemo_WMDE: yeah, saw it. that's in the sqlite-php55 job, right? I have no clue what could be the reason.. [12:48:13] :( [12:48:37] retry? [12:49:29] Thiemo_WMDE: yeah, let's try [12:50:11] but I think I saw it somewhere else in that chain as well recently, can't find it now anymore though.. Thiemo_WMDE [12:51:51] Thiemo_WMDE: hm.. it's also not rebased to latest master.. hit the rebase button? [12:52:24] Browsertests are all fine in the last CI run. [12:52:30] No, not rebased yet. [12:53:07] Thiemo_WMDE: same failure: https://integration.wikimedia.org/ci/job/mwext-Wikibase-repo-tests-sqlite-php55/893/console [12:53:16] I've now hit the rebase button [12:54:21] thanks. [12:54:22] Thiemo_WMDE: ah, it seems the failing job is only executed on gate-submit? [12:54:48] i hate when this happens. rebase after rebase in literally dozens of gerrit patches just to fix a tiny annoying bug. [13:03:23] Thiemo_WMDE: https://gerrit.wikimedia.org/r/#/c/298759/ is now green. but the evil sqlite-php55 job is only executed on gate-submit I think [13:04:53] Thiemo_WMDE: we've had something similar for the wikidata build 2 weeks ago: https://phabricator.wikimedia.org/T142905 [13:05:05] I don't know how it got fixed [13:06:28] my patches in wikibase randomly fails [13:06:38] I have some examples if you want to check [13:06:44] recheck works most of the time [13:07:01] Amir1: yeah, give me some examples please [13:07:23] okay [13:08:31] https://gerrit.wikimedia.org/r/#/c/299284/31/ [13:08:36] patchset 31 [13:08:50] https://gerrit.wikimedia.org/r/#/c/305849/11 [13:08:52] patchset 11 [13:09:10] https://gerrit.wikimedia.org/r/#/c/307811/1 [13:09:12] patchset 1 [13:09:22] https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer/4736/console [13:09:34] https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer/4635/console [13:09:44] all selenium composer Tobi_WMDE_SW [13:09:45] Amir1: ok, thx! [13:09:53] those are from yesterday or older [13:10:27] the latest is the last night [13:10:32] we've merged a (potential) fix for the selenium tests yesterday evening.. so if you rebase against current master at least the selenium tests should not fail anymore [13:11:13] https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer/4736/console [13:11:32] this one failed yesterday around 10 PM UTC+2 [13:11:56] the change Thiemo_WMDE is currently working on is failing for another reason.. it's the phpunit tests for 5.5 on sqlite (only executed on gate-submit and not on every patchset) [13:12:02] see https://phabricator.wikimedia.org/T142905 [13:12:13] erm. I mean https://integration.wikimedia.org/ci/job/mwext-Wikibase-repo-tests-sqlite-php55/895/console [13:12:31] Thiemo_WMDE: still failing: https://integration.wikimedia.org/ci/job/mwext-Wikibase-repo-tests-sqlite-php55/895/console [13:13:08] okay, sorry for false alarm [13:13:09] can you create a phabricator ticket for it? and add aude - at least she reported something similar 2 weeks ago: https://phabricator.wikimedia.org/T142905 [15:13:59] ALVARO VALES BASURA.. ESCREMENTO DE MIERDA [15:14:00] ALVARO VALES BASURA.. ESCREMENTO DE MIERDA [15:14:15] LALALALLALALA [15:14:17] JDDJJDJDJDJSSU [15:14:19] DDUDUWUSUUSW [15:14:21] hmm [15:14:29] !ops wpekdkdjwjsjsj [15:14:41] !admin please ban him [15:14:43] CAYA ALVARO [15:14:57] O TE MATO [15:15:05] (and how can I get op rights on this channel?) [15:19:40] ALVARO VALES BASURA.. ESCREMENTO DE MIERDA [15:19:51] LALALALALLALALA [15:20:00] TE MATARE HIJO DE PERRA [15:20:23] ALALALALLALA [15:20:33] you must be very bored [15:21:01] LALALALLALALALA [15:53:00] Sorry, I was doing... other things. [15:58:22] #wikimedia-wikidata [15:58:42] oops, sorry [15:58:49] kjschiroo, that chan is invite only ;) [15:59:19] Amir1 had it listed on his user page so I figured it was open [16:00:08] ah, ok [16:26:02] like wikipedia where vandalism is a challenging problem, does Wikidata also face the same? More like where users would arbitrarily create items or add arbitrary statements to existing items? [16:26:14] or is this issue not so serious here? [16:28:25] there is a lot of vandalism [16:29:11] vandalism is a problem, from what I've seen, it tends to be people editing existing data though, not adding new items/statements [16:30:18] Better abusefilters are needed. [16:32:01] sjoerddebruin not to mention a better test bed for RollBot ! :3 [16:32:09] (and a better way to trigger it) [16:32:47] (better test bed so it can get its bot flag ^^ ) [16:52:42] Alphos: or more volunteers! :P [16:55:32] https://www.wikidata.org/wiki/Q3568440 some people just won't die :-/ [17:35:18] sjoerddebruin: Alphos nikki can you tell about the common types of vandalisms that occur? like is it adding a arbitrary property-value pairs to item, or editing existing statements with random facts to distort knowledge? [17:35:36] asking because I'm looking into such cases which could make items bad quality [17:35:46] Also labels, descriptions and aliases [17:36:35] https://www.wikidata.org/w/index.php?title=Q11028&curid=12492&action=history [17:38:58] sjoerddebruin: can you provide one or two examples where knowledge has been tampered with by adding wrong statements or modifying existing one's? [17:39:32] See https://www.wikidata.org/w/index.php?title=Q39444&action=history for example [17:39:34] like I said, it's generally editing existing things, not adding items or statements (although I'm sure everything happens at some point)... the things that are most noticeable are things like changing descriptions from "nationality occupation" to "nationality some-insult", changing genders (often marking men as female, or changing it to things like "alien" or "dog"), changing occupations (to things like "murderer")... other things are changing [17:39:34] identifiers to random strings [17:40:31] https://www.wikidata.org/w/index.php?title=Q36517&action=history [17:40:45] there's other stuff that's less clearly vandalism to me, like changing commons categories to non-existent categories (maybe they're just confused by the interface, the changes are still wrong though) [17:44:55] what I can deduce is there are some edits which are outright vandalism which are done with intention of harm, they'll mostly consist of random content [17:45:45] but if I'm not wrong, there might be unintentional disparities too, like wrong usage of a property on an item by a harmless user, though I'm not able to find a constructive example as of yet [17:46:43] because after all Wikidata would not allow constraints on the usage of properties specific to items [17:48:14] but still many properties only make sense on a certain instances of items, like for example property "category for people who died here" would only make sense with a subject which is a place [17:48:30] Sometimes people use https://www.wikidata.org/wiki/Property:P488 etc. on people items [17:48:30] P488 (An Untitled Masterwork) - https://phabricator.wikimedia.org/P488 [17:48:55] I also saw people using https://www.wikidata.org/wiki/Property:P910 as P31. [17:48:56] P910 Masterwork From Distant Lands - https://phabricator.wikimedia.org/P910 [17:48:56] P31 Fork of P29 (An Untitled Masterwork) - https://phabricator.wikimedia.org/P31 [17:48:58] but there's no stopping from using such properties on other items [17:49:07] yes I was exactly looking for similar things [17:50:13] codezee: this is to some extent captured by constraint reports; which are very long, so "unintentional vandalism" might not be seen by anyone immediately [17:51:02] MisterSynergy: I'm working on the same thing, researching to try and come up with something beyond constraint reports just like intentional vandalism is detected [17:51:12] https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/Mandatory_constraints/Violations&action=history [17:51:19] which is why examples as sjoerddebruin provided are very helpful to analyze [17:51:59] for a good overview of typical vandalism I recommend that you do RC work for some days -- this would already show you the majority of cases [17:52:43] MisterSynergy: RC as in? [17:52:55] use the ORES beta feature (or gadget?), go to [[Special:RecentChanges]] and click "hide probably good edits". lots of vandalism will be listed ;-) [17:52:56] 10[6] 04https://www.wikidata.org/wiki/Special:RecentChanges [17:52:56] https://www.wikidata.org/w/index.php?title=Special:RecentChanges&hidepatrolled=1 [17:54:41] ORES is still a beta feature (just verified); it can be activated at [[Special:Preferences#mw-prefsection-betafeatures]] [17:54:41] 10[7] 04https://www.wikidata.org/wiki/Special:Preferences%23mw%2Dprefsection%2Dbetafeatures [17:55:06] MisterSynergy: Oh,I see, but I suppose to catch hold of unintentional violations I'll have to go through constraint violation report as thats the area of my research [17:55:15] uh, encoding broke. go to the "beta features" tab [17:58:16] okay, I understand. "unintentional vandalism" (good faith) mostly comes from unexperienced users, whose edits are not patrolled automatically. RC with unpatrolled edits only would help; you can also see lots of interesting edits by using User:Pasleim's "rech"-tools with the possibility to show only particular types of edits (e.g. just mergers, just label/desc/alias changes, etc) [18:04:32] nikki: you might find this interesting https://phabricator.wikimedia.org/T141230 [18:06:26] Constraint Violations page is improved by KrBot, I guess its using the definitions from https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations and scraping the pages to see for problems... [18:08:17] Yup, but the amount of properties makes the time to get them updated longer. [18:13:39] There is also https://www.wikidata.org/wiki/User:Pasleim/Vandalism btw [18:15:46] sjoerddebruin: do we have a WikiProject or any other community-based page that organises counter-vandalism activities? [18:15:54] AFAIK no [18:16:16] I'm asking every month for numbers about vandalism, but it doesn't seem to be a high issue for some. [18:16:26] I have just recently started to work in that field, and it is somewhat difficult to figure out which tools etc. are available [18:17:32] so there is no statistics about unpatrolled edits (by day or month), and also nothing about number of reverts? [18:18:03] We have this https://stats.wikimedia.org/wikispecial/EN/EditsRevertsWIKIDATA.htm [18:18:18] (July 2016) [18:20:38] interesting, but difficult to interpret [18:21:28] Yep, that's why I asked addshore a few times for grafana stuff... [18:21:39] *waves* [18:22:05] It would be great to track unpatrolled edits etc., to get a view of the scale of this problem addshore [18:23:07] Hm, so this counts as a revert? https://www.wikidata.org/w/index.php?title=Q15526030&action=history [18:24:33] * addshore needs to read / understand the problem first [18:26:49] this is the item with the third-most reverts, according to the stats.wikimedia.org page. looks like a wild bot problem, rather than real vandalism [18:27:44] actually it is even the item with the most reverts, since the other to items with more reverts are sandbox items which are regularly purged by reverts [18:27:52] addshore: we don't know the scale of vandalism on Wikidata. [18:29:11] it probably wouldn't be too difficult to read the number of unpatrolled edits per month from the replica databases, right? [18:29:12] sjoerddebruin: what do you think would be good ways to track that? can't ores help? [18:29:15] sjoerddebruin: For some reason it feels like I encounter less of it in the last couple of months [18:29:23] It might just be me editing less..... [18:29:28] multichill: It's more widespread now. [18:30:05] ORES has known false-positive/false-negative rates. It should be an effective strategy for getting an overall rate. [18:30:11] A proper workflow is needed for people wanting to fight vandalism [18:30:21] We can also use some of our analyses for a back of the envelope [18:30:30] https://meta.wikimedia.org/wiki/Research:Building_automated_vandalism_detection_tool_for_Wikidata [18:30:40] Vandalism already gets tagged most of the time halfak [18:30:54] https://meta.wikimedia.org/wiki/Research:Building_automated_vandalism_detection_tool_for_Wikidata#Building_a_corpus [18:30:59] But no good flow of checking it and signing off (ok or revert0 [18:30:59] multichill, not true :( [18:31:16] halfak: do you get pinged everytime someone says ores? :P [18:31:18] Wikidata had *a lot* of damage slipping through. [18:31:20] Event the stuff that does get tagged doesn't get processed [18:31:26] addshore, yup [18:31:48] multichill, tagged by ORES? [18:31:52] So first we need a process for that or do you propose a robot reverting? [18:31:55] Not sure what "tagged" you are referring to? [18:32:06] tagged is marked with a label by the abusefilter [18:32:15] I don't think a bot should do the reverting [18:33:03] Right now, we can filter the recent changes feed by 99% and effectively catch all of the vandalism with human review of the remaining 1% [18:33:15] That seems like it should be tractable. [18:33:38] https://www.wikidata.org/w/index.php?namespace=&tagfilter=new+editor+changing+statement&translations=noaction&hideliu=1&title=Special%3ARecentChanges [18:33:40] I used to patrol the new created items by anonymous people. But you feel like you're the only one doing it though. [18:34:08] (it would be a lot faster if redirects of merged items would be patrolled) [18:34:16] so ores doesn't create tags within mediawiki for things it thinks is vandalism? (im rather behind with all of this) [18:34:17] sjoerddebruin, I think that this is the biggest problem. "counter-vandalism" is just not as much of a career path in wikidata as it is in the big wikipedias [18:34:33] Language is another problem, I can't read them all. [18:34:35] addshore, no it doesn't generate tags. [18:35:10] addshore, we can produce a queue of non-patrolled edits that are likely to be damaging with Special:RecentChanges though. [18:35:14] in theory it could (not sure if that would be useful though..) [18:35:46] halfak: ORES's limitations would be that it could only detect syntactic vandalism, isn't it, more like bad,informal words, or small random edits? [18:35:59] Most wikipedia's have a workflow to help out with counter vandalim. I can just walk in, spend an hour and walk out again. No double work with other editors [18:36:04] Wikidata doesn't seem to have that [18:36:32] Generate a possible list of vandalism edits and provide a system to sign off [18:37:30] codezee, well, it works pretty well for catching nearly all types of vandalism in practice. [18:37:40] But you're right, that nuanced vandalism would be hard to catch [18:37:46] This is true on all wikis, wikibase or not [18:38:09] multichill, can we enable edit patrolling? [18:38:16] That would provide a workflow [18:38:18] It's already on afaik [18:38:27] People just forget to patroll. [18:38:28] Ahh... Well, ORES talks to the patrol flag :) [18:38:47] (new items especially, because it's at the fricking bottom) [18:38:47] Maybe this is a good incentive. Every edit you patrol reduces the workload of the next reviewer. [18:38:56] halfak: yes, but in case of Wikibase I'm of the opinion that catching semantic vandalism or for that matter "unintentional semantic violations" would be "relatively" easy, given the linked nature [18:39:18] codezee, indeed. Easy is the wrong word, but our classifier is effective :) [18:39:45] halfak: I guess you have scoring, right? So I would like to be able to see the edits with the worst score for the last say 24 hours [18:39:54] But it turns out that most of our work on vandalism is based on text features and querying the structures of wikidata is intractible for the scale we want to do it. [18:40:01] ... that haven't been patrolled or reverted [18:40:08] multichill, can do. [18:40:09] halfak: yes, I understand, thats why I qualified it with "relatively", though I agree "easy" would be wrong [18:40:23] multichill, we'd need an "is reverted" flag. We don't have that. [18:40:31] But we could make rollback/undo set the "patrolled" flag [18:40:41] And be sure to hash in the mediawiki tags in there somewhere, good info [18:40:54] I'm quite sure revert/undo marks the edit as patrolled [18:41:05] multichill, OK good. I didn't know that [18:41:15] Would have to check the manual [18:41:31] halfak: indeed I discussed this with you in Jerusalem if you remember in context of https://phabricator.wikimedia.org/T127470 and I'm currently working on it as part of my academic project [18:41:41] sjoerddebruin: now your request to simply keep track of a count of the number of unpatrolled edits makes more sense ;) [18:42:39] Yeah, I'm not good in explaining stuff. :P [18:42:45] the first problem that I saw as you mentioned was that Wikidata does not have large text features rather small meaningful chunks [18:42:49] codezee, awesome. I think that's going to be a solid use-case. It'll be nice to use that to get a good overview of the quality of subsections of wikidata :) [18:43:11] halfak: https://www.mediawiki.org/wiki/Help:Patrolled_edits & https://www.wikidata.org/wiki/Special:ListGroupRights , so my undo/revert is autopatrolled [18:43:39] I've got to run now, but feel free to ping me more about ORES/patrolling stuff. I want to get these things on someone's backlog and help find someone to work on it. [18:59:05] Hello all! I suck at proposing new External Identifiers...should a new proposal about a taxon datatbase be added to [[Wikidata:Property_proposal/Authority_control]] or [[Wikidata:Property_proposal/Natural_science#Biology]]? [18:59:06] 10[8] 10https://www.wikidata.org/wiki/Wikidata:Property_proposal/Authority_control13 => [18:59:09] 10[9] 10https://www.wikidata.org/wiki/Wikidata:Property_proposal/Natural_science%23Biology [19:00:44] I'll try the first one... [19:33:11] Josve05a: Just do both to be sure [19:33:52] oh right. I can just include :D [19:35:15] matanya: https://www.wikidata.org/wiki/Wikidata:Property_proposal/iNaturalist_taxon_ID :) [19:35:26] oops..wrong ping... lol... multichill [19:41:03] :O [19:52:23] Made a query to find better images to add to Wikidata. It takes quite some time. Maybe I've should have put a limit on it.... [19:54:41] multichill : i feel you >_> [19:55:19] you know that feeling, when it's taking an awful long time, and you don't want to kill it because it'd be a shame, but you're not sure you'll be able to run it otherwise ? :D [19:55:44] multichill: what do you use, page props? [19:57:38] https://tools.wmflabs.org/multichill/queries2/commons/artworks_better_image_wikidata.sql <- query from hell [19:57:51] It will make db admins wheep [19:58:11] :) [19:58:39] * Josve05a wheeps as well just looking at that query [19:58:55] It returns combinations like https://commons.wikimedia.org/wiki/File:Regnault,_Henri,_Salom%C3%A9.jpg -> https://commons.wikimedia.org/wiki/File:Henri_Regnault_(French,_Paris_1843%E2%80%931871_Buzenval)_-_Salom%C3%A9_-_Google_Art_Project.jpg [20:04:23] does anyone know how to change the redirection? https://www.wikidata.org/w/index.php?title=Q16043039&redirect=no should redirect to https://www.wikidata.org/wiki/Q26741981 [20:04:54] restore version before the redirect and then just merge again [20:05:19] nikki: I can't, the old sitelink is now used by the other entry [20:06:03] fun fun fun [20:06:31] hm, not sure if there's a better way, I usually just temporarily remove the link, do the edits I'm trying to do and then fix the links again [20:06:51] that [20:06:55] me is pretty tempted to just ignore the problem [20:06:59] * Harmonia_Amanda is lazy [20:08:32] there is no sitelink if you only undo the very last edit, which was the redirect creation; then you can merge, as nikki said, since it is an empty item [20:09:45] oh, true. I was looking at the wrong diff [20:10:02] Josve05a: I didn't see any obvious speed improvements in the query. I already added quite a few constraints to at least hit plenty of indices [20:26:42] nikki: are you also seeing bogus primary sources suggestions like these: https://www.wikidata.org/wiki/Q17428400 [20:28:04] dunno, I haven't got it enabled at the moment :/ [20:29:20] Oh, ok. [20:30:38] have the bugs in it been fixed yet? [20:32:06] of course not