[01:05:20] 10Analytics: Old deleted pages have empty fields in Analytics Cluster edit data - https://phabricator.wikimedia.org/T165201#3260061 (10Neil_P._Quinn_WMF) [01:06:32] 10Analytics: Pivot "MediaWiki history" data lake: Feature request for "Event Users" - https://phabricator.wikimedia.org/T161185#3260072 (10Neil_P._Quinn_WMF) [01:06:38] 10Analytics: Pivot "MediaWiki history" data lake: Feature request for "Time" dimension to split by calendar month / quarter / year - https://phabricator.wikimedia.org/T161186#3260073 (10Neil_P._Quinn_WMF) [06:28:27] neilpquinn: FYI I have a draft query with a rough approximation of edit_count here: [06:28:27] https://phabricator.wikimedia.org/T149021#3260176 [06:28:57] it's running now and I'll update the results when it's done, and try a different way to get edit_count too if the results come relatively quickly. If it takes until tomorrow I'll reconsider :) [06:29:17] oh :) it just finished [06:35:47] https://www.irccloud.com/pastebin/Wk50WEV8/ [06:35:57] running by hour now, will update with results [06:36:51] fyi kaldari ^ [06:36:58] milimetric: hah, very nice! You're much more fluent at this than I am. [06:37:22] milimetric: since you're here, I'm working on another data lake query that's got me stumped. [06:37:33] neilpquinn: ok, shared results here: https://phabricator.wikimedia.org/T149021#3260178 [06:38:21] you should definitely double check my logic there, I'll share the hourly when it's done too and it'll be interesting to see if there are many differences. [06:38:38] (go ahead with the other question, I'm making some plans around here but I'll try and answer) [06:38:57] milimetric: I ran this query, but got no results. https://phabricator.wikimedia.org/P5443 [06:39:35] no idea why. [06:39:54] neilpquinn: looking, while I do that, why do we lump people with 1000000 edits in the same "autoconfirmed" group as people with 11 edits? [06:39:56] just curious [06:40:13] like, doesn't that group become meaningless after like 20-30 edits? [06:40:27] and should I filter those edits to only content ones? [06:41:37] milimetric: well, autoconfirmed is just a threshold. As in, "are you an established enough account that we trust you enough to edit this frequently-vandalized page?" so if you meet that threshold, you meet it, whether you have 11 edits or a million. [06:42:17] and we're interested in how many creations are done by users below that threshold. [06:43:02] and no, don't filter to content edits. Mediawiki just looks at overall edit count when it's checking autoconfirmed status. [06:48:27] milimetric: the core of the query looks very sound but I think you want to tweak the outermost select a bit. It looks like you're getting the number of page creations and page deletions done every day by autoconfirmed users. This request is for the proportion of page creations done by autoconfirmed users. (In other words, if you had to be autoconfirmed to create a page, what percentage of creations would [06:48:28] that prevent?) [06:50:10] actually, I can tweak it :) [06:50:47] cool, that doesn't make sense to me yet but I'm sure it will when I see your tweak [06:51:02] (I'm still puzzled by why your query is not returning results, seems very much simple enough) [06:51:28] btw, mine ran really fast, like 8 minutes, so you can definitely tweak and try relatively quickly [06:51:58] neilpquinn: ok, so I tried this with enwiki and kowiki and I have no results for kowiki and lots for enwiki: [06:52:00] https://www.irccloud.com/pastebin/ub1YG9Zh/ [06:52:50] I'm looking for page/create here, which should be a bit nicer than revision/create/rev_parent_id=0 but neither give results on kowiki and both do on enwiki [06:52:58] maybe their namespaces are different or something weird? [06:54:02] milimetric: hmm, I can't imagine how they would be, since we're using the codes. [06:54:43] yeah, I can look closer at the data to see if there's a bug there, but I gotta run soon [06:54:47] you ok with the other query? [06:56:23] milimetric: yeah, just about to run it. [06:56:41] cool, good luck, see you in Vienna :) [06:57:02] milimetric: I won't be there. I'm leaving on Monday for four weeks of user research :) [06:57:21] ah! have fun then, ping me if you need me [06:57:30] thanks! [16:48:06] Hi there, is it possible to get reffer URLs? I'd like to know how many visitors of czech Wikipedia comes from Google, how many comes directly, how many comes from Wikipedia etc. Is this information public? [16:57:03] a-team [17:06:05] hi Urbanecm, we don't have too much public information on this, but there is this outdated report: https://stats.wikimedia.org/wikimedia/squids/SquidReportOrigins.htm [17:06:15] milimetric: Thank you. [17:06:57] Urbanecm: we have requests to bring it up to date, and we have decent data so it's possible to do but lower priority than some immediate projects we're working on. [17:07:52] milimetric: Okay. Thank you again. [20:49:55] Urbanecm: there is also this https://discovery.wmflabs.org/external/ [20:50:56] Urbanecm: the info is not public nor private either as for the most part we do not produce agreggated stats like that per project cc bearloga [22:38:08] 10Analytics: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3260779 (10Neil_P._Quinn_WMF) [22:42:35] milimetric, nuria_: ^ take a look when you get a chance. I was trying to use the Data Lake to do some participant selection for the Korean user research today, but I imagine a fix will take time, so I'll just have to fall back to MariaDB.