[01:51:07] (03PS4) 10Nuria: [WIP] Changing initialization of QTree to work arround precision Bug [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/408329 [06:03:05] 10Analytics-Kanban, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#4009196 (10Neil_P._Quinn_WMF) >>! In T172410#4006697, @Nuria wrote: > Could you elaborate on which data do you need... [08:07:02] (03CR) 10Joal: "Looks super cleaner :) Thanks a lot !" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415174 (owner: 10Ottomata) [08:11:58] Heya elukey [08:12:09] elukey: what happened yesterday with Cassandra? [08:12:37] joal: hey! She doesn't like me, the feeling didn't change over time [08:12:38] :D [08:13:11] it's written in the task if you want to check, let me pick the link [08:13:21] https://phabricator.wikimedia.org/T184795#4006912 [08:14:03] similar to what happened with hive-server [08:14:26] of course this happens only in the Cassandra 2.2 package, not 3.x [08:14:50] so we'll probably need to rebuild 2.2.6 to remove that weird check [08:15:22] then restarting should be fine afterwards [08:15:43] on a completely unrelated news, we need to reboot all the hosts for kernel upgrades :) [08:25:35] Yay, full reboot [08:26:21] elukey: how do you want to proceed? [08:29:20] joal: this afternoon I am planning to reboot some canaries first, and start rolling it out to hadoop tomorrow [08:29:35] AQS will need to wait for the jmx exporter fix so we'll do everything in one go [08:30:06] all kafkas will take a bit :) [08:30:14] I think I'd need to open a task for this work [08:30:48] ah also, this time I'd like to do a apt-get dist-upgrade too on jessie hosts to upgrade the debian's minor version [08:35:37] ok elukey - just a bit longer, no risk, right? [08:37:44] joal: it updates all the os packages to the new debian minor release, it should be perfectly fine, but of course it might have some downsides as all the upgrades [08:38:48] !log rerun cassandra-hourly-wf-local_group_default_T_pageviews_per_project_v2-2018-2-27-15 [08:38:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:49:11] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10ops-eqiad, and 2 others: rack/setup/install analytics107[0-7] - https://phabricator.wikimedia.org/T188294#4009389 (10elukey) >>! In T188294#4006219, @Ottomata wrote: >> We still haven't tested Hadoop packages on stretch > > We kinda have, just not s... [09:02:14] elukey: all done [09:02:36] (I mean the packages that are shipped via Debian jessie point releases) [09:02:56] those are deployed across the whole fleet [09:05:34] 10Analytics, 10Operations, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4009435 (10elukey) >>! In T188377#4006258, @Ottomata wrote: > stars: https://github.com/wikimedia/puppet-zookeeper/stargazers > watchers: https://github.com/w... [09:07:06] good morning! [09:07:13] moritzm: Debian GNU/Linux 8.10 (jessie) - lovely! Thanks! [09:07:27] there's been a long running tmux process on analytics1003 for the past 8 days FYI [09:07:58] ema: morning! I think it was mine, I should've killed it this morning [09:08:16] (together with one one eventlog1001) [09:09:33] elukey: that said, there are a few odd internal packages we can look into clearning up while rebooting [09:09:52] e.g. analytics1031 has a pending upgrade for hunspell-vi for some reason [09:09:57] ahahhaha [09:10:05] ema [09:10:06] elukey@analytics1003:~$ /usr/bin/sudo /usr/local/lib/nagios/plugins/check_long_procs -w 96 -c 480 [09:10:09] OK: No SCREEN or tmux processes detected. [09:10:22] the retry time is suuuuper long, it will take a bit to clear :( [09:10:22] also, all the analytics hosts can lose libtirpc1 (pending on plenty of them), since it's only needed for NFS servers/clients [09:10:57] elukey: thanks! [09:11:24] moritzm: should it be removed via autoremove? Or manually? [09:11:49] yeah, it's in the autoremove list as well [09:12:00] autoremove needs to be done carefully, though [09:12:23] the interaction between puppet and Debian package deps isn't ideal [09:12:44] e.g. when we stopped installing salt-minion across the fleet [09:13:20] some scripts broke since they expected a Python package which was previously pulled in via salt and noone added an explicitty package dep in puppet [09:13:51] I usually ignore autoremove, but libtirpc1 has pending updates and we can clear those at least [09:16:05] sure sure [09:16:27] it happened also with prometheus-related deb required in puppet but not part of any explicit dependency [09:27:19] 10Analytics-Kanban: Meta-statistics on MediaWiki history reconstruction process - https://phabricator.wikimedia.org/T155507#4009473 (10JAllemandou) a:03JAllemandou [09:34:04] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore: dbstore1002 (analytics store) enwiki lag due to blocking query - https://phabricator.wikimedia.org/T175790#4009484 (10Addshore) I have flagged up this ticket to be tackled by one of the WMDE teams whose sprints will start next week. [10:37:49] (03PS20) 10Joal: Upgrade scala to 2.11.7 and Spark to 2.2.1 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/348207 [10:39:33] (03CR) 10jerkins-bot: [V: 04-1] Upgrade scala to 2.11.7 and Spark to 2.2.1 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/348207 (owner: 10Joal) [10:40:53] (03PS1) 10Joal: [WIP] Add by-wiki stats to MediawikiHistory job using new MapAccumulator [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415255 (https://phabricator.wikimedia.org/T155507) [10:41:31] Gone for Gone for lunch - will be back to deploy Sqoop with Dan [10:42:40] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add by-wiki stats to MediawikiHistory job using new MapAccumulator [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415255 (https://phabricator.wikimedia.org/T155507) (owner: 10Joal) [11:31:27] * elukey lunch + errand! Be back in ~2h [13:14:00] (03PS21) 10Joal: Upgrade scala to 2.11.7 and Spark to 2.2.1 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/348207 [13:14:02] (03PS2) 10Joal: [WIP] Add by-wiki stats to MediawikiHistory job using new MapAccumulator [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415255 (https://phabricator.wikimedia.org/T155507) [13:21:27] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Add by-wiki stats to MediawikiHistory job using new MapAccumulator [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415255 (https://phabricator.wikimedia.org/T155507) (owner: 10Joal) [13:47:06] (03CR) 10Hashar: "recheck" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415255 (https://phabricator.wikimedia.org/T155507) (owner: 10Joal) [13:58:01] I checked Spark after yesterday update: it's not working when started with yarn [13:58:14] :( [13:58:18] I think it's due to Spark assembly not having been updated [13:58:49] hdfs dfs -ls /user/spark/share/lib tells us [13:58:52] -rw-r--r-- 3 spark spark 177627601 2017-11-10 17:57 /user/spark/share/lib/spark2-assembly.zip [13:59:18] Meaning the spark-assembly file present in hadoop that the driver tries ti ues is not the sqme version [14:00:06] elukey: Can you confirm the patch andrew merged would have recreated a new one and put it on HDFS if it wasn't already existing? [14:01:39] joal: I have no idea what Andrew did [14:02:31] was it a puppet patch? Can't see it in operations/puppe [14:02:32] hey joal / elukey: wanna talk sqoop stuff? [14:03:09] joal: he wrote "updating spark2-* CLIs to spark 2.2.1" [14:03:19] milimetric: morning! sqoop stuff? [14:03:40] ah there you go https://gerrit.wikimedia.org/r/#/c/405894/ [14:03:45] elukey: we have to talk over this puppet: https://gerrit.wikimedia.org/r/#/c/415217/ [14:04:21] merging the associated python script and going over how we're doing logging/testing that my updates to the cron command work / decide whether to launch the cu_changes job as well [14:04:57] sure [14:05:03] cave? [14:05:15] I am a bit busy now with eventlogging in beta, can we do it later on? [14:05:52] well, this is time sensitive because ideally we merge it before the job runs (Friday) [14:06:15] it's what nuria was talking about yesterday [14:08:22] milimetric: i'm here can help [14:08:24] what's up? [14:08:28] elukey: you keep doin yo thang [14:08:30] :) [14:08:59] oh, cool, thx ottomata [14:09:04] okok [14:09:09] it's easier to explain in the cave? [14:09:12] joal: ready? [14:09:45] milimetric: I am checking the code change, the only doubt that I have is about the 2> redirect, because it would't allow an email to be sent in case of failure right? [14:09:51] the rest seems fine [14:10:09] milimetric: k one sec... [14:10:17] elukey: yeah, that's one thing I wanted to ask about, I found some way to duplicate error output but wasn't sure if it's hacky [14:10:45] the other thing I wanted to ask joal if he thought we should deploy the cu_changes stuff too [14:12:10] https://gerrit.wikimedia.org/r/#/c/415217/1/modules/profile/manifests/analytics/refinery/job/sqoop_mediawiki.pp [14:23:10] Hi milimetric - sorry was away for a few minutes [14:34:34] oh ottomata - do you mind having allok at https://gerrit.wikimedia.org/r/415255 and confirm it looks ok? [14:35:32] looking [14:36:19] ottomata: spark2 working, thanks ! [14:36:20] ah you guys already out of the cave, sorry milimetric but el beta's root partition was full :( [14:36:27] can I do anything now? [14:37:00] elukey: andrew had a good idea to make logging easier, working on that now, I can ping you when I'm done and we can review? [14:37:16] sure [14:38:13] I usually suggest to reach out ot Andrew for a quick and clever solution rather than dealing with me :D :D [14:42:50] so I discovered that on deployment-eventlog02 we had a 60G lvm phisical volume to use [14:42:59] so I mounted /var/lib/mysql on it [14:43:07] (03CR) 10Ottomata: "Some nits and Qs. Wowee joseph I understand this at about 60% :) But, LGTM!" (034 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415255 (https://phabricator.wikimedia.org/T155507) (owner: 10Joal) [14:43:22] will do the same on eventlog05 [14:43:30] gr8 [14:45:44] ottomata: is the data on eventlog02's mariadb needed ? Or can I just start with a empty db on eventlog05? [14:46:15] empty db fine afaik [14:46:26] folks usually only use to to test when they make new events [14:53:55] (03CR) 10Joal: "Answered comments inline. I'll update the code today." (034 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415255 (https://phabricator.wikimedia.org/T155507) (owner: 10Joal) [15:02:57] 10Analytics, 10Analytics-Wikistats: Active Editors metric for all projects - https://phabricator.wikimedia.org/T188265#4001846 (10Psychoslave) Thank @Nuria for the ticket. I'm not sure the description is clear here. My initial demand, which conducted to this ticket, was to have a the number of active Wikisourc... [15:14:58] (03CR) 10Ottomata: "Niiiice, comments inline." (036 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [15:15:17] ottomata: deployment-eventlog05 is ready in deployment-prep.. it has joined all the consumer groups, currently mostly in standby [15:15:54] whenever you have time we can do a sanity check and possibly deprecate deployment-eventlog02? [15:17:24] elukey: sure gimme few mins [15:18:28] ottomata: even after standup or tomorrow, no rush [15:20:29] 10Analytics-EventLogging, 10Analytics-Kanban, 10User-Elukey: Run eventlogging purging script on beta labs to avoid disk getting full - https://phabricator.wikimedia.org/T171203#3457322 (10elukey) Today deployment-eventlog02 got its root partition saturated. I discovered a lvm pvs with 60G available, so I cre... [15:24:41] (03PS16) 10Milimetric: Refactor sqoop jobs and add from_timestamp [analytics/refinery] - 10https://gerrit.wikimedia.org/r/408848 (https://phabricator.wikimedia.org/T184759) [15:26:08] 10Analytics, 10Operations, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4010694 (10Ottomata) > Anyhow, let's do kafkatee/varnishkafka/jmxtrans for the moment. Would it be ok? Ya, let's do these. > Half of the watchers are Wikimed... [15:32:50] 10Analytics, 10Operations, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4010715 (10elukey) >>! In T188377#4010694, @Ottomata wrote: >> Anyhow, let's do kafkatee/varnishkafka/jmxtrans for the moment. Would it be ok? > Ya, let's do... [15:39:38] (03PS3) 10Joal: [WIP] Add by-wiki stats to MediawikiHistory job using new MapAccumulator [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415255 (https://phabricator.wikimedia.org/T155507) [15:39:38] joal: can we merge that sqoop patch now? Take a look at the logging changes and then it would be useful to deploy refinery so I can run generate-jar and 1. test that everything's ok in prod and 2. have a new jar [15:41:34] milimetric: reading [15:41:45] thx [15:43:04] (03CR) 10Joal: [C: 031] "LGTM ! FEel free to merge and test" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/408848 (https://phabricator.wikimedia.org/T184759) (owner: 10Milimetric) [15:43:21] milimetric: read quickly, nothing stands wrong [15:43:25] let's go [15:43:30] (03CR) 10Milimetric: [V: 032 C: 032] Refactor sqoop jobs and add from_timestamp [analytics/refinery] - 10https://gerrit.wikimedia.org/r/408848 (https://phabricator.wikimedia.org/T184759) (owner: 10Milimetric) [15:50:48] (03CR) 10Mforns: "Thanks for the CR!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [15:56:45] ottomata / elukey: I gotta upload a new mediawiki sqoop orm to archiva, it says I need ops to login as archiva-deploy. I have the jar, how do we do this [15:56:50] (https://wikitech.wikimedia.org/wiki/Archiva#Uploading_dependency_artifacts) [15:57:35] (03PS1) 10Milimetric: Update mediawiki sqoop orm README [analytics/refinery] - 10https://gerrit.wikimedia.org/r/415314 [15:57:48] (03CR) 10Milimetric: [V: 032 C: 032] Update mediawiki sqoop orm README [analytics/refinery] - 10https://gerrit.wikimedia.org/r/415314 (owner: 10Milimetric) [15:57:50] milimetric: we have the credentials in pwstore (ops only) as far as I can see [15:57:57] (just doc update) [15:58:07] elukey: yeah, so one of you has to do it [15:58:32] milimetric: check your home on stat1004 [15:58:38] /home/milimetric/archiva-deploy.pw [15:59:05] even simpler [15:59:05] uh, ok. Trusting me with passwords and stuff :) [16:00:10] 10Analytics-Kanban, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#4010821 (10Nuria) Ah, sorry, i see. Then -since we plan to add tags next quarter - your should not have to do cross... [16:00:17] milimetric: testing for now if I understand - We'll deploy refinery after success, right? [16:00:45] joal: oh I already tested, I merged and generated the orm jar with the merged code [16:01:03] and now I'm uploading it to archiva, then I have to figure out what git fat magic needs to happen [16:01:06] and then I deploy refinery [16:01:15] and then we merge the puppet [16:02:24] milimetric: I think the git-fat magin is in refinery, before deploy [16:02:32] milimetric: I follow you now :) [16:02:33] Cool [16:02:42] ottomata: for https://github.com/wikimedia/analytics-refinery/blob/master/artifacts/mediawiki-tables-sqoop-orm.README#L10 do I check [] Generate Maven POM? [16:02:44] milimetric: let me know if I can help [16:02:52] thx joal [16:03:48] milimetric: naw don't think so [16:04:10] the instructions (https://wikitech.wikimedia.org/wiki/Archiva#Uploading_dependency_artifacts) say "Click + Choose File twice, once for the .pom file, and again for the .jar file. Check the pomFile box for the .pom file." [16:04:20] so, don't do that, right? [16:06:07] elukey, do you have 5 mins to discuss where to put the new EventLogging whitelist? [16:06:41] milimetric: i think in this case you don't have a pom, [16:06:47] and i don't know what generate maven pom does [16:07:41] heh, ok, yeah, pom doesn't make sense here, just didn't know if our archiva had to have a pom for some other reason [16:07:43] cool [16:07:58] i actually don't know either :x [16:10:16] ok, joal, I searched and don't find the magic spell for the artifacts, you got a doc link? [16:11:39] milimetric: you just trying to add the artifact to refinery/artifacts? [16:11:50] i'd DL the .jar from archiva (to be 100% sure the sha matchces) [16:11:54] put it in artifacts/ [16:11:54] then [16:12:03] git add ... [16:12:06] git commit, etc. [16:12:12] that *should* be all :/ [16:12:23] as long as you have git-fat inited and all [16:12:27] https://github.com/wikimedia/analytics-refinery#setting-up-the-refinery-repository [16:13:50] cool, thanks [16:19:35] (03CR) 10Fdans: [C: 032] Fix issues with numer formatting (039 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/409714 (https://phabricator.wikimedia.org/T187010) (owner: 10Nuria) [16:19:59] nuria_: sorry, that was an accidental +2 :( [16:21:02] oh poop, there was a spelling error in https://github.com/wikimedia/analytics-refinery/blob/master/artifacts/mediawiki-tables-sqoop-orm.README#L10 and I didn't catch it, so now we have an extra group: https://archiva.wikimedia.org/#browse/org.wikimedia.analytics [16:21:06] sqooooooop :) [16:21:11] I tried to delete but it doesn't seem to let me [16:21:46] (03CR) 10Fdans: [V: 032 C: 04-1] "Great stuff! Tested with English, Japanese and Spanish on Mac Chrome, Firefox and Safari." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/409714 (https://phabricator.wikimedia.org/T187010) (owner: 10Nuria) [16:22:59] Hey milimetric - missed the ping sorry :( [16:23:18] uh.... ottomata ok now something's wrong with this: https://archiva.wikimedia.org/#artifact/org.wikimedia.analytics/mediawiki-tables-sqoop-orm [16:28:10] ah! found what I did wrong [16:28:23] (03PS1) 10Joal: [WIP] Test oozie-spark2 with hive job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/415324 [16:28:38] that readme only has half the instructions, I forgot to select the mirror repository and I accidentally set the orm as a snapshot thing... [16:28:46] gonna try to fix [16:30:34] mforns: sorryyy I was in a meeting.. after standup is ok? [16:31:38] elukey, sure :] [16:32:39] milimetric: I'm willing to help but not sure how [16:32:57] sorry... I think it's broke :( I got my 1/1 [16:41:09] 10Analytics-Kanban: Spark 2.2.1 as cluster default (working with oozie) - https://phabricator.wikimedia.org/T159962#4010929 (10JAllemandou) [16:46:09] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install notebook100[34] - https://phabricator.wikimedia.org/T183935#4010948 (10elukey) [16:51:49] 10Analytics-Kanban: Refresh SWAP notebook hardware - https://phabricator.wikimedia.org/T183145#4010962 (10Ottomata) [17:01:14] milimetric: can you give me that archiva pass file please? [17:02:38] joal: I don't have rights to do anything with the file that would add you to see it :( [17:02:47] crap [17:03:32] ottomata: could you give me the archiva pass file please? [17:06:18] joal: put it in your homedir too [17:06:20] stat1004 [17:07:25] Thanks ottomata [17:08:55] (03PS11) 10Mforns: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) [17:10:08] (03CR) 10Mforns: "Mhhh, not sure what to do with TestRefineTarget..." (035 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [17:12:46] (03CR) 10jerkins-bot: [V: 04-1] Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [17:12:59] 10Analytics-Kanban: Create metrics dropdown component - https://phabricator.wikimedia.org/T188526#4011016 (10fdans) [17:17:17] (03PS12) 10Mforns: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) [17:28:17] mforns: batcave-2 ? [17:28:22] elukey, sure! [17:35:11] (03PS1) 10Milimetric: Update sqoop jar artifact and README [analytics/refinery] - 10https://gerrit.wikimedia.org/r/415342 [17:35:25] (03CR) 10Milimetric: [V: 032 C: 032] Update sqoop jar artifact and README [analytics/refinery] - 10https://gerrit.wikimedia.org/r/415342 (owner: 10Milimetric) [17:40:03] !log deploying Refinery [17:40:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:41:05] (03PS13) 10Mforns: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) [17:42:22] (03PS14) 10Mforns: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) [17:43:39] (03PS15) 10Mforns: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) [17:50:24] (03PS16) 10Mforns: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) [17:55:41] (03CR) 10Mforns: [C: 032] "LGTM! You said in the comments you would change the private method name to be different from the other but it is still unchanged. In any c" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/413633 (https://phabricator.wikimedia.org/T186602) (owner: 10Ottomata) [17:55:53] !log Refinery synced to HDFS, deploy completed [17:55:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:56:25] (03CR) 10Mforns: "I think this is good to go now!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [17:59:56] team: batcave or other room? [18:00:12] (03Merged) 10jenkins-bot: Add RefineMonitor job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/413633 (https://phabricator.wikimedia.org/T186602) (owner: 10Ottomata) [18:00:31] bc i think no? [18:00:34] am there [18:00:40] meeee tooo [18:50:11] Gone for diner team [18:51:16] (03PS17) 10Mforns: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) [18:53:43] aaaaah, I lost SoS... [18:53:49] sorry team [18:54:55] (03PS5) 10Nuria: Changing initialization of QTree to work arround precision Bug [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/408329 [18:55:58] chelsyx, joal: please take a look at https://gerrit.wikimedia.org/r/#/c/408329/ i have fixed issue with qtree initialization [18:57:06] chelsyx: let me know if 4 digits of precision sounds good: SessionsPerUser,Android,Map(percentile_50 -> (1.0,1.0625), count -> 48678, percentile_90 -> (1.0,1.0625), min -> 1, percentile_1 -> (1.0,1.0625), max -> 2, percentile_99 -> (2.0,2.0625))) [19:00:02] 10Analytics, 10Analytics-Wikistats: Active Editors metric for all projects - https://phabricator.wikimedia.org/T188265#4011432 (10Nuria) @Psychoslave I see, we call those (for the lack of a better name) "project families" , example: wikipedia is a "Project Family" and so will be "wikidictonary". That is in f... [19:02:09] nuria_: Yes, 4 digits is good! Thank you so much! [19:02:41] (03CR) 10Chelsyx: [V: 031 C: 031] Changing initialization of QTree to work arround precision Bug [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/408329 (owner: 10Nuria) [19:03:04] chelsyx: ok, +1 and will wait for joal, we can deploy with our next cluster deploy, not sure if this job woudl need to be restarted , it might it it has a refinery version [19:04:45] nuria_: indeed, refinery-source deploy + jar version update in refinery + refinery deploy + restart [19:05:06] nuria_: This'll get shipped hopefully next week [19:06:09] joal: sounds good [19:06:19] Sounds good! Thank you all! [19:06:49] ottomata: time to vet deployment-prep? [19:07:40] Do you know how can I access the database of iOS team's piwik dashboard? [19:08:12] chelsyx: that's kind of locked away on bohrium, but you can get anything you want from the piwik interface [19:08:32] chelsyx: the database is a weird format anyway, not very useful for direct querying, what do you need? [19:09:49] milimetric: I see. I don't need to do anything particular at this point, but I thought getting access to the database would be helpful for future analysis [19:10:12] millimetric[m]: also, looks like the something [19:10:40] milimetric: something's wrong, there's no data since Feb 20 [19:12:07] chelsyx: that's weird, we didn't do anything. Maybe some new deployment messed up the logging? [19:13:22] milimetric: Idk, I will check with the team [19:13:57] milimetric: I can still see real-time visitors, but pageviews and unique visitors on piwik are 0 since Feb 20 [19:16:21] oh interesting, thanks chelsyx I'll take a look after lunch. This might mean the reports aren't running [19:17:51] elukey: oh yessssss [19:17:57] Thanks milimetric ! [19:18:01] elukey: can we do via irc or should we bc? [19:18:41] ottomata: irc is fine! I was trying to make 05 work but mariadb seems not liking toku, fixing it [19:24:09] (03CR) 10Mforns: "Right, rebased that on top of monitor change." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [19:32:34] ottomata: should be fine now, but I didn't see any insert logged for the m4 consumer yet [19:32:41] some tables created on the db though [19:33:57] oh that's good [19:34:05] if tables are created that's something [19:34:09] i would expect insets to come with them though [19:34:46] elukey: i'm tailing logs there now [19:34:51] i see inserted events :) [19:34:52] eventbus' mysql consumer fails with weird errors liek [19:34:53] Feb 28 19:33:51 deployment-eventlog05 eventlogging-consumer@mysql-eventbus[32181]: ValueError: Unknown timestamp type: 8 [19:35:01] yeah [19:35:28] 2018-02-28 19:34:47,505 [31764] (MainThread) Inserted 205 MobileWikiAppLinkPreview_15730939 events in 0.108530 seconds [19:35:35] but this works :) [19:35:38] ??? hmmm [19:35:54] yeah, hm [19:36:15] I think it is maybe some missing config of the db [19:36:46] weird [19:37:26] no its throwin gthat in the kafka consumer somehow... [19:37:50] yeah but /srv/log/eventlogging/* consumers work fine.. [19:38:00] ah they are not eventbus [19:38:11] mmmmm [19:40:30] yeah weird hm [19:41:15] hmmm [19:41:23] ottomata: same thing on 02 [19:41:29] I can see the same error in the logs [19:41:51] ValueError: Unknown timestamp type: 8 [19:43:34] investigating... [19:45:50] 10Quarry: SQL Syntax errors in Quarry - https://phabricator.wikimedia.org/T188538#4011560 (10Tohaomg) [19:47:03] elukey: i can reproduce from simple kafka consumer hmm [19:47:04] librdkafka1 is 0.9.3 [19:47:10] can it be it? [19:47:23] no, this is kafka python [19:47:30] i think its a kafka broker 1.0 vs 0.9 api version issue [19:47:33] possibly with timestamp types [19:47:50] ah right kafka python doesn't use that [19:48:02] it could very well be [19:48:21] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Switch cdnPurge to Kafka - https://phabricator.wikimedia.org/T188540#4011588 (10Pchelolo) p:05Triage>03Normal [19:48:39] ottomata: so I disabled puppet on el02 and left a /etc/motd to point to 05 [19:48:48] 05 is the only one running now [19:49:18] yeah, same code can consume from the deployment-main-eqiad cluster [19:49:57] that is on 0.9? [19:50:41] yeah [19:50:46] anyhow, it is getting a bit late in here, commuting home.. will check later! [19:50:55] so this is a good catch, this would have broke in prod too [19:50:57] still investigating [19:51:04] ok elukey i'll let you know what I figure out! [19:51:09] super! [19:54:16] elukey: https://github.com/dpkp/kafka-python/pull/828 [19:54:36] will prepare a new .deb for kafka python [20:11:32] 10Analytics, 10Analytics-EventLogging, 10TimedMediaHandler, 10Wikimedia-Video: Record and report metrics for audio and video playback - https://phabricator.wikimedia.org/T108522#4011675 (10brion) @fdans & @DarTar yes let's schedule a brief talk to get started? I'm on pacific time, mostly flexible time othe... [20:19:42] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade eventlogging servers to Stretch - https://phabricator.wikimedia.org/T114199#4011693 (10Ottomata) [20:21:42] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#4011697 (10Pchelolo) [20:21:51] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade eventlogging servers to Stretch - https://phabricator.wikimedia.org/T114199#4011701 (10Ottomata) @elukey, the error you were getting in deployment prep was caused by https://github.com/dpkp/kafka-python/pull/828, which... [20:22:03] 10Quarry: Implement SQL Query Validator in Quarry - https://phabricator.wikimedia.org/T188538#4011705 (10Reedy) [20:22:06] 10Analytics-Cluster, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Move EventLogging analytics processes to Kafka jumbo-eqiad cluster - https://phabricator.wikimedia.org/T183297#3849554 (10Ottomata) [20:22:21] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade eventlogging servers to Stretch - https://phabricator.wikimedia.org/T114199#4011708 (10Ottomata) a:03elukey [20:45:40] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore: dbstore1002 (analytics store) enwiki lag due to blocking query - https://phabricator.wikimedia.org/T175790#3603220 (10leila) @jcrespo I'm removing Research tag from this task. If the issue persists and you suspect it's Research team re... [20:45:47] 10Analytics, 10WMDE-Analytics-Engineering, 10User-Addshore: dbstore1002 (analytics store) enwiki lag due to blocking query - https://phabricator.wikimedia.org/T175790#4011774 (10leila) [20:49:07] ottomata: I want to test this cron update that I made in puppet, how do I test puppet? [20:50:58] not easily! [20:51:11] but you can kinda see what it woudl do in prod with [20:51:16] https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/build [20:53:30] woah, ottomata so for the list of nodes there I can just put analytics1003.eqiad.wmnet? [20:53:56] yup [20:54:00] this thing's awesome, ok, thx [20:54:06] click through to console output [20:54:09] wait til its done [20:54:12] then show the changes [20:54:14] it'll link them to you [20:58:15] 10Analytics: Stats on which clients/user-agents are making the most API edits - https://phabricator.wikimedia.org/T188549#4011836 (10Reedy) [20:58:20] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0: allow to view stats for all language versions - https://phabricator.wikimedia.org/T188550#4011848 (10Pamputt) [20:59:32] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0: allow to view stats for all language versions - https://phabricator.wikimedia.org/T188550#4011861 (10Pamputt) [20:59:34] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0. - https://phabricator.wikimedia.org/T130256#4011862 (10Pamputt) [21:03:15] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0: allow to view stats for all language versions (a.k.a. Project families) - https://phabricator.wikimedia.org/T188550#4011877 (10Nuria) [21:08:15] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats 2.0.: display the mean value for Total Page Views - https://phabricator.wikimedia.org/T188552#4011881 (10Pamputt) p:05Triage>03Normal [21:08:40] (03CR) 10Nuria: [C: 032] Changing initialization of QTree to work arround precision Bug [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/408329 (owner: 10Nuria) [21:15:07] ottomata: hm, it looks like maybe I did something wrong with puppet, based on the build output: https://puppet-compiler.wmflabs.org/compiler03/10205/analytics1003.eqiad.wmnet/ [21:15:15] it seems to only generate one of the cron commands [21:15:25] but I added a second cron: https://gerrit.wikimedia.org/r/#/c/415217/ [21:17:24] milimetric: it worked, but for some reason isn't showing you the cron because its new [21:17:25] see [21:17:29] Resources only in the new catalog [21:17:31] section [21:17:33] • Cron[refinery-sqoop-mediawiki-private] [21:17:53] if you click on Change Catalog [21:17:54] https://puppet-compiler.wmflabs.org/compiler03/10205/analytics1003.eqiad.wmnet/change.analytics1003.eqiad.wmnet.pson [21:18:02] and then search for refinery-sqoop-mediawiki-private [21:18:17] hmm no that doesn't show you the full thing hmm [21:18:21] oh it does [21:18:35] oh, cool [21:18:54] --log-file /var/log/refinery/sqoop-mediawiki-private.log [21:18:55] etc. looks good [21:22:39] aha! found a small mistake, this is great! [21:27:33] ottomata, do you have some minutes for me to show you the Refine errors I talked about in stand-up? [21:33:08] they read: [21:33:09] org.apache.spark.sql.execution.QueryExecutionException: FAILED: SemanticException [Error 10006]: Partition not found {year=2018, month=2, day=27, hour=0} [21:33:27] and then a long stacktrace [21:33:40] but the job continues and succeeds [21:34:38] mforns: does hour 0 exists? [21:34:45] i guess it must [21:36:24] nuria_, yes, hdfs dfs -ls /wmf/data/event/ChangesListFilters/year=2018/month=2/day=27/hour=0 gives expected contents [21:36:56] mforns: yaaa one min...! [21:37:08] no rush! :] [21:38:23] maybe the error is that the partition is not found **in the output**, cause it's purged, it's an empty DataFrame [21:38:52] but.. does not make much sense [21:39:42] query exception hmmm [21:40:00] yea [21:40:13] 18/02/28 21:24:21 ERROR Refine: Failed refinement of dataset hdfs://analytics-hadoop/wmf/data/event/ChangesListFilters/year=2018/month=2/day=27/hour=0 -> mforns.ChangesListFilters (year=2018,month=2,day=27,hour=0). [21:40:13] org.apache.spark.sql.execution.QueryExecutionException: FAILED: SemanticException [Error 10006]: Partition not found {year=2018, month=2, day=27, hour=0} [21:40:34] so, I guess by your reaction that it's not expected [21:40:41] looking deeper [21:40:42] haha nope [21:41:52] mforns: its def not working, the job catches exceptions and then prints status at end [21:41:57] and writes failure flag [21:42:05] _REFINED_FAILED exists in your table [21:42:06] dir [21:42:07] so, ottomata what if the transformation function returns an empty dataFrame? [21:42:13] I see [21:42:23] i'd guess it would work, as long as the schema is provided [21:42:39] OH hm [21:42:40] hm [21:42:48] maybe mforns [21:42:51] the partition columns are added first [21:42:58] before the df is passed to your transform function [21:43:03] aha [21:43:07] if you are just emptying the df, you'll lose those [21:43:08] hm [21:43:34] hmmm [21:43:37] mforns: you could empty it, adn the call [21:43:49] DataFrameToHive.dataFrameWithHivePartitions [21:43:50] again [21:43:52] on the empty data frame [21:43:54] to-readd them [21:44:05] but... [21:44:18] the empty data frame is created using the schema of the passed dataFrame [21:44:28] val emptyRDD = sqlContext.sparkContext.emptyRDD.asInstanceOf[RDD[Row]] [21:44:28] sqlContext.createDataFrame(emptyRDD, dataFrame.schema) [21:44:37] yeah, but the partitions need static values [21:44:41] so, if you empty it [21:44:48] there is no data for spark to know how it should be partitioned [21:44:52] hmmm weird [21:44:53] this is weird [21:44:59] ok i am trying some things... [21:45:24] you might need at least one record... OR [21:45:24] hm [21:45:31] wait, mforns [21:45:38] waiting :] [21:45:40] haha [21:45:45] empty df == data is dropped? [21:45:56] what do you mean? [21:46:01] i mean, why is there an empty df? [21:46:18] * - If the table name of the given HivePartition is not present in the [21:46:18] * whitelist, the transformation function will return an empty DataFrame. [21:46:19] ok [21:46:21] right [21:46:24] so maybe [21:46:25] when the table is not in the whitelist, for instance, the transformation function returns an empty dtaframe [21:46:32] instead of inserting into hive [21:46:37] Refine should just skip empty df [21:46:49] doesn't make any sense to insert empty df [21:46:53] ottomata, that's a possibility [21:47:03] yea I guess [21:47:16] DataFrameToHive [21:47:17] hm [21:47:27] for non-whitelisted tables, we'd have empty tables in event_public [21:47:39] ottomata, I can add that code if you want [21:48:03] and test it [21:48:15] well, no tables at all, beacuse we can return before even calling prepareHiveTable [21:48:30] hmm, mforns bc real quick? [21:48:36] sure [21:51:47] (03PS1) 10Milimetric: Fix unlikely breaking error in comments [analytics/refinery] - 10https://gerrit.wikimedia.org/r/415457 [21:51:57] (03CR) 10Milimetric: [V: 032 C: 032] Fix unlikely breaking error in comments [analytics/refinery] - 10https://gerrit.wikimedia.org/r/415457 (owner: 10Milimetric) [22:07:28] ottomata: that change is ready to merge I think, fixed a small typo and tried both commands as compiled by puppet [22:09:14] milimetric: do you want both crons to launch at the same time? [22:09:33] ottomata: yeah, that's fine, they're hitting two different dbs [22:09:37] ok [22:09:41] !log re-deployed refinery for a small docs fix in the sqoop script [22:09:42] ok, i will merge, ya? [22:09:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:09:57] thanks ottomata, yeah, ready as far as I can test [22:09:59] k [22:11:53] milimetric: there's not a dry run option or something, is there? :) [22:12:02] cronjob installed :) [22:12:25] ottomata: not really, there should be! [22:12:44] ya, like one that woudl just print out final sqoop command(s) or something [22:12:57] yeah, would be very easy to do actually [22:13:04] adding a task [22:13:15] ok, well, its installed, i guess we can wait til the 2nd and you can tail /var/log/refinery/sqoop-mediawiki-private.log on an03 [22:13:53] milimetric: +1 to adding a task about dry run [22:14:42] 10Analytics: Add a --dry-run option to the sqoop script - https://phabricator.wikimedia.org/T188556#4012066 (10Milimetric) [22:15:22] yeah, ottomata but I did execute the commands as puppet compiled them under my username with a test output directory and they worked fine [22:15:31] that's why I was more confident than my usual shaky self [22:15:36] ok great [22:17:46] nuria_: any idea why iOS data is no longer getting to piwik reports? https://piwik.wikimedia.org/index.php?module=CoreHome&action=index&idSite=3&period=day&date=yesterday#?module=Dashboard&action=embeddedIndex&idSite=3&period=day&date=yesterday&idDashboard=1 [22:18:14] I looked at the logs and haven't found any problems, but it looks like data is coming in, just not getting aggregated [22:20:05] 10Analytics-Kanban, 10Patch-For-Review: Sqoop cu_changes table for geowiki - https://phabricator.wikimedia.org/T184759#3894451 (10Milimetric) [22:21:27] ottomata, what about flags? should Refine write done flags for empty datasets? [22:21:49] otherwise, it will keep selecting them as RefineTargets all the time, no? [22:22:02] 10Analytics, 10Analytics-Data-Quality, 10Discovery-Analysis, 10Wikipedia-Android-App-Backlog: Malformed wiki field in mobile app event logs - https://phabricator.wikimedia.org/T188557#4012090 (10mpopov) p:05Triage>03Unbreak! [22:22:23] OH Hmmmm [22:22:23] mforns: HMM [22:22:39] shoot [22:22:43] xD [22:22:51] grrr [22:22:52] hm [22:22:54] grr [22:23:07] mforns: can you provide whitelisted table names to find()? [22:23:20] mmmm [22:23:21] you read the whitelist in already in order to do set up [22:23:24] you could get keys from it [22:23:28] ? [22:23:34] build the regex [22:23:35] ? [22:23:41] I guess so [22:23:52] you mean to Refine no? [22:24:05] uhh yes. [22:24:06] :) [22:24:20] yes, I think that would work! [22:24:23] trying [22:24:51] hmm, i wonder if we should have an overloaded Refine apply that takes Seq[RefineTarget], if you don't want it to search in base paths [22:24:51] hmm [22:24:57] then you can do the find yourself. [22:25:15] not sure which is better :) [22:25:26] passing whitelist to refine problably easier right now... :) [22:25:32] am heading out v soon... [22:26:03] k [22:26:12] chelsyx: I looked and couldn't find anything wrong. Can you double check with the iOS team and if they're sure everything's ok on their end I'll look deeper. [22:26:13] Hello. I'm having trouble navigating analytics documentation to see what data is available and how it can be used. Thought that'd be one click away from https://wikitech.wikimedia.org/wiki/Analytics, but it is not. [22:26:27] Sveta: what do you need? [22:26:40] An overview of what is available. I don't have a particular goal in mind. [22:26:42] Sveta: our data's pretty complicated, and we're happy to improve docs [22:26:46] Sveta: ok, one sec [22:26:49] millimetric: Sure! [22:27:32] Sveta: is this section https://wikitech.wikimedia.org/wiki/Analytics#Datasets too technical or not useful? [22:28:02] Sveta: have you looked at http://stats.wikimedia.org/v2/ [22:28:42] chelsyx: I can look, or did milimetric look alreday? [22:28:43] nuria_ / Sveta: wikistats is good for community folks looking for wiki activity stats, but the wikitech docs are an overall overview of all our data [22:28:57] nuria_: I looked at piwiki-archive logs, I pinged you above, I can't find any errors [22:29:12] nuria_: though it is weird, it says something about site id 3 (iOS) every day [22:29:13] 10Analytics, 10Analytics-Data-Quality, 10Discovery-Analysis, 10Wikipedia-Android-App-Backlog: Malformed wiki field in mobile app event logs - https://phabricator.wikimedia.org/T188557#4012125 (10Dbrant) Yes indeed, there were some old versions of the app (from around Dec 2016) that were erroneously populat... [22:29:27] chelsyx, milimetric : data on the prior 24 hrs period is always incomplete [22:29:37] nuria_: yeah, but it's missing for over a week [22:29:41] milimetric: but not zero [22:29:45] right right [22:30:29] I gotta run but I'll watch chat - nite all [22:30:48] milimetric: ok, chelsey can you write a ticket? [22:34:08] nuria_: Sure [22:41:42] l8rrs alll! [22:46:29] 10Analytics, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4012140 (10chelsyx) [22:51:04] nuria_ millimetric: Just check with the team. There is no new versions went out and they didn’t change anything. Created a ticket https://phabricator.wikimedia.org/T188559 [22:52:44] 10Quarry: Implement SQL Query Validator in Quarry - https://phabricator.wikimedia.org/T188538#4011560 (10zhuyifei1999) Quarry isn't supposed to be slow in 'queued'. I'll investigate. [22:55:29] (03PS10) 10Nuria: Fix issues with numer formatting [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/409714 (https://phabricator.wikimedia.org/T187010) [22:56:21] 10Analytics, 10Analytics-Data-Quality, 10Discovery-Analysis, 10Wikipedia-Android-App-Backlog: Malformed wiki field in mobile app event logs - https://phabricator.wikimedia.org/T188557#4012216 (10mpopov) 05Open>03Resolved a:03mpopov >>! In T188557#4012125, @Dbrant wrote: > Yes indeed, there were some... [22:59:03] (03PS11) 10Nuria: Fix issues with numer formatting [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/409714 (https://phabricator.wikimedia.org/T187010) [23:02:22] (03CR) 10Nuria: "Please sanity check, i think i corrected all your comments." (037 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/409714 (https://phabricator.wikimedia.org/T187010) (owner: 10Nuria) [23:04:08] 10Analytics: Piwik to measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419#4012230 (10Nuria) Turns out documentation pages will not be included in this site, from my brief inspection terms_of_use and privacy_policy are the most visited pages in wikimediafoundation.org so... [23:05:09] 10Quarry: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564#4012234 (10zhuyifei1999) [23:05:50] milimetric: Analytics#Datasets is OK, but it needs clicking each item to understand what sort of information it may provide. I'll do that. [23:06:18] nuria_: http://stats.wikimedia.org/v2/ looks informative and clear. Is this comprehensive, or there are some data that are not shown on the site? [23:06:45] Sveta: the answer depends on what you are looking for [23:07:15] Sveta: there are bazillions of data on http://stats.wikimedia.org [23:07:24] nuria_: does https://stats.wikimedia.org/v2/#//contributing/editors include contributors whose edits were deleted and the contributors were blocked? [23:07:39] nuria_: or it filters that out? [23:08:27] Sveta: short answer, yes, here are docs: https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/Editors [23:08:44] Yes it includes the blocked contributors? [23:08:49] Sveta: no, wait, sorry, i missunderstood youyr question [23:09:06] OK, thanks. [23:09:26] I'd be interested to know when and how it collects the data. [23:09:26] 10Quarry: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564#4012265 (10zhuyifei1999) https://www.mediawiki.org/wiki/Topic:U1le4hrq6eunlafz says 500k works. NO, that's waaay tooo large, you should be using dumps if you want so much unfiltered data. Gonna limit... [23:09:33] At the time of making the edit, they naturally aren't blocked. [23:09:53] 10Quarry: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564#4012267 (10zhuyifei1999) p:05Triage>03High [23:13:46] Sveta: it is not a trivial process: https://wikitech.wikimedia.org/wiki/File:Wikistats_2_Backend.png [23:14:42] Sveta: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits [23:15:41] Sveta: https://github.com/wikimedia/analytics-refinery/tree/master/oozie/mediawiki/history [23:15:58] Sveta: https://github.com/wikimedia/analytics-refinery/tree/master/oozie/mediawiki/history/metrics [23:19:04] I'll check in a bit, when February data is available. [23:19:21] Thanks for showing how it runs. I'd like to test its output for a small wiki to see what it does with deleted pages. [23:19:35] I'll get back to you. [23:20:43] Sveta: it is recomputed every month, if edits have been removed and editor blocked an editor will not be counted [23:24:25] 10Analytics-Kanban, 10Discovery-Analysis, 10Reading-analysis: Pageviews/Stats on dataviz-literacy.wmflabs.org - https://phabricator.wikimedia.org/T187104#4012316 (10Nuria) See tracking code: