[00:05:54] 10Analytics, 10Security: Review request for data export - https://phabricator.wikimedia.org/T264255 (10calbon) No objection from me. The ORES data referred to in this ticket is not sensitive or PII. [00:51:59] PROBLEM - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:51:55] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Review and improve Oozie authorization permissions - https://phabricator.wikimedia.org/T262660 (10elukey) @razzi when you have a moment let's create the puppet patches for this, so we can start reviewing the code etc.. [05:58:50] !log execute "sudo -u hdfs kerberos-run-command hdfs hdfs dfs -chown -R analytics-privatedata /wmf/data/archive/geoip" - T264152 [05:58:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [05:58:52] T264152: Fix Maxmind geoip database archive - https://phabricator.wikimedia.org/T264152 [06:04:22] !log execyte "sudo chown -R analytics-privatedata:analytics-privatedata-users /srv/geoip/archive" on stat1007 - T264152 [06:04:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:04:25] T264152: Fix Maxmind geoip database archive - https://phabricator.wikimedia.org/T264152 [06:13:09] 10Analytics, 10Analytics-Kanban: Fix Maxmind geoip database archive - https://phabricator.wikimedia.org/T264152 (10elukey) As temporary solution I have implemented 2), namely moving the timer to the `analytics-privatedata` user (we have the keytab on the host). I have also moved the file ownership to `analytic... [06:18:31] RECOVERY - Check the last execution of archive-maxmind-geoip-database on stat1007 is OK: OK: Status of the systemd unit archive-maxmind-geoip-database https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:44:22] (03CR) 10Elukey: [C: 03+1] "Fran, when you have a moment let's review/merge this and deploy it (so we can restart the job and re-run one failed hour)." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/630682 (owner: 10Joal) [06:46:17] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) Bootstrapped 12,15,16 - disk/partitions look good! [07:03:28] 10Analytics-Radar, 10Operations, 10Patch-For-Review: Move Hue to a Buster VM - https://phabricator.wikimedia.org/T258768 (10MoritzMuehlenhoff) 05Open→03Resolved Ack, closing this task. [07:12:41] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T264268 (10MoritzMuehlenhoff) [07:13:44] 10Analytics: Check home/HDFS leftovers of shiladsen - https://phabricator.wikimedia.org/T264269 (10MoritzMuehlenhoff) [07:15:07] !log restart hdfs namenodes on an-master100[1,2] to pick up new hadoop workers settings [07:15:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:24:21] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T264268 (10elukey) ` ====== stat1004 ====== total 24 -rw-r--r-- 1 nathante wikidev 12682 Nov 18 2018 DwellTimeModels.R.r drwxrwxr-x 3 nathante wikidev 4096 Sep 25 2018 R drwxrwxr-x 2 nathante wikidev 4096 Oct 2 2018 read... [07:26:49] 10Analytics: Check home/HDFS leftovers of shiladsen - https://phabricator.wikimedia.org/T264269 (10elukey) ` ====== stat1004 ====== total 7078300 drwxr-xr-x 15 shiladsen wikidev 4096 Dec 29 2017 analytics-refinery drwxrwxr-x 7 shiladsen wikidev 4096 Nov 28 2017 annoy-master -rw-r--r-- 1 shiladsen... [07:54:54] 10Analytics, 10Operations, 10Traffic: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 (10elukey) I am not an expert in `perf` but I tried to do the following on cp5012: `sudo perf record -F 99 -p 29945 --call-graph dwarf sleep 10` (the pid is varnishkafka-webrequest) And I... [08:07:16] 10Analytics-Clusters: Refresh 16 nodes in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T255140 (10elukey) p:05Triage→03High [08:07:37] 10Analytics-Clusters, 10Analytics-Kanban: Refresh 16 nodes in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T255140 (10elukey) [08:07:47] 10Analytics-Clusters, 10Analytics-Kanban: Refresh 16 nodes in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T255140 (10elukey) a:03elukey [09:01:51] 10Analytics, 10Analytics-Kanban: Move the Analytics infrastructure to Debian Buster - https://phabricator.wikimedia.org/T234629 (10elukey) [09:02:45] going afk for an errand, ttl! [09:02:51] Bye elukey :) [10:27:53] back! [10:28:07] I am adding an-worker1103 to the hadoop cluster [10:28:14] (first of the 16 new nodes to add) [10:30:18] !log add an-worker1103 to the hadoop cluster [10:30:21] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:40:55] all good, the worker just started [10:42:27] going to have a quick lunch, ttl [11:13:02] (03CR) 10Mforns: "Awesome :]" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/630680 (https://phabricator.wikimedia.org/T263495) (owner: 10Razzi) [12:08:18] joal: an-worker1103 bootstrapped fine, I am adding other two new workers [12:08:25] \o/ [12:08:31] (I am expanding the cluster right now, and then I'll shrink it to decom nodes) [12:08:45] GPU nodes are almost ready, I have a code review for the gpu drivers etc.. [12:08:52] elukey: thinkng of secondary cluster for data backup - which nodes are we gonna use? [12:09:52] after this expand/shrink we'll have 16 nodes [12:10:19] ~770TBs of total HDFS data [12:10:23] Ok - This represents ~100Tb useful to save - sounds great [12:10:36] joal: 100? [12:10:44] Oh! 22 disks of 2Tb, not 1Tb, right? [12:11:06] nono the 16 workers have 12x4TB disks [12:11:23] O.o [12:11:25] sorr [12:11:29] the 22 disks are 2TBs, and in the gpu nodes [12:11:44] (6 that I'll add without shrink) [12:11:58] I can explain in meet if you want (the whole plan I mean) [12:12:02] ok, so 250Tb useful available [12:12:06] that's great [12:12:12] this is hardware for last year's fiscal [12:12:18] Thanks for helping me with the matchs :) [12:12:20] then we'll have 24 more nodes :D [12:12:39] And actually, if we use rep-factor 2, it means 350Tb usefull [12:12:42] I am trying to bootstrap them asap to speed up the creation of the backup cluster [12:12:48] <3 [12:12:58] elukey: Please let know if I can help [12:14:03] should be good now, the cookbook to bootstrap the partitions etc.. works really great, it takes minutes to do the whole thing for multiple nodes [12:14:24] I already created all the keytabs, and the TLS certs are got from puppet local certs [12:14:56] * joal would love to be half as good as elukey in term of automation [12:15:38] joal: without your help on the TLS cert swap it would be way more painful to bootstrap nodes :) [12:25:46] morning! elukey, looking at the pings now [12:44:01] morning! ack [12:46:25] joal: forgot to ask - have you seen this marvellous https://phabricator.wikimedia.org/T264074 ? [12:46:48] * elukey plays sad_trombone.wav [12:46:52] Oh no I missed that :( [12:47:19] nothing on fire, but also not great [12:47:34] all the cp nodes are on varnish6 now [12:47:48] yeah :( I read with strong emergency feeling the: Analytics is currently working on confirming if the data produced by atskafka is good enough to permanently get rid of varnishkafka [12:48:17] elukey: are there things I should statrt investigating in that regard? [12:48:38] joal: nono I'll follow up with e*ma, I just wanted to let you know [12:48:41] so you are aware [12:48:58] ack elukey - please let me know if you wish me to check some data [12:49:05] sure :) [13:05:58] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh 16 nodes in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T255140 (10elukey) The plan is to add the 16 new nodes (expanding the cluster) progressively, and then remove the 16 old ones (shrinking the cluster) later on. [13:10:17] PROBLEM - Check the last execution of performance-asoranking on stat1007 is CRITICAL: CRITICAL: Status of the systemd unit performance-asoranking https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:15:13] !log execute "sudo chown analytics-privatedata:analytics-privatedata-users /srv/published-datasets/performance/autonomoussystems/*" on stat1007 to fix a perm issue after reimage [13:15:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:15:49] !log restart performance-asoranking on stat1007 [13:15:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:16:24] 10Analytics, 10Analytics-Kanban: Debianize Python's pid library to be able to use it from reportupdater - https://phabricator.wikimedia.org/T262574 (10mforns) > Can @mforns confirm that reportupdater can use this package as its? It works great! (tested the new code in an-launcher1002) Now, one behavior chang... [13:20:33] RECOVERY - Check the last execution of performance-asoranking on stat1007 is OK: OK: Status of the systemd unit performance-asoranking https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:21:11] 10Analytics, 10Analytics-Kanban: Wikistats time selector tooltip spawns at center of site instead of aligned with button - https://phabricator.wikimedia.org/T264310 (10fdans) [13:23:21] (03CR) 10Mforns: "@Paul Kernfeld," [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/623470 (https://phabricator.wikimedia.org/T173604) (owner: 10Paul Kernfeld) [13:24:54] (03CR) 10Mforns: "@Paul Kernfeld," [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/623470 (https://phabricator.wikimedia.org/T173604) (owner: 10Paul Kernfeld) [13:27:40] (03PS1) 10Fdans: Fix time selection tooltip not alighning with button [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/631443 (https://phabricator.wikimedia.org/T264310) [13:27:52] (03CR) 10jerkins-bot: [V: 04-1] Fix time selection tooltip not alighning with button [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/631443 (https://phabricator.wikimedia.org/T264310) (owner: 10Fdans) [13:28:53] (03PS2) 10Fdans: Fix time selection tooltip not alighning with button [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/631443 (https://phabricator.wikimedia.org/T264310) [13:32:53] elukey: what's going on with the prom export on an-worker1096? Is that the machine where rocm33 doesn't even recognize the GPU? [13:33:28] klausman: hi! I need to reboot the host, it doesn't have the kfd etc.. loaded [13:33:42] Ah, right. I can see to it if you're busy [13:34:37] klausman: to reboot one node we'd need to drain it first, and currently it is a little bit manual. If you want I can tell you how I have been doing it so far and you can do it [13:35:31] Sounds good [13:35:39] !log bootstrap an-worker1097 (GPU node) as hadoop worker [13:35:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:36:01] klausman: ok so on the host there are two daemons: yarn node manager (responsible to allocate containers) and hdfs datanode [13:36:29] usually I do: puppet agent --disable "etc.." ; sudo systemctl stop hadoop-yarn-nodemanager [13:36:40] and finally stop hadoop-hdfs-datanode [13:37:11] when the nodemanager is stopped the running containers keep going, so I usually wait a bit and reboot (like 10 mins) [13:38:08] there are some ways IIRC to instruct the master node to drain the hosts, but it is more like decommission that temporary stop [13:38:16] so I have never followed that road [13:38:31] Roger. [13:38:40] I am also bootstrapping now 1097 so you can reboot that one as well later on :) [13:38:51] Will disable agent, then stop node manager and datanode, then wait 10m or so with an eye on htop [13:38:56] (maybe let's wait a bit on 97 so it gets the first blocks assigned etc..) [13:39:01] super [13:41:42] * elukey coffee, bbiab [13:41:51] Huh. Looks like the machine is empty basically immediately. [13:42:05] There are two du(1)s running, but that's about it [13:42:14] Thanks for the link isaacj :) [13:43:23] joal: happily! i don't know how soon it'll happen but there seemed to be enough agreement among major browsers that changes could start soon and affect our bot detection, reader session, etc. approaches. for others: https://www.zdnet.com/article/google-to-phase-out-user-agent-strings-in-chrome/ [13:44:49] PROBLEM - Hadoop DataNode on an-worker1096 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [13:45:08] hehe :) Welcome an-worker1096 [13:45:47] PROBLEM - Hadoop NodeManager on an-worker1096 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [13:45:59] oops. Should have put in a silence [13:48:34] klausman: also don't forget to !log in #operations [13:55:12] RECOVERY - Hadoop DataNode on an-worker1096 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [13:55:14] rocm-smi works fine now [13:55:29] But still: [13:55:31] TypeError: __init__() got an unexpected keyword argument 'text' [13:56:22] ah, this is a incompatibility with theis Python's subprocess module :-/ [13:56:46] ah snap 3.7 vs 3.5 [13:56:50] Yep [13:56:51] lovely [13:57:12] Should we change the script to hardcode 3.7? Or is it better to change the default Python3 on the machine? [14:01:39] klausman: +1 for python3.7, looks fine [14:03:33] Ok, will make patch [14:07:46] RECOVERY - Hadoop NodeManager on an-worker1096 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [14:09:05] klausman: when you have a moment, please also answer in alerts@ to the alerts so people don't look at them [14:09:25] will do [14:09:29] thanks :) [14:15:51] Patch is ready at https://gerrit.wikimedia.org/r/631452 [14:30:17] And an-worker1096 has both a working GPU and Prom reporting [14:35:09] nice! [14:35:52] I are teh productives! [14:51:17] !log bootstrap an-worker109[8-9] as hadoop workers (with GPU) [14:51:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:01:14] PROBLEM - Disk space on Hadoop worker on an-worker1098 is CRITICAL: NRPE: Command check_disk_space_hadoop_worker not defined https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [15:03:26] RECOVERY - Disk space on Hadoop worker on an-worker1098 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [15:05:05] 10Analytics, 10Analytics-Kanban: Fix Maxmind geoip database archive - https://phabricator.wikimedia.org/T264152 (10Nuria) Thanks and agreed on solution 3) . I have assigned to @razzi and we can take this up as part of regular development [15:17:08] joal: are you following the keynote? [15:17:19] nope elukey - socializing wiht a-team [15:17:21] what's up [15:17:21] ? [15:18:14] joal: there is a talk about spark and parquet at oak ridge national lab :) [15:18:22] ohhhh :) [15:18:33] I guess you might like it :D [15:19:29] elukey: would by any chance have a link? [15:20:08] joal: it is just in the "stage" button from the main page off the conf [15:21:08] awesome - thanks elukey - following it now [15:21:39] :) [15:31:19] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Epic: Add data quality alarm for mobile-app data - https://phabricator.wikimedia.org/T257692 (10Nuria) Entropy of os-family per method of access, orange line shows the big oscillations in mobile apps, oscillation is not present in desktop or mobile ap... [15:49:25] 10Analytics, 10Analytics-Kanban: Fix Maxmind geoip database archive - https://phabricator.wikimedia.org/T264152 (10fdans) @razzi we can pair up on this if you have any questions or get stuck! It'll be a nice refresher for me too :) [15:51:04] 10Analytics, 10Analytics-Wikistats: Wikistats New Feature - https://phabricator.wikimedia.org/T264327 (10nurdinjaelani) [16:18:27] 10Analytics, 10Analytics-Wikistats: Wikistats New Feature - https://phabricator.wikimedia.org/T264327 (10Nuria) ping @nurdinjaelani want to add some info here as to your request? [16:18:46] 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10Patch-For-Review, and 2 others: Develop a new schema for MediaSearch analytics or adapt an existing one - https://phabricator.wikimedia.org/T263875 (10nettrom_WMF) This is awesome work so far! I've read through this task, its parent task, a... [16:42:54] 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) There are two nodes marked with `fails to hit dhcp server, please check cable/port`, @Cmjohnson when you have a moment can you check? an-worker... [16:54:47] mforns: Have you deployed the train last week? [16:56:19] joal: no... [16:56:24] ok [16:56:32] fdans: o/ [16:56:32] joal: I can do today [16:56:49] there seem to be a lot of things in https://etherpad.wikimedia.org/p/analytics-weekly-train ready for a deploy [16:56:52] (03CR) 10Razzi: [C: 03+1] Improve path discovery in drop-older-than [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628933 (https://phabricator.wikimedia.org/T263495) (owner: 10Mforns) [16:57:08] mforns: not needed - elukey pointed that sqoop has started without the new tables [16:57:19] I'm gonna run a manual run [16:57:19] yeah :( [16:57:33] there is the druid banner stuff though [16:57:55] We can deploy, it'd be useful - but it won't solve the sqoop problem [16:58:16] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Merging!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/628933 (https://phabricator.wikimedia.org/T263495) (owner: 10Mforns) [17:00:50] 10Analytics-Radar, 10Operations, 10ops-eqiad: an-presto1004 down - https://phabricator.wikimedia.org/T253438 (10RobH) a:05RobH→03Cmjohnson Reassigning to Chris, as I listed him as the contact on the self dispatch for the dell tech to contact and arrange a time for the onsite work. [17:02:49] mforns: joal i'm doing the train today! [17:03:02] ack fdans [17:04:59] ok fdans thanks! [17:06:12] nuria: Razzi and I merged the deletion script changes, and we were thinking of manually deleting the raw mediawiki_job and netflow data. Just checking if that is OK? [17:07:04] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/630682 (owner: 10Joal) [17:12:09] mforns: did you end up deploying refinery on monday? [17:12:34] fdans: no, no, we wanted to finish the deletion script changes, and we couldn't [17:12:57] gotcha mforns gracias [17:13:46] :] [17:15:56] (03PS3) 10Razzi: Test using mocked file tree in refinery-drop-older-than [analytics/refinery] - 10https://gerrit.wikimedia.org/r/630680 (https://phabricator.wikimedia.org/T263495) [17:18:23] !log deploying refinery [17:18:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:20:46] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/630680 (https://phabricator.wikimedia.org/T263495) (owner: 10Razzi) [17:21:08] fdans [17:21:17] helo [17:21:29] fdans: we just merged a patch for refinery... [17:21:40] that's ok, scap is giving me grief [17:21:41] is it too late? [17:21:44] nono [17:21:46] ok [17:21:56] let me add info to deployment train etherpad! [17:23:09] fdans: deployment train etherpad is kaput! [17:23:56] mforns: if it is the patch for banner daily I already added it [17:24:15] elukey: hellooo I'm getting access denied on scap [17:24:17] elukey: no no it's for the deletion script [17:24:57] https://www.irccloud.com/pastebin/89AWlV2E/ [17:28:53] fdans: when doing scap deploy? [17:29:01] elukey: yessir [17:30:02] ahhh on stat1007 [17:30:05] that was reimaged [17:30:07] lemme check [17:32:48] !log remove + re-create /srv/deployment/analytics/refinery on stat1007 (perm issues after reimage) [17:32:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:33:04] running puppet now, hopefully will be unblocked in a sec [17:34:15] thank you elukey [17:34:47] so it might be the same on 1004 and 1006 I fear [17:35:04] fdans: can you retry? (but stop at the canary please, that should be 1007) [17:35:22] elukey: yes [17:36:43] elukey: it's going now instead of failing immediately so I'm guessing that's good [17:36:54] will update when canary's done [17:37:36] ok so I need to do the same on 1004 and 1006 before you can proceed [17:38:32] (03PS3) 10Fdans: Fix time selection tooltip not alighning with button [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/631443 (https://phabricator.wikimedia.org/T264310) [17:38:41] (03CR) 10jerkins-bot: [V: 04-1] Fix time selection tooltip not alighning with button [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/631443 (https://phabricator.wikimedia.org/T264310) (owner: 10Fdans) [17:39:07] elukey: 1007 is good, lmk when I should proceed [17:40:13] !log remove + re-create /srv/deployment/analytics/refinery* on stat100[46] (perm issues after reimage) [17:40:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:40:34] I am doing 1004/1006 but also with their caches, it might take some mins [17:41:14] (03CR) 10Milimetric: [C: 03+2] "Looks good, just rebase and deploy" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/631443 (https://phabricator.wikimedia.org/T264310) (owner: 10Fdans) [17:41:24] (03CR) 10jerkins-bot: [V: 04-1] Fix time selection tooltip not alighning with button [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/631443 (https://phabricator.wikimedia.org/T264310) (owner: 10Fdans) [17:41:28] (03PS4) 10Fdans: Fix time selection tooltip not alighning with button [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/631443 (https://phabricator.wikimedia.org/T264310) [17:42:47] (03CR) 10Fdans: [V: 03+2] Fix time selection tooltip not alighning with button [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/631443 (https://phabricator.wikimedia.org/T264310) (owner: 10Fdans) [17:45:09] fdans: green light [17:45:17] elukey: thank you! [17:45:28] lemme know if it works [17:45:47] I need to go in a few but in case I'll bring my laptop with me [17:46:39] elukey: should be good now, I'm sorry for the pings [17:47:22] nono np :) [17:48:01] scap deploy-log looks good :) [17:49:27] elukey: scap good! [17:49:35] thank you again [17:49:58] nice :) [17:49:59] logging off! [17:50:22] (if you need me call me, I will not check IRC) [17:51:59] (03PS2) 10Paul Kernfeld: Use pid library to manage Pidfile [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/623470 (https://phabricator.wikimedia.org/T173604) [17:57:34] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T256356 (10leila) [17:57:36] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T264268 (10leila) [17:58:34] 10Analytics: Check home/HDFS leftovers of nathante - https://phabricator.wikimedia.org/T264268 (10leila) @elukey the only items @Groceryheist needs to export are at T264255 (waiting security and analytics review). The rest can be purged. [17:59:30] (03CR) 10Paul Kernfeld: "No problem, I think PS2 should now exit successfully on a duplicate query. I verified that it works by looking at the logging output when " [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/623470 (https://phabricator.wikimedia.org/T173604) (owner: 10Paul Kernfeld) [18:00:34] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Thanks a lot for this change :] Merging." [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/623470 (https://phabricator.wikimedia.org/T173604) (owner: 10Paul Kernfeld) [18:13:12] mforns: i would let the puppet change delete the netflow data once merged , no need to do it by hand no? [18:13:39] nuria: no need, indeed [18:13:53] mforns: we just need to modify: https://gerrit.wikimedia.org/r/c/operations/puppet/+/628895 [18:16:18] 10Analytics, 10Security: Review request for data export - https://phabricator.wikimedia.org/T264255 (10Nuria) I think these should fine to export, agreed. [18:23:29] 10Analytics-Radar, 10Operations, 10ops-eqiad: an-presto1004 down - https://phabricator.wikimedia.org/T253438 (10Cmjohnson) the dell tech came today and replaced the board but did not bring new power supplies...anyway, swapped the board, and the power supplies still burned up [18:27:50] 10Analytics, 10Analytics-Kanban, 10good first task: Reportupdater: do not write execution control files in source directories - https://phabricator.wikimedia.org/T173604 (10Nuria) @paulkernfeld thanks for your work, woudl you be interested in doing more wikistats work (javascript) or you prefer these type of... [18:35:06] 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10Patch-For-Review, and 2 others: Develop a new schema for MediaSearch analytics or adapt an existing one - https://phabricator.wikimedia.org/T263875 (10egardner) Thanks @nettrom_WMF – I'll answer what I can below, but some of these questions... [18:36:37] nuria: mforns regarding the data quality bundle, should the daily one also be restarted or only the hourly [18:36:49] fdans: looing [18:38:07] fdans: only the hourly one needs to be restarted, however! in the etherpad, it says to backfill last 90 days [18:38:11] fdans: but [18:38:35] fdans: that should be done separately, I can do that tomorrow [18:38:51] fdans: please just restart the hourly bundle with todays date [18:39:04] otherwise we'll receive many undesired alerts [18:39:18] mforns: so not 90 days prior [18:39:28] fdans: exactly, just from today [18:39:40] ok [18:39:55] we'll probably receive some alerts for the next couple hours, but they are expected [18:44:23] oh wait fdans... those changes are for the navtiming queries, which are both hourly and daily [18:44:32] fdans: we also need to restart the daily one [18:44:46] but no backfilling [18:45:01] mforns: same parameters but granularity daily? [18:45:09] fdans: yes [18:45:16] fdans: the date [18:45:20] the date might change [18:45:33] like set the hour to 00 [18:46:17] wait mforns daily bundle has no navtiming coord [18:46:21] https://hue.wikimedia.org/oozie/list_oozie_bundle/0095740-200720135922440-oozie-oozi-B [18:46:35] only pageview [18:46:45] ah fdans, I'm comfused! it's only hourly then [18:46:47] sorrryyyyyy [18:47:02] mforns: np! :) [18:47:02] *confused [18:53:08] 10Analytics, 10Analytics-Kanban, 10good first task: Reportupdater: do not write execution control files in source directories - https://phabricator.wikimedia.org/T173604 (10paulkernfeld) No problem! I would be happy to take a shot at Python or Wikistats work. If it helps in your planning, my goal is to spen... [18:56:28] !log creating hive table wmf_raw.mediawiki_user_properties [18:56:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:56:54] 10Analytics-Radar, 10Technical-blog-posts: Story idea for Blog: The Best Dataset on Wikimedia Content and Contributors - https://phabricator.wikimedia.org/T259559 (10srodlund) @Milimetric this is published, but can you please review it to make sure there are no errors before I announce widely? Particularly, I... [18:57:06] !log creating hive table wmf_raw.mediawiki_page_props [18:57:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:59:42] !log restarting mediawiki-history-load-coord [18:59:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:00:36] 10Analytics-Radar, 10Technical-blog-posts: Story idea for Blog: The Best Dataset on Wikimedia Content and Contributors - https://phabricator.wikimedia.org/T259559 (10Milimetric) Looks great @srodlund. Do you think the formatting will be fixed in a week or so? Maybe I can help? If so, then let's wait to anno... [19:06:44] !log restarted banner_activity-druid-daily-coord from Sep 26 [19:06:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:07:10] !log deploying wikistats [19:07:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:26:24] 10Analytics-Radar, 10Technical-blog-posts: Story idea for Blog: The Best Dataset on Wikimedia Content and Contributors - https://phabricator.wikimedia.org/T259559 (10srodlund) @Milimetric I meeting w/@bd808 to help troubleshoot on Monday; it may be something we can fix then, but I'm not entirely sure. Let me k... [19:55:06] 10Analytics: Investigate oozie banner monthly job timeouts - https://phabricator.wikimedia.org/T264358 (10JAllemandou) [20:39:21] hey a-team: isn't stat1004, stat1006, and stat1007 supposed to run Debian Buster at this point? I'm still seeing stretch mentioned when I log in, and the R version is 3.3.3. [20:39:55] stat1008 says it's running Buster, and the R version is 3.5.2 [20:40:33] * Nettrom is confused [21:29:51] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10nettrom_WMF) With these new upgrades happening, I wanted to move my Jupyter notebooks from stat1008 to stat1006 as stat1008 has been very busy lately. Afte... [21:43:46] 10Analytics, 10Code-Health-Objective, 10Epic, 10Platform Engineering Roadmap, 10Platform Team Initiatives (API Gateway): AQS 2.0 - https://phabricator.wikimedia.org/T263489 (10WDoranWMF) [22:19:26] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10elukey) >>! In T255028#6510550, @nettrom_WMF wrote: > With these new upgrades happening, I wanted to move my Jupyter notebooks from stat1008 to stat1006 as... [22:20:30] this is... --^ [22:20:49] ok will restart tomorrow, sigh [22:21:00] (I was checking if anything was needed on my side)