[00:09:16] <icinga-wm>	 RECOVERY - Check the last execution of drop-el-unsanitized-events on an-launcher1002 is OK: OK: Status of the systemd unit drop-el-unsanitized-events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[01:29:09] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Wikistats - Add avk.wikipedia.or to scoop list  - https://phabricator.wikimedia.org/T264660 (10Nuria)
[01:29:57] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Wikistats - Add avk.wikipedia.or to scoop list - https://phabricator.wikimedia.org/T264660 (10Nuria) Thanks for reporting, new wikis data is not scooped automatically, we will add this one to our list.
[07:07:54] <elukey>	 good morning
[07:09:08] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1114.eqiad.wmnet'] ` The log can be...
[07:11:04] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey)
[07:20:13] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1114.eqiad.wmnet'] `  Of which those **FAILED**: ` ['an-worker1114.eqiad.wmnet'] `
[07:24:07] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1114.eqiad.wmnet'] ` The log can be...
[07:32:17] <elukey>	 !log bootstrap an-worker111[13] as hadoop workers
[07:32:19] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[07:35:46] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1114.eqiad.wmnet'] `  Of which those **FAILED**: ` ['an-worker1...
[07:47:02] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) an-worker1114's reimage fails for:  ` 07:35:31 | an-worker1114.eqiad.wmnet | Unable to run wmf-auto-reimage-host: Unable...
[07:56:49] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10Volans) @elukey if I try to ssh with the install console key I get a BusyBox... I guess that's the reason. Basically the reimage...
[08:10:00] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) The boot sequence was NIC then HD (as it happened for 1117), just fixed it thanks for the suggestion :)
[08:13:03] <elukey>	 https://gerrit.wikimedia.org/r/c/operations/puppet/+/632433 ready to bump the heap settings of the namenodes
[08:13:39] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1114.eqiad.wmnet'] ` The log can be...
[08:17:47] * elukey bbiab
[08:24:56] <wikibugs>	 10Analytics-Clusters, 10Operations, 10Traffic: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 (10elukey) I found https://github.com/varnishcache/varnish-cache/issues/2788 that might be what's happening. The fix is https://github.com/varnishcache/varnish-cache/commit/ed1696e...
[08:34:23] <wikibugs>	 10Analytics-Clusters, 10Operations, 10Traffic: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 (10ema) >>! In T264074#6520380, @elukey wrote: > I found https://github.com/varnishcache/varnish-cache/issues/2788 that might be what's happening. The fix is https://github.com/var...
[08:35:40] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1114.eqiad.wmnet'] `  and were **ALL** successful.
[09:04:12] <klausman>	 !log Starting reimaging of stat1007
[09:04:14] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:07:36] <klausman>	 elukey: last chance to stop me in 1m :)
[09:07:52] <elukey>	 :)
[09:08:41] <elukey>	 !log add an-worker1114 to the hadoop cluster
[09:08:43] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:09:23] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by klausman on cumin1001.eqiad.wmnet for hosts: ` ['stat1007.eqiad.wmnet'] ` The log can be found...
[09:16:14] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey)
[09:16:30] <wikibugs>	 10Analytics-Radar, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[02-17] - https://phabricator.wikimedia.org/T259071 (10elukey) 05Open→03Resolved All nodes are in hadoop now, looks good!
[09:18:07] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh 16 nodes in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T255140 (10elukey) All nodes are now in Hadoop, just closed the rack/setup/deploy task. I am going to update the docs on adding worker nodes, they probably need a...
[09:22:28] <wikibugs>	 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Refresh 16 nodes in the Hadoop Analytics cluster - https://phabricator.wikimedia.org/T255140 (10elukey) During the first puppet run, datanode and nodemanager fail for different reasons:  ` 2020-10-06 09:17:19,422 FATAL org.apache.hadoop.yarn.serve...
[09:37:35] <elukey>	 I have updated https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#Standard_Worker_Installation_(12_disk,_2_flex_bay_drives_-_analytics1028-analytics1077,_most_of_an-worker10XX)
[09:37:44] <elukey>	 the procedure is simplified a lot
[09:58:53] <joal>	 Good morning team
[10:09:12] <klausman>	 1007 almost done, one last puppet run+reboot
[10:09:17] <klausman>	 Heya!
[10:10:28] <wikibugs>	 10Analytics, 10Analytics-Wikistats: Wikistats - Add avk.wikipedia.or to scoop list - https://phabricator.wikimedia.org/T264660 (10JAllemandou) avkwiki has been added in https://gerrit.wikimedia.org/r/c/analytics/refinery/+/628917 This patch however got deployed after this month sqoop, so data will be available...
[10:11:37] <joal>	 elukey: do you have a minute? I'd like to entertain you with a weird idea we had with Dan yesterday
[10:11:58] <elukey>	 joal: one sec, alarm shower in operations
[10:12:07] <joal>	 sure, ping when ready elukey 
[10:12:34] <joal>	 wow - indeed - flooding
[10:17:34] <klausman>	 Looks like not enough ignorelisting for pseudo-filesystems?
[10:18:26] <elukey>	 seems so yes, by default we do it but we have an override for hadoop workers (that I wasn't aware / didn't recall)
[10:18:36] <elukey>	 why it triggered now though no idea
[10:19:58] <klausman>	 1007 update is complete, Wiki page for SSH keys is updated and info mail sent.
[10:20:09] <klausman>	 Shall we schedule 1006 for Thursday?
[10:21:45] <elukey>	 seems good!
[10:22:15] <klausman>	 Alright, will send announcement in a minute. Do you want/need help with the disk space false positives?
[10:22:16] <elukey>	 joal: ok I am good now :)
[10:22:22] <joal>	 elukey: batcave?
[10:22:28] <elukey>	 klausman: nono all good thanks :)
[10:22:34] <elukey>	 alarms recovering
[10:22:38] <elukey>	 joal: sure
[10:26:33] <klausman>	 Is it safe to ctrl-c the reimage script? It doesn't seem to see the reboot?
[10:26:38] <klausman>	 10:21:57 | stat1007.eqiad.wmnet | Still waiting for reboot after 20.0 minutes
[10:31:45] <elukey>	 yep yep I think so
[10:35:32] <klausman>	 Ok. Manually removed the Icinga downtimes. And now, lunch!
[11:02:25] * elukey lunch!
[12:11:24] <elukey>	 joal: o/ merged both changes, going to restart the namenodes
[12:11:31] <joal>	 ack elukey
[12:19:31] <elukey>	 going to start with an-master1002
[12:19:58] <elukey>	 !log increase spark shuffle io retry logic (10 tries every 10s)
[12:20:00] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:20:21] <elukey>	 !log update HDFS Namenode GC/Heap settings on an-master100[1,2]
[12:20:22] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:24:06] <fdans>	 good morning tema :)
[12:24:10] <fdans>	 team
[12:24:15] <fdans>	 great start
[12:26:34] <joal>	 Hi fdans :)
[12:29:59] <elukey>	 hello :)
[12:30:20] <elukey>	 so I'll leave an-master1002 with the new config for 30 mins, just to see how it goes 
[12:30:33] <elukey>	 then I'll failover 1001 to 1002, then restart 1001 and failback
[12:40:17] <wikibugs>	 10Analytics: Establish what data must be backed up before the HDFS upgrade - https://phabricator.wikimedia.org/T260409 (10elukey)
[12:40:51] <wikibugs>	 10Analytics: Establish what data must be backed up before the HDFS upgrade - https://phabricator.wikimedia.org/T260409 (10JAllemandou) After talking with the, we chose to backup all data except for logs, raw data (unprocessed webrequest, events, and dumps), 2 month of webrequest, and processed wikitext (heavy)....
[12:40:55] <joal>	 elukey: --^
[12:41:36] <joal>	 elukey: Just added a comment with a more precise version of the computation we did this morning - It looks ok :)
[12:41:39] <elukey>	 nice!
[12:41:46] <elukey>	 <3
[12:43:02] <elukey>	 joal: I have re-done the calculations, and with 16 nodes we have two possibilities
[12:43:35] <elukey>	 1) we use 16x48T=768 (minus some TBs of disks failed in old nodes)
[12:43:45] <elukey>	 and we create two ganeti vms for the masters
[12:44:16] <elukey>	 2) we use 2 of the 16 nodes for the masters (wasting a lot of disks), and 14x48T=672T
[12:44:31] <elukey>	 but we'll need 3/4 more nodes from the new batch
[12:44:35] <joal>	 elukey: if it's not too expensive for ganeti, I'd for VMs
[12:45:07] <elukey>	 the only doubt that I have is about the heap size
[12:45:23] <joal>	 elukey: conf change for spark confirmed :)
[12:45:26] <joal>	 \o/
[12:45:30] <elukey>	 namely works??
[12:46:08] <joal>	 elukey: I have no idea if it actually changes something for our troubles, but the changes are applied (spark reads them and reports them in its config)
[12:46:58] <joal>	 elukey: the thing I want to check now is that the oozie-launched spark also have them
[12:47:00] <elukey>	 !log force re-creation of the base virtualenv for jupyter on stat1007 after the reimage
[12:47:02] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[12:47:18] <elukey>	 joal: in theory if spark-defaults is updated on an-coord1001 it should
[12:47:55] <joal>	 elukey: hm - not sure - for instance, we at some needed to explicitely set dynamic-allocation to true in oozie job conf as it was not picked-up
[12:48:05] <joal>	 elukey: this is why I want to double check
[12:49:18] <elukey>	 joal: yep but do you recall that we added a setting to oozie to pick up spark-defaults some months ago?
[12:49:36] <joal>	 Ahhhh!! I had forgotten about that one :)
[12:49:46] <joal>	 So it wouldn't have worked, but now it should :)
[12:49:51] <joal>	 I'm still gonna chekc :)
[12:49:58] <elukey>	 yep yep! should be oozie.service.SparkConfigurationService.spark.configurations
[12:49:58] <joal>	 Too many things to recall
[12:53:18] <joal>	 elukey: from what I read, not picked up
[12:53:37] <joal>	 elukey: could it be that oozie needs a bump?
[12:54:37] <elukey>	 ah yes probably
[12:54:47] <elukey>	 feel free to restart oozie
[12:55:12] * joal is afraid of doing something wrong :S
[12:55:26] <elukey>	 nono please do it
[12:55:28] <joal>	 elukey: sudo -u analytics service oozie restart
[12:55:43] <elukey>	 without -u analytics, and if you want sudo systemctl restart oozie
[12:56:03] <joal>	 ok will use your command
[12:56:14] <joal>	 !log Restart oozie to pick up new spark settings
[12:56:16] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[13:04:03] <klausman>	 I think I may have overdone it yesterday and so far today: my wrist is unhappy. I am going to take a break for a bit. /msg if you need me.
[13:04:22] <elukey>	 klausman: ack, please rest :)
[13:06:01] <klausman>	 But somebody's wrong on the Internet!
[13:14:45] <elukey>	 !log cleaned up /srv/jupyter/venv and re-created it to allow jupyterhub to start cleanly on stat1007
[13:14:48] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[13:16:24] <elukey>	 doing the an-master failover - 1001 -> 1002
[13:17:31] <elukey>	 lovely
[13:17:42] <elukey>	 my jupyter notebook fails to start with PermissionError: [Errno 13] Permission denied: '/srv/home/elukey/.local/share'
[13:18:04] <elukey>	 ah lol files owned by root
[13:18:11] <elukey>	 this is weirdness of the reimage
[13:18:12] <elukey>	 oh my
[13:18:34] <joal>	 meh :S
[13:20:16] <elukey>	 removing/re-creating the venv is probably the only way
[13:28:20] <elukey>	 ok now seems to work
[13:28:30] <elukey>	 but perms are not ok all around :(
[13:28:37] <elukey>	 I'll have a chat with Tobias
[13:29:14] <klausman>	 I figure the UIDs of ~everything changed
[13:29:24] <joal>	  /o\
[13:30:20] <klausman>	 elukey: maybe cd /srv/home; for i in *; do chown -R ${i}: $i; done
[13:30:40] <klausman>	 I doubt there is cross-owner file/dir ownership in homedirs
[13:31:33] <klausman>	 It is inclear to me how a file like that could end up with UID 0, tho
[13:39:31] <elukey>	 klausman: so in theory all files owned by users / groups that we have stated in puppet should have a fixed id/gid
[13:40:03] <elukey>	 the files that I pointed out above were created by jupyterhub, that I believe doesn't have a system user with fixed id :(
[13:40:17] <klausman>	 Ah
[13:40:23] <klausman>	 But we know the old and new IDs?
[13:40:48] <elukey>	 I don't know a way to get those
[13:40:50] <klausman>	 then it'd just be find /srv -uid 123 -print0|xargs -0 chown 456
[13:40:59] <elukey>	 yeah
[13:41:11] <klausman>	 well, the old ID would be still visible in broken notebooks
[13:41:21] <klausman>	 and the new user should be in /etc/passwd, no?
[13:41:28] <elukey>	 the other problem is with /srv/deployment, because scap deploy users I think are not system users with fixed id
[13:42:06] <elukey>	 in theory yes, but in this case some files in my .local/share/jupyter were owned by root, not sure why
[13:42:27] <klausman>	 UID0 is definitely weird. Have you checked jup nbs of other users?
[13:43:13] <elukey>	 still haven't got the time to investigate deeply
[13:43:38] <elukey>	 I just fixed/tested jupyterhub
[13:44:02] <elukey>	 (going afk for a bit, will read in a few)
[13:44:02] <klausman>	 Is there anything we can do to prevent this with 1006 on Thursday?
[13:44:30] <elukey>	 not sure, we need probably to brainstorm
[13:44:45] <elukey>	 adding system users in puppet etc.. would be good, but I think not for this round of reimages
[13:45:23] <joal>	 zpapierski: Heya - Just sent an update to https://phabricator.wikimedia.org/T261841
[13:45:30] <joal>	 zpapierski: feel free to close it :)
[13:45:46] <zpapierski>	 joal: great. thanks!
[13:45:51] <klausman>	 brute force might be to create a file with the current perms just before the reimage, and fix the perms afterwards?
[13:46:10] <klausman>	 I can write a tool to do that
[13:48:50] <GoranSM>	 Thanks for R 3.3 on stat1007. 
[13:49:01] <GoranSM>	 R 3.5 sorry 3.3 was already there.
[13:49:21] <klausman>	 a pirate's favorite language :)
[14:01:15] <joal>	 Gone for kids - Back at standup
[14:08:57] <mforns>	 hey teammm :]
[14:10:46] <mforns>	 elukey: you said in the email thread that you'll try to remove some terabytes from user folders by asking politely... now, are you trying to free space in the production cluster?
[14:12:14] <elukey>	 mforns: what thread?
[14:12:50] <elukey>	 I am not doing any cleanup, I am not following
[14:13:19] <mforns>	 oh, was it a phab task? about the backup of all data with replication=2
[14:13:53] <elukey>	 ah you mean https://phabricator.wikimedia.org/T260409 ? That was joseph, not me :D
[14:14:50] <mforns>	 elukey: ooh! you're right
[14:15:34] <mforns>	 ok, I was just saying because once the deletion timers I'm working on get merged, they will delete lots of raw data, mediawiki_job and netflow
[14:16:24] <mforns>	 sorry I read @elukey and I got confused
[14:17:40] <elukey>	 mforns always blaming luca
[14:17:42] <elukey>	 :D
[14:17:49] <mforns>	 hehehe
[14:18:12] <elukey>	 mforns: really nice that we'll drop a lot of raw data
[14:19:19] <mforns>	 elukey: BTW, is it OK if we pair on re-testing that? I guess it's you who is going to merge it (puppet change), so if you want I can share screen and show you the test I did?
[14:21:19] <elukey>	 sure
[14:28:48] <mforns>	 elukey: cool :], let me know when you have 10 mins
[14:34:33] <elukey>	 mforns: I have them now if you want
[14:34:45] <elukey>	 (need to leave at 17 to pick my car)
[14:35:08] <mforns>	 elukey: I'm in da cave with Dan, but maybe after? I'll ping you :]
[14:36:46] <elukey>	 yep!
[15:26:57] <elukey>	 got completely lost in some code, I am late to get my car, going now, I may end up late at standup sorry :(
[15:31:43] <wikibugs>	 10Analytics-Radar, 10Platform Engineering Roadmap Decision Making, 10Epic, 10MW-1.35-notes (1.35.0-wmf.32; 2020-05-12), and 2 others: Remove revision_comment_temp and revision_actor_temp - https://phabricator.wikimedia.org/T215466 (10daniel)
[15:55:43] <wikibugs>	 10Analytics-Clusters, 10Operations, 10Traffic: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 (10ema) >>! In T264074#6520414, @ema wrote: > Definitely, please feel free to go ahead if you have the time.  Implicit but it's probably better to state it clearly: if you do have...
[15:58:17] <joal>	 fdans: heya - the meeting for the dev standup seems not schedule to accept me - weird
[16:01:45] <nuria>	 ping milimetric fdans 
[16:02:55] <elukey>	 ping razzi (SRE standup)
[16:03:00] <razzi>	 Ohh
[16:03:01] <razzi>	 :)
[16:03:08] <nuria>	 elukey: wait
[16:03:32] <elukey>	 nuria: ?
[16:03:37] <nuria>	 klausman, elukey : can we start split standups next week
[16:03:49] <elukey>	 nuria: sure,
[16:03:55] <nuria>	 klausman, elukey , razzi : after lex joins?
[16:04:07] <nuria>	 klausman, elukey , razzi: and otto is back?
[16:05:12] <wikibugs>	 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10Patch-For-Review, and 2 others: [L] Instrument MediaSearch results page - https://phabricator.wikimedia.org/T258183 (10CBogen) @egardner I just remembered that @mwilliams created T263172 - does this plan cover the ability to answer the ques...
[16:08:52] <wikibugs>	 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10Patch-For-Review, and 2 others: [L] Instrument MediaSearch results page - https://phabricator.wikimedia.org/T258183 (10egardner) @cbogen I think my latest updates to this patch and the schema should capture enough data to answer those quest...
[16:10:35] <wikibugs>	 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10Patch-For-Review, and 2 others: [L] Instrument MediaSearch results page - https://phabricator.wikimedia.org/T258183 (10CBogen)
[16:19:02] <wikibugs>	 10Analytics-Radar, 10Product-Analytics, 10Anti-Harassment (The Letter Song): Capture special mute events in Prefupdate table [4 hour spike] - https://phabricator.wikimedia.org/T261461 (10Niharika) p:05Triage→03Medium
[16:19:10] <wikibugs>	 10Analytics-Radar, 10Product-Analytics, 10Anti-Harassment (The Letter Song): Capture special mute events in Prefupdate table [4 hour spike] - https://phabricator.wikimedia.org/T261461 (10Niharika)
[16:22:47] <wikibugs>	 10Analytics, 10observability: Indexing errors / malformed logs for aqs on cassandra timeout - https://phabricator.wikimedia.org/T262920 (10JAllemandou) Thanks @colewhite for the explanation.
[16:34:23] <wikibugs>	 10Analytics-Radar, 10Product-Analytics, 10Anti-Harassment (The Letter Song): Capture special mute events in Prefupdate table [4 hour spike] - https://phabricator.wikimedia.org/T261461 (10jwang)
[17:13:07] <elukey>	 razzi: want to chat about TLS and envoy?
[17:13:15] <mforns>	 elukey: oh you beat me
[17:13:29] <elukey>	 ah okok if you already have work scheduled np :)
[17:13:46] <wikibugs>	 10Analytics, 10Anti-Harassment, 10CheckUser, 10Privacy Engineering, and 2 others: Deal with Google Chrome User-Agent deprecation - https://phabricator.wikimedia.org/T242825 (10nshahquinn-wmf)
[17:13:55] <mforns>	 elukey:  no no, I was thinking abnout the deletion jobs testing, we could do that together with razzi
[17:14:07] <mforns>	 elukey: but that is not urgent, we can work on that tomorrow
[17:14:32] <elukey>	 mforns: it makes that that you two pair together, I'll review the outstanding patches :)
[17:14:35] <razzi>	 :) I have a meeting in 15, I'm happy to work with either /both in 45 minutes
[17:14:36] <elukey>	 please go aehad
[17:15:06] <mforns>	 elukey: but I wanted your input as well :]
[17:15:33] <elukey>	 ah sure, we can meet now if you want
[17:15:49] <mforns>	 elukey: ok! razzi, can you now?
[17:16:13] <razzi>	 Yeah, for a few minutes
[17:16:25] <mforns>	 razzi, elukey: let's go to tardis then: https://meet.google.com/kti-iybt-ekv
[17:55:53] <zpapierski>	 I've ben rate limited by etherpad...
[17:56:15] <joal>	 you type too fast zpapierski!
[17:56:16] <zpapierski>	 that was suppose to be writen on discovery channel :)
[17:56:22] <zpapierski>	 apparently
[17:56:25] <zpapierski>	 weird
[18:07:05] * fdans has internet at home!
[18:07:13] <joal>	 \o/
[18:07:20] <joal>	 welcome fdans@home :)
[18:08:43] <elukey>	 mforns just dropped 3M files from hdfs \o/
[18:08:47] <elukey>	 victory
[18:08:49] <joal>	 \o/
[18:09:09] <elukey>	 going afk now! 
[18:09:14] <joal>	 I'm gonna create a patch to drop webrequest-stats as well :)
[18:09:24] <joal>	 That''' be another million or so
[18:09:38] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Set up automatic deletion/snitization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10mforns) Raw data for mediawiki_job and netflow *older than 90 days* has been deleted with the script, and periodical deletion jobs have been d...
[18:09:38] <elukey>	 razzi: I'll try to add more info to the envoy task tomorrow so you don't get stuck, and I added some comments to your code review for oozie
[18:09:50] <razzi>	 elukey: great, thanks
[18:10:12] <joal>	 Gone for diner, back after
[19:31:44] <nuria>	 razzi: give me 5 mins to join our meeting
[19:31:48] <razzi>	 Sounds good
[19:36:32] <wikibugs>	 10Analytics: Rework how mediawiki-history differentiates fake page-create from real ones - https://phabricator.wikimedia.org/T264791 (10JAllemandou)
[19:36:40] <wikibugs>	 10Analytics: Rework how mediawiki-history differentiates fake page-create from real ones - https://phabricator.wikimedia.org/T264791 (10JAllemandou)
[19:37:06] <joal>	 milimetric, mforns: I'd value some proof-reading on that one when you folks have a minute --^
[19:37:14] <mforns>	 k
[19:48:57] <wikibugs>	 10Analytics: Rework how mediawiki-history differentiates fake page-create from real ones - https://phabricator.wikimedia.org/T264791 (10JAllemandou)
[20:05:38] <nuria>	 milimetric (and mforns and joal)  can you confirm you have access to this drive? https://drive.google.com/drive/u/0/folders/0AB5b7sFjfnJXUk9PVA
[20:05:52] <nuria>	 it is teh one i created for the whole org for design docs
[20:06:01] <milimetric>	 nuria: confirmed
[20:06:07] <joal>	 nuria: confirmed as well
[20:06:09] <mforns>	 nuria: yes
[20:06:24] <nuria>	 razzi was saying that "design home depot" does sound best 
[20:06:33] <nuria>	 milimetric: any other ideas for name of drive
[20:06:36] <nuria>	 ?
[20:06:44] <nuria>	 does NOT sound best sorry
[20:06:47] <nuria>	 cc razzi 
[20:08:56] <mforns>	 nuria: design home depot, or design documents depot?
[20:09:25] <nuria>	 mforns: it will only be documents so "design documents depot"
[20:09:30] <joal>	 Design Documents
[20:09:34] <joal>	 without depot?
[20:09:36] <nuria>	 k
[20:09:43] <mforns>	 +1
[20:11:35] <joal>	 nuria: should we also put https://docs.google.com/document/d/1JAO0TjzryBcgPZNYwsr7T7eyY7jPwJMj2FWp2_Dsbcg in that folder?
[20:12:10] <nuria>	 joal: yes but take a look https://docs.google.com/document/d/164ofi_muWrQKtOBuUWz_Aezo9YO19WrsolNrYZdXenI/edit
[20:12:17] <nuria>	 joal: we are moving shortcuts to docs
[20:12:29] <nuria>	 joal: not teh docs themselves to avoid accidental deletions
[20:12:42] <joal>	 ack nuria - no problem
[20:14:15] <wikibugs>	 10Analytics: Rework how mediawiki-history differentiates fake page-create from real ones - https://phabricator.wikimedia.org/T264791 (10Milimetric) > at the page's first revision timestamp if no page-title collision happens, or at earliest timestamp before collision otherwise (By doing so, we enforce a single pa...
[20:17:43] <wikibugs>	 10Analytics: Rework how mediawiki-history differentiates fake page-create from real ones - https://phabricator.wikimedia.org/T264791 (10JAllemandou)
[20:20:44] <wikibugs>	 10Analytics: Rework how mediawiki-history differentiates fake page-create from real ones - https://phabricator.wikimedia.org/T264791 (10JAllemandou) > couldn't we just join based on the page_first_revision timestamp? We'd need to differentiate which timestamp is used for the join from event-type: user `first-rev...
[20:22:01] <joal>	 Ok gone for tonight team - tomorrow is kids day, working in the evntin
[20:22:06] <joal>	 /evening/
[20:30:23] <milimetric>	 nuria: I'm not so passionate about these kinds of names, they can be easily changed :)  Design Documents is fine
[20:30:36] <nuria>	 milimetric: k
[20:30:47] <nuria>	 milimetric: Design Documents it is!
[20:30:51] <milimetric>	 I'm out for a bike ride, will be back to check on things in the evening
[20:32:20] <milimetric>	 btw, folks are complaining about the pagecounts-ez not being there, so far we're up to 3 complaints.  I asked if they had an urgent need for the data, we should be prepared to try and fix it if we can't get pageviews-complete out quickly.  I think if we could point them at the 2020-09 monthly dump sometime this week or next, then maybe it would be ok.
[20:32:48] <milimetric>	 but realizing that they have to change scripts, etc. makes me lean towards just fixing the job.  cc fdans lemme know what you think
[20:33:45] <fdans>	 milimetric: hmmm
[20:34:14] <fdans>	 regarding data being public, I'm waiting on this CR getting merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/629409
[20:34:24] <fdans>	 it has +2 but I don't have merge perms
[20:35:58] <fdans>	 milimetric: fixing whatever is breaking pagecounts_ez sounds like a whole worm canning facility
[20:36:16] <milimetric>	 I'm not sure... it could just be a disabled cron or something
[20:37:13] <milimetric>	 fdans: you should add a landing / documentation page to that and then ping Luca to merge it
[20:37:38] <milimetric>	 you do that and I'll do a 30 minute dive into the worm canning factory
[20:38:18] <fdans>	 milimetric: but if the files aren't rsynced yet we shouldn't publish a landing page yet no? I already have it written but I thought the files should be available on the dumps host first
[20:38:28] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10mforns) Hi all!  I believe we can use a Refine transform function to add the requested fields (except for BGP communities IIUC) at refine time. Pl...
[20:38:50] <nuria>	 milimetric: i do not think fixing the job is the way to go , let's focus on getting the new changes out
[20:38:58] <milimetric>	 fdans: you can put the landing page there and then when we change the index page to point to it we'll make sure the rsync has completed a good chunk
[20:39:25] <fdans>	 milimetric: that sounds good
[20:39:34] <milimetric>	 nuria: I can't help with that anyway and it's my ops week, I'm just going to look at it for 30 minutes
[20:39:54] <milimetric>	 either way, whatever's going on with that job we have to figure out anyway if we're going to decomission it
[20:40:23] <milimetric>	 heh, like we can't just leave it and hope it did exactly what we wanted it to by accident
[20:40:49] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Operations, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10Nuria) @mforns: it could also be a second job run after the refined one (similar to how we do virtual-pageviews) as we probably do not want to cre...
[21:14:13] <wikibugs>	 (03CR) 10Nuria: [C: 04-1] Add DesktopWebUIActionsTracking fields to eventlogging allowlist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/631988 (https://phabricator.wikimedia.org/T263143) (owner: 10MNeisler)
[21:42:02] <fdans>	 milimetric: I don't know, if the complaints were that they have no access to, say, 2012 data, I'd feel way more charitable, but pageview dumps have been out since 2016 and current, way less problematic data is fully available
[21:44:31] <fdans>	 I get that pagecounts-ez was never officially deprecated but its permanent upkeep is a contract we never signed, like in the case of wikistats 1
[22:08:52] <wikibugs>	 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10Patch-For-Review, and 2 others: Develop a new schema for MediaSearch analytics or adapt an existing one - https://phabricator.wikimedia.org/T263875 (10nettrom_WMF) >>! In T263875#6510065, @egardner wrote: > If there is a better/standard way...
[23:43:20] <wikibugs>	 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10Patch-For-Review, and 2 others: Develop a new schema for MediaSearch analytics or adapt an existing one - https://phabricator.wikimedia.org/T263875 (10Ramsey-WMF) > Is there a plan to bring MediaSearch to other wikis in the future, or will...