[00:16:08] <mutante>	 fixing puppet on icinga server, follow-up to an earlier change with notes_links/notes_urls
[00:37:42] <mutante>	 fixed. this applied a lot of changes coming from https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509365/ now and also that other change that gives Icinga a user agent
[00:40:32] <mutante>	 they had just never been applied because of the puppet fail.  icinga working.  going off now
[00:42:15] <mutante>	 arr.. or not, because now that this works i get to see another error that comes from my own merge re: gerrit contact groups, heh
[00:52:02] <mutante>	 ok, done. Total Errors/Warnings: 0 in icinga config again.   now ok
[00:53:21] <mutante>	 off
[07:54:08] <moritzm>	 FYI, I'll reboot cumin2001 in a bit, please don't use it for new screen sessions/reimages for now
[08:04:24] <moritzm>	 cumin2001 is back, please use it for reimages etc. for now (cumin1001 is up next)
[08:04:42] <marostegui>	 moritzm: when do you plan to do cumin1001?
[08:05:32] <moritzm>	 I'll start hunting down screen sessions etc. in a bit, but if anyone is long-running and can't be skipped, we'll just re-attempt a different time
[08:05:42] <moritzm>	 there's one current reimage running
[08:06:48] <marostegui>	 ok, I just finished a long-running one, so fine from my side as of today
[08:25:52] <moritzm>	 cumin1001 is rebooted and good to use again
[08:31:26] <jynus>	 thanks
[08:53:06] <moritzm>	 any objections against a reboot of deploy1001 in the next 10 minutes? there are no deployments scheduled today, but maybe there's something currently needing a DB change in wmf-config or similar?
[08:53:32] <marostegui>	 moritzm: ok from the DB side
[08:56:06] <moritzm>	 thanks, I'll proceed in a few minutes, then
[09:10:43] <moritzm>	 deploy1001 is back up
[09:30:31] <moritzm>	 FYI, I'll reboot netmon1002 in a bit, speak up if it's a bad time
[09:38:14] <volans>	 it includes netbox (JFYI)
[09:41:43] <moritzm>	 it's back up already :-)
[11:49:10] <volans>	 moritzm: so keyholder on deploy1001 says only the analytics key is armed
[11:49:39] <jynus>	 just to confirm, it was not deployed: https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php&1
[11:49:48] <moritzm>	 the keyholder is re-armed, but then it was asking me for the modsec one we don't have
[11:50:02] <moritzm>	 I assumed arming the first would arm all except the modsec one
[11:50:22] <volans>	 eh
[11:50:26] <volans>	 it does them in alphabetic order
[11:50:29] <volans>	 let me hack it
[11:50:56] <moritzm>	 from https://phabricator.wikimedia.org/T224887#5232493
[11:52:27] <volans>	 ok done, jynus retry please
[11:52:39] <jynus>	 doinf
[11:53:02] <volans>	 I've temproarily moved private and public key in /etc/keyholder.d from apache2modsec to zapache2modsec
[11:53:03] <jynus>	 it is very slow
[11:53:45] <jynus>	 11:53:26 Check 'Logstash Error rate for mw1278.eqiad.wmnet' failed: ERROR: 50% OVER_THRESHOLD (Avg. Error rate: Before: 0.10, After: 2.00, Threshold: 1.00)
[11:54:45] <jynus>	 I don't see an increased error rate, though
[11:55:13] <volans>	 interesting, scap canaries are not those defined in A:mw-canaries
[11:55:33] <volans>	 so I've deployed scap to: mw[1261-1265].eqiad.wmnet,mwdebug[2001-2002].codfw.wmnet,mwdebug[1001-1002].eqiad.wmnet (9 hosts)
[11:55:41] <volans>	 s/deployed/upgraded/
[11:56:07] <volans>	 but when scap deploys it actually checks the error rates on a different sets of hosts
[11:56:24] <volans>	 I don't know if they were supposed to be in sync or not
[11:56:40] <moritzm>	 yeah, in this context mw canaries are the hosts with the app server canary role
[11:56:47] <jynus>	 11:53:26 Canary error check failed for 1 canaries, less than threshold to halt deployment (2/11), see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details. Continuing...
[11:56:51] <moritzm>	 as those are usually used to stage new extentions, HHVM etc,
[12:01:31] <volans>	 jynus: any error related to the hosts I listed above?
[12:01:51] <jynus>	 no
[12:02:21] <volans>	 ack, thanks
[12:07:47] <volans>	 I'm going to lunch but ping if needed
[12:43:48] <volans>	 jynus: I'm wondering if the error you got earlier was due to:
[12:43:49] <volans>	 https://gerrit.wikimedia.org/r/c/mediawiki/tools/scap/+/519074
[12:43:53] <volans>	 that was included in this release too