[07:40:20] hello people [07:40:24] hi elukey [07:40:34] just added more alarms for the statsv and eventlogging varnishkafka instances [07:40:48] awesome [07:48:28] another thing that we discussed some days ago was to merge the nginx submodule to operations/puppet [07:49:15] the trick that Joe suggested (moving the submodule to environments/production/modules first, then back to modules) worked fine for jmxtrans|kafkatee|varnishkafka [08:04:50] elukey: BTW, let me know when you're ready to update librdkafka :) [08:05:38] before the offsite I've tested it on cp1008 and it seemed happy [08:06:28] test result: https://phabricator.wikimedia.org/T182993#4290888 [08:06:43] vgutierrez: if we could start from cache misc and leave it there for a couple of days it would be awesome [08:06:46] (even one day is fine) [08:07:15] elukey: so... let update the lib on cache::misc without merging the config change, right? [08:07:46] I don't like the idea of stopping puppet for 24 hours in the whole cache cluster [08:08:45] change here: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/440544/ [08:08:47] vgutierrez: now that I think about it the upgrade only adds your patch right? Anyhow, why stopping puppet? Do we force a specific version in there? I thought that a simple apt-get install would have been enough [08:09:12] elukey: the update only adds my patch, that's right [08:11:27] vgutierrez: ah ok so your "update librdkafka" means deploying the new lib + your patch [08:11:52] elukey: we can split it in two phases [08:11:56] no problem at all [08:12:23] yeah that would be awesome - upgrade cache misc first, leave it boil for today just_to_be_sure_tm, then text/upload [08:12:32] and then we merge your change incrementally [08:12:46] perfect [08:12:52] (I completely trust your tests but I am pessimist by nature :) [08:13:05] better safe than sorry man :) [08:13:56] exactly :) [08:20:32] elukey: so I'd go fo cache::misc in a jiffy [08:20:37] *I'll [08:25:01] +1 [09:22:31] elukey: at what pace should I restart varnishkafka on cache::misc nodes? [09:23:40] vgutierrez: small batches with some sleep, 30s should be super fine [09:23:44] nice [09:23:48] I'd do cp2006 manually first [09:24:10] hmm maybe some from esams :) [09:24:20] yeah.. cp3007 [09:26:48] elukey: done, could you confirm that everything is ok with cp3007 on your side? [09:28:29] just ran kafkacat -C -b kafka-jumbo1001.eqiad.wmnet -t webrequest_misc | grep 'cp3007' --color on stat1004, looks good [09:30:35] \o/ [09:36:31] vgutierrez: you might want to hold off, see -operations [09:36:49] yep [09:37:04] godog: I didn't run anything yet besides cp3007 [09:40:57] ack [10:22:00] elukey: misc done and looking good from my side :) [10:24:29] \o/ [10:55:09] quick updated about varnishkafka - I merged the changes in T177647 after a bit of testing, so in master we have now vk working from Varnish 5.2 to 6.0 [10:55:10] T177647: Varnishkafka does not play well with varnish 5.2 - https://phabricator.wikimedia.org/T177647 [10:55:32] in the varnishv51 branch there are all the commits up to the last two in master [10:55:40] to track the version that we currently use [10:56:16] (I also added a note in the README.md stating that the last master version is not the one deployed in Wikimedia's production) [10:56:27] nice [13:26:48] 10Traffic, 10Operations, 10TemplateStyles, 10Wikimedia-Extension-setup, and 4 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#4315414 (10hashar) [16:00:57] bblack: so, dns2001 and dns2002 look happy :) [16:01:14] I think we are in position of decomm the old ones [16:06:20] +1 [16:07:36] BTW, I've discovered a tricky issue by accident [16:07:49] T198215 [16:07:49] T198215: systemd-logind fails with result 'timeout' in db2093 and dns4001 - https://phabricator.wikimedia.org/T198215 [16:10:38] stracing is not helpful at all... [16:10:39] epoll_wait(4, 0x7fffb0a17580, 11, -1) = -1 EINTR (Interrupted system call) [16:10:42] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} --- [16:10:45] +++ killed by SIGTERM +++ [16:10:48] :/ [18:23:58] 10netops, 10Operations, 10fundraising-tech-ops: new pfw policy for monitor server - https://phabricator.wikimedia.org/T198237#4316809 (10cwdent) [20:41:30] gehel: nope, not really around :) [20:42:36] gehel: currently in London for https://www.women-in-technology.com/ [20:43:09] ema: you're too busy smiling :-P [20:43:26] yes! [20:43:43] * volans saw the picture [20:45:08] it was a pretty good day, except for the polar temperature due to air conditioning [20:49:33] eheheh ofc! [20:56:35] ema: cool ! You're with debt ? Enjoy !