[08:43:23] 10Wikimedia-Apache-configuration, 6operations, 7Puppet: Refactor the mediawiki puppet classes to make HHVM default, drop zend compatibility - https://phabricator.wikimedia.org/T126310#2010867 (10Joe) 3NEW [08:52:37] 10Wikimedia-Apache-configuration, 6operations, 7Puppet: Refactor the mediawiki puppet classes to make HHVM default, drop zend compatibility - https://phabricator.wikimedia.org/T126310#2010876 (10MoritzMuehlenhoff) It would also be great to fix up the package dependencies so that we can stop installing the Ze... [08:54:52] 10Wikimedia-Apache-configuration, 6operations, 7Puppet: Refactor the mediawiki puppet classes to make HHVM default, drop zend compatibility - https://phabricator.wikimedia.org/T126310#2010877 (10Joe) @MoritzMuehlenhoff I think that should be targeted when we upgrade the appservers to jessie (or stretch). Ano... [09:16:53] 10Traffic, 6operations: Forward-port VCL to Varnish 4 - https://phabricator.wikimedia.org/T124279#2010889 (10ema) p:5Triage>3Normal a:3ema [14:13:36] paravoid: is 4.4 manual package only or going into apt repo now? [14:35:05] 10Traffic, 6operations, 7Pybal: pybal fails to detect dead servers under production lb IPs for port 80 - https://phabricator.wikimedia.org/T113151#2011406 (10Joe) 5Open>3Resolved [14:35:18] 10Traffic, 6operations, 5Patch-For-Review, 7Pybal: pybal etcd coroutine crashed - https://phabricator.wikimedia.org/T125397#2011407 (10Joe) 5Open>3Resolved [14:36:39] bblack: moritz said he'd put it in apt today [14:37:20] 10Traffic, 6operations, 7Monitoring, 5Patch-For-Review, 7Pybal: Implement pybal pool state monitoring and alerting via icinga - https://phabricator.wikimedia.org/T102394#2011421 (10Joe) 5Open>3Resolved [14:40:47] paravoid: ok awesome. I'm holding off 4x machines from my "reboot for 3.19" list to put 4.4 on them as canaries. [14:40:55] :) [15:01:17] 10Traffic, 6operations, 10ops-esams: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2011443 (10BBlack) Add cp3043 to the list of nodes that needed ipmi_si blacklist [15:24:33] 10Traffic, 6operations, 10ops-esams: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2011504 (10BBlack) and cp3045 ... [15:27:37] 10Traffic, 6operations: Upgrade LVS servers to a 4.3+ kernel - https://phabricator.wikimedia.org/T119515#2011509 (10faidon) 4.4.0 was released and subsequently packaged by @MoritzMuehlenhoff. After installing it on a couple of canary hosts it was determined that it doesn't suffer from 4.3's (nor 4.2's) issues... [15:36:16] bblack: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=99cb99aa055a72d3880d8a95a71034c4d6 [15:36:24] the first one sounds interesting [15:37:55] paravoid: is that in 4.4? [15:38:00] yes [15:38:10] nice! [15:39:35] https://gerrit.wikimedia.org/r/269423 [15:39:47] I'll echo 1 it on cp3001 to see what happens [15:40:25] * bblack waits for the internet to implode [15:40:29] :P [15:40:49] er, lvs3001 that is [15:40:51] sorry :) [16:14:59] 10Traffic, 6operations, 10ops-esams: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2011692 (10BBlack) So for the record, the total list of hosts that are now running with ipmi_si blacklisted are: cp3032, cp3039, cp3043, and cp3045 [16:15:22] 10Traffic, 6operations, 10ops-esams: cp30[34]x hw/firmware/BMC issues - https://phabricator.wikimedia.org/T126062#2011695 (10BBlack) (if I had to guess, these machines won't correctly reboot/poweroff due to that, but who knows until we try) [16:42:26] 10Traffic, 6operations: Upgrade LVS servers to a 4.3+ kernel - https://phabricator.wikimedia.org/T119515#2011775 (10BBlack) I see we also have a 4.4.0-rt to try as well. It sounds like it might be beneficial on LVS and/or cp, but probably needs separate testing. [16:47:21] 10Traffic, 6Performance-Team, 6operations, 5Patch-For-Review: Disable SPDY on cache_text for a week - https://phabricator.wikimedia.org/T125979#2011788 (10BBlack) The cache kernel reboots will be done in a few hours. I figure allow the rest of the day for the perf impact there to settle back to "normal",... [16:48:34] 10Traffic, 6Performance-Team, 6operations, 5Patch-For-Review: Disable SPDY on cache_text for a week - https://phabricator.wikimedia.org/T125979#2011789 (10BBlack) (also note pinkunicorn/cp1008 already has SPDY removed. You can locally hack e.g. en.wikipedia.org DNS to point at 208.80.154.42 to see how the... [17:03:18] bblack: re: do_spdy, there are also some spdy-related settings in localssl.erb, not sure whether we want to include them if do_spdy is false [17:03:43] proxy_set_header X-Connection-Properties and proxy_set_header Accept-Encoding [17:04:10] ema: I tested those manually on cp1008 a few days ago, turns out they all do the right thing with the simple version of the patch [17:04:50] oh nice [17:05:02] (because the spdy module is still loaded (it's compiled-in), the variables still exist in the basic sense, so they all come up as if it just happened to be a non-spdy connection on a spdy-enabled server) [17:06:16] perfect [17:18:49] how hard should "reboot" really be for the kernel and hardware? :P [17:21:27] bblack: pulling the power plug should be enough I guess :) [17:22:22] yeah it's frustrating [17:22:56] I mean I know way way back in the day, this was pretty simple. it was invoking "int 0x10" with a certain argument, or jumping to some bios address, and it always worked [17:23:29] now with supposedly-more-advanced bios and firmware and BMCs and power management .... "reboot" is no longer a simple and reliable operation. [17:24:09] aside from whatever problem plagues cp30[34]x, something like 8 of the other ~100 machines have failed to reboot randomly as well (but rebooted fine on racadm powercycle) [17:24:43] wow [17:24:51] anything interesting on the console? [17:25:50] not really [17:26:17] two the most recent ones came back up into the kernel (so the reboot itself worked!) but at an (initramfs) prompt as if they failed to mount the rootfs [17:26:37] yet dmesg + /dev showed /dev/sd[ab][123] all fine, and just rebooting again caused them to boot properly [17:27:06] no other obvious problem in dmesg either, and of course we lack the output from before it dropped to the (initramfs) prompt [17:27:21] (it would be nice if it would capture a bit of backlog on the consoles...) [17:27:30] oh, perhaps /dev/sdblabla came up too late for some reason? [17:27:36] probably [17:27:42] blame systemd :P [17:27:51] :) [17:27:52] it has funny concepts around "too late" sometimes [17:28:33] but hey laptops boot 10% faster, so disrupting everything about how booting works and encouraging non-portable linux-only code all around is a win [18:38:22] 7HTTPS, 10Huggle: Huggle 2 fails on HTTP used when HTTPS expected - https://phabricator.wikimedia.org/T126357#2012072 (10DVdm) 3NEW [18:47:07] 7HTTPS, 10Huggle: Huggle 2 fails on HTTP used when HTTPS expected - https://phabricator.wikimedia.org/T126357#2012103 (10BBlack) The solution is pretty simple. All Wikimedia project URLs start with `https://`. If Huggle 2 uses `http://` for Wikimedia sites, that needs fixing! [19:01:33] 7HTTPS, 10Huggle: Huggle 2 fails on HTTP used when HTTPS expected - https://phabricator.wikimedia.org/T126357#2012152 (10DVdm) Yes, I tried to replace all occurences of http:// in config.txt with https:// the error does not occur, but I get another error: "Failed to load configuration pages!: You must be lo... [19:03:21] 7HTTPS, 10Huggle: Huggle 2 fails on HTTP used when HTTPS expected - https://phabricator.wikimedia.org/T126357#2012153 (10BBlack) CC @Petrb - I really don't know the Huggle code to tell you what to try next. [21:00:46] 10Traffic, 6Zero, 6operations: Use Text IP for Mobile hostnames to gain SPDY/H2 coalesce between the two - https://phabricator.wikimedia.org/T124482#2012719 (10BBlack) p:5Triage>3Low Updates: 1. We're still trying to get to the bottom of historical and present mysteries about Zero-rated whitelist subnet... [21:41:40] 7HTTPS, 10Huggle: Huggle 2 fails on HTTP used when HTTPS expected - https://phabricator.wikimedia.org/T126357#2012865 (10Se4598) Wasn't https forced already long before "28/29 Jan 2016"? Nevermind, there are locations in the vb-sourcecode where http is hardcoded. Also even if there were https, afaik there woul... [21:49:33] 7HTTPS, 10Huggle: Huggle 2 fails on HTTP used when HTTPS expected - https://phabricator.wikimedia.org/T126357#2012884 (10Josve05a) or...or...Download Huggle 3.x.x instead. And mark 2.x.x as defunct and move on. [22:08:35] 10Traffic, 10Deployment-Systems, 6Performance-Team, 6operations, 5Patch-For-Review: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime - https://phabricator.wikimedia.org/T99096#2012953 (10Krinkle) [22:10:59] 7HTTPS, 10Huggle: Huggle 2 fails on HTTP used when HTTPS expected - https://phabricator.wikimedia.org/T126357#2012976 (10BBlack) >>! In T126357#2012865, @Se4598 wrote: > Wasn't https forced already long before "28/29 Jan 2016"? Yes, for `GET`/`HEAD` requests, for the better part of a year now, by directing th... [22:14:02] 7HTTPS, 10Huggle: Huggle 2 fails on HTTP used when HTTPS expected - https://phabricator.wikimedia.org/T126357#2012994 (10Luke081515) >>! In T126357#2012865, @Se4598 wrote: > Wasn't https forced already long before "28/29 Jan 2016"? (...) Since this day, the API throws a warning, the "HTTP used when HTTPS expec... [22:48:23] 7HTTPS, 10Huggle: Huggle 2 fails on HTTP used when HTTPS expected - https://phabricator.wikimedia.org/T126357#2013160 (10bd808) >>! In T126357#2012994, @Luke081515 wrote: >>>! In T126357#2012865, @Se4598 wrote: >> Wasn't https forced already long before "28/29 Jan 2016"? (...) > Since this day, the API throws...