[05:35:06] https://lwn.net/SubscriberLink/800501/c42e5c5e9243637b/ - Upstreaming multipath TCP [05:40:13] akosiaris: k8s is slowly taking over the load-balancers dashboard :) - https://grafana.wikimedia.org/d/000000343/load-balancers?orgId=1 [05:48:36] <_joe_> XioNoX: uhm why is that? [05:48:42] <_joe_> those servers are not load balancers [05:48:47] <_joe_> something is wrong I think [05:48:53] yeah I don't know :) [05:49:03] was looking for LB traffic [05:49:31] <_joe_> XioNoX: heh yeah I think kube can use ipvs mode [05:49:41] <_joe_> so I think probably prometheus is collecting such metrics [05:49:44] <_joe_> but [05:49:59] <_joe_> you can just add a $cluster variable to your graphs [05:50:07] <_joe_> can I modify it? [05:50:38] not mine, but I'd guess yeah :) [05:59:24] <_joe_> it's fixed now [06:01:48] cool, thx! [08:23:49] <_joe_> volans: correct me if I'm wrong, but spicerack/cumin have no way to copy a file to multiple hosts, right? [08:26:21] <_joe_> also - I find myself copying a lot of boilerplate between python scripts. Shouldn't we start a common SRE library for CLI scripts helper functions? [09:02:11] _joe_: correct, that was on purpose but I've seen more use cases recently so maybe we could change that decision [09:02:37] what's your current use case? because there might be ways, based on what you need [09:07:00] <_joe_> volans: I was thinking of automating the recovery of an etcd cluster. Part of it is choosing a datastore snapshot you trust and copy it to the hosts [09:09:22] technically using the keyholder socket on acumin host you can scp with that, you have to make the snapshot pass through the cumin host, but for etcd sizes shouldn't be a problem [09:40:22] <_joe_> yes [12:42:27] cdanis: re: swift, ms-be hosts in codfw should be ready by today or monday, I initially thought we'd be trying buster in https://phabricator.wikimedia.org/T229911 but now I'm thinking I'd rather go ahead with stretch and postpone distro+swift upgrade [12:42:53] at least reinstalling ms-be doesn't wipe the hdd so it isn't that big of a deal [12:43:43] also this is gen10 hw and first time we use it in ms-be I'm told, good times [12:44:48] I think postponing makes some sense, there's enough to do in just setting up new machines anyway [12:45:20] is the RAID controller different from one we've used before? [12:46:28] good question -- IIRC a new version of gen9 raid controller [12:47:36] ok. I will have time to help next week, keep me in the loop :) [12:49:55] for sure -- thanks! [13:11:59] the gen10 RAID stuff has been sorted out before a DB host, the only change is that one needs to use ssacli instead of hpssacli (the former supports the Gen10 hw) [13:12:10] https://phabricator.wikimedia.org/T220787 for some background [13:12:30] I had imported that to the hwraid component, so that should "just work" [13:12:35] thanks moritzm! [13:19:04] I have migrated noc.wikipedia.org to php7 [13:19:23] this https://noc.wikimedia.org/db.php [13:19:25] looks ok to me [13:19:44] <_joe_> so the last thing missing is just the dreaded apple search interface [13:19:46] ping me if for any reason you find something not right [13:19:56] taht is wwwportals yes? [13:19:58] <_joe_> that somehow manages to always be the last thing we move over [13:19:58] or search? [13:20:09] <_joe_> search IIRC [13:20:10] I will check them after noc [13:20:27] <_joe_> yes, I was just noticing how it's always the last thing to go :D [13:23:20] I don't see it complaining [13:23:21] :D [13:24:06] I'm trying to remember if the sort order was different before [13:24:09] on db.php [13:24:36] not an actual concern, just think I remember DEFAULT being sorted as if it were s3, but there could have also been a code change anyway [13:29:02] tx chris [13:31:28] no no, thank you [13:33:45] hahaha [13:43:41] _joe_: is this a valid way to verify that search works? [13:43:43] https://search.wikimedia.org/index.php?search=lala [13:44:28] <_joe_> no idea, but the output isn't promising [13:44:32] <_joe_> what did you change? [13:44:43] I added the filehandler [13:44:47] (on mwdebug) [13:45:06] <_joe_> because right now I get the php file [13:45:07] ok I will find timo [13:45:14] I am not [13:45:20] on mwdebug1001 [13:45:29] <_joe_> no I mean from a normal appserver [13:45:34] <_joe_> this is pretty bad [13:46:13] <_joe_> and no, I get the php output on mwdebug too [13:46:17] <_joe_> for the string you searched [13:46:39] <_joe_> scratch that, not on mwdebug [13:47:41] <_joe_> RewriteRule ^/$ fcgi://127.0.0.1:9000/srv/mediawiki/docroot/search.wikimedia.org/index.php [P] [13:47:49] <_joe_> why is this not working? [13:48:46] no idea, let me go back and see what timo was telling us yesterday [13:48:55] <_joe_> anyways, this is not what timo was referring to [13:49:01] <_joe_> that is served from wwwportals [13:49:28] I don't remember, that's what I wanted to look [13:49:54] but if mwdebug's respose and output is valid [13:49:54] <_joe_> effie: the correct url to use would be [13:49:59] <_joe_> https://search.wikimedia.org/?search=lalaw [13:50:10] what is wrong with lala ? [13:50:10] <_joe_> without index.php [13:50:15] oh :p [13:50:30] <_joe_> so there is a bug in the current configuration your patch would fix [13:50:38] ok yeah I understand now [13:50:40] <_joe_> but it would need to respond correctly to this request [13:50:44] <_joe_> lemme see [13:51:15] <_joe_> yes it should, because we automatically use index.php for / [13:51:25] * effie broke it [13:51:27] :p [13:51:42] ok, I will move forward [13:51:45] <_joe_> I don't think you did [14:18:15] is there something in particular i need to check with portals?\ [14:18:24] www.wikipedia.org works on mwdebug [14:20:31] <_joe_> well look at the apache logs for the path currently served by HHVM I guess [14:23:48] like https://www.wikipedia.org/ ? [15:06:12] <_joe_> no I meant grep on a appserver and get the urls that are requested via HHVM [15:06:23] <_joe_> fgrep 127.0.0.1:9000 /var/log/apache2/other_vhosts_access.log | head -n 1 should suffice [15:06:26] chaomodus: I have a question about https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/514395/51/modules/postgresql/manifests/master.pp [15:06:29] (really a puppet question) [15:11:44] In particular I'm looking at a profile that does [15:11:47] https://www.irccloud.com/pastebin/1EZ1fprU/ [15:12:16] Which breaks now that 'includes' is typed as an array. But… it has me wondering what it did /before/ that explicit type was added. Did puppet just shrug and say, must be an array with one item? [15:14:14] hm, it's used lower down as "[ $includes, 'master.conf']," which makes me think it shouldn't be an array to begin with... [15:22:17] yah the issue is some modules use arrays and some modules use strings [15:22:29] err, some .. callers includers whatever [15:23:27] (oh i guess the ambiguous usage is more in the slave.pp stuff) [15:24:40] imo we should rationalize it across all of these [16:02:32] which i mean either you're going to have ]to pass an array there, or you're going to wantto change the type (or get rid of the type if there's no time for either of those) [16:23:29] Is someone working on puppetdb2002? this has been alerting for 4 days: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=puppetdb2002&service=Postgres+Replication+Lag "ERROR: FATAL: password authentication failed for user "replication"" [16:24:02] i think j.bond if i had to guess? [16:26:58] I think he is out for today, anyone else? Or is there a tracking task (didn't find anything with a quick phab search) [16:31:32] also mwdebug1002 is alerting about low disk space - https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=mwdebug1002&service=Disk+space /cc _joe_ maybe effie ? [16:34:04] oh damn I saw it this morning and then forgot [16:34:22] let me take a look [16:34:51] thx! [16:36:25] tx as well [16:36:40] it alerted? it was warning in the morning [16:36:45] anyway doesn't matter [16:37:05] yeah, alerting as warning [16:41:32] oh ok [16:41:59] I will create a task for monday, it is all mwdebug* and we just need to remove some mediawiki versions [16:42:56] thx! [16:44:38] moritzm: is https://phabricator.wikimedia.org/T234047 for network devices only or ferm too? [16:48:47] nevermind, commented on the task [17:52:17] XioNoX: for network devices, thx. I'll reach out to OIT for the changes on their end