[05:20:29] 10netops, 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4135199 (10Marostegui) There is definitely an impact on which kernel we are running. After running: ``` root@db1114:~# uname -a Linux db1114 4.9.0-4-amd64 #1 SMP Debian 4.9... [06:54:09] 10Traffic, 10Operations, 10Patch-For-Review: Migrate dns caches to stretch - https://phabricator.wikimedia.org/T187090#4135292 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on neodymium.eqiad.wmnet for hosts: ``` chromium.wikimedia.org ``` The log can be found in `/var/log/wmf-aut... [07:25:34] 10Traffic, 10Operations, 10Patch-For-Review: Migrate dns caches to stretch - https://phabricator.wikimedia.org/T187090#4135310 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['chromium.wikimedia.org'] ``` Of which those **FAILED**: ``` ['chromium.wikimedia.org'] ``` [07:47:07] 10netops, 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4135346 (10Marostegui) [08:51:34] 10Traffic, 10Operations, 10Patch-For-Review: Migrate dns caches to stretch - https://phabricator.wikimedia.org/T187090#4135444 (10Vgutierrez) 05Open>03Resolved a:03Vgutierrez [08:51:49] \o/ [09:06:35] yay! [09:16:32] 10Traffic, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 5 others: Proxies information gone from Zero portal - https://phabricator.wikimedia.org/T187014#4135465 (10ema) Here's pageview hourly after deploying the changes above: {F17025595} US going down, upwards trend for India and Nigeria... [12:53:46] hmm this is weird [12:53:49] https://puppet-compiler.wmflabs.org/compiler02/10948/ [12:54:05] changes for dns5001 are the expected ones: https://puppet-compiler.wmflabs.org/compiler02/10948/dns5001.wikimedia.org/ [12:54:23] for some reason, for hydrogen.wikimedia.org.. it's using a deprecated catalog (prior to reimaging to stretch) [12:54:43] vgutierrez: did you updated the compiler facts? [12:54:59] hmm nope [12:55:04] https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet3-diffs#FAQ [12:55:44] oh, thx <3 [12:55:53] I was going crazy [12:55:56] blaming my layer8 [12:56:12] lol, yw :) [13:40:46] bblack, ema: re T191897 can I continue reimaging non eqiad secondary LVSs? aka lvs[2004-2005].codfw.wmnet,lvs[3003-3004].esams.wmnet? [13:40:46] T191897: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897 [13:44:41] vgutierrez: +1 from my side, but let's wait for brandon too. IIRC he wanted to double-check things [13:46:02] yup [13:46:03] I did sometime last week, all good here [13:46:07] ack [13:46:09] did we do a multi-interface one yet? [13:46:20] (I checked the single-interface ones) [13:46:29] indeed [13:46:34] lvs2006.codfw.wmnet :) [13:46:36] right [13:47:04] we need to give some love to the "NTPD time servers" dashboard BTW [13:47:13] (it's missing dns4* and dns5* servers) [13:47:21] in any case, from what I've seen while staring at the others, it seems incredibly unlikely there could be a multi-interface-specific problem in how the puppetization plays out [13:47:27] but, I'm seeing weird stuff on the delay graph [13:47:35] well, other than "failing to set which interfaces names map to which rows" correctly heh [13:47:41] https://grafana.wikimedia.org/dashboard/db/ntp-time-servers?orgId=1&var-Server=acamar&var-Server=achernar&var-Server=chromium&var-Server=hydrogen&var-Server=maerlant&var-Server=nescio&var-ntpd_metric=delay [13:48:25] what is the delay metric anyways? average Delay of peers/servers of this server? [13:48:42] things change, we bumped up through some major ntp versions with the switch [13:48:51] and switching our peering/pooling/servers config [13:55:44] bblack: hey! Current thoughts on https://gerrit.wikimedia.org/r/#/c/421542? [13:57:49] as it is now, the patch is mostly about moving from hfp to hfm (unless there's potential for conditional requests) [13:58:25] we've circled around that a few times and I forgot what the conclusions were [13:59:37] then there's the whole "unconditional return(deliver) in vcl_hit" topic which should probably be addressed separatedly [14:09:04] ema: yeah.... both are inter-related, and there's a lot of stuff in there... [14:09:35] ema: I'm inclined to say "split that a little bit into multiple commits", e.g. the 4xx-vs-404 thing separately for consideration (are there other 404-like cases?) [14:10:36] ema: but then on the other hand, I'm inclined in the other direction that we shouldn't solve the problems that commit is trying to solve without first considering the vcl_hit issue as well and fixing it all together, because it's all inter-related, which moves in the direction of more commit aggregation rather than splitting [14:12:26] ema: on that latter part, I'm still behind you way back at "I can't even understand the VTC results about keep-v-grace" [14:15:21] bblack: yeah I think we need a task to track that, creating it [14:16:14] 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4136099 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on neodymium.eqiad.wmnet for hosts: ``` lvs3004.esams.wmnet ``` The log can be found in `/var/lo... [14:19:57] 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4136107 (10Vgutierrez) [14:21:51] 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4120349 (10Vgutierrez) [14:39:55] ah, interesting [14:40:32] bblack, XioNoX re: https://phabricator.wikimedia.org/T184293#4005921 how hostports are being identified right now? we could include a step to be done while the server is being racked: map legacy interface names (ethX) to PNI naming [14:40:46] my initial vtc test case w/ keep did not trigger any varnish conditional requests https://phabricator.wikimedia.org/P6970 [14:41:00] that's because I was returning an empty body from the origin server [14:41:45] 10netops, 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4136190 (10Marostegui) Cable has been replaced by @Cmjohnson just now. [14:42:33] varnish is apparently being smart and avoiding conditional requests for empty objects then [14:48:44] I guess under the assumption that if the object is unchanged we still need to transfer the headers only, if it has changed we need a full fetch anyways [14:58:55] 10netops, 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4136265 (10Marostegui) [15:01:25] 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4136271 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['lvs3004.esams.wmnet'] ``` and were **ALL** successful. [15:06:21] makes sense (no conditionals if no body) [15:09:01] 10netops, 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1114 connection issues - https://phabricator.wikimedia.org/T191996#4136295 (10Marostegui) [15:11:49] 10Traffic, 10Operations: Unconditional return(deliver) in vcl_hit - https://phabricator.wikimedia.org/T192368#4136305 (10ema) [15:12:00] ok I've spent too much time drafting that task, at a certain point I just had to hit 'create' [15:12:20] 10Traffic, 10Operations: Unconditional return(deliver) in vcl_hit - https://phabricator.wikimedia.org/T192368#4136317 (10ema) p:05Triage>03Normal [15:12:32] there's probably other questions to ask :) [15:23:45] gotta go afk, I might be around later tonight o/ [15:30:33] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#3878953 (10Vgutierrez) @Cmjohnson how are hostports being identified right now? I mean, how do you know which interface is eth0 and which one is eth3? We are currently upgra... [15:31:41] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4136494 (10Cmjohnson) @ayounsi the new card arrived and is installed...all the fibers are run. I need to know which port you prefer on each switch. I was going to use xe-4... [15:39:08] 10Traffic, 10Operations, 10Pybal, 10Patch-For-Review: Reimage LVS servers as stretch - https://phabricator.wikimedia.org/T191897#4136535 (10Vgutierrez) [15:59:43] 10Traffic, 10Operations, 10ops-eqiad: sda failure in hydrogen.wikimedia.org - https://phabricator.wikimedia.org/T192280#4136603 (10Cmjohnson) This servers warranty expired in 2014 and should be replaced instead of repaired. @faidon please comment. [16:05:57] 10Traffic, 10Operations, 10ops-eqiad: sda failure in hydrogen.wikimedia.org - https://phabricator.wikimedia.org/T192280#4136626 (10faidon) Yup, a replacement is underway as part of T189317 :) [16:17:33] 10Traffic, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 12 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#4136660 (10Fjalapeno) [17:44:47] 10Traffic, 10Fundraising-Backlog, 10Operations, 10fundraising-tech-ops: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561#4136977 (10CCogdill_WMF) Bumping this! We are doing a series of newsletter tests with Chapters this quarter and it is really important for us to have ac... [17:49:03] 10Traffic, 10Fundraising-Backlog, 10Operations, 10fundraising-tech-ops: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561#4137007 (10Dzahn) 05Open>03stalled [19:48:43] 10Traffic, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4137521 (10ayounsi) @Cmjohnson: would that works for you? |lvs1016|eth0/eno1|asw2-d:xe-7/0/15|cable #4061| |lvs1016|eth1/eno2|asw2-a:xe-4/0/7 |cable #3917| |lvs1016|eth2/ens... [20:52:23] 10Traffic, 10netops, 10Operations, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090#4137680 (10ayounsi) [21:12:17] 10Traffic, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 5 others: Proxies information gone from Zero portal - https://phabricator.wikimedia.org/T187014#4137700 (10Nuria) {F17046846} Indeed things look like they are coming back, Nigeria pageviews are present again and US traffic is quite... [21:15:12] 10Traffic, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 5 others: Proxies information gone from Zero portal. Opera mini pageviews geolocating to wrong country - https://phabricator.wikimedia.org/T187014#4137702 (10Nuria) [21:15:47] 10Traffic, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 5 others: Proxies information gone from Zero portal. Opera mini pageviews geolocating to wrong country - https://phabricator.wikimedia.org/T187014#3961955 (10Nuria) Solving ticket. Added note to: https://wikitech.wikimedia.org/wiki/A... [21:15:55] 10Traffic, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 5 others: Proxies information gone from Zero portal. Opera mini pageviews geolocating to wrong country - https://phabricator.wikimedia.org/T187014#4137704 (10Nuria) Solving ticket. Added note to: https://wikitech.wikimedia.org/wiki/A... [21:16:41] 10Traffic, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 5 others: Proxies information gone from Zero portal. Opera mini pageviews geolocating to wrong country - https://phabricator.wikimedia.org/T187014#3961955 (10Nuria) 05Open>03Resolved [22:20:51] 10Traffic, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 5 others: Proxies information gone from Zero portal. Opera mini pageviews geolocating to wrong country - https://phabricator.wikimedia.org/T187014#3961964 (10atgo) Thank you! [22:30:55] 10Traffic, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 5 others: Proxies information gone from Zero portal. Opera mini pageviews geolocating to wrong country - https://phabricator.wikimedia.org/T187014#4137843 (10DFoy) Thanks everyone, great to see this working again! [23:48:02] 10Traffic, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 5 others: Proxies information gone from Zero portal. Opera mini pageviews geolocating to wrong country - https://phabricator.wikimedia.org/T187014#4138028 (10Tbayer) Great, thanks everyone! But do we now know what caused the correct... [23:49:48] 10Traffic, 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, and 5 others: Proxies information gone from Zero portal. Opera mini pageviews geolocating to wrong country - https://phabricator.wikimedia.org/T187014#4138030 (10Nuria) The proxy list for zero was emptied and that must have included also...