[06:22:01] 10Traffic, 10Operations: Provide nginx support in compile_redirects() - https://phabricator.wikimedia.org/T224539 (10Vgutierrez) 05Open→03Resolved [06:22:08] 10HTTPS, 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Vgutierrez) [09:45:28] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqiad - https://phabricator.wikimedia.org/T226638 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp1086.eqiad.wmnet'] ` The log can be found in `... [10:52:40] 10Traffic, 10Operations: Replace Varnish backends with ATS on cache upload nodes in eqiad - https://phabricator.wikimedia.org/T226638 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp1086.eqiad.wmnet'] ` and were **ALL** successful. [11:04:10] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqiad - https://phabricator.wikimedia.org/T226638 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp1088.eqiad.wmnet'] ` The log can be found in `... [11:45:14] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqiad - https://phabricator.wikimedia.org/T226638 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp1088.eqiad.wmnet'] ` and were **ALL** successful. [12:02:47] 10Traffic, 10Operations, 10Performance-Team, 10Patch-For-Review, 10Performance: Study performance impact of disabling TCP selective acknowledgments - https://phabricator.wikimedia.org/T225998 (10ema) @Gilles: is there anything left to be done here? Other than blogging about the results that is. :-) [12:05:17] 10Traffic, 10Analytics, 10Operations: Increased number of webrequest sequence-numbers alarms (mostly) on upload webrequest-source - https://phabricator.wikimedia.org/T225786 (10ema) [12:15:17] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqiad - https://phabricator.wikimedia.org/T226638 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by ema on cumin1001.eqiad.wmnet for hosts: ` ['cp1090.eqiad.wmnet'] ` The log can be found in `... [12:43:39] 10Traffic, 10Operations, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10MoritzMuehlenhoff) @Vgutierrez The firmware update on the NICs fixed this for good, right? Can we close this task? [13:01:16] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqiad - https://phabricator.wikimedia.org/T226638 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp1090.eqiad.wmnet'] ` and were **ALL** successful. [13:06:09] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes in eqiad - https://phabricator.wikimedia.org/T226638 (10ema) 05Open→03Resolved a:03ema With the conversion of cp1090 this is now done. [13:06:11] 10Traffic, 10Operations, 10Patch-For-Review: Replace Varnish backends with ATS on cache upload nodes - https://phabricator.wikimedia.org/T226589 (10ema) [13:07:05] ema: \o/ :) [13:07:42] done! :) [13:08:29] I took out the "finish cache_upload conversion" part of the next Q goal [13:09:08] https://etherpad.wikimedia.org/p/SRE-goals-FQ1-FY1920 has the draft stuff I pasted into a section currently near the bottom [13:09:19] feel free to amend/rationalize as appropriate :) [13:10:38] there could/should maybe be something about finishing up the non-canonical redirect stuff (as in, turning on live usage of it), but maybe that's just a background nongoal as long as we don't forget it. [13:12:23] (or we just say we're still finishing up the prev goal as stated into the early part of this Q, I donno. It's not clear to me that we have any standards right now about how to handle the misalignment of real work and quarterly cycles... do goals that run over (intentionally or otherwise) get refresher goals at each Q boundary? Do we just continue working on the goal in the prev-Q slot to complet [13:12:30] ion? etc..) [13:13:34] on that note, what I'd really like to see us reach eventually, is a system where we have a different sort of abstraction for this that's not so quarter-bound. [13:14:06] e.g. more like a system of tracking "ongoing major projects" that may have start dates and estimated end dates, and may run for anywhere from a few weeks to multiple quarters. [13:14:32] and have the quarterly boundary be more like a check-in of progress on those that are ongoing or have completed in the most-recent Q. [13:22:49] 10Traffic, 10Operations, 10ops-eqiad: cp1083 crashed - https://phabricator.wikimedia.org/T222620 (10ema) 05Open→03Resolved a:03ema The host has been in production for weeks without issues now. Closing. [13:24:36] that's an interesting idea, though I think I like having deadlines [13:26:44] yeah, deadlines are useful too :) [13:26:59] but right now I feel like the quarterly system just imposes an artificial cadence to things [13:27:28] we may sometimes do less than we could just because an 8-week-ish thing was a "whole quarter" goal and so we slack and make it into a 12-week-ish thing. [13:27:45] and we sometimes overload because what should've taken 18 weeks gets crammed into a quarter because we really want to check that box on time [13:29:34] and for longer-term efforts in various architectural areas in general: we might know a rough plan for the next 7 things that need to happen. [13:30:05] but then the process of quarterly planning devolves into making some arbitrary decisions about which 2 or 5 of those things fits exactly into a quarter (which they never do, even by rough estimation) [13:30:46] so we have either stretch targets of varying likelihood, or some varying liklihood of missing EOQ, or something. It's kind of artificial. [13:43:24] 10Traffic, 10Operations: Rename role::cache::upload_ats to role::cache::upload - https://phabricator.wikimedia.org/T227328 (10ema) [13:43:47] 10Traffic, 10Operations: Rename role::cache::upload_ats to role::cache::upload - https://phabricator.wikimedia.org/T227328 (10ema) p:05Triage→03Normal [13:57:20] bblack: ah, I almost forgot -- https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/512925/ [13:57:35] the changes look good, but that adds so many new services [13:57:52] from 12 to 24! [19:06:30] 10Traffic, 10CX-cxserver, 10Citoid, 10Operations, and 4 others: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001 (10WDoranWMF)