[13:10:50] 10Traffic, 10Operations, 10Pybal, 10monitoring, 10Patch-For-Review: pybal: add prometheus metrics - https://phabricator.wikimedia.org/T171710#3473875 (10faidon) I know a bunch of work happened during the Wikimania hackathon, but what's the status of this? [14:52:38] morning ema :) [14:53:07] so the puppetization refactors for future-parser, are you just kinda working through squashing the errors as they go, or do you have some broader plan about how to refactor the various modules/classes better? [14:53:35] the module/class layouts we have now are definitely-crazy. they're the product of many partial refactorings. [14:54:27] I worry a bit about the commits that move hiera calls from role:: into varnish::, they may fix future-parser, but they're also moving backwards against the grain of our standards on these things [15:16:39] 10Traffic, 10Diamond, 10Operations, 10monitoring, 10Prometheus-metrics-monitoring: Enable diamond PowerDNSRecursor collector on dnsrecursors - https://phabricator.wikimedia.org/T169600#3584652 (10faidon) a:03akosiaris [15:25:11] bblack: hi! [15:25:20] so far I've been pushing left and right till I [15:25:31] till I've managed to please the future parser [15:26:00] with _joe_ we were discussing a few things though, including the fact that varnish::instances should probably be a class instead of a define [15:26:30] for all I know varnish::instances really shouldn't exist, but it seemed like it should at the time [15:26:50] err, cache::instances [15:27:06] which would allow us to leave the hiera calls where they were, and access them from the templates with scope['varnish::instances::whatever'] [15:27:18] s/them/the attributes/ [15:27:37] right now a lot of the problems stem, structurally, from the poorly-placed dividing lines between role::cache:: and varnish:: [15:28:14] and also the poor logical scoping (regardless of future parser) of some of the variables as being one of "global to all caches", "cluster-specific", "cluster-instance-specific", etc.... [15:29:26] there's probably a fair amount, by now, of low-hanging fruit for refactoring away excess complexity in general. some things were structured in an overly-complex way to meet past needs, but not simplified later when the needs were simplified [15:30:18] anyways, I don't have any concrete inputs, and I don't think I have time to wrap my brain fully around the whole scope of it all anytime soon. [15:30:30] good luck? :) [15:30:52] hehe thanks! :) [15:31:47] I'm gonna try and see what can be simplified/refactored and keep you posted [17:48:08] 10Traffic, 10netops, 10Operations, 10ops-eqiad: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3585595 (10Cmjohnson) Created the bootable img using the HP utility provided in the iso. It is a Windows software and had to borrow from a family member. Booted the Service pack and... [18:02:03] ema: I'm going for now with the theory that the accidentally (for quite a long while now) 1d keep values on cache_upload are hurting rather than helping the mailbox lag situation [18:02:32] ema: (the intended 7d keep there would result in fewer expiries from the cache?) [19:36:16] 10Traffic, 10Operations, 10Pybal: Implement stateless TCP balancing in our LVS servers - https://phabricator.wikimedia.org/T175203#3586081 (10BBlack) [19:36:25] 10Traffic, 10Operations, 10Pybal: Implement stateless TCP balancing in our LVS servers - https://phabricator.wikimedia.org/T175203#3586097 (10BBlack) p:05Triage>03High [19:36:48] 10Traffic, 10Operations, 10Pybal: Implement stateless TCP balancing in our LVS servers - https://phabricator.wikimedia.org/T175203#3586081 (10BBlack) [22:20:15] 10Traffic, 10Analytics, 10Operations: Invalid "wikimedia" family in unique devices data due to misplaced WMF-Last-Access-Global cookie - https://phabricator.wikimedia.org/T174640#3586807 (10Dzahn) [23:52:36] 10Traffic, 10Operations: Lower geodns TTLs from 600 to 300 - https://phabricator.wikimedia.org/T140365#2462333 (10herron) It would also be good to know that a single server will handle the increased load when degraded. Adjusting the TTL before adding redundancy/capacity may be advantageous in that it could hi...