[00:04:17] Krinkle: based on our discussion, i'm thinking 1) add the statsd publisher to just a single high-project job; 2) modify the script to only submit stats based on node labels that match /^statsd-/; 3) label docker nodes that i want to measure with "stats-{instancetype}" [00:04:49] i think that might be better than switching job labels around just because doing so might affect their performance for other reasons [00:06:07] and i want to get metrics (segmented by node instance type) without my measurements changing the behavior of the cluster, as much as that is possible= [00:09:19] Just to clarify, what I meant is that where we have DebianJessieDocker now, we'd also have DebianJessieDockerM4 (for example), and then the jobs to try out on M4 would have their label changed from DebianJessieDocker to DebianJessieDockerM4. Then, with job+project* metrics and label.join()* metrics you'll have (for the former) a graph that will improve or regress and always runs on the same type of node. And the latter providing general high [00:09:19] level counts for all jobs on that type. [00:09:19] M4: Phabricator project labels - https://phabricator.wikimedia.org/M4 [00:09:45] Instead of DebianJessieDockerM4 one could also have DebianJessieDocker && m4, which would work the same way indeed. I think that's what you're using now. [00:11:14] but couldn't the graph values improve or regress based on capacity too? [00:12:22] for instance, almost all of our nodes at the moment use m1.medium instances. i've introduced two more nodes, one on a m1.xlarge, and one on a bigmem). [00:12:55] if i switch a high-project job over from the m1.medium pool to the m1.xlarge, there will very likely be a regression, but like due to capacity [00:12:59] *likely* [00:14:12] if I instead make no changes to the running cluster (nodes continuing to be selected at random), but just in what i'm measuring, it seems like i would get a clearer picture of the differences in performance between the different instance types [00:16:16] i'm taking a dog training approach here, btw. :) as soon as a trainer walks into the room, the environment changes, and the dog behaves differently. [00:33:14] (03PS3) 10Dduvall: Statsd publisher that sends job/node metrics to statsd.eqiad.wmnet [integration/config] - 10https://gerrit.wikimedia.org/r/455269 (https://phabricator.wikimedia.org/T201972) [00:34:16] Krinkle: made some updates ^ there. thanks again for the feedback! i'm going to come back to it on monday with a not so exhausted brain [01:08:16] 10Phabricator: Convert RT links in Bugzilla comments in links in Phabricator tasks - https://phabricator.wikimedia.org/T874 (10Dzahn) Kind of unfortunate though because not all tickets from RT have been imported and the ones that have are mostly set to NDA and you only find them with advanced search if you reall... [01:08:36] marxarelli: I understand now. I missed the part where executor count would differ. [01:09:02] I was thinking you're considering different resources available per slot, to see what works best and still run fast enough. [01:09:39] But yeah, overall cluster capacity also factors in given you're not swapping labels with the same effective executor count available to them. [01:10:30] makes sense to me! [02:03:11] 10Phabricator: Convert RT links in Bugzilla comments in links in Phabricator tasks - https://phabricator.wikimedia.org/T874 (10Peachey88) >>! In T874#4531707, @Dzahn wrote: > Kind of unfortunate though because not all tickets from RT have been imported and the ones that have are mostly set to NDA and you only fi... [03:06:30] Krenair: so we need to package python-ib3 then? [03:07:11] legoktm, either that or we do some awful hacks to only load the module if a password is provided [03:07:18] and never give it a password in prod [03:07:27] uh doesn't that defeat the point? [03:07:33] hm? [03:07:39] since we want ircecho to auth? [03:07:53] yeah so we'll give it a password for shinken [03:08:27] and icinga-wm? [03:08:33] not my problem [03:08:39] lives in prod [03:09:41] other option is to take a copy of auth.py [03:09:43] so everyone's problem then :) [03:09:46] * legoktm afk -> dinner [03:09:57] no one outside ops can touch that one [03:10:06] aka their problem [03:10:39] another option would be to take a copy of the class we need [03:11:30] ircecho is PD whereas that's GPLv3+ [03:12:00] guess we could just put it into a separate class and have ircecho import it [03:12:12] in any case, the pip package works within labs [03:12:22] if ops won't accept that in prod that's fine but the prod bots are their problem [03:17:41] put it into a separate module* [03:23:05] icinga-wm doesn't even really need to auth [03:23:26] it has a single public hostname [03:24:13] it appears to join a much smaller list of channels, in at least one of which it's exempted with a hostname match [03:25:07] no need to worry about stuff like that wmflabs.org exempt we were using earlier [03:25:19] * Krenair -> sleep [04:21:05] (03CR) 10Legoktm: [C: 032] Run seccheck for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/455107 (owner: 10Umherirrender) [04:21:24] (03CR) 10Legoktm: [C: 032] Make seccheck for Thanks voting [integration/config] - 10https://gerrit.wikimedia.org/r/454776 (owner: 10Umherirrender) [04:22:36] (03Merged) 10jenkins-bot: Run seccheck for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/455107 (owner: 10Umherirrender) [04:23:27] !log deployed https://gerrit.wikimedia.org/r/455107 [04:23:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [04:23:45] (03PS2) 10Legoktm: Make seccheck for Thanks voting [integration/config] - 10https://gerrit.wikimedia.org/r/454776 (owner: 10Umherirrender) [04:23:53] (03CR) 10Legoktm: [C: 032] "..." [integration/config] - 10https://gerrit.wikimedia.org/r/454776 (owner: 10Umherirrender) [04:25:27] (03Merged) 10jenkins-bot: Make seccheck for Thanks voting [integration/config] - 10https://gerrit.wikimedia.org/r/454776 (owner: 10Umherirrender) [04:26:04] !log deployed https://gerrit.wikimedia.org/r/454776 [04:26:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [04:29:59] (03PS1) 10Legoktm: seccheck for WikibaseLexeme and VisualEditor [integration/config] - 10https://gerrit.wikimedia.org/r/455279 (https://phabricator.wikimedia.org/T202388) [04:31:41] (03CR) 10Legoktm: [C: 032] seccheck for WikibaseLexeme and VisualEditor [integration/config] - 10https://gerrit.wikimedia.org/r/455279 (https://phabricator.wikimedia.org/T202388) (owner: 10Legoktm) [04:33:15] (03Merged) 10jenkins-bot: seccheck for WikibaseLexeme and VisualEditor [integration/config] - 10https://gerrit.wikimedia.org/r/455279 (https://phabricator.wikimedia.org/T202388) (owner: 10Legoktm) [04:34:52] !log deployed https://gerrit.wikimedia.org/r/455279 [04:34:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [04:38:53] (03PS1) 10Legoktm: Remove seccheck from Echo [integration/config] - 10https://gerrit.wikimedia.org/r/455282 [04:43:23] PROBLEM - Puppet errors on deployment-memc06 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [04:52:42] 10Continuous-Integration-Config, 10Wikimedia-General-or-Unknown, 10phan-taint-check-plugin, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), 10Patch-For-Review: Enable phan-taint-check-plugin on all Wikimedia-deployed repositories where it is curr... - https://phabricator.wikimedia.org/T201219 [05:18:21] RECOVERY - Puppet errors on deployment-memc06 is OK: OK: Less than 1.00% above the threshold [0.0] [05:22:27] 10Continuous-Integration-Config, 10Wikimedia-General-or-Unknown, 10phan-taint-check-plugin, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), 10Patch-For-Review: Enable phan-taint-check-plugin on all Wikimedia-deployed repositories where it is curr... - https://phabricator.wikimedia.org/T201219 [06:06:23] (03PS1) 10Legoktm: Generate coverage reports for more extensions [integration/config] - 10https://gerrit.wikimedia.org/r/455285 [06:06:41] (03CR) 10Legoktm: [C: 032] Generate coverage reports for more extensions [integration/config] - 10https://gerrit.wikimedia.org/r/455285 (owner: 10Legoktm) [06:06:45] (03CR) 10Legoktm: [C: 032] Remove seccheck from Echo [integration/config] - 10https://gerrit.wikimedia.org/r/455282 (owner: 10Legoktm) [06:09:22] (03Merged) 10jenkins-bot: Remove seccheck from Echo [integration/config] - 10https://gerrit.wikimedia.org/r/455282 (owner: 10Legoktm) [06:09:24] (03Merged) 10jenkins-bot: Generate coverage reports for more extensions [integration/config] - 10https://gerrit.wikimedia.org/r/455285 (owner: 10Legoktm) [06:09:50] !log deployed https://gerrit.wikimedia.org/r/455282 https://gerrit.wikimedia.org/r/455285 [06:09:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:22:31] (03CR) 10Zoranzoki21: [C: 04-1] "For now, for MR70, whitelist is not needed. Until now, he have only three patches created on gerrit." [integration/config] - 10https://gerrit.wikimedia.org/r/455226 (owner: 10Gergő Tisza) [08:07:47] (03PS1) 10Hashar: Add ChangeUserPasswords extension [integration/config] - 10https://gerrit.wikimedia.org/r/455289 [08:08:17] (03CR) 10Hashar: [C: 032] Add ChangeUserPasswords extension [integration/config] - 10https://gerrit.wikimedia.org/r/455289 (owner: 10Hashar) [08:10:42] (03Merged) 10jenkins-bot: Add ChangeUserPasswords extension [integration/config] - 10https://gerrit.wikimedia.org/r/455289 (owner: 10Hashar) [08:14:50] PROBLEM - SSH on integration-slave-docker-1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:24:42] RECOVERY - SSH on integration-slave-docker-1004 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u5 (protocol 2.0) [08:46:11] (03CR) 10Umherirrender: "T197898" [integration/config] - 10https://gerrit.wikimedia.org/r/455144 (owner: 10Hashar) [08:53:53] (03CR) 10Thiemo Kreuz (WMDE): "I had a look at the original sniff, and find it a little scary. You see, I would love to be able to continue using at least this style:" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 (owner: 10Umherirrender) [09:00:17] (03PS2) 10Umherirrender: Enable Squiz.Functions.FunctionDeclarationArgumentSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 [09:00:40] (03CR) 10Umherirrender: "The sniff is safe for newline uses, I have add your example as generic_pass" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 (owner: 10Umherirrender) [09:04:24] (03PS3) 10Umherirrender: Enable Squiz.Functions.FunctionDeclarationArgumentSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 [09:08:10] (03CR) 10Thiemo Kreuz (WMDE): [C: 031] "Very nice, thanks a lot!" (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 (owner: 10Umherirrender) [09:17:59] (03PS4) 10Umherirrender: Enable Squiz.Functions.FunctionDeclarationArgumentSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 [09:19:26] (03CR) 10Umherirrender: "[Spelling mistakes are for free]" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 (owner: 10Umherirrender) [10:14:42] 10Phabricator: Convert RT links in Bugzilla comments in links in Phabricator tasks - https://phabricator.wikimedia.org/T874 (10Aklapper) >>! In T874#4531707, @Dzahn wrote: > Kind of unfortunate though because not all tickets from RT have been imported and the ones that have are mostly set to NDA and you only fin... [10:24:03] 10Phabricator: Convert RT links in Bugzilla comments in links in Phabricator tasks - https://phabricator.wikimedia.org/T874 (10Aklapper) >>! In T874#4531714, @Peachey88 wrote: > Should we look at transferring over the queues that didn't transitioned over? Out of scope / offtopic for this task. See T38 and subta... [12:05:54] 10Gerrit: Can not change group membership in gerrit as a group member anymore - https://phabricator.wikimedia.org/T173337 (10MarcoAurelio) [ not a Phabricator-hosted repo; nothing for #repository-admins ] [12:11:53] 10Gerrit: Can not change group membership in gerrit as a group member anymore - https://phabricator.wikimedia.org/T173337 (10Krenair) 05Open>03stalled a:03Florian [12:15:04] 10Continuous-Integration-Config: CI fail in mediawiki/extensions/SemanticExpressiveness repository because of fail-archived-repositories - https://phabricator.wikimedia.org/T202810 (10Zoranzoki21) [12:16:33] 10Continuous-Integration-Config: CI fail in mediawiki/extensions/SemanticExpressiveness repository because of fail-archived-repositories - https://phabricator.wikimedia.org/T202810 (10Zoranzoki21) https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/SemanticExpressiveness/+/455320/ [12:24:20] 10Continuous-Integration-Config: CI fail in mediawiki/extensions/SemanticExpressiveness repository because of fail-archived-repositories - https://phabricator.wikimedia.org/T202810 (10MarcoAurelio) 05Open>03Invalid This is one of the intances of {T190671}. https://www.mediawiki.org/wiki/Extension:Semantic_E... [12:29:42] 10Continuous-Integration-Config: CI fail in mediawiki/extensions/SemanticExpressiveness repository because of fail-archived-repositories - https://phabricator.wikimedia.org/T202810 (10Zoranzoki21) >>! In T202810#4532237, @MarcoAurelio wrote: > This is one of the intances of {T190671}. > > https://www.mediawiki.... [12:45:29] (03CR) 10Zoranzoki21: "With syntax in this patch is everything ok. For the time being" [integration/config] - 10https://gerrit.wikimedia.org/r/446185 (owner: 10Saint Johann) [13:09:55] (03CR) 10Saint Johann: "I did this patch as an advice from people in #wikimedia-tech where they told me that it was OK for me to add myself in config, but I don’t" [integration/config] - 10https://gerrit.wikimedia.org/r/446185 (owner: 10Saint Johann) [14:39:23] PROBLEM - Puppet errors on deployment-memc06 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [14:49:22] RECOVERY - Puppet errors on deployment-memc06 is OK: OK: Less than 1.00% above the threshold [0.0] [19:56:01] (03CR) 10Legoktm: [C: 031] "It's fine to add you to the whitelist, I'll merge this later today" [integration/config] - 10https://gerrit.wikimedia.org/r/446185 (owner: 10Saint Johann) [20:30:01] 10Release-Engineering-Team (Watching / External), 10Core-Platform-Team, 10PoolCounter, 10Patch-For-Review: Fix tests of PoolCounter extension - https://phabricator.wikimedia.org/T178517 (10Legoktm) Snipped from an email I wrote: > Tangentially, it would be nice if we could get the poolcounter daemon > tes... [21:34:55] legoktm: which CI testing would be good for mediawiki/services/poolcounter ? [21:35:07] and for their /deploy repo if it exists [21:36:12] (I responded in -dev) [21:36:18] ack [23:35:04] (03CR) 10Legoktm: [C: 032] Enable Squiz.Functions.FunctionDeclarationArgumentSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 (owner: 10Umherirrender) [23:36:02] (03Merged) 10jenkins-bot: Enable Squiz.Functions.FunctionDeclarationArgumentSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 (owner: 10Umherirrender) [23:36:35] (03CR) 10jenkins-bot: Enable Squiz.Functions.FunctionDeclarationArgumentSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 (owner: 10Umherirrender) [23:42:29] (03PS5) 10Legoktm: Enable Squiz.PHP.NonExecutableCodeSniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/449159 (https://phabricator.wikimedia.org/T168465) (owner: 10Matěj Suchánek) [23:44:09] (03CR) 10Legoktm: [C: 032] "PS5: Updated the tests to make them pass" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/449159 (https://phabricator.wikimedia.org/T168465) (owner: 10Matěj Suchánek) [23:45:01] (03Merged) 10jenkins-bot: Enable Squiz.PHP.NonExecutableCodeSniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/449159 (https://phabricator.wikimedia.org/T168465) (owner: 10Matěj Suchánek) [23:45:29] 10MediaWiki-Codesniffer, 10Patch-For-Review: add or create sniff to detect unreachable code after break in switch - https://phabricator.wikimedia.org/T168465 (10Legoktm) 05Open>03Resolved [23:45:47] (03CR) 10jenkins-bot: Enable Squiz.PHP.NonExecutableCodeSniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/449159 (https://phabricator.wikimedia.org/T168465) (owner: 10Matěj Suchánek)