[00:35:52] 03Scap3, 06Discovery, 06Maps: Failed to rollback scap3 deployment - https://phabricator.wikimedia.org/T142792#2546430 (10Yurik) [01:20:52] 03Scap3, 06Discovery, 06Maps: Failed to rollback scap3 deployment - https://phabricator.wikimedia.org/T142792#2546578 (10thcipriani) p:05Triage>03High So it looks like scap tried to restart the service on the canary host and then reach out the port 6533. The timeout for this check is 120 seconds, so that... [01:35:27] RECOVERY - Host deployment-parsoid05 is UP: PING OK - Packet loss = 0%, RTA = 0.87 ms [01:46:10] PROBLEM - Host deployment-parsoid05 is DOWN: CRITICAL - Host Unreachable (10.68.16.120) [02:27:55] Project selenium-QuickSurveys » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #116: 04FAILURE in 14 min: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/116/ [03:01:19] (03CR) 10BryanDavis: Change check_message_ok test text (031 comment) [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304249 (owner: 10BryanDavis) [03:06:05] (03CR) 10BryanDavis: "> Should we validate the length of the change id? Looks good" [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304247 (https://phabricator.wikimedia.org/T142672) (owner: 10BryanDavis) [03:11:06] (03CR) 10BryanDavis: "Filed change-id validation as T142801" [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304247 (https://phabricator.wikimedia.org/T142672) (owner: 10BryanDavis) [03:30:20] (03PS1) 10BryanDavis: Allow lines >100 chars if they are URLs [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304430 (https://phabricator.wikimedia.org/T142800) [03:34:12] (03CR) 10Legoktm: [C: 032] Add support for Depends-On statements [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304247 (https://phabricator.wikimedia.org/T142672) (owner: 10BryanDavis) [03:34:40] (03Merged) 10jenkins-bot: Add support for Depends-On statements [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304247 (https://phabricator.wikimedia.org/T142672) (owner: 10BryanDavis) [03:34:42] (03Merged) 10jenkins-bot: Add python artifacts to .gitignore [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304248 (owner: 10BryanDavis) [03:34:44] (03Merged) 10jenkins-bot: Change check_message_ok test text [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304249 (owner: 10BryanDavis) [03:35:28] legoktm: I'll do the change-id validation thing next. Then probably follow up with some test refactoring. [03:35:47] that 1 test all the things test is a bit goofy [03:36:26] I was kind of wondering about Signed-Off-By lines too [03:36:42] I know they are used occasionally by some devs [03:36:58] hmm yeah [03:38:48] (03CR) 10Legoktm: "Can we optionally allow the url to be wrapped in <...>?" [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304430 (https://phabricator.wikimedia.org/T142800) (owner: 10BryanDavis) [03:39:03] yeah, good call [03:39:32] There's also headers like Co-Authored-By and stuff [03:40:03] *nod* I wonder if there is a good list of such things? [03:40:06] I wonder if we're better off using a blacklist for things we definitely don't want like Task: [03:40:30] could be easier than adding things whack a mole style [03:41:37] and a generic "^\S: " rule to say that such things come after the body, separated by a blank line, and before Change-Id: ? [03:42:29] I'm not sure how useful being as pedantic as we are right now really is [03:42:33] except for the first line, where that's used to denote the component [03:42:37] probably not [03:43:19] yeah the only validation on the subject is line length [03:43:26] which seems right really [03:43:45] I'd like to run the validator against the last 100 or 1000 merged commits in mw/core and see what false positives are triggered...didn't have time for that today though [03:44:14] *nod* seems a useful sanity check [03:51:59] (03PS2) 10BryanDavis: Allow lines >100 chars if they are URLs [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304430 (https://phabricator.wikimedia.org/T142800) [03:52:59] (btw, unclosed braces/parenthesis are absolutely terrible ;-) [03:53:22] (whatever [03:53:33] (03CR) 10Legoktm: [C: 032] Allow lines >100 chars if they are URLs [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304430 (https://phabricator.wikimedia.org/T142800) (owner: 10BryanDavis) [03:54:01] (03Merged) 10jenkins-bot: Allow lines >100 chars if they are URLs [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304430 (https://phabricator.wikimedia.org/T142800) (owner: 10BryanDavis) [03:54:25] if g.reg-g was watching here he would have closed that paren by now [03:55:18] :P [03:55:33] * legoktm goes afk, might be on later tonight [03:55:39] o/ [04:05:49] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #106: 04FAILURE in 9 min 48 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/106/ [04:24:25] bd808: my LISP days were very formative [04:25:19] caaaaaaaar [04:43:58] (03PS1) 10BryanDavis: Validate Change-Id and Depends-On values [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304432 (https://phabricator.wikimedia.org/T142801) [04:46:24] (03PS2) 10BryanDavis: Validate Change-Id and Depends-On values [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304432 (https://phabricator.wikimedia.org/T142801) [06:16:17] 10Beta-Cluster-Infrastructure: Enable Quiz Extension on ca.wikipedia.beta.wmflabs.org for testing - https://phabricator.wikimedia.org/T142692#2546872 (10Toniher) @greg that's right. Thanks! [06:54:35] (03PS5) 10Lethexie: Add detection for calling global functions in target classes. [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/301335 [08:05:09] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 TLS Redirect - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 588 bytes in 0.002 second response time [08:06:17] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 TLS Redirect - string 'Wikipedia' not found on 'http://en.m.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 590 bytes in 0.004 second response time [09:56:53] (03CR) 10Legoktm: [C: 032] Validate Change-Id and Depends-On values [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304432 (https://phabricator.wikimedia.org/T142801) (owner: 10BryanDavis) [09:57:22] (03Merged) 10jenkins-bot: Validate Change-Id and Depends-On values [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304432 (https://phabricator.wikimedia.org/T142801) (owner: 10BryanDavis) [11:01:05] 07Browser-Tests, 10Continuous-Integration-Config, 10MediaWiki-extensions-RelatedArticles, 06Reading-Web-Backlog: RelatedArticles browser tests should run on a commit basis - https://phabricator.wikimedia.org/T120715#2547410 (10bmansurov) a:05bmansurov>03None Not working on this currently. [13:07:21] 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10Parsoid, 06Services, 15User-mobrovac: Deploy Parsoid with scap3 - https://phabricator.wikimedia.org/T120103#2547752 (10mobrovac) a:03mobrovac [13:28:27] 10Beta-Cluster-Infrastructure, 06Operations: Check status of under_NDA group - https://phabricator.wikimedia.org/T142822#2547816 (10AlexMonk-WMF) To get input about deployment-prep you need to add #Beta-Cluster-Infrastructure (excluding the list of members/sudoUser) ```dn: cn=under_NDA,ou=sudoers,cn=deploymen... [14:33:19] Project selenium-WikiLove » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #110: 04FAILURE in 1 min 18 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/110/ [14:50:39] 06Release-Engineering-Team, 06Developer-Relations, 06Project-Admins: Clean #Wiki-Release-Team project - https://phabricator.wikimedia.org/T88263#2547969 (10Danny_B) [14:52:22] 06Release-Engineering-Team, 06Developer-Relations, 06Project-Admins: Clean #Wiki-Release-Team project - https://phabricator.wikimedia.org/T88263#1007725 (10Danny_B) Still 5 tagged open tasks... [15:22:38] 06Release-Engineering-Team, 06Developer-Relations, 06Project-Admins: Clean #Wiki-Release-Team project - https://phabricator.wikimedia.org/T88263#2548046 (10greg) >>! In T88263#2547969, @Danny_B wrote: > Still 5 tagged open tasks... Which are all also tagged with #mediawiki-stakeholders-group, so we're fine/... [15:23:17] 06Release-Engineering-Team, 06Developer-Relations, 06Project-Admins: Clean #Wiki-Release-Team project - https://phabricator.wikimedia.org/T88263#2548047 (10greg) To be clear: there is nothing else to do here, please don't remove those tasks from that archived project. [15:46:52] 06Release-Engineering-Team, 06Developer-Relations, 06Project-Admins: Clean #Wiki-Release-Team project - https://phabricator.wikimedia.org/T88263#1007725 (10Danny_B) That's what I expected, hence why I didn't reopen it, but only left a note just in case... ;-) [16:17:34] PROBLEM - Puppet run on deployment-parsoid09 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:21:48] PROBLEM - Puppet run on deployment-changeprop is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:31:32] (03CR) 10Dduvall: [C: 032] Replace git.wikimedia.org with diffusion links [integration/raita] - 10https://gerrit.wikimedia.org/r/296879 (https://phabricator.wikimedia.org/T139089) (owner: 10Paladox) [16:31:47] (03Merged) 10jenkins-bot: Replace git.wikimedia.org with diffusion links [integration/raita] - 10https://gerrit.wikimedia.org/r/296879 (https://phabricator.wikimedia.org/T139089) (owner: 10Paladox) [16:46:29] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [16:52:15] PROBLEM - SSH on deployment-redis02 is CRITICAL: Server answer [16:57:34] RECOVERY - Puppet run on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [17:58:31] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T142855#2548545 (10greg) [17:58:46] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T142117#2523354 (10greg) [17:59:51] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T139217#2548568 (10greg) 05Open>03Resolved [17:59:57] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.13 deployment blockers - https://phabricator.wikimedia.org/T139215#2548569 (10greg) 05Open>03Resolved [18:00:04] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.12 deployment blockers - https://phabricator.wikimedia.org/T139214#2548570 (10greg) 05Open>03Resolved [18:19:56] !log deploying 2ef24f2 to ores-beta in sca03 [18:20:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:31:13] Did we intentionally start bouncing people accessing the Beta Cluster into HTTPS? It seems to break a few things… [18:41:45] We did a while ago [18:41:51] But HTTPS just broke [18:41:54] I think I know why [18:42:07] (I changed it earlier, probably my change broke it) [18:45:22] James_F, fixed [18:45:41] Krenair: Aha, thanks. [18:45:55] Umm. [18:46:09] Did someone push a Beta Feature that breaks Special:Contributions [18:46:29] it appears blank to you too? [18:46:44] No, [V64ZWQpEEaoAAFXd2lAAAAAA] /wiki/Special:Contributions/Jdforrester_(WMF) MWException from line 436 of /srv/mediawiki/php-master/extensions/Flow/includes/Formatter/AbstractQuery.php: Accessing non-existent parameter: oresc_probability [18:46:54] ORES Beta Feature breakage. [18:47:07] Will file a bug with ORES. [18:47:14] (Goes away if I opt out of that BF.) [18:47:30] I get a blank page at https://en.wikipedia.beta.wmflabs.org/wiki/Special:Contributions while logged out :S [18:47:39] oh, duh, logged out [18:47:45] … yeah. [18:47:47] :-) [18:48:12] Hi! Do we do any systematic collecting or monitoring of JS console errors? [18:48:27] AndyRussG, I don't think so. Last I heard there was a task open to do so [18:48:27] AndyRussG: No. [18:48:34] sadly no, Sentry was being explored but it dropped of of the priority list [18:49:01] James_F, it might've been https://gerrit.wikimedia.org/r/#/c/264608/ [18:49:07] https://phabricator.wikimedia.org/tag/sentry/ [18:49:34] Krenair: James_F: greg-g: ah K thx... Here is the task that I'm looking at https://phabricator.wikimedia.org/T139439 [18:50:02] (03PS1) 10BryanDavis: Make rules for footer contents less strict [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304496 (https://phabricator.wikimedia.org/T142804) [18:50:13] Basically it used to be that CentralNotice could load on edit pages and loaded modules but didn't show on Special pages [18:50:15] Krenair: Good catch; have filed https://phabricator.wikimedia.org/T142858 [18:51:10] Since the error comes from Flow it's possible that the commit may work standalone, but not when you enable both that and flow? [18:51:22] We stopped that and haven't heard of any ill side effects, really there shouldn't be any, but it'd be nice to be more certain [18:51:43] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T140971#2548698 (10greg) [18:58:21] Krenair: Plausibly, yeah; they might be wrongly assuming that everyone implements the same interface. Will leave it to them. [18:58:25] greg-g: Thanks. [19:00:31] meh, upload is broken in beta [19:00:57] the instance it runs on (deployment-cache-upload04) is still broken [19:01:22] it'll do TLS on port 443, but not respond to HTTP requests on that or port 80. it will ping too [19:02:14] no SSH, no salt [19:02:45] James_F: as editing PM, have you heard of any unexplained issues that may have been due to the removal of CentralNotice RL modules from edit pages (where action=edit in URL params) by any chance? Thx in advance! [19:03:32] Or is there anyone else you think I might ping to ask? [19:03:53] AndyRussG: I haven't heard of that kind of issue. Might be worth asking quiddity and other CLs though as they know everything. :-) [19:04:21] ah heh good idea :) [19:04:56] not currently aware of any, but I'll check and ask the team (those not in Europe) [19:05:08] AndyRussG, ^ [19:05:18] quiddity: cool beans, thanks so much!! Here's the Phab task, BTW https://phabricator.wikimedia.org/T139439 [19:05:54] Thanks quiddity! [19:07:10] aha [19:07:17] it eventually returns an nginx timeout [19:07:24] which makes me think varnish on the box has died [19:07:44] (504 Gateway Time-out from nginx/1.11.1) [19:08:06] it got listed in https://phabricator.wikimedia.org/T141673#2546759 too [19:11:06] ohh but I can see the console logs now, great [19:11:26] wonder why that was broken yesterday [19:16:57] AndyRussG, I've checked a few obvious places (VPT and meta:Talk:CentralNotice) and can't see anything mentioned. I've told the CLs (and Seddon) to keep an eye out, and to ping you or that task in case of issues. EOM. :) [19:18:32] quiddity: ah K thx much!! Ah yes should have thought to ask Seddon...... [19:20:29] ok, upload fixed [19:24:18] ty Krenair, what was the issue? [19:24:41] greg-g, instance was one of a bunch that had broken. got stuck on certain processes [19:24:50] when I last looked at it, SSH and salt had died. this time varnish had gone too [19:24:51] gotcha [19:24:58] rebooted it, everything came back up [19:26:12] greg-g, the HTTPS patch is pretty much ready to merge afaict. Brandon will look on Monday :) [19:26:13] sweet success [19:26:18] w00t [19:26:31] so then monitoring in shinken (outside of deployment-prep :/) will work again [19:26:36] (theoretically) [19:51:14] 10Continuous-Integration-Config, 07Jenkins, 07Puppet: There is no sane way to get arcanist's conduit tokens onto nodepool CI slaves - https://phabricator.wikimedia.org/T140417#2548797 (10mmodell) [19:52:14] 10Continuous-Integration-Config, 07Jenkins, 07Puppet: There is no sane way to get arcanist's conduit tokens onto nodepool CI slaves - https://phabricator.wikimedia.org/T140417#2463982 (10mmodell) a:05hashar>03mmodell [19:52:21] 10Continuous-Integration-Config, 07Jenkins, 07Puppet: There is no sane way to get arcanist's conduit tokens onto nodepool CI slaves - https://phabricator.wikimedia.org/T140417#2463982 (10mmodell) p:05Normal>03Low [19:57:54] 10Deployment-Systems, 06Release-Engineering-Team (Long-Lived-Branches): create `scap swat` command (the successor to make-wmf-branch) - https://phabricator.wikimedia.org/T140918#2548803 (10mmodell) [19:57:57] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3 (Scap3-MediaWiki-MVP): make scap3 look in PWD to find local CLI extensions - https://phabricator.wikimedia.org/T142590#2548802 (10mmodell) 05Open>03Resolved [20:00:32] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3 (Scap3-MediaWiki-MVP): make scap3 look in PWD to find local CLI extensions - https://phabricator.wikimedia.org/T142590#2548810 (10mmodell) [20:18:16] 10Deployment-Systems, 06Release-Engineering-Team (Long-Lived-Branches): create `scap swat` command (the successor to make-wmf-branch) - https://phabricator.wikimedia.org/T140918#2480927 (10bd808) nitpick, but wouldn't `scap branch` or `scap release` be a better command name? We use swat to mean something compl... [20:28:34] 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible, 07I18n: On Beta Cluster, MediaWiki namespace override is inconsistently applied - https://phabricator.wikimedia.org/T142863#2549208 (10Mattflaschen-WMF) [20:31:35] 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible, 03Collab-Team-Q1-July-Sep-2016, 07I18n: On Beta Cluster, MediaWiki namespace override is inconsistently applied - https://phabricator.wikimedia.org/T142863#2549253 (10Mattflaschen-WMF) a:03Mattflaschen-WMF [20:59:27] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T140971#2549363 (10Ladsgroup) [21:11:44] 03Scap3, 06Discovery, 06Maps: Failed to rollback scap3 deployment - https://phabricator.wikimedia.org/T142792#2549382 (10thcipriani) [21:58:07] Project selenium-PageTriage » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #112: 04FAILURE in 6.9 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/112/ [21:58:08] Project selenium-PageTriage » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #112: 04FAILURE in 6.2 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/112/ [21:59:53] thcipriani: is it somehow not possible to deploy a local change with scap3? I'm trying on a dev server that I run and the git tag keeps being added pointing to origin/HEAD rather than HEAD [22:01:01] hmm [22:01:14] it should be possible to deploy a local only change. [22:01:43] it looks like scap deploy -r HEAD worked [22:01:57] but I guess that wsa what I expected the default behavior to be [22:02:03] yeah, that should work, also should be able to define that in the scap.cfg file [22:03:08] hmm, should be the default behavior...https://github.com/wikimedia/scap/blob/master/scap/deploy.py#L469-L471 [22:03:52] ah ha! [22:04:12] I have copy-pasta of "git_rev: origin/master" in my scap.cfg [22:04:23] ah, yeah, that'll do it :) [22:04:47] is that in the docs? [22:04:49] * thcipriani checks [22:05:02] I just copied the config from ores [22:05:13] * bd808 never reads the docs ;) [22:06:03] :D [22:07:31] the symlink switching tripped me up for a bit while debugging too [22:08:00] I had a shell open in /srv/deployment/striker/deploy which is a symlink to the current tree [22:08:11] I was confused for a bit as to why things wheren't changing [22:08:43] then I realized that the symlink had moved and I had to cd back to the dir again to get to the live version [22:08:56] yeah, there have been a couple of trip ups with it. [22:09:21] space issues is another thing [22:09:37] instant deploy and rollback though [22:10:13] shouldn't git pretty much give you that already? [22:10:20] near instant anyway [22:10:44] I guess git-fat assets might be different [22:11:39] yeah, plus the config file stuff capabilities that are a little...configuration management-y [22:12:25] has the weird side benefit of not having un-gitified stuff in your repo checkout directory [22:13:14] also saves the hassle of dealing with corrupted checkouts to some degree. Removes a certain class of problems, or I guess exchanges them for another class, but a more manageable one. [22:13:26] *nod* [22:29:37] any idea what's wrong with this build? https://integration.wikimedia.org/ci/job/selenium-CentralAuth/113/console [22:32:10] 10Continuous-Integration-Infrastructure, 06Labs: Request increased quota for labs project - https://phabricator.wikimedia.org/T142877#2549610 (10yuvipanda) [22:32:22] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 06Release-Engineering-Team: Identify metric (or metrics) that gives a useful indication of user-perceived (Wikimedia developer) service of CI - https://phabricator.wikimedia.org/T139771#2442096 (10yuvipanda) [22:32:24] 10Continuous-Integration-Infrastructure, 06Labs: Request increased quota for labs project - https://phabricator.wikimedia.org/T142877#2549626 (10yuvipanda) [22:33:49] 10Continuous-Integration-Infrastructure, 06Labs: Request increased quota for labs project - https://phabricator.wikimedia.org/T142877#2549610 (10yuvipanda) [22:34:16] hello wonderful releng people! [22:34:28] I couldn't find a task for the instance quota increase request for contintcloud [22:34:40] so I made https://phabricator.wikimedia.org/T142877 and started adding blocking tasks [22:34:52] on things that need to happen before the request can be considered [22:34:57] 10Continuous-Integration-Infrastructure, 06Labs: Request increased quota for contintcloud labs project - https://phabricator.wikimedia.org/T142877#2549634 (10tom29739) [22:35:46] 10Continuous-Integration-Infrastructure, 05Continuous-Integration-Scaling, 06Release-Engineering-Team: Identify metric (or metrics) that gives a useful indication of user-perceived (Wikimedia developer) service of CI - https://phabricator.wikimedia.org/T139771#2549637 (10chasemp) We had an outage for CI 2 ni... [22:35:56] note that I'm not trying to add any new requirements right now, only recording what I am told is the current situation. If that is not accurate, feel free to comment. [22:37:00] 10Continuous-Integration-Infrastructure, 06Labs: Request increased quota for contintcloud labs project - https://phabricator.wikimedia.org/T142877#2549641 (10yuvipanda) 05Open>03stalled Copying from T139771#2549637 > We had an outage for CI 2 night ago and during that we discovered that nodepool seems to... [22:38:56] 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible, 03Collab-Team-Q1-July-Sep-2016, 07I18n: On Beta Cluster, MediaWiki namespace override is inconsistently applied - https://phabricator.wikimedia.org/T142863#2549649 (10Mattflaschen-WMF) [22:39:30] greg-g, bd808, tgr, any thoughts on T142863 ? [22:39:46] The bug itself is weird, and even weirder, it's not reproducible in shell. [22:41:02] matt_flaschen: the "web but not shell" bugs I ran into in the past had to do with HHVM caching and required a service restart to solve [22:41:14] weird. How does the path for MW namespace message overrides work? Is there a cache somewhere? [22:42:25] bd808, yeah. I was digging into it, and I think the cache was empty (but then I realized when I first "reproduced it" in shell I was on the wrong wiki, so I pulled up and then couldn't reproduce it on enwiki in shell). [22:42:57] tgr, what kind of caching? There has been no relevant recent code change. [22:43:10] (That I know of) [22:44:38] some stuff is cached in PHP arrays which then get opcode cached [22:44:45] messages probably aren't though [22:44:50] if there is APC caching then it would only get cleared by timeout or an hhvm restart [22:45:18] apc seems like a place we would stuff message lookups [22:45:49] but I wouldn't expect that to last for >1-5 minutes [22:45:55] should work the same way for web and cli though? [22:46:12] I'd give that a big "maybe" [22:46:24] APC is in-process cache [22:46:41] so you'd have to be inside the same hhvm parent process [22:46:43] which maybe mwrepl does [22:46:50] mwscript certainly does not [22:47:29] APC in hhvm works differently than APC in PHP5/7 [22:47:29] I'm not sure the APC user cache is in-process [22:47:52] bd808, I think mwrepl normally creates its own context. I think there is a way to attach to a web request, though. [22:47:54] in hhvm APC is just an unbounded(!) array [22:48:14] matt_flaschen: in any case if you are less interested in figuring out what's wrong and more interested in fixing it, just restart hhvm, that will probably help [22:48:28] 10Deployment-Systems, 06Release-Engineering-Team (Long-Lived-Branches): create `scap swat` command (the successor to make-wmf-branch) - https://phabricator.wikimedia.org/T140918#2549685 (10mmodell) @bd808: I kinda conflated two things in this one ticket. This is the result of discussion that occurred in a rece... [22:48:35] tgr, by now I am interested in figuring it out. [22:48:37] 10Deployment-Systems, 06Release-Engineering-Team (Long-Lived-Branches): create `scap merge` command (the successor to make-wmf-branch) - https://phabricator.wikimedia.org/T140918#2549686 (10mmodell) [22:49:08] bd808, tgr, have either of you debugged the PHP of Beta web requests? [22:49:46] matt_flaschen: only with live hacks and log tailing [22:50:12] is there any other way? [22:50:42] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: Create `scap swat` command to automate patch merging & testing during a swat deployment - https://phabricator.wikimedia.org/T142880#2549692 (10mmodell) [22:50:45] in theory you could attach to the in-process hhvm debugger and set breakpoints [22:51:05] ah, yeah, hhvm has its own debugger [22:51:11] never tried using it [22:51:39] 10Deployment-Systems, 06Release-Engineering-Team (Long-Lived-Branches), 03releng-201617-q1, 07Epic: Merge to deployed branches instead of cutting a new deployment branch every week. - https://phabricator.wikimedia.org/T89945#2549706 (10mmodell) [22:51:43] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: Create `scap swat` command to automate patch merging & testing during a swat deployment - https://phabricator.wikimedia.org/T142880#2549705 (10mmodell) [22:52:19] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3 (Scap3-MediaWiki-MVP): make scap3 look in PWD to find local CLI extensions - https://phabricator.wikimedia.org/T142590#2549722 (10mmodell) [22:52:21] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: Create `scap swat` command to automate patch merging & testing during a swat deployment - https://phabricator.wikimedia.org/T142880#2549692 (10mmodell) [22:53:11] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: Create `scap swat` command to automate patch merging & testing during a swat deployment - https://phabricator.wikimedia.org/T142880#2549692 (10mmodell) [22:53:14] 10Deployment-Systems, 06Release-Engineering-Team (Long-Lived-Branches), 03releng-201617-q1, 07Epic: Merge to deployed branches instead of cutting a new deployment branch every week. - https://phabricator.wikimedia.org/T89945#1050666 (10mmodell) [22:53:16] 10Deployment-Systems, 06Release-Engineering-Team (Long-Lived-Branches): create `scap merge` command (the successor to make-wmf-branch) - https://phabricator.wikimedia.org/T140918#2549723 (10mmodell) [22:53:27] bd808, tgr, ebernhardson has tricks for doing it, but a. It doesn't look like they work on Beta, and b. I need it so rarely I forget the exact details and have to go through IRC logs. [22:53:58] *nod* [22:54:12] If I needed to do it I woudl grab him and ask how :) [22:54:59] I've done more live debugging on mw1017 in practice than I have on beta cluster hosts [22:55:22] it's hard to keep jenkins from undoing the hacks long enough to test things [22:56:15] don't you just need to disable puppet? [22:59:26] (03PS1) 10Legoktm: Add script to test already merged commits in a repository [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304583 [23:00:43] (03CR) 10jenkins-bot: [V: 04-1] Add script to test already merged commits in a repository [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304583 (owner: 10Legoktm) [23:01:25] (03PS2) 10Legoktm: Add script to test already merged commits in a repository [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/304583 [23:03:52] bd808: heh, look at the commit message for https://gerrit.wikimedia.org/r/#/c/195297/ [23:04:01] tgr, alright, I'm punting for now. How do I restart hhvm, just sudo service hhvm restart ? Will I have rights, or do I need to ask someone? [23:04:09] it fails on your patch :P [23:06:46] matt_flaschen: yeah, just ssh into the appservers and do a service restart; if you are a labs admin for deployment-prep you can do it [23:07:00] tgr, yeah, I did. Testing now. [23:07:12] there is a nicer way with salt that I don't know the details of but with only 3 appservers there is not much difference [23:08:43] legoktm: it fails with the new loose patch? [23:08:48] yeah [23:08:52] because of Special:Login [23:08:55] I'm commenting on the bug [23:09:02] ah right [23:09:18] I knew that was going to suck sometimers [23:09:41] do you have a good idea on how to avoid that? [23:09:46] tgr, yeah, that solved the symptoms. [23:10:11] We could go with "labels aren't labels until you are in the footer" I guess [23:10:42] the definition of the footer right now is the first "xxx: ..." line following a blank linke [23:10:43] *line [23:10:46] 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible, 07I18n: On Beta Cluster, MediaWiki namespace override is inconsistently applied - https://phabricator.wikimedia.org/T142863#2549822 (10Mattflaschen-WMF) a:05Mattflaschen-WMF>03None [23:12:07] 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible, 07I18n: On Beta Cluster, MediaWiki namespace override is inconsistently applied - https://phabricator.wikimedia.org/T142863#2549208 (10Mattflaschen-WMF) Wasn't able to reproduce it in shell, so I restarted hhvm (at tgr's suggestion), which worked ar... [23:13:25] legoktm: if we went with the footer logic then the thing that may slip through the cracks is a "Bug: xxx" line that isn [23:13:32] Isn't actually in the footer [23:13:56] but I guess I could special case a few known footer components for more checking [23:14:36] why is wikibugs not talking about it in here? [23:14:47] bd808: yeah, that's basically what I proposed in https://phabricator.wikimedia.org/T142804#2549843 [23:14:55] to have a whitelist of footer things that we check [23:15:34] there were more failures from your patch, but those were issues in the patch itself, not in the proposed logic [23:17:03] I've got a little time before I head out for the evening. I'll see how far I can get [23:19:54] thanks :) [23:25:25] 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 06Labs, 10Labs-Infrastructure, 15User-greg: Create incident report for CI outage on Aug 10th - https://phabricator.wikimedia.org/T142887#2549861 (10greg) [23:46:20] (03PS1) 10Legoktm: Use tox-jessie zuul template [integration/config] - 10https://gerrit.wikimedia.org/r/304587 [23:46:22] (03PS1) 10Legoktm: Move tox-jessie & co. off of nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/304588 [23:53:09] (03PS1) 10Legoktm: Rename "*node-4.3" to "*node-4" [integration/config] - 10https://gerrit.wikimedia.org/r/304590 [23:53:29] (03CR) 10Paladox: "this was already done" [integration/config] - 10https://gerrit.wikimedia.org/r/304590 (owner: 10Legoktm) [23:53:45] (03CR) 10Paladox: "See https://gerrit.wikimedia.org/r/#/c/304068/ please" [integration/config] - 10https://gerrit.wikimedia.org/r/304590 (owner: 10Legoktm) [23:53:51] (03PS4) 10Paladox: Rename npm-node-4.3 to npm-node-4 [integration/config] - 10https://gerrit.wikimedia.org/r/304068 [23:54:25] (03CR) 10Legoktm: [C: 04-1] "This doesn't update zuul/parameter_functions.py" [integration/config] - 10https://gerrit.wikimedia.org/r/304068 (owner: 10Paladox) [23:54:45] (03Abandoned) 10Legoktm: Rename "*node-4.3" to "*node-4" [integration/config] - 10https://gerrit.wikimedia.org/r/304590 (owner: 10Legoktm) [23:54:59] (03CR) 10Legoktm: [C: 032] Use tox-jessie zuul template [integration/config] - 10https://gerrit.wikimedia.org/r/304587 (owner: 10Legoktm) [23:55:06] (03PS5) 10Paladox: Rename npm-node-4.3 to npm-node-4 [integration/config] - 10https://gerrit.wikimedia.org/r/304068 [23:55:48] legoktm im wondering do we want to migrate npm-node-4 of nodepool? [23:55:56] yes, that was what I was going to do next [23:56:00] Oh :) [23:56:01] thanks [23:56:31] guess we should migrate everything off, and let releng investigate why nodepool keeps going down, probaly file some bugs upstream [23:56:32] :) [23:56:49] I don't think we can migrate everything off [23:56:55] Oh [23:57:01] we only have 2 jessie slaves right now [23:57:04] Oh [23:57:20] so I'm migrating the fast jobs right now [23:57:24] Ok [23:57:25] thanks [23:57:34] the majority of these jobs are less than a minute [23:57:45] !log deploying https://gerrit.wikimedia.org/r/304587, no-o [23:57:46] p [23:57:48] !log p [23:57:49] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:57:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:58:00] :) [23:58:48] do you want to review https://gerrit.wikimedia.org/r/#/c/304588/ while I deploy your node change? [23:59:03] (03PS6) 10Legoktm: Rename npm-node-4.3 to npm-node-4 [integration/config] - 10https://gerrit.wikimedia.org/r/304068 (owner: 10Paladox) [23:59:27] legoktm yeh ok [23:59:31] thanks [23:59:59] Your welcome