[00:14:21] PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [00:15:25] Hmm [00:15:50] Does anyone else think we can get away with an extension having to have a vagrant role (or at least, a WIP patch in gerrit) as a pre-requisite for requesting review? [00:17:15] greg-g: ^ [00:19:22] wont that make it harder to get reviews for extensions? [00:19:35] Since some patches can go un reviewed for years like some of mine. [00:19:48] I mean, review for deployment [00:19:55] on WMF wikis [00:20:01] Not arbitary review of other extensions [00:20:10] In most cases, writing a vagrant role to do it is super trivial [00:21:29] oh [00:21:31] i see [00:24:01] 10Continuous-Integration-Config, 10MinervaNeue, 10Patch-For-Review, 10Ruby, 10User-zeljkofilipin: Setup CI on Minerva repo - https://phabricator.wikimedia.org/T166750#3428665 (10Jdlrobson) [00:24:23] 10Continuous-Integration-Config, 10MinervaNeue, 10Patch-For-Review, 10Ruby, 10User-zeljkofilipin: Setup CI on Minerva repo - https://phabricator.wikimedia.org/T166750#3306463 (10Jdlrobson) 05Open>03Resolved a:03Jdlrobson [00:28:02] (03PS1) 10AnotherLadsgroup: Remove myself from tests whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/364628 [00:29:38] Reedy: Do you think you can merge this? https://gerrit.wikimedia.org/r/#/c/364628/ [00:29:58] Amir1: I can, but I can never easily deploy stuff... [00:30:02] legoktm: About? Could you? :) [00:30:11] yep [00:30:23] * legoktm hugs Amir1 [00:30:28] Does it needs deployment, I didn't know [00:30:29] (03CR) 10Legoktm: [C: 032] Remove myself from tests whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/364628 (owner: 10AnotherLadsgroup) [00:30:35] Thanks :) <3 [00:30:59] Yeah, gotta get it onto teh servers [00:31:30] (03Merged) 10jenkins-bot: Remove myself from tests whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/364628 (owner: 10AnotherLadsgroup) [00:32:30] continuous deployment (like for beta cluster) would be nice, but that might be a dream if the infra is not set to work like that [00:32:47] Thanks legoktm [00:35:06] !log deploying https://gerrit.wikimedia.org/r/364628 [00:35:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:35:39] Amir1: you want to continuously deploy the continuous deployment system? ;) [00:36:30] dogfood [00:37:00] yeah, like inception [00:39:37] https://en.wikipedia.org/wiki/Self-hosting [00:49:23] RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0] [02:46:10] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<11.11%) [03:19:16] 10Release-Engineering-Team (Kanban), 10Developer-Wishlist (2017), 10Phabricator (Upstream), 10Upstream: Duplicate tasks are not listed in or near the description of the target task - https://phabricator.wikimedia.org/T883#3428864 (10mmodell) 05Open>03Resolved a:03mmodell [03:23:51] 10Release-Engineering-Team (Kanban), 10Phabricator (Upstream), 10Upstream: Impossible to unsubscribe from calendar event - https://phabricator.wikimedia.org/T113446#3428869 (10mmodell) 05stalled>03Resolved a:03mmodell I suspect this has been resolved since we haven't heard further complaints. Please re... [03:52:25] 10Gitblit-Deprecate, 10MW-1.30-release-notes (WMF-deploy-2017-05-23_(1.30.0-wmf.2)), 10Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#3428898 (10TerraCodes) [04:16:52] Yippee, build fixed! [04:16:53] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #450: 09FIXED in 20 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/450/ [05:59:09] 10Continuous-Integration-Infrastructure, 10Cloud-Services, 10Beta-Cluster-reproducible, 10Puppet: New instances attached to a role::puppetmaster::standalone Puppetmaster need manual changes after switching from the default Puppetmaster - https://phabricator.wikimedia.org/T148929#3429039 (10bd808) [06:35:38] Hello! [06:36:01] I was going to push a wmf config change for db-eqiad.php and ran into this on tin: https://phabricator.wikimedia.org/P5727 [06:43:35] RECOVERY - Free space - all mounts on deployment-kafka03 is OK: OK: All targets OK [07:09:26] PROBLEM - Puppet staleness on deployment-eventlogging03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [07:11:12] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:11:18] This has been solved by Joe, I have sent an email for follow this issue up [08:44:25] RECOVERY - Puppet staleness on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [3600.0] [09:45:25] PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [10:20:23] RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:35:47] (03PS1) 10Robert Vogel: Extenson:NSFileRepo add composer unit tests [integration/config] - 10https://gerrit.wikimedia.org/r/364697 [12:00:33] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Platform-Team, 10MediaWiki-extensions-WikimediaIncubator, and 2 others: Make creating a new Language project easier - https://phabricator.wikimedia.org/T165585#3430124 (10Amire80) [12:04:50] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Platform-Team, 10MediaWiki-extensions-WikimediaIncubator, and 2 others: Make creating a new Language project easier - https://phabricator.wikimedia.org/T165585#3430148 (10Amire80) [13:03:46] 10Continuous-Integration-Config, 10Wikidata, 10Patch-For-Review, 10User-Tobi_WMDE_SW, 10WMDE-QWERTY-Team-Board: E-Mail notification on failures of Wikidata-builds - https://phabricator.wikimedia.org/T152495#3430322 (10Addshore) p:05Normal>03Low [13:04:04] 10Continuous-Integration-Config, 10Wikidata, 10Patch-For-Review, 10User-Tobi_WMDE_SW, 10WMDE-QWERTY-Team-Board: E-Mail notification on failures of Wikidata-builds - https://phabricator.wikimedia.org/T152495#2850267 (10Addshore) I'm going to go ahead and mark this as low was we are trying to get rid of th... [13:32:54] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:46:20] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #457: 04FAILURE in 2 min 19 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/457/ [13:46:23] PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [14:21:22] RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:23:26] PROBLEM - Puppet errors on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:33:42] Yippee, build fixed! [14:33:43] Project selenium-WikiLove » firefox,beta,Linux,BrowserTests build #452: 09FIXED in 1 min 42 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/452/ [15:01:11] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:11:53] PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:21:13] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:29:35] PROBLEM - Puppet errors on deployment-tmh01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:33:06] PROBLEM - Puppet errors on integration-slave-docker-1000 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:36:27] 10Gitblit-Deprecate, 10MW-1.30-release-notes (WMF-deploy-2017-05-23_(1.30.0-wmf.2)), 10Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#2561337 (10Umherirrender) Git references in the i18n shims are not needed to fix, because the files should be r... [15:47:36] Project selenium-MobileFrontend » chrome,beta,Linux,BrowserTests build #489: 04FAILURE in 25 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/489/ [16:13:11] PROBLEM - Puppet errors on deployment-ms-be03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:20:33] PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:21:55] RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [16:23:59] PROBLEM - Puppet errors on deployment-eventlogging04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:29:35] RECOVERY - Puppet errors on deployment-tmh01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:43:47] PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:54:03] PROBLEM - Puppet errors on deployment-db03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:59:49] RainbowSprinkles: just making coffee and then ill be ready [16:59:59] * RainbowSprinkles sips his coffee [17:03:49] RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [17:07:46] 10MediaWiki-Codesniffer: Always use leading zeros in php files - https://phabricator.wikimedia.org/T170442#3431441 (10Umherirrender) [17:08:08] RainbowSprinkles: ready when you are :) [17:08:48] So https://gerrit.wikimedia.org/r/358141 would be the starting point [17:08:50] Let's take it to #-operations [17:09:03] RECOVERY - Puppet errors on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [17:12:23] PROBLEM - Check for valid instance states on labnodepool1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:16:03] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [140.0] [17:20:04] hmm, shows no tests running [17:20:10] (ie stuck) [17:20:34] i see this 1 hr 53 min for how long the test has been running. [17:21:40] (03PS1) 10Chad: Fix phpunit test for make-release [tools/release] - 10https://gerrit.wikimedia.org/r/364788 [17:21:42] (03PS1) 10Chad: Adding MinervaNeue extension [tools/release] - 10https://gerrit.wikimedia.org/r/364789 [17:22:44] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Epic, 10Services (attic), 10Wikimedia-Hackathon-2015: Meeting: Automatic deployment of backend services on beta cluster - https://phabricator.wikimedia.org/T100099#3431538 (10GWicke) [17:22:47] !log CI is backed up, only one nodepoll instance running for the last long while, many in building [17:22:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:22:53] jdlrobson: Let's hold off a bit, because ^ [17:23:07] We'll let our current stuff merge, but let's get CI unclogged before continuing [17:30:47] FYI we are working on the nodepool issue in -cloud [17:31:22] looks like it might be a problem related to some stuff Andrew was starting to work on [17:31:55] We are shutting down nodepool completely as part of the attempt to fix [17:32:03] RainbowSprinkles, greg-g: ^ [17:32:11] K [17:32:37] bd808: thanks, tyler's in there (-cloud) now as point [17:34:28] PROBLEM - nodepoold running on labnodepool1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [17:37:09] (03CR) 10Addshore: [C: 031] Add Squiz.Classes.SelfMemberReference to ruleset [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/363525 (owner: 10Legoktm) [17:39:05] RECOVERY - Puppet errors on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0] [17:39:40] 10Beta-Cluster-Infrastructure: On beta commons, thumbnailing of 3D files is broken still - https://phabricator.wikimedia.org/T170444#3431671 (10MarkTraceur) [17:45:30] (03CR) 10Addshore: [C: 031] Add composer-install & use in composer-test-mwextension [integration/config] - 10https://gerrit.wikimedia.org/r/354522 (https://phabricator.wikimedia.org/T165316) (owner: 10Aude) [17:46:53] RECOVERY - nodepoold running on labnodepool1001 is OK: PROCS OK: 1 process with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [17:51:30] 10Beta-Cluster-Infrastructure, 10Thumbor: On beta commons, thumbnailing of 3D files is broken still - https://phabricator.wikimedia.org/T170444#3431742 (10mmodell) [17:52:39] !log nodepool is back to making instances and running jobs, thanks Cloud team [17:52:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:54:12] 10Beta-Cluster-Infrastructure, 10Thumbor: On beta commons, thumbnailing of 3D files is broken still - https://phabricator.wikimedia.org/T170444#3431671 (10mmodell) Nobody from #releng are familiar with #thumbor, maybe @gilles can take a look at it? [17:55:46] jdlrobson: So, with the backup in CI, we've already lost an hour. What I think we should do is finish the merges + backports, get the l10n built, but stop short of actually enabling it anywhere yet. [17:56:01] Train is in an hour, so I don't wanna move too many things just before that [17:56:07] RainbowSprinkles: okay [17:56:28] Means MobileFrontend would need to be r/o another day, or be willing to backport changes [17:56:29] But that'd be it [17:56:32] RainbowSprinkles: once we've done that can i merge the change to MobileFrontend? [17:56:42] im a bit anxious with the skin defined in two places [17:56:49] Well, it won't be defined yet. [17:56:57] We wouldn't *load* the new extension [17:57:00] The code would just be out there [17:57:02] with l10n [17:57:20] Existing, but never required() [17:58:00] RainbowSprinkles: can https://gerrit.wikimedia.org/r/#/c/362448/ be merged to master? given that won't go out on the train today? [17:58:13] Yes, that can go to master [17:58:15] okay cool [17:58:19] 10Deployment-Systems, 10Services (watching): Use semantic versioning for services (for consistency with mediawiki core) - https://phabricator.wikimedia.org/T102550#3431779 (10mmodell) So this should be resolved? [17:58:30] as long as we can do that and rely on Minerva being there next train i'm fine with whatever [17:59:02] Minerva (the code, not enabling) will also be on *this* train [17:59:09] So we've primed everything [17:59:21] 10Deployment-Systems, 10Services (watching): Use semantic versioning for services (for consistency with mediawiki core) - https://phabricator.wikimedia.org/T102550#3431795 (10GWicke) 05Open>03Resolved a:03GWicke Yup, was mostly waiting for your feedback as the owner. Resolving. [17:59:44] RainbowSprinkles: and next train Minerva will be enabled right? [18:00:04] Well, we're planning to enable tomorrow, right? [18:00:14] I figured [18:00:20] We wanted the whole thing done [18:01:56] RainbowSprinkles: okay that's great too [18:02:11] RainbowSprinkles: :) [18:02:21] Basically, we'll get *most* of the work done today [18:02:24] And finish up tomorrow :) [18:02:29] yup understand now [18:02:35] This one is a little more complicated than just rolling out a new extension ;-) [18:03:07] And tbh, we can prep all the commits for tomorrow by COB today, so tomorrow will be just some +2s and syncs [18:03:32] (03CR) 10Chad: [C: 032] Fix phpunit test for make-release [tools/release] - 10https://gerrit.wikimedia.org/r/364788 (owner: 10Chad) [18:07:01] PROBLEM - Puppet errors on deployment-stream is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:14:02] (03CR) 10Chad: [V: 032 C: 032] Fix phpunit test for make-release [tools/release] - 10https://gerrit.wikimedia.org/r/364788 (owner: 10Chad) [18:14:09] (03CR) 10Chad: [V: 032 C: 032] Adding MinervaNeue extension [tools/release] - 10https://gerrit.wikimedia.org/r/364789 (owner: 10Chad) [18:14:46] 10Deployment-Systems, 10Operations, 10Services (attic): Evaluate Docker as a container deployment tool - https://phabricator.wikimedia.org/T93439#3431909 (10GWicke) We are working on this as part of {T170453}. [18:15:21] 10Release-Engineering-Team, 10Operations, 10Services (watching): 2017/18 Program 6: Streamlined Service delivery - https://phabricator.wikimedia.org/T170453#3431889 (10GWicke) [18:15:34] 10Release-Engineering-Team, 10Operations, 10Services (watching): 2017/18 Program 6: Streamlined Service delivery - https://phabricator.wikimedia.org/T170453#3431889 (10GWicke) p:05Triage>03Normal [18:16:02] 10Release-Engineering-Team, 10Operations, 10Services (watching): FY2017/18 Program 6: Streamlined Service delivery - https://phabricator.wikimedia.org/T170453#3431917 (10GWicke) [18:18:46] !log things are back to a bad state, chase etc investigating [18:18:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:18:59] !logs where "things" == nodepool instance delete/creation [18:19:04] !log where "things" == nodepool instance delete/creation [18:19:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:19:25] (03PS1) 10Ejegg: Get rid of npm test for CiviCRM [integration/config] - 10https://gerrit.wikimedia.org/r/364804 [18:19:58] greg-g: time to change the topic? :o [18:20:10] (the channel topic) [18:20:49] sure [18:27:01] RECOVERY - Puppet errors on deployment-stream is OK: OK: Less than 1.00% above the threshold [0.0] [18:32:36] 10Beta-Cluster-Infrastructure, 10Thumbor: On beta commons, thumbnailing of 3D files is broken still - https://phabricator.wikimedia.org/T170444#3431967 (10Gilles) Thumbor doesn't support those yet, you guys have to figure out how to deploy 3d2png properly first. There's a change cherry-picked for it on beta, b... [18:34:00] 10Beta-Cluster-Infrastructure, 10Multimedia, 10Thumbor: On beta commons, thumbnailing of 3D files is broken still - https://phabricator.wikimedia.org/T170444#3431971 (10greg) [18:37:41] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (doing): Streamlined Service delivery Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456#3431976 (10GWicke) [18:38:11] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (watching): FY2017/18 Program 6: Streamlined Service delivery - https://phabricator.wikimedia.org/T170453#3431889 (10GWicke) [18:38:57] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (doing): Streamlined Service delivery Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456#3431976 (10GWicke) [18:39:17] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (doing): Streamlined Service delivery Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456#3431976 (10GWicke) [18:39:21] 10MediaWiki-Releasing, 10MediaWiki-Containers, 10Services (doing), 10User-mobrovac, 10Wikimedia-Hackathon-2015: Ready-to-use Docker package for MediaWiki - https://phabricator.wikimedia.org/T92826#3431993 (10GWicke) [18:39:45] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (doing): Streamlined Service delivery Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456#3431996 (10GWicke) [18:40:16] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (doing): Streamlined Service delivery Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456#3431976 (10GWicke) [18:42:26] 10Release-Engineering-Team, 10MediaWiki-Containers, 10Operations, 10Epic, and 3 others: Streamlined Service delivery Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456#3432001 (10mobrovac) [18:43:02] PROBLEM - Puppet errors on deployment-memc04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [18:47:08] RainbowSprinkles: it finally merged [18:47:27] I saw, it looks like there's also wmf.7 and 9 branches [18:47:31] But point to different sha1s [18:47:40] Should rebranch from master [18:48:03] Right? [18:48:27] I think that nodepool is working again, although things are super slow [18:49:08] {{done}} [18:49:23] andrewbogott: Things are moving, but gonna take awhile to catch up [18:49:43] Gate-and-submit queue is almost caught up [18:49:46] Which is most important [18:49:48] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (watching): FY2017/18 Program 6: Streamlined Service delivery - https://phabricator.wikimedia.org/T170453#3432015 (10greg) [18:50:12] 10Release-Engineering-Team, 10MediaWiki-Containers, 10Operations, 10Epic, and 3 others: Streamlined Service Delivery: Outcome 2, Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456#3432016 (10greg) [18:50:35] mobrovac: whatever kind of naming scheme you want to use, but I made it clear it was part of outcome 2 ^ per https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2017-2018/Draft/Programs/Technology#Program_6._Streamlined_service_delivery [18:51:41] thnx greg-g! [18:52:00] dash vs colon vs comma blah [18:52:57] vs vs vs :) [18:55:18] Ah, it's already in wmf.9 [18:58:02] RECOVERY - Puppet errors on deployment-memc04 is OK: OK: Less than 1.00% above the threshold [0.0] [18:58:20] RainbowSprinkles: gonna grab lunch soon, do you need me to stick around? [18:58:27] Nah, I'm good [18:58:58] RainbowSprinkles: after lunch i will merge https://gerrit.wikimedia.org/r/#/c/362448/ and then https://gerrit.wikimedia.org/r/362451 [18:59:15] that will make sure Minerva is ready to go tomorrow [18:59:38] So tomorrow we'll just backport those [18:59:42] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:59:46] RainbowSprinkles: sweet [19:01:44] Hi relengers! [19:02:18] the latest nodepool fun reminded me I wanted to consolidate the npm test and composer test for CiviCRM [19:02:57] hashar had given us a patch that linted json files in composer test [19:03:10] but we ended up going with one that did it in npm test [19:03:40] now that we're using composer test to lint yaml files, I figured it made sense to do the json linting there too [19:04:09] the patch in the CRM repo is ready to merge, we just need this CI patch too: [19:04:19] https://gerrit.wikimedia.org/r/364804 [19:04:28] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): blubber: produces empy ENV statements - https://phabricator.wikimedia.org/T170285#3432160 (10dduvall) [19:10:25] (03PS1) 10Chad: Revert "Adding MinervaNeue extension" [tools/release] - 10https://gerrit.wikimedia.org/r/364824 [19:10:34] (03CR) 10Chad: [V: 032 C: 032] Revert "Adding MinervaNeue extension" [tools/release] - 10https://gerrit.wikimedia.org/r/364824 (owner: 10Chad) [19:11:05] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #2: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/2/ [19:19:40] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [19:22:02] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Improve Blubber unit test coverage - https://phabricator.wikimedia.org/T168001#3432261 (10dduvall) 05Open>03Resolved [19:24:37] 10Release-Engineering-Team (Backlog), 10Operations, 10Services (later), 10Wikimedia-Incident: Review new service 'pre-deployment to production' checklist - https://phabricator.wikimedia.org/T141897#3432284 (10GWicke) [19:25:01] PROBLEM - Puppet errors on deployment-mediawiki06 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:29:58] thcipriani: hi, how do i deploy scap changes. Since i do scap deploy and passes but looking on target it still has the same source code. [19:30:22] I am trying to revert to a specfic phab commit for phabricator as it is segfaulting in the latest change [19:30:29] please [19:32:12] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:32:58] paladox: what is the sha1 currently at head on your deployment server? Is that sha1 listed in the /srv/deployment/[repo]-cache/revs/ dir? [19:33:10] hmm will check [19:33:43] nothing seems to be in the cache folder [19:34:39] wat [19:34:48] though there is on the host [19:34:49] what does scap deploy-log -v show? [19:34:53] (not deployment host) [19:34:59] the target? [19:35:02] yep [19:35:04] yeah, that's where it sould be [19:35:15] ah ok [19:35:20] should have specified, on the target the rev directory should exist [19:35:42] shows [19:35:43] 8039e54eb72bc2c370a7b296a5af651f18d9efd4 [19:35:55] is that the rev you're trying to deploy? [19:36:04] ah yeh [19:36:06] looks like it [19:36:25] ok, where does the symlink for /srv/deployment/[repo] on the target point? [19:36:28] 10Deployment-Systems, 10Performance-Team, 10Technical-Debt: Replace xenon subscriber in wmf-config/StartProfile with Arc-Lamp - https://phabricator.wikimedia.org/T103462#1390824 (10Krinkle) [19:37:08] deployment -> deployment-cache/revs/8039e54eb72bc2c370a7b296a5af651f18d9efd4 [19:37:45] hrm [19:37:48] it's deployed the change that i wanted now :) [19:37:49] thanks [19:37:55] sure :) [19:37:55] PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [19:37:59] looks like segfault is in stable [19:38:04] twentyafterfour ^^ [19:38:58] paladox: ? [19:39:08] i will get the logs now [19:39:21] https://phabricator.wikimedia.org/P5738 [19:39:25] twentyafterfour ^^ [19:39:30] daemons are segfaulting [19:39:40] with something to do with [19:39:40] [ 3148.221271] php[8651]: segfault at 7f38a0f529a0 ip 00007f38a0f529a0 sp 00007ffefbee1188 error 14 in libffi.so.6.0.2[7f38a3fcc000+7000] [19:41:15] weird [19:41:46] yep [19:42:12] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:43:36] 10Beta-Cluster-Infrastructure, 10Scap (Scap3-Adoption-Phase1), 10scap2, 10Analytics, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#3432367 (10GWicke) Is there still anything actionable left on this task, or is it time to declare victory? [19:44:59] RECOVERY - Puppet errors on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [19:47:00] hmm fails on the known working version too [19:47:01] strange [19:47:15] then maybe it isnt actually a bug, but the unit file [19:48:04] (03PS1) 10Thcipriani: Update the php symlink after group1 promote [tools/release] - 10https://gerrit.wikimedia.org/r/364833 [19:48:20] hmm maybe [19:48:28] but running bin/phd throws seg fault too [19:49:17] 10Beta-Cluster-Infrastructure, 10Puppet, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3432385 (10Nuria) [19:49:21] 10Beta-Cluster-Infrastructure, 10Scap (Scap3-Adoption-Phase1), 10scap2, 10Analytics, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#3432384 (10Nuria) 05Open>03Resolved [19:49:55] 10Beta-Cluster-Infrastructure, 10Scap (Scap3-Adoption-Phase1), 10scap2, 10Analytics, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1743135 (10Nuria) Victory it is. [19:50:16] 10Continuous-Integration-Infrastructure, 10Math, 10RESTBase, 10Services: Restbase should be reachable from Jenkins - https://phabricator.wikimedia.org/T130783#2146318 (10GWicke) We are working on moving to container-based testing as part of {T170453}. Each test run will likely have its own cluster of servi... [19:50:39] 10Continuous-Integration-Infrastructure, 10Math, 10RESTBase, 10Services (attic): Restbase should be reachable from Jenkins - https://phabricator.wikimedia.org/T130783#3432400 (10GWicke) [19:50:52] what is libffi.so? [19:51:54] libffi6 [19:53:57] paladox: google it [19:54:04] already done that [19:54:09] :) [19:54:41] so when is the last time you did NOT have a segfault [19:54:53] and what did you change since then BESIDES the phab version itself [19:55:12] the last time was last night [19:55:24] is there a stretch upgrade involved here? [19:55:27] ok [19:55:29] and i did a mariadb upgrade today [19:55:35] ! [19:55:45] roll that back and see if segfault stays?? [19:55:54] i doint think i can do that [19:56:01] since it is not easy rolling back [19:56:02] db updates [19:56:13] if that is really the case, delete it and create a new one [19:56:32] hmm i will see if i can roll back first [20:01:53] PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:05:02] aha [20:05:08] php is the one segfaulting [20:05:14] mutante ^^ [20:05:30] * paladox https://phabricator.wikimedia.org/P5739 [20:07:01] !log Update mobileapps to d30dae2 [20:07:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:08:06] paladox: HAH, wow.. is that a new version? [20:08:13] Nope [20:08:20] from what i tell in the apt logs [20:12:22] PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:13:26] paladox: do you have quota for one more instance? [20:13:32] nope [20:13:43] will need to probaly request some [20:13:45] paladox: take the "sundayroast" instance [20:13:49] ok [20:13:50] and just test this [20:13:51] thanks [20:13:54] dont install phab [20:13:59] just install that php version [20:14:02] ok [20:14:03] and do the "php -v" [20:14:09] and check if you really get that segfault again [20:14:22] ok [20:15:06] mutante it's tuesdaytaco now [20:15:10] also php is not on there. [20:15:14] should i install it? [20:15:42] paladox: lol right, taco. yes do it.. just treat the whole instance as "throw-away" [20:15:49] ok [20:15:50] later kill it [20:15:51] thanks :) [20:15:53] np [20:16:19] oh right, php 5 is not on stretch [20:16:23] lol [20:16:28] it's 7 [20:33:21] RainbowSprinkles: so Minerva is on the cluster now right? Just not enabled? [20:33:31] I need to rebuild l10n [20:33:33] But ya [20:33:53] RainbowSprinkles: is it enabled on beta cluster? [20:33:55] or can it be? [20:33:58] Nope [20:34:00] It can be [20:34:09] yes please - will give me more confidence [20:35:28] jdlrobson: a couple of $alcoholicDrinks should have the same effect [20:35:36] Reedy: haha [20:35:48] trying not to interrupt browser test service [20:35:56] Let's go ahead and do the extension-list change. That'll get l10n built everywhere [20:36:01] Then we can do the wfLoadExtension() patch [20:36:04] To enable it on beta [20:36:09] wfLoadSkin you mean? [20:37:00] RainbowSprinkles: CommonSettings-labs.php >? [20:37:08] I'll do it [20:37:13] I don't like using -labs [20:37:19] RainbowSprinkles: roger. Lemme know when patch is up [20:37:19] Just do it in CommonSettings with a flag [20:37:55] RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [20:43:18] 10Beta-Cluster-Infrastructure, 10Wikidata: [betalabs] Wikidata sitelinks does not link to betalabs - https://phabricator.wikimedia.org/T170020#3432739 (10jmatazzoni) [20:43:24] Yippee, build fixed! [20:43:24] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #3: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/3/ [20:52:23] RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:21] Yippee, build fixed! [20:58:21] Project selenium-MobileFrontend » chrome,beta,Linux,BrowserTests build #490: 09FIXED in 26 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/490/ [21:01:56] RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:03:45] i am going to delete phabricator and recreate it to see if it is reproducible reliably. [21:05:42] taken out of context that sounds really scary :P [21:08:30] i meant the labs instance :) [21:15:36] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (watching): FY2017/18 Program 6 - Outcome 2: Developers are able to develop and test their applications through a unified pipeline towards production deployment. - https://phabricator.wikimedia.org/T170480#3432874 (10greg) [21:15:56] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (watching): FY2017/18 Program 6 - Outcome 2: Developers are able to develop and test their applications through a unified pipeline towards production deployment. - https://phabricator.wikimedia.org/T170480#3432874 (10greg) [21:16:00] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (watching): FY2017/18 Program 6: Streamlined Service delivery - https://phabricator.wikimedia.org/T170453#3431889 (10greg) [21:16:01] 10Release-Engineering-Team, 10MediaWiki-Containers, 10Operations, 10Epic, and 3 others: Streamlined Service Delivery: Outcome 2, Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456#3432891 (10greg) [21:17:11] 10Release-Engineering-Team, 10MediaWiki-Containers, 10Operations, 10Epic, and 3 others: FY2017/18 Program 6 - Outcome 2 - Objective 3: Integrated, container-based development environment - https://phabricator.wikimedia.org/T170456#3431976 (10greg) [21:19:09] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (watching): FY2017/18 Program 6 - Outcome 2 - Objective 2: Set up a continuous integration and deployment pipeline - https://phabricator.wikimedia.org/T170481#3432906 (10greg) [21:20:27] 10Release-Engineering-Team, 10Mathoid, 10Epic: Define functional tests for Mathoid running on the staging Kubernetes cluster for use in future gating decisions - https://phabricator.wikimedia.org/T170482#3432925 (10greg) [21:21:41] 10Release-Engineering-Team, 10Epic: Define method for monitoring and reacting to the mathoid functional tests - https://phabricator.wikimedia.org/T170483#3432940 (10greg) [21:21:58] 10Release-Engineering-Team (Kanban), 10Mathoid, 10Epic: Define functional tests for Mathoid running on the staging Kubernetes cluster for use in future gating decisions - https://phabricator.wikimedia.org/T170482#3432925 (10greg) [21:22:16] 10Release-Engineering-Team (Kanban), 10Mathoid: Define functional tests for Mathoid running on the staging Kubernetes cluster for use in future gating decisions - https://phabricator.wikimedia.org/T170482#3432925 (10greg) [21:22:19] 10Release-Engineering-Team (Kanban): Define method for monitoring and reacting to the mathoid functional tests - https://phabricator.wikimedia.org/T170483#3432940 (10greg) [21:24:28] RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] [21:24:58] 10Scap, 10WorkType-NewFunctionality: Play elevator music while scap is running - https://phabricator.wikimedia.org/T170484#3432960 (10demon) [21:31:32] lol ^^ [21:35:11] 10Release-Engineering-Team, 10Documentation: Require vagrant role for extensions wanting review for WMF deployment - https://phabricator.wikimedia.org/T170488#3433042 (10Reedy) [21:40:33] 10Scap, 10WorkType-NewFunctionality: Play elevator music while scap is running - https://phabricator.wikimedia.org/T170484#3432960 (10Dzahn) ``` File: Elevator Music Drum Loop Details: Elevator Music On Hold Artist: miafas Format: .WAV CC License: Noncommercial Category: Drum Loop Size: 4 MB ``` source: http:... [21:41:31] ^ click audio player in phab comment [21:41:46] lol [21:43:15] surely we have something on commons already? [21:43:23] and chad ruins afternoon productivity [21:44:16] greg-g: https://www.youtube.com/watch?v=6g4dkBF5anU [21:44:52] Well I also have a few dj friends [21:45:01] Could ask one to compose us a track [21:45:01] 10Release-Engineering-Team, 10Documentation: Require vagrant role for extensions wanting review for WMF deployment - https://phabricator.wikimedia.org/T170488#3433042 (10greg) I really like this idea and even if each extension we already have deployed doesn't have a corresponding vagrant role that's OK for now... [21:45:09] Should be able to put something together in FL [21:45:10] p858snake: that one is great :) [21:45:30] 10Continuous-Integration-Infrastructure, 10Cloud-VPS, 10Patch-For-Review: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3433155 (10Andrew) For a dramatic change, I'm going to merge a patch that doubles the spawn time from 5 seconds to 10 seconds. If... [21:46:43] 10Continuous-Integration-Infrastructure, 10Cloud-VPS, 10Nodepool, 10Patch-For-Review: figure out if nodepool is overwhelming rabbitmq and/or nova - https://phabricator.wikimedia.org/T170492#3433162 (10greg) [21:50:27] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #4: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/4/ [22:06:52] greg-g or thcipriani do you know what a healthy nodepool looks like? I restarted it and now I'm watching to see it spawn new instances, and it… maybe isn't? [22:07:09] It says it is but I don't see them [22:07:47] * thcipriani looks [22:07:54] oh wait, there is one finally! [22:07:59] Maybe it's fine and I'm just impatient... [22:08:12] or else I somehow made it 20x slower when trying to make it 2x slower... [22:09:42] it appears to be doing stuff...I'm watching both /var/log/nodepool/debug.log and nodepool list [22:10:21] yep, agreed, it seems fine now. I don't know what the long pause was [22:10:30] but I'm going to let it be for a while and we'll see [22:11:32] fwiw, I've noticed that at when first started it really tries to take stock of the state of the world and sometimes stumbles and flails around in the process -- as evidenced by various error messages in the debug log [22:15:23] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10Mathoid: Define functional tests for Mathoid running on the staging Kubernetes cluster for use in future gating decisions - https://phabricator.wikimedia.org/T170482#3433371 (10greg) [22:15:33] 10Release-Engineering-Team (Kanban), 10releng-201718-q1: Define method for monitoring and reacting to the mathoid functional tests - https://phabricator.wikimedia.org/T170483#3433373 (10greg) [22:17:38] nodepool sounds like me in the morning [22:21:39] Yippee, build fixed! [22:21:40] Project selenium-CentralAuth » firefox,beta,Linux,BrowserTests build #455: 09FIXED in 1 min 38 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/455/ [22:23:30] 10Release-Engineering-Team, 10Operations, 10Epic, 10Services (watching): FY2017/18 Program 6: Streamlined Service delivery - https://phabricator.wikimedia.org/T170453#3433422 (10GWicke) [22:46:48] 10Scap (Scap3-Adoption-Phase1), 10releng-201516-q4, 10releng-201718-q1, 10Trebuchet: [keyresult] Migrate remaining trebuchet deployed services - https://phabricator.wikimedia.org/T129290#3433569 (10greg) [22:46:51] 10Scap (Scap3-Adoption-Phase1), 10scap2, 10Cassandra, 10RESTBase-Cassandra: Deploy logstash logback encoder with scap3 - https://phabricator.wikimedia.org/T116340#3433568 (10greg) [22:52:26] 10Release-Engineering-Team (Watching / External), 10Phlogiston, 10User-KSmith: Adjust phlogiston configuration for Release Engineering - https://phabricator.wikimedia.org/T170359#3433659 (10greg) [22:56:35] nodepool just makes me think of deadpool [22:57:31] love deadpool [22:58:33] lol [23:06:09] jdlrobson: Were you gonna land those 2 changes to their masters? [23:06:13] So they can go ahead out to beta? [23:22:51] twentyafterfour strange segfault, i've recreated the instance and see no segfaulting now [23:28:44] jdlrobson: I take that as a yes ::p [23:30:31] I did the backports of MinervaNeue to wmf.7/9 (-2'd til tomorrow), but the MF one has merge conflicts [23:47:25] twentyafterfour i will work on diffusion later today (i say today because it's 00:47am) [23:47:37] I already have an idea where to put the download button :) [23:48:32] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [23:56:45] Yippee, build fixed! [23:56:46] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #5: 09FIXED in 21 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/5/