[00:14:10] (03PS1) 10Legoktm: Disable ruby for Echo [integration/config] - 10https://gerrit.wikimedia.org/r/436208 [00:14:24] (03CR) 10Legoktm: [C: 032] Disable ruby for Echo [integration/config] - 10https://gerrit.wikimedia.org/r/436208 (owner: 10Legoktm) [00:16:39] (03Merged) 10jenkins-bot: Disable ruby for Echo [integration/config] - 10https://gerrit.wikimedia.org/r/436208 (owner: 10Legoktm) [00:20:17] Project beta-update-databases-eqiad build #25813: 04FAILURE in 16 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/25813/ [00:55:10] 10Phabricator: External links are not displayed - https://phabricator.wikimedia.org/T195936#4241712 (10Zoranzoki21) [00:55:44] 10Phabricator: External links are not displayed - https://phabricator.wikimedia.org/T195936#4241712 (10Zoranzoki21) [01:01:29] 10Phabricator, 10User-Zoranzoki21: External links are not displayed - https://phabricator.wikimedia.org/T195936#4241744 (10Zoranzoki21) [01:20:16] Project beta-update-databases-eqiad build #25814: 04STILL FAILING in 14 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/25814/ [01:21:49] Exception: ('command: ', "echo 'aawiki'; /usr/local/bin/mwscript update.php --wiki=aawiki --quick", 'output: ', 'aawiki\n#!/usr/bin/env php\nMediaWiki 1.32.0-alpha Updater\n\noojs/oojs-ui: 0.27.1 installed, 0.27.0 required.\nError: your composer.lock file is not up to date. Run "composer update --no-dev" to install newer dependencies\n') [01:29:01] paladox: too late :) [01:29:07] heh [01:29:14] but paladox, guess what: [01:29:17] on planet2001: [01:29:23] Unpacking rawdog (2.22-1-wmf1) over (2.22-1) ... [01:29:29] :) [01:29:34] apt-get upgrade worked [01:29:44] :) [01:29:46] and it was already stretch [01:29:52] and the puppet class is ready [01:30:23] heh [01:30:38] "0.27.1 installed, 0.27.0 required.\nError: your composer.lock file is not up to date." [01:30:53] sounds like there is a "==" [01:30:59] instead of a ">=" [01:31:11] and a newer version is also considered "not up to date" [01:32:12] im guessing someone updated it in mw vendor [01:32:15] but not mw cor [01:32:16] core [01:32:21] or the other way around [01:32:37] * paladox goes now [01:36:32] cya [01:37:23] (03PS1) 10Legoktm: Add the option to skip a single stage [integration/quibble] - 10https://gerrit.wikimedia.org/r/436210 [01:47:52] (03CR) 10Krinkle: [C: 031] Add the option to skip a single stage [integration/quibble] - 10https://gerrit.wikimedia.org/r/436210 (owner: 10Legoktm) [02:24:10] Yippee, build fixed! [02:24:11] Project beta-update-databases-eqiad build #25815: 09FIXED in 4 min 9 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/25815/ [05:28:34] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<20.00%) [06:53:34] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:33:41] 10Phabricator (Upstream), 10Upstream: Option to Turn Off Status Updates in Phabricator Task-Threads - https://phabricator.wikimedia.org/T195728#4241966 (10Johnywhy) @AKlapper, Phabricator Support informed me: Aklapper can now Mute the conversation, which should stop this. Also, since he’d removed himself from... [07:46:26] 10Project-Admins, 10wikiba.se: permit a phabricator page for the FactGrid project - https://phabricator.wikimedia.org/T193071#4241972 (10samuwmde) @Olaf_Simons Do you still want to work on Phabricator for Factgrid? [07:47:14] 10Phabricator, 10User-Zoranzoki21: External links are not displayed - https://phabricator.wikimedia.org/T195936#4241973 (10Johnywhy) @Zoranzoki21 thx, but this thread has absolutely nothing to do with my post that you merged into it. Cheers [07:54:23] 10Phabricator (Upstream), 10Upstream: Quoting shouldn't readd me to a task I've unsubscribed from - https://phabricator.wikimedia.org/T76993#824352 (10Johnywhy) >>! In T76993#824543, @Qgil wrote: > A mere reply... to a comment you made. Isn't it reasonable to expect that the commenter wants you to know? No do... [08:22:21] 10Continuous-Integration-Infrastructure, 10MediaWiki-Installer, 10Patch-For-Review: WikibaseLexeme's PHPunit test failing on Jenkins due to autoloading issues - https://phabricator.wikimedia.org/T195823#4238331 (10hashar) PropertySuggester ( T195783 ) apparently no more have the issue. The tests still fail b... [08:25:22] 10Continuous-Integration-Config, 10MediaWiki-Releasing: Test all MediaWiki tarball extensions in gate for all changes to MediaWiki and each other - https://phabricator.wikimedia.org/T195932#4242017 (10hashar) [08:29:57] (03PS1) 10Hashar: Migrate GraphViz to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/436230 (https://phabricator.wikimedia.org/T183512) [08:30:14] (03CR) 10Hashar: [C: 032] Migrate GraphViz to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/436230 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [08:30:40] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#4242025 (10hashar) [08:31:50] (03Merged) 10jenkins-bot: Migrate GraphViz to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/436230 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [08:33:25] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#4242027 (10hashar) [08:33:52] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#4170903 (10hashar) [08:46:46] PROBLEM - Host deployment-ores01 is DOWN: CRITICAL - Host Unreachable (10.68.16.235) [09:02:11] is someone working/tracking the current issues in wikibase the block merges? [09:03:03] 10Phabricator, 10User-Zoranzoki21: External links are not displayed - https://phabricator.wikimedia.org/T195936#4242088 (10Aklapper) 05Open>03Invalid None of the examples are a valid URL as no protocol prefix is provided. Hence closing task as invalid. [09:03:47] 10Phabricator (Upstream), 10Upstream, 10User-Zoranzoki21: Phabricator does not turn external links without a protocol prefix into links - https://phabricator.wikimedia.org/T195936#4242090 (10Aklapper) [09:09:43] 10Phabricator, 10Pywikibot-core, 10Pywikibot-RfCs: Shall we rename Pywikibot-core Phabricator project to Pywikibot? - https://phabricator.wikimedia.org/T195893#4242100 (10Aklapper) Practically speaking, note that T76732 is not fully fixed. Now that Phab offers [[ https://www.mediawiki.org/wiki/Phabricator/P... [09:09:58] 10Project-Admins, 10Pywikibot-core, 10Pywikibot-RfCs: Shall we rename Pywikibot-core Phabricator project to Pywikibot? - https://phabricator.wikimedia.org/T195893#4242102 (10Aklapper) [09:20:05] Nikerabbit: this? https://wikitech.wikimedia.org/wiki/Incident_documentation/20180524-wikidata [09:21:51] zeljkof: no I mean https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm-jessie/45288/console [09:23:22] Nikerabbit: oh, what a mess... is there a task for it [09:23:35] I was just about to ping hashar, I think he sensed the ping :) [09:23:44] hashar: see ^ [09:23:46] client crash [09:25:47] zeljkof: that's what I was trying to figure out :D [09:27:13] Nikerabbit: ah yeah that is because whatever extension triggered that patch depends on WikibaseQuality [09:27:26] and the WikibaseQuality files cause a test in mediawiki/core to fail [09:27:34] because the file name / path does not match the namespaced class [09:27:48] I was talking about it with leszek_wmde yesterday and this morning. That is being fixed [09:28:34] Nikerabbit: there are a bunch of patches at https://gerrit.wikimedia.org/r/#/q/owner:%22WMDE-leszek+%253Cleszek.manicki%2540wikimedia.de%253E%22 [09:28:44] hashar: I see... how comes this error is not caught before it hits other extensions? Is this job not always run or what? [09:28:56] It's becoming a recurrent issue [09:29:39] fwiw https://gerrit.wikimedia.org/r/#/c/436234/ should be the fix for most cases [09:30:35] I believe the issue at hand could have only be noticed if there was a job for core changes, that would run all core tests + tests of ALL the extensions [09:30:55] but I don't know how sensible/wanted/possible is this [09:31:19] Nikerabbit: that is a change in mediawiki/core . It does run tests for some extensions but not for all of them [09:31:32] hmhmhm [09:31:36] and running all extensions tests is prone to failure [09:31:54] and CX is unlucky because it has a (soft) dependency on Wikibase [09:32:18] we tried when we migrated to hhvm a few years ago but: a) it is too long b) extensiosn conflict with each other c) mediawiki/core tests fails due to extensions altering the code behavior via hooks [09:32:31] but [09:33:01] when one send a change to CX, probably we should only run CX tests and other extensions tests with a specific label such as @integration [09:33:27] a change to CX probably ends up running the Scribunto tests that verify lua works properly which is definitely unrelated [09:33:40] that is known by us but we dont have the bandwith / time to figure out a solution [09:35:34] I guess that's worth for us to bring up at some point to see if you can be resourced for that, it can be pretty bad if a blocker appears when we need to get stuff merged [09:36:01] there is no urgency now, but would be nice if the current issue can be fixed today [09:39:26] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#4242211 (10hashar) [10:05:34] (03Draft2) 10Ayounsi: Allow netbox-deploy to subscribe to netbox [software/netbox] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/436253 [10:10:12] (03Abandoned) 10Zfilipin: Run mwext-ruby-jessie on PHP 7.0 [integration/config] - 10https://gerrit.wikimedia.org/r/436027 (https://phabricator.wikimedia.org/T195851) (owner: 10Zfilipin) [10:12:07] 10Release-Engineering-Team (Kanban), 10Collaboration-Team-Triage, 10Notifications, 10Patch-For-Review, 10User-zeljkofilipin: Mocha tests for Echo notifications - https://phabricator.wikimedia.org/T177412#4242345 (10zeljkofilipin) [10:12:10] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10User-zeljkofilipin: mwext-ruby-jessie fails with `Error: You might be using an older PHP version` - https://phabricator.wikimedia.org/T195851#4242344 (10zeljkofilipin) 05Open>03Resolved [10:13:25] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Wikimedia-Hackathon-2018, 10Services (watching), 10User-zeljkofilipin: Wikimedia Continuous Delivery Pipeline: Say What? - https://phabricator.wikimedia.org/T194940#4213478 (10zeljkofilipin) >>! In T194940#4240636, @bcampbell wrote: > New YouTube... [10:28:29] 10Continuous-Integration-Infrastructure, 10MediaWiki-Installer, 10Patch-For-Review: WikibaseLexeme's PHPunit test failing on Jenkins due to autoloading issues - https://phabricator.wikimedia.org/T195823#4242393 (10WMDE-leszek) 05Open>03Resolved a:03WMDE-leszek We're green again. Thanks a lot @Legoktm a... [11:00:06] PROBLEM - Puppet errors on deployment-snapshot01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:41:16] Did anyone happen to shut off some deployment-prep boxes today? [11:42:49] /topic public log is broken for me (in NL), http://ur1.ca/qlpgz [11:43:00] RECOVERY - Host deployment-ores01 is UP: PING OK - Packet loss = 0%, RTA = 1.17 ms [11:51:51] awight: url1.cs has been dead for ages [11:54:12] darn, I can't op myself here [11:54:39] Here's a nice URL, for anyone who's interested. https://wm-bot.wmflabs.org/browser/index.php?display=%23wikimedia-releng [12:50:06] (03CR) 10Ayounsi: [V: 032 C: 032] Allow netbox-deploy to subscribe to netbox [software/netbox] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/436253 (owner: 10Ayounsi) [13:18:47] PROBLEM - Puppet errors on deployment-maps03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:40:25] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [14:15:24] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:31:06] https://gerrit-review.googlesource.com/c/gerrit/+/181674 yay [15:38:10] Is it weird that the calendar on https://wikitech.wikimedia.org/wiki/Deployments only extends to tomorrow? [15:38:39] PROBLEM - Host integration-slave-k8s-1015 is DOWN: CRITICAL - Host Unreachable (10.68.16.112) [15:39:36] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Wikimedia-Hackathon-2018, 10Services (watching), 10User-zeljkofilipin: Wikimedia Continuous Delivery Pipeline: Say What? - https://phabricator.wikimedia.org/T194940#4243267 (10bcampbell) [15:40:06] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Someday): Get rid of Zend 5.5 tests for wmf branches - https://phabricator.wikimedia.org/T94149#4243269 (10Jdforrester-WMF) 05stalled>03Open No longer stalled. [15:40:25] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Wikimedia-Hackathon-2018, 10Services (watching), 10User-zeljkofilipin: Wikimedia Continuous Delivery Pipeline: Say What? - https://phabricator.wikimedia.org/T194940#4213478 (10bcampbell) The links should be fixed now. Thanks, @zeljkofilipin [15:42:09] PROBLEM - Host integration-slave-k8s-1017 is DOWN: CRITICAL - Host Unreachable (10.68.16.153) [15:43:00] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Quibble: Only run phpcs (and other slow, non-variant tasks?) in PHP7, not also in HHVM - https://phabricator.wikimedia.org/T195984#4243273 (10Jdforrester-WMF) [15:43:11] (03CR) 10Jforrester: "Created T195984 for the initiative." [integration/quibble] - 10https://gerrit.wikimedia.org/r/436210 (owner: 10Legoktm) [15:51:48] andrewbogott: I'm just behind, will be creating more weeks tomorrow [15:52:18] ok :) [16:09:38] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:20:11] (03PS2) 10Legoktm: Add the option to skip a single stage [integration/quibble] - 10https://gerrit.wikimedia.org/r/436210 (https://phabricator.wikimedia.org/T195984) [16:23:21] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Only run phpcs (and other slow, non-variant tasks?) in PHP7, not also in HHVM - https://phabricator.wikimedia.org/T195984#4243323 (10Legoktm) a:03Legoktm And parallel-lint under HHVM is atrociously sl... [16:28:29] (03CR) 10Jforrester: "Also fixed T94149." [integration/config] - 10https://gerrit.wikimedia.org/r/434922 (https://phabricator.wikimedia.org/T172165) (owner: 10Reedy) [16:28:52] (03CR) 10Legoktm: [C: 032] Add the option to skip a single stage [integration/quibble] - 10https://gerrit.wikimedia.org/r/436210 (https://phabricator.wikimedia.org/T195984) (owner: 10Legoktm) [16:28:53] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Someday): Get rid of Zend 5.5 tests for wmf branches - https://phabricator.wikimedia.org/T94149#4243332 (10Jdforrester-WMF) 05Open>03Resolved a:03Reedy Indeed, done last week in h... [16:29:53] (03Merged) 10jenkins-bot: Add the option to skip a single stage [integration/quibble] - 10https://gerrit.wikimedia.org/r/436210 (https://phabricator.wikimedia.org/T195984) (owner: 10Legoktm) [16:30:11] 10Continuous-Integration-Config, 10Patch-For-Review: Jenkins check for vulnerable libraries in all node.js repos - https://phabricator.wikimedia.org/T96078#1207791 (10Jdforrester-WMF) When we upgrade to npm 5.x we'll get `npm audit` which'll do this for us… [16:30:13] (03CR) 10jenkins-bot: Add the option to skip a single stage [integration/quibble] - 10https://gerrit.wikimedia.org/r/436210 (https://phabricator.wikimedia.org/T195984) (owner: 10Legoktm) [16:30:50] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Quibble, 10Patch-For-Review: Only run phpcs, parallel-lint, and other(?) slow, non-variant tasks in PHP7, not also in HHVM - https://phabricator.wikimedia.org/T195984#4243341 (10Jdforrester-WMF) [16:38:42] 10Phabricator, 10Project-Admins, 10Security-Team: Create Tag for WMDE Fundraising Security issues - https://phabricator.wikimedia.org/T194286#4243360 (10gabriel-wmde) Thank you for the explanation. At the moment I'd gravitate towards having a Space to avoid giving heart attacks the poor people who are watchi... [18:09:50] (03PS1) 10Legoktm: docker: quibble 0.0.15 [integration/config] - 10https://gerrit.wikimedia.org/r/436345 [18:10:51] (03CR) 10Legoktm: [C: 032] docker: quibble 0.0.15 [integration/config] - 10https://gerrit.wikimedia.org/r/436345 (owner: 10Legoktm) [18:12:42] (03Merged) 10jenkins-bot: docker: quibble 0.0.15 [integration/config] - 10https://gerrit.wikimedia.org/r/436345 (owner: 10Legoktm) [18:13:15] PROBLEM - Host deployment-deploy1001 is DOWN: CRITICAL - Host Unreachable (10.68.20.75) [18:15:57] (03CR) 10Krinkle: [C: 031] Add FunctionAnnotations checking tags in function comments only [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/424554 (https://phabricator.wikimedia.org/T182057) (owner: 10Thiemo Kreuz (WMDE)) [18:21:05] Krenair: so.. creating a new deployment-deploy1001 [18:21:14] you say it should be using just -01? [18:21:35] should be but I don't think it's worth the effort to replace it now that it's made [18:21:46] there is another reason to replace it.. disk space and another ticket [18:21:47] indeed, 10x is prod eqiad. [18:21:49] and it already got deleted [18:22:20] do we give up on the idea of multi-dc deployment-prep , theoretically [18:22:25] when doing that [18:22:42] I suppose not, we enforce fqdn in cloud now [18:22:51] So it just means more localised fragmentation [18:23:31] is the idea of multi-dc deployment-prep remotely doable right now? [18:23:32] Although if we want to emulate it within a single cluster, we could use other patterns for that yeah. Anyway. [18:24:15] I think the idea mostly is that once MW is free of current single-dc restrictions, there won;t be a ton of difference between a multi-node setup and a multi-dc setup. [18:25:00] ok, i'll go with -01 and follow your advice :) [18:25:04] Many of the same principles apply. Its just that we're currently ignoring many of the same problems given they're less common within a single DC. But so far lots of the multi-dc work that's been done just made things more stable within a single DC as well. [18:25:06] but .. i am not sure i can create instances [18:25:12] what is the permission that is tied to now [18:25:14] we don't use a hyphen in the name [18:25:16] projectadmin [18:25:17] it used to be project admins [18:25:18] Anyway, I suppose it's always best to remain consistent even if we want to change it later :/ [18:25:49] the "Access" tab in Horizon shows me project members but not project admins [18:26:00] i dont see a "create" link in the instance list yet [18:26:07] mutante, you now have projectadmin [18:26:27] 🎉 [18:26:29] thanks! trying [18:26:46] signs out and back in again [18:27:36] yes, "launch instance" is there, creating one with more disk [18:28:04] Krenair: i see all 3 variations in that project [18:28:13] deployment-ores01 [18:28:19] deployment-kafka-main-2 [18:28:32] deployment-mediawiki-09 [18:28:46] re: consisteny [18:29:34] deployment-deploy-01 ? i think i should follow mediawiki [18:31:41] I think those are also in the wrong [18:32:28] :pp [18:32:33] (03PS1) 10Legoktm: Bump Quibble jobs to 0.0.15 [integration/config] - 10https://gerrit.wikimedia.org/r/436349 [18:32:41] well, deployment-ores01 isn't [18:32:52] !log created instance deployment-deploy-01 with stretch and flavor x-large (T192561) [18:32:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:32:55] T192561: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 [18:32:59] apparently someone used a hyphen to show that a mediawiki appserver is using stretch [18:33:13] arg [18:33:30] dunno why kafka-main-2 is a thing [18:35:08] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Operations: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561#4243815 (10Dzahn) deployment-deploy1001 has been deleted by thcipriani. deployment-deploy-01 has been created with x-large flavor for mor... [18:36:16] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Next): deployment-tin has disk space issues - https://phabricator.wikimedia.org/T166492#3297749 (10Dzahn) - created instance deployment-deploy-01 with stretch and flavor x-large (T192561) - deployment-deploy1001 has been deleted by thcipriani [18:38:19] (03PS1) 10Legoktm: Move "composer-test" to separate job for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/436351 (https://phabricator.wikimedia.org/T195984) [18:39:56] Failed to generate additional resources using 'eval_generate': SSL_connect returned=1 errno=0 state=error: certificate verify failed: [self signed certificate in certificate chain for /CN=Puppet CA: deployment-puppetmaster03.deployment-prep.eqiad.wmflabs] [18:40:08] running puppet on the new instance fails with these cert issues [18:40:22] mutante i guess you have to remove the cert [18:40:24] first [18:40:27] (03CR) 10jerkins-bot: [V: 04-1] Move "composer-test" to separate job for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/436351 (https://phabricator.wikimedia.org/T195984) (owner: 10Legoktm) [18:40:37] it's a new host name [18:40:38] as new installs include the cert for the main labs puppet master [18:40:51] happens to me too. [18:41:09] why arent we using the default master [18:41:20] because the proect has a puppet master [18:41:22] project [18:41:50] what would i remove if this host never existed before? [18:41:56] so when puppet runs first (when it's doing its thing setting up the instance) it will change the puppet master to the local one [18:41:57] mutante: https://phabricator.wikimedia.org/T192561#4183530 under "fix certs" will fix that issue [18:42:09] we use deployment-puppetmaster02 for testing puppet patches in beta [18:42:12] reading, thanks! [18:42:16] or for beta-specific patches [18:42:26] that haven't yet been merged for one reason or another [18:42:33] it's its own problem :) [18:42:44] thcipriani i think krenair updated it to [18:42:46] deployment-puppetmaster03 [18:42:47] (03CR) 10Legoktm: [C: 04-1] Move "composer-test" to separate job for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/436351 (https://phabricator.wikimedia.org/T195984) (owner: 10Legoktm) [18:43:01] paladox: ah, didn't realize, thanks :) [18:43:13] your welcome :) [18:43:38] it's deployment-puppetmaster03 thcipriani [18:43:47] oh [18:44:00] :) [18:44:07] wanna sign my cert request there? [18:44:38] # sudo puppet cert sign deployment-deploy-01.deployment-prep.eqiad.wmflabs [18:44:38] done [18:45:00] Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key. [18:45:11] ehm [18:45:32] was there more than one request? [18:46:34] let's try it again.. [18:46:48] On the master: [18:46:49] puppet cert clean deployment-deploy-01.deployment-prep.eqiad.wmflabs [18:46:49] On the agent: [18:46:49] 1a. On most platforms: find /var/lib/puppet/ssl -name deployment-deploy-01.deployment-prep.eqiad.wmflabs.pem -delete [18:47:29] Info: Creating a new SSL key for deployment-deploy-01.deployment-prep.eqiad.wmflabs [18:47:32] Info: Caching certificate for deployment-deploy-01.deployment-prep.eqiad.wmflabs [18:47:35] Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key. [18:48:03] hold on [18:55:42] mutante, I sorted out the certificates and puppet is just taking it's time to run [18:56:43] PROBLEM - Puppet errors on deployment-deploy-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:57:23] 10Release-Engineering-Team (Watching / External), 10Operations, 10Patch-For-Review: setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4243879 (10Dzahn) The new planned window for this migration is the upcoming Friday, June 1st. (with thcipriani hoping he gets t... [18:57:34] Krenair: thank you :) [18:57:43] yes, the first run on deployment_server is LONG [18:57:51] i applied that role to it via instance puppet [18:57:56] that is matching prod [18:58:04] all other additional roles would be beta-specific [18:59:54] mutante, we usually wait for the first successful puppet run before adding roles but okay [19:00:26] we have a project-specific puppetmaster so you always have to fix up certificate stuff before it starts to work properly [19:00:41] ok [19:01:50] the very first puppet run is against the labs central puppetmaster is when it overwrites its puppet.conf to have the project-specific puppetmaster... but doesn't have a signed cert etc. for the new puppetmaster [19:02:09] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Operations: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561#4243899 (10Dzahn) applied the "role(deployment_server)" on it via instance puppet. (like in prod, no other roles yet that would differ fr... [19:03:40] 10Release-Engineering-Team (Watching / External), 10Operations, 10Patch-For-Review: setup/install/deploy deploy1001 as deployment server - https://phabricator.wikimedia.org/T175288#4243902 (10Dzahn) deployment-prep now also has a new instance using stretch with more disk space to match this (T192561#4243810) [19:11:18] it's still going... [19:11:18] may be one of the longest puppet runs I've seen [19:11:38] i can confirm this from prod when making deploy1001.. hours [19:12:23] just doing something different and we can look at it later [19:12:57] cloning all the things [19:13:09] yup [19:15:29] hours to clone stuff across the wikimedia network? [19:15:33] done [19:15:34] Notice: Applied catalog in 1360.60 seconds [19:15:51] 22 minutes 40.6 seconds [19:16:27] Krenair: I guess it clones all the git repos folks have to deploy [19:16:32] various things failing out of Scap_source [19:16:47] the disk on labs instance is not lightning fast, but the worse is probably resolving all the deltas and building the pack [19:17:01] which is cpu bounded and takes a while for sure [19:17:28] also possibly your instances might be running on a labvirt migth be too busy at this time (it happens sometime) [19:17:46] that seems faster than prod. dunno why :) [19:18:39] https://phabricator.wikimedia.org/P7189 [19:19:31] see line 2695 [19:19:43] -2725 [19:20:49] run it a second time? [19:20:57] also, run "/usr/bin/scap deploy --init" manually? [19:24:01] Krenair: "We create the file /var/lock/scap-global-lock preventing deploys, then we run scap deploy --init which fails because it checks the lock file" [19:24:14] see "Broken stuff" on https://phabricator.wikimedia.org/T192561#4183530 [19:25:13] so I gotta delete /var/lock/scap-global-lock and then run puppet? [19:25:38] unless puppet puts it back [19:26:15] it sounds like it, but the lines after that also mention other issues in relation to scap --init [19:26:20] thcipriani: hey! we're having some issues with scap and maps, would you have some time to have a look with us? [19:26:29] Or anyone else who knows something about scap [19:26:33] 3. and 5. [19:26:36] yeah it put it back [19:26:40] okay so what I'm going to do [19:26:48] delete the file [19:27:22] disable puppet, delete file, run scap command manually, re-enable puppet, run puppet ? [19:27:23] and make scap_source stuff happen before File[/var/lock/scap-global-lock] [19:27:24] more specifically, we have some templated config file, with is generated with the wrong variable substitution. We suspect we did something wrong with environments, but that might be entirely unrelated [19:27:27] gehel: I'm working on train, but I'm currently waiting for l10n updates so I've got a few minutes, what are you seeing? [19:27:32] i didnt have to do this in prod [19:27:34] ah [19:27:41] thcipriani: no emergency [19:27:41] gehel: for which service? [19:27:54] tilerator, deployed on maps-test2004 [19:29:26] ugh it uses create_resources right [19:30:08] specifically the 'source_location' variable should be '@kartotherian/meddo', but is '@kartotherian/osm-bright-source' [19:30:57] we expect the value of that var to come from /srv/deployment/tilerator/deploy/scap/environments/cleartables/vars.yaml [19:31:13] gehel: so if you checkout /srv/deployment/tilerator/deploy/.git/DEPLOY_HEAD scap does thing that is the correct override to use for source_location for some reason [19:32:32] ah, I see what you were thinking. So the environments in the scap.cfg file are only specific to the deployment host [19:32:33] looks like we found part of the issue! we needed to pass --environemnt explicitly to scap [19:32:37] yeah [19:33:09] if you pass --environment cleartables that will work, environment in the scap.cfg file would work to, but that's a per-deployment-host thing [19:33:50] yeah alright this is hard [19:33:52] so, like, deploying from production vs deploying from beta can have different environments and use different config, but different targets can't have a different config based on the scap.cfg [19:33:55] I'm going to do it the manual way [19:35:12] mutante, I gotta run scap deploy --init in each of the directories? [19:35:23] thcipriani: ok, I think I got it... [19:35:33] we have in scap.cfg: [19:35:41] https://www.irccloud.com/pastebin/CVCsKgfc/ [19:36:29] gehel: that won't work since the environments there are based on where you're deploying from rather than where you're deploying to [19:36:31] but that does not actually make any sense. We could specific different environment for deployment.eqiad.wmnet vs deplyoment-tin... [19:37:06] Krenair: i did not have to do that in prod. i am not sure [19:37:20] but also.. we havent deployed from that yet [19:38:24] gehel: one tip is if you run: scap deploy --init it will generate the .git/DEPLOY_HEAD file that will show you the templates and templated values it thinks is right ot use [19:38:27] the puppet run is green though and doesnt say anything about failed scap [19:38:36] deploy init should run as part of the puppet run, FWIW [19:38:59] thcipriani: and I would run that on tin? [19:39:04] and that is what took very long in prod [19:39:09] cloning all the initial things [19:39:28] gehel: yeah, if you run that on tin in your deploy repo, it'll generate that DEPLOY_HEAD file [19:40:35] DEPLOY_HEAD has a lot of stuff in there, but config_files and override_vars are the ones for templating [19:42:51] Krenair: scap deploy --init for first puppet run is handled via https://github.com/wikimedia/puppet/blob/production/modules/scap/lib/puppet/provider/scap_source/default.rb#L109-L128 [19:42:58] (hopefully) [19:43:33] thcipriani: thanks for all the context! I have learned a few things today... [19:44:52] gehel: sure thing, happy to help! If you run into other issues or anything let me know. [19:46:41] I tried to generate some commands but did it all wrong and it mostly didn't work [19:47:45] thcipriani: pnorman will probably have a CR for you at some point, with some cleanup to our scap config now that we understand what we're doing [19:48:08] gehel: cool, sounds good [19:48:59] thcipriani, mutante: it got 3d2png/deploy and phabricator/deployment [19:49:08] wanna handle the others? [19:49:16] am not familiar with scap::source [19:49:32] I tried running the commands generated by grep -E "^ *repository:" hieradata/labs/deployment-prep/common.yaml | sed -e 's/^ *repository: \(.*\)/cd \1; scap deploy --init; cd \/srv\/deployment/' [19:50:51] hrm, we should probably just fix this ordering in puppet somewhere. [19:51:42] well "somewhere" https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/mediawiki/deployment/server.pp#L102-L110 [19:52:56] so that should run after all scap_source resources have done their thing [19:53:08] how come? [19:53:10] mutante: did you run into ^ in prod at all? [19:53:21] there's no explicit dependency in there [19:55:08] yeah, it's a weird relationship. The first thing scap does is check for locks, but we want it to generate everything needed for --init so if we swap deployment servers and then add another...something...service server and then run puppet there it'll pull down the repo from tin just fine. [19:55:20] thcipriani: i did not, but i assume that is because i was never the active deployment server so far [19:55:26] if $deploy_ensure == 'present' { [19:55:27] # Lock the passive servers, leave untouched the active one. [19:56:50] does --init just do a git clone? [19:56:59] if so, maybe we can make that ignore the deployment lock? [19:57:00] hrm, but the non-active deployment servers *should* get this file: https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/mediawiki/deployment/server.pp#L86-L89 [19:57:40] init actually doesn't even clone, it just generates that DEPLOY_HEAD file [19:57:51] ... so it should be safe to ignore the deployment lock at that point? [19:58:01] yeah, it should be fine [19:58:29] might be tricky to put into scap, I don't think we parse arguments, even, before checking for the global lock [19:59:00] yeah I figured [20:06:20] hrm, this might not be so bad, it looks like we do know about the init flag before we lock the repo, then we look at both global and repo-specific locks [20:14:22] 10Continuous-Integration-Infrastructure (shipyard): Get Wikibase + dependencies to run with Quibble - https://phabricator.wikimedia.org/T196013#4244189 (10hashar) [20:17:09] 10Continuous-Integration-Infrastructure (shipyard), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Get Wikibase + dependencies to run with Quibble - https://phabricator.wikimedia.org/T196013#4244204 (10hashar) As of May 30th 20:14 UTC, install.php fails with: ``` $ install.php --with-extensions [..]... [20:18:12] 10Continuous-Integration-Infrastructure (shipyard), 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Get Wikibase + dependencies to run with Quibble - https://phabricator.wikimedia.org/T196013#4244212 (10hashar) [20:18:17] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#4244211 (10hashar) [20:18:48] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512#4174398 (10hashar) [20:21:25] (03PS2) 10Legoktm: Move "composer-test" to separate job for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/436351 (https://phabricator.wikimedia.org/T195984) [20:26:03] (03CR) 10Legoktm: [C: 032] "INFO:jenkins_jobs.builder:Creating jenkins job mediawiki-quibble-composer-mysql-hhvm-docker" [integration/config] - 10https://gerrit.wikimedia.org/r/436351 (https://phabricator.wikimedia.org/T195984) (owner: 10Legoktm) [20:27:35] !log contint1001: (for ext in BlueSpiceAbout BlueSpiceArticleInfo BlueSpiceAuthors BlueSpiceAvatars BlueSpiceBlog BlueSpiceCategoryManager BlueSpiceChecklist BlueSpiceConfigManager BlueSpiceContextMenu BlueSpiceCountThings BlueSpiceEditNotifyConnector BlueSpiceEmoticons BlueSpiceExtendedFilelist BlueSpiceExtendedSearch BlueSpiceExtendedStatistics BlueSpiceExtensions BlueSpiceFoundation BlueSpiceGroupManager BlueSpiceHideTitle [20:27:35] BlueSpiceInsertCategory BlueSpiceInsertFile BlueSpiceInsertLink BlueSpiceInsertMagic BlueSpiceInsertTemplate BlueSpiceInterWikiLinks BlueSpiceMultiUpload BlueSpiceNamespaceCSS BlueSpiceNamespaceManager BlueSpicePageAccess BlueSpicePageAssignments BlueSpicePagesVisited BlueSpicePageTemplates BlueSpicePageVersion BlueSpicePermissionManager BlueSpiceReaders BlueSpiceRSSFeeder BlueSpiceSignHere BlueSpiceSmartList BlueSpiceSubPageTree [20:27:36] BlueSpiceTagCloud BlueSpiceUEModulePDF BlueSpiceUniversalExport BlueSpiceUserManager BlueSpiceUserPreferences BlueSpiceWatchList BlueSpiceWhoIsOnline GlobalPreferences GoogleLogin Linter MultiLanguageManager ORES ReadingLists CirrusSearch ContentTranslation GeoData LifeWeb Math MathSearch PropertySuggester WikibaseJavaScriptApi WikibaseLexeme Wikibase WikibaseMediaInfo WikibaseQualityConstraints WikibaseQualityExternalValidation [20:27:36] WikibaseQuality Wikidata WikidataPageBanner WikimediaBadges Wikisource ; do python gear_client.py --function build:quibble-vendor-mysql-php70-docker --params '{"ZUUL_PROJECT": "mediawiki/extensions/'"${skin}"'", "ZUUL_URL": "https://gerrit.wikimedia.org/r/p", "ZUUL_BRANCH": "master", "ZUUL_REF": "master"}'; done; ) | tee invalid_callback.log [20:27:37] pfff [20:27:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:27:50] lol that was a big log message hasharAway [20:27:51] :) [20:28:04] !log contint1001: triggered a few quibble runs from contint1001. Running in a screen [20:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:28:12] paladox: yeah I am mass testing extensions [20:28:19] ok :) [20:28:26] I wrote some script to run a single job directly from the CLI. I should polish it up [20:29:06] (03PS1) 10Legoktm: Run 'mediawiki-core-hhvmlint' job [integration/config] - 10https://gerrit.wikimedia.org/r/436403 [20:29:55] hasharAway: I forked your gear_client.py script last week and made it a lot more flexible, I'll publish it later this week hopefully [20:30:09] hasharAway: I used it to make https://people.wikimedia.org/~legoktm/seccheck.html [20:30:35] \o/ [20:31:01] (03CR) 10jerkins-bot: [V: 04-1] Run 'mediawiki-core-hhvmlint' job [integration/config] - 10https://gerrit.wikimedia.org/r/436403 (owner: 10Legoktm) [20:31:32] so much BlueSpice [20:32:04] legoktm: I am not sure where to commit that script though. Maybe integration/config.git /bin or something [20:32:30] I was thinking ops/puppet as `zuul-trigger-jobs` or something [20:33:49] (03CR) 10Hashar: "isn't parallel lint super fast since it just lint the php files that got modified? At least that is how I remember I hacked it in Quibble" [integration/config] - 10https://gerrit.wikimedia.org/r/436403 (owner: 10Legoktm) [20:34:24] (03CR) 10Legoktm: "It's the latter that causes the performance problems" [integration/config] - 10https://gerrit.wikimedia.org/r/436403 (owner: 10Legoktm) [20:34:39] (03CR) 10Legoktm: [C: 032] Bump Quibble jobs to 0.0.15 [integration/config] - 10https://gerrit.wikimedia.org/r/436349 (owner: 10Legoktm) [20:36:35] (03Merged) 10jenkins-bot: Bump Quibble jobs to 0.0.15 [integration/config] - 10https://gerrit.wikimedia.org/r/436349 (owner: 10Legoktm) [20:36:37] (03Merged) 10jenkins-bot: Move "composer-test" to separate job for MediaWiki core [integration/config] - 10https://gerrit.wikimedia.org/r/436351 (https://phabricator.wikimedia.org/T195984) (owner: 10Legoktm) [20:37:39] !log deployed https://gerrit.wikimedia.org/r/436351 [20:37:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:41:08] (03CR) 10Hashar: "Yeah that makes sense. Though ultimately I would like to phase out the phplint jobs entirely. I am not entirely sure why hhvm lint is s" [integration/config] - 10https://gerrit.wikimedia.org/r/436403 (owner: 10Legoktm) [20:41:43] legoktm: beware with the quibble stage skip feature you added. I am worried we end up with lot of super specific jenkins jobs all other the place :] [20:42:25] I will be careful, I promise :) [20:42:36] I think we just need it for core because it's so large [20:43:51] PROBLEM - Puppet errors on integration-slave-jessie-android is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:44:23] legoktm: and the psr4 / autoloader patch should let me migrate a few dozens of extensions tomorrow :] [20:44:32] for now. it is bed time! Happy hacking [20:44:39] awesome :) good night! [20:46:10] gerrit should be listening on ipv6? `telnet gerrit.wikimedia.org 29418` uses ipv6 and stalls as if it's firewalled, but `telnet 208.80.154.85 29418` goes through just fine. Tried testing from labs too but doesn't have ipv6 [20:47:01] $ telnet -6 gerrit.wikimedia.org 29418 [20:47:01] Trying 2620:0:861:3:208:80:154:85... [20:47:01] Connected to gerrit.wikimedia.org. [20:47:09] SSH-2.0-GerritCodeReview_2.14.8-22-g07c8aa9910 (SSHD-CORE-1.4.0) [20:47:20] hmm, i get same ipv6 address but not connecting :S i guess i'll blame comcast? [20:47:22] ebernhardson: works for me :) [20:47:45] possibly, or comcast got blacklisted someho [20:47:46] w [20:47:56] do you have stream events rights hasharAway ? [20:48:08] though [20:48:08] telnet -6 gerrit.wikimedia.org 29418 [20:48:10] works for me [20:48:12] using bt [20:49:09] hmm, yea ipv6-test.com seems to agree my ipv6 is hosed somehow :( [20:49:14] paladox: yeah I do, but I am heading to bed! [20:49:27] ok [20:49:36] except dns, of course ipv6 dns is the only thing working :) [20:49:42] ebernhardson: that is a nice one! 19/20 I just lack a hostname :] [20:50:52] have good dreams and night *wave* [21:04:41] (03PS1) 10Legoktm: Revert "Move "composer-test" to separate job for MediaWiki core" [integration/config] - 10https://gerrit.wikimedia.org/r/436404 [21:04:43] (03PS1) 10Legoktm: Revert "Bump Quibble jobs to 0.0.15" [integration/config] - 10https://gerrit.wikimedia.org/r/436405 [21:04:52] (03CR) 10Legoktm: [C: 032] Revert "Move "composer-test" to separate job for MediaWiki core" [integration/config] - 10https://gerrit.wikimedia.org/r/436404 (owner: 10Legoktm) [21:04:56] (03CR) 10Legoktm: [C: 032] Revert "Bump Quibble jobs to 0.0.15" [integration/config] - 10https://gerrit.wikimedia.org/r/436405 (owner: 10Legoktm) [21:06:37] (03Merged) 10jenkins-bot: Revert "Move "composer-test" to separate job for MediaWiki core" [integration/config] - 10https://gerrit.wikimedia.org/r/436404 (owner: 10Legoktm) [21:07:11] (03Merged) 10jenkins-bot: Revert "Bump Quibble jobs to 0.0.15" [integration/config] - 10https://gerrit.wikimedia.org/r/436405 (owner: 10Legoktm) [21:07:18] !log deployed https://gerrit.wikimedia.org/r/436404, reverted quibble upgrade [21:07:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:08:22] PROBLEM - Puppet errors on deployment-mx02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:19:44] PROBLEM - SSH on integration-slave-docker-1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:22:16] RECOVERY - Puppet errors on deployment-mx02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:25:47] 10Beta-Cluster-Infrastructure, 10Puppet: Puppet broken on deployment-mx due to systemd on trusty - https://phabricator.wikimedia.org/T184244#4244425 (10Krenair) With help from @herron on https://gerrit.wikimedia.org/r/#/c/435814/ and Andrew on T195059 (and a weird labsaliaser problem) I've managed to get the p... [21:30:04] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Operations: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561#4244432 (10EddieGP) Woohoo! Thanks Daniel and Tyler! :) [21:32:53] PROBLEM - SSH on integration-slave-docker-1010 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:34:33] RECOVERY - SSH on integration-slave-docker-1017 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [21:41:35] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.32.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T191051#4244467 (10greg) 05Open>03Resolved [21:41:52] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T191054#4244468 (10greg) a:05demon>03None [21:42:02] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.9 deployment blockers - https://phabricator.wikimedia.org/T191055#4244470 (10greg) a:05demon>03None [22:03:16] hmmmm [22:07:44] RECOVERY - SSH on integration-slave-docker-1010 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [22:08:17] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.32.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T191052#4244529 (10Legoktm) [22:20:18] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.32.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T191051#4244562 (10Krinkle) [22:25:57] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.32.0-wmf.5 deployment blockers - https://phabricator.wikimedia.org/T191051#4092232 (10Krinkle) Re-adding T195546. This was wrongly removed. The wmf.5 branch introduced a breaking change to the Unicode normalisation... [22:34:46] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Determine a standard way of installing MediaWiki lib/extension dependencies within containers - https://phabricator.wikimedia.org/T193824#4244594 (10dduvall) Thanks, @Legoktm. We seem to be working at an intersection of different efforts and use cases—an... [22:36:08] 10Phabricator, 10Release-Engineering-Team (Watching / External), 10Operations, 10Patch-For-Review: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568#4244599 (10Dzahn) Using the mw server has not been approved (T195623). We will have to use another spare machine with... [23:25:18] PROBLEM - Free space - all mounts on deployment-deploy-01 is CRITICAL: CRITICAL: deployment-prep.deployment-deploy-01.diskspace.root.byte_percentfree (<33.33%) [23:33:52] lol? really ^ [23:33:59] we chose XL flavor to fix that [23:38:33] really [23:39:00] oh is the thing not mounted? [23:39:07] Krenair https://github.com/wikimedia/puppet/blob/production/modules/role/manifests/labs/lvm/srv.pp [23:39:13] need to add that as the puppet class [23:39:18] to get the xl qouta [23:39:43] but wait, stuff already exists at /srv [23:42:12] hold on [23:44:08] should be puppetized in role(deployment_server) with an "if $realm" .. not be a second role [23:44:35] but we also hate "if $realm" [23:44:47] so then it should be a Hiera override [23:44:56] which would be set with project puppet [23:45:14] anything but "add 'labs-only'-roles to it" [23:45:31] mutante: ta-da [23:45:42] /dev/mapper/vd-second--local--disk 138G 8.7G 122G 7% /srv [23:46:10] yea, i dont get it. so when you select a flavor in horizon and it says a disk space.. you dont actually get that? [23:46:19] you do but it's not mounted by default [23:46:42] ok. what mounted it? [23:47:42] role::labs::lvm::srv -> profile::labs::lvm::srv -> labs_lvm::volume [23:47:57] this puppet role mounts it for us: https://github.com/wikimedia/puppet/blob/production/modules/labs_lvm/manifests/volume.pp [23:48:03] puppet class* I should say [23:48:14] out of interest, how does this stuff work in prod? [23:48:22] does ops manage mounts manually in most cases there? [23:48:23] we should add that code in the actual deployment_server role [23:48:32] no, we dont manage mounts manually [23:48:52] we tell it which partman recipe to use when it gets installed [23:48:53] and that's it [23:49:14] ... so it's unpuppetised there [23:49:22] ? [23:49:35] puppet only exists after you have partitions [23:49:51] the code is in the repo though [23:49:54] it's partman config [23:50:07] it's puppetized in the way that the partman config gets instlaled on the install server [23:50:10] and then that uses it [23:50:12] hm [23:50:57] I wonder if the labs base images should include the mount by default [23:51:06] "if in labs, then mount this" in the normal prod role would be much better imho .. than a second unrelated role [23:51:46] or another way to mount it each time somebody selects the XL flavor [23:51:53] I'm not sure the extent to which labs and prod initial images differ actually [23:52:12] I guess labs stuff needs cloud-init etc [23:52:18] never really looked into it [23:55:18] RECOVERY - Free space - all mounts on deployment-deploy-01 is OK: OK: All targets OK