[00:12:31] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10ssastry) 05Open→03Resolved a:03Jdforrester-WMF Thanks @Jdforrester-WMF ! I tested an edit on the beta cluster and it went th... [00:23:52] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Jdforrester-WMF) Hmm, I'm still getting 404s for some attempts but not others. [00:39:54] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Krenair) Is it possible that it's cached somewhere? [00:43:45] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Jdforrester-WMF) >>! In T246833#5939661, @Krenair wrote: > Is it possible that it's cached somewhere? ` jforrester@deployment-med... [00:49:26] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Jdforrester-WMF) 05Resolved→03Open [00:49:39] James_F: what role does that host use again? [00:50:25] mutante: "deployment-mediawiki-parsoid10 is mediawiki::appserver \n … a Parsoid labs (parsoid)" [00:50:38] Maybe the "Parsoid labs" role needs looking at. [00:52:09] But I don't know where that comes from? [00:52:23] Is it a local patch on the beta cluster puppet master? [00:52:26] James_F: if you click configure puppet in Horizon .. what does it say there? [00:52:55] role::mediawiki::appserver is not exactly like production but it does include mediawiki::appserver [00:52:59] and that includs mediawiki::httpd [00:53:11] and in there it sets the SERVERGROUP to whatever is $cluster in puppet [00:53:40] On Horizon it says… [00:54:02] role::parsoid, role::mediawiki::appserver, role::beta::mediawiki, profile::mediawiki::php, role::mediawiki::common [00:54:10] :o [00:54:24] ok, yea.. i think i left a comment in the past on another ticket [00:54:26] about that [00:54:43] Is that wrong? [00:54:57] it's pretty different from production [00:55:06] normally they should just have a single role [00:55:18] a wtp server is role(parsoid) [00:55:18] Well, the parsoid boxes in prod have two, right? [00:55:28] and through Hiera settings it becomes a parsoid-php server [00:55:33] as opposed to a parsoid-js [00:55:34] Oh, did we mix MW into parsoid? [00:55:39] Ah. [00:55:55] parsoid-php server is now mw appserver plus [00:56:11] * James_F nods. [00:56:16] in prod each node can always just have a single role [00:56:20] We should probably tear down all the parsoid-js infrastructure. [00:56:29] Given we've deleted the code. [00:56:29] but the special role keyword is only in prod [00:56:47] this is the cause of many "prod vs beta" puppet things [00:57:12] nevertheless.. i don't see why it would not set the SERVERGROUP at all.. right now [00:57:54] I'm assuming that my SERVERGROUP change didn't go into effect at all, and that things suddenly worked was based entirely to running puppet for the first time in 10 days. :-) [00:58:36] there is mw* and wtp* and soon parse* [00:58:39] But that's only fixed some kinds of request and not others. Though I'm not sure what differs between VE and DT in how they parse things with a Parsoid API call that means the former doesn't work and the latter does. [00:58:47] mw* will not get parsoid [00:58:51] Yeah, and shouldn't. [00:58:52] but wtp* is mw+parsoid [00:59:01] and parse* will be wtp* renamed [00:59:45] whether wtp* is "only parsoid/js" or "mediawiki + parsoid/php" depends only on Hiera setting "use_php" true or false [00:59:53] but the role just always stays "parsoid" [01:00:14] Yeah, I was saying we should remove that complexity and have use_php always true. [01:01:00] yea, let's try making a new instance with just the parsoid role and copying all the Hiera settings like: [01:01:03] 17 # switch Parsoid/JS-server to Parsoid/PHP-MW-appserver [01:01:06] 18 profile::parsoid::use_php: true [01:01:08] 19 has_lvs: true [01:01:17] Yeah… [01:01:24] somehow i thought we already did that .. but i forget [01:05:06] James_F: https://phabricator.wikimedia.org/T236275#5621970 [01:06:14] that was the fix for the "php-fpm restart check" in beta [01:06:35] and it's kind of the same thing about the roles [01:06:36] Right. [01:07:07] So… do we need to try to create deployment-parsoid11 somehow? [01:08:11] yea, there will be a bunch of issues but i think it's the right direction [01:09:22] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Dzahn) also see T236275#5621970 [01:09:44] 10Phabricator, 10Project-Admins, 10CommRel-Design, 10WMF-Communications: Archive #CommRel-Design and related Phabricator Form? - https://phabricator.wikimedia.org/T246853 (10Aklapper) [01:09:49] 10Beta-Cluster-Infrastructure, 10Parsoid: Replace deployment-mediawiki-parsoid10 with a "purer" deployment-parsoid11 box - https://phabricator.wikimedia.org/T246854 (10Jdforrester-WMF) [01:15:31] James_F: something like https://gerrit.wikimedia.org/r/c/operations/puppet/+/576493/1/hieradata/cloud/eqiad1/deployment-prep/hosts/deployment-parsoid11 [01:15:39] but that's really quick copy/paste [01:16:41] definitely needs the "has_lvs: false" (to not fail in cloud) and "use_php: true" (to includ all the mediawiki classes) [01:16:59] * James_F nods. [01:17:01] that would be if the only role on it is role(parsoid) [01:17:44] the rest is all copy/paste from prod but without them there will be a bunch of puppet errors.. and i expect more :) [01:18:04] * James_F grins. [01:18:49] going afk for now.. cya [01:19:04] unless you want me to merge that already now [01:19:25] Ha. Well, it can't break anything. [01:19:33] actually.. the best you could do is just try the same stuff but in Horizon [01:19:38] But it'd probably not be ideal to merge that without someone around. [01:19:40] before we merge stuff in the repo [01:19:49] copy/paste from that gerrit change to horizon [01:20:25] once we actually know it works we can make a single merge [01:20:38] and remove it from Horizon web UI again [01:22:07] I'm not an admin of the deployment-prep account, so I can't create new instances, I think. [01:22:21] That's easily resolvable. [01:22:44] But then I might end up having to fix Beta Cluster all the time! ;-) [01:22:46] :P [01:23:00] Krenair: If you want me, sure, I'll give it a go. [01:23:30] I'm not saying I think you have to do this particular thing but in general you should have these permissions [01:23:43] !log Made James a deployment-prep projectadmin [01:23:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [01:23:51] (Why on Earth does deployment-prep need 3 different "deployment-hadoop-test-\d" boxes? Oy.) [01:23:59] Yeah. [01:24:19] Also, like 5 different memc boxes [01:24:23] And two logstashes [01:24:40] Woah, we still have sca boxes? [01:24:42] yeah well that is the result of a failed migration [01:24:45] Do they have anything running on them? [01:25:12] logstash2 was going to be gotten rid of in favour of 03, then something broke/didn't work and instead of fixing it someone just brought 2 back up [01:25:16] Oh, cxserver. Lovely. [01:25:30] Should we delete logstash03 to save space? [01:25:34] no [01:25:40] we should delete 2 and fix 03 [01:25:55] but like maybe not in that order [01:26:01] :-D [01:34:48] !log Beta Cluster: Created deployment-parsoid11 in Horizon T246854 to test 576493. [01:34:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [01:34:50] T246854: Replace deployment-mediawiki-parsoid10 with a "purer" deployment-parsoid11 box - https://phabricator.wikimedia.org/T246854 [01:35:23] PROBLEM - Free space - all mounts on deployment-puppetmaster04 is CRITICAL: CRITICAL: deployment-prep.deployment-puppetmaster04.diskspace.root.byte_percentfree (<55.56%) [01:37:21] Oops, yes, /dev/vda2 is full [01:38:50] daemon.log is 5G. [01:38:52] Ouch. [01:41:11] Yeah, it's very unhappy. [01:45:43] Everything's trying to talk to the wrong puppetmaster? [01:45:46] "Failed to open TCP connection to deployment-puppetmaster03.deployment-prep.eqiad.wmflabs" [01:46:27] But puppetmaster03 doesn't exist? There's dumps-puppetmaster02 and puppetmaster04 and puppetdb03? [01:47:45] hm [01:47:56] probably something left over as a default from before the switch [01:47:57] Is this my fault? (And if so, how?) [01:48:05] no [01:48:13] OK, good, at least there's that. [01:49:04] fixed the hostname [01:49:15] now we need to do the custom puppetmaster thing [01:50:12] ugh what a mess [01:50:18] puppetmaster's disk is full... that's a new one [01:50:31] Yeah, I deleted a bunch of logrotated logs but it filled again. [01:50:57] damon.log, syslog, and debug are each over 5G. [01:53:41] alright sorted out basic setup against our puppetmaster [01:53:47] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Invalid tag 'cluster: parsoid' on node deployment-parsoid11.deployment-prep.eqiad.wmflabs [01:54:10] Oh, right, that's meant to be in the hieradata. [01:56:30] Krenair: Fixed now? [01:56:51] Oh, fun. No. [01:57:07] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Cluster parsoid not defined in wikimedia_clusters (file: /etc/puppet/modules/profile/manifests/base.pp, line: 39, column: 9) on node deployment-parsoid11.deployment-prep.eqiad.wmflabs [01:57:08] "Cluster parsoid not defined in wikimedia_clusters (file: /etc/puppet/modules/profile/manifests/base.pp, line: 39, column: 9) on node deployment-parsoid11.deployment-prep.eqiad.wmflabs" [01:57:10] Yeah. [01:57:24] Does Beta not inherit the names of the clusters from production? [01:57:27] often go through several of these types of things whenever creating a new instance [01:58:21] https://phabricator.wikimedia.org/T234232 [01:58:35] it seems hieradata/cloud/eqiad1.yaml overrides hieradata/common.yaml's list of clusters [01:59:03] and we don't then go and override it in beta's project hieradata [01:59:03] Of course. :-) [02:00:20] RECOVERY - Free space - all mounts on deployment-puppetmaster04 is OK: OK: All targets OK [02:16:21] PROBLEM - Free space - all mounts on deployment-puppetmaster04 is CRITICAL: CRITICAL: deployment-prep.deployment-puppetmaster04.diskspace.root.byte_percentfree (<11.11%) [02:31:51] 10Gerrit, 10Analytics, 10Gerrit-Privilege-Requests, 10Patch-For-Review, 10User-MarcoAurelio: Give access to Wikistats 2 to l10n-bot - https://phabricator.wikimedia.org/T245805 (10abi_) Thanks @MarcoAurelio, that seems like the likely cause. I've deployed the patch, exports will be run tomorrow. Will upda... [02:52:51] PROBLEM - Parsoid on deployment-parsoid11 is CRITICAL: connect to address 172.16.1.115 and port 8000: Connection refused [04:15:12] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Izno) Is this also {T246833}? [05:21:01] 10Gerrit, 10Analytics, 10Gerrit-Privilege-Requests, 10User-MarcoAurelio: Give access to Wikistats 2 to l10n-bot - https://phabricator.wikimedia.org/T245805 (10Milimetric) Thank you so much yall! [06:00:59] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Jdforrester-WMF) >>! In T246833#5939986, @Izno wrote: > Is this also {T246833}? Yes, that's this task; did you mean a different one? [06:56:54] (03PS1) 10Vgutierrez: Edit Project Config [debs/trafficserver] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/576545 [06:57:19] (03Abandoned) 10Vgutierrez: Edit Project Config [debs/trafficserver] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/576545 (owner: 10Vgutierrez) [08:22:37] o/ JUst gonna shot this out here incase anyone can action it https://phabricator.wikimedia.org/T706#5937671 [08:47:58] (03PS1) 10KartikMistry: Add apertium-oci-fra and apertium-pol-szl [integration/config] - 10https://gerrit.wikimedia.org/r/576627 (https://phabricator.wikimedia.org/T202276) [09:05:00] 10Beta-Cluster-Infrastructure, 10Parsoid, 10Quality-and-Test-Engineering-Team (QTE): Parsoid integration is broken on beta cluster sites - https://phabricator.wikimedia.org/T246760 (10zeljkofilipin) @kostajh why did you tag #qte? Is there something we should do? [09:11:01] (03CR) 10Hashar: [C: 03+2] Add apertium-oci-fra and apertium-pol-szl [integration/config] - 10https://gerrit.wikimedia.org/r/576627 (https://phabricator.wikimedia.org/T202276) (owner: 10KartikMistry) [09:12:00] (03Merged) 10jenkins-bot: Add apertium-oci-fra and apertium-pol-szl [integration/config] - 10https://gerrit.wikimedia.org/r/576627 (https://phabricator.wikimedia.org/T202276) (owner: 10KartikMistry) [09:12:20] (03CR) 10Hashar: [C: 03+2] "Deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/576627 (https://phabricator.wikimedia.org/T202276) (owner: 10KartikMistry) [09:23:23] (03CR) 10Hashar: [C: 04-1] "The issue comes from https://gerrit.wikimedia.org/r/#/c/integration/config/+/518974/ which was made to remove the trailing comma. The patc" [integration/config] - 10https://gerrit.wikimedia.org/r/575012 (owner: 10Jbond) [09:27:15] 10Beta-Cluster-Infrastructure, 10Parsoid, 10Quality-and-Test-Engineering-Team (QTE): Parsoid integration is broken on beta cluster sites - https://phabricator.wikimedia.org/T246760 (10kostajh) @zeljkofilipin for awareness, I imagine QA people would want to know that common use cases (VE edits and Flow commen... [09:32:39] 10Continuous-Integration-Config, 10Wikidata, 10Wikidata-Campsite: Wikibase post-merge builds are failing - https://phabricator.wikimedia.org/T242617 (10Addshore) Added the people involved in https://gerrit.wikimedia.org/r/#/c/integration/config/+/563573/ to this ticket. [09:41:47] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Toolforge, and 2 others: Add CI checks for golang admission controllers - https://phabricator.wikimedia.org/T236203 (10hashar) [10:09:10] 10Beta-Cluster-Infrastructure, 10Parsoid: Parsoid integration is broken on beta cluster sites - https://phabricator.wikimedia.org/T246760 (10zeljkofilipin) I'm the only one watching the tag 😁 https://phabricator.wikimedia.org/project/members/4403/ I'll paste the link in `#wikimedia-qte`, that will get more vi... [10:14:23] 10Continuous-Integration-Config, 10BlueSpice: Enable CI on BlueSpicePermissionManager - https://phabricator.wikimedia.org/T246877 (10hashar) [10:15:33] 10Continuous-Integration-Config, 10BlueSpice, 10Patch-For-Review: Enable CI on BlueSpicePermissionManager - https://phabricator.wikimedia.org/T246877 (10hashar) [10:16:52] (03PS1) 10Hashar: BlueSpicePermissionManager: enable tests [integration/config] - 10https://gerrit.wikimedia.org/r/576654 (https://phabricator.wikimedia.org/T246877) [10:16:58] (03CR) 10Hashar: [C: 04-1] "experimental build is running on dummy change https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/BlueSpicePermissionManager/+/446075/" [integration/config] - 10https://gerrit.wikimedia.org/r/576654 (https://phabricator.wikimedia.org/T246877) (owner: 10Hashar) [10:35:18] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (CI & Testing services), 10ContentTranslation: debian-glue fails with: jessie no longer support backports - https://phabricator.wikimedia.org/T240175 (10hashar) 05Open→03Resolved a:03hashar Marking this CI task resolved, the issue is A... [10:40:32] 10Continuous-Integration-Config, 10BlueSpice, 10Patch-For-Review: Enable CI on BlueSpicePermissionManager - https://phabricator.wikimedia.org/T246877 (10hashar) [10:40:38] 10Continuous-Integration-Config, 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811 (10hashar) [10:41:06] 10Continuous-Integration-Config, 10BlueSpice, 10Patch-For-Review: Enable CI on BlueSpicePermissionManager - https://phabricator.wikimedia.org/T246877 (10hashar) p:05Triage→03Medium There are still failures, I have reopened T197900 [11:16:42] (03CR) 10Jbond: "> Patch Set 1: Code-Review-1" [integration/config] - 10https://gerrit.wikimedia.org/r/575012 (owner: 10Jbond) [12:14:23] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Izno) >>! In T246833#5940063, @Jdforrester-WMF wrote: >>>! In T246833#5939986, @Izno wrote: >> Is this also {T246833}? > > Yes, t... [12:37:01] holy lord, look at that gate-and-submit-1_31 chain o_O [12:37:07] BlueSpice folks being very busy… [12:37:18] at least it’s not the main gate-and-submit pipeline I guess [12:37:48] but that’s still going to dry up executors for hours :/ [12:41:10] :D [12:45:44] 10Beta-Cluster-Infrastructure, 10MediaWiki-File-management: I can duplicate my files (identical revisions) on betacommons by (un)deleting them - https://phabricator.wikimedia.org/T246695 (10AlexisJazz) I suspect this was caused by my botched undeletion tool making undeletion requests for the same file in rapid... [12:48:25] LTS if lyf [12:48:27] *is [13:37:23] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Port scap to Python 3 - https://phabricator.wikimedia.org/T246025 (10LarsWirzenius) It seems the oldest Python3 we have in production is 3.4, so we need to keep that as the minimum baseline when porting. [13:56:28] Reedy: any idea when the next LTS branch cut is? [13:56:52] It's due for release in June [13:57:04] I think end of this month it's probably branched [13:57:57] is LTS the same as 1.35.0? [13:58:09] ack! [13:58:17] liw: Yeah [13:58:30] if so, from Scrum of Scrums: * Release Engineering - [All] MediaWiki 1.35.0 will get cut on 7 April 2020. If your team has any proposed blockers/deadlines for that, please get them done: https://phabricator.wikimedia.org/tag/MW-1.35-release [13:58:34] also liw is the plan for the train to roll to group1 in this next train slot? :) [13:58:46] addshore, yes, starting in a few minutes [13:58:56] 7th april, great" I dont think I got to this point for SOS in my inbox yet! :D [14:09:35] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments: 1.35.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T233870 (10LarsWirzenius) Group1 deployed just now. [14:27:33] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments: 1.35.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T233870 (10LarsWirzenius) [14:30:46] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments: 1.35.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T233870 (10LarsWirzenius) Rolles back due to T246898 [14:34:54] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.35.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T233870 (10zeljkofilipin) [14:49:58] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10cscott) Yes, T246760 is almost certainly a dup. [14:55:34] Hello, Zuul no starts tests for https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/BrickipediaExtra/+/576849/ [14:55:35] Why? [14:55:46] It should run tests per https://gerrit.wikimedia.org/g/integration/config/+/59c8323d78578323706fe4c452ab9fe2beb1bd49/zuul/layout.yaml#4950 [14:56:48] Patch got +2 from Thiemo, but gate-and-submit no starts [15:05:24] * thcipriani looking [15:05:48] hrm, well it looks like zuul is up and doing ... something [15:06:19] It works gate-and-submit-1_31 jobs for BlueSpice(* [15:06:26] *BlueSpice related extensions [15:11:25] sigh. I'm going to kick zuul and then do a postmortem after we restore service. [15:11:51] thcipriani, thanks [15:11:53] Post... what? [15:12:00] !log restarting zuul [15:12:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:12:16] Zoranzoki21, an analysis after the fact of what happened [15:12:39] also known as a "retrospective" in some circles [15:13:31] liw: Ok, thanks for explaining.. :) [15:14:00] well. I see stuff moving through. [15:14:40] Works now [15:14:48] all sorts of exceptions about bluespice and gerrit in the logs [15:15:47] It had 109 patches in gate-and-submit_1-31 before ~2-3 hours [15:19:53] Wow. [15:22:34] 109 patches x (how much?) tests... It is so much [15:25:56] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10BlueSpice: Investigate Zuul crashing due to BlueSpice REL1_31 backlog - https://phabricator.wikimedia.org/T246903 (10thcipriani) [15:41:27] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10BlueSpice: Investigate Zuul crashing due to BlueSpice REL1_31 backlog - https://phabricator.wikimedia.org/T246903 (10Osnard) We have a tool that helps us updating I18N and bumping version numbers in all Blue... [15:45:12] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10BlueSpice: Investigate Zuul crashing due to BlueSpice REL1_31 backlog - https://phabricator.wikimedia.org/T246903 (10thcipriani) Hrm, on closer inspection it seems that the mutex problem is not that uncommon... [15:46:52] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10BlueSpice: Investigate Zuul crashing due to BlueSpice REL1_31 backlog - https://phabricator.wikimedia.org/T246903 (10thcipriani) >>! In T246903#5941403, @Osnard wrote: > We have a tool that helps us updating... [15:48:32] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.35.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T233870 (10LarsWirzenius) Rolled forward to group1 again, after waiting for a wikidata edit spike from being over. [15:50:15] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10BlueSpice: Investigate Zuul crashing due to BlueSpice REL1_31 backlog - https://phabricator.wikimedia.org/T246903 (10Jdforrester-WMF) Pushing directly won't help at all; CI will still be DOS'ed, it'll just b... [15:51:47] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.35.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T233870 (10LarsWirzenius) [15:53:55] 10Beta-Cluster-Infrastructure, 10Parsoid: Parsoid integration is broken on beta cluster sites - https://phabricator.wikimedia.org/T246760 (10Jdforrester-WMF) Up-merging as the other task has more content. [15:54:05] 10Beta-Cluster-Infrastructure, 10Parsoid: Parsoid integration is broken on beta cluster sites - https://phabricator.wikimedia.org/T246760 (10Jdforrester-WMF) [15:54:08] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10Jdforrester-WMF) [15:58:56] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10cscott) tl;dr: the root cause is that the parsoid-beta machine has an unusual configuration that didn't track the changes we made... [15:59:37] 10Beta-Cluster-Infrastructure, 10Core Platform Team, 10Parsoid, 10User-Ryasmeen: Parsoid/RESTbase seems to be unavailable in Beta - https://phabricator.wikimedia.org/T246833 (10cscott) [15:59:41] 10Beta-Cluster-Infrastructure, 10Parsoid, 10Patch-For-Review: Replace deployment-mediawiki-parsoid10 with a "purer" deployment-parsoid11 box - https://phabricator.wikimedia.org/T246854 (10cscott) [16:00:11] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10BlueSpice: Investigate Zuul crashing due to BlueSpice REL1_31 backlog - https://phabricator.wikimedia.org/T246903 (10thcipriani) Couple of troubleshooting notes from this event: * I didn't see it processing... [16:12:46] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10MediaWiki-Docker, 10dev-images, 10User-brennen: dev-images: reorganize to reduce various image sizes - https://phabricator.wikimedia.org/T245443 (10brennen) [16:13:34] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Wikimedia-Phabricator-Extensions, 10User-brennen: Related Gerrit Patches: Do not display CR-2 patches as "awaiting review" - https://phabricator.wikimedia.org/T245247 (10brennen) [16:14:27] (03CR) 10Jforrester: "> Patch Set 1: Code-Review-1" [integration/config] - 10https://gerrit.wikimedia.org/r/576654 (https://phabricator.wikimedia.org/T246877) (owner: 10Hashar) [16:18:52] 10Continuous-Integration-Config, 10Wikidata, 10Wikidata-Campsite: Wikibase post-merge builds are failing - https://phabricator.wikimedia.org/T242617 (10Jdforrester-WMF) >>! In T242617#5940399, @Addshore wrote: > Added the people involved in https://gerrit.wikimedia.org/r/#/c/integration/config/+/563573/ to t... [16:52:46] 10Release-Engineering-Team-TODO, 10Cleanup, 10Operations, 10Traffic, and 3 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10MusikAnimal) > [x] Remove references from the CentralAuth database. This item is checked but I still see it in `meta_p.wiki` on the Toolforge... [16:59:40] 10Release-Engineering-Team, 10MediaWiki-extensions-MediaModeration, 10Core Platform Team Workboards (Contractor Workboard): Enable integration testing in CI - https://phabricator.wikimedia.org/T246914 (10Pchelolo) [17:03:00] 10Release-Engineering-Team-TODO, 10Scap, 10MediaWiki-Internationalization, 10Performance-Team, 10Patch-For-Review: Use static php array files for l10n cache at WMF (instead of CDB) - https://phabricator.wikimedia.org/T99740 (10ori) >>! In T99740#5934663, @Krinkle wrote: > I'll enable it on a canary ser... [17:07:48] 10Release-Engineering-Team-TODO, 10Cleanup, 10Operations, 10Traffic, and 3 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF) >>! In T238803#5941793, @MusikAnimal wrote: >> [x] Remove references from the CentralAuth database. > > This item is checked... [17:31:50] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10BlueSpice: Investigate Zuul crashing due to BlueSpice REL1_31 backlog - https://phabricator.wikimedia.org/T246903 (10hashar) That has been caused by a lot of changes being send and immediately voted Code-Rev... [17:47:52] PROBLEM - Parsoid on deployment-parsoid09 is CRITICAL: connect to address 172.16.5.63 and port 8000: Connection refused [17:47:52] PROBLEM - Parsoid on deployment-mediawiki-parsoid10 is CRITICAL: connect to address 172.16.0.141 and port 8000: Connection refused [17:49:36] James_F: you? [17:49:42] 10Release-Engineering-Team (Other / Uncategorized), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Releng March priorities vs resources Chart - https://phabricator.wikimedia.org/T242237 (10thcipriani) {F31662357} This chart obviously has some problems: * @hashar is on 3 projects, 2 of which have... [17:49:58] so there is also the "09" machine, did not see that because of the different prefix [18:00:46] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): local-charts: Repair ability to enable xdebug on mw/core - https://phabricator.wikimedia.org/T246921 (10jeena) [18:05:00] 10Beta-Cluster-Infrastructure, 10Parsoid, 10Patch-For-Review: Replace deployment-mediawiki-parsoid10 with a "purer" deployment-parsoid11 box - https://phabricator.wikimedia.org/T246854 (10Dzahn) @Jdforrester-WMF Merged on prod master, it should find the cluster now. [18:08:08] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): Get an IDE to run tests against services running in kubernetes - https://phabricator.wikimedia.org/T245656 (10jeena) I was able to get this working in VSCode using a php unit test plugin that allows one to define a command to exec into docker. I used kub... [18:09:22] 10Phabricator: Phabricator is inaccessible from Egypt - https://phabricator.wikimedia.org/T246923 (10ahmad) [18:11:18] 10Phabricator (Search), 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10User-MModell, 10User-brennen: Make search context highlights work with the ferret search engine - https://phabricator.wikimedia.org/T230787 (10mmodell) If all goes accor... [18:22:15] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Parsoid, 10Patch-For-Review: Replace deployment-mediawiki-parsoid10 with a "purer" deployment-parsoid11 box - https://phabricator.wikimedia.org/T246854 (10Jdforrester-WMF) a:03Jdforrester-WMF Will look. [18:26:40] 10Phabricator, 10Project-Admins, 10CommRel-Design, 10WMF-Communications: Archive #CommRel-Design and related Phabricator Form? - https://phabricator.wikimedia.org/T246853 (10hdothiduc) Hi! Thanks for creating this, Andre! I think archiving the Form 62 sounds good. Just making sure: if I or anyone else like... [18:30:21] 10Phabricator, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): move_project breaks the world when moving a subproject that already has subprojects. - https://phabricator.wikimedia.org/T242254 (10mmodell) p:05Medium→03Low [18:31:33] 10Phabricator, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10User-MModell: Make sure elasticsearch 6 is supported in phabricator - https://phabricator.wikimedia.org/T181393 (10mmodell) This won't be needed if we are successful with {T230787} [18:31:35] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)): local-charts: Repair ability to enable xdebug on mw/core - https://phabricator.wikimedia.org/T246921 (10thcipriani) p:05Triage→03Medium [18:32:02] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10local-charts: local-charts: Repair ability to enable xdebug on mw/core - https://phabricator.wikimedia.org/T246921 (10thcipriani) [18:32:38] 10Release-Engineering-Team-TODO: Get an IDE to run tests against services running in kubernetes - https://phabricator.wikimedia.org/T245656 (10thcipriani) [18:36:41] 10Phabricator, 10Project-Admins, 10CommRel-Design, 10WMF-Communications: Archive #CommRel-Design and related Phabricator Form? - https://phabricator.wikimedia.org/T246853 (10mmodell) >>! In T246853#5942278, @hdothiduc wrote: > Hi! Thanks for creating this, Andre! > I think archiving the Form 62 sounds good... [18:37:24] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Developer Productivity, and 2 others: Upgrade to Gerrit 2.16.13 - https://phabricator.wikimedia.org/T200739 (10Jdforrester-WMF) [18:37:55] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Scap: Make scap release with --canary-wait-time and integration test changes - https://phabricator.wikimedia.org/T246455 (10thcipriani) [18:38:05] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Scap: Port scap to Python 3 - https://phabricator.wikimedia.org/T246025 (10thcipriani) [18:38:18] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Scap: Add an integration smoke test to scap - https://phabricator.wikimedia.org/T245614 (10thcipriani) [18:43:04] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10BlueSpice: Investigate Zuul crashing due to BlueSpice REL1_31 backlog - https://phabricator.wikimedia.org/T246903 (10Zoranzoki21) I have better option: # Make another account for I18N and bumping version num... [18:51:01] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10BlueSpice: Investigate Zuul crashing due to BlueSpice REL1_31 backlog - https://phabricator.wikimedia.org/T246903 (10thcipriani) p:05Triage→03Medium [18:52:57] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release Pipeline (Blubber): Implement golang directives for Blubber - https://phabricator.wikimedia.org/T246700 (10thcipriani) p:05Triage→03Medium [18:53:51] 10Release-Engineering-Team, 10Language-Team, 10MediaWiki-Interface, 10I18n, and 2 others: Interface language using Accept-Language header value instead of $wgLanguageCode - https://phabricator.wikimedia.org/T246071 (10Nirmos) The post-mortem at https://wikitech.wikimedia.org/wiki/Incident_documentation/202... [19:12:26] mutante: Yeah, we need to decom the 09 machine. [19:13:30] ok, ack [19:14:02] (There's a task, I just have no idea how to check that switching the 09 machine off won't break things more.) [19:15:27] First thing first, though, deployment-puppetmaster04 is disk-full and very unhappy. [19:15:35] Which is holding up everything else. [19:15:53] What'd happen if I just restarted the image? [19:18:45] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-brennen, 10User-zeljkofilipin: 1.35.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T233870 (10brennen) [19:19:46] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-brennen: 1.35.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T233871 (10brennen) [19:22:42] 10Continuous-Integration-Config: Add mediawiki/extensions/PushNotifications.git to CI - https://phabricator.wikimedia.org/T246933 (10MarcoAurelio) [19:23:52] !log Rebooting deployment-puppetmaster04 [19:23:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:26:11] OK, well, "progress". [19:26:26] Now the error is "Evaluation Error: Error while evaluating a Resource Statement, Function lookup() did not find a value for the name 'profile::services_proxy::envoy::local_clusters" [19:27:34] James_F: ok, in prod it is hieradata/common/profile/services_proxy/envoy.yaml:profile::services_proxy::envoy::local_clusters: [19:27:59] Envoy is a new thing, right? [19:28:05] another list we need to copy to eqiad1.yaml probably [19:28:08] or not sure yet [19:28:17] James_F: yes, that's new [19:28:49] a part of it is it does TLS termination [19:29:28] so we can have https also between caching layer and backends or between services [19:31:26] Right. Do we just need to configure it off in Beta Cluster? [19:32:34] i am not sure if it uses that but there is "has_tls: true/false" in Hiera in a couple places [19:32:45] incl. in mediawiki::appserver [19:33:01] if you don't have that in Hiera in Horizon yet you can try it [19:33:12] you might have has_lvs: false but not has_tls [19:34:00] 10Release-Engineering-Team, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10MediaWiki-Docker, 10dev-images: Increase Docker image's PHP upload limit from 2 MiB to 100 MiB - https://phabricator.wikimedia.org/T246930 (10brennen) a:03brennen Will do. [19:34:08] !log Shutting off deployment-parsoid09 for T246395 [19:34:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:34:12] T246395: Decommission deployment-parsoid09 - https://phabricator.wikimedia.org/T246395 [19:34:37] 10Release-Engineering-Team, 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10MediaWiki-Docker, 10dev-images, 10User-brennen: Increase Docker image's PHP upload limit from 2 MiB to 100 MiB - https://phabricator.wikimedia.org/T246930 (10brennen) [19:36:21] RECOVERY - Free space - all mounts on deployment-puppetmaster04 is OK: OK: All targets OK [19:36:24] PROBLEM - Host deployment-parsoid09 is DOWN: CRITICAL - Host Unreachable (172.16.5.63) [19:36:39] lunch & [19:36:43] Meh, where can I configure shinken-wm to ignore parsoid09? [19:37:19] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Decommission deployment-parsoid09 - https://phabricator.wikimedia.org/T246395 (10Jdforrester-WMF) [19:38:36] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10MediaWiki-Docker, 10dev-images, 10User-brennen: Job queue runners for MediaWiki-Docker - https://phabricator.wikimedia.org/T246935 (10brennen) [19:40:59] 10Beta-Cluster-Infrastructure: deployment-puppetmaster04 disc use exploding - https://phabricator.wikimedia.org/T246937 (10Jdforrester-WMF) [19:43:00] 10Beta-Cluster-Infrastructure: deployment-puppetmaster04 disc use exploding - https://phabricator.wikimedia.org/T246937 (10Jdforrester-WMF) Not quite sure what's blowing up exactly. The log has huge numbers of comments about hiera, but some level of that is normal, right? [19:51:38] (03PS1) 10MarcoAurelio: [LiveTranslate] Archive extension [integration/config] - 10https://gerrit.wikimedia.org/r/576936 (https://phabricator.wikimedia.org/T196452) [19:53:57] !log GitHub: extension-LiveTranslate mirror deleted refs. T196452 [19:53:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:53:59] T196452: Archive the LiveTranslate extension - https://phabricator.wikimedia.org/T196452 [20:06:23] (03CR) 10DannyS712: [C: 03+1] "LGTM" [integration/config] - 10https://gerrit.wikimedia.org/r/576936 (https://phabricator.wikimedia.org/T196452) (owner: 10MarcoAurelio) [20:19:32] hauskatze: :] thank you! [20:19:49] hashar: pourquoi? [20:20:11] hauskatze: for the github cleanup !! [20:20:21] C'était vraiment un plaisir [20:21:06] 10Phabricator, 10Operations, 10Traffic: Phabricator is inaccessible from Egypt - https://phabricator.wikimedia.org/T246923 (10Aklapper) Hi, this might be intended if you use certain providers / IP addresses which were also used by a persistent vandal (non-public reference: T218589#5033515 ). [20:51:46] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10MediaWiki-Docker, 10dev-images, 10User-brennen: Job queue runners for MediaWiki-Docker - https://phabricator.wikimedia.org/T246935 (10kostajh) I'm not sure about what the "80% of use cases" behavior should be here. We could use Redis for job qu... [20:57:13] 10Beta-Cluster-Infrastructure: deployment-puppetmaster04 disc use exploding - https://phabricator.wikimedia.org/T246937 (10Jdforrester-WMF) … and an hour later it's now disk full again. :-( [20:57:17] 10Beta-Cluster-Infrastructure, 10MediaWiki-File-management: I can duplicate my files (identical revisions) on betacommons by (un)deleting them - https://phabricator.wikimedia.org/T246695 (10AlexisJazz) With 250ms delay still not reliable, undeletion of https://commons.wikimedia.beta.wmflabs.org/wiki/File:76f2d... [20:57:21] PROBLEM - Free space - all mounts on deployment-puppetmaster04 is CRITICAL: CRITICAL: deployment-prep.deployment-puppetmaster04.diskspace.root.byte_percentfree (<22.22%) [20:59:27] !log Deleting deployment-parsoid09 for T246395 [20:59:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:59:29] T246395: Decommission deployment-parsoid09 - https://phabricator.wikimedia.org/T246395 [20:59:39] !log deployment-puppetmaster04: /var/log/syslog /var/log/debug and /var/log/daemon.log are 5GB filling / [20:59:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:00:54] James_F: eeeek [21:01:15] hashar: Yeah, I'm restarting it to clear the /var/log [21:01:24] the files got deleted [21:01:31] instead one can trim them [21:01:38] echo -n > /somefile.txt [21:01:48] and one would want to check which process fills those [21:01:53] Sure, but it'll not drop the file lock on them. [21:02:01] It's puppet. See the task. [21:02:24] ah [21:02:43] maybe some SRE turned debug on [21:03:06] Or there's a new hiera flag that defaults on and we need to set it off? [21:03:16] Last puppet commit: (993ad5c932) root - scap: enable logging to syslog [21:03:23] Bug: T227080 [21:03:23] Change-Id: I4fbaca635b18b1f2493d2d4b3755615ded794626 [21:03:24] T227080: Deprecate all non-Kafka logstash inputs - https://phabricator.wikimedia.org/T227080 [21:04:05] Oh, yeah, that might break things. [21:04:16] syslog isn't the only one growing out of control. [21:04:42] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Decommission deployment-parsoid09 - https://phabricator.wikimedia.org/T246395 (10Jdforrester-WMF) 05Open→03Resolved a:03Jdforrester-WMF Done. [21:04:45] 10Beta-Cluster-Infrastructure, 10Cloud-VPS (Debian Jessie Deprecation): Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster - https://phabricator.wikimedia.org/T218729 (10Jdforrester-WMF) [21:05:38] The actual top item from puppet is 01739ac9b4 though? [21:05:48] Looking at /var/lib/git/operations/puppet [21:07:28] it get rebased from time to time I guess [21:07:41] there is a cronjob to keep the local checkout in sync with the gerrit repo [21:07:53] so every 10 minutes or so there is a fetch && rebase [21:08:07] Yesd. [21:08:10] but the issue seems to be the puppet-master having debug log on [21:08:14] But it's not successfully running. [21:08:14] and hiera spamming ;D [21:08:33] And yeah, I think the puppetmaster spamming isn't the problem, I think it's just masking the problem. [21:09:18] 10Phabricator, 10Operations, 10Traffic: Phabricator is inaccessible from Egypt - https://phabricator.wikimedia.org/T246923 (10Krenair) If it is that I would not expect HTTP 501 responses. [21:11:12] 10Continuous-Integration-Config, 10Product-Infrastructure-Team-Backlog (Kanban): Add mediawiki/extensions/PushNotifications.git to CI - https://phabricator.wikimedia.org/T246933 (10Mholloway) a:03Mholloway [21:12:19] RECOVERY - Free space - all mounts on deployment-puppetmaster04 is OK: OK: All targets OK [21:16:26] 10Release-Engineering-Team-TODO, 10Cleanup, 10Operations, 10Traffic, and 3 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10bd808) [21:16:38] no clue :/ [21:17:23] 10Release-Engineering-Team-TODO, 10Cleanup, 10Operations, 10Traffic, and 3 others: Retire fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T238803 (10Jdforrester-WMF) [21:17:42] I guess I must fall on the mercy of K.renair then. :-( [21:20:24] 10Beta-Cluster-Infrastructure: deployment-puppetmaster04 disc use exploding - https://phabricator.wikimedia.org/T246937 (10Jdforrester-WMF) @hashar points out that the last puppet commit is suspicious: `(993ad5c932) root - scap: enable logging to syslog` / T227080 / I4fbaca635b18b1f2493d2d4b3755615ded794626. Ma... [21:23:14] (03PS1) 10MarcoAurelio: layout: [GlobalBlocking] Add code coverage tests [integration/config] - 10https://gerrit.wikimedia.org/r/576961 [21:27:02] (03PS1) 10Jforrester: make-deployment-calendar: Stop pinging the Parsing team [tools/release] - 10https://gerrit.wikimedia.org/r/576964 [21:28:20] (03Abandoned) 10MarcoAurelio: layout: [GlobalBlocking] Add code coverage tests [integration/config] - 10https://gerrit.wikimedia.org/r/576961 (owner: 10MarcoAurelio) [21:29:50] (03CR) 10Subramanya Sastry: [C: 03+1] make-deployment-calendar: Stop pinging the Parsing team [tools/release] - 10https://gerrit.wikimedia.org/r/576964 (owner: 10Jforrester) [21:39:31] (03CR) 10MarcoAurelio: [C: 03+2] make-deployment-calendar: Stop pinging the Parsing team [tools/release] - 10https://gerrit.wikimedia.org/r/576964 (owner: 10Jforrester) [21:39:53] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [21:40:51] (03Merged) 10jenkins-bot: make-deployment-calendar: Stop pinging the Parsing team [tools/release] - 10https://gerrit.wikimedia.org/r/576964 (owner: 10Jforrester) [21:45:41] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10MediaWiki-Docker, 10dev-images, 10User-brennen: Job queue runners for MediaWiki-Docker - https://phabricator.wikimedia.org/T246935 (10brion) I think principle of least surprise leads us to want a job queue that is processed automatically without... [21:47:24] 10Phabricator, 10Operations, 10Traffic: Phabricator is inaccessible from Egypt - https://phabricator.wikimedia.org/T246923 (10ahmad) Shpuldn't this be a 403/Forbidden response? [21:47:27] (03PS1) 10Mholloway: Add CI config for mediawiki/extensions/PushNotifications [integration/config] - 10https://gerrit.wikimedia.org/r/576969 [21:48:45] 10Phabricator, 10Operations, 10Traffic: Phabricator is inaccessible from Egypt: HTTP 501 error - https://phabricator.wikimedia.org/T246923 (10Aklapper) [21:49:17] (03CR) 10Mholloway: "Since this is an empty repo at the moment, I assume we won't need the -composer variant of extension-quibble." [integration/config] - 10https://gerrit.wikimedia.org/r/576969 (owner: 10Mholloway) [21:50:11] I guess we don't use composer but quibble nonetheless? ^^ [21:55:59] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Scap, 10Python3-Porting: Port scap to Python 3 - https://phabricator.wikimedia.org/T246025 (10Aklapper) [22:00:42] 10Phabricator, 10Operations, 10Traffic: Phabricator is inaccessible from Egypt: HTTP 501 error - https://phabricator.wikimedia.org/T246923 (10Urbanecm) >>! In T246923#5943004, @Krenair wrote: > If it is that I would not expect HTTP 501 responses. >>! In T246923#5943149, @ahmad wrote: > Shpuldn't this be a 4... [22:09:08] (03CR) 10Jforrester: [C: 03+2] Add CI config for mediawiki/extensions/PushNotifications [integration/config] - 10https://gerrit.wikimedia.org/r/576969 (owner: 10Mholloway) [22:10:11] (03Merged) 10jenkins-bot: Add CI config for mediawiki/extensions/PushNotifications [integration/config] - 10https://gerrit.wikimedia.org/r/576969 (owner: 10Mholloway) [22:11:06] (03PS2) 10MarcoAurelio: [LiveTranslate] Archive extension [integration/config] - 10https://gerrit.wikimedia.org/r/576936 (https://phabricator.wikimedia.org/T196452) [22:12:06] !log Zuul: Add CI for mediawiki/extensions/PushNotifications [22:12:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:12:37] (03CR) 10Jforrester: [C: 03+2] [LiveTranslate] Archive extension [integration/config] - 10https://gerrit.wikimedia.org/r/576936 (https://phabricator.wikimedia.org/T196452) (owner: 10MarcoAurelio) [22:13:43] (03Merged) 10jenkins-bot: [LiveTranslate] Archive extension [integration/config] - 10https://gerrit.wikimedia.org/r/576936 (https://phabricator.wikimedia.org/T196452) (owner: 10MarcoAurelio) [22:14:04] 10Continuous-Integration-Config, 10Release-Engineering-Team-TODO (201911): Ensure all wikis are configured to be in exactly one "family" (wikipedia/wiktionary/special/…) - https://phabricator.wikimedia.org/T239301 (10Urbanecm) 05Resolved→03Open I seem to accidentally reverted this by https://gerrit.wikimed... [22:14:12] 10Continuous-Integration-Config, 10Release-Engineering-Team (Pipeline), 10Release-Engineering-Team-TODO, 10Scap, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Urbanecm) [22:14:56] 10Continuous-Integration-Config, 10Release-Engineering-Team-TODO (201911): Ensure all wikis are configured to be in exactly one "family" (wikipedia/wiktionary/special/…) - https://phabricator.wikimedia.org/T239301 (10Urbanecm) [22:15:15] !log Zuul: Archive CI for LiveTranslate T196452 [22:15:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:15:17] T196452: Archive the LiveTranslate extension - https://phabricator.wikimedia.org/T196452 [22:16:19] 10Continuous-Integration-Config, 10Release-Engineering-Team-TODO (201911): Ensure all wikis are configured to be in exactly one "family" (wikipedia/wiktionary/special/…) - https://phabricator.wikimedia.org/T239301 (10Urbanecm) Time to revert https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/565751... [22:18:52] 10Continuous-Integration-Config, 10Release-Engineering-Team-TODO (201911): Ensure all wikis are configured to be in exactly one "family" (wikipedia/wiktionary/special/…) - https://phabricator.wikimedia.org/T239301 (10Jdforrester-WMF) 05Open→03Resolved This is still resolved (except for special wikis, which... [22:19:00] 10Continuous-Integration-Config, 10Release-Engineering-Team (Pipeline), 10Release-Engineering-Team-TODO, 10Scap, and 5 others: Define variant Wikimedia production config in compiled, static files - https://phabricator.wikimedia.org/T223602 (10Jdforrester-WMF) [22:30:05] 10Continuous-Integration-Config, 10Release-Engineering-Team-TODO (201911), 10Patch-For-Review: Ensure all wikis are configured to be in exactly one "family" (wikipedia/wiktionary/special/…) - https://phabricator.wikimedia.org/T239301 (10Krinkle) @Urbanecm Thanks, that makes sense. I guess I didn't have to re... [22:33:42] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [22:38:24] PROBLEM - Free space - all mounts on deployment-puppetmaster04 is CRITICAL: CRITICAL: deployment-prep.deployment-puppetmaster04.diskspace.root.byte_percentfree (<44.44%) [22:41:53] (03PS1) 10Brennen Bearnes: php-fpm-apache: set upload size limit to 100M [releng/dev-images] - 10https://gerrit.wikimedia.org/r/576983 (https://phabricator.wikimedia.org/T246930) [22:49:00] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [22:53:20] RECOVERY - Free space - all mounts on deployment-puppetmaster04 is OK: OK: All targets OK [23:14:20] PROBLEM - Free space - all mounts on deployment-puppetmaster04 is CRITICAL: CRITICAL: deployment-prep.deployment-puppetmaster04.diskspace.root.byte_percentfree (<44.44%) [23:16:47] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [23:50:55] 10Continuous-Integration-Config, 10Product-Infrastructure-Team-Backlog (Kanban): Add mediawiki/extensions/PushNotifications.git to CI - https://phabricator.wikimedia.org/T246933 (10Jdforrester-WMF) 05Open→03Resolved This is done.