[00:05:43] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team: Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3776447 (10awight) [00:06:01] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current): Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3776331 (10awight) [00:06:47] 10Gerrit-Migration, 10Diffusion, 10GitHub-Mirrors, 10Repository-Admins: Have Phabricator take over replication to Github - https://phabricator.wikimedia.org/T115624#1728437 (10TerraCodes) Are there any repos that still mirror from gerrit rather than phab? [00:24:37] 10Gerrit-Migration, 10Diffusion, 10GitHub-Mirrors, 10Repository-Admins: Have Phabricator take over replication to Github - https://phabricator.wikimedia.org/T115624#3776471 (10demon) >>! In T115624#3776450, @TerraCodes wrote: > Are there any repos that still mirror from gerrit rather than phab? Um, almost... [00:26:22] 10Gerrit-Migration, 10Phabricator, 10Phabricator (Upstream), 10Upstream: Disable policies on Differential - https://phabricator.wikimedia.org/T118669#3776475 (10TerraCodes) Has upstream added this? [00:34:45] Apparently Gerrit throws HTTP 500 when trying to save a draft comment containing an emoji [00:38:04] Krinkle: i guess gerrit is anti-emoji? [00:39:46] For a product (Gerrit) that effect people emotionally on a regular basis, it seems rather oppressive to disallow emoji. [00:39:53] affects* [00:41:06] Thats kown [00:41:08] Krinkle [00:41:15] It's because we are using utf8 [00:41:20] but need to migrate to utf8mb4 [00:41:28] migrating to notedb will fix this [00:41:56] see T174034 [00:41:56] T174034: Migrate to NoteDb - https://phabricator.wikimedia.org/T174034 [00:42:59] Zppix gerrit is not ant emojie. It is pro emojies with polygerrit emojie interface [00:43:08] ant = anti [00:43:56] PROBLEM - Free space - all mounts on integration-slave-jessie-1004 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1004.diskspace._srv.byte_percentfree (<50.00%) [00:46:44] * paladox forgot that when i sent logs from my local install of gerrit it included my real name lol @ http://gerrit-logstash.wmflabs.org/ [00:48:50] "Gerrit Code Review 2.15-rc2-1194-g712fd46f56 ready" [00:49:57] 10Gerrit, 10Operations, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3776546 (10Dzahn) Maybe Planet can be the guinea pig. [00:52:29] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current): Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3776331 (10Legoktm) RecentChanges is one of the core features that basically can't be unavailabl... [01:03:05] no_justification i've converted our log4j properties file into an xml one per gehel feedback which is to make use of the advanced features to prevent gerrit having problems when logstash is unreachable. [01:03:07] :) [01:03:20] i've also found some other fixes too [01:03:25] like path to the gc file [01:04:26] Cool cool. Yeah logging to the disk is important for when we can't hit logstash [01:04:46] yep. [01:04:59] i've been successfully running it since i created the patch [01:23:57] 10MediaWiki-Releasing: Cleanup https://www.mediawiki.org/keys/keys.html and related - https://phabricator.wikimedia.org/T181017#3776610 (10Legoktm) [01:25:55] 10MediaWiki-Releasing: keys.html has outdated styling - https://phabricator.wikimedia.org/T181018#3776624 (10Legoktm) [01:26:34] no_justification i think we can do the logstash thing tommror? :) [01:26:46] i am in if you like :) [01:28:02] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<30.00%) [01:33:48] 10MediaWiki-Releasing: Consider using a single MediaWiki releases key instead of individual keys - https://phabricator.wikimedia.org/T181019#3776640 (10Legoktm) [01:53:54] PROBLEM - Free space - all mounts on integration-slave-jessie-1004 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1004.diskspace._srv.byte_percentfree (<10.00%) [02:42:22] !log Adding relative time to [[Deployments]] calendar (Common.js), e.g. "4 hours from now" or "soon" [02:42:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [06:39:15] 10MediaWiki-Releasing, 10Security: Consider using a single MediaWiki releases key instead of individual keys - https://phabricator.wikimedia.org/T181019#3776881 (10greg) Input from #security or Ops security-type folks? [06:53:01] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:27:40] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T178635#3776925 (10Marostegui) [07:50:50] (03CR) 10Hashar: [C: 032] Whitelist ShoutWiki and Uncyclomedia email addresses [integration/config] - 10https://gerrit.wikimedia.org/r/392492 (owner: 10Jack Phoenix) [07:51:50] (03Merged) 10jenkins-bot: Whitelist ShoutWiki and Uncyclomedia email addresses [integration/config] - 10https://gerrit.wikimedia.org/r/392492 (owner: 10Jack Phoenix) [07:55:56] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3776954 (10hashar) [08:16:19] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T178635#3698328 (10Jack_who_built_the_house) Group 2 is back on wmf.7. When the update is gonna be? [08:48:10] RECOVERY - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is OK: OK: Less than 100.00% above the threshold [0.0] [08:49:14] 10Gerrit, 10Operations, 10Readers-Web-Backlog, 10Patch-For-Review, and 3 others: [spike] Temporarily allow pushing large objects - https://phabricator.wikimedia.org/T178189#3777010 (10phuedx) 05Open>03Resolved Being **bold**. I'll be creating a higher-level "Deploy the service" task that summarises th... [08:49:40] 10Release-Engineering-Team (Watching / External), 10Electron-PDFs, 10Operations, 10Proton, and 4 others: How should we get Chromium for use in puppeteer? - https://phabricator.wikimedia.org/T178570#3777013 (10phuedx) 05Open>03Resolved >>! In T178189#3777010, @phuedx wrote: > Being **bold**. > > I'll b... [08:51:04] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3777016 (10hashar) Thanks @thcipriani and indeed deployment-tin works just fine now :] [08:53:54] PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [08:54:27] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3777033 (10hashar) > deployment-mx > > Error: Could not retrieve catalog from remote server: Error 400 on SERVER: You can only use systemd resources on syste... [08:55:27] PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [08:55:33] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban): Various puppet issues in deployment-prep - https://phabricator.wikimedia.org/T180935#3777039 (10hashar) [08:58:43] 10Gerrit, 10Operations, 10Traffic, 10Patch-For-Review: Switch on http/2 in apache for gerrit - https://phabricator.wikimedia.org/T180978#3775204 (10elukey) Added a couple of notes in the code review: 1) mod_http2 does not work with mpm-prefork but only with worker/event (latter is preferred). The mod_http... [09:05:37] 10Release-Engineering-Team (Watching / External), 10Electron-PDFs, 10Operations, 10Proton, and 4 others: How should we get Chromium for use in puppeteer? - https://phabricator.wikimedia.org/T178570#3777061 (10phuedx) ^ For context: The conversation around this problem forked between this task and {T178189... [09:12:14] (03PS1) 10Hashar: Remove deleted repo operations/containers [integration/config] - 10https://gerrit.wikimedia.org/r/392604 [09:14:13] (03CR) 10Hashar: [C: 032] Remove deleted repo operations/containers [integration/config] - 10https://gerrit.wikimedia.org/r/392604 (owner: 10Hashar) [09:15:14] (03Merged) 10jenkins-bot: Remove deleted repo operations/containers [integration/config] - 10https://gerrit.wikimedia.org/r/392604 (owner: 10Hashar) [09:28:53] RECOVERY - Puppet errors on deployment-ms-be04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:30:27] RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:07:18] 10Beta-Cluster-Infrastructure, 10Security: Require email address to register on Beta Cluster - https://phabricator.wikimedia.org/T181034#3777390 (10MarcoAurelio) [11:27:29] 10Beta-Cluster-Infrastructure: cxserver deployment on beta is broken - https://phabricator.wikimedia.org/T181037#3777479 (10KartikMistry) [11:49:26] 10Gerrit-Migration, 10Phabricator, 10Phabricator (Upstream), 10Upstream: Disable policies on Differential - https://phabricator.wikimedia.org/T118669#3777544 (10Aklapper) >>! In T118669#3776475, @TerraCodes wrote: > Has upstream added this? If something makes you think that T118669#1842373 is not the situ... [12:22:38] Project selenium-GettingStarted » firefox,beta,Linux,BrowserTests build #593: 04FAILURE in 37 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/593/ [13:02:20] (03PS1) 10Hashar: docker: fix ci-src-setup with unbound variables [integration/config] - 10https://gerrit.wikimedia.org/r/392632 (https://phabricator.wikimedia.org/T177684) [13:02:28] !log docker push wmfreleng/ci-src-setup:v2017.11.21.12.57 && docker push wmfreleng/ci-src-setup:latest | https://gerrit.wikimedia.org/r/392632 | T177684 [13:02:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:02:32] T177684: Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684 [13:05:02] (03PS2) 10Hashar: docker: fix ci-src-setup with unbound variables [integration/config] - 10https://gerrit.wikimedia.org/r/392632 (https://phabricator.wikimedia.org/T177684) [13:06:33] (03PS1) 10Hashar: Pass env to docker run [2] [integration/config] - 10https://gerrit.wikimedia.org/r/392633 (https://phabricator.wikimedia.org/T177684) [13:08:31] !log gerrit: created wikimedia/portals/deploy https://gerrit.wikimedia.org/r/#/admin/projects/wikimedia/portals/deploy for jan_drewniak | T180777 [13:08:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:08:36] T180777: Move portal deployment artifacts into their own repo - https://phabricator.wikimedia.org/T180777 [13:20:42] (03CR) 10Hashar: [C: 032] docker: fix ci-src-setup with unbound variables [integration/config] - 10https://gerrit.wikimedia.org/r/392632 (https://phabricator.wikimedia.org/T177684) (owner: 10Hashar) [13:21:07] (03CR) 10Hashar: [C: 032] Pass env to docker run [2] [integration/config] - 10https://gerrit.wikimedia.org/r/392633 (https://phabricator.wikimedia.org/T177684) (owner: 10Hashar) [13:21:46] (03Merged) 10jenkins-bot: docker: fix ci-src-setup with unbound variables [integration/config] - 10https://gerrit.wikimedia.org/r/392632 (https://phabricator.wikimedia.org/T177684) (owner: 10Hashar) [13:22:03] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684#3777777 (10hashar) At least the mwext-php70-phan-docker seems to work now. [13:22:09] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Should we expose some JENKINS_ environment variables in docker? - https://phabricator.wikimedia.org/T177684#3777778 (10hashar) 05Open>03Resolved [13:22:22] (03Merged) 10jenkins-bot: Pass env to docker run [2] [integration/config] - 10https://gerrit.wikimedia.org/r/392633 (https://phabricator.wikimedia.org/T177684) (owner: 10Hashar) [13:24:41] !log gerrit: adding Jdrewniak to wmf-deployment group https://gerrit.wikimedia.org/r/#/admin/groups/21,members | T180639 [13:24:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:24:45] T180639: Requesting deployment access for jdrewniak - https://phabricator.wikimedia.org/T180639 [15:05:42] o/ [15:06:00] Is there a way to parallelize deployments across a cluster in scap? [15:07:28] E.g. I think that barrier synchronization would be great for everything leading up to the service restarts. Then those restarts can happen in sequence. [15:24:44] Project selenium-MobileFrontend » chrome,beta,Linux,BrowserTests build #637: 04FAILURE in 2 min 43 sec: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/637/ [15:25:06] Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #637: 04FAILURE in 3 min 5 sec: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/637/ [16:06:51] no_justification lol we have been disabling hmac md5 wrong lol. you have to do mac = on mutiple lines [16:06:54] * paladox submits a change [16:08:57] https://gerrit.wikimedia.org/r/#/c/392666/ [16:16:07] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Discovery, 10Wikimedia-Portals, and 2 others: Create a Jenkins Job that builds the portal deployment artifacts in CI - https://phabricator.wikimedia.org/T179694#3778354 (10hashar) There is now a `portals@wikimedia.org` however it is pu... [16:18:06] halfak: yes there is. The `batch_size` parameter in scap.cfg can be used to parallelize deployments. Additionally the `[stage]_batch_size` param can set it differently for different stages, e.g. fetch and promote could be completely parallel and restart_service could be serial by setting: batch_size: 80; restart_service_batch_size: 1 [16:18:42] Cool! Thanks. I'll look into that. [16:19:50] 10Release-Engineering-Team (Kanban), 10Discovery-Portal-Sprint: Create a dedicated deployment window for portal deployments - https://phabricator.wikimedia.org/T180401#3778365 (10hashar) @Jdrewniak has deployed an update today using the European SWAT window. That went smoothly!!!!!!!!! Monday/Tuesday at 11:00... [16:21:03] awesome :) I did notice at some point that ORES had batch_size: 1 but I couldn't remember the rationale there. [16:21:14] 10Continuous-Integration-Config, 10Discovery, 10Wikimedia-Portals, 10Discovery-Portal-Sprint: CI tests on wikimedia/portals repo: cache node_modules to save time - https://phabricator.wikimedia.org/T152386#3778371 (10debt) [16:21:24] 10Continuous-Integration-Config, 10Discovery, 10Wikimedia-Portals, 10Discovery-Portal-Sprint: CI tests on wikimedia/portals repo: cache node_modules to save time - https://phabricator.wikimedia.org/T152386#2846564 (10debt) p:05Triage>03Normal [16:24:18] https://gerrit.wikimedia.org/r/392671 [16:25:38] thcipriani, ^ [16:25:40] Look OK? [16:25:53] * thcipriani looks [16:29:39] thcipriani, we have https://phabricator.wikimedia.org/source/ores-deploy/browse/master/scap/cmd_web.sh [16:29:46] That currently runs during "promote" [16:29:53] Does that mean it won't be parallelized? [16:30:14] Relevant: https://phabricator.wikimedia.org/source/ores-deploy/browse/master/scap/checks.yaml;508425167b237a0c1de463b049260ed860177324$10 [16:30:30] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team (Current): Write reports about why Ext:ORES is helping cause server 500s and alternatives to fix - https://phabricator.wikimedia.org/T181010#3778429 (10awight) [16:32:41] thcipriani: Another thing that just occurred to me, Krampusnacht would come early if our virtualenv dirs were stored along with the cached source revision. The mapping is exactly 1:1, and recreating the venv is expensive during rollback. Are there any other components that are coupled to source revs this way? [16:33:39] halfak: made some clarifying (hopefully) remarks on the task. The tl;dr is fetch and promote will be parallel, restart and port check (where applicable) would be serial with that patch. [16:34:04] thcipriani, great! Thanks for your help :D [16:34:05] promote stage includes promote checks, so that would be done across servers in a group in parallel as well [16:35:56] This should speed us up by maybe 3x [16:36:03] Maybe more :) [16:36:26] awight: not as far as I know; however, the promote check is run at the end of the promote stage (which does symlink swapping fanciness to make /srv/deployment/ores/deploy point to a dir in /srv/deployment/ores-cache/revs/[whatverrev]) so in the promote check you've defined it *should be* safe to assume that /srv/deployment/ores/deploy is the same directory as the rev directory [16:37:10] thcipriani: Is there anything in the toolchain that assumes deploy/ is a pure src checkout with no additional files? [16:37:14] all that is to say that: if you build the venv in that directory and had a check in your shell script like: [ -d venv ] && echo 'venv deployed already' you could store venvs with checkouts [16:37:34] * awight high-fives [16:38:00] of course this bares some testing in beta... [16:38:04] We’d need —force to rebuild, so hopefully there’s some way to check for that from “user” scripts [16:38:20] eh not currently, but it wouldn't be too hard to add [16:38:36] there are some tasks around about injecting environtment variables for checks... [16:38:39] thcipriani: no way. /me 503’s straight to production like a C-level [16:39:34] thcipriani: hmm, that would be a blocker then. I’ve seen the venv get corrupted during deployment, so we’ll need a mechanism to retry [16:40:53] well, I suppose there's no need for a --force check in a promote check since checks don't run if a directory already exists on disk: scap assumes it was deployed correctly and won't futz with it, it'll just do what's "needed" (i.e., swap symlinks) [16:41:40] that is, if you don't pass --force and the directory for a revision already exists, it assumes that directory is in good shape for promotion (i.e., symlink swapping) [16:43:07] so maybe storing venv with revs is the Right Thing in this instance [16:44:23] thcipriani: wait, the docs for —force say that we recheckout the dir. That’s true, right? [16:44:33] yes [16:45:01] kk so you’re just saying, the promote step comes after the fetch step and —force only applies to fetch? [16:45:49] no, maybe an example will clarify (bare with me I'm a slow typer :)) ... [16:46:04] (haha sorry I’ll stop chan flooding) [16:48:45] so if you're deploying revision 123 for the first time then on the target: fetch fetches from tin to /srv/deployment/ores/deploy-cache/cache (to speed up next steps), then a new directory is created /srv/deployment/ores/deploy-cache/revs/1234 with HEAD pointed at that revision, the the fetch check (from checks.yaml) is run if there is one. [16:50:38] if you're deploying revision 123 for the second time, that is /srv/deployment/ores/deploy-cache/revs/1234 already exists on the target, then fetch fetches from tin to /srv/deployment/pores/deploy-cache/cache, then it sees that the directory /srv/deployment/ores/deploy-cache/revs/1234 exists, then it runs git rev-parse --verify HEAD and checks that to ensure HEAD == 1234 and if so: nothing else runs [16:50:41] for fetch [16:51:21] UNLESS you pass --force in which case it recreates 1234 with HEAD pointed at rev and runs the fetch check (from checks.yaml) if there is one [16:51:39] copy. [16:52:19] Well that’s bloody perfect. We build the venv in a fetch check and alles ist güt. [16:54:15] and each stage has something similar. For promote it checks that /srv/deployment/ores/deploy is a symlink to /srv/deployment/ores/deploy-cache/revs/1234 and if it is not or if you pass --force in which case it will attempt to symlink /srv/deployment/ores/deploy-cache/revs/1234 to /srv/deployment/ores/deploy and then run post promote checks (from checks.yaml). [16:54:49] oh, also service_restart is contingent upon promote happening [16:55:17] anyway, this is more than you ever wanted to know about scap :P [16:55:29] So the missing link you were mentioning is just that the promote checks don’t have access to SCAP_IS_FORCE — no problem for us. [16:55:47] LOL yes it is but will be invaluable. I’ll make a patch & see what you think. [16:58:22] (03PS1) 10Hashar: Migrate cergen tox job to docker [integration/config] - 10https://gerrit.wikimedia.org/r/392678 [16:59:15] thcipriani: Can one check apply to a list of groups? [17:00:35] i think so... mean checks are aware of groups in the code IIRC [17:00:38] * thcipriani checks [17:00:55] Currently we repeat the check for each group, is all. [17:02:05] yep: so adding a group: whatever to a check should work https://github.com/wikimedia/scap/blob/master/scap/deploy.py#L474-L479 [17:02:33] yeah, so removing group (i.e. group is None in ^) would run it for all groups [17:02:48] ooh. yes I see [17:03:08] why we named that function _valid_chk and not _valid_check is unknown to me at this time :) [17:03:20] (sure made it harder to grep for) [17:03:36] When we have a union, you’ll be paid by the letter [17:05:12] +1 [17:06:27] !log docker push wmfreleng/tox-cergen:v2017.11.21.16.52 | https://gerrit.wikimedia.org/r/392678 | For https://integration.wikimedia.org/ci/job/cergen-tox-docker/ which pass ! [17:06:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:06:35] (03CR) 10Hashar: [C: 032] Migrate cergen tox job to docker [integration/config] - 10https://gerrit.wikimedia.org/r/392678 (owner: 10Hashar) [17:10:06] (03Merged) 10jenkins-bot: Migrate cergen tox job to docker [integration/config] - 10https://gerrit.wikimedia.org/r/392678 (owner: 10Hashar) [17:38:31] 10Release-Engineering-Team (Watching / External), 10Operations, 10Scoring-platform-team (Current): Cache ORES virtualenv within versioned source - https://phabricator.wikimedia.org/T181071#3778593 (10awight) [17:53:25] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Discovery, 10Wikimedia-Portals, and 2 others: Create a Jenkins Job that builds the portal deployment artifacts in CI - https://phabricator.wikimedia.org/T179694#3778644 (10debt) >>! In T179694#3778354, @hashar wrote: > There is now a `... [17:54:46] PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:54:57] i didn’t do it [18:00:44] PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [18:01:47] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Discovery, 10Wikimedia-Portals, and 2 others: Create a Jenkins Job that builds the portal deployment artifacts in CI - https://phabricator.wikimedia.org/T179694#3733219 (10debt) [18:12:45] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Discovery, 10Wikimedia-Portals, and 2 others: Create a Jenkins Job that builds the portal deployment artifacts in CI - https://phabricator.wikimedia.org/T179694#3778690 (10RobH) [18:15:06] 10Release-Engineering-Team (Kanban), 10Discovery-Portal-Sprint: Create a dedicated deployment window for portal deployments - https://phabricator.wikimedia.org/T180401#3778700 (10debt) Yay for the update by @Jdrewniak! If we can get a dedicated time to do the portals deployment, that'd be great - 11:00 UTC s... [18:17:22] no_justification hi should we do https://gerrit.wikimedia.org/r/#/c/392079/ and https://gerrit.wikimedia.org/r/#/c/392083/ today? :) [18:34:44] RECOVERY - Puppet errors on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:38:35] !log deployed mobileapps@9d1602d on the beta cluster [18:38:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:40:41] RECOVERY - Puppet errors on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:03:19] Hello. I was told I suppose to ask Reedy or no_justification. My question is when will be released 1.30? [19:05:16] I wait before upgrade my site, but I do not know how much I suppose wait [19:05:29] adblanca: more information on timeline for that will be coming. Is there an urgent need you have? [19:05:50] Not very urgent, just wanted to know if it will be release soon [19:08:47] Thank you for your help Mr. greg-g . Have a good day [19:10:54] adblanca: not this week :) [19:17:38] greg-g: OK :) [19:37:51] (03PS1) 10Hashar: Migrate labs/tools/striker to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/392696 [19:43:27] (03CR) 10Hashar: [C: 032] Migrate labs/tools/striker to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/392696 (owner: 10Hashar) [19:43:33] no_justification how would we do a dashboard for gerrit in logstash please? [19:43:38] (03CR) 10Hashar: [C: 032] "tested / works :)" [integration/config] - 10https://gerrit.wikimedia.org/r/392696 (owner: 10Hashar) [19:44:33] (03Merged) 10jenkins-bot: Migrate labs/tools/striker to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/392696 (owner: 10Hashar) [19:50:08] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.8 deployment blockers - https://phabricator.wikimedia.org/T178635#3698328 (10thcipriani) >>! In T178635#3776970, @Jack_who_built_the_house wrote: > Group 2 is back on wmf.7. When the update is gonna be? There were some issues ye... [19:55:59] 10Beta-Cluster-Infrastructure, 10Security: Require email address to register on Beta Cluster - https://phabricator.wikimedia.org/T181034#3777390 (10Bawolff) Theres some privacy implications here - one should not assume that an email address used at beta cluster is private. [20:26:23] Yippee, build fixed! [20:26:23] Project selenium-Wikibase-chrome » chrome,beta,Linux,DebianJessie && contintLabsSlave build #19: 09FIXED in 39 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase-chrome/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=DebianJessie%20&&%20contintLabsSlave/19/ [21:28:16] !log deployment-prep Ran cleanupSpam.php on deploymentwiki. [21:28:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:50:36] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Deployments: cxserver deployment on beta is broken - https://phabricator.wikimedia.org/T181037#3779508 (10hashar) Most probably it now has to be deployed using scap from the deployment server deployment-tin.deployment-prep.eqiad.wmflabs ?... [21:58:57] Project selenium-PageTriage » chrome,beta,Linux,BrowserTests build #583: 04FAILURE in 56 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/583/ [21:59:08] Project selenium-PageTriage » firefox,beta,Linux,BrowserTests build #583: 04FAILURE in 1 min 7 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/583/ [22:05:53] (03PS1) 10Hashar: Migrate eventlogging to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/392722 [22:07:18] (03CR) 10jerkins-bot: [V: 04-1] Migrate eventlogging to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/392722 (owner: 10Hashar) [22:09:23] (03PS2) 10Hashar: Migrate eventlogging to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/392722 [22:12:02] (03CR) 10Hashar: [C: 032] "https://integration.wikimedia.org/ci/job/eventlogging-tox-docker/ pass :)" [integration/config] - 10https://gerrit.wikimedia.org/r/392722 (owner: 10Hashar) [22:16:23] (03Merged) 10jenkins-bot: Migrate eventlogging to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/392722 (owner: 10Hashar) [22:37:30] 10Continuous-Integration-Config, 10Release-Engineering-Team (Kanban), 10Discovery, 10Wikimedia-Portals, and 2 others: Create a Jenkins Job that builds the portal deployment artifacts in CI - https://phabricator.wikimedia.org/T179694#3779662 (10hashar) Nice! I will change the registered email / jenkins job... [22:49:29] 10Release-Engineering-Team (Kanban), 10Discovery-Portal-Sprint: Create a dedicated deployment window for portal deployments - https://phabricator.wikimedia.org/T180401#3779729 (10greg) When do you want to start? :) [22:56:46] no_justification just an fyi puppet is disabled on cobalt so that we can test logstash on gerrit2001. but found that because of the db issue, it dosen't log thus making that harder to test. [22:58:43] 10MediaWiki-Releasing: keys.html has outdated styling - https://phabricator.wikimedia.org/T181018#3779861 (10Krinkle) p:05Triage>03Low [23:29:34] !log deployed mobileapps@52d6a83 on the beta cluster [23:29:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:45:33] 10Release-Engineering-Team (Kanban), 10Discovery-Portal-Sprint: Create a dedicated deployment window for portal deployments - https://phabricator.wikimedia.org/T180401#3779917 (10debt) How about next week, with Nov 27th as the start date? @Jdrewniak can let me know if I'm being too optimistic here. :) [23:49:03] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<50.00%) [23:59:58] 10Release-Engineering-Team (Kanban), 10Discovery-Portal-Sprint: Create a dedicated deployment window for portal deployments - https://phabricator.wikimedia.org/T180401#3779958 (10greg) >>! In T180401#3779917, @debt wrote: > How about next week, with Nov 27th as the start date? @Jdrewniak can let me know if I'm...