[01:49:26] (03PS1) 10Kosta Harlan: Add dev directories/files to git and dockerignore [integration/quibble] - 10https://gerrit.wikimedia.org/r/497221 [02:00:52] (03CR) 10Kosta Harlan: "note, I could be totally wrong on this (maybe src and .env are needed), just an observation while building the image so I thought I'd subm" [integration/quibble] - 10https://gerrit.wikimedia.org/r/497221 (owner: 10Kosta Harlan) [02:05:22] (03PS1) 10Kosta Harlan: [WIP] Add Parsoid to docker image and run for Selenium tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/497222 (https://phabricator.wikimedia.org/T218534) [02:14:54] 10Gerrit, 10Release-Engineering-Team, 10Operations: I'm refused to login to Gerrit - https://phabricator.wikimedia.org/T218507 (10Paladox) @GChecker yes that was still required before. [02:33:03] 10Release-Engineering-Team (Kanban), 10Code-Stewardship-Reviews, 10MediaWiki-extensions-UserMerge, 10Stewards-and-global-tools: UserMerge: Code Stewardship Review - https://phabricator.wikimedia.org/T204747 (10Zer00CooL) I understand that this extension is likely to fall by the wayside. I do not have any t... [02:55:06] PROBLEM - Puppet staleness on deployment-restbase02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [03:02:22] PROBLEM - Puppet staleness on deployment-mediawiki-09 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [04:35:24] Project beta-scap-eqiad build #241796: 04FAILURE in 8 min 39 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/241796/ [04:46:24] Yippee, build fixed! [04:46:25] Project beta-scap-eqiad build #241797: 09FIXED in 9 min 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/241797/ [04:55:10] PROBLEM - Free space - all mounts on deployment-deploy01 is CRITICAL: CRITICAL: deployment-prep.deployment-deploy01.diskspace.root.byte_percentfree (<11.11%) [05:15:11] RECOVERY - Free space - all mounts on deployment-deploy01 is OK: OK: All targets OK [05:26:10] PROBLEM - Free space - all mounts on deployment-deploy01 is CRITICAL: CRITICAL: deployment-prep.deployment-deploy01.diskspace.root.byte_percentfree (<11.11%) [05:36:12] RECOVERY - Free space - all mounts on deployment-deploy01 is OK: OK: All targets OK [06:46:28] Project beta-scap-eqiad build #241808: 04FAILURE in 8 min 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/241808/ [06:57:18] Yippee, build fixed! [06:57:18] Project beta-scap-eqiad build #241809: 09FIXED in 9 min 31 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/241809/ [07:19:04] (03PS1) 10Legoktm: Use castor for mwext-php70-phan-docker [integration/config] - 10https://gerrit.wikimedia.org/r/497234 (https://phabricator.wikimedia.org/T217479) [07:24:36] (03CR) 10Legoktm: [C: 03+2] "INFO:jenkins_jobs.builder:Reconfiguring jenkins job mwext-php70-phan-docker" [integration/config] - 10https://gerrit.wikimedia.org/r/497234 (https://phabricator.wikimedia.org/T217479) (owner: 10Legoktm) [07:26:46] (03Merged) 10jenkins-bot: Use castor for mwext-php70-phan-docker [integration/config] - 10https://gerrit.wikimedia.org/r/497234 (https://phabricator.wikimedia.org/T217479) (owner: 10Legoktm) [07:34:28] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10phan: mwext-php70-phan-docker doesn't use composer cache - https://phabricator.wikimedia.org/T217479 (10Legoktm) a:03Legoktm My understanding is that someone now needs to merge a patch that runs one of these jobs to populate the cache, and then... [07:35:23] (03CR) 10Legoktm: [C: 03+2] [WikimediaEditorTasks] Add phan [integration/config] - 10https://gerrit.wikimedia.org/r/496610 (https://phabricator.wikimedia.org/T218136) (owner: 10Jforrester) [07:36:51] (03Merged) 10jenkins-bot: [WikimediaEditorTasks] Add phan [integration/config] - 10https://gerrit.wikimedia.org/r/496610 (https://phabricator.wikimedia.org/T218136) (owner: 10Jforrester) [07:37:26] !log deployed https://gerrit.wikimedia.org/r/496610 [07:37:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:01:10] PROBLEM - Free space - all mounts on deployment-deploy01 is CRITICAL: CRITICAL: deployment-prep.deployment-deploy01.diskspace.root.byte_percentfree (<11.11%) [08:41:10] RECOVERY - Free space - all mounts on deployment-deploy01 is OK: OK: All targets OK [09:07:11] PROBLEM - Free space - all mounts on deployment-deploy01 is CRITICAL: CRITICAL: deployment-prep.deployment-deploy01.diskspace.root.byte_percentfree (<11.11%) [09:08:23] (03PS1) 10Hashar: Debian glue job for operations/debs/superior-cache-analyzer [integration/config] - 10https://gerrit.wikimedia.org/r/497248 [09:09:25] (03Abandoned) 10Hashar: Debian glue job for operations/debs/superior-cache-analyzer [integration/config] - 10https://gerrit.wikimedia.org/r/497248 (owner: 10Hashar) [09:09:33] (03CR) 10Hashar: [C: 03+2] Test operations/debs/superior-cache-analyzer [integration/config] - 10https://gerrit.wikimedia.org/r/496783 (https://phabricator.wikimedia.org/T213263) (owner: 10Ema) [09:11:11] (03Merged) 10jenkins-bot: Test operations/debs/superior-cache-analyzer [integration/config] - 10https://gerrit.wikimedia.org/r/496783 (https://phabricator.wikimedia.org/T213263) (owner: 10Ema) [09:12:25] !log deployment-deploy01: cleaning disk: rm /var/cache/hhvm/cli.hhbc.sq3 [09:12:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:24:01] 10Continuous-Integration-Infrastructure, 10Shinken: Host DOWN alert for integration-publishing02 and integration-slave-docker-1046 - https://phabricator.wikimedia.org/T218146 (10hashar) [09:27:09] RECOVERY - Free space - all mounts on deployment-deploy01 is OK: OK: All targets OK [09:34:41] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.33.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T206676 (10zeljkofilipin) a:03zeljkofilipin [09:48:59] 10Release-Engineering-Team (Kanban), 10MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), 10Patch-For-Review, 10Release, and 2 others: 1.33.0-wmf.21 deployment blockers - https://phabricator.wikimedia.org/T206675 (10zeljkofilipin) 05Open→03Resolved [09:54:27] hashar: there is (as always) something I don't understand about JJB: [09:54:29] context: https://github.com/wikimedia/integration-config/blob/master/zuul/layout.yaml#L8312-L8318 [09:54:48] the post-merge build does not seem to run, as (not) seen on https://gerrit.wikimedia.org/r/c/search/glent/+/496361 [09:54:53] what did I miss? [09:55:30] 10Continuous-Integration-Infrastructure, 10OOUI: Host OOUI PHP demo (and all others?) on a PHP 7 capable server, because it needs PHP 7 - https://phabricator.wikimedia.org/T206046 (10hashar) [09:55:46] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10serviceops, 10Developer-Wishlist (2017), and 3 others: Relocate CI generated docs and coverage reports - https://phabricator.wikimedia.org/T137890 (10hashar) 05Open→03Resolved This one is resolved. There are a few pending c... [09:55:50] !log deleting shutdowned instance integration-publisher02 , we do not use it anymore since doc publishing got overhauled ( T137890 ) # T218146 [09:55:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:55:57] T137890: Relocate CI generated docs and coverage reports - https://phabricator.wikimedia.org/T137890 [09:55:57] T218146: Host DOWN alert for integration-publishing02 and integration-slave-docker-1046 - https://phabricator.wikimedia.org/T218146 [09:56:33] 10Continuous-Integration-Infrastructure, 10Shinken: Host DOWN alert for integration-publishing02 and integration-slave-docker-1046 - https://phabricator.wikimedia.org/T218146 (10hashar) The configuration is generated via a python script `modules/shinken/files/shinkengen` in operations/puppet. It seems to query... [09:57:16] 10Continuous-Integration-Infrastructure, 10Shinken: Host DOWN alert for integration-slave-docker-1046 - https://phabricator.wikimedia.org/T218146 (10hashar) [09:58:47] !log arming keyholder on integration-cumin [09:58:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:06:19] !log github: deleting https://github.com/wikimedia/wikidata-gremlin # archived T155829 [10:06:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:06:22] T155829: Archive wikidata/gremlin.git - https://phabricator.wikimedia.org/T155829 [10:09:38] !log contint1001: rm -fR /srv/doc1001.eqiad.wmnet [10:09:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:11:27] gehel: task fill it ! :) [10:11:36] but in short, I think postmerge: is broken [10:15:07] gehel: I have added it manually [10:21:26] 10Continuous-Integration-Config: post-merge jenkins job not run after merge for search/glent project - https://phabricator.wikimedia.org/T218550 (10Gehel) [10:21:52] hashar: ^ done! No idea what info you need. Ping me if there is anything I can do! [10:22:18] gehel: I guess Ineed time to invesdtigate the issue :((( [10:22:45] does not sound like I can help with that one :( [10:35:24] 10Continuous-Integration-Config: post-merge jenkins job not run after merge for search/glent project - https://phabricator.wikimedia.org/T218550 (10hashar) The change got merged on 03/18 at 09:29 UTC. From Zuul debug logs ` 2019-03-18 09:29:11,764 DEBUG zuul.source.Gerrit: Updating gehel: yeah well it is an entire mystery :-/// [10:36:26] hashar: sounds like everything is in place, it might "just work (tm)" next time? [10:36:38] might :( [10:36:49] or it is an issue related to gerrit upgrade from last week [10:39:22] let's wait until we have something else to merge and see what happens [10:39:28] not a deal breaker anyway [10:40:45] 10Release-Engineering-Team (Kanban), 10Code-Stewardship-Reviews, 10MediaWiki-extensions-UserMerge, 10Stewards-and-global-tools: UserMerge: Code Stewardship Review - https://phabricator.wikimedia.org/T204747 (10MarcoAurelio) UserMerge is not going to dissapear. It'll just be undeployed from Wikimedia wikis... [10:42:50] hashar: https://integration.wikimedia.org/ci/job/mwext-node10-rundoc-docker/11/console :( [10:42:54] any ideas? [10:48:57] hashar: made https://phabricator.wikimedia.org/T218553 [10:49:19] * hashar E_QUEUE_OVERFLOW [10:49:25] :D [10:50:00] I was just replying to my team how I just noticed a task from 03/7th and can see any forseable future to even start looking into it :( [10:50:13] according to https://stackoverflow.com/questions/33559746/c-handling-queue-overflow, MsQueue.resize(MsQueue.max_size() + 100) [10:50:40] or rather HasharQueue.resize(HasharQueue.max_size() + 100) [10:50:43] :D [10:51:13] this one might relate to https://phabricator.wikimedia.org/T213944 [10:51:25] but the timing seems to be a bit off? as this issue only just appeared [10:51:35] addshore: check what version of fiber is being used and what depends on it in your repo [10:51:50] *looks* [10:51:50] and I guess the fiver version is not compatible with whatever version of nodejs the job container is using [10:51:57] timo did some upgrade of nodejs since january [10:52:07] to switch us from node 6 I think toward node 10 (maybe) [10:52:32] hmm, fiber / fiver? in package.json ? [10:52:39] yeah or has a dep [10:52:46] then those messages might just be warnings [10:52:49] 11:31:49 > fibers@2.0.2 install /src/node_modules/fibers [10:52:52] since in the end: [10:52:53] 10:16:40 Installed in `/src/node_modules/fibers/bin/linux-x64-64/fibers.node` [10:53:42] OAAH AZHEAZEO [10:53:59] xD [10:55:14] OH [10:55:26] addshore: package.json is missing a "doc" script https://phabricator.wikimedia.org/T218553#5031983 [10:55:46] should it have one? :P [10:55:59] well did it have one previously? [10:56:04] i dont believe so [10:56:06] guess [10:56:09] I am going to revert stuff [10:56:23] https://github.com/wikimedia/mediawiki/blob/master/package.json has one [10:56:34] * 8c5d2563 - Replace *-jsduck-* jobs with *-node10-docs-* ones (35 hours ago) | [10:56:34] | zuul/layout.yaml | 41 +++++++++++------------------------------ [10:56:41] aaah yes, the failure point is the doc script [10:56:48] evil James_F [10:56:49] :P [10:57:41] I cant blame anyone for enhancing CI and moving toward phasing out jsduck hehe [10:58:27] (03PS1) 10Hashar: Revert "Replace *-jsduck-* jobs with *-node10-docs-* ones" [integration/config] - 10https://gerrit.wikimedia.org/r/497268 (https://phabricator.wikimedia.org/T218553) [10:58:34] (03PS2) 10Hashar: Revert "Replace *-jsduck-* jobs with *-node10-docs-* ones" [integration/config] - 10https://gerrit.wikimedia.org/r/497268 (https://phabricator.wikimedia.org/T218553) [10:59:19] (03CR) 10Hashar: [C: 03+2] Revert "Replace *-jsduck-* jobs with *-node10-docs-* ones" [integration/config] - 10https://gerrit.wikimedia.org/r/497268 (https://phabricator.wikimedia.org/T218553) (owner: 10Hashar) [11:01:03] (03Merged) 10jenkins-bot: Revert "Replace *-jsduck-* jobs with *-node10-docs-* ones" [integration/config] - 10https://gerrit.wikimedia.org/r/497268 (https://phabricator.wikimedia.org/T218553) (owner: 10Hashar) [11:01:47] addshore: I went reverting. should be good [11:01:59] not sure what other repos end up being broken as part of that revert [11:02:02] :( [11:03:09] :/ [11:30:41] 10Continuous-Integration-Infrastructure, 10Operations-Software-Development, 10cloud-services-team: puppet broken on integration WMCS instances due to cumin/openstack Debian packages - https://phabricator.wikimedia.org/T218559 (10hashar) [11:36:15] 10Continuous-Integration-Config, 10Project-Admins: Create phabricator tag to track CI blockers (#jenkins-failure) - https://phabricator.wikimedia.org/T218043 (10Lucas_Werkmeister_WMDE) >>! In T218043#5027961, @Jdforrester-WMF wrote: > We use #jenkins-failure for these. I don't think we need a second tag? And... [12:00:55] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10zeljkofilipin) a:03zeljkofilipin [12:02:04] 10Continuous-Integration-Infrastructure, 10HHVM, 10Language-Team (Language-2019-January-March), 10Patch-For-Review, 10Wikimedia-production-error (Shared Build Failure): Merge blocker: quibble-vendor-mysql-hhvm-docker in gate fails for most merges (exit status -11... - https://phabricator.wikimedia.org/T216689 [12:03:41] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10HHVM, 10Jenkins: phpunit drops dead on some extension tests on hhvm - https://phabricator.wikimedia.org/T217384 (10hashar) For the record, that was due to an issue in libc6/libpthread which got fixed in a more recent Debian package than... [12:07:34] 10Phabricator (Upstream), 10Upstream: Burnup rate month date shown in tool tip is off by one. January is month zero! - https://phabricator.wikimedia.org/T164478 (10hashar) [12:07:54] 10Phabricator (Upstream), 10Upstream: Burnup rate month date shown in tool tip is off by one. January is month zero! - https://phabricator.wikimedia.org/T164478 (10hashar) [12:09:25] 10Phabricator (Upstream), 10Upstream: Burnup rate month date shown in tool tip is off by one. January is month zero! - https://phabricator.wikimedia.org/T164478 (10hashar) 05Stalled→03Resolved Fixed by #upstream with https://secure.phabricator.com/D19967 Thank you @epriestley ! Indeed the tooltip is fine... [12:12:19] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10zeljkofilipin) [12:19:57] 10Beta-Cluster-Infrastructure, 10PHP 7.0 support: Beta Cluster does not have php7.0-redis available - https://phabricator.wikimedia.org/T217938 (10Joe) Sorry for coming late to the party, I was AFK last week. I should really eradicate php7.0 from both production and beta, as it's unsupported at this point. O... [12:20:56] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10zeljkofilipin) [12:27:20] 10Beta-Cluster-Infrastructure, 10PHP 7.0 support: Beta Cluster does not have php7.0-redis available - https://phabricator.wikimedia.org/T217938 (10Joe) Digging further: - php-redis was rebuilt as part of the work for T216712 **without** php 7.0 support - it has been installed in deployment-prep but not in pro... [12:27:31] 10Beta-Cluster-Infrastructure, 10PHP 7.0 support: Beta Cluster does not have php7.0-redis available - https://phabricator.wikimedia.org/T217938 (10Joe) a:03Joe [12:29:27] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10zeljkofilipin) [12:41:49] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Determine a standard way of installing MediaWiki lib/extension dependencies within containers - https://phabricator.wikimedia.org/T193824 (10hashar) We have filled this task following #release-engineering-team May 2018 offsite. One of the need is for CI... [12:46:39] 10Continuous-Integration-Config, 10phpunit-patch-coverage: Do not run mediawiki-phpunit-coverage-patch-docker on wmf branches - https://phabricator.wikimedia.org/T202496 (10hashar) 05Open→03Resolved a:03Krinkle 8b45a9a4df8b2e56d4f262a23ea94d677d64dd58 introduced the job in Feb 2018 with: ` # Only run... [12:47:04] 10Continuous-Integration-Infrastructure, 10Operations-Software-Development, 10cloud-services-team (Kanban): puppet broken on integration WMCS instances due to cumin/openstack Debian packages - https://phabricator.wikimedia.org/T218559 (10aborrero) p:05Triage→03High [12:52:12] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10zeljkofilipin) [12:54:59] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10zeljkofilipin) [12:59:51] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10zeljkofilipin) [13:04:40] 10Release-Engineering-Team (Kanban), 10Quibble, 10Patch-For-Review: Error: 1071 Specified key was too long; max key length is 767 bytes - https://phabricator.wikimedia.org/T193222 (10hashar) 05Open→03Resolved a:03hashar I have filled this task when I have migrated the CI job for MediaWiki toward Docker... [13:13:36] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10zeljkofilipin) TLDR: I would not recommend Harbormaster. Phabricator does not have an easy way (like Docker container) to install and test it locally. It does have a free... [13:13:50] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10zeljkofilipin) [13:14:14] 10Release-Engineering-Team (Kanban): Consider and evaluate possible new CI tooling - https://phabricator.wikimedia.org/T217325 (10zeljkofilipin) [13:14:17] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10zeljkofilipin) 05Open→03Resolved [13:19:45] 10Release-Engineering-Team (Kanban): Consider and evaluate possible new CI tooling - https://phabricator.wikimedia.org/T217325 (10zeljkofilipin) [13:20:59] 10Release-Engineering-Team (Kanban): Consider and evaluate possible new CI tooling - https://phabricator.wikimedia.org/T217325 (10zeljkofilipin) [13:21:27] 10Release-Engineering-Team (Kanban): Consider and evaluate possible new CI tooling - https://phabricator.wikimedia.org/T217325 (10zeljkofilipin) [13:22:35] 10Release-Engineering-Team (Kanban): Consider and evaluate possible new CI tooling - https://phabricator.wikimedia.org/T217325 (10zeljkofilipin) [13:23:46] 10Release-Engineering-Team (Kanban): Evaluate Tekton Pipeline - https://phabricator.wikimedia.org/T217912 (10LarsWirzenius) a:05LarsWirzenius→03None [13:24:00] 10Release-Engineering-Team (Kanban): Consider and evaluate possible new CI tooling - https://phabricator.wikimedia.org/T217325 (10zeljkofilipin) [13:24:18] 10Release-Engineering-Team (Kanban): Evaluate GitLab-CI - https://phabricator.wikimedia.org/T217594 (10zeljkofilipin) [13:24:20] (03CR) 10Kosta Harlan: [C: 04-1] "So, I think this change is not needed and will abandon it. IIRC, when I was experimenting with Quibble a few months ago I followed the "se" [integration/quibble] - 10https://gerrit.wikimedia.org/r/497221 (owner: 10Kosta Harlan) [13:24:24] (03Abandoned) 10Kosta Harlan: Add dev directories/files to git and dockerignore [integration/quibble] - 10https://gerrit.wikimedia.org/r/497221 (owner: 10Kosta Harlan) [13:25:28] 10Release-Engineering-Team (Kanban): Evaluate sourcehut for CI future WG - https://phabricator.wikimedia.org/T217889 (10Izno) [13:25:30] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate sourcehut builds - https://phabricator.wikimedia.org/T217852 (10Izno) [13:38:21] 10Gerrit, 10Release-Engineering-Team (Next), 10DBA, 10Operations: Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532 (10Dzahn) T218570 might unblock this [13:38:31] 10Phabricator, 10Release-Engineering-Team (Backlog), 10Availability, 10User-MModell, 10WorkType-NewFunctionality: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928 (10Dzahn) T218570 might unblock this [13:56:30] hashar, legoktm do you know if https://gerrit.wikimedia.org/r/c/integration/config/+/494802 is ready to go? it has been sitting for about 11 days now. [13:58:00] (03CR) 10Subramanya Sastry: "Pinging again. Anything I can do to help move this ahead?" [integration/config] - 10https://gerrit.wikimedia.org/r/494802 (https://phabricator.wikimedia.org/T216102) (owner: 10C. Scott Ananian) [14:08:15] 10Release-Engineering-Team (Kanban): Evaluate Concourse-CI - https://phabricator.wikimedia.org/T217595 (10LarsWirzenius) I'm having a hard time getting the "build blubber" toy project done with Concourse. Possibly it's due my unfamiliarity with Docker, or that I'd need to study the Concourse documentation more,... [14:20:47] deployment-prep people: How much will it mess with you if/when we disable creation of new Jessie VMs? [14:22:51] dunno about the others but I'm avoiding new jessie VMs [14:23:21] ok, so maybe not too much? [14:23:24] integration might have a problem [14:49:32] 10Continuous-Integration-Config, 10Wikidata: SlowTimer for two PHPunit tests, possibly from Wikibase - https://phabricator.wikimedia.org/T211035 (10hashar) I do not know whether it still occurs, I haven't checked. Maybe the SQL database is too slow :( [14:53:55] 10Continuous-Integration-Config, 10Discovery: Move tox-pyspark docker image to Debian Stretch - https://phabricator.wikimedia.org/T212399 (10hashar) 05Open→03Resolved a:03hashar The Docker container is now based on Stretch and has java 8: ` tox-pyspark (0.3.0) wikimedia; urgency=medium * Rebuild based... [14:56:38] (03CR) 10Hashar: [C: 03+2] Adds Vedmaka Wakalaka to the CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/496196 (owner: 10Vedmaka Wakalaka) [14:58:30] (03Merged) 10jenkins-bot: Adds Vedmaka Wakalaka to the CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/496196 (owner: 10Vedmaka Wakalaka) [14:58:47] (03CR) 10Hashar: [C: 03+2] "Deployed! :)" [integration/config] - 10https://gerrit.wikimedia.org/r/496196 (owner: 10Vedmaka Wakalaka) [15:01:23] andrewbogott: integration requires Jessie instances for now [15:01:28] andrewbogott: (oh and hello) [15:01:38] we haven't looked yet at migrating the CI slaves to Stretch [15:02:00] though given most of the workload is on Docker containers, it is probably not too hard [15:02:13] but yeah meanwhile, we need to keep the ability to spawn jessie instances [15:29:26] (03CR) 10Daniel Kinzler: "thanks!" [integration/config] - 10https://gerrit.wikimedia.org/r/496196 (owner: 10Vedmaka Wakalaka) [15:33:21] hashar: ok, I'll see if I can carve out an exception for Integration [15:34:03] (03PS1) 10Jforrester: Revert "Revert "Replace *-jsduck-* jobs with *-node10-docs-* ones"" [integration/config] - 10https://gerrit.wikimedia.org/r/497322 (https://phabricator.wikimedia.org/T218553) [15:34:44] andrewbogott, maybe like a private copy of the image or something? [15:35:02] Krenair: yeah, I don't know the exact command but it's definitely possible [15:35:19] maybe worth having someone write up the rationale for keeping access to it [15:35:58] Project beta-scap-eqiad build #241854: 04FAILURE in 8 min 19 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/241854/ [15:47:03] Yippee, build fixed! [15:47:03] Project beta-scap-eqiad build #241855: 09FIXED in 9 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/241855/ [15:47:59] i'm not saying renderd has a memory leak... [15:48:00] https://grafana-labs.wikimedia.org/dashboard/db/cloud-vps-project-board?orgId=1&var-project=maps&var-server=maps-tiles1&panelId=17&fullscreen [15:48:07] but i think it has a memory leak ;) [15:48:09] 10Continuous-Integration-Config, 10Project-Admins: Create phabricator tag to track CI blockers (#jenkins-failure) - https://phabricator.wikimedia.org/T218043 (10Jdforrester-WMF) >>! In T218043#5032159, @Lucas_Werkmeister_WMDE wrote: >>>! In T218043#5027961, @Jdforrester-WMF wrote: >> We use #jenkins-failure fo... [15:51:10] PROBLEM - Puppet staleness on deployment-db04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [15:54:27] PROBLEM - Puppet errors on deployment-mx02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [3.0] [16:06:48] thedj, is that it hitting the instance's max memory and getting OOM killed? [16:10:54] Krenair: it was, i now restart it hourly (although, it seems that sometimes it's not reaching that hour). [16:12:32] nice [16:12:40] is this thing JVM-based by any chance? [16:28:04] PROBLEM - Host integration-publishing02 is DOWN: CRITICAL - Host Unreachable (172.16.4.5) [16:38:59] !log created deployment-acme-chief01 and a client instance for further acme-chief testing + dev. used stretch, would be buster like prod but not sure that's easily available outside testlabs yes [16:39:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:39:21] !log yet* [16:39:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:57:09] Just got a 503 on Phabricator: "Request from 189.203.45.40 via cp1075 cp1075, Varnish XID 6359525 Error: 503, Backend fetch failed at Mon, 18 Mar 2019 16:56:05 GMT" [16:57:26] see -operations [16:57:37] ah ok thanks paladox [16:57:43] your welcome :) [17:05:06] Project beta-code-update-eqiad build #239272: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239272/ [17:06:13] Project mediawiki-core-doxygen-docker build #5511: 04FAILURE in 2 min 10 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/5511/ [17:07:50] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts] [17:10:03] ACKNOWLEDGEMENT - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts] cole_white gerrit maintenance [17:13:13] ACKNOWLEDGEMENT - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts] cole_white gerrit maintenance [17:17:17] Project beta-code-update-eqiad build #239273: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239273/ [17:29:28] Project beta-code-update-eqiad build #239274: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239274/ [17:41:39] Project beta-code-update-eqiad build #239275: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239275/ [17:52:55] releng.team, huh? fancy [17:53:36] anyone online who is familiar with how the localization cache builder in scap works? [17:53:51] Project beta-code-update-eqiad build #239276: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239276/ [17:54:58] scap update-l10n tgr ? [17:55:30] that one, yes [17:56:00] specifically, we need to add an extension to wmf-config/extension-list for beta deployment [17:56:21] will the l10n builder care about that? [17:56:34] given that at this point the extension is not present in production [17:56:44] well I think that once a week there's a cron taking care of that [17:56:58] for prod [17:57:06] for beta it works every 10 minutes irrc [17:57:12] and does a full scap [17:57:20] we used to have extension-list-labs [17:57:23] But that went away [17:57:30] afaics in the jenkins logs [17:58:21] 16:51:58 16:51:58 Updating ExtensionMessages-master.php [17:58:21] 16:51:59 16:51:59 Updating LocalisationCache for master using 6 thread(s) [17:58:53] yup [17:59:13] I think you can run it standalone in prod but it takes a loooooooong time [18:00:03] ExtensionMessages-master.php on beta looks to be the same as prod [18:01:07] yeah, with extension-list-labs beta and prod were completely decoupled, other than using the same git repo [18:01:16] tgr: It looks like we branch/add them to prod [18:01:17] now I'm not sure if that's the case [18:01:20] for example Sentry [18:01:25] Not enabled in prod, but is in beta [18:01:33] It's on /srv/mediawiki-staging [18:01:39] So I'm guessing it's in make-wmf-branch too [18:02:21] the context is that we want to deploy a new extension and don't have time to wait for make-wmf-branch [18:02:34] (with the beta deployment anyway) [18:02:56] You can just manually add it to the active wmf branches (and get it deployed/staged before you add it to extension-list) [18:02:57] so I'm trying to figure out if we need to set the branches up manually in production [18:03:03] But obviously, not when gerrit is out of action [18:03:13] did that, they got -1-ed as unnecessary [18:03:33] so I'm trying to double-chech that [18:03:54] I literally cannot see any other option [18:03:58] It's either you do it like that [18:04:04] Or you add it to make-wmf-branch and wait [18:04:42] ok, thanks [18:05:04] https://github.com/wikimedia/mediawiki-tools-release/commit/9381cdba25db86ef0cfb396321a81af6129c1a9e [18:05:07] "Adding Sentry to make-wmf-branch [18:05:07] See I5e21f89d for the full explanation" [18:05:15] Obviously cannot easily find out what that change id contains [18:06:02] Project beta-code-update-eqiad build #239277: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239277/ [18:06:14] Project mediawiki-core-doxygen-docker build #5512: 04STILL FAILING in 2 min 10 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/5512/ [18:17:30] I had a config release scheduled for SWAT today. That doesn't look like it's going to happen because of "things." If I can't get that config set, I need to revert a patch in master that will break without this config. I cannot revert because of these same "things." I also cannot add this revert as a train blocker because of "things." So, I'm writing here to post some awareness. [18:18:13] Project beta-code-update-eqiad build #239278: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239278/ [18:30:24] Project beta-code-update-eqiad build #239279: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239279/ [18:42:35] Project beta-code-update-eqiad build #239280: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239280/ [18:54:47] Project beta-code-update-eqiad build #239281: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239281/ [19:06:14] Project mediawiki-core-doxygen-docker build #5513: 04STILL FAILING in 2 min 10 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/5513/ [19:06:57] Project beta-code-update-eqiad build #239282: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239282/ [19:19:09] Project beta-code-update-eqiad build #239283: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239283/ [19:31:20] Project beta-code-update-eqiad build #239284: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239284/ [19:43:31] Project beta-code-update-eqiad build #239285: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239285/ [19:55:42] Project beta-code-update-eqiad build #239286: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239286/ [20:06:13] Project mediawiki-core-doxygen-docker build #5514: 04STILL FAILING in 2 min 9 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/5514/ [20:07:53] Project beta-code-update-eqiad build #239287: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239287/ [20:20:04] Project beta-code-update-eqiad build #239288: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239288/ [20:32:15] Project beta-code-update-eqiad build #239289: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239289/ [20:36:09] * awight sprays wmf-insecte with neem oil [20:36:20] next one should work [20:36:50] Project beta-code-update-eqiad build #239290: 04STILL FAILING in 4 min 34 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239290/ [20:38:08] Yippee, build fixed! [20:38:09] Project beta-code-update-eqiad build #239291: 09FIXED in 1 min 18 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239291/ [20:39:36] thanks for /topic work, Reedy :) [21:00:29] Will there be a SWAT deploy today? [21:01:12] I think the one in ~2 hours can probably go ahead [21:02:22] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [21:03:02] What he said [21:05:25] PROBLEM - Host deployment-sessionstore01 is DOWN: CRITICAL - Host Unreachable (172.16.3.4) [21:13:17] Yippee, build fixed! [21:13:18] Project mediawiki-core-doxygen-docker build #5515: 09FIXED in 9 min 13 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/5515/ [22:18:55] !log create LoginUsers group [22:18:55] paladox: Failed to log message to wiki. Somebody should check the error logs. [22:18:58] hmm [22:20:42] no logging for right now [22:21:03] there's a bug, it's being worked on (sorry) [22:21:23] manual edits to SAL? [22:21:40] (03PS1) 10Paladox: Fix blocking users [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/497419 [22:21:45] thcipriani ^^ [22:21:51] apergos ah thanks [22:22:38] apergos: it's posting to sal on tools, just not writing to wikitech [22:22:45] so at least not totally lost [22:22:53] also, who uses the wikitech sal anymore? :) [22:23:05] as long as it posts to twitter, who cares [22:23:10] https://tools.wmflabs.org/sal/log/AWmS4t5JA1BDhGjCXi7s [22:24:11] * paladox finally got a working config :) [22:24:17] i missed the fine print in the docs [22:25:38] bokmarked so we can manually copy anything missed, later. thanks! [22:25:46] * hauskatze doesn't have trolltwitter [22:28:07] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10Aklapper) @zeljkofilipin: - @20after4; + @mmodell. [22:28:36] 10Continuous-Integration-Config, 10Fresnel, 10Performance-Team: Omit "npm install" step in Fresnel job output - https://phabricator.wikimedia.org/T218374 (10Krinkle) 05Open→03Resolved a:03Krinkle [22:29:59] 10Project-Admins, 10Security-Team: Create maybe-public tag - https://phabricator.wikimedia.org/T215981 (10Aklapper) 05Open→03Declined Maybe's usually stay maybe until there's a process to unmaybify. Non-static states should be statuses or workboard columns, but not tags. (I admit we make this mistake in a... [22:30:12] 10Beta-Cluster-Infrastructure: Figure out future for newly created deployment-prep jessie instances - https://phabricator.wikimedia.org/T218609 (10Krenair) [22:30:20] 10Beta-Cluster-Infrastructure: Figure out future for newly created deployment-prep jessie instances - https://phabricator.wikimedia.org/T218609 (10Krenair) ` ottomata, hi, I was wondering if you made deployment-eventgate-analytics-1 be jessie for any particular reason is this mirroring a prod... [22:30:22] 10Beta-Cluster-Infrastructure: Figure out future for newly created deployment-prep jessie instances - https://phabricator.wikimedia.org/T218609 (10Krenair) [22:31:21] Project beta-scap-eqiad build #241871: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/241871/ [22:31:29] 10Beta-Cluster-Infrastructure: Figure out future for newly created deployment-prep jessie instances - https://phabricator.wikimedia.org/T218609 (10Andrew) Jessie creation is now disabled in most projects (including deployment-prep). I'd prefer to leave it that way in order to provide some mild resistance to new... [22:31:58] 10Continuous-Integration-Config, 10Project-Admins: Create phabricator tag to track CI blockers (#jenkins-failure) - https://phabricator.wikimedia.org/T218043 (10Krinkle) >>! In T218043#5032159, @Lucas_Werkmeister_WMDE wrote: >>>! In T218043#5027961, @Jdforrester-WMF wrote: >> We use #jenkins-failure for these.... [22:32:14] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.33.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T206676 (10Jdforrester-WMF) [22:33:02] 10Release-Engineering-Team (Kanban), 10Code-Health-Metrics: Define a code health metrics reporting approach/strategy - https://phabricator.wikimedia.org/T205143 (10Jrbranaa) [22:33:04] (03CR) 10Paladox: [V: 03+1] Fix blocking users [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/497419 (owner: 10Paladox) [22:33:08] 10Release-Engineering-Team (Kanban), 10Code-Health-Metrics: Define a code health metrics reporting approach/strategy - https://phabricator.wikimedia.org/T205143 (10Jrbranaa) [22:33:29] 10Release-Engineering-Team (Kanban): TEC13:O3.4:Q1 Goal - Put in place Tech Debt management process for PEP - https://phabricator.wikimedia.org/T199263 (10Jrbranaa) 05Open→03Resolved [22:33:39] 10Release-Engineering-Team (Kanban): TEC13:O3.4:Q1 Goal - Identify key Tech Debt areas (Platform) - https://phabricator.wikimedia.org/T199262 (10Jrbranaa) 05Open→03Resolved Tech debt identification and management is being done as part of the PEP refactoring work. [22:34:01] 10Release-Engineering-Team (Kanban), 10Technical-Debt: Define an approach for tracking/managing tech debt for PEP - https://phabricator.wikimedia.org/T196096 (10Jrbranaa) 05Open→03Resolved [22:34:46] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Evaluate Phabricator Harbormaster - https://phabricator.wikimedia.org/T217901 (10mmodell) @zeljkofilipin Harbormaster can do more than that, however, not without some customization of the code (which isn't possible on their free cloud instances) [22:42:50] Yippee, build fixed! [22:42:51] Project beta-scap-eqiad build #241872: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/241872/ [23:03:54] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Patch-For-Review: Upgrade to Gerrit 2.16.7 - https://phabricator.wikimedia.org/T200739 (10Paladox) p:05Normal→03High After the incident over the weekend, it's brought up the fact we cannot fall over to gerrit2001 due to the db. This update will allow us t... [23:09:21] (03PS2) 10Thcipriani: Fix blocking users [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/497419 (owner: 10Paladox) [23:10:04] thank you thcipriani! [23:16:10] (03CR) 10Thcipriani: [V: 03+2 C: 03+2] Fix blocking users [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/497419 (owner: 10Paladox) [23:16:33] thanks! [23:16:39] now users can be blocked instantly [23:17:24] Project beta-code-update-eqiad build #239305: 04FAILURE in 1 min 21 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239305/ [23:24:19] Project beta-code-update-eqiad build #239306: 04STILL FAILING in 1 min 18 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239306/ [23:30:13] (03PS1) 10Thcipriani: Revert "Fix blocking users" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/497427 [23:30:43] (03CR) 10Thcipriani: [V: 03+2 C: 03+2] Revert "Fix blocking users" [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/497427 (owner: 10Thcipriani) [23:34:19] Yippee, build fixed! [23:34:19] Project beta-code-update-eqiad build #239307: 09FIXED in 1 min 18 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/239307/ [23:35:17] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts] [23:44:35] 10Continuous-Integration-Config, 10Growth-Team, 10StructuredDiscussions, 10Patch-For-Review, 10User-kostajh: Fix Flow random test failures - https://phabricator.wikimedia.org/T208988 (10Tgr) Same issue in https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-hhvm-docker/10779/console: `... [23:50:05] (03PS1) 10Paladox: Fix blocking users [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/497429 [23:50:07] thcipriani ^^ [23:50:13] correctly tested this time [23:50:17] :) [23:54:56] who knew that using a group does not inherit the groups specified. [23:55:22] at least not in All-Projects.