[00:01:38] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO (201909), 10Release Pipeline, 10User-zeljkofilipin: Gerrit/Argo CI proof of concept - https://phabricator.wikimedia.org/T229246 (10brennen) [00:01:40] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Consider and evaluate possible new CI tooling - https://phabricator.wikimedia.org/T217325 (10brennen) [00:03:25] (03PS1) 10EBernhardson: update tox-pyspark container to spark 2.3.1 [integration/config] - 10https://gerrit.wikimedia.org/r/542267 [00:05:13] (03CR) 10jerkins-bot: [V: 04-1] update tox-pyspark container to spark 2.3.1 [integration/config] - 10https://gerrit.wikimedia.org/r/542267 (owner: 10EBernhardson) [00:09:05] (03PS2) 10EBernhardson: update tox-pyspark container to spark 2.3.1 [integration/config] - 10https://gerrit.wikimedia.org/r/542267 [00:11:01] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [00:32:09] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [00:49:57] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [01:37:04] !log Killed the job that was waiting for executor for beta-scap-eqiad, it's now running after 4 hours not [01:37:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [01:38:29] Is github replication broken? [01:38:36] or backed up [01:38:58] https://github.com/wikimedia/mediawiki-extensions-CirrusSearch and https://github.com/wikimedia/mediawiki-extensions-Elastica are out of date [02:04:07] Reedy: only 4 tasks in queue. does not look like it usually looks when broken [02:06:07] replication_log shows normal replication to gerrit2001 but does NOT mention github [02:07:31] remote "github" is in replication.config as normal and dont think we changed it [02:11:20] tried to start the replication with "manual" commands via ssh [02:13:31] Project mediawiki-core-doxygen-docker build #10522: 04FAILURE in 0.24 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/10522/ [02:14:47] Reedy: i think i fixed it. logged in operations [02:14:54] i see it pushing to github now [02:15:01] before it wasnt doing that all day [02:16:35] except there are some repos not found on the github side. that's probably normal [03:03:07] cheers [03:14:30] Yippee, build fixed! [03:14:30] Project mediawiki-core-doxygen-docker build #10523: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/10523/ [06:29:56] 10Release-Engineering-Team: php-composer-security-docker currently failing with "curl: command not found" error - https://phabricator.wikimedia.org/T235221 (10Daimona) Probably a side effect of T234623, due to the new image not having curl installed. [06:55:03] Yippee, build fixed! [06:55:04] Project mwcore-phpunit-coverage-master build #229: 09FIXED in 3 hr 55 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/229/ [08:19:14] latest gerrit fun, the gerrit-replica does not support "query" :-\ [08:20:49] I'd like to experiment with a Jenkins job, and thinking of having the job pull a docker image for testing, from hub.docker.com. Is this a terrible idea? [08:29:23] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201910), 10Upstream: Gerrit replica does not support the 'query' command - https://phabricator.wikimedia.org/T235251 (10hashar) [08:30:07] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO (201910), 10Developer-Advocacy, 10wikimedia.biterg.io: biterg.io Gerrit crawling probably stresses the server too much - https://phabricator.wikimedia.org/T234328 (10hashar) 05Resolved→03Open On the replica:... [08:33:09] awight: most probably yes [08:33:12] I mean [08:33:18] most probably yes it is a bad idea :-]]] [08:33:33] and surely for testing, you should be able to conduct it on a local VM [08:34:32] hashar: Thank you for the (in)sanity check ;-) [08:35:42] I've done most of the smoke-testing locally, but still want to try publishing the coverage results. [08:37:19] Also, I used a variable I don't understand, and wanted to see if there was invisible magic glue. This is the `$DOC_PROJECT` value passed to cover-publish. [08:44:07] awight: DOC_PROJECT is injected by Zuul parameter functions: [08:44:11] zuul/parameter_functions.py: params['DOC_PROJECT'] = raw_project.replace('/', '-') [08:44:44] :) thanks, I think I was grepping at 5pm yesterday and completely code-blind [08:44:44] and there is also DOC_SUBPATH forged from the branch or tag [08:44:51] all of that is rather messy though [08:47:43] Okay, I'm happy with my patch in that case. I still haven't tried the cover-publish step, but the job is wired under trigger `experimental` so I can safely smoke-test the full integration on WMF Jenkins. [09:59:58] (03CR) 10Awight: "Good to note." [integration/quibble] - 10https://gerrit.wikimedia.org/r/541862 (https://phabricator.wikimedia.org/T235118) (owner: 10Hashar) [10:00:01] (03CR) 10Awight: [C: 03+2] releasing: do run quibble before tagging [integration/quibble] - 10https://gerrit.wikimedia.org/r/541862 (https://phabricator.wikimedia.org/T235118) (owner: 10Hashar) [10:00:18] (03CR) 10Awight: [C: 03+2] changelog: begin new version cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/541860 (owner: 10Hashar) [10:02:28] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Coverage report for PHP extensions [integration/config] - 10https://gerrit.wikimedia.org/r/542071 (https://phabricator.wikimedia.org/T185583) (owner: 10Awight) [10:03:04] (03Merged) 10jenkins-bot: changelog: begin new version cycle [integration/quibble] - 10https://gerrit.wikimedia.org/r/541860 (owner: 10Hashar) [10:03:06] (03Merged) 10jenkins-bot: releasing: do run quibble before tagging [integration/quibble] - 10https://gerrit.wikimedia.org/r/541862 (https://phabricator.wikimedia.org/T235118) (owner: 10Hashar) [10:10:25] (03CR) 10Awight: [C: 03+2] "Wow, you found quite a few of these!" (035 comments) [integration/quibble] - 10https://gerrit.wikimedia.org/r/541806 (owner: 10Hashar) [10:11:13] (03Merged) 10jenkins-bot: Consistent http host [integration/quibble] - 10https://gerrit.wikimedia.org/r/541806 (owner: 10Hashar) [10:25:28] (03PS3) 10Awight: Npm install before each node command [integration/quibble] - 10https://gerrit.wikimedia.org/r/540387 (https://phabricator.wikimedia.org/T225008) [10:42:25] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO (201910), 10International-Developer-Events, 10Wikimedia-Technical-Conference-2019, and 2 others: Wikimedia Technical Conference 2019 Session: System level testing: patterns an... - https://phabricator.wikimedia.org/T234635 [10:43:13] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO (201910), 10International-Developer-Events, 10Wikimedia-Technical-Conference-2019, and 2 others: Wikimedia Technical Conference 2019 Session: System level testing: patterns an... - https://phabricator.wikimedia.org/T234635 [10:43:40] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO (201910), 10International-Developer-Events, 10Wikimedia-Technical-Conference-2019, and 2 others: Wikimedia Technical Conference 2019 Session: System level testing: patterns an... - https://phabricator.wikimedia.org/T234635 [10:46:46] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO (201910), 10International-Developer-Events, 10Wikimedia-Technical-Conference-2019, and 2 others: Wikimedia Technical Conference 2019 Session: System level testing: patterns an... - https://phabricator.wikimedia.org/T234635 [10:59:37] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO (201910), 10International-Developer-Events, 10Wikimedia-Technical-Conference-2019, and 2 others: Wikimedia Technical Conference 2019 Session: System level testing: patterns an... - https://phabricator.wikimedia.org/T234635 [11:32:10] hellooo releng! I've been getting a 500 error every time I've tried to abandon this change over the last couple days [11:32:21] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/517423/ [11:33:40] * hauskater sees [11:34:13] it's "Working..." for me [11:34:28] and yup, Error 500 now [11:36:03] fdans: LockFailureException: Update aborted with one or more lock failures: [11:36:19] no idea what that means and hashar just went to have lunch [11:36:54] hauskater: no rush at all from me, I was wondering if this was something happening to other people [11:37:31] fdans: does retrying work? [11:38:09] paladox: nope, takes a while showing "working..." and then 500s [11:39:24] Oh [11:39:26] paladox / fdans -- trace: https://phabricator.wikimedia.org/P9312 [11:41:12] Thanks! [11:41:13] I think there must be a stale lock [11:41:14] cc thcipriani ^ [11:41:23] (03PS1) 10Pwirth: parameter_functions: Add dependency "BlueSpiceExtendedSearch" to BlueSpiceStatistics [integration/config] - 10https://gerrit.wikimedia.org/r/542412 [11:42:44] it says LOCK_FAILURE [11:42:51] but I'm not aware of any lock [11:43:12] (03CR) 10jerkins-bot: [V: 04-1] parameter_functions: Add dependency "BlueSpiceExtendedSearch" to BlueSpiceStatistics [integration/config] - 10https://gerrit.wikimedia.org/r/542412 (owner: 10Pwirth) [11:43:12] perhaps the db locked something for a transaction and forgot to clear it? [11:44:35] rebasing also fails [11:45:15] Nope [11:45:16] We use NoteDB :) [11:45:17] That would only be for groups as that’s currently in the db [11:45:47] (03PS2) 10Pwirth: parameter_functions: Add dependency "BlueSpiceExtendedSearch" to BlueSpiceStatistics [integration/config] - 10https://gerrit.wikimedia.org/r/542412 [11:47:12] Yeh [11:47:13] (Unrelated, I saw https://phabricator.wikimedia.org/P9313) [11:47:13] thcipriani: ^ touches LocalCache [11:47:21] fdans: so, something weird. A Queimada might solve it [11:48:25] Though that was at 2am [11:49:35] lunch time for me as well [11:49:37] see you [11:49:41] hauskater: very handy that I'm currently in Santiago then :) [12:04:56] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO (201910), 10International-Developer-Events, 10Wikimedia-Technical-Conference-2019, and 2 others: Wikimedia Technical Conference 2019 Session: System level testing: patterns an... - https://phabricator.wikimedia.org/T234635 [12:06:45] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO (201910), 10International-Developer-Events, 10Wikimedia-Technical-Conference-2019, and 2 others: Wikimedia Technical Conference 2019 Session: System level testing: patterns an... - https://phabricator.wikimedia.org/T234635 [12:37:55] (03CR) 10Hashar: [C: 04-1] "The repo is used for https://integration.wikimedia.org/ which is served by contint1001. It is using Jessie and thus php5.6." [integration/docroot] - 10https://gerrit.wikimedia.org/r/542175 (owner: 10Jforrester) [12:47:46] 10Release-Engineering-Team, 10MediaWiki-User-management, 10User-DannyS712: User rights validation is malfunctioning - https://phabricator.wikimedia.org/T234743 (10Zache) Would it help for debugging if we could replicate this in test2.wikipedia.org? I think that adding `unreviewedpages` right to sysops in [... [12:52:47] (03CR) 10Hashar: [C: 03+2] "Sounds great thanks :]" [integration/config] - 10https://gerrit.wikimedia.org/r/542267 (owner: 10EBernhardson) [12:55:06] (03Merged) 10jenkins-bot: update tox-pyspark container to spark 2.3.1 [integration/config] - 10https://gerrit.wikimedia.org/r/542267 (owner: 10EBernhardson) [13:04:22] (03PS1) 10Hashar: Switch mjolnir to Spark 2.3.1 [integration/config] - 10https://gerrit.wikimedia.org/r/542423 [13:07:18] (03CR) 10Ottomata: "FYI i hope to do this in the next couple of weeks:" [integration/config] - 10https://gerrit.wikimedia.org/r/542423 (owner: 10Hashar) [13:09:41] (03PS1) 10Octfx: Edit Project Config [extensions/WikiSEO] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/542425 [13:25:53] (03PS2) 10Hashar: Switch mjolnir to Spark 2.3.1 [integration/config] - 10https://gerrit.wikimedia.org/r/542423 [13:25:55] (03PS1) 10Hashar: docker: update Spark keyring [integration/config] - 10https://gerrit.wikimedia.org/r/542428 [13:26:19] (03CR) 10Hashar: [C: 03+2] docker: update Spark keyring [integration/config] - 10https://gerrit.wikimedia.org/r/542428 (owner: 10Hashar) [13:27:58] (03Merged) 10jenkins-bot: docker: update Spark keyring [integration/config] - 10https://gerrit.wikimedia.org/r/542428 (owner: 10Hashar) [13:32:30] !log Build docker-registry.discovery.wmnet/releng/tox-pyspark:0.5.1 for ebernhardson [13:32:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [13:32:37] grr building [13:41:58] (03PS3) 10Hashar: Switch mjolnir to Spark 2.3.1 [integration/config] - 10https://gerrit.wikimedia.org/r/542423 [13:57:36] (03CR) 10Hashar: [C: 03+2] "Job updated and it seems to work :-]" [integration/config] - 10https://gerrit.wikimedia.org/r/542423 (owner: 10Hashar) [13:59:55] (03Merged) 10jenkins-bot: Switch mjolnir to Spark 2.3.1 [integration/config] - 10https://gerrit.wikimedia.org/r/542423 (owner: 10Hashar) [14:05:23] hashar i saw https://phabricator.wikimedia.org/P9313 which leads to LocalCache (it was thrown at 2am this morning), [14:05:28] i wonder if this is related [14:05:39] https://github.com/google/guava/blob/v24.1.1/guava/src/com/google/common/cache/LocalCache.java#L2023 [14:10:56] nah, it's not. [14:25:50] 10Gerrit: Gerrit crashed due to out of Heap - https://phabricator.wikimedia.org/T225166 (10Paladox) [14:25:54] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Paladox) [14:25:56] 10Gerrit: Gerrit crashed due to out of Heap - https://phabricator.wikimedia.org/T225166 (10Paladox) p:05Triage→03Normal [15:22:44] 10MediaWiki-Releasing, 10Security: Release MediaWiki 1.31.4/1.32.4/1.33.1 - https://phabricator.wikimedia.org/T225151 (10sbassett) [15:24:05] 10MediaWiki-Releasing, 10Security: Write and send supplementary release announcement for extensions with security patches (MediaWiki 1.31.4/1.32.4/1.33.1) - https://phabricator.wikimedia.org/T232113 (10sbassett) [15:29:27] 10Gerrit, 10cloud-services-team (Kanban): Create a new repository for kubernetes tool under a new directory in gerrit that doesn't exist yet - https://phabricator.wikimedia.org/T235279 (10Bstorm) [15:41:18] 10Gerrit, 10cloud-services-team (Kanban): Create a new repository for kubernetes tool under a new directory in gerrit that doesn't exist yet - https://phabricator.wikimedia.org/T235279 (10Bstorm) Although, this may be the wrong way if there is a way to smoothly move/rename projects in the future. In that case... [15:50:13] hey hey, it's me again, How can I run createAnPromote on beta cluster? I need to give sysop group to one user for testing [15:51:04] raynor: log onto deployment-deploy01.deployment-prep.eqiad.wmflabs [15:51:05] just ` mwscript createAndPromote.php --wiki=enwiki ....` on deploy1001 ? [15:51:12] Then the same as you would on prod [15:53:30] thx Reedy [15:53:32] it worked [15:59:36] !log ssh -p 29418 gerrit.wikimedia.org gerrit create-project cloud.git --permissions-only --description="Container for all other cloud/* projects for permissions, etc." - T235279 [15:59:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:59:42] T235279: Create a new repository for kubernetes tool under a new directory in gerrit that doesn't exist yet - https://phabricator.wikimedia.org/T235279 [16:00:00] well... fatal: Too many arguments: for [16:00:01] heh [16:03:08] !log ssh -p 29418 gerrit.wikimedia.org gerrit create-project cloud.git --permissions-only - T235279 [16:03:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:03:16] i guess there's some kind of bug with --description [16:41:08] (03CR) 10Jforrester: [C: 03+2] parameter_functions: Add dependency "BlueSpiceExtendedSearch" to BlueSpiceStatistics [integration/config] - 10https://gerrit.wikimedia.org/r/542412 (owner: 10Pwirth) [16:44:07] (03Merged) 10jenkins-bot: parameter_functions: Add dependency "BlueSpiceExtendedSearch" to BlueSpiceStatistics [integration/config] - 10https://gerrit.wikimedia.org/r/542412 (owner: 10Pwirth) [16:45:00] !log Zuul: Add dependency "BlueSpiceExtendedSearch" to BlueSpiceStatistics [16:45:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:17:28] 10Continuous-Integration-Infrastructure, 10Zuul: Stop/Restart tests for zuul - https://phabricator.wikimedia.org/T230019 (10Daimona) Another use case is when a quick test (e.g. phan) fails, and you're not interested to see the results of other (longer) tests. At least based on my usual workflow (which sometime... [17:20:49] 10Continuous-Integration-Config, 10Google-Code-in-2019: Add yourself to the Jenkins whitelist in Gerrit to trigger testing unit test failures or code style issues yourself - https://phabricator.wikimedia.org/T235286 (10Urbanecm) [17:57:59] !log Click "Disable Publishing" on extension-Elastica Phab mirror, T235233, T143162 [17:58:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:58:03] T143162: Reduce task notification noise/frequency of changes to associated open patchsets - https://phabricator.wikimedia.org/T143162 [17:58:03] T235233: Stop using PHP5 versions of ruflin/elastica and elasticsearch/elasticsearch - https://phabricator.wikimedia.org/T235233 [17:58:59] 10Diffusion, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO: Reduce task notification noise/frequency of changes to associated open patchsets - https://phabricator.wikimedia.org/T143162 (10Krinkle) @mmodell That does not appear to have worked. I still every week see this n... [18:00:21] 10Continuous-Integration-Config, 10Google-Code-in-2019: Add yourself to the Jenkins whitelist in Gerrit to trigger testing unit test failures or code style issues yourself - https://phabricator.wikimedia.org/T235286 (10Gopavasanth) [18:20:38] 10Gerrit, 10cloud-services-team (Kanban): Create a new repository for kubernetes tool under a new directory in gerrit that doesn't exist yet - https://phabricator.wikimedia.org/T235279 (10Bstorm) This is closely tied to T229936 in goals and direction, but I need to get some code up there fairly soon, so I pers... [18:24:28] 10Gerrit, 10cloud-services-team (Kanban): Create a new repository for kubernetes tool under a new directory in gerrit that doesn't exist yet - https://phabricator.wikimedia.org/T235279 (10hashar) What Paladox did :] As for renaming / moving repositories, that is not possible in Gerrit out of the box. One coul... [18:36:12] 10Continuous-Integration-Config, 10MediaWiki-extensions-General: Add phan to MediaWiki extensions and skins for static analysis [cloneable] - https://phabricator.wikimedia.org/T179554 (10Gopavasanth) [19:18:26] 10Deployments, 10Release-Engineering-Team, 10VisualEditor, 10Wikimedia-Logstash, and 3 others: Logstash discards messages from MediaWiki if they contain uncommon keys in the $context array - https://phabricator.wikimedia.org/T234564 (10herron) >! In T234564#5565056, @Krinkle wrote: > Would it be possible t... [19:36:59] paladox: thcipriani latest finding for gerrit, in replica mode, the daemon does not allow "query" command :-] [19:37:10] --replica means commands have to be explicitly whitelisted in the source code [19:37:18] since they should not do any write ;D [19:37:33] and well... Query is not marked explicitly as for masterorslave or just for master [19:37:40] so it defaults to master only [19:37:56] which mean I could not off load the biterg.io bot to the replica for their ssh gerrit query ;-\\ [19:40:08] paladox: thcipriani: yesterday Reedy found replication was not running to Github. I looked at it and the replication.log showed normal behaviour to gerrit2001 just .. did not mention github at all. only showed up in the log of the previous day. Not like it showed errors..just wasnt in it. I restarted Gerrit (last i had merged the autoreload=true/false change for replication.config) and then ran [19:40:14] "start replication" via ssh and then github replication started again [19:40:44] did we recently make a replication change without a restart? [19:40:47] and it was back to normal. a bit odd since we did not even change the github part.. but also there was the autoreload thing [19:41:01] 10Continuous-Integration-Config, 10Release-Engineering-Team: quibble-selenium jobs re-downloading npm packages (castor not loading) - https://phabricator.wikimedia.org/T234738 (10hashar) Looks like the central store has some content: ` integration-castor03:/srv/jenkins-workspace/caches$ du -m -d1 mediawiki-cor... [19:41:01] yeah, the autoreload thing is completely undefined [19:41:02] thcipriani: yes [19:41:12] in terms of how it's going to behave [19:41:16] ack [19:42:07] first: ssh -p 29418 dzahn@gerrit.wikimedia.org replication start --url github trying to start _just_ github [19:42:33] then just "start" and it started fine [19:42:38] yeh [19:42:52] the first one alone did not work though [19:42:58] maybe i got it wrong about --url [19:43:05] it looked like it just has to match a part of it [19:43:30] anyways. i think it's fine. just sharing anyways [19:43:46] it was the missing restart somehow [19:43:46] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [19:44:39] the only strange part how it affected only github when we did not change the github remote [19:47:18] 10Continuous-Integration-Config, 10Release-Engineering-Team: quibble-selenium jobs re-downloading npm packages (castor not loading) - https://phabricator.wikimedia.org/T234738 (10hashar) Running npm install for Wikibase, the CPU skyrocket and the installation takes time on each of: ` > core-js@3.2.1 postinstal... [19:49:01] holy shit [19:49:04] gr [19:50:55] mutante it'll be due to the reloading of the config [19:50:59] by the replication plugin [19:51:58] paladox: ack! [19:56:40] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [19:57:33] thanks for the restart yesterday mutante glad github replication worked after that [19:57:53] hashar: thanks for the restart overnight due to threads, sorry you're now sucked into that :) [19:58:10] hmm [19:58:21] so hmm "hashar" in #wikimedia-operations is acutally a bot [19:58:34] uhhh [19:58:44] it reacts on cdanis speaking and uses an AI to check whether that can mention gerrit issue [19:58:51] the AI is not really robust ;] [19:58:54] ah [19:59:05] :D [19:59:07] then it has a corupus of random sentences to reply to hold the bar and pretend I am actually working [19:59:30] sounds like a win [19:59:33] eliminating toil [19:59:39] pick_from( "oh yeah on it", "hmm", "what?!!!", "mais c'est quoi ce bins", "taking a thread dump hold on", "can you fill a task"] [19:59:59] if a manager get involved (checking via ldap lookup), I then automatically !log systemctl restart gerrit [20:00:03] Project mwcore-phpunit-coverage-master build #230: 04FAILURE in 5 hr 0 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/230/ [20:00:10] I still have to amnually reload it though :-\ [20:00:18] more seriously, [20:00:37] Paladox seems to have some lead and proposed a patch upstream to potentially work around the deadlock of doom [20:00:53] all the rest. I must confess I am only as good as restarting the service :-\ [20:01:12] hashar: new hardware for gerrit arriving very soon [20:01:23] hashar i think upstream want us to test it before they merge [20:01:28] to confirm it fixes the issue [20:02:29] hashar: by Oct 22nd we will have 64GB RAM and run on buster if nothing unexpected happens [20:07:46] Hmm, the flooding of wikibugs is probably my fault. [20:07:57] Individually DoSing CI with patches. [20:12:45] paladox: yeah we should pick your patch up and test it out something [20:13:13] I think Tyler said to delay to after the hardware/os upgrade [20:13:35] and yeah 64G will definitely help !!! [20:13:39] yeh that's better doing it after the hardware change to make sure that no issues arise. [20:14:01] do you stick to java8 or is there an upgrade to java11 at the same time? [20:15:15] paladox: https://phabricator.wikimedia.org/T175929 ?:) [20:15:25] James_F: https://phabricator.wikimedia.org/T231733 [20:15:32] heh [20:15:38] i created a change [20:15:56] James_F: https://phabricator.wikimedia.org/T112032 :p [20:16:19] maybe that deserves "reopen". not sure [20:17:17] paladox: awww. abandoned. i see. but also ticket from 2017. should the ticket be declined ? [20:18:12] oh, i have another one opened [20:18:17] https://gerrit.wikimedia.org/r/#/c/labs/tools/wikibugs2/+/491868/ [20:18:24] mutante: Not really Phab actions. But yes. [20:19:01] Yeah, next time I see Merlin around I'll talk about improvements. [20:19:44] :)) [20:23:22] James_F: Kunal has a bot to handle the upgrade, it has some smartness monitoring the gate-and-submit queue to throttle the +2ing of patches [20:23:32] that is rather nice for extremely massive upgrades [20:23:47] but for a few repositories, it is probablt not worth the hassle [20:23:56] hashar: Yes, it's nice, but he's not around and it's relatively pressing to get rid of huge ignore lists for phan, so… [20:24:09] yeah yeah [20:24:21] no complain, just making sure everyone knows about the Library Upgrader [20:24:38] maybe we should productionize it / make it reusable [20:24:54] That was Kunal's plan, but he got busy, so it's now half done. [20:25:06] but it is at least half complete! [20:25:08] Which means I can't use it any more (I used to be able to use it locally). [20:25:13] which is better than not done at all :] [20:25:17] oh [20:25:19] :-\\\\\\\\\ [20:25:21] Yeah. [20:25:27] Oh well, he'll be back. [20:26:04] I am sometime wondering whether that is the kind of tasks we could ask help on from developers [20:35:16] Maybe, but it's also a fiddly series of interrelated patches in lots of repos. [21:43:28] 10Gerrit, 10cloud-services-team (Kanban): Create a new repository for kubernetes tool under a new directory in gerrit that doesn't exist yet - https://phabricator.wikimedia.org/T235279 (10thcipriani) >>! In T235279#5567259, @hashar wrote: > What Paladox did :] +1 Looks like this has created the "gerrit.wikim... [22:38:43] 10Gerrit, 10cloud-services-team (Kanban): Create a new repository for kubernetes tool under a new directory in gerrit that doesn't exist yet - https://phabricator.wikimedia.org/T235279 (10Bstorm) So! Now we have cloud/toolforge in place, and I'm following the usual procedure to make my new repo for T234231.... [22:38:55] 10Gerrit, 10cloud-services-team (Kanban): Create a new repository for kubernetes tool under a new directory in gerrit that doesn't exist yet - https://phabricator.wikimedia.org/T235279 (10Bstorm) 05Open→03Resolved a:03Bstorm [22:40:10] 10Phabricator, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10Operations, 10serviceops: Reimage both phab1001 and phab2001 to stretch / buster - https://phabricator.wikimedia.org/T190568 (10Papaul)