[00:11:31] 10Release-Engineering-Team, 10Recommendation-API, 10Services, 10serviceops, 10Patch-For-Review: Migrate recommendation-api to kubernetes - https://phabricator.wikimedia.org/T241230 (10bmansurov) @akosiaris thanks! The first two points were already done. I've created a chart and uploaded a patch. Would y... [04:32:07] Project mwcore-phpunit-coverage-master build #433: 04FAILURE in 1 hr 32 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/433/ [09:22:21] 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Release, 10Train Deployments, 10User-brennen: 1.35.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T233864 (10WMDE-leszek) [09:34:46] 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-01 to 2020-03 (Q3)), 10Graphite, 10User-greg: Increase retention time of graphite stats for CI - https://phabricator.wikimedia.org/T242826 (10fgiunchedi) No idea @greg right off the bat, although I see the panel `Gate... [10:07:05] (03PS1) 10Addshore: experimental node10 for wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/565985 (https://phabricator.wikimedia.org/T211784) [10:07:42] (03PS2) 10Addshore: experimental node10 for wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/565985 (https://phabricator.wikimedia.org/T211784) [10:08:46] (03CR) 10Addshore: [C: 03+2] experimental node10 for wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/565985 (https://phabricator.wikimedia.org/T211784) (owner: 10Addshore) [10:09:53] (03Merged) 10jenkins-bot: experimental node10 for wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/565985 (https://phabricator.wikimedia.org/T211784) (owner: 10Addshore) [10:10:27] !log reload zuul for https://gerrit.wikimedia.org/r/#/c/integration/config/+/565985/ [10:10:29] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:10:53] !log the last zuul reload also pulled in "jjb: [java8-sonar-scanner] version bump to 0.5.4" .. T238004 [10:10:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:10:56] T238004: Implement sonarcloud integration for Java projects in the same way as PHP projects - https://phabricator.wikimedia.org/T238004 [10:37:58] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Testing, 10MediaWiki-User-login-and-signup, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: Login to en.wikipedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login s... - https://phabricator.wikimedia.org/T243123 [10:40:00] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Testing, 10MediaWiki-User-login-and-signup, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: Login to en.wikipedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login s... - https://phabricator.wikimedia.org/T243123 [10:42:27] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Testing, 10MediaWiki-User-login-and-signup, 10Quality-and-Test-Engineering-Team (QTE), 10User-zeljkofilipin: Login to en.wikipedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login s... - https://phabricator.wikimedia.org/T243123 [10:43:06] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Testing, 10MediaWiki-User-login-and-signup, 10MediaWiki-extensions-CentralAuth, and 2 others: Login to en.wikipedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login sesion` - https://phabricator.wikimedia.org/T243123 (10zeljko... [10:46:09] 10Continuous-Integration-Config, 10Wikidata: Wikibase CI: Quibble job should possibly include Math extension - https://phabricator.wikimedia.org/T201496 (10Lucas_Werkmeister_WMDE) {T243122} would also have been caught by this. [10:46:55] (03PS1) 10Addshore: node10 for wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/566001 (https://phabricator.wikimedia.org/T211784) [10:47:06] (03CR) 10Addshore: [C: 03+2] node10 for wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/566001 (https://phabricator.wikimedia.org/T211784) (owner: 10Addshore) [10:48:00] (03Merged) 10jenkins-bot: node10 for wikidata/query/gui [integration/config] - 10https://gerrit.wikimedia.org/r/566001 (https://phabricator.wikimedia.org/T211784) (owner: 10Addshore) [10:48:13] !log reload zuul for https://gerrit.wikimedia.org/r/#/c/integration/config/+/566001/ (node10 wikidata query gui) [10:48:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:54:07] 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO, 10JavaScript, 10Patch-For-Review: Upgrade all CI jobs from node6/npm3 to node10/npm6 across all projects - https://phabricator.wikimedia.org/T211784 (10Addshore) [11:35:34] 10Gerrit, 10Gerrit-Privilege-Requests, 10Massmailer: Give access to l10n-bot - https://phabricator.wikimedia.org/T243183 (10abi_) [11:51:31] 10Release-Engineering-Team, 10Recommendation-API, 10Services, 10serviceops, 10Patch-For-Review: Migrate recommendation-api to kubernetes - https://phabricator.wikimedia.org/T241230 (10akosiaris) >>! In T241230#5815630, @bmansurov wrote: > @akosiaris thanks! The first two points were already done. I've c... [12:20:52] 10Gerrit, 10Gerrit-Privilege-Requests, 10Massmailer: Give access to l10n-bot for repository labs/tools/massmailer - https://phabricator.wikimedia.org/T243183 (10Aklapper) [12:47:11] 10Gerrit, 10Gerrit-Privilege-Requests, 10Massmailer, 10User-MarcoAurelio: Give access to l10n-bot for repository labs/tools/massmailer - https://phabricator.wikimedia.org/T243183 (10MarcoAurelio) 05Open→03Resolved a:03MarcoAurelio Done. [13:42:34] (03Abandoned) 10Kosta Harlan: Add missing runuser for Apache and make directory for php-fpm [releng/dev-images] - 10https://gerrit.wikimedia.org/r/555750 (owner: 10Kosta Harlan) [14:22:39] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO, 10MediaWiki-Core-Testing, 10Google-Code-in-2019, and 3 others: Stop using jsonlint and instead use eslint-plugin-json for the linting - https://phabricator.wikimedia.org/T220036 (10Majavah) [15:20:04] Project beta-update-databases-eqiad build #39602: 04FAILURE in 3.3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39602/ [15:42:34] 10Beta-Cluster-Infrastructure, 10Wikimedia-General-or-Unknown: session_name(): Cannot change session name when headers already sent in CommonSettings.php on line 510 - https://phabricator.wikimedia.org/T243219 (10Reedy) [15:43:39] 10Beta-Cluster-Infrastructure, 10Wikimedia-General-or-Unknown: session_name(): Cannot change session name when headers already sent in CommonSettings.php on line 510 - https://phabricator.wikimedia.org/T243219 (10Reedy) [15:51:06] 10Beta-Cluster-Infrastructure, 10Wikimedia-General-or-Unknown: session_name(): Cannot change session name when headers already sent in CommonSettings.php on line 510 - https://phabricator.wikimedia.org/T243219 (10Daimona) I may be wrong, but errors like this one ("headers already sent") are often seen with som... [15:52:47] 10Beta-Cluster-Infrastructure, 10Wikimedia-General-or-Unknown: session_name(): Cannot change session name when headers already sent in CommonSettings.php on line 510 - https://phabricator.wikimedia.org/T243219 (10Reedy) >>! In T243219#5817321, @Daimona wrote: > I may be wrong, but errors like this one ("header... [16:20:03] Project beta-update-databases-eqiad build #39603: 04STILL FAILING in 3.1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39603/ [16:23:00] 10Beta-Cluster-Infrastructure, 10MediaWiki-Core-Testing, 10MediaWiki-User-login-and-signup, 10MediaWiki-extensions-CentralAuth, and 2 others: Login to en.wikipedia.beta.wmflabs.org sometimes fails with `There seems to be a problem with your login sesion` - https://phabricator.wikimedia.org/T243123 (10matthi... [16:29:46] can someone check free disk space on integration-agent-docker-1008? [16:30:08] I have two builds that look like they failed due to running out of space: https://integration.wikimedia.org/ci/job/quibble-vendor-selenium-docker/2644/consoleFull and https://integration.wikimedia.org/ci/job/quibble-vendor-selenium-docker/2675/consoleFull [16:30:42] or something else might be wrong with /tmp, not sure [16:30:51] “cannot create temp dir for user data dir”; “Cannot open "/tmp/server-94.xkm" to write keyboard description” [16:31:29] Lucas_WMDE: Not loads, but looks ok to me [16:31:35] /dev/vda2 19G 2.5G 16G 15% / [16:31:56] hm [16:32:11] is that df? Filesystem Size Used Avail Use% MountedOn? [16:32:23] that would mean it’s almost empty [16:32:47] oh yeah, I read it backwards [16:32:48] 15% use [16:32:48] lol [16:33:22] strange [16:33:23] Definitely none of the drives are anywhere near full [16:33:30] highest is 74% usage [16:33:36] ok, then I’m not sure where that error is coming from? [16:34:21] I’ll create a phab task [16:34:51] I also see.. [16:34:51] 12:44:14 INFO:backend.DevWebServer:[Mon Jan 20 12:44:14 2020] PHP Notice: tempnam(): file created in the system's temporary directory in /workspace/src/vendor/zordius/lightncandy/src/LightnCandy.php on line 152 [16:34:51] 12:44:14 INFO:backend.DevWebServer:[Mon Jan 20 12:44:14 2020] Can not generate tmp file under /tmp!! [16:35:57] oh yeah, loads of those [16:36:02] I didn’t notice that [16:36:15] (well, “loads” in the second, not in the first build) [16:37:36] 10Continuous-Integration-Infrastructure: Occasional build failures related to temporary directories on integration-agent-docker-1008 - https://phabricator.wikimedia.org/T243223 (10Lucas_Werkmeister_WMDE) [16:39:37] 10Continuous-Integration-Infrastructure: Occasional build failures related to temporary directories on integration-agent-docker-1008 - https://phabricator.wikimedia.org/T243223 (10Lucas_Werkmeister_WMDE) The file system should have plenty of disk space (checked by @Reedy): ` Filesystem Size Used Avail Use... [16:40:10] 10Continuous-Integration-Infrastructure: Occasional build failures related to temporary directories on integration-agent-docker-1008 - https://phabricator.wikimedia.org/T243223 (10Lucas_Werkmeister_WMDE) Not high priority for now, let’s see if this keeps happening I guess. [16:46:21] hm, https://integration.wikimedia.org/ci/job/quibble-vendor-selenium-docker/2677/console seems to have run out of space outside of /tmp [16:46:25] 17:09:41 error: cannot lock ref 'refs/tags/1.31.4': Unable to create '/workspace/src/.git/refs/tags/1.31.4.lock': No space left on device [16:47:23] 10Continuous-Integration-Infrastructure: Occasional build failures related to temporary directories / no space left on integration-agent-docker-1008 - https://phabricator.wikimedia.org/T243223 (10Lucas_Werkmeister_WMDE) [16:47:35] https://p.defau.lt/?3MlJ3K5WUvrNlHCr_nV8jg [16:49:29] can that build really have filled up 16GB of available disk space… during the `git clone` step? [16:49:32] very strange [16:50:07] I can't believe so.. [16:50:17] My clone of wmf prod stuff, with git objects is 1.3G [16:50:38] It's not some weirdness with the docker containers underneath is it? [16:52:05] I don’t know enough docker for that [16:52:16] Me neither, unfortunately [16:52:26] FWIW, it is a WMF holiday today, so the Americans are likely out all day [16:52:51] oh right [16:53:01] I only found out on Friday :D [16:53:20] no deploys today? [16:53:36] so that’s why jouncebot didn’t ping me about SWAT today ^^ [16:53:51] no deploys according to calendar [16:53:55] I see, Martin Luther Kind day [16:54:03] *King [16:54:24] oh, and next week is the all hands, all week [16:54:31] It is! [16:54:58] party week :P [17:11:36] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO, 10MediaWiki-Core-Testing, 10Google-Code-in-2019, and 3 others: Stop using jsonlint and instead use eslint-plugin-json for the linting - https://phabricator.wikimedia.org/T220036 (10Majavah) [17:20:07] Project beta-update-databases-eqiad build #39604: 04STILL FAILING in 6.6 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39604/ [17:23:53] Reedy: do you think T243129 has a fix? [17:23:54] T243129: deployment-cache-upload05: Several millions of logstash error entries - https://phabricator.wikimedia.org/T243129 [17:24:27] turn it off [17:25:23] lolol [17:25:41] turn of prometheus? [17:25:47] the whole machine [17:26:01] I'm... not sure that's the right path? [17:26:08] It fixes those errors [17:26:13] 10Beta-Cluster-Infrastructure, 10Operations, 10observability: deployment-cache-upload05: Several millions of logstash error entries - https://phabricator.wikimedia.org/T243129 (10Reedy) [17:26:36] I'm not familiar with prometheus, someone in SRE should be able to weigh in a bit [17:26:53] I think godog is? [17:29:34] The last Puppet run was at Mon Jan 6 14:11:55 UTC 2020 (20357 minutes ago). [17:29:55] on deployment-cache05 ? [17:29:58] yes [17:29:58] weird [17:30:11] I wouldn't be surprised if that's part of the issue [17:30:11] so something is broken [17:30:15] yeah [17:30:28] * Reedy pokes [17:32:17] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Resource type not found: Wmflib::Service at /etc/puppet/modules/wmflib/functions/service/fetch.pp:4:51 on node deployment-cache-upload05.deployment-prep.eqiad.wmflabs [17:32:19] Reedy: [17:32:31] the problem would be prometheus-varnish-exporter (i.e. the thing that talks to varnish on one end and exposes metrics over http on the other) [17:32:51] godog: I think I've seen entries with varnish as well [17:33:12] yeah quite likely varnish isn't running as it should, and the exporter barfs [17:33:28] is the puppetmaster borked too? [17:34:06] I don't think so. Let me log in [17:34:10] It's fine for other hosts [17:34:41] hmm [17:34:42] The last Puppet run was at Mon Jan 20 11:00:02 UTC 2020 (394 minutes ago). [17:34:52] on deploy01 [17:35:04] indeed [17:35:29] Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Resource type not found: Wmflib::Service at /etc/puppet/modules/wmflib/functions/service/fetch.pp:4:51 on node deployment-deploy01.deployment-prep.eqiad.wmflabs [17:35:58] so we have the varnish-exporter and the puppet went to $#!t€ [17:36:43] That erroring line was only added a month or so ago [17:47:53] PROBLEM - Parsoid on deployment-parsoid09 is CRITICAL: connect to address 172.16.5.63 and port 8000: Connection refused [17:47:53] PROBLEM - Parsoid on deployment-mediawiki-parsoid10 is CRITICAL: connect to address 172.16.0.141 and port 8000: Connection refused [17:59:54] hauskatze: puppet broken due to a bug in puppet 4 [18:00:14] ouch [18:00:28] puppet version 4 you mean? [18:00:42] puppetmaster at d-p is 03 iirc [18:03:18] 10Beta-Cluster-Infrastructure, 10Operations: Upgrade puppet in deployment-prep - https://phabricator.wikimedia.org/T243226 (10Reedy) [18:14:50] 10Beta-Cluster-Infrastructure, 10Operations: Upgrade puppet in deployment-prep - https://phabricator.wikimedia.org/T243226 (10MarcoAurelio) FWIW: ` maurelio@deployment-deploy01:~$ sudo puppet --version 4.8.2 ` to which version do we need to upgrade? [18:20:03] Project beta-update-databases-eqiad build #39605: 04STILL FAILING in 3.1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39605/ [18:20:18] 10Beta-Cluster-Infrastructure, 10Operations: Upgrade puppet in deployment-prep - https://phabricator.wikimedia.org/T243226 (10Reedy) 5.5 I think [18:59:25] 10Release-Engineering-Team (Unit & Int & System Tooling), 10Release-Engineering-Team-TODO, 10MediaWiki-Core-Testing, 10Google-Code-in-2019, and 3 others: Stop using jsonlint and instead use eslint-plugin-json for the linting - https://phabricator.wikimedia.org/T220036 (10Majavah) [19:09:01] Yippee, build fixed! [19:09:01] Project mwcore-phpunit-coverage-master build #434: 09FIXED in 4 hr 9 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/434/ [19:20:03] Project beta-update-databases-eqiad build #39606: 04STILL FAILING in 3.1 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39606/ [20:01:36] 10Beta-Cluster-Infrastructure, 10Operations: Upgrade puppet in deployment-prep - https://phabricator.wikimedia.org/T243226 (10Krenair) am guessing this is just us needing to get a new puppetmaster with buster instead of stretch [20:20:04] Project beta-update-databases-eqiad build #39607: 04STILL FAILING in 3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39607/ [21:20:07] Project beta-update-databases-eqiad build #39608: 04STILL FAILING in 6.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39608/ [21:24:40] !log fix Jenkins Console Section patterns to account for change in Quibble output (no more colors and now "cmd: << Start/Finish x" instead of "cmd:x finished" [21:24:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:20:03] Project beta-update-databases-eqiad build #39609: 04STILL FAILING in 3.2 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39609/ [22:21:25] wikibase error ^ [22:21:39] "TypeError from line 105 of /srv/mediawiki-staging/php-master/extensions/Wikibase/repo/includes/Store/PropertyTermsRebuilder.php: Argument 1 passed to Wikibase\\Repo\\Store\\PropertyTermsRebuilder::saveTerms() must be an instance of Wikibase\\DataModel\\Entity\\Property, null given" [22:37:22] paladox: I filed a bug hours ago ;) [22:37:29] oh, heh [22:37:51] https://phabricator.wikimedia.org/T243218 [23:20:03] Project beta-update-databases-eqiad build #39610: 04STILL FAILING in 3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/39610/