[00:05:51] legoktm: Is there a way to try and upgrade zlib locally in an already built container, and would that affect nodejs or is it compiled into it already? [00:05:59] Not sure how stuff works at that level. [00:06:51] you could try downloading and installing the debian buster .deb [00:06:59] it looks ABI compatible so it theoretically would work [00:07:22] assuming that zlib1g in buster doesn't have any other dependencies on stuff in buster [00:07:25] k, there'll be like 10 things in there I've never done before, but I'll have a go. [00:14:22] paladox, possibly but it looks like the remote is gerrit? it did 23:14:34 + git remote add zuul git://contint2001.wikimedia.org/operations/puppet [00:14:25] hm I suppose that is a jenkins server [00:14:35] but if it were internal why would it be pulling over git:// [00:14:49] yup, it pulls for git-daemon [00:14:53] instead of the local filesystem or something. dunno, feels odd [00:15:10] *from [00:16:24] that's definitly a internel ref to zuul. [00:19:57] legoktm: OK. I went for the sources.list approach , adding buster temporarily instead of the three lines currently there. Then apt-get update, apt search zlib1g, to see it offers 1.2.11-… and "upgradeable from 1.2.8-…, then installing it and confirming again via apt search that it says [installed]. And.... well, now it works locally. I can confirm it still fails in the unpatched container. [00:21:09] That's not really surprising, but at least it shows that it works with newer zlib. The question then becomes: Is it broken because we're running Node 10 packaged for buster on stretch, or because Node 10 in general seems to be incompatible with Debian Stretch's zlib. [00:21:36] but either way, quickest way to get us back on track might be to upgrade zlib. [00:22:11] I imagine Debian won't touch zlib on stretch unless it affects the node version with stretch. [00:22:21] (which is node 8, not node 10) [00:22:38] but I suppose Node.js will want to make sure they work on stretch, but that's their issue, and could take a while to get fixed. [00:30:17] a backport of zlimb in stretch seems very very unlikely [00:36:16] paladox, something else is wrong with jenkins too, see -operations [00:44:20] Hmm [01:13:37] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [01:21:57] oops [01:21:59] zuul looks stuck [01:24:50] so that's like three different weird jenkins/zuul behaviours now [01:26:04] ohh [01:26:11] that could be because zuul uses /p/ [01:27:36] 10Release-Engineering-Team (Kanban), 10Zuul, 10Patch-For-Review: Patch zuul to remove /p/ from /info/refs call - https://phabricator.wikimedia.org/T214807 (10Paladox) [01:27:44] 10Gerrit, 10Wikimedia-General-or-Unknown, 10Documentation, 10Epic, and 3 others: Update Gerrit /r/p/ links to /r/ - https://phabricator.wikimedia.org/T218844 (10Paladox) [01:28:45] Krenair: nah, it's because I overloaded it [01:28:47] it's moving now [01:30:33] or that ^^ :) [02:05:46] legoktm, any idea why it's upset in https://integration.wikimedia.org/ci/job/operations-puppet-tests-stretch-docker/8597/console ? [02:07:34] 17:33:27 mv: cannot stat '/srv/workspace/puppet/.tox/log/*': No such file or directory [02:07:39] someone reported that earlier [02:07:53] do we have a ticket? [02:07:58] if not I could make one [02:08:03] otherwise would like to subscribe [02:09:01] Krenair: I don't see it, so creating one is probably a good idea [02:10:16] legoktm Krenair: that's a red herring, most likely a real issue you missed somewhere above [02:10:37] I reported it before, check my history (on phone right now) [02:10:38] hmm [02:10:55] It has a || true [02:10:57] https://phabricator.wikimedia.org/T218962 [02:11:39] I don't see anything similar to that [02:11:46] but you're right about the || true [02:12:00] so [02:12:09] it can't be the tox logs problem [02:12:13] right [02:12:15] Right [02:12:25] but it's also not a puppet thing like with greg's, afaict [02:13:03] No, just some other failed test most likely [02:13:07] :) [02:13:39] I'm still wondering about that wmf_style thing paladox pointed out [02:13:46] no new violations listed [02:13:48] but delta 1 [02:13:50] is that normal? [02:14:25] it might be the source of the problem [02:14:47] https://integration.wikimedia.org/ci/job/operations-puppet-tests-stretch-docker/8580/console - here's one that succeeded, delta 0 [02:14:52] no new violations [02:14:59] I think wmf_style is up to something [02:15:24] I doint think that module was fixed to pass the wmf style thing [02:16:36] sure but wmf_style should list the error if it's going to fail on that basis [02:32:35] guess I'll make a ticket then [02:35:04] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Unexplained failure of puppet commit - https://phabricator.wikimedia.org/T219085 (10Krenair) [02:38:25] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [02:40:18] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Unexplained failure of puppet commit, possibly wmf_style related - https://phabricator.wikimedia.org/T219085 (10Krenair) [02:55:09] PROBLEM - Puppet staleness on deployment-restbase02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [03:08:17] 10Beta-Cluster-Infrastructure, 10Mathoid, 10Operations, 10Core Platform Team Backlog (Watching / External), and 2 others: remove mathoid from scb - https://phabricator.wikimedia.org/T200832 (10Krenair) deployment-mathoid still exists and has been failing puppet runs since December 3rd when profile::mathoid... [03:09:05] 10Beta-Cluster-Infrastructure: Migrate away from Debian Jessie to Debian Stretch - https://phabricator.wikimedia.org/T218729 (10Krenair) [03:25:26] 10Gerrit, 10Release-Engineering-Team, 10Operations: Create Gerrit Administrator right policy - https://phabricator.wikimedia.org/T218686 (10Legoktm) >>! In T218686#5039340, @hashar wrote: > We need a dedicated policy. Granting Gerrit administrative rights is a lot more responsibilities and require a lot of t... [03:30:16] 10Deployments, 10Release-Engineering-Team (Backlog): Review removal of ukwikimedia wiki - https://phabricator.wikimedia.org/T218170 (10Legoktm) AIUI the wiki was never formally deleted (just closed), the domain was redirected and the wiki continued to exist until these errors started cropping up and people beg... [03:36:22] 10Deployments, 10Release-Engineering-Team (Backlog): Review removal of ukwikimedia wiki - https://phabricator.wikimedia.org/T218170 (10Peachey88) [03:39:55] 10MediaWiki-Codesniffer, 10MediaWiki-Documentation, 10Documentation, 10Patch-For-Review, 10Upstream: Doxygen doesn't handle `@inheritDoc` by default, only `@inheritdoc` - https://phabricator.wikimedia.org/T219001 (10Legoktm) Filed an issue upstream for #3: https://github.com/doxygen/doxygen/issues/6900 [03:44:55] PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 2.25% of data above the critical threshold [3.0] [04:12:23] !log removed php7.0-fpm package (conflicting with php7.2-fpm) and removed /etc/nginx/sites-enabled/default (conflicting with apache, puppet will remove the available copy too) from -deploy02, -jobrunner03, -mwmaint01, and -mediawiki-07 hosts to try to get puppet there happy again [04:12:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [04:34:14] 10Beta-Cluster-Infrastructure: Get rid of deployment-db0[34] - https://phabricator.wikimedia.org/T219087 (10Krenair) [04:36:44] 10Beta-Cluster-Infrastructure: Migrate away from Debian Jessie to Debian Stretch - https://phabricator.wikimedia.org/T218729 (10Krenair) [04:37:13] 10Beta-Cluster-Infrastructure: Get rid of deployment-db0[34] - https://phabricator.wikimedia.org/T219087 (10Krenair) [04:42:19] 10Beta-Cluster-Infrastructure: Get rid of deployment-db0[34] - https://phabricator.wikimedia.org/T219087 (10Krenair) So I suggest we: * Confirm db03 is unused, we have any data we need from it, then eliminate it. * Create deployment-db06 as stretch and begin the process of copying data in from deployment-db05, a... [04:49:02] 10Beta-Cluster-Infrastructure, 10cloud-services-team: Ensure we are unlikely to have both deployment-prep DB instances hosted together again in future - https://phabricator.wikimedia.org/T219088 (10Krenair) [04:49:45] 10Beta-Cluster-Infrastructure, 10cloud-services-team: Ensure we are unlikely to have both deployment-prep DB instances hosted together again in future - https://phabricator.wikimedia.org/T219088 (10Krenair) [04:52:37] 10Beta-Cluster-Infrastructure, 10cloud-services-team: Ensure we are unlikely to have both deployment-prep DB instances hosted together again in future - https://phabricator.wikimedia.org/T219088 (10Krenair) [05:00:00] 10Beta-Cluster-Infrastructure: Puppet error on deployment-imagescaler03 due to conflicting Node.js packages - https://phabricator.wikimedia.org/T219089 (10Krenair) [05:01:57] 10Beta-Cluster-Infrastructure: deployment-mediawiki-09 has puppet error due to libpcre3 package problems - https://phabricator.wikimedia.org/T219090 (10Krenair) [05:02:44] 10Beta-Cluster-Infrastructure: deployment-mediawiki-09 has puppet error due to libpcre3 package problems - https://phabricator.wikimedia.org/T219090 (10Krenair) `root@deployment-mediawiki-09:~# apt-cache policy libpcre3 libpcre3: Installed: 2:8.42-1+0~20190203125157.5+stretch~1.gbp79d75d Candidate: 2:8.42-1+... [05:14:52] RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [2.0] [05:39:21] !log cleaned up old puppet certs/nodes -certcentral-testclient03 -certcentral-testdns -certcentral03 -zotero01 -eventgate-analytics -t153468-test -rd3-cptest-master01 -maps05 [05:39:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [05:49:15] 10Beta-Cluster-Infrastructure, 10cloud-services-team: Ensure we are unlikely to have both deployment-prep DB instances hosted together again in future - https://phabricator.wikimedia.org/T219088 (10bd808) The openstack::monitor::spreadcheck Puppet module sets up the nrpe monitor for Toolforge instances. Adding... [05:51:37] 10Beta-Cluster-Infrastructure, 10cloud-services-team, 10Patch-For-Review: Ensure we are unlikely to have both deployment-prep DB instances hosted together again in future - https://phabricator.wikimedia.org/T219088 (10Krenair) a:03Krenair [06:11:45] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<40.00%) [07:16:47] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:53:27] PROBLEM - Host integration-slave-docker-1046 is DOWN: CRITICAL - Host Unreachable (172.16.1.115) [10:22:02] (03CR) 10Umherirrender: [C: 03+1] Whitelist more phan annotations (032 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/498686 (owner: 10Legoktm) [10:52:56] Hi everyone, do we have some problems with Gerrit HTTP auth? I'm unable to do git push, with remote set as https://gerrit.wikimedia.org/r/labs/tools/urbanecmbot/, where 'unable' means git complains 'fatal: Authentication failed for 'https://gerrit.wikimedia.org/r/labs/tools/urbanecmbot/'' [10:53:09] I've regenerated Gerrit HTTP password several times [10:53:22] the non-anonymous URL from Gerrit doesn't work too [10:54:13] ftr, I'm trying it at Toolforge, but trying the same thing locally doesn't work either [11:35:43] can you push with ssh auth or no? [12:01:19] apergos: you mean like pushing straight into the repo? [12:01:57] no... was that the intent here, to push to the actual repo than to refs/for/master or whatever it is? [12:02:47] Oh I see what he means [12:02:54] He wants to push over https [12:03:04] Urbanecm: use your ldap password [12:04:28] 10Release-Engineering-Team (Watching / External), 10wikitech.wikimedia.org, 10Wikimedia-production-error, 10cloud-services-team (Kanban): labtestweb2001 is sending updates to a read-only db host: db2037 - https://phabricator.wikimedia.org/T201082 (10GTirloni) [12:07:30] paladox, okay. Why is the "HTTP credentials" thing in settings then? [12:07:34] Is there a link about this change? [12:08:05] I doint think I can say why, but that’s been fixed. [12:09:17] ok, thx [12:09:58] 10Phabricator, 10Security-Team, 10Striker, 10Patch-For-Review, 10cloud-services-team (Kanban): Unable to mirror repository from git.legoktm.com into diffusion - https://phabricator.wikimedia.org/T143969 (10GTirloni) [12:10:39] 10Project-Admins, 10Striker, 10cloud-services-team (Kanban): Allow self-service creation of Maniphest projects for Tools - https://phabricator.wikimedia.org/T144111 (10GTirloni) [12:33:35] 10Continuous-Integration-Config, 10Operations, 10Operations-Software-Development, 10Patch-For-Review: Puppet tox: properly lint both Py2 and Py3 files - https://phabricator.wikimedia.org/T184435 (10Volans) Given that py2 EOL is at the end of 2019, I'm not sure it's worth to spend our energies to make this... [12:49:21] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic, 10HTTPS: https://sv.wikipedia.beta.wmflabs.org/ has invalid certificate - https://phabricator.wikimedia.org/T202564 (10Zoranzoki21) 05Open→03Resolved [12:49:26] 10Beta-Cluster-Infrastructure, 10Parsing-Team, 10VisualEditor, 10VisualEditor-MediaWiki, and 3 others: Swedish beta cluster wiki is busted (Parsoid/RESTBase not set up?) - https://phabricator.wikimedia.org/T191184 (10Zoranzoki21) [14:13:09] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic, 10HTTPS: https://sv.wikipedia.beta.wmflabs.org/ has invalid certificate - https://phabricator.wikimedia.org/T202564 (10Krenair) 05Resolved→03Open @Zoranzoki21: they're just cherry-picks... specifically https://gerrit.wikimedia.org/r/#/c/497670/ an... [14:13:14] 10Beta-Cluster-Infrastructure, 10Parsing-Team, 10VisualEditor, 10VisualEditor-MediaWiki, and 3 others: Swedish beta cluster wiki is busted (Parsoid/RESTBase not set up?) - https://phabricator.wikimedia.org/T191184 (10Krenair) [16:04:57] 10Beta-Cluster-Infrastructure: Get rid of deployment-db0[34] - https://phabricator.wikimedia.org/T219087 (10Krenair) It doesn't look like mysql/mariadb is actually running on db03 so I'm going to shut it off and, assuming nothing comes up, delete it in a couple of weeks. [16:06:38] !log shut off old deployment-db03 instance per T219087 [16:06:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:06:41] T219087: Get rid of deployment-db0[34] - https://phabricator.wikimedia.org/T219087 [16:09:25] 10Phabricator (Upstream), 10Upstream: Add task status to phabricator notification mails - https://phabricator.wikimedia.org/T181001 (10Aklapper) 05Open→03Declined As described by Hashar, this can be achieved already: See `task-status(status)` on https://www.mediawiki.org/wiki/Phabricator/Help/Managing_mail... [16:10:30] PROBLEM - Host deployment-db03 is DOWN: CRITICAL - Host Unreachable (172.16.5.23) [16:13:53] 10Project-Admins: Rename #Wikimedia-production-error to something less generic? - https://phabricator.wikimedia.org/T216795 (10Aklapper) Thanks for taking a closer look! Does anyone have an idea where that ["Report Application Error" form](https://phabricator.wikimedia.org/maniphest/task/edit/form/46/) is 'promi... [16:17:57] 10Phabricator, 10Mail: "Wikimedia Foundation mail couldn't verify phabricator.wikimedia.org actually sent this message (and not a spammer)" in GMail - https://phabricator.wikimedia.org/T144381 (10Aklapper) 05Open→03Resolved >>! In T144381#4995774, @Aklapper wrote: > SPF recently got fixed in T216714; wonde... [16:28:05] PROBLEM - Host integration-publishing02 is DOWN: CRITICAL - Host Unreachable (172.16.4.5) [17:09:23] 10Beta-Cluster-Infrastructure: Get rid of deployment-db0[34] - https://phabricator.wikimedia.org/T219087 (10Krenair) Created deployment-db06, did standard deployment-prep puppet cert stuff, did mysql setup using my steps from T216067#4952271 [18:37:21] 10Beta-Cluster-Infrastructure: Get rid of deployment-db0[34] - https://phabricator.wikimedia.org/T219087 (10Krenair) Running: * `nc -l -p 9210 | /opt/wmf-mariadb101/bin/mbstream -x` from /srv/sqldata in screen `import` on deployment-db06 * `mariabackup --innobackupex --open-files-limit=8000 --stream=xbstream /sr... [19:05:51] 10Scap: `scap update-interwiki-cache` needs unbreaking - https://phabricator.wikimedia.org/T219103 (10MarcoAurelio) [19:08:59] 10Scap: `scap update-interwiki-cache` needs unbreaking - https://phabricator.wikimedia.org/T219103 (10MarcoAurelio) In the meanwhile I guess [[ /diffusion/EWMA/browse/master/dumpInterwiki.php | dumpInterwiki.php ]] is our friend. [19:18:11] 10Beta-Cluster-Infrastructure: Get rid of deployment-db0[34] - https://phabricator.wikimedia.org/T219087 (10Krenair) That completed, on deployment-db06 ran `mariabackup --innobackupex --apply-log --use-memory=12G /srv/sqldata`, `chown -R mysql: /srv`, `service mariadb start` Ran the following using /root/mysql.s... [19:31:05] Urbanecm: push over HTTPS was disabled temporarily recently, lemme find the ticket [19:31:17] thanks legoktm [19:31:23] Urbanecm: https://phabricator.wikimedia.org/T218750 [19:32:02] thx #2 [19:34:17] (03CR) 10MaxSem: [C: 03+2] Whitelist more phan annotations [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/498686 (owner: 10Legoktm) [19:35:05] (03Merged) 10jenkins-bot: Whitelist more phan annotations [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/498686 (owner: 10Legoktm) [19:46:59] (03CR) 10jenkins-bot: Whitelist more phan annotations [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/498686 (owner: 10Legoktm) [20:01:14] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: Unexplained failure of puppet commit, possibly wmf_style related - https://phabricator.wikimedia.org/T219085 (10Krenair) So I read through the code that does this in `rake_modules/taskgen.rb` and got suspicious that something appearing i... [20:29:18] 10Project-Admins: Rename #Wikimedia-production-error to something less generic? - https://phabricator.wikimedia.org/T216795 (10Krinkle) {F28458343} [20:42:35] James_F: Looks like mFR is failing for ooui due to src/ not existing. [21:05:26] PROBLEM - Host deployment-sessionstore01 is DOWN: CRITICAL - Host Unreachable (172.16.3.4) [21:08:06] (03PS1) 10Krinkle: Allow +2'ers to set owner (not just the patch owner). [core] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/498739 [21:09:03] (03CR) 10Krinkle: [V: 03+2 C: 03+2] Allow +2'ers to set owner (not just the patch owner). [core] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/498739 (owner: 10Krinkle) [21:09:10] (03CR) 10Krinkle: [V: 03+2] Allow +2'ers to set owner (not just the patch owner). [core] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/498739 (owner: 10Krinkle) [21:11:22] Hm.. Looks like I can't do that anymore. refs/meta has +2 for "Gerrit Managers" and "Administrator", but Submit is only for Administrators which is an empty group. [21:14:05] yup [21:14:32] Seems like either the +2 for Gerrit Mgr is unintentional, or the Submit was forgotten. They should be the same given refs/meta doesn't have Jenkins. [21:15:11] Nope, that was intentional [21:15:22] but gerrit has a safety feature for All-Projects [21:15:55] that is to prevent non admins/owners from being able to merge anything in that repo. [21:16:33] I do not think it is intentional to grant Gerrit Mgr +2 rights on refs/meta on all repos, without also granting Submit on those same refs/config commits [21:16:44] Krinkle it was [21:16:56] paladox: please explain the purpose of pressing +2? [21:17:11] if it doesn't do anything? [21:17:13] but then we found out that gerrit has a safety feature (which is then we figured out why submit was no working). [21:17:36] You are saying it is indeed unintentional, but due to a preceived bug are unable to fix it. [21:17:58] yeh [21:18:03] https://gerrit.wikimedia.org/r/#/admin/projects/All-Projects,access under "Reference: refs/meta/*" [21:18:09] well it's not a bug, but a actual safty feature :) [21:18:13] has a section for Submit rights specific to "refs/meta" [21:18:18] Gerrit Mgr can be added there. [21:18:32] ah right [21:18:33] yeh [21:19:49] actually, It already contains Project Owners which would normally suffice. Except for 'mediawiki/core' that is set to Administrators, not mediawiki. [21:25:52] yup, the safty features prevents non owners from being able to submit on refs/meta/* [21:29:13] 10Project-Admins: Rename #Wikimedia-production-error to something less generic? - https://phabricator.wikimedia.org/T216795 (10Aklapper) Ah, that's a bar I locally hide, that's why. Indeed, my volunteer account with default settings exposes "Report Application Error" too prominently: {F28458534} [21:32:22] 10Project-Admins: Rename #Wikimedia-production-error tag or "Report Application Error" form to something less generic? - https://phabricator.wikimedia.org/T216795 (10Aklapper) [22:46:58] paladox: i see, ok, that's quite useful. [22:47:10] yup