[02:19:23] 10Release-Engineering-Team, 10cloud-services-team, 10wikitech.wikimedia.org, 10Wikimedia-log-errors: labtestweb2001 is sending updates to a read-only db host: db2037 - https://phabricator.wikimedia.org/T201082 (10Krinkle) [02:19:50] 10Release-Engineering-Team, 10cloud-services-team, 10wikitech.wikimedia.org, 10Wikimedia-log-errors: labtestweb2001 is sending updates to a read-only db host: db2037 - https://phabricator.wikimedia.org/T201082 (10Krinkle) [02:23:50] 10Release-Engineering-Team, 10cloud-services-team, 10Wikimedia-log-errors: labtestweb2001: Memcached error for key "WANCache:m:global:Wikimedia\Rdbms\LoadBalancer:server-read-only:db2037" on server "127.0.0.1:11213": SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETR... - https://phabricator.wikimedia.org/T203479 [02:32:42] legoktm, looks like /usr/share/lintian/data/changes-file/known-dists is the place to make lintian be happy with stretch-wikimedia etc. [02:33:07] https://github.com/Debian/lintian/blob/13ccfaa7571f5478157a4bd92e1bfc9754ab7bea/checks/changes-file.pm#L122 [02:36:00] ahh nice :) [02:38:58] could do /usr/share/lintian/vendors/wikimedia/main/data/changes-file/known-dists I guess [02:40:22] though it still complains when I gbp buildpackage :/ [02:58:58] also the profile thing in modules/package_builder/manifests/init.pp is good [03:08:13] how do i properly link to a topic branch in gerrit in a commit message [03:08:30] when i want to say things like "after the changes in $topic_branch.. now we can do X" [03:08:53] a full URL would be https://gerrit.wikimedia.org/r/#/q/topic:icinga-stretch+(status:open+OR+status:merged) but is there a shortcut for topics within commit messages? [03:20:26] I don't think there's any shortcut [03:20:37] you can skip the "status:open OR status:merged" part though [03:21:36] *nod*.. yep. thanks [03:25:36] what would also be cool is to be able to ask gerrit to "show all open changes by people who don't have +2 / can't self-merge" [03:33:18] (03PS1) 10Legoktm: seccheck for Wikimedia-deployed MediaWiki skins and Example [integration/config] - 10https://gerrit.wikimedia.org/r/458095 [03:33:47] good night.. out [03:33:54] mutante: you can probably do -ownerin:operations I think [03:34:05] (03CR) 10Legoktm: [C: 032] seccheck for Wikimedia-deployed MediaWiki skins and Example [integration/config] - 10https://gerrit.wikimedia.org/r/458095 (owner: 10Legoktm) [03:35:37] (03Merged) 10jenkins-bot: seccheck for Wikimedia-deployed MediaWiki skins and Example [integration/config] - 10https://gerrit.wikimedia.org/r/458095 (owner: 10Legoktm) [03:35:55] !log deploying https://gerrit.wikimedia.org/r/458095 [03:35:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [04:03:59] (03PS6) 10Legoktm: Have all docker jobs use tmpfs for /tmp [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) [04:08:08] !log deployed https://gerrit.wikimedia.org/r/c/integration/config/+/457070 (tmpfs for /tmp) to all *tox*docker and *composer*docker jobs [04:08:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [04:10:03] (03CR) 10Legoktm: "Deployed to *tox*docker and *composer*docker jobs." [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm) [04:35:01] 10Phabricator: Document phabricator user groups in a central place - https://phabricator.wikimedia.org/T203511 (10Aklapper) We should turn all of them into `acl*` (see e.g. T126953) and then https://phabricator.wikimedia.org/project/query/O1eugK19EhOf/#R would basically fix this? What a group can do concretely... [05:08:59] 10Project-Admins, 10Security-Team: Undefined #Security-General and #Security-Other - https://phabricator.wikimedia.org/T109328 (10Aklapper) Thanks @chasemp! I've edited the project descriptions a bit more to be clearer. For the records, #Security-Core had three watchers (akosiaris, bawolff, jay8g), #Security-... [06:09:11] (03CR) 10KartikMistry: "> I don(t think any change NOT touching a debian file would ever need" [integration/config] - 10https://gerrit.wikimedia.org/r/457929 (owner: 10Ema) [07:30:19] 07:01:00 INFO:zuul.Cloner:Creating repo mediawiki/core from cache /srv/git/mediawiki/core.git [07:30:22] 07:01:00 DEBUG:git.cmd:AutoInterrupt wait stderr: "fatal: destination path '/src' already exists and is not an empty directory.\n" [07:33:32] Nikerabbit: hello. Please fill it as a task with a link to the Gerrit change and the Jenkins build :] [07:36:28] (03CR) 10Hashar: [C: 031] "I think my main concern was MediaWiki tests which could potentially fill that tmpfs and exhaust the host memory. If the quibble jobs ran" [integration/config] - 10https://gerrit.wikimedia.org/r/457070 (https://phabricator.wikimedia.org/T203181) (owner: 10Legoktm) [07:37:44] zeljkof: did you get wdio-mediawiki published for Popups? [07:42:14] hashar: no, got it working locally, it's in review https://gerrit.wikimedia.org/r/c/mediawiki/core/+/457942 [07:42:33] I have to reply to Krinkle's questions [07:42:56] (03Abandoned) 10Hashar: Revert "Migrate Math to Quibble" [integration/config] - 10https://gerrit.wikimedia.org/r/456411 (https://phabricator.wikimedia.org/T202266) (owner: 10Physikerwelt) [07:46:11] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:47:33] !log tested and removed a patch to operations/puppet on puppetmaster03. Solved a git rebase conflict between two changes (hope I did it well) and updated the nginx submodule [07:47:35] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:48:57] PROBLEM - Puppet errors on deployment-chromium01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:49:50] zeljkof: and I have merged the selenium-daily script thing for Math https://gerrit.wikimedia.org/r/457486 :] [07:50:12] hashar: cool! [07:51:08] math was passing anyway, we're just popups away from green for all old jogbs [07:51:09] jobs [07:52:11] have you got the mediawiki-wdio patch reviewed/merged? [07:52:39] hashar: it's in review https://gerrit.wikimedia.org/r/c/mediawiki/core/+/457942 [07:52:55] I have to reply to a few questions, will do in a few minutes, in the middle of something else [07:58:59] PROBLEM - Puppet errors on deployment-ores01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [08:04:35] (03CR) 10Hashar: [C: 032] "Thanks. And I had further discussion with ema over irc to confirm the behavior." [integration/config] - 10https://gerrit.wikimedia.org/r/457929 (owner: 10Ema) [08:06:11] (03Merged) 10jenkins-bot: Only run debian-glue when debian/* is changed [integration/config] - 10https://gerrit.wikimedia.org/r/457929 (owner: 10Ema) [08:07:26] (03CR) 10Lokal Profil: ">" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/457926 (https://phabricator.wikimedia.org/T2033099) (owner: 10Lokal Profil) [08:07:35] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/457929 (owner: 10Ema) [08:09:22] (03Abandoned) 10Lokal Profil: Add commit-message-validator to labs/tools/heritage [integration/config] - 10https://gerrit.wikimedia.org/r/457926 (https://phabricator.wikimedia.org/T2033099) (owner: 10Lokal Profil) [08:15:38] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), 10Patch-For-Review, 10User-zeljkofilipin: Run tests daily targeting beta cluster for all repositories with Selenium tests - https://phabricator.wikimedia.org/T188742 (10zeljkofilip... [08:26:11] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [08:28:58] RECOVERY - Puppet errors on deployment-chromium01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:53:35] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Easy, 10User-zeljkofilipin: All repositories with Selenium tests should use wdio-mediawiki - https://phabricator.wikimedia.org/T199113 (10zeljkofilipin) [09:39:46] (03PS1) 10Hashar: QA report: debian repo lacking debian-glue [integration/config] - 10https://gerrit.wikimedia.org/r/458134 (https://phabricator.wikimedia.org/T180330) [09:41:14] (03CR) 10Hashar: [C: 032] QA report: debian repo lacking debian-glue [integration/config] - 10https://gerrit.wikimedia.org/r/458134 (https://phabricator.wikimedia.org/T180330) (owner: 10Hashar) [09:42:37] (03Merged) 10jenkins-bot: QA report: debian repo lacking debian-glue [integration/config] - 10https://gerrit.wikimedia.org/r/458134 (https://phabricator.wikimedia.org/T180330) (owner: 10Hashar) [09:44:45] (03PS1) 10Hashar: QA report: drop debug return statement [integration/config] - 10https://gerrit.wikimedia.org/r/458136 [09:44:58] (03CR) 10Hashar: [C: 032] QA report: drop debug return statement [integration/config] - 10https://gerrit.wikimedia.org/r/458136 (owner: 10Hashar) [09:46:30] (03Merged) 10jenkins-bot: QA report: drop debug return statement [integration/config] - 10https://gerrit.wikimedia.org/r/458136 (owner: 10Hashar) [10:40:47] 10Release-Engineering-Team, 10GitHub-Mirrors, 10Wikidata, 10Composer, and 2 others: wikibase/javascript-api composer package is not installable (mainly due to a repo move) - https://phabricator.wikimedia.org/T203162 (10Addshore) As there was no responce on the old ticket on github I created a new one @ htt... [11:24:37] (03PS1) 10Hashar: Add debian-glue-non-voting on operations/ repos [integration/config] - 10https://gerrit.wikimedia.org/r/458156 (https://phabricator.wikimedia.org/T180330) [11:25:32] (03PS2) 10Hashar: Add debian-glue-non-voting on operations/ repos [integration/config] - 10https://gerrit.wikimedia.org/r/458156 (https://phabricator.wikimedia.org/T180330) [11:25:58] (03PS1) 10Hashar: Add debian-glue-non-voting to cergen [integration/config] - 10https://gerrit.wikimedia.org/r/458158 [11:26:31] (03PS1) 10Hashar: Add debian-glue-non-voting to mediawiki/services/poolcounter [integration/config] - 10https://gerrit.wikimedia.org/r/458159 [11:30:18] (03CR) 10Hashar: [C: 032] Add debian-glue-non-voting on operations/ repos [integration/config] - 10https://gerrit.wikimedia.org/r/458156 (https://phabricator.wikimedia.org/T180330) (owner: 10Hashar) [11:30:36] (03CR) 10Hashar: [C: 032] Add debian-glue-non-voting to cergen [integration/config] - 10https://gerrit.wikimedia.org/r/458158 (owner: 10Hashar) [11:30:58] (03CR) 10Hashar: [C: 032] Add debian-glue-non-voting to mediawiki/services/poolcounter [integration/config] - 10https://gerrit.wikimedia.org/r/458159 (owner: 10Hashar) [11:31:46] (03Merged) 10jenkins-bot: Add debian-glue-non-voting on operations/ repos [integration/config] - 10https://gerrit.wikimedia.org/r/458156 (https://phabricator.wikimedia.org/T180330) (owner: 10Hashar) [11:32:04] (03Merged) 10jenkins-bot: Add debian-glue-non-voting to cergen [integration/config] - 10https://gerrit.wikimedia.org/r/458158 (owner: 10Hashar) [11:32:26] (03Merged) 10jenkins-bot: Add debian-glue-non-voting to mediawiki/services/poolcounter [integration/config] - 10https://gerrit.wikimedia.org/r/458159 (owner: 10Hashar) [11:35:05] (03PS1) 10Hashar: Add mediawiki/skins/BlueSpiceCalumma [integration/config] - 10https://gerrit.wikimedia.org/r/458160 [11:35:21] (03CR) 10Hashar: [C: 032] Add mediawiki/skins/BlueSpiceCalumma [integration/config] - 10https://gerrit.wikimedia.org/r/458160 (owner: 10Hashar) [11:37:48] (03Merged) 10jenkins-bot: Add mediawiki/skins/BlueSpiceCalumma [integration/config] - 10https://gerrit.wikimedia.org/r/458160 (owner: 10Hashar) [12:16:34] PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [12:21:13] hasharAway: sorry to bother again but this is blocking me from pushing anything to beta or prod for ores: https://phabricator.wikimedia.org/T203246 [12:41:31] RECOVERY - Puppet errors on deployment-elastic05 is OK: OK: Less than 1.00% above the threshold [0.0] [13:11:01] 10Phabricator: Document phabricator user groups in a central place - https://phabricator.wikimedia.org/T203511 (10MGChecker) >>! In T203511#4558618, @Aklapper wrote: > We should turn all of them into `acl*` (see e.g. T126953) and then https://phabricator.wikimedia.org/project/query/O1eugK19EhOf/#R would basicall... [13:28:59] RECOVERY - Puppet errors on deployment-ores01 is OK: OK: Less than 1.00% above the threshold [0.0] [13:50:50] 10Phabricator, 10Release-Engineering-Team (Watching / External), 10Mail, 10Operations, and 3 others: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916 (10herron) 05Open>03Resolved a:03herron This is looking good. Here are the received headers from... [14:15:02] 10Phabricator, 10Release-Engineering-Team (Watching / External), 10Mail, 10Operations, and 3 others: Phabricator outbound email seems to have a SPOF of mx1001 - https://phabricator.wikimedia.org/T196916 (10herron) [14:22:31] howdy, could someone help me add two new hosts to the operations-puppet-catalog-compiler project in Jenkins? context is https://phabricator.wikimedia.org/T191438 [14:25:18] https://integration.wikimedia.org/ci/label/puppet-compiler-node/ [14:27:10] herron: I think, it's just a machine label... [14:27:24] let's see [14:27:56] if possible I’d like to have the new hosts be in addition to the old, for paranoias sake [14:30:10] https://integration.wikimedia.org/ci/computer/compiler1001.puppet-diffs.eqiad.wmflabs/log [14:30:35] herron: ^ It looks like SSH is struggling to connect [14:30:47] ok, timing out? maybe need to update that security group [14:30:54] let’s see [14:31:02] Hasn't yet, but I'm guessing it's going to [14:32:20] [09/05/18 14:30:31] [SSH] Opening SSH connection to 10.68.19.281:22. [14:32:20] 10.68.19.281: Name or service not known [14:32:28] lol [14:32:31] i typoed for that one [14:32:45] Good to know jerkins doesn't validate IP addresses [14:35:08] herron: Yeah, timeout [14:36:11] hmm do you know what the source address for jenkins will be? [14:37:22] Look at the other two older hosts? [14:38:00] herron: contint1001 and contint2001 [14:38:02] I guess [14:54:29] herron: Any luck? [15:01:25] sorry in a meeting [15:08:10] 10Release-Engineering-Team (Watching / External), 10cloud-services-team, 10wikitech.wikimedia.org, 10Wikimedia-production-error: labtestweb2001 is sending updates to a read-only db host: db2037 - https://phabricator.wikimedia.org/T201082 (10greg) [15:13:55] Reedy: ok, updated the security group and can telnet on port 22 from contint [15:14:43] [09/05/18 15:14:20] [SSH] The SSH key with fingerprint 5f:f7:03:6c:14:72:f8:b9:ee:69:e2:11:27:f0:a9:2e has been automatically trusted for connections to this machine. [15:14:43] ERROR: Server rejected the 1 private key(s) for jenkins-deploy (credentialId:ae711ff4-813e-4462-9a27-21bdbd4fdcb9/method:publickey) [15:14:43] ERROR: Failed to authenticate as jenkins-deploy with credential=ae711ff4-813e-4462-9a27-21bdbd4fdcb9 [15:14:43] java.io.IOException: Publickey authentication failed. [15:15:08] hooray for new errors [15:15:13] it's progress at least ;P [15:17:00] indeed! [15:17:44] has puppet provisioned the jenkins-deploy user on there? [15:22:43] yes, jenkins-deploy user is present [15:22:52] looks like the ssh key for this user is coming from ldap? [15:23:19] Has it got role::ci::slave::common? [15:23:33] let’s see [15:24:33] "jenkins-deploy (key to connect to labs instances set up with role::ci::slave::labs::common)" [15:24:41] is what the old one was set to.... [15:25:13] yes, there is require role::ci::slave::common in profile::puppet_compiler [15:32:02] 10Gerrit: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10thcipriani) Any pointers to details or pointers to tasks with details about how this was setup would be helpful for troubleshooting. I'm familiar with gerrit -> github replication, but I don't know anything ab... [15:34:22] 10Gerrit: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10Ladsgroup) {T192042} ? [15:36:54] 10Gerrit: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10thcipriani) >>! In T203246#4559851, @Ladsgroup wrote: > {T192042} ? From that task: >>! In T192042#4517100, @mmodell wrote: > It seems git-lfs does not have any support for mirroring. I think that we will hav... [15:53:45] 10Gerrit: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10Ladsgroup) so the problem is that scoring/ores/ores doesn't need git lfs but it has it because it's inside scoring/ores/ and I don't have push right to update it. Maybe Scoring platform team should have push r... [16:05:23] 10Gerrit: ORES mirrors in gerrit are not getting updated - https://phabricator.wikimedia.org/T203246 (10mmodell) @thcipriani no solution has been found yet for mirroring lfs. [16:14:49] So I'm trying to run https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master/dockerfiles/mediawiki-phan-seccheck locally. and I'm getting permission errors. I tried to use the -u option to docker to fix that, but didn't seem to work [16:15:32] should I just make my mw checkout be chmod 0666, or is there a better way here [16:22:48] bawolff: this looks interesting: https://blog.ippon.tech/docker-and-permission-management/ [16:23:08] "A user namespace extends user's rights by allowing it to access files owned by other users or groups. Its configuration is made by editing two files:" [16:23:23] /etc/subuid [16:23:25] /etc/subgid [16:25:35] Hmm. Even after I changed my mediawiki checkout to 0777, it still didn't work because it created the vendor directory as 0755, but it didn't own the directory it just created so it couldn't write in it [16:25:39] * bawolff very new to docker [16:26:39] I guess I could just run composer myself before hand, but that seems to be missing the point of all this [16:30:19] bawolff: yeah, though I'm not sure if running tests locally in docker is really the goal? Isn't it mostly driven by resource management and isolation in CI/Prod rather than dev usage? [16:30:36] twentyafterfour: true [16:30:57] This particular test is a bit of a pain to set up locally, so we're thinking of promoting docker as a way to run it [16:30:57] So maybe that wouldn't be missing the point quite so much [16:31:14] ok, so what user is docker running as, root? [16:31:30] not sure why it can't read the files it created? [16:31:37] Also that makes it so that you can get exactly the same reproducible results as jenkins [16:31:39] er write [16:32:15] twentyafterfour: I think the docker image in the virtual machine is running as root, but its mapped to me running as my normal user [16:32:53] so the crappy solution would be to change umask to avoid the 0755 [16:33:25] so the files get checked out as your user with 0755, from inside the container? [16:34:26] I think so. Docker creates directory as 755 while running as root, but the directory is created as my uid, but when it reads back from the file system it reads my uid, so it can't read them as root doesn't own the file [16:34:30] I think [16:34:38] I literally just downloaded docker today, so this is new to me [16:35:00] or maybe not. even if I change the owner of all files as root, this still happens [16:40:09] bawolff: if it's helpful, here's how we run the image in CI: https://github.com/wikimedia/integration-config/blob/master/jjb/macro-docker.yaml#L79-L86 [16:41:14] I think maybe part of this is because i'm running on mac not linux [16:41:15] maybe [16:41:23] or maybe its because i don't know what I'm doing [16:41:30] that's probably the more likely answer [16:42:00] the directories being mounted are made via mkdir -m 2777 -p {log,cache,src} [16:42:15] this blog post might be somewhat helpful: https://phabricator.wikimedia.org/phame/post/view/100/run_selenium_tests_using_quibble_and_docker/ [16:42:45] hmm, with setgid? [16:43:37] yeah, I think we have to do that so that all the logs are still accessible by jenkins-deploy at the end so we can archive them in jenkins, IIRC [16:44:30] https://en.wikipedia.org/wiki/Setuid#When_set_on_a_directory I didn't actually know what setGID did on a directory [16:44:32] TIL :) [16:45:30] heh, I've learned a great deal of shell magic from the integration/config repo :) [16:46:03] ask the CMO about it [16:47:18] chief magic officer? [16:47:21] ;D [16:48:10] Project beta-scap-eqiad build #221178: 04FAILURE in 1 min 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221178/ [16:48:32] gotta reboot in order to print a doc bah ... [16:49:36] * thcipriani looks at beta-scap-eqiad failure [16:58:43] * addshore likes magic [16:59:21] * addshore goes back to eating quid [17:09:26] why are people opping [17:09:35] testing :0 [17:09:36] :) [17:10:58] Yippee, build fixed! [17:10:59] Project beta-scap-eqiad build #221179: 09FIXED in 17 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/221179/ [17:22:02] Reedy: fwiw this is what compiler1001 sees [17:22:20] https://www.irccloud.com/pastebin/lIBEhUPm/ [17:23:43] compared fingerprints with the keys returned by `/usr/sbin/ssh-key-ldap-lookup jenkins-deploy` and it matches the first key [17:28:38] herron, is the current puppet compiler host in that puppet-diffs project? [17:29:33] herron, or is it in puppet3-diffs? [17:29:51] root@deployment-deploy01:/etc/security# ldapsearch -LLLx member=uid=jenkins-deploy,ou=people,dc=wikimedia,dc=org dn | grep dn [17:29:51] dn: cn=project-bastion,ou=groups,dc=wikimedia,dc=org [17:29:51] dn: cn=project-deployment-prep,ou=groups,dc=wikimedia,dc=org [17:29:51] dn: cn=project-integration,ou=groups,dc=wikimedia,dc=org [17:29:51] dn: cn=project-puppet3-diffs,ou=groups,dc=wikimedia,dc=org [17:29:52] root@deployment-deploy01:/etc/security# [17:30:10] no puppet-diffs on the list so I wouldn't expect it to be able to get into puppet-diffs [17:31:01] sounds like you just need to go into horizon as a projectadmin for that project and add jenkins-deploy [17:33:16] Krenair: yeah, in puppet-diffs project [17:33:19] looking [17:34:06] It looks like the current ones are compiler02.puppet3-diffs and compiler03.puppet3-diffs [17:36:27] also wow, those old jenkins keys [17:36:31] 10Release-Engineering-Team (Watching / External), 10cloud-services-team, 10wikitech.wikimedia.org, 10Wikimedia-production-error: labtestweb2001 is sending updates to a read-only db host: db2037 - https://phabricator.wikimedia.org/T201082 (10Andrew) The context here is that a while ago I moved the local lab... [17:36:41] gallium was the predecessor to contint1001 iirc [17:37:24] as for the second key, from="10.4.0.58" was an old pmtpa host, and the hostname i-00000390 indicates it was an early one [17:37:25] awesome that did the trick. thanks Krenair! [17:37:38] pmtpa was the main datacentre for years until eqiad took over [17:39:50] gotcha, yeah I was wondering about that from= [17:39:54] makes sense [17:40:39] I also am curious how this ldap group lookup is configured on the host [17:41:54] thcipriani: I uploaded https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EducationProgram/+/458220 [17:41:59] looks like pam access.conf [17:42:56] thcipriani: also it would be appreciated if releng could pick up https://phabricator.wikimedia.org/T125618#4449413 so we wouldn't have to worry about that extension anymore [17:46:00] legoktm: thank you for the patch! I'll add the EducationProgram task to our team meeting agenda for next Monday and we'll talk about it. [17:46:53] can we try adding these compiler100[12] hosts to the operations-puppet-catalog-compiler project again now that ssh as jenkins-deploy is working? [17:54:25] paladox: do you know if changing a gerrit user's name (using eg set-account: https://gerrit-review.googlesource.com/Documentation/cmd-set-account.html) will change the name associated with past comments on changes as well? I assume yes, but just checking. [17:54:48] paladox: also, do you know of a way of getting all of a user's comments? [17:55:07] greg-g yep *i think* it does [17:55:11] as it uses a rest api [17:56:04] greg-g https://stackoverflow.com/questions/41698962/how-do-i-list-all-comments-posted-on-my-changes-in-gerrit [17:56:05] :) [17:57:29] if the change is open [17:57:32] you can remove the /a/ [17:57:40] unless your POST, DELETE [17:57:53] ie [17:59:17] thanks [17:59:18] curl -s --request GET https://gerrit.wikimedia.org/r/changes/?q=owner:self+AND+status:open | sed 1d | jq --raw-output ".[] | ._number" [18:00:08] or there's [18:00:27] curl -s --request GET https://gerrit.wikimedia.org/changes/CHANGE-NUMBER/comments | sed 1d | jq --raw-output ".[] | .[] | {Updated: .updated, Message: .message}" [18:00:32] which is what you want [18:01:31] i missed /r/ before changes [18:03:26] paladox: thanks, this is mostly theoretical for now :) [18:03:37] ah ok and your welcome :) [18:05:48] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Readers-Web-Backlog, 10Jenkins: Popups and RelatedArticles daily jobs absent - https://phabricator.wikimedia.org/T203591 (10Jdlrobson) [18:06:37] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Readers-Web-Backlog, 10Jenkins: Popups and RelatedArticles daily jobs absent - https://phabricator.wikimedia.org/T203591 (10Jdlrobson) p:05Triage>03High Having no integration tests running for Popups is not acceptable IMO. We rely o... [18:06:51] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Readers-Web-Backlog, 10Jenkins: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10Jdlrobson) [18:15:03] 18:13:34 npm ERR! shasum check failed for /tmp/npm-558-f9aeb955/registry.npmjs.org/fibers/-/fibers-2.0.2.tgz [18:15:03] 18:13:34 npm ERR! Expected: 36db63ea61c543174e2264675fea8c2783371366 [18:15:03] 18:13:34 npm ERR! Actual: ba31e7c5167d3c6f969fc7a9e259a497c086a864 [18:15:03] 18:13:34 npm ERR! From: https://registry.npmjs.org/fibers/-/fibers-2.0.2.tgz [18:15:09] maintenance-disconnect-full-disks build 611 - integration-slave-docker-1026: OFFLINE due to disk space [18:16:07] we should get that ^^ to automatically clean up the space [18:16:14] PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<22.22%) [18:20:08] maintenance-disconnect-full-disks build 612 - integration-slave-docker-1026: OFFLINE due to disk space [18:22:05] (03PS41) 10Hashar: Selenium daily tests for beta using Docker/wdio [integration/config] - 10https://gerrit.wikimedia.org/r/443931 (https://phabricator.wikimedia.org/T188742) [18:23:07] uhhh [18:23:12] I don't know why 1026 is offline [18:23:15] (03CR) 10Hashar: "I have removed the junit processig part, that causes jobs to fail entirely when there is no junit xml file. T203591" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/443931 (https://phabricator.wikimedia.org/T188742) (owner: 10Hashar) [18:23:18] /dev/vda3 19G 10G 7.7G 57% / [18:23:22] /dev/mapper/vd-second--local--disk 60G 18G 40G 31% /srv [18:23:26] that's enough room [18:23:31] thcipriani: ^ [18:23:49] shinken also warned [18:23:58] so maybe it all freed up since then?? [18:24:16] maybe we could have the job include the available disk space in the offline message? [18:24:35] if a new job started and cleared the old workspace, then that's possible [18:24:53] possible that the space freed up since then I mean. [18:25:07] maintenance-disconnect-full-disks build 613 - integration-slave-docker-1026: OFFLINE due to disk space [18:25:09] tweaking the job to include disk space should be pretty easy [18:25:44] * thcipriani does [18:29:25] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Readers-Web-Backlog: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10hashar) The jobs are being migrated off of Nodepool toward Docker container. That is T188742 Popups fails, it was p... [18:30:09] maintenance-disconnect-full-disks build 614 - integration-slave-docker-1026: OFFLINE due to disk space [18:30:56] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Readers-Web-Backlog: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10hashar) https://integration.wikimedia.org/ci/view/Reading-Web/job/selenium-daily-beta-RelatedArticles/12/ passed (af... [18:31:13] RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK [18:31:43] !log bring integration-slave-docker-1026 back online since disk space is normal again [18:31:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:38:19] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Readers-Web-Backlog: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10Jdlrobson) Thanks for fixing RelatedArticles and the reassurance! I'm still very nervous about the Popups one as we... [18:41:20] thcipriani: around? https://phabricator.wikimedia.org/T203246 [18:41:52] Amir1: hi [18:42:37] sorry to bother, What do you think we should do? Funny thing is ores repo itself doesn't need LFS [18:42:53] The only thing I can think of is pushing [18:44:09] so you need different perms for scoring/ores/ores than the perms from scoring? [18:44:22] Also I need push rights (temporarily for this repo as it's already hosted in gerrit and not mirrored) for this repo to rewrite all of its history with LFS: https://gerrit.wikimedia.org/r/admin/projects/research/ores/wheels [18:45:17] thcipriani: it depends, if you just disable LFS for that repo, it might just work but I'm not sure the LFS is the (only) reason that mirroring got broken [18:45:48] (03PS1) 10Thcipriani: Edit Project Config [scoring/ores/ores] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/458271 [18:46:05] but in a more long term we still need a way to make mirrors work for other repos, not today but soon (TM) [18:46:39] (03Abandoned) 10Thcipriani: Edit Project Config [scoring/ores/ores] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/458271 (owner: 10Thcipriani) [18:47:19] if there is no way to have lfs mirroring work, the only option I can think of is to give push rights indefinitely to scoring platform team members so they force update it from github [18:49:24] I'm talking about three different problems: 1- mirroring for scoring/ores/ores (it doesn't need LFS but it has it, so mirroring is broken) 2- mirroring for editquality and some other LFS repos (we need to fix it probably by giving push rights) 3- making https://gerrit.wikimedia.org/r/admin/projects/research/ores/wheels have all of its history in LFS (that requires push rights for half an hour) [18:51:30] thcipriani: sorry for lots of confusion, I hope ^ helps [18:53:38] ok, for 1 if you could add a patch to All-Projects for disabling LFS I could +2 that. for 2 who needs push right exactly? [18:54:31] okay, let me try for 1 [18:55:07] for 2, me, Adam Wight (awight), Aaron Halfaker (halfak) [18:56:07] could you point me to what made you think push rights would work? I'm still fuzzy on how we get stuff from github -> gerrit and the conversation on the task didn't help me understand :( [18:59:00] basically I have clone of the repo from github, I define a remote named gerrit and I push master of github to gerrit by "git push -f gerrit master" [18:59:15] i.e. doing the mirroring manually [18:59:26] I couldn't find any other solution [19:01:29] ah, ok, I was confused because I thought you were proposing this as a way to fix mirroring to gerrit [19:02:06] no, TBH, I don't think it's even a solution but that makes thing working until we find a way to fix mirroring [19:07:01] did mirroring used to work, that is, did it recently stop working? [19:12:24] that was sort of opaque in the task and I wasn't involved (obviously) with the initial setup [19:32:00] 10Continuous-Integration-Infrastructure, 10Wikimedia-production-error (Shared Build Failure): Jenkins jobs for MediaWiki failing with 'npm: shasum check failed' - https://phabricator.wikimedia.org/T203506 (10Krinkle) And again 10Continuous-Integration-Infrastructure, 10Wikimedia-production-error (Shared Build Failure): Jenkins jobs for MediaWiki failing with 'npm: shasum check failed' - https://phabricator.wikimedia.org/T203506 (10Legoktm) https://npm.community/t/shasum-check-or-integrity-eintegrity-errors/153 basically says to retr... [19:36:59] https://github.com/travis-ci/travis-build/blob/73f74a94957f73eb54dc821f80c0c85ad8f8aab7/lib/travis/build/script/templates/header.sh#L168-L187 looks like a neat thing to copy [19:37:12] (03PS1) 10Ladsgroup: Disable LFS for scoring/ores/ores [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/458278 (https://phabricator.wikimedia.org/T203246) [19:37:32] thcipriani: https://gerrit.wikimedia.org/r/#/c/All-Projects/+/458278 [19:38:03] It can be also that the mirroring broke because we moved the github repos (from wiki-ai to wikimedia org) [19:38:11] but it should just redirect, right [19:38:15] 10Continuous-Integration-Infrastructure, 10Wikimedia-production-error (Shared Build Failure): Jenkins jobs for MediaWiki failing with 'npm: shasum check failed' - https://phabricator.wikimedia.org/T203506 (10Krinkle) @Legoktm Yeah, I think this is probably due to a cache corruption. Clearing castor for the aff... [19:38:23] maybe the mirroring system can't handle a 303 :D [19:38:50] (03CR) 10Thcipriani: [V: 032 C: 032] Disable LFS for scoring/ores/ores [All-Projects] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/458278 (https://phabricator.wikimedia.org/T203246) (owner: 10Ladsgroup) [19:39:25] Amir1: ^ merged [19:39:36] Thank you! [19:39:48] and now we should wait to see if it gets updated [19:44:37] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Readers-Web-Backlog: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10hashar) Zeljko and I have a pairing session on Thursday, we will polish up the Popups job and complete the migration :] [19:56:04] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Readers-Web-Backlog: Popups and RelatedArticles daily jobs absent/unusable - https://phabricator.wikimedia.org/T203591 (10Jdlrobson) 🙏 [20:28:46] 10Release-Engineering-Team (Kanban), 10User-greg: Wikimedia Portals Update and European Mid-day SWAT windows at the same time on Mondays - https://phabricator.wikimedia.org/T201932 (10greg) 05Open>03Resolved Done. [20:36:12] (03PS1) 10Thcipriani: Refactor disconnect-full-disks to use Disk object [integration/config] - 10https://gerrit.wikimedia.org/r/458290 [20:42:37] upstream 3.0 blockers https://groups.google.com/forum/#!topic/repo-discuss/lp2UarcPWxY [20:43:09] dropping requirement of having a db is a 3.0 blocker [20:53:21] 10Release-Engineering-Team, 10Mail, 10Operations: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10herron) [20:53:31] 10Release-Engineering-Team, 10Mail, 10Operations, 10User-herron: Ensure Jenkins mail configuration supports outbound smtp server failover - https://phabricator.wikimedia.org/T203607 (10herron) [21:16:56] (03PS2) 10Thcipriani: Refactor disconnect-full-disks to use Disk object [integration/config] - 10https://gerrit.wikimedia.org/r/458290 [21:42:37] (03PS1) 10Volans: spicerack: use the backport version of debian-glue [integration/config] - 10https://gerrit.wikimedia.org/r/458303 (https://phabricator.wikimedia.org/T199079) [21:45:13] (03CR) 10Thcipriani: [C: 032] "This is deployed" [integration/config] - 10https://gerrit.wikimedia.org/r/458290 (owner: 10Thcipriani) [21:47:31] (03Merged) 10jenkins-bot: Refactor disconnect-full-disks to use Disk object [integration/config] - 10https://gerrit.wikimedia.org/r/458290 (owner: 10Thcipriani) [21:56:35] (03PS1) 10D3r1ck01: Add SendGrid extension to release tool [tools/release] - 10https://gerrit.wikimedia.org/r/458309 [22:40:15] /: 95%: OFFLINE due to disk space [22:40:15] /srv: 95%: OFFLINE due to disk space [22:40:52] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [22:43:29] PROBLEM - Free space - all mounts on integration-slave-docker-1014 is CRITICAL: CRITICAL: integration.integration-slave-docker-1014.diskspace.root.byte_percentfree (<11.11%) [22:47:10] thcipriani: ^^ did you forget to have it output the hostname? :) [22:47:38] ugh, evidently :) [22:48:59] It's part of the fun [22:49:11] 1014 it looks like [22:53:28] RECOVERY - Free space - all mounts on integration-slave-docker-1014 is OK: OK: All targets OK [23:03:23] hrm, I assumed that using a class with a toString method in a string context would make it a string. It did, but somehow wiped out the rest of the formatting :\ [23:07:18] (03PS1) 10Thcipriani: Maintenance: call .toString() explicitly [integration/config] - 10https://gerrit.wikimedia.org/r/458327 [23:15:52] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:22:12] Eurgh, SlowTimer is responsible for *so much noise* on fatalmonitor. :-( [23:31:29] I've got phan failing to detect an existing class: https://integration.wikimedia.org/ci/job/mwext-php70-phan-docker/12668/consoleFull - anybody knows why it might happen? [23:36:07] We had something like that with lightncandy recently [23:40:27] twentyafterfour or thcipriani wondering could you review https://gerrit.wikimedia.org/r/c/operations/puppet/+/439808 please? :) [23:45:11] maintenance-disconnect-full-disks build 711 integration-slave-jessie-1001 (/srv: 95%): OFFLINE due to disk space [23:47:18] Reedy: anything that can be done to fix it? [23:50:11] maintenance-disconnect-full-disks build 712 integration-slave-jessie-1001: OFFLINE due to disk space [23:55:13] maintenance-disconnect-full-disks build 713 integration-slave-jessie-1001: OFFLINE due to disk space