[00:00:02] k, just looking for beta-picked stuff that would be no-op on prod [00:00:04] I think that commit was a result of my experiment, which is waiting for the neutron migration to happen [00:00:08] This channel has an alert when rebase fails. Maybe that should go to -ops instead [00:01:07] Krinkle, it did but shinken-wm has been suspiciously quiet recently. Also ops won't go near it [00:01:52] thcipriani, to be honest we could probably just drop that cherry-pick for the time being [00:02:20] I can merge 316512 [00:02:25] I mean, having puppet failures in beta addressed quickly is still miles away from "beta is up" and "stuff is added to beta before/when added to prod", entire modules are and can just never be set up. But it would be a start. [00:05:56] done [00:08:27] thanks [00:10:16] re 446242 i think it's actually changing/breaking diamond config. at least in prod. but also we are removing diamond i think [00:10:55] is it possible to have a systemd service which is installed by puppet but its enabled/disabled/running state is not managed by puppet [00:12:10] systemd has lots of ways to make unit execution conditional but it seemed a bit hackish [00:12:49] I think our systemd::service manifest doesn't allow for that [00:13:04] yea, we have that for jenkins i think [00:13:05] :service_ensure => 'unmanaged', [00:13:55] hm so [00:13:56] base::service_unit is being replaced by systemd::service but some are done and some are not [00:14:21] it has ensure => present, but the service_params dict it passes in can have ensure => undef, making it unmanaged? [00:15:08] right because it does $params = merge($base_params, $service_params) [00:15:18] the $real_ensure in this example https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/434427/6/modules/zuul/manifests/server.pp can be 'unmanaged' [00:16:04] got it [00:16:36] base::service_unit has service_params but systemd::service does not [00:16:59] hm? [00:17:05] modules/systemd/manifests/service.pp [00:17:09] $service_params = {}, [00:17:21] they should both have it [00:17:31] yeah ok [00:17:34] $params = merge($base_params, $service_params) [00:17:34] ensure_resource('service', $label, $params) [00:17:39] sorry [00:18:32] this one is merged as opposed to the former example: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/434538/6/modules/jenkins/manifests/init.pp [00:18:43] that was the jenkins one [00:24:24] re: 446242 i'm a bit surprised how it is cherry-picked but not breaking stuff. i mean per comments it was already picked when i still had "duplicate declaration" in prod [00:24:39] and then amended and got a different issue [00:26:00] i'll try to move that forward though some way.. we want to remove diamond after all [00:26:31] pt-heartbeat is supposed to run on master DB servers but not on slaves [00:27:07] one of the reasons why it takes half an hour to do a master switch right now is because you have to change the master status in puppet and wait for puppet to shut down the pt-heartbeat service and start it up in some other place [00:29:07] TimStarling, so you're planning to manage it from outside puppet? makes sense [00:30:10] yeah but this is mostly a demo in beta only, because I'm not sure if jaime will buy it [00:39:48] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Krenair) During cherry-pick review today I realised that my attempt (`b59add730544b922e1fb6... [00:43:08] TimStarling, oh yeah, other fun problem around external puppet contributions [00:43:14] I can't run puppet-compiler [00:43:16] i merged https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/447483/ i could confirm that was beta-only and you said it was picked for 1 or 2 years [00:43:36] turns out you have to be in a privileged LDAP group for jenkins to allow you to build jobs, including running puppet-compiler [00:43:54] mutante, yeah that's been there for ages [00:44:12] thanks [00:44:51] for my other cherry-picks... [00:45:08] Krenair: i was about to say "that is https://phabricator.wikimedia.org/T192532" and guess what i found then: [00:45:14] https://phabricator.wikimedia.org/T97580 [00:46:01] linking those [00:46:25] yeah puppet-compiler should not need an NDA [00:46:56] i dont know about NDA but "it's possible if you are in a certain group" is already new to me and much better than "figure out how it could even work" [00:47:17] I have one but I doubt they'll put me in that group, plus IIRC legal doesn't like that my one was a few years old and not written with a specific purpose [00:48:02] 10Continuous-Integration-Config, 10Release-Engineering-Team (Someday), 10Operations, 10puppet-compiler, 10Puppet: Figure out a way to enable volunteers to use the puppet compiler - https://phabricator.wikimedia.org/T192532 (10Dzahn) i just noticed T97580 Is that the same thing and it got solved?? [00:48:04] 10Continuous-Integration-Config: Allow Tool Labs volunteer roots to submit jobs on puppet-catalog-compiler - https://phabricator.wikimedia.org/T97580 (10Dzahn) also see T192532 [00:48:21] This one is definitely not ready yet, and is relevant to an ops quarterly goal, so I'm not worried about it: 9bfdfcb166 [WIP] Central certificates service [00:48:32] https://phabricator.wikimedia.org/T192561#4528909: b59add7305 Try to fix npm package on deployment-deploy01 [00:48:57] Nag specific person to merge: c1db9e4c85 cumin: Allow Puppet DB backend to be used within Labs projects that use it [00:49:06] that group seems strange to use for this purpose [00:49:27] while it probably makes sense that _some_ group is used [00:49:37] Nag specific person to merge: 96faa97547 exim: Permit DKIM domain to be changed by hiera [00:49:45] Nag specific person to merge: 3062da3a7b Re-combine labs and production exim minimal config [00:50:02] well. i wasnt even aware that membersip in WMF-NDA would do _anything_ besides letting you read private tickets [00:50:07] Ignore as we'll be ditching this hopefully this quarter, with the above mentioned ops goal: 313646ba2a letsencrypt: Push acme-setup timeout to 350s [00:50:11] mutante, it doesn't [00:50:18] AFAIK [00:50:26] and maybe letting you create such private tickets [00:50:48] Having the production branch be what's running in production seems reasonable to me. [00:50:52] ah, i see "LDAP groups: wmf or nda." ok [00:51:17] Unfortunately blocked on unknown production puppet behaviour, see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/437640/ comments: ff5e4a8b80 Attempt to secure Puppet DB better [00:53:51] Find out where Filippo is with his parent commit: 79799d257d swift: Fix checks on drive/filesystem titles to allow for labs ones [00:53:58] Review whether needed: bd56bdf0c7 Puppetise simple no-CA class for deployment-dumps-puppetmaster02 [09:10:33] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T191064 (10Addshore) [09:14:53] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (shipyard), 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811 (10Osnard) @Paladox In some cases we get errors from the `ParserIntegrationTest`. Something like /work... [09:15:08] (03PS1) 10Umherirrender: Run seccheck for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/455107 [09:20:53] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team, 10Quibble: Xvfb causes _XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created - https://phabricator.wikimedia.org/T202710 (10hashar) Xvfb / Xserver creates a `/tmp/.X11-unix/94` unix socket and ensure the... [09:23:10] (03PS1) 10Hashar: Xvfb does not need to listen on a unix socket [integration/quibble] - 10https://gerrit.wikimedia.org/r/455110 (https://phabricator.wikimedia.org/T202710) [09:23:45] (03CR) 10Hashar: "Untested :]" [integration/quibble] - 10https://gerrit.wikimedia.org/r/455110 (https://phabricator.wikimedia.org/T202710) (owner: 10Hashar) [09:45:57] 10Project-Admins: New Extension: ChangeUserPasswords - https://phabricator.wikimedia.org/T202275 (10Aklapper) Please ping here once the Git/Gerrit code repository has been created - afterwards I'm happy to create a Phabricator project for task/issue tracking if that is wanted [10:00:57] 10Continuous-Integration-Config, 10Wikidata: integration-slave-jessie-1002.integration.eqiad.wmflabs editting under 10.68.16.199 on test.wikidata - https://phabricator.wikimedia.org/T189047 (10hashar) 05Open>03declined [10:10:24] (03CR) 10Hashar: "Seems to work for me locally now :]" [integration/quibble] - 10https://gerrit.wikimedia.org/r/455110 (https://phabricator.wikimedia.org/T202710) (owner: 10Hashar) [10:32:13] (03PS2) 10Hashar: Archive MOOC extension [integration/config] - 10https://gerrit.wikimedia.org/r/454856 (https://phabricator.wikimedia.org/T199032) [10:32:25] (03CR) 10Hashar: [C: 032] Archive MOOC extension [integration/config] - 10https://gerrit.wikimedia.org/r/454856 (https://phabricator.wikimedia.org/T199032) (owner: 10Hashar) [10:32:53] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [10:33:17] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [10:33:57] (03Merged) 10jenkins-bot: Archive MOOC extension [integration/config] - 10https://gerrit.wikimedia.org/r/454856 (https://phabricator.wikimedia.org/T199032) (owner: 10Hashar) [10:42:41] 10Phabricator, 10Project-Admins, 10MediaWiki-General-or-Unknown, 10Developer-Wishlist (2017): Allow to search tasks about MediaWiki core and core only (create MediaWiki umbrella project?) - https://phabricator.wikimedia.org/T76942 (10Aklapper) [10:43:34] 10Phabricator (Upstream), 10Upstream: Task rename notifications are hard to understand due to missing markup for old and new title - https://phabricator.wikimedia.org/T166358 (10Aklapper) [10:44:03] 10Phabricator (2018-02-15), 10Upstream: When another user removes you as a subscriber from a task, you don't receive an email notification - https://phabricator.wikimedia.org/T126711 (10Aklapper) 05Open>03Resolved [10:47:13] 10Phabricator, 10Phabricator (Upstream), 10Upstream: Herald does not seem to act on its own changes - https://phabricator.wikimedia.org/T128143 (10Aklapper) 05Open>03declined The Herald transcripts for both actions (adding #Traffic and #Operations separately) are not available anymore so it's not possibl... [10:58:46] 10Phabricator (Upstream), 10Upstream: Search bar of phabricator isn't easily accessible on mobile devices - https://phabricator.wikimedia.org/T172705 (10Aklapper) [10:58:48] 10Phabricator (Upstream), 10Mobile, 10Upstream: Toggle buttons in Phabricator don't work on mobile (affects: Login, Search) - https://phabricator.wikimedia.org/T201480 (10Aklapper) [11:00:24] 10Phabricator, 10MediaWiki-extensions-CentralAuth, 10Mobile: Using the MediaWiki login to Phabricator on mobile, got a "No active login attempt is in progress for your session." error on CentralLogin - https://phabricator.wikimedia.org/T95221 (10Aklapper) Does this still happen / can someone still reproduce?... [11:00:28] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic, 10HTTPS: https://sv.wikipedia.beta.wmflabs.org/ has invalid certificate - https://phabricator.wikimedia.org/T202564 (10ema) p:05Triage>03Normal [11:01:04] 10Phabricator, 10Browser-Support-Android-Google-Chrome, 10Mobile: Search box in Phabricator on Android Chrome closes right away (need to hold down finger to stay open) - https://phabricator.wikimedia.org/T160045 (10Aklapper) [11:01:08] 10Phabricator (Upstream), 10Mobile, 10Upstream: Toggle buttons in Phabricator don't work on mobile (affects: Login, Search) - https://phabricator.wikimedia.org/T201480 (10Aklapper) [11:07:20] 10Phabricator: Viewing raw files (on phab.wmfusercontent.org) fails with ERROR_MESSAGE_MAIN on iOS mobile - https://phabricator.wikimedia.org/T169454 (10Aklapper) @Paladox: Can you still reproduce this? Asking as https://phabricator.wikimedia.org/T201460 got fixed (which might be unrelated but as I do not find a... [11:17:27] 10Phabricator: CURLE_COULDNT_CONNECT error when trying to use Conduit from Analytics VLAN (stat1005) - https://phabricator.wikimedia.org/T201746 (10Aklapper) [11:29:30] (03PS1) 10Hashar: Migrate ORES to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/455141 (https://phabricator.wikimedia.org/T198201) [11:29:48] (03CR) 10Hashar: [C: 032] Migrate ORES to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/455141 (https://phabricator.wikimedia.org/T198201) (owner: 10Hashar) [11:30:21] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [11:31:16] (03Merged) 10jenkins-bot: Migrate ORES to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/455141 (https://phabricator.wikimedia.org/T198201) (owner: 10Hashar) [11:36:20] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [11:38:02] (03PS1) 10Hashar: Archive BlueSpiceUserPreferences [integration/config] - 10https://gerrit.wikimedia.org/r/455144 [11:38:34] (03CR) 10Hashar: [C: 032] Archive BlueSpiceUserPreferences [integration/config] - 10https://gerrit.wikimedia.org/r/455144 (owner: 10Hashar) [11:40:11] (03PS1) 10Hashar: Skip MediaWiki tests on BlueSpiceSMWConnector [integration/config] - 10https://gerrit.wikimedia.org/r/455145 (https://phabricator.wikimedia.org/T130811) [11:40:23] hashar: I look forward to being able to increase window-floor for the gate-and-submit pipeline when everything runs as docker jobs, as cancelling jobs will be cheap [11:40:26] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (shipyard), 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811 (10hashar) [11:40:31] (03Merged) 10jenkins-bot: Archive BlueSpiceUserPreferences [integration/config] - 10https://gerrit.wikimedia.org/r/455144 (owner: 10Hashar) [11:40:43] * addshore has nearly been waiting for 2 hours for a patch he +2ed this morning to be merged [11:41:02] addshore: if you increase the window-floor that would make it worse probably [11:41:21] addshore: one of the issue is that the quibble jobs are quite slow (~ 20 minutes) [11:41:37] well, if jobs can instantly be cancelled and restarted though, then window-floor wouldnt really be needed [11:41:49] (03CR) 10Hashar: [C: 032] Skip MediaWiki tests on BlueSpiceSMWConnector [integration/config] - 10https://gerrit.wikimedia.org/r/455145 (https://phabricator.wikimedia.org/T130811) (owner: 10Hashar) [11:42:01] the overhead of having to cancel a bunch of jobs is surely the reason it is there? [11:42:45] 10Continuous-Integration-Config, 10Front-end-Standards-Group: Consider moving from npm to yarn for WMF repos? - https://phabricator.wikimedia.org/T148230 (10TheDJ) 05stalled>03declined I think we can pretty much call this dead. It seems npm is catching up on most fronts, and this being a Facebook project s... [11:43:18] (03Merged) 10jenkins-bot: Skip MediaWiki tests on BlueSpiceSMWConnector [integration/config] - 10https://gerrit.wikimedia.org/r/455145 (https://phabricator.wikimedia.org/T130811) (owner: 10Hashar) [11:43:59] hashar: this one has taken 36 mins so far :( https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/12470/ [11:44:20] oh my god [11:44:34] * addshore goes to look at which bit took the longest [11:44:55] addshore: we had slaves 1025 and 1026 added in with more executors [11:44:57] and more CPU etc [11:45:03] but I guess they hit some I/O contention [11:45:12] oooh, perhaps [11:45:27] that one is on 1025 [11:45:36] https://integration.wikimedia.org/ci/computer/integration-slave-docker-1025/ [11:45:45] it can runs up to 5 quibble jobs concurrently [11:46:14] how many cpus does it have? [11:47:24] addshore: 8 cpu [11:53:43] mhhmh [11:55:35] !log manaully install iotop on integration-slave-docker-1025 to inspect IO [11:55:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:58:48] (03PS1) 10Hashar: Remove non voting PHPUnit tests [integration/config] - 10https://gerrit.wikimedia.org/r/455147 (https://phabricator.wikimedia.org/T183512) [12:01:01] !log manaully install nmon on integration-slave-docker-1025 to inspect IO [12:01:04] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:01:18] i wonder what the speed of the disks are [12:01:57] (03CR) 10jerkins-bot: [V: 04-1] Remove non voting PHPUnit tests [integration/config] - 10https://gerrit.wikimedia.org/r/455147 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [12:05:21] (03PS1) 10Hashar: Remove generic unit tests on DonationInterface [integration/config] - 10https://gerrit.wikimedia.org/r/455148 [12:07:35] (03PS1) 10Hashar: Clean up non-voting MediaWiki jobs [integration/config] - 10https://gerrit.wikimedia.org/r/455150 (https://phabricator.wikimedia.org/T183512) [12:07:47] addshore: probably slow :/ [12:08:20] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/455147 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [12:16:38] (03CR) 10Hashar: [C: 032] Remove non voting PHPUnit tests [integration/config] - 10https://gerrit.wikimedia.org/r/455147 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [12:18:12] (03Merged) 10jenkins-bot: Remove non voting PHPUnit tests [integration/config] - 10https://gerrit.wikimedia.org/r/455147 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [12:19:02] (03CR) 10Hashar: [C: 032] Remove generic unit tests on DonationInterface [integration/config] - 10https://gerrit.wikimedia.org/r/455148 (owner: 10Hashar) [12:19:20] (03CR) 10Hashar: [C: 032] Clean up non-voting MediaWiki jobs [integration/config] - 10https://gerrit.wikimedia.org/r/455150 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [12:20:38] (03Merged) 10jenkins-bot: Remove generic unit tests on DonationInterface [integration/config] - 10https://gerrit.wikimedia.org/r/455148 (owner: 10Hashar) [12:23:06] (03Merged) 10jenkins-bot: Clean up non-voting MediaWiki jobs [integration/config] - 10https://gerrit.wikimedia.org/r/455150 (https://phabricator.wikimedia.org/T183512) (owner: 10Hashar) [12:54:48] 10Release-Engineering-Team (Kanban), 10BlueSpice, 10Patch-For-Review: [BlueSpiceAvatars] PHP Notice BS_DATA_DIR / BS_DATA_PATH - https://phabricator.wikimedia.org/T202412 (10hashar) a:03hashar [13:10:29] (03CR) 10Addshore: [C: 032] Enable Squiz.WhiteSpace.ObjectOperatorSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454718 (owner: 10Umherirrender) [13:10:59] (03CR) 10Addshore: [C: 032] Enable Squiz.Strings.ConcatenationSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454702 (owner: 10Umherirrender) [13:11:38] (03Merged) 10jenkins-bot: Enable Squiz.WhiteSpace.ObjectOperatorSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454718 (owner: 10Umherirrender) [13:11:40] (03CR) 10jerkins-bot: [V: 04-1] Enable Squiz.Strings.ConcatenationSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454702 (owner: 10Umherirrender) [13:12:36] (03CR) 10jenkins-bot: Enable Squiz.WhiteSpace.ObjectOperatorSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454718 (owner: 10Umherirrender) [13:20:01] (03PS3) 10Addshore: Update Quibble Docker README log directory [integration/quibble] - 10https://gerrit.wikimedia.org/r/451294 (owner: 10Tarrow) [13:22:24] (03PS1) 10Hashar: Migrate BlueSpiceAvatars to Quibble [integration/config] - 10https://gerrit.wikimedia.org/r/455162 (https://phabricator.wikimedia.org/T130811) [13:23:05] 10Release-Engineering-Team (Kanban), 10BlueSpice, 10Patch-For-Review: [BlueSpiceAvatars] PHP Notice BS_DATA_DIR / BS_DATA_PATH - https://phabricator.wikimedia.org/T202412 (10hashar) 05Open>03Resolved [13:24:42] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (shipyard), 10BlueSpice, 10Patch-For-Review: Enable unit tests on BlueSpice* repos - https://phabricator.wikimedia.org/T130811 (10hashar) [13:26:51] Hi, what happening with tests? Example: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/12527/console [13:29:22] 13:21:33 fatal: cannot create directory at 'languages/messages': No space left on device [13:29:26] integration-slave-docker-1026 [13:29:49] I saw to space no left on device [13:30:14] Can you resolve that? [13:32:39] kinda looks like someone already has [13:34:43] Can jobs be redirected from integration-slave-docker-1026 to another slave-docker if we have it? [13:34:47] on example [13:36:23] CI will just pick one [13:36:32] addshore: I noticed you added a blocker for wmf.18, do I need to rollback wikidata? [13:36:37] /dev/vda3 20G 17G 2.9G 85% / [13:36:37] /dev/mapper/vd-second--local--disk 64G 15G 47G 24% /srv [13:36:39] 1026 is fine [13:38:07] I think to 1026 should have more space (40G on example).. [13:40:15] thcipriani: wikidata is on .16 isnt it? [13:40:28] oh, its on .18 :D [13:40:29] no all wikis are on wmf.18 [13:40:35] when did that happen? xD [13:40:45] yesterday [13:40:48] I guess maybe https://tools.wmflabs.org/versions/ is just super behind for some reason... [13:42:04] yeah, so is https://noc.wikimedia.org/conf/wikiversions.json which seems even weirder. [13:42:16] * addshore smells something evil [13:42:53] s/evil/cached ¯\_(ツ)_/¯/ [13:43:17] anyway Special:Version is correct on the wikis [13:45:32] addshore: anyway, should I be rolling back wikidata for T202706? [13:45:32] T202706: wmf.18 - "Failed to load blob from address" while merging entities - https://phabricator.wikimedia.org/T202706 [13:48:42] thcipriani: no it should be okay as it is over the weekend, its not a major one [13:48:58] the only reason I added it as a blocker was because the version page said wikidata was still on .16 :) [13:49:13] addshore: makes sense :) mind if I move it to block next week's train? [13:49:21] sure! :) [13:49:26] * thcipriani does [13:49:29] thanks addshore [13:51:28] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T191064 (10thcipriani) [14:01:22] 10Release-Engineering-Team: tools.wmflabs.org/versions caching - https://phabricator.wikimedia.org/T202734 (10thcipriani) [14:02:17] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.18 deployment blockers - https://phabricator.wikimedia.org/T191064 (10thcipriani) Moved T202706 after talking with @Addshore in IRC. Added here due to {T202734}. [14:25:36] Hi! I want to deploy an extension to the beta cluster and I'm thoroughly confused on the right way to do that because https://www.mediawiki.org/wiki/Review_queue according to this page I have to edit files which don't exist anymore. Can someone take a moment to update it? [14:45:47] (03PS1) 10Hashar: Mark NSFileRepo as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455175 (https://phabricator.wikimedia.org/T196480) [14:46:18] (03CR) 10Hashar: [C: 032] Mark NSFileRepo as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455175 (https://phabricator.wikimedia.org/T196480) (owner: 10Hashar) [14:47:05] (03CR) 10jerkins-bot: [V: 04-1] Mark NSFileRepo as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455175 (https://phabricator.wikimedia.org/T196480) (owner: 10Hashar) [14:47:12] (03CR) 10jerkins-bot: [V: 04-1] Mark NSFileRepo as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455175 (https://phabricator.wikimedia.org/T196480) (owner: 10Hashar) [14:48:03] rgrr [14:49:02] (03CR) 10Hashar: [C: 032] Mark NSFileRepo as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455175 (https://phabricator.wikimedia.org/T196480) (owner: 10Hashar) [14:49:03] 10Release-Engineering-Team (Kanban), 10Analytics-Tech-community-metrics, 10Code-Health: Develop canonical/single record of origin, machine readable list of all repos deployed to WMF sites. - https://phabricator.wikimedia.org/T190891 (10Aklapper) [14:49:47] (03CR) 10jerkins-bot: [V: 04-1] Mark NSFileRepo as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455175 (https://phabricator.wikimedia.org/T196480) (owner: 10Hashar) [14:59:57] (03PS1) 10Hashar: Dont retrieve Jenkins plugins info in test suite [integration/config] - 10https://gerrit.wikimedia.org/r/455177 [15:01:08] (03CR) 10jerkins-bot: [V: 04-1] Dont retrieve Jenkins plugins info in test suite [integration/config] - 10https://gerrit.wikimedia.org/r/455177 (owner: 10Hashar) [15:19:52] (03PS2) 10Hashar: pin python-jenkins to 1.1.0 [integration/config] - 10https://gerrit.wikimedia.org/r/455177 [15:19:54] (03PS2) 10Hashar: Mark NSFileRepo as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455175 (https://phabricator.wikimedia.org/T196480) [15:27:34] (03CR) 10Hashar: [C: 032] pin python-jenkins to 1.1.0 [integration/config] - 10https://gerrit.wikimedia.org/r/455177 (owner: 10Hashar) [15:29:11] (03Merged) 10jenkins-bot: pin python-jenkins to 1.1.0 [integration/config] - 10https://gerrit.wikimedia.org/r/455177 (owner: 10Hashar) [15:30:14] (03Merged) 10jenkins-bot: Mark NSFileRepo as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455175 (https://phabricator.wikimedia.org/T196480) (owner: 10Hashar) [15:30:32] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Operations, 10Jenkins, 10Patch-For-Review: Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10Dzahn) Production deployment servers don't have a nodejs installed. I don't know why deploy... [15:47:34] (03PS1) 10Hashar: Mark BlueSpicePermissionManager as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455184 (https://phabricator.wikimedia.org/T197900) [15:49:07] (03PS1) 10Hashar: Mark BlueSpiceSmartList as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455185 [15:49:45] (03CR) 10Hashar: [C: 032] Mark BlueSpicePermissionManager as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455184 (https://phabricator.wikimedia.org/T197900) (owner: 10Hashar) [15:49:49] (03CR) 10Hashar: [C: 032] Mark BlueSpiceSmartList as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455185 (owner: 10Hashar) [15:51:35] (03PS1) 10Aklapper: Order list of extensions by alphabet [tools/release] - 10https://gerrit.wikimedia.org/r/455186 [15:52:21] (03Merged) 10jenkins-bot: Mark BlueSpicePermissionManager as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455184 (https://phabricator.wikimedia.org/T197900) (owner: 10Hashar) [15:52:24] (03Merged) 10jenkins-bot: Mark BlueSpiceSmartList as broken [integration/config] - 10https://gerrit.wikimedia.org/r/455185 (owner: 10Hashar) [15:59:39] 10MediaWiki-Codesniffer, 10MediaWiki-extensions-Variables, 10Patch-For-Review: Add MediaWiki Codesniffer to Variables extension - https://phabricator.wikimedia.org/T191811 (10MGChecker) 05Open>03Resolved [15:59:57] 10MediaWiki-Codesniffer, 10MediaWiki-extensions-Variables: Add MediaWiki Codesniffer to Variables extension - https://phabricator.wikimedia.org/T191811 (10MGChecker) [16:02:58] hmmmm [16:03:04] anyone got any idea whats up with this patch in gerrit? [16:03:04] https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/455180/ [16:03:17] for some reaosn gerrit is reporting it as needed-by and depending-on the same other change.... [16:03:33] but it is not needed-by it at all, and that isn't set anywhere as far as I can tell [16:12:20] hrm [16:12:24] looking at https://gerrit.googlesource.com/plugins/zuul/+/stable-2.15/src/main/java/com/googlesource/gerrit/plugins/zuul/GetCrd.java#77 [16:12:38] it's doing: https://gerrit.wikimedia.org/r/#/q/message:I89ab5e9c2d608fb2d2f7de4ed8ba3d40fd7ef13c+-change:I89ab5e9c2d608fb2d2f7de4ed8ba3d40fd7ef13c [16:12:50] which is where needed-by is coming from [16:14:53] so I guess it's just because the change-id is in another commit message [16:31:46] (03Abandoned) 10Hashar: Migrate wikidata/query/rdf publish job to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/415002 (owner: 10Hashar) [16:32:16] (03Abandoned) 10Hashar: Migrate conftool to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/392798 (owner: 10Hashar) [16:32:36] (03Abandoned) 10Hashar: quibble: docker build it from CI [integration/config] - 10https://gerrit.wikimedia.org/r/355548 (owner: 10Hashar) [16:32:40] (03Abandoned) 10Hashar: quibble: test running the container [integration/config] - 10https://gerrit.wikimedia.org/r/355560 (owner: 10Hashar) [16:33:05] 10Release-Engineering-Team (Kanban), 10Analytics-Tech-community-metrics, 10Code-Health: Develop canonical/single record of origin, machine readable list of all repos deployed to WMF sites. - https://phabricator.wikimedia.org/T190891 (10Aklapper) I made a bunch of updates today on https://www.mediawiki.org/w/... [16:35:27] 10Phabricator: Viewing raw files (on phab.wmfusercontent.org) fails with ERROR_MESSAGE_MAIN on iOS mobile - https://phabricator.wikimedia.org/T169454 (10mmodell) I don't think the fix for T201460 would affect loading images. It could be another cross domain policy issue but I don't really think so. Seems like a... [16:38:51] 10Phabricator: Viewing raw files (on phab.wmfusercontent.org) fails with ERROR_MESSAGE_MAIN on iOS mobile - https://phabricator.wikimedia.org/T169454 (10Paladox) Nope still fails. Fails with The operation couldn’t be completed. (QuickLookErrorDomain error 16.) (null) [16:41:55] 10Phabricator: Viewing raw files (on phab.wmfusercontent.org) fails with ERROR_MESSAGE_MAIN on iOS mobile - https://phabricator.wikimedia.org/T169454 (10Mainframe98) It's most likely a bug in iOS. I can reproduce it, with https://phab.wmfusercontent.org/file/data/qqvmtekknxnpbfwtap7q/PHID-FILE-rxk2etm7k467bhv3g2... [16:46:40] 10Phabricator, 10Cloud-Services, 10cloud-services-team: Tools-GUC workboard inaccessible on Phabricator - https://phabricator.wikimedia.org/T202757 (10Krinkle) [16:54:56] (03PS1) 10Hashar: jjb: simplify selenium definitions [integration/config] - 10https://gerrit.wikimedia.org/r/455203 (https://phabricator.wikimedia.org/T188742) [16:55:45] (03CR) 10Hashar: "I have simplified the JJB definition with:" [integration/config] - 10https://gerrit.wikimedia.org/r/443931 (https://phabricator.wikimedia.org/T188742) (owner: 10Hashar) [16:56:32] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), 10Patch-For-Review, 10User-zeljkofilipin: Run tests daily targeting beta cluster for all repositories with Selenium tests - https://phabricator.wikimedia.org/T188742 (10hashar) [16:56:34] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [16:57:05] 10Phabricator, 10Cloud-Services, 10cloud-services-team: Tools-GUC workboard inaccessible on Phabricator - https://phabricator.wikimedia.org/T202757 (10Aklapper) a:03Aklapper Fixed; same as T199207 [16:57:15] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10Patch-For-Review: [EPIC] Migrate Mediawiki jobs from Nodepool to Docker - https://phabricator.wikimedia.org/T183512 (10hashar) [16:57:17] 10Phabricator, 10Cloud-Services, 10cloud-services-team: Tools-GUC workboard inaccessible on Phabricator - https://phabricator.wikimedia.org/T202757 (10Aklapper) [16:57:21] 10Phabricator, 10Release-Engineering-Team (Kanban), 10User-Ryasmeen: 404 on workboard for an existing project (due to custom filter applied which did not exist in database) - https://phabricator.wikimedia.org/T199207 (10Aklapper) [16:57:57] merry week-end [17:09:46] thcipriani: OK for me to do an emergency deploy? One of the icons in the main wikitext editor is missing (single line fix). [17:09:51] to you too, hashar [17:10:54] James_F: sure, sounds innocuous and important [17:11:13] Kk. [17:13:06] Hi! I want to deploy an extension to the beta cluster and I'm thoroughly confused on the right way to do that because according to https://www.mediawiki.org/wiki/Review_queue#Deploy_to_Beta_Cluster I have to edit files which don't exist anymore. Can someone take a moment to update it? [17:14:39] which files don't exist? [17:15:19] thcipriani: extension-list-labs. [17:15:39] hrm [17:15:45] * thcipriani digs a bit [17:15:45] thcipriani: I talked to James_F yesterday and he graciously agreed to help but I think we should still document it. [17:15:56] definitely a good time to update docs [17:15:59] +1 [17:16:16] Update now before we get rid of Beta Cluster and they have to be updated again! ;-) [17:16:47] wait what? [17:16:56] I am surprised we let people do changes like remove files without updating docs. It's quite reckless. [17:17:40] Niharika: It'd only be possible to properly avoid that if the docs and the files lived in the same repo, or some automated dependency checker existed between them. [17:17:54] yeah good luck finding all references to a file everywhere... [17:18:25] James_F: Not aiming for perfection is fine but when you change the way deployments are done, you need to document it. [17:18:30] That's a big change. [17:18:42] I don't disagree. :-) [17:19:08] Maybe "expect" rather than "allow"? [17:19:13] If I do something to cause a production outage following outdated docs it'd hardly be my fault, IMHO. [17:19:48] I think we're still trying to get people to stop breaking maintenance scripts like addWiki. [17:20:05] Extension deployment documentation is a little way off [17:23:36] I found where it was removed. I don't know what's supposed to replace it, honestly. [17:26:07] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/409750/ [17:26:24] There's more problems with those docs. There are no `wfLoadExtension( );` statements in CommonSettings-labs. So there's no point in creating the variable in InitialiseSettings-labs? [17:26:36] Where do I configure my extension? [17:27:55] James_F: You initialise the extension variable here but where do you use it? https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/446845/ [17:28:20] Niharika: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/446843/1 [17:28:53] Oh, hmm, I was going to set wgMediaInfoEnable not wmgUseWikibaseMediaInfo. [17:29:11] Left a comment to myself. [17:29:28] James_F: My extension doesn't depend on Wikibase so I can skip touching IS-labs and CS-labs completely? [17:29:58] … umm. [17:30:11] Let's not try to have this conversation whilst I'm deploying. :-) [17:30:28] Sure. :) [17:33:08] * James_F sighs at dirty production. [17:42:27] Eurgh. [17:42:44] Server Admin Log not getting written to by the bot, and for that matter not getting announced in IRC. [17:42:52] Anyone know how to fix those? [17:43:38] you could always manually log it for now and announce it in the relevant channel(s) [17:43:44] Doing so. [17:46:43] I guess this is how, although I have never interacted with it: https://wikitech.wikimedia.org/wiki/Logmsgbot [17:46:59] it run on https://wikitech.wikimedia.org/wiki/Einsteinium [17:47:40] to which I have no access [17:49:14] Fun. [17:49:24] thcipriani: Emergency deploy over. [17:49:44] James_F: thank you [17:49:53] you might point it out to the person on clinic duty; they could refer it or look into it or perhaps they are aware [17:49:55] Thank *you*. :-) [17:50:09] ( am at 9pm definitely just doing driveby snarks) [17:50:22] apergos: Good idea, will do so. [17:52:58] (03CR) 10Dduvall: [C: 031] "Looks good!" [integration/config] - 10https://gerrit.wikimedia.org/r/454306 (https://phabricator.wikimedia.org/T201224) (owner: 10Thcipriani) [17:57:56] (03PS4) 10Umherirrender: Enable Squiz.Strings.ConcatenationSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454702 [17:58:06] (03CR) 10Umherirrender: "Rebased" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454702 (owner: 10Umherirrender) [18:39:44] (03CR) 10Krinkle: [C: 032] Enable Squiz.Strings.ConcatenationSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454702 (owner: 10Umherirrender) [18:40:39] (03Merged) 10jenkins-bot: Enable Squiz.Strings.ConcatenationSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454702 (owner: 10Umherirrender) [18:41:26] (03CR) 10jenkins-bot: Enable Squiz.Strings.ConcatenationSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/454702 (owner: 10Umherirrender) [19:47:23] !log set profile::elasticsearch::cirrus::tls_port: 9243 to appease puppet on deployment-elastic* hosts following https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/447568/ [19:47:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:52:40] (03PS1) 10Umherirrender: Enable Squiz.Functions.FunctionDeclarationArgumentSpacing [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 [19:53:47] really wish people would not reuse instance names [19:53:55] it looks like gehel replaced deployment-elastic* recently [19:54:06] causing the big scary SSH key warnings [19:56:43] where is shinken-wm [19:57:38] messages are going into /var/log/ircecho/irc-releng.log but not appearing here [20:14:06] wtf is going on [20:15:19] ... actually I think that one might be my fault :| [20:16:04] still doesn't work [20:16:56] Check service? [20:17:25] paladox, what? [20:17:38] Krenair: ircecho [20:17:44] paladox, what about it? [20:17:58] Krenair: on why it’s not joining the channel :) [20:18:11] Though I have been experiencing this problem too [20:18:15] paladox, ... that's what I'm looking into [20:18:29] Where it’s only joining one channel but not any other channel [20:20:01] I think I know why that is but I'm trying to run it and have it only join this channel which it's not successfully doing [20:20:24] (03PS1) 10Gergő Tisza: Add MR70 to CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/455226 [20:21:06] alright here we go [20:21:20] [u':asimov.freenode.net 474 shinken-wm-test #wikimedia-releng :Cannot join channel (+b) - you are banned'] [20:23:18] and here it is: :cherryh.freenode.net 367 Krenair #wikimedia-bans $~a AlexZ!sid12766@wikimedia/Az1568 1534838626 [20:23:35] * Krenair sigh [20:25:04] Test [20:25:05] okay [20:25:32] that's still got debug code in so I should clean up [20:26:55] paladox, so basically the problem is a global ban on all non-identified users [20:27:11] (global meaning everywhere with a +b $j:#wikimedia-bans) [20:27:17] this prevents shinken-wm from joining new channels and sending messages to existing ones [20:27:51] Ok [20:27:53] so I have to either make it auth (which let's face it is never going to get reviewed), or get it exempted in all the channels it's configured for [20:28:08] Krenair: I have a change to support authing in ircecho [20:28:15] But I need to address comments [20:28:53] Krenair: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/405594/ [20:30:53] and I don't have op rights in all the channels it joins [20:31:28] It's either the ban or constant ascii spam, take your pick :/ [20:31:37] I happen to be able to do -releng and -operations [20:31:46] but -cloud-feed, -cloud, #countervandalism, and -ai [20:32:29] AlexZ_, there's no support for globalising exempts is there? [20:32:33] like there is with bans [20:32:43] No sadly there is not [20:32:47] bah [20:33:27] freenode can't get their act together, so we're mainly limited on what can be done. [20:35:13] paladox, so, shinken-01 [20:35:16] there's no project puppetmaster [20:35:21] it runs directly off the labs puppetmaster [20:35:33] therefore all I can do is disable puppet and apply the patch manually [20:35:45] until/unless ops get involved to approve the puppet.git change [20:37:06] Ok [20:37:40] also, your patch provides us no secure way to get the password for a shinken-wm account to the instance, as it lives in labs [20:37:53] we could hide it away but it'd still be publicly accessible if someone knew where to look for it [20:37:53] I wonder if my change is authing after it try’s to join the channels [20:38:10] uh potentially [20:38:20] As icinga2-wm dosent seem to be able to join -au [20:38:28] *-ai [20:38:31] technically you may need to wait for NickServ to send you confirmation before issueing the JOIN commands [20:42:59] Krenair: which place would you recommend me do the authing in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/405594/ ? [20:43:57] the place where you're doing it now [20:44:25] Ah ok [20:44:37] How do we get it so that it joins after doing the auth? [20:46:23] gotta wait for the notice/privmsg from nickserv [20:46:34] and what about SASL ? [20:47:03] oh that might make things easier [20:48:28] would have to figure out the implementation [20:49:47] sasl requires --iirc-- putting a file with your password [20:50:00] putting a file? [20:50:13] I think so? [20:50:18] what do you mean? [20:50:34] forget about it, it's some time after I looked into that for my account [20:50:58] :( [20:52:09] I see docs for IRC clients, not for servers, dawg [20:52:43] we don't need to write the server side [20:52:47] just the client side [20:53:09] but we don't want docs for how users configure their clients, we want to know what we as a client need to send over the network to the server [20:55:15] if it helps https://freenode.net/kb/answer/sasl [20:55:20] if not, sorry :( [20:55:59] https://github.com/charybdis-ircd/charybdis/blob/master/doc/features/sasl.txt [20:56:21] so https://ircv3.github.io/ [20:56:25] nope [20:56:37] https://ircv3.github.io/irc/#sasl-authentication [21:05:15] "Error: /src/SULWatcher/SULWatcher.sql should not be executable" <-- it is not your precious worship [21:05:29] composer test returns no errors [21:05:38] but they do now? sigh [21:05:45] what? [21:06:27] https://gerrit.wikimedia.org/r/#/c/labs/tools/stewardbots/+/455233/ [21:14:04] oh, not sure about that, sorry [21:20:33] np [21:22:10] Project beta-scap-eqiad build #220155: 04FAILURE in 22 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/220155/ [21:24:00] Project beta-scap-eqiad build #220156: 04STILL FAILING in 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/220156/ [21:24:14] * thcipriani looking ^ [21:27:12] ack [21:27:40] weird [21:27:44] something wrong on the jenkins server? [21:27:56] no wait [21:28:06] on deployment-deploy01 [21:28:13] but under /srv/jenkins [21:28:28] guess that's something to do with how jenkins runs stuff on deployment-deploy01 [21:31:16] Krenair i've tryed everything to try and get it to auth before it joins [21:31:22] dosen't seem to work [21:31:29] what exactly have you tried [21:32:52] Krenair doing join twice [21:33:11] and using def on_privmsg(self, c, e): [21:33:53] have you looked at the raw messages coming back to see what's up? [21:33:57] Project beta-scap-eqiad build #220157: 04STILL FAILING in 9.8 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/220157/ [21:34:18] ^ that one was probably my fault (still testing stuff) [21:37:54] Krenair oh nope [21:38:25] well... otherwise you're pretty much doing it blindly :) [21:38:38] PROBLEM - Free space - all mounts on integration-slave-docker-1026 is CRITICAL: CRITICAL: integration.integration-slave-docker-1026.diskspace.root.byte_percentfree (<22.22%) [21:39:03] heh [21:39:05] thcipriani: looks like a permission error, maybe some file got chmod-ed? [21:40:28] still trying to figure it out. I can run: find . -name .gitignore -delete as the jenkins-deploy user in the directory where it's trying that and it succeeds...so either it's running that in the wrong directory or as the wrong user or I am [21:40:42] Krenair have to go now (movie time!) [21:42:06] have fun [21:42:38] Project beta-scap-eqiad build #220158: 04STILL FAILING in 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/220158/ [21:43:32] oh! I see what's happening. It can't cd after running that command [21:43:48] sudo bash :P [21:43:58] Project beta-scap-eqiad build #220159: 04STILL FAILING in 9.5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/220159/ [21:49:28] Project beta-scap-eqiad build #220160: 04STILL FAILING in 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/220160/ [21:53:36] RECOVERY - Free space - all mounts on integration-slave-docker-1026 is OK: OK: All targets OK [21:54:01] Project beta-scap-eqiad build #220161: 04STILL FAILING in 9.2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/220161/ [21:55:26] should work this time...or at least fail for a different reason [22:37:54] 10MediaWiki-Codesniffer: IfElseStructureSniff produces Undefined index - https://phabricator.wikimedia.org/T197197 (10Umherirrender) [22:37:57] 10MediaWiki-Codesniffer, 10Patch-For-Review: Undefined index: scope_opener in IfElseStructureSniff - https://phabricator.wikimedia.org/T183828 (10Umherirrender) [22:43:05] Yippee, build fixed! [22:43:06] Project beta-scap-eqiad build #220162: 09FIXED in 48 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/220162/ [22:56:01] Krenair: what was the conclusion on the shinken+IRC+auth stuff? [22:56:19] legoktm, it's likely still broken in all the other channels [22:56:25] the proper way forward requires ops help [22:56:46] I've got a slightly questionable ban exemption in here to fix it here [22:58:45] Krenair: is there a ticket for this / could you file one? :) [22:59:31] well there was https://phabricator.wikimedia.org/T48254 [22:59:42] which was closed [23:00:29] I'm considering reopening it [23:00:36] because it wasn't just icinga-wm [23:00:46] and our bots have to be able to function in an environment that bans all unidentified users [23:01:41] I posted in -cloud-admin but I think they're all done for the week [23:02:09] 10Phabricator: Not filling in a Due Date, tag list of a task is indented (in search results, workboard columns, etc) - https://phabricator.wikimedia.org/T199141 (10Aklapper) 05Open>03declined This issue gets hidden whenever H295 gets applied (see e.g. https://phabricator.wikimedia.org/T199135#4531537 ) henc... [23:02:32] Krenair: agreed on re-opening, though we should probably have individual tickets for each bot since I assume they'll all need diff fixes [23:02:58] yes [23:03:22] 10Project-Admins, 10Epic: Replace all "tracking" tasks with tags/projects (if unbounded), relabel them to outcome tasks (if bounded), or kill them entirely (if pointless) - https://phabricator.wikimedia.org/T192655 (10Aklapper) [23:05:49] 10MediaWiki-Codesniffer: PHPCS Internal error when using SPDX license expressions like OR - https://phabricator.wikimedia.org/T195429 (10Umherirrender) It seems to fix this issue the license string must be split by `\s+(?:AND|OR)\s+` and each part must be checked. I am not sure if this is something for upstream... [23:06:54] legoktm, so here's what I think we should do with ircecho [23:07:07] (03CR) 10Umherirrender: "You can find some examples in the following repos:" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/455225 (owner: 10Umherirrender) [23:07:10] allow a parameter to be passed in that goes to a file just containing the credentials [23:07:30] implement SASL PLAIN with that, if it's passed [23:08:14] change the puppetisation to let a path be provided [23:08:37] for shinken, have that file *unmanaged* by puppet. prod may use its secret system if they wish [23:08:57] legoktm, thoughts? [23:09:15] shinken is only running in cloud services right? [23:09:40] if so, that makes sense to me [23:09:40] yes [23:09:50] without a project puppetmaster [23:09:54] hence why it's a problem to use puppet secrets there [23:10:06] right, because labs/private is not [23:10:10] yes [23:11:38] (see -ops-internal) [23:22:11] 10MediaWiki-Codesniffer: PHPCS Internal error when using SPDX license expressions like OR - https://phabricator.wikimedia.org/T195429 (10Krinkle) Using them separately seems cleaner to me, and works with the current rule. I did the same at (03PS1) 10Dduvall: Statsd publisher that sends job/node metrics to statsd.eqiad.wmnet [integration/config] - 10https://gerrit.wikimedia.org/r/455269 (https://phabricator.wikimedia.org/T201972) [23:27:07] :oooo [23:28:31] 10Phabricator, 10OTRS: Enable Nuance ticket tracking system in Phabricator (an alternative to OTRS) - https://phabricator.wikimedia.org/T107014 (10Aklapper) 05stalled>03declined p:05Low>03Lowest Blocked on https://secure.phabricator.com/T8783#163973 so I'm boldly declining this. IMO we don't need more... [23:31:10] (03PS2) 10Dduvall: Statsd publisher that sends job/node metrics to statsd.eqiad.wmnet [integration/config] - 10https://gerrit.wikimedia.org/r/455269 (https://phabricator.wikimedia.org/T201972) [23:33:31] (03CR) 10Krinkle: "Assuming this is to be sent to prod statsd, a couple issues." (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/455269 (https://phabricator.wikimedia.org/T201972) (owner: 10Dduvall) [23:34:21] (03CR) 10Dduvall: "Created a test job with this script as a post-build action, and it seems to work correctly. The stats are available at grafana.wikimedia.o" [integration/config] - 10https://gerrit.wikimedia.org/r/455269 (https://phabricator.wikimedia.org/T201972) (owner: 10Dduvall) [23:34:25] 10Phabricator, 10Performance: /maniphest/report/project/ : Maximum execution time of 10 seconds exceeded - https://phabricator.wikimedia.org/T125357 (10Aklapper) [23:34:28] 10Phabricator, 10Easy, 10Patch-For-Review: tooltip mentions non-existent "Wishlist priority" in Open Tasks by Project and Priority report - https://phabricator.wikimedia.org/T91428 (10Aklapper) [23:35:03] marxarelli: would job.label.labelname.*ms and job.job.jobname.*ms suffice? [23:36:08] Krinkle: almost, yes [23:36:15] 10Phabricator: Converted bugs could link to the original report in static-bugzilla.wikimedia.org - https://phabricator.wikimedia.org/T882 (10Aklapper) I don't think many folks still look up stuff on https://static-bugzilla.wikimedia.org/ nowadays. Proposing to decline. [23:36:20] i'd like to get the zuul project in there as well though [23:36:47] since there's bound to be a lot of variance in a single job's duration for one project and another [23:36:58] marxarelli: I think the growth for that is too big. How would it be visualised/used? [23:37:40] I assume the main use case is infra monitoring, not job/code perf right? [23:37:47] we're trying to measure how jobs perform on a few different node configurations (labs instance type, basically, and number of executors) [23:38:12] legoktm, google'd 'SingleServerIRCBot SASL', first result: https://bd808.com/blog/2017/03/01/sasl-auth-with-python-irc/ [23:38:37] note that metrics are lazy-created with a low throttle. which means for the first week, most data will be dropped while each combination's depth and statsd expansion metrics are created one by one. [23:39:37] marxarelli: That sounds cool :) nice [23:39:54] 10Phabricator: Convert RT links in Bugzilla comments in links in Phabricator tasks - https://phabricator.wikimedia.org/T874 (10Aklapper) I don't think many folks still look up stuff on https://rt.wikimedia.org/ nowadays. Proposing to decline. [23:40:07] marxarelli: If it is temporary (like < 1month?) with the data removed for later, and visualisation/data dropping not a concern, then I'll retract. [23:40:22] it should be short term [23:40:26] 10Phabricator (Upstream), 10Upstream: When a task which is a blocker and has open blockers is closed, change the open blockers to block the grandparent task. - https://phabricator.wikimedia.org/T103182 (10Aklapper) [23:40:26] a few weeks i'm thinking [23:40:47] but i'm definitely down to limit the number of separate contexts if you think that's an issue [23:40:49] OK. The c+ms deduplication remains, but for the rest, I'd see, try and and see. It's good practice. [23:41:01] I'd say* [23:41:05] my statsd knowledge is limited and i just went with what seemed intuitive for me [23:41:24] every variation is actually 10 variations, given statsd fans out [23:41:25] ah, right. i just followed up with different names for c and ms [23:41:34] remove the c one entirely [23:41:38] right [23:41:42] 10Phabricator (Upstream), 10Upstream: When a task which is a blocker and has open blockers is closed, change the open blockers to block the grandparent task. - https://phabricator.wikimedia.org/T103182 (10Aklapper) Proposing to decline per T103182#1385104. [23:41:43] makes sense. we don't really care about that [23:41:44] foo.bar.sample_rate, foo.bar.median [23:41:51] you'll have it anyway [23:42:00] ah, cool [23:42:55] perhaps we don't need one by job without context for node too [23:43:26] and we could maybe introduce a convention for node label [23:43:56] i.e. only stats for labels matching "stats-{x}" get sent? [23:44:19] that way we'd have a basic mechanism for measuring groups of nodes at any given time, without having to collect en masse [23:44:24] Do you have different labels per type of node, or will they get distributed randomly? [23:44:46] i was going to assign them by the instance type for all docker nodes [23:44:56] since that's what i want to group on for measurement [23:45:06] does that make sense? [23:45:08] right, so you don't have different instance sizes with the same label. [23:45:21] exactly, yeah [23:45:28] and means you can promote/demote jobs individually to see their effect with more data to compare. [23:45:46] hmm, what do you mean? [23:45:59] jobs are tied to a node label, right? where label is a category of nodes. [23:46:01] like stick them to different node labels? [23:46:14] I could be wrong. [23:46:19] no, you're right [23:47:00] I don't know which approach you want to take, both are equally fine I think. just wondering. [23:47:22] One approach is to add the new executors with the same label, and they'll start running there sometimes and you can compare side by side. [23:47:32] but might have very low sampling for most of it. [23:48:08] Alternatively, you could focus on a few high profile ones, and switch one over from one to the other (zuul config, simple hot reload). And then see its effect just on the per-job metrics. [23:48:17] Might be more predictable and easier to roll back. [23:48:17] right. so, the reason i wanted job name and project in there is that i wanted to narrow down the measurement to a consistent set of jobs [23:49:34] right, that seems fine, for a temporary one, or else, to just whitelist ones you want to monitor (core + a handful of extensions at most). [23:49:43] ah, yes, what you said. (i.e. focus on high profile jobs) :) [23:49:58] that's what i was thinking at the outset, and then i got greedy :) [23:50:33] cool, i'll find a candidate then, and just schedule the publisher for that one job [23:51:00] I remember a few years back we did a similar thing for permanent slaves to nodepool, and between different kinds of slaves. Granted, those were much more different from each other than a small and big docker host, so we actually needed different labels to avoid having to deal with compatibility/bugs due to differences from all jobs at once. [23:51:09] Instead, being able to re-pin just a few at a time. [23:51:52] But you might want to do the same here, e.g. the ones currently pinned to DockerSomething introduce a bigger/differnet ones with DockerSomethingDifferent and then re-pin a few on that and see if metrics for a job+project combo improve or regress. [23:52:02] anyway, I'll be quiet now. I think you've got it. [23:52:54] no, i really appreciate the feedback! i'll probably bug you more when we have data to visualize, but this is a good start [23:54:15] yw [23:55:11] regarding labels, I recall a similar issue with RL metrics a few months ago. in that case, I found it easier to visually when joining the labels into one metric name (e.g. dash joined or some such), but it depends. Just something to consider. For this one an -and- like approach might work better. [23:56:09] also, I see it triggered via a Jenkins publisher, that's very neat. And also means it won't trigger for everything already, so I was abit overly concerned I see. [23:57:37] it was the best solution i found, using the groovy post-build plugin [23:57:55] since there's a lot more information available to groovy scripts than shell scripts