[00:09:50] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Patch-For-Review: deployment-imagescaler01 has no mwdeploy user - https://phabricator.wikimedia.org/T166013#3468620 (10mmodell) 05Open>03Resolved I'm marking this as resolved since the problem only affects beta... [00:18:19] Project selenium-Flow » firefox,beta,Linux,BrowserTests build #462: 04FAILURE in 2 min 18 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/462/ [00:52:25] (03CR) 10Krinkle: [C: 032] "Compiled and deployed 2 job update matching 'performance-webpagetest-w*'" [integration/config] - 10https://gerrit.wikimedia.org/r/367416 (https://phabricator.wikimedia.org/T166756) (owner: 10Hashar) [00:53:49] (03Merged) 10jenkins-bot: Move WebPageTest to a dedicated slave [integration/config] - 10https://gerrit.wikimedia.org/r/367416 (https://phabricator.wikimedia.org/T166756) (owner: 10Hashar) [01:21:11] 10Release-Engineering-Team, 10MW-1.29-release, 10Patch-For-Review, 10Release: MediaWiki 1.29 tarball comes with the wrong extensions - and misses some - https://phabricator.wikimedia.org/T171197#3457185 (10demon) I did not use that branch. I didn't even know it existed. [01:38:44] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Performance-Team, 10WebPageTest, 10Patch-For-Review: Where to trigger WebPageTest jobs? - https://phabricator.wikimedia.org/T166756#3468761 (10Krinkle) >>! In T166756#3466308, @hashar wrote: > ``` > webperformance:~$ nodejs -... [01:51:44] 10MediaWiki-Releasing, 10Release-Engineering-Team (Kanban), 10MW-1.29-release-notes, 10Patch-For-Review: Include release extensions/skins/vendor as submodules of core - https://phabricator.wikimedia.org/T137564#3468783 (10MacFan4000) [01:51:46] 10Release-Engineering-Team, 10MW-1.29-release, 10Patch-For-Review, 10Release: MediaWiki 1.29 tarball comes with the wrong extensions - and misses some - https://phabricator.wikimedia.org/T171197#3468782 (10MacFan4000) 05Open>03Resolved [01:55:40] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Performance-Team, 10WebPageTest, 10Patch-For-Review: Where to trigger WebPageTest jobs? - https://phabricator.wikimedia.org/T166756#3468784 (10Krinkle) Had to press "Trust Host Key", due to manual confirmation it presumably s... [01:59:09] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Performance-Team, 10WebPageTest, 10Patch-For-Review: Where to trigger WebPageTest jobs? - https://phabricator.wikimedia.org/T166756#3468786 (10Krinkle) >>! In T166756#3466253, @gerritbot wrote: > Change 367411 had a related p... [02:43:40] Project beta-scap-eqiad build #165521: 04FAILURE in 0.4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/165521/ [02:55:56] Yippee, build fixed! [02:55:57] Project beta-scap-eqiad build #165522: 09FIXED in 2 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/165522/ [03:15:43] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog: Android SDK is suddenly failing to auto-install (for non-periodic jobs) - https://phabricator.wikimedia.org/T171562#3468844 (10Mholloway) [03:17:05] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog: Android SDK is suddenly failing to auto-install (for non-periodic jobs) - https://phabricator.wikimedia.org/T171562#3468859 (10Mholloway) [03:17:07] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog: Android SDK is suddenly failing to auto-install (for non-periodic jobs) - https://phabricator.wikimedia.org/T171562#3468844 (10Mholloway) p:05Triage>03High [03:18:11] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog: Android SDK is suddenly failing to auto-install (for non-periodic jobs) - https://phabricator.wikimedia.org/T171562#3468844 (10Mholloway) [03:18:12] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog: Android SDK is suddenly failing to auto-install (for non-periodic jobs), blocking tests from being performed - https://phabricator.wikimedia.org/T171562#3468844 (10Mholloway) [03:22:03] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog: Android SDK is suddenly failing to auto-install, blocking tests from being performed - https://phabricator.wikimedia.org/T171562#3468863 (10Mholloway) [03:23:10] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog: Android SDK is suddenly failing to auto-install, blocking tests from being executed - https://phabricator.wikimedia.org/T171562#3468844 (10Mholloway) [04:19:56] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #463: 04FAILURE in 23 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/463/ [06:35:45] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next), 10User-Joe: puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3468983 (10Joe) ok, what I don't understand atm is why this works in production and not in deployment-prep. more in general,... [07:18:22] 10Scap, 10Analytics, 10EventBus: eventlogging-service-eventbus scap deployments should depool/pool during deployment - https://phabricator.wikimedia.org/T171506#3467218 (10Joe) indeed. [07:41:53] 10Scap, 10Analytics, 10EventBus: eventlogging-service-eventbus scap deployments should depool/pool during deployment - https://phabricator.wikimedia.org/T171506#3469035 (10elukey) Puppet was broken on kafka2002: ``` ESC[1;31mError: Could not set home on user[eventlogging]: Execution of '/usr/sbin/usermod -d... [08:25:01] hashar: I want to mark this task resolved but I don't have access to logstash right now to check if it's really fixed or still happens, Can you check it when you're a free? https://phabricator.wikimedia.org/T170599 [08:25:46] good morning Amir1 [08:25:59] Good morning :) [08:26:29] I don't want to bother you but since it's made be RelEng, I thought you're the best person [08:26:33] *by [08:27:04] Amir1: so yeah releng is in charge of noticing log issues on production [08:27:10] filing bugs [08:27:16] and having devs aware / working on the resolution [08:28:51] Yeah, I'm sorry. I wait for my rights [08:31:14] Amir1: the last trace was on 2017-07-20T16:56:19 [08:31:54] hashar: that's when the fix got deployed https://phabricator.wikimedia.org/T170599#3455712 [08:32:15] I'm calling it done. Thanks [08:32:39] Amir1: that is because Wikidata got rollbacked [08:32:57] oops I did it again [08:33:09] well that is my understanding :-) [08:35:02] It's very valuable [08:35:05] Thanks [08:38:13] (03PS2) 10Hashar: Run WebPageTest tests from Asia to verify the new cache pop. [integration/config] - 10https://gerrit.wikimedia.org/r/362972 (https://phabricator.wikimedia.org/T168416) (owner: 10Phedenskog) [08:38:54] (03PS3) 10Hashar: Run WebPageTest tests from Asia to verify the new cache pop. [integration/config] - 10https://gerrit.wikimedia.org/r/362972 (https://phabricator.wikimedia.org/T168416) (owner: 10Phedenskog) [08:39:39] (03PS4) 10Hashar: Run WebPageTest tests from Asia to verify the new cache pop. [integration/config] - 10https://gerrit.wikimedia.org/r/362972 (https://phabricator.wikimedia.org/T168416) (owner: 10Phedenskog) [08:41:26] (03CR) 10Hashar: [C: 031] "PS2 is a rebase" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/362972 (https://phabricator.wikimedia.org/T168416) (owner: 10Phedenskog) [08:41:38] (03CR) 10Hashar: [C: 031] Run WebPageTest tests from Asia to verify the new cache pop. [integration/config] - 10https://gerrit.wikimedia.org/r/362972 (https://phabricator.wikimedia.org/T168416) (owner: 10Phedenskog) [09:36:52] 10MediaWiki-Codesniffer, 10Patch-For-Review: Ignore blocks with @inheritDoc - https://phabricator.wikimedia.org/T164649#3240961 (10Tgr) Note that while the PSR-5 draft has `@inheritDoc`, most other projects and tools use `@inheritdoc`, in all lowercase. (E.g. [[https://manual.phpdoc.org/HTMLSmartyConverter/Han... [09:40:01] elukey: good morning :-] For jobrunner/Redis there is a patch for mediawiki/core that would eventually slightly reduce the # of connections made to the jobrunner redis [09:40:41] elukey: I have noticed that bots hitting the API request site stats which gets the # of jobs and always do a connection to redis to get the job count [09:40:48] https://gerrit.wikimedia.org/r/#/c/356119/ would cache that for 30 seconds [09:41:00] which should slightly reduce the # of connections [09:41:07] (all of that is solely for info) [09:41:08] ;-D [09:44:57] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Performance-Team, 10WebPageTest, 10Patch-For-Review: Where to trigger WebPageTest jobs? - https://phabricator.wikimedia.org/T166756#3469291 (10hashar) @Krinkle regarding the bad slave being connected, that is entirely my faul... [09:50:08] 10Release-Engineering-Team (Watching / External), 10Wikimedia-General-or-Unknown: enwiki file "Lock_icon_blue.gif" in sites CSS has to be switched to commons wiki - https://phabricator.wikimedia.org/T162235#3469313 (10hashar) 05Open>03Resolved They all have been edited: ``` $ mwgrep 'wikipedia/en/0/00/Lock... [09:55:59] hashar: nice! [10:01:51] elukey: and there are still some issues to fix for scap to properly deploy the jobrunner stuff [10:02:11] namely restart/reload TWO services, that would be in scap 3.6 [10:02:30] and one that remains to be implemented which is to not start the services in the backup datacenter [10:02:42] currently scap deploy would start the job runners in codfw :/ [10:03:03] in short, that was a quick status update letting you know that it is known / tracked [10:03:43] thanks! I gave up working on the persistent conns HHVM -> Redis since we are migrating to the new architecture [10:03:53] and there wasn't enough traction [10:04:04] (same thing for the duplicate jobs in codfw) [10:25:28] anybody already seen an issue like [10:25:29] Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item profile::etcd::cluster_name in any Hiera data file and no default supplied at /etc/puppet/modules/profile/manifests/etcd.pp:32 on node deployment-aqs02.deployment-prep.eqiad.wmflabs [10:25:50] I have no idea why I should have etcd on AQS :D [10:45:25] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Next), 10Wikidata, and 2 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3469553 (10Aleksey_WMDE) Hi @zeljkofilipin! I see that T164721 is resolved ;) So what... [10:56:03] RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:56:06] elukey: who knows? Our puppet modules are tightly coupled and maybe that comes from some other manifest than the AQS ones [10:56:35] somebody put the etcd profile in horizon -> aqs prefix [10:56:37] not sure why [11:04:41] RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0] [11:05:15] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:25:46] elukey: no idea either :( [12:43:25] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikidata, and 2 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3469790 (10zeljkofilipin) a:03zeljkofilipin [12:48:19] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikidata, and 2 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3469816 (10zeljkofilipin) @Aleksey_WMDE I will take a look. Please note that we wil... [12:49:12] 10Release-Engineering-Team, 10User-zeljkofilipin: Use pwstore (a shared gpg-encrypted password store) for Release Engineering related passwords - https://phabricator.wikimedia.org/T139093#3469817 (10zeljkofilipin) [12:52:03] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Wikidata, and 2 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#3469820 (10Aleksey_WMDE) It's not exactly ruby related. We just need a job in Jenkins [13:02:06] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Use pwstore (a shared gpg-encrypted password store) for Release Engineering related passwords - https://phabricator.wikimedia.org/T139093#3469830 (10zeljkofilipin) [13:06:52] 10Release-Engineering-Team (Kanban), 10Reading-Web-Backlog, 10RelatedArticles, 10Patch-For-Review, 10User-zeljkofilipin: Rewrite Related pages browser tests in Node.js - https://phabricator.wikimedia.org/T164024#3469837 (10zeljkofilipin) Blocked by {T170880}. [13:07:52] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MW-1.30-release-notes, 10MediaWiki-Core-Tests, and 5 others: Parser tests fail if default Skin for unit tests makes use of doEditSectionLink - https://phabricator.wikimedia.org/T170880#3445978 (10zeljkofilipin) This is blocking {T164024}. [13:29:10] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog: Android SDK is suddenly failing to auto-install, blocking tests from being executed - https://phabricator.wikimedia.org/T171562#3469892 (10Mholloway) @hashar , has any configuration around these machines (integration-slave-jessie-1001, integrati... [13:32:43] (03PS5) 10MacFan4000: rm DeepSea [integration/config] - 10https://gerrit.wikimedia.org/r/367041 (https://phabricator.wikimedia.org/T171385) [13:36:37] 10Continuous-Integration-Config, 10Release-Engineering-Team, 10MW-1.30-release-notes, 10MediaWiki-Core-Tests, and 5 others: Parser tests fail if default Skin for unit tests makes use of doEditSectionLink - https://phabricator.wikimedia.org/T170880#3469937 (10hashar) The fix is https://gerrit.wikimedia.org/... [13:40:56] (03PS1) 10Zfilipin: Do not run mwext-mw-selenium-jessie for RelatedArticles [integration/config] - 10https://gerrit.wikimedia.org/r/367674 (https://phabricator.wikimedia.org/T164024) [14:30:46] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog, 10Patch-For-Review: Android SDK is suddenly failing to auto-install, blocking tests from being executed - https://phabricator.wikimedia.org/T171562#3470232 (10Mholloway) 05Open>03Resolved a:03Mholloway Reverting to the prior build-tools... [14:51:21] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: rebuildTermSqlIndex.php causing beta-update-databases-eqiad to timeout and abort - https://phabricator.wikimedia.org/T168036#3470345 (10Aleksey_WMDE) [15:49:39] 10Continuous-Integration-Config, 10Release-Engineering-Team (Backlog), 10Zuul: recheck is ignored if there are also inline comments - https://phabricator.wikimedia.org/T171352#3470598 (10Florian) a:03Florian Aha, interesting, I haven't thought about that so far :) However, the syntax of the comment looks... [16:18:09] (03PS1) 10Florianschmidtwelzow: Recognize recheck with inline comments, too [integration/config] - 10https://gerrit.wikimedia.org/r/367691 (https://phabricator.wikimedia.org/T171352) [16:21:37] 10Continuous-Integration-Config, 10Release-Engineering-Team (Watching / External), 10MW-1.30-release-notes, 10MediaWiki-Core-Tests, and 5 others: Parser tests fail if default Skin for unit tests makes use of doEditSectionLink - https://phabricator.wikimedia.org/T170880#3470698 (10greg) [16:35:19] sheesh irc keeps timing out my connection ... [16:35:31] wrong room :P [16:42:01] RainbowSprinkles hi, i wonder could you paste the entrys for Reception123 please? (upstream would like to see those entry's.) ie from the external id table for the user. [16:42:26] 10Release-Engineering-Team (Watching / External), 10Operations: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#3470774 (10jcrespo) [16:44:25] So you think it will be possible to fix my account then? [16:44:54] any idea why I am getting this msg for a new instance in the analytics group? [16:44:57] 2017-07-25T16:17:02.940437+00:00 zk2-1 nslcd[487]: [b127f8] no available LDAP server found: Server is unavailable [16:45:58] elukey: on labs/VPS? or in beta cluster? [16:46:40] Reception123: RainbowSprinkles and me were talking yesturday about how to fix your account. We have decided on a way to delete the account without lossing your review history. [16:46:47] Reception123: <+RainbowSprinkles> I was thinking we could delete the account--just accounts & external id entries, not actual review history or patches. Then swap the account_id on those 3 entries to the old one? [16:46:59] greg-g: analytics project in labs [16:47:14] beta cluster is deployment-prep right? [16:47:17] right [16:47:19] Though the entry's i am asking for is for upstream to help and see if there's a bug and fix it :). [16:47:31] it will be unlikly this will be fixed in 2.13 [16:47:34] elukey: so, anlaytics in labs -> #wikimedia-cloud [16:47:41] or #wikimedia-analytics [16:47:49] it's now end of life as far as upstream says (no more releases for 2.13) [16:47:57] Whatever works for you. [16:48:06] greg-g: ah sorry thanks! [16:48:34] elukey: no prob :) [16:51:43] !log deploying ores 835d848 T171505 [16:51:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:51:47] T171505: Late-July 2017 ORES deploy - https://phabricator.wikimedia.org/T171505 [17:02:07] 10Release-Engineering-Team, 10Scap, 10ORES, 10Scoring-platform-team-Backlog: ORES should use git-fat for wheel deployments - https://phabricator.wikimedia.org/T171619#3470967 (10demon) [17:02:34] Unrelated to my other problem but could someone review https://gerrit.wikimedia.org/r/#/c/348577/ (extension registration)? It overrides an extension that I would like to use, so it would be nice if it could finally be fixed [17:04:19] 10Release-Engineering-Team, 10Scap, 10ORES, 10Scoring-platform-team-Backlog: ORES should use git-fat for wheel deployments - https://phabricator.wikimedia.org/T171619#3470987 (10demon) [17:25:55] 10Release-Engineering-Team, 10Scap, 10ORES, 10Scoring-platform-team-Backlog: ORES should use git-fat for wheel deployments - https://phabricator.wikimedia.org/T171619#3470967 (10bd808) #striker could use this too. It has the same sort of wheel blob repo as ORES. [17:28:50] 10Release-Engineering-Team, 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Split editquality repo to two repos, one with full history, one shallow - https://phabricator.wikimedia.org/T170967#3471090 (10awight) [17:35:44] 10Release-Engineering-Team (Watching / External), 10MediaWiki-Containers, 10Kubernetes, 10Services (designing), 10User-mobrovac: RFC: Container path conventions - https://phabricator.wikimedia.org/T169998#3471155 (10greg) [17:37:05] 10Release-Engineering-Team (Watching / External), 10Scoring-platform-team, 10editquality-modeling, 10User-Ladsgroup, 10artificial-intelligence: Split editquality repo to two repos, one with full history, one shallow - https://phabricator.wikimedia.org/T170967#3471163 (10greg) See also: {T171619} [17:41:12] 10Release-Engineering-Team, 10Scap, 10ORES, 10Scoring-platform-team-Backlog: ORES should use git-fat for wheel deployments - https://phabricator.wikimedia.org/T171619#3470967 (10awight) I don't want to hijack this task, but for the record our binary problem is much more severe in the `editquality` repo, wh... [17:43:09] 10Release-Engineering-Team, 10Scap, 10ORES, 10Scoring-platform-team-Backlog: ORES should use git-fat for wheel deployments - https://phabricator.wikimedia.org/T171619#3470967 (10greg) Yeah, let's do it for all the repos with binaries :) And we'll figure out something re labs + prod re git-fat access. [17:44:12] 10Release-Engineering-Team, 10Scap, 10ORES, 10Scoring-platform-team-Backlog: ORES should use git-fat for wheel deployments - https://phabricator.wikimedia.org/T171619#3470967 (10Halfak) Where are the best docs for setting this up. Will it work with out github repos? We do development mainly against them... [17:51:35] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Discovery, 10Discovery-Analysis, 10Operations: Setup a mirror for R language dependencies (CRAN) - https://phabricator.wikimedia.org/T170995#3471367 (10mpopov) From @Ottomata at https://gerrit.wikimedia.org/r/#/c... [17:55:47] 10MediaWiki-Codesniffer, 10Patch-For-Review: Provide a Codesniffer rule to enforce "short" type definitions: int and bool, not integer and boolean - https://phabricator.wikimedia.org/T145162#2621760 (10Umherirrender) Along with @return it should also check @param or @var [17:56:48] 10MediaWiki-Codesniffer, 10MW-1.30-release-notes, 10MediaWiki-General-or-Unknown, 10Patch-For-Review: phpcs not running on all files - https://phabricator.wikimedia.org/T129664#3471401 (10Umherirrender) 05Open>03Resolved [18:02:45] 10MediaWiki-Codesniffer: Provide a Codesniffer rule to enforce "short" type definitions: int and bool, not integer and boolean - https://phabricator.wikimedia.org/T145162#3471453 (10Krinkle) p:05Triage>03Normal [18:02:46] 10Release-Engineering-Team (Watching / External), 10Operations: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#3471455 (10jcrespo) [18:03:04] 10MediaWiki-Codesniffer: Provide Codesniffer rules to enforce "short" type definitions (int/bool, not integer/boolean) - https://phabricator.wikimedia.org/T145162#2621760 (10Krinkle) [18:03:29] 10MediaWiki-Codesniffer: Spacing around + - https://phabricator.wikimedia.org/T171393#3463430 (10Umherirrender) This could also be done for - * / % ^ But not for += or ++ (that is easy, because there are an own token for it) But it should respect standalone +1 or -1 [18:06:00] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next), 10User-Joe: puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3471479 (10mobrovac) >>! In T171173#3468983, @Joe wrote: > ok, what I don't understand atm is why this works in production and... [18:06:20] thcipriani: thnx for finding a work-around for ^^^ [18:06:56] sure, happy to horribly patch things any time :) [18:07:32] loool [18:07:45] * thcipriani lives in a house built of duct tape [18:07:55] has the next train branch been cut? [18:08:05] thcipriani: at least you have a house :P [18:08:22] I believe so, I think l10n is getting updated for the next branch right now [18:08:26] I already cut it, yes [18:09:24] oh ok [18:09:33] * mobrovac missed his opportunity to get a core patch in [18:09:38] swatting it is [18:09:41] 10Release-Engineering-Team (Watching / External), 10Operations: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#3471520 (10Krinkle) [18:10:41] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Fix or remove Blubber's node_modules optimization - https://phabricator.wikimedia.org/T171632#3471557 (10dduvall) [18:12:22] 10Release-Engineering-Team (Watching / External), 10Operations: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#3471601 (10Krinkle) [18:13:48] 10Release-Engineering-Team (Watching / External), 10Operations: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#2990470 (10Krinkle) [18:15:57] 10Release-Engineering-Team (Watching / External), 10Operations: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#3471645 (10jcrespo) [18:16:44] 10Release-Engineering-Team (Watching / External), 10Operations: Provide cross-dc redundancy (active-active or active-passive) to all important misc services - https://phabricator.wikimedia.org/T156937#2990470 (10jcrespo) [18:30:51] jdlrobson: Heyyyy, you about? Got a question about MinervaNeue [18:31:19] I was looking at some user_properties values, and I see a bit of some discrepancy: [18:31:36] | minerva | 9 | [18:31:36] | minervaneue | 4 | [18:31:37] | minerva-neue | 1 | [18:31:50] What's the correct normalized value? We should clean this up before there becomes too much drift / we forget [18:32:00] (I'm assuming it's one of the latter two) [18:35:04] o_0 [18:41:35] RainbowSprinkles was there three entry's in the table? [18:42:04] For what, the login issue in gerrit? [18:42:53] yep [18:43:33] 10Release-Engineering-Team, 10MobileFrontend, 10Reading-Web-Backlog, 10RelatedArticles, 10MW-1.30-release-notes (WMF-deploy-2017-07-18_(1.30.0-wmf.10)): RelatedPages PHP unit tests are failing during to an issue with MobileFrontend - https://phabricator.wikimedia.org/T170624#3471783 (10Jdlrobson) Let's a... [18:45:44] 10Release-Engineering-Team, 10MobileFrontend, 10Reading-Web-Backlog, 10RelatedArticles, 10MW-1.30-release-notes (WMF-deploy-2017-07-18_(1.30.0-wmf.10)): Restore Skip ApiMobileViewTest::testView and verify PHP unit tests for RelatedArticles pass - https://phabricator.wikimedia.org/T170624#3471788 (10Jdlrob... [19:11:58] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #22: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/22/ [19:14:24] paladox: Nope, just 2 [19:14:31] ok thanks [19:14:48] RainbowSprinkles would you be able to past them so i can show upstream please? [19:14:52] edwin is asking for them [19:16:25] It's the same ones as always. [19:16:30] They haven't changed [19:17:14] 3635 | | NULL | mailto: [19:17:14] 3635 | NULL | NULL | username:reception123 [19:17:31] thanks [19:18:40] RainbowSprinkles ah [19:18:43] it's missing gerrit: [19:18:50] the gerrit: prefix [19:18:52] Durrrrrrr [19:18:54] I saw 2 [19:18:57] There should be 3 here [19:19:02] MEA CULPA [19:19:10] I saw 2 and misread it as the 2 correct ones [19:19:24] lol [19:19:25] :) [19:19:26] Aha [19:19:29] It can be confussing yeh. We all presumed two entrys [19:20:57] reception123 we found your problem :) [19:21:02] Added, flushing all caches [19:21:07] thanks :) [19:22:00] Thank you!! [19:22:02] it works now [19:22:18] errors are always easier than you think [19:22:36] :) [19:22:55] and most of the time the small issue goes unnoticed :P [20:01:11] thcipriani: ping regarding https://phabricator.wikimedia.org/T129148 - I'd like to get this into a non-dirty state where we can actually make deployments again, even if that means running one or more complicated commands (as long as we can document them on-wiki). [20:01:51] Right now the repo on tin is dirty and I'm not quite sure what the proper way is. Proposal: 1) Commit the removal of restart command, 2) Document which command to use to restart the eqiad runners only. [20:03:56] Krinkle: this is on my list. We can work around this in scap with a bit of a more complicated setup for the time being. My suggestion for the time being would be to use 2 environments for deployment. So commands would be like: scap -e codfw deploy -v && scap -e eqiad deploy -v then we'd add scap.cfg for each: scap/environments/{eqiad,codfw} [20:04:36] scap multiple service restart support will be with the 3.6 release, which I just tagged (still need to push and ask for deb rebuild) [20:05:29] this setup would at least allow jobrunner to be deployed; however, a more long term solution would be good to work out re:not restarting on a subset of machines. [20:06:05] thcipriani: the 2 env solution, does that work now? [20:06:30] it would require some config changes, but, yeah, I think it could. [20:09:53] * thcipriani braindumps on T129148 [20:17:49] 10Deployment-Systems, 10Scap, 10Patch-For-Review: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762#3472190 (10thcipriani) 05Resolved>03Open Hi @fgiunchedi just tagged debian/3.6.0-1 on scap's release branch. Could you update scap on carbon please? Thanks! [20:19:24] thcipriani: It seems from sudo -l that it has rights to service jobrunner but not jobchron [20:19:26] that's a problem [20:19:40] In our case we were lucky the code from the jobchron file wasn't modified in that patch [20:19:50] 2 services 1 repo/package/deployment [20:20:24] do you know how this was setup with trebuchet? I'm unclear how the 2 service 1 repo thing ever worked... [20:20:27] That means code affecting jobchron can't be deployed at all without root, right? [20:20:49] thcipriani: firstly, the deployment tool didn't do restart, that was manually done with salt [20:20:55] and we'd just restart both one after the other [20:21:00] first jobrunner, then jobchron. [20:21:10] but that's now root-only. [20:21:16] (2014 ish) [20:21:28] gotcha [20:22:30] yeah, we'll need to add a sudoers rule in puppet to allow mwdeloy to restart jobchron [20:23:45] scap 3.6.0 will allow multiple services to be restarted as part of a deployment. [20:52:18] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Fix or remove Blubber's node_modules optimization - https://phabricator.wikimedia.org/T171632#3472421 (10hashar) Could it be that `/tmp` is a `tmpfs` and thus moving files under / is actually a copy+delete? What about running `npm install` at... [20:56:19] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Services (next), 10User-Joe: puppet dependency loop on deployment-sca hosts - https://phabricator.wikimedia.org/T171173#3456426 (10hashar) Thank you @thcipriani for the analysis and the patch! [21:11:12] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Watching / External), 10Discovery, 10Discovery-Analysis, 10Operations: Setup a mirror for R language dependencies (CRAN) - https://phabricator.wikimedia.org/T170995#3472542 (10hashar) @mpopov using twitter to get the size was a smart mo... [21:21:34] RainbowSprinkles: hey, it seems wikibase/wikiba.se is not mirrored in diffusion, is it fixable? [21:21:59] Yes. File a ticket for me so I don't forget, kinda busy today [21:22:12] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog, 10Patch-For-Review: Android SDK is suddenly failing to auto-install, blocking tests from being executed - https://phabricator.wikimedia.org/T171562#3472602 (10hashar) >>! In T171562#3469892, @Mholloway wrote: > @hashar , has any configuration... [21:22:32] sure [21:23:52] but I think amir1 can do it himself via toolsadmin right? [21:24:12] 10Release-Engineering-Team, 10Wikidata, 10wikiba.se: Mirror wikibase/wikiba.se in diffusion - https://phabricator.wikimedia.org/T171667#3472623 (10Ladsgroup) [21:24:24] https://phabricator.wikimedia.org/T171667 [21:24:59] Nope, I don't have the rights [21:25:16] okay, sorry, all that stuff is kinda new to me [21:28:07] No worries [21:34:04] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:35:46] PROBLEM - Puppet errors on deployment-etcd-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:37] RainbowSprinkles i doint see why upstream have to limit cc to notedb. :) [21:36:47] PROBLEM - Puppet errors on deployment-pdf01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:36:52] would be useful for us as then you doint have to be a reviewer to get updates [21:36:52] Meh [21:37:18] they even allow non gerrit users to be subscribed using the cc option. [21:37:48] they also limit the hashtag feature too [21:38:26] they even have a picture of a hashtag (heh) [21:38:53] you can now move changes from one branch to another in gerrit using the rest api :) [21:41:17] 10Release-Engineering-Team, 10Wikidata, 10wikiba.se: Mirror wikibase/wikiba.se from Gerrit to Diffusion - https://phabricator.wikimedia.org/T171667#3472711 (10Krinkle) [21:49:52] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 3 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3472739 (10thcipriani) I guess in deploying with trebuchet any service restarts were hand... [21:52:35] thcipriani: Do you know if wmf.13/14 will ship? They're during Wikimania so it's traditional that we don't deploy, but there are blocker tasks extant… [21:56:57] James_F: they will, I'm the only one from RelEng going, and Ops has good coverage during that time as well. [21:57:30] (I confirmed with mark/faidon) [21:57:36] Do the people whose code is being deployed, though? [21:58:24] cost/benefit: each time we make a call someone always says it was wrong. Either "devs won't be around" or "2 weeks of backlogged code" or "but devs are all around at the hackathon anyway" [22:04:38] 10Deployment-Systems, 10Release-Engineering-Team (Next), 10Scap (Scap3-Adoption-Phase1), 10MediaWiki-JobRunner, 10Operations: Figure out how to disable starting of jobrunner/jobchron in the non-active DC - https://phabricator.wikimedia.org/T167104#3472824 (10Krinkle) [22:06:44] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 3 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3472827 (10Krinkle) [22:08:57] 10Gerrit, 10Diffusion, 10Phabricator, 10Wikidata, 10wikiba.se: Mirror wikibase/wikiba.se from Gerrit to Diffusion - https://phabricator.wikimedia.org/T171667#3472831 (10greg) [22:09:40] 10Continuous-Integration-Infrastructure, 10Zuul: Add support for ecdsa keys in zuul - https://phabricator.wikimedia.org/T171165#3472848 (10greg) [22:10:44] 10Deployment-Systems, 10Release-Engineering-Team (Kanban), 10Scap (Scap3-Adoption-Phase1), 10scap2, and 3 others: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148#3472851 (10Krinkle) [22:11:03] 10Continuous-Integration-Config, 10Fundraising-Backlog, 10Wikimedia-Fundraising-CiviCRM: Find way to exclude php 5.4 files from vendor lint task - https://phabricator.wikimedia.org/T170641#3472852 (10greg) [22:13:45] thcipriani: Thanks, I've created a task list in the task summary [22:13:47] LGTY? [22:13:59] * thcipriani looks [22:14:47] * thcipriani thumbs-up [22:14:51] looks good [22:16:03] > Create dsh group jobrunners-active [22:16:20] I'm not quite sure how to do that just yet, other than manually. I'll look into it. [22:20:37] greg-g: Yeah, I know. :-( [22:20:58] greg-g: I'll ping my teams reminding them not to merge scary stuff. [22:22:36] James_F: I'm writing the qunit mail at the moment. [22:22:42] Just FYI [22:22:45] but landing it now is fine [22:22:47] Krinkle: Good. [22:22:59] Krinkle: Was midst writing to ping you in -dev. ;-) [22:23:10] 10Gerrit, 10Diffusion, 10Phabricator, 10Repository-Admins, and 2 others: Mirror wikibase/wikiba.se from Gerrit to Diffusion - https://phabricator.wikimedia.org/T171667#3472896 (10MarcoAurelio) They can create the repo and set it to watch gerrit or github. [22:27:24] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Fix or remove Blubber's node_modules optimization - https://phabricator.wikimedia.org/T171632#3472903 (10dduvall) >>! In T171632#3472421, @hashar wrote: > Could it be that `/tmp` is a `tmpfs` and thus moving files under / is actually a copy+del... [23:41:31] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Regression, 10Upstream: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#3473212 (10Paladox) Adding to the record we managed to fix Reception123 account. It was missing the gerrit: prefix in the entry in the external id...