[00:18:04] 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3250192 (10TTO) [00:26:42] marxarelli: dunno if you got the last msg, but repeating just in case: would it make sense to transfer blubber to wikimedia on gh? [00:27:27] mobrovac: yeah we should probably move it to phab at this point [00:27:47] oh that could work too i guess :) [00:33:49] addshore: Thanks, looking into it now. [00:34:39] Project beta-scap-eqiad build #154670: 04FAILURE in 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/154670/ [00:44:40] Project beta-scap-eqiad build #154671: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/154671/ [00:52:16] 10Scap: scap did not catch `Notice: Undefined variable: wmgRelatedArticlesShowInSidebar in /srv/mediawiki/wmf-config/CommonSettings.php on line 2893` - https://phabricator.wikimedia.org/T164754#3250270 (10bd808) The early timestamps are in the raw syslog logs on mwlog1001, so I don't think logstash is doing anyt... [00:54:41] Project beta-scap-eqiad build #154672: 04STILL FAILING in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/154672/ [00:59:34] Should fix soon, I had a typo in a scap change ^ [01:04:43] Project beta-scap-eqiad build #154673: 04STILL FAILING in 1 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/154673/ [01:14:41] Project beta-scap-eqiad build #154674: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/154674/ [01:24:40] Project beta-scap-eqiad build #154675: 04STILL FAILING in 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/154675/ [01:28:59] grrr. [01:35:40] Yippee, build fixed! [01:35:41] Project beta-scap-eqiad build #154676: 09FIXED in 2 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/154676/ [01:54:57] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:58:32] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [02:13:00] 05Gitblit-Deprecate, 05MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 13Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#3250358 (10TerraCodes) [02:26:22] 05Gitblit-Deprecate, 05MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 13Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#3250369 (10TerraCodes) [06:06:41] Krinkle: any joy? [06:10:54] well, looks like it is fixed! [06:26:30] Project selenium-Wikibase » chrome,test,Linux,BrowserTests build #356: 04FAILURE in 1 hr 46 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=BrowserTests/356/ [06:48:16] Project selenium-Wikibase » chrome,beta,Linux,BrowserTests build #356: 04FAILURE in 2 hr 8 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/356/ [07:14:17] 05Gitblit-Deprecate, 05MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 13Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#3250673 (10TerraCodes) [07:14:53] 05Gitblit-Deprecate, 05MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 13Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#2535563 (10TerraCodes) [07:18:31] 05Gitblit-Deprecate, 05MW-1.29-release (WMF-deploy-2017-04-25_(1.29.0-wmf.21)), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 13Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#3250679 (10TerraCodes) [07:40:45] (03CR) 10Hashar: [C: 032] "Jobs refreshed" [integration/config] - 10https://gerrit.wikimedia.org/r/352128 (https://phabricator.wikimedia.org/T161895) (owner: 10Hashar) [07:42:30] (03Merged) 10jenkins-bot: Allow composer-test to run in different directory [integration/config] - 10https://gerrit.wikimedia.org/r/352128 (https://phabricator.wikimedia.org/T161895) (owner: 10Hashar) [08:04:31] (03PS2) 10Hashar: mwext-testextension now runs composer test [integration/config] - 10https://gerrit.wikimedia.org/r/352160 (https://phabricator.wikimedia.org/T161895) [08:04:57] !log merging 'composer test' into mwext-testextension-* jobs https://gerrit.wikimedia.org/r/#/c/352160/ - T161895 [08:05:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:05:01] T161895: For MediaWiki extensions, merge composer test into mwext-textextension / mediawiki-extensions jobs - https://phabricator.wikimedia.org/T161895 [08:06:59] (03CR) 10Hashar: [C: 032] mwext-testextension now runs composer test [integration/config] - 10https://gerrit.wikimedia.org/r/352160 (https://phabricator.wikimedia.org/T161895) (owner: 10Hashar) [08:08:05] (03Merged) 10jenkins-bot: mwext-testextension now runs composer test [integration/config] - 10https://gerrit.wikimedia.org/r/352160 (https://phabricator.wikimedia.org/T161895) (owner: 10Hashar) [09:45:07] 10Gerrit, 10MediaWiki-Vagrant: "index-pack failed" when installing new MediaWiki-Vagrant box - https://phabricator.wikimedia.org/T152801#3250962 (10Tgr) >>! In T152801#3184281, @Lunnorey wrote: > I found solution. It work's for me. > >> Quick solution: >> >> With this kind of error, I usually start by raisin... [10:04:26] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [10:14:52] 10Deployment-Systems, 10ArchCom-RfC, 07I18n: RFC: Reevaluate LocalisationUpdate extension for WMF - https://phabricator.wikimedia.org/T158360#3251026 (10Nikerabbit) >>! In T158360#3239476, @greg wrote: > It's not hard at all: it logs success/failure in the SAL after each run. Language team has received repo... [10:29:29] (03PS4) 10Hashar: Add some composer jobs for skins [integration/config] - 10https://gerrit.wikimedia.org/r/352347 (owner: 10Umherirrender) [10:30:02] hashar: thanks for patches that fixes parralel lint but please put a throttle on the +2s or patches themselves. My patch in Wikibase is waiting for jenkins for 54 minutes now [10:30:13] *parallel [10:30:52] (03CR) 10Hashar: [C: 032] "testextension job now runs 'composer test'. I have deployed the change this morning ( https://gerrit.wikimedia.org/r/#/c/352160/ )." [integration/config] - 10https://gerrit.wikimedia.org/r/352347 (owner: 10Umherirrender) [10:31:40] Amir1: it is not like your patch has any kind of priority does it ? [10:31:57] hashar: it does not but it's blocking my work [10:33:10] Amir1: the spam of changes is almost done :) [10:33:29] Thanks :) [10:33:30] (03Merged) 10jenkins-bot: Add some composer jobs for skins [integration/config] - 10https://gerrit.wikimedia.org/r/352347 (owner: 10Umherirrender) [10:38:11] 10Deployment-Systems, 10ArchCom-RfC, 07I18n: RFC: Reevaluate LocalisationUpdate extension for WMF - https://phabricator.wikimedia.org/T158360#3251063 (10Reedy) >>! In T158360#3251026, @Nikerabbit wrote: > Language team has received reports multiple times about LU being broken. What I proposed could help to i... [10:39:27] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:45:56] PROBLEM - Puppet errors on deployment-imagescaler01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [10:53:09] Project beta-code-update-eqiad build #155077: 04FAILURE in 8.4 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155077/ [10:59:32] PROBLEM - Puppet errors on deployment-pdf01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:03:11] Project beta-code-update-eqiad build #155078: 04STILL FAILING in 10 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155078/ [11:13:09] Project beta-code-update-eqiad build #155079: 04STILL FAILING in 8.6 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155079/ [11:23:04] Project beta-code-update-eqiad build #155080: 04STILL FAILING in 4.3 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155080/ [11:33:05] Project beta-code-update-eqiad build #155081: 04STILL FAILING in 4.5 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155081/ [11:43:05] Project beta-code-update-eqiad build #155082: 04STILL FAILING in 4.5 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155082/ [11:53:04] Project beta-code-update-eqiad build #155083: 04STILL FAILING in 4.4 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155083/ [12:03:09] Project beta-code-update-eqiad build #155084: 04STILL FAILING in 8.9 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155084/ [12:03:50] (03PS1) 10Hashar: Generic tests for new extensions [integration/config] - 10https://gerrit.wikimedia.org/r/353052 [12:13:05] Project beta-code-update-eqiad build #155085: 04STILL FAILING in 4.5 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155085/ [12:15:10] (03CR) 10Hashar: [C: 032] Generic tests for new extensions [integration/config] - 10https://gerrit.wikimedia.org/r/353052 (owner: 10Hashar) [12:16:09] (03Merged) 10jenkins-bot: Generic tests for new extensions [integration/config] - 10https://gerrit.wikimedia.org/r/353052 (owner: 10Hashar) [12:23:05] Project beta-code-update-eqiad build #155086: 04STILL FAILING in 4.6 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155086/ [12:33:09] Project beta-code-update-eqiad build #155087: 04STILL FAILING in 8.5 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155087/ [12:35:20] !log deployment-prep: git -C /srv/mediawiki-staging/php-master/extensions rm --cached SemanticFormsInputs [12:35:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:36:21] Yippee, build fixed! [12:36:21] Project beta-code-update-eqiad build #155088: 09FIXED in 41 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/155088/ [14:34:58] !log cherry pick gerrit/352582 to puppet master [14:35:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:39:29] 10Gerrit, 10SyntaxHighlight: Rename git repo to "SyntaxHighlight" - https://phabricator.wikimedia.org/T103614#1411929 (10TheDJ) [14:45:49] PROBLEM - Puppet errors on deployment-eventlogging03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [14:46:04] just fixed it --^ [14:49:30] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [14:55:51] RECOVERY - Puppet errors on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [15:05:28] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:08:11] PROBLEM - Puppet errors on deployment-aqs02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:09:19] PROBLEM - Puppet errors on deployment-cache-upload04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [15:15:12] PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:26:09] !log refresh cherry pick gerrit/352582 on puppet master (rebase -i to remove, then cherry pick) [15:26:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:50:23] (03CR) 10Umherirrender: "Thanks for the big change" [integration/config] - 10https://gerrit.wikimedia.org/r/352347 (owner: 10Umherirrender) [15:53:38] Yippee, build fixed! [15:53:39] Project selenium-MobileFrontend » firefox,beta,Linux,BrowserTests build #419: 09FIXED in 31 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/419/ [16:11:50] PROBLEM - Puppet errors on deployment-eventlogging03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:15:47] 10Deployment-Systems, 10Scap (Scap3-MediaWiki-MVP), 06Operations, 13Patch-For-Review, 15User-Joe: Install conftool on deployment masters - https://phabricator.wikimedia.org/T163565#3252249 (10demon) Oh, duh other total obvious usecase I forgot: being able to pull our target list from etcd directly, inste... [17:05:40] PROBLEM - Puppet errors on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:05:44] PROBLEM - Puppet errors on swift is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:07:18] PROBLEM - Puppet errors on swift-storage-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:41:12] marxarelli: would syncing up about blubber and service-runner tomorrow around 9:30 PDT work for you? [17:41:20] if it's too early, we can do it in the afternoon [17:41:33] mobrovac: works for me! [17:41:58] kk [17:42:02] mobrovac: alarm set :) [17:42:19] :))))) [17:42:40] (03CR) 10Umherirrender: "In my opinion three repos now missing composer jobs" (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/352160 (https://phabricator.wikimedia.org/T161895) (owner: 10Hashar) [17:44:21] 10MediaWiki-Releasing, 05MW-1.27-release, 05MW-1.28-release: Patch for 1.27.3/1.28.2 missing - https://phabricator.wikimedia.org/T164470#3252632 (10demon) 05Open>03Resolved a:03demon Created patches: == 1.27 with signature == * https://releases.wikimedia.org/mediawiki/1.27/syntaxhighlightgeshi-1.27.3.... [17:51:51] 10Gerrit, 06Release-Engineering-Team: Support redis as a cache store - https://phabricator.wikimedia.org/T152802#3252691 (10demon) a:05demon>03None Not actively working on this right now. [17:57:57] (03PS2) 10Chad: Make-release: Drop stupid remote update code path [tools/release] - 10https://gerrit.wikimedia.org/r/352894 [17:58:09] !log deployment-tin: deleting puppet lock file (claimed it was running but also didnt run since > 900 min), looking at fixing deployment::server role name change [17:58:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:59:36] (03CR) 10Chad: [C: 032] Make-release: Drop stupid remote update code path [tools/release] - 10https://gerrit.wikimedia.org/r/352894 (owner: 10Chad) [18:00:53] (03Merged) 10jenkins-bot: Make-release: Drop stupid remote update code path [tools/release] - 10https://gerrit.wikimedia.org/r/352894 (owner: 10Chad) [18:01:29] (03PS2) 10Chad: make-release: Remove --gitroot, it's dumb [tools/release] - 10https://gerrit.wikimedia.org/r/352896 [18:01:58] (03Abandoned) 10Chad: make-release: Collapse the different versions into 1.23+ [tools/release] - 10https://gerrit.wikimedia.org/r/352904 (owner: 10Chad) [18:05:54] !log deployment-tin: configure to use role::deployment_server (instead of deployment::server), for some reason now Horizon shows _nothing_ under "other classes" where this was before [18:05:57] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:08:39] (03CR) 10Chad: [C: 032] make-release: Remove --gitroot, it's dumb [tools/release] - 10https://gerrit.wikimedia.org/r/352896 (owner: 10Chad) [18:11:46] (03Merged) 10jenkins-bot: make-release: Remove --gitroot, it's dumb [tools/release] - 10https://gerrit.wikimedia.org/r/352896 (owner: 10Chad) [18:12:09] !log deployment-tin: puppet run now ok, except ":Upload/File[/var/lib/releases/.ssh/id_rsa.bromine.eqiad.wmnet]: Could not evaluate:" this should be an unrelated issue [18:12:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:26:56] (03PS4) 10Chad: Simplify config, remove additions/removal upto REL1_23 [tools/release] - 10https://gerrit.wikimedia.org/r/343589 (owner: 10Reedy) [18:28:38] !log deployment-mira: configure puppet config in horizon, remove "role::deployment::server", use correct new name "role::deployment_server" (moved to profile). (a bit tricky because then in Horizon it seems to disappear from the "others" section, but if you click the "all" tab you get to see the class names [18:28:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:32:09] !log deployment-tin/mira: the change of the role class name was because of https://gerrit.wikimedia.org/r/#/c/344728/ which moved deployment::server to profile/role structure. both instances configured accordingly now. the remaining issue with "id_rsa.bromine" should be all unrelated [18:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:05:07] 10Continuous-Integration-Infrastructure: Allow use of ext-gmp on CI for composer tests - https://phabricator.wikimedia.org/T164977#3252974 (10Umherirrender) [19:20:09] (03CR) 10Umherirrender: "mediawiki/extensions/Wikidata needs a test entry now - https://gerrit.wikimedia.org/r/#/c/353028/" [integration/config] - 10https://gerrit.wikimedia.org/r/352160 (https://phabricator.wikimedia.org/T161895) (owner: 10Hashar) [19:20:37] 06Release-Engineering-Team, 10Phabricator, 06Project-Admins, 05Release: Decide whether to continue using deployment blocker tasks or combine them with the release milestones - https://phabricator.wikimedia.org/T164978#3253016 (10mmodell) [19:21:34] 06Release-Engineering-Team, 10Phabricator, 06Project-Admins, 05Release: Decide whether to continue using deployment blocker tasks or combine them with the release milestones - https://phabricator.wikimedia.org/T164978#3253031 (10mmodell) [19:23:50] 06Release-Engineering-Team, 10Phabricator, 06Project-Admins, 05Release: Decide whether to continue using deployment blocker tasks or combine them with the release milestones - https://phabricator.wikimedia.org/T164978#3253016 (10mmodell) p:05Triage>03Normal [19:25:09] 06Release-Engineering-Team, 10Phabricator, 06Project-Admins, 05Release: Decide whether to continue using deployment blocker tasks or combine them with the release milestones - https://phabricator.wikimedia.org/T164978#3253016 (10Zppix) I personally believe that Task is better it has a status option, whethe... [19:26:11] 10Scap: scap should always announce when it starts changing the cluster state - https://phabricator.wikimedia.org/T164980#3253066 (10thcipriani) [19:27:39] 10Scap: scap should always announce when it halts a sync due to error rate - https://phabricator.wikimedia.org/T164981#3253080 (10thcipriani) [19:28:01] 10Scap: scap should always announce when it halts a sync due to error rate - https://phabricator.wikimedia.org/T164981#3253096 (10thcipriani) p:05Triage>03Normal [19:29:58] 10Scap: scap did not catch `Notice: Undefined variable: wmgRelatedArticlesShowInSidebar in /srv/mediawiki/wmf-config/CommonSettings.php on line 2893` - https://phabricator.wikimedia.org/T164754#3253102 (10mmodell) >>! In T164754#3250270, @bd808 wrote: > Recommendations: > * scap should always `announce` when it... [19:31:26] 10Scap: scap: investigate adding a per-deploy message ID to log messages - https://phabricator.wikimedia.org/T164982#3253106 (10thcipriani) [19:33:16] twentyafterfour: apology unneeded, but accepted. I was in a rabbit hunting mood yesterday [19:34:17] bd808: nevertheless, good work. It had both thcipriani and myself scratching our heads for a a while... [19:34:31] I also discovered that my python script for looking at scap's logs didn't make it to mwlog1001 from fluorine. I need to write a new one I guess [19:35:09] yeah. when I found the sync starting and then no finish for a really long time I got interested. [19:36:12] 10Scap: scap did not catch `Notice: Undefined variable: wmgRelatedArticlesShowInSidebar in /srv/mediawiki/wmf-config/CommonSettings.php on line 2893` - https://phabricator.wikimedia.org/T164754#3253127 (10thcipriani) 05Open>03Invalid @bd808 thank you for the masterful grepping! I was really confused about ho... [19:37:11] bd808: +1 thanks for the assist, I was very confused about what was going on with logs in that instance :) [19:37:26] logs are sneaky beasts [19:38:18] logstash is useful for many things, but digging through the mass of messages that come out of a scap can be hard there [19:38:55] this is truth, also hard on the laptop fans :) [19:39:16] I had so many logstash tabs open yesterday [19:40:12] somehow elastic.co keeps making kibana less useful for this kind of forensic dive [19:40:38] *shrug* I guess they are making nicer visualizations at the same time [19:40:41] looks like bubblegum now, so it's got that going for it. [19:40:53] Miami Vice! [19:40:58] :) [19:41:20] I keep waiting for Crocket and Tubbs to pop up as a clippy [19:41:52] lol [19:42:25] adding a uuid to the messages would make kibana diving easier for sure [19:42:26] those colors clash badly with the dark theme [19:42:37] bd808: yeah that's a great idea [19:43:03] and scap really should alert in a few more places, it's silent in a couple of places where it should be noisy [19:43:05] scap message definitely don't visualize very usefully it seems, much easier to grep on mwlog1001 it seems. The uuid one is definitely one to explore. Unclear at this point how hard it would be to add, but it's in the backlog. [19:43:30] I *think* it would be trivial [19:43:35] thcipriani: seems like it would be easy - just generate a random id at the start of each scap run and use it in the logger [19:43:35] at least per-host [19:43:54] we could pass the id from tin to the targets so that they all have a shared id [19:44:25] yeah. that bit has more moving parts. but we do have global cli args so probably not too horrible [19:44:55] I'll leave it to the scap experts though. I'm just a manager now [19:44:57] ;) [19:45:36] how convenient :P [19:46:44] the error rate explosion message probably got lost in a stack-trace in this instance, I'm going to add an announce there, that should be easy and probably would have caught this specific case. Other tasks from this are all in the backlog now. [19:50:31] uhm, hi [19:50:40] so i'm trying to fix puppet on deployment-mira/tin [19:50:52] and i did but then there was another eror [19:51:04] and then i thought i found how to fix that.. but it doesnt fix it :p [19:51:30] i have now: Could not retrieve information from environment production source(s) puppet:///private/releases/id_rsa.upload [19:51:42] 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3253181 (10mmodell) [19:51:43] and so i added a fake key in labs/private, totally expecting to fix that [19:51:54] but .. after https://gerrit.wikimedia.org/r/#/c/353160/ it still does not find them [19:52:31] has the deployment-prep puppetmaster caught that update? [19:53:02] i am not sure, what i did was upload to labs/private in gerrit and merge that [19:53:05] but nothing else [19:53:29] should i just wait a moment [19:53:31] Dosen't it take a while for puppetmaster to get caught up with updates? [19:53:31] "b0ca8f6 add private/files/releases/id_rsa.upload FAKE secret key" its there [19:54:17] 06Release-Engineering-Team (Deployment-Blockers), 05MW-1.30-release-notes (WMF-deploy-2017-05-09_(1.30.0-wmf.1)), 05Release: MW-1.30.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T162954#3181337 (10mmodell) [19:54:29] running puppet again on deployment-mira [19:54:43] yay, there it is [19:54:49] all it was was short delay. cool [19:54:59] that update runs on a cron [19:55:07] aha, gotcha [19:55:13] I think it only happens every 10 minutes or something like that [19:56:12] such memory: */10 * * * * /usr/local/bin/git-sync-upstream [19:56:22] so fyi, i moved the deployment server role to a profile/role structure in prod. that included a rename of the role::deployment::server to role::deployment_server. so i went to Horizon and changed the instance config to reflect that [19:56:35] ugh. [19:56:36] nice! [19:56:38] it was a little weird because before it was in "other classes" [19:56:44] in Horizon web ui [19:56:48] or at least, nice that you updated it to match prod [19:56:53] ok. that will break my striker project :) [19:57:04] and then i edited it and it disappeared from "other" [19:57:08] but it appears in "all" tab [19:57:17] i dont know why that is but paladox showed me [19:57:34] then after that was fixed, i saw the issue with the id_rsa [19:57:56] so it was also broken unrelatedly from reprepro::upload. then added the fake key as above [19:58:15] now there is one more thing left. that is "IPv6" because we dont have it in labs [19:58:33] someday! [19:58:34] bd808 does your project use labs or is it on a prod machine. Reason i ask is you will most likly hit an ipv6 error if you update the class to the one mutante said. You will need https://gerrit.wikimedia.org/r/#/c/353095/ to fix the ipv6 problem. [19:59:09] bd808: sorry, if i still had "watroles" i could have checked what else uses it [19:59:37] when doing the profile conversion in review i was told to just use the underscore and one level [19:59:42] yeah. I think recreating watroles may be a hackathon project for the cloud services team :) [19:59:49] so that caused the change in role name [19:59:58] not so much the fact that we have a profile now [20:00:18] * bd808 is still not sold on the profile thing [20:00:27] and yea, it is more like prod but it still also includes other roles, not _just_ deployment server [20:00:44] so it's not really the same [20:01:27] but getting there step by step i guess [20:03:14] now how to fix/skip the IPv6 thing without an "if $realm" .. [20:04:29] mutante https://gerrit.wikimedia.org/r/#/c/353095/ [20:04:49] thats the fix for labs [20:05:07] !:) [20:05:43] i see... eh, yea. i moved some of those the toher direction [20:06:15] yep [20:07:18] 06Release-Engineering-Team, 10Phabricator, 06Project-Admins, 05Release: Decide whether to continue using deployment blocker tasks or combine them with the release milestones - https://phabricator.wikimedia.org/T164978#3253236 (10Jdforrester-WMF) I'm content to keep the current behaviour. I'd worry about se... [20:14:22] compiling that, will merge it [20:19:42] uhm http://puppet-compiler.wmflabs.org/6378/radium.wikimedia.org/ [20:19:55] was hoping for no-op [20:22:18] mutante also note it depends on other changes [20:22:50] depends on https://gerrit.wikimedia.org/r/#/c/353095/ and https://gerrit.wikimedia.org/r/#/c/345568/5 and https://gerrit.wikimedia.org/r/#/c/350767/4 [20:23:46] i was gonna say, yea he is working on refactoring the whole IPv6 thing [20:24:54] yep [20:27:46] i'll try to fix only for the deployment_server role while leaving the other ones untouched [20:28:34] !log Update mobileapps to 75b135e [20:28:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:30:09] ok [20:30:50] twentyafterfour: regarding https://phabricator.wikimedia.org/T164984, I think I know why it's happening but I can't find it in logstash [20:39:38] fixed puppet on eventlog03 sorry [20:43:06] Amir1: I rolled back so you won't find it currently in logstash [20:43:21] twentyafterfour: I checked the last 24 hours [20:43:27] and still wasn't able to see it [20:43:30] hmm [20:44:32] Amir1: https://logstash.wikimedia.org/goto/1a9fe4a00a42c129edd7ff651a034b1b [20:45:04] Thanks [20:51:49] RECOVERY - Puppet errors on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:53:22] bd808, thcipriani - if we have scap !log the uniqueid then we could use that to jump directly from the SAL to a kibana dashboard for a given deployment [20:54:47] would be dope [20:59:44] paladox: same thing but as real no-op http://puppet-compiler.wmflabs.org/6380/ [21:00:04] ok [21:00:08] thanks :) [21:04:25] runs puppet on all deployment servers, prod and labs [21:04:40] prod is unaffected [21:06:10] same as earlier, just gotta wait for cron [21:06:28] ok [21:10:42] 10Scap: scap should always announce when it halts a sync due to error rate - https://phabricator.wikimedia.org/T164981#3253454 (10thcipriani) [21:16:23] shinken-wm: tell us about the good news please? [21:16:33] paladox: ^ it _should_ recover i say [21:16:50] Ok [21:16:51] thanks [21:16:53] deployment-tin/mira both finished without errors now :) [21:17:10] still shows [21:17:10] deployment-mira Puppet errors CRITICAL19h 22m CRITICAL: 90.00% of data above the critical threshold [0.0] [21:17:35] ah that percentage. yea, 90% is not 100% i guess [21:18:06] it uses graphite data to check afaict [21:18:18] oh [21:23:31] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [21:23:43] mutante ^^ :) [21:24:05] twentyafterfour: bd808 ^ all fixed, what i broke and more [21:24:25] nice [21:25:00] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:58] heh @ https://gerrit.wikimedia.org/r/#/c/353094/2/modules/profile/manifests/mediawiki/deployment/server.pp [22:08:21] surprised it didnt break in other ways [22:15:55] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Release-Engineering-Team, 07Jenkins: Upgrade jenkins server and jenkins slaves to java 8 - https://phabricator.wikimedia.org/T162828#3253620 (10Paladox) @hashar it seems openjdk 8 is available on apt.wikimedia.org per T121020 and h... [22:54:11] 22:27:27 oojs/oojs-ui: 0.21.2 installed, 0.21.3 required. [22:54:12] 22:27:27 Error: your composer.lock file is not up to date. Run "composer update" to install newer dependencies [22:54:33] what would cause that? my change doesn't touch anything related to oojs-ui [22:54:46] https://integration.wikimedia.org/ci/job/mediawiki-extensions-php55-trusty/3447/console [22:58:52] twentyafterfour https://gerrit.wikimedia.org/r/#/q/owner:%22VolkerE+%253Cvolker.e%2540wikimedia.org%253E%22 [22:58:54] does it happen on all changes? it sounds like it probably does [22:59:03] they did a update a around 2pm utc +1 [23:00:33] looks like it was transient, recheck went through cleanly [23:01:14] ok :)