[06:09:40] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<55.56%) [06:49:40] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:52:25] (03PS1) 10Hashar: rcstream has been removed [integration/config] - 10https://gerrit.wikimedia.org/r/361413 [07:56:35] (03CR) 10Hashar: [C: 032] rcstream has been removed [integration/config] - 10https://gerrit.wikimedia.org/r/361413 (owner: 10Hashar) [07:58:39] (03Merged) 10jenkins-bot: rcstream has been removed [integration/config] - 10https://gerrit.wikimedia.org/r/361413 (owner: 10Hashar) [08:18:10] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-Newsletter, 10Patch-For-Review, 10Wikimedia-Hackathon-2016: Add the Newsletter extension to the Beta Cluster - https://phabricator.wikimedia.org/T127297#3377761 (10Bawolff) [08:47:22] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:19:21] !log gerrit: marked wikimedia/bugzilla/* repos read-only [09:19:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:22:21] RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:49:48] 10Beta-Cluster-Infrastructure, 10Scap (Scap3-Adoption-Phase1), 10scap2, 10Analytics, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#3378114 (10hashar) Note: puppet is disabled on `deployment-aqs01` since June 8th though there is no reason given. The last Puppet run was a... [10:13:00] 10Beta-Cluster-Infrastructure, 10Scap (Scap3-Adoption-Phase1), 10scap2, 10Analytics, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#3378202 (10elukey) >>! In T116206#3378114, @hashar wrote: > Note: puppet is disabled on `deployment-aqs01` since June 8th though there is no reas... [10:23:43] RECOVERY - Puppet staleness on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [3600.0] [10:31:03] elukey: deployment-aqs01 now has an happy puppet ^^^ Thank you! [10:32:46] sorry for the lag, forgot about it :( [10:52:22] elukey: no worries :] [12:55:56] 10Gerrit, 10Upstream: Change Gerrit's default indentation to tabs - https://phabricator.wikimedia.org/T168590#3378733 (10Paladox) [13:11:59] 10Beta-Cluster-Infrastructure, 10Deployment-Systems, 10Release-Engineering-Team (Next): deployment-imagescaler01 has no mwdeploy user - https://phabricator.wikimedia.org/T166013#3378764 (10hashar) Apparently that comes from the cherry picked patch: **Add 3d2png deploy repo to image scalers** https://gerrit.w... [14:02:07] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Labs, 10Labs-Infrastructure, and 2 others: Lower rate of Nodepool requests to OpenStack API - https://phabricator.wikimedia.org/T167803#3344784 (10hashar) a:03hashar [14:04:04] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Labs, 10Labs-Infrastructure, and 2 others: Lower rate of Nodepool requests to OpenStack API - https://phabricator.wikimedia.org/T167803#3378947 (10hashar) Keeping it open for monitoring. The OpenStack API might be struggling wi... [14:55:41] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:14:23] 10Gerrit, 10Differential, 10Repository-Admins: Separate Diffusion's rANWS from gerrit's analytics-wikistats - https://phabricator.wikimedia.org/T168549#3379216 (10Milimetric) No, we don't want to reverse the replication, the gerrit one is still semi-active. This is a tough situation. The new code is going... [15:16:03] 10Gerrit, 10Differential, 10Repository-Admins: Separate Diffusion's rANWS from gerrit's analytics-wikistats - https://phabricator.wikimedia.org/T168549#3379233 (10Milimetric) or we could make a new one called ANW2 if that's better for you [15:25:47] (03PS1) 10Hashar: Remove packages from Trusty instances [integration/config] - 10https://gerrit.wikimedia.org/r/361469 [15:33:22] (03CR) 10Hashar: [C: 032] Remove packages from Trusty instances [integration/config] - 10https://gerrit.wikimedia.org/r/361469 (owner: 10Hashar) [15:34:18] !log Rebuilding nodepool image for trusty and regenerating snapshots [15:34:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:34:51] (03Merged) 10jenkins-bot: Remove packages from Trusty instances [integration/config] - 10https://gerrit.wikimedia.org/r/361469 (owner: 10Hashar) [15:35:40] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [15:41:11] !log Image snapshot-ci-trusty-1498491445 in wmflabs-eqiad is ready [15:41:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:41:48] 10Release-Engineering-Team, 10Jenkins: Upgrade jenkins to 2.60.1 (new lts release) - https://phabricator.wikimedia.org/T168644#3379337 (10Paladox) [15:41:59] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog), 10Jenkins, 10Patch-For-Review: Upgrade jenkins server and jenkins slaves to java 8 - https://phabricator.wikimedia.org/T162828#3379339 (10Paladox) [15:45:31] Yippee, build fixed! [15:45:32] Project selenium-MobileFrontend ยป chrome,beta,Linux,BrowserTests build #468: 09FIXED in 23 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/468/ [16:26:09] 10Release-Engineering-Team, 10Jenkins: Upgrade jenkins to 2.60.1 (new lts release) - https://phabricator.wikimedia.org/T168644#3379614 (10Nuria) [16:26:11] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Backlog), 10Jenkins, 10Patch-For-Review: Upgrade jenkins server and jenkins slaves to java 8 - https://phabricator.wikimedia.org/T162828#3379615 (10Nuria) [16:32:16] (03Draft1) 10Paladox: Migrate analytics tests from java 7 to java 8 [integration/config] - 10https://gerrit.wikimedia.org/r/361482 (https://phabricator.wikimedia.org/T168644) [16:32:19] (03PS2) 10Paladox: Migrate analytics tests from java 7 to java 8 [integration/config] - 10https://gerrit.wikimedia.org/r/361482 (https://phabricator.wikimedia.org/T168644) [16:38:44] (03PS3) 10Paladox: Migrate analytics tests from java 7 to java 8 [integration/config] - 10https://gerrit.wikimedia.org/r/361482 (https://phabricator.wikimedia.org/T168644) [16:40:52] 10Release-Engineering-Team (Kanban), 10Phabricator: Exception when doing Phabricator search - https://phabricator.wikimedia.org/T168556#3379719 (10mmodell) I've disabled search on the new custom fields, for now, to work around this problem. [16:41:11] 10Release-Engineering-Team (Kanban), 10Phabricator: Exception when doing Phabricator search - https://phabricator.wikimedia.org/T168556#3379722 (10mmodell) p:05High>03Normal [16:41:52] 10Release-Engineering-Team (Kanban), 10Phabricator (Upstream), 10Upstream: Exception when doing Phabricator search - https://phabricator.wikimedia.org/T168556#3368238 (10mmodell) [16:44:31] 10Release-Engineering-Team (Kanban), 10MediaWiki-General-or-Unknown, 10MW-1.28-release (WMF-deploy-2016-10-25_(1.28.0-wmf.23)), 10Patch-For-Review, 10Wikimedia-log-errors: logspam from mediawiki - LoadBalancer::{closure}: found writes/callbacks pending. - https://phabricator.wikimedia.org/T149353#3379737 (... [17:10:23] (03CR) 10Nuria: "I run tests locally with java8 on Mac so I do not think there are any problems with this switch." [integration/config] - 10https://gerrit.wikimedia.org/r/361482 (https://phabricator.wikimedia.org/T168644) (owner: 10Paladox) [17:10:41] (03CR) 10Nuria: [C: 031] Migrate analytics tests from java 7 to java 8 [integration/config] - 10https://gerrit.wikimedia.org/r/361482 (https://phabricator.wikimedia.org/T168644) (owner: 10Paladox) [17:39:13] 10Release-Engineering-Team (Kanban), 10MW-1.30-release-notes (WMF-deploy-2017-06-27_(1.30.0-wmf.7)), 10Patch-For-Review, 10Release, 10Train Deployments: MW-1.30.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T167535#3379951 (10mmodell) [17:58:44] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: MW-1.30.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T167536#3380040 (10mmodell) [17:58:47] 10Release-Engineering-Team (Kanban), 10MW-1.30-release-notes (WMF-deploy-2017-06-27_(1.30.0-wmf.7)), 10Patch-For-Review, 10Release, 10Train Deployments: MW-1.30.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T167535#3380041 (10mmodell) [18:02:52] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: MW-1.30.0-wmf.7 deployment blockers - https://phabricator.wikimedia.org/T167536#3380059 (10mmodell) [18:06:31] hashar: nodejs is missing on new images? Core commits failing [18:08:11] Krinkle which changes? [18:08:20] Are they using trusty? [18:08:52] ripts/bin/mw-fetch-composer-dev.sh: line 11: node: command not found [18:08:53] Yes [18:08:59] thanks [18:09:07] hashar did some cleanup work with trusty today [18:09:10] All mw jobs on trusty are broken [18:09:15] Krinkle https://gerrit.wikimedia.org/r/361469 [18:10:26] twentyafterfour: can we revisit the issue of comments-on-phame? It sounded like you understand what was happening but the discussion trailed off... [18:10:35] Or I can make a bug if there isn't one already [18:10:42] (03PS1) 10Paladox: Revert "Remove packages from Trusty instances" [integration/config] - 10https://gerrit.wikimedia.org/r/361495 [18:10:42] Please revert the image ASAP. It was intended as no-op so rollback is trivial. [18:11:18] Even if that is merged it requires someone to build the image. [18:11:24] andrewbogott: The issue is that to allow comments we have to allow editing of the blog [18:11:39] I thought I fixed it already [18:11:46] Yes, or better yes, restore previous image. No need to revert perse (can do later) [18:12:07] twentyafterfour: maybe you did, I'm just trying to follow up without really knowing anything. [18:12:29] What did 'fixing it' consist of? Presumably not allowing anyone to edit... [18:12:45] twentyafterfour: RainbowSprinkles do. either if you have access to nodepool? Needs to be fixed as noting can be tested/merged right now [18:13:08] Krinkle: I have access but could use more info about what's happening [18:13:28] andrewbogott the tests are failing with node command [18:13:37] + /srv/deployment/integration/slave-scripts/bin/mw-fetch-composer-dev.sh [18:13:37] 17:34:30 /srv/deployment/integration/slave-scripts/bin/mw-fetch-composer-dev.sh: line 11: node: command not found [18:13:46] andrewbogott: hasher pushes a meant-to-be-noop commit to ci config [18:14:06] Removes some packages from the puppet config [18:14:25] But was not well tested and there are jobs breaking now, all mw jobs basically [18:14:30] Just revert and rebuild [18:15:06] ok, I don't know what 'build' consists of but I'll start with the revert [18:15:26] andrewbogott: should be fixed, I just checked all of the blogs and the only one that I can't fix is https://phabricator.wikimedia.org/phame/blog/view/7/ because I don't have access [18:15:30] oh, hm, I dont' have +2 on that repo [18:15:42] twentyafterfour: great! I will stop worrying then, thank you [18:16:02] Krinkle: access to nodepool in what sense [18:16:13] I can merge the commit. It i don't have access nor know-how to rebuild a nodepool image snapshot and to promote it somehow [18:16:22] See RelEng admin log [18:16:34] Krinkle: I've done it once before but it's been a while [18:16:38] Doesn't include the commands. But I assume it is documented somewhere [18:17:43] https://wikitech.wikimedia.org/wiki/Nodepool [18:17:57] (03PS1) 10Krinkle: Revert "Remove packages from Trusty instances" [integration/config] - 10https://gerrit.wikimedia.org/r/361496 [18:18:04] (03CR) 10Krinkle: [C: 032] Revert "Remove packages from Trusty instances" [integration/config] - 10https://gerrit.wikimedia.org/r/361496 (owner: 10Krinkle) [18:18:08] "Diskimage building requires root access hence they are build manually on labs instance, uploaded to labnodepool1001" [18:18:13] (03Abandoned) 10Krinkle: Revert "Remove packages from Trusty instances" [integration/config] - 10https://gerrit.wikimedia.org/r/361495 (owner: 10Paladox) [18:18:27] https://wikitech.wikimedia.org/wiki/Nodepool#Diskimage [18:18:51] This is for the image of ci-trusty [18:19:14] same steps apply for trusty [18:19:22] ok I see an old trusty image [18:19:28] just those are documented for jessie, but can be applied for trusty. [18:19:28] image-ci-trusty-old_20170626 [18:19:29] (03Merged) 10jenkins-bot: Revert "Remove packages from Trusty instances" [integration/config] - 10https://gerrit.wikimedia.org/r/361496 (owner: 10Krinkle) [18:22:10] Krinkle: ok I think that I got it [18:22:27] I don't think images are still being build on seemingly random labs project in unrelated projects. If I recall correctly, Antoine polished the process to use a decicated snapshot host for it. Probably in contintcloud or in integration project somewhere. [18:22:33] !log reverted nodepool image-ci-trusty to previous version 'image-ci-trusty-old_20170626' [18:22:34] But Yeah, the old image should still be there, which is easier to restore [18:22:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:23:26] !log renamed previously active image to 'image-ci-trusty_bad_20170626' [18:23:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:23:40] Thanks. Testing now.. [18:30:26] twentyafterfour: Hm.. hard to tell if this failure was from a standby instance from earlier or not. [18:30:28] https://integration.wikimedia.org/ci/job/mediawiki-phpunit-php55-trusty/5195/console [18:30:34] This started 1 min ago, and still fails. [18:30:41] Let's wait a few more minutes for other instances [18:34:31] yep, this time before the instance I also saw a new data point for "Launch time" at https://grafana.wikimedia.org/dashboard/db/nodepool?orgId=1&from=now-1h&to=now which means this will have been a fresh one [18:35:03] .. and it still failed [18:36:16] OK. One more, and then I'll give up on this one. [18:36:25] hello [18:37:17] 00:00:18.868 /srv/deployment/integration/slave-scripts/bin/mw-fetch-composer-dev.sh: line 11: node: command not found [18:37:18] grbmbmbl [18:37:23] I forgot about that script :((( [18:37:34] hashar: also oojs-ui has a trusty job that needs node [18:37:46] hashar: twentyafterfour tried to restore the _old image, but seeems like it didn't work [18:37:50] still failing [18:38:03] I did merge the revert for your commit, so I guess it will eventually rebuild automatically, but don't know when that happens. [18:38:09] one has to purge the instance that got spawned with the previous image [18:38:11] I'll let you take over :) - Good luck. [18:38:23] thanks Krinkle! and sorry for the mess :( [18:39:33] twentyafterfour: so seem you ahve renamed the base image which is great [18:39:50] the next step is to recreate the snapshot [18:40:22] nodepool$ nodepool image-update wmflabs-eqiad snapshot-ci-trusty [18:44:10] !log nodepool image-delete 1636 # Deletes snapshot-ci-trusty-1498491445 which lack nodejs when we still need it. [18:44:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:50:54] twentyafterfour: Krinkle: should be good now [19:22:48] thanks hashar [19:29:38] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Regression, 10Upstream: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#3380242 (10Paladox) @demon would you be able to have a look in the table to see if it's like the other please? So that i can bring the findings up... [19:40:25] twentyafterfour: I should have double checked that one after deployment :/ [20:08:01] PROBLEM - Puppet errors on jenkinstest is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0] [20:18:00] RECOVERY - Puppet errors on jenkinstest is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:38] !log Update mobileapps to 0b05026 [20:33:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:45:31] 10Release-Engineering-Team (Kanban), 10Release Pipeline (Blubber): Support environment variables in configuration - https://phabricator.wikimedia.org/T168425#3380440 (10dduvall) 05Open>03Resolved [20:58:38] (03PS3) 10Krinkle: Publish WebPageTest job status to wikimedia-perf-bots [integration/config] - 10https://gerrit.wikimedia.org/r/360875 (https://phabricator.wikimedia.org/T126216) (owner: 10Phedenskog) [20:59:23] (03PS4) 10Krinkle: Publish WebPageTest job status to wikimedia-perf-bots [integration/config] - 10https://gerrit.wikimedia.org/r/360875 (https://phabricator.wikimedia.org/T126216) (owner: 10Phedenskog) [21:00:28] (03CR) 10Krinkle: [C: 032] "$ jenkins-jobs test config/ -o output/" [integration/config] - 10https://gerrit.wikimedia.org/r/360875 (https://phabricator.wikimedia.org/T126216) (owner: 10Phedenskog) [21:02:30] (03Merged) 10jenkins-bot: Publish WebPageTest job status to wikimedia-perf-bots [integration/config] - 10https://gerrit.wikimedia.org/r/360875 (https://phabricator.wikimedia.org/T126216) (owner: 10Phedenskog) [21:04:07] PROBLEM - Free space - all mounts on deployment-tin is CRITICAL: CRITICAL: deployment-prep.deployment-tin.diskspace._mnt.byte_percentfree (No valid datapoints found)deployment-prep.deployment-tin.diskspace._srv.byte_percentfree (<33.33%) [21:27:32] 10Release-Engineering-Team (Watching / External), 10Fundraising-Backlog, 10Spike: Spike: decide how payments-wiki deployment process should relate to the core MW deployment process - https://phabricator.wikimedia.org/T130658#3380531 (10Jgreen) [21:49:54] 10Continuous-Integration-Config, 10Wikipedia-Android-App-Backlog, 10Technical-Debt: Add support to periodic CI tests for exercising arbitrary revisions - https://phabricator.wikimedia.org/T152455#3380602 (10Mholloway) [22:20:02] !log deploying ores-prod-deploy:82dfd56 to beta [22:20:05] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:24:17] !log deploying ores-prod-deploy:82dfd56 to beta (note: T168099) [22:24:20] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:24:20] T168099: Mid June 2017 ORES deployment - https://phabricator.wikimedia.org/T168099