[01:14:25] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:14:55] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:16:09] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [01:49:19] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [01:51:07] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [01:54:52] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [03:58:20] Yippee, build fixed! [03:58:21] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,BrowserTests build #560: 09FIXED in 2 min 19 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=BrowserTests/560/ [04:16:26] Yippee, build fixed! [04:16:26] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #560: 09FIXED in 20 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/560/ [05:01:20] PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [05:41:20] RECOVERY - Puppet errors on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [05:46:13] (03CR) 10Legoktm: Introduce ci-src-setup-simple (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/386259 (owner: 10Legoktm) [05:51:01] (03PS2) 10Legoktm: Introduce ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/386259 [06:07:32] (03PS3) 10Legoktm: Introduce ci-src-setup-simple and use it in composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/386259 [06:26:27] (03PS4) 10Legoktm: Introduce ci-src-setup-simple and use it in composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/386259 [06:28:41] (03CR) 10Legoktm: Introduce ci-src-setup-simple and use it in composer-test (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/386259 (owner: 10Legoktm) [06:29:44] (03CR) 10Legoktm: [C: 032] Introduce ci-src-setup-simple and use it in composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/386259 (owner: 10Legoktm) [06:31:37] (03Merged) 10jenkins-bot: Introduce ci-src-setup-simple and use it in composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/386259 (owner: 10Legoktm) [06:52:16] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:53:00] (03PS1) 10Legoktm: Convert mediawiki-phpcs to use ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/386975 [06:53:35] (03CR) 10Legoktm: [C: 032] Convert mediawiki-phpcs to use ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/386975 (owner: 10Legoktm) [06:54:41] (03Merged) 10jenkins-bot: Convert mediawiki-phpcs to use ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/386975 (owner: 10Legoktm) [07:01:30] 10Continuous-Integration-Config, 10Release-Engineering-Team (Watching / External), 10Discovery, 10Discovery-Analysis (Current work), 10Patch-For-Review: Add lint/CI to all wikimedia/discovery analytics repositories - https://phabricator.wikimedia.org/T153856#3717381 (10Legoktm) @mpopov ping? [07:08:45] (03PS1) 10Legoktm: Convert composer-package to use ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/386976 [07:11:16] (03PS1) 10Legoktm: Move composer-package-php70-docker out of experimental (try #2) [integration/config] - 10https://gerrit.wikimedia.org/r/386977 (https://phabricator.wikimedia.org/T144961) [07:13:47] (03CR) 10Legoktm: [C: 032] Convert composer-package to use ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/386976 (owner: 10Legoktm) [07:14:08] (03Abandoned) 10Legoktm: composer-package: git clone inside the container [integration/config] - 10https://gerrit.wikimedia.org/r/383560 (https://phabricator.wikimedia.org/T144961) (owner: 10Hashar) [07:15:34] (03Merged) 10jenkins-bot: Convert composer-package to use ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/386976 (owner: 10Legoktm) [07:18:45] fabric appears to not support ssh-ed25519 keys [07:19:44] I guess I can deploy the old-school way? [07:28:41] (03CR) 10Legoktm: [C: 032] Move composer-package-php70-docker out of experimental (try #2) [integration/config] - 10https://gerrit.wikimedia.org/r/386977 (https://phabricator.wikimedia.org/T144961) (owner: 10Legoktm) [07:29:43] (03Merged) 10jenkins-bot: Move composer-package-php70-docker out of experimental (try #2) [integration/config] - 10https://gerrit.wikimedia.org/r/386977 (https://phabricator.wikimedia.org/T144961) (owner: 10Legoktm) [07:32:15] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:58:37] What happened to deployment-salt02? [07:59:50] oh, deployment-cumin. got it [08:02:30] ===== NODE GROUP ===== [08:02:30] (1) deployment-phab.deployment-prep.eqiad.wmflabs [08:02:33] ----- OUTPUT of 'id' ----- [08:02:38] Permission denied (publickey). [08:03:45] also root fails because wrong laptop [08:03:50] would someone mind updating my key at https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep ? [08:04:26] new one is at https://phabricator.wikimedia.org/P6209 [08:51:48] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown: Make sure extensions using composer/npm for development dependencies have the right .gitignore rules - https://phabricator.wikimedia.org/T116434#3717421 (10Umherirrender) 05Open>03Resolved a:03Umherirrender Should all done for now (in media... [09:10:24] !lot fixed puppet on deployment-kafka01 by installing ldap-utils [09:12:09] !log fixed puppet on deployment-kafka01 by installing ldap-utils [09:12:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:13:02] deployment-mx puppet failing I think due to trying to get a cert for deployment-mx.eqiad.wmflabs rather than the public name [09:22:20] RECOVERY - Puppet errors on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:50:14] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-General: Decide wether we want the package-lock.json to commit or ignore - https://phabricator.wikimedia.org/T179229#3717425 (10Umherirrender) [09:53:15] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [09:53:48] 10Continuous-Integration-Config, 10MinusX: Add MinusX to MediaWiki extensions and PHP library repos - https://phabricator.wikimedia.org/T175794#3717437 (10Umherirrender) [10:28:14] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:56:38] Krenair i thought you can edit that page? [11:50:34] 10Beta-Cluster-Infrastructure: Request for adminship on zh Beta Cluster - https://phabricator.wikimedia.org/T179233#3717555 (10A2093064) [12:22:42] Project selenium-GettingStarted » firefox,beta,Linux,BrowserTests build #569: 04FAILURE in 42 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/569/ [12:23:09] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:09:37] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-General: Decide wether we want the package-lock.json to commit or ignore - https://phabricator.wikimedia.org/T179229#3717640 (10Florian) Just from my experience: The npm team clearly states (in their documentation) that the file "is intended to be... [13:25:14] (03PS1) 10Mainframe98: Archive Automatic Board Welcome [integration/config] - 10https://gerrit.wikimedia.org/r/387019 (https://phabricator.wikimedia.org/T179196) [13:32:12] 10Beta-Cluster-Infrastructure, 10User-Luke081515: Request for adminship on zh Beta Cluster - https://phabricator.wikimedia.org/T179233#3717648 (10Luke081515) a:03Luke081515 [13:34:02] 10Beta-Cluster-Infrastructure, 10User-Luke081515: Request for adminship on zh Beta Cluster - https://phabricator.wikimedia.org/T179233#3717650 (10Luke081515) 05Open>03Resolved [13:35:01] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [14:10:02] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:25:47] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:48:29] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3717703 (10Aklapper) p:05Normal>03Low Nothing to be done in #Phabricator code if I understand it correctly, hence removing tag. [15:55:23] PROBLEM - Puppet errors on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:56:55] paladox, no [15:57:05] If I could, I wouldn't be asking for someone else to do it [15:57:13] Someone else could just put me in the projectadmin group so I could do it [15:57:32] ok [16:00:50] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [16:07:13] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3717711 (10hoo) Happening again, this time on `cp1055`. Example from `mw1180`: ``` $ s... [16:14:13] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3717712 (10hoo) Also on `mw1180`: ``` $ sudo -u www-data ss --tcp -r -p > ss $ cat ss |... [16:23:00] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3717713 (10Paladox) p:05Triage>03Unbreak! Spoke to hoo on irc, who agreed it's an UB... [16:24:55] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3717717 (10Paladox) @Aklapper it depends though. We may need a new conduit call for this as gerrit side chooses the size of images, whereas phab's side, it dosen't support tha... [18:26:04] PROBLEM - Free space - all mounts on integration-slave-jessie-1002 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1002.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1002.diskspace._srv.byte_percentfree (<100.00%) [18:55:07] 10Gerrit, 10Cloud-Services, 10Repository-Admins: Deactivate repository labs/invisible-unicorn - https://phabricator.wikimedia.org/T154099#2901197 (10MarcoAurelio) rLINU in Difussion has been marked as inactive and description amended accordingly. If there's anything left to do in Gerrit, I can't assist there. [19:45:06] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-General: Decide wether we want the package-lock.json to commit or ignore - https://phabricator.wikimedia.org/T179229#3717807 (10Legoktm) If we can use a lock file to pin versions instead of hardcoding in package.json that might be nice. But what h... [19:53:34] (03PS2) 10Legoktm: Add experimental npm-node-6-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/382946 [19:59:47] 10Release-Engineering-Team (Kanban), 10Scap (Tech Debt Sprint FY201718-Q2): Scap failing to rewrite submodule urls in beta - https://phabricator.wikimedia.org/T179013#3717824 (10mmodell) p:05High>03Normal It's unbroken now. Still work to do but at least beta deploys are no longer blocked. [20:01:51] (03CR) 10SamanthaNguyen: [C: 031] Archive Automatic Board Welcome [integration/config] - 10https://gerrit.wikimedia.org/r/387019 (https://phabricator.wikimedia.org/T179196) (owner: 10Mainframe98) [20:05:50] (03CR) 10Hashar: [C: 032] Archive Automatic Board Welcome [integration/config] - 10https://gerrit.wikimedia.org/r/387019 (https://phabricator.wikimedia.org/T179196) (owner: 10Mainframe98) [20:06:52] (03Merged) 10jenkins-bot: Archive Automatic Board Welcome [integration/config] - 10https://gerrit.wikimedia.org/r/387019 (https://phabricator.wikimedia.org/T179196) (owner: 10Mainframe98) [20:21:41] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3717847 (10BBlack) Updates from the Varnish side of things today (since I've been bad ab... [21:00:29] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3717851 (10Paladox) p:05Low>03Normal I've set it as normal as im working on this :) [21:15:21] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:19:49] (03PS3) 10Legoktm: Add experimental npm-node-6-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/382946 [21:26:05] (03CR) 10Legoktm: [C: 032] Add experimental npm-node-6-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/382946 (owner: 10Legoktm) [21:28:04] (03Merged) 10jenkins-bot: Add experimental npm-node-6-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/382946 (owner: 10Legoktm) [21:50:22] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [21:57:05] * paladox wonders why apple is keeping a local copy of phabricator.wikimedia.org [21:57:08] https://lookup-api.apple.com/phabricator.wikimedia.org/phame/post/view/59/labs_and_tool_labs_being_renamed/ [21:57:24] and wikitech https://lookup-api.apple.com/wikitech.wikimedia.org/wiki/Category:Eqiad_cluster [21:57:40] and wikipedia [21:57:41] https://lookup-api.apple.com/en.wikipedia.org/wiki/Category:Eqiad_cluster [22:04:10] its probably a cache proxy for their tools that use our content [22:05:26] (03PS1) 10Legoktm: Generate .env for npm-node-6-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/387061 [22:05:29] (03PS1) 10Legoktm: Pass --no-progress to npm install [integration/config] - 10https://gerrit.wikimedia.org/r/387062 [22:05:34] (03CR) 10jerkins-bot: [V: 04-1] Generate .env for npm-node-6-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/387061 (owner: 10Legoktm) [22:05:37] (03CR) 10jerkins-bot: [V: 04-1] Pass --no-progress to npm install [integration/config] - 10https://gerrit.wikimedia.org/r/387062 (owner: 10Legoktm) [22:05:42] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3717895 (10BBlack) A while after the above, @hoo started focusing on a different aspect... [22:12:09] (03PS2) 10Legoktm: Generate .env for npm-node-6-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/387061 [22:12:11] (03PS2) 10Legoktm: Pass --no-progress to npm install [integration/config] - 10https://gerrit.wikimedia.org/r/387062 [22:13:54] Seems like some kind of proxy...unsure why it's public [22:21:05] thanks [22:23:33] (03CR) 10Legoktm: [C: 032] Pass --no-progress to npm install [integration/config] - 10https://gerrit.wikimedia.org/r/387062 (owner: 10Legoktm) [22:23:38] (03CR) 10Legoktm: [C: 032] Generate .env for npm-node-6-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/387061 (owner: 10Legoktm) [22:24:43] (03Merged) 10jenkins-bot: Generate .env for npm-node-6-docker job [integration/config] - 10https://gerrit.wikimedia.org/r/387061 (owner: 10Legoktm) [22:26:04] (03Merged) 10jenkins-bot: Pass --no-progress to npm install [integration/config] - 10https://gerrit.wikimedia.org/r/387062 (owner: 10Legoktm) [22:28:44] (03PS1) 10Legoktm: Automatically call docker-zuul-env if the .env file will be used [integration/config] - 10https://gerrit.wikimedia.org/r/387065 [22:38:18] (03CR) 10Legoktm: [C: 04-1] "This will break operations-puppet-wmf-style-guide...we probably need to port that to a ci-src-setup image first." [integration/config] - 10https://gerrit.wikimedia.org/r/387065 (owner: 10Legoktm) [22:56:49] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [23:31:48] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [23:42:07] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]