[03:02:00] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [03:42:01] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:58:26] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,BrowserTests build #562: 04FAILURE in 2 min 26 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=BrowserTests/562/ [04:07:30] Yippee, build fixed! [04:07:31] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,BrowserTests build #562: 09FIXED in 11 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=BrowserTests/562/ [04:17:16] Yippee, build fixed! [04:17:16] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #562: 09FIXED in 21 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/562/ [05:43:48] PROBLEM - Free space - all mounts on deployment-eventlog02 is CRITICAL: CRITICAL: deployment-prep.deployment-eventlog02.diskspace.root.byte_percentfree (<50.00%) [07:03:30] 10Release-Engineering-Team, 10MediaWiki-Platform-Team, 10Patch-For-Review, 10Performance-Team (Radar): Support multi-instance hosts on mediawiki-config - https://phabricator.wikimedia.org/T178553#3718571 (10Marostegui) Thanks @Reedy @Anomie and @tstarling for reviewing the patches, really appreciated. As... [07:43:06] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [08:23:06] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [08:27:14] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3718637 (10hashar) To summarize the discussion on Friday, Gerrit has a plugin for Gravatar. That looks up the email address from a third party site which we can't do due to pr... [09:21:48] !log gerrit: prefix mediawiki/extensions/AWS description with '[ARCHIVED] ' - T174864 [09:21:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:21:53] T174864: Archive the AWS extension - https://phabricator.wikimedia.org/T174864 [09:22:55] !log gerrit: prefix mediawiki/extensions/AutomaticBoardWelcome description with '[ARCHIVED] ' - T179196 [09:22:58] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:22:59] T179196: Archive the AutomaticBoardWelcome extension - https://phabricator.wikimedia.org/T179196 [09:27:38] !log gerrit: deleted /nfsd.git (unused / no changes, created on October 4th 2016) [09:27:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:32:40] 10Continuous-Integration-Infrastructure, 10User-Addshore: un blacklist https://integration.wikimedia.org/ci/computer/XXXX/builds - https://phabricator.wikimedia.org/T178458#3718736 (10Addshore) [09:34:19] (03PS1) 10Hashar: Add non voting debian glue to a few .deb repos [integration/config] - 10https://gerrit.wikimedia.org/r/387199 [09:35:51] (03CR) 10Hashar: [C: 032] Add non voting debian glue to a few .deb repos [integration/config] - 10https://gerrit.wikimedia.org/r/387199 (owner: 10Hashar) [09:36:56] (03Merged) 10jenkins-bot: Add non voting debian glue to a few .deb repos [integration/config] - 10https://gerrit.wikimedia.org/r/387199 (owner: 10Hashar) [09:43:17] (03PS1) 10Hashar: Register test/gerrit-ping [integration/config] - 10https://gerrit.wikimedia.org/r/387200 [09:47:26] (03CR) 10Hashar: [C: 032] Register test/gerrit-ping [integration/config] - 10https://gerrit.wikimedia.org/r/387200 (owner: 10Hashar) [09:49:11] (03Merged) 10jenkins-bot: Register test/gerrit-ping [integration/config] - 10https://gerrit.wikimedia.org/r/387200 (owner: 10Hashar) [09:55:49] !log gerrit: deleted graphs/shared.git unused / emtpy repo [09:55:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:14:57] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3718772 (10ema) >>! In T179156#3717895, @BBlack wrote: > My best hypothesis for the "unr... [10:47:10] hashar what's the gerrit/ping repo for? :) [10:52:59] !log deployment-logstash2 removed puppet class role::labs::lvm::mnt, replacing with role::labs::lvm::srv . /srv is already mounted. Unmounting /mnt and restarting elastcisearch [10:53:02] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:53:28] !log deployment-logstash2 removed puppet class role::labs::lvm::mnt, replacing with role::labs::lvm::srv . /srv is already mounted. Unmounting /mnt and restarting elastcisearch - T 178722 [10:53:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:53:36] paladox: a dummy git repo used for testing gerrit [10:53:44] ah ok thanks. [10:55:11] 10Beta-Cluster-Infrastructure, 10Cloud-Services, 10PAWS, 10Wikidata, 10Wikimedia-Logstash: Remove puppet class role::labs::lvm::mnt - https://phabricator.wikimedia.org/T178722#3718881 (10hashar) [10:56:20] !log deployment-logstash2 removed puppet class role::labs::lvm::mnt, replacing with role::labs::lvm::srv . /srv is already mounted. Unmounting /mnt and restarting elastcisearch - T178722 [10:56:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:56:26] T178722: Remove puppet class role::labs::lvm::mnt - https://phabricator.wikimedia.org/T178722 [10:57:20] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [10:59:33] hashar there should be some performance upgrades with the next big lts update :) [11:00:06] manly it has to do with the build que you see on the home page [11:07:21] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [11:29:04] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3718957 (10Lucas_Werkmeister_WMDE) > The only live polling feature I can think of that w... [11:29:20] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3718958 (10zeljkofilipin) [11:33:18] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3718963 (10zeljkofilipin) [11:34:09] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3294839 (10zeljkofilipin) [11:41:22] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719021 (10zeljkofilipin) [11:44:50] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719044 (10zeljkofilipin) [11:46:29] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3294924 (10zeljkofilipin) [11:48:51] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3719057 (10BBlack) >>! In T179156#3718772, @ema wrote: > There's a timeout limiting the... [11:49:53] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719058 (10zeljkofilipin) [11:50:48] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3297956 (10zeljkofilipin) [11:51:39] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3719064 (10hoo) [11:52:36] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719066 (10zeljkofilipin) [11:54:13] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3298082 (10zeljkofilipin) [11:55:57] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719070 (10zeljkofilipin) [11:58:33] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719072 (10zeljkofilipin) [12:03:09] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719077 (10zeljkofilipin) [12:05:44] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719081 (10zeljkofilipin) [12:06:30] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3320738 (10zeljkofilipin) [12:08:07] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719089 (10zeljkofilipin) [12:09:04] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3335089 (10zeljkofilipin) [12:09:56] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3335102 (10zeljkofilipin) [12:11:20] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719103 (10zeljkofilipin) [12:12:03] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3402695 (10zeljkofilipin) [12:23:08] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:23:12] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719143 (10zeljkofilipin) [12:23:48] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3719145 (10Lucas_Werkmeister_WMDE) >>! In T179156#3719057, @BBlack wrote: >could other s... [12:23:55] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3444451 (10zeljkofilipin) [12:26:20] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719163 (10zeljkofilipin) [12:30:34] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719182 (10zeljkofilipin) [12:31:49] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3466639 (10zeljkofilipin) [12:53:53] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [12:54:19] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [13:01:52] 10Release-Engineering-Team, 10MediaWiki-Platform-Team, 10Patch-For-Review, 10Performance-Team (Radar): Support multi-instance hosts on mediawiki-config - https://phabricator.wikimedia.org/T178553#3719274 (10Reedy) Beta is still working after that patch being deployed [13:17:59] 10Beta-Cluster-Infrastructure, 10ORES, 10Scoring-platform-team (Current), 10User-Ladsgroup: ORESFetchScoreJob: RuntimeException No model available for [goodfaith] - https://phabricator.wikimedia.org/T178792#3719327 (10Ladsgroup) a:03Ladsgroup [13:23:05] 10Release-Engineering-Team, 10MediaWiki-Platform-Team, 10Patch-For-Review, 10Performance-Team (Radar): Support multi-instance hosts on mediawiki-config - https://phabricator.wikimedia.org/T178553#3719379 (10Marostegui) >>! In T178553#3719274, @Reedy wrote: > Beta is still working after that patch being dep... [13:33:52] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [13:34:18] RECOVERY - Puppet errors on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0] [13:45:25] hashar using a repo is much faster heh, tested using phab-01 :) [13:45:28] i mean phab.wmflabs.org [13:45:39] https://phab.wmflabs.org/diffusion/4/repository/master/ [13:48:15] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3719455 (10Paladox) Using a repo would make things faster, and less likly to break phab's side with all that query's. It would also allow users to opt in. We need to request... [13:56:11] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10Ruby, 10User-zeljkofilipin: Announce Selenium Ruby framework deprecation on appropriate mailing lists (QA, engineering, wikitech-l) - https://phabricator.wikimedia.org/T173488#3719480 (10zeljkofilipin) 05Open>03Resolved October [[ h... [13:56:13] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719482 (10zeljkofilipin) [13:57:41] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3478080 (10zeljkofilipin) [13:59:50] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Did something change in beta cluster configuration around September 16 2017? - https://phabricator.wikimedia.org/T179157#3719502 (10zeljkofilipin) 05Open>03Resolved Thanks @Bawolff and @Jdlrobson, I think the myster... [14:03:40] 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Add createAccount method to nodemw - https://phabricator.wikimedia.org/T173505#3530636 (10zeljkofilipin) a:05zeljkofilipin>03None [14:05:10] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin, 10WorkType-NewFunctionality: Make selenium users use botflags at beta-cluster - https://phabricator.wikimedia.org/T116027#3719543 (10zeljkofilipin) [14:05:41] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin, 10WorkType-NewFunctionality: Make selenium users use botflags at beta-cluster - https://phabricator.wikimedia.org/T116027#1738691 (10zeljkofilipin) 05Open>03stalled I do not know how to proceed. [14:05:54] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin, 10WorkType-NewFunctionality: Make selenium users use botflags at beta-cluster - https://phabricator.wikimedia.org/T116027#3719547 (10zeljkofilipin) a:05zeljkofilipin>03None [14:08:02] grmblbl [14:08:07] I broken logstash on beta :( [14:08:36] ah no [14:08:37] it works [14:09:55] 10Beta-Cluster-Infrastructure, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), 10User-Ladsgroup: ORESFetchScoreJob: RuntimeException No model available for [goodfaith] - https://phabricator.wikimedia.org/T178792#3719557 (10hashar) 05Open>03Resolved [14:20:20] 10Beta-Cluster-Infrastructure: User rights request for clearing spam - https://phabricator.wikimedia.org/T176299#3719615 (10Anooprao) 05Open>03Resolved a:03Anooprao I dont have time now for being an admin on Deployment wiki, please change the status if status i have chosen is wrong. [14:22:49] 10Beta-Cluster-Infrastructure: User rights request for clearing spam - https://phabricator.wikimedia.org/T176299#3719641 (10Anooprao) a:05Anooprao>03None [14:26:08] 10Beta-Cluster-Infrastructure: User rights request for clearing spam - https://phabricator.wikimedia.org/T176299#3719647 (10Aklapper) 05Resolved>03declined [14:28:37] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3478096 (10jayvdb) @zeljkofilipin , is there any part of this which might be suitable for #google-code-in-2017... [14:55:04] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719725 (10zeljkofilipin) @jayvdb if a team in charge of a repository wants to mentor a student to (re)write Se... [15:18:10] (03PS10) 10Hashar: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) [15:18:35] (03CR) 10Hashar: [C: 04-2] "check experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) (owner: 10Hashar) [15:20:33] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3715032 (10daniel) @BBlack wrote: > something that's doing a legitimate request->respon... [15:25:21] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719885 (10zeljkofilipin) [15:34:47] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3719914 (10BBlack) Trickled-in POST on the client side would be something else. Varnish... [15:38:23] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3719924 (10demon) Creating a repository full of binary images is just going to grow unmanageably large. I'm not a fan of this task generally, I don't think it's a good use of... [15:39:15] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3719928 (10daniel) > In any case, this would consume front-edge client connections, but... [15:41:13] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719931 (10zeljkofilipin) [15:44:00] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3719939 (10Paladox) >>! In T179212#3719924, @demon wrote: > Creating a repository full of binary images is just going to grow unmanageably large. > > I'm not a fan of this ta... [15:44:07] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:44:34] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-General: Decide whether we want the package-lock.json to commit or ignore - https://phabricator.wikimedia.org/T179229#3719945 (10Jdforrester-WMF) >>! In T179229#3717807, @Legoktm wrote: > If we can use a lock file to pin versions instead of hardco... [15:46:30] 10Release-Engineering-Team (Kanban), 10releng-201718-q1, 10MediaWiki-General-or-Unknown, 10Epic, and 3 others: Port Selenium tests from Ruby to Node.js - https://phabricator.wikimedia.org/T139740#3719956 (10zeljkofilipin) [15:54:38] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3719995 (10BBlack) >>! In T179156#3719928, @daniel wrote: >> In any case, this would co... [15:55:21] PROBLEM - Puppet errors on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:04:44] (03PS11) 10Hashar: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) [16:04:46] (03PS1) 10Hashar: docker: build.py now updates any jjb file [integration/config] - 10https://gerrit.wikimedia.org/r/387256 [16:05:23] (03CR) 10Hashar: [C: 04-2] "check experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) (owner: 10Hashar) [16:19:06] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [16:28:07] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin, 10WorkType-NewFunctionality: Make selenium users use botflags at beta-cluster - https://phabricator.wikimedia.org/T116027#3720104 (10zeljkofilipin) @Kunal: I have just talked to @demon about this and he said you might know about botflags :) [16:28:34] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3720106 (10BBlack) p:05Unbreak!>03High Reducing this from UBN->High, because current... [16:44:29] 10Release-Engineering-Team (Watching / External), 10MediaWiki-Platform-Team, 10Patch-For-Review, 10Performance-Team (Radar): Support multi-instance hosts on mediawiki-config - https://phabricator.wikimedia.org/T178553#3720175 (10greg) [16:46:05] (03CR) 10Hashar: [C: 04-2] "check experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) (owner: 10Hashar) [16:48:19] 10Release-Engineering-Team, 10Scap, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesize limited to 512MBytes - https://phabricator.wikimedia.org/T145819#3720189 (10greg) [16:48:26] 10Release-Engineering-Team (Backlog), 10Scap, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 3 others: Jobs invoking SiteConfiguration::getConfig cause HHVM to fail updating the bytecode cache due to being filesize limited to 512MBytes - https://phabricator.wikimedia.org/T145819#2642064 (10greg) [16:48:48] 10Release-Engineering-Team (Watching / External), 10Commons, 10MediaWiki-Debug-Logger, 10monitoring, 10Performance: High replication lag causing read only mode on commons - https://phabricator.wikimedia.org/T178094#3720195 (10greg) [16:49:10] (03PS12) 10Hashar: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) [16:50:29] (03CR) 10Hashar: "We have some jjb/castor*.bash files and next change stick a docker command in one of them." [integration/config] - 10https://gerrit.wikimedia.org/r/387256 (owner: 10Hashar) [16:50:40] 10Release-Engineering-Team (Backlog), 10Scap, 10Phabricator, 10Patch-For-Review: Improve phabricator's deployment process - https://phabricator.wikimedia.org/T172847#3720209 (10greg) [16:50:43] (03CR) 10Hashar: [C: 04-2] "check experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) (owner: 10Hashar) [16:53:44] (03PS2) 10EBernhardson: Enable maven builds for search/xgboost repository [integration/config] - 10https://gerrit.wikimedia.org/r/386938 [16:56:44] (03CR) 10Legoktm: [C: 032] docker: build.py now updates any jjb file [integration/config] - 10https://gerrit.wikimedia.org/r/387256 (owner: 10Hashar) [16:57:25] (03PS3) 10EBernhardson: Enable maven builds for search/xgboost repository [integration/config] - 10https://gerrit.wikimedia.org/r/386938 [16:58:00] (03Merged) 10jenkins-bot: docker: build.py now updates any jjb file [integration/config] - 10https://gerrit.wikimedia.org/r/387256 (owner: 10Hashar) [16:58:01] anyone can help me push ^ to zuul/jenkins? While i know in the past i could submit jjb updates, now (and after verifying api token is correct) i get: jenkins.JenkinsException: Error in request. Possibly authentication failed [403]: Forbidden [16:58:52] (03CR) 10jerkins-bot: [V: 04-1] Enable maven builds for search/xgboost repository [integration/config] - 10https://gerrit.wikimedia.org/r/386938 (owner: 10EBernhardson) [16:58:56] heh [17:00:17] (03CR) 10Hashar: "Thank you Kunal :)" [integration/config] - 10https://gerrit.wikimedia.org/r/387256 (owner: 10Hashar) [17:01:04] ebernhardson: good morning. That jjb test is a bit too verbose :( [17:01:31] ebernhardson: Job search-xgboost-maven-site-publish not defined [17:02:02] ahh, ok i removed it from one spot and not the other. we don't want site-publish for that (its a minor fork of an upstream project so it's easy to release into our internal java repository) [17:02:05] easy fix [17:02:19] (03PS4) 10EBernhardson: Enable maven builds for search/xgboost repository [integration/config] - 10https://gerrit.wikimedia.org/r/386938 [17:02:22] (03CR) 10Hashar: Enable maven builds for search/xgboost repository (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/386938 (owner: 10EBernhardson) [17:02:31] ;) [17:02:53] indeed i think my browser is still trying to load consoleFull for that ;) [17:07:19] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3720286 (10demon) Have we talked to those maintaining LDAP to see if putting profile images into the records is something we want? I'm pretty sure there's no user-facing front... [17:09:27] ebernhardson: gotta prepare dinner sorry. But will review later tonight [17:10:06] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3720290 (10daniel) > Because they're POST they'd be handled as an immediate pass through... [17:10:08] (03PS13) 10Hashar: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) [17:10:11] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3720291 (10Paladox) @demon it's already available phabs side with https://phabricator.wikimedia.org/conduit/method/user.ldapquery/ :) I just need to copy that so we doint ne... [17:10:13] hashar: thanks! [17:10:25] (03CR) 10Hashar: [C: 04-2] "check experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) (owner: 10Hashar) [17:13:12] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Run Cucumber+Selenium+Node.js in CI - https://phabricator.wikimedia.org/T179190#3720298 (10zeljkofilipin) p:05High>03Normal [17:13:17] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#3720299 (10zeljkofilipin) p:05High>03Normal [17:14:17] (03CR) 10Hashar: [C: 04-2] "check experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) (owner: 10Hashar) [17:16:33] (03CR) 10Hashar: "Sorry for the spam. Will do a final self review later today and deploy it. Then immediately after migrate integration/config to use it, pr" [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) (owner: 10Hashar) [17:22:21] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3720342 (10demon) What on earth does that have to do with it? We don't store the images in LDAP, they're stored in Phab's DB. [17:22:30] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3720343 (10demon) Because not all Phab users are LDAP users. [17:40:01] 10Release-Engineering-Team (Watching / External), 10ORES, 10Operations, 10Scoring-platform-team, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3720392 (10BBlack) >>! In T179156#3719995, @BBlack wrote: > We have an obvious case of n... [18:00:37] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3720447 (10Paladox) Yes users would need to link there ldap account with phab to be able to use there avatar. And yes that's what i mean, the image is stored in the db. We can... [18:25:11] no_justification hi, should i stop working on avatars? [18:25:24] I don't think it's worth the effort. [18:25:28] ok [18:25:36] Plus I don't really want to have to maintain *yet another* shim for gerrit/phab [18:25:52] ok [18:25:53] hashar: could you take a look at my question on https://gerrit.wikimedia.org/r/#/c/386580/ if you have some time? [18:26:04] PROBLEM - Free space - all mounts on integration-slave-jessie-1002 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1002.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1002.diskspace._srv.byte_percentfree (<100.00%) [18:32:59] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [18:34:22] PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [18:46:37] hey [18:50:46] eddiegp [18:50:56] hi [18:51:56] hey guys [18:52:04] hey [18:52:36] menu Status #wikimedia-relengX #wikimedia-releng: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster down | Wiki: http://ur1.ca/jmph5 | Phab: http://ur1.ca/jmphb | Home of deployments, Beta Cluster, Continuous Integration, and so much more! | Our SAL: http://ur1.ca/nhui9 | This channel is publicly logged: http://ur1.ca/qlpgz [14:45] == gavin [c6142001@gateway/web/freenode/ip.198.20.32.1] has joined #wik [18:52:43] menu Status #wikimedia-relengX #wikimedia-releng: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster down | Wiki: http://ur1.ca/jmph5 | Phab: http://ur1.ca/jmphb | Home of deployments, Beta Cluster, Continuous Integration, and so much more! | Our SAL: http://ur1.ca/nhui9 | This channel is publicly logged: http://ur1.ca/qlpgz [14:45] == gavin [c6142001@gateway/web/freenode/ip.198.20.32.1] has joined #wik [18:52:44] menu Status #wikimedia-relengX #wikimedia-releng: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster down | Wiki: http://ur1.ca/jmph5 | Phab: http://ur1.ca/jmphb | Home of deployments, Beta Cluster, Continuous Integration, and so much more! | Our SAL: http://ur1.ca/nhui9 | This channel is publicly logged: http://ur1.ca/qlpgz [14:45] == gavin [c6142001@gateway/web/freenode/ip.198.20.32.1] has joined #wik [18:52:45] menu Status #wikimedia-relengX #wikimedia-releng: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster down | Wiki: http://ur1.ca/jmph5 | Phab: http://ur1.ca/jmphb | Home of deployments, Beta Cluster, Continuous Integration, and so much more! | Our SAL: http://ur1.ca/nhui9 | This channel is publicly logged: http://ur1.ca/qlpgz [14:45] == gavin [c6142001@gateway/web/freenode/ip.198.20.32.1] has joined #wik [18:52:47] menu Status #wikimedia-relengX #wikimedia-releng: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster down | Wiki: http://ur1.ca/jmph5 | Phab: http://ur1.ca/jmphb | Home of deployments, Beta Cluster, Continuous Integration, and so much more! | Our SAL: http://ur1.ca/nhui9 | This channel is publicly logged: http://ur1.ca/qlpgz [14:45] == gavin [c6142001@gateway/web/freenode/ip.198.20.32.1] has joined #wik [18:52:48] menu Status #wikimedia-relengX #wikimedia-releng: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster down | Wiki: http://ur1.ca/jmph5 | Phab: http://ur1.ca/jmphb | Home of deployments, Beta Cluster, Continuous Integration, and so much more! | Our SAL: http://ur1.ca/nhui9 | This channel is publicly logged: http://ur1.ca/qlpgz [14:45] == gavin [c6142001@gateway/web/freenode/ip.198.20.32.1] has joined #wik [18:52:49] menu Status #wikimedia-relengX #wikimedia-releng: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster down | Wiki: http://ur1.ca/jmph5 | Phab: http://ur1.ca/jmphb | Home of deployments, Beta Cluster, Continuous Integration, and so much more! | Our SAL: http://ur1.ca/nhui9 | This channel is publicly logged: http://ur1.ca/qlpgz [14:45] == gavin [c6142001@gateway/web/freenode/ip.198.20.32.1] has joined #wik [18:52:50] menu Status #wikimedia-relengX #wikimedia-releng: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster down | Wiki: http://ur1.ca/jmph5 | Phab: http://ur1.ca/jmphb | Home of deployments, Beta Cluster, Continuous Integration, and so much more! | Our SAL: http://ur1.ca/nhui9 | This channel is publicly logged: http://ur1.ca/qlpgz [14:45] == gavin [c6142001@gateway/web/freenode/ip.198.20.32.1] has joined #wik [18:52:52] menu Status #wikimedia-relengX #wikimedia-releng: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster down | Wiki: http://ur1.ca/jmph5 | Phab: http://ur1.ca/jmphb | Home of deployments, Beta Cluster, Continuous Integration, and so much more! | Our SAL: http://ur1.ca/nhui9 | This channel is publicly logged: http://ur1.ca/qlpgz [14:45] == gavin [c6142001@gateway/web/freenode/ip.198.20.32.1] has joined #wik [18:53:10] [14:52] menu Status #wikimedia-relengX #wikimedia-releng: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster down | Wiki: http://ur1.ca/jmph5 | Phab: http://ur1.ca/jmphb | Home of deployments, Beta Cluster, Continuous Integration, and so much more! | Our SAL: http://ur1.ca/nhui9 | This channel is publicly logged: http://ur1.ca/qlpgz [14:45] == gavin [c6142001@gateway/web/freenode/ip.198.20.32.1 [18:53:50] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:53:58] thanks Reception123 [18:54:07] grr, sorry Reception123, a mis-ping [18:55:49] I should start to ignore it when my phone notifies me about naked pings. [18:55:59] eddiegp: indeed. [19:04:51] eddiegp: I always link folks to https://blogs.gnome.org/markmc/2014/02/20/naked-pings/ to understand the problem with it :) [19:05:04] "saves one roundtrip of communication" [19:06:56] andre__: I know, I read it after you mentioned it somewhere :D [19:08:48] Yippee, build fixed! [19:08:48] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #180: 09FIXED in 19 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/180/ [19:09:21] RECOVERY - Puppet errors on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [19:12:59] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [19:16:36] 10Release-Engineering-Team, 10Vector, 10User-zeljkofilipin: Selenium job blocking merges in Vector repo - https://phabricator.wikimedia.org/T179327#3720716 (10Jdlrobson) [19:25:04] eddiegp: Oh oh. :D [19:33:48] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [19:55:15] PROBLEM - Puppet errors on deployment-ores-redis-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:00:23] twentyafterfour: Donno if you want to look at this git weirdness, but: https://phabricator.wikimedia.org/P6227 [20:00:35] That’s happened 4/4 times now [20:06:02] awight: strange, I've never seen that before [20:06:17] We might have the most ridiculously bloated repos… [20:06:50] code 504 ... [20:08:00] gateway timeout. So yeah, the backend didn't respond quickly enough I guess? [20:08:04] how big is this repo? [20:09:49] the .git is 2GB [20:11:20] it it fetching from phab or gerrit? [20:12:00] if this is https from phab then it's probably https://secure.phabricator.com/T4369 [20:12:21] but that results in a 500 not 504 [20:14:48] fetching from tin in that instance. The editconfig repo is 2.2GB. I think that the apache on tin is timing out because of hugeness. [20:21:06] RECOVERY - Free space - all mounts on integration-slave-jessie-1002 is OK: OK: integration.integration-slave-jessie-1002.diskspace._mnt.byte_percentfree (No valid datapoints found) [20:22:59] (03CR) 10Hashar: "Sounds smart? :)" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/386580 (owner: 10Legoktm) [20:26:15] (03CR) 10Hashar: [C: 031] Add script to manipulate clover.xml files (031 comment) [integration/jenkins] - 10https://gerrit.wikimedia.org/r/386765 (owner: 10Legoktm) [20:26:23] legoktm: yeah that sounds good :] [20:26:37] legoktm: I like the idea of copying the clover.xml to /cover/ to have it published on doc.wm.o automagically [20:27:26] legoktm: also for https://doc.wikimedia.org/cover/ , someone mentionned the progress bar could come before the project name [20:27:31] so they would all be nicely aligned [20:29:39] twentyafterfour: the .gitmodules shows it will be fetching from tin.wmo [20:29:45] err, tin.eqiad.wmnet [20:30:16] RECOVERY - Puppet errors on deployment-ores-redis-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:32:30] awight: I think what's happening is that apache is timing out fetching from tin because it's trying to fetch down that 2.2GB objects directory for editquality on tin [20:32:58] harr. Any thoughts about what I should do? [20:34:00] hrm, well... [20:34:09] (03CR) 10Hashar: [C: 032] "Aced! Well done :]" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/386938 (owner: 10EBernhardson) [20:34:10] so it fetched down from phab -> tin ok, right? [20:34:31] ebernhardson: all good to me :] also gehel added a few jobs that run mvn site goal [20:34:42] ebernhardson: that seems to generate a bunch of helpful documents / reports [20:35:08] awight: you can set: git_upstream_submodules: True in scap.cfg and it won't attempt to fetch submodules from tin, it'll try to grab them from whatever is in .gitmodules IIRC [20:35:21] not a great solution, but may allow you to deploy in this instance. [20:35:24] RECOVERY - Puppet errors on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [20:35:31] we need a task for this one, I reckon :( [20:36:07] Ack, so I would be defeating the submodule rewrite entirely. [20:36:23] hashar: yea, i might get around to adding the maven-site stuff later, but right now the upstream maven config isn't setup to build that stuff [20:36:32] thcipriani: Fine by me, if releng is okay with the load on Phabricator? [20:36:41] ^ twentyafterfour [20:36:57] awight: that's his department :) [20:36:59] Normal deployment will be pulling 3 repos of this size, to 9 machines. [20:37:22] why is it doing a full re-clone each time? [20:38:26] the clones should all share a common object tree shouldn't they? [20:38:36] * bd808 hasn't looks at scap3 guts much [20:39:19] bd808: for the main repo it does. Scap3 reimplemented the bad implementation of submodules from trebuchet, so it regrabs submodules from tin for every rev. [20:39:52] I saw some code to address this go through review last week, so it may be fixed in master for all I know [20:39:53] ah. that is unfortunate for these deployments that have big ass submodules [20:40:11] but I hear git-lfs is coming soon :) [20:40:29] yeah, ores is definitely a deploy that highlights this deficiency [20:40:39] that's the word on the streets :) [20:41:12] lol. I may be exacerbating the issue by retrying the deployment with “-f” [20:42:20] (03Merged) 10jenkins-bot: Enable maven builds for search/xgboost repository [integration/config] - 10https://gerrit.wikimedia.org/r/386938 (owner: 10EBernhardson) [20:42:39] PROBLEM - Puppet errors on integration-puppetmaster01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:42:51] Project selenium-Echo » chrome,beta,Linux,BrowserTests build #564: 04FAILURE in 1 min 51 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/564/ [20:42:53] Project selenium-Echo » firefox,beta,Linux,BrowserTests build #564: 04FAILURE in 1 min 53 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/564/ [20:44:56] 10Release-Engineering-Team (Kanban), 10Scap: ORES deploy submodule 504 - https://phabricator.wikimedia.org/T179336#3721055 (10thcipriani) [20:46:25] 10Release-Engineering-Team (Kanban), 10Scap, 10ORES, 10Scoring-platform-team: ORES deploy submodule 504 - https://phabricator.wikimedia.org/T179336#3721074 (10Zppix) [20:46:38] 10Release-Engineering-Team (Kanban), 10Scap, 10ORES, 10Scoring-platform-team: ORES deploy submodule 504 - https://phabricator.wikimedia.org/T179336#3721075 (10thcipriani) A workaround over the short-term may be to use `git_upstream_submodules: True` in the `scap.cfg` file. This would cause a fetch of the s... [20:53:00] ebernhardson: deployed and I did a recheck on the last change https://gerrit.wikimedia.org/r/#/c/387240/ [20:53:06] ebernhardson: but ci is quite busy [20:53:33] hashar: thanks! yea it's a bit busy. I'll check on it whenever it finishes. Appreciate the help! [20:54:21] 10Release-Engineering-Team (Kanban), 10Scap, 10ORES, 10Scoring-platform-team: ORES deploy submodule 504 - https://phabricator.wikimedia.org/T179336#3721109 (10thcipriani) hrm. I was able to clone this locally on tin FWIW: ``` [thcipriani@tin ~]$ git clone http://tin.eqiad.wmnet/ores/deploy/.git/modules/su... [21:08:50] thanks hashar [21:09:20] legoktm: and potentially you could process the huge clover.xml and generate a .json that just has what you need :] [21:09:28] legoktm: but yeah that looks like a good idea :] [21:10:02] 10Release-Engineering-Team, 10Vector, 10User-zeljkofilipin: Selenium job blocking merges in Vector repo - https://phabricator.wikimedia.org/T179327#3720716 (10matmarex) > ... This is definitely not generated by MediaWIki. Looks like an XDebug PHP exception stack tr... [21:10:04] (03PS14) 10Hashar: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) [21:12:03] (03CR) 10Legoktm: [C: 032] Add script to manipulate clover.xml files (031 comment) [integration/jenkins] - 10https://gerrit.wikimedia.org/r/386765 (owner: 10Legoktm) [21:12:26] awight: troubleshooting your scap issue now, don't attempt a deploy for a bit please -- I'm fiddling with stuff on tin [21:12:47] thcipriani: 10-4, thanks [21:12:57] FYI I’m not in a rush, this was a test deployment anyway. [21:13:01] cool, I'll give you the all clear here in a bit :) [21:13:53] 10Release-Engineering-Team, 10Vector, 10User-zeljkofilipin: Selenium job blocking merges in Vector repo - https://phabricator.wikimedia.org/T179327#3720716 (10hashar) You can look at the build page in Jenkins and check `mw-error.log`. It has a bunch of errors such as: jenkins-mediawiki-core-qunit-selenium-... [21:14:37] 10Release-Engineering-Team, 10Vector, 10User-zeljkofilipin: Selenium job blocking merges in Vector repo - https://phabricator.wikimedia.org/T179327#3721173 (10Legoktm) 05Open>03Invalid It's related to the patch. You can see screenshots on https://integration.wikimedia.org/ci/job/mediawiki-core-qunit-sele... [21:19:01] (03Merged) 10jenkins-bot: Add script to manipulate clover.xml files [integration/jenkins] - 10https://gerrit.wikimedia.org/r/386765 (owner: 10Legoktm) [21:20:51] (03PS15) 10Hashar: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) [21:21:18] legoktm: I am adding a docker job for integration/config with support to restore/save /cache between runs [21:21:35] legoktm: will give it a bit more of a try and then I guess write down something to QA list and migrate other tox jobs [21:21:41] awesome :D [21:22:33] legoktm: did you get the composer one working as you intended? [21:22:51] I had to revert it iirc because bunch of files in the workspace could not be deleted [21:23:07] 10Release-Engineering-Team (Kanban), 10Scap, 10ORES, 10Scoring-platform-team: ORES deploy submodule 504 - https://phabricator.wikimedia.org/T179336#3721216 (10awight) Just a point of information, we have three large repos, which add up to c. 3GB and will only grow. Our deployment cluster has 9 machines, s... [21:24:31] hashar: I did, thanks to addshore figuring out the umask stuff [21:36:22] legoktm: cool. I found some other gems such as /cache/pip/foo being rw------- nobody nogroup [21:36:30] and hence jenkins-deploy cant even read it :D [21:36:40] (03CR) 10jerkins-bot: [V: 04-1] Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) (owner: 10Hashar) [21:38:11] :/ [21:38:49] (03PS16) 10Hashar: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) [21:39:40] 10Continuous-Integration-Config, 10User-MarcoAurelio: Configure CI to run tox jobs for mediawiki/tools/cookiecutter-library - https://phabricator.wikimedia.org/T178727#3721327 (10Legoktm) Thank you Marco! [21:42:56] 10Release-Engineering-Team (Kanban), 10Scap, 10ORES, 10Scoring-platform-team: ORES deploy submodule 504 - https://phabricator.wikimedia.org/T179336#3721358 (10thcipriani) In some fiddling I realized this error message is coming from phab and not tin. Found via `GIT_TRACE=1` directly on the ores1002 server... [21:43:50] awight: are you around to try something with me? [21:44:00] thcipriani: name it [21:44:31] cool, I'm going to remove this checkout from the revs directory and then I want to try a fresh deploy... [21:44:56] I *think* something got screwed up in multiple failed deploys (I hope :)) and that's obscuring the real problem [21:45:46] awight: so if you can try another deploy: I'm monitoring all the things I want to be, I think... [21:46:09] OK great, here goes. I’ll use scap -l but not -f, FWIW [21:46:17] k [21:47:08] (running) [21:48:13] ok, this time they *are* fetching from tin unlike what I found on ores1002 when I started troubleshooting.. [21:48:43] huh. it's...working now [21:48:46] looks like the fetch was clean. [21:48:47] yup. [21:48:53] well shit. [21:49:00] I should try again at rush hour tomorrow… [21:49:18] yeah...I'll leave that task open and update with this, but: that's weird... [21:49:33] I’m a little suspicious of how fast the clone was, actually. c. 1 minute [21:49:50] maybe the -f flag is circumventing a cache [21:50:06] yeah, checkout the checkout on ores1002 and ensure that everything looks sane [21:50:20] maybe it succeeded because it skipped something... [21:50:52] It’s good. [21:50:55] huh [21:51:02] How about I try deploying to another machine, with -f for fun? [21:51:04] ---fun [21:51:12] sure, which other machine? [21:51:32] ores1003.eqiad.wmnet. [21:52:10] these are all not-pooled right? I'm not breaking production? :) [21:52:17] right [21:52:24] ok, just thought I'd ask :P [21:52:28] these are our new toys, still in the shrink-wrap [21:52:39] "Toys" [21:52:39] RECOVERY - Puppet errors on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:52:51] awight: ok, I'm monitoring if you want to try that one [21:53:40] huh: I think this one looks good too [21:53:52] at least, not the problem I had with the other one [21:53:56] still watching [21:54:40] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3721419 (10hashar) https://phabricator.wikimedia.org/api/user.ldapquery is a good find, I gave it a try with ldapnames: `["Hashar"]` which yields: ``` { "0": { ... "... [21:55:27] (03CR) 10Hashar: [C: 032] Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) (owner: 10Hashar) [21:55:33] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3721431 (10Paladox) @hashar though query conduit in java is very slowwwwwww. [21:55:36] 10Gerrit: Create avatar plugin for gerrit that uses phab's conduit to get users profile image - https://phabricator.wikimedia.org/T179212#3721432 (10demon) 05Open>03declined Per IRC. [21:55:45] Interestingly, the fetch took the same amount of time. [21:55:46] yeah, so, when I got to ores1002 initially all the submodules where pointing to phab [21:55:52] wat. [21:55:55] https://phabricator.wikimedia.org/T179336#3721358 [21:56:03] * awight grips head [21:56:10] checkout the second paste there [21:56:37] the only thing I could think happened is that somehow we ran: git checkout .gitmodules; git submodule sync ... somehow [21:56:53] like an initial failed deploy that left that checkout it weird shape [21:57:06] and then subsequent deploys failed weirdly [21:57:17] there's a bug here somewhere, but I'm not clear where... [21:57:17] I did look at .gitmodules in its final state and it was correctly rewritten. FYI twentyafterfour just patched that code to tighten it up. [21:57:47] yeah, but we haven't cut a new release yet so it's probably only on beta at this point (new release coming soon™) [21:57:59] (03Merged) 10jenkins-bot: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (https://phabricator.wikimedia.org/T179208) (owner: 10Hashar) [21:58:17] where "it's" is twentyafterfour 's better code [21:58:43] ok. well...I will leave that task open, and I'll dig a little more to see how this could have happened. [21:59:06] FYI there are logs of many deployments on tin.eqiad.wmnet:/srv/deployment/ores/deploy/scap/logs [21:59:26] and you can use: scap deploy-log -v -f [path/to/log/file] to look at them [21:59:52] I'm going to dig in there a bit [22:01:19] 10Continuous-Integration-Infrastructure (shipyard), 10Release-Engineering-Team (Kanban), 10Patch-For-Review: Port castor to support docker container - https://phabricator.wikimedia.org/T179208#3721469 (10hashar) 05Open>03Resolved Some basis have been done in https://gerrit.wikimedia.org/r/385390 That i... [22:02:26] ty for the command [22:06:17] thcipriani: [22:06:34] thcipriani: lol I’ve broken some things, could you rm -rf /srv/deployment/ores/venv on ores1002 and 1003? [22:06:54] I forgot about a bug I introduced which breaks virtualenv irreparably. [22:07:25] heh [22:08:13] awight: heh, sure I can remove it...will that break everything? [22:08:15] 8D my forte. In this case, I’m doing a stunt where I update pip in order to use newer wheels, but it doesn’t update the virtualenv bin/pip correctly, it sems. [22:08:22] it will, but non-production so NBD [22:08:38] ohhh k, doing [22:08:52] well, we don't want it spamming the beta logs for very long [22:09:14] awight: done [22:09:16] greg-g: luckily, this is a production cluster which hasn’t been pooled yet. [22:09:22] thcipriani: thanks! [22:09:40] oh, right! [22:10:16] legoktm: thanks to thcipriani and his umask stuff ;) [22:10:27] Just landed back from Berlin! Time to head home! [22:12:26] addshore: weee ? [22:13:07] Weee! [22:13:29] :) [22:14:34] PROBLEM - Puppet errors on deployment-pdfrender02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [22:19:41] thcipriani: XioNoX may start bugging you soon about scap3 stuff. He's got a django app that he wants to deploy for some techops stuff [22:22:09] bd808: okie doke, sounds good. Just realizing Scap3 Deployment Guide is a redlink https://wikitech.wikimedia.org/wiki/Scap3 :( [22:23:09] yeah :) [22:23:39] I kind of started on something about deploying striker with scap3 but haven't gotten very far [22:24:04] https://wikitech.wikimedia.org/wiki/Toolsadmin.wikimedia.org/Build [22:24:26] nice [22:24:34] I swear I started on this page at some point. Just need to find where I put the draft... [22:31:55] * paladox had 1am shown twice on sunday :) [22:47:54] (03PS1) 10Hashar: tox-docker generic job [integration/config] - 10https://gerrit.wikimedia.org/r/387459 [22:51:29] (03PS2) 10Hashar: tox-docker generic job [integration/config] - 10https://gerrit.wikimedia.org/r/387459 [22:52:25] 10Gitblit-Deprecate, 10Epic, 10MW-1.30-release-notes (WMF-deploy-2017-07-18_(1.30.0-wmf.10)), 10Patch-For-Review: Fix references to git.wikimedia.org in all repos - https://phabricator.wikimedia.org/T139089#3721790 (10Umherirrender) Search without i18n php shim: ``` core + languages + Language.php 17... [22:54:35] RECOVERY - Puppet errors on deployment-pdfrender02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:00:26] PROBLEM - Puppet errors on castor02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [23:04:03] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:29:41] PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:34:28] RECOVERY - App Server Main HTTP Response on deployment-mediawiki04 is OK: HTTP OK: HTTP/1.1 200 OK - 46805 bytes in 1.081 second response time [23:39:00] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:40:28] RECOVERY - Puppet errors on castor02 is OK: OK: Less than 1.00% above the threshold [0.0] [23:49:55] PROBLEM - Free space - all mounts on deployment-mediawiki04 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki04.diskspace.root.byte_percentfree (<11.11%) [23:53:48] PROBLEM - Free space - all mounts on deployment-eventlog02 is CRITICAL: CRITICAL: deployment-prep.deployment-eventlog02.diskspace.root.byte_percentfree (<11.11%) [23:59:15] 10Release-Engineering-Team (Watching / External), 10Scap, 10Operations: Scap: Standardize git version - https://phabricator.wikimedia.org/T179353#3721967 (10thcipriani)