[00:17:11] Yippee, build fixed! [00:17:11] Project selenium-Flow » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #106: 09FIXED in 1 min 10 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/106/ [00:17:16] Yippee, build fixed! [00:17:16] Project selenium-Flow » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #106: 09FIXED in 1 min 15 sec: https://integration.wikimedia.org/ci/job/selenium-Flow/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/106/ [00:44:31] 10scap: scap sync-(file|dir) breaks tab complete - https://phabricator.wikimedia.org/T142548#2538846 (10Reedy) [00:46:11] legoktm: OK, https://www.mediawiki.org/w/index.php?title=Commit-message-validator&oldid=2212306 works but (a) you don't need '/path/to/', it's into the repo, and (b) it dirties the repo, boo. [00:46:56] James_F: I was thinking that you do it outside the repo somewhere, so you can use the same install for every repo, meaning you need an absolute path [00:47:14] legoktm: Ah, OK. Yeah, that can work. [00:47:31] (Also your instructions obliterate the post-commit hook if there's anything else there; you should append.) [00:48:11] legoktm: Maybe add `cd ~` and simplify? [00:50:23] legoktm: https://www.mediawiki.org/w/index.php?diff=2212438&oldid=2212306&title=Commit-message-validator&type=revision [00:51:32] James_F|Away: thanks, that looks good [01:40:15] PROBLEM - SSH on deployment-redis02 is CRITICAL: Server answer [01:45:17] RECOVERY - SSH on deployment-redis02 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [01:51:16] PROBLEM - SSH on deployment-redis02 is CRITICAL: Server answer [01:52:24] (03CR) 10Legoktm: [C: 04-1] "We should use the zuul templates instead of manually defining pipelines. I'll amend." [integration/config] - 10https://gerrit.wikimedia.org/r/303218 (owner: 10BryanDavis) [01:55:31] (03PS3) 10Legoktm: labs/striker: Add test and gate-and-submit [integration/config] - 10https://gerrit.wikimedia.org/r/303218 (owner: 10BryanDavis) [01:55:53] (03PS4) 10Legoktm: labs/striker: Add test and gate-and-submit [integration/config] - 10https://gerrit.wikimedia.org/r/303218 (owner: 10BryanDavis) [01:56:43] (03CR) 10Legoktm: [C: 032] labs/striker: Add test and gate-and-submit [integration/config] - 10https://gerrit.wikimedia.org/r/303218 (owner: 10BryanDavis) [01:57:24] (03Merged) 10jenkins-bot: labs/striker: Add test and gate-and-submit [integration/config] - 10https://gerrit.wikimedia.org/r/303218 (owner: 10BryanDavis) [01:58:11] !log deploying https://gerrit.wikimedia.org/r/303218 [01:58:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [01:59:18] bd808: ^ [02:17:00] Project selenium-QuickSurveys » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #113: 04FAILURE in 3 min 59 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/113/ [02:31:16] RECOVERY - SSH on deployment-redis02 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [02:34:51] "The sqlite data files are binary blobs and as such will not rsync well. Scap could either invent yet another binary->text->binary processing step as we did with the l10n cache files, or we could figure out a better way to sync blobs." [02:35:17] bd808: in what sense do binary blobs "not rsync well"? [02:56:02] (03CR) 10Legoktm: [C: 032] Convert check_output to str in Python 3 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303755 (https://phabricator.wikimedia.org/T142458) (owner: 10Legoktm) [02:56:10] (03Abandoned) 10Legoktm: [WIP] Support being used in commit-msg git hook [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303756 (https://phabricator.wikimedia.org/T142460) (owner: 10Legoktm) [02:57:57] (03Merged) 10jenkins-bot: Convert check_output to str in Python 3 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303755 (https://phabricator.wikimedia.org/T142458) (owner: 10Legoktm) [02:59:22] TimStarling: those particular files don't rsync well in that the l10n bild process seems to make a nearly completely different file each time a key is updated. So they end up sending many megs of diff for a relatively small content change [02:59:38] ok [03:00:01] what was the file size? [03:00:02] oh.. not l10nupdate but hhvm change I guess [03:00:23] I... actually don't remember. I could build some and find out [03:01:24] the build harness I used when we were testing repo authoritative stuff is at https://github.com/bd808/bug-67168 [03:02:09] yeah, I have been looking at that code again [03:04:34] legoktm: thanks for deploying that. I was being lazy and hoping you or hashar would take care of it :) [03:04:42] heh, np :) [03:06:44] heh "SRC_DIR=/usr/local/apache/common-local" that was wa hile ago [03:08:02] (03Abandoned) 10Legoktm: Add jsonchecker.py [integration/jenkins] - 10https://gerrit.wikimedia.org/r/192059 (https://phabricator.wikimedia.org/T73284) (owner: 10Legoktm) [03:11:35] where is the current scap source code? [03:11:56] https://phabricator.wikimedia.org/diffusion/MSCA/ [03:13:45] (03Abandoned) 10Legoktm: Run mediawiki/core phpunit tests on sqlite again (in addition to mysql) [integration/config] - 10https://gerrit.wikimedia.org/r/207135 (owner: 10Legoktm) [03:20:35] (03CR) 10Legoktm: Add checkstyle publisher for mediawiki-core-phpcs job (033 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/243869 (https://phabricator.wikimedia.org/T113865) (owner: 10Legoktm) [03:20:44] (03PS2) 10Legoktm: Add checkstyle publisher for mediawiki-core-phpcs job [integration/config] - 10https://gerrit.wikimedia.org/r/243869 (https://phabricator.wikimedia.org/T113865) [03:27:56] how do you propose changes to it? [03:30:06] TimStarling: as differential changes -- arc diff [03:30:21] https://secure.phabricator.com/book/phabricator/article/arcanist_diff/ [03:31:25] https://secure.phabricator.com/book/phabricator/article/arcanist_quick_start/ -- might be a good place to start reading [03:31:37] I have to look it all up again every time still [03:35:12] done: https://phabricator.wikimedia.org/D307 [03:36:59] nice :) [03:40:34] TimStarling: the convention with differential is that the author lands the change after a reviewer approves it. So that's good to land whenever you'd like [03:43:18] (03PS1) 10Legoktm: Fix running tests on Python 3 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303970 (https://phabricator.wikimedia.org/T142455) [03:46:36] bd808: wanna review that? ^ [03:49:13] I gather the jenkins failure is not my fault [03:49:15] (03CR) 10BryanDavis: [C: 032] Fix running tests on Python 3 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303970 (https://phabricator.wikimedia.org/T142455) (owner: 10Legoktm) [03:49:36] TimStarling: I don't think I've ever seen it pass [03:49:43] (03Merged) 10jenkins-bot: Fix running tests on Python 3 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303970 (https://phabricator.wikimedia.org/T142455) (owner: 10Legoktm) [03:51:05] Exception: You do not have permission to push to this repository. [03:51:18] also, I can't edit the diffusion project either [03:51:23] " Members of the project "Repository-Admins" can take this action." [03:51:24] hmm... that's lame. /me looks at perms [03:52:05] https://phabricator.wikimedia.org/project/members/85/ [03:52:44] the push users are me, Amir, releng, and ops [03:53:22] I'll land it for you I guess [03:59:50] TimStarling: landed. phab is still trying to digest the change. -- https://phabricator.wikimedia.org/rMSCA821d7c7 [04:00:00] thanks [04:04:55] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #104: 04FAILURE in 8 min 55 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/104/ [04:11:17] 10Continuous-Integration-Config, 13Patch-For-Review: Set up composer-test for all MW extensions where it isn't broken - https://phabricator.wikimedia.org/T124342#2538983 (10Legoktm) a:05Legoktm>03Paladox @paladox: I'm assigning this to you since I think you were working on this last - feel free to unassign... [04:12:30] 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Fetch dependencies using composer instead of cloning mediawiki/vendor for non-wmf branches - https://phabricator.wikimedia.org/T90303#2538986 (10Legoktm) a:05Legoktm>03None [04:13:18] Project selenium-MultimediaViewer » chrome,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #104: 04FAILURE in 17 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/104/ [04:13:55] 10MediaWiki-Codesniffer, 05MW-1.27-release-notes, 07Upstream: @codingStandardsIgnoreStart only works with some types of comments - https://phabricator.wikimedia.org/T114213#2538989 (10Legoktm) a:05Legoktm>03None [04:14:11] 06Release-Engineering-Team: Send email of last day's SAL entries to releng@ - https://phabricator.wikimedia.org/T106443#2538990 (10Legoktm) a:05Legoktm>03None Sorry, never found time to work on this. [04:18:42] Project selenium-MultimediaViewer » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #104: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/104/ [04:27:34] (03PS1) 10Legoktm: Add README and update URL [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303971 [04:27:36] (03PS1) 10Legoktm: Release 0.3.0 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303972 [04:28:04] (03CR) 10Legoktm: [C: 032] Release 0.3.0 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303972 (owner: 10Legoktm) [04:28:08] (03CR) 10Legoktm: [C: 032] Add README and update URL [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303971 (owner: 10Legoktm) [04:28:37] (03Merged) 10jenkins-bot: Add README and update URL [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303971 (owner: 10Legoktm) [04:28:39] (03Merged) 10jenkins-bot: Release 0.3.0 [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303972 (owner: 10Legoktm) [04:44:08] (03PS1) 10Legoktm: run-commit-message-validator: Switch to Python 3 [integration/jenkins] - 10https://gerrit.wikimedia.org/r/303974 [05:02:35] (03PS1) 10Legoktm: Improve and fix detection of merge commits [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303977 [05:02:53] (03CR) 10Legoktm: [C: 032] Improve and fix detection of merge commits [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303977 (owner: 10Legoktm) [05:03:22] (03Merged) 10jenkins-bot: Improve and fix detection of merge commits [integration/commit-message-validator] - 10https://gerrit.wikimedia.org/r/303977 (owner: 10Legoktm) [05:05:56] (03CR) 10Legoktm: [C: 032] run-commit-message-validator: Switch to Python 3 [integration/jenkins] - 10https://gerrit.wikimedia.org/r/303974 (owner: 10Legoktm) [05:06:26] (03Merged) 10jenkins-bot: run-commit-message-validator: Switch to Python 3 [integration/jenkins] - 10https://gerrit.wikimedia.org/r/303974 (owner: 10Legoktm) [05:11:07] okaaay. I think it's all ready now. [05:54:47] 10Beta-Cluster-Infrastructure, 10WikimediaPageViewInfo: Deploy WikimediaPageViewInfo extension to beta cluster - https://phabricator.wikimedia.org/T129602#2539028 (10Legoktm) [Oops, apparently I never submitted my comment here] >>! In T129602#2123685, @greg wrote: > I'd like explicitly list the blockers to ha... [05:55:13] greg-g: ^ only took me 4 months to respond, sorry [06:24:38] (03PS3) 10Legoktm: Add checkstyle publisher for mediawiki-core-phpcs job [integration/config] - 10https://gerrit.wikimedia.org/r/243869 (https://phabricator.wikimedia.org/T113865) [06:25:18] (03CR) 10Legoktm: "Hashar: I updated the job and ran it (), but I don't se" [integration/config] - 10https://gerrit.wikimedia.org/r/243869 (https://phabricator.wikimedia.org/T113865) (owner: 10Legoktm) [06:31:15] 05Gerrit-Migration, 10Differential, 07Documentation: Document use of Owners in Phabricator and advertise it - https://phabricator.wikimedia.org/T128372#2539051 (10mmodell) Should we mark this as resolved or should we make some attempt to further promote the use of owners? [07:01:16] PROBLEM - SSH on deployment-redis02 is CRITICAL: Server answer [07:11:15] RECOVERY - SSH on deployment-redis02 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [07:12:21] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 3230 bytes in 0.076 second response time [07:13:33] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 3232 bytes in 1.078 second response time [07:14:02] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 3230 bytes in 0.080 second response time [08:05:08] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 TLS Redirect - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 588 bytes in 0.003 second response time [08:06:19] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 TLS Redirect - string 'Wikipedia' not found on 'http://en.m.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 590 bytes in 0.008 second response time [08:14:36] [V6riZQpEEH8AAB7KA8sAAAAG] /wiki/Main_Page?debug=true MWException from line 335 of /srv/mediawiki/php-master/includes/MagicWord.php: Error: invalid magic word 'coordinates' [08:16:04] on it [08:22:52] 10Continuous-Integration-Config, 13Patch-For-Review: Set up composer-test for all MW extensions where it isn't broken - https://phabricator.wikimedia.org/T124342#2539183 (10Paladox) Thanks and yep i am working on this. [08:48:39] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 44529 bytes in 5.633 second response time [08:49:01] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 44527 bytes in 0.973 second response time [08:52:19] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 44522 bytes in 1.403 second response time [08:57:41] 07Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 06Reading-Web-Backlog, 03Reading-Web-Sprint-78-Terminal-Velocity, 07Unplanned-Sprint-Work: MultimediaViewer tests fail with waiting for {:class=>"mw-mmv-final-image"} (Firefox only) - https://phabricator.wikimedia.org/T142423#2534386 (10phuedx) [... [09:20:15] PROBLEM - SSH on deployment-redis02 is CRITICAL: Server answer [09:25:38] 10Beta-Cluster-Infrastructure, 06Commons, 06Multimedia: Setup deployment-imagescaler host(s) in Beta Cluster - https://phabricator.wikimedia.org/T142289#2539296 (10Gilles) Thumbor is meant to replace mediawiki image scaling entirely. We're not considering running both on the same machine. deployment-imagesca... [09:27:48] 10Beta-Cluster-Infrastructure, 06Commons, 06Multimedia: Setup deployment-imagescaler host(s) in Beta Cluster - https://phabricator.wikimedia.org/T142289#2539299 (10Gilles) Or we could also move faster on beta and have thumbnailing done entirely by thumbor there as soon as the swift part is set up, that's als... [09:30:47] 10MediaWiki-Codesniffer: Update squizlabs/PHP_CodeSniffer to 3.x - https://phabricator.wikimedia.org/T142474#2539307 (10Paladox) [09:40:16] RECOVERY - SSH on deployment-redis02 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [09:46:17] PROBLEM - SSH on deployment-redis02 is CRITICAL: Server answer [11:26:14] RECOVERY - SSH on deployment-redis02 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [11:38:15] PROBLEM - SSH on deployment-redis02 is CRITICAL: Server answer [12:01:22] Project selenium-RelatedArticles » chrome,beta-desktop,Linux,contintLabsSlave && UbuntuTrusty build #108: 04FAILURE in 21 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-desktop,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/108/ [12:08:16] RECOVERY - SSH on deployment-redis02 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [12:22:45] Project selenium-GettingStarted » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #109: 04FAILURE in 45 sec: https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/109/ [13:20:07] 07Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 06Reading-Web-Backlog, 03Reading-Web-Sprint-78-Terminal-Velocity, 07Unplanned-Sprint-Work: MultimediaViewer tests fail with waiting for {:class=>"mw-mmv-final-image"} (Firefox only) - https://phabricator.wikimedia.org/T142423#2539746 (10dr0ptp4kt... [13:26:06] Hey, Does the job runner in beta have problem? [13:26:41] My jobs for ores scores gets done but no score is being added (and when I run the maintenance script, it works) [13:26:54] probably there is something wrong with db connection [13:27:16] no ores-related error in logstash though [14:11:09] 06Release-Engineering-Team, 06Operations: Manage Appveyor account - https://phabricator.wikimedia.org/T104306#2539826 (10MoritzMuehlenhoff) a:05MoritzMuehlenhoff>03None [14:22:18] PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [14:34:01] Yippee, build fixed! [14:34:01] Project selenium-WikiLove » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #108: 09FIXED in 2 min 0 sec: https://integration.wikimedia.org/ci/job/selenium-WikiLove/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/108/ [14:52:18] RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [15:05:16] PROBLEM - Puppet staleness on deployment-cache-upload04 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0] [15:20:17] PROBLEM - SSH on deployment-redis02 is CRITICAL: Server answer [15:25:20] (03CR) 10Thcipriani: [C: 032] update delete-stale-branch to use keyholder [tools/release] - 10https://gerrit.wikimedia.org/r/303913 (owner: 1020after4) [15:30:59] (03Merged) 10jenkins-bot: update delete-stale-branch to use keyholder [tools/release] - 10https://gerrit.wikimedia.org/r/303913 (owner: 1020after4) [15:46:49] 07Browser-Tests, 10MediaWiki-extensions-MultimediaViewer, 06Reading-Web-Backlog, 03Reading-Web-Sprint-78-Terminal-Velocity, 07Unplanned-Sprint-Work: MultimediaViewer tests fail with waiting for {:class=>"mw-mmv-final-image"} (Firefox only) - https://phabricator.wikimedia.org/T142423#2540072 (10jhobs) @p... [15:52:03] (03CR) 10Paladox: [C: 04-1] Lets install MySQL before installing extension and extensions dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [15:56:23] (03CR) 10Paladox: [C: 031] Lets install MySQL before installing extension and extensions dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [15:56:38] (03CR) 10Paladox: Lets install MySQL before installing extension and extensions dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [15:56:42] (03Abandoned) 10Paladox: Lets install MySQL before installing extension and extensions dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/264333 (owner: 10Paladox) [16:00:52] (03PS2) 10Paladox: [Cards] Add test npm-run-doc [integration/config] - 10https://gerrit.wikimedia.org/r/278447 [16:04:14] (03PS6) 10Paladox: In node-4.3 clone under src [integration/config] - 10https://gerrit.wikimedia.org/r/290702 (https://phabricator.wikimedia.org/T130208) [16:04:56] (03PS4) 10Paladox: Add pywikibot-npm-node-4.3 experimental job to pywikibot/i18n [integration/config] - 10https://gerrit.wikimedia.org/r/290703 (https://phabricator.wikimedia.org/T130207) [16:07:03] (03CR) 10jenkins-bot: [V: 04-1] Add pywikibot-npm-node-4.3 experimental job to pywikibot/i18n [integration/config] - 10https://gerrit.wikimedia.org/r/290703 (https://phabricator.wikimedia.org/T130207) (owner: 10Paladox) [16:07:05] (03Abandoned) 10Paladox: [timeline] Update Jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/291670 (owner: 10Paladox) [16:09:52] (03PS5) 10Paladox: Add pywikibot-npm-node-4.3 experimental job to pywikibot/i18n [integration/config] - 10https://gerrit.wikimedia.org/r/290703 (https://phabricator.wikimedia.org/T130207) [16:12:40] (03PS3) 10Paladox: [timeline] Add test extensions-unittests-generic [integration/config] - 10https://gerrit.wikimedia.org/r/303305 [16:15:29] 10Beta-Cluster-Infrastructure, 10WikimediaPageViewInfo: Deploy WikimediaPageViewInfo extension to beta cluster - https://phabricator.wikimedia.org/T129602#2540183 (10greg) >>! In T129602#2539028, @Legoktm wrote: > I don't really have an answer for you, because I don't understand the costs or consequences of se... [16:18:29] (03PS3) 10Paladox: [wikidata/query/gui-deploy] make npm test voting [integration/config] - 10https://gerrit.wikimedia.org/r/291736 [16:19:31] (03PS7) 10Paladox: Migrate wikimedia/fundraising/dash node-0.10 test to node-4.3 test [integration/config] - 10https://gerrit.wikimedia.org/r/291603 [16:19:56] (03PS7) 10Paladox: [CentralAuth] Add composer-test test [integration/config] - 10https://gerrit.wikimedia.org/r/288819 [16:20:41] paladox: are those just a bunch of rebases? is there anything new regarding them that would require that rebase? or is it just to remove the "this patch needs a rebase" notice? [16:21:13] greg-g i am going through my patches, since im aware i need to clean up most of them since most need abandoning. [16:21:36] But most of those are still waiting for review and were last updated 1month+ ago [16:21:43] and hi [16:21:48] PROBLEM - Puppet run on deployment-changeprop is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:22:10] paladox: hola! [16:22:18] :) [16:22:33] paladox: yeah, just wondering if the rebase will actually do any good or if it's just noise :) but, I'll move on ;) [16:23:04] greg-g oh, sorry for the noise, some rebases i found actually needed fixing locally. [16:23:22] since it is hard to see if there are merge conflicts, since they allways say that [16:23:23] :) [16:26:27] * greg-g nods [16:26:43] yeah, it'd be nice if gerrit knew before hand [16:27:08] Oh [16:27:22] * paladox likes the new inline edit, so easy now. [16:28:35] (03Abandoned) 10Paladox: [Cards] Add test npm-run-doc [integration/config] - 10https://gerrit.wikimedia.org/r/278447 (owner: 10Paladox) [16:28:52] (03Abandoned) 10Paladox: Add composer-test53 and npm-node-4.3 test to experimental: in some repos [integration/config] - 10https://gerrit.wikimedia.org/r/280664 (owner: 10Paladox) [16:30:17] RECOVERY - SSH on deployment-redis02 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [16:30:43] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: Deploy mediawiki release tools repo (rMREL) with scap3 - https://phabricator.wikimedia.org/T142588#2540290 (10mmodell) [16:31:14] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: make scap3 look in PWD to find local CLI extensions - https://phabricator.wikimedia.org/T142590#2540325 (10mmodell) [16:31:26] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: Deploy mediawiki release tools repo (rMREL) with scap3 - https://phabricator.wikimedia.org/T142588#2540338 (10mmodell) [16:31:28] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: make scap3 look in PWD to find local CLI extensions - https://phabricator.wikimedia.org/T142590#2540337 (10mmodell) [16:35:56] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: make scap3 look in PWD to find local CLI extensions - https://phabricator.wikimedia.org/T142590#2540344 (10mmodell) a:03mmodell [16:36:17] PROBLEM - SSH on deployment-redis02 is CRITICAL: Server answer [16:37:16] shinken is not so happy today :/ [16:42:17] (03Abandoned) 10Paladox: [Metrolook] Add mw-checks-test test [integration/config] - 10https://gerrit.wikimedia.org/r/265099 (owner: 10Paladox) [16:42:32] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: make scap3 look in PWD to find local CLI extensions - https://phabricator.wikimedia.org/T142590#2540413 (10mmodell) [16:44:28] 06Release-Engineering-Team (Long-Lived-Branches), 03Scap3: make scap3 look in PWD to find local CLI extensions - https://phabricator.wikimedia.org/T142590#2540325 (10mmodell) [16:44:30] 10Deployment-Systems, 06Release-Engineering-Team (Long-Lived-Branches): create merge-wmf-branch the successor to make-wmf-branch - https://phabricator.wikimedia.org/T140918#2540426 (10mmodell) [16:44:59] (03PS2) 10Paladox: [JsonData] Add npm test and composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/280772 [16:45:06] 10Deployment-Systems, 06Release-Engineering-Team (Long-Lived-Branches): create `scap swat` command (the successor to make-wmf-branch) - https://phabricator.wikimedia.org/T140918#2480927 (10mmodell) [16:45:47] 10Deployment-Systems, 06Release-Engineering-Team (Long-Lived-Branches), 07WorkType-NewFunctionality: Use subrepos instead of git submodules for deployed MediaWiki extensions - https://phabricator.wikimedia.org/T98834#2540434 (10mmodell) 05Open>03declined I don't think this is in the cards after all. [16:45:54] (03PS3) 10Paladox: [JsonData] Add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/280772 [16:46:16] RECOVERY - SSH on deployment-redis02 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [16:47:32] (03Abandoned) 10Paladox: [SolrStore] Switch unit tests to composer non-voting unit tests [integration/config] - 10https://gerrit.wikimedia.org/r/284492 (owner: 10Paladox) [16:52:17] PROBLEM - SSH on deployment-redis02 is CRITICAL: Server answer [16:52:50] 05Gerrit-Migration, 03releng-201617-q3, 07Documentation: Update Code Review related documentation on wiki pages from Gerrit to Differential - https://phabricator.wikimedia.org/T207#2540498 (10greg) [16:52:55] 05Gerrit-Migration, 10Differential, 07Documentation: Document use of Owners in Phabricator and advertise it - https://phabricator.wikimedia.org/T128372#2540494 (10greg) 05Open>03Resolved I think we're done for now, honestly. Thanks Andre and Mukunda! [17:02:43] * paladox can now order from target.com in the uk :) [17:06:53] 06Release-Engineering-Team, 10Pywikibot-General: Manage Appveyor account - https://phabricator.wikimedia.org/T104306#2540561 (10greg) So, just to be clear, the pywikibot team wants the WMF Release Engineering team to own this account? As this account is not something that RelEng set up nor maintains I will s... [17:07:39] 06Release-Engineering-Team: Use pwstore (a shared gpg-encrypted password store) for Release Engineering related passwords - https://phabricator.wikimedia.org/T139093#2540564 (10greg) [17:07:40] 06Release-Engineering-Team, 10Pywikibot-General: Manage Appveyor account - https://phabricator.wikimedia.org/T104306#2540563 (10greg) [17:08:24] 06Release-Engineering-Team, 10Pywikibot-General: Share Appveyor account credentials with Release Engineering - https://phabricator.wikimedia.org/T104306#1413120 (10greg) [17:08:35] 06Release-Engineering-Team, 10Pywikibot-General: Share Appveyor account credentials with Release Engineering - https://phabricator.wikimedia.org/T104306#1413120 (10greg) [17:21:44] 10MediaWiki-Codesniffer: Update squizlabs/PHP_CodeSniffer to 3.x - https://phabricator.wikimedia.org/T142474#2540627 (10Paladox) @legoktm could I assign you this please? [17:25:45] Yippee, build fixed! [17:25:45] Project selenium-RelatedArticles » chrome,beta-desktop,Linux,contintLabsSlave && UbuntuTrusty build #109: 09FIXED in 43 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-desktop,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/109/ [17:26:22] 07Browser-Tests, 03Reading-Web-Sprint-78-Terminal-Velocity: Various browser tests failing due to login error - https://phabricator.wikimedia.org/T142600#2540639 (10Jdlrobson) [17:26:45] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-MultimediaViewer, 06Reading-Web-Backlog, 13Patch-For-Review, and 2 others: A JSON text must at least contain two octets! (JSON::ParserError) in MultimediaViewer, Echo, Flow, RelatedArticles, MobileFront... - https://phabricator.wikimedia.org/T129483#2540655 [17:26:47] 10Browser-Tests-Infrastructure, 06Reading-Web-Backlog, 13Patch-For-Review, 03Reading-Web-Sprint-78-Terminal-Velocity, and 3 others: Upgrade Selenium gem for various reading web opened extensions - https://phabricator.wikimedia.org/T142141#2540651 (10Jdlrobson) 05Open>03Resolved The task requested that... [17:36:34] fyi thcipriani https://phabricator.wikimedia.org/T142600 [17:38:07] Hi jdlrobson. Did you get a patch deployment yesterday? [17:42:46] 10Deployment-Systems, 03Scap3 (Scap3-MediaWiki-MVP), 10scap, 10MediaWiki-API, and 3 others: Create a script to run test requests for the MediaWiki service - https://phabricator.wikimedia.org/T136839#2540773 (10mobrovac) @Anomie thnx a lot for the comments! You are way ahead of what I had in mind for the fi... [18:10:14] 06Release-Engineering-Team, 15User-greg: Determine timing of 2016 RelEng team offsite - https://phabricator.wikimedia.org/T137720#2540978 (10greg) a:05greg>03None [18:19:32] Nodepool is down again [18:19:40] greg-g ^^ https://integration.wikimedia.org/zuul/ [18:20:02] Oh wait [18:20:07] wikibase is using all the resources [18:20:11] and it was at the bottom [18:20:25] Oh never mind [18:20:33] that was using normal instances and not nodepool [18:20:44] so nodepool is down it seems [18:21:09] Oh its working [18:21:31] for experimental jobs [18:23:21] check https://integration.wikimedia.org/ci/ as well, it sure doesn't look down to me [18:26:31] greg-g yep sorry, it seemed that at the bottom it was using the resources, so i presumed because of that it was down since it didnt look like it was running at the top [18:26:35] and sorry for pining [18:26:40] pining = pinging [18:26:46] PROBLEM - Host Generic Beta Cluster is DOWN: PING CRITICAL - Packet loss = 100% [18:34:53] not sure what that is testing, but https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page just loaded for me [18:35:31] greg-g: It's just generically down [18:36:03] I'm just generically not believing that :) [18:41:24] just emailed an incident report [18:49:54] RECOVERY - Host Generic Beta Cluster is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [18:54:33] https://wikitech.wikimedia.org/wiki/Incident_documentation/20160809-MediaWiki [19:31:28] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Services, 07Easy, 13Patch-For-Review: npm-node-4.3 jobs are failing because node is version 4.4.6 - https://phabricator.wikimedia.org/T139374#2541301 (10mobrovac) a:03Paladox This has been worked around. Resolving for now. [19:32:30] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Services, 07Easy, 13Patch-For-Review: npm-node-4.3 jobs are failing because node is version 4.4.6 - https://phabricator.wikimedia.org/T139374#2541317 (10Paladox) @Krenair yeh I think we should rename them to v4 instead of it bein... [19:32:59] Woops wrong person ^^ was meant for krinkle sorry. [19:48:46] paladox: Yeah, you can edit the comment [19:49:07] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 06Services, 07Easy, 13Patch-For-Review: npm-node-4.3 jobs are failing because node is version 4.4.6 - https://phabricator.wikimedia.org/T139374#2541379 (10Krinkle) 05Open>03Resolved @Paladox Yeah, let's track that separately th... [19:49:07] Krinkle yep i did. [19:49:09] thx [19:49:16] Your welcome :) [19:51:09] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:51:43] Krinkle that should be easy to do, i just used github now. Im creating the patch [19:51:55] I hope i did not just gnix myself and when i upload it fails LOl [20:10:29] * paladox wonders if we can do https://phabricator.wikimedia.org/T124690 [20:10:31] :) [20:14:12] (03PS1) 10Paladox: Rename npm-node-4.3 to npm-node-4 [integration/config] - 10https://gerrit.wikimedia.org/r/304068 [20:14:18] Krinkle ^^ [20:15:02] (03CR) 10jenkins-bot: [V: 04-1] Rename npm-node-4.3 to npm-node-4 [integration/config] - 10https://gerrit.wikimedia.org/r/304068 (owner: 10Paladox) [20:16:56] (03PS2) 10Paladox: Rename npm-node-4.3 to npm-node-4 [integration/config] - 10https://gerrit.wikimedia.org/r/304068 [20:18:06] (03CR) 10jenkins-bot: [V: 04-1] Rename npm-node-4.3 to npm-node-4 [integration/config] - 10https://gerrit.wikimedia.org/r/304068 (owner: 10Paladox) [20:19:40] (03PS3) 10Paladox: Rename npm-node-4.3 to npm-node-4 [integration/config] - 10https://gerrit.wikimedia.org/r/304068 [20:22:41] 06Release-Engineering-Team, 06Developer-Relations (Jul-Sep-2016): Release Engineering Offsite - https://phabricator.wikimedia.org/T141941#2541484 (10Rfarrand) Yup - this is just basically a place holder to show what I am working on. Will take an opportunity to update. In discussions to work out details with a... [20:51:40] 07Browser-Tests, 03Reading-Web-Sprint-78-Terminal-Velocity: Various browser tests failing due to login error - https://phabricator.wikimedia.org/T142600#2541616 (10Jdlrobson) I can replicate this. x-analytics: "ns=-1;special=Badtitle;WMF-Last-Access=10-Aug-2016;https=1" x-client-ip: "198.73.209.4" MediawikiApi... [20:52:35] 07Browser-Tests, 03Reading-Web-Sprint-78-Terminal-Velocity: Various browser tests failing due to login error - https://phabricator.wikimedia.org/T142600#2541624 (10Jdlrobson) ``` 2.1.1 :001 > require 'mediawiki_api' => true 2.1.1 :002 > c = MediawikiApi::Client.new('https://en.wikipedia.beta.wmflabs.org/w/ap... [20:52:50] ^ tgr [20:59:29] 10Deployment-Systems, 06Operations: Have fallback communication channel when freenode has problems - https://phabricator.wikimedia.org/T127904#2057999 (10Dzahn) WMF used to run a freenode server (T82958) but we are now decom;ing it (T120752). That would have been the perfect fallback, like if there are netspli... [21:01:43] 07Browser-Tests, 03Reading-Web-Sprint-78-Terminal-Velocity: Various browser tests failing due to login error - https://phabricator.wikimedia.org/T142600#2541660 (10Tgr) Oh, right, I didn't think of that. The least painful way to fix it is just to change passwords. The old login API just submits the username an... [21:08:44] greg-g hi, it seems nodepool is vary slow https://integration.wikimedia.org/zuul/ only one test is running. [21:08:47] sorry for ping [21:08:57] and sorry if i am wrong again. But it seems very slow [21:09:31] it is slow, yes, but there are now 3 instances [21:09:37] I'm not sure what to make of it [21:10:21] Oh [21:11:57] 21:10 < thciprian> there are a bunch marked as "building" in nodepool list [21:12:06] 21:10 < thciprian> so "coming soon" I suppose [21:12:10] Oh [21:12:14] Thanks [21:22:49] 10Deployment-Systems, 06Operations: Have fallback communication channel when freenode has problems - https://phabricator.wikimedia.org/T127904#2541749 (10greg) I'm inclined to just have an official Conpherence room for this. It'd need to be clear that this room (or any solution, really) is **only for backup pu... [21:26:59] 10MediaWiki-Codesniffer: Update squizlabs/PHP_CodeSniffer to 3.x - https://phabricator.wikimedia.org/T142474#2541754 (10Legoktm) Oh wow, this looks awesome. But it does require updating every single one of our sniffs, so I would like to finish the current GSoC project, do a 0.8.0 release, and then begin working... [21:28:44] 10MediaWiki-Codesniffer: Update squizlabs/PHP_CodeSniffer to 3.x - https://phabricator.wikimedia.org/T142474#2541763 (10Paladox) @legoktm yeh we get speed increase too since it can now test at the same time. So we choose the file limit it tests at the same time. We should also make this release a 1.x as it is br... [21:36:27] hmm [21:36:45] no trusty nodes/ [21:36:46] ? [21:36:51] Oh [21:36:57] I think it is slow [21:37:19] no trusy nodepool nodes, that is? [21:37:37] looks like it from zuul... [21:38:24] I think that may be [21:38:30] because it has to decide what [21:38:46] avilable nodes are avilable for nodepool, and currently there arnt alot [21:38:58] there is one trusty node building currently according to nodepool list [21:39:01] today, even though there should be at least 2 for trusty and 8 for jessie [21:39:32] although, right now, there are only 6 nodes shown in nodepool list. [21:40:40] so [21:40:47] what did I miss? [21:41:10] tl;dr: the jobs are stuck on running tests on a trusty instance, nodepool is now building 1 such instance, but we don't know why it's so far behind as of now [21:41:37] * apergos heaves a heavy sigh [21:41:43] well it must be smoothie time [21:41:55] LOL [21:41:58] I was going to wait until officially "clocking out" but that seems not in the cards [21:42:13] I have the other half smoothie from earlier, might as well get down to it [21:42:14] thcipriani: I see the nodes building and then going away, but jobs aren't starting in zuul... [21:42:25] hrm. Seeing a lot of Forbidden: Quota exceeded for instances: Requested 1, but already used 10 of 10 instances (HTTP 403) in the nodepool logs, but I only see 6 instances when I run nodepool list I'm unclear why that's happening? [21:42:34] that happened to hashar ^^ on sunday [21:42:44] hrm, /me checks email [21:43:06] But instead of it being 6, it was 10 [21:43:19] but there wasent actually 10 avilable. [21:43:45] the inode issue? [21:44:01] inodes on gallium? [21:44:12] 10Deployment-Systems, 06Operations: Have fallback communication channel when freenode has problems - https://phabricator.wikimedia.org/T127904#2057999 (10Southparkfan) Do we prefer a fallback that cannot be impacted by a Wikimedia outage of any kind? Conpherence is an option, but it is not off-site; a network... [21:44:30] He was going to write to the ops list [21:44:41] So if Antoine wrote about inodes on sun/mon.. [21:44:42] That'll be it :) [21:44:52] Oh wait [21:44:54] jenkins [21:45:08] He did have to delete a file from jenkins [21:45:12] https://phabricator.wikimedia.org/T126552 [21:45:17] "Jenkins files under /var/lib/jenkins/config-history/config need to be garbage collected" [21:45:23] 10Deployment-Systems, 06Operations: Have fallback communication channel when freenode has problems - https://phabricator.wikimedia.org/T127904#2541910 (10Dzahn) There is also the external VM that runs wikitech-static. It is outside WMF infra for this reason. [21:45:27] Yeah [21:45:31] Looks like an inode issue [21:45:57] Yep [21:46:13] Dirty fix is: [21:46:13] ssh gallium [21:46:13] find /var/lib/jenkins/config-history/config/nodes \ [21:46:13] -path '*_deleted_*' -delete [21:46:27] 10Continuous-Integration-Infrastructure, 07Jenkins, 13Patch-For-Review: Jenkins files under /var/lib/jenkins/config-history/config need to be garbage collected - https://phabricator.wikimedia.org/T126552#2541914 (10Paladox) I am wondering, can we set this as high priority please? [21:46:29] Yep [21:46:38] uh [21:46:45] but that's not the issue [21:46:57] cd: /var/lib/jenkins/config-history/config/nodes: No such file or directory also df -i shows 8% inodes used [21:47:04] yeah, not it then [21:47:07] Oh [21:47:14] it only needs being cleaned out every few months or so [21:47:24] oh [21:47:35] * paladox wonders what it could be [21:47:47] I see one instance in nodepool alien-list [21:48:00] it's *just* trusty too right? not jessie? [21:48:13] legoktm it is both [21:48:21] restart nodepool then? [21:48:22] just jessie has more resources [21:48:25] or the zeromq thingy? [21:48:40] LOL, yeh. But could possibly be labs [21:48:55] since they did disable instance creation [21:49:08] so, hmm, now I'm seeing 9 instance in nodepool list [21:49:14] Oh [21:49:15] maybe it's just super bogged down? [21:49:45] no, trusty jobs are definitely not running [21:49:54] strange since it seems to be running on one repo [21:50:06] yeah, I don't see any trusty instances in the jenkins view [21:50:11] seems to be getting slower and slower. [21:50:17] yeah, I don't really see any jobs running. Must be some communication issue here. Lots of building, lots of deleting. [21:51:01] oh [21:51:15] I looked in https://integration.wikimedia.org/ci/log/ and didn't see anything obvious [21:52:43] have we tried restarting gearman yet? [21:52:46] no [21:53:10] But then it would happen to the other jobs if it was gearman [21:53:22] but we are only seeing the problem with nodepool. [21:53:30] paladox: good point [21:53:34] Maybe the rabitt thing needs restarting. [21:53:51] yep, :) [21:54:43] I only see one test working now [21:55:01] that was it, jenkins just gave me the big +2 [21:55:06] :) [21:55:06] I can create new vms, so I don't think there's anything wrong with rabbit or scheduling [21:55:13] Oh [21:56:04] yeah, I don't think it's that it can't create. more like it isn't communicating with zuul. [21:56:34] Oh, i guess try restarting gearman and zuul [21:57:07] huh, I was the only one who got lucky there? weird [21:57:12] LOL [21:57:15] 07Browser-Tests, 03Reading-Web-Sprint-78-Terminal-Velocity: Various browser tests failing due to login error - https://phabricator.wikimedia.org/T142600#2541936 (10Jdlrobson) @greg is it possible for your team to rule out whether the user that runs the beta cluster tests simply needs a password change? [21:57:17] not complaining [21:57:19] greg-g: ^ [21:58:07] I'm going to restart nodepool [21:58:13] jdlrobson: don't need double pings :) [21:58:17] Thanks [21:59:30] !log restarted nodepool, no trusty instances were being used by jobs [21:59:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:01:08] thcipriani: working now! [22:01:25] :) [22:01:49] that's good. still unclear on why it wasn't working before though :\ [22:02:33] would be nice if releng had more rights on that box, like: lsof or strace :( [22:02:47] thcipriani: file a task (sorry) [22:02:54] But weird trusty works now, but seems only 4 instances are working for nodepool [22:03:50] I'm guessing that task will come to ops? If so just be nice and specific about what you want and an example (ie now) of how it would have come in handy [22:04:32] that way it will take the minimum time possible [22:04:52] 07Browser-Tests, 03Reading-Web-Sprint-78-Terminal-Velocity: Various browser tests failing due to login error - https://phabricator.wikimedia.org/T142600#2541959 (10greg) They can be changed in the jenkins credential store: https://integration.wikimedia.org/ci/credential-store/domain/selenium/ If it is changed... [22:05:25] ack, will try to keep the access request task scope narrow [22:06:05] 07Browser-Tests, 03Reading-Web-Sprint-78-Terminal-Velocity: Various browser tests failing due to login error - https://phabricator.wikimedia.org/T142600#2541962 (10greg) I don't see any warning about changing the password when I manually log in to this selenium user account [22:06:22] * paladox goes and watch tv :) [22:06:46] thanks thcipriani [22:06:51] what happens to the things still in gate-and-submit? [22:07:01] they'll get caught up [22:07:21] they'll get processed slowly, IIRC gate-and-submit takes precedence over test so it *should* clear out first [22:07:31] hm [22:07:38] I'm just watching a lack of movement over there is all [22:07:42] guess we'll see [22:07:43] no need for rechecks though? [22:07:46] No [22:07:51] That makes it worse ;-) [22:07:58] no, no rechecks please [22:08:02] Like pressing F5 repeatedly when a website is slow ;-) [22:08:07] oh, nm, it finished and just didn't email me [22:08:08] * andrewbogott merges [22:08:17] click on the top change there in gate-and-submit, you can see it now has some of the trusty tess done [22:08:46] It's mainly backlogged because there's only 1 trusty instance and a couple of jobs required trusty. So other jobs got caught behind them. As best I can tell at least. [22:08:50] Yay old branches that suck! [22:08:59] ostriches: (and if there's no cache what so ever in front of said website ;) ) [22:09:04] There should be at least 2 [22:09:08] for trusty [22:09:18] unless they changed [22:09:22] I couldn't find where you see the trusty stuff happening [22:09:32] it's gone now [22:09:32] probably clicked the wrong thing somehow [22:09:34] ah [22:09:39] it was https://gerrit.wikimedia.org/r/#/c/304107/ [22:10:00] aka, blame ostriches [22:10:02] ;) [22:10:09] LOL [22:10:41] hahahaha [22:11:01] mister "nothing's wrong, I'll just got have coffee" *whistles innocently* [22:11:04] *go [22:11:10] LOL [22:11:29] apergos: remember his old nick :) [22:11:30] Nothing's wrong! [22:11:34] :-D [22:11:42] And my coffee was excellent :p [22:11:49] well you can have my share [22:11:51] hate the stuff [22:12:11] I'm having your share of the smoothies after all [22:12:12] coffeecoffeecoffee [22:12:18] I want a smoothie now [22:12:23] LOL [22:13:04] come here and get it buddy [22:13:18] lol [22:14:54] apergos: I think it'll melt before I can.... [22:15:29] well, I offered [22:16:10] * paladox goes back to watching tv 23:15pm bst here :) [22:16:51] apergos: Offer appreciated. I'll hold that to you next time we're in the same city :) [22:17:08] if we're not here I'll have to buy you one [22:17:20] but if we're here you'll get one fresh outa the blender [22:18:29] 10Continuous-Integration-Infrastructure, 07Jenkins, 13Patch-For-Review, 07Wikimedia-Incident: Jenkins files under /var/lib/jenkins/config-history/config need to be garbage collected - https://phabricator.wikimedia.org/T126552#2542027 (10greg) There's not much use in petitioning about specific priority sett... [22:19:07] seems like it's going to be slow for awhile [22:19:20] good time for me to wander off and read... [22:22:14] gate and submit is catching up, normal test queue will catch up quickly after that [22:37:39] ostriches: Seems to be stuck again [22:37:53] at least wikibase jobs seem stuck [22:38:08] It's not stuck, it's just slow as fuck [22:38:16] Hey I rhymed! [22:38:20] I see no running nodepool instances [22:38:24] https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm-composer/5851/console hm [22:38:28] it's goin' [22:38:29] just slow [22:38:41] no output for 17 minutes [22:38:46] OH wait what? [22:38:50] That's dumb as shit [22:38:56] I wish there was a quick view into what nodepool is doing [22:39:12] Testing is for people who write bad code. Maybe we should write less bad code and then we don't need to test things! [22:39:17] ;-) [22:40:00] humans should strop writing code [22:40:02] Made that one fail [22:40:08] hoo: Silly humans [22:40:10] thanks [22:41:12] thcipriani: I can't confirm, but I see no nodepool instances in the jenkins ui [22:41:22] I mean, I can't confirm via doing a nodepool list [22:41:28] * greg-g doesn't have access [22:41:32] hmm [22:41:50] seem like there are some building [22:41:57] k [22:42:01] (sorry) [22:42:24] looks like this time jessie aren't building this time? [22:42:31] none [22:42:42] well, in the jenkins ui at least [22:42:47] and trusty now too. [22:43:19] OuKB: yes, we know :) [22:43:25] oh, you weren't online before [22:44:17] hmm, it's trying to build some jessie instances, seemingly [22:47:41] well. [22:47:57] now instance creation/deletion may be busted. [22:54:41] thcipriani: how terrible would it be if I moved some jobs back on to the normal trusty slaves? they're just sitting there unused right now [22:54:56] andrewbogott: hmm, I think there may be something going on with nodepool image creation/deletion. I can't seem to manually delete a nodepool instance. The same IDs have been marked as building for a while now :\ [22:55:10] legoktm: not terrible at all in my view. [22:56:10] thcipriani: jessie is moving again now [22:56:29] thcipriani: that's one of the labvirt hosts acting up, I think yuvi is going to depool it [22:56:37] it's acting in extreme slow motion for some reason [22:56:43] oh, good to know! [22:57:05] from -labs topic: Status: Normal, instance creation is disabled [22:57:12] that's still accurate, I presume? [22:57:15] oh [22:57:53] greg-g: it's re-enabled but of course now we have an all new bug that just started 10 minutes ago [22:58:01] heh [22:58:14] andrewbogott: my condolences/sympathies [22:58:41] I'm about to be on vacation where there's no phone service. So this will stop being my problem, somehow :/ [22:59:12] where is the discussion about this happening? [23:03:03] 07Browser-Tests, 03Reading-Web-Sprint-78-Terminal-Velocity: Various browser tests failing due to login error - https://phabricator.wikimedia.org/T142600#2542235 (10Jdlrobson) Thanks @greg Looking closely all issues seem to be in api.create_page So it might be an issue with the login here: https://github.com/... [23:06:09] 07Browser-Tests, 03Reading-Web-Sprint-78-Terminal-Velocity: Various browser tests failing due to login error - https://phabricator.wikimedia.org/T142600#2542243 (10greg) sure, not sure how that changes the test i just did where I can log in using the username/password without prompt for changing it [23:06:31] (03PS1) 10Legoktm: Temporarily move composer-hhvm/php5 jobs off of nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/304131 [23:09:05] (03CR) 10Zppix: [C: 031] Temporarily move composer-hhvm/php5 jobs off of nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/304131 (owner: 10Legoktm) [23:11:54] nodes are very slowly being created... [23:12:39] legoktm i guess it may be because of thcipriani: that's one of the labvirt hosts acting up, I think yuvi is going to depool it [23:12:44] hm [23:14:11] it's acting in extreme slow motion for some reason [23:15:09] no jessie nodes though [23:15:32] yeah, I'm still having trouble manually deleting nodes. [23:15:42] (03CR) 10Legoktm: [C: 032] Temporarily move composer-hhvm/php5 jobs off of nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/304131 (owner: 10Legoktm) [23:17:36] legoktm we should revert that later today ^^ once we have found nodepool to be improved. [23:24:17] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T139217#2542315 (10greg) [23:36:19] Hi nodepool has gone down [23:36:22] no longer working [23:36:45] We know. [23:37:09] ok [23:37:23] nodepool isn't down, it's just very slow apparently [23:37:47] *probably* due to labs [23:37:54] investigation is on-going [23:37:55] it isent working since labs took down one of the labs virt hosts [23:38:30] legoktm: i can tell by looking at zuul i've seen stuff sitting there for over 2 hrs now [23:38:43] if theres anything i can do to help tell me [23:38:57] Don't submit new changes and don't type "recheck" on everything ;-) [23:39:10] Good thing i've not been doing that :P [23:39:18] and i dont think i have access to recheck [23:39:19] lol [23:39:32] !log restarted rabbitmq on labcontrol1001 [23:39:39] Zppix ^^ [23:39:49] that means nothing to me :P [23:39:57] (03CR) 10Legoktm: [V: 032] Temporarily move composer-hhvm/php5 jobs off of nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/304131 (owner: 10Legoktm) [23:40:04] It actually has to do with nodepool [23:40:24] its prob time to upgrade the cpu or get a bigger nodepool cap [23:40:58] !log deploying https://gerrit.wikimedia.org/r/304131 [23:41:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:41:16] LOL, actually we carnt since labs is out of space. [23:41:23] developers - We fix bugs and break the nodepool while doing it :P [23:41:30] lol [23:41:43] let's keep the non-critical discussion to a minimum right now, please [23:41:48] ok sorry [23:41:59] wow. [23:42:08] In about 30 seconds jenkins processed 12 jobs [23:42:17] oh wow [23:42:21] only 40+ to go [23:43:51] (that was composer-hhvm/php55) [23:44:02] oh [23:44:30] (03PS1) 10Paladox: Revert "Temporarily move composer-hhvm/php5 jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/304145 [23:44:34] (03CR) 10Paladox: [C: 04-1] Revert "Temporarily move composer-hhvm/php5 jobs off of nodepool" [integration/config] - 10https://gerrit.wikimedia.org/r/304145 (owner: 10Paladox) [23:44:47] (03CR) 10Paladox: "Wait until labs fix the problem." [integration/config] - 10https://gerrit.wikimedia.org/r/304145 (owner: 10Paladox) [23:46:36] I don't even know if we should revert [23:46:53] using an entire VM for these 4 second jobs is extremely wasteful [23:47:02] heh [23:47:32] !log stopping nodepool to clean up [23:47:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:48:13] like, more than half of our slaves are sitting there doing nothing [23:48:19] legoktm: we prob should merge the 4 second jobs with another vm that isnt used as much or doesnt use/need alot of power [23:48:44] but idk what to do about the jessie jobs [23:49:19] Well we carnt roll back npm node 4.3 [23:49:19] One of the jessie vms are offline [23:49:27] since that will break alot of jobs [23:49:34] make that 2 [23:49:41] :/ [23:49:54] But what we could do is see if we can setup some jessie instances and use that until labs get alot of space [23:50:12] I say we merge jobs that can be merged to a set of vms and try to optimise what we have to work with [23:51:03] Zppix nodepool is different to instances [23:51:08] ah [23:51:12] since nodepool creates instances on the fly [23:51:21] whereas instances are already created [23:51:26] I thought we disabled nodepool [23:51:47] The pros of nodepool is increased security, the cons is we hit labs limited space making ci unstable [23:52:09] Zppix no we carnt disable nodepool, we use it for npm node 4.3 [23:52:20] PROBLEM - nodepoold running on labnodepool1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [23:52:23] which would break alot of tests if we go back to 0.10 [23:52:28] rip noodepool [23:52:40] (03PS1) 10Legoktm: Move mediawiki-core-phpcs off of nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/304149 [23:53:19] I say we move anything possible off the nodepool and use nodepool for what we actually need it for atm til we can find an alternative [23:53:44] (03CR) 10Zppix: [C: 031] Move mediawiki-core-phpcs off of nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/304149 (owner: 10Legoktm) [23:53:46] uh, no [23:53:51] Zppix there is no alternitive, plus it benefits us for the security [23:54:05] What legoktm is doing is temporarily to get things to move on [23:54:10] until labs fixes it [23:54:20] (me guessing, sorry if i am wrong) [23:54:21] but we have no clue how long that will take [23:54:29] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T139217#2542450 (10Reedy) [23:55:03] Zppix but it will be impossible for us to go of nodepool, since it will break tests, plus we want the security it brings. [23:55:19] (03CR) 10Legoktm: [C: 032 V: 032] Move mediawiki-core-phpcs off of nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/304149 (owner: 10Legoktm) [23:55:23] So we are losing the battle either way [23:56:27] !log deploying https://gerrit.wikimedia.org/r/304149 [23:56:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:56:48] Is there a reason for 2 jessie instances or whatever being offline? [23:56:57] correction 1 [23:57:07] wow my eyes are bad its now 3 [23:57:07] Not sure [23:57:19] legoktm: thank you. [23:57:57] :) [23:57:58] legoktm im wondering will this be short term, ie until labs fix it [23:58:05] paladox: I don't know [23:58:08] Ok [23:58:35] Hopefully labs fixes the issue soon [23:58:37] Zppix: yes, that's normal [23:58:47] ah i'm still learing the CI ways [23:59:47] I mean, it's normal given we stopped the nodepool service