[02:12:59] 10Gerrit, 06Release-Engineering-Team, 06Operations: Investigate why cobalt went down for 1 minute on Feburary 5 and then again 4 minutes later - https://phabricator.wikimedia.org/T157203#2999987 (10demon) 05Open>03declined Meh, not worth investigating...seems as though it was transient. [03:56:28] Project selenium-MultimediaViewer » firefox,mediawiki,Linux,contintLabsSlave && UbuntuTrusty build #288: 04FAILURE in 27 sec: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=mediawiki,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/288/ [04:06:46] Yippee, build fixed! [04:06:47] Project selenium-MultimediaViewer » safari,beta,OS X 10.9,contintLabsSlave && UbuntuTrusty build #288: 09FIXED in 10 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=safari,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=OS%20X%2010.9,label=contintLabsSlave%20&&%20UbuntuTrusty/288/ [06:33:46] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:01:46] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [07:43:04] PROBLEM - Puppet run on integration-slave-trusty-1003 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [08:23:06] RECOVERY - Puppet run on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [09:43:08] 10Continuous-Integration-Config, 10MediaWiki-extensions-WikiLexicalData-or-OmegaWiki, 07Easy, 07I18n: Extension OmegaWiki failing tests due to missing apihelp messages - https://phabricator.wikimedia.org/T155044#2931798 (10hashar) p:05Triage>03Normal [10:03:56] umm, I might have made zuul-runner stuck, can someone check? [10:04:56] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10scap, 10ORES, 06Revision-Scoring-As-A-Service: Running out of space when deploying on sca03 (deploy-cache) - https://phabricator.wikimedia.org/T157199#2998859 (10hashar) The root cause is that the scap cache clone submodules from the deployment... [10:08:27] Nikerabbit: there is no such thing as zuul-runner ? :} [10:13:18] hashar: https://integration.wikimedia.org/zuul/ 335610,2 seems stuck, I accidentally submitted the patch before it was merged [10:13:44] oh, it just went through [10:20:20] Nikerabbit: looks like it self fixed [10:20:55] oh no it is in gate-and-submit [10:21:50] Nikerabbit: I guess once the test jobs complete, the change will dequeue from the gate-and-submit [10:21:55] and things will process just fine [10:22:01] hashar: let's hope so [10:22:09] anyway, I'm done for today :D [10:22:14] :-} [11:26:05] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 10Developer-Wishlist (2017), 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#2637873 (10scfc) I dislike the idea because it is neither Jenkins nor... [11:56:21] (03CR) 10Hashar: [C: 032] Remove tests for Vine extension [integration/config] - 10https://gerrit.wikimedia.org/r/336127 (https://phabricator.wikimedia.org/T157224) (owner: 10Florianschmidtwelzow) [11:57:19] (03Merged) 10jenkins-bot: Remove tests for Vine extension [integration/config] - 10https://gerrit.wikimedia.org/r/336127 (https://phabricator.wikimedia.org/T157224) (owner: 10Florianschmidtwelzow) [12:01:50] Yippee, build fixed! [12:01:51] Project selenium-RelatedArticles » chrome,beta-desktop,Linux,contintLabsSlave && UbuntuTrusty build #301: 09FIXED in 49 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-desktop,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/301/ [12:02:10] Project selenium-RelatedArticles » chrome,beta-mobile,Linux,contintLabsSlave && UbuntuTrusty build #301: 04FAILURE in 1 min 9 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-mobile,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/301/ [12:36:13] 10Gerrit, 06Developer-Relations, 10Developer-Wishlist (2017): Implement a way to bring GitHub pull requests into gerrit - https://phabricator.wikimedia.org/T37497#398624 (10ThurnerRupert) i stumbled on https://gerrit.googlesource.com/plugins/github/+/master/README.md - is this something you envision here? [13:01:04] (03PS2) 10Hashar: Change SmashPig tests from PHP 5.3 to 5.5 [integration/config] - 10https://gerrit.wikimedia.org/r/335271 (owner: 10Ejegg) [13:01:48] (03CR) 10Hashar: [C: 032] "\O/" [integration/config] - 10https://gerrit.wikimedia.org/r/335271 (owner: 10Ejegg) [13:02:39] (03Merged) 10jenkins-bot: Change SmashPig tests from PHP 5.3 to 5.5 [integration/config] - 10https://gerrit.wikimedia.org/r/335271 (owner: 10Ejegg) [13:08:54] (03PS1) 10Hashar: [SemanticPageMaker] remove jshint/jsonlint [integration/config] - 10https://gerrit.wikimedia.org/r/336216 [13:10:09] (03CR) 10Hashar: [C: 032] [SemanticPageMaker] remove jshint/jsonlint [integration/config] - 10https://gerrit.wikimedia.org/r/336216 (owner: 10Hashar) [13:10:19] 10Gerrit, 06Release-Engineering-Team, 06Operations: Investigate why cobalt went down for 1 minute on 2017-02-05 and then again 4 minutes later - https://phabricator.wikimedia.org/T157203#3001539 (10Aklapper) [13:11:01] (03Merged) 10jenkins-bot: [SemanticPageMaker] remove jshint/jsonlint [integration/config] - 10https://gerrit.wikimedia.org/r/336216 (owner: 10Hashar) [13:21:43] 10Gerrit, 10DBA, 06Operations, 13Patch-For-Review: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3001567 (10Aklapper) @Paladox: Please do not add random projects to tasks. [13:49:07] PROBLEM - Puppet run on buildlog is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:01:35] hashar: if you're feeling up to it, want to push through that image renaming for nodepool? [14:06:20] 10Deployment-Systems, 06Release-Engineering-Team, 10scap, 06Operations, 15User-Addshore: cannot delete non-empty directory: php-1.29.0-wmf.3 messages on 'scap sync' on mwdebug1002 - https://phabricator.wikimedia.org/T157030#3001755 (10Addshore) Looks like these messages still appear today during EU swat! [15:41:08] chasemp: eek I have missed your poke sorry :/ [15:43:34] hashar: no worries, depending on how adventurous you are feeling we can could knock it out [15:43:52] basically, you babsit your patch and then I can push through the check based on it later today [15:44:04] sounds good [15:44:34] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #316: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/316/ [15:45:00] * hashar checks deployment calendar [15:45:19] chasemp: I guess we can do it now [15:45:42] hashar: you are ready for me to +2 and merge https://gerrit.wikimedia.org/r/#/c/335809/? [15:45:56] yeah [15:45:59] will stop nodepool [15:47:29] saw your sal log and following that I merged [15:48:25] hashar: let me know what I can do [15:48:41] chasemp: should be good [15:48:46] running puppet: sudo /usr/local/sbin/puppet-run [15:50:42] renamed the base images [15:51:37] so the base image was recreated w/ new scheme? [15:52:21] PROBLEM - nodepoold running on labnodepool1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [15:52:41] I am just renaming it in glance [15:56:19] renaming fields in the database [15:58:40] going to restart it [15:59:21] RECOVERY - nodepoold running on labnodepool1001 is OK: PROCS OK: 1 process with UID = 113 (nodepool), regex args ^/usr/bin/python /usr/bin/nodepoold -d [15:59:43] hashar: ok, all good? [16:00:08] checking [16:01:15] it is booting still [16:06:16] chasemp: I messed up some database update :\ [16:06:26] though it should be good now. Restarted nodepool again [16:06:30] remediatoin? [16:06:31] ok [16:10:08] I am afraid it is messed up :/ [16:10:38] it loose track fo the snapshot images [16:10:39] so it is recreating fresh new ones [16:10:42] which takes roughly 10 mins [16:11:06] well even less [16:11:12] just boot an instance / run puppet to catch up [16:11:16] then upload the result [16:11:18] tailing the progress in /var/log/nodepool/image.log [16:13:58] ok, I'm not sure what we would do from our side to help that [16:15:06] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-InterwikiSorting, 06Operations, 10Wikidata, and 4 others: Deploy InterwikiSorting extension to beta - https://phabricator.wikimedia.org/T155995#3002296 (10Addshore) [16:16:02] I see a lot of [16:16:03] 2017-02-06 16:15:53,955 DEBUG nodepool.NodePool: wmflabs-eqiad does not have image snapshot-ci-jessie for label ci-jessie-wikimedia [16:16:06] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-InterwikiSorting, 06Operations, 10Wikidata, and 4 others: Deploy InterwikiSorting extension to beta - https://phabricator.wikimedia.org/T155995#2960964 (10Addshore) 05Open>03Resolved The InteriwkiSorting extension is now deployed on beta sites! Thi... [16:16:47] I'm unclear on why the DB manual change was needed? [16:20:22] chasemp: nodepool store the images/snapshot image names in the database [16:20:25] so they have to be updated [16:20:40] why not generate new ones via the cli? [16:20:48] that is what it ends up doing [16:20:51] :( [16:22:00] what would you like to do for this hashar? My impression was change the name of the image via config and then have nodepool reup the same as always, I'm not sure what makes sense atm if we've changed the name of the snapshot out from under it manually [16:22:22] it has finished the ci-jessie snapshot [16:22:27] so will see how it goes with that [16:22:29] ok [16:22:33] else I guess we are good to revert [16:22:56] I'm hopping on a call, but I'm watching this channel for pings on whether you want to revert or not hashar [16:23:03] let me know [16:25:57] 2017-02-06 16:24:35,789 INFO nodepool.NodeLauncher: Creating server with hostname ci-jessie-wikimedia-516170 in wmflabs-eqiad from image snapshot-ci-jessie for node id: 516170 [16:26:05] 2017-02-06 16:24:35,578 INFO nodepool.NodePool: Need to launch 19 ci-jessie-wikimedia nodes for contint1001 on wmflabs-eqiad [16:26:10] chasemp: looks like it managed to sort it out [16:26:13] at least for the jessie image [16:29:33] chasemp: jessie instances are still booting [16:29:45] I guess it takes a while because the image get copied/synced to the compute nodes [16:33:08] chasemp: jessie works!! [16:39:57] fixing up trusty one still [16:40:08] it got deadlocked trying to upgrade mariadb bah [16:53:14] hashar: ok so are you actively fixing something for the trusty image now? [16:53:20] yes [16:53:22] k [16:53:26] waiting for https://horizon.wikimedia.org/project/instances/4c4fea80-6913-46c7-8e7b-b557050c7343/ to spawn [16:53:34] that is the instance that is going to be used for a snapshot [16:53:40] ah it booted [16:53:52] k [16:53:56] being provisioned [16:54:08] I'm just here waiting in teh wings let me know how I can help [16:54:21] puppet running [16:54:46] ./setup_node.sh complete (hostname: snapshot-ci-trusty-1486399717) ! [16:55:00] well beside watching logs and waiting .. not much to do [16:55:22] maybe we should set up a small nodepool targeting the labtest / dev openstack infra [16:55:53] Image snapshot-ci-trusty-1486399717 in wmflabs-eqiad is ready [16:57:38] and two trusty instances are spawning [16:59:10] 03Scap3, 10Parsoid: Saying yes (y) continues to all groups - https://phabricator.wikimedia.org/T156839#3002418 (10thcipriani) p:05Triage>03Normal [17:00:18] waiting for trusty instances to boot [17:00:23] and confirm they run fine [17:00:40] yup a job ran on trusty \O/ [17:00:56] hashar so, all indications are things are running off new images now? [17:01:05] yup [17:01:18] ok [17:01:26] so the thing is I updated the state in Mysql manually (I had the command ready) [17:01:38] but nodepool did not find the renamed snapshots for some reason [17:01:41] eventually it spawned a new jessie one [17:01:59] and the trusty image failed to upgrade mariadb. So I build a new image which managed to boot [17:02:53] chasemp: so yeah looks in a good state right now [17:03:09] ok [17:03:12] thanks hashar [17:03:16] that was messy sorry about that :( [17:03:31] done now :) [17:03:38] there are a bunch of images backup etc [17:03:42] will clear them out tonight [17:03:45] ok [17:03:54] so from now on [17:04:18] the instances that run jobs are matching: /ci-(trusty|jessie)-wikimedia-\d+/ [17:04:38] and the snapshots should be: /snapshot-ci-(trusty|jessie)-\d+/ [17:04:45] so they can be filtered out [17:05:27] all snapshots have snapshot in thte name so I was going ot filter on that now [17:05:33] yup [17:06:15] and there are no aliens right now [17:09:01] chasemp: yeah all good \O/ [17:09:25] chasemp: while you are around. Do you think we could set up a small nodepool on labtest ? [17:09:32] would be nice to test that kind of upgrade/changes [17:09:49] I'm on a meeting atm, but possibly yes. andrew is out sick however and would probably be part of it. [17:09:51] not sure how easy/doable it is though [17:10:13] would poke andrew when he is back I guess. Good meeting chasemp ! [17:12:05] 03Scap3, 06Operations: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3002474 (10thcipriani) Is there anything on the releng side we need to do to push this forward? @Ottomata are you the right person to bother? :) For context this will likely solve {T147856} and (probabl... [17:12:30] hashar: sure thanks also https://gerrit.wikimedia.org/r/#/c/335373/ [17:26:40] chasemp: will attempt to review your last patch though in meeting now, then dinner + another meeting [17:26:45] bah [17:27:24] 10Gerrit, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3002514 (10demon) [17:31:16] hashar: no worries I think is gtg from previous review just an fyi I'm going to drop that today if possible [17:33:45] 10Deployment-Systems, 03Scap3, 07WorkType-NewFunctionality: Scap3 submodule space issues - https://phabricator.wikimedia.org/T137124#3002570 (10hashar) [17:34:01] 10Deployment-Systems, 03Scap3, 07WorkType-NewFunctionality: Scap3 submodule space issues - https://phabricator.wikimedia.org/T137124#2358196 (10hashar) [17:34:03] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10scap, 10ORES, 06Revision-Scoring-As-A-Service: Running out of space when deploying on sca03 (deploy-cache) - https://phabricator.wikimedia.org/T157199#2998859 (10hashar) [17:34:32] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10scap, 10ORES, 06Revision-Scoring-As-A-Service: Running out of space when deploying on sca03 (deploy-cache) - https://phabricator.wikimedia.org/T157199#2998859 (10hashar) Made this bug a dupe of the older task T137124 and I have copy pasted the... [17:37:46] 10Gerrit, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3002584 (10Paladox) @Aklapper gerrit is maintained by releng. Which the task i added is not random. [17:44:28] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T155525#3002626 (10mmodell) 05Open>03Resolved [18:00:44] 10MediaWiki-Releasing, 06MediaWiki-Stakeholders-Group, 10MediaWiki-extensions-General-or-Unknown, 10Developer-Wishlist (2017): Improve LTS support of extensions - https://phabricator.wikimedia.org/T156640#3002681 (10greg) [18:01:56] 10Gerrit, 06Release-Engineering-Team, 10DBA, 06Operations, 13Patch-For-Review: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#3002686 (10demon) >>! In T145885#3001567, @Aklapper wrote: > @Paladox: Please do not add random projects to tasks... [18:15:17] 03Scap3, 10Parsoid: Saying yes (y) continues to all groups - https://phabricator.wikimedia.org/T156839#3002735 (10dduvall) a:03dduvall [18:56:34] Yippee, build fixed! [18:56:34] Project selenium-RelatedArticles » chrome,beta-mobile,Linux,contintLabsSlave && UbuntuTrusty build #302: 09FIXED in 43 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-mobile,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/302/ [19:35:40] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Differential, 10Developer-Wishlist (2017), 07Jenkins: Add support for a wmf-ci.yaml type file for wikimedia jenkins - https://phabricator.wikimedia.org/T145669#3003039 (10Legoktm) >>! In T145669#3001204, @scfc wrote: > I dislike... [19:42:02] (03PS1) 10Umherirrender: [Bouquet] Add npm job [integration/config] - 10https://gerrit.wikimedia.org/r/336251 [19:43:04] 03Scap3, 10Parsoid: Saying yes (y) continues to all groups - https://phabricator.wikimedia.org/T156839#3003079 (10dduvall) Looks like I caused the regression {rMSCA98247477db5dcc61666cf91de94fccf909de129e} and called it out in the commit message as a "behavior change". :) [19:44:51] 10Gerrit, 06Release-Engineering-Team, 07Upstream: Convert its-phabricator upstream repo which wmf maintains to bazel - https://phabricator.wikimedia.org/T156024#3003082 (10Paladox) All the plugins wikimedia uses have now been migrated to bazel, so no blockers on the plugin side of things, i've even converted... [19:46:26] (03PS2) 10Umherirrender: [Bouquet] Add npm job and make test voting [integration/config] - 10https://gerrit.wikimedia.org/r/336251 [19:51:44] twentyafterfour: saw a "PhabricatorClusterStrandedException" on phab just now. seems to have been isolated but just fyi [19:52:21] I got that too [19:52:37] paladox: (cc twentyafterfour) ah, just saw your message in -ops [19:52:51] Yep, a refresh worked for me :) [20:21:56] (03CR) 10Hashar: [C: 032] [Bouquet] Add npm job and make test voting [integration/config] - 10https://gerrit.wikimedia.org/r/336251 (owner: 10Umherirrender) [20:24:34] (03Merged) 10jenkins-bot: [Bouquet] Add npm job and make test voting [integration/config] - 10https://gerrit.wikimedia.org/r/336251 (owner: 10Umherirrender) [20:53:47] weird [21:31:34] !log Update mobileapps to 034a391 [21:31:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:17:43] i am beging the migration for phab-01 now. Phabricator has been working fine, i just need some modifications (cherry picks) to fix elasticsearch for now, to work with puppet. [22:21:05] All done now :), domain still the same [22:21:06] https://phab-01.wmflabs.org [22:26:00] 10Gerrit, 10MediaWiki-extensions-General-or-Unknown, 06Repository-Admins: Mark Vine extension as a read-only repository on Gerrit - https://phabricator.wikimedia.org/T157224#3003531 (10SamanthaNguyen) 05Open>03Resolved @Florian Thanks for the patch! I'm going to close this ticket now as I think everythin... [22:28:40] 06Release-Engineering-Team, 06Operations, 10Phabricator, 10hardware-requests, 10ops-eqiad: replacement hardware for iridium (phabricator) - https://phabricator.wikimedia.org/T156970#3003536 (10Paladox) Migrations complete for phab-01 -> phabricator (labs instance). Phabricator officially currently works... [22:32:29] 03Scap3, 06Operations: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3003544 (10Ottomata) I can help! [22:32:42] 03Scap3, 10Analytics, 06Operations: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3003545 (10Ottomata) [22:49:46] 03Scap3, 10Analytics, 06Operations: Package + deploy new version of git-fat - https://phabricator.wikimedia.org/T155856#3003573 (10thcipriani) >>! In T155856#3003544, @Ottomata wrote: > I can help! yay! Thanks in advance. Feel free to poke me in IRC if you have questions/problems/need a post-deploy checker. [23:35:09] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T155526#3003705 (10Krinkle) [23:35:30] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T155526#2946186 (10Krinkle) [23:37:26] 10Continuous-Integration-Config, 10OOjs-UI, 07Documentation: Link to jsduck docs somewhere in OOUI demos - https://phabricator.wikimedia.org/T127281#3003716 (10SamanthaNguyen) 05Open>03Resolved a:03Prtksxna