[00:21:24] greg-g hi, im wondering if we can setup puppet swat for gerrit please? Since recently there are many changes for it now. It would stop us from keep having to ask one of the ops with +2 rights. [00:24:02] Or just use normal puppet swat window :) [00:24:48] Yep or that [00:47:49] so... what're we doing about 'rake' stopping mediawiki commits from merging properly? [00:49:48] 10Continuous-Integration-Infrastructure: timeouts with rubygems.global.ssl.fastly.net causing jobs to fail - https://phabricator.wikimedia.org/T144325#2596308 (10Krinkle) This is preventing commits from being merged in MediaWiki core and elsewhere because the `rake` job runs there as well (for coding style of Ru... [00:49:53] 10Continuous-Integration-Infrastructure: timeouts with rubygems.global.ssl.fastly.net causing jobs to fail - https://phabricator.wikimedia.org/T144325#2596546 (10Krinkle) p:05Triage>03Unbreak! [01:18:55] ah, I'm just encountering this rake thing too [01:37:54] (03PS1) 10Krinkle: rake: Set "*.rb" files filter [integration/config] - 10https://gerrit.wikimedia.org/r/307670 (https://phabricator.wikimedia.org/T144325) [01:38:17] (03PS2) 10Krinkle: rake: Limit to commits that touch "*.rb" files [integration/config] - 10https://gerrit.wikimedia.org/r/307670 (https://phabricator.wikimedia.org/T144325) [01:38:20] (03CR) 10Krinkle: [C: 032] rake: Limit to commits that touch "*.rb" files [integration/config] - 10https://gerrit.wikimedia.org/r/307670 (https://phabricator.wikimedia.org/T144325) (owner: 10Krinkle) [01:39:15] (03CR) 10jenkins-bot: [V: 04-1] rake: Limit to commits that touch "*.rb" files [integration/config] - 10https://gerrit.wikimedia.org/r/307670 (https://phabricator.wikimedia.org/T144325) (owner: 10Krinkle) [01:42:36] (03PS3) 10Krinkle: rake: Set "*.rb" files filter [integration/config] - 10https://gerrit.wikimedia.org/r/307670 (https://phabricator.wikimedia.org/T144325) [01:43:24] (03PS4) 10Krinkle: rake: Limit to commits that touch "*.rb" files [integration/config] - 10https://gerrit.wikimedia.org/r/307670 (https://phabricator.wikimedia.org/T144325) [01:43:30] (03CR) 10Krinkle: [C: 032] rake: Limit to commits that touch "*.rb" files [integration/config] - 10https://gerrit.wikimedia.org/r/307670 (https://phabricator.wikimedia.org/T144325) (owner: 10Krinkle) [01:44:28] (03Merged) 10jenkins-bot: rake: Limit to commits that touch "*.rb" files [integration/config] - 10https://gerrit.wikimedia.org/r/307670 (https://phabricator.wikimedia.org/T144325) (owner: 10Krinkle) [01:44:50] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/307670 [01:44:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [04:03:02] 10Beta-Cluster-Infrastructure, 06Release-Engineering-Team, 10DBA, 13Patch-For-Review, 07WorkType-Maintenance: Upgrade mariadb in deployment-prep from Precise/MariaDB 5.5 to Jessie/MariaDB 5.10 - https://phabricator.wikimedia.org/T138778#2596718 (10dduvall) Scratch that @jcrespo. I had a conflict after al... [05:58:54] 10Continuous-Integration-Config: Standalone jshint check for ULS is failing - https://phabricator.wikimedia.org/T144337#2596797 (10Nikerabbit) [07:49:20] 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2597344 (10hashar) [08:21:11] 10Browser-Tests-Infrastructure, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: "Scenario: Add reference with multiple snaks" browsertest is flaky - https://phabricator.wikimedia.org/T144190#2597502 (10thiemowmde) p:05Triage>03High [09:09:24] 10Continuous-Integration-Infrastructure, 06Operations: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#2597635 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi this is completed, package uploaded ``` # reprepro list jessie-wikimedia... [09:35:48] 10Continuous-Integration-Infrastructure, 06Operations: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#2597676 (10hashar) Confirmed. Since 0.17.0 got installed to `main` can you drop the old version from `thirdparty` please? That is to... [09:39:06] 10Beta-Cluster-Infrastructure, 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10Analytics, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2597678 (10elukey) >>! In T116206#2595478, @bd808 wrote: >>>! In T116206#2582429, @elukey wrote: >> Thanks for reporting, this is my bad since an... [09:45:21] 10Continuous-Integration-Infrastructure, 06Operations: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#2597698 (10hashar) Sorry I have forgot about a bunch of other ones: | jenkins-debian-glue-buildenv-slave | 0.13.0 | http://apt.wikim... [09:48:24] 10Continuous-Integration-Infrastructure, 06Operations: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#2597726 (10Paladox) Also it seems the packages it depends on do not exist in debian package manager causing installs to fail. [09:54:20] 10Continuous-Integration-Infrastructure, 06Operations: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#2597732 (10hashar) @paladox what do you mean? Do you have some kind of trace? On Jessie `apt-get update && apt-get install jenkins-deb... [09:55:43] 10Continuous-Integration-Infrastructure, 06Operations: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#2597733 (10Paladox) >>! In T141114#2597732, @hashar wrote: > @paladox what do you mean? Do you have some kind of trace? > > On Jessie... [09:59:24] paladox: that is no more relevant [09:59:32] Oh [09:59:34] dpkg -i does not resolve dependencies [09:59:34] so after you have to apt-get install -f [09:59:38] and manually fix a bunch of issues [09:59:39] now [09:59:44] Oh ah [09:59:44] that the packages have been pushed to apt.wm.o [09:59:57] you just have to apt-get install jenkins-debian-glue jenkins-debian-glue-buildenv [10:00:00] thanks for explaning :) [10:00:04] and it would just work [10:00:13] Ok thanks [10:13:28] 10Continuous-Integration-Infrastructure, 06Operations: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#2597764 (10hashar) @Paladox `dpkg -i` does not resolve dependencies so you then have to manually fix them. Now that the package has bee... [10:14:02] 10Continuous-Integration-Infrastructure, 06Operations: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#2597765 (10Paladox) Oh ok, thanks :) [10:14:44] (03CR) 10Paladox: "@hashar could you review please?" [integration/config] - 10https://gerrit.wikimedia.org/r/307654 (https://phabricator.wikimedia.org/T143233) (owner: 10Paladox) [10:14:48] (03PS3) 10Paladox: Add rm -fR "$WORKSPACE/modules/*/bin" to jenkins job operations-puppet-doc [integration/config] - 10https://gerrit.wikimedia.org/r/307654 (https://phabricator.wikimedia.org/T143233) [10:14:59] (03CR) 10Paladox: "@hashar could you review please?" [integration/config] - 10https://gerrit.wikimedia.org/r/307543 (owner: 10Paladox) [10:15:02] (03PS3) 10Paladox: [DonationInterface] Switch jenkins tests to extension-unittests-composer-non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/307543 [10:15:17] paladox: I am subscribed / a reviewer of all integration/config changes :D [10:15:23] Oh [10:20:02] Project beta-update-databases-eqiad build #11001: 04FAILURE in 0.84 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11001/ [10:20:37] 10:20:01 Exception: ('command: ', '/usr/local/bin/mwscript update.php --wiki=aawiki --quick', 'output: ', "#!/usr/bin/env php\nFatal error: Uncaught exception 'MediaWiki\\Services\\NoSuchServiceException' with message 'No such service: MobileFrontend.Config' in /mnt/srv/mediawiki-staging/php-master/includes/Services/ServiceContainer.php:185\nStack trace:\n#0 /mnt/srv/mediawiki-staging/php-master/includes/MediaWikiServices.php(205): [10:20:37] MediaWiki\\Services\\ServiceContainer->peekService('MobileFrontend....')\n#1 /mnt/srv/mediawiki-staging/php-master/includes/MediaWikiServices.php(185): MediaWiki\\MediaWikiServices->salvage(Object(MediaWiki\\MediaWikiServices))\n#2 /mnt/srv/mediawiki-staging/php-master/includes/Setup.php(506): MediaWiki\\MediaWikiServices::resetGlobalInstance(Object(GlobalVarConfig), 'quick')\n#3 /mnt/srv/mediawiki-staging/php-master/maintenance/doMaintenanc [10:20:37] e.php(97): require_once('/mnt/srv/mediaw...')\n#4 /mnt/srv/mediawiki-staging/php-master/maintenance/update.php(216): require_once('/mnt/srv/mediaw...')\n#5 /mnt/srv/mediawiki-staging/multiversion/MWScript.php(97): require_once('/mnt/srv/mediaw...')\n#6 {main}\n thrown in /mnt/srv/mediawiki-staging/php-master/includes/Services/ServiceContainer.php on line 185\n") [10:20:50] hashar ^^ i probaly should have pasted that in paste [10:20:51] sorry [10:21:10] but anyways looks like the MobileFrontend error is back [10:21:47] Caused by https://gerrit.wikimedia.org/r/#/c/307225/ [10:32:28] hashar, paladox: woo [10:32:57] phuedx hi, i belive it is because we decided to go with https://gerrit.wikimedia.org/r/#/c/307133/4 instead of doing https://gerrit.wikimedia.org/r/#/c/307133/1 [10:33:03] which fixed it in the first patch [10:33:21] I guess duplicating seemed to fix it. [10:34:11] Reverted here https://gerrit.wikimedia.org/r/#/c/307716/ for now [10:35:11] paladox: you're right that version has an error [10:35:23] but the duplication is unnecessary [10:35:26] Yep and oh [10:35:56] but https://gerrit.wikimedia.org/r/#/c/307133/1/includes/MediaWikiServices.php that fixes it [10:36:02] where as https://gerrit.wikimedia.org/r/#/c/307133/4/includes/MediaWikiServices.php [10:36:03] dosent [10:36:07] phuedx ^^ [10:36:29] yes [10:37:04] but it's because the "NoSuchServiceException" isn't defined in the MediaWiki namespace [10:37:12] it's defined in the "MediaWiki\Services" namespace [10:37:16] Oh [10:37:24] so that catch stanza won't catch //anything// ;/ [10:37:28] Ah [10:37:38] Do you have a fix? [10:37:45] because it's such a minor fix i'm not going to revert florian's patch to core [10:37:49] i'm going to follow it up [10:37:49] Ok [10:37:52] thanks [10:37:53] but i'll merge your revert [10:38:05] ok thanks [10:38:05] thanks for flagging it paladox :) [10:38:12] Your welcome :) [10:40:59] phuedx adding use MediaWiki\Services\NoSuchServiceException; to mediawiki services still dosent fix it. [10:41:35] paladox: where are you adding that line and how are you testing it? [10:41:53] Im testing it here https://en.random-wikisaur.tk/ [10:42:05] im adding it to includes/MediaWikiServices.php [10:42:11] I applied https://gerrit.wikimedia.org/r/#/c/307133/4/includes/MediaWikiServices.php [10:42:18] and adding that use statement at the top [10:42:30] i have the MobileFrontend extension installed. [10:43:09] phuedx ^^ [10:43:20] 10Continuous-Integration-Infrastructure: Frivolous Jenkins failures for Selenium due to DB error - https://phabricator.wikimedia.org/T144247#2597795 (10hoo) Did e92cefafdf70b380a0882b2fbe3250860769c986 fix this? [10:43:35] stacktrace? [10:45:11] phuedx strange it is working now [10:45:28] Yep lets go with that fix [10:45:43] I will upload the patch now [10:47:17] phuedx https://gerrit.wikimedia.org/r/#/c/307720/ [10:47:21] :) [10:54:23] paladox: hah! https://gerrit.wikimedia.org/r/#/c/307721/ [10:55:05] phuedx maybe move the use statement up like here https://gerrit.wikimedia.org/r/#/c/307720/ and we use your patch :) [10:57:41] paladox: i'll abandon mine. provide reasoning in your commit message [10:57:53] Oh woops [10:58:03] phuedx already did mine, in favour of yours [10:58:13] oh [10:58:14] okie [10:58:14] since your commit msg explains more [10:58:15] w/e [10:58:47] phuedx i left a comment at https://gerrit.wikimedia.org/r/#/c/307721/ [11:00:27] paladox: updated [11:00:45] Thankyou [11:00:46] not quite where you wanted it ;) [11:00:56] Nope but its ok [11:01:11] phuedx would you be able to merge it please? [11:02:39] paladox: i'm the owner of the patch -- i've added reviewers of the previous patch [11:02:43] shouldn't take long :) [11:02:48] Ok [11:05:32] thanks [11:20:59] Yippee, build fixed! [11:21:00] Project beta-update-databases-eqiad build #11002: 09FIXED in 58 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11002/ [11:24:09] (03PS37) 10Zfilipin: WIP Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [11:28:39] (03PS38) 10Zfilipin: WIP Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [11:29:42] (03PS39) 10Zfilipin: Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [11:31:26] 06Release-Engineering-Team, 102017 (Organization), 06Developer-Relations (Jul-Sep-2016): Developer Summit 2017: Work with TPG and RelEng on solution to event documenting - https://phabricator.wikimedia.org/T132400#2597913 (10Aklapper) [11:33:17] (03PS40) 10Zfilipin: Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [11:33:47] (03PS41) 10Zfilipin: WIP Run language screenshots script for VisualEditor in Jenkins [integration/config] - 10https://gerrit.wikimedia.org/r/300035 (https://phabricator.wikimedia.org/T139613) [11:47:50] 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2597938 (10mark) @hashar and I just had a long chat on IRC, where we clarified some things in both directions. A f... [13:16:59] 07Browser-Tests, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: "Scenario: Add reference with multiple snaks" browsertest is flaky - https://phabricator.wikimedia.org/T144190#2598066 (10Tobi_WMDE_SW) [13:22:45] 10Browser-Tests-Infrastructure, 07JavaScript, 10Malu (Malu-Prototype), 15User-zeljkofilipin: Release malu 0.0.3 - https://phabricator.wikimedia.org/T139742#2598073 (10zeljkofilipin) 05Open>03declined [13:34:37] 10Continuous-Integration-Infrastructure (phase-out-gallium), 06Operations, 10hardware-requests: Allocate contint1001 to releng and allocate to a vlan - https://phabricator.wikimedia.org/T140257#2598087 (10chasemp) I am good with following in the foot step of gallium here as pragmatic. It seems like the mos... [13:47:57] Yippee, build fixed! [13:47:57] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #131: 09FIXED in 3 min 55 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/131/ [13:54:54] hashar: we probably need to hold off on reverts today as we are seeing new instance errors that seem to be in relation to concurrent activities and we have two other sizable maintenances scheduled today [13:59:22] 07Browser-Tests, 10Wikidata: Update Wikidata browsertest documentation - https://phabricator.wikimedia.org/T144392#2598152 (10Tobi_WMDE_SW) [13:59:30] 07Browser-Tests, 10Wikidata: Update Wikidata browsertest documentation - https://phabricator.wikimedia.org/T144392#2598164 (10Tobi_WMDE_SW) p:05Triage>03High [14:00:20] chasemp: yeah I have assumed the maintenance would prevent reverts [14:00:34] and after it you probably want to stick to a known solution [14:02:13] thcipriani|afk: ostriches: labs network is going to maintenance mode one hour from now. [14:02:44] so would probably want to stop Jenkins entirely or make it stop processing jobs via the safe shutdown system [14:05:21] 07Browser-Tests, 10Wikidata, 15User-Tobi_WMDE_SW: Update Wikidata browsertest documentation - https://phabricator.wikimedia.org/T144392#2598191 (10Tobi_WMDE_SW) [14:05:36] 07Browser-Tests, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 15User-Tobi_WMDE_SW: "Scenario: Add reference with multiple snaks" browsertest is flaky - https://phabricator.wikimedia.org/T144190#2598192 (10Tobi_WMDE_SW) [14:20:28] hiyaa, did group0 deployment make it out yesterday? [14:25:36] it would appear so [14:27:20] 10Browser-Tests-Infrastructure, 10Continuous-Integration-Config, 07Upstream, 15User-zeljkofilipin: Firefox v47 breaks mediawiki_selenium - https://phabricator.wikimedia.org/T137561#2598287 (10zeljkofilipin) a:03zeljkofilipin [14:27:31] 10Browser-Tests-Infrastructure, 15User-zeljkofilipin: Update mediawiki_selenium to use Marionette - https://phabricator.wikimedia.org/T137540#2598288 (10zeljkofilipin) a:03zeljkofilipin [14:29:38] Krenair: does group0 include wikitech? [14:30:04] no [14:30:08] why? [14:30:54] oh just curious, am testing stuff [14:30:58] ah mediawikiwiki is good [14:31:00] testing there [14:31:05] oh ja it works! awesome [14:33:04] ottomata: https://tools.wmflabs.org/versions/ by Bryan Davis, has all your answers :} [14:33:13] shows each group, the version they run at and list of wikis [14:33:17] oh nice! [14:33:28] ottomata: wikitech is later since it is very crucial to wmflabs [14:33:50] I am not sure why zerowiki is in group0 [14:34:10] mediawiki.org it is because whatever oddity happening there would result in a detailed bug report [14:34:16] since the audience is usually tech savvy [14:34:32] ah right [14:34:32] ok cool, thanks hashar [14:39:08] hashar: omg omg omg! that's awesome /cc bd808 [14:40:14] * phuedx sends a note to the rest of reading web [15:29:41] a trick paladox found out is to add a message to the top of https://integration.wikimedia.org/zuul/ [15:29:59] can be done via integration/docroot and it is auto deployed on merge via a jenkins job [15:30:17] 10Beta-Cluster-Infrastructure: wgCentralAuthCheckLoggedInURL in Beta Cluster should be https - https://phabricator.wikimedia.org/T124275#2598475 (10AlexMonk-WMF) 05Open>03Resolved a:03AlexMonk-WMF It is HTTPS now! [15:30:18] if jenkins is dead, gotta hack it directly on gallium under /srv/org/wikimedia/ somewhere in a default.php file [15:31:35] it actually seems like nodepool/ci is behaving just fine at the moment. [15:32:25] I think andrew said there "might" be impact [15:32:26] maybe the network will be only shortly disrupted [15:32:26] I will keep an eye on it/kill all the things as necessary [15:33:44] we're all done breaking things, so if tests are still running then all is well [15:34:46] RECOVERY - Puppet run on deployment-ms-fe01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:42:28] 10Beta-Cluster-Infrastructure: Reenable $wgMWOAuthSecureTokenTransfer=true; on the beta cluster - https://phabricator.wikimedia.org/T67421#2598539 (10AlexMonk-WMF) 05Open>03Resolved I'm considering this done, unless someone really thinks we need to reset those. (@dpatrick, @bawolff?) [15:44:06] 10Continuous-Integration-Infrastructure, 13Patch-For-Review: timeouts with rubygems.global.ssl.fastly.net causing jobs to fail - https://phabricator.wikimedia.org/T144325#2598546 (10hashar) 05Open>03Resolved a:03hashar Pretty sure that has been sorted out by their CDN now. [15:49:01] 10Continuous-Integration-Infrastructure: Frivolous Jenkins failures for Selenium due to DB error - https://phabricator.wikimedia.org/T144247#2593143 (10greg) >>! In T144247#2597795, @hoo wrote: > Did e92cefafdf70b380a0882b2fbe3250860769c986 fix this? Does that mean that you see things working again? [15:49:12] RECOVERY - Puppet run on deployment-mediawiki01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:49:30] RECOVERY - Puppet run on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:49:36] RECOVERY - Puppet run on integration-slave-precise-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [15:49:56] 10Continuous-Integration-Infrastructure: Frivolous Jenkins failures for Selenium due to DB error - https://phabricator.wikimedia.org/T144247#2598563 (10hashar) Each examples ran on different slaves. Looks like the permissions for the database are off on the CI slaves. Looks like the user we create does not ha... [15:51:12] 10Continuous-Integration-Infrastructure: Frivolous Jenkins failures for Selenium due to DB error - https://phabricator.wikimedia.org/T144247#2598565 (10hashar) Another possibility is that mysql got magically upgraded on the CI slaves (via unattended upgrade) and whatever permission needed got dropped while the d... [15:54:11] RECOVERY - Puppet run on integration-aptly01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:25] RECOVERY - Puppet run on integration-slave-precise-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:29] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:33] RECOVERY - Puppet run on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:33] RECOVERY - Puppet run on zuul-dev-jessie is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:35] RECOVERY - Puppet run on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:35] RECOVERY - Puppet run on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:37] 10Continuous-Integration-Infrastructure: Frivolous Jenkins failures for Selenium due to DB error - https://phabricator.wikimedia.org/T144247#2598587 (10Ladsgroup) Jenkins doesn't fail on Wikibase tests now, so maybe we can lower the priority now. [15:54:39] RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:39] RECOVERY - Puppet run on deployment-poolcounter02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:41] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:49] RECOVERY - Puppet run on deployment-kafka04 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:52] RECOVERY - Puppet run on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:52] RECOVERY - Puppet run on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:57] RECOVERY - Puppet run on deployment-pdfrender is OK: OK: Less than 1.00% above the threshold [0.0] [15:54:59] RECOVERY - Puppet run on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [15:55:05] RECOVERY - Puppet run on deployment-db1 is OK: OK: Less than 1.00% above the threshold [0.0] [15:58:27] thcipriani: hashar all clear on labs maint if you guys are still holding back [15:59:16] chasemp: thanks for the all-clear. been monitoring, but hadn't made any changes didn't noticed any weirdness on our side. [15:59:32] RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [15:59:36] RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [15:59:38] thcipriani: bonus [15:59:41] :D [15:59:50] RECOVERY - Puppet run on deployment-zookeeper01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:59:50] RECOVERY - Puppet run on deployment-fluorine02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:01:22] 10Continuous-Integration-Infrastructure: Frivolous Jenkins failures for Selenium due to DB error - https://phabricator.wikimedia.org/T144247#2598674 (10hashar) Wikibase got "fixed" by skipping all the impacted tests. The job running on the CI slaves failed at least twice for Echo on https://gerrit.wikimedia.org... [16:02:01] nodepool has all instances in "ready" [16:02:14] https://integration.wikimedia.org/ci/ shows a bunch of nodepool instances connected [16:02:22] as well as the permanent slaves [16:02:50] with the exception of https://integration.wikimedia.org/ci/computer/integration-slave-trusty-1012/log but that one has exploded a few days ago already [16:03:11] https://integration.wikimedia.org/zuul/ shows a spike of function in Gearman that all got processed [16:03:13] queue is empty [16:03:20] thcipriani: ^^^ looks all fine to me [16:03:22] I am rushing out [16:03:29] if in doubt: restart ! [16:03:35] hashar: ack thanks :) [16:03:50] chasemp: andrewbogott: kudos :) [16:04:02] I am off. I will be back for the group1 train [16:35:59] 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10Parsoid, 06Services, and 2 others: Deploy Parsoid with scap3 - https://phabricator.wikimedia.org/T120103#2598773 (10greg) \o/ [16:39:24] 10Continuous-Integration-Infrastructure: Frivolous Jenkins failures for Selenium due to DB error - https://phabricator.wikimedia.org/T144247#2598798 (10greg) p:05Unbreak!>03High [16:53:54] hm… are other people getting their test results? [16:53:58] probably I'm just impatient [16:55:06] it seems to be slow [16:55:07] again [16:55:08] see https://integration.wikimedia.org/zuul/ [16:55:14] long queue [16:55:20] andrewbogott ^^ [16:55:36] ok, I'll just wait :) [16:55:57] Ok [16:56:12] andrewbogott: I imagine our maint backed it up [16:56:37] Wikibase is using all the resources [16:57:06] but it's not too bad on nodepool things? [16:57:06] https://graphite.wikimedia.org/render/?width=966&height=489&_salt=1471549160.573&target=cactiStyle(zuul.pipeline.gate-and-submit.label.ci*.wait_time.mean)&hideLegend=false&lineMode=connected&from=-6h [16:58:42] twentyafterfour i wonder can we switch phabricator to letsencrypt ? [16:58:46] like we did with gerrit [16:58:52] or will this only work on jessie [17:01:10] I wonder will nodepool work on Microsoft windows 10 bash (ubuntu) lol [17:06:10] PROBLEM - Puppet run on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:10:42] PROBLEM - Puppet run on integration-slave-precise-1011 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:15:23] 10Beta-Cluster-Infrastructure: Betacluster hangs when visiting 2016_Formula_One_season - https://phabricator.wikimedia.org/T144409#2598913 (10Jdlrobson) [17:15:38] PROBLEM - Puppet run on integration-slave-precise-1012 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:16:00] PROBLEM - Puppet run on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:19:52] https://gerrit.wikimedia.org/r/#/c/306750/ [17:20:08] nvm. [17:25:14] PROBLEM - Puppet run on integration-slave-trusty-1016 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:25:14] PROBLEM - Puppet run on integration-slave-trusty-1012 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:25:54] PROBLEM - Puppet run on integration-slave-trusty-1013 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:30:18] PROBLEM - Puppet run on integration-puppetmaster is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:30:33] PROBLEM - Puppet run on integration-publisher is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:36:02] PROBLEM - Puppet run on deployment-elastic07 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:40:30] 10Continuous-Integration-Config, 10MediaWiki-extensions-ZeroBanner, 06Wikipedia-Android-App-Backlog, 10Wikipedia-App-MobileApp-extension: MobileFrontend should run the tests of ZeroBanner and MobileApps - https://phabricator.wikimedia.org/T144412#2599002 (10Jdlrobson) [17:41:01] (03PS1) 10Jdlrobson: Make ZeroBanner and MobileApps dependencies of MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/307794 (https://phabricator.wikimedia.org/T144412) [17:41:24] (03PS2) 10Jdlrobson: Make ZeroBanner and MobileApps dependencies of MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/307794 (https://phabricator.wikimedia.org/T144412) [17:42:08] (03CR) 10jenkins-bot: [V: 04-1] Make ZeroBanner and MobileApps dependencies of MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/307794 (https://phabricator.wikimedia.org/T144412) (owner: 10Jdlrobson) [17:43:32] (03PS3) 10Jdlrobson: Make ZeroBanner and MobileApps dependencies of MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/307794 (https://phabricator.wikimedia.org/T144412) [18:00:06] greg-g: beta cluster mobile site seems to be down. Are you aware of that? [18:00:38] 10Beta-Cluster-Infrastructure: Betacluster hangs on mobile site - https://phabricator.wikimedia.org/T144409#2599111 (10Jdlrobson) [18:01:10] jdlrobson: no, can you tell me more? [18:01:17] https://en.m.wikipedia.beta.wmflabs.org/wiki/H [18:01:25] any page other than main page seems to throw a service unavailable [18:01:42] not just mobile [18:01:47] did you test non mobile? :) [18:02:28] looks like i didnt :) I thought I did but I guess I only checked desktop main page [18:02:45] 10Beta-Cluster-Infrastructure: Betacluster hangs on page views other than Main page - https://phabricator.wikimedia.org/T144409#2599115 (10Jdlrobson) [18:14:10] robla: you want someone from our team to attend https://phabricator.wikimedia.org/E266 for a followup discussion of https://phabricator.wikimedia.org/T69223 is that correct? [18:14:29] from releng that is [18:16:00] RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [18:16:26] * robla double checks [18:17:08] thcipriani: the E266 meeting today is about https://phabricator.wikimedia.org/T589 [18:18:50] I'll expand a little bit. In preparing for today, it occurred to me that what Krinkle is really looking for help on is deployment strategy questions [18:18:55] 06Release-Engineering-Team, 06Labs, 06Operations, 10wikitech.wikimedia.org, 07LDAP: Rename specific account in LDAP, Wikitech and Gerrit - https://phabricator.wikimedia.org/T133968#2599177 (10demon) 05Open>03Resolved >>! In T133968#2551726, @Sophivorus wrote: > I modified my rename request. The reque... [18:20:04] the questions are going to be what the migration needs to look like. how far do we need to go to ensure that this work has a good backout strategy [18:21:25] ah, yeah, I see that there is some question about depooling, etc. [18:22:30] ack, ok, lemme poke some folks. Wasn't initially clear on what we were talking about after looking back at the SoS notes :) [18:34:53] cool, thanks! [18:49:23] o/ [18:54:30] jdlrobson: it's relatedarticles breaking all of beta cluster [18:54:48] jdlrobson: in the future: https://logstash-beta.wmflabs.org/ [18:55:19] ahh logstash-beta. I was looking on logstash.wmflabs.org (/facepalm) [18:55:28] greg-g: i'll take a look [18:55:33] ostriches: https://phabricator.wikimedia.org/T144409 [18:55:49] jdlrobson: yeah, that confusion is annoying [18:55:55] (which logstash) [18:56:31] instant fix coming [18:56:52] Er, already got a patch [18:56:52] https://gerrit.wikimedia.org/r/#/c/307800/ [18:56:57] (i was working on it lol) [18:56:59] jdlrobson: ostriches beat you ^ [18:57:06] Dammit [18:57:09] NOW I'M SAD [18:57:12] * ostriches runs away [18:57:22] needs a revert [18:57:42] A revert or a fix? [18:57:48] always revert [18:57:51] it's our punishment for messing up :) [18:58:08] https://gerrit.wikimedia.org/r/#/c/307801/ [18:58:34] Easier to fix but w/e ;-) [18:58:40] Your call [18:59:00] heh [18:59:21] Chad [18:59:21] Abandoned [18:59:21] MY PATCH IS NOT WANTED [18:59:34] jdlrobson: I SEE HOW IT IS [18:59:41] * ostriches pouts [18:59:42] lol [18:59:48] lol no [18:59:56] i was gonna squash it into the re-revert [18:59:58] if that's okay :) [19:00:15] First you don't want my work and now you do?! [19:00:19] MAKE UP YOUR MIND SIR [19:00:23] ;-) [19:00:38] lol [19:01:07] 2nd fatal from sam this week :) he rocks! [19:01:22] and 2nd patch I've wrongly helped merged. [19:01:29] go team! [19:01:42] if we're not breaking things we're not working :) [19:01:59] seriously though - sorry for the pain :) [19:15:42] PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [19:24:48] yay were almost finished to call fixing gerrit puppet role in labs a sucess, now deleting instance and recreating it, We have setup gerrit-mysql for our db [19:24:57] mutante helped alot with ^^ [19:49:42] 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T142117#2599672 (10hashar) 19:46:03 rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.17 [19:50:42] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [20:01:16] 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T142117#2599739 (10AlexMonk-WMF) [20:01:20] 06Release-Engineering-Team (Deployment-Blockers), 13Patch-For-Review, 05Release: MW-1.28.0-wmf.17 deployment blockers - https://phabricator.wikimedia.org/T142117#2599741 (10hashar) There is a couple explicit commit of implicit transactions for Wikidata T144433 T144434 not much of a worry [20:03:10] (03CR) 10Bmansurov: Make ZeroBanner and MobileApps dependencies of MobileFrontend (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/307794 (https://phabricator.wikimedia.org/T144412) (owner: 10Jdlrobson) [20:05:53] PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:15:36] PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:42:05] paladox: Re https://gerrit.wikimedia.org/r/307869 , it's better to wait until after the patch merges, so that Gerrit will add the (cherry picked from abc123) line [20:42:47] RoanKattouw oh sorry. Should i abandon and wait for it to merge to include cherry picked from abc123 ? [20:45:55] RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:55:33] PROBLEM - Puppet run on deployment-mediawiki02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:55:39] RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [21:06:48] hi folks, we're starting the T589 discussion now (thcipriani) [21:07:51] robla: ack, ostriche.s incoming (as you probably saw :)) [21:14:58] thanks! [21:22:38] ostriches how can you use another instance as a mysql host in gerrit db [21:22:41] it is failing for me [21:22:52] i am having gerrit-mysql as our db host [21:22:57] and gerrit-test3 as our gerrit host [21:24:18] You have to set it in hiera. [21:24:27] Oh [21:24:34] What should it look like [21:24:43] since i have it set in hiera and it still fails [21:24:52] https://wikitech.wikimedia.org/wiki/Hiera:Git [21:24:57] "gerrit::jetty::db_host": gerrit-mysql [21:25:18] Fails how then? [21:25:42] ostriches it fails with https://phabricator.wikimedia.org/P3957 [21:26:20] What does gerrit.config on disk say the host is? [21:26:24] Is it set correct? [21:26:32] Can you verify a connection to it with the commandline `mysql` tool? [21:27:05] (Also, why is install_gerrit_jetty downloading the mysql jar file? It should be installed by the package & symlinked from the puppet manifest already) [21:27:10] ostriches Oh, how can i remotly connect to mysql to another host [21:27:16] and not sure why it does that [21:28:08] `mysql -h whateveryourhostnameis -u whateveryourusernameis -p` [21:28:57] Ok thanks [21:30:33] RECOVERY - Puppet run on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:04:56] 10Beta-Cluster-Infrastructure, 10Mathoid: Move mathoid to deployment-sca* hosts in Beta Cluster - https://phabricator.wikimedia.org/T142255#2600176 (10AlexMonk-WMF) [22:04:57] 10Beta-Cluster-Infrastructure, 10ContentTranslation-CXserver: Move apertium to deployment-sca* hosts in Beta Cluster - https://phabricator.wikimedia.org/T142152#2600177 (10AlexMonk-WMF) [22:04:59] 10Beta-Cluster-Infrastructure, 07Puppet, 07Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2600178 (10AlexMonk-WMF) [22:05:02] 10Beta-Cluster-Infrastructure, 13Patch-For-Review, 07Puppet, 15User-bd808: deployment-sca0[12] puppet failure due to issues involving /srv/deployment directory - https://phabricator.wikimedia.org/T143065#2600172 (10AlexMonk-WMF) 05Open>03Resolved a:03bd808 Thanks @bd808 and @dzahn [22:22:45] 10Continuous-Integration-Config, 10MediaWiki-extensions-ZeroBanner, 06Wikipedia-Android-App-Backlog, 10Wikipedia-App-MobileApp-extension, and 2 others: MobileFrontend should run the tests of ZeroBanner and MobileApps - https://phabricator.wikimedia.org/T144412#2600288 (10jhobs) p:05Triage>03High Triagi... [22:24:43] (03CR) 10Jhobs: [C: 031] "Assuming there is not a problem with circular dependencies." [integration/config] - 10https://gerrit.wikimedia.org/r/307794 (https://phabricator.wikimedia.org/T144412) (owner: 10Jdlrobson) [22:59:54] !log Deleted empty /data, /data/project, and /data/scratch on integration-publisher to fix puppet [23:00:00] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:04:40] !log Deleted empty /data, /data/project, and /data/scratch on integration-puppetmaster to fix puppet [23:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:08:43] !log Deleted /data on integration-slave-jessie-1001 to fix puppet [23:08:49] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:10:32] RECOVERY - Puppet run on integration-publisher is OK: OK: Less than 1.00% above the threshold [0.0] [23:11:10] !log Deleted /data on integration-slave-precise-1011 to fix puppet [23:11:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:15:11] !log Deleted /data/scratch on integration-slave-precise-1012 to fix puppet [23:15:16] RECOVERY - Puppet run on integration-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [23:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:19:43] !log Deleted /data/scratch on integration-slave-trusty-1011 to fix puppet [23:19:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:21:01] RECOVERY - Puppet run on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [23:22:30] !log Deleted /data/scratch on integration-slave-trusty-1012 to fix puppet [23:22:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:25:37] RECOVERY - Puppet run on integration-slave-precise-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [23:25:43] RECOVERY - Puppet run on integration-slave-precise-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [23:31:08] RECOVERY - Puppet run on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [23:32:27] !log Deleted /data/scratch on integration-slave-trusty-1013 to fix puppet [23:32:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:35:14] RECOVERY - Puppet run on integration-slave-trusty-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [23:36:11] !log Deleted /data/scratch on integration-slave-trusty-1016 to fix puppet [23:36:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:40:24] !log forced puppet run on deployment-salt02. Had not run automatically for 8 hours [23:40:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:40:54] RECOVERY - Puppet run on integration-slave-trusty-1013 is OK: OK: Less than 1.00% above the threshold [0.0] [23:45:13] RECOVERY - Puppet run on integration-slave-trusty-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [23:51:29] RECOVERY - Puppet staleness on deployment-salt02 is OK: OK: Less than 1.00% above the threshold [3600.0]