[05:38:19] PROBLEM - Puppet run on deployment-eventlogging04 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [05:57:07] (03CR) 10Robert Vogel: [C: 031] Whitelist Ljonka [integration/config] - 10https://gerrit.wikimedia.org/r/315579 (owner: 10Paladox) [06:13:18] RECOVERY - Puppet run on deployment-eventlogging04 is OK: OK: Less than 1.00% above the threshold [0.0] [06:47:09] Project selenium-Wikibase » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #140: 04FAILURE in 2 hr 7 min: https://integration.wikimedia.org/ci/job/selenium-Wikibase/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/140/ [08:34:39] (03CR) 10Hashar: [C: 032] Whitelist Ljonka [integration/config] - 10https://gerrit.wikimedia.org/r/315579 (owner: 10Paladox) [08:35:22] (03Merged) 10jenkins-bot: Whitelist Ljonka [integration/config] - 10https://gerrit.wikimedia.org/r/315579 (owner: 10Paladox) [08:42:49] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2712068 (10hashar) @mmodell seems the way search now works is the source of lot of confusion... [08:59:43] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2712108 (10mmodell) [09:14:18] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2712139 (10hashar) From a conversation with Mukunda, D413 should address it and is deployed... [09:59:24] so memcached seems working fine afaics [09:59:28] good :) [10:41:04] 10Continuous-Integration-Config, 10Analytics-Dashiki: Add CI job for Dashiki | jQuery version mismatch - https://phabricator.wikimedia.org/T148019#2712346 (10hashar) [10:42:03] 10Continuous-Integration-Config, 10Analytics-Dashiki: Dashiki bower has a version conflict for jQuery - https://phabricator.wikimedia.org/T148020#2712358 (10hashar) [10:42:49] 10Continuous-Integration-Config, 10Analytics-Dashiki: Add CI job for Dashiki | jQuery version mismatch - https://phabricator.wikimedia.org/T148019#2712346 (10hashar) [10:44:58] PROBLEM - Puppet run on repository is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [11:11:43] 10Continuous-Integration-Config, 10Analytics-Dashiki, 13Patch-For-Review: Add CI job for Dashiki | jQuery version mismatch - https://phabricator.wikimedia.org/T148019#2712433 (10hashar) [11:12:17] 10Continuous-Integration-Config, 10Analytics-Dashiki: Add CI job for analytics/mediawiki-storage - https://phabricator.wikimedia.org/T148023#2712435 (10hashar) [11:20:00] RECOVERY - Puppet run on repository is OK: OK: Less than 1.00% above the threshold [0.0] [11:37:19] 06Release-Engineering-Team, 10ArchCom-RfC, 06Developer-Relations, 06WMF-Legal, and 3 others: Create formal process for CREDITS files - https://phabricator.wikimedia.org/T139300#2712491 (10Aklapper) Thanks for working on this! Once merged and dust has settled, https://gerrit.wikimedia.org/r/#/c/211034/ shou... [12:20:01] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2712578 (10Paladox) I found an ssh app on the iPhone so I used that. I have installed this h... [12:39:59] (03PS1) 10Hashar: Experimental npm job for analytics mediawiki-storage and dashiki [integration/config] - 10https://gerrit.wikimedia.org/r/315669 (https://phabricator.wikimedia.org/T148019) [12:40:58] (03CR) 10Hashar: [C: 032] Experimental npm job for analytics mediawiki-storage and dashiki [integration/config] - 10https://gerrit.wikimedia.org/r/315669 (https://phabricator.wikimedia.org/T148019) (owner: 10Hashar) [12:41:34] (03Merged) 10jenkins-bot: Experimental npm job for analytics mediawiki-storage and dashiki [integration/config] - 10https://gerrit.wikimedia.org/r/315669 (https://phabricator.wikimedia.org/T148019) (owner: 10Hashar) [13:37:53] PROBLEM - Puppet run on deployment-redis01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:42:26] !log Cherry picking https://gerrit.wikimedia.org/r/#/c/315248/ on deployment-puppetmaster [13:42:30] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:46:44] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #178: 04FAILURE in 2 min 43 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/178/ [13:47:48] PROBLEM - Puppet run on deployment-imagescaler01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [14:12:52] RECOVERY - Puppet run on deployment-redis01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:22:13] !log Resetting to 61a9cd1f47c5aec8ded92f2486ce43309b9e3e03 on deployment-puppetmaster [14:22:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:27:03] !log Cherry-picking https://gerrit.wikimedia.org/r/#/c/315234/ on deployment-puppetmaster [14:27:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:32:07] !log Resetting to 61a9cd1f47c5aec8ded92f2486ce43309b9e3e03 on deployment-puppetmaster [14:32:11] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:32:56] !log Cherry-picking https://gerrit.wikimedia.org/r/#/c/315234/4 on deployment-puppetmaster [14:32:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:34:06] !log Resetting to 61a9cd1f47c5aec8ded92f2486ce43309b9e3e03 on deployment-puppetmaster [14:34:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:38:06] !log Cherry-picking https://gerrit.wikimedia.org/r/#/c/315234/5 on deployment-puppetmaster [14:38:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [14:41:36] hashar hi, thanks for merging my patches today [14:41:57] hashar also I did the migrating one of the tables in phabricator_search to innodb [14:42:04] and am now running the reindex [14:44:04] 10Beta-Cluster-Infrastructure: Enable Quiz Extension on ca.wikipedia.beta.wmflabs.org for testing - https://phabricator.wikimedia.org/T142692#2712949 (10Toniher) Hello, would it be possible to enable it in beta cluster cawiki soon for starting some tests? Thanks! [14:52:50] RECOVERY - Puppet run on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:57:15] PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:12:41] 10Continuous-Integration-Config, 10Analytics-Dashiki: Dashiki bower has a version conflict for jQuery - https://phabricator.wikimedia.org/T148020#2713053 (10hashar) [15:13:06] 10Continuous-Integration-Config, 10Analytics-Dashiki, 13Patch-For-Review: Add CI job for Dashiki | jQuery version mismatch - https://phabricator.wikimedia.org/T148019#2713055 (10hashar) [15:13:29] 10Continuous-Integration-Config, 10Analytics-Dashiki, 13Patch-For-Review: Add CI job for Dashiki - https://phabricator.wikimedia.org/T148019#2712346 (10hashar) [15:15:23] 10Continuous-Integration-Infrastructure, 07Zuul: Run zuul-clear-refs.py daily on all our repositories to reclaim Zuul references - https://phabricator.wikimedia.org/T103528#2713060 (10hashar) We can now run `/usr/bin/zuul-clear-ref`. But since it is playing with the ref, it is safer to have the zuul-merger ser... [15:31:18] !log add settings to duplicate traffic to thumbor in beta and restart swift-proxy [15:32:15] RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0] [15:35:28] !log Resetting to 61a9cd1f47c5aec8ded92f2486ce43309b9e3e03 on deployment-puppetmaster [15:35:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [15:38:35] hashar twentyafterfour theres a new feature in phab [15:38:44] Persistent chat [15:39:00] It opens your chat rooms so you can view tasks and do other things [15:39:05] A bit like facebook [15:40:01] Its live on phab-01 [16:03:40] !log Cherry-picking https://gerrit.wikimedia.org/r/#/c/315648/ on deployment-puppetmaster [16:03:45] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:04:45] godog: it looks like qa-morebots didn't like your message [16:06:06] ugh you are right gilles [16:06:10] !log add settings to duplicate traffic to thumbor in beta and restart swift-proxy [16:06:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:16:56] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2713364 (10mmodell) Thanks @paladox! I'll reindex. [16:17:58] twentyafterfour i would have wrote on task but I have done the reinde [16:18:00] reinde [16:18:02] reindex [16:18:12] Ssl problems are preventing me from accessing phab [16:23:35] !log Resetting to 61a9cd1f47c5aec8ded92f2486ce43309b9e3e03 on deployment-puppetmaster [16:23:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [17:16:02] This patch for CentralAuth won't merge automatically: https://gerrit.wikimedia.org/r/#/c/309553. Should I just force merge it? [17:16:35] * greg-g looks [17:17:07] kaldari Try doing +2 again [17:17:10] for code review [17:17:16] Please [17:17:25] re +2ed ... [17:18:19] kaldari it seems you also done v+2, if you want to force merge without jenkins [17:18:29] there should be a submit button you press [17:18:44] yeah, I was wondering if I should click it :) [17:19:16] I have no idea what the rules are for CentralAuth, but if it passed the gate and submit then yep the submit button can be pressed [17:20:43] did resettign it to no +2 (either CR or V) then re +2'ing CR not work? [17:20:57] greg-g: no, still didn't merge [17:21:05] :/ go ahead [17:21:48] ah, it says "missing dependency" when I try to merge. I guess that explains it. [17:22:11] no idea what the dependency is though [17:22:34] kaldari, try a rebase [17:22:44] But i see no dependacy either [17:22:49] parent one is merged [17:22:51] Though [17:22:53] So strange [17:23:52] kaldari then you will have to remove your c+2 and re do it [17:23:56] and see if that will work [17:23:56] ? [17:31:35] paladox: it still wouldn't automatically merge, but the manual merge worked this time :) [17:32:09] Oh, hasharAway when your back could you take a look at why jenkins isen't auto merging please? ^^ [17:44:32] PROBLEM - Puppet run on deployment-lvs-realservertest is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:50:41] RECOVERY - Puppet staleness on deployment-lvs-realservertest is OK: OK: Less than 1.00% above the threshold [3600.0] [17:56:57] 06Release-Engineering-Team, 06Operations, 07HHVM, 13Patch-For-Review, 06Services (doing): Migrate deployment servers (tin/mira) to jessie - https://phabricator.wikimedia.org/T144578#2713751 (10mobrovac) [17:59:15] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2713758 (10Paladox) Your welcome, I guess we reindex twice now :) [18:00:43] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2713762 (10Paladox) We should deftly deploy this If this improves things more. [18:02:29] 10Gerrit, 06Repository-Admins: Rename the Semantic Forms extension to "Page Forms" - https://phabricator.wikimedia.org/T147582#2713766 (10Yaron_Koren) @Paladox - thank you! [18:03:24] 10Gerrit, 06Repository-Admins: Rename the Semantic Forms extension to "Page Forms" - https://phabricator.wikimedia.org/T147582#2713773 (10Paladox) @Yaron_Koren your welcome. [18:12:01] 10Beta-Cluster-Infrastructure: Enable Quiz Extension on ca.wikipedia.beta.wmflabs.org for testing - https://phabricator.wikimedia.org/T142692#2543311 (10Krenair) It'll still need a security review before that can be done [18:13:44] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2713790 (10Paladox) @mmodell are we running reindexing on iridium, since It looks like it wor... [18:14:42] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2713792 (10mmodell) @paladox: reindexing already happened a while ago. [18:15:23] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2713794 (10Paladox) Oh, so we doint need to reindex with your change? [18:16:18] I'm wondering where "ci-jessie-wikimedia" nodes are defined. For example, do they include mysql? [18:16:29] awight that's nodepool [18:16:41] It depends, we need to add mysql on per test [18:17:02] PROBLEM - Puppet run on deployment-elastic08 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:17:03] It isen't like instances, this way it is safer and securer since we wont be using things that the test dosent need [18:17:11] That's great news [18:17:22] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2713829 (10mmodell) Right. This just changes the query to always include + before every word.... [18:17:23] * awight hunts for docs [18:17:56] I presume we remove the whitelist and allow every one to use the tests as if they were whitelisted. [18:18:25] Since once the test is finished the instance is deleted and then re created when another test starts [18:19:12] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2713850 (10Paladox) Ok, thankyou. [18:19:12] yeah the isolation is nice [18:19:22] Yep [18:19:40] I'm imagining that configuration is done with hiera, and controls /opt/git/integration/config/dib/puppet/ciimage.pp ? [18:19:55] awight yep i think [18:20:36] awight but we have to define what php version or if php is included in the test in parameters_function.py [18:20:42] in the integration/config repo [18:20:48] ah great lead [18:21:34] Yep [18:24:20] PROBLEM - Host integration-puppetmaster is DOWN: CRITICAL - Host Unreachable (10.68.16.42) [18:42:15] From what I can tell, it seems that mariadb-server is installed by default on nodepool ci-jessie-wikimedia, but I'm having a rough time finding the default passwords. [18:44:33] Oh [18:44:46] I guess one of releng know where the password [18:44:47] is [18:44:49] hashar ^^ [18:46:03] I'm trying root/(no password) at the moment... [18:46:20] Oh [18:46:32] PROBLEM - jenkins_service_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [18:46:44] awight Is this for a new test? [18:47:29] paladox: Yeah, I'm adding tox nose tests to the wikimedia/fundraising/tools repo, and one of the tests will require mysql (with the memory backend) [18:47:37] If you're curious, https://gerrit.wikimedia.org/r/#/c/315638/ [18:47:48] awight you wont need the password, it should work [18:48:31] Oh i see [18:48:32] another benefit of isolation! [18:48:36] you need mysql setup [18:48:38] ACKNOWLEDGEMENT - jenkins_service_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war daniel_zahn gallium is still prod for now [18:48:38] ACKNOWLEDGEMENT - jenkins_zmq_publisher on contint1001 is CRITICAL: connect to address 127.0.0.1 and port 8888: Connection refused daniel_zahn gallium is still prod for now [18:49:08] paladox: I'll happily believe you--but it does seem to be enabled by default [18:49:25] awight yes [18:49:30] Well not really [18:49:32] mw-install-mysql [18:49:35] I think you need ^^ [18:49:38] for that test [18:49:43] Setting up mysql [18:49:45] dib/puppet/ciimage.pp has the line ensure_packages(['mariadb-client', 'mariadb-server']) [18:50:04] huh, okay I'll try that too [18:50:21] Yep, oh [18:52:03] RECOVERY - Puppet run on deployment-elastic08 is OK: OK: Less than 1.00% above the threshold [0.0] [18:52:13] I think I might be in now, the latest error was "OperationalError: (1049, "Unknown database 'test'")" [18:52:28] awight this https://phabricator.wikimedia.org/diffusion/CIJE/browse/master/bin/mw-install-mysql.sh [18:52:31] so root/ is the ticket I suppose [18:52:42] We doint need all that [18:52:47] but the part at the top [18:52:51] paladox: oh wonderful, that does demonstrate the username [18:52:55] Yep [18:53:23] awight we will want to do a seperate test for you if you want to do that so that other user doint do that with the normal test [18:55:51] awight we can do [18:55:52] - project: [18:55:52] name: 'labs' [18:55:52] jobs: [18:55:52] - '{name}-tox-jessie' [18:56:03] labs replaced with what ever name you want the test [18:56:07] to be in {name} [19:00:01] paladox: Are you sure we need that? My plan is to drop and recreate the test database, but assuming the tests are isolated, I shouldn't have to worry about other users or tests, right? [19:05:57] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T146998#2714091 (10Pchelolo) [19:08:11] paladox: success! Thanks for pointing me in the right direction. [19:08:19] I still don't understand the need for a separate test, though... [19:10:11] Hi all. Is it possible to deploy an unmerged core patch on betacluster? [19:10:17] I see instructions for an unmerged puppet patch: https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Cherry-picking_a_patch_from_gerrit [19:18:24] awight oh, if it worked then we wont need a new test [19:18:39] I only suggested a new test if we had to do a shell script to do this [19:19:54] ah, yeah that makes sense. [19:20:05] Yep [19:20:06] FYI the tests I was struggling with pass on CI now! [19:20:11] :) [19:35:03] Looks like soon you can vote on merged changes in gerrit - https://gerrit-review.googlesource.com/#/c/88572/ [19:36:01] (03CR) 10Awight: [C: 04-1] "Almost there--we're just fixing a few tests." [integration/config] - 10https://gerrit.wikimedia.org/r/314481 (https://phabricator.wikimedia.org/T145012) (owner: 10Awight) [20:08:43] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T146998#2714297 (10thcipriani) [20:08:58] (03PS8) 10Hashar: Move composer-hhvm/php5 jobs back to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/306727 (https://phabricator.wikimedia.org/T143938) [20:11:43] (03CR) 10Hashar: [C: 032] "Later on yeah :]" [integration/config] - 10https://gerrit.wikimedia.org/r/306727 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [20:12:14] (03CR) 10Paladox: ":)" [integration/config] - 10https://gerrit.wikimedia.org/r/306727 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [20:12:20] (03Merged) 10jenkins-bot: Move composer-hhvm/php5 jobs back to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/306727 (https://phabricator.wikimedia.org/T143938) (owner: 10Hashar) [20:12:26] !log Switching composer-hhvm / composer-php55 to Nodepool https://gerrit.wikimedia.org/r/#/c/306727/ T143938 [20:12:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [20:13:09] 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2714326 (10hashar) [20:14:07] 10Continuous-Integration-Infrastructure, 07Nodepool: 2016-08-10 CI incident follow-ups - https://phabricator.wikimedia.org/T142952#2714330 (10hashar) [20:14:09] 10Continuous-Integration-Infrastructure, 07Nodepool, 13Patch-For-Review: Bring back jobs to Nodepool - https://phabricator.wikimedia.org/T143938#2584030 (10hashar) 05Open>03Resolved Finally the rollback is complete. I will baby sit / monitor it more and tomorrow morning delete some permanent slaves to f... [20:15:50] 10Continuous-Integration-Infrastructure, 07Nodepool: 2016-08-10 CI incident follow-ups - https://phabricator.wikimedia.org/T142952#2552198 (10hashar) [20:15:52] 10Continuous-Integration-Infrastructure, 06Labs, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2714349 (10hashar) 05Open>03Resolved a:03hashar This has been fixed up quite fast. The root cause is the quota we... [20:16:40] 10Continuous-Integration-Infrastructure, 07Nodepool: 2016-08-10 CI incident follow-ups - https://phabricator.wikimedia.org/T142952#2714353 (10hashar) p:05Normal>03Low All settled. The last action is to upgrade some python modules on labnodepool1001.eqiad.wmnet T137217 Effects unknown though :( [20:23:07] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T146998#2714384 (10thcipriani) [20:23:15] PROBLEM - Puppet run on deployment-ores-redis is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:47:29] thcipriani: so that's the current blocker now? https://phabricator.wikimedia.org/T147971 (sorry, I had a long lunch break due to $reasons) [20:47:57] greg-g: yeap [20:48:09] blew up pretty huge when I rolled forward [20:48:18] ongoing discussion in -operations [20:48:49] PROBLEM - Puppet run on deployment-fluorine02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:49:09] gotcha [20:58:09] 06Release-Engineering-Team, 10ArchCom-RfC, 06Developer-Relations, 06WMF-Legal, and 3 others: Create formal process for CREDITS files - https://phabricator.wikimedia.org/T139300#2714492 (10RobLa-WMF) >>! In T139300#2712491, @Aklapper wrote: > Thanks for working on this! > Once merged and dust has settled, h... [21:09:38] 06Release-Engineering-Team, 05MW-1.28-release: Release MW 1.28 - https://phabricator.wikimedia.org/T148087#2714505 (10greg) [21:12:04] 06Release-Engineering-Team, 03releng-201617-q2, 05MW-1.28-release: Release MW 1.28 - https://phabricator.wikimedia.org/T148087#2714519 (10greg) [21:14:08] 03Scap3 (Scap3-MediaWiki-MVP), 03releng-201617-q2: Flatten MediaWiki config, all MediaWiki versions, and extensions into a unified git repo - https://phabricator.wikimedia.org/T147478#2714522 (10greg) [21:14:30] 03Scap3 (Scap3-MediaWiki-MVP), 10releng-201516-q3, 10scap, 07Tracking, 07WorkType-NewFunctionality: [keyresult] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#2714523 (10greg) [21:14:49] 03Scap3 (Scap3-MediaWiki-MVP), 10releng-201516-q3, 10scap, 07Tracking, 07WorkType-NewFunctionality: [EPIC] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#1691147 (10greg) [21:19:04] PROBLEM - Host integration-slave-trusty-1018 is DOWN: CRITICAL - Host Unreachable (10.68.16.240) [21:19:20] PROBLEM - Host integration-slave-trusty-1017 is DOWN: CRITICAL - Host Unreachable (10.68.17.28) [21:20:14] PROBLEM - Host integration-slave-trusty-1014 is DOWN: CRITICAL - Host Unreachable (10.68.16.235) [21:20:36] ^^me [21:21:51] !log Deleted CI slaves integration-slave-jessie-1004 integration-slave-jessie-1005 integration-slave-trusty-1013 integration-slave-trusty-1014 integration-slave-trusty-1017 integration-slave-trusty-1018 [21:21:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [21:22:47] PROBLEM - Host integration-slave-trusty-1013 is DOWN: CRITICAL - Host Unreachable (10.68.18.28) [21:23:54] ah grafana got upgraded [21:23:58] PROBLEM - Host integration-slave-jessie-1004 is DOWN: CRITICAL - Host Unreachable (10.68.19.167) [21:24:10] when one change the default time, the URL get updated with proper parameter [21:24:11] s. [21:24:23] https://grafana.wikimedia.org/dashboard/db/labs-capacity-planning?panelId=6&fullscreen&from=now-30m&to=now [21:24:29] I have freed up some disk on labs infra :] [21:24:58] PROBLEM - Host integration-slave-jessie-1005 is DOWN: CRITICAL - Host Unreachable (10.68.16.223) [21:38:14] RECOVERY - Puppet run on deployment-ores-redis is OK: OK: Less than 1.00% above the threshold [0.0] [21:39:06] thcipriani: Where were those errors appearing from Zero stuff? [21:39:15] Just wondering where to look when trying a fix on mw1017? [21:39:42] Reedy: Saw them in fatalmonitor on logstash [21:39:55] was a big spike as soon as I moved the wikipedia wikis [21:40:16] Mmmm [21:40:21] Wonder if we'd notice mw1017 [21:40:36] https://logstash.wikimedia.org/goto/e7461b2e5ce52210eae2c4caeb2081d0 [21:41:26] 06Release-Engineering-Team, 10AbuseFilter, 10Wikimedia-Logstash, 13Patch-For-Review, and 2 others: Should we keep StashEdit logs in AbuseFilter? - https://phabricator.wikimedia.org/T146697#2714553 (10hashar) 05Open>03Resolved a:03aaron The most spasming message has been lowered from INFO to DEBUG whi... [21:41:56] aha, perfect [21:41:59] and filter by host... [21:42:11] ok, let's setup a deploy [21:43:13] alright. Do you just want to move mw1017 to .22? [21:43:21] that was my plan :) [21:43:25] just merging the config fix [21:43:28] cool [21:43:30] was gonna do a local revert of your revert [21:43:34] fine with me [21:43:38] scap pull on mw1017? [21:43:46] then how do we get it to rebuild wikiversions? [21:43:52] eh, you may have to do a rebuild...one sec [21:44:02] I can rebuild on mira and sync if it's easier [21:44:09] obviously not syncing everywhere [21:45:29] scap wikiversion-compile I think [21:45:52] locally on mw1017 post scap pull [21:46:27] reedy@mw1017:~$ scap wikiversions-compile [21:46:27] 21:46:18 wikiversions-compile must be run as user mwdeploy [21:46:28] 21:46:18 Compiled /srv/mediawiki/wikiversions.json to /srv/mediawiki/wikiversions.php [21:46:35] Does that mean it ran it as the right user? [21:46:36] * Reedy looks [21:46:47] yes, it seems [21:47:05] hrm [21:47:31] ah, yeah, seems to call exec as that user [21:47:35] It's unobvious if that means I'm supposed to sudo ;) [21:47:36] cool [21:48:12] reedy@mw1017:/srv/mediawiki$ mwscript eval.php enwiki [21:48:12] PHP Fatal error: Class 'Memcached' not found in /srv/mediawiki/php-1.28.0-wmf.22/includes/libs/objectcache/MemcachedPeclBagOStuff.php on line 63 [21:48:12] Fatal error: Class 'Memcached' not found in /srv/mediawiki/php-1.28.0-wmf.22/includes/libs/objectcache/MemcachedPeclBagOStuff.php on line 63 [21:48:13] bah [21:48:21] obviously not in php-cli [21:48:41] ah shit. I think this just got reimaged as jessie and without php5 [21:48:48] heh [21:48:51] https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor?_g=(refreshInterval:(display:'5%20minutes',pause:!f,section:2,value:300000),time:(from:'2016-10-13T20:09:13.420Z',mode:absolute,to:'2016-10-13T20:20:16.578Z'))&_a=(filters:!(('$state':(store:appState),bool:(must:!((terms:(level:!(NOTICE,INFO,WARNING))),(term:(type:mediawiki)))),meta:(alias:!n,disabled:!f,index:'logstash-*',key:bool,negate:!t,value:'%7B%22must%22:%5B%7B [21:48:51] %22terms%22:%7B%22level%22:%5B%22NOTICE%22,%22INFO%22,%22WARNING%22%5D%7D%7D,%7B%22term%22:%7B%22type%22:%22mediawiki%22%7D%7D%5D%7D')),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logstash-*',key:message,negate:!t,value:SlowTimer),query:(match:(message:(query:SlowTimer,type:phrase)))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logstash-*',key:message,negate:!t,value:'Invalid%20host%20name'),query:( [21:48:51] match:(message:(query:'Invalid%20host%20name',type:phrase)))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logstash-*',key:host,negate:!f,value:mw1017),query:(match:(host:(query:mw1017,type:phrase))))),options:(darkTheme:!f),panels:!((col:1,id:Top-20-Hosts,panelIndex:2,row:3,size_x:9,size_y:2,type:visualization),(col:1,columns:!(type,level,wiki,host,message),id:Default-Events-List,panelIndex:3,row:10,size_x:12,size_y: [21:48:51] 23,sort:!('@timestamp',desc),type:search),(col:1,id:Fatal-Events-Over-Time,panelIndex:4,row:1,size_x:12,size_y:2,type:visualization),(col:1,id:Trending-Messages,panelIndex:5,row:5,size_x:12,size_y:5,type:visualization),(col:10,id:MediaWiki-Versions,panelIndex:6,row:3,size_x:3,size_y:2,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'(type:mediawiki%20AND%20(channel:exception%20OR%20channel:wfLogDBError))%20OR% [21:48:53] 20type:hhvm')),title:'Fatal%20Monitor',uiState:(P-2:(spy:(mode:(fill:!f,name:!n)),vis:(legendOpen:!f)),P-4:(spy:(mode:(fill:!f,name:!n)),vis:(colors:(exception:%23C15C17,hhvm:%23BF1B00))),P-6:(spy:(mode:(fill:!f,name:!n)),vis:(legendOpen:!t)))) [21:48:57] Ugh, ffs [21:49:02] or php5 packages rather. [21:49:10] https://logstash.wikimedia.org/app/kibana#/dashboard/Fatal-Monitor?_g=(refreshInterval:(display:'5%20minutes',pause:!f,section:2,value:300000),time:(from:'2016-10-13T20:09:13.420Z',mode:absolute,to:'2016-10-13T20:20:16.578Z'))&_a=(filters:!(('$state':(store:appState),bool:(must:!((terms:(level:!(NOTICE,INFO,WARNING))),(term:(type:mediawiki)))),meta:(alias:!n,disabled:!f,index:'logstash-*',key:bool,negate:!t,value:'%7B%22must%22:%5B%7B [21:49:11] %22terms%22:%7B%22level%22:%5B%22NOTICE%22,%22INFO%22,%22WARNING%22%5D%7D%7D,%7B%22term%22:%7B%22type%22:%22mediawiki%22%7D%7D%5D%7D')),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logstash-*',key:message,negate:!t,value:SlowTimer),query:(match:(message:(query:SlowTimer,type:phrase)))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logstash-*',key:message,negate:!t,value:'Invalid%20host%20name'),query:( [21:49:13] match:(message:(query:'Invalid%20host%20name',type:phrase)))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'logstash-*',key:host,negate:!f,value:mw1017),query:(match:(host:(query:mw1017,type:phrase))))),options:(darkTheme:!f),panels:!((col:1,id:Top-20-Hosts,panelIndex:2,row:3,size_x:9,size_y:2,type:visualization),(col:1,columns:!(type,level,wiki,host,message),id:Default-Events-List,panelIndex:3,row:10,size_x:12,size_y: [21:49:15] 23,sort:!('@timestamp',desc),type:search),(col:1,id:Fatal-Events-Over-Time,panelIndex:4,row:1,size_x:12,size_y:2,type:visualization),(col:1,id:Trending-Messages,panelIndex:5,row:5,size_x:12,size_y:5,type:visualization),(col:10,id:MediaWiki-Versions,panelIndex:6,row:3,size_x:3,size_y:2,type:visualization)),query:(query_string:(analyze_wildcard:!t,query:'(type:mediawiki%20AND%20(channel:exception%20OR%20channel:wfLogDBError))%20OR% [21:49:16] thanks kibana4 [21:49:17] 20type:hhvm')),title:'Fatal%20Monitor',uiState:(P-2:(spy:(mode:(fill:!f,name:!n)),vis:(legendOpen:!f)),P-4:(spy:(mode:(fill:!f,name:!n)),vis:(colors:(exception:%23C15C17,hhvm:%23BF1B00))),P-6:(spy:(mode:(fill:!f,name:!n)),vis:(legendOpen:!t)))) [21:49:21] That is not a share url, logstash [21:49:40] So, my fix seems to be having the intended result [21:49:46] but I wanted to compare on eval.php [21:49:47] why would keeping application state in a url every cause problems? [21:50:00] :) [21:50:30] Reedy: after you hit the [^] share toggle, hit the >< icon second from the right to get a short url [21:51:01] it will give you something like https://logstash.wikimedia.org/goto/...... [21:51:34] * Reedy puts on the portal OST [21:51:44] yurik: ^ [21:52:01] https://phabricator.wikimedia.org/P4213 [21:52:09] I obviously removed the username and password manually [21:52:26] Reedy, looks good to me [21:52:32] lets depl :) [21:52:56] greg-g: thcipriani: Are you alright if I push everything back to .22? [21:53:02] I'll make a revert commit in gerrit [21:53:25] thcipriani: your call [21:53:28] Reedy: that was the last blocker, so works for me. [21:54:02] So, first, full deploy of my hack to mobile.php [21:55:20] should probably migrate deploy convo back to -operations [21:56:00] mmm [22:14:08] composer jobs on Nodepool vm looks all fine [22:14:13] so time for a sleep :] [22:14:27] see you tomorrow ! [22:14:54] hashar: g'night sir [22:27:01] 10Beta-Cluster-Infrastructure: Enable Quiz Extension on ca.wikipedia.beta.wmflabs.org for testing - https://phabricator.wikimedia.org/T142692#2543311 (10greg) >>! In T142692#2713788, @Krenair wrote: > It'll still need a security review before that can be done Made the blocker task. [22:52:17] 06Release-Engineering-Team, 10DBA, 10Phabricator, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2714880 (10mmodell) [22:54:27] * Reedy checks on logstash [22:54:54] it seems angry [22:58:48] 06Release-Engineering-Team, 15User-greg: Create RelEng FY1617Q1 QR slides (due 10/14) - https://phabricator.wikimedia.org/T146515#2714892 (10greg) [22:58:59] 06Release-Engineering-Team, 15User-greg: Gather workflow metrics for FY1617Q1 - https://phabricator.wikimedia.org/T146519#2714894 (10greg) 05Open>03Resolved [22:59:01] 06Release-Engineering-Team, 15User-greg: Create RelEng FY1617Q1 QR slides (due 10/14) - https://phabricator.wikimedia.org/T146515#2663812 (10greg) [23:00:58] 10Gerrit, 13Patch-For-Review: Update site CSS customizations for the new change screen in Gerrit 2.12 - https://phabricator.wikimedia.org/T141286#2493106 (10Dzahn) applied on prod gerrit [23:30:25] 06Release-Engineering-Team, 15User-greg: Update RelEng skillmatrix (due 10/18) - https://phabricator.wikimedia.org/T146516#2715031 (10greg) [23:30:38] 06Release-Engineering-Team, 15User-greg: Update RelEng skillmatrix (due 10/18) - https://phabricator.wikimedia.org/T146516#2663826 (10greg) [23:30:40] 06Release-Engineering-Team, 15User-greg: skill matrix updates - https://phabricator.wikimedia.org/T140507#2715032 (10greg) [23:31:02] 06Release-Engineering-Team, 15User-greg: skill matrix updates - https://phabricator.wikimedia.org/T140507#2466983 (10greg) Sticking to the wikipage for now, current setup. May work on this post QR for next quarter...