[01:47:17] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<22.22%) [06:37:18] RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK [07:02:33] RECOVERY - Free space - all mounts on deployment-videoscaler01 is OK All targets OK [07:57:06] 6Release-Engineering: broken link to /w/COPYING in Special:Version on production wikis - https://phabricator.wikimedia.org/T107007#1484102 (10Spage) 3NEW [08:11:58] !log apt-get upgrade Trusty Jenkins slaves [08:12:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:12:29] !log On CI slaves, bumping HHVM from 3.6.1+dfsg1-1+wm3 to 3.6.5+dfsg1-1+wm1 [08:12:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:24:25] 10Continuous-Integration-Infrastructure, 7HHVM: Upgrade HHVM related packages on Trusty Jenkins slaves - https://phabricator.wikimedia.org/T106699#1484130 (10hashar) 5Open>3Resolved a:3hashar @Joe confirmed the extension are compatible, the extension packages have `Provides: hhvm-api-20150212` like the P... [08:24:33] PROBLEM - Puppet failure on integration-slave-trusty-1011 is CRITICAL 40.00% of data above the critical threshold [0.0] [08:25:52] PROBLEM - Puppet failure on integration-slave-trusty-1012 is CRITICAL 60.00% of data above the critical threshold [0.0] [08:26:05] I broke puppet :D [08:26:55] PROBLEM - Puppet failure on integration-slave-trusty-1016 is CRITICAL 50.00% of data above the critical threshold [0.0] [08:39:52] !log upgrading python-pip on Trusty from 1.5.4-1ubuntu1 to 1.5.4-1ubuntu3 . Fix up pip silently removing system packages ( https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771794 ) [08:39:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:46:57] RECOVERY - Puppet failure on integration-slave-trusty-1016 is OK Less than 1.00% above the threshold [0.0] [08:49:33] RECOVERY - Puppet failure on integration-slave-trusty-1011 is OK Less than 1.00% above the threshold [0.0] [08:49:36] !log rebooting all Trusty jenkins slaves [08:49:39] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:50:53] RECOVERY - Puppet failure on integration-slave-trusty-1012 is OK Less than 1.00% above the threshold [0.0] [08:52:19] !log upgrading packages on Precise slaves [08:52:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:52:55] PROBLEM - Puppet failure on integration-slave-trusty-1016 is CRITICAL 30.00% of data above the critical threshold [0.0] [08:56:58] PROBLEM - Puppet failure on integration-slave-precise-1013 is CRITICAL 40.00% of data above the critical threshold [0.0] [08:59:12] PROBLEM - Puppet failure on integration-slave-precise-1014 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:01:06] PROBLEM - Puppet failure on integration-slave-precise-1012 is CRITICAL 11.11% of data above the critical threshold [0.0] [09:01:44] PROBLEM - Puppet failure on integration-slave-precise-1011 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:06:04] RECOVERY - Puppet failure on integration-slave-precise-1012 is OK Less than 1.00% above the threshold [0.0] [09:07:56] RECOVERY - Puppet failure on integration-slave-trusty-1016 is OK Less than 1.00% above the threshold [0.0] [09:11:43] RECOVERY - Puppet failure on integration-slave-precise-1011 is OK Less than 1.00% above the threshold [0.0] [09:14:11] RECOVERY - Puppet failure on integration-slave-precise-1014 is OK Less than 1.00% above the threshold [0.0] [09:16:57] RECOVERY - Puppet failure on integration-slave-precise-1013 is OK Less than 1.00% above the threshold [0.0] [09:22:26] PROBLEM - Puppet failure on deployment-mx is CRITICAL 100.00% of data above the critical threshold [0.0] [09:30:06] RECOVERY - Puppet failure on integration-dev is OK Less than 1.00% above the threshold [0.0] [09:37:43] PROBLEM - Puppet failure on integration-slave-precise-1011 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:37:53] PROBLEM - Puppet failure on integration-slave-precise-1013 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:40:13] PROBLEM - Puppet failure on integration-slave-precise-1014 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:42:08] PROBLEM - Puppet failure on integration-slave-precise-1012 is CRITICAL 66.67% of data above the critical threshold [0.0] [09:45:34] RECOVERY - Free space - all mounts on integration-slave-trusty-1015 is OK All targets OK [10:16:00] 6Release-Engineering: Ready-to-use Docker package for MediaWiki - https://phabricator.wikimedia.org/T92826#1484393 (10Nemo_bis) [10:24:27] (03PS2) 10Hashar: Add the npm job for the MobileApps service repository [integration/config] - 10https://gerrit.wikimedia.org/r/226736 (https://phabricator.wikimedia.org/T106831) (owner: 10Mobrovac) [10:25:47] (03CR) 10Hashar: [C: 032] Add the npm job for the MobileApps service repository [integration/config] - 10https://gerrit.wikimedia.org/r/226736 (https://phabricator.wikimedia.org/T106831) (owner: 10Mobrovac) [10:27:18] (03Merged) 10jenkins-bot: Add the npm job for the MobileApps service repository [integration/config] - 10https://gerrit.wikimedia.org/r/226736 (https://phabricator.wikimedia.org/T106831) (owner: 10Mobrovac) [10:40:08] (03PS1) 10Hashar: tests: report test duration with nose-timer [integration/config] - 10https://gerrit.wikimedia.org/r/227199 [10:54:23] (03CR) 10Hashar: "The tox-py27 job console shows duration for each test now: https://integration.wikimedia.org/ci/job/tox-py27/2239/console" [integration/config] - 10https://gerrit.wikimedia.org/r/227199 (owner: 10Hashar) [11:03:27] (03CR) 10Hashar: "gotta set debian-glue $distribution based on the changelog entry." [integration/config] - 10https://gerrit.wikimedia.org/r/226911 (owner: 10Hashar) [12:18:27] 6Release-Engineering: broken link to /w/COPYING in Special:Version on production wikis - https://phabricator.wikimedia.org/T107007#1484619 (10mmodell) a:3mmodell [12:18:39] 6Release-Engineering: broken link to /w/COPYING in Special:Version on production wikis - https://phabricator.wikimedia.org/T107007#1484102 (10mmodell) p:5Triage>3High [12:35:17] 10Beta-Cluster, 10Pywikibot-OAuth: Set up a Pywikibot OAuth test client on the Beta cluster - https://phabricator.wikimedia.org/T104764#1484629 (10hashar) Changed group membership for User:Pywikibot-oauth from (none) to confirmed user :-} [12:37:25] 6Release-Engineering: broken link to /w/COPYING in Special:Version on production wikis - https://phabricator.wikimedia.org/T107007#1484632 (10mmodell) As far as I can tell this must have been broken for a long time. /w/static/$VERSION/ contains 3 symlinks: |`extensions/`|all of the deployed extensions| |`skin... [12:39:55] PROBLEM - Puppet failure on integration-slave-trusty-1014 is CRITICAL 60.00% of data above the critical threshold [0.0] [12:44:48] 6Release-Engineering: broken link to /w/COPYING in Special:Version on production wikis - https://phabricator.wikimedia.org/T107007#1484635 (10hashar) The `w/COPYING` and `w/CREDITS` were removed by https://gerrit.wikimedia.org/r/#/c/177476/ which was self merged. Note the links show as broken in a checkout of o... [13:14:56] RECOVERY - Puppet failure on integration-slave-trusty-1014 is OK Less than 1.00% above the threshold [0.0] [13:15:47] pfff [13:19:32] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<20.00%) [13:21:18] !log puppet stalled on Precise Jenkins slaves :-( [13:21:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:26:14] (03PS1) 10Paladox: Add BlogPage to testextension [integration/config] - 10https://gerrit.wikimedia.org/r/227217 [13:26:27] (03PS2) 10Paladox: Add BlogPage to testextension [integration/config] - 10https://gerrit.wikimedia.org/r/227217 [13:26:40] !log Precise slaves had faulty elasticsearch: apt-get install --reinstall elasticsearch [13:26:43] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [13:31:22] (03PS8) 10Paladox: Add jenkings test for BoilerPlate [integration/config] - 10https://gerrit.wikimedia.org/r/226680 [13:42:42] RECOVERY - Puppet failure on integration-slave-precise-1011 is OK Less than 1.00% above the threshold [0.0] [13:45:11] RECOVERY - Puppet failure on integration-slave-precise-1014 is OK Less than 1.00% above the threshold [0.0] [13:46:02] 10Continuous-Integration-Infrastructure, 7Zuul: Zuul fails "Function build:integration-zuul-layoutdiff is not registered (negative ttl in effect)" - https://phabricator.wikimedia.org/T94657#1484732 (10hashar) 5Open>3declined a:3hashar Not sure what happened. We can investigate properly if it occurs again. [13:47:04] RECOVERY - Puppet failure on integration-slave-precise-1012 is OK Less than 1.00% above the threshold [0.0] [13:47:55] RECOVERY - Puppet failure on integration-slave-precise-1013 is OK Less than 1.00% above the threshold [0.0] [14:11:11] PROBLEM - Free space - all mounts on integration-slave-trusty-1011 is CRITICAL integration.integration-slave-trusty-1011.diskspace._mnt.byte_percentfree (<30.00%) [14:22:16] (03PS1) 10Hashar: 'recheck' on CR+2 now triggers gate-and-submit [integration/config] - 10https://gerrit.wikimedia.org/r/227223 (https://phabricator.wikimedia.org/T105474) [14:22:45] (03CR) 10Hashar: [C: 04-1] "Requires a Zuul upgrade to include https://review.openstack.org/#/c/102726/26/doc/source/zuul.rst" [integration/config] - 10https://gerrit.wikimedia.org/r/227223 (https://phabricator.wikimedia.org/T105474) (owner: 10Hashar) [14:23:43] (03CR) 10jenkins-bot: [V: 04-1] 'recheck' on CR+2 now triggers gate-and-submit [integration/config] - 10https://gerrit.wikimedia.org/r/227223 (https://phabricator.wikimedia.org/T105474) (owner: 10Hashar) [14:36:09] RECOVERY - Free space - all mounts on integration-slave-trusty-1011 is OK All targets OK [14:40:02] 10Continuous-Integration-Infrastructure: Package / puppetize zuul-clear-refs.py - https://phabricator.wikimedia.org/T103529#1484872 (10hashar) a:3hashar [14:40:16] 10Continuous-Integration-Infrastructure: Package / puppetize zuul-clear-refs.py - https://phabricator.wikimedia.org/T103529#1392464 (10hashar) p:5Normal>3Low Still being reviewed upstream at https://review.openstack.org/#/c/109276/ [14:40:46] (03PS4) 10Paladox: Update Apex tests [integration/config] - 10https://gerrit.wikimedia.org/r/226994 [14:42:24] 6Release-Engineering, 10Wikimedia-Git-or-Gerrit: Unreviewed commits merged in gerrit - https://phabricator.wikimedia.org/T103396#1484885 (10hashar) [14:43:13] 10Beta-Cluster, 10Pywikibot-OAuth: Set up a Pywikibot OAuth test client on the Beta cluster - https://phabricator.wikimedia.org/T104764#1484887 (10VcamX) @hashar Thanks! [Th new consumer](http://deployment.wikimedia.beta.wmflabs.org/w/index.php?title=Special:OAuthListConsumers/view/8dd31b299e8e455b6b6d12fbd5c5... [14:44:31] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: Diamond metrics for cpu.system suddenly up 100% after a reboot - https://phabricator.wikimedia.org/T95912#1484891 (10hashar) 5Open>3Resolved a:3hashar I rebooted the slaves and all metrics look fine. [14:44:43] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: Diamond collected metrics about memory usage inaccurate until third reboot - https://phabricator.wikimedia.org/T91351#1484903 (10hashar) I rebooted the slaves and all metrics look fine. [14:44:45] 10Continuous-Integration-Infrastructure: Re-create ci slaves (April 2015) - https://phabricator.wikimedia.org/T94916#1484906 (10hashar) [14:44:48] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: Diamond collected metrics about memory usage inaccurate until third reboot - https://phabricator.wikimedia.org/T91351#1484904 (10hashar) 5Open>3Resolved a:3hashar [14:51:25] 10Continuous-Integration-Infrastructure, 10Wikidata, 7Composer: create mirror for for our composer dependencies - https://phabricator.wikimedia.org/T106548#1484922 (10hashar) I played a bit with Satis ages ago. IIRC you can list the packages you want to install on the mirror and end up with a local/private... [15:00:50] 10Continuous-Integration-Infrastructure, 10Wikidata: github.com is 403ing downloads from Wikimedia CI during composer update - https://phabricator.wikimedia.org/T106519#1484941 (10Addshore) Just took another looks at this and it would seem there are other URLs that composer could use to download the zips that... [15:21:10] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Create CI slaves using Debian Jessie (tracking) - https://phabricator.wikimedia.org/T94836#1485044 (10hashar) [15:21:12] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Install etcd on Jessie CI slaves - https://phabricator.wikimedia.org/T103976#1485041 (10hashar) 5Open>3Resolved a:3hashar We moved the tox jobs to Jessie as part of T103972 [15:22:54] 10Continuous-Integration-Infrastructure, 10Wikimedia-Fundraising-CiviCRM: CI for Civi: provision and run tests under Jenkins/Zuul - https://phabricator.wikimedia.org/T86103#961478 (10hashar) Seems this is just waiting for {T91911}. [15:23:28] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL 100.00% of data above the critical threshold [0.0] [15:31:52] greg-g: It's just a POC at this point, but -- https://stashbot.wmflabs.org/#/dashboard/elasticsearch/SAL [15:35:58] heh https://stashbot.wmflabs.org/#/dashboard/elasticsearch/default [15:36:01] bd808: neat-o [15:37:17] crazy plan #2: provide searchable irc logs by channel. Like wm-bot logs but without all the download and grep [15:37:35] (03PS9) 10Paladox: Update CheckUser tests [integration/config] - 10https://gerrit.wikimedia.org/r/225182 [15:38:09] ie: "who needs slack?" :) [15:38:34] (03PS8) 10Paladox: Update Maintenance extension tests [integration/config] - 10https://gerrit.wikimedia.org/r/225222 [15:38:42] crazy plan #3: "!bash some really cool quote here" in any of those channels will feed an index to power a quips tool :) [15:39:08] now you're talking [15:39:40] (03PS13) 10Paladox: Update SyntaxHighlight_GeSHi tests [integration/config] - 10https://gerrit.wikimedia.org/r/225035 [15:40:54] most useful thing I discovered working on it this weekend: regex to strip irc formatting codes [15:40:58] (03PS8) 10Paladox: Update farmer tests [integration/config] - 10https://gerrit.wikimedia.org/r/225042 [15:41:18] (03PS9) 10Paladox: Update tests in vector extension [integration/config] - 10https://gerrit.wikimedia.org/r/225029 [15:42:01] (03PS15) 10Paladox: Update theme tests [integration/config] - 10https://gerrit.wikimedia.org/r/224838 [15:42:07] (03PS17) 10Paladox: Update tests for Vector skin [integration/config] - 10https://gerrit.wikimedia.org/r/224824 [15:44:08] (03PS8) 10Paladox: Update BreadCrumbs tests [integration/config] - 10https://gerrit.wikimedia.org/r/225291 [15:44:10] bd808: but! colors! [15:44:25] (03PS3) 10Paladox: Add extension-unittests-generic to metrolook [integration/config] - 10https://gerrit.wikimedia.org/r/226995 [16:05:49] greg-g: subjective irc logging question for you: are join/part/nick messages useful in a channel log? [16:06:33] I'm not catching them yet with the logstash setup and debating if they are really useful or just noise [16:07:06] other than an occasional nick based joke I'm leaning to them just being noise [16:11:21] bd808: probably for this use case not, unless you want to log it all, then do post-processing? [16:11:36] kinda like the SAL view you have [16:12:30] I think I'll leave them out until people actually use the tool and whine that they are missing [16:13:54] good call :) [16:15:55] (03PS1) 10Hashar: Throttle mediawiki core jobs to one per node [integration/config] - 10https://gerrit.wikimedia.org/r/227234 [16:33:03] ostriches: I jsut added a repo request to https://www.mediawiki.org/wiki/Git/New_repositories/Requests -- I need an operations/software/logstash/plugins.git repo to store gem files in for the newest version of logstash [16:33:26] which is yuck but what they have decided to do now :/ [16:34:28] bd808: fyi, he's still on vacation through today [16:34:36] bah [16:34:45] "reachable by email" [16:34:56] who else has gerrit repo creation powers? [16:35:06] qchris and ...? [16:35:15] opsen? [16:35:32] I think i can create via UI [16:35:37] dunno if this is specific to me [16:36:35] uhh don't [16:36:36] https://gerrit.wikimedia.org/r/#/admin/create-project/ [16:36:42] roots have all the fun [16:36:43] this is documented...somewhere [16:36:58] https://www.mediawiki.org/wiki/Git/Creating_new_repositories [16:37:18] https://gerrit.wikimedia.org/r/#/admin/groups/119,members [16:37:19] > Even though there's a way to create new repositories via the Gerrit web interface, please don't use that GUI. [16:37:21] in bold too! [16:37:48] ori has da powers! I can bug him [16:37:55] I've done it with success but I won't recommend it :) [16:44:56] 10Browser-Tests, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: investigate failing Wikidata browsertests on jenkins - https://phabricator.wikimedia.org/T92619#1485301 (10Jonas) [16:45:15] 10Browser-Tests, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: investigate failing Wikidata browsertests on jenkins - https://phabricator.wikimedia.org/T92619#1116439 (10Jonas) [16:55:49] 10Browser-Tests, 10MediaWiki-extensions-DonationInterface: Write browser tests for DonationInterface - https://phabricator.wikimedia.org/T99955#1485345 (10awight) [17:02:18] marxarelli: dan can you state that Jenkins config is freeform and anyone is welcome to help :D_ [17:02:20] bah too late [17:02:25] time goes fast [17:02:41] marxarelli: thank you to have set up that workshop [18:07:47] PROBLEM - Free space - all mounts on deployment-fluorine is CRITICAL deployment-prep.deployment-fluorine.diskspace.root.byte_percentfree (<100.00%) [18:19:09] 10Continuous-Integration-Infrastructure, 10Wikidata: github.com is 403ing downloads from Wikimedia CI during composer update - https://phabricator.wikimedia.org/T106519#1485647 (10JanZerebecki) We will still need a local cache like satis. But changing composer to not hit the github api if it is possible is a g... [18:24:31] (03CR) 10JanZerebecki: [C: 032] tests: report test duration with nose-timer [integration/config] - 10https://gerrit.wikimedia.org/r/227199 (owner: 10Hashar) [18:28:03] (03Merged) 10jenkins-bot: tests: report test duration with nose-timer [integration/config] - 10https://gerrit.wikimedia.org/r/227199 (owner: 10Hashar) [18:45:13] 10Continuous-Integration-Infrastructure, 10Wikidata: github.com is 403ing downloads from Wikimedia CI during composer update - https://phabricator.wikimedia.org/T106519#1485720 (10hashar) Maybe we can route all requests to a shared web proxy that would cache the packages? That would benefit other package syst... [18:49:12] 10Beta-Cluster, 10Analytics-EventLogging, 10VisualEditor: Beta cluster is sending VisualEditor events to production bits.wikimedia.org/statsv - https://phabricator.wikimedia.org/T98196#1485738 (10Jdforrester-WMF) [18:49:48] (03CR) 10Ejegg: "The call to composer was added to the wmff configuration in our branch of civicrm-buildkit in I6b7e01719d90bf3f6450bf8938a6a1a29921aa1c" [integration/config] - 10https://gerrit.wikimedia.org/r/221310 (owner: 10Awight) [18:50:44] 10Beta-Cluster, 10Pywikibot-OAuth: Set up a Pywikibot OAuth test client on the Beta cluster - https://phabricator.wikimedia.org/T104764#1485748 (10hashar) http://deployment.wikimedia.beta.wmflabs.org/w/index.php?title=Special:OAuthListConsumers/view/8dd31b299e8e455b6b6d12fbd5c5d2c1&name=&publisher=&stage=0 is... [18:58:44] (03CR) 10Ejegg: "Or rather, this patch makes it stop needlessly checking out the vendor submodule. That submodule has .git directories removed for the dev" [integration/config] - 10https://gerrit.wikimedia.org/r/221310 (owner: 10Awight) [19:03:19] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<55.56%) [19:10:54] 10Continuous-Integration-Infrastructure, 10Wikidata: github.com is 403ing downloads from Wikimedia CI during composer update - https://phabricator.wikimedia.org/T106519#1485812 (10JanZerebecki) Yes unless we add a MITM proxy they can not be cached by a normal proxy. [19:15:39] (03CR) 10Dduvall: "recheck" [selenium] - 10https://gerrit.wikimedia.org/r/226950 (owner: 10Dduvall) [19:15:43] 10Browser-Tests, 10MediaWiki-extensions-General-or-Unknown: running `bundle install` on the core repo modifies Gemfile.lock - https://phabricator.wikimedia.org/T107071#1485842 (10Amire80) 3NEW [19:18:36] 10Beta-Cluster, 10Parsoid, 7Varnish: deployment-parsoidcache02 fails varnish VCL compilation - https://phabricator.wikimedia.org/T106662#1485860 (10thcipriani) p:5Triage>3Normal [19:19:06] 10Beta-Cluster, 10Parsoid, 7Varnish: deployment-parsoidcache02 fails varnish VCL compilation - https://phabricator.wikimedia.org/T106662#1485866 (10hashar) Error is still present. [19:20:40] 10Beta-Cluster, 6Labs, 6operations, 7Monitoring: Setup (simple) catchpoint monitoring and metrics for enwiki betacluster just like production - https://phabricator.wikimedia.org/T97865#1485870 (10hashar) Will be done with Jenkins, see {T106421}. [19:22:09] 10Beta-Cluster, 6Release-Engineering, 7Jenkins, 7Monitoring: Create metrics of Beta Cluster stability using a Jenkins job - https://phabricator.wikimedia.org/T106421#1485881 (10thcipriani) p:5Triage>3Normal [19:24:25] (03CR) 10JanZerebecki: Update vendor using composer rather than cloning the deployment repo [integration/config] - 10https://gerrit.wikimedia.org/r/221310 (owner: 10Awight) [19:24:40] 10Beta-Cluster: Parser cache (memcached?) broken in Beta Cluster - https://phabricator.wikimedia.org/T91310#1485897 (10Mattflaschen) This is still present: ``` matthew@matthew-l55: ~/Code/Wikimedia/operations/mediawiki-config (master|…)% curl -s 'http://en.wikipedia.beta.wmflabs.org/wiki/Timestamp_test' -H 'Coo... [19:25:13] 10Beta-Cluster: puppet run broken on deployment-fluorine - https://phabricator.wikimedia.org/T106655#1485905 (10hashar) [19:28:21] 10Beta-Cluster: puppet run broken on deployment-fluorine - https://phabricator.wikimedia.org/T106655#1485921 (10hashar) trimmed /var/log/udp2log/udp2log.log which is VERY QUICKLY taking all the disk space on /var . [19:32:42] 10Beta-Cluster: Upgrade beta cluster puppet master from Precise to Trusty - https://phabricator.wikimedia.org/T106649#1485942 (10hashar) [19:32:43] 10Beta-Cluster: puppet run broken on deployment-fluorine - https://phabricator.wikimedia.org/T106655#1485939 (10hashar) 5Open>3Resolved a:3hashar I deleted the client SSL cert `rm -fR /var/lib/puppet/client/ssl/` and puppet managed to grab and write on disk the cert stuff. Puppet run !!!!!! [19:33:16] 10Beta-Cluster, 10Sentry: Integrate Sentry with beta cluster - https://phabricator.wikimedia.org/T106920#1485946 (10mmodell) p:5Normal>3Triage [19:33:37] 10Beta-Cluster, 10Sentry: Integrate Sentry with beta cluster - https://phabricator.wikimedia.org/T106920#1485949 (10mmodell) p:5Triage>3Normal [19:34:09] 10Beta-Cluster: Beta cluster "test.wikipedia" thinks it is "test.wikimedia" - https://phabricator.wikimedia.org/T99156#1485953 (10mmodell) p:5Triage>3Normal [19:37:49] 10Beta-Cluster: Can not ssh to beta cluster instance deployment-apertium01 - https://phabricator.wikimedia.org/T106658#1485976 (10thcipriani) p:5Triage>3Low [19:41:46] 10Beta-Cluster: Can not ssh to beta cluster instance deployment-apertium01 - https://phabricator.wikimedia.org/T106658#1485995 (10hashar) I have soft rebooted the instance via Horizon. [19:41:47] RECOVERY - Puppet staleness on deployment-fluorine is OK Less than 1.00% above the threshold [3600.0] [19:42:49] RECOVERY - Free space - all mounts on deployment-fluorine is OK All targets OK [19:43:15] (03PS2) 10Dduvall: Expose client cookies [ruby/api] - 10https://gerrit.wikimedia.org/r/226949 [19:45:12] 10Beta-Cluster, 10Traffic, 6operations: Upgrade beta-cluster caches to jessie - https://phabricator.wikimedia.org/T98758#1486006 (10thcipriani) p:5Normal>3High Sounds like Varnish packages won't be getting built for Trusty any longer, upping priority. [19:55:41] 6Release-Engineering, 5Patch-For-Review: broken link to /w/COPYING in Special:Version on production wikis - https://phabricator.wikimedia.org/T107007#1486040 (10mmodell) 5Open>3Resolved [19:56:17] 10Beta-Cluster: Can not ssh to beta cluster instance deployment-apertium01 - https://phabricator.wikimedia.org/T106658#1486042 (10hashar) Hard rebooted the instance. [20:00:05] 10Beta-Cluster: Can not ssh to beta cluster instance deployment-apertium01 - https://phabricator.wikimedia.org/T106658#1486050 (10hashar) The instance is unreachable by SSH. OpenSSH did start but the 22 port does not respond. Maybe the instance is firewalled :-/ Maybe the best would be to rebuild the apertium... [20:18:57] 10Beta-Cluster: Can not ssh to beta cluster instance deployment-apertium01 - https://phabricator.wikimedia.org/T106658#1486101 (10thcipriani) Looks like there are IPTables rules in place preventing login. Running `sudo salt 'deployment-apertium01*' cmd.run 'iptables -I INPUT -p tcp --dport 22 -s 10.68.17.232 -j... [20:23:49] PROBLEM - Puppet staleness on deployment-apertium01 is CRITICAL 100.00% of data above the critical threshold [43200.0] [20:48:13] (03PS4) 10JanZerebecki: Update vendor using composer rather than cloning the deployment repo [integration/config] - 10https://gerrit.wikimedia.org/r/221310 (owner: 10Awight) [20:49:53] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 40.00% of data above the critical threshold [0.0] [20:50:07] (03PS1) 10Florianschmidtwelzow: Add QuickSearchLookup generic basic tests [integration/config] - 10https://gerrit.wikimedia.org/r/227341 [20:58:48] RECOVERY - Puppet staleness on deployment-apertium01 is OK Less than 1.00% above the threshold [3600.0] [20:59:52] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [21:18:27] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL 100.00% of data above the critical threshold [0.0] [21:26:17] PROBLEM - Puppet failure on deployment-cache-text03 is CRITICAL 100.00% of data above the critical threshold [0.0] [21:29:33] (03CR) 10Hashar: [C: 04-1] "I love this new scap feature and I am eager to see being used on beta cluster to deploy all the backend services whenever a merge occur on" (0317 comments) [tools/scap] - 10https://gerrit.wikimedia.org/r/224374 (owner: 10Thcipriani) [21:38:23] (03CR) 10JanZerebecki: "Updated Jenkins jobs wikimedia-fundraising-civicrm." [integration/config] - 10https://gerrit.wikimedia.org/r/221310 (owner: 10Awight) [21:38:46] PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL 100.00% of data above the critical threshold [43200.0] [21:52:08] (03CR) 10JanZerebecki: [C: 032] "Seems Adam already did that. The git checkout of vendor is on no slave for that job anymore. Please make sure my update of the script didn" [integration/config] - 10https://gerrit.wikimedia.org/r/221310 (owner: 10Awight) [21:54:25] (03Merged) 10jenkins-bot: Update vendor using composer rather than cloning the deployment repo [integration/config] - 10https://gerrit.wikimedia.org/r/221310 (owner: 10Awight) [22:25:55] (03PS1) 10Dduvall: Increase default MW-Selenium WebDriver timeout [integration/config] - 10https://gerrit.wikimedia.org/r/227362 (https://phabricator.wikimedia.org/T106878) [22:26:27] jdlrobson: ^ [22:35:13] (03PS2) 10Dduvall: WIP Login helper for fast API-based authentication [selenium] - 10https://gerrit.wikimedia.org/r/226950 [22:35:39] (03CR) 10jenkins-bot: [V: 04-1] WIP Login helper for fast API-based authentication [selenium] - 10https://gerrit.wikimedia.org/r/226950 (owner: 10Dduvall) [22:55:04] marxarelli: thnx for hosting the testing session today [22:55:11] it was really cool [22:56:47] mobrovac: oh, thanks. glad you found it useful [22:58:00] mobrovac: now we just need to keep the momentum going. grace and i will be meeting this wednesday to sift through all the notes [22:58:14] cool [22:58:17] glad to hear that [22:59:02] marxarelli: glad to hear there are immediate follow-ups and that this practice will continue [22:59:07] (03CR) 10Jdlrobson: [C: 031] Increase default MW-Selenium WebDriver timeout [integration/config] - 10https://gerrit.wikimedia.org/r/227362 (https://phabricator.wikimedia.org/T106878) (owner: 10Dduvall) [22:59:12] ofc, let me know if i can help [23:00:17] mobrovac: most definitely [23:04:33] !log running `jenkins-jobs update config/ 'browsertests-*'` to deploy I3c61ff4089791375e21aadfa045d503dfd73ca0e [23:04:37] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:07:47] (03CR) 10Dduvall: [C: 032] "Deployed." [integration/config] - 10https://gerrit.wikimedia.org/r/227362 (https://phabricator.wikimedia.org/T106878) (owner: 10Dduvall) [23:11:57] PROBLEM - Free space - all mounts on integration-slave-trusty-1012 is CRITICAL integration.integration-slave-trusty-1012.diskspace._mnt.byte_percentfree (<10.00%) [23:24:07] (03Merged) 10jenkins-bot: Increase default MW-Selenium WebDriver timeout [integration/config] - 10https://gerrit.wikimedia.org/r/227362 (https://phabricator.wikimedia.org/T106878) (owner: 10Dduvall)