[00:01:18] thcipriani, so now I'm stuck with 0/1 minions completed fetch [00:02:36] Krenair: it might be easier to see the error running: sudo salt-call deploy.fetch "" but it usually has something to do with the redis returner. [00:03:22] status 10? [00:04:39] https://github.com/wikimedia/operations-puppet/blob/production/modules/deployment/files/modules/deploy.py#L504 [00:05:18] so could be something to do with: /srv/deployment/[repo]/.git/deploy/deploy being...not there or incorrect. [00:06:24] root@sm-puppetmaster-trusty2:/srv/deployment/servermon/servermon# cat .git/deploy/deploy [00:06:24] {"sync-time": "20160422-000144", "tag": "servermon/servermon-sync-20160422-000144", "user": "Alex Monk", "time": "20160422-000144"} [00:06:37] hmmm that seems correct. [00:08:15] I wonder if the configured url is wrong in the deployment_config pillar? [00:08:27] or if you can't fetch from that configured url via http [00:09:02] possibly [00:09:07] not sure how to construct the url myself [00:09:18] check the server value in /srv/pillars/deployment/deployment_config.sls [00:10:13] it should just be: http://[server value from deployment_config]/[repo]/.git/deploy/deploy [00:10:18] ['deployment_config']['servers']['eqiad'] ? [00:10:38] ah, yeah, that is correct. [00:10:48]

The requested URL /servermon/servermon/.git/deploy/deploy was not found on this server.

[00:11:06] < Server: Apache/2.4.7 (Ubuntu) [00:11:54] if you GET / then this is the only td: [00:12:02] tr with tds:* [00:12:04] [DIR]html/2016-04-21 21:50 -   [00:12:24] so the apache docroot is /var/www [00:12:48] instead of /srv/deployment [00:14:07] ah, the apache site comes from role::deployment::server [00:14:11] only non-dummy file in apache's sites-enabled contains this line: "CustomLog /var/log/apache2/_access.log wmf", but that file doesn't exist [00:14:41] /etc/apache2/sites-enabled/50-deployment.conf ? yes, I got that one [00:15:26] /etc/apache2/apache2.conf does have this near the end: "IncludeOptional sites-enabled/*.conf" [00:17:02] confirmed it's definitely looking in /var/www/ by creating a file there - it shows in the index [00:17:39] sites-available/default-ssl.conf: DocumentRoot /var/www/html [00:17:39] sites-available/000-default.conf: DocumentRoot /var/www/html [00:17:46] but neither of these are symlinked in sites-enabled [00:17:57] hmm, I suppose it wouldn't be surprising if we made some bad assumptions about the base-state of apache in puppet. [00:19:41] but the apache config looks fine, it's just misbehaving [00:20:19] (03PS1) 10Ppchelko: Set up CI for event-schemas repository. [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) [00:21:47] thcipriani, ohhh... hang on [00:21:53] I stopped apache, but when I start it: [00:21:54] Output of config test was: [00:21:54] AH00526: Syntax error on line 3 of /etc/apache2/sites-enabled/50-deployment.conf: [00:21:54] ServerName takes one argument, The hostname and port of the server [00:22:06] wtf? [00:22:07] (03CR) 10Ppchelko: "I'd like to point out that I have 0 experience with this system, so I have no idea what I am doing here. Help would be very much appreciat" [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) (owner: 10Ppchelko) [00:22:44] evidently it doesn't like whatever the value of apache_fqdn is? [00:22:56] ServerName <%= @apache_fqdn %> [00:23:32] That explains the /var/log/apache2/_error.log [00:23:38] There's supposed to be something before the _ [00:25:00] (03CR) 10Legoktm: "Can the package.json file be moved to the root? Most MW extensions are also not npm packages, but still keep it in the root." [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) (owner: 10Ppchelko) [00:26:38] so yeah, the problem was my puppet changes broke apache_fqdn [00:26:44] still wonder how apache started in that state [00:27:43] (03CR) 10Ppchelko: "@Legoktm I'm wondering how we didn't see this coming..." [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) (owner: 10Ppchelko) [00:28:34] maybe just reloading the config failed, no restart attempted. [00:29:12] maybe puppet showed it fail to reload, but the next time puppet ran it didn't touch apache, so didn't re-attempt, and I didn't notice [00:31:57] still got 0 minions completed fetch [00:31:58] but: [00:31:59] # curl http://sm-puppetmaster-trusty2/servermon/servermon/.git/deploy/deploy [00:31:59] {"sync-time": "20160422-003039", "tag": "servermon/servermon-sync-20160422-003039", "user": "Alex Monk", "time": "20160422-003039"} [00:34:43] and the files arrive on the remote host correctly [00:34:47] yeah, there are lots of ways that trebuchet can fail :) might try the deploy.fetch again [00:35:20] if that works try: deploy.checkout [00:35:30] status 0 [00:35:48] also status 0 [00:35:57] kk, so it's got to be a redis thing now [00:36:27] redis-cli hgetall deploy:[repo]:minions:[host] [00:37:07] 1) "fetch_checkin_timestamp" [00:37:07] 2) "1461285221.474493" [00:37:07] 3) "checkout_checkin_timestamp" [00:37:07] 4) "1461285008.485345" [00:37:15] trebuchet is checking for the tag that it's deploying in redis to power the 0/X minions message. [00:37:44] yeah, the checkin timestamps happen before it does checkout and fetch [00:38:07] then after they complete it uses the salt returner...lemme see if I have how to test this in my notes... [00:41:35] hmmm...not in my notes...it'd be something like sudo salt-call deploy.fetch [repo] --return deploy_redis [00:42:44] that's the name of the returner: deploy_redis.py it should have keys like: fetch_tag, fetch_status, and fetch_timestamp [00:43:00] status 0 [00:44:35] but didn't write anything to deply:[repo]:minions:[host] fetch_tag? [00:44:58] redis-cli hget deploy:[repo]:minions:[host] 'fetch_tag' [00:46:09] root@sm-puppetmaster-trusty2:/srv/deployment/servermon/servermon# redis-cli hget deploy:servermon/servermon:minions:sm1.servermon.eqiad.wmflabs 'fetch_tag' [00:46:09] (nil) [00:47:20] ugh. I hate trebuchet. OK, so it can write tags since it's writing the fetch_checkin tag. [00:48:46] so what git deploy is actually calling is: sudo salt-call publish.runner deploy.fetch [repo] maybe that'll provide some more insight? [00:50:20] [INFO ] Publishing runner 'deploy.fetch' to tcp://10.68.16.66:4506 [00:50:20] local: [00:50:20] None [00:50:33] holy crap [00:50:39] dig that IP [00:51:20] hmm, weird [00:51:56] the last time I ran into this, my solution was not very technically sound: https://phabricator.wikimedia.org/T125067#1979516 it was mostly: mash everything until it works. [00:52:18] ultimately what worked was reinstalling salt-minion [00:56:15] ok, left the IP thing at https://phabricator.wikimedia.org/T115194#1717673 [01:00:45] hmm, also 4506 isn't a very standard port for redis iirc, is that where redis is running? [01:01:25] no, that port is bound by /usr/bin/python /usr/bin/salt-master [01:02:21] maybe that's normal then? I dunno. Worth doublechecking the redis port settings in /srv/pillars/deployment/deployment_config.sls [01:03:06] "redis": {"db": 0, "host": "sm-puppetmaster-trusty2.servermon.eqiad.wmflabs", "port": 6379, "socket_connect_timeout": 5} [01:03:44] (that port is in use by /usr/bin/redis-server) [01:03:47] ah, ok, it's probably normal then. I guess it just means that it's running stuff on the puppet master. [01:04:01] er saltmaster [01:04:17] the host was originally intended to be a puppetmaster, yes. but then I added the salt master and trebuchet deployment server [01:04:24] so now it's all three [01:04:51] heh, shouldn't be a problem really. [01:05:42] thcipriani, saltutil.refresh_pillar returned True for all three hosts [01:05:42] yeah, I'd give sudo apt-get remove salt-minion; sudo apt-get install salt-minion a shot. I'm out of ideas after that :( [01:06:05] saltutil.sync_all returned this for each: [01:06:06] ---------- [01:06:06] grains: [01:06:06] modules: [01:06:06] outputters: [01:06:08] renderers: [01:06:10] returners: [01:06:12] states: [01:06:14] utils: [01:06:30] I think that means everything is in sync [01:06:59] (or, at least, the salt master didn't think it needed to sync anything) [01:07:10] and yet "0/2 minions completed fetch" comes from git deploy sync [01:07:30] ... maybe that happens if they don't need to sync anything?? [01:07:56] nah, git deploy creates a new tag each deploy, iirc [01:08:16] so there's always something to fetch [01:08:42] I haven't been starting a new deploy each time [01:09:15] I'm pretty sure git deploy start just creates a lock file [01:10:29] oh no wait, git deploy start does write the tag [01:11:25] I did abort --force, start, sync [01:11:32] 0/2 minions completed fetch [01:12:31] wait. they both write tags, for some reason. https://github.com/wikimedia/operations-software-deployment-trebuchet-trigger/blob/master/trigger/shell.py#L150 [01:14:37] it might be useful to run the master in the foreground, there is some debug output https://github.com/wikimedia/operations-puppet/blob/production/modules/deployment/files/returners/deploy_redis.py#L45-L46 [01:14:46] https://docs.saltstack.com/en/develop/topics/troubleshooting/index.html#running-in-the-foreground [01:19:00] anyway, I've actually got to run [01:19:22] also: this effort should be expended moving this repo to scap3 anyway :P [01:21:49] I'd still like to upload the puppet patch, can't really do that if it might break everything [01:24:50] yeah. I don't think there's likely anything wrong with puppet so much as this is a weird salt-related thing. [01:25:54] the non-determinism is strong whenever salt is involved. In my limited experience anyway. [02:07:03] I stopped the salt-master service and am running "salt-master -l debug", nothing like "Entering deploy_redis returner" seen [02:20:48] hmm, I'm fairly sure it's the master that should receive that message since git deploy triggers the runner that runs a client.cmd with a returner [02:21:41] although, since it is a client.cmd it *could be* that it is the minion that would get the debug message. [02:38:21] yep, there it is [02:38:24] [DEBUG ] Entering deploy_redis returner [02:38:27] on the minion [02:40:04] 1/2 minions completed fetch [02:40:05] WTF [02:40:28] I didn't change anything [02:42:15] (the minion not completing the fetch is the deployment server/salt master. . .) [02:42:54] hmm, probably going to have to futz with editing that returner file to get more debug info /var/cache/salt/minion/files/base/_returners/deploy_redis.py [02:48:15] salt-minion on the master just has "grains target:" and "Attempting to match" [02:50:29] like it fails to match [02:50:50] the other one would do things like "User root Executing command deploy.fetch with jid 20160422023503910412" [02:51:25] should it be trying to deploy to itself like that? [02:52:16] hmm, oh! actually I think there may be code that prevents a master from deploying to itself. Lemme double-check, this is something I just half-remembered. [02:57:15] yup: https://github.com/wikimedia/operations-puppet/blob/production/modules/deployment/files/runners/deploy.py#L15-L18 [02:58:23] so it matches all the deployment_target:[repo] but not deployment_server:* [02:58:25] https://docs.saltstack.com/en/latest/topics/targeting/compound.html [03:01:33] doesn't explain how it ended up in the redis db in the first place. [03:02:31] maybe try removing trebuchet master, resyncing, see if it's 1/1 redis-cli srem deploy:[repo]:minions "[deployhost]" [03:09:34] (integer) 1 [03:10:23] 1/1 minions completed fetch [03:10:49] 1/1 minions completed checkout [03:10:57] Deployment finished. [03:11:11] I still don't know what I changed to fix the proper deployment target [03:11:15] or how the deployment master became a target [03:12:33] Yippee, build fixed! [03:12:34] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #950: 09FIXED in 22 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/950/ [03:13:32] yeah :\ the second piece may be important for a puppet patch: that is, if the deployment_target grain is added, then the trebuchet provider is called, prior to the deployment_server grain being added. [03:21:39] PROBLEM - Host cache-rsync is DOWN: CRITICAL - Host Unreachable (10.68.23.165) [03:39:01] 06Release-Engineering-Team, 10Phabricator, 06Project-Admins, 07WorkType-Maintenance: Create a Policy Admins project and move all of the acl*various_policy_admins pojects under it as subprojects. - https://phabricator.wikimedia.org/T129515#2229754 (10mmodell) 05Open>03Resolved [03:44:49] 03Scap3, 10scap, 13Patch-For-Review: scap::target shouldn't allow users to redefine the user's key - https://phabricator.wikimedia.org/T132747#2229757 (10mmodell) p:05Triage>03High [04:41:22] 10Beta-Cluster-Infrastructure, 07Puppet, 07Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2229784 (10Krenair) [04:41:53] 10Beta-Cluster-Infrastructure, 03Scap3, 06Revision-Scoring-As-A-Service, 07Puppet: deployment-((sca|aqs)01|ores-web) puppet failures due to scap3 errors - https://phabricator.wikimedia.org/T132267#2229786 (10Krenair) [04:41:58] 10Beta-Cluster-Infrastructure, 07Puppet: deployment-cache-parsoid05 puppet failures due to removal of role::cache::parsoid - https://phabricator.wikimedia.org/T132260#2229787 (10Krenair) [04:42:40] 10Beta-Cluster-Infrastructure, 10Analytics: deployment-fluorine puppet failure due to '/usr/sbin/usermod -u 10003 datasets' returned 4: usermod: UID '10003' already exists - https://phabricator.wikimedia.org/T117028#2229789 (10Krenair) [04:46:48] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review, 07Puppet: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2229791 (10Krenair) a:03mmodell (I went and found the code in puppet... [04:49:30] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review, 07Puppet: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2229793 (10mmodell) I think I found the race condition: The order of o... [04:50:14] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review, 07Puppet: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2229794 (10mmodell) I'm gonna cherry pick the patch on beta. We'll see... [04:52:30] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review, 07Puppet: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2229795 (10mmodell) Actually, come to think of it, I'm not sure if it's... [05:00:05] 10releng-201516-q3, 10scap, 10Scap3 (Scap3-MediaWiki-MVP), 07WorkType-NewFunctionality: [keyresult] Migrate the MW weekly train deploy to scap3 - https://phabricator.wikimedia.org/T114313#2229806 (10mmodell) a:05mmodell>03None [05:02:14] 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review, 07Puppet: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2206880 (10yuvipanda) See also T120159 [05:26:16] Yippee, build fixed! [05:26:17] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #765: 09FIXED in 24 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/765/ [05:39:27] "23:26:17 Yippee, build fixed!" That bot is the most enthusiastic one I've seen in a while. [06:35:18] PROBLEM - Puppet run on integration-slave-trusty-1024 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [06:42:52] PROBLEM - Puppet run on integration-slave-trusty-1015 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [06:54:35] PROBLEM - Puppet run on integration-slave-trusty-1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [07:07:58] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50704 bytes in 0.517 second response time [07:11:16] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 50705 bytes in 1.314 second response time [07:15:23] RECOVERY - Puppet run on integration-slave-trusty-1024 is OK: OK: Less than 1.00% above the threshold [0.0] [07:22:51] RECOVERY - Puppet run on integration-slave-trusty-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [07:34:36] RECOVERY - Puppet run on integration-slave-trusty-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [08:02:17] (03PS6) 10Legoktm: Update squizlabs/php_codesniffer to 2.6.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/281816 (owner: 10Paladox) [08:05:15] (03PS7) 10Legoktm: Update squizlabs/php_codesniffer to 2.6.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/281816 (owner: 10Paladox) [08:09:22] (03PS1) 10Legoktm: Also run 'mw-tools-codesniffer-mwcore-testrun' job under HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/284860 [08:09:33] (03PS2) 10Legoktm: Also run 'mw-tools-codesniffer-mwcore-testrun' job under HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/284860 [08:09:35] (03CR) 10Mobrovac: "Alternatively, we could use the trick from jobs for deploy repos: they have a SRC env var that tells the job where to go to execute test, " [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) (owner: 10Ppchelko) [08:12:00] (03CR) 10Legoktm: [C: 032] Also run 'mw-tools-codesniffer-mwcore-testrun' job under HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/284860 (owner: 10Legoktm) [08:12:47] (03Merged) 10jenkins-bot: Also run 'mw-tools-codesniffer-mwcore-testrun' job under HHVM [integration/config] - 10https://gerrit.wikimedia.org/r/284860 (owner: 10Legoktm) [08:13:07] !log deploying https://gerrit.wikimedia.org/r/284860 [08:13:10] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [08:14:21] (03PS1) 10Legoktm: Fix comment in fabfile.py [integration/config] - 10https://gerrit.wikimedia.org/r/284861 [08:17:00] 10Continuous-Integration-Infrastructure, 07Technical-Debt, 07Tracking: All repositories should pass jshint test (tracking) - https://phabricator.wikimedia.org/T62619#2229938 (10hashar) [08:18:34] (03CR) 10Legoktm: [C: 032] Update squizlabs/php_codesniffer to 2.6.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/281816 (owner: 10Paladox) [08:19:11] (03Merged) 10jenkins-bot: Update squizlabs/php_codesniffer to 2.6.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/281816 (owner: 10Paladox) [08:20:38] Project browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #929: 04FAILURE in 38 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/929/ [08:21:15] 10Continuous-Integration-Infrastructure, 07Technical-Debt, 07Tracking: All repositories should pass jshint test (tracking) - https://phabricator.wikimedia.org/T62619#2229939 (10hashar) 05Open>03Resolved a:03Paladox The reason I have filled all those tasks was to get rid of the myriad of non voting jsli... [08:27:09] 10Continuous-Integration-Infrastructure, 06Operations: Investigate usage of ttf-ubuntu-font-family which is not available on Jessie - https://phabricator.wikimedia.org/T103325#2229946 (10hashar) On `integration-slave-jessie1001` (which has the puppet class `mediawiki::packages`) puppet is happy: ``` Notice: /S... [08:28:46] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #948: 04FAILURE in 18 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/948/ [08:39:26] 10Continuous-Integration-Infrastructure, 07Technical-Debt, 07Tracking: All repositories should pass jshint test (tracking) - https://phabricator.wikimedia.org/T62619#2229947 (10Legoktm) Major kudos Paladox! [08:46:33] 10Continuous-Integration-Config, 10Fundraising-Backlog, 07FR-ActiveMQ, 07FR-Smashpig, and 2 others: SmashPig CI should run phpunit - https://phabricator.wikimedia.org/T133248#2226085 (10hashar) It is magic! :-} [08:47:48] (03CR) 10Hashar: [C: 032] Fix comment in fabfile.py [integration/config] - 10https://gerrit.wikimedia.org/r/284861 (owner: 10Legoktm) [08:48:36] (03Merged) 10jenkins-bot: Fix comment in fabfile.py [integration/config] - 10https://gerrit.wikimedia.org/r/284861 (owner: 10Legoktm) [09:36:38] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #788: 04FAILURE in 15 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/788/ [09:38:01] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #842: 04FAILURE in 1 min 1 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/842/ [09:52:06] 10Continuous-Integration-Infrastructure, 07Technical-Debt, 07Tracking: All repositories should pass jshint test (tracking) - https://phabricator.wikimedia.org/T62619#2230086 (10Paladox) Your welcome. [09:56:11] (03CR) 10Paladox: "Thanks." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/281816 (owner: 10Paladox) [11:32:44] (03PS1) 10Zfilipin: WIP make JJB work for VisualEditor [integration/config] - 10https://gerrit.wikimedia.org/r/284883 [11:58:39] Project browsertests-CentralAuth-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #460: 04FAILURE in 1 min 38 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralAuth-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/460/ [12:30:34] (03CR) 10Mobrovac: [C: 04-1] "Nah, scratch that, let's not special-case this one repo. I vote for simply putting package.json in the root or symlink it there." [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) (owner: 10Ppchelko) [12:37:19] PROBLEM - Puppet run on integration-slave-trusty-1024 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [12:38:35] PROBLEM - Puppet run on deployment-ms-be02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [12:39:15] PROBLEM - Puppet run on integration-slave-trusty-1018 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:06:34] Project browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #890: 04FAILURE in 33 sec: https://integration.wikimedia.org/ci/job/browsertests-PageTriage-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/890/ [13:13:36] RECOVERY - Puppet run on deployment-ms-be02 is OK: OK: Less than 1.00% above the threshold [0.0] [13:17:23] RECOVERY - Puppet run on integration-slave-trusty-1024 is OK: OK: Less than 1.00% above the threshold [0.0] [13:20:37] (03PS37) 10Zfilipin: WIP Simplify creating Jenkins jobs for running browser tests daily [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) [14:02:30] 06Release-Engineering-Team, 10DBA: Missing / Dropped databases? - https://phabricator.wikimedia.org/T132838#2230390 (10jcrespo) @Krenair @demon : these databases existing, despite being "deleted" created multiple long-to fix replication issues on dbstore (analytics/backups) and labs. The main conceptual proble... [14:13:17] Krenair: Currently here? Beta cluster shows white pages again [14:27:23] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #861: 04FAILURE in 1 min 22 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/861/ [14:33:45] Luke081515, fixing github/gerrit stuff first [14:33:49] beta is next [14:34:03] ok, thx [14:36:04] hmm...made a task for beta, dunno what's causing it: https://phabricator.wikimedia.org/T133391 [14:36:13] ^ that's the error in the logs though. [14:41:29] $this->vectorConfig = ConfigFactory::getDefaultInstance()->makeConfig( 'vector' ); [14:42:49] (03PS1) 10Hashar: dib: glue for Ubuntu Trusty imaging [integration/config] - 10https://gerrit.wikimedia.org/r/284900 (https://phabricator.wikimedia.org/T133203) [14:42:54] thcipriani|afk, 02 works, 01 and 03 are showing that [14:44:18] (03CR) 10Hashar: [C: 04-2] "Have to try building an image and see whether it provision and boot properly." [integration/config] - 10https://gerrit.wikimedia.org/r/284900 (https://phabricator.wikimedia.org/T133203) (owner: 10Hashar) [14:44:59] Project beta-scap-eqiad build #99412: 04FAILURE in 0.23 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/99412/ [14:53:52] huh, back. [14:54:07] hhvm restart on 02 and 03 got it back up [14:54:57] I was changing things at the same time [14:55:56] Yippee, build fixed! [14:55:57] Project beta-scap-eqiad build #99413: 09FIXED in 1 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/99413/ [14:56:15] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 40324 bytes in 0.696 second response time [15:02:59] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 40304 bytes in 0.828 second response time [15:07:03] ty for the beta hhvm restart [15:16:06] RECOVERY - Puppet run on integration-slave-trusty-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [15:25:36] 06Release-Engineering-Team, 10DBA: Missing / Dropped databases? - https://phabricator.wikimedia.org/T132838#2230587 (10demon) >>! In T132838#2211939, @Krenair wrote: > I don't know what `steward` is... > Maybe it was a mistake during the creation of `stewardwiki`? That'd be my guess. >>! In T132838#2230390,... [15:31:27] 06Release-Engineering-Team, 10DBA: Missing / Dropped databases? - https://phabricator.wikimedia.org/T132838#2230626 (10jcrespo) +1. I need to check a general solution for "long term storage/archive". [15:34:09] (03PS38) 10Zfilipin: WIP Simplify creating Jenkins jobs for running browser tests daily [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) [15:41:29] (03PS39) 10Zfilipin: WIP Simplify creating Jenkins jobs for running browser tests daily [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) [16:08:25] 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Simplify creating Jenkins jobs for running browser tests daily - https://phabricator.wikimedia.org/T128190#2230751 (10zeljkofilipin) [16:23:17] (03CR) 10Ottomata: "Ah ok, if there are technical reasons to put it in root, I'm ok with it. My reasons for not putting it there are purely aesthetic :)" [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) (owner: 10Ppchelko) [16:33:12] 10Browser-Tests-Infrastructure, 13Patch-For-Review, 15User-zeljkofilipin: Simplify creating Jenkins jobs for running browser tests daily - https://phabricator.wikimedia.org/T128190#2230811 (10zeljkofilipin) [16:33:47] (03PS40) 10Zfilipin: WIP Simplify creating Jenkins jobs for running browser tests daily [integration/config] - 10https://gerrit.wikimedia.org/r/274136 (https://phabricator.wikimedia.org/T128190) [16:41:56] legoktm: Looks like composer 1.0.2 has been released https://github.com/composer/composer/releases [16:52:42] robla: I added a section to the RfC on the current state of code review tools. https://www.mediawiki.org/wiki/Requests_for_comment/Migrate_code_review_and_management_to_Phabricator_from_Gerrit#Alternatives [17:05:54] * robla looks while he waits for his next meeting [17:07:41] ostriches: I'm tempted to put a pros/cons section on Gerrit. I suspect that Ops may prefer Gerrit in some ways (because of tighter +2 controls) [17:09:41] (03PS2) 10Ppchelko: Set up CI for event-schemas repository. [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) [17:12:30] robla: Tighter controls on who can press +2, but no less tight on who can push at the end [17:14:34] robla: Plus you could set rules for repos too, requiring a "+2/Approved" before landing from a specific group (iirc) [17:14:43] paladox: lets get to 1.0.0 first [17:14:58] "Must be reviewed by group" type thing [17:15:11] Ok but 1.0.2 is stable and fixes a few things [17:15:19] legoktm ^^ [17:15:25] Should we do that now. [17:35:03] 06Release-Engineering-Team, 05Release: MW-1.27.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T131559#2170735 (10Jdforrester-WMF) Is wmf.24 going to exist? From [[https://lists.wikimedia.org/pipermail/wikitech-l/2016-April/085334.html | Chad's announcement]] I got the impression that 1.27.0-w... [17:37:48] ostriches: I can't believe you didn't mention Gareth as another Gerrit alternative! ;) [17:46:18] (03PS3) 10Legoktm: Set up CI for event-schemas repository. [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) (owner: 10Ppchelko) [17:46:31] (03PS4) 10Legoktm: Set up CI for event-schemas repository. [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) (owner: 10Ppchelko) [17:52:01] since greg-g probably isn't around, any objections to me doing T131844 today? it passed security review [17:52:01] T131844: Set up UploadsLink extension on the beta cluster - https://phabricator.wikimedia.org/T131844 [18:03:07] ostriches: if I go on to #wikimedia-operations, ping a few key people, and say the RFC says Gerrit this about it: "Nobody loves it, people just use it", will those people agree that "of course we should move to Differential!" [18:04:45] robla: But it is limiting to people who use other platforms such as windows since windows is not linux and windows support for arc is new so there are alot of bugs. [18:05:07] Ive created this installer https://github.com/paladox/Arcanist-installer-for-windows that should make it easy [18:06:54] robla: Probably not, but if objections are based on fear of access to repos they're unfounded. [18:08:25] accusing other people of fear and having unfounded beliefs isn't a good way of building trust. I wouldn't be surprised if ArchCom rejects the RFC based on that. [18:09:09] the RFC seems incredibly biased in tone [18:09:14] I'm not saying all objections are unfounded. [18:09:28] I'm saying if someone is having concerns about access to repositories that can be solved via education and documentation. [18:09:51] *one very specific concern that you had raised about review just now* [18:12:29] (03CR) 10Legoktm: [C: 032] Set up CI for event-schemas repository. [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) (owner: 10Ppchelko) [18:13:28] (03Merged) 10jenkins-bot: Set up CI for event-schemas repository. [integration/config] - 10https://gerrit.wikimedia.org/r/284841 (https://phabricator.wikimedia.org/T124438) (owner: 10Ppchelko) [18:13:49] !log deploying https://gerrit.wikimedia.org/r/284841 [18:13:52] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [18:17:41] 10Beta-Cluster-Infrastructure, 10TemplateStyles: Deploy TemplateStyles to the beta-cluster - https://phabricator.wikimedia.org/T133414#2231195 (10Luke081515) [18:19:18] 10Beta-Cluster-Infrastructure, 10TemplateStyles: Deploy TemplateStyles to the beta-cluster - https://phabricator.wikimedia.org/T133414#2231195 (10Luke081515) [18:28:36] 10Beta-Cluster-Infrastructure, 10TemplateStyles, 10Wikimedia-Extension-setup: Deploy TemplateStyles to the beta-cluster - https://phabricator.wikimedia.org/T133414#2231195 (10Luke081515) [18:30:51] 10Beta-Cluster-Infrastructure, 10Wikimedia-Site-requests: On beta metawiki, a mix of the beta enwiki and the production metawiki logos show - https://phabricator.wikimedia.org/T125942#2001079 (10Luke081515) You mean a metawiki logo with a beta "note"? [18:46:09] PROBLEM - Host integration-dev is DOWN: CRITICAL - Host Unreachable (10.68.17.81) [18:49:59] 10Beta-Cluster-Infrastructure, 10Wikimedia-Site-requests: On beta metawiki, a mix of the beta enwiki and the production metawiki logos show - https://phabricator.wikimedia.org/T125942#2231344 (10Krenair) yes [19:53:13] 10MediaWiki-Codesniffer, 03Google-Summer-of-Code-2016: [GSoC 2016 Proposal] Improving static analysis tools for MediaWiki - https://phabricator.wikimedia.org/T130574#2231471 (10Sumit) Congratulations @Lethexie for getting selected for this project in GSoC 2016! Wish you a good luck with it. You can start discu... [20:25:06] (03CR) 10Aaron Schulz: [C: 031] Enable tests for kafka-watcher [integration/config] - 10https://gerrit.wikimedia.org/r/284746 (owner: 10Smalyshev) [21:45:49] Hi, can somebody tell me where I can find the code for a jenkins job, whih builds a new wiki, and make tests later? [23:01:18] Is there a reason gerrit keeps logging me out today? I'm logging in for the 3rd time... [23:13:52] csteipp: I'm still logged in, since 10 hours [23:41:44] csteipp: it has logged me out multiple times as well