[00:20:00] 10Continuous-Integration-Config, 10Fundraising-Backlog, 5Patch-For-Review, 7WorkType-Maintenance: Tests on deployment branches of wikimedia/fundraising/crm falling causing to force merge (and deadlock of Zuul) - https://phabricator.wikimedia.org/T117062#1809607 (10DStrine) [00:23:37] 10Deployment-Systems, 6Release-Engineering-Team, 3Scap3: scap creating directories owned by root on mira - https://phabricator.wikimedia.org/T118691#1809619 (10Krinkle) When syncing a file: ``` 00:21:21 Started sync-masters 00:21:29 ['/srv/deployment/scap/scap/bin/sync-master', 'tin.eqiad.wmnet'] on mira.co... [01:12:03] (03PS1) 10Tim Starling: Add .gitreview [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/253519 [01:12:05] (03PS1) 10Tim Starling: Optimisations [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/253520 [01:13:18] (03CR) 10Tim Starling: [C: 032] Add .gitreview [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/253519 (owner: 10Tim Starling) [01:15:39] (03CR) 10Tim Starling: [V: 032] Add .gitreview [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/253519 (owner: 10Tim Starling) [01:15:53] (03CR) 10Tim Starling: [C: 032 V: 032] Optimisations [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/253520 (owner: 10Tim Starling) [02:06:54] 10Continuous-Integration-Infrastructure, 10MediaWiki-Unit-tests: MediaWiki PHPUnit tests skips HtmlFormatterTest because "Tidy extension not installed" - https://phabricator.wikimedia.org/T118814#1809962 (10Krinkle) 3NEW [02:08:10] 10Continuous-Integration-Infrastructure, 10MediaWiki-Unit-tests: MediaWiki PHPUnit tests skips TidyTest because "Tidy not found" - https://phabricator.wikimedia.org/T118814#1809969 (10Krinkle) [02:29:24] PROBLEM - Host integration-labsvagrant is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [03:04:58] Project beta-scap-eqiad build #78855: 04FAILURE in 10 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/78855/ [03:27:17] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #884: 04FAILURE in 45 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/884/ [06:27:21] greg-g: twentyafterfour: ostriches: Almost 100 "Production impact" issues in Wikimedia-log-errors. May be time for a sprint or otherwise highlight of sorts. [06:38:22] RECOVERY - Free space - all mounts on deployment-bastion is OK: OK: All targets OK [06:40:13] Krinkle: I know. [06:40:25] Half of the are probably fixed/no longer issues [06:40:31] The other half nobody owns. [06:40:36] And the last half might get fixed :) [09:47:40] 5Continuous-Integration-Scaling, 6operations: Upload new Zuul packages on apt.wikimedia.org for Precise / Trusty / Jessie - https://phabricator.wikimedia.org/T118340#1810354 (10hashar) The packaging work is held in our Gerrit repo `integration/zuul.git` with the following branches: | `upstream` | 1cc37f7b469a... [09:57:26] PROBLEM - Host deployment-cache-parsoid04 is DOWN: CRITICAL - Host Unreachable (10.68.19.197) [10:02:55] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10MobileFrontend: MobileFrontend is failing mwext-mw-selenium test - https://phabricator.wikimedia.org/T118771#1810389 (10hashar) [10:27:43] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 0.54 ms [11:02:01] 5Gerrit-Migration, 10Analytics-Tech-community-metrics: Make MetricsGrimoire/korma support gathering Code Review statistics from Phabricator's Differential - https://phabricator.wikimedia.org/T118753#1810527 (10Aklapper) [11:52:15] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [12:37:35] 10Continuous-Integration-Config, 10ArticlePlaceholder, 10Wikidata, 5Patch-For-Review, and 2 others: [Task] add CI to extension ArticlePlaceholder - https://phabricator.wikimedia.org/T113049#1810631 (10Tobi_WMDE_SW) [12:42:48] (03CR) 10Hashar: [C: 032] Set up CI for eventlogging (python) repo [integration/config] - 10https://gerrit.wikimedia.org/r/253359 (https://phabricator.wikimedia.org/T118761) (owner: 10Ottomata) [12:43:40] (03Merged) 10jenkins-bot: Set up CI for eventlogging (python) repo [integration/config] - 10https://gerrit.wikimedia.org/r/253359 (https://phabricator.wikimedia.org/T118761) (owner: 10Ottomata) [12:55:00] Yippee, build fixed! [12:55:01] Project browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #662: 09FIXED in 59 sec: https://integration.wikimedia.org/ci/job/browsertests-GettingStarted-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/662/ [13:17:08] hashar: new date works for me [13:18:04] jzerebecki: sorry for the late notification :( [13:18:32] jzerebecki: andrew proposed that time slot to get a new zuul-merger instance deployed and there is no other good timeslot this week [13:27:28] (03PS1) 10Hashar: Dependencies install notes for Mac/Homebrew [integration/uprightdiff] - 10https://gerrit.wikimedia.org/r/253598 [14:28:37] Yippee, build fixed! [14:28:37] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #693: 09FIXED in 2 min 36 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/693/ [14:51:43] 5Continuous-Integration-Scaling, 6operations: Upload new Zuul packages on apt.wikimedia.org for Precise / Trusty / Jessie - https://phabricator.wikimedia.org/T118340#1810828 (10Andrew) ok, good enough for me :) [15:02:37] hashar: this’ll do [15:03:00] :-} [15:03:00] there is a lot of other ops stuff going [15:05:07] andrewbogott: so that is a two step process [15:05:16] first get zuul-merger installed / setup on scandium [15:05:25] then once happy / completed [15:05:45] add scandium to the iptables rule on gallium which prevent it from joining the pool [15:06:34] ok, so installing zuul-merger is https://gerrit.wikimedia.org/r/#/c/252336/3 right? [15:06:49] yes [15:07:08] wich will create the /etc related files to have the service running [15:07:25] and should create a disk mount under /srv/ssd [15:08:12] once happ [15:08:19] we can enable the iptables rule ( https://gerrit.wikimedia.org/r/252337 ) [15:08:36] hm, "parent directory /srv/ssd/zuul does not exist" [15:08:42] do you want to fix that in puppet or shall I? [15:08:49] oh man [15:08:55] I forgotto mount the ssd [15:09:10] on gallium that is done via site.pp [15:09:16] file { '/srv/ssd': [15:09:16] mount { '/srv/ssd': [15:09:29] * hashar copy paste [15:10:26] hmm [15:10:58] /dev/md2 139G 33M 139G 1% /srv [15:11:05] andrewbogott: I dont know how that mount got realized [15:11:07] maybe on setup [15:11:36] most of our partman recipes put extra drives at /srv [15:11:54] so, it’s no surprise. You can move it, or move your stuff to use /srv instead [15:12:10] I would move it to /srv/ssd for consistency [15:12:18] or agrrg [15:12:25] yeah [15:12:25] it is easier [15:12:34] else I will have to vary the mount point between gallium and scandium [15:13:02] PROBLEM - puppet last run on scandium is CRITICAL: CRITICAL: Puppet has 1 failures [15:13:57] andrewbogott: https://gerrit.wikimedia.org/r/253611 [15:14:00] copy pasted from gallium [15:14:30] wrong disk [15:15:53] made it to mount /dev/md2 [15:16:01] ready for me to merge that? [15:20:28] yeah [15:20:37] I think you still need to mkdir /srv/ssd/zuul someplace [15:21:04] looks like the zuul puppet manifest is not magic [15:21:46] file { $git_dir: [15:21:46] ensure => directory, [15:21:47] owner => 'zuul', [15:21:47] } [15:21:53] zuul::merger should create it [15:23:29] oh I got it [15:23:34] so [15:23:43] the zuul::merger class is being passed `/srv/ssd/zuul/git` [15:23:48] but only /srv/ssd exists :( [15:23:57] and puppet doesn't mkdir -p [15:24:09] yeah, puppet still lacks proper recursive mkdir I think [15:25:06] The role should probably create all the parent dirs before invoking the module [15:25:06] should we just create it manually and call it an end? :D [15:25:37] or I can exec {} mkdir -p [15:25:46] let me look... [15:27:53] andrewbogott: and I had the same issue with nodepool actually [15:28:55] https://gerrit.wikimedia.org/r/#/c/253616/1/modules/zuul/manifests/merger.pp [15:29:39] I took it from the nodepool manifest https://github.com/wikimedia/operations-puppet/blob/production/modules/nodepool/manifests/init.pp#L135-L149 [15:31:27] hashar: won’t https://gerrit.wikimedia.org/r/#/c/253617/ do it? [15:31:56] yup [15:31:59] though a couple line below [15:32:00] 'git_dir' => '/srv/ssd/zuul/git', [15:32:05] that is a configurable dir [15:32:13] so in our specific case that is going to work [15:32:25] but if we ever change the git_dir that can causes some troubles [15:32:26] oh, I see... [15:32:44] yeah, yours is better :) [15:33:07] must have been suggested to me by filipo when reviewing the nodepool manifest [15:33:14] I deserve no credit :-} [15:34:54] hashar: ok, puppet is happy now. Want to make sure that things are doing what you’d expect? [15:35:08] checking [15:35:22] RECOVERY - puppet last run on scandium is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [15:35:25] the zuul-merger did not start [15:35:34] but maybe it is lacks ensure => started [15:35:36] or something similar [15:37:10] PROBLEM - zuul_merger_service_running on scandium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [15:37:25] started it manually [15:40:27] hashar: let me know when you want me to merge that last patch [15:40:31] so [15:40:34] I got it running [15:40:44] although maybe we should puppetize the running state first? [15:41:03] running puppet [15:41:06] to make sure it starts the service [15:41:25] there might be some weird interaction between systemd and the .pid file [15:42:00] and puppet just refuse to start it [15:42:03] :(- [15:43:00] RECOVERY - zuul_merger_service_running on scandium is OK: PROCS OK: 1 process with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [15:43:17] service { 'zuul-merger': [15:43:18] name => 'zuul-merger', [15:43:19] enable => true, [15:43:19] hasrestart => true, [15:44:11] I see in the debug [15:44:12] Debug: Executing '/usr/sbin/service zuul-merger status' [15:44:13] Debug: Executing '/bin/systemctl show -pSourcePath zuul-merger' [15:45:53] 5$ that puppet doesn't manage systemd properly :/ [15:46:04] it works in lots of other cases [15:46:13] could be that the systemd setup in the package isn’t quite right... [15:46:55] and puppet doesn't start the git-daemon either [15:48:03] the package for Jessie doesn't have systemd [15:48:42] PROBLEM - zuul_merger_service_running on scandium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [15:48:53] ok — so sounds like either we need to add proper systemd scripts to the package, or add them externally with puppet [15:49:15] yeah [15:49:25] or figure out what is happening with puppet [15:49:29] (03CR) 10Alexandros Kosiaris: [C: 031] "LGTM, but I am not the best one around to judge that" [integration/config] - 10https://gerrit.wikimedia.org/r/252716 (https://phabricator.wikimedia.org/T110019) (owner: 10Zfilipin) [15:49:30] service { 'git-daemon': [15:49:30] ensure => running, [15:49:31] enable => true, [15:49:31] hasrestart => true, [15:49:35] doesn't start it either :-( [15:51:27] If there’s no systemd script then I wouldn’t expect anything to work. It’s not puppet’s fault, is it? [15:54:18] I think it auto detects the provider [15:54:18] so on Jessie assumes everything is systemd [15:54:18] but then [15:54:18] it executes /usr/sbin/service zuul-merger status [15:54:18] and probably ends up being confused while trying to parse the output [15:54:19] so [15:54:19] I guess it is fail [15:54:19] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:54:19] till I figure out what is going on puppet side [15:54:19] if you look in modules/openstack/manifests/designate/service.pp [15:54:19] you can see me hacking around a similar problem, where a .deb doesn’t have systemd scripts [15:54:19] look for 'These would be automatically included in a correct designate package' [15:54:19] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:54:53] goood morninnniiing [15:54:57] and you use base::service_unit [15:55:12] ostriches: do you use scap3 in beta labs (deployment-prep)? [15:55:53] hashar: fixing the packages so they work properly on debian would be better, but I don’t really know how to do that :/ [15:56:06] i think I had a patch floating around [15:56:25] but that mean diverging the one for Jessie from the ones for Precise/Trusty [15:56:25] but yeah [15:56:27] will have to do that [15:56:34] though. git-daemon doesn't work either [15:57:54] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 38934 bytes in 0.614 second response time [15:57:55] andrewbogott: so I am calling it an end. Lets not pool it [15:58:10] ok :( [15:58:25] will try to reproduce on labs [15:58:33] and figure out what is the heck is happening in puppet :D [15:58:38] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39292 bytes in 0.747 second response time [15:58:40] zuul is workable [15:58:44] but the git-daemon is not [16:02:08] 10Continuous-Integration-Infrastructure, 5Patch-For-Review, 7Zuul: Zuul-cloner should use hard links when fetching from cache-dir - https://phabricator.wikimedia.org/T97106#1811009 (10hashar) [16:02:09] 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1811010 (10hashar) [16:02:11] 5Continuous-Integration-Scaling, 6operations: Upload new Zuul packages on apt.wikimedia.org for Precise / Trusty / Jessie - https://phabricator.wikimedia.org/T118340#1811007 (10hashar) 5Open>3Resolved Andrew uploaded them all :-} Thank you! [16:02:12] andrewbogott: I started it manually to clear the icinga alarm. [16:02:21] thanks [16:02:25] will fill a bunch of follow up tasks [16:02:33] and I guess we will want to reschedule something next week :-( [16:02:51] RECOVERY - zuul_merger_service_running on scandium is OK: PROCS OK: 1 process with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [16:08:13] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:08:13] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:08:14] andrewbogott: one last thing, do we setup the Icinga monitoring probe in the modules or on the role? [16:08:25] hashar: modules, usually [16:08:40] oh [16:08:42] modules/zuul/manifests/monitoring/merger.pp !! [16:12:36] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 1.71 ms [16:26:29] 5Continuous-Integration-Scaling, 6operations, 7Puppet: On Jessie, puppet does not start zuul-merger via init scripts - https://phabricator.wikimedia.org/T118861#1811101 (10hashar) 3NEW a:3hashar [16:27:02] andrewbogott: can I grab your hand to facepalm self ? [16:27:14] having zuul-merger to not be run by puppet is intended [16:27:25] just remembered about it when I proposed the task [16:27:36] the idea is to be able to manually stop it without having puppet to interfer [16:28:56] was confused by enable => true [16:29:00] which is really "start at boot" [16:29:00] hashar: does that mean we’re done? [16:29:11] and it lacks ensure => running, [16:29:56] 5Continuous-Integration-Scaling, 6operations, 5Patch-For-Review: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1811113 (10hashar) [16:29:58] 5Continuous-Integration-Scaling, 6operations, 7Puppet: On Jessie, puppet does not start zuul-merger via init scripts - https://phabricator.wikimedia.org/T118861#1811111 (10hashar) 5Open>3Resolved zuul-merger does not have `ensure => running,` so we can stop it manually without having puppet to start... [16:30:00] oh yeah :) So puppet was doing what we told it to do [16:30:00] andrewbogott: yeah pretty much [16:30:09] I will get the iptables rule added tomorrow with european ops [16:30:12] yeah [16:30:13] as usual [16:30:32] the problem was between the keyboard / chair and the poor semantic used by puppet (enable vs ensure) [16:30:40] ok — ping me tomorrow if things aren’t done by the time I’m awake [16:30:59] + side [16:31:07] the git-daemon will now be monitored [16:31:53] and the parent directory of /srv/ssd/zuul/git is now created [16:32:11] want me to merge https://gerrit.wikimedia.org/r/#/c/253622/ ? [16:32:47] andrewbogott: yeah I think it is fine [16:32:55] merely copy pasted [16:33:01] I can get the iptables rule lifted tomorrow since I got access to ferm rules on gallium [16:33:28] and test everything works fine. if so the iptables patch can just be merged [16:33:30] \O/ [16:34:25] thank you very much andrewbogott ! [16:35:24] 7Browser-Tests, 10Continuous-Integration-Config, 10Wikidata, 3Wikidata-Sprint-2015-11-03: create a Wikibase browser test job running against a fresh MediaWiki installation - https://phabricator.wikimedia.org/T118284#1811138 (10JanZerebecki) Patch in wikibase that adds an initial browsertest: https://gerrit... [16:36:15] 7Browser-Tests, 10Continuous-Integration-Config, 10Wikidata, 3Wikidata-Sprint-2015-11-03: create a Wikibase browser test job running against a fresh MediaWiki installation - https://phabricator.wikimedia.org/T118284#1811143 (10JanZerebecki) Job: https://integration.wikimedia.org/ci/job/mwext-mw-selenium-co... [16:47:46] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [16:52:13] (03PS1) 10JanZerebecki: Add set_ext_dependencies to mwext-mw-selenium-composer [integration/config] - 10https://gerrit.wikimedia.org/r/253636 (https://phabricator.wikimedia.org/T118284) [16:52:16] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 0.89 ms [16:55:18] (03CR) 10JanZerebecki: "That job is not whitelisted in test_zuul_layout.py for the check pipeline." [integration/config] - 10https://gerrit.wikimedia.org/r/253343 (https://phabricator.wikimedia.org/T114860) (owner: 10Zfilipin) [17:01:33] (03CR) 10Dduvall: [C: 032] Add set_ext_dependencies to mwext-mw-selenium-composer [integration/config] - 10https://gerrit.wikimedia.org/r/253636 (https://phabricator.wikimedia.org/T118284) (owner: 10JanZerebecki) [17:02:46] (03Merged) 10jenkins-bot: Add set_ext_dependencies to mwext-mw-selenium-composer [integration/config] - 10https://gerrit.wikimedia.org/r/253636 (https://phabricator.wikimedia.org/T118284) (owner: 10JanZerebecki) [17:12:55] (03CR) 10JanZerebecki: [C: 032] Run Ruby jobs using Rake [integration/config] - 10https://gerrit.wikimedia.org/r/252690 (https://phabricator.wikimedia.org/T114860) (owner: 10Zfilipin) [17:13:03] (03PS5) 10JanZerebecki: Run Ruby jobs using Rake [integration/config] - 10https://gerrit.wikimedia.org/r/252690 (https://phabricator.wikimedia.org/T114860) (owner: 10Zfilipin) [17:13:13] (03CR) 10JanZerebecki: [C: 032] Run Ruby jobs using Rake [integration/config] - 10https://gerrit.wikimedia.org/r/252690 (https://phabricator.wikimedia.org/T114860) (owner: 10Zfilipin) [17:14:06] !log Reloading Zuul to deploy I902e9dace28a6e5f42a71f90c86891cfb645b232 [17:14:39] (03Merged) 10jenkins-bot: Run Ruby jobs using Rake [integration/config] - 10https://gerrit.wikimedia.org/r/252690 (https://phabricator.wikimedia.org/T114860) (owner: 10Zfilipin) [17:18:29] (03CR) 10JanZerebecki: [C: 04-1] "Would make CI for that repo fail: https://gerrit.wikimedia.org/r/#/c/253637/1" [integration/config] - 10https://gerrit.wikimedia.org/r/252716 (https://phabricator.wikimedia.org/T110019) (owner: 10Zfilipin) [17:19:07] (03CR) 10JanZerebecki: [C: 04-1] "Would make the repo fail CI: https://gerrit.wikimedia.org/r/#/c/253637/1" [integration/config] - 10https://gerrit.wikimedia.org/r/252689 (https://phabricator.wikimedia.org/T110019) (owner: 10Zfilipin) [17:24:33] PROBLEM - Puppet failure on deployment-eventlogging03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:27:27] (03PS5) 10JanZerebecki: Code review by whitelisted users should triggers tests [integration/config] - 10https://gerrit.wikimedia.org/r/184886 (https://phabricator.wikimedia.org/T64429) (owner: 10Hashar) [17:29:23] (03CR) 10JanZerebecki: [C: 032] Code review by whitelisted users should triggers tests [integration/config] - 10https://gerrit.wikimedia.org/r/184886 (https://phabricator.wikimedia.org/T64429) (owner: 10Hashar) [17:30:34] (03CR) 10Legoktm: [C: 031] "Yay! Please announce this somewhere, as it is a pretty drastic behavior change (CR +1 is not useless anymore)" [integration/config] - 10https://gerrit.wikimedia.org/r/184886 (https://phabricator.wikimedia.org/T64429) (owner: 10Hashar) [17:30:40] (03Merged) 10jenkins-bot: Code review by whitelisted users should triggers tests [integration/config] - 10https://gerrit.wikimedia.org/r/184886 (https://phabricator.wikimedia.org/T64429) (owner: 10Hashar) [17:31:48] (03CR) 10JanZerebecki: [V: 04-1] "Waiting to get dependent patches merged." [integration/config] - 10https://gerrit.wikimedia.org/r/248663 (owner: 10Gergő Tisza) [17:37:44] !log reload zull for 339b575..a2e0173 [17:42:43] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [17:45:37] (03CR) 10JanZerebecki: "Two of the changed repos are now failing." [integration/config] - 10https://gerrit.wikimedia.org/r/252690 (https://phabricator.wikimedia.org/T114860) (owner: 10Zfilipin) [17:45:47] zeljkof: here? ^^ [17:46:06] *sigh* probably means I need to revert. [17:49:30] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL: CRITICAL: deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<33.33%) [18:21:19] (03CR) 10JanZerebecki: "Tested on https://gerrit.wikimedia.org/r/#/c/251924/ . Mail sent to wikitech-l." [integration/config] - 10https://gerrit.wikimedia.org/r/184886 (https://phabricator.wikimedia.org/T64429) (owner: 10Hashar) [18:22:05] (03PS1) 10JanZerebecki: Revert "Run Ruby jobs using Rake" [integration/config] - 10https://gerrit.wikimedia.org/r/253649 [18:23:04] (03PS2) 10JanZerebecki: Revert "Run Ruby jobs using Rake" [integration/config] - 10https://gerrit.wikimedia.org/r/253649 [18:23:12] (03CR) 10JanZerebecki: [C: 032] Revert "Run Ruby jobs using Rake" [integration/config] - 10https://gerrit.wikimedia.org/r/253649 (owner: 10JanZerebecki) [18:33:41] (03Merged) 10jenkins-bot: Revert "Run Ruby jobs using Rake" [integration/config] - 10https://gerrit.wikimedia.org/r/253649 (owner: 10JanZerebecki) [18:36:32] !log reloading zuul for a2e0173..9f35c8d [18:38:30] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Zuul: run 'test' jobs on jenkins when trusted user votes +1 and only 'check' jobs was ran - https://phabricator.wikimedia.org/T64429#1811676 (10JanZerebecki) 5Open>3Resolved [18:40:39] (03CR) 10JanZerebecki: "Please reupload. We can try again when the repos under test are changed so that they will pass the job." [integration/config] - 10https://gerrit.wikimedia.org/r/252690 (https://phabricator.wikimedia.org/T114860) (owner: 10Zfilipin) [18:44:16] hm, i'm having trouble setting up a new trebuchet deploy target [18:44:22] things seem to work...but nothing happens [18:44:24] bd808: ? [18:44:36] i've done this [18:44:36] https://gerrit.wikimedia.org/r/#/c/253637/ [18:44:44] i'm testing in both beta labs and in prod [18:44:51] in prod, i see the new pillars get added on palladium [18:45:07] then i run puppet on tin, but /srv/deployment/eventlogging/eventlogging never shows up [18:46:57] ottomata: I can try to take a look in a few minutes [18:47:05] k [18:47:05] thanks [18:47:49] hoping its not due to case insensitivity [18:47:53] maybe i'll try removing the old target [18:49:10] don't really want to, as i'm not ready to force prod deploys from this new repo yet... [19:05:21] 7Browser-Tests, 10Continuous-Integration-Config, 10Wikidata, 3Wikidata-Sprint-2015-11-17: [Task] Move Wikidata browsertests into Wikibase repository - https://phabricator.wikimedia.org/T118727#1811830 (10JanZerebecki) [19:05:43] 7Browser-Tests, 10Continuous-Integration-Config, 10Wikidata, 5Patch-For-Review, 3Wikidata-Sprint-2015-11-17: create a Wikibase browser test job running against a fresh MediaWiki installation - https://phabricator.wikimedia.org/T118284#1811831 (10JanZerebecki) [19:06:06] 10Continuous-Integration-Config, 10Wikidata, 5Patch-For-Review, 3Wikidata-Sprint-2015-11-17: [Task] Add Wikidata to extension-gate in CI - https://phabricator.wikimedia.org/T96264#1811834 (10JanZerebecki) [19:08:11] 7Browser-Tests, 10Continuous-Integration-Config, 10Wikidata, 5Patch-For-Review, 3Wikidata-Sprint-2015-11-17: create a Wikibase browser test job running against a fresh MediaWiki installation - https://phabricator.wikimedia.org/T118284#1811836 (10JanZerebecki) a:3JanZerebecki [19:22:32] bd808: ping again, am a little lost atm. recommend another helper? :) [19:22:45] ottomata: I think I figured it out [19:22:52] oh! [19:22:54] k... [19:22:55] I believe you have to add your new repo in https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [19:22:56] is it case sensitive? [19:23:04] oh, but in prod its not moving either... [19:23:15] no new repo checked out on tin [19:23:25] will edit deployment prep and try [19:24:01] ottomata: this gets to a place that needs root powers pretty quickly for debugging so I won't be much help outside of beta cluster [19:24:21] when things are right your new repo should show up in /srv/pillars/deployment/repo_config.sls on the salt master [19:24:30] yes, it is there [19:24:37] on palladium [19:24:55] the next thing to try then on tin would be `sudo salt-call deploy.deployment_server_init` and see if it gets a mention [19:25:08] https://wikitech.wikimedia.org/wiki/Trebuchet#Repo_doesn.27t_exist_on_tin [19:26:02] for beta cluster the new repo isn't listed on the salt master because it hasn't been added to the weird on-wiki hiera settings [19:26:11] i just added it [19:26:21] is puppetmaster salt master there?... [19:26:24] Yippee, build fixed! [19:26:25] Project beta-scap-eqiad build #78954: 09FIXED in 41 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/78954/ [19:26:33] no, -salt [19:26:51] * bd808 forces puppet on deployment-salt [19:27:02] ah ha! [19:27:03] [ERROR ] Command '/usr/bin/git clone https://gerrit.wikimedia.org/r/eventlogging/.git /srv/deployment/eventlogging/eventlogging' failed with return code: 128 [19:27:06] thank you, now i'm onto something [19:27:22] cool. permissions problem? [19:27:45] [ERROR ] output: fatal: could not create work tree dir '/srv/deployment/eventlogging/eventlogging'.: Permission denied [19:27:46] yeah [19:28:06] drwxrwsr-x 3 sartoris wikidev 4096 Mar 16 2015 . [19:28:09] vs trebuchet? [19:28:34] should I just chown it? [19:28:41] gonna try... [19:28:42] yeah. sartoris was before the trebuchet rename [19:29:13] i think that works. [19:29:38] yes cool, and it is on deployment-bastion now too [19:29:41] thank you bd808! [19:29:50] ottomata: yw [19:30:00] i shoulda just found that ref in the wiki myself, apologies for bugging, help much appreciated though! [19:30:26] sure. I have that page pretty much memorized :/ [19:30:56] ha [19:31:13] one more q bd808. does puppetmaster self not work on beta labs? i tried to apply it to a node so I can more easily test a puppet patch [19:31:22] i guess i can cherry pick on puppetmaster... [19:32:05] the beta cluster uses deployment-puppetmaster as the "self hosted" puppet [19:32:24] right, but i should be able to override it for an individual node, no? [19:32:28] so the way to test is by cherry picking your patches there (on top of the current stack) [19:32:31] ok [19:32:48] i'll just do that [19:33:00] I think the hiera stuff we have setup makes overriding per-node hard [19:33:26] ah, k [19:33:42] yeah [19:33:42] hiera > the configure instance page [19:33:42] probably [19:33:44] that is probably why [19:34:13] yeah, I think we do it in the Hiera namespace on wikitech and that trumps all the other config locations [19:50:22] hiera is a first value winner take all for the most part, most specific value first [20:04:35] RECOVERY - Puppet failure on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:07:01] (03PS1) 10Gilles: Configure thumbor/exif-optimizer [integration/config] - 10https://gerrit.wikimedia.org/r/253668 (https://phabricator.wikimedia.org/T111722) [20:25:30] PROBLEM - Puppet failure on deployment-eventlogging03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:57:21] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms [22:03:47] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [22:10:38] (03PS1) 10Krinkle: mediawiki/conf: Use wgDebugLogGroups['ratelimit'] instead of wgRateLimitLog [integration/jenkins] - 10https://gerrit.wikimedia.org/r/253762 [22:10:57] (03CR) 10Krinkle: [C: 032] mediawiki/conf: Use wgDebugLogGroups['ratelimit'] instead of wgRateLimitLog [integration/jenkins] - 10https://gerrit.wikimedia.org/r/253762 (owner: 10Krinkle) [22:13:43] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 1.08 ms [22:19:45] (03Merged) 10jenkins-bot: mediawiki/conf: Use wgDebugLogGroups['ratelimit'] instead of wgRateLimitLog [integration/jenkins] - 10https://gerrit.wikimedia.org/r/253762 (owner: 10Krinkle) [22:37:05] PROBLEM - Host deployment-parsoidcache02 is DOWN: CRITICAL - Host Unreachable (10.68.16.145) [23:42:18] RECOVERY - Host deployment-parsoidcache02 is UP: PING OK - Packet loss = 0%, RTA = 0.56 ms [23:54:54] PROBLEM - Puppet failure on pmcache is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]