[00:40:28] 6Labs: Configure a database for services in labtest - https://phabricator.wikimedia.org/T120302#1938637 (10Andrew) 5Open>3Resolved a:3Andrew We're using mysql on labtestcontrol2001. Databases already exist for: -nova -keystone -glance -designate [00:40:29] 6Labs: [Tracking] Create labtest cluster - https://phabricator.wikimedia.org/T120293#1938640 (10Andrew) [00:43:11] 6Labs: Allocate vlan and IPs for labtest VMs - https://phabricator.wikimedia.org/T123817#1938645 (10Andrew) 3NEW [04:46:50] (03CR) 10Legoktm: [C: 04-1] Redis delay to 2 seconds (031 comment) [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/263931 (https://phabricator.wikimedia.org/T112032) (owner: 10Samtar) [04:48:05] (03PS3) 10Legoktm: Bump redis delay to 2 seconds to avoid flooding [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/263931 (https://phabricator.wikimedia.org/T112032) (owner: 10Samtar) [04:49:11] (03CR) 10Legoktm: [C: 032] Bump redis delay to 2 seconds to avoid flooding [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/263931 (https://phabricator.wikimedia.org/T112032) (owner: 10Samtar) [04:49:48] (03Merged) 10jenkins-bot: Bump redis delay to 2 seconds to avoid flooding [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/263931 (https://phabricator.wikimedia.org/T112032) (owner: 10Samtar) [04:50:10] 6Labs, 10Tool-Labs: Install libbytes-random-secure-perl on tool labs - https://phabricator.wikimedia.org/T123824#1938842 (10Anomie) 3NEW [04:50:40] !log tools.wikibugs legoktm: Deployed 9fde1aa8a06e27a8a08f5d2468cc5e00799dd43f Bump redis delay to 2 seconds to avoid flooding wb2-phab [04:51:26] 10Wikibugs, 5Patch-For-Review: wikibugs - throttle output, don't get kicked for flooding - https://phabricator.wikimedia.org/T112032#1938850 (10Legoktm) Let's try it... [04:51:27] 10Wikibugs, 5Patch-For-Review: wikibugs - throttle output, don't get kicked for flooding - https://phabricator.wikimedia.org/T112032#1938850 (10Legoktm) Let's try it... [04:53:19] 10Wikibugs: wikibugs test bug - https://phabricator.wikimedia.org/T1152#1938854 (10Legoktm) ! [04:53:20] 10Wikibugs: wikibugs test bug - https://phabricator.wikimedia.org/T1152#1938854 (10Legoktm) ! [05:41:12] PROBLEM - Puppet staleness on tools-mail-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0] [11:45:44] 6Labs, 10Tool-Labs: Prevent overly-large log files - https://phabricator.wikimedia.org/T122508#1938971 (10Nemo_bis) > The problem with compressing those log files is that the jobs writing to them need to reliably reopen them after they are moved elsewhere Then don't move them? Can use `xz -k` and a separate c... [12:03:55] PROBLEM - Puppet failure on tools-test2-for-backports-scfc is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [12:15:36] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939001 (10doctaxon) Here the next problem (tools.taxonbot): ``` JSUB_OPTIONS=-once -j y -quiet -v LC_ALL=en_US.UTF-8 -mem 1g 0 0 * * * jsub -once -j y -quiet -v LC_ALL=en_US.UTF... [12:43:54] RECOVERY - Puppet failure on tools-test2-for-backports-scfc is OK: OK: Less than 1.00% above the threshold [0.0] [14:29:14] is tools-webgrid-lighttpd-1204 dead or half-dead? I'm unable to ssh in, my tools return 404 but some other tools on it seems only very slow or look like to work [14:30:30] phe: yep, looks dead [14:30:50] yeps, webservice restart timetout [14:37:27] 6Labs, 10Tool-Labs: tools-webgrid-lighttpd-1204 locked up - https://phabricator.wikimedia.org/T123835#1939046 (10valhallasw) 3NEW [14:39:05] 6Labs, 10Tool-Labs: tools-webgrid-lighttpd-1204 locked up - https://phabricator.wikimedia.org/T123835#1939054 (10valhallasw) Can't login from tools-bastion-01, nor with my root key. Rebooting it. [14:41:34] phe: try again? [14:42:54] running now, thanks [14:44:52] 6Labs, 10Tool-Labs: tools-webgrid-lighttpd-1204 puppet tries to downgrade php-* packages - https://phabricator.wikimedia.org/T123836#1939061 (10valhallasw) [14:54:20] 6Labs, 10Tool-Labs: tools-webgrid-lighttpd-1204 locked up - https://phabricator.wikimedia.org/T123835#1939073 (10valhallasw) From kern.log: ``` Jan 16 12:11:43 tools-webgrid-lighttpd-1204 kernel: [1500960.820584] INFO: task lighttpd:18320 blocked for more than 120 seconds. Jan 16 12:11:43 tools-webgrid-lighttp... [17:18:42] (03CR) 10Esanders: "Should have been tagged as T88747" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/188926 (https://phabricator.wikimedia.org/T88474) (owner: 10Merlijn van Deen) [18:52:06] 6Labs, 5Patch-For-Review: Any puppet failure on a labs instance should send an email to project admins - https://phabricator.wikimedia.org/T121773#1939163 (10Andrew) a:3Andrew OK... in projects 'testlabs' and 'puppet' I can do this: echo "this is text" | mail -s 'subject line' andrewbogott@gmail.com and it... [19:09:39] 6Labs: Rename labcontrol2001 to labtestweb2001 - https://phabricator.wikimedia.org/T123790#1939171 (10Andrew) [19:26:38] 6Labs, 6operations, 10ops-codfw, 5Patch-For-Review: Update tag and racktables for labcontrol2001: renamed to labtestweb2001 - https://phabricator.wikimedia.org/T123841#1939187 (10Andrew) 3NEW [19:50:36] kaldari: sorry, my suggestion 'send an email to root@tools' might not have been a very good one. It actually ended up in a filtered directory with lots of autogenerated emails for me :/. [19:51:09] kaldari: In retrospect, I should have suggested the #tool-labs project in phabricator, and I think it also means we don't have a good way to contact roots for private issues (other than private tickets on phab) [20:01:32] 6Labs, 5Patch-For-Review: Rename labcontrol2001 to labtestweb2001 - https://phabricator.wikimedia.org/T123790#1939225 (10Andrew) This box is now renamed and reimaged; all that remains is racktables an tagging (as per subtask T123841) [20:22:31] andrewbogott: would it be possible to make the instance console / vnc tab in Horizon available? It currently errors out with console is currently unavailable. Please try again later.', and I'm not sure how much engineering effort would be required to get it to work [20:26:38] valhallasw`cloud: I haven’t thought about it much. It would be handy but makes me nervous security-wise [20:27:31] mm, right, it's an extra possible attack vector [20:30:34] 6Labs, 5Patch-For-Review: Any puppet failure on a labs instance should send an email to project admins - https://phabricator.wikimedia.org/T121773#1939240 (10scfc) This works for me in an interactive session on `tools-bastion-01`. For grid jobs, @Anomie found out that this can fail under some circumstances (c... [20:32:08] 6Labs, 5Patch-For-Review: Any puppet failure on a labs instance should send an email to project admins - https://phabricator.wikimedia.org/T121773#1939245 (10Andrew) @scfc thank you! I will try that. [20:35:54] andrewbogott: I don't see any mail to you in the exim log other than ones to root@tools [20:37:04] andrewbogott: was that around the same time as your phab message? (18:50 UTC?) [20:39:08] yeah, just a minute or two before [20:39:26] I ran the same command on tools-puppet-is-broken-here-on-purpose.tools.eqiad.wmflabs and on util-abogott.testlabs.eqiad.wmflabs [20:39:30] it worked in testlabs but not in tools [20:40:25] 2016-01-16 06:50:41 1aJzIg-0002oE-Aa no IP address found for host polonium.wikimedia.org [20:40:25] 2016-01-16 06:50:41 1aJzIg-0002oE-Aa == root@wmflabs.org R=smart_route defer (-1): lookup of host "polonium.wikimedia.org" failed in smart_route router [20:40:48] why on earth is it trying to route via polonium O_o [20:42:09] * andrewbogott out for now [20:44:44] 6Labs, 5Patch-For-Review: Any puppet failure on a labs instance should send an email to project admins - https://phabricator.wikimedia.org/T121773#1939274 (10valhallasw) The messages seem to be queued on the test host (tools-puppet-is-broken-here-on-purpose.tools.eqiad.wmflabs), and are not actually being sent... [21:20:11] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939287 (10valhallasw) Number of cronjobs for each minute of the day: {F3234316} There's a huge peak at midnight, but no clear peak at 2300 UTC which suggests that might be a different issue.... [21:50:21] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939297 (10valhallasw) And only about 200 of them actually ran: ``` valhallasw@tools-submit:~$ sudo grep /var/log/syslog.1 -e "Jan 16 00:00" | grep -e "CMD" | wc -l 198 ``` Syslog is full of `... [21:51:37] valhallasw`cloud: I wonder if we should rebuild tools-submit as tools-cronrunner-01 and a bigger trusty machine [21:52:03] YuviPanda: I think that's a good idea in any case, but I also think this shouldn't be a huge issue even for a small server [21:52:15] YuviPanda: and if it's LDAP, the bigger server won't help [21:52:27] and I think LDAP might be part of the reason it takes a minute to run all the jobs [21:52:34] anyway, bed [21:52:57] * YuviPanda nods [21:53:01] yes [21:53:04] LDAP is terrible [21:53:10] moritz is on vacation though [21:53:13] oh well [21:53:19] I too should get off bed now [21:53:21] cya [21:59:18] valhallasw`cloud: I think the answer to the mail issue is “It works as andrew but not as root" [22:00:16] hm… or not [22:00:23] that was true in testlabs but not in tools [22:00:44] tools has its own custom mail config of sorts btw [22:00:49] so it isn't the most accurate testbed [22:00:54] I had totally forgotten about that [22:05:05] YuviPanda: I can’t get it to work as root at all, in tools or in testlabs [22:05:19] ah [22:05:21] ok [22:05:39] but it works as user in testlabs but not as user in tools [22:05:52] so there are many issues :( [22:06:03] I don’t suppose you have a solution to suggest? [22:06:24] no [22:06:27] mail is a dark art [22:06:32] valhallasw`cloud and scfc know more than I do :D [22:06:40] putting it on a task is bound to attract one or both of those [22:06:57] I guess if I have something that will work everywhere /but/ tools I’ll be happy [22:07:04] me too [22:07:10] But I’m not sure what user to use, if not root. [22:10:01] Tools.admin? [22:10:25] valhallasw`cloud: I’m talking about in non-tools instances [22:10:29] I’ve already given up on tools :) [22:28:03] 6Labs, 10Tool-Labs: tools.taxonbot and tools.giftbot cronjobs not firing - https://phabricator.wikimedia.org/T123186#1939336 (10scfc) `cron` keeps a "virtual" time and should not drop jobs unless the difference becomes more than five minutes. The thread at http://lists.arthurdejong.org/nss-pam-ldapd-users/201... [22:45:08] ok, everything I said about things not getting sent as root was wrong — I was just looking in the wrong folder :( [22:53:01] 6Labs, 5Patch-For-Review: Any puppet failure on a labs instance should send an email to project admins - https://phabricator.wikimedia.org/T121773#1939338 (10Andrew) Ok, seems like tools is an outlier. I'm testing in 'testlabs' instead. [22:57:06] PROBLEM - Host tools-puppet-is-broken-here-on-purpose is DOWN: CRITICAL - Host Unreachable (10.68.22.146)