[00:03:54] RECOVERY - Puppet run on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [00:09:02] RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [00:09:17] RECOVERY - Puppet run on tools-webgrid-lighttpd-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [00:13:17] RECOVERY - Puppet run on tools-exec-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [00:13:47] (03CR) 10Dzahn: "all up to the releng team" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/307439 (owner: 10Paladox) [00:16:40] everything ok? [00:22:52] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:23:50] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:25:14] RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [00:25:40] RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:26:08] RECOVERY - Puppet run on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [00:26:12] RECOVERY - Puppet run on tools-worker-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [00:26:40] RECOVERY - Puppet run on tools-web-static-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:26:58] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [00:27:00] RECOVERY - Puppet run on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [00:27:28] RECOVERY - Puppet run on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [00:27:34] RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [00:27:42] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:28:14] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [00:28:22] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [00:28:51] RECOVERY - Puppet run on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [00:28:51] RECOVERY - Puppet run on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [00:29:03] RECOVERY - Puppet run on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [00:29:25] RECOVERY - Puppet run on tools-grid-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [00:29:43] RECOVERY - Puppet run on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [00:29:45] RECOVERY - Puppet run on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [00:29:49] RECOVERY - Puppet run on tools-worker-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [00:29:53] RECOVERY - Puppet run on tools-worker-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [00:30:05] RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [00:30:13] RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [00:30:25] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [00:30:35] RECOVERY - Puppet run on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0] [00:30:37] RECOVERY - Puppet run on tools-worker-1009 is OK: OK: Less than 1.00% above the threshold [0.0] [00:30:39] RECOVERY - Puppet run on tools-web-static-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:30:57] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [00:31:11] RECOVERY - Puppet run on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0] [00:31:19] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [00:31:25] RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:31:51] RECOVERY - Puppet run on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [00:32:01] RECOVERY - Puppet run on tools-checker-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:32:05] RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [00:32:19] RECOVERY - Puppet run on tools-worker-1008 is OK: OK: Less than 1.00% above the threshold [0.0] [00:32:42] RECOVERY - Puppet run on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [00:33:00] RECOVERY - Puppet run on tools-webgrid-lighttpd-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [00:33:04] RECOVERY - Puppet run on tools-static-02 is OK: OK: Less than 1.00% above the threshold [0.0] [00:34:32] RECOVERY - Puppet run on tools-exec-1220 is OK: OK: Less than 1.00% above the threshold [0.0] [00:34:48] RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [00:35:48] RECOVERY - Puppet run on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [00:45:25] (03PS1) 10BryanDavis: jsub: Add warnings for precise deprecation [labs/toollabs] - 10https://gerrit.wikimedia.org/r/307461 (https://phabricator.wikimedia.org/T143282) [00:48:46] everything should be good now [01:09:31] !log exim deleted the only instance exim-jessie as requested by andrew [01:09:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Exim/SAL, Master [01:20:56] i'm trying to double-check if "role::labs::extdist" is used by anything or not [01:21:05] i have https://tools.wmflabs.org/watroles/role/role::labs::extdist looking like it's not [01:21:20] is that the right way/URL to check? [02:18:52] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [02:37:12] 10Tool-Labs-tools-Pageviews: Add "wiki page" as a source to Massviews - https://phabricator.wikimedia.org/T144251#2593411 (10MusikAnimal) [02:53:51] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [04:41:56] 06Labs: Clean up data in /data/scratch/mwoffliner - https://phabricator.wikimedia.org/T144025#2593457 (10Kelson) @madhuvishy OK, I can live with that. [05:19:22] 06Labs, 06Operations, 13Patch-For-Review: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2593476 (10madhuvishy) In order to backup scratch from labstore1001 to labstore1003 using rsync: ### snapshot `lvcreate -L1T -s -n backup-scratch /dev/labstore/scratch` ### mount `mount /... [05:39:25] 06Labs, 06Operations, 07Tracking: Migrate tools-project and others(Labs) data from labstore1001 to labstore1004/5 - https://phabricator.wikimedia.org/T144255#2593494 (10madhuvishy) [05:54:13] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [06:01:12] PROBLEM - Puppet staleness on tools-exec-1204 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [06:03:00] PROBLEM - Puppet staleness on tools-exec-1213 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [06:21:24] PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [06:22:12] PROBLEM - Puppet run on tools-exec-1216 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:22:59] uhhh [06:34:14] RECOVERY - Puppet run on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [06:36:12] RECOVERY - Puppet staleness on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [3600.0] [06:37:19] madhuvishy every day at somewhere around this time, the puppetmaster logrotates and a random bunch of nodes that were running puppet at exactly that time will fail [06:37:52] yuvipanda: oho [06:38:00] RECOVERY - Puppet staleness on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [3600.0] [06:38:13] madhuvishy or could be something else, which is annoying. [06:38:24] should write the less-flaky puppet test at some point [06:38:36] yuvipanda: i ran puppet on some of them, it runs fine [06:38:45] cool [06:39:27] * yuvipanda goes to bed [06:39:38] good night yuvipanda :) [06:42:06] RECOVERY - Puppet run on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [06:42:29] 06Labs: Clean up data in /data/scratch/mwoffliner - https://phabricator.wikimedia.org/T144025#2593628 (10madhuvishy) @Kelson Thank you! [06:43:29] PROBLEM - Puppet run on tools-prometheus-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:46:23] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [06:47:59] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [06:49:13] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [06:51:11] RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [06:58:18] PROBLEM - Puppet run on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:01:06] RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [07:01:48] PROBLEM - Puppet run on tools-exec-1408 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:18:30] RECOVERY - Puppet run on tools-prometheus-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:27:59] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [07:38:18] RECOVERY - Puppet run on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [07:41:48] RECOVERY - Puppet run on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [08:14:39] 06Labs, 10Tool-Labs, 13Patch-For-Review: Install pdf2djvu for Wikisource DjVu aid - https://phabricator.wikimedia.org/T130138#2593807 (10Nemo_bis) 05Open>03Resolved a:03valhallasw [09:02:12] 06Labs, 10Tool-Labs: Maintainers are not shown in the Tools list - https://phabricator.wikimedia.org/T142684#2593876 (104nn1l2) [09:44:56] PROBLEM - Puppet run on tools-exec-1215 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [09:56:25] 06Labs, 10Analytics: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594010 (10Stigmj) Any chance for this to be expedited? At nowiki we are currently using these datasets for a local stats-service (https://tools.wmflabs.org/pagecount/) and the users are complaining (ht... [10:19:57] RECOVERY - Puppet run on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [10:59:09] !log stashbot bounce stashbot, not seen on irc [10:59:09] stashbot is not a valid project. [10:59:16] !log tools bounce stashbot, not seen on irc [10:59:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [11:00:30] (03CR) 10Hashar: [C: 04-1] "I dont think we maintain GerritBot." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/307439 (owner: 10Paladox) [11:01:24] (03Abandoned) 10Paladox: Add GerritBot project to #wikimedia-releng channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/307439 (owner: 10Paladox) [11:01:26] (03CR) 10Hashar: [C: 04-1] "I dont think we maintain grrrr-bot" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/307441 (owner: 10Paladox) [11:02:46] (03PS3) 10Paladox: Add grrrit-wm project to #wikimedia-labs channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/307441 [11:03:15] (03Restored) 10Paladox: Add GerritBot project to #wikimedia-releng channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/307439 (owner: 10Paladox) [11:03:36] (03PS3) 10Paladox: Add GerritBot project to #wikimedia-releng channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/307439 [11:03:44] (03PS4) 10Paladox: Add GerritBot project to #wikimedia-labs channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/307439 [11:35:12] where do I check the refresh delay of labs to enWS db? [11:35:19] <- forgotten clearly [11:36:08] sDrewth, https://tools.wmflabs.org/replag/ [11:36:29] or on heartbeat_p db [11:36:40] rhx [11:36:42] thx [11:58:36] What IP address range does Labs use? [11:59:10] I can see it uses 10.*.*.*, but is there a more specific range? [12:06:45] tom29739: please see here: https://github.com/wikimedia/operations-puppet/blob/production/modules/network/spec/fixtures/hieradata/common.yaml [12:07:18] Thanks. [12:25:56] PROBLEM - Puppet run on tools-exec-1215 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [12:53:03] 06Labs, 06Analytics-Kanban, 13Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594241 (10Ottomata) p:05Triage>03High a:03Ottomata [13:20:58] RECOVERY - Puppet run on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [13:44:22] 06Labs: Can not kill job on tools labs - https://phabricator.wikimedia.org/T138924#2594405 (10Magnus) No, after the two month this ticket lay dormant, it appears to have died on its own. Good thing I followed the advice of using Phabricator over other means of communication to contact roots... [14:36:09] RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 0.91 ms [14:39:23] RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 1.38 ms [14:42:00] PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218) [14:42:26] RECOVERY - Host tools-secgroup-test-102 is UP: PING OK - Packet loss = 0%, RTA = 0.74 ms [14:45:04] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [14:45:54] PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170) [15:03:06] hi http://quarry.wmflabs.org/ give me bad gateway are you aware ? [15:07:33] 06Labs, 06Operations: Connect secondary nic for labstore1004 and labstore1005 - https://phabricator.wikimedia.org/T144183#2594689 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson @chasemp Second NIC cabled up for both...updated switch description and enabled port. Vlan will need to be updated. labstore100... [15:07:35] 06Labs, 06Operations, 13Patch-For-Review, 07Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#2594692 (10Cmjohnson) [15:07:46] xcombelle thanks for reporting, I'm looking into it now [15:07:52] thanks yuvipanda [15:19:52] xcombelle should be back up now! [15:20:37] perfect yuvipanda well done [15:43:25] 06Labs, 06Analytics-Kanban, 13Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594755 (10Ottomata) Done! I'm running the first rsync over now. I didn't create a specific rsync module, it seems like the ::dumps one should be enough. @Stigmj, I don't... [16:01:31] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 13Patch-For-Review, 15User-bd808: Make jsub warn when run without `-l release=...` - https://phabricator.wikimedia.org/T143282#2594834 (10bd808) a:03bd808 [16:09:46] 06Labs, 06Analytics-Kanban, 13Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594863 (10Nuria) 05Open>03Resolved [16:30:25] 06Labs, 06Analytics-Kanban, 13Patch-For-Review: Put pageviews dataset in labs /public/dumps - https://phabricator.wikimedia.org/T142671#2594927 (10Stigmj) @Ottomata yeah they are there. Thank you. [16:57:52] * valhallasw`cloud prods valhallasw`vecto [17:08:04] * bd808 worries about valhallasw`cloud talking to himself in public [18:09:14] !log deployment-prep reboot deployment-kafka03 seems to be stuck [18:09:14] Please !log in #wikimedia-releng for beta cluster SAL [18:09:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL, Master [18:39:30] 06Labs, 10Tool-Labs: Puppet not running on tools-webgrid-lighttpd-1207 - https://phabricator.wikimedia.org/T143191#2595438 (10valhallasw) This seems to be related to the root password that was introduced a week or so ago. On hosts that are OK, `/etc/shadow` looks like this: ``` valhallasw@tools-webgrid-lightt... [18:49:23] 06Labs, 10Tool-Labs: Puppet not running on tools-webgrid-lighttpd-1207 - https://phabricator.wikimedia.org/T143191#2595450 (10valhallasw) This seems to be a remainder of a half-applied puppet change. The solution is simple -- remove the newline, and puppet will clean up the shadow file on the first run. [18:54:00] !log tools edited /etc/shadow on a range of hosts to fix https://phabricator.wikimedia.org/T143191 [18:54:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [18:57:39] 06Labs, 10Tool-Labs: Puppet not running on tools-webgrid-lighttpd-1207 - https://phabricator.wikimedia.org/T143191#2595504 (10valhallasw) 05Open>03Resolved a:03valhallasw did this on: * tools-webgrid-lighttpd-1207 * tools-webgrid-lighttpd-1208 * tools-exec-1204 * tools-exec-1211 * tools-exec-1213 T... [19:00:15] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [3600.0] [19:10:55] RECOVERY - Puppet staleness on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [3600.0] [19:20:04] 10Quarry: Forking your own query results in a new one owned by YuviPanda - https://phabricator.wikimedia.org/T144309#2595654 (10Huji) [19:28:21] RECOVERY - Puppet staleness on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [3600.0] [19:37:20] PROBLEM - Puppet run on tools-docker-builder-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:40:24] RECOVERY - Puppet staleness on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [3600.0] [20:25:14] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:59:35] 10Quarry: Forking your own query results in a new one owned by YuviPanda - https://phabricator.wikimedia.org/T144309#2595935 (10yuvipanda) whoops :( try again? [21:18:05] 06Labs, 10Labs-Infrastructure: Move designate and nova plugins out of packages and straight into the puppet repo - https://phabricator.wikimedia.org/T144317#2595992 (10Andrew) [21:18:13] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Once we have Liberty: remove project-id logic from designate/ldap plugin, use project_id in metadata instead. - https://phabricator.wikimedia.org/T105891#2596005 (10Andrew) https://phabricator.wikimedia.org/T144317 [21:28:37] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Once we have Liberty: remove project-id logic from designate/ldap plugin, use project_id in metadata instead. - https://phabricator.wikimedia.org/T105891#2596027 (10Andrew) 05Open>03Resolved One! Less! Hack! [23:20:17] !log deployment-prep removed 'project_id' key from deployment-restbase02's metadata to fix compatibility with the new labsprojectfrommetadata code [23:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL, Master [23:43:26] 06Labs, 06Operations, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#2596341 (10yuvipanda) deployment-prep is done! \o/ None of the instances have duplicates in their puppet.conf either! \o/