[00:37:58] PROBLEM - Host tools-docker-builder-01 is DOWN: CRITICAL - Host Unreachable (10.68.19.180) [00:44:57] RECOVERY - Free space - all mounts on tools-worker-1003 is OK: OK: tools.tools-worker-1003.diskspace._var_lib_docker.byte_percentfree (No valid datapoints found) tools.tools-worker-1003.diskspace._public_dumps.byte_percentfree (No valid datapoints found) [01:13:22] RECOVERY - Puppet staleness on tools-worker-1003 is OK: OK: Less than 1.00% above the threshold [3600.0] [01:22:30] 06Labs: Log files filling up k8s worker nodes - https://phabricator.wikimedia.org/T148487#2724374 (10yuvipanda) [01:41:51] PROBLEM - Host tools-k8s-master-02 is DOWN: CRITICAL - Host Unreachable (10.68.20.144) [01:45:00] PROBLEM - Host tools-puppetmaster-02 is DOWN: CRITICAL - Host Unreachable (10.68.21.122) [02:09:41] PROBLEM - Puppet run on tools-docker-builder-02 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [02:19:40] RECOVERY - Puppet run on tools-docker-builder-02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:35:52] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [03:36:51] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [05:25:45] 06Labs, 10labs-sprint-116, 10DBA, 13Patch-For-Review: Make watchlist table available on labs - https://phabricator.wikimedia.org/T59617#2724516 (10MZMcBride) >>! In T59617#2384976, @jcrespo wrote: > I have now the watchlist count generator running on all public wikis. Someone just asked me about generatin... [06:20:41] PROBLEM - Puppet run on tools-docker-builder-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:30:40] RECOVERY - Puppet run on tools-docker-builder-02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:40:53] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:41:48] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:55:46] 06Labs, 10Tool-Labs: Tools puppet runs hanging - https://phabricator.wikimedia.org/T148244#2718154 (10yuvipanda) I've an instance now (tools-puppetmaster-02) that's role::puppetmaster::standalone, with all the private cherrypicks in it. tools-puppetmaster-02 itself is on the labs puppetmaster, though. [07:06:52] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:16:41] PROBLEM - Puppet run on tools-docker-builder-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:27:26] PROBLEM - Puppet run on tools-exec-1214 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:27:48] PROBLEM - Puppet run on tools-proxy-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:27:50] PROBLEM - Puppet run on tools-webgrid-lighttpd-1205 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:28:00] PROBLEM - Puppet run on tools-exec-1201 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:28:00] PROBLEM - Puppet run on tools-worker-1011 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:28:02] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:28:12] PROBLEM - Puppet run on tools-grid-master is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:28:12] PROBLEM - Puppet run on tools-worker-1016 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:28:16] PROBLEM - Puppet run on tools-exec-1410 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:28:18] PROBLEM - Puppet run on tools-k8s-etcd-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:28:19] PROBLEM - Puppet run on tools-flannel-etcd-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:28:26] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:28:28] PROBLEM - Puppet run on tools-exec-1404 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:28:32] PROBLEM - Puppet run on tools-k8s-etcd-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:28:38] PROBLEM - Puppet run on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:28:42] PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:28:48] PROBLEM - Puppet run on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:28:54] PROBLEM - Puppet run on tools-worker-1005 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:28:59] PROBLEM - Puppet run on tools-exec-1215 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:28:59] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:29:03] PROBLEM - Puppet run on tools-checker-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:29:15] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:29:19] PROBLEM - Puppet run on tools-exec-1203 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:29:19] PROBLEM - Puppet run on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:29:19] PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:29:21] PROBLEM - Puppet run on tools-webgrid-lighttpd-1210 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:29:23] PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:29:27] PROBLEM - Puppet run on tools-exec-1211 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:29:31] PROBLEM - Puppet run on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:29:41] PROBLEM - Puppet run on tools-k8s-master-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:29:45] PROBLEM - Puppet run on tools-exec-1408 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:29:53] PROBLEM - Puppet run on tools-worker-1006 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:30:01] PROBLEM - Puppet run on tools-webgrid-lighttpd-1401 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:30:01] PROBLEM - Puppet run on tools-k8s-etcd-03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:30:03] PROBLEM - Puppet run on tools-worker-1020 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:30:03] PROBLEM - Puppet run on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:30:05] PROBLEM - Puppet run on tools-elastic-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:30:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:30:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1204 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:30:13] PROBLEM - Puppet run on tools-exec-gift is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:30:14] PROBLEM - Puppet run on tools-worker-1019 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:30:19] PROBLEM - Puppet run on tools-webgrid-lighttpd-1202 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:30:20] PROBLEM - Puppet run on tools-worker-1008 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:30:21] PROBLEM - Puppet run on tools-webgrid-lighttpd-1206 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:30:23] PROBLEM - Puppet run on tools-grid-shadow is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:30:26] PROBLEM - Puppet run on tools-worker-1022 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:30:32] PROBLEM - Puppet run on tools-exec-1220 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:30:34] PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:30:34] PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:30:36] PROBLEM - Puppet run on tools-worker-1009 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:30:40] PROBLEM - Puppet run on tools-worker-1021 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:30:40] PROBLEM - Puppet run on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:30:42] PROBLEM - Puppet run on tools-exec-1209 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:30:44] PROBLEM - Puppet run on tools-worker-1018 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:30:48] PROBLEM - Puppet run on tools-webgrid-lighttpd-1415 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:30:48] PROBLEM - Puppet run on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:30:48] PROBLEM - Puppet run on tools-exec-1405 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:30:50] PROBLEM - Puppet run on tools-elastic-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:30:50] PROBLEM - Puppet run on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:30:51] PROBLEM - Puppet run on tools-mail is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:30:54] PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:31:00] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:31:04] PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:31:06] PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:31:10] PROBLEM - Puppet run on tools-exec-1216 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:31:12] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:31:16] PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:31:17] PROBLEM - Puppet run on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:31:22] PROBLEM - Puppet run on tools-logs-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:31:24] PROBLEM - Puppet run on tools-cron-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:31:26] PROBLEM - Puppet run on tools-static-10 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:31:31] PROBLEM - Puppet run on tools-worker-1017 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:31:31] PROBLEM - Puppet run on tools-redis-1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:31:37] PROBLEM - Puppet run on tools-exec-1406 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:31:39] PROBLEM - Puppet run on tools-exec-1217 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:31:39] PROBLEM - Puppet run on tools-docker-registry-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:31:39] PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:31:41] RECOVERY - Puppet run on tools-docker-builder-02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:31:41] PROBLEM - Puppet run on tools-exec-1218 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:31:49] PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:31:51] PROBLEM - Puppet run on tools-exec-1208 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:31:52] ^yuvipanda is that you? [07:31:59] PROBLEM - Puppet run on tools-exec-1219 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:32:02] jynus: not entirely. [07:32:07] I'm around and working on something related [07:32:14] which is replacing this dying puppetmaster [07:32:20] it just seems to have chosen this moment to die [07:32:25] :-( [07:32:47] PROBLEM - Puppet run on tools-worker-1012 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:32:51] PROBLEM - Puppet run on tools-exec-1205 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:33:01] !log tools restarted puppetmaster on tools-puppetmaster-01 [07:33:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [07:33:15] PROBLEM - Puppet run on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:33:17] PROBLEM - Puppet run on tools-exec-1207 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:38:20] PROBLEM - Puppet run on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:39:19] @quiet shinken-wm [07:39:20] RECOVERY - Puppet run on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [07:39:24] meh [07:40:01] !log tools complete moving all general tools exec nodes to tools-puppetmaster-02 [07:40:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [07:40:54] RECOVERY - Puppet run on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [07:41:08] RECOVERY - Puppet run on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [07:41:36] RECOVERY - Puppet run on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0] [07:41:38] RECOVERY - Puppet run on tools-exec-1218 is OK: OK: Less than 1.00% above the threshold [0.0] [07:41:50] RECOVERY - Puppet run on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [07:41:58] RECOVERY - Puppet run on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0] [07:42:52] RECOVERY - Puppet run on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [07:43:00] RECOVERY - Puppet run on tools-exec-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [07:43:13] !log tools move all tools webgrid nodes to tools-puppetmaster-02 too [07:43:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [07:43:18] RECOVERY - Puppet run on tools-exec-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [07:43:28] RECOVERY - Puppet run on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [07:44:23] 06Labs, 10Tool-Labs: Tools puppet runs hanging - https://phabricator.wikimedia.org/T148244#2724581 (10yuvipanda) I've moved all general exec nodes as well as all the webgrid nodes to tools-puppetmaster-02 now [07:44:28] RECOVERY - Puppet run on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [0.0] [07:45:23] RECOVERY - Puppet run on tools-webgrid-lighttpd-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [07:45:31] RECOVERY - Puppet run on tools-exec-1220 is OK: OK: Less than 1.00% above the threshold [0.0] [07:45:41] RECOVERY - Puppet run on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [07:45:51] RECOVERY - Puppet run on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [07:45:59] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [07:46:13] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [07:46:15] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [07:46:39] RECOVERY - Puppet run on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [07:47:27] RECOVERY - Puppet run on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [07:48:15] RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [07:48:59] RECOVERY - Puppet run on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [07:49:47] RECOVERY - Puppet run on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [07:50:25] jynus: I moved about 62 nodes to the new puppetmaster I'd built, and things are better again [07:50:35] RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [07:50:38] new one is based off the prod role, uses apache + passenger, so can actually handle the instances [07:50:39] \o/ [07:50:49] I'll migrate the rest tomorrow [07:50:59] yes, go have some rest [07:51:22] yes [07:51:32] RECOVERY - Puppet run on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0] [07:53:02] RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [07:53:12] RECOVERY - Puppet run on tools-worker-1016 is OK: OK: Less than 1.00% above the threshold [0.0] [07:53:18] RECOVERY - Puppet run on tools-k8s-etcd-02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:53:18] RECOVERY - Puppet run on tools-flannel-etcd-02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:53:34] RECOVERY - Puppet run on tools-k8s-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:53:50] RECOVERY - Puppet run on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [07:54:00] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [07:54:16] RECOVERY - Puppet run on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [07:54:42] RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:55:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [07:55:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [07:55:13] RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [07:55:18] RECOVERY - Puppet run on tools-webgrid-lighttpd-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [07:55:38] RECOVERY - Puppet run on tools-worker-1009 is OK: OK: Less than 1.00% above the threshold [0.0] [07:55:44] RECOVERY - Puppet run on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [07:55:50] RECOVERY - Puppet run on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [07:55:51] RECOVERY - Puppet run on tools-elastic-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:55:51] RECOVERY - Puppet run on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [07:56:17] RECOVERY - Puppet run on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [07:56:25] RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:56:39] RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [07:56:53] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:57:49] RECOVERY - Puppet run on tools-worker-1012 is OK: OK: Less than 1.00% above the threshold [0.0] [07:57:51] RECOVERY - Puppet run on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [07:58:01] RECOVERY - Puppet run on tools-worker-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [07:58:11] RECOVERY - Puppet run on tools-grid-master is OK: OK: Less than 1.00% above the threshold [0.0] [07:58:13] RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [07:58:25] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [07:58:37] RECOVERY - Puppet run on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [07:58:43] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:59:03] RECOVERY - Puppet run on tools-checker-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:59:19] RECOVERY - Puppet run on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [07:59:21] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [07:59:22] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [07:59:23] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [07:59:29] RECOVERY - Puppet run on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [07:59:55] RECOVERY - Puppet run on tools-worker-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [07:59:57] RECOVERY - Puppet run on tools-k8s-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:02] RECOVERY - Puppet run on tools-webgrid-lighttpd-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:04] RECOVERY - Puppet run on tools-elastic-02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:04] RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:06] RECOVERY - Puppet run on tools-worker-1020 is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:20] RECOVERY - Puppet run on tools-worker-1008 is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:24] RECOVERY - Puppet run on tools-grid-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:34] RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:38] RECOVERY - Puppet run on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:38] RECOVERY - Puppet run on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:48] RECOVERY - Puppet run on tools-webgrid-lighttpd-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [08:00:52] RECOVERY - Puppet run on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [08:01:06] RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [08:01:30] RECOVERY - Puppet run on tools-redis-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [08:01:38] RECOVERY - Puppet run on tools-docker-registry-01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:01:46] RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [08:03:18] RECOVERY - Puppet run on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [08:03:54] RECOVERY - Puppet run on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [08:05:14] RECOVERY - Puppet run on tools-worker-1019 is OK: OK: Less than 1.00% above the threshold [0.0] [08:05:26] RECOVERY - Puppet run on tools-worker-1022 is OK: OK: Less than 1.00% above the threshold [0.0] [08:06:05] RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [08:06:25] RECOVERY - Puppet run on tools-logs-02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:06:25] RECOVERY - Puppet run on tools-static-10 is OK: OK: Less than 1.00% above the threshold [0.0] [08:07:49] RECOVERY - Puppet run on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:31:47] 06Labs, 10labs-sprint-116, 10DBA, 13Patch-For-Review: Make watchlist table available on labs - https://phabricator.wikimedia.org/T59617#2724627 (10jcrespo) The tables actually are generated physically, but work is being done as of this moment by labs engineers to make those available on all wikis. Stay tun... [09:27:43] (03PS1) 10Gehel: maps - change structure of slaves according to https://gerrit.wikimedia.org/r/#/c/315271/ [labs/private] - 10https://gerrit.wikimedia.org/r/316536 (https://phabricator.wikimedia.org/T147194) [09:28:06] (03CR) 10Gehel: [C: 032 V: 032] maps - change structure of slaves according to https://gerrit.wikimedia.org/r/#/c/315271/ [labs/private] - 10https://gerrit.wikimedia.org/r/316536 (https://phabricator.wikimedia.org/T147194) (owner: 10Gehel) [10:54:42] 06Labs, 10Labs-Infrastructure, 10DBA: Implement proxysql both for labs and for later production usage - https://phabricator.wikimedia.org/T148500#2724798 (10jcrespo) [10:55:18] 06Labs, 10Labs-Infrastructure, 10DBA: Implement proxysql both for labs and for later production usage - https://phabricator.wikimedia.org/T148500#2724813 (10jcrespo) proxysql 1.2.4 is now available on the wikimedia repository (jessie only). [11:48:15] (03PS1) 10Gehel: maps - adding dummy monitoring password for postgresql [labs/private] - 10https://gerrit.wikimedia.org/r/316549 (https://phabricator.wikimedia.org/T147194) [11:50:14] (03CR) 10Gehel: [C: 032 V: 032] maps - adding dummy monitoring password for postgresql [labs/private] - 10https://gerrit.wikimedia.org/r/316549 (https://phabricator.wikimedia.org/T147194) (owner: 10Gehel) [12:45:17] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Implement proxysql both for labs and for later production usage - https://phabricator.wikimedia.org/T148500#2724798 (10jcrespo) a:03jcrespo [14:12:09] (03PS1) 10Ricordisamoa: Get rid of fake_globals hack [labs/tools/ptable] - 10https://gerrit.wikimedia.org/r/316559 [14:17:16] 06Labs, 10Tool-Labs, 13Patch-For-Review: Install python-pyicu - https://phabricator.wikimedia.org/T102165#2725243 (10Aklapper) 05Open>03Resolved a:03Aklapper Not #easy as this requires special permissions; plus looks like this task got fixed and forgotten to be closed. [15:20:12] 06Labs, 07Puppet: Puppet parser, puppet API, and inline docs - https://phabricator.wikimedia.org/T148479#2725484 (10Andrew) This appears to be fixed in 3.8.5 [15:22:43] 06Labs: Move all of Labs (including self-hosted masters) to puppet 3.8.5 - https://phabricator.wikimedia.org/T148431#2725500 (10Andrew) [15:22:45] 06Labs, 07Puppet: Puppet parser, puppet API, and inline docs - https://phabricator.wikimedia.org/T148479#2725499 (10Andrew) [15:34:46] 06Labs, 10DBA: Make watchlist table available as curated foo_p.watchlist_count on labsdb - https://phabricator.wikimedia.org/T59617#2725676 (10chasemp) [15:38:15] 06Labs, 07Puppet: Puppet parser, puppet API, and inline docs - https://phabricator.wikimedia.org/T148479#2725683 (10Andrew) p:05Triage>03Low [17:39:01] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: Create maintain-views user for labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T148560#2726213 (10chasemp) [17:40:53] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2726240 (10chasemp) [17:40:56] 06Labs, 10DBA: Make watchlist table available as curated foo_p.watchlist_count on labsdb - https://phabricator.wikimedia.org/T59617#609458 (10chasemp) [17:41:00] 06Labs, 10Wikimedia-Labs-General, 10DBA, 06Operations, 07Tracking: Database replication services (tracking) - https://phabricator.wikimedia.org/T50930#2726242 (10chasemp) [17:41:04] 06Labs, 10Labs-Infrastructure, 10DBA: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2400728 (10chasemp) 05Open>03Resolved I'm resolving this as I believe {T148560}, {T147302} and {T59617} are now the relevant work tasks. Big thanks to @Krenair [17:45:07] 06Labs, 10Labs-Infrastructure, 10DBA, 10MediaWiki-extensions-ORES, 06Revision-Scoring-As-A-Service: Replicate ores_classification and ores_model on labs - https://phabricator.wikimedia.org/T148561#2726255 (10Ladsgroup) [17:45:31] 06Labs, 10Labs-Infrastructure, 10DBA, 10MediaWiki-extensions-ORES, 06Revision-Scoring-As-A-Service: Replicate ores_classification and ores_model tables in labs - https://phabricator.wikimedia.org/T148561#2726270 (10Ladsgroup) [17:48:36] !log maps Disabled puppet across maps hosts [17:48:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL, Master [17:49:21] !log maps Rolling out https://gerrit.wikimedia.org/r/#/c/316482/ on maps instances for T147657 [17:49:22] T147657: Move maps share to labstore1003 - https://phabricator.wikimedia.org/T147657 [17:49:25] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL, Master [17:50:50] (03CR) 10ArthurPSmith: [C: 031] "Hey, I caught this before you were done for once. I don't know I'd call this a "hack", it seems a cleaner way to handle it. I looked throu" [labs/tools/ptable] - 10https://gerrit.wikimedia.org/r/316559 (owner: 10Ricordisamoa) [18:14:19] Well, I have a problem now in running the jar files. Yesterday I could'nt run because the java version, but I fix the problem. Now I have a problem in running jars with crontab. [18:15:37] when I write for exampe: "1 * * * * java -jar update/dist/update.jar" It does not work, and also doesn't give me any error. [18:15:57] I need a way to find out if the cron has an error. [18:16:17] PROBLEM - Host tools-exec-cyberbot is DOWN: CRITICAL - Host Unreachable (10.68.16.39) [18:30:54] Any help? [18:34:41] The cron entry looks right to me... does it output to STDOUT? If so, maybe add some logging? Or output to a file? [18:36:31] 06Labs: Request creation of status labs project - https://phabricator.wikimedia.org/T148569#2726492 (10Matthewrbowker) [18:38:32] Matthew_: Thank's for your reply. Is there a tutorial about that? [18:39:02] Hm... I'd have to look, I'm very rusty in Java. Is your program outputting anything to the console? [18:39:57] If yes, You can change your command to be "java -jar update/dist/update.jar > PATH/TO/LOG/FILE" (though I'm not completely sure how that works with the Grid) [18:40:27] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: Create maintain-views user for labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T148560#2726213 (10AlexMonk-WMF) (a script run would also handle wikimania2017wiki_p and tcywiki_p which are currently missing) [18:40:41] 06Labs, 10DBA: Make watchlist table available as curated foo_p.watchlist_count on labsdb - https://phabricator.wikimedia.org/T59617#2726519 (10chasemp) I think with T148560 this is acheivable [18:41:31] When I prompt the cron, It give me nothing, if there is an error in the paths, I recieve an email, else; I don't see any thing. [18:42:05] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: Create maintain-views user for labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T148560#2726213 (10chasemp) >>! In T148560#2726515, @AlexMonk-WMF wrote: > (a script run would also handle wikimania2017wiki_p and tcywiki_p which are currently m... [18:43:27] ASammour: uh, your cron should be changed into something else [18:43:30] with a jsub in front [18:43:45] so the output is probably in java.err/java.out in your home dir [18:45:05] 06Labs: Increase quota for tools project - https://phabricator.wikimedia.org/T146322#2726540 (10chasemp) +1 @Andrew could you knock this out sometime this week? [18:45:45] Can you please edit the cron for me?. I'm a windows man :) [18:45:57] 06Labs, 10Tool-Labs: tools-exec-cyberbot in SHUTOFF state - https://phabricator.wikimedia.org/T147805#2726541 (10chasemp) thanks @Luke081515Socke double ping for @Cyberpower678. any objections? [18:46:50] ASammour: which tool? [18:47:28] Here is the pure code: 1 * * * * /usr/bin/jsub -N cron-tools.sammour-1 -once -quiet java -jar root/data/project/sammour/testt/dist/testt.jar [18:47:58] then the output will be in cron-tools.sammour-1.err and .out [18:49:35] valhallasw`cloud: Thank you very much. I see the error log file. [18:57:38] 06Labs, 06Operations, 13Patch-For-Review: Move maps share to labstore1003 - https://phabricator.wikimedia.org/T147657#2726587 (10madhuvishy) Maps migration details Announcement: The /data/project/maps NFS share is undergoing maintenance, starting 9 AM PST (16:00 GMT) and will be unavailable for a short win... [19:07:25] RECOVERY - Host tools-secgroup-test-102 is UP: PING OK - Packet loss = 0%, RTA = 0.66 ms [19:11:04] PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170) [19:13:32] 06Labs: Request creation of hound labs project - https://phabricator.wikimedia.org/T148573#2726640 (10Dereckson) [19:15:32] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: Create maintain-views user for labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T148560#2726654 (10chasemp) a:05chasemp>03None [19:16:06] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: Create maintain-views user for labsdb1001 and labsdb1003 - https://phabricator.wikimedia.org/T148560#2726213 (10chasemp) I'm not sure what the right steps are to do this through Puppet so I'm hoping to connect with one of the #DBA folks to knock it out so... [19:21:10] 06Labs: Request creation of hound labs project - https://phabricator.wikimedia.org/T148573#2726669 (10Dereckson) [19:24:33] 06Labs: Request creation of status labs project - https://phabricator.wikimedia.org/T148569#2726492 (10chasemp) I'm unclear what the plan is here, we are moving towards a model of prometheus integration with k8s. Is this to pursue something separate from that? [19:34:24] RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 0.94 ms [19:46:20] 06Labs: Request creation of status labs project - https://phabricator.wikimedia.org/T148569#2726767 (10Matthewrbowker) Oh, no. This is for specifically for Tool Labs and Labs instances. The intention here is to provide an easy way for volunteers to set up monitoring for their tools and labs instances. My pl... [19:47:19] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [19:48:11] 06Labs: Request creation of status labs project - https://phabricator.wikimedia.org/T148569#2726492 (10Legoktm) shinken can be used for other projects, extdist.wmflabs.org uses it for example. [19:53:33] 06Labs: Request creation of status labs project - https://phabricator.wikimedia.org/T148569#2726793 (10Matthewrbowker) >>! In T148569#2726769, @Legoktm wrote: > shinken can be used for other projects, extdist.wmflabs.org uses it for example. Are there plans to expand shinken to tool labs? Running http checks?... [20:06:09] RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 0.69 ms [20:17:01] PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218) [22:56:09] !log maps Cannot ssh even as root into maps-warper.maps, no changes will be applied there [22:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL, Master [22:56:32] !log tools flip tools-k8s-master-01 to tools-puppetmaster-02 [22:56:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [23:00:13] PROBLEM - Puppet run on tools-worker-1015 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [23:09:49] !log maps Reenabled puppet on maps instances. Maps share mounted at /mnt/nfs/labstore1003 on all instances (T147657) [23:09:50] T147657: Move maps share to labstore1003 - https://phabricator.wikimedia.org/T147657 [23:09:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Maps/SAL, Master [23:10:14] PROBLEM - Puppet run on tools-flannel-etcd-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [23:17:04] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Research-and-Data, 15User-bd808: 2016 Tool Labs user survey - https://phabricator.wikimedia.org/T147336#2727380 (10bd808) [23:25:14] RECOVERY - Puppet run on tools-worker-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [23:25:15] RECOVERY - Puppet run on tools-flannel-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:58:00] 06Labs, 10Tool-Labs: Tools puppet runs hanging - https://phabricator.wikimedia.org/T148244#2727425 (10yuvipanda) I've migrated everything to tools-puppetmaster-02 now, and things seem faster in general.