[00:35:02] <wikibugs>	 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 07Tracking: Issues with 'webservice' kubernetes backend - https://phabricator.wikimedia.org/T139107#2489156 (10yuvipanda)
[00:35:04] <wikibugs>	 10Labs-Kubernetes: Kubernetes does not mount shared path - https://phabricator.wikimedia.org/T141098#2489154 (10yuvipanda) 05Open>03Resolved Cool! I've documented that it is available in /data/project/shared in https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web/Kubernetes
[01:56:19] <YuviPanda>	 !log tools deploy kubernetes v1.3.3wmf1
[01:56:24] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master
[02:04:35] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1213 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[02:21:38] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10DBA: labsdb* has no High Availability solution - https://phabricator.wikimedia.org/T141097#2489261 (10Danny_B)
[02:21:48] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10DBA: Having lots of accounts with separate grants makes auditing difficult. - https://phabricator.wikimedia.org/T141096#2489262 (10Danny_B)
[02:22:01] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10DBA: Users can't run EXPLAIN queries to check the theoretical efficiency of their SQL - https://phabricator.wikimedia.org/T141095#2489263 (10Danny_B)
[02:39:37] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [0.0]
[03:51:50] <wikibugs>	 06Labs, 10Labs-Infrastructure, 06Operations: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593#2489274 (10chasemp) I had to reboot seaborgium today as it froze up and took out ldap with it.  > !log gnt-instance reboot seaborgium.wikimedia.org  I would say...definitely something is sti...
[03:54:26] <shinken-wm>	 PROBLEM - Puppet run on tools-worker-1005 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[03:54:31] <shinken-wm>	 PROBLEM - Puppet run on tools-redis-1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[03:54:49] <shinken-wm>	 PROBLEM - Puppet run on tools-proxy-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[03:55:01] <shinken-wm>	 PROBLEM - Puppet run on tools-worker-1008 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:55:04] <shinken-wm>	 PROBLEM - Puppet run on tools-redis-1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:55:45] <YuviPanda>	 uh 
[03:56:28] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1404 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [0.0]
[03:56:30] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1220 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:56:34] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:56:38] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:56:40] <shinken-wm>	 PROBLEM - Puppet run on tools-docker-registry-01 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0]
[03:56:41] <shinken-wm>	 PROBLEM - Puppet run on tools-web-static-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:56:46] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1408 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[03:56:46] <shinken-wm>	 PROBLEM - Puppet run on tools-merlbot-proxy is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:56:48] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1405 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[03:56:48] <shinken-wm>	 PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 85.71% of data above the critical threshold [0.0]
[03:56:49] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:56:51] <shinken-wm>	 PROBLEM - Puppet run on tools-mail is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:56:55] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:56:59] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1219 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[03:57:03] <shinken-wm>	 PROBLEM - Puppet run on tools-mail-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:57:08] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1216 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:57:08] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[03:57:14] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[03:57:15] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1221 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:57:17] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:57:17] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1203 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[03:57:20] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:57:31] <YuviPanda>	 chasemp ^ just side effects of the ldap outage, verified it is fine now.
[03:57:36] <chasemp>	 puppet is going to take a dive w/ ldap having been down and should recover here
[03:57:45] <chasemp>	 yeah ok cool I was doin gthe same
[03:57:54] <chasemp>	 this is getting to be a serious thing
[03:58:37] <YuviPanda>	 chasemp yeah...
[03:59:09] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1206 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:11] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-generic-1405 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:13] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-gift is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:17] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1410 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:17] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0]
[03:59:19] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1202 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:25] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1403 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:25] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0]
[03:59:25] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1202 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:26] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1210 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:29] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1214 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:30] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1211 is CRITICAL: CRITICAL: 87.50% of data above the critical threshold [0.0]
[03:59:30] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1401 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:30] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[03:59:37] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1217 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:38] <shinken-wm>	 PROBLEM - Puppet run on tools-web-static-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:38] <shinken-wm>	 PROBLEM - Puppet run on tools-checker-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[03:59:40] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0]
[03:59:40] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 87.50% of data above the critical threshold [0.0]
[03:59:42] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1402 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:42] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1209 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:52] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1205 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[03:59:52] <shinken-wm>	 PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:52] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[03:59:52] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1208 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[03:59:53] <shinken-wm>	 PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[04:00:00] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[04:00:02] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[04:00:02] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0]
[04:00:02] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[04:00:04] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[04:00:06] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[04:01:58] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:02:16] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:04:32] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:04:32] <shinken-wm>	 RECOVERY - Puppet run on tools-redis-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:04:52] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:04:52] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:05:04] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:05:04] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:05:58] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[04:06:29] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:06:37] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:06:47] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:06:48] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:06:55] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:07:10] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:07:14] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:07:18] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:09:06] <wm-bot>	 Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Omidfi was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=781262 edit summary: 
[04:09:22] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1210 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[04:09:22] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:09:23] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:09:26] <shinken-wm>	 RECOVERY - Puppet run on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:09:27] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:09:41] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:09:41] <shinken-wm>	 RECOVERY - Puppet run on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:09:41] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:09:42] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:09:50] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1205 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[04:10:04] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:10:06] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1415 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[04:11:46] <shinken-wm>	 RECOVERY - Puppet run on tools-merlbot-proxy is OK: OK: Less than 1.00% above the threshold [0.0]
[04:12:12] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1221 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:14:09] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1206 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:14:19] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1207 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[04:14:31] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:17:03] <shinken-wm>	 RECOVERY - Puppet run on tools-mail-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:19:25] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1210 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:19:26] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:19:52] <shinken-wm>	 RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:20:04] <shinken-wm>	 RECOVERY - Puppet run on tools-redis-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:21:34] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:21:51] <shinken-wm>	 RECOVERY - Puppet run on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0]
[04:22:09] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:24:13] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-generic-1405 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:24:14] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0]
[04:24:15] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:24:18] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:24:22] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:24:23] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:24:36] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:24:50] <shinken-wm>	 RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:24:50] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:24:52] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:25:00] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:25:04] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:25:04] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:25:58] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:26:42] <shinken-wm>	 RECOVERY - Puppet run on tools-web-static-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:27:23] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:29:19] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1202 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:29:39] <shinken-wm>	 RECOVERY - Puppet run on tools-web-static-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:29:41] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:30:01] <shinken-wm>	 RECOVERY - Puppet run on tools-worker-1008 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:30:05] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1415 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:31:31] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1220 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:31:39] <shinken-wm>	 RECOVERY - Puppet run on tools-docker-registry-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:31:47] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:31:47] <shinken-wm>	 RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:34:17] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1207 is OK: OK: Less than 1.00% above the threshold [0.0]
[04:34:49] <shinken-wm>	 RECOVERY - Puppet run on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[05:20:26] <legoktm>	 !log mediawiki-core-team deleted urlshortener instance
[05:20:30] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Mediawiki-core-team/SAL, Master
[05:20:51] <legoktm>	 bd808: ^ fyi
[06:25:24] <YuviPanda>	 legoktm did you get php7 setup?
[06:26:28] <legoktm>	 YuviPanda: heh no. I was going to work on it a few days ago except the labs instance was busted and needed a reboot and I spent all of my time realizing that and was not motivated to actually work on it -.-
[06:27:06] <YuviPanda>	 sounds not unfamiliar unfortunately :(
[09:03:33] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10DBA: labsdb* has no High Availability solution - https://phabricator.wikimedia.org/T141097#2489365 (10jcrespo) > bd808 changed the title from "labsdb* has no HA solution" to "labsdb* has no High Availability solution".  I do not like much this title. I think we all understand...
[09:10:49] <shinken-wm>	 PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[09:17:36] <shinken-wm>	 PROBLEM - SSH on tools-k8s-etcd-02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:29:00] <Yellowcard>	 Hi! Are there currently any known problems regarding creating new tool projects? I tried to create one half an hour ago and I still cannot access it. However, trying to register one with the same name also fails. Any ideas?
[09:33:37] <tom29739>	 Yellowcard: I think LDAP went out earlier today
[09:35:15] <Yellowcard>	 tom29739: ah, that expalins it. Is it still out and/or is there a phab bug?
[09:50:50] <shinken-wm>	 RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[10:11:21] <wikibugs>	 06Labs, 10Tool-Labs: Created tool does not show up, re-creation impossible - https://phabricator.wikimedia.org/T141178#2489411 (10Yellowcard)
[10:39:00] <doctaxon>	 Hi! How do I make a cronjob so silent, that there are no cron status lines in the .out-file such as this:
[10:39:03] <doctaxon>	 [Sat Jul 23 10:20:15 2016] there is a job named 'portal-db' already active
[10:42:15] <tom29739>	 Yellowcard: I'm not sure
[10:43:29] <Yellowcard>	 tom29739: alright, I failed a bug in phabricator. Let's see what the sysops find out, the tool still seems to be stuck somewhere
[10:44:22] <tom29739>	 There were some messages about it at about quarter to five my time this morning: "just side effects of the ldap outage, verified it is fine now", "puppet is going to take a dive w/ ldap having been down and should recover here"
[10:45:10] <tom29739>	 Yellowcard: what's the error message you get when you try and become it?
[10:45:14] <tom29739>	 (if any)
[10:45:56] <Yellowcard>	 trying to become it returns "project doesn't exist" (or something like that), but trying to create one with this name simply returns "creation failed"
[10:46:58] <tom29739>	 Weird
[10:47:00] <Yellowcard>	 however, the first time I created it returned "creation successful" (or similar)
[10:47:13] <tom29739>	 When did you try and create it?
[10:47:37] <Yellowcard>	 puh, let's say 2 hours ago, maybe a little less
[10:48:12] <Yellowcard>	 actually, my second weird tool labs problem within 24 hours :D
[10:48:41] <Yellowcard>	 trying to become it returns: "become: no such tool 'eulenwiki'"
[10:49:01] <tom29739>	 Gimme a sec
[10:49:08] <Yellowcard>	 I'm wondering whether I should create a second one with a different name, but I don't want to create an even bigger mess
[10:49:13] <tom29739>	 Wonder whether the LDAP user was created
[10:50:54] <tom29739>	 Yellowcard: it doesn't appear in the great big list
[11:07:50] <Yellowcard>	 tom29739: mhm, would you suggest trying to create a new job, then, or rather wait?
[11:27:12] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[11:37:58] <tom29739>	 Yellowcard: it's as if the user was never created
[11:38:19] <tom29739>	 Yellowcard: try creating the tool again
[11:38:27] <Yellowcard>	 tom29739: just that I cannot re-create the tool
[11:38:30] <Yellowcard>	 OK, I will!
[11:38:53] <tom29739>	 It's like the tool was only partially created
[11:40:54] <Yellowcard>	 tom29739: "Failed to create service group."
[11:41:50] <tom29739>	 I wonder...
[11:42:26] <shinken-wm>	 RECOVERY - SSH on tools-k8s-etcd-02 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:48:54] <tom29739>	 Yellowcard, that's weird
[11:49:08] <tom29739>	 It shows up in the big list of service groups
[11:49:16] <tom29739>	 https://wikitech.wikimedia.org/wiki/Special:NovaServiceGroup
[11:49:19] <wikibugs>	 10PAWS: Paws display 502 - Bad gateway error - https://phabricator.wikimedia.org/T140578#2489524 (10Ivanhercaz) @yuvipanda Since yesterday I tried to work in my bot but when I follow the instructions ―clean cookies, logout and sign in again―, it works a few minutes and then the "502 - Bad Gateway error" happens...
[11:49:31] <tom29739>	 Select the tools project and go to the very bottom
[11:50:03] <Yellowcard>	 right. I'll try to remove it and then to re-create it, maybe that helps it out
[11:50:30] <Yellowcard>	 mhm, trying to remove it returns "You must be a member of the projectadmin role in project tools to perform this action."
[11:51:58] <tom29739>	 Yeah, I can't get rid
[11:52:02] <tom29739>	 Nor can you
[11:52:14] <tom29739>	 You'll need a tools admin
[11:54:27] <shinken-wm>	 PROBLEM - SSH on tools-k8s-etcd-02 is CRITICAL: Server answer
[11:59:27] <shinken-wm>	 RECOVERY - SSH on tools-k8s-etcd-02 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[12:04:53] <Yellowcard>	 tom29739: do you know who to contact best regarding this issue?
[12:06:28] <tom29739>	 Yellowcard, a tools admin
[12:07:03] <tom29739>	 Yellowcard, it's a Saturday, so I don't think there's many around
[13:38:42] <shinken-wm>	 PROBLEM - SSH on tools-grid-master is CRITICAL: Server answer
[13:57:28] <wikibugs>	 06Labs, 10Tool-Labs: Disable the sumdisc job of AsuraBot - https://phabricator.wikimedia.org/T140909#2489655 (10Luke081515) p:05Triage>03High The Bot is still spamming. Can someone handle that fast please?
[13:58:41] <shinken-wm>	 RECOVERY - SSH on tools-grid-master is OK: SSH OK - OpenSSH_6.9p1 Ubuntu-2~trusty1 (protocol 2.0)
[14:04:40] <shinken-wm>	 PROBLEM - SSH on tools-grid-master is CRITICAL: Server answer
[14:14:28] <wikibugs>	 06Labs, 10Tool-Labs: i18n for https://tools.wmflabs.org/ - https://phabricator.wikimedia.org/T141182#2489670 (10Steinsplitter)
[14:22:53] <wikibugs>	 06Labs, 10Tool-Labs, 13Patch-For-Review: Convert most top level tool and bastion dns redcords to CNAMEs - https://phabricator.wikimedia.org/T131796#2178815 (10AlexMonk-WMF) > The latter is in a special org that requires commandline access on labcontrol1001 to manipulate. This makes it unusable for tools admi...
[14:26:01] <wikibugs>	 06Labs, 10Wikimedia-Site-requests, 10wikitech.wikimedia.org, 13Patch-For-Review: Allow wikitech to write files - https://phabricator.wikimedia.org/T126628#2019112 (10AlexMonk-WMF) I understand a bit more about how the file backends work these days, and I think this would make wikitech attempt to connect to...
[14:29:07] <wikibugs>	 06Labs, 10Tool-Labs, 07I18n: Internationalize Tool Labs' homepage - https://phabricator.wikimedia.org/T105590#2489715 (10Bugreporter)
[14:29:09] <wikibugs>	 06Labs, 10Tool-Labs: i18n for https://tools.wmflabs.org/ - https://phabricator.wikimedia.org/T141182#2489717 (10Bugreporter)
[14:30:43] <wikibugs>	 06Labs, 10Tool-Labs, 07I18n: Internationalize Tool Labs' homepage - https://phabricator.wikimedia.org/T105590#2489718 (10Luke081515) p:05Lowest>03Triage
[14:37:11] <wikibugs>	 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Support reverse dns for public labs IPs - https://phabricator.wikimedia.org/T104521#2489734 (10AlexMonk-WMF) a:03AlexMonk-WMF
[15:55:17] <doctaxon>	 10:39 < doctaxon> Hi! How do I make a cronjob so silent, that there are no cron status lines in the .out-file such as this:
[15:55:20] <doctaxon>	 10:39 < doctaxon> [Sat Jul 23 10:20:15 2016] there is a job named 'portal-db' already active
[15:58:16] <Luke081515>	 doctaxon: jsub with -quiet maybe?
[16:03:30] <doctaxon>	 jsub with -quiet verwende ich ja
[16:03:51] <doctaxon>	 jsub with -quiet does not help
[16:33:08] <Glaisher>	 Could someone make grrrit-wm talk again?
[16:34:37] <wikibugs>	 10Labs-Kubernetes: Odd kubernetes error - https://phabricator.wikimedia.org/T141041#2489800 (10Magnus) I'll have a look at it this weekend, maybe I can code around it in PHP. Then I could continue to use the "default" container.
[16:39:30] <tom29739>	 Glaisher: it isn't talking?
[16:39:32] <tom29739>	 Weird
[16:40:56] <Glaisher>	 tom29739: It hasn't shown any change for hours
[16:51:06] <paladox>	 I think it needs a reboot
[16:52:31] <Glaisher>	 legoktm: YuviPanda ^
[17:50:36] <tom29739>	 Glaisher: would kicking it work?
[17:50:54] <tom29739>	 Don't know much about grrrit-wm 
[17:51:01] <Glaisher>	 I don't think so..
[17:51:34] <tom29739>	 Of course, it could be that there hasn't actually been a gerrit change
[17:51:44] <tom29739>	 But I find that unlikely
[18:01:37] <wikibugs>	 10Tool-Labs-tools-Other: https://tools.wmflabs.org/merlbot-web/ 404s - https://phabricator.wikimedia.org/T85739#2489834 (10bd808)
[18:01:39] <wikibugs>	 10Tool-Labs-tools-Other, 07Tracking: merl tools (tracking) - https://phabricator.wikimedia.org/T69556#2489833 (10bd808)
[19:01:19] <bd808>	 !log tools.lolrrit-wm Killed pod grrrit-wm-230500525-ze741
[19:01:24] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lolrrit-wm/SAL, Master
[19:15:27] <Luke081515>	 bd808: currently here?
[19:15:34] <Luke081515>	 *still here
[19:15:42] <bd808>	 Hi Luke081515 
[19:15:45] <Luke081515>	 hi :)
[19:16:02] <Luke081515>	 can you tell my which s5... user merlbot is?
[19:16:56] <Luke081515>	 a user from de had an idea, we can take a look, IIRC merlbot writes own dbs, and if we can access them, it would be possible to take over the write jobs from merlbot, so we can create a temporary workaround
[19:17:18] <Luke081515>	 but for that I need to know, how the dbs are organised, and if it would be possible to access them
[19:17:19] <bd808>	 Luke081515: Run `id tools.merlbot` on one of the tools bastions
[19:17:38] <Luke081515>	 ah, thx :)
[19:22:08] <wikibugs>	 10Labs-Kubernetes: Odd kubernetes error - https://phabricator.wikimedia.org/T141041#2490008 (10yuvipanda) @Magnus thank you :D You can also go to https://grafana-labs-admin.wikimedia.org/dashboard/db/kubernetes-pods (login with your wikitech username/password, will be made more open soon), and select your tool n...
[19:23:42] <Luke081515>	 bd808: I got another question :). Are you able to look at s51127__temp_transient? It's the only DB from merlbot, I can found at the toolsdb. (I will search at the replicas too now), can you take a look at the schema?
[19:25:16] <wikibugs>	 06Labs, 10Tool-Labs: Disable the sumdisc job of AsuraBot - https://phabricator.wikimedia.org/T140909#2490010 (10yuvipanda) 05Open>03Resolved a:03yuvipanda Done
[19:25:23] <Luke081515>	 YuviPanda: thx
[19:27:57] <tom29739>	 YuviPanda: that labs grafana, I can't log into it
[19:28:12] <tom29739>	 Nor can I use any LDAP console utilities
[19:28:28] <tom29739>	 I think it's because I have 2FA enabled
[19:28:35] <YuviPanda>	 hi tom29739
[19:28:41] <Luke081515>	 tom29739: do you know how many labsdbs do we have, and if there is another DB than these and toolsdb?
[19:28:41] <tom29739>	 But I need it enabled to use Horizon
[19:28:43] <YuviPanda>	 are you using your shell name or wikitech username?
[19:28:57] <YuviPanda>	 I too have 2FA enabled and can log in to all these things
[19:29:22] <tom29739>	 YuviPanda: it's case sensitive isn't it?
[19:29:36] * tom29739 facepalms
[19:29:49] <bd808>	 Luke081515: I think only s2 and s3 are active right now
[19:29:50] <YuviPanda>	 the username? yeah
[19:30:34] <YuviPanda>	 bd808 c1 and c3 (equivalent to labsdb1001 and 1003). s* are prod slices not entirely relevant to labs anymore
[19:30:48] <Luke081515>	 I wonder, because I just found one database for merlbots various entrys. the problem is, that I don't have read for s51127__temp_transient, si I can't take a look, if I can get merlbots data from these tables :-/
[19:30:57] <Luke081515>	 *so
[19:31:36] <bd808>	 I think without help from merl that reviving that bot is a lost cause :/
[19:32:09] <YuviPanda>	 there's no code
[19:32:10] <YuviPanda>	 so...
[19:32:57] <YuviPanda>	 I'm close to adding per-tool http request counts to graphite :D
[19:33:16] <Luke081515>	 I think programms to decompile things exist? But I'm not good in programming java...
[19:33:37] <bd808>	 working with decompiled java is not fun
[19:33:41] <bd808>	 no comments
[19:33:50] <tom29739>	 You'd need to be a Java expert
[19:33:57] <bd808>	 and depending on how it was compiled the symbols may be garbage
[19:34:07] <Luke081515>	 but at least, we maybe can findout the hardcoded DB-tables
[19:34:20] <bd808>	 I *am* a java expert but I wouldn't do it
[19:34:21] <Luke081515>	 maybe...
[19:34:25] <Luke081515>	 hm, ok
[19:34:55] <bd808>	 there are 12 tables in that schema. t, t2 and 10 taxa_* tables
[19:35:05] * tom29739 does not decompile programs when he can help it
[19:35:11] * YuviPanda personally recommends writing from scratch with pywikbot and a proper VCS + CI, involving multiple people
[19:35:36] <bd808>	 and not depending on array jobs and other OGE features
[19:35:45] * YuviPanda likes how JeanFred and Lokal_Profil are handling the heritage API :)
[19:36:49] <Luke081515>	 bd808: is it possible do get read access to that DB? maybe I can find useful information from that DB, I can use
[19:37:14] <bd808>	 The process for that is to ask permission from the owner...
[19:37:36] <bd808>	 since it isn't named *_p
[19:37:53] <tom29739>	 There might be private data in there
[19:38:06] <Luke081515>	 bd808: but I think WMDE-Fisch is added to the tool too, without asking the owner?
[19:38:44] <tom29739>	 YuviPanda: that's a good idea
[19:38:50] <Luke081515>	 at least a DESC for each table would be enough for the first moment
[19:38:59] <bd808>	 Luke081515: then they are an owner and can approve I guess
[19:39:05] * tom29739 hasn't really looked into testing or CI
[19:39:09] <Luke081515>	 so that I can see how the tables are set up, without seing the data in there
[19:41:51] <Luke081515>	 bd808: is it possible to look at the output of the DESC command of each table?
[19:42:37] <bd808>	 Luke081515: https://phabricator.wikimedia.org/P3564
[19:42:43] <Luke081515>	 thx :)
[20:10:03] <Luke081515>	 bd808: I contacted someone of WMDE now, so that they can take a look, if it seems possible (the user I mailed has access to the tool)
[20:11:19] <Luke081515>	 (just FYI)
[20:20:08] <wm-bot>	 Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Thathanka was created, changed by Thathanka link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Thathanka edit summary: Created page with "{{Tools Access Request |Justification=Individual wiki maintenance |Completed=false |User Name=Thathanka }}"
[20:20:31] <Luke081515>	 YuviPanda: Still here?
[20:20:46] <Luke081515>	 bd808: or you?
[20:21:00] <bd808>	 what's up Luke081515 
[20:21:37] <Luke081515>	 bd808: sry, I forgot to mention it at T140909, the actual running job needs to be killed too. It's sum_disc with ID 8965678, running at tools-exec-1404
[20:21:38] <stashbot>	 T140909: Disable the sumdisc job of AsuraBot - https://phabricator.wikimedia.org/T140909
[20:23:50] <bd808>	 !log tools.asurabot killed running job for sum_disc per T140909
[20:23:51] <stashbot>	 T140909: Disable the sumdisc job of AsuraBot - https://phabricator.wikimedia.org/T140909
[20:24:28] <Luke081515>	 bd808: thank you very much :)
[20:25:48] <bd808>	 yw
[20:28:54] * bd808 -> new star trek movie
[20:51:13] <paladox>	 bd808 you watch star trek movies
[20:54:01] <wikibugs>	 06Labs, 10Tool-Labs: Track web request stats for each tool on tool labs - https://phabricator.wikimedia.org/T69880#2490058 (10yuvipanda)
[20:56:47] <wikibugs>	 06Labs, 10Tool-Labs, 13Patch-For-Review: Track web request stats for each tool on tool labs - https://phabricator.wikimedia.org/T69880#2490063 (10yuvipanda) a:03yuvipanda
[21:22:28] <shinken-wm>	 PROBLEM - SSH on tools-k8s-etcd-02 is CRITICAL: Server answer
[21:23:03] <CP678|Laptop>	 kaldari: YuviPanda: section-redirect doesn't exist.
[21:24:08] <kaldari>	 ?
[21:24:58] <CP678|Laptop>	 kaldari: whoops.  I meant to ping ksft
[21:25:57] <ksft>	 yeah, what kind of person starts their name with a K?
[21:26:14] <ksft>	 only I get to do that
[21:26:24] <CP678|Laptop>	 Hmm...good question.  What person starts their name with a k?
[21:26:34] <CP678|Laptop>	 :p
[21:27:15] <CP678|Laptop>	 ksft: be careful kaldari is with WMF.  He will run you over. :D
[21:27:21] <ksft>	 only crazy people who probably spend hours setting up a way to get the program they wrote to maintain an online encyclopedia to run regularly
[21:27:26] <shinken-wm>	 RECOVERY - SSH on tools-k8s-etcd-02 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[21:28:50] <CP678|Laptop>	 ksft: now he will definitely run you over. :p
[21:29:03] <wikibugs>	 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 13Patch-For-Review: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2490102 (10yuvipanda) Happened to tools-k8s-etcd-02 again.
[21:29:45] <ksft>	 I mean, uh, isn't K great?
[21:29:58] <CP678|Laptop>	 lol
[21:31:05] <wikibugs>	 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 13Patch-For-Review: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2490103 (10yuvipanda) ``` [140280.197047] INFO: task sshd:20819 blocked for more than 120 seconds. [140280.198328]       Not tainted 4.4.0-1-amd64 #1 [140280.198794] "...
[21:31:32] <CP678|Laptop>	 ksft: try logging out and logging back in again.
[21:31:36] <ksft>	 okay
[21:31:45] <wikibugs>	 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 13Patch-For-Review: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2490104 (10yuvipanda) It's in labvirt1001 now.
[21:31:56] <ksft>	 still not working
[21:32:15] <CP678|Laptop>	 TimStarling: ^
[21:34:20] <wikibugs>	 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 13Patch-For-Review: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2490106 (10yuvipanda) I'm migrating it to labvirt1013 to see if that helps, since this node has died previously too.
[21:36:17] <shinken-wm>	 PROBLEM - Host tools-k8s-etcd-02 is DOWN: CRITICAL - Host Unreachable (10.68.18.64)
[21:45:22] <shinken-wm>	 RECOVERY - Host tools-k8s-etcd-02 is UP: PING OK - Packet loss = 0%, RTA = 0.68 ms
[21:50:54] <wikibugs>	 06Labs: Two small instances: for WikiToLearn development - https://phabricator.wikimedia.org/T115282#2490135 (10Toma.luca95) >>! In T115282#2481251, @chasemp wrote: >>>! In T115282#1771743, @Toma.luca95 wrote: >> It is possible have a proxy for *.wikitolearn.org/*.wiki2learn.org domains and subdomains with webso...
[21:52:31] <shinken-wm>	 PROBLEM - Puppet staleness on tools-k8s-etcd-02 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0]
[21:52:48] <wikibugs>	 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 13Patch-For-Review: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2490138 (10yuvipanda) I see   ``` [Sat Jul 23 21:44:05 2016] HTB: quantum of class 10002 is big. Consider r2q change. ```   repeated a bunch of times in dmesg, doesn't...
[21:57:31] <shinken-wm>	 RECOVERY - Puppet staleness on tools-k8s-etcd-02 is OK: OK: Less than 1.00% above the threshold [3600.0]
[22:05:24] <wm-bot>	 Change on 12www.mediawiki.org a page Wikimedia Labs was modified, changed by Shirayuki link https://www.mediawiki.org/w/index.php?diff=2199573 edit summary: [+4] translation tweaks
[22:26:42] <wikibugs>	 10Labs-Kubernetes: Odd kubernetes error - https://phabricator.wikimedia.org/T141041#2490190 (10yuvipanda) (URL at https://grafana-labs-admin.wikimedia.org/dashboard/db/kubernetes-tool-combined-stats now)
[22:49:52] <ksft>	 I can't `become` my tool account.
[22:51:58] <Krenair>	 account names?
[22:52:22] <tom29739>	 ksft: yellowcard was having that issue earlier
[22:52:36] <ksft>	 Krenair: ksft
[22:52:44] <ksft>	 the tool is section-redirect
[22:53:06] <ksft>	 Wikitech username is KSFT
[22:53:18] <Krenair>	 seems I'm not much use at the moment, SSH doesn't want to work on my connection
[22:53:36] <Krenair>	 high packet loss
[22:54:14] <tom29739>	 Krenair: I reckon it's the same issue as before
[22:54:22] <Krenair>	 what was the problem?
[22:54:41] <ksft>	 I just created the tool, by the way. I probably should have mentioned that before.
[22:54:44] <tom29739>	 The service group gets created: it appears in https://wikitech.wikimedia.org/wiki/Special:NovaServiceGroup
[22:55:01] <ksft>	 I created it a little over an hour and a half ago.
[22:55:02] <tom29739>	 The tool doesn't appear in the big list of tools
[22:55:07] <ksft>	 I noticed that.
[22:55:33] <tom29739>	 And you can't become it, it says the tool does not exist
[22:55:55] <tom29739>	 Krenair: do you know why a tool would be only partially created?
[22:56:06] <Krenair>	 to be honest I don't know much about our service group system
[22:56:10] <tom29739>	 I know there was an LDAP outage last night
[22:56:16] <tom29739>	 But that's been resolved
[22:56:31] <Krenair>	 yeah, that probably wouldn't be relevant
[22:57:20] <Krenair>	 dn: cn=tools.section-redirect,ou=servicegroups,dc=wikimedia,dc=org
[22:57:26] <Krenair>	 member: uid=ksft,ou=people,dc=wikimedia,dc=org
[23:01:31] <Krenair>	 rather unhelpfully you can't read sudoers files without being root, and I don't currently have admin in that project
[23:05:24] <tom29739>	 Krenair: I told yellowcard: it's a Saturday
[23:05:56] <tom29739>	 There aren't likely to be many around unfortunately
[23:06:38] <Krenair>	 ksft, it says no such tool?
[23:09:19] <Krenair>	 ksft, try `sudo -niu tools.section-redirect"
[23:17:20] <Krenair>	 LDAP has the group and you as a member, `become` won't work because /data/project/section-redirect doesn't exist - this might also be why it doesn't appear on the list
[23:17:33] <Krenair>	 I'm not yet sure what creates those directories
[23:21:37] <YuviPanda>	 !log tools restart maintain-kubeusers on tools-k8s-master-01, was stuck on connecting to seaborgium preventing new tool creation
[23:21:41] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master
[23:22:51] <wikibugs>	 06Labs, 10Tool-Labs: Add appropriate timeouts to maintain-kubeusers - https://phabricator.wikimedia.org/T141203#2490212 (10yuvipanda)
[23:22:58] <YuviPanda>	 krenair ^ was the cause
[23:23:06] <YuviPanda>	 ksft Yellowcard your tools should work now
[23:24:36] <wikibugs>	 06Labs, 10Labs-Infrastructure: Unable to SSH onto tools-login.wmflabs.org - https://phabricator.wikimedia.org/T130446#2490225 (10yuvipanda) 05Open>03Resolved a:03yuvipanda
[23:24:55] * YuviPanda goes afk again
[23:25:15] <YuviPanda>	 tom29739 I've renamed dashboard again, and it has web request stats as well! https://grafana-labs-admin.wikimedia.org/dashboard/db/kubernetes-tool-combined-stats
[23:25:39] <tom29739>	 I tried it
[23:25:44] <tom29739>	 It showed nothing
[23:26:12] <tom29739>	 OK...
[23:26:21] <tom29739>	 It seems to be working now
[23:26:25] <YuviPanda>	 I fixed it up again, been fiddling with it
[23:26:34] <tom29739>	 Though nothing shows for web requests
[23:26:50] <YuviPanda>	 web requests are being recorded only in the last few hours
[23:27:00] <YuviPanda>	 so maybe there have been no webrequests to that tool in that time?
[23:27:49] <Krenair>	 maintain-kubeusers is responsible for creating servicegroup dirs under /data/project?
[23:28:16] <tom29739>	 YuviPanda: I made some web requests
[23:28:19] <tom29739>	 It works
[23:28:24] <Krenair>	 I guess it could be worse - I was looking at labstore's create-dbusers :)
[23:28:27] <YuviPanda>	 :)
[23:29:33] <YuviPanda>	 krenair used to be worse, there was a bash script running in a loop
[23:29:38] <Krenair>	     dirpath = os.path.join('/data', 'project', user.name, '.kube')
[23:29:41] <Krenair>	     os.makedirs(dirpath, mode=0o775, exist_ok=False)
[23:29:42] <Krenair>	 :/
[23:29:54] <tom29739>	 God, I bet that was unreliable
[23:29:59] <YuviPanda>	 this meant there were three things racing - the bash script (toolswatcher), create-dbusers and maintain-kubeusers
[23:30:02] <tom29739>	 (the bash script)
[23:30:10] <YuviPanda>	 so I folded two of them together
[23:30:35] <tom29739>	 YuviPanda: I like this one: https://grafana-labs-admin.wikimedia.org/dashboard/db/tools-activity
[23:30:48] <YuviPanda>	 krenair you want create_homedir
[23:30:53] <tom29739>	 Though it doesn't show the web requests
[23:31:07] <Krenair>	 YuviPanda, lovely
[23:31:22] <tom29739>	 "N/Areq/min
[23:31:25] <tom29739>	 "
[23:31:28] <YuviPanda>	 tom29739  yeah...
[23:32:12] <YuviPanda>	 tom29739 graphite labs is set to 'proxy mode' which I guess causes some of the problems. I'll try to investeigate moving it to 'direct' mode next week
[23:32:13] <YuviPanda>	 should be less flaky
[23:32:23] <YuviPanda>	 krenair yeah... incrementally less shitty tho
[23:32:56] <ksft>	 I just went afk for a while. What was the answer?
[23:32:59] <ksft>	 It should work now?
[23:33:14] <ksft>	 oh, it does
[23:33:45] <Krenair>	 YuviPanda, so is this now on the list of things that break whenever ldap does?
[23:33:58] <Luke081515>	 unexpected, if things work, like originally planned \o/ <ironic />
[23:35:10] <YuviPanda>	 krenair yeah
[23:35:21] <YuviPanda>	 the problem was that it didn't recover when ldap did
[23:36:03] <tom29739>	 YuviPanda: what's the difference between direct and proxy mode?
[23:36:33] <YuviPanda>	 tom29739 direct mode will make request to graphite-labs.wikimedia.org directly from your browser
[23:36:54] <YuviPanda>	 proxy mode makes it to grafana-labs-admin.wikimedia.org, which then makes a request from there to graphite-labs.wikimedia.org
[23:38:16] <tom29739>	 YuviPanda: why wasn't it in direct mode in the first place?
[23:38:34] <YuviPanda>	 because it doesn't work :)
[23:38:38] <YuviPanda>	 and I haven't had time to investigate yet
[23:38:56] <YuviPanda>	 krenair https://gerrit.wikimedia.org/r/#/c/300741/ should make it better
[23:40:04] <YuviPanda>	 alright, now I go for realz
[23:40:07] <YuviPanda>	 ttyl!
[23:41:03] * Krenair waves