[01:16:43] Hi, I know tools are supposed to be read only but should this affect ssh login? [01:16:52] It seems I cannot login any more with ssh [01:17:30] debug1: Connecting to tools-login.wmflabs.org [208.80.155.163] port 22. [01:17:36] ... [01:17:41] debug1: Authentications that can continue: publickey,keyboard-interactive,hostbased [01:17:41] debug1: Next authentication method: publickey [01:17:41] debug1: Offering RSA public key: /home/hr/.ssh/id_rsa [01:17:42] debug1: Server accepts key: pkalg ssh-rsa blen 279 [01:17:43] Connection closed by 208.80.155.163 [01:17:59] hroest: Same problem here. Word is that they're working on it. [01:18:01] anybody else have similar issuse? [01:18:03] ah ok, [01:18:10] at least its not just me :-) [01:18:26] well I wish them good luck [01:18:31] $ ssh matthewrbowker@login.tools.wmflabs.org [01:18:31] Connection to login.tools.wmflabs.org closed by remote host. [01:18:31] Connection to login.tools.wmflabs.org closed. [01:56:43] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/ChongDae was created, changed by ChongDae link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/ChongDae edit summary: Created page with "{{Tools Access Request |Justification=Running Bot for kowiki |Completed=false |User Name=ChongDae }}" [02:14:28] For what it's worth, we're all four of us frantically jamming the pieces of toollabs back together :( [02:31:03] !log tools reboot tools-exec-1406 [02:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [02:56:02] !log tools reboot tools-exec-1405 to ensure noauto works (because atboot=>false is lies) [02:56:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [03:21:56] !log tools reboot tools-checker-01 [03:21:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:31:59] !log tools.nagf Restarted webservice to get access to r/w NFS [04:32:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.nagf/SAL [04:39:58] bd808: hmm, was that needed? [04:39:59] interesting [04:40:08] maybe I should do a restart of all webservices once we come back [04:40:24] yuvipanda: yeah. it had an error message about not being able to write to a cache dir [04:41:30] hey have we gotten enough jobs off of precise nodes to start shutting some of them down? [04:46:42] bd808: yeah [04:47:15] bd808: not today, but I think we can cut out 50% of them maybe in a few days [04:48:45] HaeB: paws is back now [04:49:15] saw it, thanks :) [04:59:26] yuvipanda: are you gonna do an webservice restart [04:59:39] all webservice* [05:03:02] madhuvishy: considering, but also hungry. [05:03:23] yuvipanda: i was wondering if i should send email or wait [05:03:27] till you do that [05:03:41] madhuvishy: ok, I'll do it now [05:03:52] we also have to switch back to the no /dev/null writing thing no? [05:04:04] yuvipanda: ^ [05:04:22] madhuvishy: nah, that I want to do tomorrow only [05:04:28] yuvipanda: okay cool [05:04:30] madhuvishy: in case stuff goes south. don't want 50g of logs [05:04:33] yes [05:04:49] thanks for the hard work guys! everything is running smoothly now [05:05:13] yay thanks musikanimal :) [05:05:23] I did notice my app is taking an unusually long time to update after I pulled in new code. Not sure if that's related to the maintenance [05:05:53] !log tools restarting all webservices on gridengine [05:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:05:58] musikanimal: what do you mean by 'update'? [05:06:20] madhuvishy: ok, I'll wait for this to finish and then start k8s. [05:06:25] I git pulled in some new HTML (and some JS), it's there in public_html directory but not in my browser [05:06:28] I've cleared the cache, etc. [05:07:10] let me try a different app, see if it's isolated to just this one [05:07:15] musikanimal: let us know if it's repeatable? [05:07:17] yeah okay [05:11:34] madhuvishy: yeah it seems none of them are updating. They're running on k8s [05:11:45] you should see a big "BLAH BLAH" instead of "Langviews Analysis" http://tools.wmflabs.org/langviews-test/ [05:12:03] there I edited the index.php directly [05:15:44] musikanimal: ah hmmm [05:16:53] musikanimal: can you try restarting the webservice? [05:17:23] tools.wmflabs.org is down atm [05:17:38] chasemp: probably going through the webservice restart cycle? [05:17:41] chasemp: probably momentary [05:17:45] yeah - it's up now [05:17:46] chasemp: and seems up to e too [05:17:55] Ah ok [05:20:40] !log tools restart all k8s webservices too [05:20:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:21:56] musikanimal: restart seems to have brought it back to liveness [05:22:00] musikanimal: try again? [05:22:07] I see BLAH BLAH! [05:22:09] :) [05:22:12] okay [05:22:25] so sorry for my ignorance, with k8s I can still do `webservice restart`? [05:23:52] musikanimal: yes [05:24:31] okay cool. Thanks! [05:26:19] i brought shinken back [05:27:47] yuvipanda: let me know when all restarts are done? [05:28:16] madhuvishy: yup will do [05:28:26] thanks! [05:29:11] madhuvishy: in the email you send out, mention that logs are still not being written until tomorrow? [05:29:17] yuvipanda: yes [05:29:25] madhuvishy: thanks [05:44:35] yuvipanda: Hm.. labs grafana/graphite is still a bit of a roulette. Getting the project dashboard to show 3 graphs is hard. Usually 2/3 or 3/3 are broken. [05:44:44] * Krinkle keeps refreshing until all three render.. [05:44:49] https://grafana-labs.wikimedia.org/dashboard/db/labs-project-board?from=now-3h&to=now&var-project=cvn&var-server=cvn-app4&var-server=cvn-app6 [06:25:36] ssh back, but php can't work? [06:30:59] Shizhao: what issue are you facing? [06:35:55] labs reboot, php not work, see https://tools.wmflabs.org/pub/ [06:36:23] I'm not change code [06:36:26] that page works for me? [06:37:06] just html work [06:38:47] 3 hours ago page is ok, now no [06:39:44] error.log no errors were logged [06:39:53] yeah, logs won't work for another 12h [06:40:11] :( [06:41:00] Is maintenance caused? [06:42:18] it shouldn't have affected it in any way, but it'll be hard to debug without access to error log [06:43:20] Shizhao: so I'd reccomend waiting for the logs to come back in ~12h before attempting to debug [06:43:33] Shizhao: I'm also going to just try restarting it, to see if that helps [06:45:10] Shizhao: try now? [06:45:22] !log tools.pub restarted with webservice stop && webservice --backend=kubernetes start [06:45:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.pub/SAL [06:51:03] thx [06:54:09] Shizhao: does it work for you? [07:05:57] yuvipanda: no :( [07:07:21] Shizhao: alright. let's check back tomorrow then [07:10:20] en [07:14:56] It seems that all php internal functions are not working [07:15:32] curl is ok [07:34:34] PROBLEM - Free space - all mounts on tools-docker-builder-03 is CRITICAL: CRITICAL: tools.tools-docker-builder-03.diskspace.root.byte_percentfree (<10.00%) [15:47:18] Is there a way I can get the stdout and stderr of 'generic' webservice jobs? I saw https://gerrit.wikimedia.org/r/319798 got merged yesterday which looks like it might be relevant but perhaps output wasn't being logged to ~/error.log before this either [15:59:53] tarrow: We are not logging those, I sent a note last night. We'll switch it back within the next couple hours [16:04:10] Ah cool, so once you switch it back I should see output in ~/error.log again? [16:12:27] tarrow: yes [16:18:25] hi yuvipanda and madhuvishy: I'm getting a "502 Bad Gateway" error when attempting to start a PAWS server. is PAWS still down? [16:18:39] PROBLEM - Puppet run on tools-mail-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:21:26] zareen: shouldn't be - i'm looking [16:21:54] RECOVERY - Puppet staleness on tools-puppetmaster-02 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:22:01] zareen: it looks fine to me now [16:22:20] could you check again? [16:22:22] RECOVERY - Puppet staleness on tools-docker-builder-03 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:23:04] RECOVERY - Puppet staleness on tools-elastic-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:23:06] RECOVERY - Puppet staleness on tools-flannel-etcd-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:23:14] RECOVERY - Puppet staleness on tools-docker-registry-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:24:36] RECOVERY - Puppet staleness on tools-elastic-03 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:25:05] madhuvishy: still not working for me [16:25:16] RECOVERY - Puppet staleness on tools-mail-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:26:08] RECOVERY - Puppet staleness on tools-k8s-etcd-03 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:26:41] zareen: can you try hard reloading? i am on phone, can look in a bit [16:26:51] RECOVERY - Puppet staleness on tools-elastic-02 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:27:17] RECOVERY - Puppet staleness on tools-prometheus-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:27:17] RECOVERY - Puppet staleness on tools-k8s-etcd-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:27:45] RECOVERY - Puppet staleness on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:27:51] RECOVERY - Puppet staleness on tools-flannel-etcd-03 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:28:35] RECOVERY - Puppet staleness on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:28:41] RECOVERY - Puppet staleness on tools-logs-02 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:28:45] RECOVERY - Puppet staleness on tools-prometheus-02 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:28:45] RECOVERY - Puppet staleness on tools-flannel-etcd-02 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:29:37] RECOVERY - Puppet staleness on tools-redis-1002 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:29:53] RECOVERY - Puppet staleness on tools-redis-1001 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:30:52] RECOVERY - Puppet staleness on tools-k8s-etcd-02 is OK: OK: Less than 1.00% above the threshold [3600.0] [16:31:03] madhuvishy: hard reload didn't fix it [16:31:11] zareen: ok looking now [16:33:38] RECOVERY - Puppet run on tools-mail-01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:07:45] bd808: sorry to bug you. If you have a chance could you take a look at T149709? [17:07:45] T149709: Possible use of tools-lab-elasticsearch cluster - https://phabricator.wikimedia.org/T149709 [17:11:39] tarrow: man. I just keep forgetting about you. :( I'll run the "fancy" new process today and get you your creds. [17:12:30] zareen: sorry, can you check now? [17:13:17] madhuvishy: all good now, thanks for the help [17:13:26] zareen: cool :) np! [17:24:01] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:26:57] madhuvishy: what did you do to fix? [17:28:09] PROBLEM - Puppet run on tools-exec-1419 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:28:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:28:17] PROBLEM - Puppet run on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:29:23] chasemp: paws? I logged in as the pod tool, did kubectl get pods - it didn't have a pod for zareen - but i just deleted the hub and proxy pods and they respawned [17:29:33] paws tool* [17:29:40] PROBLEM - Puppet run on tools-exec-1408 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:29:54] i realized i didn't know anything about this setup on k8s [17:31:54] PROBLEM - Puppet run on tools-exec-1213 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:32:30] PROBLEM - Puppet run on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:32:54] PROBLEM - Puppet run on tools-exec-1211 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:32:58] PROBLEM - Puppet run on tools-exec-1403 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:33:02] PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:33:20] PROBLEM - Puppet run on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:33:32] PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:33:48] PROBLEM - Puppet run on tools-exec-1402 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:35:00] PROBLEM - Puppet run on tools-exec-1202 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:35:07] PROBLEM - Puppet run on tools-exec-1221 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:35:13] PROBLEM - Puppet run on tools-exec-1413 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:35:49] PROBLEM - Puppet run on tools-bastion-05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:36:13] PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:36:19] PROBLEM - Puppet run on tools-exec-1206 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:37:13] PROBLEM - Puppet run on tools-webgrid-lighttpd-1206 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:38:15] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:38:39] PROBLEM - Puppet run on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:38:43] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:39:09] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:39:33] PROBLEM - Puppet run on tools-exec-1414 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:39:39] PROBLEM - Puppet run on tools-exec-1416 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:40:31] PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:42:06] PROBLEM - Puppet run on tools-exec-1210 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [17:42:26] PROBLEM - Puppet run on tools-checker-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:43:10] PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:43:52] PROBLEM - Puppet run on tools-exec-1214 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:44:16] PROBLEM - Puppet run on tools-webgrid-lighttpd-1205 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:44:19] PROBLEM - Puppet run on tools-exec-1201 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:44:20] PROBLEM - Puppet run on tools-mail is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:45:24] PROBLEM - Puppet run on tools-exec-1208 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:46:20] PROBLEM - Puppet run on tools-exec-1410 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:46:50] PROBLEM - Puppet run on tools-exec-1217 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:47:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1210 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:47:13] PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:47:29] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:47:41] PROBLEM - Puppet run on tools-exec-1216 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:47:43] PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:47:53] PROBLEM - Puppet run on tools-exec-gift is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:48:18] andrewbogott: is this ldap issues^? [17:48:21] PROBLEM - Puppet run on tools-exec-1412 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:48:57] PROBLEM - Puppet run on tools-exec-1418 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:49:01] PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:49:07] PROBLEM - Puppet run on tools-webgrid-lighttpd-1202 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:49:15] PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:49:23] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:49:51] PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:49:53] PROBLEM - Puppet run on tools-exec-1220 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:50:11] PROBLEM - Puppet run on tools-cron-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:50:20] PROBLEM - Puppet run on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:50:20] I no longer think it's ldap, but yeah, it's all those puppet failures that I was trying to investigate [17:50:38] PROBLEM - Puppet run on tools-webgrid-lighttpd-1415 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:51:12] PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:51:56] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:53:16] PROBLEM - Puppet run on tools-exec-1420 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:54:16] PROBLEM - Puppet run on tools-exec-1207 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:54:24] PROBLEM - Puppet run on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:54:40] PROBLEM - Puppet run on tools-exec-1209 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:55:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:55:13] PROBLEM - Puppet run on tools-exec-1205 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:55:15] PROBLEM - Puppet run on tools-exec-1203 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:56:02] PROBLEM - Puppet run on tools-exec-1415 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:57:12] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:57:32] PROBLEM - Puppet run on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:57:36] PROBLEM - Puppet run on tools-webgrid-lighttpd-1204 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:57:38] PROBLEM - Puppet run on tools-exec-1215 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:57:50] PROBLEM - Puppet run on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:58:08] PROBLEM - Puppet run on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:58:08] PROBLEM - Puppet run on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:58:24] PROBLEM - Puppet run on tools-exec-1219 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:59:02] PROBLEM - Puppet run on tools-exec-1218 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [17:59:12] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:59:36] PROBLEM - Puppet run on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:02:45] bd808: thanks! I really appreciate it :-) [18:03:55] tarrow: let me know if you run into problems. I've been the only user of this elasticsearch cluster so far so you are a bit of a test case :) [18:04:01] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:08:55] are atm nfs problems? [18:09:24] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [18:12:23] Steinsplitter: can you be more specific? [18:13:01] Steinsplitter: i dont believe so probably just servers that are restarting or puppets not being used by puppetmasters atm [18:13:12] RECOVERY - Puppet run on tools-exec-1420 is OK: OK: Less than 1.00% above the threshold [0.0] [18:14:02] chasemp: i get errors with strange phats such as /mnt/nfs/labstore-secondary-tools-project/ and i can't git pull etc. just wondering if nfs problem atm otherwise i need to find out what is broken now. [18:14:21] I'm not sure what this means 'strange phats' [18:14:44] but that path works for me w/ my own tool on tools-bastion-03 [18:14:56] that is the mount that provides /data/project fyi [18:16:22] How do i move between bastions? [18:16:42] logout of one and into another [18:17:31] I just get 02 [18:17:49] zppix|mobile: try direct: tools-bastion-03.eqiad.wmflabs [18:18:06] Ah ok i was just curious [18:18:09] RECOVERY - Puppet run on tools-exec-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [18:18:19] RECOVERY - Puppet run on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [18:18:43] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [18:18:49] RECOVERY - Puppet run on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [18:19:09] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [18:19:11] RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [18:19:31] RECOVERY - Puppet run on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [18:19:37] RECOVERY - Puppet run on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [18:19:39] RECOVERY - Puppet run on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [18:19:53] Hey bd808 stashbot needs rejoined to my chan again [18:20:12] RECOVERY - Puppet run on tools-exec-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [18:20:31] zppix|mobile: restarting now.... [18:20:34] RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [18:20:38] RECOVERY - Puppet run on tools-webgrid-lighttpd-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [18:21:02] RECOVERY - Puppet run on tools-exec-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [18:21:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [18:21:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [18:21:20] RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [18:22:30] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [18:22:46] RECOVERY - Puppet run on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [18:23:00] RECOVERY - Puppet run on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [18:23:08] RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [18:23:40] RECOVERY - Puppet run on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [18:23:47] chasemp: \o/ wors again. likely just a machine was bus. thanks. [18:23:55] *k [18:24:08] PROBLEM - Puppet run on tools-exec-1419 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:24:12] PROBLEM - Puppet run on tools-exec-1420 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:24:18] PROBLEM - Puppet run on tools-exec-1412 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:24:26] PROBLEM - Puppet run on tools-webgrid-lighttpd-1401 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [18:24:41] PROBLEM - Puppet run on tools-exec-1417 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:24:43] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:25:09] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [18:25:13] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:25:19] RECOVERY - Puppet run on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [18:25:35] PROBLEM - Puppet run on tools-exec-1414 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [18:26:13] PROBLEM - Puppet run on tools-exec-1413 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:26:33] PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:27:01] PROBLEM - Puppet run on tools-exec-1415 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [18:27:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [18:27:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:32:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [18:33:04] RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [18:33:06] RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [18:33:20] RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [18:33:58] RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [18:34:24] RECOVERY - Puppet run on tools-webgrid-lighttpd-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [18:34:34] RECOVERY - Puppet run on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [18:35:12] RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [18:36:32] RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [18:36:48] looks like the cron box is broken [18:36:57] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [18:37:01] RECOVERY - Puppet run on tools-exec-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [18:37:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [18:37:09] error: SGE_ROOT directory "/var/lib/gridengine" doesn't exist [18:37:28] Betacommand: ok thanks, somewhat of a known freak thing we are cleaning up [18:37:29] RECOVERY - Puppet run on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [18:37:30] I'll hit that next [18:37:57] Is kubectl affected? [18:38:04] shouldn't be at all [18:38:07] chasemp: np. thought I would let you know since I havent seen any outage emails [18:38:11] RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [18:38:17] RECOVERY - Puppet run on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [18:38:38] it's entirely internal grid engine house keeping shenanigans [18:39:09] RECOVERY - Puppet run on tools-exec-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [18:39:23] RECOVERY - Puppet run on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [18:39:51] RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [18:41:09] RECOVERY - Puppet run on tools-exec-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [18:43:18] Betacommand: cron should be ok [18:44:16] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [18:44:42] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [18:45:08] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [18:45:34] RECOVERY - Puppet run on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [18:49:18] RECOVERY - Puppet run on tools-exec-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [18:49:41] RECOVERY - Puppet run on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [18:54:19] RECOVERY - Puppet run on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [18:55:13] RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:57:16] !log tools.wikibugs restarted wikibugs [18:57:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [18:57:59] 06Labs, 10DBA: Recover/Rename p50380g51020_perfectbot on replica LabsDBs or toolDBs - https://phabricator.wikimedia.org/T150659#2796674 (10jcrespo) 05Open>03declined I apologize again. [18:58:56] RECOVERY - Puppet run on tools-exec-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [18:59:12] RECOVERY - Puppet run on tools-exec-1420 is OK: OK: Less than 1.00% above the threshold [0.0] [19:05:01] RECOVERY - Puppet run on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [19:05:15] RECOVERY - Puppet run on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [19:06:03] 06Labs, 10Labs-Infrastructure, 07LDAP: Remove shell user "80686" - https://phabricator.wikimedia.org/T63967#2796723 (10demon) >>! In T63967#2600758, @MoritzMuehlenhoff wrote: > validnames is a configuration setting of nslcd and configured via a regex in puppet. There's a comment that the regex must be kept i... [19:10:55] 06Labs, 10Gerrit, 10wikitech.wikimedia.org, 07LDAP: Alter full name on Gerrit - https://phabricator.wikimedia.org/T149976#2796754 (10demon) 05Open>03Resolved Sorry for the delay, all done. Please note, your shell name remains `gryllida` as those are not changed. Your display name and what you login to... [19:23:29] RECOVERY - Puppet run on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [19:25:13] RECOVERY - Puppet run on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [19:26:18] RECOVERY - Puppet run on tools-exec-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [19:29:16] RECOVERY - Puppet run on tools-exec-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [19:30:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [19:35:23] RECOVERY - Puppet run on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [19:37:05] RECOVERY - Puppet run on tools-exec-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [19:39:41] RECOVERY - Puppet run on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [19:41:54] RECOVERY - Puppet run on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [0.0] [19:42:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [19:42:32] RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [19:42:36] RECOVERY - Puppet run on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [19:42:54] RECOVERY - Puppet run on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [0.0] [19:43:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [19:44:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [19:44:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [19:47:11] RECOVERY - Puppet run on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [19:47:13] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [19:47:39] RECOVERY - Puppet run on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [19:47:41] RECOVERY - Puppet run on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [19:48:32] is it normal that i have jsub error on dev.tools but not on login.tools ? [19:48:32] error: SGE_ROOT directory "/var/lib/gridengine" doesn't exist [19:48:51] Framawiki: no it's a known issue we are dealing w/ [19:48:55] RECOVERY - Puppet run on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [19:49:05] Framawiki: what host is that? [19:51:36] tools.framabot@tools-bastion-02 [19:51:46] dev.login [19:51:49] RECOVERY - Puppet run on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0] [19:51:52] !log reboot tools-precise-dev [19:52:11] Unknown project "reboot" [19:52:20] !log tools reboot tools-precise-dev [19:52:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:52:29] Framawiki: sure I'll handle it, will require a quick reboot tho [19:52:38] thanks you [19:54:02] RECOVERY - Puppet run on tools-exec-1218 is OK: OK: Less than 1.00% above the threshold [0.0] [19:58:22] RECOVERY - Puppet run on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0] [19:59:52] RECOVERY - Puppet run on tools-exec-1220 is OK: OK: Less than 1.00% above the threshold [0.0] [20:00:06] RECOVERY - Puppet run on tools-exec-1221 is OK: OK: Less than 1.00% above the threshold [0.0] [20:12:26] RECOVERY - Puppet run on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:13:14] RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [20:15:02] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:15:38] PROBLEM - Host tools-puppetmaster-01 is DOWN: CRITICAL - Host Unreachable (10.68.22.61) [20:17:13] RECOVERY - Puppet run on tools-webgrid-lighttpd-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [20:17:53] RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [20:19:19] RECOVERY - Puppet run on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [20:20:42] run now, thanks ! [20:22:42] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:40:46] RECOVERY - Puppet run on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [21:00:02] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:01:09] 06Labs, 10Labs-Infrastructure, 10DBA, 07Epic, 07Tracking: Labs databases rearchitecture (tracking) - https://phabricator.wikimedia.org/T140788#2797093 (10jcrespo) [21:05:51] yuvipanda hi im wondering if we could switch wikibugs over to kubunetes as long as it is ok with legoktm and valhallasw`vecto please? [21:06:02] It would make restarting the bot easyer. [21:06:19] I'm not a wikibugs maintainer, so you've to ask them and not me :) [21:06:47] Oh ok [21:06:49] sorry [21:09:07] 06Labs, 10Labs-Infrastructure, 10DBA: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2797122 (10jcrespo) [21:10:53] Paladox: uh, why would that make restarting easier? [21:11:42] According to https://www.mediawiki.org/wiki/Wikibugs#Deploying_changes you have to install fab on your local pc [21:11:45] But no, I'd rather not. The current infrastructure is built around SGE [21:11:48] Ok [21:11:50] Yes [21:11:52] So? [21:12:14] With kubenetes you doint need to install anything on your local pc [21:13:56] Yes. And the idea is to develop locally, the push the changes to tools, so you need a local setup anyway. [21:15:22] Oh [21:21:28] valhallasw`vecto im wondering could we upgrade irc3 on wikibugs to 0.9.x please?, it requires python 3.3+ no longer supports python 2.x, could we upgrade python on wikibugs please? [21:25:08] (03Draft1) 10Paladox: Wikibugs: Update irc3 to 0.8.9 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321742 [21:25:11] (03Draft2) 10Paladox: Wikibugs: Update irc3 to 0.8.9 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321742 [21:25:26] (03CR) 10jenkins-bot: [V: 04-1] Wikibugs: Update irc3 to 0.8.9 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321742 (owner: 10Paladox) [21:25:43] Paladox: why? [21:26:19] It's currently working. If there is no reason to change things, don't change then [21:26:24] Ok [21:26:32] (03Abandoned) 10Paladox: Wikibugs: Update irc3 to 0.8.9 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321742 (owner: 10Paladox) [21:28:18] (03Draft1) 10Paladox: Fix tox-jessie [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321743 [21:28:20] (03Draft2) 10Paladox: Fix tox-jessie [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321743 [21:28:48] PROBLEM - Puppet run on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:28:56] (03CR) 10Paladox: "Fix failures found in https://integration.wikimedia.org/ci/job/tox-jessie/13524/console" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321743 (owner: 10Paladox) [21:32:28] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808, 10Wikimedia-Developer-Summit (2017): Developing community norms for vital bots and tools - https://phabricator.wikimedia.org/T149312#2797257 (10bd808) [21:38:24] 06Labs, 10Labs-Infrastructure, 06Operations, 10netops, and 2 others: Provide public access to OpenStack APIs - https://phabricator.wikimedia.org/T150092#2797271 (10Andrew) [21:42:28] 06Labs, 10Labs-Infrastructure, 06Operations, 10netops, and 2 others: Provide public access to OpenStack APIs - https://phabricator.wikimedia.org/T150092#2797284 (10Andrew) [21:43:47] 06Labs, 10Labs-Infrastructure, 06Operations, 10netops, and 2 others: Provide public access to OpenStack APIs - https://phabricator.wikimedia.org/T150092#2773828 (10Andrew) [21:44:56] 06Labs, 10Labs-Infrastructure, 06Operations, 10netops, and 2 others: Provide public access to OpenStack APIs - https://phabricator.wikimedia.org/T150092#2797286 (10Andrew) [21:45:39] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808, 10Wikimedia-Developer-Summit (2017): Developing community norms for vital bots and tools - https://phabricator.wikimedia.org/T149312#2797290 (10bd808) [21:46:15] PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [21:49:37] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808, 10Wikimedia-Developer-Summit (2017): Developing community norms for vital bots and tools - https://phabricator.wikimedia.org/T149312#2797294 (10bd808) Call for participation [[https://lists.wikimedia.org/pipermail/labs-l/2016-November/004771.... [22:05:22] are you all aware grid engine is not working [22:06:55] Zppix: I can do things, that's a difficult statement, can you be more specific? [22:08:32] actually my jobs are still working in the grid, idk what the problem is? [22:08:55] Zppix ^^ [22:09:01] try running qstats [22:09:04] qstat* [22:09:18] it works fine for me as a tool I have [22:09:33] from where, for what tool, etc are you doing this [22:10:25] oh wait webservice migrated to kubectl didnt it? [22:10:53] not by default, but it is possible that you moved your webservice [22:11:03] try `webservice status` [22:13:05] weird now it works was grid engine down for maintence earlier this afternoon (utc -5) [22:13:38] PROBLEM - Puppet run on tools-exec-1406 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [22:33:32] 10Tool-Labs-tools-Zppixbot, 05Goal: Complete config - https://phabricator.wikimedia.org/T148944#2797438 (10Zppix) Thanks to @tom29739 we now have the bot on kubectl and running perfectly so far, now all is left is to make sure config is satisfactory for the time being at least and doing some other customizatio... [22:33:47] RECOVERY - Puppet run on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [22:48:38] RECOVERY - Puppet run on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [22:50:40] is it possible to ssh to a job thats on kubectl like you could with grid engine jobs [22:52:57] (03CR) 10Legoktm: [C: 032] Fix tox-jessie [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321743 (owner: 10Paladox) [22:53:21] (03Merged) 10jenkins-bot: Fix tox-jessie [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321743 (owner: 10Paladox) [22:53:30] (03CR) 10Paladox: "Thanks." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321743 (owner: 10Paladox) [22:53:53] (03Restored) 10Paladox: Wikibugs: Update irc3 to 0.8.9 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321742 (owner: 10Paladox) [22:53:55] (03PS3) 10Paladox: Wikibugs: Update irc3 to 0.8.9 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321742 [22:53:58] (03Abandoned) 10Paladox: Wikibugs: Update irc3 to 0.8.9 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321742 (owner: 10Paladox) [22:54:04] (03CR) 10Paladox: "recheck" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/321742 (owner: 10Paladox) [22:56:15] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [22:59:13] (03PS1) 10Yurik: Added mapdata to interactive [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/321800 [23:02:06] (03CR) 10Zppix: "Is this patch complete and ready if so im okay with merging it" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/321800 (owner: 10Yurik) [23:02:26] Zppix, ?? [23:02:39] is that a question or a statement? :) [23:03:15] question [23:04:40] Zppix that looks ok, yurik want us to merge? Or do you still need to update your patch? [23:05:02] Zppix, its ready to go :) [23:05:06] thx paladox [23:05:07] paladox ok [23:05:09] i will merge [23:05:12] your welcome [23:05:17] thx! [23:05:18] (03CR) 10Zppix: [C: 032] Added mapdata to interactive [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/321800 (owner: 10Yurik) [23:05:58] (03Merged) 10jenkins-bot: Added mapdata to interactive [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/321800 (owner: 10Yurik) [23:05:59] !log tools.lolrrit-wm merging https://gerrit.wikimedia.org/r/#/c/321800/ [23:06:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lolrrit-wm/SAL [23:06:32] !log tools.lolrrit-wm deploying https://gerrit.wikimedia.org/r/#/c/321800/ [23:06:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lolrrit-wm/SAL [23:08:45] yurik should be broadcasting changes to mapdata to proper channel now, if it doesnt work feel free to let me or paladox or other grrrit-wm maintainers know [23:08:57] :) [23:23:25] RECOVERY - Host tools-secgroup-test-102 is UP: PING OK - Packet loss = 0%, RTA = 4.16 ms [23:25:31] PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170)