[02:02:38] 06Labs, 10Tool-Labs: DNS resolution sometimes fails on tools-bastion-03 - https://phabricator.wikimedia.org/T143194#2569135 (10Samwilson) Here's the job status: ``` tools.mediawiki-feeds@tools-bastion-03:~$ qstat -j 9910004 ============================================================== job_number:... [06:05:04] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL - No data received from host [06:10:05] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.044 second response time [06:49:26] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:27:20] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Bharathirajas was created, changed by Bharathirajas link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Bharathirajas edit summary: Created page with "{{Tools Access Request |Justification=To analyze Data |Completed=false |User Name=Bharathirajas }}" [07:29:25] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [09:39:15] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:45:45] 06Labs, 10Tool-Labs: Python is different versions on Tool labs and Grit. How run a scripts on grid and with scheduling? - https://phabricator.wikimedia.org/T143473#2569290 (10Vladis13) [10:47:36] 06Labs, 10Tool-Labs: Python is different versions on Tool labs and Grit. How run a scripts on grid and with scheduling? - https://phabricator.wikimedia.org/T143473#2569304 (10Vladis13) [10:53:55] 06Labs, 10Tool-Labs: Python is different versions on Tool labs and Grit. How run a scripts on grid and with scheduling? - https://phabricator.wikimedia.org/T143473#2569290 (10valhallasw) bastion hosts run trusty; to make grid jobs also run on trusty, pass "-l release=trusty" as parameter to jsub. [11:28:14] is tools-mail actually down? [11:28:24] or slow? [11:32:39] Luke|away: ? [11:34:04] valhallasw`cloud: I actually don't recive mails, if someone sends mails to the @tools.wmflabs.org address. I checked it by sending a mail to myself too [11:34:12] so is it down, or is there a delay? [11:34:31] It's always worked for me. [11:34:33] * tom29739 tries [11:36:03] Luke|away: what address are you mailing to/ [11:36:30] valhallasw`cloud: luke081515 [at] web [dot] de [11:36:41] Luke|away: what @tools.wmflabs.org address [11:36:51] oh. luke081515@tools... [11:37:06] (the other mail is already public via gerrit, so not a problem ;)) [11:38:18] hrm, there's nothing with that mail address in the exim log. When did you try this? [11:39:38] hrrrrm [11:39:38] It doesn't work for me either. [11:40:03] valhallasw`cloud: 11 minutes ago [11:40:18] valhallasw@maeglin:~$ telnet mail.tools.wmflabs.org. 25 0 [11:40:18] Trying 208.80.155.162... [11:40:19] welllll [11:40:30] btw, if you take a look at nagf, tools-mail looks like shutdown since yesterday [11:40:42] the host is called tools-mail-01 [11:40:46] ah, ok [11:40:52] and I can ssh to that host [11:41:05] ...unless it's not that host and tools-mail-01 is something else [11:41:06] * valhallasw`cloud checks [11:41:19] yeah, we actually have both [11:41:19] ugh. [11:41:35] and the external ip is linked to tools-mail. Ok. [11:41:41] valhallasw`cloud: I'm sure, that I at least get 3 mails from that address, maybe 4, that's why I'm asking [11:42:40] !log tools rebooting tools-mail (hanging) [11:42:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [11:45:12] valhallasw`cloud: does that mean, that we get the already send mail later, or are the mails from before gone? [11:45:23] should be delivered later [11:45:28] valhallasw`cloud, is tools-mail hanging another case of T141673? [11:45:28] T141673: Track labs instances hanging - https://phabricator.wikimedia.org/T141673 [11:46:19] ok, goo [11:46:21] *good [11:49:12] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Track labs instances hanging - https://phabricator.wikimedia.org/T141673#2569369 (10valhallasw) [11:49:13] tom29739: yes [11:49:18] RECOVERY - SSH on tools-mail is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [11:52:07] except.. shit [11:53:35] PROBLEM - Puppet staleness on tools-mail is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [11:54:13] 06Labs, 10Tool-Labs: tools-mail: "Couldn't open journal file /var/spool/exim4/input//1baLLw-00025F-75-J: No space left on device" - https://phabricator.wikimedia.org/T143476#2569371 (10valhallasw) [11:55:15] 06Labs, 10Tool-Labs: tools-mail: "Couldn't open journal file /var/spool/exim4/input//1baLLw-00025F-75-J: No space left on device" - https://phabricator.wikimedia.org/T143476#2569384 (10valhallasw) ``` df -i /var/ Filesystem Inodes IUsed IFree IUse% Mounted on /dev/vda2 125184 125184 0 100% /var... [12:09:59] valhallasw`cloud: does https://phabricator.wikimedia.org/T143476 mean, that there is an impact for sended mails, do you know that? [12:11:25] ? [12:11:46] it means exim is not processing any mails right now [12:24:53] 06Labs, 10Tool-Labs: tools-mail: "Couldn't open journal file /var/spool/exim4/input//1baLLw-00025F-75-J: No space left on device" - https://phabricator.wikimedia.org/T143476#2569420 (10valhallasw) The mail log is full of messages to `tools.crosswatch`, which can't be delivered (gmail says: 'this user is receiv... [12:51:00] valhallasw`cloud: actually I got 1 of 4 expected mails, so at least a bit works already :) [12:51:34] Luke081515: right, the others are probably in the sender's email server's queue [12:52:21] now 2/4 :) [12:53:01] valhallasw`cloud, if the gmail user has used up all their storage then that user won't receive mail. [12:53:09] Gmail will just bounce it. [12:53:24] And the mail server will try to resend, etc [12:53:31] tom29739: I don't think that's the issue [12:53:40] there were 60k emails [12:54:10] Crosswatch is unmaintained if I recall. [12:54:25] yep, sitic is sadly inactive [12:54:29] since september [12:54:56] Those look like automated mail. [12:55:15] they are [12:55:21] It'll keep sending the mail if the problem isn't solved. [12:57:04] if that happens, I'm shutting down crosswatch [12:57:13] until the issue is resoled [12:58:12] 06Labs, 10Tool-Labs, 10crosswatch: Crosswatch sends out large amounts of error mails, crashing tools-mail - https://phabricator.wikimedia.org/T143476#2569452 (10valhallasw) [12:58:35] RECOVERY - Puppet staleness on tools-mail is OK: OK: Less than 1.00% above the threshold [3600.0] [12:59:05] valhallasw`cloud: otherwise, shut down crosswatch mail, not the whole tool? [13:02:00] Crosswatch is broken anyway. [13:02:20] it's running, isn't it? [13:02:34] works for my watchlists [13:03:33] OK, it appears to be working now. [13:03:51] I've tried it several times in the past and it hasn't worked. [14:17:10] With the Kubernetes setup, can custom images be built? The registry in the docs accepts my login, then gives me unauthorized on an attempted push. Interested in using it for a C++ service that needs hitting to get running. [14:17:57] Damianz: iirc no, because that would give you root access via nfs [14:20:02] Maybe... does it anymore than having shell for web services though? Dunno, not looked into it that much, just got the server build against jessie finally. Maybe back to plan A, static compiled binary and hope :) [14:20:33] Damianz: web services run as the current user [14:20:48] also in k8s [14:21:08] but if someone could build their own docker container, they could just add themselves to sudoers, or add a suid binary, etc [14:24:14] Ah, yep....never mind there's a few other ways to make it nicer than it is at the moment, that was just a nice option if possible :) [14:25:46] Damianz: it might be possible in the future with non-nfs enabled containers, but that's pretty far on the horizon [14:27:51] Cool... ironically I could do without nfs :) [15:13:47] (03PS1) 10Jean-Frédéric: Safeguard against non-array $row [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/305794 (https://phabricator.wikimedia.org/T137505) [15:17:34] (03PS2) 10Jean-Frédéric: Safeguard against non-array $row [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/305794 (https://phabricator.wikimedia.org/T143481) [15:29:33] (03CR) 10Lokal Profil: Safeguard against non-array $row (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/305794 (https://phabricator.wikimedia.org/T143481) (owner: 10Jean-Frédéric) [15:31:17] (03CR) 10Lokal Profil: Refactor database configuration handling (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/303428 (owner: 10Jean-Frédéric) [15:32:53] (03CR) 10Lokal Profil: Setup local development environment for ErfgoedBot (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/303498 (owner: 10Jean-Frédéric) [15:35:05] (03CR) 10Lokal Profil: "> * The toolbox all in all was a worthy attempt − to get nice," [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/303933 (https://phabricator.wikimedia.org/T142570) (owner: 10EdouardHue) [16:27:01] PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218) [16:34:55] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [16:51:03] PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170) [18:08:15] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:08:44] PROBLEM - Puppet run on tools-proxy-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:10:07] Guest95904: ^ did you deploy that clush change on tools-puppetmaster or something like that? [19:10:16] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: secret(): invalid secret clush/clushuser.pub at /etc/puppet/modules/clush/manifests/target.pp:15 on node tools-proxy-02.tools.eqiad.wmflabs [19:10:30] I did [19:10:49] I also checked that it works [19:10:54] so it is probably puppet being transient? [19:11:25] tools-precise-dev is really broken [19:11:26] hrm [19:11:52] tools-precise-dev is also on tools-puppetmaster? [19:12:01] but maybe didn't get new certs? [19:12:20] Error: Could not retrieve catalog from remote server: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed: [self signed certificate in certificate chain for /CN=Puppet CA: tools-puppetmaster-01.tools.eqiad.wmflabs] [19:12:47] oh [19:12:48] yes [19:12:49] looks like it [19:12:53] :| I must've missed it [19:13:07] valhallasw`cloud can you rm -rf /var/lib/puppet/ssl and run puppet, if you're already there? I'll sign on master [19:13:39] ok [19:14:00] yuvipanda: does this mean all hosts are on tools-puppetmaster now? [19:14:24] valhallasw`cloud yup [19:14:27] yuvipanda: Info: Certificate Request fingerprint (SHA256): 55:2F:5E:37:2A:E0:15:69:4E:BB:5A:5F:B4:43:BF:38:D6:08:13:52:5E:2B:34:53:6B:1E:C3:0E:FB:1A:57:23 [19:14:28] D6: Interactive deployment shell aka iscap - https://phabricator.wikimedia.org/D6 [19:14:47] valhallasw`cloud confirmed and signed! [19:15:10] yuvipanda: cool! that means tool-labs puppet compiler is basically one CR away [19:15:32] (instead of one CR and one 'convince someone to run scripts on the central labs puppetmaster') [19:17:23] * yuvipanda nods [19:17:39] I'm off Monday and Tuesday, helping run a python bootcamp [19:17:59] tools-docker-builder-01 is also failing [19:18:12] * valhallasw`cloud checks why [19:18:27] Error: Failed to apply catalog: Could not find dependency Package[docker-engine] for Service[docker] at /etc/puppet/modules/docker/manifests/engine.pp:35 [19:18:36] I'll try and apt-get update [19:18:46] have fun, yuvipanda [19:19:10] valhallasw`cloud no, I think that requires some puppet changes I haven't done yet. [19:19:22] yeah, something with a duplicate sources.list [19:28:16] RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [19:37:19] PROBLEM - Puppet run on tools-docker-builder-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:39:21] valhallasw`cloud I'm super close to having clush fully working :D [19:40:07] clush is like salt but better? [19:40:40] valhallasw`cloud yeah, and over ssh [19:40:46] and you can do things like [19:40:54] 'run this command on all trusty exec hosts, 8 at a time' [19:40:57] and it works reliably [19:40:59] oh nice [19:41:26] can also 'copy this script to all trusty exec hosts and run the, 8 at a time, with a command timeout of 2minutes, collect all output and deduplicate' [19:41:31] and it works reliably [19:41:39] also has a nice pytohn API so you can script it from python rather than bash [19:41:48] I've been using it from my local machine for many months now [19:41:56] since it doesn't require any extra setup in that case [19:48:43] RECOVERY - Puppet run on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:56:54] PROBLEM - Puppet run on tools-worker-1005 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:57:46] PROBLEM - Puppet run on tools-exec-1408 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:01:51] going to be a bunch of failures because I just quit a 'run puppet on all nodes' :) [20:01:55] (because I realized that was a terrible idea) [20:02:55] PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:03:09] PROBLEM - Puppet run on tools-exec-1206 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:03:17] PROBLEM - Puppet run on tools-k8s-master-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:03:28] yeah, we should really only log failured if the cron one fails [20:03:32] bah [20:04:12] PROBLEM - Puppet run on tools-flannel-etcd-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:04:23] PROBLEM - Puppet run on tools-logs-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:04:41] PROBLEM - Puppet run on tools-exec-1209 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:04:46] valhallasw`cloud yeah, I agree [20:05:00] valhallasw`cloud I was thinking of doing this with prometheus in some way. it could write out a status in the puppet-run script that the cron calls [20:05:05] and we can alert only on that [20:05:19] PROBLEM - Puppet run on tools-flannel-etcd-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:05:24] yuvipanda: just tail /var/log/puppet.log | grep Error ? [20:06:00] there's nothing written there on manual runs, and we can ignore 'lock file' warnings the first k times [20:06:13] PROBLEM - Puppet run on tools-exec-gift is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:07:12] valhallasw`cloud mhm, that sounds much more indirect tho [20:07:30] it is [20:08:17] PROBLEM - Puppet run on tools-exec-1203 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:08:49] PROBLEM - Puppet run on tools-exec-1405 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:13:09] RECOVERY - Puppet run on tools-exec-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [20:13:43] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Bharathirajas was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=817880 edit summary: [20:14:12] RECOVERY - Puppet staleness on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [3600.0] [20:14:16] RECOVERY - Puppet run on tools-flannel-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:18:18] RECOVERY - Puppet run on tools-k8s-master-02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:25:19] RECOVERY - Puppet run on tools-flannel-etcd-02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:26:13] RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [20:29:41] RECOVERY - Puppet run on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:17] RECOVERY - Puppet run on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [20:33:47] RECOVERY - Puppet run on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [20:34:21] RECOVERY - Puppet run on tools-logs-02 is OK: OK: Less than 1.00% above the threshold [0.0] [20:36:56] RECOVERY - Puppet run on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [20:37:46] RECOVERY - Puppet run on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [20:55:22] yuvipanda: I just found https://www.python.org/shell/ [20:55:22] :D [20:56:26] nice! [20:56:57] valhallasw: am trying to figure out why a bunch of exec nodes don't seem to be running the puppet cron at all [20:57:04] O_o [20:57:09] (looking at tools-exec-1204) [20:57:10] which ones? [20:57:17] /var/log/puppet.log's last run was in Aug 9 [20:57:18] last modified [20:57:28] I just restarted cron daemon there to see what's up [20:57:30] Yet 'The last Puppet run was at Sat Aug 20 19:59:23 UTC 2016 (58 minutes ago).' [20:57:53] that was my clush to run puppet everywhere because I Noticed the ssh key stuff hasn't propogated [20:58:35] valhallasw`cloud on tools-puppetmaster-01, you can do ' clush -g 'all' -b 'uname -r'' [20:58:37] and that works now [21:00:26] not fully puppetized yet tho [21:00:33] and you can see the hosts on which puppet hasn't run yet [21:01:21] yuvipanda: so I investigated -1207 last week which also hadn't had puppet running for ages [21:01:29] seemed to be a nslcd issue [21:01:51] sorry [21:01:51] https://phabricator.wikimedia.org/T143191 [21:01:57] tools-webgrid-lighttpd-1207 [21:06:12] ah, interesting [21:07:07] 06Labs, 10Tool-Labs: Puppet not running on tools-webgrid-lighttpd-1207 - https://phabricator.wikimedia.org/T143191#2569749 (10yuvipanda) The nslcd one is a red herring I think, has been happening on all hosts since the move to openldap (but I think @MoritzMuehlenhoff told us it doesn't matter?) [21:16:27] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569795 (10Ivanhercaz) [21:17:22] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569795 (10Ivanhercaz) One comment more: I have log-out and login again but it persists. [21:32:40] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569894 (10yuvipanda) Hello! It looks like you hit your memory limit and so got stuck. I've killed it now, and you should be able to log in. [21:43:34] PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:44:11] ^ more clush stuff [21:44:12] yuvipanda, valhallasw`cloud, there's another one ^ [21:45:20] PROBLEM - Puppet run on tools-webgrid-lighttpd-1210 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:45:23] tom29739: thanks :-) [21:45:51] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:45:51] hmm [21:45:54] i'm running puppet again [21:45:58] I didn't do anything [21:46:01] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:46:12] did you do anything valhallasw`cloud? or is it just fallout from earlier? [21:47:42] manual run works fine [21:48:20] yuvipanda: no [21:48:54] that run was.... 08/20/2016 @ 9:10pm (UTC) [21:49:03] so that's a bit more than half an hour ago [21:49:12] I see [21:49:25] let me file a bug about building a non-flaky puppet failure system [21:51:00] 06Labs, 10Tool-Labs: Build a puppet failure check for tools that's less flaky than current one - https://phabricator.wikimedia.org/T143499#2569899 (10yuvipanda) [21:52:57] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569911 (10Ivanhercaz) OMG! Sorry for hit it. Could you give me some advice to not hit the memory limit? Or is it normal? [21:54:34] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569912 (10yuvipanda) No worries! Hitting your memory limit only causes problems for you and not other users, so no need to be sorry! Things to check: 1. Check the 'running' tab in your notebook home page, shut down thi... [21:55:25] 06Labs, 10Tool-Labs: Build a puppet failure check for tools that's less flaky than current one - https://phabricator.wikimedia.org/T143499#2569913 (10yuvipanda) Things it should do: 1. Not complain about transient puppet failures (aka they disappear on next run) 2. Catch failed restarts (since they *do* disap... [21:58:34] RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [21:59:48] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569918 (10Ivanhercaz) Okay! I am going to be relaxed so hehe. About your first point, I usually have only one (or two) terminals, but now I'm working with a lot of pages adding a template (Authority Control); about the... [22:00:38] valhallasw`cloud radical idea - if automatic puppet run fails, just re-run it and report failure only if it fails too! [22:00:55] (handwavey fix to make sure this doesn't ddos the puppetmaster) [22:00:57] *nod* [22:01:11] still fails if e.g. a manual puppet run is running [22:01:21] or we can just reduce the reports [22:03:30] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569932 (10yuvipanda) ah, are you working only on terminals? [22:03:31] yuvipanda, a good way to disable it might be a good idea [22:03:39] Like if work is being done for instance [22:03:50] right [22:04:13] edited to say that [22:06:42] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569946 (10Ivanhercaz) Mainly. Sometimes I use another notebooks, but it is usually for some test. [22:07:33] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569950 (10yuvipanda) ah ok. can you try opening a python notebook, and see if you can see a number like: 'Mem: 124 MB' on the top right? [22:09:11] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569952 (10Ivanhercaz) I see "Mem: " and nothing more. [22:10:32] 06Labs, 10Tool-Labs: Build a puppet failure check for tools that's less flaky than current one - https://phabricator.wikimedia.org/T143499#2569954 (10yuvipanda) so we have a mechanism for marking, *on the machine*, that we're working on it, and so it shouldn't complain about puppet failures. This mechanism sho... [22:10:46] valhallasw: ^ [22:14:33] 06Labs, 10Tool-Labs: Build a puppet failure check for tools that's less flaky than current one - https://phabricator.wikimedia.org/T143499#2569899 (10valhallasw) We should also have a simple way to acknowledge failures before spamming the entire irc channel; Shinken allows scheduling downtime per host, but not... [22:18:58] valhallasw`cloud I think puppet checks and alerts are the only things that shinken still does that prometheus doesn't, so I want to totally get rid of sinken [22:19:00] *shinken [22:20:50] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:21:06] valhallasw`cloud demo at http://demo.robustperception.io:9093/#/alerts [22:21:41] http://demo.robustperception.io:9093/#/silences is very configurable [22:25:20] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:59] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [22:36:39] ok [22:58:53] 10PAWS: Paws display 504 - Bad gateway time-out - https://phabricator.wikimedia.org/T143493#2569963 (10Ivanhercaz) Is possible that I have hit again the memory limit? :/ [23:53:44] hi all! question regarding OAuth: I'm sending an API request with multipart/formdata (since I'm trying to edit a page on user's behalf) and when I sign parameters passed in the request body - API returns auth error. But when I do not - it goes through [23:54:59] Is it a bug or a feature lol? This pretty much means I could send any params to API in multipart/formdata request while effectively only signing the url and not the params