[01:24:16] 6Labs, 10Tool-Labs, 6operations, 7Mail: Offer a solution to manage @toolserver.org mail redirections - https://phabricator.wikimedia.org/T116373#1798202 (10Dzahn) [02:22:18] 6Labs, 10Tool-Labs, 10Wikidata: widar oauth broken - magical script elves are temporarily ill - https://phabricator.wikimedia.org/T118363#1798267 (10Dzahn) [04:15:26] Hi, I'm experiencing some trouble with the OAuth service. I got "Invalid response from token request" for some hours now. https://tools.wmflabs.org/oauth-hello-world/index.php?action=authorize [04:18:27] Trying to login at X Tools give more info about the error [04:18:27] https://tools.wmflabs.org/xtools/oauthredirector.php?action=login&callto=https://www.mediawiki.org/w/api.php&returnto=http://tools.wmflabs.org/xtools-ec/ [04:18:27] Request from 10.64.32.105 via cp1067 cp1067 ([10.64.0.104]:3128), Varnish XID 780290271 [04:18:27] Forwarded for: 10.68.17.163, 10.64.32.105, 10.64.32.105 [04:18:28] Error: 503, Service Unavailable at Wed, 11 Nov 2015 04:13:54 GMT [06:58:18] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:04:13] PROBLEM - Puppet failure on tools-exec-1404 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:33:21] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [07:34:02] 6Labs, 10Tool-Labs, 10Wikidata: widar oauth broken - magical script elves are temporarily ill - https://phabricator.wikimedia.org/T118363#1798470 (10yuvipanda) p:5Triage>3Unbreak! [07:39:08] RECOVERY - Puppet failure on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [07:41:41] 6Labs, 10Tool-Labs, 10Wikidata: widar oauth broken - magical script elves are temporarily ill - https://phabricator.wikimedia.org/T118363#1798491 (10yuvipanda) I looked at the error logs and get: ```2015-11-11 07:35:04: (mod_fastcgi.c.2673) FastCGI-stderr: PHP Warning: Missing argument 1 for MW_OAuth::doAu... [07:59:40] 6Labs, 5Patch-For-Review: HBA failing to trusty instances on tool labs - https://phabricator.wikimedia.org/T116687#1798517 (10yuvipanda) 5Open>3Resolved [08:01:28] 6Labs, 5Patch-For-Review: HBA failing to trusty instances on tool labs - https://phabricator.wikimedia.org/T116687#1798519 (10yuvipanda) a:3valhallasw [08:05:45] 6Labs, 5Patch-For-Review: HBA failing to trusty instances on tool labs - https://phabricator.wikimedia.org/T116687#1798520 (10yuvipanda) (I tested it, seems ok! \o/) [09:05:27] 6Labs, 10Tool-Labs, 10Wikidata: widar oauth broken - magical script elves are temporarily ill - https://phabricator.wikimedia.org/T118363#1798584 (10Magnus) That's just a warning for an optional argument not passed, not related to the issue. OAuth appears to be down. See T118372. [09:11:35] 6Labs, 10Tool-Labs, 10Wikidata: widar oauth broken - magical script elves are temporarily ill - https://phabricator.wikimedia.org/T118363#1798588 (10valhallasw) [09:18:13] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3labs-sprint-116, and 2 others: Allow direct ssh access to tools - https://phabricator.wikimedia.org/T113979#1798597 (10valhallasw) Why didn't we revert this change until T116687 was resolved? Now inter-host ssh was broken for two weeks :/ [09:50:28] 6Labs, 10Tool-Labs, 10Wikidata: widar oauth broken - magical script elves are temporarily ill - https://phabricator.wikimedia.org/T118363#1798654 (10jcrespo) [09:56:08] 6Labs, 10Tool-Labs, 10Wikidata: widar oauth broken - magical script elves are temporarily ill - https://phabricator.wikimedia.org/T118363#1798672 (10Joe) [09:58:02] 6Labs, 10Tool-Labs, 10Wikidata: widar oauth broken - magical script elves are temporarily ill - https://phabricator.wikimedia.org/T118363#1798675 (10mmodell) [09:59:29] <_joe_> O [09:59:46] <_joe_> err, OAuth has been fixed, at least it works from phabricator now [10:00:02] <_joe_> I'd like to get more people to confirm before I solve the ticket [10:00:38] works for gerrit-patch-uploader and widar for me [10:01:03] <_joe_> it also works in the test oauth app [10:01:07] <_joe_> so I guess it's now fixed [10:01:17] <_joe_> valhallasw`cloud: thanks for pointing me to the problem [10:01:28] yw [10:01:59] 6Labs, 10Tool-Labs, 10Wikidata: widar oauth broken - magical script elves are temporarily ill - https://phabricator.wikimedia.org/T118363#1798680 (10jcrespo) [10:02:57] 6Labs, 10Tool-Labs, 10Wikidata: widar oauth broken - magical script elves are temporarily ill - https://phabricator.wikimedia.org/T118363#1798686 (10Joe) [10:06:10] _joe_: thanks :) [10:08:13] 6Labs, 10Tool-Labs, 6operations, 7Mail: Offer a solution to manage @toolserver.org mail redirections - https://phabricator.wikimedia.org/T116373#1798694 (10valhallasw) Yes: create a ticket with the requested change, in the #tool-labs project, ccing @coren. /etc/toolserver.aliases is currently unpuppetized... [10:08:40] 6Labs, 10Tool-Labs, 7Mail: Offer a solution to manage @toolserver.org mail redirections - https://phabricator.wikimedia.org/T116373#1798696 (10valhallasw) 5Open>3Resolved p:5High>3Low [10:10:17] 6Labs, 10Tool-Labs: Add tool-labs admins to `relic` project - https://phabricator.wikimedia.org/T118375#1798699 (10valhallasw) 3NEW [11:36:53] anyone know what to do about repeated bigbrother failure messages? I have already removed the components that were for the run [11:36:55] ? [11:37:38] [bigbrother] warn: job '' failed to start [11:37:52] [bigbrother] warn: Too many attempts to restart job ''; throttling [11:38:01] there is no file, it is reporting a ghost [12:12:41] sDrewth: please create a Task for it [12:12:49] bigbrother sometimes gets stuck [12:13:05] ok [12:18:57] 6Labs: bigbrother trying to run and restart non-existing task (wikisource-bot) - https://phabricator.wikimedia.org/T118387#1798898 (10Billinghurst) 3NEW [12:47:46] 6Labs: bigbrother trying to run and restart non-existing task (wikisource-bot) - https://phabricator.wikimedia.org/T118387#1798928 (10scfc) [12:47:48] 6Labs, 10Tool-Labs: bigbrother doesn't stop - https://phabricator.wikimedia.org/T94500#1798929 (10scfc) [12:48:51] 6Labs: bigbrother trying to run and restart non-existing task (wikisource-bot) - https://phabricator.wikimedia.org/T118387#1798898 (10scfc) (I have restarted `bigbrother` on `tools-submit`, so your case should be resolved.) [13:46:55] 6Labs: bigbrother trying to run and restart non-existing task (wikisource-bot) - https://phabricator.wikimedia.org/T118387#1798988 (10Billinghurst) 5duplicate>3Open While it may be the same bug, I need for the specific action to resolved promptly, not for a long term solution. [14:11:12] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [14:48:26] 6Labs, 10wikitech.wikimedia.org: "Edit with form" missing on a Tools access request page - https://phabricator.wikimedia.org/T118136#1799058 (10Krenair) I can see the "edit with form" button... [14:51:10] RECOVERY - Puppet failure on tools-bastion-01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:10:12] 6Labs, 10Labs-Infrastructure, 10hardware-requests, 6operations, and 2 others: Labs test cluster in codfw - https://phabricator.wikimedia.org/T114435#1799088 (10chasemp) [15:10:14] 6Labs, 10Labs-Infrastructure, 10netops, 6operations, and 3 others: Allocate labs subnet in dallas - https://phabricator.wikimedia.org/T115491#1799086 (10chasemp) 5Open>3Resolved >>! In T115491#1795538, @faidon wrote: > @chasemp, is this done? yes, I believe we can call this done. The hosts in this new... [15:12:44] 6Labs, 6operations, 10wikitech.wikimedia.org: wikitech regularly looses session directly after login - https://phabricator.wikimedia.org/T118395#1799090 (10JanZerebecki) 3NEW [15:17:13] 10Tool-Labs-tools-tsreports, 6Commons: kmlexport down - https://phabricator.wikimedia.org/T118396#1799107 (10Rodhullandemu) 3NEW [15:26:31] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review: deployment tracking of codfw labs test cluster - https://phabricator.wikimedia.org/T117097#1799126 (10chasemp) 5Open>3Resolved >>! In T117097#1798052, @Andrew wrote: > All the boxes now have an OS installed and puppet and salt signed and run... [15:55:52] 10Tool-Labs-tools-tsreports, 6Commons: kmlexport down - https://phabricator.wikimedia.org/T118396#1799162 (10valhallasw) [15:55:54] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1799163 (10valhallasw) [16:13:57] valhallasw`cloud: hmm, https://tools.wmflabs.org/zoomviewer/iipsrv.fcgi?FIF=cache/779543aa14d92a2dff180a4cbc0eb2f6.tif&obj=IIP,1.0&obj=Max-size&obj=Tile-size&obj=Resolution-number [16:14:01] says not serviced [16:14:06] but webserbvice exists [16:14:10] and zoomviewer itself works [16:14:13] wtf [16:14:16] schwd: ^ [16:14:23] need to look at lighttpd response [16:14:25] YuviPanda: wrong 404 filter in nginx? [16:14:48] ooooo [16:14:50] yes [16:14:52] prolly [16:14:59] is that what lighty says? [16:15:09] * YuviPanda has to run to kubecon again now [16:15:17] YuviPanda: haven't checked [16:15:20] but that would be my guess [16:15:26] ok [16:16:24] i'll take a look in a couple hours [16:29:36] 6Labs, 6operations, 3Labs-Sprint-101: Make Labs NFS alerts paging - https://phabricator.wikimedia.org/T101650#1799216 (10yuvipanda) 5Open>3Resolved These are paging from icinga now. [16:30:07] 6Labs, 3Labs-Sprint-104: Recover files from old corrupted file system (Tracking) - https://phabricator.wikimedia.org/T104334#1799219 (10yuvipanda) p:5High>3Normal a:5yuvipanda>3None [16:31:28] 6Labs, 10wikitech.wikimedia.org, 5Patch-For-Review: Use a Puppet ENC to define which classes are included in which nodes (in Labs) - https://phabricator.wikimedia.org/T85279#1799222 (10yuvipanda) a:5yuvipanda>3None [16:33:05] 6Labs, 6operations, 3Labs-sprint-112, 5Patch-For-Review: labstore1002 out of space in vg to create new snapshots - https://phabricator.wikimedia.org/T109954#1799230 (10yuvipanda) 5Open>3Resolved Verified. [17:05:53] 6Labs, 10Tool-Labs: ZoomViewer FCGI broken - https://phabricator.wikimedia.org/T118405#1799320 (10dschwen) 3NEW a:3yuvipanda [17:06:37] thx guys! [17:34:46] 6Labs, 6operations: labs precise instance not accessible after provisioning - https://phabricator.wikimedia.org/T117673#1799370 (10fgiunchedi) yup, works for me, both instances are up but can be deleted/recreated at will for tests too [18:05:08] 6Labs, 3labs-sprint-118, 3labs-sprint-119: Document support levels for tools and labs projects - https://phabricator.wikimedia.org/T116598#1799416 (10chasemp) a:5Andrew>3chasemp [18:12:52] schwd: from what I can see, it's a 503 thrown by the application [18:14:32] 6Labs, 10Tool-Labs: ZoomViewer FCGI broken - https://phabricator.wikimedia.org/T118405#1799447 (10valhallasw) a:5yuvipanda>3None As far as I can see, this is an application error: ``` tools.zoomviewer@tools-webgrid-lighttpd-1401:~$ curl "http://localhost:60146//zoomviewer/iipsrv.fcgi?FIF=cache/779543aa14d... [18:15:48] $log librarybase updated wikibase to wmf/1.26wmf22 [18:15:52] !log librarybase updated wikibase to wmf/1.26wmf22 [18:18:13] 6Labs, 10Tool-Labs: ZoomViewer FCGI broken - https://phabricator.wikimedia.org/T118405#1799457 (10dschwen) Hmm, so where can I see details on that error? My ~/error.log is empty. [18:21:56] bah, i dont rmemeber that command... [18:22:35] 6Labs, 10Tool-Labs: ZoomViewer FCGI broken - https://phabricator.wikimedia.org/T118405#1799480 (10valhallasw) ``` tools.zoomviewer@tools-webgrid-lighttpd-1401:~$ ls -l /proc/5591/fd total 0 lrwx------ 1 tools.zoomviewer tools.zoomviewer 64 Oct 24 11:39 0 -> /dev/null lrwx------ 1 tools.zoomviewer tools.zoomvie... [18:22:50] addshore: it should be !log, but the bot is buggy [18:22:54] =] [18:23:15] !log librarybase should work now [18:23:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Librarybase/SAL, Master [18:23:20] !log librarybase updated wikibase to wmf/1.26wmf22 [18:23:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Librarybase/SAL, Master [18:23:25] =] [18:23:33] https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource%3ALibrarybase%2FSAL&type=revision&diff=202524&oldid=175931 [18:45:38] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1799510 (10Rodhullandemu) This hasn't been addressed in over a month. Given the demand for this tool, could someone please kick it back into life? [18:46:54] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1799512 (10yuvipanda) It needs an actual maintainer who can look at the code and figure out what is causing the issue - the webserver itself is running but just 'hung'. I'm not fully sure what us admins can do. [19:00:09] 6Labs, 10Tool-Labs: ZoomViewer FCGI broken - https://phabricator.wikimedia.org/T118405#1799535 (10yuvipanda) I restarted the webservice and it seems to work now?! [19:00:28] schwd: ^ can you check? [19:09:26] YuviPanda: huh. 2015-11-11 19:09:03: (mod_cgi.c.1055) fork failed: Cannot allocate memory for kmlexport [19:09:31] but there's still 700M free [19:09:49] it's probably leaking memory left and right? [19:09:58] well, the lighttpd process is using ~800M [19:10:06] hmm [19:10:21] but why doesn't it use COW semantics? [19:10:25] or is that turned off on exec hosts? [19:10:26] not sure [19:10:32] don't think you can turn that off? [19:10:42] dunno? maybe overcommit settings? [19:10:54] hmm [19:11:11] not sure, all the memory stuff was tweaked by Coren and I'm not really sure what they're even set to and more importantly why. [19:11:28] should we strace it to see what's going on? [19:11:34] oh good idea [19:11:34] valhallasw`cloud: what host is it on? [19:11:42] tools-webgrid-lighttpd-1413 [19:11:44] I'll strace [19:11:47] kkk [19:12:48] YuviPanda: we should also have better checks for how busy servers are [19:12:56] this has been on the to-do for months [19:13:04] please make it happen =p [19:13:06] servers as in processes? [19:13:09] or [19:13:11] vms [19:13:12] instances? [19:13:14] right [19:13:19] do we have a bug somewhere? [19:13:23] proooobably [19:13:25] this should be doable via graphite I think [19:13:49] clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b46c9e27210) = -1 ENOMEM (Cannot allocate memory) [19:14:01] heh [19:15:09] so the next question is why on earth lighttpd needs 800M [19:15:23] I wonder if there's an strace equivalent for memory [19:15:43] typical memory usage is < 0.1%, i.e. less than 8MB [19:16:45] bd808 are you around? [19:20:20] valhallasw`cloud: I guess that's a core dump we can try [19:20:25] YuviPanda: yeah did that [19:20:27] then strings [19:20:27] woo [19:20:31] ok [19:20:37] lots of requests and responses [19:20:47] in memory? [19:20:52] ya [19:21:09] hmm [19:21:10] maybe this is another of those CLOSE_WAIT situations? [19:21:17] valhallasw`cloud: you can check that with lsof -p [19:22:04] hm, nope, no connections to be seen [19:22:30] none at all or just no network connections? [19:22:46] only the listening port [19:22:51] no network connections [19:24:09] 6Labs, 10Tool-Labs: Rewrite the meta_p table populating code to python and have it run on a cron - https://phabricator.wikimedia.org/T107094#1799558 (10yuvipanda) a:5yuvipanda>3None [19:24:14] ok [19:25:18] https://tools.wmflabs.org/kmlexport/server-status [19:29:37] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1799566 (10valhallasw) You can consider creating a proposal at https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey. We don't have the manpower to fix tools -- we barely have enough manpower to keep tool labs itself... [19:30:27] * YuviPanda provides hugs to valhallasw`cloud [19:33:13] valhallasw@tools-webgrid-lighttpd-1413:~$ sysctl vm.overcommit_memory [19:33:14] vm.overcommit_memory = 2 [19:36:32] YuviPanda: ^ I'm not sure if that even makes sense for webgrid hosts [19:36:45] what does 2 do [19:37:26] sorry. 2 is 'never overcommit' [19:37:33] ... [19:37:37] should be 0 maybe [19:37:40] is that even in puppet? [19:37:44] I think so [19:37:57] can you make a patch? I'll merge [19:42:20] Coren: was there a specific reason for setting overcommit to 2? [19:42:20] YuviPanda: I'm also not sure if we need overcommit disabled anywhere at all? [19:42:28] yeah I don't know why it is [19:42:41] because sge should stop us from overcommitting to begin with [19:43:10] not for lighttpd nodes [19:48:22] right [19:48:23] so [19:48:27] lighttpd doesn't release memory [19:48:40] because fragmentation of virtual memory [19:48:49] so [19:48:57] we need to allow lighttpd to swap and overcommit [19:56:27] https://gerrit.wikimedia.org/r/#/c/63283/2/modules/toollabs/manifests/exec_environ.pp [19:57:26] YuviPanda: the alternative is a huge swap space [19:58:16] valhallasw`cloud: we actually already have huge swap spaces [19:58:23] 2x RAM I think [19:58:33] no? [19:58:36] 500M [19:58:42] really? [19:58:51] on tools-webgrid-lighttpd-1413 at least [19:58:51] I even puppetized it [19:58:55] ugh [19:58:58] really? [19:59:03] maybe that's the problem? [19:59:41] no, wait [19:59:50] yes! [19:59:53] that's exactly the issue [20:00:11] because there is less than [lighttpd resident size] in swap available, linux refuses to allocate memory [20:00:32] because there is memory available -- 5285536 cached Mem [20:01:08] the missing swap? [20:01:21] cached mem = file system cache [20:01:27] in-memory cache of the FS [20:01:36] yeah and that gets evicted right [20:01:42] yep [20:01:45] let me try something [20:01:49] ok! [20:03:32] 6Labs, 10wikitech.wikimedia.org: "Edit with form" missing on a Tools access request page - https://phabricator.wikimedia.org/T118136#1799649 (10scfc) 5Open>3Resolved a:3scfc //Now// I do as well. [20:03:55] 6Labs, 10wikitech.wikimedia.org: "Edit with form" missing on a Tools access request page - https://phabricator.wikimedia.org/T118136#1799654 (10scfc) a:5scfc>3None [20:11:42] I can get a python process to approx 4-5% memory, which is 320-400MB [20:11:48] which is odd as well [20:11:59] but that's the overcommit in action I suppose [20:12:26] but /proc/sys/vm/overcommit_ratio is set to 95 [20:12:38] so it should allow malloc up to swap + 95% of ram [20:12:40] *confused* [20:13:46] CommitLimit: 8267684 kB [20:13:46] Committed_AS: 7809168 kB [20:13:49] ooooooh [20:15:49] but the sum of VIRT is much larger... but that's probably because it doesn't count shared libs for overcommit? [20:18:51] sum of RES is the right order of magnitude (~100 procs * 20MB = 2GB, plus a few big ones) [20:23:00] YuviPanda: turning overcommit_memory to 0 does bring kmlexport back online [20:23:15] turning swap back on? [20:23:17] the risk is of course not being able to login when something completely hoards memory [20:23:26] no that just disables overcommit checking [20:23:37] no I mean, did you try turning swap on instead? [20:23:45] not sure how to extend swap? [20:23:58] not sure why puppet didn't do that already [20:24:22] see modules/toollabs/manifests/node/compute/general.pp [20:24:35] and labs_lvm::swap [20:24:49] because that's only for general compute nodes [20:24:52] not for webgrid? [20:24:53] :P [20:25:22] can I easily test that on a single node? [20:25:29] more swap? [20:25:36] yeah [20:25:37] lvmadmin is scary [20:25:39] on trusty definitely [20:25:43] you don't need lvmadmin [20:25:46] oh [20:25:48] kindof [20:26:10] valhallasw`cloud: you can run the commands in labs_lvm::swap [20:26:13] and see what happens [20:26:16] ah right [20:29:27] yep, that works [20:29:46] valhallasw@tools-webgrid-lighttpd-1413:~$ cat /proc/meminfo | grep Commit [20:29:46] CommitLimit: 33433504 kB [20:29:46] Committed_AS: 8055556 kB [20:30:05] so let's do the swap for all exec hosts? [20:30:31] yeah [20:30:36] but precise hosts will need a restart I think [20:30:57] valhallasw`cloud: can you make a patch gating it to just trusty hosts now? [20:31:04] uuuuh [20:32:09] I remember needing to do that when I did it for exec nodes [20:32:17] I'm not sure why we didn't do it for the web ones [20:32:32] I even wrote this down somewhere maybe [21:01:38] 6Labs, 10Tool-Labs: webgrid nodes have very limited swap (500MB) - https://phabricator.wikimedia.org/T118419#1799812 (10valhallasw) 3NEW [21:01:43] ^ YuviPanda [21:01:56] ok! [21:09:40] 6Labs, 10Tool-Labs: ZoomViewer FCGI broken - https://phabricator.wikimedia.org/T118405#1799829 (10dschwen) 5Open>3Resolved a:3dschwen Thanks, yes, it works now. [21:10:23] !help [21:10:23] !documentation for labs !wm-bot for bot [21:10:51] !documentation [21:10:58] * Ryan_Lane grumbles [21:11:26] Attempting to join #confidant using wm-bot [21:11:26] @add #confidant [21:25:42] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1799872 (10Rodhullandemu) Thanks, I have addedmy 2c worth there. Meanwhile, it is up again, thanks to whomever did that. [21:34:15] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1799895 (10valhallasw) After discussing this issue with lighttpd people: - lighttpd does not release memory, so dynamic responses to big requests are a big issue - the current memory configuration on tool labs does not hand... [21:36:18] hello , I have a question about this == https://tools.wmflabs.org/xtools-ec [21:36:38] at the bottom it says copyright (list of half a dozen wikipedians) [21:36:58] unlike most wmf-hosted pages, it does not mention CCBYSA nor GFDL [21:37:16] user75108: you should ask the xtools maintainers. [21:37:26] (aka the people listed there) [21:37:30] none of whom seem to be here [21:37:35] is it permissible to copy the output of tools , to an on-wiki location? or is their an equivalent local template, that generates the same output? [21:37:56] YuviPanda , okay thanks. [21:41:46] 6Labs, 10Tool-Labs: webgrid nodes have very limited swap (500MB) - https://phabricator.wikimedia.org/T118419#1799915 (10valhallasw) Increasing this to 24GB solves the kmlexport woes, and probably also the regular puppet issues we see. Before: 21:13 CommitLimit: 8267684 kB 21:13 anyone have any experience doing custom event tracking in piwik? [22:21:25] 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1800007 (10scfc) Does lighttpd really hold the dynamic response in memory and not just pass it through? If the culprit was PHP, there is the setting `PHP_FCGI_MAX_REQUESTS` which (IIRC) triggers after how many requests the PH... [23:18:38] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:36:36] PROBLEM - Puppet failure on tools-proxy-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [23:51:39] RECOVERY - Puppet failure on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:53:39] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [23:57:16] valhallasw`cloud: legoktm ok I fixed the final security issue with k8s! [23:57:22] valhallasw`cloud: legoktm I can give wikibugs access now