[00:00:37] PROBLEM - Puppet failure on tools-jessie-test is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:15:49] Coren: I am currently running a script to generate datasets using information from the db replicas on Tool Labs. These datasets feature usernames of editors. This is all publicly available information, but I was wondering if there were any rules on publishing datasets of usernames? [00:20:35] RECOVERY - Puppet failure on tools-jessie-test is OK: OK: Less than 1.00% above the threshold [0.0] [00:22:04] harej: what kind of dataset? [00:22:42] it is a csv with usernames, edits to a wikiproject space (i.e. [[Wikipedia:WikiProject so-and-so]], the talk page, and subpages), and edits to pages within that WikiProject's scope. [00:23:44] the usernames come from the respective revision tables. anyone could find this information and get it by looking at edit history pages or through API calls [00:24:21] harej: should be ok [00:37:02] does anyone know how I can set the permissions on the files in my directory without confusing git? [00:37:23] I see the my bot's entire directory structure is readable by global [00:37:39] well not the entire structure, but much of it [04:15:42] 10Tool-Labs: Setup a local repo for toollabs that supports separate trusty and precise packages - https://phabricator.wikimedia.org/T76802#1098762 (10scfc) p:5Triage>3High a:3scfc [04:31:05] PROBLEM - Host tools-jessie-test is DOWN: CRITICAL - Host Unreachable (10.68.17.212) [04:49:36] 10Tool-Labs: Setup a local repo for toollabs that supports separate trusty and precise packages - https://phabricator.wikimedia.org/T76802#1098775 (10scfc) The WMF repositories (`wikimedia-precise` & Co.) have a priority of 1001. So if the priority of our local repository was set in a way that it would just sup... [08:30:31] 6Labs: Instances with no name or project associated on wikitech - https://phabricator.wikimedia.org/T91922#1098825 (10scfc) 3NEW [09:34:01] 10Tool-Labs: Get rid of custom nginx packages - https://phabricator.wikimedia.org/T91878#1098844 (10yuvipanda) (and get rid of the tools-webproxy instance as well) [09:49:40] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [09:54:01] 10Tool-Labs, 5Patch-For-Review: Setup a local repo for toollabs that supports separate trusty and precise packages - https://phabricator.wikimedia.org/T76802#1098849 (10yuvipanda) Everything seems ok. Now to remove the unneeded packages from trusty... [09:55:35] PROBLEM - Puppet failure on tools-trusty is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [10:07:31] 10Tool-Labs, 5Patch-For-Review: Setup a local repo for toollabs that supports separate trusty and precise packages - https://phabricator.wikimedia.org/T76802#1098851 (10scfc) You will end up with :-): ``` scfc@toolsbeta-puppetmaster3:~$ ls -l /data/project/.system/deb-trusty total 1564 -r--r--r-- 1 root tools... [10:08:56] PROBLEM - Puppet staleness on tools-exec-15 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [43200.0] [10:52:25] RECOVERY - Puppet failure on tools-exec-12 is OK: OK: Less than 1.00% above the threshold [0.0] [10:54:53] RECOVERY - Puppet failure on tools-exec-catscan is OK: OK: Less than 1.00% above the threshold [0.0] [10:55:23] RECOVERY - Puppet failure on tools-webgrid-05 is OK: OK: Less than 1.00% above the threshold [0.0] [10:56:52] RECOVERY - Puppet failure on tools-webgrid-07 is OK: OK: Less than 1.00% above the threshold [0.0] [11:00:38] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [11:00:47] 10Tool-Labs: Get rid of custom nginx packages - https://phabricator.wikimedia.org/T91878#1098873 (10yuvipanda) 5Open>3Resolved Done now. [11:00:48] 10Tool-Labs: Reduce amount of tools-local packages - https://phabricator.wikimedia.org/T91874#1098875 (10yuvipanda) [11:01:01] PROBLEM - Host ToolLabs is DOWN: CRITICAL - Host Unreachable (tools.wmflabs.org) [11:03:25] RECOVERY - Puppet failure on tools-webgrid-06 is OK: OK: Less than 1.00% above the threshold [0.0] [11:04:19] PROBLEM - Host tools-webproxy is DOWN: CRITICAL - Host Unreachable (10.68.16.4) [11:04:28] 10Tool-Labs, 5Patch-For-Review: Setup a local repo for toollabs that supports separate trusty and precise packages - https://phabricator.wikimedia.org/T76802#1098876 (10yuvipanda) Did you remove some from deb-trusty for tools? misctools etc is missing there.. [11:07:07] don’t panic [11:07:09] there’s no tools outage [11:10:18] 10Tool-Labs, 5Patch-For-Review: Setup a local repo for toollabs that supports separate trusty and precise packages - https://phabricator.wikimedia.org/T76802#1098877 (10scfc) 5Open>3Resolved Yes, accidentally, I started with a clean plate, added what was missing, and as misctools was already installed ever... [11:11:32] 10Tool-Labs, 5Patch-For-Review: Setup a local repo for toollabs that supports separate trusty and precise packages - https://phabricator.wikimedia.org/T76802#1098879 (10yuvipanda) Wheeee! You're awesome :) ty [11:13:07] 6Labs, 10Tool-Labs: Please install python's MySQLdb module in all uwsgi instances on labs tools. - https://phabricator.wikimedia.org/T91155#1098881 (10yuvipanda) It's already installed. ```± |master ✗| → ssh tools-webgrid-generic-01.eqiad.wmflabs Last login: Wed Mar 4 08:41:05 2015 from bastion-restricted1.e... [11:59:36] 6Labs, 10Tool-Labs: Please install python's MySQLdb module in all uwsgi instances on labs tools. - https://phabricator.wikimedia.org/T91155#1098916 (10Mjbmr) @yuvipanda , I were getting "No module named MySQLdb" in my logs, maybe someone installed till now, I used virtualenv, nvm. could you check my the other... [12:00:08] 6Labs, 10Tool-Labs: Please install python's MySQLdb module in all uwsgi instances on labs tools. - https://phabricator.wikimedia.org/T91155#1098918 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Cool, marking this as resolved... [12:00:58] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098922 (10yuvipanda) 5declined>3Open p:5Triage>3Normal [12:01:11] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1075356 (10yuvipanda) Retitling and re-opening based on comment in T91155 [12:01:48] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098931 (10yuvipanda) @Mjbmr heya! Can you point us to the code that is having problems? [12:05:12] RECOVERY - Host ToolLabs is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms [12:06:17] there we go [12:06:21] well done, DNS cache [12:07:51] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098932 (10Mjbmr) @yuvipanda you really want my code? it don't document my code, but if you're a root you can check it out on /data/project/xmlfeed/www/python/src/app.py when I run it separa... [12:09:59] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098934 (10yuvipanda) @Mjbmr You do know that your code has to be open source to live on labs, right? [12:14:35] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098936 (10Mjbmr) @yuvipanda the only way you can put a code on wmf projects is to be hosted labs, I can't host them somewhere else, beside I know sysadmins has access to my codes and I don'... [12:18:18] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098938 (10Mjbmr) @yuvipanda there are passwords being stored on labs, you can't count any privacy as open source. [12:18:59] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098939 (10yuvipanda) Alright, so two unrelated issues here. 1. Code on labs has to be open source. It has to have a OSI compatible license (https://wikimediafoundation.org/wiki/Terms_of_Us... [12:21:43] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098942 (10yuvipanda) >>! In T91156#1098938, @Mjbmr wrote: > @yuvipanda there are passwords being stored on labs, you can't count any privacy as open source. Sure, but your code itself can... [12:25:00] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098947 (10Mjbmr) @yuvipanda I count my codes as privacy, unless you want me stop helping these projects. are you going to debug and fix uwsgi? or you just want keep talking about owning my... [12:26:33] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098951 (10yuvipanda) As an admin on toollabs I think it is my duty to investigate non open source applications on toollabs. I'll start an internal thread amongst other admins and figure out... [12:27:43] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098952 (10Mjbmr) 5Open>3declined [12:28:44] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098953 (10yuvipanda) 5declined>3Open Re-opening. Why was this closed? [12:30:04] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098955 (10Mjbmr) 5Open>3declined I removed my codes. [12:31:51] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098957 (10yuvipanda) I'm not going to wheel war on the status of this bug. I'll take a look next week on the actual bug. [12:32:09] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098958 (10yuvipanda) a:5coren>3yuvipanda [14:11:41] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1098995 (10Aklapper) 5declined>3Open The actual issue reported in this ticket looks valid so far as per T91156#1098922. Hence this ticket should remain open for the time being until the... [14:58:42] .q Yuvipanda [14:58:48] Heh [14:59:07] Coren: hi :) [15:00:47] i have some problems with inexplicable (array) job aborts (sigkill) that didn't occur before. could it have to do with ubuntu upgrades? or something else? i'm sort of lost [15:01:46] gifti: On tools-gift or on the regular queues? [15:01:57] tools-gift, yes :) [15:20:22] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1099078 (10Mjbmr) @Aklapper I don't need him to do anything for me, if he don't know what privacy means. [17:28:42] Hi [17:28:51] Can someone help a noob like me? [17:29:36] Why does https://tools.wmflabs.org/eagleeye/xregexp-all-min.js say that ther's no webservice running, though it used to work and I just re-started the webservice? [17:36:38] Ah, okay, I see the status now: "Your webservice is scheduled" [17:37:37] queue instance "webgrid-lighttpd@tools-webgrid-03.eqiad.wmflabs" dropped because it is temporarily not available [17:37:44] So I guess I just have to wait? [17:46:54] "error: commlib error: got select error (Connection refused) - error: unable to send message to qmaster using port 6444 on host "tools-master.eqiad.wmflabs": got send error" [17:47:08] I'm getting this message when trying to execute "qstat" [17:48:07] on tool labs [17:50:53] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1099231 (10coren) @Mjbmr: you are welcome to keep code private or proprietary at some place //other// than the Wikimedia Labs. The Foundation offers free hosting for tools and utilities mad... [17:51:35] gifti: You seem to be running into the OOM killer. [17:52:02] apper: I'm on it. [17:56:19] coren: ah, hm, i didn't see any suspicious ram use [17:57:31] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1099232 (10Mjbmr) @coren I totally understand that and knew it, the only problem here is to solve the issue without trying to run my code, I don't like people giving me advice how to code, t... [17:58:32] apper: Something odd going on with the queue database that'd making the master stop itself. Debugging now. [17:58:45] Coren: thanks :) [18:00:47] I've just got a "error: commlib error: got select error (Connection refused)" after qsub. What should I do, just wait patiently? [18:01:00] alkamid: Known issue; I'm working on it now. [18:01:14] cheers Coren [18:11:46] alkamid, apper: There was a corrupted job entry in the database. Deleting it allowed me to restart the master. [18:12:07] Coren, thanks! [18:12:12] Coren: thanks, works now [18:14:05] Coren: while I remember, could you add User:alpha to the tools project please? :) [18:21:47] Coren: I'm not able to start a webservice. Project is "persondata". I had a custom lighty config, but this doesn't work since this morning, so I now stopped it and tried the normal "webservice start" (and also "webservice2 start"). The job could be found in "qstat", but the server says "No webservice". [18:23:12] JohnLewis: {{done}} [18:23:20] thanks :D [18:24:54] apper: That's odd; your actual webservice is definitely running but it looks like the proxy doesn't know about it for some reason. [18:25:30] YuviPanda: If you're around, ^^ could use your wisdom. [18:25:54] Coren: thanks for looking into this. [18:28:08] (03PS1) 10John F. Lewis: rename file extension + remove redundant file [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/195101 [18:30:45] (03CR) 10Aldnonymous: [C: 031] "New update for WMT." [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/195101 (owner: 10John F. Lewis) [18:31:56] PROBLEM - Puppet failure on tools-shadow is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:33:07] (03CR) 10John F. Lewis: [C: 032 V: 032] rename file extension + remove redundant file [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/195101 (owner: 10John F. Lewis) [18:38:30] Coren: there is something wrong with the proxy... for tool "wikihistory" I'm getting an empty 404 answer (Content-Length: 0) no matter if there is a webservice job running or not [18:39:36] apper: Please open a phab and cc Yuvi. I'm trying to look into it but Yuvi is the proxy guru [18:44:15] 10Tool-Labs: Problems with web proxy for Tool Labs - https://phabricator.wikimedia.org/T91939#1099251 (10APPER) 3NEW [18:45:32] Coren: afais the -t parameter is not supported by jsub, would you (in principal) mind adding it? [18:46:30] if otherwise, would i achieve -mem 1g with -hard -l h_vmem=1g? [18:46:58] RECOVERY - Puppet failure on tools-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [18:47:12] gifti: I'm pretty sure I don't want to have jsub support array jobs - it's meant to be an easy sauce wrapper for people who don't want to deal with the oddness of qsub but if you do array jobs you really should be using the raw interface. [18:47:46] But if you want h_vmem to be usable on tools-exec-gift I need to set a maximum amount for the node. [18:47:54] (Right now, you don't have one) [18:48:14] i just need the tasks to not oom [18:49:06] It's not hard to add an h_vmem limit to your node; you just have to tell me what you want the maximum to be and it won't schedule over that maximum [18:49:52] i don't really know yet [18:50:30] do you mean an overall limit for all tasks combined? [18:50:40] Right. [18:51:07] As jobs start, they reserve their h_vmem and jobs that would go over the limit are queued until it's available again. [18:52:03] so the memory size for a single task isn't restricted at all by the grid engine? only by the killer? [18:52:20] The most conservative way to set it is to put the max at (ram+swap-500M); this guarantees no overcommitment. [18:52:45] No, if you -hard -l h_vmem it'll restrict the process even if there is no total limit. [18:53:02] But then you can still run 500 jobs with a limit of 1G and the os will run out of patience. :-) [18:53:19] this is hard to grasp for me [18:53:33] what is responsible for killing my tasks right now? [18:53:39] The OOM killer only starts if the OS is out of memory. That's not the gridengine, it's the kernel. [18:53:47] ah [18:53:51] strange [18:53:51] 10Tool-Labs-tools-WMT-bots: Implement whitelisting function - https://phabricator.wikimedia.org/T91940#1099262 (10JohnLewis) 3NEW [18:54:25] Oh! [18:54:28] see 'qacct -j 8737604 -t 1 [18:54:33] ' for example [18:54:38] 3 seconds runtime [18:54:45] 51.379M maxvmem [18:55:14] Well yeah, but what tickles the oom killer is the total memory used, not any one process. [18:55:25] ah [18:55:33] is there a way to prevent this? [18:55:38] Once it wakes up, it'll kill according to an algorithm that appears random but that favors recent processes. [18:56:48] gifti: Prevent the oom killer from waking? Not use too much memory. That's the kernel's last resort before the box goes asploding and there is no way to prevent it from triggering if the kernel believes it must. [18:57:10] hm [18:57:51] so, the node thinks 27 tasks in parallel are just enough [18:58:48] It's also probably data-driven unless your code has very strict static allocation. [18:59:38] it worked until a few weeks ago, this may be caused by my custom tcl c code [19:00:47] maybe i compiled it in a way, so that it consumes more memory than usual … [19:00:48] gifti: Lemme dig in see if I can find further information. [19:00:52] i don't remember [19:03:56] gifti: is a single job spawning workers? [19:04:04] gifti: At first glance there were 191 workers when that happened. [19:04:25] Betacommand: yes [19:05:08] gifti: then depending on how the OoM killer works it might be looking at combined memory usage totals for killing [19:05:13] gifti: Ah, they're not synchronous. Ignore what I just said - 27 died at once, it's just the highest numbered one that was 191 [19:05:52] IE vmem or similar could have gotten too high [19:07:48] (03PS1) 10John F. Lewis: send WMT bugs to #wmt [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/195104 [19:08:04] 03/08/2015 05:35:58| main|tools-exec-gift|W|job 8737604 exceeds job hard limit "h_vmem" of queue "giftbot@tools-exec-gift.eqiad.wmflabs" (296910848.00000 > limit:268435456.00000) - sending SIGKILL [19:08:24] Aha! It's your local shepherd that went boom. [19:08:47] (03CR) 10Legoktm: [C: 032] send WMT bugs to #wmt [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/195104 (owner: 10John F. Lewis) [19:08:59] (03Merged) 10jenkins-bot: send WMT bugs to #wmt [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/195104 (owner: 10John F. Lewis) [19:10:02] Wait, gifti, are you putting in an h_vmem limit? [19:10:16] no [19:11:16] gifti: You're being attacked by a stray default, it looks like. [19:11:48] 268435456 is 256M [19:12:31] hm [19:12:37] interesting [19:13:16] Coren: called it :P [19:13:42] 256M is indeed the default; but it looks like it's being applied to you even though there is no h_vmem limit for the node. How annoying. [19:14:00] gifti: It's per job though - not per task. So it's the sum that's biting you. [19:14:28] gifti: Try starting you tasks with -l h_vmem=1G or so. [19:14:47] !log tools.wikibugs Updated channels.yaml to: 614ee42338f6ab3f8d0705d3f0358523189af00e send WMT bugs to #wmt [19:14:51] Logged the message, Master [19:14:57] But also, if the behaviour changed the default most certainly did not in the past 2 years - so something /did/ happen to increase the memory your jobs use. [19:15:24] ok [19:15:46] i will try to run the job with the default interpreter and see again [19:15:55] Coren: it could have been just skirting the limit, and something minor caused a few extra bytes to push it over the limit [19:16:15] … which i already do [19:16:28] I ran into that a while back with a PHP update [19:21:43] 10Tool-Labs-tools-WMT-bots: Redesign how setlists are generated - https://phabricator.wikimedia.org/T91941#1099275 (10JohnLewis) 3NEW a:3JohnLewis [19:24:47] yay, seems to work again [19:25:08] Coren: thank you for your analysis! [19:31:08] legoktm: thanks :) [20:24:03] 10Tool-Labs-tools-WMT-bots: -delete is broken - https://phabricator.wikimedia.org/T62750#1099290 (10Southparkfan) $page (the messed up variable) is equal to $matches2[2], which is defined by a preg_match - and the input string for that preg_match is given by yet another preg_match. Less preg_matches might help h... [20:36:28] (03PS1) 10John F. Lewis: restructure the repo [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/195111 [20:45:56] (03CR) 10Southparkfan: "Haven't tested the "new bot" yet, but seems good." [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/195111 (owner: 10John F. Lewis) [20:52:00] (03CR) 10Aldnonymous: [C: 031] "Seems the typo already fixed/removed." [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/195111 (owner: 10John F. Lewis) [21:09:50] (03PS2) 10John F. Lewis: restructure the repo [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/195111 [21:13:33] (03CR) 10John F. Lewis: [C: 032 V: 032] restructure the repo [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/195111 (owner: 10John F. Lewis) [21:24:11] https://tools.wmflabs.org/reasonator/?q=Q7817498 Reasonator is down ... is it the webserver ? [21:29:15] GerardM-: no [21:32:13] it does not work as you can see [21:34:02] GerardM-: It's some of the javascript that fails to load off magnustools. [21:55:48] 6Labs, 10Tool-Labs: Multithreading does not seem to work on uwsgi in toollabs - https://phabricator.wikimedia.org/T91156#1099342 (10scfc) @Mjbmr: @yuvipanda probably didn't want to advise you on how to code, but solve the issue you claimed there is. There are various helpful hints for good bug reporting, with... [23:38:03] Hey, X!'s tool needs a restart. [23:44:26] Cyberpower678: It appears as if X!'s tool is down. Do you mind restarting it when you get the time? Thanks.