[00:11:06] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Armyboy2 was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1029574 edit summary: [00:20:44] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2824933 (10scfc) [00:23:13] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Check that all k8s nodes are in 'ready' condition - https://phabricator.wikimedia.org/T140248#2824937 (10scfc) [00:23:44] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Health check for k8s etcd - https://phabricator.wikimedia.org/T140247#2824939 (10scfc) [00:24:22] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Monitor k8s flannel etcd health - https://phabricator.wikimedia.org/T140246#2824941 (10scfc) [00:25:45] 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 07Tracking: Packages to be installed in Tool Labs Kubernetes Images (Tracking) - https://phabricator.wikimedia.org/T140110#2824945 (10scfc) [00:25:47] 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 13Patch-For-Review: Install libmysqlclient-dev in tools python2 kubernetes containers - https://phabricator.wikimedia.org/T140112#2824943 (10scfc) 05Open>03Resolved a:03yuvipanda [00:26:50] 06Labs, 10Tool-Labs: Setup running uwsgi webservices on k8s - https://phabricator.wikimedia.org/T139783#2824946 (10scfc) [00:30:16] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Move tools-db and tools-redis into DNS - https://phabricator.wikimedia.org/T139190#2824952 (10scfc) p:05Triage>03Normal [00:31:02] 06Labs, 10Tool-Labs, 13Patch-For-Review: Add yanker redirect - https://phabricator.wikimedia.org/T136924#2824955 (10scfc) p:05Triage>03Normal a:03valhallasw [00:32:34] 06Labs, 10Tool-Labs: Update nginx on tools and labs proxies and static file server - https://phabricator.wikimedia.org/T134383#2824958 (10scfc) [00:33:26] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Setup monitoring for kubernetes core components. - https://phabricator.wikimedia.org/T131929#2824960 (10scfc) [00:36:11] 06Labs, 10Tool-Labs: Convert most top level tool and bastion dns records to CNAMEs - https://phabricator.wikimedia.org/T131796#2824962 (10scfc) [00:37:21] 06Labs, 10Tool-Labs, 13Patch-For-Review: Offer Korean Locales "ko_KR.euckr" and "ko_KR.utf8" on Tool Labs - https://phabricator.wikimedia.org/T130532#2824964 (10scfc) a:03valhallasw [00:39:27] 06Labs, 10Tool-Labs, 03Scap3 (Scap3-Adoption-Phase1): Setup a proper deployment strategy for Kubernetes - https://phabricator.wikimedia.org/T129311#2824971 (10scfc) [00:41:03] 06Labs, 10Tool-Labs, 06Operations, 10Traffic, 07HTTPS: Detect tools.wmflabs.org tools which are HTTP-only - https://phabricator.wikimedia.org/T128409#2824992 (10scfc) [00:45:52] 06Labs, 10Tool-Labs, 10DBA: Tool Labs queries die - https://phabricator.wikimedia.org/T127266#2824997 (10scfc) [00:46:46] 06Labs, 10Tool-Labs, 07Tracking: Packages to be added to toollabs puppet - https://phabricator.wikimedia.org/T55704#2825001 (10scfc) [00:46:49] 06Labs, 10Tool-Labs, 13Patch-For-Review: Install inkscape - https://phabricator.wikimedia.org/T126933#2824999 (10scfc) 05Open>03Resolved a:03valhallasw [00:49:00] 06Labs, 10Tool-Labs: Instrument jsub/jstart/webservices usage - https://phabricator.wikimedia.org/T123444#2825002 (10scfc) [00:52:24] 06Labs, 10Tool-Labs, 10labs-sprint-116, 10labs-sprint-117, 10labs-sprint-118: Allow direct ssh access to tools - https://phabricator.wikimedia.org/T113979#2825005 (10scfc) [00:53:55] 06Labs, 10Tool-Labs, 13Patch-For-Review, 15User-bd808: install python-ldap dependencies - https://phabricator.wikimedia.org/T114388#2825006 (10scfc) 05Open>03Resolved a:03bd808 [00:56:42] 06Labs, 10Tool-Labs: Reduce amount of Tools-local packages - https://phabricator.wikimedia.org/T91874#2825010 (10scfc) [00:59:14] 06Labs, 10Tool-Labs: Make http (404, 302, 301 etc) statistics for toolserver.org - https://phabricator.wikimedia.org/T85167#2825012 (10scfc) [00:59:55] 06Labs, 10Tool-Labs, 07Puppet: Puppetize adding a host to a particular queue - https://phabricator.wikimedia.org/T88713#2825013 (10scfc) [01:00:46] 06Labs, 10Tool-Labs: /etc/cron.daily/logrotate: gzip: stdin: file size changed while zipping - https://phabricator.wikimedia.org/T96007#2825014 (10scfc) [01:01:27] 06Labs, 10Tool-Labs, 10puppet-compiler: toolsbeta: set up puppet-compiler / temporary-apply - https://phabricator.wikimedia.org/T97081#2825015 (10scfc) [01:03:46] 06Labs, 10Tool-Labs: support python3 uwsgi apps - https://phabricator.wikimedia.org/T104374#2825018 (10scfc) [01:04:42] 06Labs, 10Tool-Labs, 06Operations, 10Traffic, 07HTTPS: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#2825020 (10scfc) [01:06:27] 06Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#2825021 (10scfc) [01:16:18] 06Labs, 10Tool-Labs, 10Mail: Set up shinken for tools-mail exim paniclog - https://phabricator.wikimedia.org/T96898#2825024 (10scfc) [01:34:19] 06Labs, 10Tool-Labs: Implement metrics for tool labs (under NDA?) - https://phabricator.wikimedia.org/T121233#2825029 (10scfc) [02:21:26] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Move kubernetes authentication to using X.509 client certs - https://phabricator.wikimedia.org/T144153#2590052 (10scfc) I had thought about using certificates to authenticate tools in services in the past, mostly so that there would be only one "thing" per tool and not... [02:25:02] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [03:00:00] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:22:59] yuvipanda: see #wikimedia-operations, what's going on there? [03:24:57] doctaxon: we are seeing some instability in the api servers. Folks are looking at it. [03:25:18] yes, i get many 502 and 503 errors [03:25:34] nothing's working any more [03:26:44] also see T151686 [03:26:45] T151686: several 502 Bad Gateway - https://phabricator.wikimedia.org/T151686 [03:27:46] doctaxon: "nothing's working" might be a bit hyperbolic. we are seeing increased 5xx api responses but its not a complete outage by any stretch [03:35:40] bd808: but those 5xx API responses are VERY increased [03:35:58] but you're working on it, and thanks for that [05:29:59] !log tools.sal Fixed long standing pagination bug and upgraded PHP libraries [05:30:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.sal/SAL [06:33:18] PROBLEM - Puppet run on tools-worker-1015 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [06:37:33] PROBLEM - Puppet run on tools-exec-1414 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [06:40:00] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 531 bytes in 0.019 second response time [07:01:24] Hi all. I'm running an IRC bot. I jstopped and jstarted my program and the bot can no longer connect to Freenode. The bot connects fine in the login server. [07:02:05] Wonder what can possibly be the cause of this. [07:02:17] And written in Python [07:02:43] Yup. [07:03:46] I was thrown a broken pipe error, so definitly sounds like something network related. [07:05:03] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.041 second response time [07:08:17] RECOVERY - Puppet run on tools-worker-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [07:12:31] dargasia: freenode throttles the number of clients per IP, so it's possible you're just hitting a limit that has nothing to do with your bot. [07:12:34] RECOVERY - Puppet run on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [07:12:44] I don't immediately recall how we deal with that in general, though :( [07:13:13] probably best to write to labs-l or re-ask here when people are more awake [07:13:34] Andrewbogott short term solution would be to give all exec nodes a floating ip [07:14:09] Ah, that might be the cause then. [07:14:20] yuvipanda: yeah, that was the reason we have them on half the nodes, right? [07:14:21] I tried again and got placed on a different node / outbound IP, and things started working again. [07:14:33] seems like we should be able to beg an exception from freenode [07:14:41] dargasia: heh, yeah, that sounds like exactly the issue then [07:15:37] Andrewbogott right. Should get them on all for now [07:16:04] I'll make a bug, unless you think there's already one... [07:17:16] Andrewbogott please do [07:19:38] 06Labs, 10Tool-Labs: Freenode sometimes throttles bot connections from tools - https://phabricator.wikimedia.org/T151704#2825129 (10Andrew) [12:31:24] learning how to usethe grid: I run [12:31:26] echo "mvn package" | qsub -cwd -e mvnpackage.log.err -N mvnDbnaryEtymology -o mvnpackage.log.out [12:31:35] and got error Eqw [12:31:46] any idea why? [12:33:36] Epantaleo: I'm not sure what that command is supposed to do [12:34:04] to run command mvn package [12:34:34] you have to pass the command as parameter to qsub (or jsub, which has a friendlier interface) [12:34:49] i.e. qsub/jsub mvn package [12:35:57] thanks. when I do qsub -cwd -e mvnpackage.log.err -N mvnDbnaryEtymology -o mvnpackage.log.out mvn package [12:36:14] I get Unable to read script file because of error: error opening mvn: No such file or directory [12:37:47] if I run "mvn package" on the tools bastion it runs but gives me: Error occurred during initialization of VM [12:37:47] [ERROR] java.lang.OutOfMemoryError: unable to create new native thread [13:13:23] Epantaleo: I don't know what the exact parameters to qsub would be, but jsub takes a -mem parameter [13:13:32] to request a given amount of memory [13:13:38] for maven, you probably want something like 1GB [13:13:52] thanks [13:13:53] there's a few notes on wikitech on what java settings work best, I think [13:14:35] what I have understood by rerunning the command is that on the grid it doesn't find mvn as a command [13:14:43] while on the bastion it has it [13:15:11] ah. That could be. [13:15:24] you could try using tools-dev instead of tools-login [13:15:42] tools-dev is meant for things like compiling things, and may have some more memory available [13:15:54] the java error was given by the tools-dev [13:18:19] I don't really know enough about java memory management to really be of help here :( [14:03:45] (03PS1) 10Jean-Frédéric: Add script to print a ready-made deploy message for SAL [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323689 [14:05:01] (03CR) 10jenkins-bot: [V: 04-1] Add script to print a ready-made deploy message for SAL [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323689 (owner: 10Jean-Frédéric) [14:18:30] (03PS1) 10Jean-Frédéric: Fix Flake8 violation E305 and pin flake8 dependency [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323692 [14:53:05] (03PS2) 10Jean-Frédéric: Add script to print a ready-made deploy message for SAL [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323689 [17:02:06] valhallasw`cloud: legoktm: Hi. I see you're both listed as mantainers of ReleaseTaggerBot. It's not working for some days now. Can you restart it? https://phabricator.wikimedia.org/T151725 Thanks. [18:28:53] I have a problem with wdq.wmflabs.org . I seem to always get timeouts. Anyone else? [18:29:21] Timeouts doing what? [18:29:24] Trying to run queries? [18:29:28] Just viewing the page? [18:30:00] Example: https://wdq.wmflabs.org/wdq?q=string[214:%2264192849%22] [18:30:07] Running queries [18:31:00] seems my internet is sucking [18:31:42] https://wdq.wmflabs.org/api/ itself doesn't respond as well , but https://wdq.wmflabs.org/ does [18:32:25] Yeah, it does seem rather slow [18:32:38] I'd suggest filing a task in phabricator, someone should look at it probably tomorrow [18:43:18] (03PS4) 10Zppix: Adding quit message when issued command 'force-restart' [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323342 (https://phabricator.wikimedia.org/T151508) [18:45:34] (03PS5) 10Zppix: Adding quit message when issued command 'force-restart' [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323342 (https://phabricator.wikimedia.org/T151508) [19:19:13] (03Draft1) 10Paladox: Test: Do not merge [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 [19:19:17] (03Draft2) 10Paladox: Test: Do not merge [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 [19:20:13] (03CR) 10Zppix: "can you please, make this appart of the other change to keep it from getting messy?" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 (owner: 10Paladox) [19:21:14] (03CR) 10Paladox: "> can you please, make this appart of the other change to keep it" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 (owner: 10Paladox) [19:22:35] (03CR) 10Zppix: "No i mean add it to the gerrit change i created for the phabricator task" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 (owner: 10Paladox) [19:23:22] (03CR) 10Paladox: "Oh, but that will be messy and will mess your change up, this is for marktraceur to see what I have done so far." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 (owner: 10Paladox) [19:23:46] (03CR) 10Zppix: "Isnt this basically the same exact change?" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 (owner: 10Paladox) [19:24:27] (03CR) 10Paladox: "No, this is what I have done locally, this dosent work correctly yet, but once it is working we can add it to your change." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 (owner: 10Paladox) [19:25:11] (03CR) 10Zppix: "ok" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 (owner: 10Paladox) [19:36:01] (03PS3) 10MarkTraceur: [WIP] Attempting to send a quit message [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 (owner: 10Paladox) [19:37:53] (03PS4) 10MarkTraceur: [WIP] Attempting to send a quit message [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 (owner: 10Paladox) [19:40:34] (03PS5) 10MarkTraceur: [WIP] Attempting to send a quit message + add logging [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323709 (owner: 10Paladox) [20:24:25] (03PS1) 10MarkTraceur: Remove hacks, fix logging [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323712 [20:24:59] (03CR) 10jenkins-bot: [V: 04-1] Remove hacks, fix logging [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323712 (owner: 10MarkTraceur) [20:25:32] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: etcd hosts hanging with kernel hang - https://phabricator.wikimedia.org/T140256#2825802 (10hashar) I haven't seen that kernel soft lock occurring for a while. I guess it was a bug in the kernel that ran on labvirt hosts. [20:26:01] (03PS2) 10MarkTraceur: Remove hacks, fix logging [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/323712 [20:28:54] 06Labs, 10Tool-Labs: Create sqldump script - https://phabricator.wikimedia.org/T151680#2824464 (10scfc) I don't really like the idea. I think there would be quite a few users trying out `sqldump enwiki_p`. [20:30:45] 06Labs, 10Tool-Labs: Create sqldump script - https://phabricator.wikimedia.org/T151680#2825807 (10Reedy) >>! In T151680#2825805, @scfc wrote: > I don't really like the idea. I think there would be quite a few users trying out `sqldump enwiki_p`. Well, you can prevent it, if you really care. Of course, they c... [20:38:02] Reedy: ^ I'm confused about the use case here. What part of it can't be solved by just using mysqldump? Dumping a production database on tool labs doesn't sound like a very common use case... [20:38:22] valhallasw`cloud: A user was trying to dump one of their own tables [20:38:26] /db's [20:39:39] mmm [20:40:04] and you want to auto-fill the --defaults-file --h enwiki_p.labsdb part [20:40:16] pretty much [20:40:40] As I linked, I did it for prod to scratch an itch [20:42:51] yes, but there *you* do the maintenance ;-) [20:44:15] I don't think anyone has made any changes to it since :P [20:44:18] nothing functional anyway [20:46:19] Though. You can template it out, and make 95% of it common [20:47:08] 06Labs, 10Tool-Labs: Create sqldump script - https://phabricator.wikimedia.org/T151680#2825813 (10valhallasw) [20:47:19] 06Labs, 10Tool-Labs: Create sqldump script - https://phabricator.wikimedia.org/T151680#2824464 (10valhallasw) If we can somehow merge this into the existing sql script (e.g. by having one script with two names, or by copying out the server resolution logic): sure. Otherwise, I'm afraid the maintenance is going... [20:56:53] 06Labs, 10Tool-Labs: Create sqldump script - https://phabricator.wikimedia.org/T151680#2824464 (10Platonides) >>! In T151680#2825805, @scfc wrote: > I don't really like the idea. I think there would be quite a few users trying out `sqldump enwiki_p`. They will end up with a VIEW definition ;) @Valhallasw t... [20:58:32] 06Labs, 10Tool-Labs: Create sqldump script - https://phabricator.wikimedia.org/T151680#2825822 (10Reedy) Puppet templates could be used to bring in the common stuff... And then have the different stuff at the bottom... Or parameterise that in too so you have separate scripts on disk [20:59:56] Reedy: building bash scripts with puppet templates? https://cdn.meme.am/cache/instances/folder251/59558251.jpg [21:00:13] It's already done... numerous times [21:29:14] 06Labs, 10Tool-Labs: Create sqldump script - https://phabricator.wikimedia.org/T151680#2825848 (10scfc) Ah, I missed the `VIEW` part for `enwiki_p`. In that case I don't have concerns. [21:38:07] grrrit-wm: force-restart [21:38:08] Re-connecting to Gerrit and IRC. [21:38:50] re-connected to Gerrit and IRC. [22:52:08] 10Tool-Labs-tools-nlwikibots: tvpmelder broken - https://phabricator.wikimedia.org/T151734#2825959 (10Akoopal) [22:59:26] 10Tool-Labs-tools-nlwikibots: tvpmelder broken - https://phabricator.wikimedia.org/T151734#2825979 (10Akoopal) 05Open>03Invalid False alert, I didn't see any reports in the contributions quickly, and I saw a nomination that was not reported to the creator, but that must have been a one off, looking better I...