[06:43:43] PROBLEM - Puppet failure on tools-submit is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:47:18] PROBLEM - Puppet failure on tools-exec-09 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:51:55] PROBLEM - Puppet failure on tools-webgrid-05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:13:38] RECOVERY - Puppet failure on tools-submit is OK: OK: Less than 1.00% above the threshold [0.0] [07:17:17] RECOVERY - Puppet failure on tools-exec-09 is OK: OK: Less than 1.00% above the threshold [0.0] [07:21:01] Legoktm: ? [07:21:56] RECOVERY - Puppet failure on tools-webgrid-05 is OK: OK: Less than 1.00% above the threshold [0.0] [07:28:59] legoktm: wait, a local merge?? [07:29:05] legoktm: because 9538cc69ef4226d248a38fa86dadca6d646b6b37 sure is not on github [07:30:15] legoktm: oh, I think that's my fault. [07:31:15] legoktm: reset to origin/master now [07:33:30] autolist might be a good thing to move to hhvm [07:40:14] !log tools increase memory limit for autolist from 4G to 7G [07:40:17] Logged the message, Master [07:44:17] 3Tool-Labs: Autolist keeps dying - https://phabricator.wikimedia.org/T86134#962008 (10yuvipanda) 3NEW [07:44:32] 3Tool-Labs: Autolist keeps dying - https://phabricator.wikimedia.org/T86134#962008 (10yuvipanda) (This should probably be filed elsewhere) [07:45:20] 3Tool-Labs: Autolist keeps dying - https://phabricator.wikimedia.org/T86134#962015 (10yuvipanda) I have: - Added a bigbrotherrc file so it would get restarted automatically - Move it to trusty, where it will use php5.5 which is faster / more memory efficient - Bump up the memory limit for it from 4G (standard)... [07:47:42] 3Wikidata, Tool-Labs: Autolist keeps dying - https://phabricator.wikimedia.org/T86134#962016 (10Lydia_Pintscher) [08:09:56] (03CR) 10Merlijn van Deen: "If we want to do this (I'm not sure whether this is actually better than just having #wikimedia-releng contain all projects), I'd rather s" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183371 (https://phabricator.wikimedia.org/T86053) (owner: 10Hashar) [08:46:39] 3Tool-Labs: Investigate using monit to replace bigbrother - https://phabricator.wikimedia.org/T76840#962127 (10yuvipanda) [09:01:42] (03CR) 10Hashar: "The use of mapping instead of sequences is indeed rather hacky. That is transient though, we will get rid of the -qa and -devtools channe" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183371 (https://phabricator.wikimedia.org/T86053) (owner: 10Hashar) [09:03:03] 3Labs-Team, Wikimedia-Labs-Infrastructure: port 22 blocked in some cases despite being allowed with security groups - https://phabricator.wikimedia.org/T86143#962149 (10yuvipanda) 3NEW [09:03:31] 3Labs-Team: Set up ssh checks for all labs hosts - https://phabricator.wikimedia.org/T86027#962155 (10yuvipanda) [09:09:21] 3Labs-Team, Wikimedia-Labs-Infrastructure: port 22 blocked in some cases despite being allowed with security groups - https://phabricator.wikimedia.org/T86143#962163 (10yuvipanda) It's also inconsistent - deployment-mediawiki01 and 02 can't be reached, but 03 can be. -salt and -bastion can easily be reached as w... [09:28:05] (03CR) 10Merlijn van Deen: "Copy paste + tests to check sounds good to me!" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183371 (https://phabricator.wikimedia.org/T86053) (owner: 10Hashar) [09:28:11] (03PS1) 10Gilles: Wikibugs should listen to the Multimedia project for the multimedia channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183452 (https://phabricator.wikimedia.org/T77947) [09:29:48] (03CR) 10Merlijn van Deen: [C: 04-1] "Maybe it's better to add Multimedia to the list (instead of replacing it) while Herald rules are not in place yet?" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183452 (https://phabricator.wikimedia.org/T77947) (owner: 10Gilles) [09:30:04] (03CR) 10Merlijn van Deen: [C: 04-1] Duplicate -qa notifcations to -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183371 (https://phabricator.wikimedia.org/T86053) (owner: 10Hashar) [09:31:19] (03CR) 10Gilles: "The list that was there was quite incomplete and we add the multimedia tag to tasks that don't have it almost daily." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183452 (https://phabricator.wikimedia.org/T77947) (owner: 10Gilles) [09:44:42] (03CR) 10Merlijn van Deen: [C: 032] "OK, if you think that's the best option, let's do that." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183452 (https://phabricator.wikimedia.org/T77947) (owner: 10Gilles) [09:45:01] (03Merged) 10jenkins-bot: Wikibugs should listen to the Multimedia project for the multimedia channel [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183452 (https://phabricator.wikimedia.org/T77947) (owner: 10Gilles) [09:45:41] gi11es: deployed! [09:46:09] legoktm: we should get an automagic pull for wikibugs2 :-p [09:51:43] !log tools.wikibugs Updated channels.yaml to: 019f6b0366a97df69733f7c80303aec8058ecb79 Wikibugs should listen to the Multimedia project for the multimedia channel [09:51:48] Logged the message, Master [10:50:42] (03PS1) 10Merlijn van Deen: Add Wikibugs to -labs [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183464 [10:50:54] (03CR) 10jenkins-bot: [V: 04-1] Add Wikibugs to -labs [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183464 (owner: 10Merlijn van Deen) [10:51:27] (03PS1) 10Merlijn van Deen: Get translatewiki.net project changes to #mediawiki-i18n [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183465 (https://phabricator.wikimedia.org/T86148) [10:51:43] (03CR) 10jenkins-bot: [V: 04-1] Get translatewiki.net project changes to #mediawiki-i18n [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183465 (https://phabricator.wikimedia.org/T86148) (owner: 10Merlijn van Deen) [10:52:19] grrr [10:52:49] (03PS2) 10Merlijn van Deen: Add Wikibugs to -labs [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183464 [10:53:12] (03PS2) 10Merlijn van Deen: Get translatewiki.net project changes to #mediawiki-i18n [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183465 (https://phabricator.wikimedia.org/T86148) [10:53:20] YuviPanda: ^ rubberstamp kthx? :> [10:53:27] Ok [10:55:03] (03CR) 10Yuvipanda: [C: 031] Get translatewiki.net project changes to #mediawiki-i18n [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183465 (https://phabricator.wikimedia.org/T86148) (owner: 10Merlijn van Deen) [10:55:17] valhallasw`cloud: rubber stamped [10:55:27] YuviPanda: you have +2, silly person ;-) [10:55:37] (03CR) 10Merlijn van Deen: [C: 032] Get translatewiki.net project changes to #mediawiki-i18n [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183465 (https://phabricator.wikimedia.org/T86148) (owner: 10Merlijn van Deen) [10:55:56] (03Merged) 10jenkins-bot: Get translatewiki.net project changes to #mediawiki-i18n [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183465 (https://phabricator.wikimedia.org/T86148) (owner: 10Merlijn van Deen) [10:55:59] YuviPanda: also https://gerrit.wikimedia.org/r/183464 ? [10:56:46] I'm also on my phone and didn't want to merge things I'm not deploying [10:56:56] YuviPanda: OK [10:57:02] (03CR) 10Yuvipanda: [C: 031] Add Wikibugs to -labs [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183464 (owner: 10Merlijn van Deen) [10:57:48] (03CR) 10Merlijn van Deen: [C: 032] Add Wikibugs to -labs [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183464 (owner: 10Merlijn van Deen) [10:57:53] thx [10:57:57] * valhallasw`cloud hugs YuviPanda :3 [10:58:01] (03Merged) 10jenkins-bot: Add Wikibugs to -labs [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183464 (owner: 10Merlijn van Deen) [10:58:11] :D [10:59:23] !log tools.wikibugs Updated channels.yaml to: 29b1c027a31c7650094b195e70e3a4ac82c05d00 Merge "Add Wikibugs to -labs" [10:59:23] 3translatewiki.net, Wikibugs: Get translatewiki.net project changes to #mediawiki-i18n - https://phabricator.wikimedia.org/T86148#962312 (10valhallasw) 5Open>3Resolved a:3valhallasw [10:59:25] Logged the message, Master [11:34:27] PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:59:23] RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0] [13:04:09] 3translatewiki.net, Wikibugs: Get translatewiki.net project changes to #mediawiki-i18n - https://phabricator.wikimedia.org/T86148#962573 (10Nikerabbit) Thanks for the swift fix. [13:17:57] 3Continuous-Integration, Wikimedia-Labs-General: Create labs project for continuous integration nodepool - https://phabricator.wikimedia.org/T55978#962605 (10Krinkle) [13:18:11] 3Continuous-Integration, Wikimedia-Labs-General: Create labs project for continuous integration nodepool - https://phabricator.wikimedia.org/T55978#578885 (10Krinkle) p:5Triage>3Normal [13:26:18] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<11.11%) [13:36:16] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [13:52:42] 3Continuous-Integration, Wikimedia-Labs-General: Create labs project for continuous integration nodepool - https://phabricator.wikimedia.org/T55978#962689 (10hashar) I am merging this up in its parent task {T47499}. Will fill some new tasks for each of the item listed there. [13:52:56] 3Continuous-Integration, Wikimedia-Labs-General: Create labs project for continuous integration nodepool - https://phabricator.wikimedia.org/T55978#962691 (10hashar) [13:56:24] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: Create labs project for CI disposables instances - https://phabricator.wikimedia.org/T86167#962706 (10hashar) 3NEW [14:02:16] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<11.11%) [14:04:25] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: OpenStack API account to control `contintcloud` labs project - https://phabricator.wikimedia.org/T86170#962741 (10hashar) 3NEW [14:04:40] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: Create labs project for CI disposables instances - https://phabricator.wikimedia.org/T86167#962749 (10hashar) [14:06:28] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: Create labs project for CI disposables instances - https://phabricator.wikimedia.org/T86167#962706 (10hashar) [14:06:40] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: OpenStack API account to control `contintcloud` labs project - https://phabricator.wikimedia.org/T86170#962755 (10hashar) [14:11:00] 3Continuous-Integration, Wikimedia-Labs-General: Create labs project for continuous integration nodepool - https://phabricator.wikimedia.org/T55978#962770 (10hashar) [14:11:44] 3Continuous-Integration, Wikimedia-Labs-General: Create labs project for continuous integration nodepool - https://phabricator.wikimedia.org/T55978#578885 (10hashar) I have filled new tasks for the list of item that were there: {T86168} {T86170} {T84989} [14:25:52] 3Labs-Team: New Labs project requests (Tracking) - https://phabricator.wikimedia.org/T76375#962811 (10hashar) [14:29:06] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: Create labs project for CI disposables instances - https://phabricator.wikimedia.org/T86167#962706 (10hashar) [14:32:16] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [14:35:52] Hey, *I* actually need help from a tool writer for a change. :-) Anyone here with experience enough with OAuth to write me a (I expect) trivial little webservice in short order? [14:36:20] Coren: sure [14:36:35] Coren: does it have to be in perl, or would python also work ;-) [14:42:32] nooooo peeerrrllll! [14:46:29] Coren: any idea what could be causing https://phabricator.wikimedia.org/T86143 [14:48:32] goooood morning [14:49:06] !log contincloud created project per request at https://phabricator.wikimedia.org/T86167 [14:49:07] YuviPanda: some beta cluster instances have ferm::rules() [14:49:07] contincloud is not a valid project. [14:49:14] !log contintcloud created project per request at https://phabricator.wikimedia.org/T86167 [14:49:18] Logged the message, Master [14:49:22] hashar: but mediawiki01 and 02 aren’t reachable, but 03 is [14:49:33] * hashar blames openstack [14:49:40] 03 is half backed iirc [14:49:42] 3Labs-Team: New Labs project requests (Tracking) - https://phabricator.wikimedia.org/T76375#962873 (10yuvipanda) [14:49:46] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: OpenStack API account to control `contintcloud` labs project - https://phabricator.wikimedia.org/T86170#962876 (10yuvipanda) [14:53:17] 3Labs-Team, Wikimedia-Labs-Infrastructure: port 22 blocked in some cases despite being allowed with security groups - https://phabricator.wikimedia.org/T86143#962903 (10hashar) Lot of the beta cluster instances have ferm::rules applied to them. I can confirm deployment-mathoid and mediawiki02 both have the ferm... [14:53:24] YuviPanda: details at https://phabricator.wikimedia.org/T86143#962903 [14:53:41] YuviPanda: tl;dr the ferm:rules default rules in puppet have to be adjusted to allow ssh from the Shinken host [14:54:39] hashar: ah, hmm. I was suspecting that as well, but got thrown off by mw03 didn’t have it [14:54:43] let me poke around [14:54:45] hashar: thanks! [14:56:15] YuviPanda: and thanks for the labs project creation! [15:06:20] YuviPanda: Looking [15:06:31] andrewbogott_afk: Ping when you are around. [15:06:35] Coren: greg-g asked me yesterday about the labs DNS madness status. Seems it still fails to respond from time to time and no idea how I can help. It might just be overloaded. [15:07:07] hashar: It is. As soon as I manage to corner Andrew I have an alleviating measure to deploy [15:07:34] Coren: have you looked whether some IP might be spamming the dns server? [15:07:42] hashar: But I don't want to do something that drastically labs-wide without a nod from Andrew (and possibly an OMG-don't-do-it-that-way-or-X!) [15:07:53] haha [15:08:21] Coren: hmm, so base::firewall has ' rule => 'saddr $MONITORING_HOSTS ACCEPT;’,' [15:08:27] where is the $MONITORING_HOSTS set? [15:08:39] hashar: There are a couple outliers, but I fear we're just hitting against the (mediocre, wimpy) capacity of dnsmasq [15:08:48] I don’t see it anywhere else in puppet [15:09:29] YuviPanda: akosiaris might know [15:09:41] YuviPanda: network.pp [15:09:50] it's a ferm variable [15:10:03] oh, that explains why my all caps GREP didn’t catch that [15:10:11] paravoid sees all, and knows all. :-) [15:10:21] modules/base/templates/firewall/defs.erb is the magic that creates it [15:10:28] and indeed, akosiaris wrote that [15:10:40] Coren: if there are any offenders from beta cluster / contint I am willing to know. Can surely fix them [15:11:18] hashar: Well, I didn't separate by project since there were no major outliers, but lemme see if there are big ones from beta. [15:13:57] paravoid: nice. [15:14:04] I think the labs IPs there might use some cleaning up [15:14:42] there = network.pp? [15:14:45] yeah [15:15:09] feel free :) [15:15:16] yeah, investigating to make sure :) [15:20:01] 3Labs-Team: Cleanup labs IPs in network.pp - https://phabricator.wikimedia.org/T86183#963017 (10yuvipanda) 3NEW [15:22:12] 3Labs-Team: Cleanup labs IPs in network.pp - https://phabricator.wikimedia.org/T86183#963031 (10yuvipanda) p:5Triage>3Low [15:31:45] Coren: labstore1001 disk space is on a WARNING state for 15d [15:31:52] 15d21h to be exact :) [15:32:25] paravoid: Moar hardware on the way. I don't want to expand the lvm before next week's copy though. [15:32:34] ok [15:32:40] paravoid: That said, I should proably ask the bigger outliers to clean up after themselves. [15:33:00] I'm pretty sure some of that is wasted. [15:34:13] YuviPanda: The biggest DNS users are our proxies, when counted together. Interesting. [15:34:37] Coren: oh, hmm. I can make the proxies hit a different DNS server. [15:34:45] Coren: or, make proxylistener set IP addresses instead? [15:34:51] YuviPanda: Does your nginx config reverse the IPs for the logs? [15:35:06] Coren: no, but it proxies back using domain name, not IP [15:35:13] also why isn’t it caching them!? [15:35:22] Coren: it doesn’t reverse client ip for log, no [15:35:28] YuviPanda: Lemme look at what is being looked up [15:35:56] ok [15:36:53] Coren: we could also set up a local resolver/recursor just on the proxies [15:38:28] Holy stupid, batman. It really looks as though ngnix does _no_ caching whatsoever. [15:39:04] 90% of what your proxy queries is 'maps-tiles3.eqiad.wmflabs.' and 'maps-wma1.eqiad.wmflabs.' [15:39:30] Coren: ugh, that’s very stupid. [15:39:54] Mine is even more "funny". It doesn't have issues with the real names, but it keeps looking up 'tools-webgrid-01.' and so on at the TLD [15:39:55] Coren: it also doesn’t use system resolver, we have to specify its resolver manually [15:40:34] Coren: so looks like we can solve our issues with a local dns resolver in each of those machines? :) [15:40:37] YuviPanda: I think our best solution for both of those is to add a caching resolver [15:40:42] yeah [15:40:44] agree [15:40:54] Coren: should I take that up or do you want to do it? [15:41:04] I haven’t really played around with DNS before, so this might be a good / bad time [15:41:27] YuviPanda: Together, those are about 40% of the current DNS load. The rest is spread out somewhat evenly around labs, but one of the exe nodes has a disproportionate 10% or so I'm going to look into. [15:41:36] right [15:44:36] YuviPanda: I'm not going to just use Ubuntu's default bind9 config which is literally just a recusor. [15:44:47] s/not just/just/ [15:45:18] * Coren tests. [15:45:26] ok [15:46:41] Ahh, right, but ngnix won't use the local dns by default will it. [15:46:44] * Coren grumbles. [15:47:02] Coren: ah, so nginx doesn’t use *any* DNS by default [15:47:12] Coren: we set the server for it to use. it’s a param to the dynamicproxy class [15:47:16] we can easily set that to 127.0.0.1 [15:48:31] Bleh. If we have to do that anyways, we might as well point them at a more general labswide recursor that will combine caching. [15:48:41] Which is the first plan all along. [15:49:08] And which I will do now. [15:51:24] Coren: even then, having a local recursor would make the proxies themselves faster [15:51:33] since that’ll provide them local caching [15:51:52] Coren: and I suspect that if we take the proxies off the main recursor the load will drop a fair bit [16:08:16] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<33.33%) [16:09:55] YuviPanda: Well, let's try it first then. [16:11:57] Coren: yup. let’s try it on the general proxy first and see what happens [16:13:38] I'm looking in modules/dynamicproxy but I don't see where dns is configured. [16:13:56] Coren: it has a $resolver setting [16:14:18] Ah, I just saw it. [16:14:20] Coren: and if you look at domainproxy.conf and urlproxy.conf, they have resolver <%= resolver %>; [16:14:50] So it's actually a class parameter. Goodie. [16:14:55] yeah [16:42:35] 3Labs-Team, Continuous-Integration, Wikimedia-Labs-Infrastructure: OpenStack API account to control `contintcloud` labs project - https://phabricator.wikimedia.org/T86170#963208 (10hashar) Dear #Labs-Team , do you have any idea how to provide OpenStack API credentials for the `contintcloud` project ? Would use i... [16:43:47] 3Labs-Team, Continuous-Integration, Wikimedia-Labs-Infrastructure: OpenStack API account to control `contintcloud` labs project - https://phabricator.wikimedia.org/T86170#963215 (10yuvipanda) Where are you going to call the API from? From inside labs or from inside production? [16:49:10] YuviPanda: The resolver is already tested as working with that config, btw, it's just a matter of convincing ngnix to use it now. [16:49:30] Coren: :D [16:49:46] Commented to your comments. [16:53:58] YuviPanda: Coren: I am going to leave so no urgency. Could use some comment about how to get credentials to hit our OpenStack API directly ( https://phabricator.wikimedia.org/T86170 ) [16:54:17] hashar: are you going to hit them from inside labs or from outside labs? [16:54:21] as in, from prod? [16:54:24] no clue yet [16:54:36] either prod, a labs server or a labs instance :D [16:54:56] I am still gathering pieces to write down the architecture document [16:55:17] I think prod is not able to hit virt machines [16:55:37] so will probably end up with the system being installed on a bare metal in the labs network, much like labsmon [16:56:46] hashar: yup, so I think you’ll need something like that first. [16:57:00] not sure if you can hit the API from inside labs. [16:57:17] does anyone know what populates /etc/ssh/userkeys/foo on precise labs hosts? [16:57:53] YuviPanda: Dude, if this works properly we almost certainly want the bind options to point at $resolver; making a new class is serious overkill imo [16:58:29] YuviPanda: Given that nginx doesn't cache, it makes no sense to not have it live alongide a resolver imo. [16:58:32] Coren: sure, in which case we should drop the $resolver. If it doesn’t work we can just revert and get resolver back? [16:59:26] Wait, so you'll drop it now, then put it back when we parametrize the caching? I'm all for good linting but I think you're overdoing it now. :-) [17:00:02] paravoid: https://phabricator.wikimedia.org/T59751 has some info [17:00:11] paravoid: "" The cron that manages this runs on labstore2 rather than labstore1. (I'm not sure why.) It calls manage-keys, which adds keys to /mnt/keys and logs to /var/log/manage-keys.log. """ [17:00:33] Coren: why can’t we parameterize the caching in this patch itself? [17:00:35] not in puppet? wtf? [17:00:38] paravoid: found while searching Phabricator for /etc/ssh/userkeys which yields some other results https://phabricator.wikimedia.org/search/query/krpUjSoip_sF/#R [17:00:46] paravoid: tech debt from when labs got setup :/ [17:01:04] YuviPanda: yeah you are correct. Should write down my doc first, ask question after :] thx [17:01:18] hashar: That's very very out of date. [17:01:28] Coren: anyway, we’re almost bikeshedding now :) [17:01:47] I also filled a task to fetch the keys from ldap https://phabricator.wikimedia.org/T59752 :D might have been implemented nowadays [17:01:51] YuviPanda: Because I wanted to *test* the actual improvement before spending the brain cycles. :-) [17:02:20] Coren: ah, I see. ok [17:02:34] hmm, I thought we had testing instances for both the proxies [17:02:36] Coren: ok, then where is it? [17:03:16] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [17:03:27] paravoid: modules/ldap/manifests/client.pp it's misnamed "manage-keys-nfs" now and runs on the active labstore (atm, labstore1001) [17:03:36] 3Labs-Team, Continuous-Integration, Wikimedia-Labs-Infrastructure: OpenStack API account to control `contintcloud` labs project - https://phabricator.wikimedia.org/T86170#963332 (10chasemp) p:5Triage>3Normal [17:03:50] Coren: oh but that's for /public/keys [17:04:39] root@bastion-restricted1:~# ls /etc/ssh/userkeys/ [17:04:39] root ubuntu [17:04:42] where is that coming from? [17:04:44] and wth is ubuntu? :) [17:04:55] Hm. If manage-keys did more than manage-keys-nfs every did, I wasn't aware. The latter is just supposed to be the former for the NFS server vs gluster. [17:05:41] modules/admin/manifests/user.pp ? [17:05:45] no [17:05:53] admin isn't even applied in labs [17:06:01] Ah, true. [17:06:11] I know scap creates stuff in /etc/ssh/userkeys [17:06:15] But that's it [17:06:28] sure, that I can see [17:06:32] "root" I can't :) [17:06:57] Hm. Maybe it's part of the base image? andrewbogott_afk comments? [17:07:16] beta has a hack to put a key in /etc/ssh, scap doesn't in general [17:07:19] Oh wait, it might also be in -private [17:07:48] good point [17:07:57] it is [17:08:00] well, root is [17:08:03] ubuntu... no clue [17:12:25] off, have a good rest of your day! [17:15:06] Coren: btw, nginx needs to be reloaded manually, it’s not automatic [17:15:16] YuviPanda: Amusing, the proxy isn't subscribed to the change of config file? [17:15:20] Yeah, what you said. :-) [17:15:29] Coren: nope, intentional, I think. [17:15:37] well, it’s intentional in the nginx:: module [17:16:00] Ah, hm. It might require a restart vs reload - doesn't look like the config change had any effect [17:16:16] afaict, nginx is still doing queries itself. [17:16:35] Coren: is this on dynamicproxy? [17:16:39] * Coren nods [17:17:37] Lemme try a (more gentle?) force-reload first [17:18:37] No dice. I see the config changed to resolver 127.0.0.1 but it's still doing the queries to 10.68.16.1 [17:19:00] Coren: try a restart? [17:21:30] * Coren inspecting the result now. [17:21:55] (03CR) 10Greg Grossmeier: "Btw, the duplication is only temporary as we move channels. After everything is duplicated/setup in -releng correctly we'll disable the no" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183371 (https://phabricator.wikimedia.org/T86053) (owner: 10Hashar) [17:26:09] OYHGTBFKM [17:26:35] YuviPanda: So, little to no improvement. Except that bind is quite a bit more verbose about what is going on. [17:26:56] dnsmasq returns responses with a TTL of... 0! [17:27:05] oh. [17:27:06] wat [17:27:09] is that why nginx isn’t caching [17:27:16] It may very well be. [17:27:30] gurrr [17:27:34] yes, I didn’t see that either. [17:28:01] Coren: I suppose we should fix *that* [17:28:14] There is no reason why we should have even looked. Who ever heard of a [bleep]ing DNS server giving a TTL of 0 to authoritative answers?! [17:29:23] That also explains with the nscd cache hit rate is so low in general. [17:29:34] * Coren strangles dnsmasq [17:29:54] Now to track down where that insano-braindead TTL comes from. [17:34:15] (03CR) 10Legoktm: "If this is temporary I'd rather just copy-paste since we're going to have to undo this anyways once the other channels are gone." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/183371 (https://phabricator.wikimedia.org/T86053) (owner: 10Hashar) [17:37:55] Coren: looks like dnsmasq sets ttl to 0 if the ip is coming from the DHCP leases [17:38:03] we can override by setting local-ttl [17:41:03] Except that we can't; dnsmasq settings are managed by openstack [17:41:23] Ah, nvm. [17:41:31] You're doing the change /there/ :-) [17:53:56] PROBLEM - Puppet failure on tools-exec-gift is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:56:48] PROBLEM - Puppet failure on tools-uwsgi-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:58:22] PROBLEM - Puppet failure on tools-webgrid-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:59:33] PROBLEM - Puppet failure on tools-exec-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:59:45] PROBLEM - Puppet failure on tools-webgrid-04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:00:44] Coren: ^ I suppose these hit dnsmasq when it was just down? [18:00:47] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:02:47] 3Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#963720 (10yuvipanda) dnsmasq was overloaded because it was setting TTL to 0 for all responses, which sounds rather terrible and stupid. Have set it to 300s now, and let's see how this goes. [18:11:53] YuviPanda: Pretty sure that's the case. [18:12:19] Coren: yeah. let’s wait for a day or so I guess and then mark that bug as fixed [18:12:53] Coren: I think this should also be considered an upstream bug... [18:13:36] 3Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#963755 (10coren) Indeed, TTL of 0 lies deep in the middle of batsh*t crazy territory - even single requests can be duplicated needlessly (because glue records). [18:14:33] It's absolutely a bug. There is no sane reason to have a TTL of zero ever. If you need crazy fast turnaround of dynamic records you're misusing DNS (or, if you really must, set ttl at 5-10s not 0) [18:14:34] Coren: hmm, tools-exec-gift has puppet failures because it still can’t resolve itself [18:14:53] Coren: I guess it hit a nxdomain cache in the local cache? [18:15:01] nscd has a (short) negative cache [18:15:22] 2 or 5 min, iirc. [18:15:34] I’m restarting it [18:15:37] to see if that fixes it [18:15:51] It won't - the cache persist. You can flush it though. [18:16:31] nscd -i hosts [18:16:31] yeah, invalidating now [18:16:36] why can't i ssh to tools-webgrid-05? >.> [18:17:04] annika_: Works for me. [18:17:09] annika_: What happens when you try? [18:17:11] annika_: from which host? [18:18:00] gifti@tools-login:~$ ssh tools-webgrid-05 [18:18:00] Permission denied (publickey). [18:18:01] Coren: huh, I can’t ssh there either. [18:18:39] Coren: specifically, HBA doesn’t seem to work [18:18:43] Coren: but bastion access does. [18:19:07] Oooooh. It's paravoid's ssh fixes. Changed which host keys are presented. [18:19:23] ECDSA keys are there now. [18:19:33] why would this fail for labs? [18:19:51] paravoid: Probably just tool labs; I collect keys with puppet for HBA [18:20:01] collecting how? [18:20:07] :D [18:20:07] how do you export them? [18:20:24] ssh::hostkeys-collect was adjusted [18:20:24] paravoid: In an ugly way; they're exported to a shared directory and collected by puppet on the bastions. [18:20:40] paravoid: Because labs doesn't have resource collection enabled. [18:20:43] well, you get to keep both pieces :) [18:20:44] :P [18:21:03] 3Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#963781 (10yuvipanda) There will be some failures for the next hour or so - if local nscd has cached an nxdomain for something in the few seconds the dns server was down. You can wait for th... [18:21:07] Yeah, I know. It shouldn't be hard to fix. [18:21:27] not sure how your fix will look like but don't make any assumptions about key types [18:21:32] we might move to ed25519 at some point [18:21:40] or whatever is current [18:21:45] where is this code? not puppet? [18:22:03] 3Labs-Team: Fix dnsmasq in nova to not cache - https://phabricator.wikimedia.org/T80293#963782 (10yuvipanda) p:5High>3Normal Actually, *this* should be declined. It *wasn't* caching, and that almost killed it. [18:22:28] 3Labs-Team: Fix dnsmasq in nova to not cache - https://phabricator.wikimedia.org/T80293#963785 (10yuvipanda) See T72076 [18:22:52] paravoid: It's in puppet. modules/toollabs/manifests/init.pp [18:22:59] 3Labs-Team: Fix dnsmasq in nova to not cache - https://phabricator.wikimedia.org/T80293#963788 (10yuvipanda) 5Open>3declined Recreating dead instances with same name should be rare enough for the 5min TTL to not be a big deal, I think. [18:23:21] oops [18:23:29] sorry, I grepped for other variances but I missed that [18:23:48] No worries. Right now, it presumes RSA so I'll have to be a bit smarter about how I do it. [18:23:50] still [18:24:00] Permission denied isn't the right error message [18:24:12] and SSH falls back gracefully when new key types are added [18:24:30] so if your known_hosts contains an RSA key and the server provides both RSA and ECDSA, it prefers RSA [18:24:40] so there shouldn't be any issue here [18:24:45] paravoid: It's not in principle; but because the hosts aren't known HBA is ignored entirely and there is no other usable auth. [18:25:05] paravoid: It looks like the hostkeys might have been... regenerated? That'd be odd. [18:25:15] * Coren digs deeper. [18:25:20] oh I purged unmanaged hostkeys :) [18:25:49] anything that's not managed by puppet (by virtue of the sshkey resource) is now... gone :) [18:26:04] ... yeay? [18:26:16] heh, sorry [18:26:42] What I don't get is that it doesn't seem to have been universal. [18:26:48] RECOVERY - Puppet failure on tools-uwsgi-01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:26:53] annika_: Can you ssh to tools-webgrid-01? [18:27:16] i can [18:27:35] Ah-ha! [18:27:56] paravoid: Turns out you are correct, the RSA keys still work. [18:28:07] paravoid sees all and knows all. [18:28:24] RECOVERY - Puppet failure on tools-webgrid-03 is OK: OK: Less than 1.00% above the threshold [0.0] [18:28:36] based on which hostkeys are already recorded in known_hosts. This [18:28:36] avoids hostkey warnings when connecting to servers with new ECDSA [18:28:36] keys, since these are now preferred when learning hostkeys for the [18:28:39] first time [18:28:45] Your changed just made the error confusing on my end because of the complaint about the new key. [18:29:36] RECOVERY - Puppet failure on tools-exec-02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:29:46] RECOVERY - Puppet failure on tools-webgrid-04 is OK: OK: Less than 1.00% above the threshold [0.0] [18:30:48] RECOVERY - Puppet failure on tools-exec-catscan is OK: OK: Less than 1.00% above the threshold [0.0] [18:31:08] annika_: For some reason, -webgrid-05 didn't have HBA turned on in Wikitech. Fixing now. [18:31:20] thx [18:32:34] annika_: Try now? [18:33:07] works [18:33:41] YuviPanda: I'm keeping an eye on dnsmasq, but I'm not seeing drop requests anymore (that its current load is a fraction of what it was surely helped) [18:33:51] Coren: yup [18:33:58] RECOVERY - Puppet failure on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [18:35:29] 3Labs-Team: Fix dnsmasq in nova to not cache - https://phabricator.wikimedia.org/T80293#963811 (10coren) If it ever turns out to be an issue, reducing the TTL slightly would be acceptable (but no lower than 2 min or so - it takes that long to build an instance anyways) [18:35:42] 3Tool-Labs: Make HHVM based webservices available on toollabs - https://phabricator.wikimedia.org/T78783#963814 (10yuvipanda) [18:35:43] 3Tool-Labs: Create a static file server service for tools - https://phabricator.wikimedia.org/T84982#963812 (10yuvipanda) 5Open>3Resolved Done now. [18:41:19] Coren: so, not related at all? [18:42:22] paravoid: Nope. Works fine with RSA still. The only thing your change did was change what appeared in the logs which made it look like your change had an influence. [18:42:44] ok [18:42:56] regardless, your code should be adjusted to prefer ECDSA [18:43:05] It should indeed. [18:43:42] Coren: also, I’m thinking of starting work on a maintain-replicas replacement that would explicitly list columns even for fully whitelisted views. [18:43:47] (and also move it off perl) [18:43:54] Doesn't look like those keys are in facter though, so new facts will be needed. [18:44:18] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<11.11%) [18:44:49] YuviPanda: I know, you already told me that and I already agreed. :-) Only thing is I wouldn't want that to share any code or data with the sanitarium code to avoid the same error propagating. [18:45:15] Coren: hmm, I think they *should* share data, though. [18:45:25] otherwise it’ll literally just be two copies of the same data [18:45:43] That makes the exercise pointless; it's no longer two layers of defense but the same layer twice. [18:45:52] Coren: wait, ‘sanitarium’? [18:46:09] I meant your audit code. [18:46:21] I was under the impression it ran on the sanitarium? [18:46:32] Coren: no, it hits labsdb. [18:46:36] Ah. [18:47:00] At any rate, they should be entirely independent so that an error in one becomes visible/breaks in the other. [18:47:04] Coren: right now, that is. I’m verifying view definitions only at the moment, mostly because there’s already so many random columns to clean up [18:47:17] Coren: hmm, thoughts on how exactly to do that without sharing data? [18:47:20] Otherwise, any error or typo in the maintain code will just be blitherly okayed by the audit. :-) [18:47:35] Coren: we have no reference for ‘how prod should be’, since prod has a lot more cruft than labsdb :D [18:47:51] We do have one: mw.org documentation of the schema. [18:48:20] nice job finding the ttl=0 thing. [18:48:38] Coren: hmm, *technically* I could parse tables.sql [18:48:40] in core [18:48:53] YuviPanda: That'd be nice, for one of the implementations. [18:49:06] Coren: for the whitelist, maybe. for the greylist? [18:49:09] johang: Thank bind for being more verbose about why it's doing things. [18:49:09] we’d still need to specify conditions [18:49:56] YuviPanda: Indeed; and that's a very good reason why they both should be independently maintained so that if maintain-replicas creates a broken condition the auditor will notice. [18:50:06] Coren: it's not every day someone praises bind. [18:50:21] johang: It's relative praise. "Better than dnsmasq" :-) [18:50:35] Coren: but it’s going to be literally the same file, no? I mean, the code will be totally independent. We have no other source for greylist. [18:51:05] YuviPanda: No, that's my point. It should be two files, maintained independently. No copypasta. [18:51:14] hmm [18:51:54] Otherwise, any mistake gets put in live without so much as a peep of protestation. [18:52:25] I suppose, that should also have associated processes [18:52:34] and not live in the same repo, for example. [18:53:11] like, different people have to modify them differently [18:53:18] Coren: alright, I’ll think up ways to parse this + the process [18:53:25] Well, same repo seems okay to me - but different people seems very good process. [18:53:48] in fact, if we just set up an airtightish process (ie. having different people update the files), I can just not parse off table.sql [18:53:52] and just have people update files [18:54:15] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [18:56:08] 3Tool-Labs, LabsDB-Auditor: Make labsdb views fully column-whitelist based - https://phabricator.wikimedia.org/T86218#963882 (10yuvipanda) 3NEW [18:56:50] 3Tool-Labs, LabsDB-Auditor: Make labsdb views fully column-whitelist based - https://phabricator.wikimedia.org/T86218#963882 (10yuvipanda) This will replace maintain-replicas.pl. It should live separately from labsdb-auditor and have a separate set of data files to read off. Will also need a documented process t... [19:02:55] anyway, I’m off to sleep now [19:02:56] night [19:03:08] Happy zzzs. [19:03:34] Coren: not so much. have to sleep now b’coz have an MRI scheduled early morning tomorrow (9am) [19:04:15] * YuviPanda|zzz hopefully will wake up on time [19:06:18] * valhallasw`cloud hugs YuviPanda|zzz [19:06:23] good luck tomorrow [19:07:16] bah, i lose my connections to labs more often than normal today [19:08:22] actually, i only lose my connections when i have my adsl reconnect, but not today (except if it has adsl reconnects every hour-ish now …) [19:10:11] annika_: If you type 'last ' you'll see if your IP changed between disconnections. [19:10:33] Also, the labs bastions support mosh; and those survive IPs changing. You could always look into that. [19:11:15] it's bastion1 always … maybe i should look there … [19:11:31] last time i tried mosh it didn't work at all [19:11:55] annika_: you can use mosh with toollabs [19:11:59] As a note, if you mostly use just tools it maintains its own bastions that you can use directly [19:12:11] Coren: regular bastions are useless with mosh. it has no proxycommand nor key forwarding [19:12:26] YuviPanda|zzz: Ah, good point. [19:12:28] well, i just use proxycommand for everything ;) [19:12:42] Coren: so we do have mosh, but it’s just not very useful :) [19:12:56] annika_: If you want to mosh, you should connect directly to login.tools.wmflabs.org [19:13:25] wait, it's not tools-login.wmflabs.org? [19:13:41] That too, but the canonical name is login.tools [19:13:46] ah [19:14:00] interesting [19:14:28] let's fix that in the docs, then :-) [19:14:39] Well, perhaps not canonical since it's not the actual hostname - but it's the guaranteed stabled name. However you want to call that. :-) [19:14:43] stable* [19:15:17] valhallasw`cloud: Stop making sense! [19:15:56] ok, ip hasn't changed [19:16:06] must be something else [19:16:54] maybe my wireless … [19:17:02] Coren: mm, should tools-static then also really be static.tools? [19:17:09] Coren: are we allowed to use screen for that? [19:17:35] annika_: Keep your session live? That's what it's /for/ :-) [19:17:56] i thought it was banned on tools or something [19:18:03] Also multiplexing of course. [19:18:15] annika_: No, running bots in a detached screen is. [19:18:23] ah, well, then … :) [19:18:39] Coren: 'tis fixed! [19:18:54] valhallasw`cloud: You rule. [19:19:24] valhallasw`cloud: Right now, subdomain names need manual intervention, but static.* sounds like a good idea. [19:19:46] 3Tool-Labs: Make tools-static.wmflabs.org available on static.tools.wmflabs.org - https://phabricator.wikimedia.org/T86222#963956 (10valhallasw) 3NEW [19:22:26] 3Tool-Labs: Make tools-static.wmflabs.org available on static.tools.wmflabs.org - https://phabricator.wikimedia.org/T86222#963974 (10yuvipanda) Uh oh, needs its own ssl certificate, I think. [19:23:10] YuviPanda|zzz: GO TO BED (also, I think you're right :( ) [19:23:25] ah, only problems … [19:23:44] YuviPanda|zzz: Or modify the current one and add yet another alias [19:23:56] multichill: current certificate is *.wmflabs.org [19:27:14] * doesn't match a "." AFAIK [19:27:46] Not sure if you're allowed to add *.*.wmflabs.org as an alias [19:28:00] multichill: don’t think so. that’s why lots of betalabs doesn’t have ssl [19:28:35] 3Tool-Labs: Make tools-static.wmflabs.org available on static.tools.wmflabs.org - https://phabricator.wikimedia.org/T86222#963992 (10yuvipanda) p:5Triage>3Volunteer? [19:31:21] 3Labs-Team, Wikimedia-Labs-Infrastructure: port 22 blocked in some cases despite being allowed with security groups - https://phabricator.wikimedia.org/T86143#963997 (10yuvipanda) p:5Triage>3Low [19:31:32] 3Tool-Labs, Labs-Team: Document labsdb replication set up - https://phabricator.wikimedia.org/T85868#964001 (10yuvipanda) p:5Triage>3Normal [19:31:43] 3Tool-Labs, Labs-Team: Document labsdb replication set up - https://phabricator.wikimedia.org/T85868#964003 (10yuvipanda) a:3yuvipanda [19:32:08] 3Labs-Team, Wikimedia-Labs-Infrastructure: Have shinken check for basic labs infrastructure - https://phabricator.wikimedia.org/T75865#964005 (10yuvipanda) p:5Triage>3Normal [19:35:15] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<22.22%) [19:41:07] 3Tool-Labs, Wikidata: Autolist keeps dying - https://phabricator.wikimedia.org/T86134#964033 (10scfc) (Assuming Autolist is http://tools.wmflabs.org/autolist/.) [19:41:49] 3Tool-Labs-tools-Other, Wikidata: Autolist keeps dying - https://phabricator.wikimedia.org/T86134#964036 (10scfc) [19:43:58] meh, readline doesn't work as before when i use screen :( [19:45:17] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [19:53:31] annika_: huh? [19:53:52] TERM issues [20:05:32] oh, hm, it's independent of screen :( [20:07:34] and of my terminal emulator [20:07:42] duh [20:07:53] this used to work [20:08:50] is this the right place for tools-labs questions? [20:10:48] yes [20:11:37] ok, my problem is solved with rlwrap --set-term-name dumb :p [20:21:34] i have a bot in tools that appears to be running but it's invisible to me (via ps) [20:22:07] i was thinking maybe i started it a long time ago as a job, but i can't seem to figure out how to list the jobs [20:22:08] edsu: qstat [20:22:39] valhallasw`cloud: qstat doesn't output anything, so i guess it's not a job? [20:23:07] edsu: well, at least not being controlled by the grid engine anymore [20:23:16] ok [20:23:20] was this with the right account? [20:23:32] i believe so yes [20:23:39] i can see the log file being written to at least [20:24:00] huh, weird [20:24:10] i am using dev and i seem to remember there is a dev environment too? [20:24:26] sorry, i'm using node (not dev) [20:24:37] edsu: dev.tools.wmflabs.org, login.tools.wmflabs.org and trusty.tools.wmflabs.org are the three login servers [20:24:57] so i could see different processes on each potentially? [20:25:06] yes [20:25:24] and then there's the whole range of tools-exec-* hosts, but it shouldn't be running there if it's not shown by qstat [20:26:11] ahh yes, it was on dev.tools.wmflabs.org ; thanks! [20:26:20] the filesystem is shared though i guess eh? [20:26:30] /data/project [20:26:37] it is [20:27:37] edsu: That said, you shouldn't be running a bot outside the grid. [20:29:30] Coren: yeah, i should change that eh? [20:30:09] Yes, yes you should. Off-grid bots are under a permanent death sentence and can be killed randomly as they are noticed. :-) [20:30:53] they're just never noticed it seems [20:31:39] last time you rebooted tools-login, they were all up again [20:51:57] Coren: can we get some progress on https://phabricator.wikimedia.org/T47646 ? [21:08:02] It shouldn't be /too/ hard to do, but it's not on my immediate todo list I fear. If you find someone to write a patch against the class in puppet to fiddle with the symlinks, I'll review and merge it though. [21:08:51] 3Wikimedia-Labs-Infrastructure: Create -latest alias for dumps - https://phabricator.wikimedia.org/T47646#964239 (10coren) p:5Normal>3Volunteer? [21:09:24] It's probably a reasonably easy task for someone who wants to cut their teeth. [21:57:47] Coren: it's sort of working \o/ [21:58:15] except I have to tell flask what it's url prefix is somehow, which magically worked in the fcgi world (using uwsgi now) [22:04:14] YuviPanda|zzz: uwsgi dark magic :{ [22:06:10] how did this suddenly stop working :| [22:07:27] valhallasw`cloud: Where can I play with / test? [22:07:42] Coren: well, under /outofband, I had a 404 working, so calling into flask worked [22:07:46] but that's now broken, too :/ [22:08:37] That's the error message for no webservice [22:08:46] Your server went boom. [22:08:57] no, this is uwsgi throwing http/500s [22:13:51] I think I'm going to need YuviPanda|zzz for this [22:14:57] Or you could use the lighttpd/fcgi scheme. :-) [22:15:20] Coren: potentially :-p [22:15:32] let's see how quickly I get that set up [22:23:22] Coren: working \o/ [22:23:25] "OutOfBand secure communication" is no longer approved as a Connected App, contact the application author for help. [22:23:33] Coren: yeah, not approved *yet* ;-) [22:24:08] Coren: https://merlijn.vandeen.nl/drafts/flask-mwoauth-on-tools.html [22:24:22] that's very rough, to say the least, but it's something [22:24:41] Coren: you can get another key and try with that (you can always login with your own user) [22:24:47] only thing missing is returning the right html file ;-) [22:25:34] ... I'm not seeing how. [22:27:28] Coren: the author of an oauth app can always use a key/secret, even if the app is not approved [22:27:47] Ah! [22:27:51] so if you propose a new version at https://www.mediawiki.org/wiki/Special:OAuthConsumerRegistration/propose and use the keys you get, you can test [22:28:01] Lemme see if I can push the approval forward rather. :-) [22:28:12] that's more effective, yes ;-) [22:29:27] valhallasw`cloud: Are you under NDA? [22:29:31] Coren: no [22:29:40] I'll have to remove you as maintainer then. :-( [22:29:47] I'd happily sign an NDA if you think that's effective, but you can also kick me out and add YuviPanda|zzz instead [22:29:48] np [22:29:56] lemme fix the crappy html first :p [22:30:04] Sure. [22:30:50] I'm not going to kick you out without warning. Go ahead and tweak; I'll have to create a consumer myself first so that the contact is right (there doesn't seem to be a way to /edit/ the publisher and switch it anyways. [22:31:10] *nod* [22:31:23] Tell me when you're ready to hand it off, I'll reject yours and approve mine. :-) [22:33:50] Coren: https://imgur.com/jFMCcgn,CF9sypD,g8goE20,C6cfQ1A [22:33:57] good enough for me ;-) [22:34:22] the IOError isn't the prettiest, but it tells you what's going on [22:34:43] Coren: also please check if the permissions are OK, www/python/app/secrets is now 700, which I think means no-one can read any file inside [22:34:55] same for app.py which contains sensitive tokens [22:35:00] anyway, have fun with it :-) [22:35:19] I will. TYVM. [22:41:10] valhallasw`cloud: there's an app.py and and app.pyc; is the conversion of one to the other automated in some way or does it need manual intervention? [22:41:22] Coren: app.fcgi calls app.py [22:41:38] Coren: app.fcgi is basically middleware between lighttpd and the flask app [22:41:44] What's the pyc for? uwsgism? [22:42:11] Coren: python bytecode cache, automatically generated when you run python app.py [22:42:39] Cool beans. I presume the app.secret_key is arbitrary? [22:43:02] Coren: the first one? yeah, it's used to sign cookies [22:43:23] then consumer_key and consumer_secret are the oauth params [22:43:27] (called token and secret by MW) [22:43:32] Those I got. :-) [22:43:37] and the rest is fairly obvious [22:43:50] you might want to undutchify the 404 as well (or just remove that handler altogether) [22:44:28] Meh. This is special purpose enough that I probably will never prettify it. :-) [22:44:33] anyway, I'm off to bed now. Feel free to kick me off the project! [22:44:39] Thanks again! [22:44:44] you're welcome!