[00:43:24] So, the problem is not with json but with accessing the local tools database when run as a webservice. I seem to be missing something basic here which I couldn't find in any documentation yet. [01:12:40] what hostname are you using to connect? [01:12:46] is it on /etc/host of the node? [01:13:02] * /etc/hosts [01:26:43] I'm using tools-db as the hostname. I do not know if it is on /etc/hosts of the node. Should I be using replica servers? [01:42:28] ashwinpp, it's just a blind shot on what could be different there [01:42:34] night! [03:20:40] 6Labs, 10Fundraising Tech Backlog, 10MediaWiki-extensions-CentralNotice, 3Fundraising Sprint Rowlf the Dog: Create meta.m.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T110273#1574035 (10awight) 3NEW [03:26:43] 6Labs, 10Beta-Cluster, 10Fundraising Tech Backlog, 10MediaWiki-extensions-CentralNotice, 3Fundraising Sprint Rowlf the Dog: Create meta.m.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T110273#1574042 (10Krenair) [03:31:46] 6Labs, 10Beta-Cluster, 10Fundraising Tech Backlog, 10MediaWiki-extensions-CentralNotice, 3Fundraising Sprint Rowlf the Dog: Create meta.m.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T110273#1574045 (10Krenair) I looked at https://wikitech.wikimedia.org/wiki/Special:NovaAddress for depl... [03:35:20] 6Labs, 10Beta-Cluster, 10Fundraising Tech Backlog, 10MediaWiki-extensions-CentralNotice, 3Fundraising Sprint Rowlf the Dog: Create meta.m.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T110273#1574047 (10Krenair) Since this is urgent, I used "Add host name" on deployment-cache-mobile04, a... [03:37:43] 6Labs, 10Beta-Cluster, 10Fundraising Tech Backlog, 10MediaWiki-extensions-CentralNotice, 3Fundraising Sprint Rowlf the Dog: Create meta.m.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T110273#1574049 (10awight) [03:52:59] 6Labs, 10Beta-Cluster, 10Fundraising Tech Backlog, 10MediaWiki-extensions-CentralNotice, 3Fundraising Sprint Rowlf the Dog: Create meta.m.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T110273#1574052 (10Krenair) ```alex@alex-laptop:~/Development/Wikimedia/Operations-Puppet (master)$ grep... [04:05:30] 6Labs, 10Beta-Cluster, 10Fundraising Tech Backlog, 10MediaWiki-extensions-CentralNotice, 3Fundraising Sprint Rowlf the Dog: Create meta.m.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T110273#1574061 (10awight) http://meta.m.wikimedia.beta.wmflabs.org/wiki/Main_Page Hey, you did it! [04:07:19] 6Labs, 10Beta-Cluster, 10Fundraising Tech Backlog, 10MediaWiki-extensions-CentralNotice, 3Fundraising Sprint Rowlf the Dog: Create meta.m.wikipedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T110273#1574062 (10awight) Woohoo, removing the blocked task link so you can finish cleaning up the docs... [04:53:58] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109, 3Labs-Sprint-111: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1574095 (10Etune) One simple thing that could work right now for you for authentication is to generate a password or a token per user, and pu... [05:07:34] HI, can someone tell me how does one install a new extension in Mediawiki-vagrant? An extension for which no role exists. [05:08:15] I have been trying to setup my extension ( mediawiki.org/wiki/Extension:LanguageTool ) under vagrant unsuccessfully so far [05:08:32] I cloned it inside extensions folder. [05:09:12] Then I modified /srv/mediawiki-vagrant/LocalSettings.php [05:09:23] Is there something I am missing out on? [05:20:18] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109, 3Labs-Sprint-111: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1574138 (10Etune) Some thoughts about service accounts and security contexts: I saw in the other issue you said that each user's jobs run as... [05:33:50] 6Labs, 10Datasets-General-or-Unknown, 10Labs-Infrastructure, 10Wikidata: Wikidata JSON entity dumps not being copied correctly on labs - https://phabricator.wikimedia.org/T109830#1574150 (10Hydriz) [05:33:51] 6Labs, 10Datasets-Archiving, 10Datasets-General-or-Unknown, 10Labs-Infrastructure, 10Wikidata: [Bug] Wikidata JSON dumps gets deleted after every new Wikidata dump - https://phabricator.wikimedia.org/T107226#1574151 (10Hydriz) [06:35:03] 10MediaWiki-extensions-OpenStackManager, 10MediaWiki-Authentication-and-authorization, 6Reading-Infrastructure-Team: Update OpenStackManager to use AuthManager - https://phabricator.wikimedia.org/T110288#1574317 (10Tgr) 3NEW [06:35:53] 10MediaWiki-extensions-OpenStackManager, 10MediaWiki-Authentication-and-authorization, 6Reading-Infrastructure-Team: Update OpenStackManager to use AuthManager - https://phabricator.wikimedia.org/T110288#1574317 (10Tgr) [06:56:20] valhallasw`cloud : around? [07:19:42] 6Labs, 10Labs-Infrastructure, 6operations: disk space on labvirt1007 - https://phabricator.wikimedia.org/T109752#1574464 (10hashar) @andrew seems any instance on labvirt1007 might have ended up being corrupted. deployment-puppetmaster suffers from the same issue that occurred on the Jenkins slaves: files wr... [07:38:11] 6Labs, 10Labs-Infrastructure, 6operations: disk space on labvirt1007 - https://phabricator.wikimedia.org/T109752#1574486 (10hashar) [08:09:16] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1574509 (10hashar) [08:21:13] 6Labs, 10Labs-Infrastructure: cloud-init stacktrace on Precise instance first boot - https://phabricator.wikimedia.org/T110304#1574532 (10hashar) 3NEW [08:22:07] 6Labs, 10Labs-Infrastructure: cloud-init stacktrace on Precise instance first boot - https://phabricator.wikimedia.org/T110304#1574539 (10hashar) There are more going on later: ``` 2015-08-26 08:21:09,912 - __init__.py[ERROR]: config handling of power-state-change, None, [] failed 2015-08-26 08:21:09,912 - c... [08:22:51] 6Labs, 10Labs-Infrastructure: cloud-init stacktrace on Precise instance first boot - https://phabricator.wikimedia.org/T110304#1574540 (10hashar) Note, we can ssh to the instance once cloud-init has completed since puppet first run managed to run. [08:31:34] ankita-ks: partially. Please just ask your question -- maybe others know the answer as well. [08:32:39] valhallasw`cloud : I did twice earlier. :/ I am trying to install my extension inside a labs instance with Mediawiki-vagrant (visual editor role enabled) [08:32:56] but I can not see my extension in the toolBar [08:32:58] ankita-ks: I know very little about mw-vagrant [08:33:05] and mediawiki in general [08:33:22] ah..okay. Not a problem. :) [10:56:14] valhallasw`cloud: ! :D I have another python / labs query for you :D [10:57:11] If I have some python web thing running on the web grid and it requires to be in a virtualenv to start how can I make the virtual env stuff stick through auto / non user controlled restarts? [10:59:46] addshore: use /path/to/venv/bin/python instead of just python [11:00:02] or better yet, use uwsgi [11:00:33] addshore: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#Python_2_.28uwsgi.29 [11:00:49] will read in a sec ;D [11:00:59] if you're using fcgi, use #!/path/to/venv/bin/python as hashbang in your app.fcgi [12:05:24] valhallasw`cloud: cool! many thanks! [12:55:48] where are the global blocks stored in the database? [12:57:04] on centralauth db [12:57:45] but specific whitelist tables for each wiki [13:03:26] ah, thanks. I always forget that db exists [13:05:33] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1575146 (10hashar) Zeljko rebuild integration-slave-precise-1014 :-} [13:52:03] 6Labs, 10Labs-Infrastructure: re-image labnet1001 - https://phabricator.wikimedia.org/T110332#1575296 (10Andrew) 3NEW a:3Andrew [14:02:43] 10Tool-Labs-tools-Other, 7Need-volunteer: Migrate https://toolserver.org/~daniel/WikiSense/* to Tool Labs and provide redirect - https://phabricator.wikimedia.org/T60869#1575321 (10Steinsplitter) @KasiaWMDE: Can WMDE migrate this tool please? It is a very useful tool. [14:03:15] 10Tool-Labs-tools-Other, 6Commons: Move catbot from toolserver to toollabs - https://phabricator.wikimedia.org/T63825#1575326 (10Steinsplitter) [14:03:17] 10Tool-Labs-tools-Other, 7Tracking: Toolserver.org tools that have not been migrated (tracking) - https://phabricator.wikimedia.org/T60865#1575328 (10Steinsplitter) [14:03:19] 10Tool-Labs-tools-Other, 7Need-volunteer: Migrate https://toolserver.org/~daniel/WikiSense/* to Tool Labs and provide redirect - https://phabricator.wikimedia.org/T60869#1575324 (10Steinsplitter) 5declined>3Open a:5daniel>3None [14:04:27] 10Tool-Labs-tools-Other, 7Need-volunteer: Migrate https://toolserver.org/~daniel/WikiSense/* to Tool Labs and provide redirect - https://phabricator.wikimedia.org/T60869#1575332 (10Steinsplitter) [14:05:15] 10Tool-Labs-tools-Other, 7Need-volunteer: Migrate https://toolserver.org/~daniel/WikiSense/* to Tool Labs and provide redirect - https://phabricator.wikimedia.org/T60869#606252 (10Steinsplitter) [14:05:28] 10Tool-Labs-tools-Other, 6Commons, 7Need-volunteer: Migrate https://toolserver.org/~daniel/WikiSense/* to Tool Labs and provide redirect - https://phabricator.wikimedia.org/T60869#606252 (10Steinsplitter) [14:07:38] 10Tool-Labs-tools-Other, 6Commons, 7Need-volunteer: Migrate https://toolserver.org/~daniel/WikiSense/* to Tool Labs and provide redirect - https://phabricator.wikimedia.org/T60869#1575341 (10daniel) @Steinsplitter Which tool, exactly? WikiSense was a collection of about 20 tools. Do you mean CatScan? These... [14:09:54] 10Tool-Labs-tools-Other, 6Commons: Move catbot from toolserver to toollabs - https://phabricator.wikimedia.org/T63825#1575345 (10daniel) [14:09:56] 10Tool-Labs-tools-Other, 7Tracking: Toolserver.org tools that have not been migrated (tracking) - https://phabricator.wikimedia.org/T60865#1575346 (10daniel) [14:09:57] 10Tool-Labs-tools-Other, 6Commons, 7Need-volunteer: Migrate https://toolserver.org/~daniel/WikiSense/* to Tool Labs and provide redirect - https://phabricator.wikimedia.org/T60869#1575342 (10daniel) 5Open>3declined a:3daniel I close this as declined again, since the toolserver project "WikiSense", as c... [14:25:14] jzerebecki@bastion-01:~$ ssh wikidata-builder1.eqiad.wmflabs [14:25:14] ssh: connect to host wikidata-builder1.eqiad.wmflabs port 22: No route to host [14:25:37] seems the instance dropped off the network [14:25:50] https://tools.wmflabs.org/nagf/?project=wikidata-build [14:26:10] jzerebecki: hmm, just saw this in another instance, assumed it was OOM'd a while ago.... [14:26:13] jzerebecki: let me see. [14:27:49] it happened according to nagf on the 24th [14:27:58] hmm [14:28:11] it's also on labvirt1007 which had disk issues earlier.... [14:28:17] no indication of growing mem before it drops off [14:28:18] halfak: do you know when ores-compute stopped? [14:28:32] YuviPanda, I don't [14:28:36] hmm [14:28:44] I just discovered it was unreachable this morning. [14:28:46] I suspect the cause of both is the disk issues on labvirt1007 [14:28:50] I could look in the redis cache to find out. [14:28:51] since they're on the same host [14:28:58] jzerebecki: I'm going to reboot the instance [14:28:58] Or flower logs? [14:29:12] precached was running on there sending ~3 requests/second [14:29:17] To ORES [14:29:17] was flower running anything on ores-compute? [14:29:17] aaah [14:29:18] I see [14:30:09] !log wikidata-build rebooted wikidata-build1 instance, seems to be down from the disk issues on labvirt1007 from a few days ago [14:30:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-build/SAL, Master [14:30:14] Looks like flower was rebooted recently too [14:34:12] YuviPanda: thx [15:02:00] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109, 3Labs-Sprint-111: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1575595 (10scfc) (@Etune: In #Tool-Labs, all web services (and almost all of the grid jobs) run as so-called tool accounts/service groups (cf... [15:17:32] YuviPanda, you there still? [15:18:47] Krenair: yup [15:18:51] (still in european timezone) [15:19:35] YuviPanda, what magic is going on behind https://phabricator.wikimedia.org/T110273#1574045 ? [15:21:43] Krenair: good question. I have no idea - andrewbogott probably knows... [15:21:53] (hopefully we can all switch out of our custom DNS soon!) [15:22:37] I don't really understand how http://meta.wikimedia.beta.wmflabs.org/ goes anywhere because it's not set up in NovaAddress. [15:24:37] I hope it's not just working because something is set up in the labs-wide DNS domains by a cloudadmin... [15:32:07] Krenair: I’m not sure I understand your question... [15:32:19] But also I have an upgrade window coming up so don’t really have time to look, ask me again tomorrow? [15:32:25] ok [15:57:30] Krenair: in the proxy maybe? [15:57:40] unless it's set up in ops/dns [15:57:49] It's not in operations/dns. [15:58:02] It's not in NovaProxy either. [16:12:40] 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-111: Update Labs to OpenStack Juno - https://phabricator.wikimedia.org/T110047#1575901 (10Andrew) [16:12:41] 6Labs, 10Labs-Infrastructure: Upgrade Labs to Openstack Juno - https://phabricator.wikimedia.org/T104587#1575902 (10Andrew) [16:13:41] Krenair: there's a *.beta.wmflabs.org catchall associated with 208.80.155.135 [16:13:48] ldaplist -l hosts | grep "*.beta.wmflabs.org" -C 15 [16:14:28] valhallasw`cloud, but not *.wikimedia.beta.wmflabs.org? [16:14:33] 6Labs, 10Labs-Infrastructure: Upgrade Labs to Openstack Juno - https://phabricator.wikimedia.org/T104587#1421642 (10Andrew) Because Juno is just a stepping stone, I'm going to try a partial upgrade. Horizon and Designate will remain on Icehouse. Horizon, because we may as well change the GUI fewer times rath... [16:14:34] Krenair: correct [16:14:38] so how does it work? [16:15:10] Hi! Can anyone help me reset staff WMF account my password on the beta cluster? [16:15:11] only m.wikimedia.beta.wmflabs.org and meta.m.wikimedia.beta.wmflabs.org are in ldap [16:16:20] AndyRussG, why do you have a separate staff account on the beta cluster...? [16:16:40] Krenair: shouldn't I? I do have the password for my account on wikitech/gerrit [16:16:59] wikitech/gerrit is separate [16:17:25] I don't see why you'd want a separate account in beta, but whatever [16:17:32] Krenair: I do have my password for one account, AndyRussG. I guess the next quesiton is who could maybe make me a centralNotice admin there? [16:17:56] Maybe I have to go through one of these folks? http://meta.wikimedia.beta.wmflabs.org/w/index.php?title=Special:ListUsers&group=sysop [16:18:02] Krenair: as I understand it, dns wildcards catch multiple levels. e.g. http://whatever.dfdsaffsa.arctus.nl/ gives a 'domain not configured' error through a *.ractus.nl catchall [16:18:23] valhallasw`cloud, so basically meta.m.wikimedia.beta.wmflabs.org should have been working already [16:18:40] that's what the bug said, right? That it was just a vhost that wasn't configured [16:18:46] maybe I misread [16:19:06] We didn't set up a vhost [16:19:32] I added an entry in NovaAddress, and it started resolving and HTTP returned a domain unconfigured error [16:19:42] aaah [16:19:55] Shortly after I went to bed, it stopped returning a domain unconfigured error and started showing the actual site in mobile view [16:21:21] I had a question from yesterday. When running a webservice, is it possible to use the local tools-db or do we have to use production replicas. I couldn't find any documentation covering it. [16:22:29] hey ashwinpp [16:22:29] Hi [16:22:30] ashwinpp: you can connect to 'enwiki.labsdb', '.labsdb' from your code [16:22:30] AndyRussG, you should now be able to reset your password for http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:CentralAuth/AGreen_(WMF) [16:22:32] ashwinpp: you should find a mysql ini file at ~/replica.my.cnf [16:22:46] yes, I'm using it to connect to tools-db as of now, because that was the recommended place to keep the database if it does not need the replicas. [16:22:55] ashwinpp: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Database_access [16:23:02] ashwinpp: yup, tools.labsdb [16:24:01] I see, I was using tools-db as the hostname and everything ran perfectly on localhost [16:24:24] So now I simply change the hostname to tools.labsdb, right? [16:24:59] PROBLEM - Puppet failure on tools-exec-1203 is CRITICAL 20.00% of data above the critical threshold [0.0] [16:25:00] ashwinpp: yeah [16:25:11] ashwinpp: and use the username password in ~/replica.my.cnf [16:25:22] yes, I'm doing that [16:25:27] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 40.00% of data above the critical threshold [0.0] [16:25:35] right, so that should work [16:25:53] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1204 is CRITICAL 30.00% of data above the critical threshold [0.0] [16:26:09] Incorrect password on wikitech? [16:26:10] wot? [16:26:17] PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL 44.44% of data above the critical threshold [0.0] [16:26:17] ostriches, /topic [16:26:22] Krenair: I'm very confused now. abc.bla.wikimedia.beta.wmflabs.org gives me .135 (i.e. the *.beta wildcard), but m.wikimedia.beta.wmflabs.org gives me NXDOMAIN. [16:26:27] Bleh [16:26:32] * ostriches goes and makes a coffee [16:26:50] The puppet alerts look like they're from the upgrade, btw [16:26:53] PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL 60.00% of data above the critical threshold [0.0] [16:27:07] Krenair: then again, in ldap there's an entry dc=m-wikimedia-beta which defines m.wikimedia.beta.wmflabs.org as 'something that can have subdomains', so it's probably that. [16:27:08] valhallasw`cloud, the temporary 'domain unconfigured' error might be because I initially assigned the address to the wrong instance, and then changed it [16:27:15] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL 66.67% of data above the critical threshold [0.0] [16:27:17] PROBLEM - Puppet failure on tools-exec-1219 is CRITICAL 50.00% of data above the critical threshold [0.0] [16:27:23] it's probably that, yes [16:27:31] I don't know how this stuff is cached [16:27:36] Krenair: cool! done :) many thanks...! Mmm now I just need centralnotice admin (at least) [16:27:49] and then *.m.wikimedia.beta.wmflabs.org is delegated, so the *.beta.wmflabs.org wildcard doesn't trigger [16:28:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 33.33% of data above the critical threshold [0.0] [16:28:24] PROBLEM - Puppet failure on tools-exec-1205 is CRITICAL 60.00% of data above the critical threshold [0.0] [16:28:26] PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 20.00% of data above the critical threshold [0.0] [16:28:37] YuviPanda: Still not working [16:29:01] ashwinpp: can you expand on what you mean by 'not working'? [16:29:02] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 30.00% of data above the critical threshold [0.0] [16:29:05] YuviPanda: FYI, puppet failures are also because of the wikitech upgrade; facter times out. [16:29:05] specific error messages maybe? [16:29:15] valhallasw`cloud: yeah, figured. [16:29:20] should we kick out shinken-wm? [16:29:23] (it tries to get things like internal IP from openstack) [16:29:25] or just +m [16:29:30] PROBLEM - Puppet failure on tools-exec-1218 is CRITICAL 66.67% of data above the critical threshold [0.0] [16:29:31] but I am not op [16:29:54] I just killed it for now [16:30:03] puppet should restart it [16:30:03] k [16:30:08] when it's back alive [16:30:09] smart! [16:30:12] AndyRussG, bah. I have to grant the right from deploymentwiki rather than metawiki, but the group doesn't exist on deploymentwiki [16:30:22] I have a patch for this but it's years old and no one will approve it [16:30:29] Guess I'll have to use the shitty workaround [16:30:59] Are there error logs where I can get detailed error messages? Currently in uwsgi.log I'm getting 500 error [16:31:00] valhallasw`cloud: heh, and then we'll get flooded with recoveries [16:31:25] ashwinpp: I think you've to turn on debug mode in flask or whatever web framework you're using [16:31:29] I'm sure the problem is not due to urls because unless I'm doing anything with the database, it is working fine [16:31:55] I recently added a puppet role for role::labs::vagrant_lxc. I know i can add that to my project from Special:NovaPuppetGroup, but how can i add that globaly? [16:32:40] ebernhardson: ah, cloudadmin has to add it globally [16:33:03] AndyRussG, done [16:33:06] ashwinpp: what's the name of the tool? [16:33:09] YuviPanda: ahh ok, i'll make a ticket [16:33:13] ebernhardson: +1 thanks [16:33:26] YuviPanda: navlink-recommendation [16:33:51] YuviPanda: Running it in debug mode did not yield better error messages [16:34:24] YuviPanda: The uwsgi.log says "GET /navlink-recommendation/Bioluminescence/ => generated 291 bytes in 68 msecs (HTTP/1.1 500) 2 headers in 84 bytes (1 switches on core 0)" [16:34:48] ashwinpp: I'm looking at it now [16:36:04] Krenair: thanks so much!!!! :D [16:38:04] labs network outage? [16:38:34] YuviPanda, I can connect to bastion? [16:38:45] And I can ping stuff from there. [16:38:49] So no [16:39:07] yeah it is back [16:39:10] local network outage, I guess :D [16:39:16] ashwinpp: can you hit a few test URLs now? [16:39:49] YuviPanda: I did [16:40:08] meh [16:40:10] stupid uwsgi [16:40:32] ashwinpp: can you give me a sample URL? [16:40:45] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1575973 (10bd808) >>! In T110052#1575146, @hashar wrote: > Zeljko r... [16:40:51] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1575975 (10Legoktm) >>! In T110052#1575146, @hashar wrote: > Zeljko... [16:41:08] YuviPanda: http://tools.wmflabs.org/navlink-recommendation/topk/10/ [16:45:02] labs puppet not happy: Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to determined $::labsproject at /etc/puppet/manifests/realm.pp:14 on node search-datavis.shiny-r.eqiad.wmflabs [16:45:13] ebernhardson: yup, see /topic [16:45:31] :) thanks [16:45:35] ashwinpp: aha! [16:46:47] YuviPanda: What was the problem? [16:46:57] ashwinpp: your uwsgi logs have exception logs now [16:47:10] oh man, you really should use version control... [16:47:16] ashwinpp: i edited your app.py to enable debugging [16:47:29] ashwinpp: basically http://stackoverflow.com/a/17839750 and the answer after that [16:47:44] I also highly reccomend using git or a version control system to keep your code safe :) [16:47:57] YuviPanda: Yes [16:48:02] btw, nova metadata is broken for the moment, so puppet runs are failing on labs instances. I’m pretty sure I know the fix, will be done in a few. [16:48:06] But the problem still remains [16:48:28] ashwinpp: yes, it seems to be you are using a global wrongly in your python code [16:48:36] NameError: global name 'recommenderObj' is not defined [16:48:54] It runs correctly on localhost [16:49:29] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure: integration-slave-trusty-1014 and integration-slave-trusty-1017 instances can't boot anymore, ended up corrupted. Need rebuild - https://phabricator.wikimedia.org/T110052#1575990 (10hashar) Seems it takes more than 10 minutes to clone med... [16:49:43] YuviPanda: So I was thinking it has to be something associated with running it as a webservice as opposed to running it on localhost [16:50:08] ashwinpp: that's because you have the recommenderObj = navlinkRecommender() in __name__ == "__main__" [16:50:34] ashwinpp: aah, indeed. yes, if __name__ == '__main__' is only called when you run it locally [16:50:49] Cool. Got it! [16:50:51] what valhallasw`cloud said :) [16:50:55] Thanks guys :) [16:50:58] yw [16:56:40] http://tools.wmflabs.org/ is not loading [16:57:41] that is also probably me, working on it... [16:57:44] ok [16:58:26] Reasonator et al are not available either [16:59:00] GerardM-, when the labs network is down, it's safe to assume all tools and all other labs projects are inaccessible [16:59:03] GerardM-: it's being worked on, should be back soon. [16:59:24] thanks yuvi [17:00:59] GerardM-: ashwinpp back up [17:01:27] YuviPanda: Yes, thanks. Everything seems to be working now. [17:01:34] ok, sorry for the network interruption, it didn’t like the version mismatch [17:01:43] I think things are stable now although I still have bits and pieces to upgrade. [17:04:30] now let’s make sure instance creation/dns still works... [17:48:45] andrewbogott: you can probably block all ssh connections that are not from coming from an internal ip? [17:49:04] oh, but you don't want to do that, because you normally /do/ want to listen to connections because they have to be routed [17:49:08] that same node handles floating ips [17:49:16] hm, also that [17:49:35] how would you normally ssh in? maybe sshd can just listen on one specific interfaces? [17:50:34] andrewbogott: or, simpler, running sshd on a non-standard port [18:40:43] 10Tool-Labs-tools-Other, 6Commons, 7Need-volunteer: Migrate https://toolserver.org/~daniel/WikiSense/* to Tool Labs and provide redirect - https://phabricator.wikimedia.org/T60869#1576467 (10Steinsplitter) You used a sql query to get the cats or how you do? It was api-only based? If you publish the source c... [18:49:22] 6Labs, 10wikitech.wikimedia.org: Searching for "Hiera:" with namespace "Hiera" deselected still shows results in "Hiera:" - https://phabricator.wikimedia.org/T110377#1576533 (10scfc) 3NEW [18:52:54] 6Labs, 10wikitech.wikimedia.org: Searching for "Hiera:" with namespace "Hiera" deselected still shows results in "Hiera:" - https://phabricator.wikimedia.org/T110377#1576547 (10Krenair) > I expect the search to return no (*1) results. Hmm. Are you sure it's not just smarter than you think it is? You did search... [18:55:08] 10MediaWiki-extensions-OpenStackManager, 10CirrusSearch, 6Discovery: Searching for "Hiera:" with namespace "Hiera" deselected still shows results in "Hiera:" - https://phabricator.wikimedia.org/T110377#1576551 (10Krenair) [19:02:13] (03PS1) 10Jean-Frédéric: Catch Exception in processCountry in unused_monument_images [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234041 [19:02:48] (03CR) 10Jean-Frédéric: [C: 032 V: 032] Catch Exception in processCountry in unused_monument_images [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234041 (owner: 10Jean-Frédéric) [19:13:03] 10MediaWiki-extensions-OpenStackManager, 10CirrusSearch, 6Discovery: Searching for "Hiera:" with namespace "Hiera" deselected still shows results in "Hiera:" - https://phabricator.wikimedia.org/T110377#1576599 (10scfc) If I //explicitly// deselect a namespace, I expect no hits in it. If the search was smart... [19:48:26] valhallasw`cloud: if still here, can you double-check something for me? [19:48:55] or, chasemp, you could do this too... [19:49:08] I just want someone to create an instance on wikitech, agree with me that it came up and looks fine. [19:54:34] ok [19:55:55] instance state: ERROR [19:55:56] off the bat [19:56:21] testposticehouse [19:57:07] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: Upgrade Labs to Openstack Juno - https://phabricator.wikimedia.org/T104587#1576791 (10Andrew) Upgraded: labcontrol1001 labnet1002 labvirt1004 labvirt1005 Partial upgrade: holmium (this is in a stable state and will remain half-and-half until Kilo. There ar... [19:57:33] andrewbogott: ^ [19:57:56] chasemp: huh… it just worked for me. Let me try again [19:58:28] chasemp: what project? [19:58:34] chasetest [20:03:07] chasemp: try now? [20:03:36] BUILD (spawning) [20:04:43] (03PS1) 10Jean-Frédéric: Remove NS 104 from (jo,ar) configuration [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234125 (https://phabricator.wikimedia.org/T110386) [20:05:58] (03CR) 10Multichill: [C: 032 V: 032] "Delete!" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234125 (https://phabricator.wikimedia.org/T110386) (owner: 10Jean-Frédéric) [20:06:31] andrewbogott: gtg [20:06:35] ok [20:06:40] I’ll keep an eye out [20:08:54] ok, I need a break, will mop up later. [20:14:25] andrewbogott: anything I still need to test? [20:28:02] (03PS1) 10Jean-Frédéric: Add bin/ scripts from ToolLabs [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234132 [20:30:24] (03CR) 10Multichill: [C: 032 V: 032] "Great!" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234132 (owner: 10Jean-Frédéric) [20:30:25] 10Wikibugs, 3Collaboration-Team-Current, 5Patch-For-Review: Add Collaboration-Team-Current to #wikimedia-collaboration - https://phabricator.wikimedia.org/T110186#1576900 (10Etonkovidova) Additionally checked on labs / tools/wikibugs2. Looks consistent with what other teams - e.g. #wikimedia-analytics, #me... [20:47:16] (03PS1) 10Jean-Frédéric: Update some paths Toolserver --> ToolLabs [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234136 [20:54:16] (03CR) 10Jean-Frédéric: [C: 032 V: 032] Update some paths Toolserver --> ToolLabs [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234136 (owner: 10Jean-Frédéric) [20:55:14] (03PS1) 10Jean-Frédéric: Fix Shebang of update_monuments_min bin script [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234137 [20:55:26] (03CR) 10Jean-Frédéric: [C: 032 V: 032] Fix Shebang of update_monuments_min bin script [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234137 (owner: 10Jean-Frédéric) [21:17:09] (03PS1) 10Jean-Frédéric: Fix SQL command in update_monuments_min.sh bin script [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234144 [21:17:21] (03CR) 10Jean-Frédéric: [C: 032 V: 032] Fix SQL command in update_monuments_min.sh bin script [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234144 (owner: 10Jean-Frédéric) [21:24:39] I'm trying to figure out how a bot I'm trying to fix works. Are there any logging or debugging tools on Labs that would be helpful? [21:26:01] I'm troubleshooting the login error here: https://tools.wmflabs.org/citations-dev/doibot.php [21:27:20] but because the login requests/responses are happening server-side, I can't just use my browser tools to inspect the http traffic. [21:27:40] fhocutt: (heading to bed in a few mins). Not much specific to Labs; mostly just good old add-print-statements-everywhere debugging [21:28:38] if this were Python, that would be easier--I tried adding echo($response); lines but nothing printed on the page. [21:28:44] I'll poke at it some more. [21:29:23] I typically have more success with print_r($value);, but I'm not very good with php [21:29:45] I'll give that a try. I'm still learning its quirks. [21:30:46] fhocutt: from what I can see, the login happens in expandFns.php in function logIn. I'd suggest checking the contents of $submit_vars (and maybe sending that to the api directly) [21:30:52] and dumping $login_result [21:31:06] yeah, that's where it's happening [21:31:18] oh, but there's a print_r($login_result) on error already [21:31:42] is it maybe trying to connect over http and not handling the http->https redirect? [21:31:54] very possible [21:32:27] it's using Snoopy for the http client and I'm not familiar with how that handles things [21:32:45] (03PS1) 10Jean-Frédéric: Surround processCountry with try/except [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234150 [21:32:46] you can also take a look at the first request (to get a login token) [21:33:07] I don't think I can in my browser. [21:33:40] (03CR) 10Jean-Frédéric: [C: 032 V: 032] Surround processCountry with try/except [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234150 (owner: 10Jean-Frédéric) [21:34:40] I just see a GET to doibot.php [21:34:51] and I'm not sure how to monitor the server-side traffic. [21:34:54] $bot->submit(api, $submit_vars); <-- and api is defined as https://test.wikipedia.org/etc [21:34:58] humdumdum [21:35:06] you can try mitmproxy [21:35:16] if you can get Snoopy to talk through a proxy, at least :/ [21:35:29] and use a fake certificate... [21:35:33] huh, interesting [21:39:25] I'm off to bed -- good luck with debugging! [21:39:56] thanks, valhallasw`cloud. [21:40:30] (03PS1) 10Jean-Frédéric: Change Namespace of pt/pt config to 0 [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234154 [21:40:42] (03CR) 10Jean-Frédéric: [C: 032 V: 032] Change Namespace of pt/pt config to 0 [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/234154 (owner: 10Jean-Frédéric) [22:19:34] 6Labs, 10Tool-Labs, 6Engineering-Community, 6WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1577350 (10Aklapper) >>! In T87730#1249301, @yuvipanda wrote: > I don't know if just closing the rfc on meta is enough - this needs some consensus fro... [22:22:52] 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, 3Labs-Sprint-109, 3Labs-Sprint-111: Evaluate kubernetes for use on Tool Labs - https://phabricator.wikimedia.org/T107993#1577360 (10yuvipanda) Thanks for Chiming in @ETune! I wasn't aware of admission control plugins - that sounds like exactly what we need :)... [22:33:07] is there a standard place on labs for a php error log or a server error log for any given tool? [22:35:31] fhocutt: hello! it's often ~/error.log in the tool's homedir [22:35:42] hey YuviPanda! [22:35:51] hello fhocutt [22:35:54] * YuviPanda is still in Europe [22:36:05] fhocutt: if it's a python uwsgi tool, it's going to be in ~/uwsgi.log [22:36:13] ah, yes [22:36:14] php ones are in ~/error.log I think [22:36:17] no, this is PHP [22:36:22] right [22:36:48] ok, that is helpful [22:37:11] fhocutt: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web has some info dumps (errr... documentation?) [22:37:38] web! there we go [22:38:13] yes that is very helpful [22:38:26] \o/ [22:38:39] maybe now I can meaningfully log things! [22:38:42] do edit if you learn something new that should be on that page [22:38:42] that is exciting [22:38:45] will do [22:38:45] :) [22:39:14] fhocutt: so this uses NFS for logging (sad trombone, etc) which means there's sometimes a few seconds lag between when a log item is written and it becomes visible to you [22:39:21] ok [22:39:28] good to know [22:39:45] :) [22:39:49] now I go sleeep [22:52:06] 10Tool-Labs-tools-Other, 6Commons, 6Community-Tech, 6Multimedia: [AOI] Ceate a new DerivativeFX after the Toolserver shutdown - https://phabricator.wikimedia.org/T110409#1577472 (10Aklapper) [22:53:39] looks like beta cluster code isn't syncing: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/67380/console [23:19:12] Hi Krenair! Not sure if I should be pinging u again, or who else, just a heads-up that beta cluster code isn't scapping to: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/67383/console [23:19:46] * Krenair will poke it [23:23:38] bd808, how is deployment SSH in beta supposed to work? [23:23:52] I know the jenkins-deploy user on deployment-bastion is supposed to do it [23:24:29] then logs in as mwdeploy across the other servers? [23:25:06] yes. It uses an shared agent like in prod [23:25:19] I wonder if it is unprimed [23:25:46] Where is the mwdeploy_rsa password kept? [23:26:31] it's in the private puppet repo on deployment-puppetmaster [23:26:44] ah [23:26:56] when you have it -- https://tools.wmflabs.org/sal/log/AU8pB8bQ1oXzWjit5Nvc [23:28:32] Krenair: I found it if you haven't yet [23:28:44] okay, where is it? [23:29:14] deployment-puppetmaster.deployment-prep:/var/lib/git/labs/private/files/ssh/tin/mwdeploy_rsa.passphrase [23:29:20] ah [23:30:01] * bd808 guesses ostriches picked that password [23:30:08] that did it [23:30:30] that damn jenkins job is supposed to fail when that happens :/ [23:30:37] bd808, although... I didn't use that command you posted [23:30:53] is there a nicer way? [23:30:57] bd808, scap doesn't fail when that happens :) [23:31:10] bd808, sudo keyholder arm? [23:31:33] ah. I figured there was a nice command :) [23:31:49] When this breaks in prod you get this: [23:31:50] PROBLEM - Keyholder SSH agent on mira is CRITICAL Keyholder is not armed. Run keyholder arm to arm it. [23:32:01] scap should exit with a non-zero status if there are any soft failures [23:32:21] I actually looked into that not long ago and thought I fixed it [23:33:05] and ... my patch isn't on beta :( [23:34:27] AndyRussG, anyway, it's working again now, thanks for reporting [23:35:27] !log Updated scap to a7ec319 (Use configured bin_dir to find refreshCdbJsonFiles) [23:35:27] Updated is not a valid project. [23:38:35] !log Krenair primed keyholder agent via `sudo keyholder arm` with password from deployment-puppetmaster:/var/lib/git/labs/private/files/ssh/tin/mwdeploy_rsa.passphrase [23:38:36] Krenair is not a valid project. [23:39:12] doh wrong channel twice for me [23:40:16] bd808, I logged it in -releng [23:40:23] awesome [23:40:30] although not with that level of detail [23:41:20] I noticed searching sal that I had never logged where to find the password before [23:41:38] because I've hunted it down several times now [23:44:08] Krenair: fantastic, thx!