[04:20:26] 6Labs, 6operations: Replicate or back up glance image data on virt1000 - https://phabricator.wikimedia.org/T90628#1077040 (10Krenair) virt1000, yep. [04:30:04] 6Labs: labs_lvm can clash file resource for mount point with other packages - https://phabricator.wikimedia.org/T91225#1077055 (10scfc) 3NEW a:3scfc [04:43:01] 6Labs, 10Wikimedia-Labs-wikitech-interface: wikitech is unresponsive/unable to log in - https://phabricator.wikimedia.org/T91218#1077083 (10yuvipanda) Restarted keystone, and it works now. Probably got killed OOM. [04:43:11] 6Labs, 10Wikimedia-Labs-wikitech-interface: wikitech is unresponsive/unable to log in - https://phabricator.wikimedia.org/T91218#1077084 (10yuvipanda) 5Open>3Resolved a:3yuvipanda [04:44:10] 6Labs, 10Wikimedia-Labs-wikitech-interface: Monitor for wikitech logins failing - https://phabricator.wikimedia.org/T91226#1077086 (10yuvipanda) 3NEW [04:44:21] 6Labs, 10Wikimedia-Labs-wikitech-interface: wikitech is unresponsive/unable to log in - https://phabricator.wikimedia.org/T91218#1076771 (10yuvipanda) Filed T91226 for monitoring [04:44:32] 6Labs, 10Wikimedia-Labs-wikitech-interface, 7Monitoring: Monitor for wikitech logins failing - https://phabricator.wikimedia.org/T91226#1077086 (10yuvipanda) [04:44:58] 6Labs, 10Tool-Labs, 5Patch-For-Review: Retire 'tomcat' node, make Java apps run on the generic webgrid - https://phabricator.wikimedia.org/T91066#1077097 (10yuvipanda) I'm going to agree as well, but one step at a time :) [06:55:38] 6Labs, 10Tool-Labs: Generic services nodes should be redundant so OGE can reschedule them onto another machine if one goes down - https://phabricator.wikimedia.org/T90557#1077193 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Done, and they are on different virt hosts. [06:55:39] 6Labs, 10Tool-Labs, 7Tracking: Make sure that toollabs can function fully even with one virt* host fully down - https://phabricator.wikimedia.org/T90542#1077196 (10yuvipanda) [06:55:48] PROBLEM - Puppet failure on tools-dev is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:02:14] 10Tool-Labs: 10-minute load times on toolserver - https://phabricator.wikimedia.org/T76297#1077199 (10scfc) No, all issues regarding XTools should be reported only at [[https://github.com/x-Tools/xtools/issues|GitHub]] as that is were it is maintained. [07:04:39] (03PS3) 10Yuvipanda: Point users to webservice2 for tomcat [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193559 (https://phabricator.wikimedia.org/T91066) [07:23:39] 10Tool-Labs: Create a utility that dumps all databases of a user - https://phabricator.wikimedia.org/T91231#1077206 (10scfc) 3NEW [07:25:54] RECOVERY - Puppet failure on tools-dev is OK: OK: Less than 1.00% above the threshold [0.0] [08:10:59] !log tools delete tools-uwsgi-02 because https://phabricator.wikimedia.org/T91065 [08:11:05] Logged the message, Master [08:12:33] PROBLEM - Host tools-uwsgi-02 is DOWN: CRITICAL - Host Unreachable (10.68.17.216) [08:13:01] * YuviPanda pats shinken-wm [08:13:02] we know [08:18:56] 10Tool-Labs: Document / get rid of jobkill.pl - https://phabricator.wikimedia.org/T91233#1077305 (10yuvipanda) 3NEW a:3coren [08:33:30] 10Tool-Labs: Get rid of toolwatcher, use skeleton homedirs instead - https://phabricator.wikimedia.org/T91235#1077343 (10yuvipanda) 3NEW a:3coren [08:36:15] 10Tool-Labs, 7Tracking: Packages to be added to toollabs puppet - https://phabricator.wikimedia.org/T55704#1077355 (10yuvipanda) a:5coren>3None [08:38:47] 6Labs, 10Tool-Labs, 5Patch-For-Review: Move uwsgi jobs to be run on generic hosts, retire uwsgi hosts - https://phabricator.wikimedia.org/T91065#1077361 (10yuvipanda) Moved 'em all over! \o/ [08:41:05] !log tools delete tools-uwsgi-01 [08:41:08] Logged the message, Master [08:44:35] PROBLEM - Host tools-uwsgi-01 is DOWN: CRITICAL - Host Unreachable (10.68.16.64) [08:44:57] 6Labs, 10Tool-Labs, 5Patch-For-Review: Move uwsgi jobs to be run on generic hosts, retire uwsgi hosts - https://phabricator.wikimedia.org/T91065#1077380 (10yuvipanda) Deleted the puppet code, and also deleted the two instances. Need to remove the queue and the exec hosts from OGE. [08:48:24] 10Tool-Labs: Get rid of toolwatcher, use skeleton homedirs instead - https://phabricator.wikimedia.org/T91235#1077399 (10scfc) IIRC we use `pam_mkhomedir.so` (cf. `/etc/pam.d/common-session`), so we would need to have separate pam setups (to reference different skeletons) for normal users and tool accounts. Rea... [08:57:17] PROBLEM - Puppet failure on tools-webgrid-generic-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:07:27] 10Tool-Labs: Migrate tools to trusty - https://phabricator.wikimedia.org/T88228#1077419 (10yuvipanda) Yup, this is about moving individual tools, sorry that wasn't clear. Jessie looks like a non-starter now, since Debian isn't going to put Oracle's Abandonware in its main repos. We should seriously consider get... [09:07:46] 10Tool-Labs: Migrate individual tools to trusty to relieve pressure on older precise nodes - https://phabricator.wikimedia.org/T88228#1077420 (10yuvipanda) [09:07:50] (03CR) 10Tim Landscheidt: "Wouldn't it be better to bite the bullet, merge webservice2 into webservice and be done? IIRC the only change is that webservice2 default" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193559 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [09:09:48] (03CR) 10Yuvipanda: "There are a couple of issues with webservice2 that still have to be fixed:" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193559 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [09:11:00] 6Labs, 10Tool-Labs: Have bigbrother run on multiple nodes to provide redundancy against tools-submit failure - https://phabricator.wikimedia.org/T91237#1077421 (10yuvipanda) 3NEW [09:17:38] 6Labs, 10Tool-Labs: Setup a redis slave for toollabs as backup / redundancy - https://phabricator.wikimedia.org/T91239#1077440 (10yuvipanda) 3NEW [09:19:55] (03CR) 10Tim Landscheidt: [C: 04-1] "Okay, that makes sense. Small syntax error in the patch, then it should work." (031 comment) [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193559 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [09:41:20] (03PS4) 10Yuvipanda: Point users to webservice2 for tomcat [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193559 (https://phabricator.wikimedia.org/T91066) [09:43:02] (03CR) 10Yuvipanda: "Fixed in PS4!" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193559 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [09:44:08] (03CR) 10Yuvipanda: Point users to webservice2 for tomcat (031 comment) [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193559 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [09:47:05] good morning [09:47:06] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10Tool-Labs, 10Tool-Labs-tools-Article-request, and 10 others: Labs' Phabricator tags overhaul - https://phabricator.wikimedia.org/T89270#1077511 (10Aklapper) [09:47:16] hi marcmiquel [09:47:36] hi YuviPanda, here with my research. [09:47:53] hi Yuvi - I am working on an API for Dario's research team [09:48:02] it is written in Python Flask and uses SQLite [09:48:16] nice [09:48:20] hi ananthrk_ [09:48:28] i would like to ask a question. [09:48:37] marcmiquel: sure [09:48:43] ananthrk_: do you have access to toollabs already? [09:49:06] how could I get the coords from each article in mediawiki? [09:49:07] if I have to host my code & DB so that Dario & team can access, should I create a new project as suggested in https://phabricator.wikimedia.org/T76375 ? [09:49:35] ananthrk_: nope! you should just use toollabs. tools.wmflabs.org / wikitech.wikimedia.org/wiki/Help:Tools [09:49:48] ananthrk_: if you tell me your labs username, I can add you to the tools project, and then you can follow the help pages [09:50:05] its "ananthrk" [09:50:21] ananthrk_: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs. https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#Python_.28uwsgi.29 specifically for setting up a flask app [09:50:24] ananthrk_: adding [09:50:47] marcmiquel: there’s an API for that, I believe. [09:50:52] i've been checking different options. in tool labs mediawiki there is a table called geo_tags, which seems incomplete [09:51:01] YuviPanda: in wikidata? [09:51:06] marcmiquel: https://www.mediawiki.org/wiki/Extension:GeoData [09:51:11] marcmiquel: and wikidata too, maybe. I do not know. [09:51:16] marcmiquel: geo_tags is useless, afaik. [09:51:29] marcmiquel: you might get better answers from emailing labs-l [09:51:45] geodata is an option? [09:52:07] marcmiquel: yeah, but I don’t know if it stores values in databases. [09:52:23] i would like to retrieve [09:52:40] let's say: all articles from german wikipedia. select then those with coords only... [09:52:42] marcmiquel: https://www.mediawiki.org/wiki/Extension:GeoData#API lists the APIs [09:52:59] marcmiquel: might be possible by referring to the templates being used. [09:53:39] !log tools added ananthrk to project [09:53:41] Logged the message, Master [09:53:42] ananthrk_: ^ added you. [09:54:03] thanks YuviPanda. i'll check it, and if i need more. i will email labs-l [09:54:32] thanks [09:55:16] marcmiquel: yw! if you know which templates are used to specify geo-co-ords in dewiki, you can probably use the https://www.mediawiki.org/wiki/Manual:Templatelinks_table table to find out all pages with that template [09:55:32] ananthrk_: let me know if you need any help [09:55:42] that would be an option, but i don'tn plan to use only german wiki [09:55:56] right [09:56:02] I tried a "create new tool" request [09:56:04] so probably i'm going to go for wikidata or geodata apis [09:56:10] but get back a message "Your account is not in the project tools" [09:56:19] ananthrk_: try logging out and logging back in?} [09:56:21] see if they understand well with my python :) [09:56:23] from wikitech.wikimedia.org [09:56:55] marcmiquel: :) [09:58:19] "Your account is not in the project tools. [09:58:20] You cannot complete the action requested as your user account is not in the project tools." [09:58:25] no luck...same message [09:58:57] (03CR) 10Tim Landscheidt: [C: 032] Point users to webservice2 for tomcat [labs/toollabs] - 10https://gerrit.wikimedia.org/r/193559 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [09:59:08] ananthrk_: hmm, interesting. let me look [10:00:32] just filed a new access request [10:01:58] ananthrk_: hmm, I see you in https://wikitech.wikimedia.org/wiki/Special:NovaProject [10:02:01] (under toollabs) [10:02:07] my second question is regarding the database. i'm not a sql expert and i think what i am doing is not very efficient. i would like to know from what wikipedia is a username and if it exists in two, which one has more edit_count. [10:03:03] marcmiquel: hmm, I don’t know either. Certainly sounds possible with some code and SQL. [10:03:10] maybe iterate through all wikis and then sort edit counts? [10:03:20] i tried that [10:03:33] and got a row with one column per lang [10:03:42] i didn't know how to sort them [10:03:48] I don’t think you can do this with pure SQL [10:03:55] but then again, my SQL is very primitive too [10:04:10] ananthrk_: are you able to select ‘tools’ in the dropdown on the page I linked you to? [10:04:20] yes..i am [10:04:42] yup. but consulting 200 db at the same time is very expensive! [10:05:09] this is why i wonder if there is a shortcut [10:05:54] marcmiquel: I think the query itself (if you use the edit count column in the user table) should be simple enough... [10:06:04] just remember to cache it and not re-issue query over and over again [10:06:35] yes, it's not the simplicity though. it's that in a single query i may be consulting many databases [10:06:42] and that takes time [10:07:08] right. [10:07:11] i was thinking if there was a table or something [10:07:17] I’m not sure if there are other ways of optimizing it. [10:07:21] not that I know of. [10:07:26] labs-l might know better [10:07:33] ananthrk_: hmm, I’m investigating (still) [10:09:12] btw if I have done things right in my end, should i be able to login to bastion.wmflabs.org or login.tools.wmflabs.org? because I dont seem to be able to do that [10:10:23] ananthrk_: latter. [10:10:27] ananthrk_: oh, I wonder if you don’t have shell right [10:10:28] * YuviPanda checks [10:12:25] ananthrk_: hmm, you do have shell. [10:12:30] ananthrk_: have you uploaded a public key to wikitech? [10:13:37] yuvipanda@tools-dev:~$ groups ananthrk [10:13:37] ananthrk : wikidev project-bastion project-tools [10:13:43] hmm, so ldap sees you appropriately [10:14:39] yes..i have already uploaded my public key [10:15:13] ananthrk_: try sshing to tools-login.wmflabs.org? I’m watching the logs... [10:15:27] infact i do have access to bast1001.wikimedia.org [10:15:49] right, so that’s different [10:15:54] wikimedia.org != wmflabs.org [10:15:58] they are strictly separated. [10:16:21] hmm [10:16:21] Failed publickey for ananthrk [10:16:23] > Failed publickey for ananthrk [10:16:25] from the logs [10:17:35] oops..got it resolved [10:17:45] the identities were not added in my local setup [10:17:49] ananthrk_: :) ok! [10:18:04] so when I explicitly did ssh-add my wikimedia keys I am able to login to tools-login now [10:18:30] ssh-rsa [10:18:30] AAAAB3NzaC1yc2EAAAADAQABAAABAQC3oR+Ke/M1+VC18siR+zLBK+mD/Ek4ZOmapPLamXZwvoSNwAGS7aCzx76BsvTsHT5kZOwqeRTT+EoqY0HNdwjCbFsPQ54rAlDwSN4wwa9/a/n1Oxg7eYEOv7Tt4yK4mMjcO1WtRQ1KAvjLxyWuVgZ/tP99lVVn/lmDYyh3AGvUHxOz9eI2NTqnfb3i1X9+j3zY3dC3YtrOqR9OtttYQZR+Glr/u9wWLswGB6m8zAhuHetjc3i4NLWKzuDeyXEjF6iMEL++FNPwnsWyalRSDko0fZBRTrOhk8SKPjKzppcA8P88C+ZjZxvuVIMqnyVeBtsn/Vm5MtzF [10:18:30] dLBUFVjrjLPR ananthrk@ymxdata.com [10:18:35] so that key also has a newline at the end [10:19:15] so do I create a new tool in order to host the API? [10:20:32] ananthrk_: yup [10:22:42] YuviPanda: Hi, just tried to add a new instance "mwoffliner3" and it seems the VM creation failed. Are you facing troubles currently? [10:22:44] but keep getting the same error whenever i try to create a new tool request [10:23:00] Kelson: what error did you get? [10:23:00] that my account is not in project tools [10:23:07] Kelson: no, no issues atm. [10:23:24] ananthrk_: hmm, strange. what is the name of the tool you wanted to create? Let me see if I can create it and add you [10:23:28] YuviPanda: the first time I tried, I think I had no error, but the VM was not listed in the list of https://wikitech.wikimedia.org/wiki/Special:NovaInstance [10:23:30] fyi, when I logout/login I select "labs" as my domain (which is the only one listed) [10:23:44] YuviPanda: then I tried again with the same named, and I got something like "creation failed" [10:23:48] for now, just create it as "clickstream_api" [10:24:26] YuviPanda: and I can not connect via SSH to it, so I guess the creation process is stucked somewhere [10:24:35] ananthrk_: I’ll call it clickstream-api, is that ok [10:24:44] sure :) [10:28:15] ananthrk_: try sshing in now, and do ‘become clickstream-api’? [10:28:21] your current ssh session won’t work [10:29:27] i logged out and logged in again to the host [10:29:39] and was able to ‘become clickstream-api’ [10:30:51] thanks [10:30:53] ananthrk_: w00t. [10:31:06] ananthrk_: can you file a bug in phabricator about not being able to create a tools? [10:31:48] do you really think it is a bug and not something I did on my side? :) [10:32:03] ananthrk_: we don’t really know yet :) bugs are good to file anyway [10:32:36] hah..sure..is it just a new phabricator ticket? do I need to tag any specific projects/users? [10:36:12] ananthrk_: ‘tool labs' [10:37:24] 6Labs: Two instances with same name - https://phabricator.wikimedia.org/T89931#1077667 (10scfc) I had to delete the older instance because the confusion was too much :-). So now there's only one instance. [10:40:12] 10Tool-Labs: Unable to "Create New Tool" from tools.wmflabs.org webpage - https://phabricator.wikimedia.org/T91246#1077672 (10ananthrk) 3NEW [10:40:26] https://phabricator.wikimedia.org/T91246 [10:40:37] do I need to add anything? [10:41:01] ananthrk_: nope :) go ahead [10:41:07] with your work [10:41:09] and use uwsgi [10:41:10] :D [10:41:23] thanks for the help :) [10:42:14] btw do I just copy my project code to "/data/project/clickstream-api"? [10:43:33] ananthrk_: I’d suggest putting it in git somewhere and pulling it down to there [10:44:05] okay..but the location is correcT? [10:45:37] ananthrk_: yeah. [10:45:41] ananthrk_: for exact locations reccomended for uwsgi, see https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#Python_.28uwsgi.29 [10:48:04] 6Labs, 10Tool-Labs: Unable to "Create New Tool" from tools.wmflabs.org webpage - https://phabricator.wikimedia.org/T91246#1077684 (10yuvipanda) Note that I was able to create the tool for him and add him to it. [10:54:45] 10Tool-Labs, 10Continuous-Integration: labs-toollabs-debian-glue fails apparently with a timeout - https://phabricator.wikimedia.org/T91247#1077688 (10scfc) 3NEW [10:56:44] thanks..will check [11:19:41] @seen scfc_de [11:19:41] YuviPanda: Last time I saw scfc_de they were quitting the network with reason: Client Quit N/A at 10/16/2014 2:53:13 PM (136d20h26m28s ago) [11:22:23] RECOVERY - Puppet failure on tools-webgrid-generic-02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:24:53] RECOVERY - Puppet failure on tools-webgrid-07 is OK: OK: Less than 1.00% above the threshold [0.0] [11:26:37] wheeee [11:28:07] Hi everyone [11:28:38] Are there any known problems with wdq/autolist? [11:29:24] I cannot get any results, or just errors [11:30:48] jem: http://wdq.wmflabs.org/ looks up to me... [11:30:56] https://tools.wmflabs.org/autolist/ as well [11:30:59] outside of thet, I dunno. [11:31:46] YuviPanda: Yes, but when I run queries... no results [11:32:00] Right now, "Running query..." forever [11:32:21] jem: ah, I’m restarting them now [11:32:47] Ah [11:32:53] Ok, let's wait then [12:09:35] YuviPanda: I cloned my repo in the location specified in the docs [12:09:56] but could not get it to refer other files [12:10:12] wdq / autolist still not working here... [13:02:02] jem: at this point I dunno if there is anything I can do [13:04:27] ananthrk_: what do you mean by 'refer to other files' [13:05:39] I get a valid webpage (index list) when I navigate to the URL only when I place a file called "app.py" in www/python/src [13:06:27] but it does not recognize other routes specified in this file [13:06:33] Ah hmm [13:06:42] I'm on my phone right now and can't debug [13:06:51] thus I am assuming it is serving a different app than the one in app.py? [13:06:57] You can look at uwsgi.log in your tool homedir [13:07:07] ananthrk_: Aah you have to restart the webservice after every change [13:07:14] i did [13:07:19] Hmm [13:07:22] That's strange [13:07:41] I am out now [13:07:48] Can I fetch all items that have certain properties from the wikidata repdb? [13:07:56] If you haven't figured it out by then I'll try help [13:08:01] * YuviPanda|zzz goes away [13:08:39] just checked the logs [13:08:43] "ImportError: No module named flask" [13:08:59] so looks like i need to explicitly install flask in the new virtualenv? [13:09:38] ananthrk_: Yeah, I don't think it comes with any modules. [13:09:54] ananthrk_: When you make a new venv, you need to install them all. [13:12:49] btw is it recommended that I always install venv? or is it okay to just have flask running from the folder for now? [13:13:49] ananthrk_: I'm not sure what you're doing, but I have a few pythons run in venvs where extra modules are needed, and some others running regular. [13:15:02] a930913: am just trying to run an API built in Flask that exposes data in a local SQLite DB for now [13:16:25] ananthrk_: ah yeah. Put things in the venv. It doesn't have anything by default [13:16:44] okay..will do [13:18:00] the path for the venv is also out in the docs [13:18:34] ~/www/python/venv [13:18:35] yup [13:28:19] 10Tool-Labs, 5Patch-For-Review: Enable OpenJDK 8 - https://phabricator.wikimedia.org/T68171#1077818 (10Krinkle) See also T85964. [13:39:58] Thanks anyway, YuviPanda|zzz [13:40:23] It's really frustrating, it is back for some minutes and then off again [13:42:36] YuviPanda|zzz: works now, again logout/login has helped. [13:42:58] Kelson: wheee [13:43:16] jem: nobody except magnus understands the wdq code. So hard to debug [13:46:02] YuviPanda|zzz: shinken’s worries overnight were just you creating/destrying instances, right? Not an actual tools outage? [13:46:25] Ah, so I see in the backscroll :) [13:46:26] YuviPanda|zzz: wrong feedback, this is related to the size, it works for a small, but not for a x-large instance [13:47:09] Kelson: I don’t know what you’re talking about but I can maybe help :) Do mind starting over at the beginning? [13:47:39] andrewbogott: Hi Andrew [13:48:18] andrewbogott: I try to create "mwoffliner3" has an xlarge instance in "mwoffliner" project, and it fails. [13:48:42] Most likely your project is over quota — have you checked that? [13:48:57] I see, YuviPanda|zzz [13:49:29] Kelson: and/or do you know how to check quota? There’s a link on the ‘manage projects’ page [13:49:30] Currently there is no working alternative to those tools, right? [13:50:45] 10Wikimedia-Labs-Infrastructure, 10Continuous-Integration, 3Continuous-Integration-Isolation: Figure out how to dedicate baremetal to a specific labs project - https://phabricator.wikimedia.org/T84989#1077901 (10Krinkle) p:5Triage>3Normal [13:52:08] andrewbogott: that's a good hint [13:52:34] andrewbogott: cores & RAM are too limited [13:53:45] Do you have other instances you can sacrifice? We’re a bit shy of space just now [13:54:34] andrewbogott: not really, I have two of them which work well and I think to achieve to make dump of all our projects we will need around a dozen [13:54:54] andrewbogott: the only think we could try to shrink is IMO the memory [13:55:08] andrewbogott: it might work with 8GB instead of 16 [13:57:04] andrewbogott: would that help if we talk again about that in a week? [13:58:26] Kelson: Why so many instances? [13:58:49] I can raise your quota for another 8G instance, but not for a dozen of them :) [13:59:18] andrewbogott: generating copies of Wikipedias with pictures takes resources... [13:59:37] Yeah, but /processing/ resources? [14:00:05] andrewbogott: Didn't labs recently get a load more hardware? Has it all been used already? [14:00:32] andrewbogott: processing and storage (storage because we need to cache pictures... and also to compute ZIM files) but also CPU because we need compress ZIM files, portable ZIP files and also recompress/optimize pictures [14:00:54] a930913: yes, but it also lost a lot of hardware. And labs is growing quickly. [14:01:41] Moar hardware! :D [14:02:02] andrewbogott: but I don't need everything now, I'm just doing thing step by step. [14:02:39] ananthrk_: mwoffliner3 will be for example dedicated for wiktionary and the two which are already there do all the rest (except wikipedia stuff) [14:05:31] 6Labs, 10Tool-Labs: Distributing tools, deployment-prep to both data centers (availability/redundancy) - https://phabricator.wikimedia.org/T85610#1077971 (10coren) 5Open>3declined a:3coren This was decided against during the Operations meeting at the All-Hands. [14:06:30] andrewbogott: if we reduce RAM for mwoffliner1 to 8GB and increase the mwoffliner storage quota. would we be able to create a new xlarge instance (but with 8GB or RAM)? [14:06:59] Instance sizes can’t be changed once they’re running. [14:07:13] andrewbogott: I can stop mwoffliner1 [14:07:30] Have you experimented to verify that you really need xlarge instances? Almost no one is using that size [14:07:41] I mean, size is only set at creation time. [14:09:25] Could someone point me the fastest way to reach Magnus? (I assume he's not in IRC) [14:09:58] (autolist and wdq aren't working) [14:10:20] Even http://wdq.wmflabs.org/stats is giving gateway errors [14:11:18] andrewbogott: Yes, I have been building ZIM files for at least 5 years and making offline copies of Wikipedia for almost 10 years. The solution we have now can probably be improved, but it's already pretty much resource efficient. [14:12:40] andrewbogott: ok, if we can not change the available RAM per VM after a VM creation... then this quota needs to be increased too. [14:14:59] It looks like you have the quota to create an m1.large instance… am I miscounting? [14:16:57] andrewbogott: I confirm, let's try to do something with this... wiktionary is a little bit a special case, their is almost no pictures... might work. [14:17:38] andrewbogott: that's a good idea for this step. Thx [14:18:45] Kelson: Can you send an email to labs-l describing your proposed project and resource needs? I’ll see if we can allocate more hardware so that we have capacity for it. [14:19:03] andrewbogott: Yes, I will. [14:19:08] thanks [14:19:21] (Sorry if you’ve already gone over all this w/Yuvi) [14:20:23] Hey, anyone? [14:20:42] andrewbogott: no problem, I should already have written that email. [14:21:32] jem: no idea. I agree that he’s seldom on IRC [14:32:29] Thanks, andrewbogott, I've tried Twitter, let's hope there is some luck [14:34:04] But it's strange that no one else is complaining about this, I thought wdq and autolist are very used [15:00:17] Is it normal to not have your puppet run since Christmas day? [15:13:04] Izhidez: is it failing, or just not running at all? [15:13:17] (Not normal, it should run every 30 mins) [15:17:09] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10Tool-Labs, 10Tool-Labs-tools-Article-request, and 9 others: Labs' Phabricator tags overhaul - https://phabricator.wikimedia.org/T89270#1078173 (10hashar) [15:18:35] 6Labs, 7Puppet: dynamicproxy: Move list of blocked user agents to hiera - https://phabricator.wikimedia.org/T90844#1078178 (10coren) [15:20:41] andrewbogott: not sure, I just see the message everytime I ssh into the instance or look at the instance on wikitech [15:21:30] Izhidez: what project and instance? [15:22:56] utrs - utrs-primary - i-0000024a.eqiad.wmflabs. but apparently it's still rebooting from when I hit it a second ago. I'm getting public key errors. [15:23:51] also account-creation-assistance - accounts-appserver2 - i-00000104.eqiad.wmflabs [15:28:08] how long before you can SSH into an instance after reboot? I'm going on 10 minutes, but the web port is working fine... [15:30:09] i'm still getting publickey issues... [15:31:18] 6Labs: Process for user backups - https://phabricator.wikimedia.org/T85608#1078208 (10coren) Bugs in LVM2 make thin snapshots iffy, I'm backporting lvm2 back to Precise to fix. (WIP at https://launchpad.net/~marc-u/+archive/ubuntu/wmf/+packages) [15:32:47] andrewbogott: ^^ [15:33:30] Izhidez: is this an instance that you have every accessed or used for anything? [15:34:23] You should be able to ssh in immediately. [15:36:02] yes, I rebooted utrs - utrs-primary - i-0000024a.eqiad.wmflabs, and it hosts all of UTRS, over 13,000 unblock tickets for the english wikipedia (utrs.wmflabs.org) - still having public key issues, and I just logged into it before the reboot [15:41:54] andrewbogott: ^^ (should I ping you with each reply? or do you see them) [15:41:58] I can’t ssh either. I don’t know why not [15:42:03] Why did you reboot? [15:42:38] modified my personal .bashrc to add a few alias commands, that's it. [15:43:41] hm… .bashrc takes effect on each login, it’s definitely not necessary to reboot for that. [15:45:17] on each time I go in via SSH it takes effect? I thought is was only through local login, that's why I rebooted [15:47:24] Yesh, each time. [15:47:32] hmm ok, good to know [15:47:42] Is the web server working still? (I rebooted in an attempt to follow the log but I’m not getting logs.) [15:48:20] no, 502 bad gateway [15:48:40] but that's usual till it boots up [15:51:07] How long did it take to boot last time? seems very slow. [15:52:04] few minutes (2-3 min) [15:52:10] at worst [15:56:11] ya it's usually active stale by now... [16:05:40] andrewbogott: it's back up and active now, trying ssh [16:05:59] still public key issues tho [16:07:00] Oh, the firewall blocks ping — that’s part of why I’m confused. Mind if I fix that? [16:08:29] sure, can you just let me know what you do, so I can make sure that doesn't happen in the future? [16:08:34] I cannot ssh either [16:09:01] I added the -1,-1 entry to https://wikitech.wikimedia.org/wiki/Special:NovaSecurityGroup [16:09:13] it should’ve been there by default, I don’t know where it went [16:09:31] hmm ok [16:10:06] I don’t have any idea why ssh is broken. I can’t reach it with my root key either, so there’s not much I can to do to debug [16:13:38] there is still a way to get into the VM though, even if it requires physical access to the server hosting it? [16:14:24] No, ssh is the only access path [16:15:01] oh lovely... [16:18:12] andrewbogott: mwoffliner3 is almost configured now, may you please increase our public IP quota of one so I can be able to download the created ZIM files? [16:27:17] Hey, does anybody know if there's a way to set a newer php version on tools? [16:27:20] Are we all stuck on 5.3? [16:37:45] andrewbogott: could it be an issue of /home encryption? (http://askubuntu.com/questions/254776/ubuntu-server-ssh-after-reboot-permission-denied-publickey) [16:49:40] andrewbogott: after reboot number 3 or 4, and waiting a few minutes after the reboot to try SSH, i'm in [16:57:14] can someone do something about WDQ that is not working ? [16:57:58] Warning: fopen(http://wdq.wmflabs.org/api?q=claim%5B31%3A5%5D+and+noclaim%5B69%3A1795487%5D): failed to open stream: HTTP request failed! HTTP/1.1 504 Gateway Time-out in /data/project/catscan2/public_html/omniscan.inc on line 132 [17:03:07] GerardM-: I dont know if there is anything we can do [17:03:18] I restarted wdq a few times but it isn't really working [17:03:34] And I'm nor really in a position to read the c++ code to debug it [17:03:44] I can restart it a few times again if you want [17:04:26] huskyr: hey! [17:04:36] YuviPanda|zzz: did you send a notice to Magnus ? [17:04:36] huskyr: if you use trusty you get 5.5 [17:04:40] If not I will [17:05:00] huskyr: webservice2 start instead of webservice start [17:06:06] GerardM-: I believe jem had done something [17:06:16] jem ? [17:08:45] YuviPanda|zzz: If you are still not asleep :) Would you be so kind to make a public a new IP available for mwoffliner3 pleae? [17:09:22] Kelson: you are using these for rsync right? [17:09:33] YuviPanda|zzz: yes, only for rsync [17:09:39] I can do it when I'm back. [17:09:44] YuviPanda|zzz: merci [17:09:44] andrewbogott: ^ [17:10:13] 6Labs: milimetric and halfak would like postgresql database access - https://phabricator.wikimedia.org/T91267#1078554 (10Milimetric) 3NEW a:3yuvipanda [17:10:18] Kelson: we could also try to find a solution to this that doesn't involve so many public IPS some day. Rsync proxies maybe [17:11:31] YuviPanda|zzz: yes, of course. I'm really open to that... I simply still don't have found on my own a robust and simply to implement solution for this. [17:12:12] who is Yem ... do I need to say something about this ? [17:12:41] eh Jem [17:14:36] Izhidez: sorry, I stepped away. Did you enable home encryption? [17:15:13] not by myself no. I was wondering if it was default. but I am back in now after as I said waiting a few minutes after a reboot to SSH [17:15:37] *to attempt ssh [17:16:27] Hi everyone again [17:17:44] GerardM- and the rest: The only thing I "did" was warn Magnus in Twitter [17:18:05] And try and retry with several queries to see if there was any difference [17:18:37] Apparently simple ones work from time to time, but complex ones never [17:20:06] GerardM-: And nothing else to say apart from "please fix it soon" :) [17:23:37] if complex queries break WDQ it makes sense not to run them [17:23:52] OR define the query and ask Magnus for help [17:24:13] breaking things time and again does not help [17:29:49] GerardM-: But complex queries did work two weeks ago, and errors happen if loading wdq/stats [17:29:55] *even if loading [17:30:26] Anyway, ok, I've stopped testing until news from Magnus [17:32:44] jem define what you need and send a mail to Magnus [17:33:17] at this moment things break ... ie it is not testing ... not really [17:38:51] GerardM-: I've twitted him, won't that be enough? [17:39:25] no [17:39:34] Ok [17:39:41] please write up what you are doing and send Magnus a mail [17:39:47] do you have the address ? [17:39:56] I would have to search [17:40:19] Or grep in the mailboxes [17:41:54] or you get a PM [17:45:12] :) [17:48:30] YuviPanda|zzz: heh, shinken's top warning is that shinken can't contact shinken? (Host Unreachable (10.68.16.42) ) [18:03:27] Mail sent [18:05:41] 6Labs: Hardware for Designate - https://phabricator.wikimedia.org/T91277#1078794 (10Andrew) 3NEW [18:37:24] bd808: Hi, I have the feeling that the runJobs.php does not really work on my labs-vagrant instance. I assume that there must be a jobrunner.json file somewhere that specifies the details how jobs are run. Do you know where I can find it? [18:38:04] physikerwelt: It should be /etc/jobrunner.json [18:39:22] bd808: thank you. The config looks fine... but I think I'll figure out the rest by myself [18:40:31] * bd808 nods [18:44:24] bd808: thanks it's all rights... I just way more jobs than expected. [19:40:33] 6Labs: Storage capacity & redundancy expansion (tracking) - https://phabricator.wikimedia.org/T85604#1079204 (10coren) The new shelf has been added, and configured. Actual expansion is pending on thin volumes, which itself requires a backport of a recent version of lvm2 (which is nearly complete) - Precise has... [19:42:27] aude: GerardM- jem I’m starting the update process for wdq dump from 2015-03-02 now. maybe that’ll make things better. it’s gonna have to run overnight tho [19:46:21] 6Labs: Labs NFSv4/idmapd mess - https://phabricator.wikimedia.org/T87870#1079225 (10coren) Despite the success with labstore2001, the instabilities of the past three weeks have made me weary of making a change of this magnitude on labstore1001. I would suggest waiting for a week or two with no significant Labs... [19:52:16] 6Labs: Create a Labs instance for Shiny - https://phabricator.wikimedia.org/T91297#1079279 (10Ironholds) 3NEW [19:53:30] 6Labs: Create a Labs instance for Shiny - https://phabricator.wikimedia.org/T91297#1079290 (10yuvipanda) Couple of comments! 1. This is for their 'open source version', right? 2. I wonder if this can't be on toollabs? I can take a look about (2) in a day or so. [19:58:23] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10Tool-Labs, 10Tool-Labs-tools-Article-request, and 9 others: Labs' Phabricator tags overhaul - https://phabricator.wikimedia.org/T89270#1079319 (10coren) I'm fine with either, although the points raised by @yuvipanda make sense. I would say "do without the... [20:02:27] 6Labs: Create a Labs instance for Shiny - https://phabricator.wikimedia.org/T91297#1079354 (10Ironholds) 1. Yes! 2. I guess I see it more as a pseudo-infrastructure thing; it's not a tool so much as it is a platform we're experimenting with. Of course, we could just host it under the analytics project in a new i... [20:04:09] 6Labs: Create a Labs instance for Shiny - https://phabricator.wikimedia.org/T91297#1079356 (10yuvipanda) Hmm, right. After looking at it for a bit more, perhaps your own project might be the better way to go now. @Andrew do we have enough spare capacity left atm to give a project where Oliver can create an inst... [20:21:08] 6Labs: Create a Labs instance for Shiny - https://phabricator.wikimedia.org/T91297#1079456 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Considering that I just deleted a couple of tools instances (the separate uwsgi ones), and @Ironholds promises to not use up a lot of instances and mine bitcoin, I've gone a... [20:23:10] hi Coren, back again [20:23:14] ,-) [20:23:26] my webservice task was again killed [20:23:26] Hello. [20:23:45] without me manually turning it down [20:23:52] Did you check what qacct reports for it? It'll tell you more information. [20:23:56] tool name was wikidata-primary-sources [20:24:02] ok, I look [20:25:09] YuviPanda: can I mine bitcoin on labs ;) (re. the shiny task) [20:25:16] 6Labs: milimetric and halfak would like postgresql database access - https://phabricator.wikimedia.org/T91267#1079483 (10yuvipanda) @akosiaris can you do this one last time? I swear I'll get on the postgres user creation script as soon as labs is less on fire. [20:25:18] JohnLewis: no :) [20:25:28] <^d> Can we mine dogecoins? [20:25:33] memory gives me an amazingly high number of 120419019788.388 [20:27:44] wastl: What job number was that? [20:28:14] Oh, wait, you used 'qacct' without specifying a job number. That gives you the sum over all of your jobs ever. :-) [20:28:23] I was just running qacct - since it is a webservice I have no jobnumber [20:28:28] I was just running qacct - since it is a webservice I have no jobnumber [20:28:33] ah ok :) [20:29:03] wastl: No, like I told you last time, you can use qstat to find the job number while it's running. Hang on, lemme find out your last one. [20:29:24] sorry [20:29:49] Don't worry about it. :-) [20:31:17] It's just a bit longer to find it otherwise. [20:31:24] not sure if you guys know this - but Special:NovaProxy seems broken [20:31:33] I logged out / logged in many times [20:31:39] with cache deleting [20:32:09] wastl: Yep. Your last ended job reached "maxvmem 4.387G" that is why it was killed. [20:32:22] ok, that's bad [20:32:39] particularly since it's only vmem [20:33:05] milimetric: Known issue since this weekend. https://phabricator.wikimedia.org/T91114 [20:33:23] sorry for the doubletalk, thx [20:33:44] wastl: For comparison, your previous job - which you ended yourself (a restart, I think) maxxed out at 2.246G) [20:34:11] yes, thanks [20:34:18] wastl: It looks like you are leaking vmem. [20:34:24] I'll see why this is happening [20:34:32] well, more likely the library I am using [20:34:47] the funny thing is that it worked for several weeks without restarting before [20:35:10] wastl: Well, the leak may be data-driven; or actual usage of your tool may have increased. [20:36:22] not really, just a handful requests [20:36:31] but I'll investigate [20:46:04] 6Labs, 10Wikimedia-Labs-wikitech-interface: Proxy creation fails with opaque error message - https://phabricator.wikimedia.org/T91114#1079598 (10coren) p:5Normal>3High [20:47:05] Coren: I cannot really reproduce the problem, I was running a loadtest with siege and 20 concurrent requests for 30 secs on my laptop (1170 requests overall) and the memory consumption is constant, 2961 MB virtual and 13 MB resident [20:47:19] can I run a loadtest on the labs server? [20:47:51] wastl: Sure; the only thing you risk breaking is yours. [20:48:00] ok :) [20:48:09] since it is anyways behaving strangely ... [20:48:22] Wait, is this a big bandwidth hog or just processing? [20:48:31] Because bandwidth hogs /can/ affect others. :-) [20:48:52] I was planning to run siege -c 20 -t 30S on the service [20:49:38] no dramatic amount of data, just small JSON responses [20:49:38] That should be okay unless the requests or responses are really large. [20:50:39] in my local test with 1200 requests it said 0.18 MB transferred, so no bandwidth problem [20:51:32] ok, was killed again [20:51:48] after 45 requests [20:55:10] I was now trying to get the accounting data, qstat tells me my job id is 8572890, but qacct -j tells me job id not found [21:02:00] Coren: I think I solved it - the library I am using is computing the number of started threads based on the number of cores * 5, and probably the grid servers have some more cores than my laptop [21:02:20] so I set the number of threads now explicitly [21:02:35] still I cannot get the accounting data with the job ID [21:03:05] wastl: 8 or 16, depending on which. [21:03:56] wastl: You can't get accounting with a running job; those are postmortem. You /can/ see the live stats with 'qstat -j ' [21:04:01] 8572973 atm [21:04:08] ah good [21:04:29] wastl: In particular, you want the 'usage 1' line [21:04:39] yes, saw it [21:05:02] the thing is it only shows vmem, but in reality the service never uses more than a few MB [21:12:49] Coren: please restart the webserver for https://tools.wmflabs.org/fengtools/contribsize/result.php?user=Matanya [21:12:52] arrg [21:12:56] https://tools.wmflabs.org/magnus-toolserver/persondata.php [21:13:13] * Coren looks into it. [21:13:31] request # unsigned-int [21:13:38] matanya: It's already running - is it ill? [21:13:44] yes [21:13:47] see the link [21:14:03] returns something that looks like 503 [21:16:42] Coren: FYI: https://tools.wmflabs.org/catscan2/notice.html [21:17:55] matanya: That explains http://tools.wmflabs.org/catscan3/catscan2.php [21:18:04] matanya: I've restarted it. [21:18:10] thanks [21:18:43] Coren: all the saga around big bother, can I suggest we replace it with monit ? [21:19:27] matanya: It's not clear this would help; some of the tools are ailing not for lack of restarting, but because they have systemic instabilities that would take time to solve. [21:19:40] Coren: in qstat, what does the mem= column tell me? I read it is the accumulated memory consumption but has it any consequence? [21:19:46] I understand [21:20:05] can you name a few issues, and i see if i can help ? [21:20:17] it is the only memory count that is growing [21:21:30] mem= includes subprocesses, and cumulates. It's not a particularily useful metric, which is why resource allocation uses vmem instead [21:21:59] ok, so it will inevitably grow bigger over time [21:22:09] then I think the issue is solved [21:22:17] wastl: Yes, but it's not counted against the job. [21:22:40] thanks :) [21:28:26] matanya: phab all the way :D [21:28:43] * matanya dives deep [21:30:11] ok, bye [21:42:28] Coren: what about this tool? https://tools.wmflabs.org/wikiviewstats/ [21:42:45] sorry for driving you crazy, i would debug myself, if i could [21:43:03] matanya: Heh. Don't worry about it. [21:43:35] This one at least has a pretty error message. :-) [21:43:51] Hm. [21:43:58] unlike this 504: https://tools.wmflabs.org/dnbtools/dnb_wikisource.php [21:44:06] the tool author must be German [21:44:12] i can tell from the test image :) [21:44:22] hi mutante ! :) [21:44:29] it's what you saw on German tv after midnight [21:44:37] back in the days [21:44:43] matanya: hello:) [21:46:24] (08004/1040): Too many connections [21:46:26] Hm. [21:47:13] Coren: https://tools.wmflabs.org/bbc-tv-cite/ one more with webserver issues [21:47:23] and https://tools.wmflabs.org/catfood/catfood.php [21:47:29] matanya: That one just isn't running. [21:47:43] which one if them ? [21:47:46] matanya: Nor is catfood. Lemme check those two first - that's trivial. [21:47:52] https://tools.wmflabs.org/catfood/commons_image_feed.php [21:47:54] as well [21:48:34] matanya: put those on T90569 [21:49:12] i made T90800 but not sure if you want a separate one for each [21:49:51] Catfood back up - it had a typo in its .bigbrotherrc that prevented restarts [21:50:48] bbc-tv-site restarted; that one didn't have a .bigbrotherrc to restart its webservice at all. [21:52:54] Coren: the third one too please [21:53:14] What third one? [21:54:05] https://tools.wmflabs.org/catfood/commons_image_feed.php [21:54:24] That's also catfood. That's the second one. [21:54:48] thanks [21:55:47] matanya: The wikiviewstats one doesn't seem to be an error message at all but a hardcoded thing. [21:56:35] I see an actual error 2h ago, but fresh webserver restart doesn't report errors in the log. [21:58:06] Coren: what about written a status.wmflabs.org that will list the status of every service ? [21:58:12] * matanya just throws ideas [21:58:49] matanya: You mean, like https://tools.wmflabs.org/?status [21:59:05] matanya: That can't help for tools with errors. :-) [21:59:14] yes and no [21:59:38] Coren: something along : http://status.wikimedia.org [22:00:08] but instead of services the status of the tool [22:00:09] we had icinga [22:00:28] me and you know how to access that [22:00:29] matanya: Sounds like a worthwhile hackathon thing to do. [22:00:33] average user doesn't [22:01:03] yes, Coren do you have vacations days to spare for me ?:) [22:01:23] Surely thou jests! :-) [22:06:45] matanya: Sorry, can't seem to see what could be wrong with wikiviewstats. It logs no error, and unconditionally returns that message. AFAICT, that only should happen when it can't connect to the database but I see nothing wrong with them nor does it report a failure. [22:07:18] well, too bad. thanks for checking [22:08:16] matanya: And when I connect to the tool's DB with the tool's credentials, it works fine. [22:08:38] so it is a hardcoded error :) [22:08:45] https://phabricator.wikimedia.org/T63833 [22:08:50] ^ wikiviewstats bug [22:09:10] (yea, it is kind of abused for a different thing, should be 2) [22:09:54] matanya: I'm trying to see if it's using the right credentials. [22:11:07] Huh. That's odd. [22:12:02] There are two sets of creds, one of which is specifically connecting to the wrong db [22:13:46] 10Tool-Labs-tools-Other: wikiviewstats - No db-connection - https://phabricator.wikimedia.org/T91320#1080054 (10Dzahn) 3NEW [22:14:45] 6Labs, 10Wikimedia-Labs-Infrastructure: Fix syslog error "nslcd[29117]: error writing to client: Broken pipe" - https://phabricator.wikimedia.org/T78616#1080067 (10Krinkle) It seems on Trusty instances this is frequenting the logs quite a lot. About a dozen syslog entries every minute. [22:14:53] 10Tool-Labs-tools-Other: Wikiviewstats does not support Wikidata - https://phabricator.wikimedia.org/T63833#677111 (10Dzahn) >>! In T63833#823180, @Andyrom75 wrote: > It seems that Hedonil do not connect since 08/2014. Can anyone support the resolution of db connection on his behalf? I made a separate ticket fo... [22:15:42] 10Tool-Labs-tools-Other: wikiviewstats - No db-connection - https://phabricator.wikimedia.org/T91320#1080054 (10Dzahn) [22:16:03] matanya: Nope, sorry, can't figure out what's wrong with the tool, but it's clear that something is confused in its configuration. It's using wrong ports for some things, apparently. [23:01:34] andrewbogott: fyi: Looks like provisioning the instances will take at least another 1-2 days. [23:02:22] As usual, whenever we need to do anything with puppet in CI, it turns out everything is broken since the last time since we created an instance. I've resolved 4 blockers. Only only exposed after the other. It seems there's only one left, but not sure whether something else will expose itself after that. [23:02:36] Tracking via https://phabricator.wikimedia.org/T90984 [23:03:05] So we're still on double quota for the moment. Which is stressing our puppetmaster with limited /var/log, but anyways. Just letting you know since it was supposed to be done by today. [23:07:17] Krinkle: that’s a good reason to build new instances every week :) It keeps puppet in line [23:07:55] andrewbogott: Soon we'll be re-creating them continously ;-) [23:08:09] perfect [23:08:16] 1000s a day [23:08:25] ^_^ [23:08:54] aka node pool / isolation testing [23:09:26] We'll put the 'continuous' back in 'continuous integration' :P [23:11:09] heh [23:11:17] that's a good motto :P