[00:18:42] Hm… I'm here to deal with the outage but it seems to've been fixed already. [00:18:54] Good news I guess :) [00:19:07] back soon... [07:01:30] (03CR) 10Adamw: "lol, good judgement call!" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/112311 (owner: 10Adamw) [08:00:58] hey andrewbogott [08:01:00] andrewbogott: I have an instance where I am trying to sudo, and can't [08:03:27] what instance? [08:04:28] YuviPanda: ? [08:04:37] andrewbogott: dptypes [08:04:42] (design-prototypes) [08:04:57] andrewbogott: I had a couple of other instances that had the same problem, so I just deleted and re-created them. still have it [08:05:26] And you checked the sudo policy for the project already? [08:06:17] andrewbogott: yeah [08:08:12] andrewbogott: everyone can be everyone else [08:09:08] you should never need to delete an instance to get sudo working [08:09:09] it uses LDAP [08:09:34] Ryan_Lane: true, but I had used labs::vagrant and thought that fucked it up. [08:09:48] oh [08:09:50] Ryan_Lane: and then when I deleted the second one I was slightly drunk and it was saturday night where nobody else was around :P [08:09:53] but apparently not [08:09:57] :D [08:10:13] ok, more than slightly. but still :P [08:10:49] YuviPanda: it looks to me like you have a special policy just for you, that forbids everything [08:11:17] andrewbogott: oh? [08:11:29] Ryan_Lane: where? I don't see it [08:11:31] err [08:11:32] andrewbogott: ^ [08:11:45] this project is called 'design' right? [08:11:45] andrewbogott: and I only checked myself after this problem started occuring [08:12:09] Oh, hm, maybe I'm misreading. Anyway I'm going to clean that up [08:12:15] andrewbogott: yeah [08:12:16] andrewbogott: ok [08:12:35] but you're right, it shouldn't have mattered. [08:13:23] YuviPanda: Can you check another project and see if the behavior is different? [08:14:26] andrewbogott: let me do it [08:16:01] andrewbogott: worked for multimedia-alpha [08:21:25] andrewbogott, yuvipanda: it's possible to debug sudo ldap [08:21:27] on the system [08:21:50] hmmm [08:21:52] SUDOERS_DEBUG [08:22:09] I guess I can't do it, not having root [08:22:11] * yuvipanda tries [08:22:15] hah [08:22:16] right [08:22:22] Yuvi - I got the HTTPS stuff worked out on UTRS [08:23:16] andrewbogott: that's set in /etc/sudo-ldap.conf [08:23:19] Ryan_Lane: So, since sudo works for me anyway, does that help? [08:23:32] andrewbogott: you can su to root, then su to yuvi [08:23:34] on that box [08:23:37] to see what's happening [08:23:42] with debug enabled, of course [08:24:21] "sudoRunAsUser: %project-design" <- most likely the problem [08:24:45] TParis: wooo! [08:24:52] BTW, IIRC there had been some failures to login b/c of /public/keys not working. What's the reason not to use LDAP for auth on the instances? [08:24:55] well, maybe… I can't remember what that means [08:25:14] scfc_de: no support in ssh to read keys from ldap [08:25:23] not with default ssh. you need to have a patched ssh [08:25:27] in 14.04 this changes [08:25:34] Ryan_Lane: That's a very valid reason :-). [08:25:38] there's a way to write scripts that handle the keys [08:25:55] so we'll read keys from LDAP directly for >=14.04 [08:26:33] hm… Ryan_Lane, what's supposed to happen when I set SUDOERS_DEBUG? [08:26:42] SUDOERS_DEBUG=3 [08:26:43] or so [08:26:50] http://www.sudo.ws/sudoers.ldap.man.html [08:27:08] andrewbogott: wat. I am root now :| [08:27:11] i can be root now [08:27:16] when you use sudo after that it'll show all the debug info [08:27:21] Yeah, the 'run as' was set weird. [08:27:26] ah [08:27:35] The gui is kind of dumb, it shouldn't offer you that option. [08:27:46] I had run-as set? [08:28:14] YuviPanda: I think so… I just created a different project for you a few days ago, right? What was that called? [08:28:17] I'll check and compare. [08:28:27] andrewbogott: that is this one :P [08:28:31] oh [08:28:32] andrewbogott: 'design' [08:28:32] hm [08:28:52] I thought there were two [08:29:06] no? [08:29:07] just 'design' [08:29:31] anyway, I'll just pick a different recent one [08:29:40] ok! [08:32:39] Yeah, YuviPanda, other new projects don't have that box checked. But nonetheless, the box shouldn't be there, so that's the real issue. [08:32:49] hmm, ok [08:32:53] Do you have a minute to make a bug for me about that? [08:33:01] *shrug* Actually,I can do it. nm [08:33:30] andrewbogott: ah, ok! [08:33:56] wikitech is so slow though, on any of the Nova* pages [08:34:19] scfc_de: [08:34:33] andrewbogott: I still can't sudo into being another user. [08:34:55] scfc_de: Guten Tag Tim. I have made a tarball for the old meetbot logs ( https://bugzilla.wikimedia.org/show_bug.cgi?id=61128 , assigned to you :D ) [08:35:02] andrewbogott: I could do 'sudo -s' first and then su vagrant [08:35:39] hmm, ok, I am running a vagrant provision [08:35:57] andrewbogott: hmm, 'only root can execute commands as other user'? [08:36:07] YuviPanda: there isn't anyone in that project but you :) [08:36:10] Well, and me I guess. [08:36:17] andrewbogott: there's a local user called 'vagrant' [08:36:39] Oh... [08:36:48] a local user, I don't know that you can do that via the gui. [08:37:06] yeah, but I do have a sudoers.d file [08:37:09] my role puts it there [08:37:28] hashar: Thanks! I'll put them there later. [08:37:48] scfc_de: and thank you very much to have handled the Meetbot installation :-] [08:38:38] hashar: np, it was an interesting experience :-). supybot is very far from a traditional "these are your options, now run a daemon with it". [08:39:10] YuviPanda: I don't immediately know how to fix this. I would think that modifying stuff in /etc/sudoers.d/ would do the trick. [08:39:28] scfc_de: iirc it is more like, run daemon, let you figure out what python code you have to write to set it up properly :-D [08:39:31] andrewbogott: let me look at my sudoers.d, my role does put something there [08:39:34] If it doesn't then this will be a somewhat big development issue. You can log a bug and pester me about it if it's important. [08:39:46] scfc_de: as for packaging Meetbot, I tried to reach out upstream but never get any reply :-( [08:39:48] Anyway, you can sudo su - [08:39:54] so that should get you… anything you actually need? [08:40:07] sudo cat /etc/sudoers.d/vagrant [08:40:08] vagrant ALL=(ALL) NOPASSWD:ALL [08:40:09] andrewbogott: ^ [08:40:37] andrewbogott: labs-vagrant tries to run as root sometimes and as vagrant other times, so needs sudo [08:40:42] so are we talkinga bout granting rights /to/ vagrant? [08:40:50] Or granting rights to you so you can become vagrant? [08:40:54] hashar: MeetBot seems very ... "diversely maintained". The links on the Debian page are broken, and there are lots of repos. But I think copy-catting the OpenStack people isn't a bad choice. [08:41:13] YuviPanda: OK, well, as I said… "If it doesn't then this will be a somewhat big development issue. You can log a bug and pester me about it if it's important." [08:41:13] andrewbogott: *to* vagrant [08:41:20] andrewbogott: heh, ok! [08:41:24] wait, now you're just saying both things! [08:41:34] andrewbogott: yes! both are an issue! [08:42:00] Well… ldap doesn't know about the vagrant user, so that should definitely be configurable in /etc [08:42:12] I'll turn on debugging so you can see what's happening. [08:43:00] andrewbogott: I can't do sudo -u vagrant -s, nor can I become root from vagrant [08:43:01] ok [08:43:09] done. Very wordy! [08:43:20] I am many hours late for my lunch, so -- back later. [08:43:40] andrewbogott: baaah, works now [08:43:42] grrr [08:43:44] andrewbogott: anyway, nevermind [08:43:47] :p [08:44:01] * andrewbogott didn't do it! [08:44:23] heh [09:01:39] scfc_de: I used the Meetbot version from a darcs repository [09:02:17] scfc_de: from https://wiki.debian.org/MeetBot : darcs get http://anonscm.debian.org/darcs/collab-maint/MeetBot/ [09:02:23] scfc_de: i guess the openstack version is fine :-] [09:04:44] hashar: I'm *so* sure that I tried that and got an error; I wouldn't have been so stupid to mistake the error message that darcs isn't installed for that, would I? :-) Well, I want to package MeetBot properly anyway, so it will be updated in time :-). [09:05:58] scfc_de: I am sure lot of people would love to just: apt-get install meetbot :D [09:08:54] hashar: I think it should be a two-parter: One package that goes under /usr/share/something/python-modules/supybot/plugins, and maybe another for the app that just calls supybot with appropriate options. But for the latter, it would probably be good to refactor supybot so that it is more similar to common Unix daemons, and that's a lot of work. [09:16:56] scfc_de: yup probably lot of work indeed. That is certainly why Meetbot is not packaged yet :] [12:41:08] scfc_de: i'm getting 502 on labs webs [12:41:40] matanya: Which URL? [12:41:44] The proxy server received an invalid response from an upstream server. [12:41:44] The proxy server could not handle the request GET /rightstool/cgi-bin/recentlogs. [12:41:44] Reason: Error reading from remote server [12:41:48] http://tools.wmflabs.org/rightstool/cgi-bin/recentlogs?user=matanya [12:42:47] s/labs/tools/ [12:43:12] yeah, that [12:43:17] thanks Damianz [12:44:00] matanya: "[Mon Feb 10 12:43:11 2014] [error] [client 10.4.1.89] Script timed out before returning headers: recentlogs, referer: http://tools.wmflabs.org/rightstool/cgi-bin/recentlogs" [12:44:20] too heavy, i guess [12:44:30] Too long? [12:44:49] i think the query is too heavy, and hence, takes too long [12:45:32] Potato, potato ... :-) [12:46:19] Didn't look closer at your query but you seem to select logging for user; there is a special table for that with indexes. Moment. [12:47:05] matanya: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Tables_for_revision_or_logging_queries_involving_user_names_and_IDs [12:47:37] oh, thanks scfc_de [14:21:24] petan: wm-bot is present in #wikimedia-operations, but hasn't written the log since yesterday, [04:28:29]. [14:21:35] yes I know [14:21:39] petan: And here neither. [14:21:45] because /data/project is read only [14:21:53] it is waiting to flush the logs to fs [14:21:59] but someone need to fix gluster [14:22:09] now it's in memory [14:22:17] let's hope it won't crash until someone fix gluster [14:22:27] btw I think that mysql logs are ok [14:22:32] andrewbogott: Could you take a look at that? Is it possible to fix Gluster on the Bots project without reboot? [14:22:47] scfc_de: http://tools.wmflabs.org/wm-bot/logs/ [14:22:55] it uses double storage [14:24:16] it once happened that bot was holding data in memory for weeks until gluster got fixed... [14:24:27] that was a funny resync [14:25:00] petan: Looks okay; still I prefer the text files :-). I have commands /todayslog and /yesterdayslog that just load the corresponding text files. [14:25:18] yes I prefer text files too, that's why it produces them as well [14:25:26] especially because I can wget the tarball with all of them [14:25:47] should send them to elasticsearch :P [14:26:03] wm-bot can't do magic, if there is no filesystem to write to, it doesn't write... [14:26:16] scfc_de: Back -- I'll take a look. [14:26:42] scfc_de: is the problem /home or /data/project? [14:26:51] /data/project [14:27:13] if you do sudo touch /data/project/meep it will slap you [14:27:44] restarting autofs didn't fix it [14:27:57] try now [14:28:14] wm-bot: go ahead ;P [14:28:15] Hi petan, there is some error, I am a stupid bot and I am not intelligent enough to hold a conversation with you :-) [14:29:04] andrewbogott: I forgot: Is there some plan to move off Gluster, or just occasional ranting? :-) [14:29:20] scfc_de: yes, in the new datacenter everything will use nfs [14:30:42] scfc_de: it's back [14:31:26] eqiad - it sounds like some fantastic place in a movie that is never quite reached :-). [14:31:34] scfc_de: according to my personal statistics, since tools moved from gluster to nfs, nfs started to have outages more often than gluster [14:32:02] petan: And the logs look fine; thanks! [14:32:06] so even when it's on nfs I expect troubles :P [14:32:17] but they'll be /different/ troubles! [14:32:21] scfc_de: I didn't even do anything :D [14:32:30] * petan pats wm-bot [14:32:30] wm-bot, hi [14:32:39] petan: ... anything *now*, but previously :-). [14:33:44] wm-bot, ping [14:34:00] mm... [14:34:01] umm.... [14:34:01] oh right [14:34:02] !ping [14:34:02] !pong [14:34:04] Yesterday I thought if we should maybe partition NFS in smaller pieces (say, each tool or so), so that a failure in XFS doesn't cause the whole cluster to halt, but unfortunately that probably doesn't work well for a few hundred NFS volumes :-). [14:35:33] scfc_de, that way only one tool fails and not all? [14:36:14] Gotta go. [14:36:27] Cyberpower678: That was the idea; and it probably works for a dozen or so volumes, but not on Tools scale :-). [14:49:45] andrewbogott: i am unable to ssh into our wikidata-jenkins instance [14:50:19] i wonder if there is a way to "fix" it or do i need to make a new instance? [14:50:36] aude: I'll have a look… have a call in a few minutes though [14:50:41] ok [14:51:01] i know there were some issues during the weekend on labs, so perhaps related [14:51:06] aude, what's the project name? [14:51:13] wikidata-dev [14:51:36] mostly puppetized but might be a few things not [14:52:27] aude, try now? [14:53:08] Unable to create and initialize directory '/home/aude'. [14:53:15] that's new [14:53:59] * aude in now :) [14:54:05] ok [14:54:11] thanks! [14:54:20] I wonder if we just crossed some numeric threshold with gluster, it's failing like crazy this week... [14:54:25] after a couple of months of peace :( [14:54:33] no idea [15:21:34] Hey folks. I was asked to help get a set of new users approved for Tool Labs. The access requests should have already been filed. All summaries should reference "Taha Yasseri's Northeastern Workshop". [15:21:47] Can someone take a look? [15:23:25] petan, addshore: ^ [15:23:40] Thanks Krenair [15:23:59] ok sec [15:27:20] halfak: There are no access requests that have not been completed. Do you have a link? [15:28:03] Not exactly. I got the message 5 hours ago. I looked around wikitech, but I couldn't figure out where to look for open requests. [15:28:44] halfak: [15:28:45] https://wikitech.wikimedia.org/w/index.php?title=Special%3AAsk&q=%5B%5BCategory%3ATools+Access+Requests%5D%5D%5B%5BIs+Completed%3A%3ANo%5D%5D&po=&eq=yes&p%5Bformat%5D=broadtable&sort_num=&order_num=ASC&p%5Blimit%5D=&p%5Boffset%5D=&p%5Blink%5D=all&p%5Bsort%5D=&p%5Bheaders%5D=show&p%5Bmainlabel%5D=&p%5Bintro%5D=&p%5Boutro%5D=&p%5Bsearchlabel%5D=%E2%80%A6+further+results&p%5Bdefault%5D=&p%5Bclass%5D=sortable+wikitable+smwtable&eq=yes [15:29:20] halfak: I don't see any open requests at https://wikitech.wikimedia.org/wiki/Special:RecentChanges, either. [15:29:41] Gotcha. Might have been a miscommunication. I'll go back to the event organizer. Thanks for taking a look. [15:29:48] np [15:34:58] andrewbogott: when you have a minute, wikidata-jenkins is inaccessible again :( [15:35:15] are we doing something wrong or waht is the problem? [15:35:16] hm… I can do what I did before... [15:35:33] i can pull latest puppet (when i get in again) [15:35:39] I think it's just that gluster is over-taxed. I don't know what to do about that in the short run. [15:35:45] ok [15:35:46] It's nothing happening on the instance. [15:36:04] even http doesn't work (did before) [15:37:20] oh, that's something else then! [15:37:37] could be [15:37:56] maybe i can reboot this time [15:40:49] yeah, might be oom or something [15:41:07] should i do reboot? [15:42:21] sure [15:42:39] If http is broken then something is wrong local to the instance -- largely beyond my influence :) [15:42:59] You guys having issues with GlusterFS as well? [15:43:27] lol [15:43:37] When does one not have issues with glusterfs? [15:45:48] oh, i'm in again [15:45:50] ok [16:28:44] andrewbogott: sorry forgot to update jenkins-deploy home directory in LDAP and I can't do it apparently :( [16:28:57] andrewbogott: would you mind changing jenkins-deploy homedir to point to: /mnt/home/jenkins-deploy please? :-] [16:29:47] I'll see if I can do it without puppet countering [16:30:43] hashar, surely you have sudo on that machine? (I don't even have a login atm) [16:31:30] andrewbogott: that is on virt0 isn't it ? [16:31:33] I dont have access there [16:31:45] ? [16:32:03] I mean the LDAP server. I dont think I have any credentials to do ldap modifications :/ [16:32:07] You're asking for a change on jenkins-deploy, wouldn't that change be made on... [16:32:47] Oh, you mean the /user/ jenkins-deploy? [16:32:56] the user is not a systemuser{} declared by puppet, sorry [16:33:22] yeah the LDAP user :-] [16:33:23] sorry [16:33:35] I should commute to singapore from time to time [16:45:30] hashar, how's that? [16:45:43] my hero :-] [16:53:38] HOME=/mnt/home/jenkins-deploy [16:53:46] andrewbogott: THANK YOU VERY MUCH [16:54:14] hashar: sure -- the hard part was figuring out the ldap password :) [16:54:23] :]] [18:48:11] !ping [18:48:12] !pong [18:51:03] !searchlogs [18:51:07] !searchlog [18:51:07] http://bots.wmflabs.org/~wm-bot/searchlog [21:52:55] Coren: Logging into tools-webproxy, I get "Creating directory '/home/scfc'." => "Unable to create and initialize directory '/home/scfc'." => "Connection to tools-webproxy.pmtpa.wmflabs closed.". Sounds like /home mount is broken? [23:00:41] scfc_de: On webproxy? Odd. Lemme check. [23:16:12] !log tools rebooting webproxy (braindead autofs) [23:16:14] Logged the message, Master [23:17:44] scfc_de: Should be all better now. [23:18:07] * Coren hates autofs [23:23:35] Yep, it seems to work. Do we really need autofs? [23:27:17] scfc_de: We don't need it at all in our use case, IMO. It adds a layer of unneeded complexity since the mapping are few, very static, and explicitly in puppet anyways. [23:28:07] Coren: So replace them with normal puppetized fstab mounts some time in the future? [23:46:53] le tools web est cassé :/ [23:49:13] Coren: scfc_de: https://tools.wmflabs.org/ is down [23:52:14] hedonil: Working on another outage in prod atm, will look at it shortly. [23:52:30] Coren: 'k [23:53:03] "Syntax error on line 42 of /etc/apache2/sites-enabled/100-webproxy.conf:" [23:53:09] "RewriteMap: file for map dynamic not found:/var/run/apache2/dynamic" [23:54:10] Ah, the /var/run symlink got killed by the reboot. Moment. [23:58:11] hedonil: Should be back now. [23:58:27] scfc_de: aaah! very good. thx