[00:20:14] how do I schedule a crontab? Should I do it from my account? IE local-voxelbot doesn't have permission to set a crontab [00:20:33] er, you should be able to do it as your tool [00:20:53] crontabs/local-voxelbot/: fdopen: Permission denied [00:21:05] My tool can'... what sdamashek said [00:21:07] *sigh* [00:21:18] fwilson: I'm discussing that with legoktm :P [00:21:22] :) [00:21:33] is that when you try crontab -e? [00:21:45] yes [00:21:51] weird [00:21:56] file a bug :) [00:22:09] on tools-login [00:22:29] tools-login is correct? [00:23:48] yeah [00:43:22] legoktm: I noticed local-voxelbot isn't in the crontab group [00:43:29] ouch. [00:43:32] should that be automatic? [00:43:38] probably [00:48:20] bug fileds [00:48:22] filed* [00:51:02] Tools don't need to be in crontab group. [00:51:26] oh they don't? [00:51:33] how should I schedule crontabs then? [00:51:39] from my account? [00:52:10] sdamashek: Do you "become voxelbot", and then "crontab -e"? [00:52:20] scfc_de: exactly [00:52:51] Hmmm. What does "crontab -l" say? [00:53:02] local-voxelbot@tools-login:~$ crontab -l [00:53:02] crontabs/local-voxelbot/: fopen: Permission denied [00:53:21] cause it's not in the crontab group [00:54:55] No, that's not the cause. local-typoscan isn't in the crontab group either, and "crontab -l" works perfectly. "crontab" is setuid, so the calling user doesn't need to access /var/spool/cron/crontab. [00:55:23] any ideas then? [00:57:09] I don't think normal users can misconfigure their tool accounts in a way to shut them out from cron, and the admins petan and Coren are away, so you'll probably have to wait till tomorrow. [00:57:36] okay [01:00:18] Ryan_Lane, are you currently grinding through instances with puppetValues.php? I'm happy to do the cleanup but don't know how to generate the list of 'old salt' instances. [01:00:32] I did, yeah [01:00:42] still a bunch of instances not reporting with salt, though [01:00:47] not totally sure why [01:01:15] sdamashek: "man crontab" says: "There is one file for each user's crontab under the /var/spool/cron/crontabs directory. Users are not allowed to edit the files under that directory directly to ensure that only users allowed by the system to run periodic tasks can add them, and only syntactically correct crontabs will be written there. This is enforced by having the directory writable only by the crontab group and configuring crontab [01:01:15] command with the setgid bid set for that specific group." [01:01:32] sdamashek: So the trailing "/" in your error message looks very odd. [01:01:49] andrewbogott: I think I got all the ones I can [01:02:23] 30 more -- big improvement! [01:02:40] yeah [01:02:50] Still leaves a lot of mysterious lost instances though. [01:02:52] and at least 5 instances are ubuntu 11.10 [01:02:56] sdamashek: If you do "env | fgrep voxelbot" as the tool account, are there any "/"s there? [01:03:17] still, though, that's like 60 not reporting [01:06:27] Is it possible to make a lucid instance behave? Or are lucid instances permenantly out of the game? [01:10:38] vacation9@tools-login:~$ env | fgrep voxelbot [01:10:39] OLDPWD=/data/project/voxelbot/VoxelBot [01:11:06] sorry irc froze [01:11:31] sdamashek: erm, thats not as your local-user [01:11:42] he said as the tool account [01:11:56] which means $ become voxelbot; first [01:12:15] oh [01:12:23] local-voxelbot@tools-login:~$ env | fgrep voxelbot [01:12:23] USER=local-voxelbot [01:12:23] USERNAME=local-voxelbot [01:12:23] MAIL=/var/mail/local-voxelbot [01:12:24] PWD=/data/project/voxelbot/ [01:12:26] HOME=/data/project/voxelbot/ [01:12:28] LOGNAME=local-voxel [01:12:37] sorry for the spam [01:16:48] Hmmm. I don't understand that, but Coren or petan will have to debug that. [01:18:02] andrewbogott: they should work. is something broken with one? [01:18:21] Nope, just wondering if I should be exluding lucid boxes from the instances that need fixing [01:19:02] precise and lucid should work [01:19:07] only 11.10 are broken [01:19:16] andrewbogott: I have a list of downed hosts on virt0 [01:19:29] hosts-down.txt in /root [01:19:54] this instance is hosed: https://wikitech.wikimedia.org/wiki/Nova_Resource:I-000000e2 [01:20:02] drdee: ^^ [01:20:06] that's in analytics [01:20:11] I think it's been dead for ages [01:20:32] in fact, I think it was one of the 30 that was corrupted ages ago and I just never deleted it [01:20:35] Ah, I just generated that same list. Matches. [01:20:53] I'll start at the end :) [01:20:59] heh [01:21:00] cool [01:21:01] thanks [01:21:12] yep. that one is long dead [01:21:14] I'm deleting it [01:21:40] hm, of course there's the problem of instances that predate my root key :( [01:24:44] andrewbogott: mark the ones you get permission denied on in the file [01:24:55] hm. [01:24:58] maybe copy the file [01:25:25] Yeah, I have a copy [01:25:39] When puppet says 'err: Could not request certificate: getaddrinfo: Name or service not known' that makes me think that the instance never came fully up to begin with [01:25:49] there's nothing to do about that, is there? [01:26:20] Oh, or it's possible I just forgot to sudo :( [01:28:27] andrewbogott: some instances may be missing /etc/puppet/puppet.conf [01:29:15] I ran a ddsh to kill puppetmaster::self on instances the class was removed and ran a command to fix puppet.conf [01:29:37] but some instances hadn't run puppet in so long that they didn't have /etc/puppet/puppet.conf.d [01:29:45] will a clean puppet run generally bring salt up, or is there another step to get it going? [01:32:01] clean puppet run should [01:32:06] look in /etc/salt/minion [01:32:20] Ah, good, this one just started working. [01:32:22] if master_finger ends in dd it should be ok [01:35:58] OK… this one is saying "Exiting; no certificate found and waitforcert is disabled" and it has a puppet.conf dated today [01:37:13] andrewbogott: is it empty? [01:37:33] if so, copy a good one from bastion-restricted and change the instance-id in it [01:37:41] puppet.conf? No, not empty. Looks right. [01:37:46] oh [01:37:46] hm [01:37:52] which instance? [01:38:14] bots-gs [01:38:22] which should maybe just be deleted anyway [01:40:08] ah. I know why [01:40:19] look at its hostname in puppet.conf [01:40:30] there's some package that fucks with the hostname [01:41:02] hm. no petan around.... [01:41:40] I'll just make a note to ask him tomorrow… maybe we can delete it. [01:42:16] * Ryan_Lane nods [01:46:29] I'm going to try to resize this tiny instance [01:46:39] hopefully I don't break it [01:46:53] of course, I have serious doubts its being used. / has been at 100% for ages [01:53:28] bleh. I already can't wait to upgrade to grizzly [01:54:19] damn it [01:54:24] killed that tiny instance [01:58:21] ImageNotFound: Image 28 could not be found. [01:58:23] bleh [01:58:33] annoying [01:58:53] thanks diablo -> essex upgrade for that one [01:59:53] are you getting lots of 'no route to host'? [02:00:30] yep [02:00:36] if you get that it likely needs to be rebooted [02:00:42] it probably OOMd [02:08:09] i-0000018f.pmtpa.wmflabs is dead and is 11.10 [02:09:47] I've emailed the owner [02:12:58] i-000001e5.pmtpa.wmflabs was in some weird state in nova [02:14:00] it wouldn't let me reboot it via the nova command, so I changed its state via the database [02:14:02] then rebooted it [02:16:30] same with i-00000207 [02:17:37] hm… the home-migrate project can die, right? That was you and me moving things around... [02:17:55] yep [02:18:11] anything in the project anyway [02:18:15] deleting the project itself is hard [02:18:22] it may leave lingering resources [02:20:25] hm… speaking of which, this instance seems to belong to a no-longer-existing project. -00000500 [02:20:58] ewwww [02:21:07] which project is it supposed to belong to? [02:21:27] ah. I see [02:21:31] that's problematic [02:21:38] I'll recreate the project [02:22:07] and I'll delete the instance [02:22:17] done [02:22:35] There's another one... [02:22:42] 4fa [02:22:54] same project. [02:24:07] deleted [02:24:32] are you down to 2dd yet? [02:25:33] 2dd? [02:25:37] I don't see it in my list [02:25:42] I'm at 207 [02:26:01] it's possible I deleted it? [02:26:36] not sure… it rejects my key [02:26:57] the list I'm working on is hosts-down-andrew-progress.txt [02:27:05] yeah. I deleted it [02:27:12] it should disappear in a bit [02:27:44] ah. we only have 211 and 21c left [02:28:19] heh. except for the no-root key ones [02:28:24] I'll start on those [02:28:44] yeah, that's most of 'em [02:28:54] I'm working on 21c now [02:29:01] I get no host for i-00000282.pmtpa.wmflabs [02:29:03] must be deleted [02:29:18] yep [02:34:18] New patchset: Tim Landscheidt; "Test that Perl scripts are compilable before packaging." [labs/toollabs] (master) - https://gerrit.wikimedia.org/r/68330 [02:35:31] oops, this one is 11.10, no version of salt available. [02:36:26] yep [02:36:34] I've been emailing folks with 11.10 instances [02:36:41] which one are you on? [02:36:54] i-0000061d.pmtpa.wmflabs doesn't let me in either [02:37:12] rebooting it [02:39:50] i-0000051a isn't in the default security group [02:39:56] there's no way it's ever been used. deleting [02:41:05] rebooted 211 and it seems to be ooming again [02:41:26] heh [02:41:36] I'll email about 211 and 21c [02:41:58] it's ooming as a medium? ouch [02:42:16] wtf is going on on that instance? [02:44:18] ok. taking a break for food [02:44:31] Yeah, I'm going to go in a minute too. I'll catch up in the morning. [02:48:56] we're up to 336 out of 380. I feel like that number on wikitech must be off by a few [03:37:34] New patchset: Tim Landscheidt; "Fix variable declaration and enable warnings in job." [labs/toollabs] (master) - https://gerrit.wikimedia.org/r/68331 [04:08:33] wikidatawiki_p replag is increasing :( [04:35:38] why database dump do I want to try and get a complete mirror of wikipedia working? [04:35:45] which* [06:23:54] YuviPanda: ping [06:23:58] ponnng [07:46:03] Change on 12mediawiki a page Wikimedia Labs/Migration of Toolserver tools was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=710639 edit summary: [+240] /* What do I have to think of when finishing migration? */ after-migration process (draft) [08:05:41] Change on 12mediawiki a page Wikimedia Labs/Migration of Toolserver tools was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=710643 edit summary: [+300] /* What do I have to think of when finishing migration? */ switching the url [09:13:12] petan: around? [09:13:21] hi [09:13:29] hi [09:13:42] so what's the position on having long running processes on tools-labs? [09:13:44] specifically... deamons? [09:14:14] what exactly you mean [09:14:58] do you mean system daemons (like some service which is part of tool labs infrastructure) or some tool? [09:15:46] petan: some tool [09:16:16] that is something what should be easily possible I guess [09:16:20] !log integration added Yuvipanda to the project with su rights. [09:16:21] Logged the message, Master [09:16:26] you just run it as continuous task [09:16:37] petan: ah. [09:18:12] petan: specific thing in this instance is to listen to gerrit's stream constantly and do things [09:22:58] ok [09:29:46] !log integration integration-apache1 updated the proxy rules in /etc/apache2/sites-enabled to use the integration-jenkins2.pmtpa.wmflabs hostname instead of an IP. [09:29:48] Logged the message, Master [09:32:45] YuviPanda: depending on what you want to do, emails might be more reliable :-) [09:32:59] but anyway, jstart is what you're looking for [09:33:17] yeah, I decided I don't want to maintain something of that sort [09:33:24] will just integrate into our production zuul [09:36:26] !log integration upgraded Gerrit on integration-jenkins2 to 2.6-rc0-144-gb1dadd2 which comes from apt.wm.o [09:36:28] Logged the message, Master [09:42:34] where can I see the error log for my php page [09:42:59] on toolserver [09:52:08] Oren_Bochman: For genuine PHP errors, there's php_error.log in the tool's directory. But server errors aren't viewable by users yet. [09:53:30] thamks [10:08:14] at last I figured out the bugs [12:17:30] !log tools petrb: killing process 31186 31185 69 Jun11 pts/32 1-13:14:41 /usr/bin/perl ./bin/catpagelinks.pl ./enwiki/target/main_pages_sort_by_ids.lst ./enwiki/target/pagelinks_main_sort_by_ids.lst because it seems to be a bot running on login server eating too many resources [12:17:32] Logged the message, Master [12:19:53] theo|cloud, carl-cbm [12:20:01] you are maintainers of local-enwp10? [12:20:12] it is running too many cpu expensive jobs on tools-login [12:21:15] !log tools petrb: killing process 31190 sort -T./enwiki/target of user local-enwp10 for same reason as previous one [12:21:16] Logged the message, Master [12:22:19] !log tools petrb: killing process 31187 sort -T./enwiki/target -t of user local-enwp10 for same reason as previous one [12:22:20] Logged the message, Master [12:22:48] petan: Load on tools-login is less than 2 (and has been for fifteen minutes). [12:23:35] that box has 1 cpu, load higher than 1 isn't good for interactive machine [12:23:51] these all processes were eating 99% of cpu all time [12:24:46] and, according to rules, the bots and similar tasks /should/ run on grid, or at least on execution nodes, not on login server... it slows the box down and causes various troubles [12:25:34] petan: I just wanted to say that there's no urgency. It would have been totally sufficient to ask them to clean that up. [12:25:57] I did sent them e-mail like 2 hours ago and changed priority to 19 [12:26:05] no response... [12:26:46] Some people have a life (or sleep) apart from Wikipedia, and does changing the priority affect the load? [12:27:31] no it doesn't affect the load, but it makes the processes cause less troubles as they have lower priority than interactive processes which are supposed to run there [12:28:57] this isn't the only bot running on -login, I wouldn't have killed it if it wasn't eating as much resources as it could... [12:30:28] tools-login didn't feel laggy to me. [12:31:53] idk... Coren told me to kill all similar processes on -login (and I am actually not doing it, except for really evil ones) if you disagree with that, why you don't ask Coren to change the rules? he is the boss... I am just a monkey [12:35:17] What rules? http://tools.wmflabs.org/?Rules is still empty. But if you're more comfortable, I'll discuss it with your keeper :-). [12:43:24] !log tools petrb: tools-webserver-01 is running quite expensive python job (currently eating almost 1gb of ram) it may need to be fixed or moved to separate webserver, adding swap to prevent machine die OOM [12:43:25] Logged the message, Master [12:46:39] Didn't Coren set a memory limit on the webservers of, hmmm, 512 MByte or something? [12:46:53] I don't know :/ these things aren't documented [12:47:02] but if he did, he did it wrong [12:47:18] because this is a single process with more than 1000mb of resident ram [12:48:33] Did he only change that for PHP? Hmmm. [12:48:53] idk... anyway that python job just died / ended [13:20:44] a930913: you are maintaining bracketbot, can you please move the bot from tools-login to grid? [13:29:01] whats the naming convention for the different language wikis? [13:29:26] enwiki points to what replica? dewiki points to what replica? [13:30:31] FutureTense: database names. XXwiki = XX.wikipedia.org, XXwikibooks = XX.wikibooks.org, etc [13:31:02] for the replicas? [13:32:28] well, its dbname_p for the public views, if that's what you mean. [13:33:02] yes [13:38:33] When trying to connect to dewiki_p im getting an error: 1049, "Unknown database 'dewiki_p'" [13:38:52] from my application, not the shell [13:39:51] kind of stumped on this [13:41:31] FutureTense try sql dewiki -v [13:41:48] fromt he shell? [13:42:00] yes [13:42:13] it works to me [13:42:19] it connect me to server Connecting to dewiki.labsdb [13:42:24] database dewiki_p [13:42:30] yes, it works for me as well [13:42:42] however my application won't connecto to dewiki_p [13:42:43] I think you connect to wrong host [13:42:54] ah [13:43:01] whats the host for enwiki_p [13:43:18] thats it [13:43:28] thx petan [13:43:31] yw [13:44:36] so all of the databases are xxwiki_p and all of the hsots are xxwiki.labsdb [13:44:37] ? [13:47:38] they should be [13:47:55] if you aren't sure you can always use the sql script with -v or --verbose option [13:48:00] it will tell you what it was resolved to [14:19:50] dewiki.labsdb [14:20:04] works for me :> [14:20:29] oh /me slaps self as he continues reading the above [14:44:46] FutureTense, to be more specific. All the hosts are connected to .labsdb where the database is called _p. [14:45:20] enwiktionary.labsdb host the enwiktionary database. [14:45:33] *enwikitionary_p database [14:46:07] wikidatawiki.labsdb hosts the wikidatawiki_p database etc... [14:46:45] From shell, interface, terminal, bash, whatever... sql is the shortcut command to connect to any database. [14:47:03] FutureTense, hope that helps [14:47:19] got it.. [15:05:18] petan or anyone: i heard there is git support on tool labs. is there documentation about it somewhere? [15:06:03] we want to move render tools to a git repository on labs. [15:06:07] petan: Ah yes, I was testing it and forgot to move it over :p [15:09:38] basically we just want to create a git repository on tool labs which we can push into. what's needed for that? scfc_de, addshore, ... anyone? [15:10:45] JohannesK_WMDE: There's no Git server, if you mean that. But you can create git repositories in the file system, of course. [15:11:10] scfc_de: oh. [15:13:10] JohannesK_WMDE: The bigger iron of course is a Gerrit repo. You have to ping demon^ for that. [15:13:27] JohannesK_WMDE: Eh, ^demon. [15:13:32] scfc_de: is there an svn server? [15:13:36] No. [15:13:41] uh [15:13:46] ok... [15:13:55] Why do you need an VCS server on Tool Labs? [15:14:29] well we need some version control. we could just put our stuff on github of course [15:14:46] put stuff in git on gerrit [15:16:00] JohannesK_WMDE: For something close to Wikimedia like RENDER, Gerrit is probably a good idea. [15:16:39] <^demon> scfc_de: I'm more than happy to make new git repos in gerrit for folks. I have a page on mw.org to keep track of them. https://www.mediawiki.org/wiki/Git/New_repositories [15:16:51] <^demon> There's no backlog, so I'll most likely knock it out right away. [15:18:53] ^demon, scfc_de: it looks to me like gerrit is mostly for mediawiki extensions? we just want to have a repo where several users can push to. [15:19:46] I don't think Gerrit allows direct pushing, you're supposed to go through review first... [15:20:04] the majority of repos there are mediawiki extensions but you can still use it for other stuff [15:20:17] uh. okay. [15:20:29] <^demon> Krenair: Depends on the repo. It's mainly extensions we forbid the direct pushing. [15:20:35] <^demon> Of course pushing for review is always encouraged. [15:21:14] <^demon> I'm thinking though...it might be nice to have a "miscellany" repo or somesuch. [15:21:18] <^demon> Allows direct pushing. [15:21:28] <^demon> Just for people to dump one-off things. [15:23:41] Could just make a git repo somewhere on tool labs and use that as a server [15:28:03] Krenair: Yes, if you "abuse" a tool account, you could share a repository quite easily. But with Gerrit, and SourceForge, and GitHub, and ... around -- they usually have made setting up repositories so simple, that it's not worth to bother. [15:28:32] don't have to use a tool account even, could be anywhere on labs [15:28:33] yes [15:28:36] 'git init' [15:28:38] very difficult [15:29:45] But permissions and umasks can be a pain in the ass. [15:30:22] (Though -- obviously -- I have a number of Git repos on Tool Labs -- every "git clone" creates one.) [15:32:14] for me the biggest problem with sharing repositories was always other people leaving uncommitted changes [15:33:25] You have to create bare repos. [15:34:11] yeah, which should be fine if you just want to use it as a 'server' [15:34:51] It's also the only way to push something to a remote repo if the branch you want to update is checked out there. [15:35:18] so, could we just create a git repo on the tool account, and push to it from remote using our user accounts? [15:35:24] using ssh [15:35:28] But the nice thing is that you can use Git hooks for example to create a mini-Jenkins. [15:35:33] JohannesK_WMDE: Yes, that should work. [15:36:00] (But Gerrit (or GitHub) will be more convenient and less error-prone.) [15:45:35] for now, we will try the bare repo way, i think. it is easiest, and we are on a schedule :-] [15:45:49] we can still push to a different remote repo later... [16:10:38] So… replag >.> [16:13:27] legotm: yeah, i read about the 12-hour replag on tools. is that true? [16:14:10] yup [16:14:15] at least on wikidata [16:14:20] wiki_p [16:17:56] strange [16:19:08] if it's that bad now, i don't want to imagine the replag half a year from now when more tools have moved :D [16:19:45] mhm :( [16:22:05] scfc_de, who is/was working on replication? [16:22:53] asher/Coren i think [16:28:10] legoktm where you see replag [16:28:22] wikidatawiki_p [16:28:31] MariaDB [wikidatawiki_p]> SELECT UNIX_TIMESTAMP() - UNIX_TIMESTAMP(MAX(rc_timestamp)) FROM recentchanges; [16:28:35] ah [16:28:36] 109661.000000 [16:29:02] but maybe there is no RC from that time :P [16:29:05] wikidata is small wiki [16:29:08] :D [16:29:16] you're joking right...? [16:29:21] sort of [16:29:41] wikidata has edits every second so that's not the issue. [16:30:02] MariaDB [dewiki_p]> SELECT UNIX_TIMESTAMP() - UNIX_TIMESTAMP(MAX(rc_timestamp)) FROM recentchanges; [16:30:09] 109712.000000 [16:30:25] a remarkably similar number... ;) [16:30:33] so i guess its s5? [16:31:05] i dunno. on enwiki_p it's 0.000000 [16:31:24] https://noc.wikimedia.org/conf/highlight.php?file=s5.dblist is just dewiki and wikidatawiki [16:32:08] then it seems to be a problem with s5 [16:39:20] legoktm, JohannesK_WMDE: are you guys sure that UNIX_TIMESTAMP can actually understand the format used by rc_timestamp? [16:39:39] DanielK_WMDE: yes, thats the command thats been used on the TS for years [16:39:48] and, don't we have a stored porocedure to solve exactly that problem? [16:40:06] er? [16:40:08] ah, that's the custom conversion function? interesting name :P [16:40:20] idk, i just copied it from https://wiki.toolserver.org/view/Replag [16:41:14] according to http://dev.mysql.com/doc/refman/5.5/en/date-and-time-literals.html the answer should be "yes" [16:41:26] as it allows timestamps to be relaxed, not having delimiters [16:41:49] legoktm: ok, you are right, it does understand it [16:42:03] hey #labs - I need to modify my my.cnf but puppet wants to put it back. this is on a single node mediawiki instance. Is there a good way to make the change stick? [16:51:37] manybubbles disable puppet :> [16:53:16] !puppet [16:53:16] learn: http://docs.puppetlabs.com/learning/ troubleshoot: http://docs.puppetlabs.com/guides/troubleshooting.html [16:53:21] !puppet del [16:53:21] Successfully removed puppet [16:53:31] !puppet is http://scary.hostei.com/ascarywaterorg-/img/scary+puppet1.jpg learn: http://docs.puppetlabs.com/learning/ troubleshoot: http://docs.puppetlabs.com/guides/troubleshooting.html [16:53:31] Key was added [17:03:05] I know labs usually has amazing technology but why is the Wikidata edit counter reporting "Caution: Replication lag is high, changes newer than 1 days, 7 hours may not be shown." [17:04:12] CP678|iPad: we know. s5 is replagged [17:07:59] Is there a page like https://noc.wikimedia.org/dbtree/ for labs? [17:09:36] legoktm: s5? [17:09:49] all the wikis on s5. aka dewiki and wikidatawiki [17:10:04] Ok. [17:10:19] That made me think of toolserver. [17:10:58] https://noc.wikimedia.org/conf/s5.dblist [17:20:22] petan, how would you feel about deleting bots-gs? I was trying to get puppet working on it last night but it's kind of a mess. [17:20:43] Coren|Away: seems the replag for s5 is incredibly high [17:21:04] andrewbogott I dont think we need puppet there [17:21:17] How does a labs server let that happen? [17:21:33] petan, in theory we need puppet everywhere, to push out updates and such… at the moment its salt is broken. [17:21:34] andrewbogott I would prefer to delete it later if we really need to because that involves installation of grid master servr [17:21:50] petan: OK… mind if I just patch up salt by hand in the meantime? [17:21:56] Ryan_Lane: It clearly is (laggy), but I've no idea why. I expect Asher coudl tell us. [17:21:58] no problem [17:22:13] petan: everything needs puppet [17:22:24] some package is causing the instance to change its hostname [17:22:29] that needs to get fixed [17:22:37] if that happens, the instance can stay [17:22:39] Ryan_Lane did you see how scary is it? [17:22:40] http://scary.hostei.com/ascarywaterorg-/img/scary+puppet1.jpg [17:23:04] I dont know if I want that creepy thing in front of me :/ [17:23:07] (and when I say hostname, I mean the fqdn) [17:23:29] it's changing it from bots-gs.pmtpa.wmflabs to bots-gs.canonical.com [17:23:37] anyway I am fine with having puppet there as long as it doesnt require reinstalling the system from scratch every week :P [17:23:54] in which way it puppet breaking it? [17:35:37] marktraceur, can you comment on the status of orgcharts-dev? It seems to be down, was maybe never up? [17:38:05] Warning: There is 1 user waiting for shell: Orbartal (waiting 0 minutes) [17:39:02] andrewbogott: Yeah, it's not supposed to be [17:39:08] It was replaced by the new instance [17:39:11] You can delete it [17:39:29] cc Ryan_Lane (oh, he's not here) [17:39:34] marktraceur, ok, deleting now. thanks. [17:43:52] ryan_lane, I updated hosts-down-andrew.txt; we're now down to six instances that need your root key magic. [17:46:05] Our counts are still wrong, but the other way now :( ldap reports 351 hosts, wikitech 344 [17:51:39] Warning: There is 1 user waiting for shell: Orbartal (waiting 13 minutes) [17:59:57] andrewbogott: hm. instance count is a little off [18:00:05] andrewbogott: vs what's in LDAP [18:00:15] yep, off by six [18:00:18] 351 in ldap [18:00:34] I wonder if some ldap entries weren't deleted [18:00:49] Or the instances were created while the wiki updater was broken [18:01:05] 340 instances reporting in salt :) [18:01:06] ah. right [18:01:16] Have a good way to isolate those six? It's a whole lot of sed if you start with the page output [18:01:28] I think so, yeah [18:01:28] Oh, you were gone, let me quote myself... [18:01:37] I can do a SMW query to dump the list [18:01:44] andrewbogott: ryan_lane, I updated hosts-down-andrew.txt; we're now down to six instances that need your root key magicandrewbogott: Our counts are still wrong, but the other way now :( ldap reports 351 hosts, wikitech 344 [18:01:44] [12:49pm] [18:01:50] whoah, that didn't work very well [18:01:57] heh [18:03:16] I have a python script that will list the instances from SMW [18:03:24] let me compare against allhosts.txt [18:03:39] I should really put these scripts I have somewhere :) [18:09:31] andrewbogott: missing.txt on virt0 [18:10:01] python list-instances.py | sort > allhosts-smw.txt [18:11:12] diff -u allhosts-smw.txt allhosts.txt | grep '+i' | grep -v '@' > missing.txt [18:12:23] hm… Special:NovaInstance doesn't get its lists from smw, so these instances should be visible there [18:12:23] And yet, this one isn't [18:12:44] oh, wait, dumb mistake... [18:12:48] i-000004b1.pmtpa.wmflabs <— as an existing page [18:12:54] i-0000073e.pmtpa.wmflabs <— does not [18:13:23] ^^ deployment-parsoid2 [18:14:06] so, let me think of a way to force wikistatus to generate the page... [18:14:18] * Ryan_Lane nods [18:14:53] https://wikitech.wikimedia.org/wiki/Nova_Resource:I-0000073e [18:14:54] weird [18:14:57] shows as a deleted page [18:16:57] ah. must have been when you initially enabled wikistatus [18:17:24] of course, rebooting it would work :) [18:17:42] or suspend/resume [18:17:52] that seems mostly harmless [18:18:27] yeah [18:18:33] suspend/resume sounds good [18:18:43] we should probably make a fake api call [18:18:57] well, "fake" [18:19:03] yeah, or update on a metadata change. That might make it into upstream [18:19:04] where fake is "refresh your docs" [18:19:09] ah. right [18:19:13] that would be easiest [18:19:58] hm, how can I discover which host this is running on? [18:20:05] Or is there a way to suspend w/out knowing that? [18:20:16] I am trying to run puppetd -tv on a labs instance, but get an error: [18:20:19] err: Could not retrieve catalog from remote server: getaddrinfo: Name or service not known [18:20:21] warning: Not using cache on failed catalog [18:20:45] gwicke, when I get that error it's usually because I forgot to 'sudo' [18:20:58] I'm running this as root [18:20:58] andrewbogott: if you know the project you can do a nova show on it [18:21:11] gwicke: which instance? I pushed a change that probably broke puppet on some nodes [18:21:19] Ryan_Lane: varnishtest [18:21:26] I'll fix. one sec [18:22:32] gwicke: done [18:23:13] bah, I can't remember how to do auth when running nova commands directly... [18:23:21] Ryan_Lane: works now, thanks! [18:23:55] andrewbogott: OS_TENANT_NAME= nova [18:24:02] gwicke: yw [18:24:08] seems I broke puppet on a bunch of instances [18:24:12] * Ryan_Lane grumbles [18:25:24] only 38. not too bad [18:33:31] Ryan_Lane: now puppet seems to be broken again for me [18:33:40] err: Could not retrieve catalog from remote server: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed. This is often because the time is out of sync on the server or client [18:34:19] gwicke: ugh. crap [18:34:28] it's because of puppetmaster self [18:34:36] should I restart that? [18:34:41] restart the puppetmaster [18:35:18] it does not seem to start [18:35:41] ok. one sec [18:36:08] it was running [18:36:11] and lost its pid [18:36:51] yep. that fixed it [18:37:34] I killed the running service then did a restart [18:37:40] makes sense [18:37:48] strange that it lost the pid [18:38:00] works again now, thanks! [18:38:36] yw [18:46:26] ok. fixed puppet on the broken hosts [18:49:11] Now we have 351 on wikitech and 341 in salt [18:51:27] cool [18:51:30] that's way closer [18:51:49] we have a number of 11.10 instances and some OOM'd ones too [18:52:13] I have emails out about deleting the 11.10 instances [18:52:28] andrewbogott: thanks for the help. I think we've fix this for now :) [18:52:46] yep, pending some users responding. [18:54:43] Ryan_Lane, most likely the next thing I'll work on is refactoring low-hanging puppet manifests into modules. If you have anything that's bothering you more than that just let me know :) [18:54:58] nope. that sounds awesome :) [18:55:49] Kind of -- it's one of those things that gets us no joy until it's 100% finished. No real incremental benefit [18:56:20] yeah, but when it's actually done it's a massive benefit :) [18:59:18] * andrewbogott -> lunch [19:32:24] Warning: There is 1 user waiting for shell: Mpolucha (waiting 0 minutes) [19:45:59] Warning: There is 1 user waiting for shell: Mpolucha (waiting 13 minutes) [19:59:28] Warning: There is 1 user waiting for shell: Mpolucha (waiting 27 minutes) [20:01:03] Re replag on s5: IIRC at least Merlissimo had a query that caused the replication to block (CREATE TABLE or something?). [20:08:49] hey #labs - I have a question about the performance of my labs instances [20:09:20] I've been working on a script to bootstrap my search index and it is slower than I'd like. [20:09:57] I spent a few hours finding the slow bits and made it about 10x faster - but 10x is still way way way too slow for production use. [20:10:17] how much of that might be coming from labs vs coming from me? [20:11:08] manybubbles, what project are you running in? And, is there lots of file i/o in your script? [20:12:08] the search project and yes, it is mostly io [20:13:02] Warning: There is 1 user waiting for shell: Mpolucha (waiting 40 minutes) [20:13:12] Try moving your I/O to a local dir (that is, outside of /home or /data) -- that may make quite a difference. [20:15:58] I'd have to bump the size of my / partition for that. I can build a new machine to try it out though without too much trouble. [20:16:35] It's probably worth it as an experiment. Probably gluster is dragging you down. [20:16:47] I can't pick an instance with enough space on /. [20:17:06] or, rather, it'd be a pretty tight squeeze [20:17:26] Can you run a limited test just to verify that disk i/o is the problem? [20:18:09] If there aren't a lot of things going on in the search project then maybe we can just move you to nfs right now. Ryan_Lane, Coren, would that be possible? [20:19:02] not a lot of things. [20:19:23] I could ask ^demon to try moving his mysql off to /. he has a smaller wiki [20:19:51] <^demon> I thought mine was on /, or did you move it on solr-mw too? [20:20:20] oh [20:20:24] The process requires making everything read-only for a big, and then rebooting all instances. If you can corral everyone to permit that... [20:20:35] mysql lives on /mnt/mysql [20:20:43] ^demon and I are everyone. [20:20:55] would it speed up /mnt or /? [20:21:35] I'd prefer /mnt be faster or just having more space on / so I can use it instead [20:22:13] probably would not affect /mnt or / [20:22:16] just /project and /home [20:22:32] Well, depends on if /mnt/mysql is mapped to something [20:22:52] <^demon> !log solr solr-mw: copying mysql data from /mnt/mysql to /a/mysql [20:22:53] Logged the message, Master [20:25:42] hmm - I haven't been putting anything in /project actually [20:26:26] The only interesting question is whether your data is on gluster or instance-local. Should be easy enough to tell. [20:26:32] Warning: There is 1 user waiting for shell: Mpolucha (waiting 54 minutes) [20:27:28] the disks we're using are at /vda and /vdb. I imagine those are instance-local [20:28:16] yeah -- in that case I'm not sure what to say about performance :( [20:28:21] <^demon> Grrr, mysql failed to start. [20:28:24] <^demon> That's nice and specific. [20:28:31] meh - just move it back [20:28:32] <^demon> I hate init scripts :) [20:28:45] I think it won't help any way [20:29:15] So mysql is pulling a whole 3MB/second when it gets up and puffing doing our reindexing. [20:30:42] stupid last job. I keep misspelling apt-get [21:36:33] manybubbles: /mnt is too small? [21:36:42] there's instance types with fairly large /mnt folders [21:56:47] Coren|Away ping [21:59:21] !log tools petrb: replaced logsplitter on both apache servers with far more powerfull c++ version thus saving a lot of resources on both servers [21:59:22] Logged the message, Master [21:59:40] Ryan_Lane I am now using c++ instead of c# just to make u happy :3 [21:59:59] I'd be way happier if you didn't write your own logging daemon ;) [22:00:09] logsplitter isnt logging daemon [22:00:22] it just split logs from apache to multiple files [22:00:41] before we had some version written in shell script which was horribly slow and had to spawn like 5 processes to parse 1 line [22:01:08] ah [22:01:22] what's the purpose of splitting the logs? [22:01:26] btw, Ryan_Lane if you dont like own logging daemons, why dont you slap Tim Starling for making udp logger :P [22:01:40] you mean udp2log? [22:01:43] yes [22:01:52] well, it's very likely going away [22:02:25] so, there's that [22:02:33] each user has access log of their tool in their home, also there is a global access log, and in some cases the log needs to go special place (if tool is missing etc) [22:02:47] heh [22:02:52] that is what a splitter is doing :P [22:03:00] well, in 2.4 that should be less necessary [22:03:04] it basically takes 1 information and distribute it to different places [22:03:14] in different forms [22:03:24] since you can have a virtual host that runs as the user itself, where it logs as the user to a specific place with a logging format [22:03:25] and it is horribly fast now :> [22:03:39] yes we can do that now as well [22:03:43] but we need to log to multiple places [22:03:47] why? [22:03:50] otherwise we never get the global stats [22:04:04] why not have apache write to syslog/ [22:04:06] for whole tools project [22:04:12] idk [22:04:20] maybe because it is complicated to set up? :o [22:04:26] it really isn't [22:04:35] I think Coren made this splitter as a temporary workaround [22:04:37] especially if you use syslog-ng [22:04:40] !toolsadmin [22:04:40] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Documentation/Admin [22:04:45] this thing needs to be extended [22:04:57] there is lot of stuff that is not documented reason for this is one of them [22:05:34] heh [22:06:27] how would you simply convince syslog to a) save logs of existing users to their home directory respectively (converting tool -> local-tool -> back to tool) b) save logs to global log folder in a different format (prefix with apache server) c) log nonexistent tools to logA d) log tools with missing home to logB [22:07:00] heh. seriously, let's just use syslog [22:07:00] I think that is a reason why Coren decided to write 5 line shell script instead of messing with syslog [22:07:01] err [22:07:04] logstash [22:07:11] then we can tag the log [22:07:12] *logs [22:07:20] and people can see all of their logs by using the tag [22:07:27] that is a cool future... but I guess until then we need to stick with simple solutions [22:07:46] I have a feeling it'll end up being easier to use and easier to maintain [22:08:20] I would happily use log stash, once I read the documentation... yuvipanda was saying this about syslog as well, that it is easy and fantastic. he is promising for 2 weeks he will set it up and nothing [22:08:33] heh [22:08:43] I guess it is not /that/ easy [22:09:08] easier than writing and maintaing a daemon [22:09:11] in c++ [22:09:20] it took me just 3 hours :P [22:09:33] and I learned that using threads in c++ is far easier than people say [22:09:49] and now you'll need to write a web frontend [22:09:56] and add search indexing [22:10:00] nah I dont mean the logging thing [22:10:06] I mean the logsplitter [22:10:08] ah [22:10:14] right [22:10:42] but I am really curious how easy this logstash will be for tool developers to implement into their programs [22:10:58] you can write to it with syslog [22:11:14] I guess even writing to syslog might be complicated for some of them... [22:11:29] we still have people who didnt understand how to use grid [22:11:38] then they can just not have logs [22:11:44] :P [22:11:46] or they can write them directly into their homedirs [22:11:52] and not be able to search them [22:11:56] yes that is what most of them do [22:12:09] some of them run their bots in screen on tools-login :DD [22:12:12] or we can have libraries for people to use [22:12:33] hmm syslog has libraries for many languages [22:12:38] I suppose logstash will be same [22:12:56] well, maybe [22:13:10] in documentation they have interface for many languages [22:13:38] it looks like you can just write log messages directly to the port [22:13:47] nc localhost 3333 < apache_log.1 [22:14:00] I think we could set up multiple logging providers, such as syslog but merge the output of them - like redirect all of them to logstash [22:14:18] so developers would just pick what they like most and all logs would be on 1 place [22:14:51] they would use native syslog function of their language so tool -> syslog -> logstash [22:15:11] result would be same as if they used logstash directly just they wouldnt need to learn anythinh new [22:15:15] hell, they could always just shell out to log :) [22:15:38] (that's actually a horrible idea and I'm kidding) [22:15:51] yes I think scfc_de was saying something like that :P [22:15:56] before [22:16:00] logger writes to syslog [22:16:07] but yeah, that's asking for security issues [22:16:15] when I suggested writing logs to stdout and pipe them out to logeater [22:21:16] Ryan_Lane: I had some trouble pointing logger to the right host/port. It kept writing to the local syslogd, so using the standard libraries is probably safer :-). [22:25:16] scfc_de: :D [22:25:26] yeah. libraries are likely much easier [22:30:18] I'm being an idiot and need some help logging in via ssh. I'm getting "Permission denied (publickey)". I have uploaded the public key. I'm on Linux. [22:31:21] bgwhite: your shell account name is bgwhite? [22:31:31] Yes [22:31:57] heh [22:32:06] does your public key have linebreaks in it? [22:32:22] that will make it not work [22:32:40] I wonder if we have a bug in to strip line endings in uploaded keys [22:32:48] No. I just copy and pasted from the .pub file [22:33:14] yeah, but it may have pasted with line breaks [22:33:33] it definitely looks like it does [22:35:49] I added a bug for us to make sure this doesn't happen [22:36:01] but, yeah, you'll need to reupload your key [22:36:06] make sure there's no line breaks in it [22:36:31] I just uploaded it again and there are no line breaks (cross fingers) [22:40:00] bgwhite: it looks better [22:40:10] give it a sec to sync [22:45:26] Yea, thank you. I've logged in. [22:46:16] \o/ [22:46:23] yw [23:01:32] Ryan_Lane: hello, I want to run Parsoid on a service group in tools [23:01:52] but I think it's not enabled globally [23:02:03] there's a parsoid-spof instance available to all of labs [23:02:03] Warning: There is 1 user waiting for shell: Patrick87 (waiting 0 minutes) [23:02:24] I don't know how to use it [23:02:31] neither do I ;) [23:02:33] gwicke: ^^ [23:02:55] I think gwicke is a little busy [23:03:03] but I hope he helps [23:03:35] Amir1: I don't know much about the puppet / labs integration [23:04:08] gwicke: what about using parsoid-spof in a service group? [23:04:14] I see a misc::parsoid option in the visualeditor project, but am not sure if that is avialable everywhere [23:04:34] Amir1: you can point your browser to http://parsoid.wmflabs.org/ [23:04:58] in localsetting.js? [23:05:40] what are you trying to do? [23:05:52] you can check our parsing using a plain browser at the above URL [23:07:16] gwicke: I want to enable Visual Editor in my wiki in a service group [23:07:27] I already installed Visual Editor [23:07:35] (extension) [23:07:38] then you want a local Parsoid install [23:08:02] but "Error loading data from server: parsoidserver-http-curl-error: couldn't connect to host. Would you like to retry?" [23:08:21] I installed Parsoid [23:08:21] but it doesn't work [23:08:21] if you don't see Parsoid in the available puppet options in your VM's config page, then Roan might have to add that option to your project (or all of them) [23:08:55] Amir1: you configured both Parsoid and VE? [23:09:05] yes [23:09:16] is parsoid working? [23:09:27] gwicke: no [23:09:45] is it running? [23:10:39] not now but I ran it and it didn't work neither [23:12:19] this is my localsettings.js [23:12:21] http://pastebin.ca/2397993 [23:12:28] is it something wrong in it? [23:12:35] gwicke: ^ [23:13:22] $wgVisualEditorParsoidURL = 'http://localhost:8000'; does not belong in there [23:13:44] Ryan_Lane, petan, Coren|Away, legoktm: Not only is it replaged, but I think replication has completely halted over there. [23:13:46] that belongs in LocalSettings.php [23:13:54] Cyberpower678: Yeah. [23:14:01] gwicke: ok [23:14:05] let me check [23:14:13] Caution: Replication lag is high, changes newer than 1 days, 13 hours, 11 minutes, 7 seconds may not be shown. [23:15:34] Warning: There is 1 user waiting for shell: Patrick87 (waiting 13 minutes) [23:15:38] Cyberpower678: When Merlissimo originally ported some of his tools, there was an issue with "CREATE TABLE" or something similar I believe (are the channel logs somewhere grepable, BTW?). But we probably need Coren or binasher to look at the process list on the labsdb server. [23:28:59] Warning: There is 1 user waiting for shell: Patrick87 (waiting 27 minutes) [23:42:33] Warning: There is 1 user waiting for shell: Patrick87 (waiting 40 minutes) [23:48:12] guys I want to submit this job but it is deleted every time [23:48:13] jsub -N Parsoid -continuous -mem 1024m node /data/project/wikitest-rtl/public_html/w/extensions/Parsoid/js/api/server.js [23:48:21] what's wrong in this [23:48:35] I checked, Parsoid.err is empty [23:48:54] Parsoid.out shows part of running code [23:49:25] S5 replication is going again [23:49:49] petan: [23:49:52] ^ [23:52:32] (thanks to asher) [23:53:35] Ryan_Lane: What was the cause? [23:53:51] I believe some maintenance that needed to be done [23:54:08] s4 and s5 with both need to be restarted at some point in the future as well [23:56:03] Warning: There is 1 user waiting for shell: Patrick87 (waiting 54 minutes)