[00:01:04] is tools-login.wmflabs.org down? [00:01:47] OK I cant find the host tools-login.wmflabs.org [00:02:32] Coren: petan ^ [00:02:49] * Coren checks. [00:03:23] just went to start a sftp connection and got an error [00:03:29] Betacommand: Works for me, without delay or issue. [00:03:36] * Coren checks from elsewhere. [00:03:42] Coren: whats the IP address? [00:03:56] it may be a local DNS issue [00:04:14] Works from my colo in Ohio too. It's 208.80.153.224 [00:05:37] Coren: its a local dns issue [00:05:45] connecting via IP works [00:07:34] Coren: any plans on getting 45646 fixed? [00:13:40] Coren: shouldnt I have an access.log in my home dir? [00:15:44] Hm. It should indeed be relatively simple to do but it has to be done on the dump side. I'll poke Ariel. [00:16:13] I'm guessing that the symlink can be done by the same process that does the dump in the first place. [00:17:48] Coren: would make the dump related tools much easier to maintain [00:20:24] And yes, you should have an access.log, though I expect it should be in your tool's home and not your own. :-) [00:20:38] Coren: thats what I ment :P [00:22:19] ... and you don't? That might happen if it was rm'ed since the webservice got started (the service would still be writing to the unlinked file, not recreate it). Stopping and restarting the daemon will cause it to create a new one. [00:22:28] ah [00:23:11] Normally, if you want to just flush your log, the "right" thing to do is to just truncate it. '>access.log' works. [00:37:35] do lower Page revision IDs ALWAYS mean earlier edits? I.e. is this a gurantee? It seems true in general. [00:59:08] notconfusing: I think it's true, but not by design. [00:59:37] notconfusing: I know they are explicitly not guaranteed to be monotonal. [01:01:50] Coren, thanks. [01:01:59] not shortcuts here [05:43:34] (03PS1) 10Tim Landscheidt: Add Apple Touch icon for Labs [labs/toollabs] - 10https://gerrit.wikimedia.org/r/113335 [08:19:44] !log tools tools-login: rm -f /var/log/exim4/paniclog (OOM) [08:19:47] Logged the message, Master [08:54:53] (03CR) 10Odder: [C: 031] Add Apple Touch icon for Labs [labs/toollabs] - 10https://gerrit.wikimedia.org/r/113335 (owner: 10Tim Landscheidt) [09:38:53] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/List of Toolserver Tools was modified, changed by Nemo bis link https://www.mediawiki.org/w/index.php?diff=906266 edit summary: typo [10:50:31] hi there [10:50:49] I'm looking for an api route to list all revision ids of an article [10:53:48] Roux_taff: You'll probably find more expertise on #mediawiki or mediawiki-api-l :-). [10:53:57] arf indeed sorry :/ [10:56:07] Roux_taff: https://en.wikinews.org/wiki/Special:ApiSandbox#action=query&prop=revisions&format=json&rvprop=ids&rvlimit=500&titles=Main [12:05:38] petan, Ryan_Lane, andrewbogott_afk: Could you add me to https://wikitech.wikimedia.org/wiki/Nova_Resource:Nagios with rights to create a new instance and root there for https://bugzilla.wikimedia.org/60112? Thanks. [12:06:52] scfc_de: I don't understand at all what do you need it for? o.o [12:07:02] that bug is related to production icinga not labs icinga [12:07:10] and why do you need to create a new instance there? [12:09:29] IMHO our icinga on labs isn't affected by this, because it should be backported [12:09:42] so it's way newer than what they have on production [12:13:47] petan: I need an instance to test the package I built on. [12:14:18] ok, I don't think that this project is really intended for testing of random icinga instances, but I don't really care... sec [12:15:30] @labs-user scfc_de [12:15:30] That user is not a member of any project [12:15:34] @labs-user Scfc_de [12:15:34] That user is not a member of any project [12:15:39] what is your labs username [12:17:11] "Tim Landscheidt" [15:16:09] Coren: are you here? I need some perl guru [15:16:21] petan: What be up? [15:16:51] I will send you a source code :P which I just wrote and which doesn't work, basically I need to understand concept of classes in perl... [15:17:58] http://tools.wmflabs.org/paste/view/a1637168 [15:18:27] Coren: ^ this is the code which crashes on "$hostinfo->{"name"} = $host;" [15:18:48] $hostinfo is supposed to be an instance of class / package host [15:18:57] which has attribute $name [15:19:05] so... why it doesn't work o.O [15:21:07] Coren: any idea or should I ask someone else? :/ [15:21:19] petan: Because your new functions never actually create or return an object. :-) [15:21:20] petan: you should try #perl [15:21:22] ? [15:21:26] s/should/coulud/ [15:21:32] YuviPanda: I prefer someone I know [15:21:36] like Coren :P [15:21:43] petan: And only declare variables that you never use. :-) [15:21:50] mhm [15:22:03] petan: sub new { my $stuff = [ ]; return bless $stuff; } [15:22:11] so how does a proper contruction of a class that just has 2 public attributes I can read / write to looks [15:22:19] petan: Objects are just arrays or hashes that are blessed. [15:22:41] In your case, you seem to want a hash. So: [15:22:44] petan: :D [15:22:58] sub new { my $thing = { }; return bless $thing; } [15:23:04] if you don't need to initialize anything. [15:23:25] Otherwise, you can add stuff to your hash before you return it as needed. [15:23:48] Also: "man perlobj" [15:24:05] Or possibly "man perlootut" [15:24:11] ok so for 2 elements it would be sub new { my $a = { }; my $b = { }; return bless $a, $b; } [15:24:12] ? [15:24:23] No, that's trying to return two objects. [15:24:48] ok so I need that new statement twice? [15:24:51] sub new { my $self = { }; $self->{"a"} = 'foo'; $self->{"b"} = 'bar'; return bless $self; } [15:24:56] petan: And *always* "use strict; use warnings;" :-). [15:25:02] scfc_de`: I do [15:25:09] scfc_de`: that's why I receive so many warnings :D [15:25:32] The pastebin doesn't say so :-). [15:25:33] Or, nicer: sub new { my $self = { a=>'foo', b=>'bar' }; return bless $self; } [15:25:54] Or, nicest: sub new { return bless { a=>'foo', b=>'bar' }; } [15:26:28] All three of those new subs return identical objects. [15:26:36] petan: Had a chance to look at https://wikitech.wikimedia.org/wiki/Nova_Resource:Nagios? [15:28:09] Successfully added Tim Landscheidt to nagios... [15:28:47] petan: Thanks! [15:29:36] petan: No "Add new instance" button; I need some some more karma :-). [15:30:18] done now you have to help me finish that perl thing [15:30:21] XD [15:30:27] I will soon stuck somewhere else [15:30:29] What's the problem? [15:35:23] scfc_de: http://tools.wmflabs.org/paste/view/26180f4d [15:35:30] now it crashes on line 78 [15:35:44] Not an ARRAY reference at ./daemon_mail.pl line 78, <$hostfile> line 2. [15:39:18] scfc_de: some idea? [15:41:49] scfc_de: are u even here? :P [15:46:49] I am :-). One moment, please. [15:49:51] So I created ~/abc{,/conf}, set $path to "/home/scfc/abc" and the script runs without doing anything, i. e. it's not crashing? [15:50:29] With "use warnings;" it says "Scalar value @values[2] better written as $values[2] at ./test.pl line 76.", "Scalar value @values[0] better written as $values[0] at ./test.pl line 77.", "Scalar value @values[1] better written as $values[1] at ./test.pl line 78." (line 78!) and "Use of uninitialized value $arg1 in string eq at ./test.pl line 130.". [16:24:05] scfc_de: I already figured how to fix it [16:25:33] petan: k [17:35:32] Hi; any memories as to how long the Sunday's outage was? [17:35:50] * twkozlowski trying to finish https://meta.wikimedia.org/wiki/Tech/News/2014/08 [17:36:08] "On February 9, Wikimedia Labs was broken for I-don't-know-how-many hours due to an [[m:w:Network File System|NFS]] problem." is what I have now [17:37:36] twkozlowski: I'm not entirely sure when it started, but it ended about 20 minutes after I started work ~9 PST [17:40:07] [15:12:51] the tools webserver seems non-responsive. [17:40:21] 9:20 PST is 17:20 UTC [17:40:24] so about two hours [17:40:54] so thanks Coren :) [17:48:10] twkozlowski: Also, strictly speaking, it wasn't the NFS service that was broken, but the underlying XFS that was hosed. [17:48:50] Coren: sure, I'll fix that [17:49:16] mmm xfs [18:19:08] hi Yuvi [18:19:51] hi YuviPanda [18:20:03] hi veera [18:20:23] i have made an access request for Nova Tools with help of Arjuna [18:20:33] ah [18:20:35] let me look [18:20:40] https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Veera.sj [18:20:54] can you please approve it, so that I can get some training from Arjuna [18:20:57] veera: doing [20:27:32] Something's broken with initial Puppet runs on instance creations. I just aborted a "puppet agent --splay" that had been sitting from 15:30Z to now. And now it's stuck in "Could not request certificate: The certificate retrieved from the master ..." [20:28:50] And it is still confused about being i-00000906.pmtpa.wmflabs. *Argl* [20:29:40] Ryan_Lane: Do you have an idea how to debug this? (Or better: To fix this? :-)) [21:54:54] bd808: I have to `sudo su vagrant; bash` to do anything on a labs-vagrant instance. I added this to https://wikitech.wikimedia.org/wiki/Labs-vagrant . Can one configure sudoers so `sudo -u vagrant ` works? [21:56:46] spagewmf: Configuring sudo would be possible, but how did you end up with things being owned by vagrant? I haven't played with a labs-vagrant instance for a while but that sounds like a regression of some sort. [21:57:08] Can you clarify "to do anything"? [21:58:15] i just got SIGTERM on the login server [21:58:18] bd808: umm, everything in /mnt/vagrant on ee-flow-extra.pmtpa.wmflabs is owner:group vagrant:www-data [21:58:26] i was testing something there before jsubbing [21:59:16] notconfusing: kinda helpful if you say which server you mean [21:59:50] jeremyb, tools-login, is there a proc that kills things that are running on tools-login? [22:00:05] notconfusing: maybe! [22:00:13] either 755 or 644, and I'm not in www-data. So git checkouts, adding files to /vagrant/settings.d all require me to be vagrant, or chmod files that I worry puppet/vagrant will reset [22:00:28] jeremyb can you remind me how i redirect stdout and stderr when jsubbing? [22:01:28] <45PAAEWZL> notconfusing: it does that automatically (to jobname.err and jobname.out) [22:01:35] <45PAAEWZL> errr. [22:01:53] spagewmf: Ok. I'll check an older instance when I get a minute and see if that's always been the case or if it's something new that we should be excluding in a labs deploy. [22:02:28] 45PAAEWZL, found it, thanks https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Submitting.2C_managing_and_scheduling_jobs_on_the_grid [22:02:58] I haven't been using/testing mw-vagrant in labs for a while and may have missed some changes that cause problems there [22:03:36] notconfusing: viafbot? [22:03:40] spagewmf: Actually would you mind filing a bug about this issue so I don't lose track? [22:03:45] notconfusing: really local-viafbot [22:05:20] for whatever reason it names my job "python" so if i run two jobs the filenames would overwrite each other, so i ran it locally, but i will redircet output by instructing jsub to do so, so its not a problem naymore [22:05:22] bd808: Will do. the ownership seems fine and matches MW-vagrant, but you don't `vagrant ssh` to a labs instance. [22:06:37] Yeah. Yuvi's hacks are hacks. :) But we should try and support them if we can. [22:06:38] notconfusing: anyway, yes there is a killer and you were killed [22:06:40] Maintainer: Petr Bena [22:06:47] Description: Terminator daemon A terminator daemon which can watch the system resources and automatically clean unwanted processes to prevent system going out of memory [22:06:55] Package: terminatord [22:07:15] jeremyb, ok good to confirm that i did infact get spanked for a reason [22:07:24] sorry for being naughty [22:07:24] hi [22:07:35] did someone mention me :D [22:07:36] notconfusing: it was memory [22:07:49] petan: yes. apt-cache show terminatord mentioned you [22:08:44] when you get killed by terminator it send you a mail with explanation [22:08:53] thanks for being helpful :) [22:08:58] it kills processes with low priority OR highest memory usage [22:09:16] you should be able to see why you get killed by typing mail [22:09:49] i wonder what the mail looks like... [22:10:49] Your job python (19064 using 1168 kb of memory) on server tools-login was killed, because the system didn't have enough of free operating memory (only 91090944 bytes of memory was remaining in [22:10:50] +moment) and Your process was one of most resource expensive jobs. This action was done automatically to prevent whole server from dying, and is no way Your fault. System administrators were [22:10:52] notified about this problem and will resolve the issue soon, also please keep in mind that if this process was a task or a bot, it should have been scheduled on grid instead. I am sorry and I wish you a pleasant day. Your terminator daemon [22:11:41] 1168kb doesn't sound like too much, but I guess it had low priority... or there were zillions of them [22:12:46] given that I was getting can't allocate memory failures from system daemons in same time, I guess it really had some memory issues... [22:13:09] daemon: fork of queue-runner process failed: Cannot allocate memory :o [22:13:41] this one was for viabot; [22:13:42] Your job python (20222 using 321872 kb of memory) on server tools-login was killed, because the system didn't have enough of free operating memory (only 84692992 bytes of memory was remaining [22:46:11] spagewmf: Were you trying to do vagrant things on your labs image without using sudo at all? [22:47:00] spagewmf: I'm looking at an instance where mw-vagrant hasn't been updated for 5+ months and it has the same file ownership (vagrant:www-data) [22:47:14] So it's not a recent regression as I feared [22:47:25] bd808: yes. Obviously I could do `sudo vi ...` but that doesn't seem a good idea for `sudo git checkout xxx` [22:49:29] Sure. It seems dirty but I'm not sure that `sudo -u vagrant …` seems any cleaner. [22:50:45] bd808: for old-skool labs instances we chmod g+w and config git for shared groups, but I worry that puppet might change it all back. Maybe sudo su vagrant; bash; is the best approach [22:52:31] I'm not sure that the puppet bits you are using would change permissions back. It would be worth trying. [22:53:58] When running inside virtualbox with shared dirs file permission management doesn't work from inside the vm so we might not be trying to enforce permissions on the git checkouts. [22:59:01] error 400 when loading a tool page [22:59:39] connection closed after letting me log in [22:59:47] what exactly is going on? [23:02:14] I am having same issue as Sveta [23:02:19] tools-dev:~$ ls [23:02:19] ls: cannot open directory .: Stale NFS file handle [23:03:03] :s [23:03:38] Coren: ^ [23:04:04] sounds like a systemwide issue, lets just be patient, understanding and appreciative of our the lovely wmflabs [23:04:53] Getting 400s [23:04:56] ont tool labs [23:05:01] likewise [23:05:10] waiting for someone relevant to read this channel [23:05:20] Coren, ^ [23:05:25] scfc_de, ^ [23:05:52] greg-g, is looking into it [23:05:56] Ah, hm. I hadn't checked tools-dev after the XFS issue earlier this week. [23:06:07] It just needs a swift kick in the butt. [23:06:10] i am not on -dev, i am on something else [23:06:47] * greg-g was mostly looking for someone to look into it ;) [23:07:05] Oh, FFS, did the XFS break again?! [23:07:19] heya notconfusing :) [23:07:26] long time [23:07:35] greg-g, likewise [23:08:03] Coren, throw away the tools infrastructure and build a new one. [23:08:13] That's what we're doing. [23:08:18] And yes, XFS went boom again. [23:08:44] ls: cannot access /srv: Input/output error [23:09:33] Things will return shortly once the server has been kicked. [23:09:49] Long live ext4 [23:10:26] greg-g, its a great day, hope life is good besides the server f-up [23:10:53] That [bleeping] filesystem has been nothing but trouble since day one. [23:10:53] notconfusing: well, this is the second outage today (nay, third) :/ [23:11:17] greg-g: In all fairness, one of our DC having a "power event" is not even close to our fault. :-) [23:11:20] well i need to start writing this paper anyway [23:11:26] Coren: no, but still us dealing with [23:11:32] notconfusing: enjoy [23:14:25] It'll take a bit; those servers are surprisingly long to pass POST [23:18:37] * Coren doesn't take a chance and runs an xfs_repair [23:20:17] aa something is bent [23:24:00] ♬ take us down to eqiad city, where the grass is green and the bits are pretty ...♬ [23:27:26] repair_xfs done. Restarting service. [23:28:59] Instances should recover gradually from now. [23:30:03] (That may take a while as the piled up processes wake up gradually) [23:33:07] load average: 63.79, 225.12, 181.23 [23:33:45] * Coren grumbles. [23:34:02] Well, it's brittle but at least it's relatively easy to kickstart it back into shape. [23:36:51] Coren: you mentioned a "power event". did that happen yesterday? [23:37:11] No, earlier today and that was ulsfo (caching for west coast and APAC) [23:54:17] !log tools restarting grrrit-wm since it disappeared [23:54:23] Logged the message, Master