[03:05:40] RECOVERY Puppet freshness is now: OK on mobile-enwp i-000000ce output: puppet ran at Mon Apr 9 03:05:30 UTC 2012 [03:31:00] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 14% free memory [03:41:21] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 15% free memory [03:43:01] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 16% free memory [03:46:21] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 14% free memory [03:47:31] PROBLEM Puppet freshness is now: CRITICAL on puppet-lucid i-00000080 output: Puppet has not run in last 20 hours [03:56:01] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [03:57:21] PROBLEM Free ram is now: WARNING on test3 i-00000093 output: Warning: 11% free memory [04:01:01] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 93% free memory [04:01:21] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 4% free memory [04:01:21] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 5% free memory [04:02:21] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [04:03:01] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 4% free memory [04:06:21] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 96% free memory [04:11:21] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:13:01] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [06:30:03] How do I again get access to phpmyadmin for the bots mysql server? [06:35:54] RECOVERY Disk Space is now: OK on deployment-transcoding i-00000105 output: DISK OK [06:41:14] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 15% free memory [06:51:14] RECOVERY Free ram is now: OK on bots-3 i-000000e5 output: OK: 20% free memory [07:02:34] RECOVERY Disk Space is now: OK on aggregator1 i-0000010c output: DISK OK [07:15:14] PROBLEM Free ram is now: WARNING on bots-3 i-000000e5 output: Warning: 17% free memory [07:21:28] Beetstra: which [07:21:43] you just need to have an account on sql server [07:35:21] I have that [07:35:30] hmm .. bots-3 is complaining .. [07:35:43] I need the mysql of bots-sql3 [07:35:49] phpmyadmin, I mean [07:36:27] I was there to set up the tables .. but I forgot to bookmark the address [07:37:02] bots.wmflabs.org/phpmyadmin [07:37:40] Cheers, that was it! [07:39:24] Wow .. that db is groiwing fast [07:43:04] What is the size of the storage for the sql servers? [07:46:11] dunno [08:16:00] One of my dbs is already 8 Gb .. after 3 months [08:16:18] Maybe I should consider a restructure (hopeing I do not lose search capabilities with that) [09:15:39] PROBLEM Disk Space is now: WARNING on aggregator1 i-0000010c output: DISK WARNING - free space: / 351 MB (3% inode=93%): [09:17:18] Beetstra: sql-2 80gb [09:17:23] sql-3 like 10 [09:17:27] it's smallest [09:20:39] PROBLEM Disk Space is now: CRITICAL on aggregator1 i-0000010c output: DISK CRITICAL - free space: / 50 MB (0% inode=93%): [10:03:32] sql-3 has only 10 .. hmmm [10:03:38] I filled 8.2 ... [10:24:44] unless nagios complain all is ok [10:25:10] this isn't production anyway [10:25:12] Ryan_Lane: hi [10:25:28] I would like to begin with production project of bots on monday [10:25:54] start puppetizing stuff, write some manuals etc [10:26:09] Ryan_Lane: I really need you to look in that request I made [10:26:23] Give it boobies and 5hour [10:27:01] meh [10:27:21] why can't he work 24 * 7 like my server [10:27:34] You mean he doesn't? [10:27:43] yes [10:28:29] I just spent the last 7hours configuring a cluster for something going live this afternoon that got traffic raped the end of last week so I'm all for working 24x7 :P [10:29:57] raped? I don't think I'm properyl understanding you... [10:30:54] Yes, raped. [10:31:31] teenage girls + f5 is bad news for servers [10:55:56] not really [10:56:07] I have one server, try to hold f5 you will see easter bunny [10:56:11] :D [10:56:19] I installed anti hammer there [10:57:11] * Damianz goes to find a few dozen thousand people to press f5 on petan's site [10:57:30] http://corz.org/serv/tools/anti-hammer/ [10:57:41] server is fast [10:57:42] no worries [10:57:50] I tried to hammer it myself [10:58:03] worst effect was it got some load, but still was responding quickly [10:58:17] but hundreds of apache processes heh [10:58:47] Hundreds isn't a problem.... [10:58:58] When you push a decent spec'd box to a load of 500 you have issues [10:59:04] this anti hammer make it stop when someone excess the limit [10:59:07] When you have 4 of them.... [10:59:10] so no other processes are created [10:59:24] for that user [10:59:50] It still runs a php instance and from the screenshot, stats the filesystem though? [11:00:00] I don't know [11:00:04] but it works [11:00:10] yes it work w filesystem heh [11:00:16] it can works in sessions too [11:00:22] but it's not so effective then [11:30:55] Platonides: hey you online ? [12:49:47] PROBLEM Current Load is now: CRITICAL on nagios 127.0.0.1 output: CRITICAL - load average: 2.53, 7.48, 4.25 [12:50:17] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 4.01, 9.29, 5.51 [12:54:47] PROBLEM Current Load is now: WARNING on nagios 127.0.0.1 output: WARNING - load average: 0.24, 3.19, 3.30 [12:55:23] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.24, 3.58, 4.06 [12:59:47] RECOVERY Current Load is now: OK on nagios 127.0.0.1 output: OK - load average: 0.08, 1.36, 2.48 [13:48:08] PROBLEM Puppet freshness is now: CRITICAL on puppet-lucid i-00000080 output: Puppet has not run in last 20 hours [17:22:33] welcome Faidon! [17:56:33] <^demon> Ryan_Lane: Good morning :) [17:56:40] morning [17:56:51] petan: what request? [17:57:21] petan: this? https://labsconsole.wikimedia.org/wiki/Requests#Create_a_new_group [17:57:27] <^demon> Ryan_Lane: Do you think you could take care of the port 29418 forwarding issue this week? I've had more people complain about firewalls. [17:57:56] sure... [17:58:02] it's not an easy change you realixe. [17:58:04] *realize [17:58:27] <^demon> I'm willing to help if there's anything I can do. [18:00:01] <^demon> What would be the easiest way to go about it? [18:00:21] well, I need to give the box another IP [18:00:26] assign it to the NIC [18:00:42] and change the SSH daemon to not listen on 0.0.0 [18:00:44] .0 [18:02:56] <^demon> We'd still be able to ssh in to manganese, right? [18:25:36] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 78 MB (5% inode=53%): [18:27:17] ^demon: yes [18:27:32] ^demon: we'd change gerrit to a service IP, rather than a CNA *CNAME [18:29:01] Why do en wiki admins think that is a good reason to have review access for my bot stuff o.0 [18:36:49] PROBLEM dpkg-check is now: CRITICAL on reportcard1 i-000000a8 output: DPKG CRITICAL dpkg reports broken packages [18:49:27] PROBLEM host: reportcard1 is DOWN address: i-000000a8 CRITICAL - Host Unreachable (i-000000a8) [18:52:53] hiyaaaa evweybody! [18:52:57] long time no bother [18:53:15] I just rebooted our reportcard1 labs instance, and it is not coming back up [18:53:37] should I submit a ticket? [18:54:57] hm [18:55:00] lemme take a look [18:55:37] danke! [18:55:39] We have tickets? [18:55:44] ha, do we? [18:55:45] I just blame Ryan [18:55:47] yes [18:55:51] or maybe that is just for other admin stuff [18:55:51] bugzilla [18:55:56] Wikimedia Labs product [18:56:07] Labs could be a bug, yes [18:56:07] :D [18:56:26] do I need a new account for bugzilla or is it LDAPed? [18:56:33] new account [18:56:37] RECOVERY host: reportcard1 is UP address: i-000000a8 PING OK - Packet loss = 0%, RTA = 0.57 ms [18:56:45] k [18:56:49] bz totally should use central auth [18:56:58] hm, in now [18:57:03] maybe it just took a while to come up? [18:57:24] ah [18:57:25] likely [18:57:27] cool [18:57:33] welp, thanks! [18:57:59] i was worried somethign was wrong, especially since i couldn't get console output on labsconsole [19:03:28] yargh, i want to scrap this instance [19:03:39] oops [19:03:40] wrong chat [19:13:43] PROBLEM Current Load is now: CRITICAL on reportcard2 i-000001ea output: Connection refused by host [19:14:23] PROBLEM Current Users is now: CRITICAL on reportcard2 i-000001ea output: Connection refused by host [19:15:03] PROBLEM Disk Space is now: CRITICAL on reportcard2 i-000001ea output: Connection refused by host [19:15:43] PROBLEM Free ram is now: CRITICAL on reportcard2 i-000001ea output: Connection refused by host [19:16:53] PROBLEM Total Processes is now: CRITICAL on reportcard2 i-000001ea output: Connection refused by host [19:17:33] PROBLEM dpkg-check is now: CRITICAL on reportcard2 i-000001ea output: Connection refused by host [19:27:40] Ryan_Lane, have a moment before you go to lunch? [19:41:03] andrewbogott: sure [19:42:08] Ryan_Lane: I created an account for user 'Shizhao' a couple of days ago, and just got a note that he can't reset his password because he gets a db error. [19:42:19] Can you investigate, or coach me on how to investigate? [19:42:44] His note is on my talk page... http://www.mediawiki.org/wiki/User_talk:Andrew_Bogott [19:42:45] strace -p$(pidof mysqld) [19:43:26] ah [19:43:34] I know the problem [19:43:48] Does it mean that I've been creating accounts incorrectly? [19:43:48] someone linked him directly to the password reset page [19:44:00] 'someone' = me [19:44:04] hm [19:44:06] does that not work for some reason? [19:44:06] or I broke something [19:44:13] with my latest code push [19:44:20] is this the only user that's complained? [19:44:38] I think so, yes. [19:45:02] He had an svn account, so I created his account on the cmdline [19:45:04] on formey [19:45:06] yeah [19:45:09] ah [19:45:09] right [19:45:18] then he's hitting the password reset form directly [19:45:20] which doesn't work [19:45:27] he needs to click through to the page [19:45:39] I don't understand why that does work, but linking directly doesn't [19:45:56] likely a bug that I fixed in core that isn't in 1.18 [19:45:59] weird. OK, click through from where? [19:46:34] Here is the 'done' message I've been using: http://www.mediawiki.org/wiki/Developer_access/Archive#Discussion_33 [19:46:44] (For folks w/SVN accounts) [19:47:08] Log in -> (Click log in button on login page) -> Click "Forgotten your login details?" -> Fill in your user name -> click "E-mail new password" button [19:47:50] ok, I will revise... [19:48:19] Can I at least link to the login page? [19:49:21] yeah [19:49:58] I don't get why linking directly doesn't work [19:50:01] mediawiki annoys me [19:51:46] ok, I will advise future users differently. thanks. [19:52:08] yw [20:11:07] Ryan_Lane, maybe going directly you don't get a session created? [20:11:20] could be [20:11:47] I really hate all of our authn/z code [20:13:50] Ryan_Lane: *points at the gsoc volunteer* [20:14:03] Or use your new greek counterpart :D [20:14:20] the gsoc person is working on supporting the openstack api [20:14:28] faidon is working on puppet and database replication [20:14:31] Multitasking! [20:14:39] both of those are more important than the auth code in mediawiki [20:14:53] the mediawiki people should give more of a crap about the authn/z code anyway ;) [20:14:57] * Damianz disagrees...want oauth [20:15:17] the problem, of course, is that no one likes writing authn/z code. [20:15:49] not really [20:16:08] it's simple to make a well-coded authentication against a users table [20:16:21] it's allowing it to interact which all kind of systems what makes it hard [20:16:23] our system is way more complex than that, though [20:16:37] that's not really true [20:16:57] (I hereby make the obligatory reminder that we haven't decided yet whom to admit to GSoC, and that the acceptances and rejections won't be public till 23 April) [20:16:58] a properly pluggable system makes that easy enough [20:17:26] but, our system is fucked up [20:18:03] especially since parts of our system get rewritten and authplugin calls are just left out [20:18:12] there's no tests written for authplugin [20:18:56] also, the code is broken into a bunch of special pages, none of which handle things properly [20:19:10] and part of the authentication is handled in User.php [20:19:29] sumanah: Yeah but he wanted it, he can work on it eitherway. STUCKFORLIFE [20:19:35] Ha! [20:19:53] ideally all authentication would occur via a plugin [20:20:02] where the default is what happens in user.php [20:20:12] Hell we have core features in extensions. [20:20:15] and we'd be able to have a fallback mechanism [20:20:19] *cough* mergeuser [20:20:36] hell, *rename user* [20:21:29] * Damianz votes Ryan_Lane should be a core dev then we can have a 15th direction of the core code :D [20:21:58] I am a core dev [20:22:06] or well, I was in svn, I dunno about in git [20:22:22] Does anyone know about git :P [20:22:35] I'm in ops, so technically I am :D [20:23:00] I've always avoided major core changes, though [20:23:02] sudo git rewrite :D [20:23:11] because I don't want to be the maintainer of any core systems [20:24:07] Ryan_Lane, could you create a project for gerrit replacement dreaming? [20:24:44] Isn't gerrit replacing dreaming ^demon's remit :D [20:25:06] Platonides: as I mentioned in the list, I created a gareth project [20:25:11] I added Daniel Friesen in it [20:25:30] oh, I didn't see it [20:25:40] I think it's going to be a long and difficult path to replace gerrit :) [20:25:47] I had been wanting to catch you in irc for that [20:25:53] * Ryan_Lane nods [20:25:57] It is [20:26:06] Most things worth doing are arduous and difficult :) [20:26:30] I don't think we'll necessarily do a better job [20:26:40] We should write it in brainfuck, it would look nicer than java [20:26:53] if we do an equally good job, it will probably suck less [20:27:01] at least it should [20:27:08] don't forget git replication [20:27:21] git replication? [20:27:24] making a web interface is supposed to be one of the points we are good at [20:27:33] * Platonides echoes RoanKattouw question [20:27:36] RoanKattouw: manganese is replicating to formey right now [20:27:44] Platonides: have you used wikipedia? [20:27:48] ;) [20:27:52] xD [20:27:55] Have you used Gerrit? [20:28:01] xDDD [20:28:02] It's like the definition of bad UX [20:28:11] I think using gerrit's interface is easier than editing wikipedia [20:28:29] Yeah because editing wikipedia is so user friendly :D [20:28:34] * Ryan_Lane puts on his troll hat [20:28:40] hehe [20:28:59] I think editing wikipedia is easier than writing a bootloader in asm. [20:29:03] I wish gerrit was written in python [20:29:08] ^ [20:30:02] I think editing wikipedia is easier than writing a wiktext parser in asm [20:30:08] heh [20:34:52] Ryan_Lane: https://gerrit.wikimedia.org/r/#change,4483 [23:49:04] PROBLEM Puppet freshness is now: CRITICAL on puppet-lucid i-00000080 output: Puppet has not run in last 20 hours