[00:06:04] Hello all. [00:06:28] *** I got Common.js on beta error-free! *** [00:08:35] nice [01:00:23] hi ryan, could you maybe install argparse on stat1 as well (http://pypi.python.org/pypi/argparse) [01:00:34] gimme a sec [01:01:09] done [01:11:16] super! [02:06:45] \o/ [02:06:48] virt2 is happy again [02:06:57] with the gluster volume rebalanced, it's doing less IO [02:07:55] and now we have 2.3 TB of storage for instance storage! [02:34:53] PROBLEM host: test1 is DOWN address: test1 CRITICAL - Host Unreachable (test1) [02:46:13] PROBLEM host: labs-mc2 is DOWN address: labs-mc2 CRITICAL - Host Unreachable (labs-mc2) [03:17:13] PROBLEM host: labs-mc2 is DOWN address: labs-mc2 CRITICAL - Host Unreachable (labs-mc2) [03:48:03] PROBLEM host: labs-mc2 is DOWN address: labs-mc2 CRITICAL - Host Unreachable (labs-mc2) [03:59:06] Daniel Kinzler of Wikimedia Germany & the toolserver has now arrived in San Francisco for some discussions tomorrow & Fri re Wikidata [04:18:03] PROBLEM host: labs-mc2 is DOWN address: labs-mc2 CRITICAL - Host Unreachable (labs-mc2) [04:38:13] RECOVERY host: labs-mc2 is UP address: labs-mc2 PING OK - Packet loss = 0%, RTA = 1.04 ms [10:03:57] Err .. how many bots can connect to irc.freenode.net from the servers? [10:04:12] Can it be that I can't connect another bot because of this? [11:14:55] There is a limit of X clients from a certain IP, yes [11:15:39] can't seem to find the number tho [11:19:33] Beetstra: they tell me it's ~50, but this was sometime ago [11:19:58] but it should give you an error message saying too many connection [11:20:55] If that's the problem, somebody should request an iline [11:23:13] Bot does not seem to get a decent error message, but I can't find it anywhere on freenode [11:23:22] especially not in the channels where it should be [11:23:44] * Beetstra will try to figure out what message the bot gets .. [13:11:28] Snowolf: [13:11:29] LINKREPORTER 000003 3: IRC: Reporter 3 got disconnect call from irc.freenode.net, attempt to reconnect @ 1326978587 [13:11:29] LINKREPORTER 000003 3: returned error: Closing Link: 208.80.153.192 (Too many user connections (global)) [13:11:39] Okey yep that's it [13:11:42] :-) [13:12:04] .. I told one bot to die, and the one that did not connect pops up immediately .. [13:24:47] LINKREPORTER 000003 3: IRC: Reporter 3 got disconnect call from irc.freenode.net, attempt to reconnect @ 1326978596 [13:24:47] LINKREPORTER 000003 3: returned error: Closing Link: 208.80.153.192 (Too many user connections (global)) [13:24:53] oops, wrong channel [14:13:36] Beetstra: you need to get a labs admin and get them to send an email to ilines@freenode.net detailing the ips, the # of connections and the like [14:57:24] looks like a LDAP server is dead [14:57:28] !nagios [14:57:28] http://nagios.wmflabs.org/nagios3 [14:58:04] !ldap [14:58:24] mutante: petan: can one of you look at the wmfLabs LDAP ? [14:58:28] looks like virt1 is dead [14:58:50] of course nagios does not look at it [14:59:44] !log Labs LDAP seems down [14:59:44] Labs is not a valid project. [14:59:47] !log [15:00:04] ... [15:00:42] Seems up to me. [15:02:06] maybe the DNS entry virt1.wikimedia.org was removed [15:02:38] Ldap is on virt0 and I can bind to it using the proxyagent and my login fine. [15:02:59] clear [15:03:04] Jenkins use virt1.wikimedia.org which does not exist anymore it seems [15:03:33] Dunno, try setting it to ldap://virt0.wikimedia.org:389 that's what all the servers use. [15:05:24] trying :) [15:08:42] You could stick virt1 in the hosts file pointing to virt0's ip if you need to get into the web interface. Jenkins is a PITA to disable security on. [15:25:13] Damianz: works with virt0 thx [15:25:37] I have just changed jenkins config.xml and reloaded it [15:29:26] :) [16:10:23] PROBLEM Free ram is now: WARNING on bots-3 bots-3 output: Warning: 19% free memory [17:50:23] RECOVERY Free ram is now: OK on bots-3 bots-3 output: OK: 64% free memory [18:02:13] PROBLEM Current Load is now: WARNING on bots-cb bots-cb output: WARNING - load average: 7.90, 17.72, 9.40 [18:17:13] RECOVERY Current Load is now: OK on bots-cb bots-cb output: OK - load average: 0.28, 1.18, 3.76 [19:59:53] hi [20:00:09] could someone import the MediaWiki namespace to hu.wikipedia.beta? [20:07:48] tgr: sure [20:08:04] thanks [20:23:58] tgr: done [20:24:08] :o [20:24:22] btw if you wanted any other list it on global request please [20:26:25] petan: where is that? [20:26:48] http://labs.wikimedia.beta.wmflabs.org/wiki/Global_Requests [20:28:14] ok [20:28:35] tgr: I am leaving now, do you need anything else [20:28:39] admin? [20:28:40] etc [20:28:41] will the pages only be visible later? it seems they are only from A to G [20:28:56] it's still running now [20:29:01] ah, ok [20:29:03] should be finished in a minute [20:29:14] yes, an admin bit would be useful [20:29:36] username? [20:29:39] Tgr [20:30:05] Nem létezik „Tgr” nevű szerkesztő [20:30:12] one sec [20:31:16] petan: we have more instance storage now [20:31:20] twice as much [20:31:22] cool [20:31:32] so, feel free to add more space [20:31:32] can I make a backup project now? [20:31:36] :) [20:31:44] yeah [20:31:47] I want one big instance there [20:31:50] like 80gb [20:31:58] it'll be temporary. when we get snapshots, we probably don't need it anymore [20:32:01] sure [20:32:05] it's mostly for db [20:32:12] yeah [20:32:28] petan: registration does not seem to be working right now, maybe the import script slows things down [20:32:28] when we get the databases, we probably won't need it as much either [20:32:28] I hope :) [20:32:34] i will go to the requests page when I managed to registerű [20:32:38] but we need to find out how to make backups there [20:32:41] thanks for your help [20:32:43] + how to recover them [20:32:49] tgr: weird [20:32:50] yeah [20:32:56] tgr: try on meta [20:33:03] btw what error you get? [20:33:26] but yes, import is eating a lot of cpu [20:33:30] nothing, it just keeps loading when I submit the form [20:33:42] ah, right maybe meta would be faster then [20:33:44] it's SUL [20:34:00] you mean, meta.wikimedia.beta.wmflabs.org? [20:34:02] yes [20:34:24] btw pages opens to me [20:34:26] on hu [20:35:28] Ryan_Lane: it would be really useful if people could ssh there to check load, processes, memory etc :) eventualy control backups [20:35:30] yes, even Special:Userlogin loads, its just the form submit which hangs [20:35:42] because it's in squid [20:35:46] meta is the same [20:35:47] apache is overloaded now [20:35:50] heh [20:35:52] well, I'd prefer that stuff be monitored via ganlia [20:35:54] *ganglia [20:35:58] hm... ok [20:35:59] anyway, I managed to register on hu.wp now [20:36:05] and I'd prefer any actions controlled via an API [20:36:09] you know I like top :) [20:36:18] it won't help much [20:36:34] hm... probably not [20:36:40] since there may be a bunch of databases, and you wouldn't know which one is causing issues [20:36:44] but is it possible to check disk space in ganglia? [20:36:47] I'd like to be able to see processlist and such [20:36:51] should be, yes [20:36:54] ok [20:37:07] I would be fine I could check disk space, free memory and load [20:37:15] also, we'll likely make quotas for databases using xfs filesystem quotas [20:37:19] ok [20:37:36] total diskspace would be interesting though :) I want to know how much people use it, heh [20:37:44] heh [20:37:55] I want to be able to add that info to labsconsole [20:38:00] ok [20:38:35] tgr: sysoped [20:38:42] tgr: if you needed any other flag let me know [20:38:56] petan, ok, thanks [20:39:16] btw import is still running, I will leave for 2 hours now, ok? [20:39:26] sure [20:39:28] I hope it's all, you can import pages youserlf [20:53:34] Ryan_Lane: how do I configure squid so that it works when apache is offline [20:54:03] you cant? [20:54:07] it doesn't work [20:54:10] squid is just a proxy [20:54:17] when I turn off apache squid tell me it can't connect [20:54:21] if apache is down, squid will break, if something isn't in the cache [20:54:33] maybe it isn't caching [20:54:38] hm [20:54:43] but I have no idea how to check it [20:54:49] headers say cache hit [20:55:03] squid made 400mb big cache folder [20:55:09] so I guess something is cached [20:55:13] PROBLEM Current Load is now: WARNING on bots-cb bots-cb output: WARNING - load average: 1.11, 6.43, 5.67 [20:55:19] oh [20:55:19] but in apache I see logs on every hit [20:55:20] hm [20:55:29] when I refresh a page it's in apache [20:55:34] I think it should not be [20:55:34] I'll work on bringing our squid config in soon [20:55:37] ok [20:59:43] Ryan_Lane: would it be possible to make information about virt cluster public? [20:59:51] hw, memory, load [20:59:53] etc [21:00:08] yes. ganglia [21:00:13] RECOVERY Current Load is now: OK on bots-cb bots-cb output: OK - load average: 0.21, 2.56, 4.18 [21:00:18] I've seen that but it doesn't say that much [21:00:24] http://ganglia.wikimedia.org/2.2.0/?r=hour&cs=&ce=&m=&s=by+name&c=Virtualization%2520cluster%2520pmtpa&tab=m&vn= [21:00:28] it mostly only show load but some weird one [21:00:32] I wanted unix load [21:00:39] http://ganglia.wikimedia.org/2.2.0/?c=Virtualization%20cluster%20pmtpa&h=virt2.pmtpa.wmnet&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [21:00:39] not % [21:00:43] http://ganglia.wikimedia.org/2.2.0/graph_all_periods.php?h=virt2.pmtpa.wmnet&m=load_one&r=hour&s=by%20name&hc=4&mc=2&st=1327006829&g=load_report&z=large&c=Virtualization%20cluster%20pmtpa [21:00:47] 41%, 34%, 22% [21:00:51] what does it mean? [21:01:02] where do you see %? [21:01:10] Current Load Avg (15, 5, 1m): [21:01:12] 41%, 34%, 22% [21:01:17] ganglia [21:01:21] again, look at the last link [21:01:23] I don't see % [21:01:32] cool [21:01:56] is it possible to see usage of ram? [21:02:08] how many cpu we have? [21:02:18] a ton [21:02:23] PROBLEM Free ram is now: WARNING on deployment-web deployment-web output: Warning: 9% free memory [21:02:25] CPUs Total: [21:02:25] 96 [21:02:27] it shows it in ganglia [21:02:34] http://ganglia.wikimedia.org/2.2.0/?r=hour&cs=&ce=&m=&s=by+name&c=Virtualization%2520cluster%2520pmtpa&tab=m&vn= [21:02:46] ah [21:03:04] 200gb of ram? [21:03:29] close [21:03:33] 189 [21:04:04] I'm not totally sure why we are swapping [21:04:13] since we have cache and free memory on all systems [21:04:21] usage of disk storage is there? [21:04:22] maybe the kernel just wants to swap out unused pages [21:04:29] no [21:04:31] unfortunately [21:04:33] ok [21:04:57] oh. sweet [21:05:00] http://ganglia.wikimedia.org/2.2.0/graph.php?r=hour&c=Virtualization+cluster+pmtpa&m=load_one&s=by+name&mc=2&g=network_report&json=1 [21:05:03] load 54 with 96 cpu's is ok [21:05:06] you can get json from ganglia [21:05:14] cool [21:05:24] we could make a bot which would report to channel :D status [21:05:38] well, we can use ganglios [21:05:42] like !status [21:05:43] :D [21:05:50] to have nagios alert on ganglia status [21:06:04] hm, maybe it would be nice to insert virt cluster to nagios we use on labs [21:06:10] so that it would be all in one place [21:06:15] nah [21:06:18] I know it's on prod nagios now [21:06:34] I'd prefer to have production in production and labs in labs [21:06:43] hm, ok [21:06:59] but virt cluster is used for labs only or not [21:07:07] maybe not forever [21:07:10] ah [21:07:22] we may have a zone for production [21:10:57] I am wondering how much insecure it would be to open a nrpe port to labs net [21:11:14] is it possible to hack something using nrpe? [21:12:09] what does nrpe stand for? [21:12:16] eh... [21:12:27] I don't remeber shortcut it's a client of nagios [21:12:29] oh, Nagios Remote Plugin Executor [21:12:32] yes [21:13:07] Ryan_Lane: main benefit would be that we would have reports in this channel too [21:13:19] I don't think people here watch nagios in -tech [21:13:31] nrpe isn't terribly secure [21:13:41] I don't really know how does it work, but ok [21:14:23] brb [21:14:25] :o [21:14:36] hu wiki has mw bigger than en wiki [21:16:56] Ryan_Lane, could you give an external IP and domain name for mobile-enwp? we're ready to try it with mobile devices [21:17:13] I thought you guys already had it [21:17:23] RECOVERY Free ram is now: OK on deployment-web deployment-web output: OK: 25% free memory [21:18:10] no, that was mobile-feeds [21:18:39] which project is this? [21:18:59] mobile [21:19:16] you have more than one instance in here? [21:19:36] what's the difference between the two? [21:20:51] they're for testing different features [21:21:25] and have different MW, etc [21:21:50] what DNS name do you want? [21:22:34] mobile-geo [21:22:42] mobile-geo.wmflabs.org? [21:22:44] ok [21:24:03] !log mobile upped the floating IP quota to 2 [21:24:05] Logged the message, Master [21:24:09] !log allocated a new IP [21:24:09] allocated is not a valid project. [21:24:14] !log mobile allocated a new IP [21:24:15] Logged the message, Master [21:24:22] !log mobile associated new IP with mobile-en [21:24:23] Logged the message, Master [21:24:45] !log mobile adding DNS name for newly allocated IP (mobile-geo.wmflabs.org) [21:24:45] Logged the message, Master [21:25:04] MaxSem: done [21:25:21] Ryan_Lane, thank you [21:25:32] ye [21:25:34] err [21:25:34] yw [21:32:23] ah Ryan!!! :-) [21:32:54] is there any FQDN to reach the labs LDAP service? [21:33:06] jenkins denied authentication this afternoon because it pointed at virt1.wikimedia.org [21:33:18] which was deleted / removed recently. [21:33:40] I switched it to virt0.wikimedia.org but would prefer to use a service name if that is possible [21:33:49] * hashar pings Ryan_Lane ^^^ [21:36:17] ah [21:36:18] heh [21:36:19] sorry [21:36:24] it's virt0 right now [21:36:43] I'd prefer to move all LDAP to separate servers [21:38:57] that preferably don't have public IPs [21:39:09] brb [21:49:14] Dantman: Are you doing anything with the bugtracker project? [21:49:35] Haven't done anything with anything yet [21:49:48] Though on that project I don't even have a database to import [21:50:16] Dantman: And that's to test out other tracker software, right? [21:50:40] mhmm