[00:54:40] PROBLEM Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours [00:58:40] PROBLEM Puppet freshness is now: CRITICAL on nova-gsoc1 i-000001de output: Puppet has not run in last 20 hours [01:16:02] paravoid: We're going grab some food. Want to come? [01:16:14] sure [01:16:15] where? [01:16:53] Not sure. You at the office? [01:17:00] yes [01:17:19] We'll meet you there [01:17:33] oh okay [02:41:00] PROBLEM Free ram is now: WARNING on mobile-enwp i-000000ce output: Warning: 15% free memory [02:41:40] PROBLEM Free ram is now: WARNING on mobile-feeds i-000000c1 output: Warning: 7% free memory [03:01:44] PROBLEM Free ram is now: CRITICAL on mobile-feeds i-000000c1 output: Critical: 5% free memory [03:04:00] RECOVERY Current Load is now: OK on wikidata-dev-1 i-0000020c output: OK - load average: 1.29, 1.45, 0.63 [03:06:44] RECOVERY Current Users is now: OK on wikidata-dev-1 i-0000020c output: USERS OK - 0 users currently logged in [03:06:44] RECOVERY Disk Space is now: OK on wikidata-dev-1 i-0000020c output: DISK OK [03:06:44] RECOVERY Free ram is now: OK on wikidata-dev-1 i-0000020c output: OK: 86% free memory [03:07:20] RECOVERY Total Processes is now: OK on wikidata-dev-1 i-0000020c output: PROCS OK: 90 processes [03:07:30] RECOVERY dpkg-check is now: OK on wikidata-dev-1 i-0000020c output: All packages OK [03:11:10] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 4.24, 5.76, 5.17 [03:16:10] RECOVERY Current Load is now: OK on bots-sql3 i-000000b4 output: OK - load average: 2.67, 4.45, 4.80 [03:16:50] PROBLEM Disk Space is now: CRITICAL on deployment-transcoding i-00000105 output: DISK CRITICAL - free space: / 38 MB (2% inode=53%): [03:21:50] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 60 MB (4% inode=53%): [03:46:30] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 16% free memory [03:51:30] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 17% free memory [03:51:30] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 16% free memory [03:55:14] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 13% free memory [04:06:24] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 5% free memory [04:11:24] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 4% free memory [04:11:24] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 96% free memory [04:11:24] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:15:14] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 5% free memory [04:16:24] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [04:16:24] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:16:24] PROBLEM Free ram is now: WARNING on test3 i-00000093 output: Warning: 7% free memory [04:21:24] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [04:25:14] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 93% free memory [05:20:34] PROBLEM Puppet freshness is now: CRITICAL on wikidata-dev-2 i-0000020a output: Puppet has not run in last 20 hours [05:30:53] nagios PROBLEM Disk Space is now: WARNING on nagios 127.0.0.1 output: DISK WARNING - free space: /home/dzahn 2242 MB (13% inode=82%): /home/petrb 2242 MB (13% inode=82%): /home/laner 2242 MB (13% inode=82%): [06:44:15] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 5.58, 6.32, 5.39 [06:47:25] RECOVERY Disk Space is now: OK on nagios 127.0.0.1 output: DISK OK [07:09:21] RECOVERY Current Load is now: OK on bots-sql3 i-000000b4 output: OK - load average: 2.08, 3.84, 4.66 [08:03:18] helllo [08:52:01] PROBLEM Disk Space is now: CRITICAL on deployment-transcoding i-00000105 output: DISK CRITICAL - free space: / 34 MB (2% inode=53%): [08:57:01] PROBLEM Disk Space is now: WARNING on deployment-transcoding i-00000105 output: DISK WARNING - free space: / 61 MB (4% inode=53%): [10:55:31] PROBLEM Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours [10:59:31] PROBLEM Puppet freshness is now: CRITICAL on nova-gsoc1 i-000001de output: Puppet has not run in last 20 hours [12:02:00] hi [12:09:38] !log [12:23:29] New patchset: Dzahn; "test multiple cronjob creation in labs where i need it anyways" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/5315 [12:23:41] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (test); V: -1 - https://gerrit.wikimedia.org/r/5315 [12:24:50] New patchset: Dzahn; "test multiple cronjob creation in labs where i need it anyways" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/5315 [12:25:04] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/5315 [12:25:54] New review: Dzahn; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5315 [12:25:57] Change merged: Dzahn; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/5315 [12:28:09] https://bugzilla.wikimedia.org/show_bug.cgi?id=36090 [12:28:13] DamianZ, mutante [12:28:20] if you have an idea how to solve it do that [12:28:41] the bot is configured by Ryan and in fact I don't know where it is [12:28:50] bots-labs is where it should live [12:28:57] Bit busy to look for it atm though [12:30:16] petan|wk: is it broken? [12:30:24] or just "unstable" , what means stable instance [12:30:53] you mean like the distro version? [12:34:00] mutante: bots-2 is most loaded instance in bots project [12:34:07] it's so loaded that log bot is down most of time [12:34:13] !log blah blah [12:34:15] you see? [12:34:32] only bots which can recover from OOM can live there [12:34:46] it's not for weak bots :P [12:35:21] if oom, fork! [12:35:37] that's how other bots work I think [12:35:54] :) [12:36:01] i see what you mean but no idea why its so loaded [12:36:22] just restart it [12:36:22] I don't know it either and I don't really care I just want to move it to another instance so that it works [12:36:24] that'll sort it out [12:36:39] Reedy: this issue is happening for months, dozen of restarts happened in past [12:36:55] I just gave up on waiting for someone to fix it, so I created ticket [12:37:19] add moar ram [12:37:28] good idea [12:37:58] that bot itself doesn't need much ram, so moving it to another instance is surely easiest + we can't add moar ram to existing isntance [12:38:47] http://cdn.memegenerator.net/instances/400x/18959123.jpg [12:38:48] just create another instance? [12:39:01] you are in bots project, right [12:39:23] mutante: I don't have problem allocating the resources, I have problem locating where bot lives and moving it [12:39:33] I mean if I create new instance who move the bot? :-) [12:39:37] it's Ryan's bot [12:39:40] he knows where it is [12:39:44] he knows how to move it [12:40:14] we already have instance bots-labs which is idling and waiting for this bot [12:40:23] but someone need to move it there [12:40:27] thats true, just saying we are at the same level then [12:40:35] that's why I created the ticket [12:40:43] which is assigned to Ryan now [12:40:48] i dont know about the bot, and either there are public docs or we'll have to ask Ryan to make some [12:40:51] I just hoped someone knows more than I do [12:41:02] Ryan does [12:41:10] !Ryan [12:41:10] man of all answers ever (but there are others :)) [12:41:20] but im sure you can feel free to also just do it [12:41:25] problem is that he's not around when I am [12:41:27] since its broken now [12:41:36] and you have access to the instance [12:41:47] mutante: I'd love to do it if only I knew where the bot lives [12:42:21] I don't even know how to properly move it [12:42:26] it's packaged to .deb package [12:42:26] search the file system? [12:42:32] apt-get remove logmsgbot [12:42:33] it's installed using apt [12:42:45] ok that's way how to remove existing one [12:42:55] but I need to install it to another server too :) [12:42:58] i can look at it later, just give me a while to finish what i was at [12:43:06] np [12:43:21] if its just a debian package [12:43:27] apt-get install :) [12:43:30] yes it is .deb package no one has [12:43:45] it's just installed package which doesn't exist on fs [12:43:51] maybe it's somewhere on Ryan's pc [12:44:07] then the package is most likely installed via a puppet class you'd have to apply to the instance via labsconsole [12:44:17] I didn't find any [12:44:34] I think he didn't puppetize this so far [12:44:43] also it require other stuff, like having bot password [12:44:55] it needs to login to console so that it can edit wiki [12:45:20] hmm,ok, maybe this _is_ better to ask him about first [12:45:36] i'll take a look later..bbiaw [12:48:19] New patchset: Dzahn; "cronjobs - fix project name in command" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/5318 [12:48:32] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/5318 [12:48:58] New review: Dzahn; "(no comment)" [operations/puppet] (test); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5318 [12:49:01] Change merged: Dzahn; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/5318 [13:17:58] hashar: if u are around https://bugzilla.wikimedia.org/show_bug.cgi?id=36011 [13:18:10] you said that you could help with stuff on deployment site [13:18:39] yup [13:18:55] that is probably going to keep me busy during May ;-D [13:19:00] ok [13:20:44] petan|wk: while you are around, is there any document about the WMF Beta project ? [13:21:45] yes [13:22:08] http://deployment.wikimedia.beta.wmflabs.org/wiki/Help [13:22:31] http://deployment.wikimedia.beta.wmflabs.org/wiki/HetDeploy [13:22:40] !sal [13:22:40] https://labsconsole.wikimedia.org/wiki/Server_Admin_Log see it and you will know all you need [13:23:15] now we have 6 apaches [13:23:27] other than that it's same [13:23:53] \o/ [13:25:50] petan|wk: about memcached, seems you can enable debugging with $wgMemCachedDebug = true [13:26:01] according to doc, that will send message to $wgDebugLogFile [13:26:27] that file is pretty much unreadable given to number of messages [13:26:46] ohhh [13:26:49] we don't have such a traffic like production but we still have some :) [13:26:52] so we need to have that split somewhere else :D [13:27:25] but I could try to grep some errors from there if I knew how they look [13:27:39] but I will try it [13:28:15] there is another debug function which accept a prefix [13:28:37] then any debug message using that prefix can be written in a specific file [13:29:07] ahhh wfDebugLog() [13:32:47] ok tail -f errors.log [13:32:51] I get a load of text [13:32:58] :D [13:34:13] memcached: MemCache: sock i:0; got swwiki:sidebar:sw [13:34:14] memcached: get(swwiki:resourceloader:filter:minify-css:7:d5a1bf6cbd05fc6cc2705e47f52062dc) [13:34:25] is that what we need? [13:37:32] it would be really cool if mediawiki created a token for each run of php code and addded it to source code like