[00:19:02] PROBLEM Free ram is now: WARNING on mobile-feeds i-000000c1 output: Warning: 7% free memory [01:22:04] PROBLEM Puppet freshness is now: CRITICAL on wikidata-dev-2 i-0000020a output: Puppet has not run in last 20 hours [02:10:04] PROBLEM Free ram is now: WARNING on mobile-enwp i-000000ce output: Warning: 19% free memory [02:20:01] RECOVERY Free ram is now: OK on mobile-enwp i-000000ce output: OK: 24% free memory [02:21:21] PROBLEM Current Load is now: WARNING on bots-cb i-0000009e output: WARNING - load average: 12.75, 19.24, 8.99 [02:31:21] RECOVERY Current Load is now: OK on bots-cb i-0000009e output: OK - load average: 0.30, 2.86, 4.86 [02:37:57] PROBLEM Total Processes is now: CRITICAL on incubator-bots2 i-00000119 output: PROCS CRITICAL: 851 processes [03:44:02] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 15% free memory [03:46:20] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 14% free memory [03:51:20] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 14% free memory [03:56:20] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 16% free memory [04:01:21] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [04:06:20] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 96% free memory [04:06:20] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 5% free memory [04:09:00] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [04:14:00] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 93% free memory [04:16:20] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 96% free memory [04:16:20] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 4% free memory [04:21:20] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [04:21:20] PROBLEM Free ram is now: CRITICAL on test3 i-00000093 output: Critical: 1% free memory [04:26:20] RECOVERY Free ram is now: OK on test3 i-00000093 output: OK: 96% free memory [06:44:05] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 6.11, 6.30, 5.44 [06:56:14] PROBLEM Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours [07:00:14] PROBLEM Puppet freshness is now: CRITICAL on nova-gsoc1 i-000001de output: Puppet has not run in last 20 hours [07:24:05] RECOVERY Current Load is now: OK on bots-sql3 i-000000b4 output: OK - load average: 2.03, 2.71, 4.28 [08:55:35] 04/20/2012 - 08:55:35 - Creating a home directory for platonides at /export/home/gareth/platonides [08:56:37] 04/20/2012 - 08:56:37 - Updating keys for platonides [10:13:46] https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation [11:22:15] PROBLEM Puppet freshness is now: CRITICAL on wikidata-dev-2 i-0000020a output: Puppet has not run in last 20 hours [12:17:08] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 3.91, 5.63, 5.11 [12:27:08] RECOVERY Current Load is now: OK on bots-sql3 i-000000b4 output: OK - load average: 2.54, 3.57, 4.41 [13:25:08] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 5.19, 5.85, 5.16 [14:00:08] RECOVERY Current Load is now: OK on bots-sql3 i-000000b4 output: OK - load average: 2.95, 3.78, 4.59 [14:30:09] petan|wk: can the debug token take out again? its also inserted into javascript [14:30:21] and breaks pages like http://commons.wikimedia.beta.wmflabs.org/wiki/Special:UploadWizard [14:32:40] ok [14:33:48] done [14:36:10] !log . [14:36:11] Message missing. Nothing logged. [14:36:18] !log bots [14:36:19] Message missing. Nothing logged. [14:36:24] !log blah blah [14:36:25] Can't contact LDAP for project list. [14:36:25] blah is not a valid project. [14:36:36] petan|wk: any way to flush the cache in squid? [14:37:04] j^: yes, you need to purge cache for one page or more of them [14:37:33] ?action=purge does it for 1 page [14:39:11] thing its needed for all pages, or all javascript cache, i.e. translation strings are broken [14:39:24] ok [14:39:29] it's gonna take a while [14:43:04] j^: the site will be down for 5 minutes, ok? [14:43:14] ok [14:45:38] RECOVERY Free ram is now: OK on deployment-squid i-000000dc output: OK: 91% free memory [14:53:18] ok given that I need to create new filesystem on storage it will be a bit more [14:53:31] previous one was damaged a lot [14:53:39] no wonder it was so slow [14:56:53] thanks petan|wk [15:02:18] hello - test... [15:04:40] hi [15:05:01] j^: it's back [15:08:01] hm, page loads now but lots of instead of the text [15:08:30] it may be that I enabled localisation cache [15:08:42] I made a lot of changes to config [15:09:03] did you update any code recently? [15:09:09] there is new script in bin [15:09:23] did not make any changes [15:09:40] in the last days [15:09:47] hm... [15:10:08] it could be also broken translation? [15:10:18] can you show me example page [15:10:54] i.e. http://commons.wikimedia.beta.wmflabs.org/wiki/Special:UploadWizard [15:11:21] or some parts of http://commons.wikimedia.beta.wmflabs.org/wiki/File:Sheep.ogv [15:11:24] I found some other pages with problems, I think someone commited a bug [15:11:51] it happened even before I enabled cache [15:13:39] http://commons.wikimedia.beta.wmflabs.org/wiki/Main_Page does not load the File embed [15:15:08] ah linked image is missing [15:27:50] meh [15:27:53] it works [15:27:55] j^: ^ [15:28:06] mutante: how do you rebuild cache on prod? [15:28:14] you do it for each wiki? [15:30:18] petan|wk: http://commons.wikimedia.beta.wmflabs.org/wiki/Special:UploadWizard still has placeholders [15:30:27] where? [15:30:29] I see it ok [15:30:37] maybe try to ?action=purge [15:31:05] oh yes, there are some on top of page [15:33:38] petan|wk: there is caching at the squid level. and there the answer would be "squid clean" on sq* servers. if you meant that cache [15:34:25] mutante: no I mean rebuildLocalizationCache.php [15:34:28] purging one specific URL would be something | php ./purgeList.php --wiki aawiki [15:34:52] because that thing create cache in php-trunk/cache [15:34:54] oh..hm.. afraid no idea. i would try -dev [15:34:58] which is shared for all wikis [15:35:04] mutante: but you are from operation? [15:35:13] you guys do run these scripts [15:35:25] devs only created them :P [15:35:48] problem is that the script create cache files based on extensions enabled for wiki [15:35:58] so when I start it for commons it create cache for commons [15:36:08] but it's a bit broken when I use it for other wiki [15:36:14] like simple wiki [15:36:32] when I start rebuild of cache for simple it remove the previous commons cache [15:36:37] and then commons is broken [15:36:56] because whole farm is using het and share same files [15:36:57] petan|wk: enwiki beta is also of interest, for interwiki links to commons [15:37:13] chrismcmahon: is there a ticket in bugzilla for that? [15:37:43] petan|wk: not at the moment, but I'll make one [15:37:46] ok [15:38:25] petan|wk: I did run your script that did enable interwiki links, but I think it has broken again since then [15:38:35] petan: still different people do different stuff and i haven't had to do with LocalizationCache before, there's way too much stuff to expect everybody to know everything.. that's why we have ticket pools [15:39:11] ok, maybe someone else could answer this [15:39:18] I asked in -operations [15:40:18] hashar you know how to rebuild localizationCache? [15:41:20] the production script is in operations/puppet files/misc/l10nupdate [15:41:31] um [15:41:32] its l10nupdate-1 [15:41:38] ok, can you get it to labs? [15:41:48] oh sorry [15:41:53] I just need to know how does it work [15:41:53] it is not the LC :D [15:42:30] I guess you will want to ask nikerabbit or siebrand or raymond in #mediawiki-i18n :-( [16:33:22] petan: around? [16:38:19] petan|wk, petan: /export/home/bots/petrb/production/bot2/errors was 4.2G [16:38:40] I deleted it, but your process is holding open the file descriptor, meaning the space isn't being given back [16:38:43] Only 4.2? We need more traffic [16:38:44] please bounce your service [16:38:54] it's an *error log) [16:38:56] * [16:39:00] Ryan_Lane: You so need hadoop though :D [16:39:10] so that we can save worthless logs? [16:39:14] Yes :D [16:43:38] RECOVERY Disk Space is now: OK on labs-nfs1 i-0000005d output: DISK OK [16:44:47] ok. freed up a little space [16:44:53] it's basically all log files [16:45:03] what a waste of disk IO [16:45:29] You totally havn't seen cluebots log files which are also replicated mostly to irc in multiple places :D [16:45:37] Maximum bandwidth usage on all levels [16:48:09] Ryan_Lane: ok [16:48:33] petan|wk: thanks :) [16:48:44] petan|wk: I added something special for you! [16:48:56] I haven't implemented it on the instance side yet, though [16:49:08] petan|wk: in labsconsole, click on "Manage sudo policies" [16:49:51] ok I killed the proccess it's still using same space I guess [16:49:55] is it better? [16:52:22] hm [16:52:30] better [16:52:49] ok. need to get offlie [16:53:28] mutante: any luck with fixing labs? [16:57:18] PROBLEM Puppet freshness is now: CRITICAL on nova-production1 i-0000007b output: Puppet has not run in last 20 hours [17:01:18] PROBLEM Puppet freshness is now: CRITICAL on nova-gsoc1 i-000001de output: Puppet has not run in last 20 hours [17:16:08] With Ubuntu 12.04LTS being released next week. Are there any plans or time lines for when it will be available on labs yet (and an option for production)? [17:20:33] I'm sure it'll be on labs in the neare future [17:20:43] as for production, that's supposed to start later this year [17:21:00] Though, some hosts might be upgraded much earlier than others [17:22:44] i suspect they'll appear on labs when there's an image available [17:25:35] Thanks. It would likely be much easier to set up an openstreetmap tileserver on 12.04 than 10.04, as a number of packages on 10.04 are rather out of date. Which is why I am asking. [17:36:41] When it's released we can ask Ryan nicely about it [20:52:20] 04/20/2012 - 20:52:20 - Creating a home directory for faidon at /export/home/nginx/faidon [20:53:20] 04/20/2012 - 20:53:20 - Updating keys for faidon [20:58:33] Ryan_Lane: did you add me to the project? [20:58:35] I can't see it [21:23:12] PROBLEM Puppet freshness is now: CRITICAL on wikidata-dev-2 i-0000020a output: Puppet has not run in last 20 hours [21:33:12] PROBLEM Current Load is now: WARNING on bots-sql3 i-000000b4 output: WARNING - load average: 5.91, 5.77, 5.22 [21:43:12] RECOVERY Current Load is now: OK on bots-sql3 i-000000b4 output: OK - load average: 2.90, 4.09, 4.71 [21:56:35] hmm [21:57:03] so I'm typing "python fixing_redirects.py -help" into my labs bots-3 instance through SSH [21:57:07] but it doesn't give any output [22:11:12] PROBLEM Puppet freshness is now: CRITICAL on deployment-web4 i-00000163 output: Puppet has not run in last 20 hours [22:26:27] petan, petan|wk: Hey. can you replace the web servers in deployment-prep with instances that have 8GB of memory? [22:26:39] that's what they have in production, and the new hardware can handle it [22:30:12] PROBLEM Puppet freshness is now: CRITICAL on deployment-web2 i-00000125 output: Puppet has not run in last 20 hours [22:50:10] 04/20/2012 - 22:50:10 - Creating a home directory for kaldari at /export/home/bots/kaldari [22:51:12] 04/20/2012 - 22:51:11 - Updating keys for kaldari [22:52:16] !log bots added Kaldari to bots [22:52:16] Can't contact LDAP for project list. [22:52:16] bots is not a valid project. [22:52:21] heh [22:52:23] damn it [22:53:50] !log bots added Kaldari to bots [22:53:50] bots is not a valid project. [22:53:54] seriously? [22:54:23] ah. cache [22:55:00] !log bots added Kaldari to bots [22:55:01] Logged the message, Master [22:55:30] ok. I need to finish packaging and puppetization of this bot [22:58:49] paravoid: other than /var/run, where's a good spot to put a directory and file owned by a user that's created by a package? [22:59:01] the directory needs to survive reboots [22:59:09] /var/spool? [22:59:17] /var/lib/ [22:59:28] wait, does the program write to this? [22:59:29] yes [22:59:41] which is why I originally used /var/run :) [22:59:41] right, /var/lib/ [22:59:48] hm [23:00:03] ah. right [23:00:15] why the hell is it installing the script in /var/lib? [23:00:23] that should be /usr/lib [23:00:24] what does? [23:00:26] this package is broken [23:00:29] adminbot [23:00:38] yes, it shouldn't [23:00:53] /usr/lib or /usr/share [23:00:54] wait [23:01:01] the package isn't even installed here [23:01:02] wtf [23:02:37] how was this installed? [23:03:08] where's the packaging files? :( [23:03:14] I'm going to have to redo all of this, aren't i? [23:04:43] \o/ [23:04:49] hyperon has it in his homdir [23:39:32] PROBLEM dpkg-check is now: CRITICAL on bots-2 i-0000009c output: CHECK_NRPE: Socket timeout after 10 seconds. [23:44:22] RECOVERY dpkg-check is now: OK on bots-2 i-0000009c output: All packages OK