[00:07:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [00:37:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [01:07:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [01:19:15] PROBLEM Current Load is now: WARNING on bots-3 bots-3 output: WARNING - load average: 6.34, 8.38, 5.51 [01:27:05] PROBLEM Current Load is now: WARNING on dumps-nfs1 dumps-nfs1 output: WARNING - load average: 5.32, 6.75, 5.34 [01:29:15] RECOVERY Current Load is now: OK on bots-3 bots-3 output: OK - load average: 2.10, 3.47, 4.32 [01:31:44] huh [01:31:47] my crontab got deleted? [01:32:05] RECOVERY Current Load is now: OK on dumps-nfs1 dumps-nfs1 output: OK - load average: 4.41, 4.66, 4.77 [01:32:18] is there a way to restore it? [01:37:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [02:06:51] petan: whats up [02:07:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [02:37:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [03:01:55] PROBLEM Free ram is now: CRITICAL on puppet-lucid puppet-lucid output: Critical: 3% free memory [03:07:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [03:26:55] RECOVERY Free ram is now: OK on puppet-lucid puppet-lucid output: OK: 20% free memory [03:37:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [03:42:05] PROBLEM Current Load is now: WARNING on dumps-nfs1 dumps-nfs1 output: WARNING - load average: 4.17, 5.37, 5.11 [03:57:05] RECOVERY Current Load is now: OK on dumps-nfs1 dumps-nfs1 output: OK - load average: 3.83, 4.32, 4.75 [04:07:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [04:10:55] PROBLEM Free ram is now: WARNING on bots-3 bots-3 output: Warning: 12% free memory [04:15:55] RECOVERY Free ram is now: OK on bots-3 bots-3 output: OK: 58% free memory [04:37:25] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [05:07:27] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [05:37:28] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [05:48:38] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [05:56:58] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [06:02:13] PROBLEM Current Load is now: CRITICAL on driver-dev-jumbo driver-dev-jumbo output: CHECK_NRPE: Socket timeout after 10 seconds. [06:02:13] PROBLEM Disk Space is now: CRITICAL on driver-dev-jumbo driver-dev-jumbo output: CHECK_NRPE: Socket timeout after 10 seconds. [06:05:08] PROBLEM Current Users is now: CRITICAL on driver-dev-jumbo driver-dev-jumbo output: CHECK_NRPE: Socket timeout after 10 seconds. [06:05:08] PROBLEM Free ram is now: CRITICAL on driver-dev-jumbo driver-dev-jumbo output: CHECK_NRPE: Socket timeout after 10 seconds. [06:05:08] PROBLEM Total Processes is now: CRITICAL on driver-dev-jumbo driver-dev-jumbo output: CHECK_NRPE: Socket timeout after 10 seconds. [06:05:13] PROBLEM dpkg-check is now: CRITICAL on driver-dev-jumbo driver-dev-jumbo output: CHECK_NRPE: Socket timeout after 10 seconds. [06:06:38] PROBLEM SSH is now: CRITICAL on driver-dev-jumbo driver-dev-jumbo output: CRITICAL - Socket timeout after 10 seconds [06:06:38] PROBLEM Current Users is now: CRITICAL on dumpster01 dumpster01 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:06:38] PROBLEM Disk Space is now: CRITICAL on dumpster01 dumpster01 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:07:28] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [06:08:28] PROBLEM Current Load is now: CRITICAL on dumpster01 dumpster01 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:08:28] PROBLEM Free ram is now: CRITICAL on dumpster01 dumpster01 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:08:28] PROBLEM Total Processes is now: CRITICAL on dumpster01 dumpster01 output: CHECK_NRPE: Socket timeout after 10 seconds. [06:10:28] PROBLEM Current Load is now: WARNING on dumps-nfs2 dumps-nfs2 output: WARNING - load average: 3.15, 6.37, 5.28 [06:11:28] RECOVERY SSH is now: OK on driver-dev-jumbo driver-dev-jumbo output: SSH OK - OpenSSH_5.8p1 Debian-7ubuntu1 (protocol 2.0) [06:11:28] RECOVERY Current Users is now: OK on dumpster01 dumpster01 output: USERS OK - 0 users currently logged in [06:11:28] RECOVERY Disk Space is now: OK on dumpster01 dumpster01 output: DISK OK [06:11:58] PROBLEM Current Load is now: WARNING on driver-dev-jumbo driver-dev-jumbo output: WARNING - load average: 4.73, 11.94, 7.93 [06:11:58] RECOVERY Disk Space is now: OK on driver-dev-jumbo driver-dev-jumbo output: DISK OK [06:13:18] RECOVERY Current Load is now: OK on dumpster01 dumpster01 output: OK - load average: 1.13, 6.27, 4.20 [06:13:18] RECOVERY Free ram is now: OK on dumpster01 dumpster01 output: OK: 83% free memory [06:13:18] RECOVERY Total Processes is now: OK on dumpster01 dumpster01 output: PROCS OK: 77 processes [06:14:58] RECOVERY Current Users is now: OK on driver-dev-jumbo driver-dev-jumbo output: USERS OK - 0 users currently logged in [06:14:58] RECOVERY Free ram is now: OK on driver-dev-jumbo driver-dev-jumbo output: OK: 80% free memory [06:14:58] RECOVERY Total Processes is now: OK on driver-dev-jumbo driver-dev-jumbo output: PROCS OK: 161 processes [06:15:03] RECOVERY dpkg-check is now: OK on driver-dev-jumbo driver-dev-jumbo output: All packages OK [06:15:28] RECOVERY Current Load is now: OK on dumps-nfs2 dumps-nfs2 output: OK - load average: 4.54, 3.92, 4.42 [06:19:28] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [06:21:58] RECOVERY Current Load is now: OK on driver-dev-jumbo driver-dev-jumbo output: OK - load average: 0.00, 1.63, 4.18 [06:27:57] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [06:37:37] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [06:49:42] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [06:58:11] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [07:07:41] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [07:10:31] PROBLEM Current Load is now: WARNING on bots-sql3 bots-sql3 output: WARNING - load average: 4.25, 5.73, 5.04 [07:12:37] @new [07:20:41] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [07:23:55] I've kind of screwed things up... [07:28:11] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [07:35:21] PROBLEM Current Load is now: WARNING on dumps-nfs1 dumps-nfs1 output: WARNING - load average: 13.30, 10.22, 5.66 [07:37:41] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [07:40:31] RECOVERY Current Load is now: OK on bots-sql3 bots-sql3 output: OK - load average: 1.72, 2.86, 4.49 [07:50:41] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [07:53:31] PROBLEM Current Load is now: WARNING on bots-sql3 bots-sql3 output: WARNING - load average: 7.32, 6.84, 5.90 [07:58:21] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [08:07:41] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [08:14:02] !account [08:14:02] in order to get an access to labs, please type !account-questions and ask Ryan, or someone who is in charge of creating account on labs [08:14:07] !account-questio [08:14:08] !account-question [08:14:10] !account-question [08:14:11] !account-questions [08:14:11] I need the following info from you: 1. Your preferred wiki user name. This will also be your git username, so if you'd prefer this to be your real name, then provide your real name. 2. Your preferred email address. 3. Your SVN account name, or your preferred shell account name, if you do not have SVN access. [08:14:14] morning [08:14:25] Ryan_Lane: hi [08:14:29] morning. I believe I've broken things fairly badly [08:14:37] Ryan_Lane: I sent you a mail now [08:14:53] you did? [08:14:56] I don't see it [08:15:07] should arrive in a minute [08:15:23] ah ok [08:15:28] so... [08:15:30] we decide to make a web huggle and we want to host it on labs [08:15:40] so I need to create a new project and accounts for 2 devs [08:15:44] it's likely that any instance that reboots will die [08:15:48] oh [08:16:06] I'm working on fixing that, but I'm not totally sure how I'm going to do it [08:17:02] what you did? [08:17:15] I mean why it happens now [08:17:59] I screwed up the _base directory on the instances share [08:18:13] oh [08:18:16] THO|Cloud: hi [08:18:21] I'm recovering the files, but now the directory is fucked up [08:18:26] I can't re-create it [08:18:45] that's the folder where instance storage is? [08:18:56] well…. no [08:19:08] /var/lib/nova/instances is [08:19:14] _base is under that directory [08:19:19] ah [08:19:24] which data are in that [08:19:34] * is [08:19:38] when an instance is created, nova pulls an image from glance [08:19:43] it sticks it there [08:19:53] so it's the image of vm? [08:20:02] kind of [08:20:30] right, which fs it's using? most of unix fs should have not break since the fd is open [08:20:35] there's a cow2 image in /var/lib/nova/instances//disk.local [08:20:41] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [08:20:45] problem is when instance die [08:20:54] but, it's based on the stuff in base [08:20:57] same with snapshots [08:21:08] so, I'm restoring _base from the fds [08:21:13] ok [08:21:17] but, I can't write it back into _base [08:21:19] does it affect the current instance data [08:21:30] because all of the instances are holding open files in the old _base [08:21:35] you likely need to shutdown the instances now [08:21:39] it shouldn't, no [08:21:59] those write into /var/lib/nova/instances/ [08:22:05] ah [08:22:11] the data in _base likely shouldn't change [08:22:19] I don't know that that is actually true, though [08:22:25] I need to ask the openstack people how that works [08:22:32] can you make a backup before putting it back [08:22:40] if I shut down the instances, it's possible they may not come back up [08:22:48] I think it's possible to recover the file just using the fd [08:23:01] yeah, that's what I'm doing [08:23:06] right [08:23:09] as long as the fd is there, I can just copy it [08:23:11] in that case instances must not die [08:23:15] indeed [08:23:28] I have a copy going based on lsof [08:23:35] ok [08:23:46] sounds cool [08:24:03] I wasn't really about to reboot stuff :) [08:24:04] I don't know how to handle the directory problem, though [08:24:21] hm, yes [08:24:24] creating instances is likely broken right now [08:24:36] you don't have a backup of that [08:24:44] oh what? [08:24:53] we don't keep backups of instances [08:24:53] to check how the directory was looking [08:24:57] ok [08:25:17] instances aren't intended to have data that needs to be backed up, for the most part [08:25:26] true [08:25:28] bots is an obvious exception [08:25:56] maybe the project storage could have a backup though [08:26:33] well, eventually it's supposed to go on the gluster shared storage [08:26:37] that isn't instance storage [08:26:50] ok [08:27:37] oh well. I'm off to bed [08:27:43] I'm going to finish fixing this tomorrow [08:27:47] ok [08:27:54] so no new instances today [08:28:13] actually I have one instance which can be removed so I can try to reboot it [08:28:19] we will see if it die [08:28:21] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [08:28:42] let's try it [08:30:21] RECOVERY Current Load is now: OK on dumps-nfs1 dumps-nfs1 output: OK - load average: 2.60, 2.75, 4.12 [08:34:02] sure [08:34:16] it's very, very likely to die [08:34:44] unless nova doesn't kill the process and recreate it [08:35:07] if it does it will definitely die, because it can't access the _base directory [08:36:41] PROBLEM host: turnkey-1 is DOWN address: turnkey-1 CRITICAL - Host Unreachable (turnkey-1) [08:37:41] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [08:40:13] yes [08:40:16] it's gone [08:45:07] bah. I've been copying the same file over and over. heh [08:45:12] stupid incorrect script [08:50:41] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [08:53:08] so, the _base directory holds a qcow2 copy of the image the instance is running. all changes to the qcow2 image are added to the filesystem in the instance's directory. [08:53:38] as long as the initial image exists it's possible to reboot the instances [08:56:49] but it doesn't exist [08:56:54] now [08:57:03] I can modify the basedir in nova's code for now [08:57:10] then I can reboot all of the instances [08:57:15] ok [08:57:17] of course, I'm going to test that ;) [08:57:21] not just do it [08:57:34] did you totally delete that instance, or left it broken? [08:57:38] left it [08:57:40] cause I can test using that [08:57:40] ok [08:57:42] which one is it? [08:57:48] turnkey-1 [08:57:52] ok [08:58:21] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [09:06:11] RECOVERY host: turnkey-1 is UP address: turnkey-1 PING OK - Packet loss = 0%, RTA = 0.52 ms [09:07:41] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [09:20:41] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [09:28:21] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [09:37:41] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [09:50:41] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [09:58:21] PROBLEM host: testing-puppet is DOWN address: testing-puppet CRITICAL - Host Unreachable (testing-puppet) [09:58:30] Ryan_Lane: creation is broken? [09:58:32] now [09:58:42] yeah, I told you it would be [09:58:46] ok [09:59:08] isn't it like 2 am where you are? [09:59:11] :D [09:59:14] yes [09:59:16] yay [09:59:22] I think we can wait [10:00:11] Most people do their best work at 2am :D [10:00:15] if there is anything I can help with, let me know [10:03:37] * Ryan_Lane nods [10:03:54] like rebooting all instances etc [10:04:06] please don't do that :) [10:05:46] ok [10:07:14] seems like that worked [10:07:30] yeah, so, this isn't going to be much fun :) [10:07:41] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [10:07:44] don't tell me we need to reinstall all [10:07:53] nah [10:08:21] we may need to shut down all instances and then restart them, though [10:08:31] RECOVERY host: testing-puppet is UP address: testing-puppet PING OK - Packet loss = 0%, RTA = 0.78 ms [10:08:44] ok [10:08:47] tell me when [10:08:54] well, I need to do it [10:08:57] ok [10:09:03] because it needs to be every single instance [10:09:04] I wanted to restart apaches one by one [10:09:09] so there is a short outage only [10:09:19] no. it needs to be all of them at the same time [10:09:22] ah [10:09:23] ok [10:09:32] because they need to release their handle on the directory [10:09:38] right [10:09:48] and when it gets shut down, it won't come back up until the directory is released [10:09:59] I guess I could just reboot all nodes [10:10:17] I think it's a good time to patch my bot [10:10:21] heh [10:10:42] maybe I could also install new kernels on all [10:10:51] I wonder if I can just suspend the instances [10:10:53] I bet I can [10:10:57] hm... [10:11:07] I think for bots it's better to shut down [10:11:11] why? [10:11:19] because some can't handle connection outage [10:11:31] ah [10:11:44] restart of process is better [10:11:52] but maybe it's just my case [10:12:05] I don't know the other bots but I bet cluebot will crash as well [10:12:10] Damianz: ^ [10:12:12] :o [10:12:48] are you sure you can suspend them? [10:12:57] well, it seems suspend keeps the process running [10:13:09] I think it needs to keep fd [10:13:20] otherwise it couldn't recover [10:13:28] depends [10:13:47] if there is a difference between the two images it was running from, it would crash [10:13:50] CB will just have a heart attack but supervise will bring it back up [10:13:50] I guess [10:13:58] if it was like hibernate, it would write changes to disk, then shut down the process [10:14:03] then reconnect to the file [10:14:06] when you resume [10:14:19] ok but if the file is changed while it's down [10:14:32] it could be problem a bit [10:14:41] maybe it keep the fd for that reason to prevent changes [10:14:42] why would the file change? [10:15:06] the files I restored never change [10:15:07] because as I understand it you copied the old image using fd and now you want to put it back [10:15:10] ah [10:15:24] I thought it's like a vd image [10:15:38] mounted as /dev/vda1 [10:15:43] ok [10:15:52] there's a base image and a disk image [10:16:02] the disk image only writes changes from the base [10:16:08] ah [10:16:22] that way, if we have 20 lucid images, it doesn't need to make 20 copies of the base [10:16:33] ok [10:19:11] Ryan_Lane: I forward that request to dzahn ok? my email [10:19:18] so that you don't need to handle it... [10:19:29] you can if you'd like [10:20:41] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [10:20:47] mutante: ping [10:24:52] !log dumps Deleted instances dumps-6 & dumps-7 [10:24:54] Logged the message, Master [10:27:39] yep. no getting around it. I'm going to have to stop all instances [10:27:47] * Ryan_Lane sighs [10:28:39] ALL? [10:29:12] yep [10:29:31] the way in which I've fucked up only has one solution :D [10:29:52] you don't happen to mean all >100 instances [10:30:01] hey, it could be worse, we could have 1,000 instances [10:30:05] yes, I mean all of them [10:30:11] wait wait [10:30:17] * Hydriz goes and save work [10:30:22] I don't mean immediately [10:30:36] I'm going to send an email out, and give people till some time tomorrow. [10:30:49] phew [10:31:06] yeah, tomorrow then probably I have time [10:31:22] if right now then it would be scramble [10:31:29] but whats fucked? [10:31:41] I screwed up the filesystem somewhat [10:32:07] but how does it affect the instances? [10:32:09] and the only way to fix what I did is to make all of the instances drop their handles to the current files [10:32:29] Oh, I see what you did there [10:41:27] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [10:43:37] I'm going to shut them down on March 6th [10:43:59] which also means no instance creation [10:49:18] sent an email explaining it to labs-l [10:50:47] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [10:51:52] * Ryan_Lane goes to sleep [10:51:56] * Damianz notes to take backup [10:55:56] bye [11:11:27] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [11:20:47] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [11:41:27] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [11:50:47] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [12:11:27] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [12:20:47] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [12:41:27] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [12:50:47] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [13:11:27] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [13:20:47] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [13:41:27] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [13:49:26] 03/05/2012 - 13:49:26 - Creating a project directory for hugglewa [13:50:47] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [13:59:33] mutante: here? [13:59:40] can you insert me to huggle wa too [14:01:34] petan|wk: Am I able to be added into deployment-prep? [14:01:49] hm... what are you going to do there? [14:02:03] probably going to just enable extensions for testing [14:02:12] (of course not cluster-wide or something) [14:02:15] that's exactly what people are not supposed to do there [14:02:26] zzz [14:02:34] deployment prep is a project where we test software before deploying it to cluster [14:02:37] if only labs allowed forking [14:02:52] then I would have forked this project and used it to enable extensions [14:02:57] that mean it's a place where approved sw is enabled to check it won't break the cluster [14:03:17] sw? [14:03:17] hm, it should be possible in future [14:03:27] Soft Ware [14:03:35] Ware soft! [14:04:39] <^demon> Wario Ware. [14:05:08] zzz [14:05:26] * Hydriz was just hoping to find out how the beta cluster is actually set up [14:05:43] weird [14:05:52] I would like to have someone to check and fix it [14:05:56] atm it's totally broken [14:06:10] if people won't start logging to channel [14:06:15] I will install audit daemon there [14:06:22] speaking of logging [14:06:31] I am currently running the bot from screen [14:06:42] ok [14:06:44] sudo service adminlogbot restart or something broke [14:06:48] permission issue [14:06:56] but it doesn't really matter who runs it anyway [14:06:57] which instance it is [14:07:06] sudo should run it as root [14:07:20] it does matter at some point [14:07:28] yes [14:07:34] I am wondering how root is getting permission issues [14:07:34] but somehow it broke in mid-air [14:07:43] ok let me fix it [14:07:50] !log bots restart logbot [14:07:51] Logged the message, Master [14:07:51] I think its starting the bot without sudo [14:08:01] letme kill it [14:08:22] logging in... [14:08:38] bad bot [14:08:39] !log killed :P [14:09:08] petan|wk: The bot was failing the other day on trying to write to a folder in /var that didn't exist [14:09:14] Also could do with moving to the labs instance. [14:09:22] it was moving? [14:09:30] thats good though [14:09:50] We have a bots-labs for like irc stuff as bots-2 randomly gets overloaded and kills like logbot. [14:10:01] Hydriz: you should not have kill it [14:10:06] but fine... [14:10:07] :O [14:10:15] * Hydriz is a murderer!!! [14:11:27] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [14:11:42] now it's runnig as root [14:11:55] loggie [14:12:00] not really [14:12:11] it doesn't run as root, but as its service user [14:12:40] !log bots Restarted as service user now... [14:12:41] Logged the message, Master [14:12:46] \o/ [14:13:22] but it doesn't matter who runs it, still [14:13:40] if it breaks again I believe anyone should just ssh in and restart it [14:13:47] as labslogbot [14:14:33] point of service is to be able to restart itself on fail [14:14:47] oh good [14:14:48] it shouldn't require us to restart it [14:14:52] it had better [14:16:14] hmm, whats audit daemon [14:16:29] stuff which track the changes to files [14:16:46] hyperon: hey [14:16:50] then how does logmsgbot work in -tech? [14:16:54] petan|wk: hey [14:17:02] what's up [14:17:07] can you move the log bot to bots-labs [14:17:15] I don't know where the .deb is [14:17:45] Hydriz: exactly as this one :o [14:18:10] Qn: What does "this one" refer to? [14:18:19] !log me [14:18:20] Message missing. Nothing logged. [14:18:23] this [14:18:30] nono [14:18:47] there is a small diff between them [14:18:48] the bot in #wikimedia-tech which automatically logs changes in the cluster [14:19:05] hm I guess it can be called using the shell too [14:19:09] either shell or irc [14:19:18] wtf isn't it called logmsgbot... [14:19:27] !log me [14:19:27] Message missing. Nothing logged. [14:19:35] actually there is another bot, one which logs and one which echoes text with !log [14:19:37] is morebots on #wikimedia-tech [14:19:52] yeah, I am talking about the echo bot [14:20:02] I think it's just same as nagios bot [14:20:09] it reads a file and that's all [14:20:21] when you write a line to file it echo it [14:20:26] that's all [14:20:28] simple [14:20:30] hmm [14:20:45] so shouldn't you be enabling that on deployment-prep? [14:20:47] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [14:20:56] you can just create a command log, which does echo "!log $1" >> log [14:20:57] it automatically logs stuff that is changed [14:21:03] I can [14:21:05] true [14:21:09] yeah [14:21:14] thats what should be done [14:21:25] ok [14:21:27] let me fix it [14:21:34] I look at the SAL of deployment-prep, but I don't understand much that is happening [14:21:46] I am still watching config.beta's all.dblist [14:21:48] for new wikis [14:22:06] but the latest is still enwikisource [14:22:40] oh yes [14:23:04] Can I get global sysop rights on the beta cluster so that I can help in importing MediaWiki: stuff? [14:23:17] and some important pages of the respective wikis [14:32:00] yes [14:32:05] but it's broken atm [14:32:37] importing? [14:33:26] of course this global sysop right can help fight vandalism :) [14:33:38] I see 2 in labs.wikimedia itself [14:34:07] whole site is down [14:34:11] down? [14:34:14] sort if [14:34:21] someone broke it I still try to find what's up [14:34:34] * Hydriz raises eyebrows [14:34:53] wait, whats broken? [14:34:59] eh [14:35:05] :D [14:35:11] !log deployment-prep petrb: this is test :o [14:35:13] Logged the message, Master [14:35:28] ok [14:35:42] here we go [14:36:23] !log deployment-prep petrb: created a new log system, just type log message to log your change on prep [14:36:24] Logged the message, Master [14:36:37] ... [14:36:43] its still not automatic logging [14:37:00] unless wikimedia's logmsgbot isn't automatic [14:37:40] see this example that just showed on #wikimedia-labs: [14:37:41] !log reedy synchronized php-1.19/extensions/CategoryTree/ 'r113035' [14:37:41] reedy is not a valid project. [14:38:04] thats what should be done [14:39:51] * Damianz eyes labs-sexy-bottie [14:40:52] Hydriz: yes it's not automatic on prod [14:41:01] zzz [14:41:06] that just sucks [14:41:11] maybe [14:41:22] yeah, maybe [14:41:22] write a system which is automatic and I will be happy to deploy it here :) [14:41:28] too much logging is bad too [14:41:31] heh [14:41:40] people will be eyeing on your system [14:41:58] (people like me) [14:42:23] but can I has global sysop rights? I can't stand that vandalism there [14:42:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [14:43:16] !log deployment-prep petrb: blah [14:43:27] !log deployment-prep petrb: this is a test of bottie :o [14:43:35] CRASH [14:43:36] damn [14:43:37] EPIC [14:43:46] bottie is too sexy for log bot [14:43:50] it crashed [14:44:28] who shall restart it? [14:44:29] Hydriz: I give you all what you need once I fix the labs [14:44:32] whoever [14:44:38] me then :P [14:44:40] ok [14:44:44] I hate that service service [14:44:47] heh [14:44:58] I shall use my own direct start? :P [14:45:04] no [14:45:16] hyperon: can you move it? :P [14:45:41] shyt [14:46:34] sorry, forgot what the service was called [14:46:35] :P [14:47:08] hmm [14:47:16] beta.wmflabs.org looks working fine to me [14:47:47] unless its broken somewhere very hidden [14:48:45] no doubt its slow [14:49:07] yes it looks like it works but it's broken atm [14:49:34] * Hydriz spots something [14:49:38] !log deployment-prep petrb: moved git repository to new path [14:49:51] "Sorry! We could not process your edit due to a loss of session data. Please try again. If it still does not work, try logging out and logging back in." [14:50:07] !log deployment-prep petrb: the new path is /usr/local/apache/common-local/ by the way [14:50:30] nonono [14:50:32] hm [14:50:37] there is something wrong [14:50:38] I don't think we can start it using service [14:50:42] it is hopeless [14:51:20] lets see if I can save preferences [14:51:23] ok [14:51:24] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [14:52:08] !log deployment-prep petrb: updating svn [14:52:11] looks like it ain't saved [14:52:21] okok [14:52:25] I start it direct [14:52:27] and see the diff [14:53:34] fail [14:53:55] my preferences are not executed [14:54:47] or saved [14:56:46] \o/ I saved my userpage [14:57:16] !log deployment-prep petrb: meh [14:57:21] MEH [14:57:24] itdoesn't wotk [14:57:26] work [14:57:32] so, morebots is broken itself [14:57:36] yes [14:57:43] oh wait [14:57:44] no [14:57:48] I forgot sudo again [14:57:51] GRRR [14:57:55] meh [14:58:06] IOError: [Errno 13] Permission denied: '/var/run/adminbot/project.cache' [14:58:06] !log deployment-prep petrb: meh [14:58:07] Logged the message, Master [14:58:09] heh [14:58:13] here we go [14:58:17] \o/ [14:58:42] !log deployment-prep petrb: updated live [14:58:43] Logged the message, Master [14:59:35] but its hell slow, despite having like 5 servers? [14:59:41] it's broken [14:59:49] s/servers/very small servers/ [15:01:29] !log deployment-prep petrb: please ignore some of the previous lines in log we were just testing bot [15:01:30] Logged the message, Master [15:04:57] !log deployment-prep petrb: inserted new wiki to sul [15:04:58] Logged the message, Master [15:05:46] thats just vague [15:05:58] * Damianz pokes labs-sexy-bottie with a stick [15:12:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [15:20:26] !log deployment-prep petrb: fixed broken memc :o [15:20:27] Logged the message, Master [15:20:52] wth [15:21:22] ? [15:22:24] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [15:23:59] !log deployment-prep petrb: temporary changed code of localsettings to debug site [15:24:00] Logged the message, Master [15:24:28] * Damianz questions why we have bots talking to bots [15:25:03] love it [15:25:31] If they get close and have botsexing we'll be overrun with bos [15:25:46] I see debugging information in beta [15:25:51] :O [15:36:58] !log deployment-prep petrb: restarted servers [15:36:59] Logged the message, Master [15:37:09] done [15:37:11] it's fixed [15:42:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [15:48:38] !log deployment-prep petrb: reconfigured squid [15:48:39] Logged the message, Master [15:48:48] !log deployment-prep petrb: temporary disabled ssl server [15:48:49] Logged the message, Master [15:52:34] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [16:05:58] petan|wk: use puppet or similar to push a log binary out to all wikis. Have a daemon on bastion listening for input, which can then dump it to IRC [16:06:19] binary send text to bastion:port etc [16:06:27] I'm sure WMF ha(ve|d) something similar [16:12:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [16:16:05] Reedy: I have no access on bastion so having daemon there isn't a best place [16:16:19] I would rather put to bots-labs [16:16:39] Seems a bit strange to put it on a specific project like that [16:16:49] bots-labs is instance for bots related to labs [16:16:58] ryan made it [16:17:08] logbot should be moved there itself [16:17:13] now it's on bots-2 [16:17:36] so it makes sense to have such a service there [16:18:16] is there a .bash_rc in puppet? [16:18:39] it would be easier if I could change the variables of all users on all projects [16:18:51] I need to define $PROJECT [16:22:34] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [16:42:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [16:52:34] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [17:12:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [17:22:34] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [17:28:15] petan|wk: just saw your mail about deployment-prep [17:29:00] petan|wk: i think i've been pretty good about logging there. but otoh there's some stuff i haven't been able to do because i couldn't figure out how so i just did nothing (instead of breaking something) [17:30:31] petan|wk: e.g., are there docs on how to make a new wiki (with a complete clone of MW space and a few hundred / thousand other pages)? [17:31:28] +1 that would be a good idea especially for the n00bs that are learning around here like myself ;) [17:31:44] Thehelpfulone: are you a member of the project? [17:33:00] deployment-prep? nope, but I've been asking a few questions about it [17:33:06] k [17:33:29] another .php is used instead of LocalSettings.php? [17:33:42] petan|wk: also, [i guess this is more of a general mediawiki question: ] i know there's the createandpromote maint script that's included with all installs but i didn't see any way (short of direct DB write or making a new script) to make a new steward or promote an existing user to steward. i guess wikis w/ stewards are just not that common in the world ;) [17:33:55] Thehelpfulone: yes, kinda a clone of prod [17:34:30] okay, jeremyb - the deployment-prep has SUL so you can request global steward yourself [17:35:36] Thehelpfulone: did you see petan|wk's mail? maybe SUL's still broken? (the SAL isn't clear on that front) [17:36:23] yep saw it - it seems to work for me [17:36:41] * Thehelpfulone reads the email in a bit more detail [17:37:01] Thehelpfulone: http://labs.wikimedia.beta.wmflabs.org/wiki/Main_Page#Config_files [17:37:57] ah interesting [17:39:37] jeremyb: do you have an account on the wiki? [17:40:17] Thehelpfulone: i think i couldn't find it last i checked? maybe i had one on a previous incarnation and then it was nuked and i never made a new one [17:40:25] making an account is easy though ;) [17:40:41] yep [17:42:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [17:52:34] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [18:12:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [18:22:34] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [18:42:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [18:52:34] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [19:12:34] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [19:17:24] PROBLEM host: storm2 is DOWN address: storm2 CRITICAL - Host Unreachable (storm2) [19:17:54] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [19:22:37] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [19:22:56] fwiw, i deleted storm2, so ignore that. [19:23:20] i can't get into reportcard1. it was unresponsive to ssh before so i rebooted it. [19:24:58] oh. i see the email now. [19:42:37] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [19:43:33] andrewbogott: lemme see if I can get your instance running [19:43:52] oneiric had a corrupted base image [19:44:06] that'd be great, thanks. [19:44:13] I tried cleaning it up, and that led me to wiping out the whole directory. heh [19:44:19] it was a very fail night [19:44:36] which instance is it again? [19:44:55] driver-dev [19:46:00] hm. hopefully this isn't a corrupted version [19:48:02] it surely won't boot [19:48:14] I wonder what's up with this image [19:48:19] the base must be corrupt [19:48:37] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [19:50:44] ah. crap [19:50:49] I think I know the problem [19:51:11] I think this image can't have its kernel upgraded [19:51:48] I wonder if I can mount your filesystem [19:52:44] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [19:56:28] andrewbogott: I don't see anything in /mnt [19:56:36] andrewbogott: all I see is lost+found [19:56:42] did you have it in /? [19:56:57] the bits I need are in /opt/stack. [19:57:01] ah [19:57:11] I'm having trouble mounting / [19:58:04] Any idea what caused this corruption in the first place? [19:58:11] hm. the base image is odd [19:58:28] it may be that the oneiric image isn't good [19:59:27] /var/lib/nova/instances/_base_bak/972a67c48192728a34979d9a35164c1295401b71: x86 boot sector; partition 1: ID=0x83, active, starthead 0, startsector 16065, 4176900 sectors, code offset 0x63 [19:59:37] it shows it as an x86 boot sector [19:59:49] whereas the lucid base shows: 7719a1c782a1ba91c031a682a0a2f8658209adbf: Linux rev 1.0 ext3 filesystem data, UUID=d0b682c8-a65f-43e0-a6b6-65987b7a9850, volume name "cloudimg-rootfs" (large files) [20:00:16] so, it's likely I need to mount a partition, and not the device [20:00:18] lemme try that [20:02:04] ah, you can specify a partition explicitly [20:02:21] qemu is pretty fucking awesome [20:04:10] well, partition1 doesn't seem to be correct. heh [20:04:25] maybe I should fdisk the entire disk so that I can see the partition table [20:04:36] hm. well, this is annoying [20:05:40] ok. I take back the part about it being awesome [20:07:00] there's a *really* good chance I'm about to segfault virt1 [20:07:22] We're all rooting for you! [20:08:31] wow. I can't believe that didn't kill the system [20:09:17] I just kill −9'd a mount command, and a nbd command [20:09:54] yikes [20:09:56] ok. let's try this again :) [20:10:07] * Damianz finds his spike [20:11:17] hm. there's only a single partition... [20:11:40] \o/ [20:11:44] I mounted it! [20:11:56] * Damianz looks at Ryan_Lane mounted [20:11:58] andrewbogott: let me back up your data :) [20:12:12] OK. Or I can just salvage the files I need, if I have access to the mount [20:12:22] it's on the virtualization host [20:12:34] at some point we should set you up with production cluster access [20:12:39] so that you can access these nodes [20:12:45] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [20:12:53] OK, I was just about to ask w/not I have access to virt1 :) [20:13:21] yeah, there's no good reason not to give you access to them [20:13:48] ok, qemu, I take it back, you're kind of awesome [20:14:46] Ryan_Lane: Everytime you feel hate towards qemu just think of Xen 3 and the kernel admin hastle :) [20:15:01] I despise xen [20:15:26] I despise xen less than virtuoozo [20:15:59] heh. I've made /dev/nbd0 worthless [20:16:03] the kernel can't use it now [20:17:20] andrewbogott: where do you want me to stick this tar file? [20:17:54] My homedir in the gluster project is good, if there's space. [20:18:10] Um... presuming my other gluster instances are still working. [20:18:11] * andrewbogott checks [20:18:32] yep. [20:19:05] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [20:21:50] driver-dev-jumbo [20:22:45] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [20:29:14] so, the good thing is, as long as you don't delete the actual instance's change disks, it's always possible to at minimum get the data back [20:29:41] That comes as quite a relief. [20:29:52] and even then, you may be able to suspend the instance, and recover the disks via the file descriptors [20:30:07] Is that .tar still copying, or is it up someplace? [20:30:14] sorry. copying now [20:30:34] it isn't easy to move stuff from a backend production system to a labs node :) [20:30:56] I can just scp it to my local machine too if that's simpler [20:31:14] nah, I'll have it there in a sec [20:31:40] all this will be less problematic when we have the gluster storage up [20:31:49] no worry of dataloss then [20:32:00] well, not from instances going away anyway [20:32:34] there's an undelete module for gluster. it may be good to enable that, and put a cleanup script in the undelete directory [20:32:55] (for volume storage) [20:33:17] I really need to stop calling that volume storage [20:33:23] project storage? :) [20:33:41] <^demon> project? Let's not bikeshed over that word again :p [20:33:52] heh [20:34:18] ^demon: well, this is specifically storage for labs projects [20:34:37] <^demon> I know, I'm just being a jerk ;-) [20:34:49] I'm going to continue to use project, you guys can decide to change your term, if you'd like :) [20:35:22] andrewbogott: it's copying to /tmp on driver-dev-jumbo [20:35:34] is devstack ported to precise yet? [20:35:41] I really want to get away from oneiric [20:35:53] <^demon> Is oneric bad? [20:36:01] no, but it isn't an LTS [20:36:08] and we aren't set up to use it [20:36:10] Thanks for rescuing my data! [20:36:14] yw [20:36:18] Not sure about precise; I don't think it's ported yet. [20:36:31] I don't know what's up with oneiric images, but not being able to reboot is a PITA [20:36:46] I bet it's because of kernel upgrades [20:37:59] for lucid I'm using a specific image loader. I thought oneiric just worked, but apparently not [20:42:34] andrewbogott: ok. finished copying [20:42:45] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [20:43:10] cool [20:49:05] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [20:49:27] did someone reboot reportcard1? [20:49:42] why's it reporting as down? [20:50:58] I'm avoiding rebooting anything right now... [20:52:45] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [20:53:35] yeah, everyone should :) [20:54:43] Might just take a backup as well :P [20:55:55] hm. yeah. seems it's been rebooted [20:56:01] it isn't running on any node [21:16:22] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [21:19:22] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [21:23:52] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [21:31:05] petan: petan|wk: sup? [21:32:58] * addshore_ needs sleep [21:46:22] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [21:49:22] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [21:54:32] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [22:16:22] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [22:19:22] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [22:24:32] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [22:46:22] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [22:49:22] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [22:54:32] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [23:16:22] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [23:17:58] andrewbogott: what directory structure were you aiming at for the gluster stuff again? [23:18:11] // ? [23:19:22] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [23:19:27] Ryan_Lane: It'll be configurable. [23:19:34] ah. cool [23:19:41] that's what I'm aiming at for now [23:19:55] and gluster volume name: - [23:20:04] There might be some merit to using an arbitrary hash instead, to avoid conflicts. [23:20:15] hm [23:20:31] But that makes it a lot harder to dig around by hand. [23:20:38] that may make it difficult to mount [23:20:44] Yeah. [23:21:02] - should avoid conflicts, right? [23:21:10] since projects are unique [23:21:15] and volumes in a project must be unique [23:21:22] Yep. [23:21:27] same with // [23:21:58] Yeah, unless someone creates a project called 'global' :) [23:22:04] heh [23:22:11] yeah, that's problematic [23:22:48] hm. maybe /global// ? [23:22:52] So maybe /projects//etc. [23:23:00] * Ryan_Lane nods [23:23:07] and then global stuff in /global/etc. [23:23:16] yeah. that makes sense [23:24:32] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge) [23:24:49] I wonder if we should make global filesystems have unique names across projects, or if they should be in their own project namespace [23:25:18] /global// vs /global/ [23:25:26] I'd imagine project specific is likely saner [23:25:44] Wait, isn't the point of a global fs that it isn't project-specific? [23:25:58] well, who manages the volume? [23:26:14] Oh, I see... ownership vs. access. [23:26:18] * Ryan_Lane nods [23:26:52] global shares could be read/write or read/only, but only the creator should be able to delete the share [23:26:58] If a filesystem is always owned by the project that creates it regardless of access scope, then everything can just be // [23:27:06] ah. true [23:27:14] then it's just meta-information about the volume [23:27:31] And scope just won't be reflected in the directory structure at all. It'll be stored in a db someplace [23:27:41] right [23:27:44] Which I need to do anyway. [23:27:53] I like that idea [23:28:11] ok, seems straightforward. [23:33:55] * Ryan_Lane twitches [23:34:03] seeme access is all or nothing [23:34:05] *seems [23:35:43] per host, you mean? [23:35:55] Ah, no read-only? [23:36:01] seems to be the case [23:38:22] let's see if it's an option in 3.3 [23:38:24] :) [23:38:46] hm. can use glusterfs as a backend for hadoop in 3.3 [23:42:05] no read-only support in 3.3 [23:42:11] other interesting features, though [23:46:22] PROBLEM host: driver-dev is DOWN address: driver-dev CRITICAL - Host Unreachable (driver-dev) [23:47:54] addshore_: hey can u give me sysadmin in hugglewa [23:48:02] + netadmin [23:48:09] or Ryan_Lane1 can you do it [23:48:14] ? [23:48:18] mutante forgot to do that [23:48:28] he created a project but didn't put me in [23:48:45] it's not really urgent since we can't create instances but anyway :D [23:49:22] PROBLEM host: reportcard1 is DOWN address: reportcard1 CRITICAL - Host Unreachable (reportcard1) [23:49:23] done [23:49:26] ok [23:49:27] 03/05/2012 - 23:49:27 - Creating a home directory for petrb at /export/home/hugglewa/petrb [23:49:43] heh. it's actually not a bad time to break instance creation [23:49:51] :) [23:49:51] I probably needed to disable it anyway [23:49:56] really? [23:50:01] out of space? [23:50:01] :D [23:50:06] we have 115 instances and 4 hosts [23:50:08] no. space is fine [23:50:10] heh [23:50:13] memory isn't [23:50:14] I think I can kill 1 [23:50:22] nah. it's fine [23:50:25] all new instances are 1gb memory only [23:50:26] 03/05/2012 - 23:50:26 - Updating keys for petrb [23:50:36] problem is that we didn't have other than 2g in past [23:50:49] so I couldn't save memory even if I wanted [23:51:10] apaches could be all 1g [23:51:54] nikerabit wanted me someone know why? [23:52:01] nah. hydriz needs to kill some instances, though [23:52:08] that's way too many for dumps [23:52:14] dumps? [23:52:15] of what [23:52:20] we already have dumps afaik [23:52:30] it's uploading dumps to internet archive [23:52:32] apergos handle it or not [23:52:36] ah [23:52:50] wikimedia dumps? [23:52:55] what's it for [23:53:04] :D [23:53:05] 6 instances for uploading and two for storage is too many [23:53:08] wikimedia dumps [23:53:19] is there a good reason for this project to exist? [23:53:27] why we need to upload dumps to archive? [23:53:31] why not? [23:53:40] I think we already have archive or not [23:53:46] it's good for orgs other than us to have our dumps [23:53:48] old dumps can be downloaded too [23:53:58] what if we get hacked and someone deletes them all? [23:54:03] but dumps we provide are public or not [23:54:07] public [23:54:07] ah, right [23:54:09] true [23:54:25] i'm fine with having dumps uploaded. less so with this many instances for doing it [23:54:32] ok [23:54:32] PROBLEM host: canonical-bridge is DOWN address: canonical-bridge CRITICAL - Host Unreachable (canonical-bridge)