[00:01:08] Looks good on a few labs instances I just updated. Monitoring should be fine, but has another un-related issue right now. Can fix it if not anyway. [00:01:13] cool [00:01:22] ok [00:19:59] For at least the last several hours, attempts to access Catscan v2 (https://tools.wmflabs.org/catscan2/catscan2.php) immediately result in "Internal Server Error" [00:23:17] Confirmed. [00:26:32] !log tools tools-webserver-02: Rebooted; inaccessible via ssh, http said "500 Internal Server Error" [00:26:33] Logged the message, Master [00:27:24] scfc_de: still borken, any suggestions? [00:30:40] I can log into tools-webserver-02 now, let me see. [00:31:27] http://tools.wmflabs.org/catscan2/catscan2.php seems to work now. [00:31:44] Yay, AlanM1: fixed [00:31:47] thanks scfc_de [00:31:56] Lots of "Cannot allocate memory: couldn't create child process:" for xtools. [00:33:15] Cool. I guess I should have reported it hours ago :) I'm surprised nobody else did. [00:41:36] Now I get, after a little while, "Could not connect to commonswiki.labsdb : Can't connect to MySQL server on 'commonswiki.labsdb' (4)" [00:44:40] AlanM1: if everyone just assumes that someone else will do something, it will not get done at all [00:44:40] although I am surprised nobody noticed it [00:45:15] :/ [00:45:26] I understand. I'm still surprised. [00:47:30] scfc_de: can you take a look? [00:48:56] *Argl*. I'll never learn to run iptables after a reboot. Moment, plese. [00:49:52] AlanM1: Please try again? [00:50:03] k [00:51:24] Excellent on both catscan and quick_intersection. Thanks! [00:51:51] np [00:52:51] guten Abend [01:21:01] Coren, is manage-volumes-daemon running wild again? [01:47:01] Coren: Re the mail to root@tools.wmflabs.org, same issue as with Tahir: As fsainsbu, I can't "/data/project/tasmania/abc". [01:48:45] But I remembered anomie (?) suggesting using newgrp, and that works "newgrp local-tasmania && touch ...". [01:58:45] Googling (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/409366) suggests a causality with users being in more than 16 groups, but fsainsbu is only in four. [02:00:16] Are labstore* run with --manage-gids or without? [02:50:05] andrewbogott: Our image has an issue. ^@%# [02:50:11] andrewbogott: / is only 4G [02:50:54] weird… surely that's the same as the base image we were using in pmtpa? [02:51:20] Actually no, there are other disk partitioning differences; most not an issue but this one is going to bite us. [02:52:12] OTOH, I did manage to puppetize LVM partitioning of the extra space; I can probably weasel off something. (Which isn't all bad, you don't want to have all in / if you can avoid it) [02:55:04] Just to make things fun; tools-login (which has a lot of packages, and the whole dev tool chain) ends up at 96% full. :-) [02:56:54] * Coren wonders. [02:57:21] Wait, /is/ it the image that does this or is that made by openstack when it creates the instance? [02:57:40] Nah, gotta be the image. [02:59:47] Other difference: rather than a fixed /dev/vda and a /dev/vdb for the rest, eqiad has one big /dev/vda with the unpartitioned extra. (Not an issue) [03:02:15] Nah, that image is going to be unusable in practice; it's going to cause us support headaches down the road. [03:02:27] Crap crap crap. [03:02:50] andrewbogott: Not really any choice, I got to find and fix this and we have to rebuild everything in eqiad. :-( [03:03:34] * Coren has lots of very stern, very angry words. [03:03:42] Why oh why did I not notice this earlier? [03:04:20] * Coren screams [03:05:02] andrewbogott: Read scrollback and weep. [03:05:23] andrewbogott: Were there any other things in the image you really wished would have been tweaked? Because now is the time. :-) [03:08:10] Coren, sorry, back. Catching up... [03:08:49] Dang. [03:09:10] Well, let's make sure it's not the 'flavor' setting that's messed up vs. the image. Did you try creating a 'large' instance to see if / is bigger there? [03:09:34] But… :( to having to rebuild new instances [03:09:41] andrewbogott: It makes a bigger overall disk. The / partition has the same size. [03:09:51] huh [03:10:02] Why different between folsom and havana I wonder? [03:10:11] Do you see where the size of / is specified in the image? [03:10:22] (and, btw, no, I don't have anything else in my wishlist for new images I think) [03:10:26] I'm looking for it now. [03:12:06] The .qcow2 has the partitioned scheme. [03:12:42] Which means it's definitely at image creation time. [03:15:20] Ah, so maybe the device names are rearranged in eqiad? [03:16:07] That's part of it, but not all of it. [03:16:55] eff, and I was basically done setting up tools too. [03:17:43] In situations like this I usually tell myself "It will be even better the second time, thanks to what I've learned!" [03:17:48] But in this case it was already the second time :( [03:28:53] andrewbogott: Apparently, vmbuilder ignores the specified vmbuilder.partition file in vmbuilder.cfg; you gotta specify it (again?) on the command line. [03:29:10] that's dumb [03:29:19] I'm not sure where it even gets the 4G from though. [03:29:31] Yeah, makes me wonder if it's assigned in yet a third place? [03:29:47] * Coren experiments. [03:33:52] There goes my weekend. :-( [03:37:30] andrewbogott: Do you see a sane reason to keep a swap partition by default? [03:38:07] I mean, if I'm going to redo the image, I might as well get it right the first time. (split /var for one) [03:38:14] Coren: I don't know enough to have an opinion. If existing puppet setups expect swap then maybe we should keep one... [03:38:22] But, I defer to your judgement :) [03:41:51] puppet doesn't care about swap -- it's all in the image; and on a VM the presence of swap is of dubious value anyways. The current sysadmin thought is divided on its value anyways. [03:41:59] IMA go paranoid and still put some. [03:42:29] Turning it off isn't complicated anyways. [04:06:49] Coren: so after putting my (puppetized) webserver behind proxy recently, should i just remove all SSL configuration from the local Apache ? [04:07:05] or is it like nice to keep for others to reuse without such a proxy.. .. or shrug [04:07:09] mutante|away: Might as well, it's just going to collect dust. [04:07:26] hmm, yeah, i have a pending "turn into module" change [04:07:26] The proxy always hits port 80 [04:07:41] but my main class has parameters like wikistats_ssl_cert => $wikistats_ssl_cert, etc [04:08:02] i guess i can remove all of that, ,yea [04:08:37] and if it ever becomes prod, then i guess it will make me use misc varnish.. ok [04:15:38] mutante|away: You'd want to do that anyways; we wouldn't want to expose the actual server to the 'net [04:16:01] andrewbogott: pushing the image with glance now. [04:16:18] ok [04:16:26] Best to leave the old one there I guess [04:21:11] andrewbogott: Done. *sigh* Now to delete all the tools instance and start over. Yeay. [04:21:18] :( [04:21:52] have you talked to hashar about this already? [04:23:10] No, but with luck his instance with the cramped root won't be a trouble for him. [04:23:52] Is there a predictable pattern on what server instances are created? Round robin? [04:25:57] It should place new instances on the least-busy host [04:26:04] for a somewhat complicated definition of 'least busy' [04:26:10] which I can look up for you if you care :) [04:26:36] Not enough. What counts to me is that two consecutive new instances are likely to be on different servers. :-) [04:28:47] Worst case is I have to move a couple post-facto. [04:29:42] New images are correct with 8G root, 2G /var [04:31:55] which is the same as what we were getting in pmtpa? [04:49:49] andrewbogott: pmtpa gets 10G /, no separate /var [04:57:21] !log wikistats - turned into puppet module [04:57:24] Logged the message, Master [05:56:15] YuviPanda: If you are around… can we talk about the labs proxy a bit? [06:52:55] andrewbogott: https://gerrit.wikimedia.org/r/116063 [06:57:24] Ryan_Lane2: That looks good… does that make it harder to turn on for everyone later on? [06:57:35] nope. just make the list empty [06:57:38] I'm testing it now [06:57:55] hm. I'm logged into wikitech-test, but it isn't working [06:58:08] well, I mean I see projects but have no rights [06:58:13] maybe I need to log out and back in [06:59:11] yep. that helped [07:05:42] I had in_array backwards :D [07:06:25] andrewbogott: ok, tested. this can be merged and deployed [07:06:27] wait [07:06:31] I take that back. one sec [07:07:33] yep. tested the right added to a group [07:07:40] andrewbogott: it can be merged and deployed :) [07:07:55] great! [07:08:07] if you deploy it, I'll add the necessary config [07:08:24] ok, one second... [07:08:25] in fact, I'll add it now [07:09:51] contigured [07:10:13] we'll need to do a few more things [07:10:23] 1. the region config needs to be added to keystone in pmtpa [07:11:16] ok, deployed. [07:11:18] now, the question is, do we want to switch the keystone url to eqiad? [07:11:27] and have eqiad replicate to pmtpa [07:11:31] or vice versa? [07:11:41] Hm... [07:11:59] I'd say leave it at pmtpa for now. that seems like the conservative choice. [07:12:10] well, either way I'm about to wipe out every keystone token ;) [07:13:00] let me switch the replication for redis [07:13:09] so that pmtpa replicates to eqiad [07:13:37] ok [07:13:58] *shrug* I could be convined either way. It'll be easier to switch off virt0 if virt1000 is the boss of keystone [07:15:33] we may want to do more testing on keystone in eqiad [07:15:41] so let's keep it all in pmtpa for now [07:16:02] I'm turning off keystone in eqiad and deleting its data [07:16:05] same in pmtpa [07:16:07] err [07:16:11] well, redis in pmtpa [07:16:15] not keystone [07:16:28] andrewbogott: mind adding the region info in pmtpa to keystone? [07:16:58] I don't mind but I don't immediately know what you mean [07:19:26] adding eqiad region data to keystone in pmtpa [07:19:35] all the change I'm doing is replicate keys [07:19:50] the region data still needs to live in mysql for both pmtpa and eqiad [07:19:52] Ah, you mean, configure endpoints? [07:19:56] yeah [07:19:57] and services [07:20:16] it's the same services isn't it? Just new endpoints for existing services? [07:22:02] shit [07:22:03] broken [07:22:13] * andrewbogott stands back [07:22:57] I keep forgetting folsom and havana have different templates [07:23:53] should be working now [07:23:59] I have it locally fixed and puppet disabled [07:24:50] hm [07:24:57] wikitech is showing no instances [07:25:34] I see some in bastion but not my instance in testlabs [07:25:38] ah. memcache [07:25:59] restarted memcache to purge any cached keystone keys [07:26:05] So… my question before… I point the eqiad endpoints and the existing pmtpa services, yes? [07:26:05] now everything is showing up [07:26:24] yep, looks right to me [07:26:24] no. eqiad's endpoints point at eqiad's endpoints ;) [07:26:32] keystone should have both pmtpa and eqiad regions [07:26:38] with services and endpoints for both [07:26:51] in both pmtpa and eqiad [07:27:04] that data isn't replicated between pmtpa and eqiad [07:27:07] only token data is [07:27:21] ok, but… I don't think a service is region-specific... [07:28:03] services may not be [07:28:10] if not, then it's just the endpoints [07:28:13] ok, well… we'll see how this goes [07:28:19] I haven't looked at that in a while ;) [07:28:27] hm. why are the tokens not replicating? [07:29:22] redis needed a restart [07:29:32] well, this is going swimmingly so far [07:29:42] andrewbogott: had you logged out and back in to wikitech? [07:30:03] yes, a couple of minutes ago [07:30:26] ok [07:30:33] I'll need to purge people's long-term tokens [07:30:39] so that they'll be logged out [07:31:51] hm… do we want an eqiad keystone endpoint on pmtpa? [07:31:53] we don't do we? [07:31:55] just compute and image? [07:32:13] keystone too [07:32:23] everything that's in pmtpa should be in eqiad and vice versa [07:32:24] ok [07:32:35] (except proxy 'cause i haven't set it up yet :( ) [07:34:39] ok… keystone in pmtpa should be able to see eqiad endpoints now. [07:35:01] ok, purged wiki user session tokens and keystone tokens associated with those [07:35:44] hm. I may need to restart memcache again [07:36:00] yep [07:36:02] that did it [07:36:07] everyone will be forced to log back in now [07:36:19] nova commandline can see both regions now [07:36:30] Do I need to log back into wikitech again? [07:36:35] Oh, so I do [07:37:06] Heeeeey, I can see two regions! [07:37:19] \o/ [07:37:23] same [07:37:34] let me remove your permission for that [07:37:42] BTW, yesterday: "[22:23:32] ssh: connect to host abusefilter-global-main port 22: No route to host" [07:37:55] and the 'add instance' links show different images for the two regions, that's a good sign [07:38:00] Still: "ping abusefilter-global-main" => "From tools-login.pmtpa.wmflabs (10.4.0.220) icmp_seq=1 Destination Host Unreachable" [07:38:06] andrewbogott: and now it's gone ;) [07:38:12] scfc_de: there was a dns outage, caused by an ldap freakout [07:38:18] hm. I may want to make another group for this [07:38:28] right now it's set to cloudadmin [07:38:28] Ryan_Lane2: yep, gone. [07:38:41] but we'll want to add some early access [07:38:47] maybe not [07:38:54] andrewbogott: back [07:39:32] andrewbogott: DNS works fine ("abusefilter-global-main.pmtpa.wmflabs has address 10.4.1.38"); that looks like network. [07:40:01] looks like it's talking to nova-api well [07:40:10] and it's giving back glance data [07:40:31] ah. the proxy stuff is limited to pmtpa right now, eh? [07:40:44] yeah, only because I haven't gotten around to it. [07:40:54] And the proxy live /in/ labs so requires a working eqiad to create [07:41:00] yep [07:41:01] heh [07:42:04] ok, so to throw the big switch I change [07:42:05] $wgGroupPermissions['cloudadmin']['accessrestrictedregions'] = true; [07:42:06] to [07:42:11] $wgGroupPermissions[]['accessrestrictedregions'] = true; [07:42:12] ? [07:42:20] nope [07:43:01] $wgOpenStackManagerRestrictedRegions = array( 'eqiad' ); [07:43:07] make that an empty array [07:43:14] no restricted regions ;) [07:43:47] Ah, I see. Ok. [07:46:19] Ryan_Lane: I'm copying a gluster volume by piping it through tar… and 'du' says the copy is bigger than the original. Is that possibly I copied sparse files wrong? Or something else? [07:46:36] gluster volumes aren't sparse [07:46:44] (I'm doublechecking the numbers to verify that the size mismatch is actually true...) [07:46:45] so no need to specify that option there [07:46:54] and yes, gluster will lie about its du [07:47:03] it's basically completely inaccurate [07:47:04] Oh, ok. Then I will disregard! [07:47:11] * Ryan_Lane hates gluster [07:51:48] * Damianz hates xfs [07:57:51] Ryan_Lane: so, more patches incoming to make up for hotfixes? Or is all that in already? [07:58:04] all done [08:02:52] should I redeploy on virt0? [08:09:21] andrewbogott: redeploy? [08:09:34] It should be re-enabled [08:09:56] Ah, sorry, I mean, shall I rebase the code on virt0 to match what's in gerrit [08:10:15] I didn't deploy anything to openstackmanager [08:10:19] it was a puppet issue [08:10:41] I merged the fix in and reenabled puppet on virt0 [08:10:54] OSM should be at whatever point you left it :) [08:11:41] ok, great. [08:11:42] Thanks! [08:11:47] Probably you would like to go to sleep [08:12:05] I will be in a bit [08:39:57] andrewbogott: are things working ok for eqiad in wikitech? [08:40:06] or haven't tried? [08:40:41] I havent done much so far. Just now created an eqiad instance, let's see how it goes... [08:40:56] cool [08:41:41] looks ok so far, dispatched to virt1006 [08:43:48] Ryan_Lane: Yep, new instance came up and looks good. [08:43:54] So that's one tiny test passed :) [08:43:58] \o/ [10:17:01] !log deployment-prep Puppet running on varnish upload cache after several months. Might break random things in the process :( [10:17:02] Logged the message, Master [10:54:37] Hi, i am interested in participating FOSS outreach program for women this year. I would like to contribute to wikimedia project "Welcoming new contributors to Wikimedia Labs and Wikimedia Tool Labs". How can I get to know more about this project? [10:56:22] sandaru__: That's probably me, but I'm quite swamped today. Do you happen to know when the application deadline is? [10:57:17] March 19 is the deadline for application [10:58:16] It's ok . I already sent you a mail. can you reply to that when you are free?? [10:59:45] Ah, great. Yes, I will reply -- probably will be on Tuesday or Wednesday though. Feel free to pester me if you don't hear back by then. [11:00:50] ok :) Thank you. [11:04:24] Speaking of which, we also need a "Village Pump" for Tools for the "old" contributors; mailing lists and IRC apparently don't reach everybody. I remember some discussion about the proper name, but I forgot the results :-). [14:12:26] notice: Finished catalog run in 3867.26 seconds [14:12:32] Impressive. [14:14:50] nice [14:46:40] !log tools petrb: extending /usr on tools-dev by 800mb [14:46:42] Logged the message, Master [15:15:18] Coren: andrewbogott: mind merging a change for the contint slave labs ? https://gerrit.wikimedia.org/r/116111 :-] [15:16:00] Coren: andrewbogott I need a global configuration for python package manager 'pip' (it does not run on production) [15:16:24] so definitely safe :] [15:16:59] hashar, just a minute and I'll look... [15:17:03] hashar: Comment inline; switch those to \n adn we're okay. [15:17:14] merci! [15:17:48] so all on a very long huge line? :-] [15:18:17] I will use a file instead [15:18:17] easier to edit spot [15:20:24] hashar: Yeah; I rarely use content => for anything over two lines, because that messes up the puppet manifest. [15:22:38] updated https://gerrit.wikimedia.org/r/#/c/116111/2/manifests/role/ci.pp,unified [15:22:49] but I have put the file in the module contint [15:22:52] referring to it as puppet:///modules/contint/pip-labs-slaves.conf [15:23:00] and the file is modules/contint/files/pip-labs-slaves.conf [15:24:05] Coren: :-) [15:35:09] Coren, is eqiad working OK for you on eqiad? (If you've gotten to that yet…) [15:35:56] eqiad is working okay on eqiad? :-) [15:36:15] I'm sure you mean Wikitech, in which case it does appear to. I haven't run into any problems. [15:42:16] Coren: thx :] [15:46:53] Um… yes, wikitech [15:47:24] Coren: I am (evidently) moments away from starting my weekend, but about to send you an email for proof-reading. [15:55:29] andrewbogott: Email works for me. I'll be able to write my part soon once I get tools working. Again. Because I had to start over. :-) [15:56:03] I am very very sorry about that :( [15:59:04] OK, sent. [15:59:27] Coren, if the hotel internet smiles on me then I'll be online a bit over the weekend. Failing that, I'll see you on Tuesday. [15:59:41] I hope you get at least a little bit of weekend in before then :/ [16:01:47] Have a good flight back! [16:04:59] I am off for the weekend have fun folks! [16:59:31] Coren: question about migrating, Will it be possible to copy symlinks to the new setup? [17:00:01] I have a system of symlinks that I would rather not have to re-create [17:00:13] Oh, yes -- it makes a tarball so everything should stay put. [17:00:47] Coren: that includes restoring the sym paths ? [17:01:10] None of the paths changes, so yeah. [17:01:55] Coren: Ill hold you to that :) if stuff breaks Ill know who's head to put on a pike [17:06:19] Coren: I know its kinda ugly but Ive dumped everything into a single svn repo and use symlinks to make the structure more usable [17:21:09] Betacommand: Whatever floats your boat. :-) [17:22:32] Coren: since I cant use winscp to copy new files to labs I had to get around that issue by throwing everything into a repo [17:22:54] I'm sure there has to be an actually good scp client for Windows. [17:23:23] I mean, I know Winscp is teh suxx0rz, but surely someone /somewhere/ wrote a decent one. [18:43:51] Coren, will home directories also be sub-copied when running the transfer script manually? [18:44:28] valhallasw: There will be a script provided to just outright copy your home; that copy-to-a-subdirectory is just what will happen if you don't move by hand first. [18:45:11] OK, great. [18:46:57] (It's a last-resort-help-save-my-data thing) [19:23:10] Hi, anyone around? [19:23:17] ssh: connect to host abusefilter-global-main port 22: No route to host [19:23:19] ^ ??? [19:26:32] huh: I'm around. Looks like that box is down for the count. [19:26:38] s/box/vm/ [19:27:19] Is it possible to start it? [19:27:33] If so, how? [19:28:26] huh: Well, any of the projects admins can do so. In practice, that'd be PiRSquared17 [19:28:40] ... which is you. [19:28:40] :-) [19:28:47] Didn't recognize the nick. :-) [19:28:53] Sorry ;) [19:29:18] I tried rebooting it on wikitech, but no luck (and the console output was not that helpful). [19:29:40] Is there a command I can use to start it? [19:29:48] What state is it in? [19:29:59] it says active [19:31:24] Then there is little to do but scrutinize the log to see what happened. Presuming your configuration is all in puppet (which it should) you can simply recreate the instance. [19:31:49] The best I could do is go kill the actual KVM; see if it helps. [19:32:14] (It's possible the system is too broken to respond to the reboot request) [19:32:43] :/ [19:33:20] Coren: is the console output just the result of dmesg ? [19:33:29] Or is it from stderr/stdout/something else? [19:33:42] It's mostly what comes out of syslog. [19:33:48] Which includes dmesg [19:33:56] Okay. [19:37:18] Coren: would the memory be lost? [19:37:39] or virtual memory [19:37:53] Wait, if I kill the vm? Yes. It'd be a cold boot. [19:42:20] Coren: it's okay, I don't think I have anything important there [19:56:21] huh: Should be rebooting now. [19:56:26] ok [20:39:05] huh: Any luck? [20:39:40] pirsquared@bastion1:~$ ssh abusefilter-global-main [20:39:42] ssh: connect to host abusefilter-global-main port 22: No route to host [21:14:38] (03PS1) 10John F. Lewis: Increase length before removing users from -attacks [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/116160 [21:19:37] (03PS2) 10John F. Lewis: Increase length before removing users from -attacks [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/116160 [21:27:08] (03Abandoned) 10John F. Lewis: Increase length before removing users from -attacks [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/116160 (owner: 10John F. Lewis) [21:30:02] (03PS1) 10John F. Lewis: Increase length before removing users from -attacks [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/116161 [21:33:17] (03PS2) 10John F. Lewis: Increase length before removing users from -attacks [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/116161 [21:33:49] (03CR) 10PiRSquared17: [C: 04-2 V: 032] Increase length before removing users from -attacks [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/116161 (owner: 10John F. Lewis) [21:34:13] (03CR) 10PiRSquared17: [C: 032] Increase length before removing users from -attacks [labs/tools/WMT] - 10https://gerrit.wikimedia.org/r/116161 (owner: 10John F. Lewis) [21:36:01] * abuuuse hides [22:07:49] huh: I think your instance is hosed.