[00:14:16] 41 $servernames = $site ? { [00:14:16] 42 'pmtpa' => [ 'nfs1.pmtpa.wmnet' ], [00:14:17] 43 'eqiad' => [ 'nfs1.pmtpa.wmnet' ], [00:14:23] from LDAP config [01:49:40] hehe, nice commit message from andrew [01:49:42] https://gerrit.wikimedia.org/r/#/c/164489/1 [01:49:55] cool if it helps [03:37:47] mutante: it worked! [04:05:55] Did someone do something with SSH on Labs? [04:06:00] The keys in particular [04:06:07] I just connected to one of my instances a little while. [04:06:13] Upon reconnecting just now, it's telling me the key has changed. [04:15:58] Okay, I don't know if this is the "DNS issue" mentioned in the topic, but I think I tracked it down: [04:16:01] https://wikitech.wikimedia.org/wiki/Nova_Resource:I-0000010d.eqiad.wmflabs shows: [04:16:18] Private IP: 10.68.16.61 [04:16:28] But on bastion1.eqiad.wmflabs: [04:16:44] The authenticity of host 'mwui (10.68.16.63)' can't be established. [04:16:48] Different IP! [04:16:53] ^ Coren [04:57:32] 3Wikimedia Labs: DNS problem with instance on Wikimedia Labs (apparently two instances with same name) - 10https://bugzilla.wikimedia.org/71595 (10Matthew Flaschen) 3NEW p:3Unprio s:3major a:3None The symptoms are as follows: https://wikitech.wikimedia.org/wiki/Nova_Resource:I-0000010d.eqiad.wmflabs (... [06:37:47] PROBLEM - ToolLabs: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: tools.tools-webgrid-01.puppetagent.failed_events.value (33.33%) [07:02:37] RECOVERY - ToolLabs: Puppet failure events on labmon1001 is OK: OK: All targets OK [08:12:30] 3Wikimedia Labs / 3Infrastructure: acct (process and login accounting) fill up instances /var/ partition - 10https://bugzilla.wikimedia.org/69604#c1 (10Antoine "hashar" Musso) The package is acct which is part of of our standard packages installed via puppet. It provides data for commands such as 'ac' or 'la... [08:22:47] 3Wikimedia Labs / 3Infrastructure: acct (process and login accounting) fill up instances /var/ partition - 10https://bugzilla.wikimedia.org/69604 (10Antoine "hashar" Musso) a:3Antoine "hashar" Musso [08:24:01] 3Wikimedia Labs / 3Infrastructure: acct (process and login accounting) fill up instances /var/ partition - 10https://bugzilla.wikimedia.org/69604#c3 (10Antoine "hashar" Musso) +bd808 +jeremyb [08:28:00] 3Wikimedia Labs / 3Infrastructure: atop (monitoring system) logs fill up instances /var/ partition - 10https://bugzilla.wikimedia.org/69605#c1 (10Antoine "hashar" Musso) Example on deployment-bastion: $ du --total -h /var/log/atop* 4.0K /var/log/atop 2.7M /var/log/atop.log 41M /var/log/atop.log.1 3... [08:38:14] 3Wikimedia Labs / 3deployment-prep (beta): deployment-rsync01 20GB hard drive is too small - 10https://bugzilla.wikimedia.org/71431#c7 (10Antoine "hashar" Musso) Thanks Bryan for the detailed explanation :-) [12:28:59] Hey guys. I have just a little issue that absolutely is important for fawiki. We have a tool served on tools.wmflabs that is authenticating with OAuth and works without any issue, but its issue is on when user wants to use the tool for the second time, it seems its session (vanilla PHP session) is being destroyed just after one hour. I tried every trick mentioned on the net but was not useful. Do you who any idea? [12:46:29] ebraminio: what tool? [12:48:10] valhallasw`cloud: /fawikiauto [12:49:03] valhallasw`cloud: used for automating some task on fawiki like running "Delinker" bot [12:49:35] ebraminio: hm. The cookie is set to expire in a week. [12:49:59] but apart from setting a cookie, https://tools.wmflabs.org/fawikiauto/ doesn't do anything for me [12:50:17] valhallasw`cloud: https://tools.wmflabs.org/fawikiauto/?action=identify [12:50:29] you should once https://tools.wmflabs.org/fawikiauto/?action=authorize [12:51:09] valhallasw`cloud: it is just acting like a service from a gadget and doesn't provide UI it self [12:51:11] hmm [12:51:29] so php is killing the session before the session expires? [12:51:40] valhallasw`cloud: as far as I know [12:51:41] or is it on the oauth end? [12:51:48] no I don't think [12:54:09] valhallasw`cloud: I've sent its source as pm, it is nothing serious but as I am running a python code there not sure about it publishing its source [12:55:04] I can't find anything obvious [12:55:12] the session length of 7 days is correctly sent to my browser [12:56:11] valhallasw`cloud: I guessed wmflabs is dropping session or there is something wrong with my code? [12:56:39] ebraminio: maybe php runs from a different web server at the next requests, and the sessions aren't shared? not sure... [12:58:46] the only other thing I can think of is another tool clearing your cookies [12:59:56] valhallasw`cloud: different web server makes sense [13:00:20] Thank you [13:00:39] valhallasw`cloud: probably setting sessions path would be useful? [13:05:22] ebraminio: thta should work, yes [13:49:10] valhallasw`cloud: seems working, thank you :) [14:02:12] !log deployment-prep rebuilding beta's simplewiki cirrus inde [14:02:16] Logged the message, Master [14:02:18] !log deployment-prep rebuilding beta's simplewiki cirrus *index* [14:02:21] Logged the message, Master [15:27:56] YuviPanda: I just forwarded you a ganglia status email… I've been getting lots of those. [15:27:58] Any idea? [17:56:06] YuviPanda: so… in pmtpa the instance flavor types always had great big 'var' partitions that varied based on instance size. In eqiad that disk space is allocated but not partitioned until the user partitions it with an lvm partition. [17:56:28] I regard that as a feature, kind of? Except people sure do complain about /var being too small a lot [17:56:31] what are your thoughts? [17:56:37] andrewbogott: I don't think it's a feature, I think it's ab ug [17:56:44] Coren: same question [17:56:53] andrewbogott: since if you add the biglogs role, then it mounts a *new* volume with 8G [17:57:01] andrewbogott: and the old volume with 2G still exists with old files [17:57:15] so 'migration' involves essentially stopping everything, applying the role, moving the old log files, then restarting things [17:57:17] which is a PITA [17:57:20] so I definitely consider it a bug [17:57:23] What I'm leaning towards right now is… a slightly bigger /var by default (maybe 10G) but the rest of the space remains partitionable. [17:57:29] yeah, that sounds good [17:57:35] Does that seem right? Or do you think we should just get one massive partition like in pmtpa? [17:57:35] 10G /var should fix most things [17:58:04] nah, just increasing /var to 10G, and leaving the rest as is seems like a good way to go [17:58:07] I am not positive that I know how to do this, but I'm running a test now [17:58:10] so people can still get a huge /srv if they want [17:58:27] Of course it will mean setting up a new array of flavor types and disabling the old one for old instances. I don't think that's a big deal though [17:58:38] yeah [17:58:42] (The only thing about old image and flavor types is that we have to keep them around forever to run old instances) [17:58:51] that seems ok [17:59:25] For images it's expensive but a flavor is just a db entry, seems unimportant. [18:01:52] andrewbogott: yeah [18:11:36] No code? https://github.com/x-Tools/SuperCount [18:11:46] YuviPanda: hm, adding 'ephemeral' space worked but it mounted it on /mnt [18:12:12] hmm, isn't there a definition for the 2G of /var that we can just increase? [18:12:16] * YuviPanda is completely unsure how this works [18:12:44] I'm unsure too [18:12:59] heh [18:13:46] * andrewbogott googles [18:14:47] heh, there's a file called 'vmbuilder.partition' [18:14:51] that could be related! [18:23:15] andrewbogott: unrelated, but does OpenStack support a 'snapshot' feature for labs? [18:23:36] YuviPanda: OpenStack sort of supports it, I've never gotten it to work very well. [18:23:42] yurikR1: ^ [18:23:43] right [18:23:46] And if we supported it in labs we'd need 20x as much storage space as we have now :( [18:23:55] ah [18:24:01] so I guess that's mostly out of the question for now [18:24:03] 'cause people would use it and then never clean up after themselves, I predict [18:24:06] heh [18:24:16] andrewbogott, the reason for it - we need to test various vagrant patches [18:24:21] It's not a terrible idea. we could quota it. [18:24:47] yurikR1: hmmmm, we could also setup MWV in docker [18:24:51] (we already have support) [18:24:54] But it's not likely to be supported any time soon. Log a bug and make a case and maybe Yuvi will implement it in his spare time :) [18:24:59] haha ;) [18:25:18] so every time someone changes vagrant code, we should automatically test if labs-vagrant would 1) build a new vagrant instance, and 2) be able to update from the current 'master' to the new patch, followed by an immediate rollback [18:25:43] yurikR1: so we could have a 'mwlabs' similar to toollabs, which lets you provision MW instances with a specific extension setup + config, but use a shared db / redis / proxy [18:26:08] that would give you snapshots, but more importantly would mean you can throw away broken code easily [18:26:30] yurikR1: And test every combination of enabled roles and verify that all extensions in the wiki work and ... ;) [18:26:33] hehe :) [18:26:50] yuvipanda, not sure we even need the shared workspace, as long as vagrant can do provisioning :) [18:27:11] yeah, but 20 mysql + redis instances on one machine may not give you the best performance :) [18:27:19] bd808, that would be coool.... with 30 roles, that would only be ... 30! ? [18:28:09] yeah. I'm all for better testing for MW-V but we need form reasonable parameters for it. [18:28:28] heh [18:30:36] * YuviPanda wonders if we should experiment with OpenShift to replace labs-vagrant [18:31:04] YES!!!!!! [18:31:15] hah :) [18:31:26] OpenShift on top of OpenStack... [18:31:30] If that means what I think it means (provision labs vm via vagrant) [18:31:50] bd808: OpenShift is containers, IIRC? [18:37:01] andrewbogott: I'm still unconvinced that a bigger /var is a good idea: pretty much by definition, if you've got over 2G of crap filling /var/log you have an issue you need to fix and that isn't disk space. It's bad logging hygiene. [18:37:35] 2G isn't very much... [18:37:41] plus mysql puts data in there by default, for example [18:37:42] If you actually /need/ very large logs (which is almost never true), then you need to manage that space specifically and insulate it from the running system. [18:37:44] Coren: I'm responding partly to https://gerrit.wikimedia.org/r/#/c/164520/ -- even with log rotation set up they're still overloading it [18:37:54] besides, /var/log isn't all there is in log [18:38:15] Coren: and, just now when I logged into 130 different instances to fix their puppet, many of them were broken due to /var being full. [18:38:22] Of course, it might be that they'd've been full either way [18:38:40] YuviPanda: That's a mysql bug and against the standard. *sigh* But it /is/ the broken Ubuntu default. [18:39:53] andrewbogott: If you really must change the recipe, we should be doing it with LVM so we can keep the default at a correct small value but it can be *grown* if needed (as opposed to mounting something atop it) [18:41:25] Coren: With overprovisioning isn't that essentially what it does to make it 10g? The allocated space doesn't cost us anything unless it's filled. [18:42:12] Coren: but, having it be resizable seems ok -- how would we do that? [18:42:15] andrewbogott: True, but it *will* get fillled - just slightly later. Most instances log too much, and keep the logs too long, and don't even use them. [18:42:34] hashar, how does mediawiki-config work on Beta Labs? [18:42:40] Hm... [18:42:41] andrewbogott: If you give 10G, it'll just break when they fill 10G. :-) [18:42:49] That's true. But it will break less often. [18:42:53] still will break faaaaar less than with 2G :) [18:43:12] "Less often" only means "The problem will hit harder and be more surprising". [18:43:14] In particular, I'm trying to figure out where the checkout is and how private data works. [18:43:24] Coren: partly it's a question of how perfect we're expecting our labs sysadmins to be [18:43:37] I see /srv/mediawiki/wmf-config on deployment-bastion, although that's not a git checkout. [18:43:48] Does it get copied from a git repo somewhere else. [18:43:51] andrewbogott: Not very. That's why I expect that if you grow /var (and /var/log by extension) they'll just fill it to capacity too. :-) [18:44:50] andrewbogott: As to how, you just need to change the partitioning recipe. There are stanzas to create an lvm vg and make partitions in it. I never can recall it by heart though, I'd have to look the detail up. [18:45:13] superm401: I am not really there sorry. Can ask some other usual deployers from the mw core group [18:45:47] superm401: there is also https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated [18:45:53] sorry off, diapers duty :D [18:47:04] andrewbogott: And yes, having /var and /var/log logical volumes would be a *major* improvement. [18:48:03] IIRC the only reason I didn't do it is that we were in some sort of rush to have a new precise image for eqiad so we mostly kept the defaults everywhere. [19:08:32] YuviPanda: by the way, I forwarded you a labs ganglia email, did it make any sense to you? [19:08:34] I'm still getting them [19:31:08] andrewbogott: ah, yes, I've no idea where it's coming from, tho [19:31:29] YuviPanda: if not you then who? :( [19:31:33] andrewbogott: there's no ganglia specified in toollabs, perhaps manual... [19:32:32] andrewbogott: ah bam, found it [19:32:41] that was fast! [19:34:02] superm401: still around? Sorry had some dad duties to complete early on :D [19:51:45] !log deployment-prep performing rolling restart of elasticsearch nodes to pick up preview of accelerated regex plugin for testing at larger-than-mylaptop-scale [19:51:49] Logged the message, Master [19:53:09] hashar, it's okay, it's resolved, I think. [19:55:41] superm401: good to know [19:55:52] superm401: kid(2) keeps me busy :D [20:58:30] 3Wikimedia Labs: DNS problem with instance on Wikimedia Labs (apparently two instances with same name) - 10https://bugzilla.wikimedia.org/71595#c1 (10Andrew Bogott) There is only one instance named mwui. It's ip is 10.68.16.63 and its ec2 id is i-0000064f. That id, 10d is quite a low number which suggests th...