[05:11:58] 3Tool Labs tools / 3[other]: Migrate https://toolserver.org/~mauro742 to Tool Labs - 10https://bugzilla.wikimedia.org/60903#c2 (10Nemo) 5RESO/INV>3RESO/WOR I think the crucial service operated by this account is now running under itwp-deletions, additionally for it.wiki lists the others are using https:/... [06:58:38] the Wikidata dump http://dumps.wikimedia.org/wikidatawiki/ is not available yet in Labs ... I thought it was automated ??? [07:06:01] 3Wikimedia Labs / 3Infrastructure: latest dump not available again - 10https://bugzilla.wikimedia.org/66362 (10Gerard Meijssen) 3NEW p:3Unprio s:3normal a:3None The latest Wikidata dump was ready on May 28, the dump should become available soon after this. It does not and consequently the statistics... [07:35:31] 3Wikimedia Labs / 3tools: Include imagescaler::packages in tool labs - 10https://bugzilla.wikimedia.org/66354#c1 (10Tim Landscheidt) a:5Marc A. Pelletier>3Tim Landscheidt We included the fonts packages for bug #58740 in exec nodes in February (since then replaced by ::mediawiki::multimedia::fonts). Due... [14:04:59] 3Wikimedia Labs: Mail notifications from fab.wmflabs.org delivered only days later (or not at all?) - 10https://bugzilla.wikimedia.org/65861#c12 (10Andre Klapper) 5RESO/FIX>3REOP There seems to be a problem again for me. I have received no Phabricator notifications since Thursday (though I have actively c... [14:49:15] 3Wikimedia Labs: Mail notifications from fab.wmflabs.org delivered only days later (or not at all?) - 10https://bugzilla.wikimedia.org/65861#c13 (10Marc A. Pelletier) 5REOP>3RESO/FIX /var was full again, because /var/tmp/phd/log had a 1.4G logfile(!) I've truncated it to the last 50K lines, but this is tr... [15:00:15] 3Wikimedia Labs: Mail notifications from fab.wmflabs.org delivered only days later (or not at all?) - 10https://bugzilla.wikimedia.org/65861#c14 (10Andre Klapper) Sigh. Thanks! Confirming I received mail now. > at the very least /var/tmp/phd/log needs to be configured to go into /var/log > (which has more spa... [15:00:19] Is anyone around who knows a bit about deployment-prep? I have a few questions, the first of which is: which of these boxes is the puppetmaster? [15:03:12] * andrewbogott is dumb, can clearly figure that out for himself [15:10:07] !log beta updating all instances to puppet 3 via a cherry-pick of https://gerrit.wikimedia.org/r/#/c/137898/ on deployment-salt [15:10:10] beta is not a valid project. [15:10:24] !log deployment-prep updating all instances to puppet 3 via a cherry-pick of https://gerrit.wikimedia.org/r/#/c/137898/ on deployment-salt [15:10:27] Logged the message, dummy [15:10:55] that certainly sounds rude [15:19:46] !log deployment-prep doing a 'rebase origin' on deployment-salt, because it needs it. [15:19:48] Logged the message, dummy [18:04:52] (03PS1) 10Andrew Bogott: Specify puppet:///modules/etc [labs/private] - 10https://gerrit.wikimedia.org/r/138379 [18:12:30] (03PS2) 10Andrew Bogott: Specify puppet:///private/modules/etc [labs/private] - 10https://gerrit.wikimedia.org/r/138379 [18:14:28] (03CR) 10Andrew Bogott: [C: 032 V: 032] Specify puppet:///private/modules/etc [labs/private] - 10https://gerrit.wikimedia.org/r/138379 (owner: 10Andrew Bogott) [18:46:56] (03PS1) 10Andrew Bogott: Added a bogus 'lib' dir to the passwords module. [labs/private] - 10https://gerrit.wikimedia.org/r/138386 [18:47:29] (03CR) 10Andrew Bogott: [C: 032 V: 032] Added a bogus 'lib' dir to the passwords module. [labs/private] - 10https://gerrit.wikimedia.org/r/138386 (owner: 10Andrew Bogott) [18:49:05] (03PS1) 10Andrew Bogott: Um... make that dummy dir in the right place. [labs/private] - 10https://gerrit.wikimedia.org/r/138387 [18:49:31] (03CR) 10Andrew Bogott: [C: 032 V: 032] Um... make that dummy dir in the right place. [labs/private] - 10https://gerrit.wikimedia.org/r/138387 (owner: 10Andrew Bogott) [18:56:44] (03PS1) 10Andrew Bogott: Revert "Specify puppet:///private/modules/etc" [labs/private] - 10https://gerrit.wikimedia.org/r/138390 [18:56:58] (03CR) 10Andrew Bogott: [C: 032 V: 032] Revert "Specify puppet:///private/modules/etc" [labs/private] - 10https://gerrit.wikimedia.org/r/138390 (owner: 10Andrew Bogott) [19:42:26] scfc_de: more spams are coming these days and some of them can even bypass the Gmail spam filter :( [19:57:42] Coren: pm? [19:57:52] YuviPanda: Que pasa? [19:58:02] liangent: The spammers, they learn too. :-( [20:19:44] 3Wikimedia Labs: Mail notifications from fab.wmflabs.org delivered only days later (or not at all?) - 10https://bugzilla.wikimedia.org/65861#c15 (10Matthew Flaschen) (In reply to Marc A. Pelletier from comment #13) > at the very least that logfile needs to be configured to go > into /var/log (which has more sp... [20:20:29] 3Wikimedia Labs: Mail notifications from fab.wmflabs.org delivered only days later (or not at all?) - 10https://bugzilla.wikimedia.org/65861#c16 (10Matthew Flaschen) The /var/log workaround, I mean. [21:38:48] I'm getting home dir errors on parsoid-spof [21:39:30] are there any known issues with home dir storage for the visualeditor project? [21:39:45] it's a very recent thing, was working ~10 minutes ago [21:40:26] Coren: ^^ [22:27:28] andrewbogott: ^^ [22:27:33] gwicke: looking... [22:29:16] gwicke: I can log into towtruck but not into parsoid-spof. [22:29:28] You can log in, I take it? [22:29:47] no home dir -> no login [22:29:49] same for other users [22:30:03] I was actually logged in when the home dir disappeared [22:30:20] all home dirs for that matter [22:30:36] The syslog for that instance shows problems with a local disk. [22:30:39] Mind if I reboot it? [22:30:50] sure [22:33:17] gwicke: When was the last time this box was working properly and doing things and such? [22:36:26] 14:39:44 gwicke it's a very recent thing, was [22:36:28] working ~10 minutes ago [22:36:30] y [22:36:57] andrewbogott: ^^ [22:38:43] hm, the exports are right... [22:38:50] Do you know if puppet has mostly been working or not working on that box? [22:42:11] I have not heard of any issues with it [22:42:33] we did lose the home dir quite a few times though; typically that was glusterfs [22:43:59] It can't access the exported NFS volumes. I don't know why not… I can see that those volumes are properly exported to that IP [22:45:00] any issues on the NFS server? [22:45:40] looks fine to me, working for other instances [22:47:28] the odd thing is that it just disappeared [22:48:05] I was trying out aptly (a debian repo manager) in my home dir, and got a write error when I tried to create a repo [22:48:13] then did ls /home, which was now empty [22:50:36] [16054431.786503] init: manage-nfs-volumes respawning too fast, stopped [22:51:11] mutante: that might be my fault… I tried to restart it a couple of times [22:51:14] gwicke: did you just reboot? [22:51:18] andrewbogott: yes [22:51:29] That instance has some kind of local disk failure [22:51:40] the mount points where nfs should be attaching don't exist [22:51:43] and I can't create them... [22:52:46] I have no idea what to do next, when I'm already root and I get 'permission denied' [22:53:52] andrewbogott, with puppet3; puppetd -tv no longer starts a puppet run with results visable to the tty that launched it (because puppetd no longer exists...) -- what is the equivalent command? puppet kick is deprecated and we dont have mcollective installed (which seems to be what puppetlabs recommends) [22:54:05] mwalker: 'puppet agent -tv' [22:54:21] ah; thanks much :) [22:54:35] rpcbind: Cannot open '/run/rpcbind/rpcbind.xdr' file for reading, errno 2 (No such file or directory) [22:54:36] rpcbind: Cannot open '/run/rpcbind/portmap.xdr' file for reading, errno 2 (No such file or directory) [22:54:38] mount.nfs: Failed to resolve server labstore.svc.eqiad.wmnet: Temporary failure in name resolution [22:54:59] right after / is mounted rw [22:55:35] and then, a bit later: mountall: mount /public/keys [378] terminated with status 32 [22:56:03] # mount /public/keys [22:56:04] mount.nfs: mount point /public/keys does not exist [23:03:51] gwicke: I don't know what else to try. I can't explain where those mount points went or why I can't recreate them [23:07:08] andrewbogott: it's hard to tell the root cause from looking at the log [23:07:26] are there any hardware issues on the host node? [23:08:00] * Coren returns to dinner [23:08:02] from* [23:08:18] Need me to take a look? What instance is this? [23:08:30] Coren, sure. parsoid-spof [23:08:32] project 'visualeditor' [23:08:49] Things look right on labstore1001 [23:09:00] but the instance won't mount for lack of mount points. where did they go? [23:10:54] andrewbogott: They're being subsumed by autofs. When in doubt, look at /proc/mount. Purging autofs5 now. [23:11:27] oh, of course. [23:11:37] I wonder why autofs decided to suddenly start messing with us? [23:13:00] andrewbogott: beats me; it might have been off but then the box rebooted? [23:13:17] the problem started while gwicke was logged in and typing [23:13:17] All the filesystems are there now. [23:13:26] huh [23:13:27] welp. [23:13:30] gwicke: working now? [23:13:39] yes, just logged in [23:13:45] For some reason, autofs decided to wake up and take over /data and /home at that time. [23:13:59] Which is why I purge the [bleep]ing thing. [23:14:24] is autofs still supposed to be installed? [23:14:36] it's the first time it seems to have done this [23:14:43] No; it's leftover from Tampa [23:15:02] gwicke: I purged it by hand when I migrated instances… must've missed yours :( [23:15:10] I see [23:15:26] strange that it decided to wake up randomly [23:15:43] yeah [23:16:25] andrewbogott, when you and gwicke get done -- I need some help figuring out how to make beta::natfix and base::firewall play nice together (or rather, how to make beta::natfix and a ferm::service rule place nice) [23:17:25] mwalker: I don't really know about either of those classes… and I don't have too many minutes left in my day :/ [23:17:29] what's the conflict? [23:18:17] heh; beta::natfix installs ferm, but adding a ferm::service rule doesn't actually add a rule (jeff_green made it work in production by adding the base::firewall class, but adding that in labs removes the natfix rules) [23:18:35] coren, building an image for precise I'm getting a lot of these: invoke-rc.d: policy-rc.d denied execution of start [23:18:45] so basically, I need a way to add a ferm rule via puppet that works in both production and labs [23:18:47] but best I can tell policy-rc.d isn't even set up on that box. Ever see that? [23:19:49] mwalker: put base::firewall on node level [23:19:59] so on an instance in labs [23:20:03] Coren, andrewbogott, mutante: thanks for the help! [23:20:15] mutante, I did that; but doing that removed the natfix rules [23:20:38] gwicke: No worries. If I /had/ to venture a guess, I'd say autofs had been running all along /under/ NFS but then it crashed; so when it restarted it reclaimed the filesystems. [23:21:05] *actually; I may have accidently unclicked the beta::natfix checkbox -- trying again [23:21:47] mwalker: removed ? are they not created via ferm yet? [23:21:55] is that using the old pre-firm iptables puppet stuff? [23:22:02] pre-ferm [23:22:29] mutante, andrewbogott; ah; my problem was simply user error -- somehow the beta::natfix role got removed from this box [23:22:31] andrewbogott: ... why do you have policy-rc.d installed at all? [23:23:01] mwalker: aha! so yea, put base::firewall on the instance additionally, just don't add it to the role class [23:23:01] mutante, I don't know; the rules are in /etc/ferm/conf.d -- but I don' tknow how they're created since I needed to add base::firewall to get my rules to work [23:23:01] Coren: it isn't, as far as I know [23:23:19] Or at any rate, I didn't install it; this is the same labs-vmbuilder-precise as always [23:23:25] * Coren odds. [23:23:25] mwalker: if you needed base::firewall for them to work then they are ferm [23:23:37] Coren: sounds plausible [23:23:37] Hm; well, it's supposed to /be/ there but not deny anything. [23:23:50] Coren: I just built a scratch trusty build box and that went fine, I'll start doing that for precise as well. [23:24:26] Oh, wait, that _during_ the vm-builder run? [23:24:29] mutante, that's the fun thing; I needed base::firewall for my rules -- beta::natfix was standalone (e.g. worked without base::firewall); but somehow still uses ferm to implement them [23:24:30] yeah [23:24:43] Coren: Presumably something about the chroot jail [23:24:59] Oh! Then that's perfectly normal; in vm-builder, policy-rc.d will deny services starting. [23:24:59] but also the image creationg is failing, which I take to be related. [23:25:04] oh, so a red herring. [23:25:05] Hm [23:25:23] mwalker: can it be explained by the puppet rule being applied in the past and then removed again ? (puppet does not delete iptables rules when the class is not applied anymore) [23:25:38] yeah, I'd say red herring because services not starting when installing packages in the chroot jail is exactly what you want. :-) [23:26:05] I have "red herring" stalked in my client. [23:26:11] It gets said most often in here, I think. [23:26:23] mwalker: well.. does it work when you apply both rules? [23:26:27] roles.. damnit [23:26:36] yes -- now that I've clicked both checkboxes it works :) [23:26:40] Perhaps I should stalk just "herring" and do a color comparison. [23:26:43] ... somehow [23:26:45] Gloria: Now of course I wonder why in blazes you'd stalk that -- but do I really want to know? :-) [23:26:46] mwalker: cool:) [23:26:56] Coren: ultimately the build fails due to a busy mount. Which led me to think that the problem was that policy-rc.d was stopping something that was still running there [23:27:00] …or something [23:27:37] andrewbogott: I'll take a look at it tomorrow if you haven't figured it out. Email me a pointer at your build area? [23:28:03] ok. I'm just following the rules on wikitech… but I will email if the fresh build box has the same issue. [23:37:30] E: Sub-process /usr/bin/dpkg returned an error code (1) [23:37:40] my instance broke when installing package upgrades [23:37:48] N: Ignoring file 'puppet_base_2.7' in directory '/etc/apt/preferences.d/' as it has an invalid filename extension [23:38:00] Errors were encountered while processing: /var/cache/apt/archives/mariadb-server-5.5_5.5.38+maria-1~precise_amd64.deb [23:38:24] mariadb-server : Depends: mariadb-server-5.5 (= 5.5.38+maria-1~precise) but 5.5.36+maria-1~precise is installed [23:38:27] oh man [23:41:05] did mariadb get imported into apt again? [23:46:08] mutante: the puppet_base_2.7 thing shouldn't matter… I see that too but it should be resolved with the new images I'm building [23:46:11] or trying to build [23:48:22] andrewbogott: gotcha, thx [23:48:43] for now just restoring services, gotta look at the package versions later [23:49:05] fwiw, i had already written puppet code to import mariadb back in 2012 or something [23:49:10] incl. the apt key [23:49:21] it disappeard in the "test" branch