[01:51:38] hello ! i am getting errors when running puppet on labs: Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class passwords::phabricator for i-00000102.eqiad.wmflabs on node i-00000102.eqiad.wmflabs [01:51:38] Warning: Not using cache on failed catalog [01:52:04] any ideas? [01:57:40] interesting, i got the same error on puppet-compiler earlier today. i already checked that this class _does_ exist [01:58:16] in labs/private since https://gerrit.wikimedia.org/r/#/c/166569/ [01:58:40] so i don't know why, but ran into same issue [02:44:23] 3Wikimedia Labs / 3Infrastructure: make Debian Jessie image for labs - 10https://bugzilla.wikimedia.org/73592 (10Daniel Zahn) 3NEW p:3Unprio s:3normal a:3None make an installer image for Debian Jessie, that can be used on Labs instances currently we are very close to release: https://release.debian... [02:51:01] I'm trying to test restbase on some new instances, but don't get a ssh login; other pre-existing instances in deployment prep work fine though. [02:51:19] the failing instances are deployment-restbase01 and deployment-restbase02 [02:52:24] nm, started working about 15 minutes later [03:11:30] hello [03:12:06] i´d like to propose a change to the /usr/bin/sql script to the tools-login.wmflabs.org server [03:12:08] how to do that? [03:34:16] DiegoQueiroz: Best way is open a bugzilla with the details. [03:34:46] Thanks Coren, I am doing that [03:34:47] :) [03:59:52] 3Wikimedia Labs / 3tools: sql script does not accept wildcards as parameter - 10https://bugzilla.wikimedia.org/73595 (10Diego Queiroz) 3UNCO p:3Unprio s:3minor a:3Marc A. Pelletier Created attachment 17169 --> https://bugzilla.wikimedia.org/attachment.cgi?id=17169&action=edit New version of the /us... [05:07:43] hi folks, anyone here know the difference between http://tools.wmflabs.org/xtools/pcount/ and http://tools.wmflabs.org/xtools/ec/ ? [09:59:41] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add hieradata [labs/private] - 10https://gerrit.wikimedia.org/r/174197 (owner: 10Alexandros Kosiaris) [13:50:19] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_avail.value (11.11%) [13:57:29] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [14:29:14] huh..?? what did just happen to http://en.wikipedia.beta.wmflabs.org/? [14:29:19] getting a blank white page [14:29:32] same on http://wikidata.beta.wmflabs.org/ [14:30:01] http://de.wikipedia.beta.wmflabs.org/ seems to work [14:37:41] Hm, that's not encouraging [14:38:13] I'll see if I can find logzzz [14:40:31] Some MWExceptions about a missing VE file, but shouldn't be fatal I think [14:41:18] Ooh, no space left on device [14:42:12] But it looks like there's space [14:42:13] Weird [14:43:05] Oh, seems to work now [14:43:22] Oh, no, just once randomly [14:49:30] Tobi_WMDE_SW, might be https://bugzilla.wikimedia.org/show_bug.cgi?id=73567 [14:50:16] ^ blank page in beta labs [14:50:27] andre__: See -operations [14:50:36] /var is full on mediawiki01, maybe others [14:50:46] hmm [14:51:02] One possible cause anyway [14:51:13] andre__: "blank page" only means "there's an error!!!!" [14:51:22] yeah [14:56:02] Tobi_WMDE_SW: /var has free space now, seems to work again [14:56:22] marktraceur: thx for looking into! [14:56:37] No problem at all! [16:10:42] Coren: is there any eta for fixing https://bugzilla.wikimedia.org/show_bug.cgi?id=73493 ? [16:11:20] Merlissimo: I expect to be running the maintain-replicas script sometime tonight; this will fix this and add the new wikis. [16:11:29] if not i have a lot of work to rewrite all my scripts running queries joing with wikidata using not the default db server [16:11:50] ok thanks [17:06:32] andrewbogott: just a fyi, puppet is failing on all labs instances, probably due to upgrade in progress. Nothing to worry about atm, I'd say [17:06:51] YuviPanda: yeah, almost certainly my fault. Hopefully it will recover in a few minutes [17:06:58] yeah [17:07:10] In fact, I bet it's better already [17:07:55] * YuviPanda tries [17:10:16] well, splended, keystone icehouse doesn't work at all [17:10:34] hmm, puppet still dead, but that can wait. [17:12:51] YuviPanda: if you have a moment, please log into virt1000 and see if you can tell what's up with the puppet master [17:12:55] while I wrestle with keystone [17:13:01] ok [17:14:27] hmm, something else is already using the port [17:21:22] YuviPanda: I need to relocate, back soon... [17:21:31] andrewbogott: ok. I'll keep investigating this. [17:23:36] oh, nm, staying for a bit [17:26:50] hmm, both puppetmaster and apache want to listen on port 443 [17:28:51] Maybe there shouldn't be a puppetmaster, I'm not sure [17:29:09] isn't virt1000 our puppetmaster? [17:29:26] it is [17:29:31] it's proxied by apache, I think [17:48:52] 3Wikimedia Labs / 3deployment-prep (beta): File upload area resorts to 0777 permissions to for uploaded conent - 10https://bugzilla.wikimedia.org/73206 (10Greg Grossmeier) p:5Unprio>3High [17:58:35] YuviPanda: ok, keystone is now upgraded and sort of working. wikitech remains broken because glance needs an upgrade… I'll start that next. [17:58:41] But, first changing venue... [17:58:56] andrewbogott: ok. puppetmaster is actually fine, was a red herring (we just use apache). investigating why it isn't running properly now. [17:59:08] ok, thanks! [17:59:21] In theory my openstack nonsense shouldn't have anything to do with puppet, except I did a dist-upgrade. [17:59:25] So possible the versioning is screwed up? [17:59:27] Anyway, bbl [17:59:36] oh, dist-upgrade? to trusty? [17:59:45] oh, dist-upgrade [17:59:48] nvm [18:34:36] YuviPanda: ok, now I'm updating the network node. Might be some service interruptions on the instances. [18:34:40] Hopefully brief :/ [18:35:52] andrewbogott: yeah, ok. facter's fine except ec2id is now a bit screwed up, investigating. [18:39:46] ec2metadata still thinks it's on ec2 [19:11:53] (03PS1) 10Legoktm: Add support for creating tarballs of skins [labs/tools/extdist] - 10https://gerrit.wikimedia.org/r/174468 [19:12:02] (03CR) 10jenkins-bot: [V: 04-1] Add support for creating tarballs of skins [labs/tools/extdist] - 10https://gerrit.wikimedia.org/r/174468 (owner: 10Legoktm) [19:17:54] (03PS2) 10Legoktm: Add support for creating tarballs of skins [labs/tools/extdist] - 10https://gerrit.wikimedia.org/r/174468 [20:27:57] andrewbogott: yay on upgrade completion! [20:28:22] YuviPanda: it was both easier and slower than I expected [20:28:27] :) [20:29:18] YuviPanda: how are your grub skills? [20:29:39] ah, fairly outdated, I think? been 2-3 years. [20:29:51] I can still poke around if you'd like [20:30:16] YuviPanda: when I did dist-upgrade on virt1009 grub complained that every volume it was installed on didn't exist [20:30:19] Which seems… worrisome. [20:30:29] But now that the upgrade is done I can't really figure out if there's a problem or not [20:30:36] ah, hmm [20:30:46] The only way to test is to reboot… which I am not going to do :) [20:30:47] that seems like something where poking at it might cause more problems... [20:30:49] yeah :) [20:32:16] I'm hoping to find someone who can tell by looking... [20:32:27] I can't get apt to complain a second time though [20:33:51] log? [20:34:53] Yeah, it's most likely logged in the apt log [20:36:01] indeed it is [20:53:57] andrewbogott: It's not clear why it failed during the apt run, but an update-grub2 worked without issue. [20:54:31] !log ircnotifier created project for irc notifier relay [20:54:33] grub-probe is being needlessly noisy, but I'm not worried about it. [20:54:34] The probe command in the log fails though… and works on other hosts. [20:54:34] Logged the message, Master [20:54:50] Oh, it's not saying no such disk anymore? [20:55:04] !log ircnotifier created instance ircnotifier-test-01 to test ircnotifier and puppetize [20:55:06] Logged the message, Master [20:55:06] Yeah, for some grub-probes [20:55:38] Hmm. It's not clear whether any one of them succeeded; you're right that this bears looking into. [21:00:12] it's... trying to open the wrong uuid. dafu? [21:07:46] gwicke: deployment-restbase01 has puppet failures, you're aware, right? [21:13:20] grub-probe is confused/broken [21:13:53] I can certainly work around it with a device-map, but I'd rather try to figure out what's happening? [21:19:23] !log deployment-prep Cherry-picked https://gerrit.wikimedia.org/r/#/c/173336/3 to Beta [21:19:25] Logged the message, Master [21:21:01] YuviPanda: thanks, I'll check later [21:41:47] andrewbogott: This... confuses me. initramfs whines that it can't find /dev/md/virt1008:0 (and :1). [21:41:55] ... on virt1009 mind you. [21:42:31] and also mdadm.conf gives /0 and /1 names to the arrays anyways, not the longform. [21:42:32] why would the devs have the hostname in them at all? [21:42:55] andrewbogott: That's normal for foreign arrays - they always have names, just hidden when the same as the hostname. [21:43:01] ok [21:43:29] But the arrays in the box are named - as expected - Name : virt1009:0 (local to host virt1009) [21:43:32] and so on. [21:43:44] Did you copy bits of config from 1008? [21:53:10] * Coren strangles grub-probe. [21:55:06] Coren: as far as I know virt1009 was built fresh from puppet [21:57:16] It's not finding the uuid of the array. But it's *there* [21:58:10] But I think it's looking for the wrong thing; it shouldn't be trying to find the array uuid (as it does) but the disk uuid which /is/ in /dev/disk/by-uuid [21:59:28] None of this makes any sense. [22:06:03] andrewbogott: are there docs on creating a new gerrit repo + importing? [22:06:12] I don't think so [22:09:54] andrewbogott: think you can create operations/software/ircyall and import http://github.com/yuvipanda/ircnotifier into it? [22:10:16] is ok if you're busy with something else, can wait. [22:10:28] I'm concentrating on something else atm. I'm sure you can do it -- just go to the 'projects' tab in gerrit. [22:10:52] oh [22:10:53] right [22:10:53] ok [22:10:55] andrewbogott: thanks [22:13:49] andrewbogott: Why are you getting all the really obscure problems? [22:13:59] Coren: sorry :( [22:26:44] * Coren rages at this. [22:35:43] me too, but at something different [22:37:05] What, you got another one? [22:38:18] Um… just, a stupid change in icehouse that breaks some things [23:16:30] Coren: are you encountering a barely-controllable urge to reboot that box just to see what happens? [23:16:40] I would be, if I were you [23:39:14] andrewbogott: I'm semi-confident that the way grub is failing means it almost certainly didn't touch the actual bootloader which means it'd reboot. [23:39:25] andrewbogott: But isn't that box running instances? [23:47:40] (03CR) 10Chad: [C: 031] Add support for creating tarballs of skins [labs/tools/extdist] - 10https://gerrit.wikimedia.org/r/174468 (owner: 10Legoktm) [23:53:30] Coren: yes, many. So not so good to reboot :)