[04:25:01] Hi Deskana. I'm told you're the one to bother about approving an OAuth app request. :) [04:26:17] Pathoschild: Me or any other person in the staff global group. :) [04:27:02] Pathoschild: I approved your request. :) [04:27:08] Thanks! [06:22:29] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools-dev.diskspace._var.byte_avail.value (11.11%) [06:34:18] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [06:56:58] PROBLEM - ToolLabs: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: tools.tools-exec-12.puppetagent.failed_events.value (33.33%) [07:21:59] RECOVERY - ToolLabs: Puppet failure events on labmon1001 is OK: OK: All targets OK [11:56:39] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (11.11%) WARN: tools.tools-dev.diskspace._var.byte_avail.value (100.00%) [12:25:44] 3Wikimedia Labs / 3tools: Gerrit Patch Uploader does not work - 10https://bugzilla.wikimedia.org/73078 (10Fomafix) 3NEW p:3Unprio s:3blocke a:3Marc A. Pelletier https://tools.wmflabs.org/gerrit-patch-uploader/ delivers an empty HTML page. [12:44:41] 3Wikimedia Labs / 3tools: Gerrit Patch Uploader does not work - 10https://bugzilla.wikimedia.org/73078#c1 (10Andre Klapper) Meh. We advertise that on the Gerrit help pages on mediawiki.org... CC'ing Legoktm and Valhallasw who are listed as maintainers on http://tools.wmflabs.org/ [12:46:13] 3Tool Labs tools / 3Other: Gerrit Patch Uploader does not work - 10https://bugzilla.wikimedia.org/73078 (10Tim Landscheidt) a:5Marc A. Pelletier>3None [12:48:42] 3Tool Labs tools / 3Other: Gerrit Patch Uploader does not work - 10https://bugzilla.wikimedia.org/73078#c2 (10Merlijn van Deen) 5NEW>3RESO/FIX Restarted webserver, seems to work again, but I'm not sure why it was broken... [13:16:44] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [13:57:11] Coren: I have a few hundred files that contain JSON objects separated by \n. Each objects maps some text to an ID. What is the best form to restore this dump, so that given an ID, I can fetch the text? [13:57:29] There are about 15 million objects. [13:58:50] So making them into individual text files was unwieldly. rm ran out of argument space :/ [14:01:45] w930913: use a database? [14:01:47] tools-db [14:01:55] a table with id and 'json'? [14:01:58] and index on the id [14:19:55] 3Tool Labs tools / 3Quentinv57's tools: Tool cannot provide results for usernames with some diacritical symbols - 10https://bugzilla.wikimedia.org/69004#c1 (10Dan) + one case http://tools.wmflabs.org/quentinv57-tools/tools/sulinfo.php?username=%D0%A7%D1%80%D1%8A%D0%BD%D1%8B%D0%B9+%D1%87%D0%B5%D0%BB%D0%BE%D0%... [14:45:11] 3Tool Labs tools / 3Quentinv57's tools: Tool cannot provide results for usernames with some diacritical symbols - 10https://bugzilla.wikimedia.org/69004#c2 (10Glaisher) This tool is quite unstable now and Quentin has been inactive for months now. [[m:Special:CentralAuth]] now has almost everything that sulin... [15:23:57] hey labs! I want to start playing around a distributed solution for wikidata querying - should I just jam that into the mediawiki-core project or can someone make me a new one? [15:24:12] w930913: My first reflex would be to use a DBD as keystore. [15:24:57] 3Tool Labs tools / 3Quentinv57's tools: Tool cannot provide results for usernames with some diacritical symbols - 10https://bugzilla.wikimedia.org/69004#c3 (10Cyberpower678) 5UNCO>3NEW a:3Cyberpower678 This tool is old. As it's maintainer, I'm going to be merging the tool into my edit counter when I h... [15:29:45] manybubbles: Well, if you can reuse an existing project right now it'd be better as we're a bit tight on resources. We got new hardware on the way but it'll be some time before it's deployed. [15:35:27] Coren: I can reuse an existing project but not existing hardware, I believe [15:35:42] I could try to do something with vagrant locally.... [15:36:17] You mean you need new instances? Then it doesn't really matter whether it lives in a distinct project or not; those are organizational units and have no cost. [15:36:45] And we're not /out/ of resources, so if you don't need many new instances you're still okay. [15:37:27] Coren: yeah - its maybe 3 or 4 small instances should do it [15:50:17] YuviPanda: Coren: Are databases that good for storing large blocks of text? [15:50:32] how large are these? [15:50:49] YuviPanda: Academic paper sized. [15:51:01] how much is that in MB? :) [15:51:07] w930913: Depends which, and what your use pattern is; but a bdb is fairly reasonable as a key/value store with large values. [15:51:22] are these open access papers? :) would there be copyright issues, etc? [15:52:02] w930913: That said, the filesystem could also be reasonable if you do something like images and split file files into many directories according to some hashing scheme. [15:52:25] YuviPanda: We can republish OA can't we? [15:53:30] Coren: By hashing, you mean a folder for 1*, 2*, 3*...? :p [15:54:14] w930913: something like this; as long as whatever scheme you use distributes the files fairly evenly. [15:54:38] w930913: You might want more than one level though; something like 1/2/12345.txt [15:55:00] This has the advantage of being very simple to implement, and very robust. [15:55:01] Coren: 1/2/3/4/5/6/7.txt? :p [15:55:49] That;s what I liked about the FS version. Very simple with little overhead. [15:55:56] w930913: Possibly, though that may be overkill. It's not worthwhile to reduce the last level to contain too few files; directories with a couple hundred entries are quite okay. [15:56:23] It worked, but crashed on ls/rm :p [15:56:30] (The flat FS I mean.) [15:57:16] Ok, I'm going to try three or so double digit layers, before resorting to DB. [15:57:21] Thanks guys. [15:57:26] w930913: That's because flat directories with many thousand of files can cause issue in tools, and have inneficiencies. [15:59:42] s/thousand/million/ :p [16:01:53] my test wiki on a tools lab [which I use for bug fix] looks very slow, any common advice for improving wiki performance on labs? [16:06:34] Coren: so can you make me a wikidata-query labs project? I can be judicious in my node creation [16:22:04] Coren: wikidata-query - something like that [19:32:39] manybubbles: {{done}} [19:32:47] grrrit is gone? [19:32:48] and wikibugs is gone [19:32:49] and lots of tools are extremely slow! [19:32:49] * aude wonders if labs is broken [19:32:50] wikidata game is not loading [19:32:50] Coren: ? [19:32:50] <^d> gerrit's there for me. [19:32:51] <^d> Oh the gerrit bot. [19:32:51] <^d> nvm [19:32:52] yeah [19:32:52] SSH on labstore1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:32:53] Somethings is going broken. [19:32:53] since 10 minutes [19:32:54] Oy! [19:32:55] The NFS server went down. [19:32:55] I'm on it. [19:32:56] thanks [19:32:57] <3 [19:32:57] Dafu, the box went completely down. [19:32:58] ah that's what's wrong with logging in [19:32:59] Was wondering why my bots died [19:33:01] Oh, sheisse. Possible hardware failure. [19:33:01] :( [19:33:02] m( [19:33:05] * Coren tries first-pass remedial first. [19:33:05] did this affect bastion.wfmlabs.org? [19:33:07] This will have all of Labs down. [19:33:07] Coren: Erm... That... Was it me? :/ [19:33:09] w930913: No, the raid controller is reporting hardware issues. [19:33:09] Coren: Was I the straw that broke the camel's back? [19:33:10] w930913: Not unless you magically found a way to make ram remotely fail. [19:33:11] * w930913 sighs with relief. [19:33:11] Coren: can I do anything? [19:33:12] andrewbogott: No, I'm trying to determine if the ECC error in the backing ram was a fluke or not, and trying to convince the controller to retake the devices. [19:33:13] yikes [19:33:16] It looks like it's the shelves that have an issue and they need to be powercycled (at least) [19:33:16] I'm not sure I know what that means. 'shelf' is different from 'rack' right? [19:33:19] thanks YuviPanda [19:33:19] andrewbogott: Disk shelves, three 4u enclosure full of disks. [19:33:20] Ah, that makes sense [19:33:21] Some day I should visit the data center and get things pointed out to me. It's all very abstract atm [19:33:21] Coren: I take it the disk shelves don't have a mgmt console? They're external to the server? [19:33:22] andrewbogott: Correct. [19:33:23] andrewbogott: The good news is, chris was already on-site and power cycling them made them visible to the controller again. [19:33:23] Yeah, I saw. But it sucks that there's no remote option. [19:33:24] The datacentre needs a telepresent robot! [19:33:25] That would introduce a whole new range of interesting outage reports. [19:33:25] "P [19:33:26] (there are some nice (but outdated) photos about the datacenter on commons, maybe when andrewbogott goes to visit it, he can take new photos) [19:33:26] :D [19:33:27] andrewbogott: We might have controllable PDUs in that rack; that would have been my second go-to if chris hadn't been there. [19:33:32] duh, no logs even [19:33:36] labs down? [19:33:36] yes [19:33:37] howcome... [19:33:37] !ping [19:33:37] !pong [19:33:38] indeed... it's down [19:33:38] petan: hardware failure in the NFS server [19:33:38] means that all user logins are broken [19:33:39] and all shared storage [19:33:39] petan: We're doomed I tell you. DOOMED! [19:33:40] is there redundance? [19:33:40] it's fun how wm-bot's bouncers are still up [19:33:40] * milimetric runs around with his head cut off [19:33:40] why isn't it fixed yet? [19:33:41] these bouncers never die [19:33:41] hehe [19:33:42] There's controller redundancy, but the failure is more fundamental than that so it knocked out the redundant system as well. [19:33:42] * ^d hides under his desk [19:33:42] We'll send a post-mortem when things are fixed [19:33:42] <^d> Hey, it works for nuclear fallout drills in the 60s! [19:33:43] :D [19:33:43] if it's good enough for the 60s, it's good enough for me [19:33:43] Where's an IRC BOFH excuse generator when you need one? [19:33:44] <^d> w930913: It was running on labs :( [19:33:44] We should have a redundant labs, codenamed toolserver :p [19:33:45] silly question: there's no other way for me to read data from /var/lib/git from one of my instances right? [19:33:45] YuviPanda: First action, pulling the plug on labs? ;-) [19:33:46] w930913 :D [19:33:46] I have to admit that toollabs give a genuine Toolserver experience these days [19:33:46] Altough I wouldn't mind if we drop the irregular outages from that experience [19:33:47] I remember DaB telling us we will be like toolserver once all users move from there :D [19:33:47] he was so true heh [19:33:47] btw there is still one thing better than TS, that is user registration [19:33:47] you don't have to wait 2 years for access :P [19:33:48] True, a lot of things are better, just getting some things to the better level took a bit too much time [19:33:48] multichill: :D [19:33:48] And some things are still behind, but I'm pretty sure YuviPanda is going to attack that part [19:33:48] But haredware issues, that sounds, ehm, interesting [19:33:49] * petan slaps wmflabs [19:33:49] wooden servers [19:33:49] I'm more scared of software issues than of hardware issues in my day to day operations [19:33:49] switch to better sw then :P [19:33:49] or be like me [19:33:50] write everything yourself :D [19:33:50] everytime I decide to use some SW I find out it suck so hard that I rewrite it from scratch [19:33:50] I [19:33:50] I'm a network engineer. The software I'm talking about powers switches/routers/firewalls/load balancers [19:33:50] hi, what's with labs? I cannot login to labs, my labs bots not works and i cannot save pages with api even from home computer [19:33:51] For the first three writing it yourself or open source is not really an option yet, altough some cool stuff is happening right now [19:33:51] yes, load balancer was something I was going to rewrite as well [19:33:51] there is LVM [19:33:51] that is quite decent open source load balancer I think [19:33:51] * LVS [19:33:52] mbh: There is a serious outage in progress. Good places to look for info are e.g. the topic line in this room, or the labs-l list. [19:33:52] wm-bot: can you hear us? :o [19:33:53] Hi petan, there is some error, I am a stupid bot and I am not intelligent enough to hold a conversation with you :-) [19:33:53] I believe it does it just can't respond :/ [19:33:53] andrewbogott: any docs on the physical layer of labs openstack deployment ? [19:33:53] matanya: sure, it's on... eh labs :P [19:33:53] petan: i.e what servers, how many, what cpu / ram etc [19:33:54] some info was on wikitech [19:33:54] dunno how recent [19:33:54] matanya: not really, outside of the puppet code. The virt nodes are https://wikitech.wikimedia.org/wiki/Cisco_UCS_C250_M1 [19:33:54] There's one controller and one network/api node. It's pretty simple. [19:33:54] Did your UCS setup die andrewbogott ? [19:33:55] so no seperate node for network? [19:33:55] multichill: the current failure is in the drive shelf attached to the NFS server that supports shared storage. [19:33:55] matanya: maybe I don't understand the question. There is a deadicated network node. We're currently using nova-network and not neutron [19:33:55] since it doesn't really support our use case. [19:33:56] andrewbogott: so one controller, one network and one compute node ? [19:33:56] um… dedicated :) [19:33:56] andrewbogott: That sucks man, controller failure or disk failure? [19:33:56] one controller, one network node, 9 compute nodes (and we're getting three mode in a week or two) [19:33:56] Lol, I just checked my emails, and saw "YuviPanda joins Ops" immediately followed by "Outage of labs in progress" [19:33:57] the compute nodes are those big ciscos that I just lined to. The other nodes are modest misc Dell servers. [19:33:57] multichill: It's unclear so far. Coren and Chris are digging frantically, I'm trying not to bug them with too many questions. [19:33:57] I see. thanks andrewbogott [19:33:57] Mercifully, Chris happend to be /at/ the datacenter today, so we have hands on the hardware if necessary. [19:33:58] matanya: I'll write some of that up right now, since I can't really do what I was planning on doing today... [19:33:58] thanks! [19:33:59] Ha, and he is getting blamed now for pulling the wrong plug :P [19:33:59] really? [19:33:59] page saving through API does not work for the same reason? [19:33:59] 503 [19:34:00] mbh: a) you're not telling us the specifics of what you're doing, so it's hard to comment; b) sounds like something that's more related to #mediawiki or #wikimedia-tech [19:34:00] valhallasw`cloud: my bot try to save page from local computer and receive 503 error [19:34:01] andrewbogott: Maybe you can redirect all traffic to a houston we have a problem page? [19:34:03] shouldn't the outage appear on https://wikitech.wikimedia.org/wiki/Labs_Server_Admin_Log ? [19:34:03] Platonides: the bot that manages the SAL runs on labs [19:34:04] andrewbogott: Outage landing page? [19:34:04] multichill: for what URL? [19:34:04] andrewbogott, I saw your !log in the other channel :) [19:34:04] (actually, I wasn't sure on which one it applied) [19:34:04] *to [19:34:05] Toollabs for starters [19:34:05] the nginx is up, so it shouldn't be too hard [19:34:06] matanya: for starters, https://wikitech.wikimedia.org/wiki/Labs_infrastructure -- I welcome your thoughts or additions. [19:34:06] andrewbogott: thank you very much, didn't really understand why neutron won't work [19:34:07] Hm, I just updated https://wikitech.wikimedia.org/wiki/MediaWiki:SiteNotice but that clearly doesn't do what I thought it would [19:34:07] andrewbogott: https://wikitech.wikimedia.org/wiki/MediaWiki:Sitenotice [19:34:08] this is the one you need ^ [19:34:08] oh, case mismatch [19:34:08] you can also give me admin rights and i can help :P [19:34:08] I will never get used to when mw does and doesn't handle that [19:34:08] generally, never for mediawiki namespace [19:34:09] and now it works for you [19:34:09] andrewbogott: please correct the link to point at: https://lists.wikimedia.org/pipermail/labs-l/2014-November/003094.html [19:34:09] matanya: I meant that as a subtle "If you don't already know this from the labs list, subscribe dammit" [19:34:10] But, yes, I will link that also [19:34:10] Or, rather, my followup [19:34:10] a nice way to put it :D [19:34:11] Platonides: So far I can't get a login on tools-webproxy.eqiad.wmflabs, which would be required to put up an outage notice. I'm still fussing with it. [19:34:12] andrewbogott: Isn't it load balanced so you can just point it somewhere else on the load balancer? [19:34:12] multichill: yes, I could monkey with ldap but I don't want to cause another outage in my attempts to notify about an outage... [19:34:12] ldap? I'm used to load balancers where you can set a last resort page. If it loses all member it will just show that page [19:34:13] multichill: tools-webproxy isn't load balanced. another SPOF for tools [19:34:15] Hi YuviPanda, do you guys keep a backlog for labs for these kind of things? [19:34:15] YuviPanda: And will you be in SF in January. I will be :-D [19:34:15] multichill: not yet, but I'm starting to set something up now. Will make announcements next week. ATM it is all a bit ad hoc [19:34:15] multichill: yay :) [19:34:16] multichill: I will be, for all staff [19:34:16] I'll arrive around that time too. Dev summit and than crazy road trip [19:34:16] yeah [19:34:16] nice [19:34:16] * YuviPanda is at the foot of the himalayas, hacking under the moonlight [19:34:16] mmm [19:34:17] O_O [19:34:17] Awesome YuviPanda, post some dull photos [19:34:18] multichill: indeed :) I shall! [19:34:18] The company I work at is switching to agile too for operations. That's going to be interesting [19:34:19] andrewbogott: In theory, the hardware is okay now; I'm going to gradually bring the server back up, checking every step. [19:34:19] Great! [19:34:19] :) [19:34:19] andrewbogott: It looks like it wasn't so much the controller as the cable leading to it. [19:34:20] Well, that's… cheap to fix! [19:34:21] multichill: I don't think we're switching to anything :) [19:34:23] ACTIVE '/dev/store/project' [30.00 TiB] inherit [19:34:23] * Coren sighs in releif. [19:34:24] ... perhaps prematurely. [19:34:24] Ah, no, journal recovery was _long_ [19:34:26] NFS is back up; instances are gradually recovering. [19:34:27] thank you :) [19:34:27] !!! [19:35:10] andrewbogott: I need to go eat; I'm at least 2.5h past my lunchtime. Things should be waking back up gradually, NFS is fine again. Followup email will have to wait until I've eaten. [19:35:30] Coren: ok, no worries. [19:35:56] Thank $deity Chris was already on-site; jiggling cables around isn't something that's easy to do remotely. [19:36:26] thanks for bringing labs back, y'all [19:45:19] wm-bot a bot that outlived hardware [19:45:27] !ping [19:45:27] !pong [19:55:49] Hmm, something is starting again. Getting emails from cron "/bin/sh: execle: Cannot allocate memory" [20:10:29] Coren: I assume you guys are still working on erros? [20:10:31] *errors? [20:11:21] catscan3 has 2 webservices running now [20:11:33] multichill: That's almost certainly dust falling off the outage, not new outage. Lots of stuff will have piled up while the filesystem wasn't there and much of it would have failed; but no notices could have gone out during. [20:11:34] i get emails about oom [20:12:06] And as things try to catch up, some will have piled up unreasonably. [20:12:32] Labs as a whole tends to be wobbly for a little while while things catch up. [20:15:53] So db replication etc is all running again ? [20:21:41] db replication was never affected [20:28:15] when mono was reestablished? [20:30:40] 3Wikimedia Labs / 3Infrastructure: Can't ssh into bastion-eqiad.wmflabs.org - 10https://bugzilla.wikimedia.org/73089#c2 (10Tim Landscheidt) 5RESO/WON>3REOP AFAIUI, someone wants to fix it :-). [20:32:12] 3Wikimedia Labs / 3Infrastructure: Can't ssh into bastion-eqiad.wmflabs.org - 10https://bugzilla.wikimedia.org/73089 (10Gilles Dubuc) 3NEW p:3Unprio s:3normal a:3None As of a fee minutes ago I can't ssh into that server: $ ssh bastion-eqiad.wmflabs.org ssh_exchange_identification: Connection closed... [20:41:30] there seems to be a disparity between emails maintainers get [20:41:55] "disparity"? [20:42:21] my co-maintainer got 30 cron mails, i only 3 [20:42:55] gifti: Perhaps they are throttled or put in the junk folder at your end? [20:43:12] we'll see [20:44:40] 3Wikimedia Labs / 3Infrastructure: Can't ssh into bastion-eqiad.wmflabs.org - 10https://bugzilla.wikimedia.org/73089#c1 (10Andrew Bogott) 5NEW>3RESO/WON There is a labs outage underway. It's always a good idea to check the labs mailing list before logging bugs like this :) [20:49:24] 3Wikimedia Labs / 3Infrastructure: Can't ssh into bastion-eqiad.wmflabs.org - 10https://bugzilla.wikimedia.org/73089#c3 (10Andrew Bogott) 5REOP>3RESO/FIX ok then :) [20:49:46] remind you that mono still not works [20:51:53] What do you mean "mono" doesn't work? [20:56:24] Coren: http://i.imgur.com/nt3DVPA.png [20:56:52] http://i.imgur.com/5mM04xq.png [21:00:12] That's unrelated to the previous outage, and doesn't seem to be a problem with mono itself but with your program. It looks like you're trying to load "NotNetWikiBot" but your program can't load it. [21:00:42] I don't really know much about dotnet programming, though, so there is little I can do to help you; but perhaps others on labs-l might be able to. [21:04:02] yes, second screen was because I don't put DNWB dll into this folder [21:51:22] Coren: Around? [21:51:39] Yep. What be up? [21:53:14] Coren: I've cleared those fClueBot forks... no idea why it did that [21:53:37] I take it that didn't happen again? [21:54:24] 0Nope, well, not yet anywa [21:54:45] (Fail typing tonight) [21:56:16] Although... I have bumped up the RAM... as I don't like getting 75 e-mails from Cron Daemon saying that there is no more memory lol [21:58:10] 3Wikimedia Labs / 3deployment-prep (beta): no log in deployment-bastion:/data/project/logs from "503 server unavailable" on beta labs - 10https://bugzilla.wikimedia.org/72275#c6 (10Greg Grossmeier) (Commented on that change) [21:58:53] i suggest that this is an outage induced sge hiccup [22:09:02] gifti: It may well be; mail processing would have also been affected. [22:49:01] I have a puppet problem on a labs instance: Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid tag "role::restbase::labs::otto_cass " on node i-000006b1.eqiad.wmflabs [22:49:54] this is after adding the (new) role through the web interface, copying it & the module it depends on into /etc/puppet & running sudo puppet agent -tv [22:56:40] gwicke: If your C&P is precise, then you have an extra trailing space. [22:57:00] C&P? [22:57:01] It wouldn't matter in a manifest, but it may be significant in the LDAP entry [22:57:07] Copy & Paste [22:57:12] ah [22:57:30] hmmmm [22:57:39] that's a good point [23:02:41] I re-entered it, definitely without a trailing space, but still get the same error [23:24:33] Coren: have a few minutes to help me investigate a mystery? [23:24:46] andrewbogott: Sure. What be up? [23:25:05] openstack-test-horizon2 [23:25:22] If you log in, you will see that the package openstack-dashboard is installed. [23:25:36] And yet, experience shows me that if I enable and run puppet, it will try to install it again, breaking many things. [23:25:48] It's set to ensure => present, so why install again? [23:26:09] (The reason it breaks things is… my manifest waits until the package is installed and then rearranges the config files. Reinstalling de-arranges them and causes trouble.) [23:26:13] And it's not a virtual package, right? Puppet's apt-get provider is not very good about that. [23:26:31] I don't think so… how would I know? [23:27:37] !log integration deleted corrupt mediawki/core clone in workspace/mwext-MobileFrontend-qunit-mobile on gallium [23:28:00] !log !log is not working :( [23:28:11] Logged the message, Master [23:28:12] !log is not a valid project. [23:29:27] aptitude is more verbose about things like this; but it's not virtual so that's not it. [23:29:59] And you see puppet actually doing an install again, not just thinking it does it? [23:30:28] Wait, how are you doing your rearranging? Just having a dependency on the package? [23:31:27] Yes, actually installing. Which causes it to rearrange config files into a broken state, and then try to restart apache, which fails, thus causing the subsequent dependencies (which would fix the config) to not run [23:31:54] That's a self-hosted puppet instance, the code in question is in modules/openstack/manifests/horizon [23:37:02] ... require => package['openstack-dashboard'] [23:37:08] Shouldn't that be 'Package'? [23:37:18] Hm, yes. [23:38:21] I think the behaviour with lowercase dependencies is subtly wrong. It might be that simple. [23:38:22] Would that matter? [23:38:27] ok, trying... [23:40:20] holy crap [23:40:26] Is that somehow a feature? [23:41:26] Shit, no! [23:41:32] Coren, the first run was clean but I ran again [23:41:34] and now it's trying to install it [23:41:41] (This is after fixing the case) [23:42:18] This kind of oscillating behavior is exactly why I've burned so much time on this stupid simple manifest [23:43:44] Personally, I'd do the file cleanup with an exec that creates a file so that you can use a creates => to make sure you never do it twice. [23:44:28] Also, 'refreshonly => true' on the exec and 'refresh => Exec['the-exec']' in the package? [23:44:52] OK. But… it's insane that puppet is trying to install the package at all, right? [23:45:47] Not really; remember pupet stanzans are supposed to be idempotent - I'm pretty sure the dependency doesn't mean 'if you installed' but 'if it /is/ installed' [23:46:14] Wait, now I'm lost. [23:46:31] Shouldn't the package definition get traversed but do nothing since it checks to see if something is installed before installing it? [23:46:43] Or do you think puppet just sometimes reinstalls packages when it feels like it, even if they're already there? [23:47:29] I also don't follow how the refreshonly logic will prevent getting into the broken state I'm in now... [23:47:30] It may or may not; but your issue I think is that whatever depends on Package['foo'] isn't going to fire when foo gets installed, but whenever foo is there at all. [23:48:02] OK, but, that's fine I think? It removes those config files, and then tries to remove them subsequently but they're not there anyway, so, no problem. [23:48:17] And meanwhile I install a config in a different place that the package is unaware of [23:48:48] The config file I /want/ is called 50-horizon [23:49:19] The problem is that when the package reinstalls it then has /two/ configs, which duplicate definitions and prevent Apache from starting [23:49:41] Ah, but with refreshonly… that fixes it for the second run, maybe... [23:49:47] Wait... [23:49:49] Does refresh happen even if the dependency fails? [23:49:53] Waiting! [23:50:12] That makes no sense. If you apt-install something that is already there it doesn't fix config files you have removed. [23:51:29] I think that the package has a postinstall script that creates a symlink [23:51:39] which is bad because I removed the file it points to [23:51:56] I guess I should change my exec to empty that file but not remove it. Then the package can point to it as much as it wants. [23:52:49] That'd be a neat workaround, but I still don't get. If you 'apt-get install foo' and 'foo' is known to be installed apt-get doesn't run postinstalls either - it's supposed to do nothing. Unless you have something that /removes/ the package first. [23:53:20] Now you are back to my original question! [23:53:24] Try it! [23:53:25] dpkg --list | grep openstack-dashboard [23:53:30] apt-get install openstack-dashboard [23:53:32] It does things [23:54:43] 3Wikimedia Labs / 3wikitech-interface: Get Parsoid (and thus VisualEditor) working again on Wikitech - 10https://bugzilla.wikimedia.org/73104 (10James Forrester) 3NEW p:3Unprio s:3normal a:3None They broke and were disabled, but we should get them back. [23:56:09] Ah, "1 not fully installed or removed." The config files aren't properly marked as config files in the package! [23:56:27] 3Wikimedia Labs / 3deployment-prep (beta): An inserted image gives 403 Forbidden - 10https://bugzilla.wikimedia.org/73102#c1 (10Andre Klapper) http://en.wikipedia.beta.wmflabs.org/wiki/File:Paragon_2725918194_4227b11610.jpg "Size of this preview" links also trigger 403s Not an issue in the MediaWiki codeb... [23:56:29] This is why puppet reinstalls it. [23:56:33] Ah, so removing the file causes it to think it's not installed. [23:56:40] So, will blanking the file vs. removing it help with that as well? [23:56:44] Well, it causes to think it's broken. [23:57:11] I don't know if it will; it may well decide that the file is /wrong/ and still want to reinstall -- but it's worth a try. [23:57:15] lemme try [23:57:29] Just 'ensure => file' and 'content => ""' [23:57:42] Bah, still wants to install [23:57:46] I just cleared it by hand [23:57:47] But, btw, that's an upstream bug. [23:58:09] config files should be marked as such in the package - endusers are supposed to be allowed to beat config files up without issues. [23:58:38] Consider how "fun" apache2 would be if it always wanted its 00-default in sites-enabled :-) [23:58:53] So, can we ask the package why exactly it thinks it's broken? [23:59:20] There's a way, but I haven't done that in ages. I think you need apt-file