[00:08:30] Sisyphus! [00:52:04] atus [00:54:12] geohack is broken [00:54:26] https://tools.wmflabs.org/geohack/geohack.php?pagename=File%3AEFF_photograph_of_NSA%27s_Utah_Data_Center.jpg¶ms=40.431530_N_-111.933092_E_globe:Earth_type:camera_&language=en [00:54:38] Internal Error [00:54:44] should i BZ? [00:55:10] it links to that from all images on commons that have location metadata [00:58:50] how might i connect to a mysql slave from a labs instance? i see there are labsdb1001-3.eqiad.wmnet accessible, but they dont like my wikitech l/p [01:03:00] nm i'm daft, there appears to be a file in my home directory with credentials [01:03:50] 3Tool Labs tools / 3[other]: geohack.php - Internal Error - "camera location" links on commons broken - 10https://bugzilla.wikimedia.org/67785 (10Daniel Zahn) 3NEW p:3Unprio s:3normal a:3None (all) images on commons that have location metadata automatically link to geohack.php on tool labs in the "Ca... [01:09:48] 3Tool Labs tools / 3[other]: wiwosm - broken links from commons to open street map - 10https://bugzilla.wikimedia.org/67786 (10Daniel Zahn) 3NEW p:3Unprio s:3normal a:3None (all) images on commons that have location metadata display link to these external sites in the "Camera Location" field: "View... [01:33:32] !log tools tools-exec-11 and tools-exec-13 have been added to the @general hostgroup [01:33:36] Logged the message, Master [01:34:15] Coren: ^ two new nodes added, since all the ones seem to be CPU bound [01:45:02] Coren: hmm, I added them to the @general hostgroup, but it still seems to be sending any new tasks I start to other machines, despite these being empty [01:46:32] * YuviPanda|zzz goes to try and sleep again [01:47:04] Amir1: btw, I added two new hosts (exec-11 and -13), try submitting jobs and see if any of them end up there :) [01:50:15] YuviPanda|zzz: Thank you [01:50:23] for your relentless efforts [01:50:28] I try [01:52:49] the new job is now in exec-01 [02:03:43] Amir1: yeah, looks like they aren't fully in rotation yet. Unsure why, will have to wait till Core.n wakes up [02:23:29] Coren: I've removed them from both exec list and host list, will try again when you're around. I did the same steps as for webgrid (but with @general), but qconf -se shows them to have 0 processors and hence no jobs go there. I also realized that these are bigger machines than the other exec hosts (8cores vs 4 cores), I guess we can either delete and start over or just give these more slots. [04:11:09] heya, Flow API calls on beta labs aren't working right, they're stuck in the past. ebernhardson says "can you have someone poke the APC cache on beta" ? [04:13:10] beta labs has more than one API server, right? Maybe one of them isn't updated or something? [04:13:15] specifically, its not finding a class the certainly exists and is in the autoloader [04:23:27] there might be something in deployment-bastion:/data/project/logs but I'm not sure what to look for [04:26:07] ebernhardson: I issued the API request in a new browser tab and it worked fine, [04:26:13] http://en.wikipedia.beta.wmflabs.org/w/api.php?action=flow&format=json&page=Talk%3AFlow&submodule=view-topic-summary&workflow=rxgdvyxm2fypkekf&vtscontentFormat=wikitext [04:26:20] spagewmf: i was able to get the error from beta over here too though [04:26:38] well, its just a general error page no details so could be something different [04:27:06] failed in-page and successful both had the same X-Cachedeployment-cache-text02 miss (0), deployment-cache-text02 frontend miss (0). I don't know how to tell which server did the work. [04:27:18] ^I mean HTTP response header [04:27:35] hmm, a few refreshes later and it works again [04:28:08] but i see te same error in /data/project/logs/fatal.log @deployment-bastion [04:28:53] its always 02 with the fail [04:32:39] ok i just bounced the apache on 02, all is fine [08:15:08] (03PS1) 10Lokal Profil: Add new list for Sweden [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145244 [09:07:10] (03PS1) 10Lokal Profil: Update Defaults to wmflabs [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145254 [09:08:24] (03PS1) 10Lokal Profil: Https for markers and images [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145256 [09:09:03] (03CR) 10Lokal Profil: [C: 031] Https for markers and images [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145256 (owner: 10Lokal Profil) [09:09:54] (03CR) 10Lokal Profil: [C: 031] Update Defaults to wmflabs [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145254 (owner: 10Lokal Profil) [09:13:21] (03CR) 10Lokal Profil: [C: 031] Add new list for Sweden [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145244 (owner: 10Lokal Profil) [09:13:33] (03CR) 10Multichill: [C: 032] "Are you working down the to do list at https://wikitech.wikimedia.org/wiki/Nova_Resource:Local-heritage/SAL ?" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145254 (owner: 10Lokal Profil) [09:13:43] (03CR) 10Multichill: [V: 032] "Are you working down the to do list at https://wikitech.wikimedia.org/wiki/Nova_Resource:Local-heritage/SAL ?" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145254 (owner: 10Lokal Profil) [09:17:02] (03CR) 10Multichill: [C: 032 V: 032] "You don't have to +1 your own changes. Just use the -1/-2 if you notice an error after you submitted the change." [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145244 (owner: 10Lokal Profil) [09:21:10] (03CR) 10Multichill: "Wouldn't // instead of http:// or https:// work here?" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145256 (owner: 10Lokal Profil) [09:38:06] (03PS2) 10Lokal Profil: Https for markers and images [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145256 [09:39:37] (03CR) 10Lokal Profil: "True. Also spotted that the image is loaded through CommonFuntions.php (only the link to the image is in FormatKml)" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145256 (owner: 10Lokal Profil) [09:42:13] !log local-heritage Updated Default.php to point to toollabs instead of toolserver [09:42:15] Logged the message, Master [09:42:41] !log local-heritage Added se-arbetsliv a list for Working Life Museums in Sweden [09:42:42] Logged the message, Master [09:43:31] hi lokal-profil [09:43:46] !log local-heritage Images and markers in Kml now load from // instead of http:// [09:43:47] Logged the message, Master [09:43:59] hi multichill [09:44:10] Good job :-) [09:44:57] You could include the link to gerrit patchset in the log. Makes it easier for others. [09:45:14] The http:// isn't submitted yet, right? Or did you do a local patch? [09:45:34] thanks. True, didn't think about that. Will fix it manually. [09:47:13] not submitted, to fast on the log trigger for that one. New patch up though [09:47:50] not sure I understood about the todo on SAL [09:57:11] lokal-profil: I did some local changes, but was too lazy to push them into Gerrit [09:57:39] (03CR) 10Multichill: [C: 032 V: 032] "Mixed content warnings suck" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145256 (owner: 10Lokal Profil) [09:59:45] ok. Does it change any of the i18n settings in Default.php/database.inc? [10:02:44] think so, hold on [10:03:19] lokal-profil: Just do a "git diff" to see the changes [10:15:43] multichill: Should still load intuition data in the same way though. [10:16:07] I had to make some changes to get that working again [10:16:07] and i18n paths had been updated in both places already [10:16:30] ah [10:54:58] (03PS1) 10Lokal Profil: Add .gitignore [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145268 [10:57:47] hey [11:12:07] i have to copy quite an amount of tiles to maps-tile3 but i'd love to have it faster then last time [11:12:22] anyone who can advice on an appropriate copy path? [11:13:37] it would finally have to arrive at maps-tiles3:/data/project/tiles [11:19:59] Coren: ? ^^ [11:20:48] !log local-heritage Created ~/temp so that the change in https://gerrit.wikimedia.org/r/#/c/145254/1/api/includes/Defaults.php doesn't produce an error any more [11:20:50] Logged the message, Master [11:30:17] 3Tool Labs tools / 3[other]: geohack.php - Internal Error - "camera location" links on commons broken - 10https://bugzilla.wikimedia.org/67785#c1 (10This, that and the other) 5NEW>3UNCO Works for me! [11:46:38] !log local-heritage Corrected commands at [[commons:Commons:Monuments_database/Harvesting]] [11:46:40] Logged the message, Master [12:28:31] 3Tool Labs tools / 3[other]: geohack.php - Internal Error - "camera location" links on commons broken - 10https://bugzilla.wikimedia.org/67785#c2 (10Andre Klapper) Currently works for me too... [12:29:46] 3Tool Labs tools / 3[other]: wiwosm - broken links from commons to open street map - 10https://bugzilla.wikimedia.org/67786#c1 (10Andre Klapper) 5NEW>3UNCO Works for me [12:43:13] !paste [12:43:13] http://tools.wmflabs.org/paste/ [12:56:36] Coren: Did we have some sort of hickup in dns or labs as a whole? [12:57:14] I just noticed tools.wmflabs.org not resolving [12:57:33] Coren: godog just restarted the DNS server there, I think [12:57:53] (is still not resolving for me) [12:58:38] Restarting a dns server shouldn't break all resolving unless you set the ttl really short [12:58:39] ah forgot to log it here, I've restarted opendj on virt1000 [12:59:10] multichill: what record doesn't resolve? [12:59:22] godog: The most important one: tools.wmflabs.org [13:02:44] so tools does resolve for me, bastion-restricted e.g. doesn't [13:02:44] $ dig +short tools.wmflabs.org @labs-ns0.wikimedia.org [13:02:45] 208.80.155.131 [13:03:06] or does it? [13:03:06] $ dig +short bastion-restricted.wmflabs.org @labs-ns0.wikimedia.org [13:03:07] 208.80.155.155 [13:03:23] godog: @labs-ns1.wikimedia.org is borked [13:03:53] multichill: ack, taking a look [13:04:21] (dig tools.wmflabs.org @labs-ns1.wikimedia.org returns an empty record) [13:09:14] multichill: better now? [13:09:28] yup [13:09:47] Do you plan on changing tools.wmflabs.org. 3600 IN A 208.80.155.131 a lot? [13:09:58] Otherwise it might be smart to bump the ttl to a day [13:10:41] not sure what are the plans there, this was related to ldap not accepting new connections and pdns not retrying when ldap was back up [13:12:03] godog / YuviPanda : You should have a (Nagios?) dns check for this. That way you'll see it right away when it breaks [13:12:04] YuviPanda: Some housekeeping: Could you please abandon https://gerrit.wikimedia.org/r/#/c/125241/, https://gerrit.wikimedia.org/r/#/c/139685/ and https://gerrit.wikimedia.org/r/#/c/142819/ ? Also, I think https://gerrit.wikimedia.org/r/#/c/144615/ has been obsoleted by Rush's patch. [13:12:53] multichill: there was one, it was handled on -operations (since this is on virt1001 I think) [13:14:09] scfc_de: there :) [13:14:25] scfc_de: I want to rescue the python code from the mongo patch for accounts for postgres users (and mysql ones as well, eventually), but that can come later [13:14:26] YuviPanda: They closed down their Nagios, see https://bugzilla.wikimedia.org/show_bug.cgi?id=60112 [13:14:46] multichill: ah :( [13:14:51] fuckers [13:14:54] multichill: but icinga-wm complains in -operations [13:15:24] YuviPanda: Thanks! [13:15:29] scfc_de: :D yw! [13:15:55] scfc_de: btw, the new hosts I created weren't the same size as the old ones. were twice as big :( [13:17:10] multichill: The problem with Icinga are some security bugs, and I even backported the patch to the version used on the cluster, but haven't had the energy to test + push upstream since ... January? If you want to, I can give you the code. [13:17:21] YuviPanda: Are the hosts already in use? [13:17:29] scfc_de: no, they aren't. [13:17:48] Well, you could go back, but on the other hand I think unallocated space isn't "really" used. [13:18:08] scfc_de: yeah, but it's mostly the fact that CPU/Memory will be different and harder to see load 'at a glance' [13:20:15] YuviPanda: For a quick overview of load, you can always use "qstat -f". [13:20:29] scfc_de: true, true. [13:20:38] scfc_de: I'm personally ok with it, let's see what Coren says [13:25:50] !log local-heritage dns was broken, because of that api has been acting up for the last 2 (?) hours [13:25:51] Logged the message, Master [13:26:28] scfc_de: I'm not going to do the operations job. They should fix up their systems, that's part of their job description [13:46:34] ... there are problems AGAIN with webservices .... [13:46:35] !@#$% [13:46:56] I cannot do my thing in this way.. I relay of reliable services [13:54:11] GerardM-: If you have the url of the tool, maybe YuviPanda can give it a gentle push [13:57:48] yeah, I think this problem needs fixing stat [13:59:57] .. it is now running again ... [14:00:49] tried to add some MeP to Wikidata [14:07:10] yuvi .. this one should show results of a query and does not http://tools.wmflabs.org/reasonator/?q=Q9014290 [14:07:22] GerardM-: looking [14:10:39] GerardM-: hmm, unsure what's happening. no obvious issues I could see :( [14:49:15] Show all jobs for tools that a user is member of: "qstat $(groups magnus | sed -e 's/^[a-z]\+ : //;' | tr \ \\n | sed -ne 's/^tools\./-u tools./p;')" [14:50:09] So from the looks of it it can either be tools.wikidata-todo or tools.commonshelper. [14:51:22] For the former, error.log shows a couple of "server stopped by UID = 0 PID = 16094" => probably OOM. [14:53:30] Requested memory is 7 GByte, maxvmem = 6.034G. [15:07:42] godog: a bit late for this, but… pdns pretty much always crashes whenever opendj restarts. [15:07:56] So I always restart pdns on both hosts (virt0, virt1000) after messing with opendj [15:10:20] andrewbogott: oh ok! I thought it was smarter than that :( anyways I have https://gerrit.wikimedia.org/r/145282 out for review [15:10:28] some day replace with gdnsd? [15:11:13] some day :/ [15:11:28] godog: do you know what the ulimit is now? [15:12:02] andrewbogott: ye I think it starts at 4096, that was exhausted on virt1000 when I looked tho [15:12:19] ok [15:17:34] !@#$%^ services down again :( [15:18:04] .... two minutes load time ... is not down ... but bad [15:24:13] godog: naturally when you roll out that change and restart opendj… you will also need to restart pdns :) [15:28:32] andrewbogott: hehe indeed, I'm also suspecting opendj is leaking file descriptors [15:29:29] godog: if you want to just rip it out and put in gdnsd I won't stop you :) [15:30:33] heh I'll file an RT so we don't forget at least [15:31:47] GerardM-: I'm not ignoring you but I also don't much know how to investigate. Hopefully Coren or a useful volunteer will be along shortly... [15:32:11] GerardM-: I do know that we're having some growing pains w/all the new TS arrivals. [15:35:01] andrewbogott: The webservices are pretty much isolated against each other, so that shouldn't matter that much. [15:35:17] scfc_de: ok, any idea what GerardM- is seeing then? [15:37:24] I'm along. What's up? [15:37:26] andrewbogott: No. [15:37:39] GerardM-: Talking about Magnus' tools? [15:39:02] At the cost of being Captain Obvious, isn't there a way to spread load more? :) https://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&m=load_report&s=by+name&c=Virtualization+cluster+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 [15:39:55] Nemo_bis: We could reschedule webservices on -01 and -02, but that would cause momentary disruption to those. [15:40:19] Oh, you mean the virtual nodes themselves; my mind was still on webservices. [15:40:45] Nemo_bis: you're talking about 8 and 9 being idle? [15:43:17] Nemo_bis: virt1009 is depooled at the moment, virt1008 I only just set up and added a few days ago. So it will fill up gradually. [15:43:58] I believe that the scheduler looks at allocated resources and not activity. So it may be imprecise and get us a bunch of idle instances on one box and a bunch of busy ones elsewhere... [15:44:13] we could move some busy tools instances to virt1008 [15:44:14] andrewbogott: The instances are spread over virt* manually IIRC? I. e., OpenStack doesn't move them by itself? [15:44:37] There's balancing on VM creation time, but once they're on a host they stay there [15:44:43] it requires downtime to move a VM between hosts. [15:45:48] Coren: saw my messages from yesterday? [15:46:50] YuviPanda: Yeah, I don't think there's anything /wrong/ with the bigger nodes per say, gridengine does manage it, but it makes things a bit more complicated. [15:46:55] per se* [15:50:02] Coren: right. I'm ok with either, so let's keep 'em [15:53:11] YuviPanda: You added 3? I thought you added 2? [15:53:23] Coren: right, so I was going to add 2, but the second one I accidentally added trusty [15:53:33] Coren: so I haven't added it to any host, but testing the packages on it [15:53:34] role [15:53:43] we should add a trusty queue later on, I THINK> [15:53:48] Aha. -12 is it? [15:54:08] Coren: can you add -11 and -13 to the hostgroup? I tried yesterday (first with -Ae and then with -mhgrp), and it seemed to affect nothing [15:54:15] Coren: -se on the new hosts showed them to have 0 processors [15:54:33] Coren: yeah, -12 is trusty. https://gerrit.wikimedia.org/r/#/c/145195/ should fix most of the failures, I belive. Wanna merge? :) [15:54:56] I'm going to do you one better and add a requestable resource for 'trusty' and 'precise' soon; I don't want a distinct queue for stuff that may work perfectly well on either. [15:55:31] Coren: ah, cool! that's nice. [15:56:26] Coren: ty, let me see if puppet succeeds now [16:00:04] !log tools manually removed mariadb remote repo from tools-exec-12 instance, won't be added to new instances (puppet patch was merged) [16:00:07] Logged the message, Master [16:07:17] Coren: hmm, a bunch of packages seem to have different names. [16:07:45] YuviPanda: Yeah, that'll take some time to sort out; I'm not going to put it in the queue until then. [16:08:08] Coren: yeah, I'm sorting them out now. will have to branch in puppet, but they mostly have equivalents [16:13:30] YuviPanda: 11 and 13 added. [16:13:35] Coren: woot [16:14:25] brb lunch [16:22:35] Coren: you seem to have added -12 as well? I see it in http://tools.wmflabs.org/?status [16:28:43] -12 is there, but not part of any queue. [17:51:36] could someone please restart autolist2 ? [17:51:48] 400 - Bad Request [17:52:05] it is said to run out of memory [17:52:47] GerardM-: there's no such tool called autolist2, can you give me the URL? [17:58:10] YuviPanda: http://tools.wmflabs.org/wikidata-todo/autolist2.php [17:59:05] GerardM-: try again? [18:44:09] qchris, any thoughts on why wikitech would be claiming an instance doesn't exist when it definitely does exist (its running and I can log into it): https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=configure&project=collection-alt-renderer&instanceid=1cc1be94-dde3-44e1-abf4-846535702a14®ion=eqiad [18:44:37] this is probably related to me also not being able to create new instances in this project -- collection-alt-renderer [18:44:57] mwalker: No clue. I hardly have labs experience :-/ [18:45:02] mwalker: I had the same problem and logging out and in fixed this. [18:45:14] was just about to suggest that. [18:45:23] It fixed a problem for me two days ago :-) [18:45:30] heh; ok! trying :) [18:45:35] also consider turning your computer on and off [18:45:38] who is the labs admin btw? [18:46:01] or is it just whoever is unlucky enough to pick up the rt tickets? [18:46:06] YuviPanda: I turned my computer on and off, but now it's not doing anything [18:46:13] oh dear [18:46:20] first I messed up horses vs ducks [18:46:20] now this [18:46:21] WHY [18:46:30] (03CR) 10Multichill: [C: 032 V: 032] "ok" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/145268 (owner: 10Lokal Profil) [18:46:32] mwalker: usually andrewbogot.t and Core.n [18:46:55] And YuviPanda if they are not around :-D [18:47:04] ah; yes; that rings a bell :) [18:47:06] qchris: I don't have root on labs :D [18:47:08] qchris: only toollabs [18:47:52] Lame excuse ... I don't have root on gerrit.wikimedia.org, but still I am told to fix things there :-P [18:48:08] fascinating... that worked [18:48:14] mwalker: \o/ [18:48:28] qchris: :D [19:34:44] andrewbogott: Any ETA on my patches for Echo on wikitech being deployed? Just curious. [19:35:11] bd808: usually I try to schedule a deployment window for OpenStackManager roll outs. [19:35:17] Which I haven't done so far :) [19:35:44] Sure. No worries. I just got another "empty" email and it made me curious [19:52:46] Damn. MariaDB [nlwiki_p]> SELECT * FROM wikidatawiki_f_p.wb_entity_per_page LIMIT 1; ..... 1 row in set (16.75 sec) [19:53:14] federated tables performance issues? [19:53:56] !ping [19:53:57] !pong [19:56:20] YuviPanda: Sure looks like it [19:58:22] hi there! is there any issue known with tools-db? [19:58:23] because my bot keep returning error with it : (2005, "Unknown MySQL server host 'tools-db' (11)") [19:58:47] yet I’ve just tested it on eqiad and it looked just fine… [20:22:28] Toto_Azero: Hours ago there was a (very short) DNS outage. Does your error happen now? [20:23:00] scfc_de: yes, at least a few dozens of minutes ago [20:23:04] let me check right now [20:23:29] scfc_de: less than 5 minutes ago [20:25:16] YuviPanda: Doh! labs-vagrant is very broken now [20:26:14] YuviPanda: https://gerrit.wikimedia.org/r/#/c/144102/ got rid of helpers.rb [20:26:21] * bd808 will debug and try to fix [20:26:40] bd808: oh, guh [20:26:45] bd808: damn. [20:27:33] Toto_Azero: I can connect just fine. What is your tool's name? [20:28:46] scfc_de: totoazero ; yet the problem is that I can connect fine too manually, but it looks like scripts running with SGE can’t [20:29:14] scfc_de: lack of iptables/NAT on the new nodes maybe? [20:31:01] No, the task is running on -exec-06, if I see that correctly. [20:31:56] right [20:32:16] Toto_Azero: In which file can I see the error message? [20:32:35] YuviPanda: And tools-db doesn't need NAT, so it shouldn't be affected anyhow. [20:32:40] right [20:35:01] !log tools tools-exec-11, tools-exec-12, tools-exec-13: iptables-restore /data/project/.system/iptables.conf [20:35:03] Logged the message, Master [20:35:48] scfc_de: we should fern those :) [20:37:51] YuviPanda: I tried that once and failed miserably :-). I'd prefer to move the aliases to DNS and the NAT on the DB hosts. [20:38:01] scfc_de: :D right [20:53:57] scfc_de: in totoazero/logs/07/10/unblock.out for instance [20:54:05] (sorry I was afk) [21:07:32] Toto_Azero: Very odd. When I (as tools.totoazero) log into tools-exec-10 and do "mysql -us51245 -p -htools-db", it works without problem (tools-exec-10 being the last host the job was executed on). [21:31:37] hi Is the time in tools labs and the grid in GMT? [21:31:51] rohit-dua: should be [21:33:40] YuviPanda: both grid and instance will haave the same time in gmt.. right? [21:33:46] should, yeah [21:36:07] And with ntp sync, they should probably be not more than a second apart. [21:48:36] hi labs folks [21:48:39] Error 400 on SERVER: Could not find class passwords::mysql::phabricator [21:49:09] anyone know how to fix that? It's been happening in labs for a week or more [21:49:24] christian did some magic but he's not around so I'm a bit stuck [21:49:40] milimetric: yeah, I just opened up the phabricator role and commented out the lines including that include and the lines after [21:49:42] He is not? [21:49:47] ugh! [21:49:53] you monitor the word christian now?! [21:49:54] :P [21:50:20] thanks YuviPanda; qchris - is that how I should proceed on production wikimetrics? [21:50:30] milimetric: I basically did what Yuvi said in the commit that has the wikimetrics passwords. [21:50:39] ok, so I'll do the same in prod [21:50:41] thanks! [21:50:46] milimetric: aaaah, right. I don't think that'll work for things that aren't your own puppetmaster, actually... [21:51:00] it's ok YuviPanda, we're self-hosted here [21:51:37] milimetric: ah, cool [21:51:50] milimetric: btw, the underlying problem is the labs private repo not being sync'd properly, unsure how to fix [21:52:06] yeah, annoying, seems like we shouldn't merge stuff that does this though [21:55:03] milimetric: I think you just need to "git pull" the labs/private repo. Basically operations/puppet and labs/private need to be updated in parallel. [21:56:13] milimetric: can you try what scfc_de suggested to see if that fixes it? [21:56:15] theoretically it should [21:56:47] hm, I'll try that on staging at some point, but I'm on production now and it's working so I'm afraid to break it :) [21:57:15] milimetric: :D ok [21:58:06] milimetric: Production = WMF production? Then you need to ask someone from ops; I don't think there should be labs/private outside of Labs :-). [21:58:10] nonono [21:58:12] heh :) [21:58:24] scfc_de: I think wikimetrics production is on labs [21:58:26] production = wikimetrics production (it's like 3rd tier analytics stuff) [21:58:36] yeah, runs on labs [21:59:14] Ah, okay. Just wanted to say something before you break the cluster :-). [22:00:17] yep, no worries, nobody in their *right mind* would give me access to that [22:02:15] YuviPanda: labs-vagrant should be fixed now on Trusty hosts. [22:02:44] bd808: woot! [22:03:01] bd808: tyvm [22:03:06] scfc_de: Did you try that? I tried it some days ago, And pulling was not allowed. Is it supposed to work? [22:03:28] YuviPanda: Sure. S yells at me before he yells at you when it's broken [22:03:36] bd808: aaah :D [22:04:06] qchris: Yes and no. I did that, but I had to copy the ssh key to ~root/.ssh/id_rsa. Sorry, forgot to mention. [22:04:27] scfc_de: Ok. Makes sense. [22:04:31] bd808: I need to work on hardening it a bit, not depend on vagrant internals [22:04:34] Thanks. [22:05:03] qchris: /var/lib/git/labs/private/labs-puppet-key [22:05:29] YuviPanda: it's kind of a mess I've noticed. It reads the /etc/puppet/puppet.conf file for one thing, which seems wrong. [22:05:30] scfc_de: Ok. Thanks. [22:05:52] YuviPanda: Does it disable normal puppet runs when it is installed? [22:05:58] bd808: nope [22:06:45] Hmmm... I wonder if apache vhosts disappear randomly now then? I guess S hasn't complained so hopefully not. [22:07:07] bd808: shouldn't since they are on /vagrant, no? [22:07:52] puppet manages /etc/apache2/sites-enabled now I thought with purge,recursive [22:08:05] bd808: ah, hmm, yes it does. [22:08:21] * bd808 forces a puppet run to see what happens [22:09:39] lots of ownership changes...but apache vhost is left alone [22:10:02] bd808: hmm, and the vhosts are on /etc [22:10:21] * bd808 shrugs [22:11:13] YuviPanda: It could use some work in any event. [22:11:26] bd808: definitely. lots of things have changed since I first wrote it. [22:11:32] trusty, me not being as much of an idiot... [22:12:32] :) I'd like to figure out if we can move some stuff into hiera to fix vagrant/labs differences too [22:13:02] Like https://bugzilla.wikimedia.org/show_bug.cgi?id=67331 [22:13:14] bd808: right [22:13:34] bd808: I think what I'd like to do when I touch it next is to have a general role that lets any git puppet repo be applied (in addition to the ops puppet repo) to an instance [22:13:40] bd808: and then do a specific role for labs-vagrant [22:13:55] shiny [22:14:15] That would be useful for lots of labs folks too [22:14:27] You have good ideas :) [22:14:38] bd808: :D [22:14:45] bd808: and eventually, we can extend that to let, say, ansible... [22:14:47] run there [22:15:10] ansible? Now your talking goofy [22:15:30] How about letting me provision directly from Vagrant on my laptop [22:15:59] bd808: :D that perhaps can wait for the evaluation of Horizon to replace the wikitech interface [22:17:02] YuviPanda: Did you see this? -- https://gerrit.wikimedia.org/r/#/c/144334/ [22:17:17] I can't wait for it to deploy so I get nicer emails [22:17:22] HAH FINALLY [22:17:27] so much emtpy email [22:17:53] Coren: btw, when you added the new nodes, did you increase number of slots? [22:18:03] That's been bugging me for at least 9 months [22:18:09] Coren: we should also probalby find the node that's smaller (there was one, IIRC?) and reduce its slot count [22:22:03] spagewmf: Hey if you haven't heard, be careful updating the git checkout for labs-vagrant in /vagrant for labs hosts still running Ubuntu 12.04. The latest master won't work with it. [22:37:51] !log deployment-prep Added Gergő Tisza and Yuvipanda as project admins [22:37:54] Logged the message, Master [22:37:59] woot [22:49:36] bd808: I'll keep you posted when I start on the newer puppet-in-puppet system :) [22:49:49] probably gonna be a while, with graphite taking up my time now