[01:29:55] !log wikimania-support switched scholarship-alpha to gerrit repo [01:29:56] Logged the message, Master [04:17:35] Coren, ping [18:17:44] Coren: what's your BZ address? [18:21:14] StevenW: marc@uberbox.org [18:21:32] Thx [18:22:32] StevenW: What YuviPanda said. [18:34:56] petan: can you tell me about the state of nagios-in-labs? [18:35:03] I just rebooted the host because it was OOM. [18:36:41] YuviPanda: regarding my recent emails, I think you can ignore all of them except for the code review request that you just got :) [18:37:01] andrewbogott: yup! Investigating an API breakage right now, will look at that right after [18:37:09] thanks [19:13:55] hey folks [19:14:00] labsdb1003 DISK WARNING - free space: /a 217114 MB (6% inode=99%): [19:14:05] Ryan_Lane: ^^ [19:16:09] Coren: ^^ [19:16:25] I've never actually touched those boxes [19:16:37] Huh. [19:16:44] * Coren looks into it. [19:18:14] Mostly consumed by one specific user. [19:19:28] 1.2T of table space. [19:20:16] awww [19:21:57] andrewbogott: that patch looks good. have you tested it? [19:21:57] Well, he's collecting some stats. Lots and lots of stats. [19:22:23] YuviPanda: I tested it enough to make sure that something was getting written to redis. I didn't test to see if the stuff in redis subsequently worked :) [19:22:32] hehe [19:24:23] (03CR) 10Yuvipanda: [C: 032 V: 032] "http://blogs.msdn.com/cfs-filesystemfile.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-01-32-02-metablogapi/7317.image_5F0" [labs/invisible-unicorn] - 10https://gerrit.wikimedia.org/r/96668 (owner: 10Andrew Bogott) [19:24:26] andrewbogott: ^ [19:24:31] the image being http://blogs.msdn.com/cfs-filesystemfile.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-01-32-02-metablogapi/7317.image_5F00_0F65063B.png [19:25:09] That's me! [19:25:31] :D [19:25:37] andrewbogott: ALL OF OPS! actually :D [19:26:35] This week at least :( [19:28:13] Dafu is he doing? He's proxing a mysql connection to ssh and dumping stuff in the db. [19:28:23] s/to ssh/through ssh/ [19:28:52] It looks like serious analytics stuff, but 1.2T needs its own support not dumping on a replica [19:31:52] ... and whatever he uses to do it reconnects automatically. [19:32:13] Ryan_Lane: Is a blocked account on wikitech disabled from loggin in, BTW? [19:32:27] you need to remove the user from shell [19:33:01] That seems like overkill. [19:33:03] Hm. [19:33:14] oh, blocked from logging in on wikitech? [19:33:14] Well, I can't have him bring down the database. [19:33:17] or via shell? [19:33:21] No, via shell. [19:33:32] removing shell will remove him from all groups too [19:33:39] we probably should have a disabled group [19:33:40] Yeah, hence 'overkill' [19:33:53] and adding someone into disabled will temporarily disable [19:34:05] while removing from shell will permanently disable [19:34:23] Personally, I'd hook into blocks for this -- that seems like the natural thing we'd want. [19:34:30] No shell login while blocked. [19:34:44] how in the world could that work? :D [19:34:47] In the meantime, I'll just chmod his home 000 [19:35:00] the instances have no clue if someone is blocked [19:35:06] Ryan_Lane: Blocking could fiddle with LDAP [19:35:14] yeah, I guess so [19:35:31] blocking someone from login blocks them from gerrit, too, though [19:35:51] Well, I'd expect that if we block a user, we want to get their attention. [19:45:33] Well, with no alternatives right now I had to remove shell anyways. [19:46:02] Coren: It's my database, whats wrong? [19:46:36] Well, it's now at 1.2T and growing. If you want to do analytics on large datasets, the replicas are most certainly not the right place for it. [19:47:00] !log deployment-prep manually fixed some permissions rights that prevented automatic deployed of mediawiki-config.it Been broken since rougly Nov 21st at 7pm UTC. [19:47:05] Logged the message, Master [19:47:08] Coren: Sorry for that, a job is importing daily dumps [19:47:21] Also, doing it remotely through a proxy makes it harder for me to notify you with more subtlety that cutting you off. :-) [19:48:08] coren: this connection was only mysql workbench, no imports from there [19:48:34] hedonil: Ah, I couldn't tell that- just that you were connected through it. :-) [19:48:48] !log deployment-prep mwscript update.php --wiki=labswiki --quick (for OAuth database updates [19:48:52] Logged the message, Master [19:49:17] So yeah; if you explain to me what you're trying to do, I can probably help you find a way to do it that isn't going to be all that troublesome (and probably work better for you too) [19:49:49] First question: Is that data of yours precious because it's using almost all of the free space on that replica. [19:49:54] hey lovely labs peoples [19:50:04] i'm trying to find a way to backup my wikimetrics instance [19:50:20] we're in the process of deploying it to production, but I want to make sure it's safe from the tampa tornado [19:50:29] Coren: Tool is - as you know - https://tools.wmflabs.org/wikiviewstats/ - making some smart presentation of view stats [19:51:24] hedonil: Yeah, I know the end result, I'm just wondering why your database is so big, and why it needs to be on the replica. [19:51:59] milimetric: The general rule of thumb is: if it's puppetized and you don't store stuff on the local disk, you're golden. [19:52:11] Coren: Well, was the first place to be and fast enough [19:52:19] then I'm whatever the opposite of golden is :) [19:52:42] milimetric: I think you're looking for "SOL". :-) [19:52:45] it's not a super-critical service but I'd like to just make sure we have daily backups for now [19:53:05] i was thinking just mysqldump out to a file, then save that somewhere [19:53:20] and was wondering if there was some safe central file system or bacula or something like that [19:53:20] milimetric: That'd work. Stuff it onto project space and you're set. [19:53:28] what's project space? [19:53:37] It's /data/project [19:53:44] That doesn't live on the instance. [19:53:49] oh, fantastic [19:53:54] i'll do that, thank you much! [19:54:49] hm, I get permission denied even with sudo Coren [19:54:59] does someone need to give me rights there? [19:55:21] milimetric: Not supposed to. What instance is this? [19:55:26] wikimetrics [19:56:16] milimetric: I see nothing wrong there. What is the exact command you are trying to do? [19:56:27] sudo mysqldump --database wikimetrics > /data/project/wikimetrics.bak [19:56:59] hedonil: I can probably arrange something for you if you give me a minute, but you'll not have SSD performance. :-) [19:57:33] milimetric: Ah, common mistake. You're running the mysqldump as root through sudo, but your redirection (>) is done as yourself before sudo can kick in. [19:57:36] Coren: Ok, stay in talk. can I regain access? [19:58:35] milimetric: The better way to do this is use sudo to create a directory, give it to your normal user, then dump there without sudo. [19:59:03] cool, thanks [19:59:28] milimetric: (unrelated), but looks like hedonil just filled up a db or two by importing pagecounts into labs :) [19:59:45] oh! hedonil - are you Magnus? [19:59:53] No. [19:59:57] oh :) [20:00:04] :-D [20:00:08] 'cause Magnus is trying to import pagecounts into hive as well [20:00:12] *hive on labs [20:00:21] Magnus Manske, you might want to talk with him [20:00:28] and I'm about to get involved in that as well [20:00:32] yes, i know him by name. [20:00:33] I'm Dan Andreescu (analytics) [20:00:40] Hi [20:00:51] howdy, what's your interest in pageviews [20:00:55] hedonil: Yeah, hang on. [20:00:57] and your name seems familiar... [20:01:03] Coren: thank [20:01:27] New presentation and combination of view statsitics [20:01:54] actually combined with new wiki thanks (german) [20:02:30] grok.se does not fulfill all the needs [20:03:56] cool, hedonil, we're totally working on that - it just got reprioritized for our team [20:04:03] so we'll start again this coming Wednesday [20:04:23] make sure to join #wikimedia-analytics and watch analytics-l [20:05:01] milimetric: what will your features be like? better than mine? 8-) [20:05:23] well, the current thinking is to do this:https://www.mediawiki.org/wiki/Analytics/Hypercube [20:05:45] milimetric: let's have a look on that. [20:05:46] the big problem with labs, as you can see, is that you run out of space if you're trying to copy 3TB [20:05:46] :) [20:06:23] milimetric: Yeah, tried everything to minimize but it's a vast amount of data [20:07:04] I thought Tool Labs /could/ handle big data ;-) [20:07:09] there are some tricks, but ultimately hadoop might be the right thing [20:07:26] i have a test hadoop cluster set up on hadoop-test1, but the API seems like it'd be more useful [20:07:30] then people don't have to learn hive [20:07:31] brb [20:07:45] thanks Coren, btw, I understand my problem now and I've set up the backup [20:07:46] :) [20:08:02] andrewbogott: around? [20:08:02] unfortunately i'm not familiar with hadoop [20:08:07] hedonil: It can, but not on the replica storage -- that's limited SSD space. [20:08:15] YuviPanda: yep, what's up? [20:08:22] hedonil: I'm moving your database to spinning rust as we speak. [20:08:39] andrewbogott: so the weekend, I am going to try to hack up a way to get per-project puppet repos [20:08:49] Coren: Wow, tried that once from my place..... [20:08:49] Quiet. I'm trying to watch CSI. [20:08:51] :p [20:08:51] andrewbogott: that aren't ops/puppet. Is this something that'll be vetoed? [20:09:08] andrewbogott: this will just be generalizing the work I already did on labs-vagrant. [20:09:26] YuviPanda: not necessarily vetoed, but… how are you thinking you'll implement it? [20:09:30] andrewbogott: when configuring an instance, you can put in an URL to a puppet repo in the wikitech interface as a variable, and it'll clone it an drun that [20:09:45] andrewbogott: puppet apply with appropriate paths set. [20:09:57] And… the URL will point to a repo hosted where? [20:10:09] andrewbogott: just a git repo. Can be gerrit or github or wherever. [20:10:14] Coren: While you are moving the database - if i could login, i could stop the importers. Next will be in ~20 minutes [20:10:43] hedonil: No need to worry about that; there's a global lock atm. [20:11:10] YuviPanda: We sort of have that already… you can make a self-hosted puppet instance and use that as the puppetmaster for other instances. [20:11:11] https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster#Puppet_Client_Setups [20:11:15] That's the same thing, right? [20:11:45] andrewbogott: well, not really [20:11:53] ? [20:11:59] andrewbogott: first, you won't be using puppetmaster self, so you will still get updates from ops/puppet.git changes [20:12:12] Coren: if required, cancel job st-current job-nr: 1511095 (the hourly importer) [20:12:17] !log integration added Anomie as user + sudoer [20:12:19] Logged the message, Master [20:12:19] YuviPanda, who will get those updates? [20:12:26] andrewbogott: the instances that have them ticked? [20:12:46] andrewbogott: those instances will essentially be running from two separate puppet repos [20:13:01] I think that's the same thing as what I'm talking about though... [20:13:03] andrewbogott: check out the labs-vagrant module. It does exactly this. [20:13:09] andrewbogott: CAUTION: there are no provisions for automated pulls of the repository right now. This means that as soon as you add role::puppet::self the instance will stop receiving updates that are pushed into gerrit. [20:13:39] andrewbogott: plus isn't per-project puppetmaster a bit more... complex than a thing with puppet apply? [20:13:47] hedonil: It'll take a while. I'll give you access back in the meantime, but don't touch that DB. :-) [20:14:00] YuviPanda: I think you're missing what I'm saying. [20:14:03] Coren: ok [20:14:04] andrewbogott: also that page doesn't tell me if I can use my own arbitrary puppet repo [20:14:10] andrewbogott: that's highly possible :D [20:14:28] So, this is stuff Otto set up. It's maybe currently used by the beta project (hashar was going to adopt it but I don't know if he's set it up yet.) [20:14:44] which stuff ? [20:14:53] You would have one instance that is a puppetmaster (derived from gerrit upstream, but not self-updating) [20:15:04] Then you have a set of clients which automatically update and sync to that puppetmaster. [20:15:18] So, you have your own puppet repo, which is on your own puppetmaster... [20:15:22] hedonil: {{done}} [20:15:25] which derives from gerrit but doesn't automatically update. [20:15:45] hashar: Setting up a beta-specific puppetmaster so that you can test individual patches without merging them into production… remember that? [20:15:47] andrewbogott: by 'your own puppet repo', you mean 'my own clone of operations/puppet.git', correct? [20:15:57] right. [20:16:25] andrewbogott: right. So what with what I am proposing, you are not restricted to operations/puppet.git [20:16:41] I can have operations/mobile/hajitsu.git for example, and run off that [20:17:07] andrewbogott: hohh yeah you told me about it during the all staff, completely forgot to followup soryr [20:17:07] ok... [20:17:20] sorry [20:17:30] hashar: your loss :) [20:17:40] meanwhile I came with another idea which would be to add a class that would include '/data/project/puppet/*.pp' :-D [20:17:57] andrewbogott: look at CAUTION: there are no provisions for automated pulls of the repository right now. This means that as soon as you add role::puppet::self the instance will stop receiving updates that are pushed into gerrit. [20:18:10] andrewbogott: line 15 [20:18:21] Coren: canceled the jobs. what do have to do now? [20:18:37] YuviPanda: Well, that's otto's fault for not properly updating the docs :) [20:18:52] Well, and it's true for a puppetmaster. [20:18:56] oh [20:18:57] grr [20:18:58] Mostly people don't use clients, just a master. [20:19:26] I mean, it's correct that it won't receive updates pushed into gerrit. [20:19:29] hedonil: Wait patiently. :-) [20:19:34] Because rather everyone is getting updates from the local puppetmaster. [20:19:43] So they will get updates that are pushed there, but that's not gerrit... [20:19:54] hedonil: About 25% of your DB has been moved. [20:19:58] *shrug* The docs don't do a good job of explaining this use because it's obscure. [20:20:53] * hedonil waiting patiently for new infos and orders:-X [20:21:04] andrewbogott: anyway, I still think that what I'm doing is different enough to be useful vs. puppetmaster. [20:21:11] andrewbogott: let me patch it up and poke you in a few days :) [20:21:20] Do you have specific applications that require it? [20:21:26] It doesn't seem terrible, just complex. [20:21:59] andrewbogott: it'll be far simpler than puppetmaster, IMO [20:22:24] YuviPanda: Because having volunteers create their own puppetmasters by hand is… simple? [20:22:53] Will you be composing puppet classes from two different sources? Or will the instance be totally cut off from labs puppetmaster? [20:23:03] andrewbogott: two different sources [20:23:10] but *not* mixed together [20:23:18] you can't refer to classes from one source to another [20:23:22] see https://github.com/wikimedia/mediawiki-vagrant/blob/master/lib/labs-vagrant.rb#L56 [20:23:28] that's what labs-vagrant uses now [20:23:36] OK, but the instances will still pick up labs-wide changes made in the central repo? [20:23:43] andrewbogott: yup! [20:24:05] Will a compilation failure on the secondary repo prevent puppet runs? Or will there be two different isolated runs? [20:24:11] isolated runs [20:25:35] milimetric: dou you have volunteers in your team or just WMF clerks? [20:25:59] YuviPanda: That sounds ok then. As long as I can still admnistrate the affected boxes. [20:26:22] andrewbogott: yup, you still can. That's the difference from puppetmaster, that applying the class doesn't 'isolate' it [20:26:51] hedonil: yeah actually, we're working with a couple of volunteers loosely right now [20:28:23] milimetric: Do you nknow how many developers are currently working on view statitics? Wouldn't it be smart to make /one/ database for all? [20:29:18] * with ssd-performance and full history data. [20:31:59] andrewbogott: I'll take that as a 'no that is not a completely terrible idea' :) and try to write up a patch this weekend [20:32:18] YuviPanda: that's fair :) [20:32:24] :) [20:32:38] andrewbogott: any idea when we'll start publicizing the new proxy system? [20:32:45] andrewbogott: before the migration or afteR? [20:33:19] YuviPanda: definitely before -- pretty much as soon as I do some disaster-recovery tests and get things deployed. [20:35:21] YuviPanda: any last little bits you want to throw into the proxy api code before I build a new package? [20:35:50] andrewbogott: nope! but can you leave the package source somewhere where I can look at it later? [20:35:55] to learn, and tinker, etc [20:36:00] maybe even put it on gerrit? [20:36:17] I think that package-building instructions are already in the repo. I'll add them if not. [20:36:50] andrewbogott: more like the actual files? [20:37:37] I regenerate the package source files every time, as part of the build. [20:37:57] And I pretty much never look at 'em so I assume they're boring :) [20:38:02] aaah [20:38:08] andrewbogott: then instructions are fine :D [20:38:22] haven't looked at the readme yet [20:38:27] It's in README.md, just a couple of lines. [20:38:28] * MrZ-man is also working with view statistics [20:38:29] so dunno about the instructiosn [20:38:31] ok [20:39:29] MrZ-man: Hi [20:40:14] What' your database in size? Where is it located? [20:41:30] I only aggregate data into monthly stats, so it's not very big, I'm still in the process of migrating that from the toolserver [20:42:16] MrZ-man: Ah, toolserver. strong debate in germany. [20:42:24] I generate lists like https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Cats/Popular_pages for wikiprojects on enwiki [20:43:05] along with the migration, I'm also rewriting a lot of it [20:43:48] * hedonil looking at your tool right now [20:44:28] I used to only collect data on specific pages, which made sense when there were only a few projects using it, but now it's ~90% of enwiki articles [20:44:47] so I'm just going to collect data on all articles [20:45:40] MrZ-man: Yeah that's true, a full take on all pages contains lots of crap. [20:46:27] it requires more processing to filter out the crap than process the data [20:46:53] MrZ-man: I've filtered and aggregated data, but still 1,2 billion records per month - but i need that [20:47:28] I use a 3 pass process. #1 filters out obvious crap and namespaces I don't want to track, #2 does existence checks and resolves redirects, #3 gets the assessment/importance rankings for the project [20:48:27] #2 also does the monthly aggregation so that #3 (which is the slowest) only has to do a few million [20:49:54] MrZ-man: yeah i see, you specialized on some pages of interest [20:50:35] currently. Starting next month (hopefully) it will track all enwiki articles [20:51:07] I've been doing performance tests the past couple weeks to make sure the new process can actually run reliably [20:51:49] like, that it can actually get through 1 hourly data file in less than an hour [20:52:19] MrZ-man: may god and coren guide you through your process - and provide enough and fast storage:-D [20:53:29] most of it is written in Python, except the program that actually parses the data files, which is written in C [20:55:08] hedonil: Almost done. [20:55:48] hedonil: Why, by the way, did you pick /that/ slice? [20:56:22] Coren: by chance [20:57:55] Coren: labsdb1002 was under heavy load at this point in time. labsdb1003 was /quiet/ [20:59:54] hedonil: You've been moved. [21:00:13] Coren: yep. thanks [21:00:17] hedonil: That said, please speak with milimetric further -- this does seem like something you might want to coordinate. [21:00:39] Coren: i guess you are right [21:01:35] Coren: where did i move to? new disk? new datacenter?:) [21:01:54] hedonil: Different disk; spinning rust on the same server. [21:02:34] * hedonil sees heavy load on labsdb1003 on ganglia [21:03:38] hedonil: The whole thing has been a table lock while I was moving your data; there's probably lots of things hard at work to catch up. [21:05:02] Coren: I take a look at it now, bur there are no other processes except web, so it should't be that much work to to. [21:05:23] Coren: thanks again. [21:06:40] Hey folks. I'm trying to get access to the "archive" table in the tool labs DB replicas. Is this possible? [21:07:48] Hmmm... It looks like I have a bigger issue. [21:07:49] ERROR 1142 (42000): SELECT command denied to user 'u2041'@'10.4.0.220' for table 'user' [21:07:53] halfak: here's some information: https://bugzilla.wikimedia.org/show_bug.cgi?id=49088 [21:08:24] halfak: maybe Coren knows any updates on this [21:08:36] Thanks hedonil [21:09:34] halfak: tl;dr: no, archive isn't accessible now. It's going to be made available in redacted form soon, pending availability of our DBA. [21:09:54] Coren: Thanks for the update. [21:10:29] and halfak: if you want a performant version of "revision" on labsdb, you should use "revision_userindex" [21:10:39] that one has all the nice indices and it's what I'm using in wikimetrics [21:11:03] Is that table english specific? [21:11:04] hedonil - wanna private chat? I am hesitant to keep flooding labs but I'd love to talk more about pageviews [21:11:21] revision_userindex is available on all the databases I've tried [21:11:30] (ar, commons, de, fr, en, etc.) [21:11:32] I don't see it for eswiki [21:11:43] are you on eswiki_p? [21:11:54] Good question. That could be part of the problem. [21:12:08] There it it. [21:12:12] is [21:12:13] :) [21:12:15] Thanks. [21:12:37] yep, np, you just got the 1 minute version of my 1 hour confusion when I started on labsdb :) [21:13:05] yay! [21:13:12] <3 timesavers [21:13:28] * anomie would like more than only 3 timesavers [21:13:35] gotta run. thanks for your help all. [21:14:02] halfak: a new pleased labs customer:) [21:14:30] :P I'm not new. [21:14:44] But I was pleased when I was. [21:14:50] And continue to be pleased. [21:29:30] andrewbogott: sure [21:29:47] andrewbogott: what kind of state you need to know [21:31:03] petan: I'm interested in puppet freshness. It looks like instances test that but I don't see stats on nagios-main [21:31:49] aha ok let me check [21:33:26] andrewbogott: I think puppet work there [21:33:28] notice: Finished catalog run in 15.13 seconds [21:33:56] heh yes because motd is broke [21:33:59] petan: Sorry, what I mean is -- I'm wondering if nagios/icinga is collecting stats about puppet freshness across labs. [21:34:02] It looks like it should be. [21:34:08] ooh [21:34:10] that [21:34:30] this is something I wanted to implement very long time ago and what did work in past [21:34:37] then it suddenly stopped working :( [21:35:01] unfortunately it was basically all set up by mutant e [21:35:12] Ok… I may work on it a bit in that case, I just didn't want to duplicate your work. [21:35:16] and I never understood well how it works [21:35:33] there are some data send by instances to nagios server [21:35:41] using some snmp trap I think [21:35:45] or whatever is it called [21:36:16] but the service on nagios service that used to accept these data and passed them to nagios is defunct now, dunno why [21:36:35] I think we should document this :) [21:37:21] nagios is working on labs, a little bit -- it reports downtime I think. [21:38:38] yes but puppet freshness doesn't [21:38:59] you know there is -nagios channel too? [21:39:23] irc bot was moved so that it doesn't spam here [21:39:33] wikimedia-labs-nagios? [21:39:36] yes [21:39:38] it's in topic [21:39:44] heh, so it is :) [21:39:53] channel must've been quite since that system was dead until just now [21:40:09] (19:30:53) PROBLEM Current Load is now: WARNING on app2-wpd.pmtpa.wmflabs 10.4.1.91 output: WARNING - load average: 5.58, 5.13, 5.18 [21:40:10] (19:35:45) labs-nagios-wm_!~labs-nagi@208.80.153.210 has quit [Read error: Connection reset by peer] [21:40:18] it did work until 19:35 [21:40:25] lot of messages there in backlog [21:40:33] * history [21:45:11] andrewbogott: the freshnesss monitoring, so we had issues before, and fair the root cause was [21:45:25] need to fix init script for snmptrapd or snmptt [21:45:32] so that it survives a reboot [21:45:37] on hosts or server? [21:45:38] and listens with the right options like in prod [21:46:07] andrewbogott: on server [21:46:24] snmptrapd and smptt , on neon they look like [21:46:29] mutante: ok, so you think that individual labs instances are already reporting properly, it's just that no one is listening? [21:46:31] snmp 1059 0.0 0.0 48804 5036 ? S 11:16 0:04 /usr/sbin/snmpd -Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid [21:46:34] root 4708 0.0 0.0 49900 3064 ? Ss 11:24 0:19 /usr/sbin/snmptrapd -On -Lsd -p /var/run/snmptrapd.pid [21:46:49] and the snmptrapd didnt start up with the right options, so it ran but didnt listen [21:47:04] i'm just doing wild guesses here, it might be fixed, but was an issue in the past [21:48:23] That certainly fits with what I see [21:48:44] andrewbogott: are any instances reporting freshness ? Service is not scheduled to be checked... [21:49:00] um... [21:49:13] Again, I don't know if you're talking about clients or hosts [21:49:20] every time you say something I don't know which you're talking about :) [21:49:32] I don't know much, just that it looks like in theory things are getting installed on instances to report freshness. [21:49:43] And it's clear that the nagios server doesn't actually know anything. [21:49:50] Judging from lack of mention of puppet in the logfile [21:49:50] snmpd / snmptrapd listen on the monitoring server [21:49:56] so compare that to neon in prod [21:50:05] Pretty sure it isn't listening now. I'll look at that shortly. [21:50:22] and they receive freshness checks from all the clients/nodes [21:50:44] they send that packet out to the server when a puppet run completes [21:50:50] and snmptrapd receives it [21:51:19] so what you wanna do to debug is tcpdump on the monitoring server [21:51:26] and check for incoming smptraps [21:51:45] on udp port 162 [21:51:56] and if they arrive you know its not the nodes and not networking [21:52:30] if they arrive the next step would be to tail the icinga log [21:52:41] to see if it gets them and maybe just has "not matching hostname" [21:53:02] icinga will receive a node name and then try to match it against a defined host [21:53:20] could also be mismatch of hostnames ,like instance name vs. hostname [21:53:35] hold on. i have some old mail .. [21:55:38] andrewbogott: http://lists.wikimedia.org/pipermail/labs-l/2012-March/000128.html [21:56:04] mutante: thanks. I'm in the middle of a different project now but will note this all down for next... [21:57:17] nods, yep [22:17:12] YuviPanda: OK, restoring the proxy db onto a new instance works. [22:17:18] so, now I just have to get a sidebar link... [22:17:18] w00t [22:17:23] and maybe fix that domain pulldown [22:17:24] andrewbogott: how many cores does it have? [22:17:31] hopefully not 1 :P [22:18:02] Hm, yeah, the VM currently running the proxy has 1. Should I move to a bigger instance? [22:18:09] * andrewbogott thinks, threading [22:18:11] ? [22:18:40] andrewbogott: yeah, probably. AT LEAST two so redis and nginx don't fight for one, and larger if possible. I'd give it 4 if I could [22:18:52] ok, I'll build a new instance and migrate things there... [22:18:58] since that seems to be possible :) [22:20:00] andrewbogott: :D [22:40:46] Coren, congratulations, I had no idea you were Marc-Andre Pelletier. But what does Hashar's "I am very happy to see you rempiles for new project" mean? [23:34:57] spagewmf: "put stuff back on the heap/pile". I never heard the idiom myself (different dialect) but I took it to mean "sign up for more" [23:35:49] merci