[08:14:06] hashar: you emailed about getting ganglia up for labs. I'm considering spending some time setting up graphite + diamond. thoughts/strong feelings? [08:16:31] YuviPanda: I thought I would have some time to fix up the Ganglia install [08:16:37] YuviPanda: I was wrong :-/ [08:16:51] hashar: yeah, time is a precious commodity around :| [08:17:04] hashar: the app release is going out today, so I'll have some more breathing room over the next few weeks [08:17:11] great! [08:17:40] Ganglia can probably fixed easily. Gotta create a bigger instance, migrate the data and update the puppet manifests to have instances send their metric to the new IP address [08:17:41] hashar: so was thinking of just setting up diamond / graphite than ganglia. That'd also let us collect custom stats easily, like reqs/s, 500s/s etc from tools proxy, etc [08:18:01] for graphite / diamond, I am not sure whether there is any nice graphical interface for it [08:18:11] hashar: graphite itself is the nice graphical interface, no? [08:18:21] hashar: I am also not fully sure of what ops uses ganglia vs graphite for :| [08:18:24] well it is missing some nice dashboards :D [08:18:41] though one could automatically generate configuration files for a shared gdash [08:20:16] hashar: right [08:24:19] YuviPanda: I think diamond got setup by chasemp [08:24:28] hashar: yeah. it's currently realmed to not run in labs [08:24:58] YuviPanda: feel free to build a proof of concept and then set it up on beta cluster :] [08:25:00] that is a good showcase [08:25:17] hashar: ah, yes. I was thinking of putting it on tools, since I know tools much better than betacluster [08:25:25] the deployment-bastion instance has a diamond process running apparently [08:25:30] not sure where it sends its data though [08:25:37] yeah or tools [08:25:39] up to you [08:26:45] YuviPanda: the metrics are apparently sent to tungsten.eqiad.wmnet port 8125 [08:26:50] so that is sent to prod [08:27:22] hashar: yeah. that is what's specified in the puppet config. [08:27:47] I don't see the instance in prod graphite though [08:28:06] one would probably want to namespace the metric [08:28:24] hashar: yeah, and it'll probably not make it to tungsten anyway because of firewalls [08:28:27] something like: servers.wmflabs.. [08:28:44] or setup a new graphite server for labs usage [08:28:44] hashar: that is why it was realmed to be removed in labs, since it's sending to prod and they don't make it and diamond creates huge log files :) [08:28:46] possibly in prod [08:28:54] gotta check out with labs ops (andrew/coren/chase) [08:29:01] hashar: setting it up on labs sounds more sane I think, because again of firewall issues. [08:29:08] hashar: are you projectadmin on the ganglia project? if so can you add me? [08:29:08] yup [08:29:15] adding [08:29:26] hashar: ty. adding as projectadmin would be nice too :D [08:29:30] I can't remember which instance is running ganglia right now [08:29:55] but its IP address is hardcoded in puppet manifests to have labs send their metric to it [08:30:09] the web proxy configuration will tell you which instance is used [08:30:15] hashar: yeah, I can poke around [08:30:29] hashar: I assume it's fully puppetized? :) [08:30:35] !log ganglia adding YuviPanda has a project admin [08:30:37] Logged the message, Master [08:30:48] YuviPanda: no. The ganglia frontend is not puppetized. You gotta clone :( [08:30:49] hashar: ty [08:30:53] hashar: ah, gah. [08:31:01] hashar: if only resizing instances wasn't such a PITA [08:31:02] you are an admin [08:31:06] yeah :-( [08:31:09] that is why I gave up [08:31:30] been too lazy to create a new instance, reinstall ganglia, update the iP in puppet manifest and migrate the existing RRD [08:31:35] though we can probably drop the history [08:31:47] hashar: yeah. dropping history sounds ok [08:31:57] another thing is the instance runs both the ganglia frontend and the aggregators, that causes a lot of CPU usage [08:32:12] a better architecture would be to have several aggregators, point instance to different aggregator [08:32:23] then aggregate the aggregator on a central instance that has the web frontend [08:32:37] but I guess a large instance can handle all the load [08:32:41] hashar: would we need that much for labs? a big aggregator instance + a big frontend instance should be fine no? [08:32:51] web proxy : ganglia.wmflabs.org http://aggregator.eqiad.wmflabs:80 [08:33:12] I have no clue what the other instances are for [08:33:24] they can probably be dropped. Gotta find out who created them and ask whether they are still needed. [08:33:31] hashar: yeah [08:33:37] I have created aggregator i-000002a4.eqiad.wmflabs ACTIVE stale [08:33:37] 10.68.16.101 [08:33:48] hashar: heh, one of them is even 10.04 [08:33:48] 1 CPU / 2GB ram :-( [08:33:51] not enough cpu [08:33:59] yeah they are probably the old one from pmtpa [08:34:26] YuviPanda: finally the SAL has some informations https://wikitech.wikimedia.org/wiki/Nova_Resource:Ganglia/SAL [08:34:29] hashar: meh, no quota to create big enough info. [08:34:30] on April 1st [08:34:33] hashar: I'll wait till andrewbogott_afk is around [08:34:34] that list what I did more or less [08:34:42] 19:55 hashar: cloned https://github.com/ganglia/ganglia-web.git to /usr/share/ganglia-webfrontend and checked out tag 3.5.12. Total unpuppetized hack for the win [08:34:44] :( [08:34:58] heh [08:36:27] hashar: I'll keep you updated on major updates :) [08:37:04] YuviPanda: or labs-l list :] [08:37:34] hashar: will keep you updated via emailing labs-l then :) [08:44:52] andrewbogott_afk: I've found https://github.com/jmervine/autoperf which lets me automate log replay testing of our nginx proxy. setting it up now [13:23:32] Could somebody with powers truncate /var/tmp/phd/log on the instance behind fab.wmflabs.org? Looks like https://bugzilla.wikimedia.org/show_bug.cgi?id=65861 hits again... [13:23:48] 3Wikimedia Labs: Mail notifications from fab.wmflabs.org delivered only days later (or not at all?) - 10https://bugzilla.wikimedia.org/65861#c17 (10Andre Klapper) 5RESO/FIX>3REOP Problem happens again, as expected. I've asked on #labs. CC'ing Chase because this sounds relevant for the final setup: (In re... [14:21:26] YuviPanda: I'm catching up on the backscroll... [14:21:32] You need me to raise the quota in the ganglia project? [14:21:42] andrewbogott: ah, no. I think I'll play with diamond instead. [14:21:48] andrewbogott: can you create a 'graphite' project for me? [14:21:59] sure [14:22:19] andrewbogott: ty [14:22:32] Actually, there is one already -- members are mutanta and Damianz [14:22:57] andrewbogott: I've asked mutante to add me. Damianz - around? [14:23:00] And it has 0 instances [14:23:16] andrewbogott: ah, cool :) you can add me too if you're ok with it, but I'm ok with waiting for projectadmins too [14:23:36] I added you since there clearly isn't much happening there [14:23:43] andrewbogott: woot, ty. projectadmin as well? [14:23:54] yep [14:24:33] andrewbogott: there's actually one instance in that project, but 'tis k [14:24:52] hm, why can't I see it I wonder? [14:26:32] andrewbogott: should be two now [14:27:36] hm, I guess I can only see them if I'm project admin. That's wrong, isn't it? [14:29:03] andrewbogott: yea [14:37:39] andrewbogott: new labs instances get nfs by default now, right? no glusterfs? [14:37:56] YuviPanda: yeah, we don't have gluster servers anymore [14:38:02] andrewbogott: woot, cool. [14:38:23] Shared storage wasn't turned on for that project, though -- I just switched it on. [14:38:29] So it'll be a few minutes & a puppet run [14:38:31] andrewbogott: cool! [14:38:36] initial puppet run is still happening [15:23:13] !ping [15:23:13] !pong [15:50:10] anyone can give me admin on ko.wikipedia.beta.wmflabs.org? (username Revi) :P [15:50:31] Revi: ask chrismcmahon [15:51:09] Revi: I think so, let me check... [15:52:54] and... for Korean language (not tested other lang) Read, Edit, View history is in English (it's translated on production wikis) [15:54:56] Revi: what is your user name on ko/beta? [15:55:25] Revi [15:56:12] Revi: you now have Administrator rights on http://ko.wikipedia.beta.wmflabs.org/ [15:57:22] Thanks! [16:36:04] 3Wikimedia Labs / 3deployment-prep (beta): Vector buttons are not translated - 10https://bugzilla.wikimedia.org/67087 (10Revi) 3NEW p:3Unprio s:3normal a:3None Step to reproduce: 1. Go to URL with Vector skin, language Korean. (Modern skin shows translation correctly, and I haven't test other languag... [16:41:37] is this a known issuse? when i try to run 'become mytool' on my instance (not tools-lab) it returns "Sorry, command-not-found has crashed!" [16:44:07] Coren: welcome back! Have a good holiday? [16:44:09] notconfusing: "become" is a program only available on Tools (well, you could install it on your instance as well). I don't think command-not-found should crash, but I don't know how/what/where. [16:45:22] andrewbogott: Incredibly good, although I am not paying the price for having been completely off-grid for over a week. :-) [16:45:48] s/not/now/ [16:46:38] scfc_de: thanks [16:46:57] scfc_de: i was reading this documentation, but i guess its only for creating new databases [16:47:07] Coren: :( I handled a few tools issues but didn't follow bugzilla all that closely [16:47:22] I'm (finally) working on service groups today; ping me when you have time to think about that. [16:47:54] andrewbogott: Probably more tomorrow than today. I don't expect I'll seriously have time for more than catching up with bz and replying to email. [16:48:34] Coren: sure. Feel free to fob off bugs on me in the meantime. [19:24:33] 3Tool Labs tools / 3Erwin's tools: Migrate https://toolserver.org/~erwin85/randomarticle.php to Tool Labs - 10https://bugzilla.wikimedia.org/60871#c3 (10Andre Koopal) 5NEW>3ASSI a:3Andre Koopal Now it is, fix was simple, use the correct database, same as with related_changes. Only this one is also not... [20:58:33] !log deployment-prep Fixed rebase conflict in operations/puppet.git on deployment-salt caused by cherry-picked vcl patch left over from varnish submodule usage [20:58:38] Logged the message, Master