[00:04:18] Ironholds, I think that a few slides your most important open questions with wikis + some discussion of your work would be great. [00:04:27] huh. Okie-dokes! [09:01:12] (CR) Nuria: Add Newly Registered Users metric (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/136431 (https://bugzilla.wikimedia.org/65944) (owner: Milimetric) [09:36:28] qchris, a question about publishing to graphite from EL if you have the time [09:36:45] nuria: Sure (But I know nothing about graphite) [09:36:59] (And nothing about EventLogging) [09:37:03] i bet you know this though [09:37:08] from your graph [09:37:10] Let's find out. [09:37:17] https://wikitech.wikimedia.org/w/images/d/d6/Eventlogging-backend.svg [09:37:31] the multiplexer publishes to graphite [09:37:44] which is right [09:37:45] nuria@vanadium:~$ sudo netstat -lpa | grep 29336 [09:37:45] tcp 0 0 *:8600 *:* LISTEN 29336/python [09:37:45] tcp 0 0 localhost:47741 localhost:8522 ESTABLISHED 29336/python [09:37:45] tcp 0 0 vanadium.eqiad.wmn:8600 hafnium.wikimedia:54155 ESTABLISHED 29336/python [09:37:45] tcp 0 0 localhost:43724 localhost:8521 ESTABLISHED 29336/python [09:37:46] tcp 0 0 localhost:8600 localhost:57916 ESTABLISHED 29336/python [09:37:46] tcp 0 0 vanadium.eqiad.wmn:8600 hafnium.wikimedia:55525 ESTABLISHED 29336/python [09:37:47] tcp 0 0 localhost:8600 localhost:40010 ESTABLISHED 29336/python [09:37:47] tcp 0 570 vanadium.eqiad.wmn:8600 hafnium.wikimedia:54153 ESTABLISHED 29336/python [09:37:48] tcp 0 84480 localhost:8600 localhost:60640 ESTABLISHED 29336/python [09:37:48] tcp 0 0 localhost:8600 localhost:48402 ESTABLISHED 29336/python [09:38:24] but ... [09:38:48] where is it configured to do so? [09:38:55] mutiplexer reads [09:39:05] the config here [09:39:11] etc/eventlogging.d/multiplexers/all-events [09:39:19] --sid [09:39:19] all events [09:39:19] tcp://*:8600 [09:39:19] tcp://127.0.0.1:8521 [09:39:19] tcp://127.0.0.1:8522 [09:39:55] Some of the scripts just look for all running relevant services and parse them. [09:39:56] but in that config the harvester for graphite does not appear [09:40:02] Let me check the code if multiplexer is one of them. [09:40:41] as far as i can see it just reads the file on /etc/ [09:43:03] By "harvester" you mean the service on hafnium? [09:43:21] That is in manifests/site.pp [09:43:32] as "role::eventlogging::graphite" [09:44:34] nuria^ [09:44:54] Or what kind of "harvester" do you mean? [09:45:15] the service on hafnium, that is correct [09:45:41] but what i do not understand is what "makes" the multiplexer publish to that port [09:46:13] the "output" parameter in the multiplexer configuration [09:46:58] output => 'tcp://*:8600' [09:47:48] https://git.wikimedia.org/blob/operations%2Fpuppet/d37109dc98954d2377ed25fce66b2361b8cd190e/manifests%2Frole%2Feventlogging.pp#L73 [09:47:57] (git.wikimedia.org is slow today :-( ) [09:48:12] but wait, the multiplexer is publishing on 8600, correct? [09:48:54] yes. [09:49:03] And role::eventlogging::graphite is reading from that port [09:49:22] https://git.wikimedia.org/blob/operations%2Fpuppet/d37109dc98954d2377ed25fce66b2361b8cd190e/manifests%2Frole%2Feventlogging.pp#L177 [09:50:32] ahhh right, it is hardcoded [09:50:47] hardcoded in puppet. Yes. [09:51:44] ok, see, you knew .... EVERYTHING seems to be in puppet [09:52:07] puppet is the mother of all things wmf :-) [09:53:34] indeed it is [10:29:35] ^qchris another question if you can answer [10:29:57] Let's hear it. [10:30:25] i was trying to see what is being published on 8600 on vanadium [10:30:28] with netcat [10:30:46] but i get nothing ... [10:30:48] root@vanadium:~# nc -v -p 160000 localhost 8600 [10:30:48] Connection to localhost 8600 port [tcp/*] succeeded! [10:30:53] IIRC multiplexer uses zmq. [10:30:54] ....void ... [10:31:22] Wait ... [10:31:30] "-p 160000" ? [10:31:34] Let me check the man page. [10:32:18] "-p" is the port number. [10:32:25] incoming yes, [10:32:39] i though you needed that fot listening to tcp eith nc [10:32:39] 160000? [10:32:44] well any [10:32:53] tcp only has 16-bit port numbers. [10:33:09] ahh, wait is that too large, i just typed whatever num [10:33:24] 16-bit is 65536. [10:33:47] may be you do not even need incoming port... [10:34:00] but still nothing [10:34:02] root@vanadium:~# nc -v localhost 8600 [10:34:02] Connection to localhost 8600 port [tcp/*] succeeded! [10:35:11] I get a byte. [10:35:23] (when running it on stat1002) [10:36:02] And as I said above ... it seems to be zmq [10:36:02] \ [10:36:13] Try running: [10:36:15] zsub vanadium.eqiad.wmnet:8600 [10:36:18] on stat1002 [10:36:29] That shows you the output [10:36:40] Or rather: zsub vanadium.eqiad.wmnet:8600 | head [10:38:03] (But nc should at least give you a byte) [10:38:24] nc does give me a byte [10:39:03] Oh. Ok :-) [10:39:07] i still do not understand ... thouhg, shouldn't i still see teh output with nc even if it is raw? [10:39:25] nc does not talk the zmq protocol on it's own. [10:39:38] but isn't tcp at the end? [10:40:17] Yes. But zmq sits on top of tcp. [10:40:25] (Like different iso/osi layer) [10:41:28] Like if you netcat a webserver at port 80, it also waits for http command before sending web pages. [12:04:19] qchris [12:04:24] Yup. [12:04:35] I was looking ta rates to put a threshold for EL [12:04:45] and i am seeing that gain Media viewer events are huge [12:05:04] they are generating about 14 million rows every day [12:05:16] *was looking at rates [12:05:25] 162 events/second? [12:05:35] ~ about that [12:06:32] Didn't we apply sampling to those a week (two weeks) back? [12:06:39] That number seems quite high for these type of data and I think we probably lack teh cap[acity to analyze it [12:06:47] ya but it seems it got reverted. [12:07:18] http://ganglia.wikimedia.org/latest/graph.php?r=month&z=xlarge&c=Miscellaneous+eqiad&h=vanadium.eqiad.wmnet&jr=&js=&v=564667490&m=eventlogging_client-side-events&vl=events&ti=client-side-events [12:07:22] ^ Looks fun :-) [12:07:44] ok, i will send them an e-mail [12:07:46] Counting 162 requests/second. That should be possible. [12:07:58] But do we want to see thath many requests/second? [12:08:27] not from just 1 event [12:08:45] check out the break [12:08:46] MediaViewer 61.73% (165.02/sec) [12:08:46] UniversalLanguageSelector-tofu 21.78% (58.23/sec) [12:08:46] PageContentSaveComplete 5.01% (13.40/sec) [12:08:46] NavigationTiming 2.07% (5.53/sec) [12:08:46] TrackedPageContentSaveComplete 1.98% (5.28/sec) [12:08:47] WikimediaBlogVisit 0.99% (2.63/sec) [12:08:47] SignupExpAccountCreationImpression 0.93% (2.48/sec) [12:08:48] PersonalBar 0.80% (2.13/sec) [12:08:48] UniversalLanguageSelector 0.71% (1.90/sec) [12:08:49] PageCreation 0.59% (1.57/sec) [12:08:49] EchoInteraction 0.45% (1.20/sec) [12:08:50] MultimediaViewerNetworkPerformance 0.40% (1.07/sec) [12:09:14] EventLogging can keep up, up to now. [12:09:28] dbstore-analytics is starting to lag a bit for eventlogging. [12:09:36] Currently ~1 hour. [12:10:00] Not sure what the expectations around EventLogging are. [12:10:13] That's why I want tnegrin to publicly announce the EventLogging ownership. [12:10:27] So it gets clear /who/ should make those calls. [12:11:25] (CR) Milimetric: Add Newly Registered Users metric (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/136431 (https://bugzilla.wikimedia.org/65944) (owner: Milimetric) [12:11:56] All in all, it looks like vanadium is currently not choking. [12:12:26] But if not been part of capacity/schema planning, so I have no clue about expectations. [12:12:32] s/if/I've/ [12:13:17] where do you see the lag on dbstore-analytics? [12:13:37] In my lag monitoring :-) [12:13:44] But it should be on icinga as well ... [12:13:49] let me find that for you. [12:14:10] There it is "OK slave_sql_lag Seconds_Behind_Master: 7969" [12:14:18] Go to https://icinga.wikimedia.org/icinga/ [12:14:25] Search for the host dbstore1002 [12:14:32] *your* lag monitoring .....mi madre ..... [12:14:39] Look for the "MariaDB Slave Lag: m2" line [12:15:32] what is m2? [12:15:45] The cluster where EventLogging data is sitting. [12:16:15] ok [12:24:05] Mhmm ... by the rate the m2 lag is increasing ... it looks something weird is going on ... [12:53:36] (PS3) QChris: Switch back to normal slave for s1 again [analytics/geowiki] - https://gerrit.wikimedia.org/r/133273 [13:39:04] (CR) Milimetric: [C: 2] Switch back to normal slave for s1 again [analytics/geowiki] - https://gerrit.wikimedia.org/r/133273 (owner: QChris) [15:44:40] ottomata: hi there, are you free at all today, tonight, or tomorrow AM? My intern is in town and I thought it would be nice for her to meet you [15:45:47] you have an intern!? :) [15:45:50] I do! [15:46:49] She was in town for Wiki Conference USA. http://franceshocutt.com/2014/06/02/wikiconference-usa-2014-rundown/ [15:46:59] i'm feeling sicky today, so taking a sick day but still kinda working, and I can't tomorrow. Thursday not possible? [15:47:25] Bleh, no, sorry [15:47:54] rats, hm [15:47:57] where's your intern live? [15:48:03] i assume not in nyc? [15:48:25] no, Seattle [15:48:55] she's also gonna be at Open Source Bridge in Portland in late June [15:48:58] (as will I) [15:49:15] anyway ottomata you ought to concentrate on getting well - it's fine. I'm sure you will get other chances to meet [15:49:39] did you get to see Rachel dC while she was in town? [15:50:41] i didn't, was out of town last weekend [15:50:48] oh, have you been to the cowork space? [15:50:54] yup [15:51:42] how is it? [15:54:56] I like it ottomata - it's still improving, e.g., getting power to all the outlets, etc. I wish there were just 1 more privacy booth. But the light is lovely, and the people are reasonable [15:55:50] reasonable! [15:55:50] ha [15:55:52] cool [15:56:44] Like, you know, polite and friendly but noninterfering if you want to be left alone, and sharing free snacks, and so on [15:58:45] aye, hm ,cool [15:58:52] ja wanna try it out, maybe friday or next week hmm [17:00:04] tnegrin: [17:00:04] hiya [17:00:04] hey [17:00:04] got this one going [17:00:04] https://plus.google.com/hangouts/_/gsperkj5l3ki6p3llp65nr3e5qa?authuser=1 [17:00:04] be ther ein 2 secs [17:00:04] ok [17:00:04] https://plus.google.com/hangouts/_/gsperkj5l3ki6p3llp65nr3e5qa [17:00:04] tnegrin: ^ [17:00:05] ugh -- give me a second [17:00:05] this meeting is going long -- I'm going to need to reschedule [17:00:05] how about 4:30? [17:00:05] ottomata: sorry -- I need to reschedule -- 4:30 ET? [17:00:05] sure [17:00:05] although i'm kinda taking a sick day (not sure how to officially do that), want to do it tomorrow? [17:05:02] (PS2) AndyRussG: WIP Create a cohort from campaign membership [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/126927 (owner: Awight) [17:05:04] (CR) jenkins-bot: [V: -1] WIP Create a cohort from campaign membership [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/126927 (owner: Awight) [17:05:06] (CR) AndyRussG: "Just rebasing..." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/126927 (owner: Awight) [17:06:38] DarTar, where are you? [17:06:41] tnegrin, ditto ;p [17:07:02] still in this meeting about AE 2.0, started 2 hours ago [17:07:14] AE? [17:07:23] active editors(tm) [17:07:39] ohh [17:07:48] see, I thought it stood for Aneurysm-Endorsing [17:07:54] that makes far more sense than your explanation [17:11:02] DarTar, toby in there with you? [17:11:09] yes [17:11:12] kk [17:11:31] leila has sort of been waiting around by phones for 40 minutes with no luck, and I feel bad [17:13:29] Ironholds: I know, I’ve been trying to keep her informed [17:13:43] yup [17:13:43] but my messages had trouble going through [17:14:05] because of some weird issue with skype [17:14:25] we’re done now, I understand we’ll be starting staff in 5 [17:16:03] kk [20:31:22] looking for tnegrin.. [20:39:12] (PS1) Milimetric: Update for April meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/137164 [20:39:38] (CR) Milimetric: [C: 2] Update for April meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/137164 (owner: Milimetric) [20:39:44] (CR) Milimetric: [V: 2] Update for April meeting [analytics/reportcard/data] - https://gerrit.wikimedia.org/r/137164 (owner: Milimetric) [21:51:47] wow, are all replicas of wikis other than enwiki gone from analytics-slave or have the privs for user:research changed? [21:51:47] likely the latter [21:52:08] springle ^ [21:53:39] probably related to the migration qchris_ was referring to [21:54:03] Mhmm ... interesting [21:54:18] BUt should not be related to the migration I mentioned. [21:55:10] 'show databases' does show the wikis for me. [21:55:15] (Connecting from stat1003) [21:56:13] brb [21:57:51] icinga also looks normal for dbstore1002. [21:59:05] ganglia looks good too (to me) [21:59:30] qchris_: weird, let me check again [21:59:42] Btw I use "user=research" and a password that exists at least since August 2013 [22:00:18] if I connect to analytics- slave as user:research here’s what I get: [22:00:20] dartar [22:00:21] enwiki [22:00:22] information_schema [22:00:24] log [22:00:25] prod [22:00:26] rfaulk [22:00:27] staging [22:00:46] (from a SHOW DATABASES) [22:00:47] want me to act as the tiebreaker? ;p [22:01:18] please do, but I’m also trying to see if there’s anything weird on my end [22:01:21] DarTar: s/slave/store [22:01:23] yep, same here. [22:01:29] and yeah, it's analytics-store for the unified dbs [22:01:39] I assume that analytics-slave is just a alt name for s1-analytics-slave. [22:01:46] they appear to contain the same dbs, at least. [22:01:59] ok [22:02:04] Ironholds: Yes, it is just an alias pointing to s1 [22:02:11] * Ironholds nods. cool! [22:02:12] consistently with springle’s notes [22:02:43] qchris_: sorry for the noise [22:02:50] No worries. [22:03:20] analytics-store still has hiccups from time to time. So problems are totally plausible. [22:03:28] kk [22:12:42] O_o http://stats.grok.se/pt.q/top [22:14:25] What ... data: urls? [22:14:27] :-D [22:17:02] Mhmm ... Looking through the requests ... it seems requests to http://pt.wikipedia.org/wiki/data:image/png;base64,i[...] are really prominent these days [22:17:06] ? [22:20:55] some ancient silly client? [22:22:30] No :-/ [22:22:34] Chrome etc. [22:22:58] All Windows. [22:24:12] Meh. I am too tired to debug it right away. [22:24:41] It seems to be requests made by people. [22:24:56] And it happens some time back. [22:25:33] So nothing that is immediately killing people's Wikipedia experience, or else they would have shouted already. [22:25:46] Ironholds: ping? do you have direct realtime access to the mobile varnish logs? if so can you check something for me? [22:25:48] So I guess it can wait until I got some sleep :-) [22:27:19] YuviPanda: Something that I can check real quick? [22:27:50] YuviPanda, pong [22:27:56] ah, qchris_ got it :) [22:28:08] and not realtime, no, but qchris_ may. I have it on a, iirc, 15 minute or 1 hour delay. [22:28:19] qchris_: Ironholds sure! can you check for presence of the string isdca21b7e-c234-4660-9722-8c50f47519d8 in any of them? [22:28:29] just testing the mobile app session tracking code [22:28:34] *winces* [22:28:54] can't you just check EL? [22:29:00] Ironholds: this doesn't go into EL [22:29:06] (I cannot help with that, unless you mean in the UA.) [22:29:11] ..then where does it go? [22:29:21] and why are you issuing it? ;p [22:29:34] YuviPanda: Nothing in todays mobile sampled-100 stream. [22:29:45] qchris_: yeah, because I'm just testing :) anyway, it should show up [22:29:59] where? in the UA, in the x_analytics field...? [22:30:08] I mean, if it's not being used in a captured field I'm confused as to why it's there. [22:30:09] YuviPanda: So you mean sampling is in the way? [22:30:18] Let me set up unsampled ... [22:31:24] Ironholds: ah, so instead of sending it to EL on every link clicked, which would kill EL, or using sampling on the client side, we're just appending it to the URL and sending it so we can track sessions that way. either Toby or halfak suggested it at Zurich and I just got around to implementing that. [22:31:36] .. [22:31:42] so it's going to be in hadoop? [22:31:59] in the referer column, presumably? [22:32:23] I wish you...oh man. The best of luck :D. [22:32:28] Hadoop has gone down 3 times for me this week. [22:32:35] Ironholds: unsure yet, we can figure that out later. I do know that mobile access logs are at least *kept* so we can do something about it. [22:32:46] Ironholds: and the app release is going out tomorrow, and I didn't want to accidentally kill EL [22:32:53] yeah, I get that bit. [22:33:15] I'm just bemused by "we don't quite know how we're going to get the data to somewhere we can use it, but we're storing it". Anyway; I probably don't have that, I'm afraid. [22:33:19] YuviPanda: Unsampled logging set up. Grepping for isdca21b7e-c234-4660. [22:33:25] qchris_: woot! :) [22:33:33] But seeing nothing coming in. [22:33:49] qchris_: oh, is this capturing request to /w/api.php? [22:34:20] *requests [22:34:22] Full firehose. [22:34:25] So everything. [22:35:18] qchris_: ah, hmm. grep for appInstallID? [22:35:27] 1 sec... [22:36:04] Ok. [22:36:28] Seeing no requests coming in with appInstallID [22:37:17] bah [22:37:25] * YuviPanda wonders what's happening [22:38:13] So the request would look like [22:38:14] http://en.wikipedia.org/w/api.php?abc=appInstallID [22:38:15] ? [22:38:25] If I make such a request by hand, it shows up. [22:38:45] qchris_: hmm, am investiagint my code [22:38:49] There was one! [22:39:08] Again! [22:39:09] qchris_: was there one now again? [22:39:12] qchris_: ah, cool! :) [22:39:14] works now, yay [22:39:27] qchris_: can you paste me the full URL? [22:39:36] In here? [22:39:56] qchris_: yeah, has no personal info. that's me making the requests and the UUID there is a temp random [22:40:15] It's not from a wikipedia IP [22:40:26] mhmmm. [22:40:41] qchris_: you can PM me if you want :) I am sitting at a friend's house in the UK [22:40:53] ttps://en.m.wikipedia.org/w/api.php?sectionprop=toclevel%7Cline%7Canchor§ions=0&noheadings=true&page=FILTERED&appInstallID=FILTERED&action=mobileview&prop=text%7Csections%7Clastmodified%7Cnormalizedtitle%7Cdisplaytitle%7Cprotection%7Ceditable&format=json&onlyrequestedsections=1 [22:41:01] qchris_: cool :) [22:41:02] ^ is that ok? [22:41:20] FILTERED two parts [22:42:18] qchris_: thank you! :) [22:42:30] yw [22:42:47] * qchris_ hears his bed scream for him ... [22:42:55] Anything else I can help? [22:43:03] qchris_: get some sleep, sir! :) [22:43:19] Thank you sir. I will. [22:43:29] See you :-) [22:51:06] Ironholds: actually, curious. do we have stats on how many hits the *new* wikipedia app is getting? detectable based on UA [22:51:53] the one with the new UA format? [22:51:58] Ironholds: yeah [22:52:06] no, but we could if it's a reasonably high-priority. [22:52:36] I add that caveat because if I spend one more day trying to get things on hive to work and failing 14 hours in, I will become a professional alcoholic ;p [22:52:53] Ironholds: ah, no, not high priority at all [22:53:07] Ironholds: it might be reasonable priority in about a week's time, though [22:58:02] to see adoption rates? [22:58:30] Ironholds: pretty much, yeah.