[08:25:42] Analytics-Tech-community-metrics, ECT-May-2015: Ensure that most basic Community Metrics are in place and how they are presented - https://phabricator.wikimedia.org/T94578#1275554 (Qgil) [10:32:48] Analytics-EventLogging, Analytics-Kanban, Patch-For-Review: Troubleshoot EL performance problems on 2015-05-06 - https://phabricator.wikimedia.org/T98588#1275743 (mforns) [10:34:56] Analytics-EventLogging, Analytics-Kanban: Backfill EL missing data for 2015-05-06 - https://phabricator.wikimedia.org/T98729#1275747 (mforns) NEW a:mforns [11:04:43] Analytics-EventLogging, Analytics-Kanban, Mobile: EL unable to decode may apps events {oryx} - https://phabricator.wikimedia.org/T96938#1275775 (mforns) In fact, the events are being truncated for being larger than EL can handle. The reason of the exceptional length of the events lies not in the size... [14:47:40] I just got Kevin's vacation response and went into an infinite loop [14:47:49] :) [14:48:29] hrn [14:48:32] * Ironholds headscratches [14:48:46] does anyone remember how I made Labs play nicely pointing the web proxy to non-80 ports /last/ time? ;) [15:00:52] Ironholds: can't you just specify the port in the proxy? [15:05:18] milimetric, I did but it sn't working. Hrn. [15:16:37] Ironholds: the other way you might have done it is with an nginx proxy? [15:16:44] or apache proxy? That's how we do the limn dashboards [15:17:05] but there are two reasons why my web proxies sometimes appear to not work [15:17:17] they're forwarding traffic to the port I specify just fine but my instance is rejecting them because: [15:17:31] 1. I'm not allowed to use that port (there are certain ranges, I forget) [15:17:51] 2. the firewall or whatever is blocking the traffic [15:18:24] yeah, I'm very deliberately using the same port setup as for the datavis.wmflabs instance [15:18:27] (which works) [15:18:35] I'll debug in a sec, I think. Gotta finish this email :) [15:24:10] Quarry: Seeing all of created queries in one page - https://phabricator.wikimedia.org/T98688#1276295 (Capt_Swing) See also: # https://phabricator.wikimedia.org/T77948 # https://phabricator.wikimedia.org/T88920 [15:30:12] morning halfak :) [15:32:16] milimetric, this is bizzarre [15:32:34] the application is definitely accepting 8080 connections, the web proxy claims to be pointing to 8080, and yet it times the hell otu [15:39:07] Analytics-EventLogging, Analytics-Kanban, Mobile: EL unable to decode may apps events {oryx} - https://phabricator.wikimedia.org/T96938#1276349 (mforns) Open>Resolved [15:53:07] apparently none of the search engineers can see the group because of labs permissions. [15:56:07] yuvipanda? ;) [16:25:23] hi folks. ottomata is away on a bike trip this week so i'll do my best to cover for him. please ping me if analytics operational issues come up. however, i won't be available tomorrow morning. also btw i'm not on analytics-internal anymore. [16:25:44] hi jgage, I am trying to cover the analytics bit of it :) [16:25:59] thanks for being there with us jgage :) [16:25:59] great ok :) i will back you up as needed. [16:26:23] joal can we cancel our devops meeting tomorrow? i need to help my mother move to a new home. [16:26:29] Sure, np :) [16:26:33] thanks amigo [16:39:05] Analytics-Tech-community-metrics, ECT-May-2015: Maniphest backend for Metrics Grimoire - https://phabricator.wikimedia.org/T96238#1276502 (sduenas) We were able to collect around 71700 but now, we're stuck because we're getting pretty 503 errors. Oldest issue we retrieved was T25551 (we use modification... [16:45:18] Ironholds: you want to open up port 8080 on Special:NovaSecurityGroups [16:46:06] "no such special page" [16:46:33] try the singular [16:49:33] oh, doy [16:49:36] thanks, yuvipanda :) [16:49:40] yw [16:49:50] "Failed to add rule. " [16:50:13] okay, successfully added rule! [16:50:33] halfak, got a search weekly checkin thing! We should reschedule our weeklies :) [16:50:37] (not now, because meeting, but soon) [16:51:12] Ironholds, OK! Does that mean you won't make the meeting in 10 mins? [16:54:44] halfak, yep :( [16:54:54] Ironholds, kk no worries. [16:58:39] yuvipanda, works perfectly! You is genius [16:58:46] yw. [16:59:11] now to go to the french embassy [17:23:17] Analytics-Tech-community-metrics, ECT-May-2015: Maniphest backend for Metrics Grimoire - https://phabricator.wikimedia.org/T96238#1276693 (epriestley) Offset-based paging costs approximately `O(offset)` (we must load, policy filter, and then discard all results to compute the offset) so the cost of the qu... [17:42:03] halfak, milimetric http://searchdata.wmflabs.org/ may interest [17:42:11] dynamic dashboarding with ~40 lines of R [17:42:26] :) yay [17:42:57] Ironholds, cool! Does the 40 lines include the generation of the dataset? [17:44:03] naw, that's an additional 10 [17:44:21] I should also say it dynamically updates the dataset as new entries come in, up to and including "updating your graph in front of you" [17:50:04] Ironholds, that's pretty cool. 50 lines total and I can stand something like this up? [17:50:23] halfak, yep! oh, and all of that text? [17:50:27] that's stored in markdown [17:50:37] visualisation platforms where I never have to touch HTML to update things: HELL YES. [17:50:59] and if you want to get more complicated you can insert custom CSS or HTML from file or inline, but the purpose is if you don't want to, you don't have to [17:51:14] HTML isn't so bad. [17:51:19] Browsers such [17:51:20] *k [17:51:31] Markdown doesn't help with that :| [17:51:45] that's true [17:51:56] still, it's fun! [17:52:10] it also means I've successfully reconstructed Limn in a form Someone Else maintains [17:52:11] Indeed. I'd still rather Markdown generally for long-form text. [17:52:17] exactement [17:52:27] the trends/outages sections, ferinstance [18:02:46] Ironholds: you've got a few composable iterators and abstract factories to go before you can claim you've reconstructed limn. I would recommend against that :) [18:03:02] milimetric, from the user end, then ;p [18:03:03] Ironholds: you and I should talk about dashiki and where it's at [18:03:06] we got to work on it a bit [18:03:11] cool! [18:03:18] getting data to labs is still my biggest bugbear [18:03:34] I gotta go to lunch now but feel free to schedule me anytime after [18:15:25] Wikimetrics has been stalled for like 30 minutes trying to validate a cohort. [19:35:49] joal, yt? [19:46:42] ragesoss: how many users in the cohort? [19:47:07] milimetric: I tried with just 1. [19:47:13] same problem. [19:47:25] the actual cohort I'm interested in is about 2100. [19:47:38] yeah, that's very small, lemme check [19:50:31] Hi milimetric. [19:50:46] hi halfak [19:50:50] It looks like stat1002 is having some issues (/tmp is filling up) [19:50:53] ragesoss: i bounced the server, wanna try again? [19:50:59] ottomata seems to be out. [19:51:01] oooh... andrew's gone for 4 days :( [19:51:05] So I'm not sure who to turn to. [19:51:11] Any ideas? [19:51:26] I'd file a phab bug with operations, ping jgage about it, and go from there [19:51:46] jgage: stat1002 tmp is filling up, you're our only hope [19:52:04] :) [19:52:40] milimetric, kk filling out the phab ticket [19:54:44] milimetric, i'm doing a tech talk on thursday, want to be a local backup? [19:56:39] yurik: what in the world is a human backup? :) [19:57:12] backs up another human in case of an internet failure :) [19:57:34] you doing the talk in Philly? [19:58:54] yurik: oh! you mean a tech talk for WMF [19:58:59] yep [19:59:02] sure, you can send me the slides [19:59:08] no slides yet :) [19:59:12] will work on those soon [19:59:14] well, when you have them :) [19:59:18] sure [19:59:23] want to make a few examples too? [19:59:54] i think it would be better to show a demo than slides [20:00:07] just line up a few demo graphs [20:00:20] the slides would be useful at first to explain the main principale [20:00:29] yeah, a few slides for some basics [20:00:38] yep [20:00:40] demo the rest of the time [20:00:43] and discussion! [20:00:47] yep yep [20:01:04] ok, lets practice for the wikimania :) [20:01:09] I think the current Vega grammar is too wonky [20:01:19] brb, meeting [20:03:10] yeah [20:03:16] that's where templates come in [20:22:15] hi halfak [20:22:37] I have seen stat1002 having issues [20:26:51] ori took care of it a while ago ;) [20:45:55] thanks milimetric, that seems to have fixed it. [21:03:28] (PS3) Milimetric: Add error message to failed reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203241 (https://phabricator.wikimedia.org/T88610) (owner: Mforns) [21:14:45] Analytics-EventLogging, Analytics-Kanban: Forwarder modification to produce to multiple outuputs, 1 to zmq and 1 to Kafka - https://phabricator.wikimedia.org/T98779#1277503 (ggellerman) NEW a:JAllemandou [21:15:13] Analytics-EventLogging, Analytics-Kanban: Forwarder modification to produce to multiple outuputs, 1 to zmq and 1 to Kafka [8 points] - https://phabricator.wikimedia.org/T98779#1277513 (ggellerman) [21:15:16] milimetric: getting the new laptop today! [21:15:31] madhuvishy: happy birthday indeed! [21:16:02] milimetric: ha ha i also got new keyboard mouse and all that. so many gifts from office :) [21:16:27] Analytics-EventLogging, Analytics-Kanban: Forwarder modification to produce to multiple outuputs, 1 to zmq and 1 to Kafka [8 points] - https://phabricator.wikimedia.org/T98779#1277503 (ggellerman) a:JAllemandou>mforns [21:18:00] Analytics-EventLogging, Analytics-Kanban: Configuration for Python & Kafka in Beta labs [8 points] - https://phabricator.wikimedia.org/T98780#1277521 (ggellerman) NEW a:madhuvishy [21:22:10] Analytics-EventLogging, Analytics-Kanban: Code to write a new Camus consumer and store the data in two Hive tables [21 points] - https://phabricator.wikimedia.org/T98784#1277563 (ggellerman) NEW a:JAllemandou [21:22:40] Analytics-EventLogging, Analytics-Kanban: Code to write a new Camus consumer and store the data in two Hive tables [21 points] - https://phabricator.wikimedia.org/T98784#1277571 (ggellerman) [21:27:53] (CR) Milimetric: [C: 2] Add error message to failed reports (2 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/203241 (https://phabricator.wikimedia.org/T88610) (owner: Mforns) [21:28:00] (PS2) Milimetric: Permit rerun of failed report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208017 (https://phabricator.wikimedia.org/T88610) (owner: Mforns) [21:40:16] (PS3) Milimetric: Permit rerun of failed report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208017 (https://phabricator.wikimedia.org/T88610) (owner: Mforns) [21:40:25] (CR) Milimetric: Permit rerun of failed report (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208017 (https://phabricator.wikimedia.org/T88610) (owner: Mforns) [21:40:33] (CR) Milimetric: [C: 2] Permit rerun of failed report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208017 (https://phabricator.wikimedia.org/T88610) (owner: Mforns) [21:40:59] (PS5) Milimetric: Reduce size of public reports folder [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208637 (https://phabricator.wikimedia.org/T95756) (owner: Mforns) [21:49:56] (CR) Milimetric: [C: 2] "Good trick, this should help the size a lot." [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/208637 (https://phabricator.wikimedia.org/T95756) (owner: Mforns) [21:52:32] Anyone know how to test event logging in production, esp. after the https://phabricator.wikimedia.org/T98404 change? [21:52:37] leila: yuvipanda ^ [21:54:02] bearND: I am not sure what you mean by 'test in production' :) [21:56:08] joal: so, the event logging endpoint is going to change. So, there are patches outstanding to have the apps change those. Now we need a way to verify the events still come through when we do that. [21:56:27] joal: the zsub command doesn't seem to produce output anymore. [21:57:05] I didn't know zsub bearND [21:57:29] If the endpoint has not changed yet, I assume there is no easy way to test ? [21:57:56] bearND: I'm heading to a meeting in a minute and will be back in an hour. can look into this with you when I'm back. In the mean time: there may be delays between the actual event time and the time you see them in the raw files or DBs [21:58:04] Or I don't know them :) Which is more thna probable ! [21:58:13] have you waited for some time bearND? [21:58:56] joal: nuria created a wiki page to help verify event logging, but only if we use a temporary endpoint, instructions are here: https://wikitech.wikimedia.org/wiki/EventLogging/Testing/BetaLabs [21:59:44] yes bearND, this uses test environment [22:00:13] And I don't know if the new endpoints are set up allreday in the new env [22:00:18] in the test env sorry [22:04:15] leila: great. Thanks. As I was saying the zsub command on stat1003 does not produce any output, even after waiting for several minutes. [22:10:35] leila: i'm going to get my colleague coreyfloyd to join this channel as well, since it's getting too late for me (am currently in Europe). You can also mention him. I'm going to check the log tomorrow morning if I'm not around by then. [22:10:49] leila: So, my main questions right now are: Who is the point person for event logging? Where/how can I access real event logging data? [22:29:39] leila / milimetric : echoing bearND's question, is there a file i can tail or a topic i can subscribe to with zsub or its like for verifying hot event log data? or is mysql kind of the main way to verify stuff has gone through? [22:31:11] dr0ptp4kt: you're probably interested in events that end up in the mysql database, right? [22:31:17] there's a whole pipeline leading up to that [22:31:40] and there are logs, and you can watch zmq and all that [22:31:55] but you just care about the end result that you can run computation on, right? [22:32:03] bearND: here [22:32:11] hey coreyfloyd [22:32:14] coreyfloyd: great! [22:32:20] sup all [22:32:20] I was just seeing bearND's question about EL [22:32:30] all of us in analytics are point people for EL [22:32:37] I'm around now so I can help [22:33:04] milimetric: Yes, coreyfloyd and I want to verify if EL still works before we merge the respective patches that change the EL endpoints [22:33:12] milimetric: cool - so I have dr0ptp4kt helping me right now… we were sendnig some events over the new API to make sure the are in production [22:33:19] gotcha [22:33:26] ok, so does any of you have access to stat1003? [22:33:36] from there, you can access the mysql server on analytics-store.eqiad.wmnet [22:33:39] milimetric: yes we have access to stat1003 [22:33:44] and on that server, the "log" database is the event logging db [22:33:46] milimetric: i don’t have access to production analytics db [22:34:08] coreyfloyd: there are many dbs for different things [22:34:13] EL has a production master/slave cluster [22:34:17] but that's not what people touch [22:34:40] milimetric: how do I know if I have access? Is there an access list somewhere? [22:34:42] that data is replicated to an analytics box, which is where 99% of the people use the data [22:34:50] coreyfloyd: ssh stat1003.eqiad.wmnet [22:35:15] from there, the credential file for the mysql box is at: /etc/mysql/conf.d/research-client.cnf [22:35:17] so you can do: [22:35:22] mysql --defaults-file=/etc/mysql/conf.d/research-client.cnf -hanalytics-store.eqiad.wmnet [22:35:31] coreyfloyd: no I do not have access… [22:35:40] milimetric: i mean [22:35:43] that was very meta [22:35:46] :) [22:35:58] ok, so file a ticket in phabricator under Ops-Access-Requests [22:36:03] (cc that project I mean) [22:36:19] one sec coreyfloyd, I'll find you a similar request [22:36:20] milimetric: same here. I'm not in researchers group [22:36:58] dr0ptp4kt: milimetric: is it cool if we apps ppl become members of researchers? [22:37:10] bearND / dr0ptp4kt / coreyfloyd: two things [22:37:22] first, you need stat1003 access if you want to analyse the data, not *test* [22:37:25] milimetric: i'm in researchers. but [22:37:36] that's very important, testing should happen on beta labs, and I'll help you with that in a sec [22:37:52] second, here's a similar ticket (notice they also want access to "test" which I'm about to chastize them on :)) [22:37:53] https://phabricator.wikimedia.org/T98209 [22:38:01] milimetric: what's the zmq syntax or what's the file to tail for prod? [22:38:31] dr0ptp4kt: isn't it easier to look at the db? [22:38:35] milimetric: looks like there's about a 20 minute delay on the table in question for stat1003 mysql [22:38:41] milimetric: so we don’t want to “test” really. We are modifying the endpoint for analytics… so we need to make sure it works [22:39:05] coreyfloyd: is the setup in beta labs not enough for that? [22:39:29] milimetric: i don’t think so, right? [22:39:37] dr0ptp4kt: true about the delay, the logs are not accessible though [22:39:38] milimetric: the whole point is the production end point is changeing [22:39:45] the zmq syntax is on the EL wiki, one sec [22:40:10] milimetric: so if we change the endpoint how do we know if it actually works without looking at the production db? [22:40:11] coreyfloyd: help me understand what this endpoint is [22:40:21] milimetric: just one sec… [22:41:01] milimetric: here's the Android patch of the change: https://gerrit.wikimedia.org/r/#/c/208315/6/wikipedia/src/main/java/org/wikipedia/analytics/EventLoggingEvent.java [22:41:25] milimetric: old endpoint: "https://bits.wikimedia.org/event.gif” new endpoint: "https://meta.wikimedia.org/beacon/event" [22:42:30] ori made the a similar patch for the iOS app [22:43:04] milimetric: looks like a ton of events just flushed, so i see those in mysql now. cc bearND coreyfloyd [22:43:15] ah, this change. An equivalent change should be done in beta labs and EL code tested there [22:43:23] milimetric: if you do happen to find that syntax for tailing / subscribing with a grep lemme know. [22:43:34] dr0ptp4kt: I couldn't find it, looked all over mediawiki [22:44:15] milimetric: yeah, the beta stuff looked good, just wanted to make sure on prod. looks like we're good. confined the app to using testwiki instead of one of the other true language wikis. that's high enough confidence for today given that there aren't a lot of WikipediaApp/4* UAs actually hitting the endpoint in question [22:44:16] milimetric: ok - thanks for the help - dr0ptp4kt can see the new events… milimetric we will also update our beta urls… we just didn’t want to ship this without knowing it works. [22:44:56] coreyfloyd: cool, I think ideally we want to get to a point where if people's EL is working in beta, we can guarantee it works in prod [22:45:24] that way people don't have to worry about all these nasty details and get access to stuff they don't need [22:46:56] dr0ptp4kt: the lag used to be up to two hours last week before we improved performance. But I don't think you can get to the logs anymore unless you have access to eventlog1001.eqiad.wmnet, which almost nobody does [22:47:19] that's kind of a good thing since there's a lot more PII in EL events than there used to be [22:48:05] milimetric: yeah, no access there. couldn't find stuff with find / -name udp2zmq 2>/dev/null on 02 and 03 boxen [22:48:20] dr0ptp4kt / coreyfloyd: for the future, also, if you're just trying to double check something like this, let me know what to look for and I'm happy to go poke around [22:48:24] milimetric: was looking at https://wikitech.wikimedia.org/wiki/EventLogging [22:48:43] dr0ptp4kt: all documentation for EL should be on mediawiki.org, our docs are a total mess