[00:07:51] Analytics, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1707539 (GWicke) [00:08:10] Analytics, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1707320 (GWicke) [00:08:57] Analytics, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1707320 (GWicke) I went ahead and hijacked the task description with a summary of my own. Please edit to reflect the discussion more accurately! [00:09:21] Analytics, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1707542 (GWicke) [00:11:13] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Microtask: Create a very simple REST API - https://phabricator.wikimedia.org/T114838#1707546 (jgbarah) NEW [00:12:11] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Microtask: Create a very simple REST API - https://phabricator.wikimedia.org/T114838#1707546 (jgbarah) [00:14:52] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1707561 (jgbarah) [00:16:40] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1088193 (jgbarah) >>! In T60585#1706581, @Oakshweta11 wrote: > @jgbarah could you suggest what I should do next? I would... [00:18:15] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1707575 (jgbarah) >>! In T60585#1677452, @Simmimourya3107 wrote: > Is there any chance that this project gets proposed for... [01:49:01] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1707671 (Milimetric) [01:54:16] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1707673 (Milimetric) Here's what I understood from our last discussion, I think it's fairly close to the task summary, but it's a bit simpler in my mind. We have two ways of fro... [02:47:08] Analytics-Tech-community-metrics: Microtask: Create a very simple REST API - https://phabricator.wikimedia.org/T114838#1707725 (NiharikaKohli) [07:24:48] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1707946 (Anmolkalia) Hi, @jgbarah, I am facing a problem in running the original mediawiki_analysis.py. I am writing this in the terminal "./mediawiki_analysis.py --database mwdb --db-user root... [07:44:54] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1707960 (jgbarah) >>! In T114437#1704569, @Anmolkalia wrote: > Hi, I have a doubt. The dbms we will be using in the backend is till MySQL, right? So, I should continue using MySQL datatypes? Sib... [07:47:12] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1707962 (jgbarah) >>! In T114437#1704738, @Anmolkalia wrote: > Hi, @jgbarah, this is the what the mapping file looks like. Let me know if this is fine. {F2661511} At first glance, it seems reas... [07:49:03] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1707963 (jgbarah) >>! In T114437#1707946, @Anmolkalia wrote: > Hi, @jgbarah, I am facing a problem in running the original mediawiki_analysis.py. I am writing this in the terminal "./mediawiki_a... [11:48:47] Analytics-Kanban, Analytics-Wikimetrics: Wikimetrics' cohort page is returning 500 in production. - https://phabricator.wikimedia.org/T114881#1708445 (mforns) NEW a:mforns [11:50:12] Analytics-Kanban, Analytics-Wikimetrics: Wikimetrics' cohort page is returning 500 in production {dove} - https://phabricator.wikimedia.org/T114881#1708457 (mforns) [11:58:55] Analytics-Kanban, Analytics-Wikimetrics: Wikimetrics' cohort page is returning 500 in production {dove} - https://phabricator.wikimedia.org/T114881#1708469 (mforns) [12:03:43] (PS2) Joal: Add CassandraXSVLoader to refinery-job [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) [12:18:48] Analytics-Backlog, Analytics-Wikimetrics: Some special characters break Wikimetrics' encoding {dove} - https://phabricator.wikimedia.org/T114884#1708492 (mforns) NEW [12:19:06] Analytics-Kanban, Analytics-Wikimetrics: Wikimetrics' cohort page is returning 500 in production {dove} - https://phabricator.wikimedia.org/T114881#1708501 (mforns) The problem was that a user inserted a tag with the character(s): ``` ՛ր ``` And it seems that the encoding we use does not support that. It... [12:19:40] Analytics-Kanban, Analytics-Wikimetrics: Wikimetrics' cohort page is returning 500 in production {dove} [3 pts] - https://phabricator.wikimedia.org/T114881#1708504 (mforns) [12:33:52] Analytics-Kanban, Analytics-Wikimetrics: Wikimetrics' cohort page is returning 500 in production {dove} [2 pts] - https://phabricator.wikimedia.org/T114881#1708524 (mforns) [12:40:44] Analytics-Backlog, Analytics-Wikimetrics: Some special characters break Wikimetrics' encoding {dove} - https://phabricator.wikimedia.org/T114884#1708539 (mforns) [12:42:59] (PS4) Joal: [WIP] Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 (https://phabricator.wikimedia.org/T108174) [12:45:41] (CR) Joal: [C: -1] "Need to remove comments on pom.xml" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/232448 (https://phabricator.wikimedia.org/T108174) (owner: Joal) [13:33:23] Analytics-Tech-community-metrics, DevRel-October-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1708649 (Qgil) [13:33:38] Analytics-Tech-community-metrics, Developer-Relations, DevRel-October-2015: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1708654 (Qgil) [14:06:43] Analytics-EventLogging, Performance-Team, Patch-For-Review: Support kafka in eventlogging client on terbium - https://phabricator.wikimedia.org/T112660#1708738 (Ottomata) See: https://phabricator.wikimedia.org/T109567#1683710 and https://phabricator.wikimedia.org/T114199 [14:07:07] Analytics-Backlog, Analytics-EventLogging: Upgrade eventlogging servers to Jessie - https://phabricator.wikimedia.org/T114199#1708740 (Ottomata) [14:07:09] Analytics-EventLogging, Performance-Team, Patch-For-Review: Support kafka in eventlogging client on terbium - https://phabricator.wikimedia.org/T112660#1708739 (Ottomata) [14:17:57] Analytics, Services: restbase is not listening on port 7231 on aqs* - https://phabricator.wikimedia.org/T114742#1708783 (mobrovac) Open>Resolved AQS is now up && running, resolving. [14:17:59] Analytics-Kanban, netops, operations, Patch-For-Review: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug} - https://phabricator.wikimedia.org/T107056#1708785 (mobrovac) [14:25:57] Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1708828 (Oakshweta11) I'll start with T114838: Microtask: Create a very simple REST API Thanks. [14:27:17] ottomata: i did miss a lot, eh? what? you are writing js now? [14:28:04] haha [14:28:06] yes! :) [14:30:40] (CR) Ottomata: [C: 2 V: 2] Setting up wmf branch to use archiva.wikimedia.org and maven release plugin [analytics/camus] (wmf) - https://gerrit.wikimedia.org/r/244043 (owner: Ottomata) [14:31:36] hehe, ooo [14:31:39] it does look similar [14:31:43] oops, wrong chat [14:35:09] ottomata: do you have a sec? [14:35:50] (PS1) Ottomata: Fix scm url for gerrit camus [analytics/camus] (wmf) - https://gerrit.wikimedia.org/r/244161 [14:36:07] (CR) Ottomata: [C: 2 V: 2] Fix scm url for gerrit camus [analytics/camus] (wmf) - https://gerrit.wikimedia.org/r/244161 (owner: Ottomata) [14:36:57] nuria: yes [14:37:12] ottomata: okquestion about madhuvishy 's patch [14:37:31] ottomata: seems to me that we do not want schemas here: https://gerrit.wikimedia.org/r/#/c/243990/ [14:37:49] ottomata: but rather in an outside depot that both mediawiki and our java apps can share [14:38:23] ottomata: because we will not be able to deploy the [14:38:32] ottomata: refinery camus to mw stack, right? [14:38:42] nuria: eventually yes [14:38:44] but we don't ahve that yet [14:38:51] and we can't create it now without a lot of bikeshedding [14:39:04] ottomata: but search code is almost done right? [14:39:42] ottomata: they are having to duplicate those schemas to unit test at least [14:39:46] ottomata: as things are now [14:40:14] ottomata: right? [14:40:24] ottomata: so we are having schemas n two places [14:40:26] nuria: at the moment, yes, these schemas will be duplicated. [14:40:36] we will talk about this problem on thursday :) [14:40:47] ottomata: ok [14:52:48] joal: i own you a million crs [14:53:04] ottomata: did you put the camus artifacts in archiva? [14:55:10] (CR) Nuria: "Let's talk about this today." (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (owner: Madhuvishy) [14:55:18] Analytics-Kanban: Spike: understand wikistats enough to estimate replacing pageview data source {lama} [8 pts] - https://phabricator.wikimedia.org/T114660#1708905 (Milimetric) a:Milimetric [14:56:00] nuria: am trying... [14:56:13] we are trying to make the camus wmf branch able to mvn release to archiva [14:56:35] ottomata: every time it builds? [14:56:39] no [14:56:42] ottomata: ahhh [14:56:44] the same way we do for refinery [14:56:45] Analytics-Kanban: Gain permission to delete articles on wikitech (needed for doc cleanup) [3 pts] - https://phabricator.wikimedia.org/T114672#1708908 (Milimetric) a:Milimetric [14:57:12] Analytics-Kanban: Gain permission to delete articles on wikitech (needed for doc cleanup) [3 pts] - https://phabricator.wikimedia.org/T114672#1708914 (Nuria) And mediawiki, right? [15:00:04] nuria: don't worry :) [15:00:18] joal: which one is the most pressing one? [15:00:42] ottomata: let me know if i can help with archiva stuff, did madhu "faked" maven yesterday to build her jars? [15:00:45] cc madhuvishy [15:00:50] joal: keyspaces are there! [15:01:22] nuria: https://gerrit.wikimedia.org/r/#/c/240868/ would be great (not in detail, camus needs to be pushed to archiva first, but the approach and so [15:01:29] milimetric: awesome ! [15:02:26] ottomata: Thanks for the camus mvn stuff ! [15:04:02] Analytics, Privacy, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1708950 (Nuria) > but we should not make correlating the webrequest/pageviews dataset with mediawiki logs easier, mmm... this is precisely what this ticket i... [15:04:27] joal: looking [15:05:23] joal: idea is to use a spark job for monitoring correct? [15:05:25] joal: nuria, it smostly working, i'm trying to do release:perform now [15:05:38] i think it sworking [15:05:43] nuria: it's not even spark: it's core java :) [15:06:01] ottomata: you have 0.1.0-wmf6 in archiva (com.linkedin.* [15:06:02] ottomata: so, should i be able to build [15:06:28] ottomata: is that what is expected? [15:06:39] nuria: I think you should be able to build my code yes :) [15:06:44] ottomata: Still:Failed to execute goal on project refinery-camus: Could not resolve dependencies for project org.wikimedia.analytics.refinery.camus:refinery-camus:jar:0.0.20-SNAPSHOT: The following artifacts could not be resolved: com.linkedin.camus:camus-schema-registry:jar:0.1.0-wmf6, com.linkedin.camus:camus-etl-kafka:jar:0.1.0-wmf6, [15:06:44] com.linkedin.camus:camus-kafka-coders:jar:0.1.0-wmf6: Failure to find com.linkedin.camus:camus-schema-registry:jar:0.1.0-wmf6 in https://archiva.wikimedia.org/repository/mirrored/ was cached in the local repository, resolution will not be reattempted until the update interval of wmf-mirrored has elapsed or updates are forced -> [Help 1] [15:07:08] mwarf :( [15:07:50] nuria, that's weird: I can see the jar in archiva [15:08:06] joal: man .. mvn everytime! [15:08:11] let me re-build [15:08:12] :D [15:09:08] Analytics-Kanban: Gain permission to delete articles on wikitech and mediawiki (needed for doc cleanup) [3 pts] - https://phabricator.wikimedia.org/T114672#1708959 (Milimetric) [15:09:47] nuria: they aren't uploaded yet, am working on it [15:10:04] ottomata: k, that's what i thought [15:10:05] Ah, right, I can see them on search though ... wird [15:10:13] Nuria: yes, I did mvn install: install-file locally to get my build to work [15:10:29] And yes, those schema registry classes were only for testing [15:11:24] The dummy schema one is in a different package now - org.Wikimedia.. And thats the one I'm using in the properties [15:11:39] So I'm sure it's using the right class [15:12:17] joal: i think some of them went... [15:12:22] its in the middle of it [15:12:25] and it failed once before [15:13:59] madhuvishy: wait, teh bindings (which is what we use) are in the same package right? https://gerrit.wikimedia.org/r/#/c/243990/3/refinery-camus/src/main/java/com/linkedin/camus/example/records/DummyLog2.java [15:14:24] madhuvishy: ah wait , no [15:14:28] Yup [15:14:35] Its in my new one [15:15:29] madhuvishy: right, but on the same "package" com.linkedin.camus.example.records; [15:15:40] So, I wanted to test my new setup, so I thought I would make a search schema registry, and make sure everything else is working. When that dint work, I put the dummy one in there so I can see if it was something else [15:16:03] madhuvishy: I see, there are several things [15:16:06] Nuria: true [15:16:27] the dummy one (since it exists on the other jar) if you have the other jar is been loaded from there [15:16:37] Can be fixed by changing the namespace on the local class and trying [15:16:54] And I dint put the example one on the path [15:17:05] ahhh, ok, that makes sense [15:17:15] other thing are bindings [15:17:34] madhuvishy: does your jar have bindings for cirrus search schema? [15:17:40] Yes [15:17:49] I pushed it with the source too [15:19:23] Its in /home/madhuvishy/avro-kafka/refinery-camus-0.20 something jar [15:19:26] hey milimetric [15:19:33] hey hey [15:19:39] nuria: ^ [15:19:52] the top 'all-days' for the top end-point, is it 'all-day' or 'all-days' ? [15:20:08] madhuvishy: ok, and you build it by adding the jars to your ~/.m2? [15:20:12] milimetric: --^ [15:20:32] all-days: https://github.com/wikimedia/restbase/blob/master/specs/analytics/v1/pageviews.yaml#L193 [15:21:33] joal: if you're still testing that's cool but I think the per-article data should be loaded first if possible, that's the one people are blocked on us for [15:22:11] (CR) Nuria: [WIP] Add refinery-camus module (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (owner: Madhuvishy) [15:22:46] ay so many things ... i am getting lost .... [15:23:09] nuria: yup, let's catch up after standup [15:23:10] madhuvishy: ok, i put on the patch all things i wanted to capture [15:23:39] madhuvishy: ok [15:23:51] I do wanna have a common registry that does topic-schema mapping and returns the appropriate schema [15:24:01] Both those registries should ideally be gone [15:24:03] madhuvishy: ya, let's do that today [15:24:18] But Id first like the search schema to work [15:24:26] madhuvishy: teh topic/schema mapping will be on properties file [15:24:36] Yup [15:24:40] madhuvishy: we will just pass a property that tells us what schema to load [15:24:46] and we will load it at runtime [15:24:49] milimetric: starting tonight (still a few things to have squared0 [15:25:08] madhuvishy: what did you used to generate bindings? [15:25:27] madhuvishy: the mvn plugin? [15:27:27] In Wikimedia Report Card terms, what does "reach" refer to? [15:27:38] Further, are there any breakdowns by country, at least for the big countries? [15:28:00] nuria: yes, its all in the pom [15:28:11] mvn compile will generate it, so will package etc [15:28:29] milimetric: ^^ [15:28:42] madhuvishy: did you build it by putting the jars on your .m2? since they are not in archiva? [15:28:49] yes [15:29:11] nuria: http://pastebin.com/pue28vUV [15:30:07] madhuvishy: ja, so handy! i always did it low tech by hand [15:30:46] harej: it's a comScore term, it means the percentage of people in that region that have visited our projects [15:31:25] ottomata: standup baby [15:31:30] harej: I think there are country breakdowns, Erik Zachte usually processes that data and makes it available on a monthly basis [15:34:11] YES [INFO] BUILD SUCCESS for camus. [15:34:21] So I see for Europe that in June 2015, there were 134.5 million uniques from Europe and the reach was 31.7%. But 31.7% of Europe isn't 134.5 million; that would be closer to the 300-millions. How should these numbers be interpreted? [15:35:55] harej: those are devices , not people [15:36:14] which are devices, not people? reach or uniques? [15:39:29] Analytics-Kanban: Update camus-wmf to be deployed by maven (missing jars otherwise) {hawk} [8 pts] - https://phabricator.wikimedia.org/T114657#1709089 (Ottomata) a:Ottomata [15:39:51] bd808, I implemented the changes you suggested here: https://gerrit.wikimedia.org/r/#/c/241984/ Have you seen them? :] [15:40:07] Analytics-Kanban: Update camus-wmf to be deployed by maven (missing jars otherwise) {hawk} [8 pts] - https://phabricator.wikimedia.org/T114657#1702293 (Ottomata) Done! https://archiva.wikimedia.org/#artifact~releases/com.linkedin.camus/camus-etl-kafka/0.1.0-wmf6 [15:40:45] mforns: I think I saw the email notification but didn't do a full review. I'll do that now and refresh the beta cherry-pick if it looks good [15:41:17] bd808, thank you very much! [15:41:19] milimetric: where can I find the country breakdowns? [15:41:21] bd808: if you +1 i will merge [15:41:43] nuria: which refers to devices and not people: reach or unique visitors? [15:42:04] harej: unique visitors refers to devices [15:42:42] harej: reach is basically unique visitors divided by how many devices comScore figures each person accesses our projects from in each different area [15:43:20] to get more data, either email Erik Zachte directly or the analytics list as other people might have crunched the numbers the same way you have [15:43:46] and our pageview/reach data is down y/y; this makes me sad, unless there is a logical explanation? [15:43:48] harej: this is important though: the comScore data is wildly inaccurate for us because it doesn't include any data from mobile devices. And it's considered wildly inaccurate anyway [15:44:01] harej: yes, very logical: they don't include mobile devices :) [15:44:23] harej: this is a more accurate count of pageviews: https://vital-signs.wmflabs.org [15:44:45] (there's a decline there but it's mostly seasonal, which will show up once we have more data) [15:45:10] there are annotations at the bottom for some outages and there's a breakdown by type of access on the left, so you can see the mobile data [15:45:37] and how do I use this site to get data for all the projects? [15:45:58] I'm not getting any useful information; just blank screens. [15:46:10] harej: search for "totals" on the top left [15:46:19] harej: hm? sounds like an error... [15:46:30] maybe file a bug on phabricator, but try clearing your cache first [15:46:31] yes; incidentally, how well does this site play with Internet Explorer 11? [15:46:55] oh :) not sure, I just recently got IE 11 so I'll try it now :) [15:47:57] harej: we might need to fix some bugs, try chrome /ff [15:48:36] oh, if I *had* those browsers on this computer I would use them without being prompted to! [15:48:58] harej: just tried on IE 11, works fine. File a bug with what you're seeing if you're getting blank screens. My IE 11 loaded everything in about 1 second [15:49:53] harej: second milimetric , for your data questions an e-mail to analytics@ would be best [15:50:17] analytics@lists.wikimedia.org? [15:50:21] yes [15:50:27] mforns: I updated the beta cluster cherry-pick. Can you force a few errors so we can see if it still does what we want? [15:50:39] bd808, sure, just a sec [15:54:22] Analytics-Backlog, MediaWiki-API, Reading-Infrastructure-Team, Research-and-Data, Patch-For-Review: Publish detailed Action API request information to Hadoop - https://phabricator.wikimedia.org/T108618#1709138 (bd808) [15:55:49] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1709142 (GWicke) @milimetric, that's basically option 1). Sounds good to me. Regarding the root, shall we use `/api/rest_v1/stats/`, `/api/rest_v1/metrics/`, `/api/rest_v1/data/... [15:56:11] bd808, error logs are in logstash-beta [15:57:11] mforns: https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/eventlogging looks ok to me. Does the data for the events look right to you? [15:57:15] Analytics-Backlog: Flag in x-analytics in varnish any request that comes with no cookies whatsoever - https://phabricator.wikimedia.org/T114370#1709145 (Nuria) p:Triage>Unbreak! [15:58:01] bd808, yes they look good :] [15:58:10] sweet. [15:58:18] (Abandoned) Ottomata: [WIP] How to send job results via email using oozie [analytics/refinery] - https://gerrit.wikimedia.org/r/182350 (owner: Nuria) [15:58:22] (Abandoned) Ottomata: Add generic oozie component for emailing data [analytics/refinery] - https://gerrit.wikimedia.org/r/210632 (owner: Madhuvishy) [16:01:16] Analytics-Backlog: Oozie sends emails when any job fails - https://phabricator.wikimedia.org/T114901#1709169 (JAllemandou) NEW [16:04:52] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1709189 (Milimetric) Since it's proxying to the analytics query service, we chatted a bit and figured /api/rest_v1/analytics/ would make the most sense. I agree /data is somethi... [16:04:59] mforns: logstash change applied [16:05:07] ottomata, mmmmmm [16:05:36] prod dashboard at https://logstash.wikimedia.org/#/dashboard/elasticsearch/eventlogging-errors [16:05:52] COOL [16:08:00] bd808, no eventlogging error logs are entering production logstash [16:08:13] we should have like one event every other second, in average [16:08:22] hmmm... [16:09:01] I see the kafka input running in the process list [16:09:05] aha [16:09:10] so, how scalable is this new eventlogging stuff. I would kinda like to start collecting information about the results people are clicking on from Special:Search (to build a feedback loop) but by definition that feedback loop is only usefull with minimal sampling [16:10:14] mforns: the prod connect string looks like this: zk_connect => "conf1001.eqiad.wmnet,conf1002.eqiad.wmnet,conf1003.eqiad.wmnet/kafka/eqiad" [16:10:17] our other option is to build out some sort of bounce and suck the data straight out of web request logs, which is also possible [16:10:26] is that the right syntax for multiple hosts? [16:11:04] bd808, yes it seems right, although the port is not there... [16:11:28] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1709209 (GWicke) To me, 'metrics' seem to be a bit more accurate description of what is available in that hierarchy. 'Analytics' has a higher-level ring to me. [The wiki says](ht... [16:11:30] will ask ot-tomata [16:12:05] ebernhardson, we recently did some load testing on EventLogging [16:12:22] and achieved more than 7500 evt/sec in a single machine [16:12:38] in theory, EventLogging is now lineraly scalable [16:12:47] its not writing to a single mysql master anymore either? [16:13:14] mforns: the beta config doesn't have a port either: zk_connect => "deployment-zookeeper01.eqiad.wmflabs/kafka/deployment-kafka" [16:13:14] so if we add more machines to it, we can continue scaling until...? [16:13:19] sweet! [16:14:08] ebernhardson, it continues to write to mysql, but in case of such large event flows, we would blacklist the schema [16:14:22] and its events would only be stored in hdfs [16:14:30] mforns: ok cool, we would only actually want them in hdfs anyways [16:15:17] ebernhardson, you should anyway talk to ottomata for the details and confirmation [16:15:20] :] [16:15:49] sure, and it will probably be a month before we get arround to building out this feedback loop, we have a few other hadoop things in the pipeline first it was just another on our list of ways to improve search :) [16:15:59] bd808, you're right, the format seems correct to me [16:17:54] bd808: that format is correct, it would be good if the ports were there, but i think it picks the default if not [16:17:55] 2181 [16:18:01] ebernhardson: everytign mforns says is correct [16:18:18] i would really like to push more through eventlogging soon, so we can make use of the work we did [16:18:26] i don't want to push a ton more all at once [16:18:37] but i'd be fine with doubling or tripling our current throughput for your use case [16:18:40] maybe a little mroe [16:19:22] sounds reasonable, i don't actually know how many clicks we get in special:search...should probably find out [16:19:51] ebernhardson: we could probably ramp it up, right? [16:19:58] start doing it sampled, and then slowly ramp it up? [16:20:00] ottomata: certainly, start with a sampling and then go up [16:20:33] mforns: The logstash process is showing me that there are 11 threads in the JVM labeled " i'm also not sure what % of events we need to make a useful feedback loop, i'm mostly guessing many because our most common queries are run a couple hundred times a day, and we serve 100M requests/day [16:20:57] so there is a wide variety to capture [16:21:05] hm, mforns, bd808, there are very few EventErrors [16:21:05] bd808, 11 threads? [16:21:26] it's java. gawd knows why it spawns threads [16:21:32] (PS5) Joal: Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 (https://phabricator.wikimedia.org/T108174) [16:21:37] actually even more scary it's jruby [16:21:40] bd808: if you check here, can you see just EventError? [16:21:40] http://grafana.wikimedia.org/#/dashboard/db/eventlogging?panelId=5&fullscreen [16:22:01] it averages over a minute, so it looks like < 1 per second [16:22:05] bd808: Java spawn threads to ensure its threads are running [16:22:15] i guess that should be plenty though, about 0.5 per second is 30 per minute [16:22:44] ottomata, yes, that's the usual [16:22:57] ottomata: yeah and we haven't seen any at all into the logstash backend yet [16:23:02] * bd808 looks at the filter again [16:24:01] hmmm.. puppet did something different in generating the config file between beta and prod [16:24:08] and it would cause it all to fail [16:24:21] "tags => eventlogging_EventErrorkafka" [16:24:27] wow [16:24:36] instead of "tags => ["eventlogging_EventError", "kafka"]" [16:24:47] curious [16:25:02] so there is the problem. Now I guess we need to figure out how to fix it [16:25:11] * bd808 checks verisons [16:25:13] ok [16:25:29] brb [16:26:13] even more strangely, the beta and prod hosts have the exact same version of puppet installed [16:27:24] mmm [16:28:23] hm [16:28:48] hmmm, that's weird [16:28:51] yeah, i guess if you expect tags to be an array [16:29:05] to render it properly in the template you should stringify it [16:29:06] mabye [16:29:37] ["<%= @tags.join(",") %>"] [16:29:38] ? [16:29:42] ottomata, makes sense, but the same file worked in logstash-beta, no? crazy [16:29:53] ["<%= @tags.join('","') %>"] [16:29:57] yeah, that is very strange [16:30:05] yeah, that would be safer I guess [16:30:11] different versions of ruby? [16:30:22] this is all on the puppet side [16:30:30] ja but puppet is ruby [16:30:41] oh right. [16:30:46] .erb == embedded ruby [16:30:53] so <%= %> blocks are ruby code [16:31:10] they should basically be the same. both are jessie and have puppet 3.7.2-4 installed [16:31:20] hm, yeah, dunno [16:31:26] bd808: hm [16:31:30] what if instead of the .join [16:31:32] you do [16:31:36] @tags.to_s [16:31:37] ? [16:31:38] huh [16:31:40] that is what <%= shoudl do. [16:31:42] hm [16:31:44] strange, dunno [16:32:08] !log Started cassandra load jobs from 2015-10-01 [16:32:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [16:37:02] bd808, ottomata, https://gerrit.wikimedia.org/r/#/c/244191/ [16:38:55] mforns: somehow I think int won't matter [16:38:59] .to_s is what <%= should do. [16:39:10] but, i guess we can try! [16:39:10] ottomata, yes [16:39:18] aha [16:41:32] yeah, no chnage mforns [16:41:37] :/ [16:42:05] mforns: build the string manually? [16:42:24] Analytics, Privacy, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1709316 (GWicke) @nuria, I think @csteipp means the logs MediaWiki stores in its log tables, and not "operational" logs. There aren't any plans to store requ... [16:42:34] ottomata: some analytics box crashed and daniel did a hard reboot I think he was pingin' you on -ops fyi [16:45:26] ottomata, looking at it [16:55:11] ottomata: did you apply that patch on the puppetmaster after you merged? If so it didn't change anything in the generated config. [16:55:37] I have a patch ready that uses join() instead [17:03:14] bd808, I think ot-tomata is having lunch, can I see the patch? [17:03:38] mforns: https://gerrit.wikimedia.org/r/#/c/244198/ [17:03:42] thx! [17:06:40] bd808, have you tested this in logstash-beta? do you need me to send errors to kafka? [17:10:11] I tested it but just by comparing the prior config to the new one (they were identical) [17:10:11] bd808, aha, OK [17:10:11] bd808, hey thank you for working in that, appreciate it a lot :] [17:10:12] sure. logstash is kind of my baby so I feel obligated to help get more usage of it [17:10:12] aha, anyway thx [17:14:42] Analytics-Kanban, Analytics-Wikimetrics: Wikimetrics' cohort page is returning 500 in production {dove} [2 pts] - https://phabricator.wikimedia.org/T114881#1709457 (Fhocutt) Oh, no. Thanks for tracking this down, @mforns. [17:28:41] ottomata: gabriel makes the valid point that "analytics" may be too abstract from the consumer's point of view. And that "metrics" is more approachable. Do you or anyone else feel strongly about "analytics"? I'd like to move forward soon [17:28:42] https://phabricator.wikimedia.org/T114830#1709209 [17:29:04] Analytics-Kanban, Analytics-Wikimetrics: Wikimetrics' cohort page is returning 500 in production {dove} [2 pts] - https://phabricator.wikimedia.org/T114881#1709498 (egalvezwmf) Thank you! [17:31:22] milimetric: no issue for me [17:31:49] milimetric: issue with jobs though --> problem with keyspaces having capitals (shitty cassandra code again ...) [17:32:17] milimetric: I need to go for tonight, I have a friend at home [17:32:31] I'll try to solve and start the loading tomorrow asap :( [17:36:29] joal: no worries, thx [17:36:29] lemme know if i can help [17:36:29] milimetric: I'll explain you tomorrow, it's about quoted identifiers in cassandra [17:36:30] Have a good end of day all ! see y'all tomorrow :) [17:42:02] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1709578 (Ottomata) I suggested analytics not because it is connected to the 'Analytics team' or the 'Analytics cluster', but that we have named this endpoint the Analytics Query... [17:46:04] mforns: bd808 looks bletter [17:46:04] + tags => ["eventlogging_EventError", "kafka"] [17:46:17] cool' [17:46:17] https://logstash.wikimedia.org/#/dashboard/elasticsearch/eventlogging-errors [17:46:21] here they are! [17:46:38] hey, we can haz data [17:47:32] next nice step would be getting the schema name added to the event we see in kafka [17:47:55] so we can drill down on what schema is producing bad data [17:49:11] Analytics, Security-Reviews: Security review of Analytics Query Service - https://phabricator.wikimedia.org/T114918#1709604 (csteipp) NEW [17:51:40] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster - https://phabricator.wikimedia.org/T114830#1709619 (Milimetric) I agree with the wiki on what "analytics" means, and I think this particular part of the path isn't too related to the analytics query service. We may hit t... [17:54:15] bd808: aye, should not be hard to do [17:55:28] bd808, ottomata, woohoo! cool!! [17:55:41] +1 schema name [17:55:57] will create a task for that [17:55:58] ok, moving to cafe, back shortly [17:56:25] hey thank you guys for helping me *a lot* in that! [17:56:47] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1709639 (Milimetric) a:Milimetric [18:11:10] Analytics, Security-Reviews: Security review of Analytics Query Service - https://phabricator.wikimedia.org/T114918#1709722 (mobrovac) The main component is a #RESTBase cluster on `aqs100[123]`. It uses the vanilla version, with an additional module - [`pageviews`](https://github.com/wikimedia/restbase/b... [18:13:01] madhuvishy: interview was cacelled [18:13:05] *cancelled [18:13:09] nuria: aah [18:14:39] let me know if you wanna continue [18:20:33] madhuvishy: yessss [18:20:36] batcave! [18:24:56] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1709771 (GWicke) @milimetric, the new x-request-handler syntax should make this relatively straightforward to accomplish. You could use a simple proxy hierarchy wi... [19:27:02] ebernhardson: yt? [19:28:48] nuria: yup [19:29:15] ebernhardson: we have had success processing records from your schema but [19:29:36] ebernhardson: the schema needed some modifications for defaults [19:29:50] ebernhardson: we are commiting a patch so you can see [19:30:03] ebernhardson: how are you validating your records against teh schema? [19:30:09] there's always a but :) more defaults is probably ok. I prefer the strictness of not having defaults, but i can see how it makes the whole thing smoother [19:30:21] (and the schema upgade policy requires it) [19:30:48] ebernhardson: they do not seem to be optional [19:31:03] ebernhardson: were you using any tools to validate your records against your schema? [19:31:33] nuria: in php we validate the schema with the avro library: https://github.com/wikimedia/avro-php/blob/master/lib/avro/schema.php#L389 [19:31:47] (it says wikimedia, but thats just because apache doesn't distribute it in a way that works for our production) [19:32:00] err, validate the datum against the schema [19:32:00] ebernhardson: so if your records validated there are discrepancies between that library and the avro tools one in java [19:32:23] hmm, i'm pretty sure i manually checked the json i sent you validated (i wrote it by hand, not generated by the code yet) [19:32:33] lemme double check i still hvae that on my disk [19:32:36] ebernhardson: can you check again? [19:32:48] ebernhardson: sounds good cc madhuvishy [19:34:17] (PS4) Madhuvishy: [WIP] Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 [19:35:31] ebernhardson: the message was valid - the schema we had to change [19:35:43] ebernhardson: see changes to schema: https://gerrit.wikimedia.org/r/#/c/243990/3..4/refinery-camus/src/main/avro/CirrusSearchRequestSet.avsc [19:35:54] madhuvishy: oh, you mean it wouldn't load the schema? i thought you meant the datum didn't match the schema [19:35:58] php 100% thinks thats a valid schema :S [19:36:06] ebernhardson: it is the same problem [19:36:15] ebernhardson: teh data did not validate to schema [19:36:20] ebernhardson: we change schema [19:36:36] ebernhardson: but you could change the data too to solve teh issue [19:36:44] ebernhardson: hmmm, its a bit confusing. the thing that din't work was - for union types like ["null", "int"], it expects the message to explicitly state the type [19:36:54] for now we removed all the union types [19:37:07] also camus expects all the fields in the schema to have defaults [19:37:17] interesting, yes php thinks that the schema is valid [19:37:18] ebernhardson: makes sense? [19:37:25] ebernhardson: the schema compiles [19:37:35] i mean that the message matches the schema [19:37:41] interesting [19:37:49] do you have the avro-tools jar? [19:37:52] no nulls would be annoying, but i guess thats more an anoyance for oliver and mikhail :) [19:37:58] ebernhardson: right, different bindings [19:38:07] it just means they have to do funny stuff to get averages that don't include the unknown values [19:38:22] ebernhardson: there might be a way to do nulls [19:38:27] like avg(null, 1, 2, 3) is not the same as avg(0, 1, 2, 3) or avg(-1, 1, 2, 3) or however you signify a "non-value" [19:38:30] i am not an expert at avro schemas [19:38:37] neither am i, i just read the docs one day :) [19:38:58] if you have the avrotools jar, you can do [19:39:00] java -jar avro-tools-1.7.6.jar jsontofrag --schema-file CirrusSearchRequestSet.avsc searchmessage.json [19:39:03] ebernhardson: bottom line is that a list [null, 1, 2, 3] [19:39:08] is not legal avro [19:39:15] where the last arg is the json message you gave me [19:39:24] if this throws an error, we have a problem [19:39:35] ok [19:40:22] apart from that, all fields should have defaults, feel free to play around with the null stuff - and if your schema validates against the tools jar, we can use it [19:40:40] indeed, the java one complains "Expected start-union. Got VALUE_STRING" [19:40:43] interesting [19:40:44] yup [19:40:57] ebernhardson: this is what i found on that http://stackoverflow.com/questions/27485580/how-to-fix-expected-start-union-got-value-number-int-when-converting-json-to-av [19:43:04] hmm, i guess i need to file an upstream bug with apache for that. I've been through the code and nowhere is that taken into account in the php version of avro librar [19:43:45] oh, actually i guess thats only specific to json encoding (which is unsupported by the php library, it just does binary) [19:50:33] overall i think that will be fine without unions, just having the data is most important [20:01:03] ebernhardson: hmmm, may be it will work with binary? i'll test that in a bit [20:01:18] "ebernhardson: hmmm, its a bit confusing. the thing that din't work was - for union types like ["null", "int"], it expects the message to explicitly state the type " [20:01:22] i kinda remember this, and thought it was weird [20:01:31] yeah [20:01:33] in the json encoding, its like you can't just send the value of one of the types [20:01:39] yeah [20:01:47] you actually have to send an sub object with the value and a key that specifies the type, right? [20:01:51] yupp [20:01:58] may be it'll be fine with binary [20:02:00] i'll check [20:13:11] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1710142 (Milimetric) a:kevinator>Milimetric [20:18:34] What is the best way to get page view data for over a thousand articles all at once? [20:27:34] harej: did you try the pageview dumps? [20:28:15] harej: for example: https://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-05/ [20:28:40] I was just going to have a Python script query grokse 1,400 times but that reminds me that I do actually have a script for parsing through the pagecount dump. [20:32:45] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1710260 (GWicke) [20:37:40] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1710269 (Spage) a:Ottomata [20:39:08] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1710278 (Milimetric) @GWicke, it seems like the right way would be to factor out the docs from specs/analytics/v1/pageviews.yaml and use them in both the front-end... [20:56:14] milimetric, yt? [20:56:21] hi mforns [20:56:24] hi! [20:56:50] I forgot to say in standup that when trying to solve the wikimetrics issue, I broke staging :/ [20:57:10] I thought the problem might be with the last change from ha-shar [20:57:22] wich has to do with requirements [20:57:29] possible yea [20:57:36] so I executed the setup.py script [20:57:48] and after that I could not recover staging [20:57:56] it seems a path problem [20:58:08] Analytics-Backlog: Sanitize pageview_hourly - https://phabricator.wikimedia.org/T114675#1710357 (kevinator) [20:58:16] i'll brb, gotta help carry some stuff, but mforns don't worry about it [20:58:30] milimetric, no problem [20:58:31] setup.py though? did pip install -e . change? [20:59:02] I didn't try that [20:59:30] don't worry, I'll look at it, and if I can not solve it, I'll ping you aftewards [20:59:31] (CR) Nuria: "Things working now." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (owner: Madhuvishy) [20:59:42] Analytics-Backlog: Sanitize pageview_hourly - https://phabricator.wikimedia.org/T114675#1702699 (kevinator) [21:04:39] mforns: back. If it's easier, you can just uninstall completely, wipe all the egg files and dirs, etc. and re-install [21:11:22] milimetric, I dont understand the relation between /srv/wikimetrics/dist/...egg, /srv/wikimetrics/...egg-info and /usr/local/lib/python2.7/dist-packages/wikimetrics* [21:11:42] mforns: I don't understand that crap either, I just delete without prejudice [21:11:57] I kind of treat broken python installations like farmers treat Avian Flu [21:12:35] xD [21:14:39] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1710421 (ezachte) I figured we can produce all breakdowns by geography (middle column of TBD diagram) with two datasets, one for views, one for edits.... [21:36:31] milimetric, wikimetrics-staging is back, thx! [21:36:41] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1710522 (Milimetric) It's up to @Tnegrin if he wants to get geographic breakdowns, @ezachte. I'm leaning towards handling that along with the other br... [21:36:45] cool, thx mforns [22:04:42] (PS5) Madhuvishy: [WIP] Add refinery-camus module [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) [22:06:01] nuria: ^ [22:07:51] (Abandoned) Madhuvishy: [WIP] Add support to camus to consume schemaID-less avro [analytics/camus] - https://gerrit.wikimedia.org/r/243845 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [22:28:02] madhuvishy: helou [22:28:08] madhuvishy: batcave [22:28:09] ? [22:28:22] yup [22:28:41] i'm there [22:28:45] omw [22:32:50] bye a-team! see you tomorrow! [22:33:00] ciao! [22:33:08] bye mforns :) [22:33:12] :] [22:41:08] Analytics-Kanban, RESTBase: configure RESTBase pageview proxy to Analytics' cluster {slug} [3 pts] - https://phabricator.wikimedia.org/T114830#1710968 (GWicke) @milimetric: yes, that ought to be possible. Only complication is that we'll have to template the URL of the actual backend service, and sub-specs... [22:48:25] (CR) Nuria: "Besides comments the only things needed is to commit our properties file we have been using to refinery side (temporarily) until these can" (2 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/243990 (https://phabricator.wikimedia.org/T113521) (owner: Madhuvishy) [22:49:24] Analytics-Kanban, Analytics-Wikistats: Feed Wikistats traffic reports with aggregated hive data {lama} [8 pts] - https://phabricator.wikimedia.org/T114379#1710977 (ezachte) The quick survey shows most support for continuation of the geographic reports (report 21-24) , more than other breakdowns https://ww... [23:53:54] Analytics, Security-Reviews: Security review of Analytics Query Service - https://phabricator.wikimedia.org/T114918#1711162 (csteipp) Thanks @mobrovac. Once that code forwards the request to restbase, e.g., https://github.com/wikimedia/restbase/blob/master/mods/pageviews.js#L221, what does request look?...