[00:08:30] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1798090 (ezachte) @Nemo_bis > Will the "Views/hr" column in the index for each project (https://stats.wikimedia.org/wiktionary/EN... [00:17:34] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1798101 (Nemo_bis) It's interesting to see that some languages are unaffected by the new calculations, for instance Vietnamese Wi... [00:23:55] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1798119 (Nuria) >Other results confirm past suspicions about crawlers, for instance French Wikiquote and Serbian Wikinews. Also p... [00:33:54] nuria: Hm.. where can I find the raw data for https://phabricator.wikimedia.org/T88504 ? It was on some data-dumps.wikimedia.org url but can't find it. [00:34:09] on hdsf [00:34:15] Krinkle: let me send you teh cmd [00:34:18] *the [00:34:32] Krinkle: hdfs dfs -text /wmf/data/archive/browser/general/desktop_and_mobile_web-2015-9-27/* [00:35:08] Krinkle: we are hoping to work on the viz soon, we need to wrap up some stats.wikimedia.org work [00:35:14] Krinkle: did you had any new ideas [00:35:15] ? [00:35:17] Also, unless the query changed since the sample I got on 2015-10-27, the data is cut off too high. E.g. the lowest entry I have it 0.50% for Windows7-IE8. The metric of 0.5% was already on a high side for browser usage, but since this is fragmented by OS, it is even less details. I'm missing important data and currently blocking me from an informed decision [00:35:18] about IE8 usage. [00:35:22] Analytics-Kanban, Analytics-Wikistats: {lama} Wikistats 2.0 - https://phabricator.wikimedia.org/T107175#1798144 (Jdforrester-WMF) @Nuria, Why did you remove T88504 as a blocker to this task? There's no commentary here or there. As this is probably the most vital statistical (non-load) information for engi... [00:35:31] Krinkle: it is .1% [00:35:39] In August at https://docs.google.com/a/wikimedia.org/spreadsheets/d/1n9FhSqcBGM9iKXrlHsP0EZI0gU89Rmz5m51uglUGVjs/edit?usp=drive_web It was 0.97 percent [00:35:55] The data https://docs.google.com/spreadsheets/d/1fUJrNztr-cWyo0NIwUK5OHNMMEotb_vPgFZE6RnwqZ0/edit#gid=0 is insufficient because it doesn't include Ie8 data from other Windows versions. [00:36:21] Krinkle: sorry, the report doesn't report below 0.1% [00:36:34] I'm not asking below 0.1% [00:36:39] Krinkle: did you do your query over ALL requests? [00:36:40] but I do need totals for browsers upto 0.1% [00:36:49] not by OS [00:37:32] Krinkle: ah let me see, report details up to 0.1% but counting Os , is that it? [00:37:42] I'll file that task about visualising browser data. I forgot that last week [00:37:57] Krinkle: i did already, no need [00:38:04] OK. Link? [00:38:05] Krinkle: it is on our backlog [00:38:17] Krinkle: https://phabricator.wikimedia.org/T118329 [00:38:25] Krinkle: i wanted to talk to erik z [00:39:03] Analytics-Backlog: Visualization of Browser data to substitute current reports on wikistats - https://phabricator.wikimedia.org/T118329#1798151 (Krinkle) [00:39:04] Krinkle: cause once this is done we thought squid reports could be removed and he agreed [00:39:23] I agree too [00:39:46] Krinkle: let's look at reports for a sec, they should be reporting 0.1% [00:39:53] but it is important that I and PMs can see total % usage by browser by version. Up to 0.1% ideally. [00:40:30] The report I had from september only went up to 0.5%, and more importantly that was fragmented by OS. So really it was cut off higher, because numbers or higher when you merge them from different OSes. [00:40:49] Krinkle: that was our 1st run, since then that is been corrected [00:40:58] please ssh into 1002 [00:41:13] and go into /home/nuria/wikistats-browser/ [00:41:15] E.g. there are 4 different Windows versions using IE8. They can amount to something we want to support together. [00:41:19] OK checking :) [00:41:24] i just downloaded all reports there [00:43:02] Krinkle: let me know [00:44:06] Krinkle: looks like you got your answer though, ie 8 is > .1 adding all software versions [00:44:16] OK. That one includes Win7 and WinXP. Amounting to 0.75% for IE 8 when added together. [00:44:34] I think it is still missing some values since the total is 0.94 when I query Hive last week. [00:44:48] But that's not too bad, but less than ideal. [00:45:05] Krinkle: well if you round to 0.1 with two decimals precision yes, you miss [00:46:14] Do you believe it is feasible to additionally query without the os breakdown? That way we can also round to 0.1, but after aggregation. I'm okay either way. [00:46:36] Krinkle: it is feasible but it is very informative to have os specially for moibile [00:46:36] I suppose we can aggregate inside the visualisation as well. [00:46:39] *mobile [00:46:50] Both can be useful yes. [00:46:58] as chrome views from android and IOS are based on completely different browsers [00:47:02] both called chrome [00:47:34] Krinkle: so , i'd say,m if you have more queries this report doesn't satisfy we can change it or add an additional report [00:47:37] afaik those are identified by ua-parser as "Chrome Mobile" and "Chrome iOS" [00:48:26] Krinkle: good [00:48:46] https://www.irccloud.com/pastebin/E4wc8i8q/ [00:48:50] but you're right. That is not the case for all browsers. I acknowledge that. I'm not saying OS isn't useful. But there is a time for both. [00:49:00] so right, it is reporting 90% [00:49:07] we can play with the .1 [00:49:10] As long as the visualisation can merge them my use case and that of Editing department is resolved. [00:49:16] WE don't need a separate query per se. [00:49:22] We* [00:49:46] i think we might update it up to 0.05 and with this one we can satisfy all those queries, [00:50:03] Is there currently a web-accessible url to these dumps that doesn't require authentication? [00:50:08] Krinkle: report is running weekly , erik z requested also a monthly one so we will have both [00:50:20] Cool. Yeah, i was going to ask that next :) [00:50:27] Krinkle: i wonder if we can throw together a viz for this quarter [00:50:41] Krinkle: ya, we need both, weekly to see browser updates [00:51:05] Krinkle: and correlate recently filed bugs [00:51:13] Krinkle: and monthly for bird eye view [00:53:00] Krinkle: but do not worry we get how important this data is [00:53:18] Krinkle: i just want it to be more accessible before we publish it WMF-wide [00:57:41] nuria: Well, I have to make decisions now. So I am sharing these results with various people in WMF for the time being. [00:58:00] It's no different than last 6 months, except the query was ran automatically instead of by me :) [00:58:10] from hdfs [01:14:01] if only we had a platform for dashboarding automagic dataaa [01:25:42] Or page views! [04:30:20] Analytics: MobileWikiAppDailyStats should not count Googlebot - https://phabricator.wikimedia.org/T117631#1798355 (JKatzWMF) @dbrant how do you think we should document this to ensure that anyone else looking at the data (or we in the future) know to lookout for bots. Could other bots conceivably enter the... [11:51:51] Analytics-Tech-community-metrics: Pull user profile data from wikitech.wikimedia.org and use it in community metrics - https://phabricator.wikimedia.org/T53050#1798874 (Aklapper) stalled>declined a:Aklapper [14:32:00] Analytics-Tech-community-metrics, DevRel-November-2015: Tech community KPIs for the WMF metrics meeting - https://phabricator.wikimedia.org/T107562#1799018 (Qgil) Open>Resolved I'm closing this one. The incentive of the metrics meeting and the WMF Engineering meeting has been useful to get these KPI... [14:33:37] (PS5) DCausse: Add support for custom timestamp in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [14:34:43] (PS6) DCausse: Add support for custom timestamp in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [14:44:11] (PS7) DCausse: Add support for custom timestamp in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [15:32:50] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [30.0] [15:50:55] hola a-team [15:51:41] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [15:59:32] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [16:00:09] hoolaaa [16:01:03] hola nuria [16:02:07] Analytics-Backlog, Analytics-EventLogging: EventLogging (MySQL?) Kafka consumer stops consuming after Kafka metadata change - https://phabricator.wikimedia.org/T118315#1799176 (Ottomata) I think this is a bug in pykafka, that may have just been fixed. https://github.com/Parsely/pykafka/issues/314 https:/... [16:02:09] ottomata, joal: this can be merged right? [16:02:10] https://gerrit.wikimedia.org/r/#/c/251238/ [16:03:20] Analytics-Backlog, Analytics-EventLogging: Eventlogging monitoring of consumers (process nanny) - https://phabricator.wikimedia.org/T115495#1799177 (Ottomata) Open>declined a:Ottomata Declined in favor of T118315 [16:04:46] (CR) Nuria: "The changes to properties files will need to be done in puppet, can you do those and link to that changeset here? (ignore if you already C" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [16:06:03] (CR) Ottomata: Add initial oozie job for CirrusSearchRequestSet (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) (owner: DCausse) [16:06:09] (PS8) DCausse: Add support for custom timestamp in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [16:11:34] ottomata: nice, you fixed the code and all my tests are passing now, will dedicate time to write more tests today, did not do any yesterday [16:12:31] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [16:15:01] dcausse: if you add the comment that ottomata is suggesting linking to your hard work on that phab ticket oi think we are aredy to merge [16:15:04] *ready [16:15:43] nuria: aye cool [16:15:57] ottomata: avro in hive makes me sad [16:16:28] (CR) DCausse: Add initial oozie job for CirrusSearchRequestSet (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) (owner: DCausse) [16:17:14] (PS10) DCausse: Add initial oozie job for CirrusSearchRequestSet [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) [16:17:42] nuria: done :) [16:19:34] dcausse: all righttttt, please aadd any handy tips you might have to https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Camus [16:20:20] (CR) Nuria: [C: 2 V: 2] Add initial oozie job for CirrusSearchRequestSet [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) (owner: DCausse) [16:20:36] \o/ [16:21:44] nuria: concerning avro decoder I worked only unit tests :/ [16:21:52] can I use stat1002 to test? [16:21:59] dcausse: that is plenty right? [16:22:21] dcausse: yes, of course, you need to pass jars to job [16:22:29] dcausse: so decoders are found [16:22:29] nuria: ok will try [16:22:49] dcausse: see my examples here: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Camus [16:22:57] thanks [16:26:23] dcausse: and correct/update as needed, the setup of classpath to test this is not exactly .. ahem.. the most intuitive thing. If iam not arround madhuvishy can also help [16:26:49] ok :) [16:28:32] (CR) Ottomata: [C: 1] "Nice!" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [16:31:04] ja dcausse cool, you can test in prod with nuria's example, just modify and change your camus .properties file so that it imports into paths in your homedir [16:31:24] might want to modify .properties so it only imports a little bit from kafka just for testing [16:31:47] ottomata: nothing will be deleted from kafka right? [16:32:03] right [16:32:07] you can't delete from kafka :) [16:32:12] cool :) [16:32:20] camus just stores offsets in hdfs somewhere, wherever you say [16:32:34] after the first run, it will read from there and contintue wherever it left off [16:32:38] dcausse: you can look on 1002 on my homedir [16:32:47] ok then I should be able to test the full chain [16:32:51] nuria: thanks [16:32:53] dcausse: yup, excatly [16:33:13] dcausse: nuria/avro-kafka.. but really i think what you need is on wiki [16:33:30] ok [16:33:41] dcausse: if you need to, you can insert data into the 'test' topic in kafka, but you need to be careful to make camus only read from where you where you are inserting...not sure how to best do that [16:34:23] writing to kafka is done by mediawiki and it's also a pain to setup, so I'd rather use prod data if it's ok :) [16:34:29] ottomata: i think there is a timestamp field on avro properties [16:34:43] ottomata: where you can tell camus to read msgs as of today [16:35:05] ottomata: that way dcausse doesn't get our test messages from past times [16:38:57] yeah dcausse is totally fine [16:39:08] aye [16:43:03] ottomata, dcausse , even better, that way you do not get distracted by exceptions taht might not apply to you [16:46:12] * addshore may go and play with opentsdb, influxdb and blueflood in labs for a bit! [16:53:15] Analytics-Kanban: Pageview API showcase App {slug} - https://phabricator.wikimedia.org/T117224#1799266 (mforns) The new version with the suggested changes is now in the same gist and same url for playing. [16:55:50] Analytics-Backlog: Make AQS return 0 instead of no values {slug} - https://phabricator.wikimedia.org/T118402#1799275 (mforns) NEW [16:58:28] Analytics-Backlog: AQS should expect article names uriencoded just once {slug} - https://phabricator.wikimedia.org/T118403#1799289 (mforns) NEW [17:04:36] Analytics-Backlog, Analytics-Kanban, operations, Monitoring, Patch-For-Review: Turn off sqstat udp2log instance - https://phabricator.wikimedia.org/T117727#1799309 (Nuria) [17:04:39] Analytics-Kanban, operations, Monitoring, Patch-For-Review: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1799308 (Nuria) Open>Resolved [17:09:43] Analytics-Kanban, CirrusSearch, Discovery, Discovery-Cirrus-Sprint, Patch-For-Review: Setup oozie task for adding and removing CirrusSearchRequestSet partitions in hive - https://phabricator.wikimedia.org/T117575#1799341 (Nuria) [17:10:55] argh I can't build on stat1002 because of a test that uses /tmp/testcamus/ :/ [17:11:15] * dcausse disabling mvn tests [17:14:05] dcausse: i rsync jars ... [17:14:12] ah ok [17:15:12] Analytics-Kanban, Patch-For-Review: Exclude MobileMenu from Pageviews - https://phabricator.wikimedia.org/T117345#1799346 (kevinator) Please update the wiki changelog as well: https://meta.wikimedia.org/wiki/Research:Page_view#Change_log [17:15:49] Analytics-Backlog, Analytics-EventLogging: EventLogging (MySQL?) Kafka consumer stops consuming after Kafka metadata change - https://phabricator.wikimedia.org/T118315#1799348 (Ottomata) We should reproduce this in labs with current version of pykafka, then upgrade, and then see if the problem goes away! [17:24:57] Analytics-Kanban: Pageview API showcase App {slug} - https://phabricator.wikimedia.org/T117224#1799363 (Nuria) If @milimetric can test this on IE it will be great. [17:30:32] Ironholds: i think you need to update pageview definition log to note your changes with hidebanners and such right? cc kevinator [17:31:48] ^^^ https://meta.wikimedia.org/wiki/Research:Page_view#Change_log [17:40:32] the --check flag is for camus? [17:52:44] Analytics-Kanban: Pageview API documentation for end users {slug} - https://phabricator.wikimedia.org/T117226#1799387 (mforns) http://rest.wikimedia.org/en.wikipedia.org/v1/?doc doesn't seem to describe our endpoints. [17:58:07] hmm somethig's wrong: Topic not fully pulled, max task time reached at 2015-11-04T20:03:09.000Z, pulled 1035965 records [17:58:33] and I can't see anything in etl.destination.path, the path is not even created :( [17:58:50] will continue tomorrow, thanks for your help! [18:09:51] Analytics-EventLogging, MobileFrontend: Schema:MobileWebEditing: What are commons sorts of errors? - https://phabricator.wikimedia.org/T118366#1798324 (phuedx) [18:12:18] Analytics-EventLogging, MobileFrontend: Schema:MobileWebEditing: What are commons sorts of errors? - https://phabricator.wikimedia.org/T118366#1799436 (phuedx) p:Triage>Normal [18:29:13] Analytics-Kanban: Pageview API documentation for end users {slug} - https://phabricator.wikimedia.org/T117226#1799492 (Nuria) Right, we need a friendlier doc in wikitech that an end user can use w/o having to know anything about restbase [19:32:44] Analytics-Kanban: Pageview API documentation for end users {slug} - https://phabricator.wikimedia.org/T117226#1799571 (GWicke) I think you are looking for https://wikimedia.org/api/rest_v1/?doc. [21:12:03] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1799843 (Ottomata) Is it time to consider creating a standalone repo for these schemas? If so, then that means it is time for r... [21:14:02] Analytics-Cluster: logrotate kafkaServer-gc.log on kafka brokers - https://phabricator.wikimedia.org/T118421#1799847 (Ottomata) NEW [22:16:54] laters a-team! ;) [22:17:06] bye ott... :] [22:17:08] l8r [22:22:08] Analytics-Kanban: Pageview API documentation for end users {slug} - https://phabricator.wikimedia.org/T117226#1800008 (mforns) Thanks @GWicke!