[00:28:29] (CR) Yurik: [C: 2] Add script for hourly cronjobs [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/319250 (https://phabricator.wikimedia.org/T149722) (owner: MaxSem) [00:28:46] (CR) Yurik: [V: 2] Add script for hourly cronjobs [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/319250 (https://phabricator.wikimedia.org/T149722) (owner: MaxSem) [00:36:27] (PS1) MaxSem: WIP: count page with geo tags [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/319260 (https://phabricator.wikimedia.org/T149722) [00:45:15] (PS1) MaxSem: DO NOT MERGE WIP: count page with geo tags [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/319262 (https://phabricator.wikimedia.org/T149722) [08:54:41] Analytics-Kanban, Operations, Traffic, Patch-For-Review: Varnishlog with Start timestamp but no Resp one causing data consistency check alarms - https://phabricator.wikimedia.org/T148412#2762438 (elukey) This is probably due to: https://github.com/openssl/openssl/issues/1799 [09:02:19] Analytics-Kanban, Operations, Traffic: Varnishlog with Start timestamp but no Resp one causing data consistency check alarms - https://phabricator.wikimedia.org/T148412#2762540 (elukey) [09:42:18] Analytics-Kanban, Patch-For-Review: Count pageviews for all wikis behind varnish - https://phabricator.wikimedia.org/T130249#2762623 (JAllemandou) Tested this morning through pivot and pageview-api -- Seems very ok :) [10:00:05] hi everyone. is the data from the event logging schema MobileWikiAppArticleSuggestions somewhere publically available? [10:22:23] mschwarzer: Hi! I think that it is only contained in the EL tables on our databases, but I might be wrong. What is your use case? [10:41:13] elukey: I plan to use the data to measure the performance of article recommender systems ( https://phabricator.wikimedia.org/T142477 ) but before collecting new data I would like to test the evaluation on the exisiting data. [10:42:30] elukey: can the data be made public? i currently dont have access to the analytics infrastructure. [10:50:43] mschwarzer: I am a bit ignorant about specific data in EL (I am an ops engineer :) but afaik it is easier to set you up to access the analytics infrastructure rather than going to the process of making data publicly available [10:52:02] are you doing this work as volunteer or are you part of the WMF? (sorry to ask but it helps to understand what to do) [10:53:28] joal: --^ [10:53:43] hi elukey [10:53:46] o/ [10:54:07] would you mind to review mschwarzer's use case? I am a bit ignorant and probably I might say stupid things :) [10:55:03] mschwarzer: I agree with elukey in the idea that making data public is usually complicated - except in the case data doesn't contain PII [10:56:47] mschwarzer: see https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/Publishing [10:58:03] Also mschwarzer: https://meta.wikimedia.org/wiki/Schema_talk:MobileWikiAppArticleSuggestions [10:58:48] mschwarzer: The purge line of the last link tells you that the schema contains PII by default, which is usual for event-logging (capsule contains user agent for instance) [10:59:30] mschwarzer: So if you want to publish a dataset, it'll mean removing PII from it, etc, which means having access to EL infra [10:59:39] elukey, mschwarzer: makes sense ? [11:00:07] elukey: also, you never say stupid things :) [11:00:33] joal, elukey: for my use case I dont need PII, thus, I dont mind it being removed from the data. [11:01:50] mschwarzer: As said, removing PII etc still needs to be done, meaning having access to EL infra - I really thing you won't go without that [11:02:05] ah nice one https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/Publishing ! [11:03:21] joal: is it possible that somebody with access does that for me? or do I personally need to request the access? [11:04:21] mschwarzer: This should be discussed internally by the team, but we usually tend to provide help for people to du tather than doing for them [11:07:05] joal: thanks for the info. ..just for clarification, for the productive shell access i need NDA, right? [11:07:13] mschwarzer: https://wikitech.wikimedia.org/wiki/Volunteer_NDA :) [11:07:26] I was about to give you the link [11:07:36] thanks elukey :) [11:07:45] I think that you'll also need https://phabricator.wikimedia.org/L3 [11:07:51] for the production access [11:07:52] thanks :) [11:08:14] usually you need to open a phab task explaining the use case in detail and why you need it [11:10:50] i'll do that. thanks for the support. [11:11:18] np mschwarzer :) [11:27:34] Analytics-Kanban, Operations, Traffic: Varnishlog with Start timestamp but no Resp one causing data consistency check alarms - https://phabricator.wikimedia.org/T148412#2762917 (elukey) After a chat with @ema we decided to test a very basic use case, namely if a TCP RST from the client could cause th... [11:32:27] joal: https://github.com/edenhill/librdkafka/issues/777 is interesting [11:32:55] I found it this morning while trying to figure out why cp3045 had tons of delivery errors last week [11:33:18] indeed, interesting !!! [11:33:37] elukey: I'm sure ottomata will be glad to read this :) [11:35:25] already spammed ottomata :) [11:38:54] joal: Moritz is merging https://gerrit.wikimedia.org/r/311138 that will change Firewall rules for ZooKeeper. This should be a no-op but let's keep our eyes open :) [11:39:40] k elukey [11:55:08] elukey: Away for some time, we will back later [12:03:45] sure! [13:17:13] elukey, joal: some more questions regarding NDA: do I need the support of a WMF employee before I submit the request? or can I just explain my project and then hopefully some employee will support it? and is the productive shell access the same as the EL infra access? [13:20:36] mschwarzer: yes you need somebody that will support your use case, but probably the Analytics team could help in this. The production shell will grant you access to a subset of hosts (the analytics ones), and from there you'll be able to query data on the EL dbs [13:20:46] are you working with some team in the WMF now? [13:26:06] elukey: not directly. i got some feedback on my phab ticket but currently nobody from WMF is actively working on the project. [13:30:53] mschwarzer: all right, maybe we could ask to somebody in the WMF that follows the Android app development to vouche for you? [13:31:19] (not sure if you have already followed up with them, probably yes) [13:34:32] Analytics-Kanban, EventBus, Wikimedia-Stream, Services (watching), User-mobrovac: Bikeshed what events should be exposed in public EventStreams API - https://phabricator.wikimedia.org/T149736#2763302 (Ottomata) Not yet, haven't had time. [14:01:53] helloo team :] [14:10:43] o/ [16:08:12] I couldn't find my generic filter task... someone deleted it 'cause yall think I'm crazy, right? :) [16:08:31] milimetric: I think you're crazy but didn't deleted the task ;) [16:09:01] I see. Fair enough :) [16:32:09] Thanks urandom for paving the way for cassandra :) [16:32:11] (trying to connect to our meeting but hangouts is dying) [16:32:16] milimetric: same [16:32:31] (if anyone's in, please forward our apologies) [16:32:55] joal: ottomata : milimetric : https://appear.in/dumps [16:32:59] let's try this [17:13:10] Analytics-Kanban, Patch-For-Review: Count pageviews for all wikis behind varnish - https://phabricator.wikimedia.org/T130249#2764055 (MusikAnimal) Woohoo! Thank you @Nuria and all who helped make this happen! :) [17:16:01] nuria: is there a full list of supported wikis? [17:16:47] musikanimal: yes. https://github.com/wikimedia/analytics-refinery/blob/master/static_data/pageview/whitelist/whitelist.tsv [17:17:02] awesome, thanks! :) [17:24:40] musikanimal: to be precise, we will allow all public wikis from now on, and that whitelist may not have some of them, we'll add them as we see them. [17:25:37] ok sounds good, I'll keep my whitelist synced with that one [17:32:07] milimetric: no, whitelist has everything [17:32:17] milimetric: otherwise we will get alrams [17:32:23] *alarms [17:35:05] nuria: exactly, whitelist doesn't have *everything*, if it did, we would never get alarms :) [17:35:43] right now, it may not have some wikis that haven't received traffic since November 1st. And it of course won't have wikis that are going to be added in the future. [17:37:24] a-team: tomorrow Traffic is planning to start the Text migration to Varnish 4 [17:37:34] brace yourselves, winter is coming [17:38:01] * milimetric puts on a coat [17:38:03] xD [17:41:39] Analytics, ChangeProp, Citoid, ContentTranslation-CXserver, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2764193 (GWicke) [17:44:22] Analytics, ChangeProp, Citoid, ContentTranslation-CXserver, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2764211 (GWicke) [17:50:59] Analytics-Cluster, Operations, Packaging: libcglib3-java replaces libcglib-java in Jessie - https://phabricator.wikimedia.org/T137791#2764216 (MoritzMuehlenhoff) I think so [17:52:10] Analytics, ChangeProp, Citoid, ContentTranslation-CXserver, and 11 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2764221 (GWicke) [17:54:44] heading to cafe [18:17:57] going offline! [18:18:01] o/ [18:36:20] Analytics-Kanban, Wikipedia-iOS-App-Backlog, iOS-app-feature-Analytics: Drop in iOS app pageviews since version 5.2.0 - https://phabricator.wikimedia.org/T148663#2764417 (JAllemandou) `iPhone` string is currently necessary in the user_agent for webrequest being considered as pageview, and it looks th... [18:36:38] (PS1) Joal: Update PageviewDefinition fixing iOS bug [analytics/refinery/source] - https://gerrit.wikimedia.org/r/319374 (https://phabricator.wikimedia.org/T148663) [18:37:58] nuria: do you mind having a look at my comment and CR above --^ [18:38:17] Joal: on meeting, can talk in abit [18:50:33] joal, do you have 5 minutes to talk on oozie code organization? :] [18:50:45] mforns: I do :) [18:50:48] mforns: batcave [18:50:51] to the batcave :] [19:02:42] nuria: gotta get to diner soon [19:05:33] nuria: I'll pass by after diner or we can talk tomorrow [19:20:34] Analytics-Kanban, EventBus, Operations: Open up kafka1003 port 9092 in Analytics vlan ACL - https://phabricator.wikimedia.org/T149835#2764651 (Ottomata) [19:25:34] Analytics-Kanban, EventBus, Operations: Open up kafka1003 port 9092 in Analytics vlan ACL - https://phabricator.wikimedia.org/T149835#2764698 (faidon) Open>Resolved Done! [19:25:37] Analytics-Kanban, EventBus, Operations, Patch-For-Review: setup/install/deploy kafka1003 (WMF4723) - https://phabricator.wikimedia.org/T148849#2764700 (faidon) [19:35:51] Analytics-Kanban, Mobile-Content-Service, Wikipedia-Android-App-Backlog, Patch-For-Review, Spike: [Feed] Establish criteria for blacklisting likely bot-inflated most-read articles - https://phabricator.wikimedia.org/T143990#2585510 (Fjalapeno) @Mholloway what are you thinking about this patch... [19:38:39] Analytics-Kanban, Mobile-Content-Service, Wikipedia-Android-App-Backlog, Patch-For-Review, Spike: [Feed] Establish criteria for blacklisting likely bot-inflated most-read articles - https://phabricator.wikimedia.org/T143990#2764764 (Mholloway) I'm all for it if someone is willing to give it a... [19:43:52] !log manually stopped an old wikistats_git pageviews cron in spetrea's crontab on stat1003. no output from it since 2013, and spetrea doesn't really have an account [19:43:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:52:12] nuria: here? [19:52:23] joaL;yes just looked at code on IOS patch [19:52:34] joal: looks good, was doing one more select [19:52:49] nuria: have you seen my comment on the task? [19:52:51] joal: to triple check one more thing [19:52:55] sure [20:00:09] joal: there is still 1 issue [20:00:16] joal: let me update ticket [20:00:19] sure [20:00:19] see: {"browser_major":"-","os_family":"iOS","os_major":"9","device_family":"Generic Smartphone","browser_family":"Mobile Safari UIWebView","os_minor":"1","wmf_app_version":"5.2.1.942"} false WikipediaApp/5.2.1.942 (iPhone OS 9.1; Phone) [20:00:35] joal: webviews on android and ios are kind of on their own [20:00:51] I don't get it [20:00:52] joal: we need to see what is supposed to happen with those [20:01:29] Joal: your code change will catch this one too as a pageview that is not a problem [20:01:29] You mean we shouldn't tag them as pageview? [20:01:47] joal: I want to verify that IOS UA setting also applies to webview requests [20:01:49] in fairness nuria, those was already caught as pageview [20:02:09] joal: right, yes. [20:02:28] The thing we should double check is access method on those [20:02:57] access_method should possibly be mobile web instead of mobile app ? [20:03:01] Not even sure myself [20:03:33] joal: no, it's teh app [20:03:37] k [20:03:42] a webview doesn't imply mobile-web [20:03:49] it is just a container for content [20:04:10] k, it means we use a 'web browser' thing inside the app [20:04:17] then it should be app ? [20:06:45] Analytics-Kanban, Wikipedia-iOS-App-Backlog, iOS-app-feature-Analytics: Drop in iOS app pageviews since version 5.2.0 - https://phabricator.wikimedia.org/T148663#2764951 (Nuria) >As to the "iPhone" vs "Phone", I am not sure why "iPhone" is being searched for. According to the file history for this co... [20:08:34] joal: right the originating source of traffic is the app, the webview might be showing content that comes from anywhere restbase, php api, mobile web...it is partially a browser but not teh same than any browser you might have installed [20:08:54] I think I get it :) [20:09:44] nuria: I don't think they realised the change occurred in the actual os name: In previous versions it was 'iPhone OS blah', and now is 'iOS blah' [20:10:59] joal: unless is a webview and then is "iPhone blah Phone", without no IOS in some instances [20:11:29] nuria: it was that before for all, and now only for webviews yes [20:11:57] nuria: catching both 'iPhone' and 'iOS' should do the trick though :) [20:12:28] joal: you are right, and the CR is correct. I just would like for IOs team to verify their code for Webview UA [20:14:08] nuria: do you recall the parent task for bots in top ? [20:14:27] joal: yes, i added a bunch of subtasks to taht one [20:14:37] Analytics, Analytics-Kanban, Pageviews-API: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#2764980 (JAllemandou) [20:14:40] nuria: another one to add ;) [20:14:48] https://phabricator.wikimedia.org/T138207 [20:14:51] cc joal [20:14:58] Many thanks nuria [20:16:14] Analytics, Analytics-Kanban, Pageviews-API: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#2765036 (JAllemandou) This issue is caused by usual hidden bot traffic. I link this task as a subtas... [20:17:52] Analytics, Analytics-Kanban, Pageviews-API: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#2765191 (JAllemandou) [20:17:54] Analytics, Research-and-Data-Backlog: Improve bot identification at scale - https://phabricator.wikimedia.org/T138207#2393202 (JAllemandou) [20:19:19] Done for tonight a-team, see you tomorrow ! [20:19:24] bye joal ! [20:20:50] byeeee [20:26:06] Analytics: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#2744325 (JAllemandou) Hive query: ``` SELECT user_agent_map, SUM(view_count) FROM wmf.pageview_hourly WHERE year = 2016 AND month = 10... [20:26:57] Analytics: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#2766008 (JAllemandou) [20:27:01] Analytics: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#2744325 (JAllemandou) a:JAllemandou>None [21:02:52] Analytics-Kanban, EventBus, Patch-For-Review, Services (watching): Empty body in EventBus request - https://phabricator.wikimedia.org/T148251#2766246 (Ottomata) p:Triage>Normal [21:30:08] laters a-team! [21:30:16] bye ottomata! [21:33:09] Analytics: Non existing article is one of the most viewed according to the data returned by the /metrics/pageviews/top/ API - https://phabricator.wikimedia.org/T149178#2744325 (Nuria) In this case i think mediawiki might be retuning 200 to a non existing page, correct? For us to be counting these as pageviews [21:43:15] Analytics-Kanban, EventBus, Patch-For-Review, Services (watching): Empty body in EventBus request - https://phabricator.wikimedia.org/T148251#2766484 (mmodell) [21:45:10] Analytics-Kanban, EventBus, Patch-For-Review, Services (watching): Empty body in EventBus request - https://phabricator.wikimedia.org/T148251#2718379 (mmodell) [21:45:13] Analytics-Kanban, EventBus, Patch-For-Review, Services (watching): Empty body in EventBus request - https://phabricator.wikimedia.org/T148251#2718379 (mmodell) This seems to have gotten more noisy with #wmf-deploy-2016-11-01_1.29.0-wmf.1 [22:51:24] bye team, see you tomorrow!