[00:48:02] <krrrit-wm>	 (PS1) Madhuvishy: [WIP] Add script to drop old eventlogging partitions [analytics/refinery] - https://gerrit.wikimedia.org/r/240299 (https://phabricator.wikimedia.org/T106253)
[00:55:54] <HaeB>	 madhuvishy: so which eventlogging data is defined as old?
[00:57:05] <HaeB>	 or less abstractly: the WikimediaBlogVisit_5308166 table has historical data back to 2013 we'd like to keep for the time being - do we need to flag that, or is that purge just for privacy sensitive data?
[03:19:13] <madhuvishy>	 HaeB: this is just for the new data flowing into hadoop
[03:20:05] <madhuvishy>	 anything that's already in mysql will have the outcome defined by the recent EL schema audit that we did
[03:20:46] <madhuvishy>	 HaeB: https://meta.wikimedia.org/wiki/Schema_talk:WikimediaBlogVisit
[03:22:20] <madhuvishy>	 looks like it's set to auto purge after 90 days based on what ori told us
[03:22:38] <HaeB>	 madhuvishy: huh
[03:23:11] <HaeB>	 need to talk to him...but i'm pretty certain that's not based on the input of the blog team..
[03:23:24] <madhuvishy>	 I dont think the auto purging has been applied to it yet, checking
[03:24:09] <HaeB>	 i checked very recently, the 2013 data is still there
[03:24:29] <madhuvishy>	 Yup
[03:25:45] <madhuvishy>	 We are blocked on ops on executing that, but before we delete data we were gonna let everyone know. In this case, since ori was marked as owner, and he confirmed the next steps, we marked it as auto purge after 90 days. But we can change that of course.
[03:26:17] <HaeB>	 ok, i'll ping him
[03:26:45] <HaeB>	 i'm actually meeting folks from the current comms team about the use of this data tomorrow ;)
[03:27:44] <HaeB>	 fwiw, i went to the lightning talk today and totally understadn the purging of private data after a period, but this one doesn't have IPs or other PII
[03:30:03] <madhuvishy>	 HaeB: all schemas have user agents and hashed ips in the eventlogging capsule
[03:30:32] <madhuvishy>	 if we can purge those columns, and none of the other columns have sensitive data, we can keep the data indefinitely
[03:32:10] <madhuvishy>	 Feel free to ping me or mforns once you know, and we'll update the status.
[03:37:15] <HaeB>	 madhuvishy: oh right... we never used that data though, and i somehow thought that all user agents were set to "null" ... but checking again that was only the case in the beginning (had been looking at the start of the table ...)
[03:37:38] <HaeB>	 and the hashed IPs are considered PII too?
[03:38:16] <HaeB>	 in any case, purging these should not be a problem (especially if it's only after 90 days)
[03:38:27] <madhuvishy>	 HaeB: yeah, it might be possible to follow session behaviors in combination with other data.
[03:39:25] <madhuvishy>	 HaeB: alright then, let us know when you've checked with ori too, and we'll update the owner and team too.
[03:39:32] <HaeB>	 got it
[03:39:33] <HaeB>	 thanks!
[03:39:48] <madhuvishy>	 no problem!
[08:14:37] <wikibugs>	 Analytics-Engineering, Analytics-Wikimetrics: Once public a report cannot be made private - https://phabricator.wikimedia.org/T113452#1665453 (Lokal_Profil) NEW
[08:20:36] <wikibugs>	 Analytics-Engineering, Analytics-Wikimetrics: Cannot remove invalid members from cohort - https://phabricator.wikimedia.org/T113454#1665480 (Lokal_Profil) NEW
[08:27:44] <wikibugs>	 Analytics-Tech-community-metrics, Research consulting, Research-and-Data: Quantifying the "sum of all contributors" - https://phabricator.wikimedia.org/T113406#1665506 (Qgil) There is a list of all people who contributed code to Wikimedia projects at http://korma.wmflabs.org/browser/contributors.html...
[09:01:11] <wikibugs>	 Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1665558 (Qgil) @acs @dicortazar this project is still featured at #Possible-Tech-Projects. Do you want to propose it for #O...
[09:07:15] <wikibugs>	 Analytics-General-or-Unknown, Possible-Tech-Projects: Pageviews for Wikiprojects and Task Forces in Languages other than English - https://phabricator.wikimedia.org/T56184#1665584 (Qgil) This is a message posted to all tasks under "Need Discussion" at #Possible-Tech-Projects. #Outreachy-Round-11 is aroun...
[09:13:25] <wikibugs>	 Analytics-Tech-community-metrics, MediaWiki-Extension-Requests, Possible-Tech-Projects: A new events/meet-ups extension - https://phabricator.wikimedia.org/T99809#1665628 (Qgil) This is a message posted to all tasks under "Backlog" at #Possible-Tech-Projects. #Outreachy-Round-11 is around the corner....
[09:35:07] <wikibugs>	 Analytics-Tech-community-metrics, MediaWiki-Extension-Requests, Possible-Tech-Projects: A new events/meet-ups extension - https://phabricator.wikimedia.org/T99809#1665662 (Qgil) This is a message sent to all #Possible-Tech-Projects. The new round of [[ https://meta.wikimedia.org/wiki/Grants:IEG | Wik...
[10:51:47] <mforns>	 hi a-team!
[13:49:24] <nuria>	 joal: yt?
[13:49:28] <joal>	 hey nuria
[13:49:29] <joal>	 I am
[13:49:54] <nuria>	 joal: we had a meeting with leila yesterday to wrap up the last-access cookie
[13:50:04] <joal>	 right
[13:50:39] <nuria>	 joal: let's have another meeting to talk a bit about bot detection
[13:50:48] <joal>	 nuria: for sure
[13:50:51] <nuria>	 as a possible follow up
[13:51:07] <joal>	 There is the same kind of request from the discovery team
[13:51:14] <nuria>	 is there?
[13:51:28] <joal>	 yup
[13:51:38] <nuria>	 is it a phab item?
[13:52:06] <joal>	 I don't think so, let me add
[13:52:11] <joal>	 ask sorry
[13:52:41] <nuria>	 did the request came via e-mail?
[13:53:42] <joal>	 talking with dcausee
[13:53:47] <wikibugs>	 Analytics-Engineering, Analytics-Wikimetrics: Once public a report cannot be made private - https://phabricator.wikimedia.org/T113452#1666336 (Lokal_Profil) Note that the report in question does not actually seem to be public any more... but does still show up as public under "my reports".
[13:53:49] <joal>	 There is a ticket:
[13:53:52] <joal>	 https://phabricator.wikimedia.org/T112846
[13:54:17] <wikibugs>	 Analytics-Backlog, Discovery, Discovery-Analysis-Sprint: Display automata and humans separately on zero results rate graph - https://phabricator.wikimedia.org/T112846#1666338 (JAllemandou)
[13:54:22] <joal>	 Added the ticket to our backlog
[13:54:26] <joal>	 nuria: --^
[13:55:50] <nuria>	 joal: mmm.. not quite the same
[13:56:11] <joal>	 nuria: let's dicuss that, I think it's close
[13:56:34] <nuria>	 yes it is, but serach has different ways to serach, some js based some not
[13:56:39] <nuria>	 *search
[13:58:18] <nuria>	 but sure, is similar
[14:10:33] <milimetric>	 hey joal
[14:10:39] <milimetric>	 how goes with cleanup?
[14:10:56] <milimetric>	 I'm not finding patches that are owned by the community, mostly WMF folks
[14:28:56] <nuria>	 milimetric: i found  one EL patch I was going to look into
[14:29:04] <joal>	 milimetric: Hey !
[14:29:25] <joal>	 milimetric: spent most of my time discussing with David Causse (discovery team)
[14:29:49] <nuria>	 milimetric: now .. ahem trying to remember how you enable EL and navigationtiming on vagrant to work arround sampling
[14:29:57] <joal>	 We did not really clean (*shame*), but it was a good discussion about potential collaboration
[14:30:17] <nuria>	 joal: anything you want to share?
[14:30:26] <joal>	 Sure :)
[14:30:27] <milimetric>	 nuria: did you enable the roles?
[14:30:47] <joal>	 discovery team is after stats to better rank their search results
[14:30:55] <nuria>	 milimetric: ya, there was something else that needed changing so it is not sampled... need to find what it was
[14:31:17] <nuria>	 joal: does discovery team have server side data?
[14:31:22] <joal>	 They will send data in hadoop (ottomata is onto it with ebernhardson)
[14:31:28] <milimetric>	 nuria: the navigationtiming role doesn't get you unsampled?  We should change that
[14:31:43] <nuria>	 milimetric: no, it is -on purpose- sampled
[14:32:01] <joal>	 nuria: I have not asked
[14:32:05] <milimetric>	 hm, even in just the testing environment?
[14:32:10] <nuria>	 milimetric: and i think perf team wants it that way , but like anything in mediawiki, there is 1 config
[14:32:14] <joal>	 I think they plan to send data from Wikimedia cleint side
[14:32:16] <nuria>	 for dev and prod
[14:32:18] <milimetric>	 oh, right
[14:32:30] <milimetric>	 yay :)
[14:32:33] <nuria>	 joal: i think that sending data from client is lame
[14:32:42] <milimetric>	 I'm currently trying to learn Visual Editor so I can contribute thre
[14:32:58] <milimetric>	 it's gonna take a while :)
[14:33:09] <nuria>	 joal: makes a lot more sense to send it from the server that has access to all kinds of queries rather than using users bandwidth
[14:33:16] <nuria>	 joal: specially in mobile
[14:33:39] <joal>	 nuria: one difficulty server side is to build what they call 'discussion'
[14:33:40] <nuria>	 milimetric: *bow* to your bravery
[14:34:01] <joal>	 meaning user-session pre-aggregated logs
[14:34:21] <milimetric>	 lol.  we'll see if it's bravery and not stupidity
[14:34:30] <nuria>	 joal: sure, but that *some* metrics are not doable server side doesn't mean you have to send all metrics from the client, right?
[14:34:37] <joal>	 If at some point we end up having a user identifer, server-side is find by me
[14:34:49] <joal>	 nuria: for sure
[14:35:15] <joal>	 I don't have many opinions at for where the logs come from, I was trying to understand their need :)
[14:35:56] <joal>	 some are pageview related --> possibly dedicated runs with specific metrics sending data to ES
[14:36:31] <nuria>	 joal: it's analogous to try to measure pageviews by sending request from client
[14:36:33] <joal>	 Some need specific data we don't have already in sync with DBs (link graph) --> to compute pagerank
[14:36:43] <nuria>	 joal: instead of measuring them from the server side
[14:36:57] <joal>	 nuria: they are interested in pageviews we already compute
[14:37:21] <milimetric>	 joal: the "getting in sync with the DBs" problem is one I think is at the root of future analytics work
[14:37:32] <joal>	 milimetric: Indeed !
[14:37:48] <joal>	 milimetric: And an interesting one for sure :)
[14:37:56] <nuria>	 joal: ya, that makes sense too, but what i am saying is that it seems that not sending search queries from the server will make them miss a lot of queries
[14:38:15] <joal>	 nuria: I am not sure of that, you may be right
[14:38:25] <nuria>	 joal: cause for once, they will miss anyone w/o a js client ( a minority, sure but important in mobile specially)
[14:38:43] <nuria>	 joal: and also the apps -if they use the search which i do not know-
[14:39:00] <joal>	 nuria: I am not the one to be convinced here :)
[14:39:07] <joal>	 I think you should talk to ebernhardson :)
[14:39:18] <nuria>	 joal: so where logs come from is important to get the "whole" picture
[14:39:44] <joal>	 nuria: Yes
[14:40:00] <joal>	 nuria: But there is already stuff we can do for them even without talking about new stuff
[14:40:37] <joal>	 nuria: getting server-side log shouldn't too difficult, I don't know why they are after client-side first
[14:47:47] <nuria>	 joal: ya, i do not know either but i think is also our obligation to ensure the data is the best it can be and in this case i think it warrants a little research on why logging needs to come from the client
[14:48:11] <joal>	 hm
[14:49:10] <joal>	 nuria: I think andrew has had discussion, and I also think Ironholds is involved
[14:49:52] <joal>	 nuria: finally, not speaking of client/server side jobs, there is already a good amount of pageview data they'd be interested in
[14:50:16] <nuria>	 joal: ya, makes sense for "search suggestions"
[14:50:27] <joal>	 nuria: indeed
[14:57:23] <joal>	 nuria: could be a good idea to invite elleery to the bot meeting ?
[14:57:43] <nuria>	 joal: ah yessss
[14:57:51] <joal>	 Thx :)
[15:23:40] <wikibugs>	 Analytics-Cluster, Analytics-Kanban: Create Kafka deployment checklist on wikitech {hawk} [5 pts] - https://phabricator.wikimedia.org/T111408#1666558 (ggellerman) a:Ottomata
[15:27:31] <krrrit-wm>	 (PS2) Milimetric: success_by_user_type: Split the 5–99 cohort into 5–9 and 10–99 [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/237534 (owner: Jforrester)
[15:30:14] <milimetric>	 mforns: do you think there's an elegant way we could handle the case when someone adds a column to a query that's run by reportupdater?
[15:30:27] <mforns>	 milimetric, mmmmm
[15:30:42] <mforns>	 sure there is... :]
[15:31:18] <milimetric>	 wanna talk in batcave mforns ?
[15:31:35] <mforns>	 sure!
[15:31:37] <mforns>	 omw
[15:48:02] <joal>	 Hey ottomata, have a minute?
[15:49:21] <ottomata>	 joal i have a few!
[15:49:25] <ottomata>	 interview starting soon
[15:49:37] <joal>	 np, catchup after if ok
[15:50:01] <ottomata>	 k, i have another meeting right after though
[15:50:06] <ottomata>	 if it i short now is better prob
[15:50:07] <joal>	 Mwarf
[15:50:22] <joal>	 Will wait tomorrow
[15:50:25] <ottomata>	 ok
[15:50:25] <joal>	 :)
[15:52:33] <wikibugs>	 Analytics-Backlog, MediaWiki-API, Research-and-Data: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1666634 (Anomie) To help whoever has the conversation with Ops: a sampling of api-feature-usage.log has an average 89.48 bytes for the user agent per line. archive/api.log-20150923...
[15:54:10] <wikibugs>	 Analytics-Backlog, MediaWiki-API, Research-and-Data: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1666641 (EBernhardson) If size is a concern we could consider moving this from fluorine to the kafka->hadoop logging pipeline. It is much better suited for dealing with and process...
[15:59:35] <wikibugs>	 Analytics-Backlog, MediaWiki-API, Research-and-Data: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1666653 (Anomie) Can that pipeline handle an additional 5000–5500 requests per second on top of whatever it already gets hit with?
[16:02:14] <milimetric>	 hey a-team, marcel and I are hanging out in the batcave talking about patches, life, etc.
[16:02:27] <milimetric>	 so, consider that your party invitation ^
[16:02:51] <joal>	 A party ! Great :)
[16:05:30] <wikibugs>	 Analytics-Backlog, MediaWiki-API, Research-and-Data: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1666674 (EBernhardson) The kafka pipeline currently serves the entire flood of web request logs from the varnish servers (peak 150000 req/s). I just wrote the mediawiki->kafka inte...
[16:07:04] * ebernhardson is guessing at 150k req/s, but sounds close enough :)
[16:22:46] <nuria>	 ottomata, madhuvishy : what is the state of the beta labs instance of eventlogging ?
[16:27:08] <nuria>	 ottomata, madhuvishy : as can devs use it to test their EL changes?
[16:36:42] <krrrit-wm>	 (CR) Milimetric: "couple comments" (2 comments) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/237398 (https://phabricator.wikimedia.org/T112109) (owner: Mforns)
[16:38:27] <nuria>	 ottomata: ping me when you are back.
[16:42:48] <joal>	 hey kevinator, do you have aminute ?
[16:42:55] <kevinator>	 sure
[16:43:16] <joal>	 calling you then kevinator :)
[17:03:33] <ottomata2>	 kevinator: did we cancel meeting with d'ana today?
[17:05:55] <ottomata>	 nuria:  am back , need FOOD thought wasssup?
[17:06:08] <nuria>	 ottomata: do eat this is NOT urgent
[17:06:14] <nuria>	 ottomata: ping me later
[17:06:45] <ottomata>	 k
[17:07:24] <kevinator>	 yes, ottomata I thought you were busy today... so no meeting with D'Ana
[17:07:35] <kevinator>	 I'm meeting with her solo so we can go over the process
[17:07:41] <ottomata>	 ahhh but we really need to fix this!  you should still interview before I do!
[17:07:58] <kevinator>	 yes, we're discussing that now
[17:08:02] <ottomata>	 oh, you are in meeting?
[17:08:13] <kevinator>	 yes I'm with D'Ana now
[17:08:16] <ottomata>	 i guess if you are fixing that then I don't need to join
[17:08:20] <ottomata>	 that's all I wanted to talk about.
[17:08:26] <kevinator>	 no, i think 1:1 is best to get this done
[17:08:30] <ottomata>	 k
[17:08:32] <kevinator>	 ttyl
[17:08:35] <ottomata>	 k laters, thanks
[17:08:50] <nuria>	 thanks for doing than kevinator
[17:08:54] <nuria>	 *that
[17:36:38] <madhuvishy>	 nuria: if we find a patch that's been -1-ed and doesn't make sense, what should we do with it?
[17:37:01] <joal>	 hey ottomata, two questions for you if you have time
[17:37:24] <joal>	 one on yesterday's error on load job, one on camus
[17:40:05] <nuria>	 madhuvishy: on our analytics code?
[17:40:13] <madhuvishy>	 nuria: yeah
[17:40:25] <madhuvishy>	 nuria: https://gerrit.wikimedia.org/r/#/c/202729/
[17:41:59] <ottomata>	 joal:  nuria, hallo!
[17:42:01] <nuria>	 madhuvishy: nice eh?
[17:42:04] <ottomata>	 yes ask me!
[17:42:15] <nuria>	 madhuvishy: for that one in particular i would say 'abandon'
[17:42:27] <madhuvishy>	 nuria: yeah, it doesn't make any sense.
[17:42:33] <wikibugs>	 Analytics-Backlog: Install snzip on stat1002 and stat1003 {hawk} - https://phabricator.wikimedia.org/T112770#1667000 (Milimetric) Just heard an update at Scrum of Scrums that Ops is giving Analytics ownership of this kind of work.  They said they'd help if needed.
[17:42:34] <nuria>	 ottomata: what is the status of eventlogging on beta labs?
[17:42:49] <ottomata>	 ah, should be fine, no?  on deployment-eventlogging03?
[17:42:52] <ottomata>	 it is using kafka there
[17:42:53] <nuria>	 ottomata: as in.. should it be working or are you guys testing stuff there?
[17:42:57] <ottomata>	 should be working
[17:43:04] <ottomata>	 is it not?
[17:43:16] <nuria>	 ottomata: i do not think is receiving events
[17:43:36] <nuria>	 https://www.irccloud.com/pastebin/KKIYEGza/
[17:43:50] <nuria>	 as logfiles have not been updated in quite a while
[17:43:59] <ottomata>	 hm, i checked it a last week and i saw events go into mysql...
[17:44:01] <ottomata>	 let's see..
[17:44:23] <ottomata>	 nuria: you are looking at /etc
[17:44:27] <ottomata>	 those are not log files
[17:44:44] <ottomata>	 otto@deployment-eventlogging03:/var/log/eventlogging$ ls -l
[17:44:44] <ottomata>	 total 2028
[17:44:44] <ottomata>	 -rw-r--r-- 1 eventlogging eventlogging 954279 Sep 23 17:41 all-events.log
[17:44:44] <ottomata>	 drwxrwxr-x 2 eventlogging eventlogging   4096 Sep 23 06:34 archive
[17:44:44] <ottomata>	 -rw-r--r-- 1 eventlogging eventlogging 695277 Sep 23 17:38 client-side-events.log
[17:44:44] <ottomata>	 -rw-r--r-- 1 eventlogging eventlogging 407983 Sep 23 17:41 server-side-events.log
[17:44:58] <joal>	 ottomata: yesterday's failure were not real ones: due to varnish-kafka restart I think (sequence started at 0)
[17:45:02] <ottomata>	 ok cool
[17:45:08] <joal>	 And it still raised errors
[17:45:17] <krrrit-wm>	 (Abandoned) Madhuvishy: Modify access rules [analytics/camus] (refs/meta/config) - https://gerrit.wikimedia.org/r/202729 (owner: Alexey)
[17:45:19] <joal>	 Shall we prevent that ?
[17:45:27] <ottomata>	 ah, if we can, yes, that would be awesome
[17:45:57] <milimetric>	 heads up a-team: Security will be banging on our cluster next week doing secret testing things :)
[17:46:23] <ottomata>	 secret!  yikes ok.
[17:46:26] <joal>	 ottomata: you did the biggest part of the work in sequence_stats_hourly: don't consider hosts when sequence starts at 0 :)
[17:46:44] <joal>	 I'll create sa ticket :)
[17:46:47] <ottomata>	 ja, we could make SUCCESS file depenent on that result instead
[17:46:53] <ottomata>	 even say if loss < 1%
[17:47:01] <ottomata>	 maybe :)
[17:47:11] <nuria>	 ottomata: you are so rightt, man i forgot all this
[17:47:34] <joal>	 Then what that means is we should use sequence_stats_hourly instaed of faulty hosts for SUCCESS flaggign, right ?
[17:48:05] * milimetric lunch
[17:48:45] <joal>	 Also, Got a review of camus code ottomata : the timestamp in offests files is not linked to the message but to the execution --> can't be used for load cvheck
[17:48:57] <ottomata>	 yes, joal i think that would be good
[17:49:05] <ottomata>	 ????
[17:49:08] <ottomata>	 oh
[17:49:09] <ottomata>	 in the files.
[17:49:10] <ottomata>	 really?
[17:49:12] <ottomata>	 nawwww
[17:49:13] <ottomata>	 really?
[17:49:14] <joal>	 yes sir
[17:49:22] <ottomata>	 what's the point of them then?!
[17:49:38] <ottomata>	 i thought i looked at them though
[17:49:41] <ottomata>	 when we were having troulbes
[17:49:42] <joal>	 Debugging execution I guess
[17:49:53] <joal>	 I already asked that question at that time :)
[17:49:56] <ottomata>	 and saw old timestamps during a current execution for certain partitions
[17:50:37] <joal>	 ottomata: https://github.com/linkedin/camus/blob/master/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/common/EtlKey.java#L78
[17:51:17] <madhuvishy>	 joal: https://gerrit.wikimedia.org/r/#/c/185377/ This patch is trying to extract project and project qualifier from uri host, don't we already have all this functionality now?
[17:51:37] <ottomata>	   // if event can't be decoded,
[17:51:37] <ottomata>	     // this time will be used for
[17:51:37] <ottomata>	     // debugging.
[17:51:38] <ottomata>	 hmmmm
[17:51:39] <joal>	 madhuvishy: we do !!
[17:52:27] <madhuvishy>	 joal: cool, will comment and abandon patch
[17:53:00] <joal>	 in webrequest_table: normalized_host field
[17:53:04] <joal>	 madhuvishy: --^
[17:53:15] <madhuvishy>	 yup
[17:53:18] <joal>	 But maybe ellery code is better, you can check ;)
[17:53:35] <joal>	 ottomata: So camus timestamp, no good :(
[17:54:33] <joal>	 I'll update the ticket again with using sequence_stats_hourly :)
[17:54:35] <joal>	 ottomata: --^
[17:55:09] <krrrit-wm>	 (Abandoned) Madhuvishy: first draft of host parsing udf. extracts the project and project qualifier from the uri_host. Added mobile_qualifier option. Changed tabs to spaces. Use '__null__' string to denote expected null values in test csv Change-Id: I3b48f0cf2b4836d452e00a460a7b [analytics/refinery/source] - https://gerrit.wikimedia.org/r/185377 (owner: Ewulczyn)
[17:55:10] <ottomata>	 joal:  i don't think that's right
[17:55:16] <ottomata>	 that is just the set() method
[17:55:33] <ottomata>	 the main constructor takes an another key and sets the this.time = other.time
[17:55:48] <ottomata>	 I think that that set() call is for constructors not starting with a key
[17:56:07] <krrrit-wm>	 (PS4) Mforns: Make reportupdater support execution of scripts [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/237398 (https://phabricator.wikimedia.org/T112109)
[17:56:54] <joal>	 ottomata: there is no constructor using time
[17:57:00] <joal>	 except the other one
[17:57:21] <joal>	 How would you build the one out of messages ?
[17:57:27] <ottomata>	 joal:  https://github.com/linkedin/camus/blob/master/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/mapred/EtlMultiOutputCommitter.java#L58
[17:58:18] <ottomata>	 hmm i get your question though...thinking
[17:58:28] <krrrit-wm>	 (PS5) Mforns: Make reportupdater support execution of scripts [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/237398 (https://phabricator.wikimedia.org/T112109)
[17:58:29] <joal>	 ottomata: https://github.com/linkedin/camus/search?utf8=%E2%9C%93&q=setTime
[17:58:34] <joal>	 interesting --^
[17:58:53] <ottomata>	 this one
[17:58:54] <ottomata>	 https://github.com/linkedin/camus/blob/00fbb5fd7f61ca5ed2b69948566204e15bd30bee/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/mapred/EtlRecordReader.java#L315
[17:58:56] <ottomata>	 yeah
[17:59:11] <krrrit-wm>	 (CR) Mforns: Make reportupdater support execution of scripts (2 comments) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/237398 (https://phabricator.wikimedia.org/T112109) (owner: Mforns)
[18:00:02] <joal>	 ottomata: I looked at timsteamps in files though, and messages in kafka --> not precisely the same :(
[18:00:02] <mforns>	 milimetric, I made the changes to the RU, that looks *a lot* better, thanks!
[18:00:26] <joal>	 ottomata: got to go diner, but let's discuss that more tomorrow :)
[18:00:32] <ottomata>	 ok
[18:01:51] <joal>	 milimetric: I keep you dashiki task in mind, in case I have a irrepressible desire to code javascript :)
[18:01:57] <krrrit-wm>	 (Abandoned) Nuria: Create a cohort from campaign participations [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/126927 (owner: Awight)
[18:02:22] <milimetric>	 lol
[18:02:24] <joal>	 have a good end of day a-team !
[18:02:32] <kevinator>	 ciao
[18:02:33] <ottomata>	 byeyaa
[18:02:51] <mforns_gym>	 by joal !
[18:03:23] <nuria>	 joal: ciao joal, let's talk tomorrow about fingerprinting
[18:04:30] <madhuvishy>	 bye joal
[18:05:00] <madhuvishy>	 ottomata: is this patch still relevant? https://gerrit.wikimedia.org/r/#/c/164653/
[18:05:22] <HaeB>	 milimetric: any reason why https://reportcard.wmflabs.org/graphs/active_editors is still stuck in june?
[18:05:30] <HaeB>	 (https://stats.wikimedia.org/EN/TablesWikimediaAllProjects.htm has the august numbers already)
[18:06:33] <milimetric>	 HaeB: it's manually updated, and I didn't see the files from Erik yet, unless I missed something
[18:07:56] <HaeB>	 shall we file a task? (i'm mainly concerned about the ETA for the september data, because of the publication of the quarterly report next month... as usual ;)
[18:22:39] <krrrit-wm>	 (PS2) Madhuvishy: [WIP] Add script to drop old eventlogging partitions [analytics/refinery] - https://gerrit.wikimedia.org/r/240299 (https://phabricator.wikimedia.org/T106253)
[18:27:52] <madhuvishy>	 ottomata: around?
[18:28:08] <ottomata>	 uyp hiya
[18:28:36] <wikibugs>	 Analytics-Cluster, Analytics-Kanban: Create Kafka deployment checklist on wikitech {hawk} [5 pts] - https://phabricator.wikimedia.org/T111408#1667227 (Ottomata) Done.
[18:28:46] <madhuvishy>	 I added the script to drop old EL partitions - wondering if there should be a more common script for both webrequest and EL
[18:28:58] <ottomata>	 milimetric:  review MMEMMEMMEEEE https://gerrit.wikimedia.org/r/#/c/238854/6
[18:29:08] <madhuvishy>	 in this case, the major difference was there are no hive partitions to drop
[18:29:49] <madhuvishy>	 ottomata: why does jenkins bot hate you so much in that patch
[18:30:26] <ottomata>	 haha
[18:30:29] <ottomata>	 it hates me
[18:31:07] <madhuvishy>	 :P
[18:31:22] <madhuvishy>	 so check this out and tell me what you think - https://gerrit.wikimedia.org/r/240299
[18:31:37] <ottomata>	 yeah, madhuvishy, i was at one point trying to make everything super awesome and generic
[18:31:57] <ottomata>	 but was having trouble and it was getting complicated
[18:32:07] <ottomata>	 and nuria convinced me to just do the most right thing without getting too complicated
[18:32:14] <ottomata>	 but, i you can adapt the original script easily, then sure!
[18:32:23] <ottomata>	 maybe there can be an option for not deleting hive partitoins
[18:32:27] <ottomata>	 like there is for not deleting data
[18:32:33] <ottomata>	 if everything is the same otherwise
[18:32:53] <madhuvishy>	 Hmmm, okay let me poke around and see. the options there are raw and refined
[18:33:20] <madhuvishy>	 if i add one for el - it would be el, raw, refined - kinda weird
[18:34:06] <madhuvishy>	 may be i can change types to webrequest-raw, webrequest-refined and eventlogging-raw
[18:34:35] <madhuvishy>	 that would mean changing the existing cron that's running
[18:36:29] <ottomata>	 hmmm
[18:36:35] <ottomata>	 no madhuvishy, maybe just keep it simple
[18:36:37] <ottomata>	 if there is more to change
[18:36:41] <ottomata>	 let's just keep it generic
[18:36:41] <wikibugs>	 Analytics-Backlog: Install snzip on stat1002 and stat1003 {hawk} - https://phabricator.wikimedia.org/T112770#1667278 (Halfak) It seems like this library will not work for snappy compression from Hadoop.  @ottomata, unless we want to fix this upstream, then I don't see a good reason to spend more time getting...
[18:37:29] <madhuvishy>	 ottomata: so 2 different scripts?
[18:38:06] <madhuvishy>	 hmmm
[18:38:44] <ottomata>	 sorry
[18:38:50] <ottomata>	 ya meat specific
[18:38:50] <ottomata>	 yeah
[18:38:52] <ottomata>	 meant*
[18:38:56] <ottomata>	 not DRY but mehhhh
[18:39:20] <madhuvishy>	 ottomata: yeah okay, we can make it generic but things would have to change - not sure if worth it
[18:39:26] <ottomata>	 yeah no i mean
[18:39:29] <ottomata>	 not worth it i think
[18:39:38] <ottomata>	 what you are doing is good
[18:40:04] <madhuvishy>	 ottomata: ya also, if we have to put in some extra rules for EL that are schema based, might be better to keep it separate
[18:40:15] <ottomata>	 yea
[18:40:56] <madhuvishy>	 ottomata: okay then, feel free to review it when you get a chance. where do the actual cron jobs get scheduled? puppet?
[18:41:08] <ottomata>	 yes
[18:41:29] <krrrit-wm>	 (PS3) Madhuvishy: Add script to drop old eventlogging partitions [analytics/refinery] - https://gerrit.wikimedia.org/r/240299 (https://phabricator.wikimedia.org/T106253)
[18:43:31] <krrrit-wm>	 (CR) Ottomata: "One comment, otherwise LGTM" (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/240299 (https://phabricator.wikimedia.org/T106253) (owner: Madhuvishy)
[18:45:56] <krrrit-wm>	 (PS4) Madhuvishy: Add script to drop old eventlogging partitions [analytics/refinery] - https://gerrit.wikimedia.org/r/240299 (https://phabricator.wikimedia.org/T106253)
[18:47:01] <ottomata>	 madhuvishy:  ready for merge ? :)
[18:47:15] <madhuvishy>	 ottomata: Yup :) nothing will happen though right?
[18:47:19] <ottomata>	 naw
[18:47:22] <madhuvishy>	 ya cool
[18:47:24] <krrrit-wm>	 (CR) Ottomata: [C: 2] Add script to drop old eventlogging partitions [analytics/refinery] - https://gerrit.wikimedia.org/r/240299 (https://phabricator.wikimedia.org/T106253) (owner: Madhuvishy)
[18:47:31] <krrrit-wm>	 (CR) Ottomata: [V: 2] Add script to drop old eventlogging partitions [analytics/refinery] - https://gerrit.wikimedia.org/r/240299 (https://phabricator.wikimedia.org/T106253) (owner: Madhuvishy)
[18:47:46] * madhuvishy hunts for related puppet code
[19:00:37] <ottomata>	 madhuvishy:  you probably alrady found
[19:00:38] <ottomata>	 but
[19:00:38] <ottomata>	 https://phabricator.wikimedia.org/T83580
[19:00:41] <ottomata>	 oops
[19:00:44] <ottomata>	 https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/analytics/refinery.pp#L82
[19:00:45] <madhuvishy>	 ya found :)
[19:04:05] <madhuvishy>	 ottomata: https://gerrit.wikimedia.org/r/240449
[19:05:08] <wikibugs>	 Analytics-EventLogging, MediaWiki-extensions-MultimediaViewer: 60% of MultimediaViewerNetworkPerformance events dropped (exceeds maxUrlSize) - https://phabricator.wikimedia.org/T113364#1667404 (Jdlrobson) p:Triage>Normal
[19:06:42] <ottomata>	 madhuvishy:  missing change of ${log_file} var in cron
[19:45:57] <milimetric>	 ottomata: ok, but I have to finish this reportupdater stuff first
[19:47:01] <ottomata>	 OOOK
[19:48:06] <krrrit-wm>	 (CR) Milimetric: [C: 2] Make reportupdater support execution of scripts [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/237398 (https://phabricator.wikimedia.org/T112109) (owner: Mforns)
[19:55:22] <madhuvishy>	 ottomata: oops, fixed.
[19:57:41] <ottomata>	 ok madhuvishy looks good,
[19:57:45] <ottomata>	 i'm goign to deploy refinery and try your script
[19:57:49] <ottomata>	 if that works then we can merge this too
[20:00:28] <madhuvishy>	 okay cool
[20:00:42] <madhuvishy>	 also, ottomata I was asking before - https://gerrit.wikimedia.org/r/#/c/164653/ is this still relevant?
[20:01:29] <ottomata>	 i dunno, lets' say no
[20:01:30] <ottomata>	 so old!
[20:01:32] <ottomata>	 abandoning.
[20:01:38] <krrrit-wm>	 (Abandoned) Ottomata: Add oozie util workflow to drop partitions from Hive tables [analytics/refinery] - https://gerrit.wikimedia.org/r/164653 (owner: Ottomata)
[20:02:34] <madhuvishy>	 ottomata: :)
[20:04:43] <madhuvishy>	 milimetric: this is for you - https://gerrit.wikimedia.org/r/#/c/181179/ this repo looks old/inactive - should it be deprecated?
[20:04:45] <ottomata>	 llooks good madhuvishy, merging.
[20:04:51] <madhuvishy>	 ottomata: thanks
[20:05:21] <milimetric>	 madhuvishy: yeah, ori asked me to do that a while back, but I don't think anything came of it.  I'll abandon
[20:05:47] <madhuvishy>	 milimetric: okay, can I add it here? https://etherpad.wikimedia.org/p/GerritCleanupDay
[20:06:17] <krrrit-wm>	 (Abandoned) Milimetric: Add basic flask server with highcharts [analytics/abacist] - https://gerrit.wikimedia.org/r/181179 (owner: Milimetric)
[20:06:26] <milimetric>	 madhuvishy: yes, thx
[20:08:42] <madhuvishy>	 milimetric: np, done :)
[20:13:42] <wikibugs>	 Analytics: Setup pipeline for search logs to travel through kafka and camus into hadoop - https://phabricator.wikimedia.org/T113521#1667718 (EBernhardson) a:Ottomata
[20:14:21] <wikibugs>	 Analytics, Analytics-Backlog, Analytics-Cluster: Setup pipeline for search logs to travel through kafka and camus into hadoop - https://phabricator.wikimedia.org/T113521#1667723 (Ottomata)
[20:14:46] <ottomata>	 oop madhuvishy missed this
[20:14:46] <ottomata>	 https://gerrit.wikimedia.org/r/#/c/240566/
[20:15:17] <madhuvishy>	 ottomata: ouch, sorry about that, thanks!
[20:19:05] <wikibugs>	 Analytics-Backlog, MediaWiki-API, Research-and-Data: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1667747 (EBernhardson) I checked with otto, 6k req/s and 150GB/day is no problem to send to the existing kafka cluster.  The biggest difference of logging to the kafka->hadoop pipe...
[20:20:06] <wikibugs>	 Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1667748 (MeganHernandez_WMF) Hey @Jgreen checking to see if the impression numbers should be sampled differentl...
[20:20:59] <wikibugs>	 Analytics-Backlog, MediaWiki-API, Research-and-Data: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1667756 (Anomie) I don't know anything about kafka or hadoop, so...
[20:21:33] <wikibugs>	 Analytics-Backlog, MediaWiki-API, Research-and-Data: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1667759 (Ottomata) BTW logs going to Kafka don't just have to go into Hadoop for processing.  We use kafkatee in a few places to consume, sample and filter from kafka into simple l...
[20:22:44] <ottomata>	 madhuvishy:  works!  thank you!
[20:23:11] <madhuvishy>	 ottomata: awesome.
[20:23:20] <wikibugs>	 Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1667761 (Jgreen) >>! In T97676#1667748, @MeganHernandez_WMF wrote: > Hey @Jgreen checking to see if the impress...
[20:23:54] <madhuvishy>	 milimetric: I'm planning to run a hive class of sorts for a bunch of people requesting access atm next week. Do you know more people who want access we should reach out to?
[20:24:25] <milimetric>	 madhuvishy: what's the current group?
[20:24:58] <madhuvishy>	 Zhou, Dan, Edward, Jonathan, and Neil mentioned today he wanted to do some stuff.
[20:25:16] <wikibugs>	 Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1667768 (ellery) @jgreen do the banner impression numbers in the table  pgheres.bannerimpressions correctly ref...
[20:25:25] <milimetric>	 madhuvishy: the folks from wikidata but I'm thinking they're already ok
[20:25:29] <milimetric>	 you could reach out and let them know
[20:26:02] <milimetric>	 (maybe just email / phab ping your friend Lydia)
[20:26:24] <madhuvishy>	 okay will do
[20:28:12] <wikibugs>	 Analytics-Backlog, MediaWiki-API, Research-and-Data: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1667776 (EBernhardson) Within search our primary destination for our logging data will be hive. This is an SQL-like frontend that accepts mostly standard SQL queries on your data t...
[20:29:25] <wikibugs>	 Analytics, Analytics-Backlog, Analytics-Cluster: Setup pipeline for search logs to travel through kafka and camus into hadoop - https://phabricator.wikimedia.org/T113521#1667778 (Ottomata)
[20:34:14] <nuria>	 milimetric: i e-mailed lydia a while back but got no answer
[20:36:47] <wikibugs>	 Analytics-EventLogging, Fundraising-Backlog, Unplanned-Sprint-Work, Fundraising Sprint Tom Waits, Patch-For-Review: Promise returned from LogEvent should resolve when logging is complete - https://phabricator.wikimedia.org/T112788#1667801 (Ejegg) @awight, did you move this by mistake?  I'm gett...
[20:38:35] <wikibugs>	 Analytics-EventLogging, Fundraising-Backlog, Unplanned-Sprint-Work, Fundraising Sprint Tom Waits, Patch-For-Review: Promise returned from LogEvent should resolve when logging is complete - https://phabricator.wikimedia.org/T112788#1667810 (awight) Sorry, it looks like I never CR+1'd even, yeah...
[20:39:16] <wikibugs>	 Analytics-EventLogging, Fundraising-Backlog, Unplanned-Sprint-Work, Fundraising Sprint Tom Waits, Patch-For-Review: Promise returned from LogEvent should resolve when logging is complete - https://phabricator.wikimedia.org/T112788#1667814 (Nuria) @ejegg Thank you for your changes , sorry I coul...
[20:45:34] <wikibugs>	 Analytics, Analytics-Cluster, Fundraising Tech Backlog, Fundraising-Backlog, operations: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1667863 (Jgreen) >>! In T97676#1667768, @ellery wrote: > @jgreen do the banner impression numbers in the table...
[20:49:13] <milimetric>	 mforns: I could probably learn this myself but it feels easier to ask
[20:49:25] <mforns>	 milimetric, sure
[20:49:28] <milimetric>	 for the previous results, as they are keyed by "date"
[20:49:36] <milimetric>	 can the value there be an array of arrays?
[20:49:40] <milimetric>	 or just an array of values?
[20:50:01] <mforns>	 it is an array of arrays if the report config has funnel=true
[20:50:06] <milimetric>	 gotcha
[20:50:08] <mforns>	 I know funnel is a terrible name
[20:50:12] <milimetric>	 :)
[20:50:16] <mforns>	 we can change that
[20:50:19] <milimetric>	 it's ok, that's still the layout name in dashiki
[20:52:40] <mforns>	 milimetric, if you look at utils:get_previous_results or executor:execute_sql or writer.write_results you'll see the "if report.is_funnel"
[20:52:51] <milimetric>	 right
[20:53:42] <mforns>	 btw, thanks for the review
[21:09:33] <milimetric>	 mforns: I'm not sure if I did something wrong or there's a minor bug
[21:09:39] <milimetric>	 there's this line in the teardown:             os.remove('test/fixtures/output/writer_test.tsv')
[21:09:42] <mforns>	 milimetric, batcave?
[21:09:52] <milimetric>	 test/fixtures/output/writer_test1.tsv
[21:10:03] <milimetric>	 (oh, small issue I think)
[21:10:06] <mforns>	 ok
[21:10:27] <milimetric>	 so my test is now adding lines to writer_test1.tsv
[21:10:37] <mforns>	 aha
[21:11:59] <milimetric>	 i was just wondering if you had a similar test that created different output and you validated without changing the comitted output files
[21:12:50] <mforns>	 yes, that's kind of a manual thing, you can add the files that you want to remove after executing the test - to self.paths_to_clean
[21:13:00] <mforns>	 and the tearDown will delete them
[21:13:17] <mforns>	 I don't know if that's what you want
[21:13:42] <milimetric>	 well... hm
[21:13:46] <milimetric>	 batcave :)
[21:13:49] <mforns>	 ok :]
[21:15:47] <nuria>	 ejegg: yt?
[21:16:58] <ejegg>	 yep!
[21:18:19] <ejegg>	 yep, i'm here nuria
[21:18:39] <nuria>	 ejegg: regarding the sendbeacon patch
[21:18:48] * AndyRussG lurks
[21:19:08] <nuria>	 ejegg: you are correct that for browsers that do not support sendBeacon you will not see logging when moving away from page
[21:19:44] <ejegg>	 yeah, we were hoping to learn what series of banner views leads a person to donate
[21:19:52] <nuria>	 that is correct. you can read epic bug on this topic:
[21:20:13] <ejegg>	 so we wanted to log a banner history event for everyone that clicks through the fundraising banner
[21:20:31] <nuria>	 (long read) https://phabricator.wikimedia.org/T44815
[21:20:50] <ejegg>	 and longstanding, apparently!
[21:21:02] <nuria>	 the bottom line is that you cannot log (w/o delaying page loads) in the absence of sendbeacon
[21:21:09] <nuria>	 and we do not do it on purpose
[21:21:25] <AndyRussG>	 nuria: yeah we were looking at the solution used by gettingstarted, but on close examination, it seemed that it didn't work
[21:22:25] <AndyRussG>	 nuria: the browsers that don't support sendbeacon may represent more or less of certain segments of users/potential donors so excluding them could skew the data
[21:23:03] <awight>	 Hi!  I think the promise makes it possible to use as fire+forget, but for higher value data the calling code can decide to delay the page load or run a fallback path.  AndyRussG: is that what we were planning to do?
[21:23:21] <nuria>	 AndyRussG, ejegg brb
[21:23:21] <AndyRussG>	 currently the banner that we're testing on production just waits 200 ms after making the eventlogging call before navigating away (only for non-sendBeacon browsers)
[21:23:49] <ejegg>	 which skews data according to network speed
[21:24:03] <awight>	 yeah, .2s is too short
[21:24:08] <AndyRussG>	 hmmm
[21:24:13] <ejegg>	 and longer would needlessly delay folks on speedy networks
[21:24:54] <awight>	 Perhaps our approach should be to add the json to the paymentswiki request?  Which would still require the promise to resolve.
[21:24:56] <AndyRussG>	 FWIW here's the banner http://meta.wikimedia.beta.wmflabs.org/wiki/Special:CentralNoticeBanners/edit/CN_banner_history_test_banner
[21:25:42] <AndyRussG>	 nuria: ejegg: awight: I had two ideas about other possible solutions
[21:27:04] <AndyRussG>	 One is that eventLogging could just resolve the promise as it currently does, but in the case of non-sendBeacon browsers, when it resolves send the Javascript object representing the new img tag that was created to send the log
[21:27:27] <kevinator>	 hey milimetric, got a sec?
[21:27:28] <AndyRussG>	 That would let the calling code decide whether it wants to set an onload event or something
[21:27:35] <milimetric>	 kevinator: sure
[21:27:42] <kevinator>	 batcave?
[21:28:17] <AndyRussG>	 The other is that we could just substitute the donate button link for a link to the event log URL. And then, on the server, redirect.
[21:28:29] <milimetric>	 kevinator: omw
[21:28:44] <AndyRussG>	 I think that's the traditional solution... pretty sure it's been used by Google and the like for eons
[21:29:31] <AndyRussG>	 We could POST instead of GET to avoid payload issues, at least in that part...
[21:29:50] <ejegg>	 Seems like a much bigger change to eventlogging, allowing redirects
[21:31:11] <ejegg>	 caching seems problematic, and you'd have to avoid being an open redirect somehow
[21:32:14] <AndyRussG>	 ejegg: couldn't varnish just grab a param and redirect to that?
[21:32:31] <ejegg>	 without validating?
[21:32:54] <ejegg>	 lets anybody make a link to wikipedia that bounces to their malicious site
[21:33:06] <AndyRussG>	 ejegg: hmmm
[21:34:38] <awight>	 AndyRussG: donno about returning the img element, it would do the job, but having different signatures is messy.  I don't think ejegg's patch prevents the next page from loading, right?
[21:35:03] <awight>	 AndyRussG: about the second solution though, why not just send all the data to paymentswiki and do the EL on the server side?
[21:36:14] <ejegg>	 awight: right, my patch doesn't block, just lets you wait if you want to
[21:37:00] <nuria>	 ejegg: back
[21:37:14] <AndyRussG>	 awight: (first point) no it doesn't, though it does put more infrastructure in EL for such a case
[21:37:40] <AndyRussG>	 awight: ejegg: re: the redirect, you're right, hrm... For example, Google does validate its click-through URLs before redirecting
[21:38:19] <nuria>	 ejegg: let me catch up
[21:38:38] <ejegg>	 hi nuria!  Just discussing other possible workarounds
[21:38:56] <AndyRussG>	 awight: sending it all on to paymentswiki is certainly a solution! I think we'd looked at that, but stopped since EL isn't installed there... however...
[21:39:10] <awight>	 AndyRussG: oooh right.  We would have to go through donatewiki
[21:39:33] <AndyRussG>	 we already have a hack in the bannerHistoryLogger that creates the EL URL, just to make sure it doesn't get too long
[21:39:49] <AndyRussG>	 We could certainly send that to whatever comes after the banner
[21:39:53] <awight>	 The hit isn't unpalatable, since this is just for the people w/o sendBeacon, who we are inconveniencing either way
[21:40:17] <awight>	 sigh, no sendBeacon on any IE or Safari though
[21:40:33] <ejegg>	 maybe check for DNT too, and send those folks along w/o delay
[21:40:37] <nuria>	 AndyRussG, awight before doing a workarround I think you should look at what % of our user base on desktop
[21:40:45] <nuria>	 doesn't have sendbeacon support
[21:41:03] <AndyRussG>	 ejegg: right
[21:41:03] <nuria>	 AndyRussG, awight it is probably way less than you might think
[21:41:15] <ejegg>	 all safari, including all ios users, right?
[21:41:23] <nuria>	 AndyRussG, awight cause the HUGE majority of our users use chrome
[21:41:48] <nuria>	 ejegg, awight so i would postulate that you do not need to do a workarround
[21:42:03] <nuria>	 ejegg: are you planning onlogging with a sampling rate or 1:1
[21:42:24] <ejegg>	 for the navigation click, yes, we plan to log all of them
[21:42:33] <ejegg>	 otherwise it's a tiny sample rate
[21:42:44] <AndyRussG>	 http://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm
[21:42:44] <awight>	 Something like 12% or more of our readers don't have sendBeacon, reading here: https://stats.wikimedia.org/wikimedia/squids/SquidReportClients.htm
[21:42:52] <ejegg>	 which we log not too long into the page view
[21:42:58] <AndyRussG>	 awight: jinx!
[21:43:12] <nuria>	 some like over 3-5% do not have javascript
[21:43:15] <nuria>	 and thus not EL
[21:43:41] <ejegg>	 on desktop, IE is 12%, Safari is 4%
[21:43:54] <nuria>	 man..a-team needs to really update those reports
[21:43:59] <awight>	 :)
[21:44:10] <nuria>	 i been wanting to do that for ages
[21:44:12] <ejegg>	 iPad is 3%, iPhone is 11%
[21:44:34] <nuria>	 lemme get the last report made for mobile, cause the squid reports we know are incorrect
[21:44:47] <awight>	 Just checking, ejegg's patch is a bad idea because it makes EL heavier, or because it allows the caller to do naught non-fire-and-forget things?
[21:44:56] <awight>	 *naughty
[21:44:58] <nuria>	 kills me taht we do not have this data readily available for you guys
[21:44:59] <ejegg>	 So total about 30% don't do sendBeacon
[21:45:02] <AndyRussG>	 nuria: it's also a different sort of user that may be on these browsers without sendBeacon, so omitting them can skew the stats
[21:45:24] <nuria>	 awight: because is a performace issue waiting to happen
[21:45:31] <awight>	 hmm
[21:45:43] <nuria>	 awight: gives you the idea that is ok not to fire and forget
[21:46:10] <nuria>	 so yes, we want fire and forget
[21:46:32] <nuria>	 lemme give you an idea of % of non beacon support on desktop
[21:46:36] <awight>	 ^ yeah what AndyRussG said, the browser shares will look very different in specific countries where we're running these campaigns, compared to the global stats
[21:47:44] <nuria>	 ejegg: have in mind that you do not need every single data point to take good decisions
[21:47:49] <ejegg>	 nuria: I'm pretty sure that patch doesn't add any more operations to LogEvent beyond the current state of EL
[21:47:55] <AndyRussG>	 ...or also, within any given region, for example, non-technical users who visit infrequently, use bad software, but do donate, vs. tech-savvy sendBEeacon-endowed users who always close FR banners
[21:48:24] <nuria>	 ejegg: statististics help you infer without having to sample the whole population
[21:48:33] <ejegg>	 nuria: but browser use does correlate with other demographics that might influence donations
[21:49:03] <AndyRussG>	 nuria: this particular question is for Ellery though. In FR-tech we don't actually process and study the data. I don know that Ellery can often focus on quite small percentage differences
[21:50:20] * AndyRussG sympathizes that nuria is the only voice championing performance in this discussion...
[21:50:26] <awight>	 :)
[21:50:28] <nuria>	 AndyRussG: have in mind that EL - by definition, because it requires js support- will never give you the whole user base
[21:50:38] <nuria>	 AndyRussG: so you are always working with a subset
[21:50:50] <ejegg>	 Yeah, but the non-js folks never see the banners anyway
[21:50:59] <AndyRussG>	 nuria: true... also ^ what ejegg just said
[21:51:48] <AndyRussG>	 Heh untill we make CentralNotice fully server-side ESI
[21:52:08] * awight hears the wailing of the banshee
[21:53:21] <nuria>	 jaja (spanish laugh)
[21:53:39] <nuria>	 ok, let's get on track here, lemme get a better browser list
[21:54:12] <AndyRussG>	 ejegg: nuria: awight: Here is the hacky cut'n'paste-y function function we have in banner history that creates the EL URL to check its size ('cause of the EL payload size bug): https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCentralNotice/b248eac052184e14c57e3a7cea4c37230471fed2/resources%2Fsubscribing%2Fext.centralNotice.bannerHistoryLogger.js#L196
[21:54:47] <AndyRussG>	 We could just send that same data in a post along to the next wiki, and that wiki could call the URL, w/out needing to have EL installed
[21:54:58] <awight>	 AndyRussG: I think it'll be easy to get a number on the number of people who should have sent banner history associated with a donation but did not
[21:55:04] <awight>	 We could start there...
[21:58:30] <nuria>	 Krinkle: did you compiled a browser list the other day? ( i know a-team needs to get this and i am the 1st one to say that)
[21:58:53] <Krinkle>	 I made one last month yeah
[21:59:06] <AndyRussG>	 awight: yea good point :)
[21:59:14] <mforns>	 nuria, do you need help?
[21:59:28] <nuria>	 Krinkle: can you share, and i will make it my personal mission this is reported by analytics next quarter
[21:59:33] <Krinkle>	 https://docs.google.com/a/wikimedia.org/spreadsheets/d/1n9FhSqcBGM9iKXrlHsP0EZI0gU89Rmz5m51uglUGVjs/edit?usp=drive_web
[22:00:23] <ejegg>	 nuria: we fr-tech folks have a meeting in a sec. can we continue this discussion later?
[22:00:24] <Krinkle>	 nuria: Aside from the query I ran, I did one extra thing by hand which is to combine all entires with <10K hits into one entry on the bottom
[22:00:28] <Krinkle>	 for privacy reasons
[22:00:43] <nuria>	 ejegg: sure, i start early, so tomorrow is good too
[22:01:04] <nuria>	 Krinkle: sounds good, we have updated ua parser, so the other bucket shall decrease
[22:01:16] <ejegg>	 ok, cool!
[22:01:29] <nuria>	 Krinkle: man, i am putting getting this list in my goals, like it is going to be my ONLY GOAL
[22:01:43] <Krinkle>	 I'm excited :)
[22:12:17] <krrrit-wm>	 (PS1) Milimetric: Handle new or re-arranged columns [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/240599
[22:13:30] <krrrit-wm>	 (CR) Milimetric: "as soon as this [1] change is merged, we can merge this patch without touching the files." [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/237534 (owner: Jforrester)
[22:13:34] <wikibugs>	 Analytics-EventLogging: Provide a robust way of logging events without blocking until network request completes; use sendBeacon - https://phabricator.wikimedia.org/T44815#1668290 (Ejegg) Does the GettingStarted workaround even work?  Looks like the LogEvent promise is always resolved or rejected when the fun...
[22:36:18] <awight>	 nuria: Thanks again for the browser numbers, that's already helping inform our internal conversations!
[22:37:06] <nuria>	 awight: ok, let me know if you want to talk about this further tomorrow or today
[22:47:34] <wikibugs>	 Analytics-Kanban: Introduction to Hive class {flea} - https://phabricator.wikimedia.org/T113545#1668452 (madhuvishy) NEW a:madhuvishy
[22:50:31] <wikibugs>	 Analytics-Backlog, MediaWiki-API, Research-and-Data: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1668471 (Tgr) IMO if the goal is to put data into Hadoop, we should pick one of the other solutions from T102079#1622808. But if someone needs quick and dirty access right now, api...
[23:12:21] <wikibugs>	 Analytics-Backlog, Analytics-EventLogging: Send raw server side events to Kafka using a PHP Kafka Client {stag} - https://phabricator.wikimedia.org/T106257#1668575 (EBernhardson)
[23:12:23] <wikibugs>	 Analytics-EventLogging, MW-1.26-release, Patch-For-Review, WMF-deploy-2015-09-22_(1.26wmf24): Kafka Client for MediaWiki - https://phabricator.wikimedia.org/T106256#1668573 (EBernhardson) Open>Resolved a:Ottomata>EBernhardson
[23:15:21] <mforns>	 bye everyone! see you tomorrow!
[23:15:46] <awight>	 ejegg: AndyRussG: Are we relenting on the EventLogging patch for now?
[23:16:29] <ejegg>	 maybe see how bad ewulczyn thinks it'll skew his data?
[23:16:37] <wikibugs>	 Analytics-Backlog, MediaWiki-API, Research-and-Data: log user agent in api.log - https://phabricator.wikimedia.org/T108618#1668621 (EBernhardson) I've built out option 3 from T102079 and that is exactly what i'm suggesting to do here. It would be rather pointless to put the text logs into kafka but it...
[23:16:56] <awight>	 ok that sounds great
[23:17:38] <awight>	 nuria: Thanks for all the help, we'll talk to ewulcyzn about the underrepresentation and respond on the gerrit changeset!
[23:18:56] <AndyRussG>	 awight: ejegg: I think posting the URL and sending from DI will work, also
[23:19:10] <AndyRussG>	 nuria: thanks a ton!! :)
[23:19:30] <awight>	 AndyRussG: but that solution would be a fallback which we would want to trigger based on the EL promise not resolving.
[23:20:10] <AndyRussG>	 awight: why not just do that for all non-sendBeaconners? It would also avoid any delay
[23:21:04] <awight>	 You mean check for sendBeacon support in CN?  That's fine, it does muddy separation of responsibility a bit but no big deal
[23:21:18] <AndyRussG>	 yeah already pretty muddies
[23:21:26] <awight>	 :)
[23:21:27] <AndyRussG>	 We could ask EL to make a public method for that instead
[23:22:11] <AndyRussG>	 And also a method for getting the real EL URL
[23:22:26] <AndyRussG>	 awight: ^
[23:22:43] <awight>	 Not sure exposing isSendBeacon is right either, I'd want to ask EL exactly the question the promise answers--was the data sent yet?
[23:23:37] <awight>	 But since we can measure how much data is missing, I'm happy with the redirect to donatewiki whenever !sendBeacon
[23:23:58] <awight>	 Why would we want the real EL URL
[23:24:11] <awight>	 Server-side EL doesn't use it
[23:28:26] <AndyRussG>	 awight: ah hrmm silly me, yes
[23:28:45] <AndyRussG>	 awight: what do you mean, measure how much data is missing? and how does that affect this?
[23:30:45] <awight>	 Well, we can just analyze campaign results and get a number for how many donors were missing their banner history information.  If it was going to be unknown how many were missing, I'd wanna be more certain that we're plugging all the known holes a reader could fall through.
[23:31:46] <AndyRussG>	 awight: hmmm... well, we coud try to plug the holes anyway, too
[23:32:42] <awight>	 I'm happy that we can get a number to tell us how big the holes are.
[23:33:16] <AndyRussG>	 awight: did we talk about how hard it might be to install EL where DI lives? (paymentswiki?)
[23:33:21] <AndyRussG>	 Maybe it's easy enuf
[23:33:35] <awight>	 30% (non-sendbeacon browsers) is not OK, but I assume we'll just get a trickle of donors who never manage to send the banner history info
[23:33:36] <AndyRussG>	 And so we could post the log and then log server-side
[23:33:55] <awight>	 AndyRussG: I think installing EL on frack is a nonstarter, at least at this time of year
[23:34:08] <awight>	 there's also the bigger issue of getting that data back to a cluster where ewulczyn can use it
[23:34:29] <AndyRussG>	 At least some should get through the 200 ms! I get it OK on all the tests
[23:34:34] <awight>	 The donatewiki redirect is a pain, it adds a few seconds... donno how we can avoid it though
[23:35:30] <AndyRussG>	 what about if we just send the data to donatewiki, then donatewiki includes it in the web page, and then it's sent client-side?
[23:35:40] <AndyRussG>	 awight: we don't need EL installed on the client just to call a URL
[23:36:14] <awight>	 The end destination is paymentswiki, though.  Not sure what client-side donatewiki EL would help with?
[23:37:21] <AndyRussG>	 awight: it would give us a chance to send the log for all donors. The end destination just needs to get the ID, I think
[23:38:20] <awight>	 but donatewiki could EL the history server-side
[23:38:47] <wikibugs>	 Analytics-Kanban: Introduction to Hive class {flea} - https://phabricator.wikimedia.org/T113545#1668768 (madhuvishy) Scheduled for 29th September, 11 am PST.
[23:38:55] <AndyRussG>	 awight: so we have EL on donatewiki but not paymentswiki?
[23:39:12] <awight>	 yep!
[23:39:15] <awight>	 donatewiki is in the main cluster
[23:39:31] <awight>	 We'd need to enable the extension, but that's it
[23:39:54] <AndyRussG>	 awight: ah cool! Hmmm. And donatewiki... Like where you go after a susccessful donation?
[23:40:09] <awight>	 I think wmfwiki is where the thank-you page lives
[23:40:10] * AndyRussG apologizes for ignorance about the other side of the shop
[23:40:23] <awight>	 no worries!  back to #wikimedia-fundraising...
[23:49:46] <Juandev>	 hi