[00:34:52] Analytics-EventLogging: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2672617 (Tbayer) [03:57:07] Analytics, Analytics-EventLogging: Some recent ExternalLinksChange data lost - https://phabricator.wikimedia.org/T146815#2673154 (Nuria) >As a result of some recent EventLogging issues We have not had any issues with EL this quarter we know of. Will take a look at whether these events are being received... [03:57:24] Analytics-Kanban: Some recent ExternalLinksChange data lost - https://phabricator.wikimedia.org/T146815#2673155 (Nuria) [04:13:40] Analytics: Responses on pageview API should be lighter - https://phabricator.wikimedia.org/T145935#2645635 (MusikAnimal) I think it would be favourable to still return the request data but at the base of the response, something like: ``` { article: 'Example', access: 'all-access', agent: 'user', gran... [05:36:03] Analytics-EventLogging: Various EventLogging schemas losing events since around September 8/9 - https://phabricator.wikimedia.org/T146840#2673201 (Tbayer) > Interestingly, Firefox does not seem to be affected there, which leaves Chrome as the main suspect in case of that particular schema (because it exclude... [10:06:30] Analytics, Fundraising-Analysis, Fundraising-Backlog, MediaWiki-extensions-CentralNotice: Provide performant query access to banner show/hide numbers - https://phabricator.wikimedia.org/T90649#1064105 (Milimetric) +1 to the streaming way. We have to get there eventually and this task is already... [10:10:12] Analytics: Responses on pageview API should be lighter - https://phabricator.wikimedia.org/T145935#2673692 (Milimetric) Yes, we can make the optimization factor out everything that won't vary in the reaponse. [10:12:01] Analytics: Responses on pageview API should be lighter - https://phabricator.wikimedia.org/T145935#2673694 (Milimetric) [10:14:02] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#2673695 (Milimetric) \o/ thanks Lego, I'm gonna check out that code, maybe I can give these... [10:23:48] Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#2673720 (Milimetric) The union all approach hits some limits as mysql doesn't let you do that beyond I think like 50 or 100 times or something like that. @yuvipanda, yes, not p... [12:15:20] joal, hi! yt? [12:15:33] Hey mforns [12:15:38] heya! [12:16:34] I was load testing druid and... it crashed [12:16:42] mforns: :( [12:17:26] strangest thing is that, even when I stopped the request load, druid logs continue to flow fast [12:17:42] and CPU utilization is super high [12:18:02] wow [12:19:40] joal, I wonder if pivot, seeing that druid is not responding, continues to send queries to it [12:19:55] completely possible mforns :( [12:20:33] joal, do you know if we need ottomata to restart pivot? [12:20:39] mforns: for pivot, sure [12:20:49] mforns: for druid, I htink we have admin rights [12:21:05] k, I'll try to restart druid [12:21:29] thanks mforns [12:25:48] Analytics-Tech-community-metrics: "Backlog" widget on "Gerrit-Backlog" has redundant "Changesets" column (always "1") - https://phabricator.wikimedia.org/T146891#2673894 (Aklapper) [12:25:52] joal, do you know if we are already using a dedicated zookeeper for each druid machine, or still we have to do it? [12:26:06] Analytics-Tech-community-metrics: "Backlog" widget on "Gerrit-Backlog" has redundant "Changesets" column (always "1") - https://phabricator.wikimedia.org/T146891#2673894 (Aklapper) [12:26:30] mforns: dedicated: we have a zookeeper per druid host [12:26:37] joal, thx [12:28:50] Analytics-Tech-community-metrics: "Backlog" widget on "Gerrit-Backlog" seems to cover only last two years, misses oldest open changesets - https://phabricator.wikimedia.org/T146893#2673934 (Aklapper) [12:29:02] Analytics-Tech-community-metrics: "Backlog" widget on "Gerrit-Backlog" seems to cover only last two years, misses oldest open changesets - https://phabricator.wikimedia.org/T146893#2673934 (Aklapper) p:Triage>Normal [12:29:16] Analytics-Tech-community-metrics: "Backlog" widget on "Gerrit-Backlog" has redundant "Changesets" column (always "1") - https://phabricator.wikimedia.org/T146891#2673947 (Aklapper) p:Triage>Low [12:29:30] joal, I do not have sudo rights :[ [12:29:45] mforns: hm [12:29:59] joal, but seems druid is going back to normal slowly [12:30:05] k [12:32:36] mforns: druid1001 seems not overworking anymore [12:33:18] joal, yes, pivot and druid are back [12:33:54] great mforns, good perf test, system seems resilient :) [12:34:18] xD, joal, but I did very few requests... [12:34:28] mforns: Arf :( [12:34:57] joal, I'll repeat and try to find the threshold of meaningful requests per second [12:35:33] mforns: k, question: what datasource? [12:35:40] pageview_hourly [12:36:06] mforns: right, makes sesne [13:01:05] o/ [13:02:41] Hi halfak, as planned, I can't make it as od now [13:02:54] Understood, joal. :) [13:13:44] o/ milimetric [13:13:51] heyo halfak [13:14:07] (I figured we weren't meeting, sorry if you were waiting for me!) [13:14:07] Hey dude. I moved the live systems meeting to now. Does that work for you? [13:14:26] I think it should be short today. I have a couple questions about your thoughts on some things. [13:14:30] yeah, now works, I thought you didn't wanna meet 'cause joseph [13:14:34] oh totally, jumping in [13:32:33] halfak, milimetric : still talking? [13:32:39] yep [13:32:52] halfak, milimetric: ok, joining (except if you prefer me not to ;)! [14:29:28] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#2674224 (Nuria) @milimetric: changes will be deployed with next mediawiki deployment [14:31:30] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinksChange - https://phabricator.wikimedia.org/T115119#2674231 (Milimetric) @Nuria: but it's not merged yet. [15:44:51] hey nuria_ Can you help us get unblocked with https://phabricator.wikimedia.org/T146064 ? [15:45:44] I'd rather you explicitly resolve this on the task, nuria_. [16:03:02] Analytics: Quantify false positives when filtering for number of distinct user agents per page in top pages computation - https://phabricator.wikimedia.org/T146911#2674479 (Nuria) [16:16:56] leila: got your ping, sorry, just finished meeting [16:17:52] np, nuria_. I just want you to communicate that we're blocked, and this is critical to move forward. [16:18:25] leila: right, my concern is that we do not want to have raw ips long term and this patch doesn't include cleanup [16:18:52] leila: manual cleanup is faulty as someone needs to rememeber to run a select at a certain date .. etc etc [16:19:30] leila: I was trying to think what else could we do but best i can think of is what i was saying of preemptive calculating signatures [16:19:42] nuria_: it's not clear if we can drop the IP altogether at the end of the 90 days. [16:20:00] we may have to hash them with salt, and delete the salt, for example. [16:20:09] this I will know once we get a chance to work with this data in the first place [16:20:27] nuria_, ^ [16:21:10] nuria_: my suggestion is that we extract, and in a month, I get back to it with purging options. [16:21:30] I can create a phab task so I don't drop the ball on it, nuria_. I know exactly what your concern is. [16:21:47] once we know what the reasonable purging option is, we can update the job to include it, nuria_. [16:22:09] leila: why couldn't we hash them as we collect them now just like we used to do in EL? [16:22:34] because the actual IP address information may be valuable for us. [16:22:44] for example, if we want to sample based on the last n digits of the IP, nuria_ [16:22:58] hashing makes it harder as we should think about all the things that we may need now. [16:24:39] nuria_: I appreciate the care you're putting into making sure the sensitive data doesn't leak out, but I need to work with this data before I can say what parts of it I need and what parts I don't. I'm signing up for being responsible for it, and making a task to get back to it in a month and review purging options. [16:25:01] "leak out" -> comes out of the cluster without care. [16:26:05] leila: ok, let's review in a month but purging options of a table created by an oozie job also need to be code changes for which you will need access to engineer resources [16:26:34] understood nuria_. [16:26:51] milimetric: I will let marcel CR your semantic change cause it doesn't look like i ma going to get to it to day. [16:27:14] leila: can you send me the task you created about purging? [16:27:27] yup, working on it, and will attach it to the other task [16:27:47] nuria_: np, cc mforns ^ [16:29:30] Analytics, Research-and-Data, Research-collaborations, Research-management, Patch-For-Review: Oozie job to extract data for WDQS research - https://phabricator.wikimedia.org/T146064#2649398 (leila) [16:30:33] Analytics, Research-and-Data, Research-collaborations, Research-management, Patch-For-Review: Oozie job to extract data for WDQS research - https://phabricator.wikimedia.org/T146064#2649398 (leila) @schana please go ahead with this task. Nuria and I talked off-list and created a task to revie... [16:33:43] (CR) Nuria: Add Oozie job to extract data for WDQS research (3 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/311964 (https://phabricator.wikimedia.org/T146064) (owner: Nschaaf) [16:33:53] leila: updated review [16:34:12] leila: ball is on schana's cort [16:38:21] nuria_: does this capture what you want in the commit message? "Purging of the sensitive data is tracked in T146915. [16:38:21] This job is to provide data for [16:38:21] https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries" [16:38:22] T146915: Purging of sensitive data for WDQS research - https://phabricator.wikimedia.org/T146915 [16:39:36] schana: sure, make sure you run the job (ask for help as needed) and quantify how long does it take etc. Have in mind that when running it you need to publish data to /tmp as running as your user you cannot update the main hdfs partitions [16:40:07] schana: oozie is a bit dry at teh beginning: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie [16:40:20] nuria_: I don't have access to stat1003, and last time joal ran the job [16:40:30] schana: you can do it in 1004/1002 [16:40:45] maybe that's what I meant [16:41:10] yeah, I **do** have access to 1003, but not the others [16:44:16] sorry team, I forgot to log in after stand-up restart [16:44:41] schana: ok, do file for access to 1002 , just open a phab task and tagg it with access request [16:45:03] schana: 1002 and 1004 [16:45:17] okay nuria_ [16:45:28] (PS3) Nschaaf: Add Oozie job to extract data for WDQS research [analytics/refinery] - https://gerrit.wikimedia.org/r/311964 (https://phabricator.wikimedia.org/T146064) [16:45:39] amended the commit for now [16:46:51] (CR) Nuria: "Thanks for doing the changes. please update CR when you have been able to run the job. Please request help on irc as needed." [analytics/refinery] - https://gerrit.wikimedia.org/r/311964 (https://phabricator.wikimedia.org/T146064) (owner: Nschaaf) [16:59:52] nuria_: running late, hopefully here in 10 mins [16:59:56] np [16:59:58] joal: np [17:09:06] nuria_: [17:09:08] here [17:09:14] k, omw [18:19:52] schana: did you file request for access? I can approve if so [18:26:35] Analytics-Kanban, Patch-For-Review: Make and deploy simple proof of concept dashboard for Daily Edits and Daily Pages Created on simplewiki - https://phabricator.wikimedia.org/T146775#2674985 (Nuria) Open>Resolved [18:26:48] Analytics-Kanban, MediaWiki-extensions-WikimediaEvents, Collab-Team-Q1-July-Sep-2016, Patch-For-Review, WMF-deploy-2016-10-04_(1.28.0-wmf.21): EL alarms raw/validated 20160926 - https://phabricator.wikimedia.org/T146674#2674986 (Nuria) Open>Resolved [18:29:47] nuria_: just filed [18:30:02] schana: ok, do cc-me and i can approve [18:30:26] nuria_: I added you as a subscriber [19:03:28] Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#2675100 (Quiddity) >>! In T95582#2667746, @Milimetric wrote: > @Quiddity, we're very close to allowing this kind of query in Hadoop, which you've worked with before, right? Gra... [19:33:43] I'm out for now yall, meeting with a potential designer, see ya later [20:09:03] Analytics-Kanban: Some recent ExternalLinksChange data lost - https://phabricator.wikimedia.org/T146815#2675293 (Nuria) I cannot see any data on this table from the 2015-09-25 onwards nor in the logs for validated events. There are events on prior days though: 22, 23, 24. [20:10:10] Analytics-Tech-community-metrics, Developer-Relations: Measuring Time To First Code Change (TTFCC) - https://phabricator.wikimedia.org/T137201#2675298 (Qgil) Our team doesn't have hands to work on this task during #devrel-oct-dec-16, but a possible plan would be: * @Peter pushes this topic in the contex... [20:10:56] Analytics-Kanban: Some recent ExternalLinksChange data lost - https://phabricator.wikimedia.org/T146815#2675301 (Nuria) I assume these are server side events, correct? [20:18:01] Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#2675345 (matmarex) >>! In T95582#2673720, @Milimetric wrote: > The union all approach hits some limits as mysql doesn't let you do that beyond I think like 50 or 100 times or so... [23:30:39] Analytics-Kanban, MediaWiki-extensions-WikimediaEvents, Collab-Team-Q1-July-Sep-2016, Patch-For-Review, WMF-deploy-2016-10-04_(1.28.0-wmf.21): EL alarms raw/validated 20160926 - https://phabricator.wikimedia.org/T146674#2668022 (Etonkovidova) no "clientValidated":false" was found in vagrant l...