[08:44:00] tgr|away: thanks! Just created https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Run_queries_in_Cron and replaced the hive -f example with a beeline one [09:31:32] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3614203 (10Jhernandez) [09:46:50] 10Analytics-Kanban, 10Analytics-Wikistats: Addition of Unique Devices metric - https://phabricator.wikimedia.org/T170461#3614255 (10fdans) [09:46:52] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats2 bugs (3/4) - Data issues - https://phabricator.wikimedia.org/T170937#3614256 (10fdans) [09:48:06] 10Analytics-Kanban, 10Analytics-Wikistats: Use daily granularity for 1-month time ranges - https://phabricator.wikimedia.org/T173372#3614258 (10fdans) [10:26:23] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3614339 (10elukey) [11:45:28] * elukey lunch! [11:57:51] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3614494 (10Mholloway) [12:03:46] 10Analytics-Tech-community-metrics: Filtering on "author_name" in Git can show unrelated activity by other users - https://phabricator.wikimedia.org/T176135#3614501 (10Aklapper) [12:05:16] 10Analytics-Tech-community-metrics: Filtering on "author_name" in Git can show unrelated activity by other users - https://phabricator.wikimedia.org/T176135#3614501 (10Aklapper) [12:33:40] taking a break a-team [13:38:34] morning [13:38:39] afternoon [13:39:49] fdans: should I review https://phabricator.wikimedia.org/T171487 or help with the two things you have in progress? [13:40:03] Joseph, when you're back, let's sync up [13:42:02] milimetric: if you can take a look at that one it would be great :) [13:48:10] fdans: patch fails because... [13:48:24] https://www.irccloud.com/pastebin/tOK1yvil/ [13:48:34] yeah I've no idea what that means [13:49:19] well the DC_Store thing is just some mac thing you should exclude [13:49:32] DS_Store [13:49:48] yeah but the other errors are strange [13:50:02] yeah, why would it not apply... doesn't it checkout a totally different branch? [13:50:14] you can checkout the branch directly - error-handling [13:50:15] is there such a thing as "rebasing" since this is an older change? [13:50:28] milimetric: no, it applies the diff to the current branch [13:50:39] which is stupid if you ask me, but... [13:50:50] right, ok, I'll just review the code and accept / reject based on that, you can figure out how to merge it and remove the DS_whatever [13:50:56] hi! milimetric and fdans, can I help with any of your tasks or should I grab another wikistats one? like the one with Volker's feedback? [13:51:17] mforns: I think we should wait for Joseph and work on the backend [13:51:27] milimetric, ok [13:51:39] in the meantime I was thinking of merging anything outstanding on the front end [13:52:07] mforns: do you want me to land your change? [13:52:12] mforns: you were already reviewing https://phabricator.wikimedia.org/T171487 would you like to finish that? [13:52:23] fdans, wait I will test safari and edge [13:52:30] now [13:52:33] ok [13:52:42] ok, then I'll do the error one [13:55:48] joal, milimetric, have you seen the thread about dropping rev_sha1 for multi content revisions? [13:55:50] wikitech-l [13:55:54] might be good to have your input there [13:56:33] checkin [14:07:12] 10Analytics, 10Analytics-EventLogging, 10AbuseFilter, 10CirrusSearch, and 30 others: Possible WMF deployed extension PHP 7 issues - https://phabricator.wikimedia.org/T173850#3614768 (10Reedy) >>! In T173850#3542522, @Anomie wrote: > I'm not impressed with the false positive rate for this tool. The warnings... [14:09:46] fdans, do you have 5 mins to batcave and help me land the wiki selector change? [14:09:55] yesss [14:09:57] omw [14:10:04] also to talk about https://phabricator.wikimedia.org/D755 and see if I can have it working on my machine? [14:10:07] omw too [14:25:51] thanks ottomata, very important thread [14:26:57] 10Analytics, 10ChangeProp, 10EventBus, 10Reading-Infrastructure-Team-Backlog, and 2 others: Update node-rdkafka version to 2.0 - https://phabricator.wikimedia.org/T176126#3614815 (10mobrovac) [14:33:14] fdans: added some comments to D755 [14:33:14] D755: Add visual error handling to dashboard and detail - https://phabricator.wikimedia.org/D755 [14:37:00] 10Analytics, 10ChangeProp, 10EventBus, 10Reading-Infrastructure-Team-Backlog, and 2 others: Update node-rdkafka version to 2.0 - https://phabricator.wikimedia.org/T176126#3614857 (10elukey) >>! In T176126#3614757, @Ottomata wrote: > Luca, I thought the ultimate problem was that we couldn't properly set the... [14:39:11] milimetric: thank youuuuu [14:42:58] ottomata: kafka on jumbo is producing metrics to prometheus :) [14:43:25] woohooo [14:44:44] 10Analytics-EventLogging, 10Analytics-Kanban: EventLogging tests fail for python 3.4 in Jenkins - https://phabricator.wikimedia.org/T164409#3614868 (10mforns) a:03mforns [14:49:01] milimetric: v nice response thank you [14:54:00] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 2 others: Options for implementing JobQueue statistics methods - https://phabricator.wikimedia.org/T175957#3614904 (10GWicke) [14:54:19] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 2 others: Options for implementing JobQueue statistics methods - https://phabricator.wikimedia.org/T175957#3609097 (10GWicke) Added the "fetch metrics from graphite / prometheus" option. [15:10:02] 10Analytics: Improve Wikistats UI for mobile - https://phabricator.wikimedia.org/T176143#3614935 (10mforns) [15:23:54] 10Analytics-Kanban: Write document on wikitech on why do we want to migrate back to gerrit from differential - https://phabricator.wikimedia.org/T176145#3614982 (10Nuria) [15:25:43] 10Analytics-Kanban, 10Analytics-Wikistats: Implement Topic Selector Widget - https://phabricator.wikimedia.org/T167676#3614995 (10Nuria) a:03mforns [15:34:13] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922#3615014 (10elukey) a:03elukey [15:42:17] ottomata: Could use some urgent help regarding https://phabricator.wikimedia.org/T176105 [15:42:27] We've made schema changes many times when we consumed from ZMQ without issue [15:42:43] This is the first time we've made a schema change with the kafka consumer, and it seems it stopped consuming events since the schema change. [15:43:14] We subscribe to eventlogging_NavigationTiming, not to a specific revision, so I'm unsure as to why the newer ones wouldn't be reported. [15:43:50] Output from our consumer went from 150/min to <1/min. Raw Kafka stats also affected, but much less so: https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?orgId=1&var-schema=NavigationTiming&from=now-7d&to=now [15:43:56] from 300/min to 250/min [15:44:33] 10Analytics-Tech-community-metrics: Filtering on "author_name" in Git can show unrelated activity by other users - https://phabricator.wikimedia.org/T176135#3615047 (10Aklapper) 05Open>03Invalid [[ https://www.elastic.co/guide/en/kibana/master/lucene-query.html | Lucene query syntax. ]] One day I will rememb... [15:45:11] (03CR) 10Ottomata: [V: 032 C: 032] Add oozie util workflow to launch spark jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/201009 (https://phabricator.wikimedia.org/T94596) (owner: 10Ottomata) [15:45:40] looking [15:46:26] Sep 18 15:27:09 hafnium python[11198]: 2017-09-18 15:27:09,094 Error sending JoinGroupRequest_v0 to node 22 [NodeNotReadyError: 22] [15:46:49] but then it joins correctly [15:46:50] mmmm [15:47:18] Krinkle: [15:47:26] i think your events are not validating with your schema change [15:47:59] https://meta.wikimedia.org/w/index.php?title=Schema%3ANavigationTiming&type=revision&diff=17216284&oldid=16305090 [15:48:03] This is the only change as far as I can tell [15:48:36] do you have clients that send secureConnectionStart as not an int? [15:48:44] i see [15:48:44] None is not of type 'integer' [15:48:47] in eventlogging logs [15:48:54] looking for that so i can link in logstash... [15:49:01] don't know kibana well... [15:49:46] Krinkle: i think you have clients that set that field to null [15:49:59] ottomata: Aye, I think some might be setting it to a negative number by accident [15:50:03] I see it now [15:50:10] I guess negative is not allowed right? [15:50:17] And either way, we don't want those [15:50:23] "type": "integer" should allow negative [15:50:29] but the errors i'm seeing say they are null [15:50:58] Krinkle: i am not sure if these errors still go to logstash... [15:50:58] but [15:51:21] Krinkle: not related but I opened https://phabricator.wikimedia.org/T176149 [15:51:29] kafkacat -C -b kafka1012.eqiad.wmnet:9092 -t eventlogging_EventError [16:24:03] ottomata: Thanks, I think I found the problem. [16:24:21] cool [16:28:30] 10Analytics-Kanban: vet edit data on the data lake - https://phabricator.wikimedia.org/T153923#3615233 (10Nuria) [16:31:29] hey, I am trying a fairly simple hive query, but don't seem to have much luck getting results: https://gist.github.com/gwicke/8987a57f176fde8d64c7f209274e8d34 [16:33:21] what's the problem? [16:33:29] I tried selecting the first couple of hits with limit as well [16:33:41] just no results? [16:33:51] the query succeeds, but no results [16:33:56] same for https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive/Queries#Distinct_IPs [16:35:16] mforns, milimetric: https://gist.github.com/jobar/ba849b1be53e4c50c9006a86f3805f97 [16:35:19] * gwicke sent a long message: gwicke_2017-09-18_16:35:16.txt [16:35:40] hm gwicke i get results for a query like that one [16:35:44] you sure you are using wmf.webrequest? [16:35:46] in the wmf database [16:35:47] ? [16:36:12] I am on stat1004 [16:36:29] and am using beeline -f [16:36:51] yeah gwicke, you need either fully qualified table name [16:36:53] with database name [16:36:54] or [16:36:55] use wmf; [16:36:59] in your script [16:37:01] change table name to [16:37:03] wmf.webrequest [16:37:05] and it'll work [16:37:52] ah, retrying.. [16:38:13] yay, thanks! [16:38:15] i guess there is a 'webrequest' table in the default db? [16:38:19] you shoulda got a no table exists [16:38:19] looking [16:38:22] it shouldn't be there if there is one [16:39:20] yeah, I didn't get any errors [16:39:28] yeah, weird there was an empty table [16:39:30] i dropped it [16:39:56] thanks! [17:10:45] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3615426 (10JoeWalsh) I'd like to keep access without an expiration date for access to data from the mobile apps [17:12:18] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3615446 (10JMinor) I am a PM on the WMF staff and need continuing access to stats machines for product analysis. [17:13:07] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3615455 (10elukey) [17:13:26] Hi all!!! Is there a way to get the results of a Pivot query as a CSV? Or should I query Druid directly? Thx!!! :) [17:14:17] AndyRussG: hi! I'd use Beeline instead, do you have access to it ? [17:15:35] elukey yes, I'll do that too, using Pivot for some quick exploration, and I did get some useful results there, so I thought maybe I could grab them in some other format... [17:17:39] (for example, from table format view to csv...) [17:21:10] AndyRussG: little share icon [17:21:12] up top [17:21:17] top right [17:21:19] export to csv [17:22:11] yes I found it now, really nice :) [17:24:22] going afk people, have a nice day! [17:24:30] * elukey off! [17:24:40] elukey: ottomata thx! [17:24:59] Yea very nice! [17:37:29] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3615558 (10Mhurd) I'd also like to keep access w/o expiration for apps data access. [17:46:49] 10Analytics-Kanban: vet edit data on the data lake - https://phabricator.wikimedia.org/T153923#3615588 (10Nuria) Three vettings: - import from mediawiki into hadoop, analytics vetted that fairly well - metric calculation on metrics calculated on top of data lake versus wikistats metrics - vetting usage of raw... [17:57:01] Hi gwicke :) Just so that you know: your current running query is huge :) [17:57:33] milimetric: do you have a link to a presentation where you showed that chrome bug with pivot thing? [17:59:16] gwicke: 1 month of webrequest (any webrequest_source) is ~35Tb - Given Sept. is not finished, it's only 20Tb this time, but not bad [18:09:54] 10Analytics-Kanban: Write document on wikitech on why do we want to migrate back to gerrit from differential - https://phabricator.wikimedia.org/T176145#3615704 (10Nuria) [18:11:02] nm milimetric nuria linked to task, got screenshots there [18:11:07] ottomata: I couldn't find it, maybe joal has it, but I can link you [18:11:23] https://phabricator.wikimedia.org/T141506#2523165 [18:52:37] AndyRussG: be aware of data gudelines when sharing data https://office.wikimedia.org/wiki/Data_access_guidelines [19:03:22] mforns: still around? [19:03:56] milimetric, yes! [19:04:07] queries? [19:04:31] do you want to split the queries and sync up in like 1 hour? [19:04:48] or prefer to pair [19:04:50] ? [19:07:00] mforns: ah, sorry, [19:07:09] batcave? [19:07:39] yep [19:07:42] omw [19:54:10] milimetric, mforns: It looks like denormalization (multiplication of events) accross 5 dimesions is too much - I'll let the job finish, but it's really heavy [19:54:22] mmmm [19:54:41] milimetric, mforns: It worked fine with 2, without keeping heavy dims in them (titles, ids and so on) [19:55:00] I think Druid columnar compression will help, but can't be sure it'll fit [20:17:54] hm [20:19:22] milimetric: problem here really comes from data multiplication because of dimension denormalization [20:20:13] milimetric: The digest-base table is 33Gb, and one daily digest-stage, map side only, generated more than 5Tb [20:21:21] I'm willing to check reduction, but I'm not feeling it's gonna work :( [20:22:23] milimetric: The good thing remaining is that we have a middle ground solution, not perfect for flexibility and reusability, but working [20:25:06] tomorrow I wanna pick your brain about what generated 5Tb I don't get that, but I have to run to the doctor's office to drop something off before they closer [20:25:08] *close [20:25:11] so ttyl [20:46:04] nuria_: yes for sure, thanks :) [20:52:08] 10Analytics, 10DBA, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3616219 (10bd808)