[00:24:21] 10Analytics, 10Community-Tech: Data missing in page creation datasets - https://phabricator.wikimedia.org/T185019#3904866 (10kaldari) @Milimetric: Any idea why there might be data missing here? [01:25:45] 10Analytics, 10Community-Tech: Data missing in page creation datasets - https://phabricator.wikimedia.org/T185019#3905124 (10Milimetric) So, I'm not sure, but my bet is that there was some outage that caused the data to land there after the reports were run. When that happens, you can re-run the reports: htt... [01:26:35] 10Analytics, 10Community-Tech: Data missing in page creation datasets - https://phabricator.wikimedia.org/T185019#3905125 (10Milimetric) and if you're nervous about what that did, you can check the .reruns folder: ``` milimetric@stat1006:/srv/reportupdater$ cat /srv/reportupdater/jobs/reportupdater-queries/p... [03:21:21] (03CR) 10Milimetric: "couple of follow-ups on stuff Fran said and a couple of things of my own. But you and Fran feel free to merge and deploy tomorrow morning" (0312 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/402387 (https://phabricator.wikimedia.org/T179530) (owner: 10Mforns) [03:28:24] (03CR) 10Milimetric: [C: 04-1] "couldn't build/run because you forgot some dependencies, check the jenkins error logs, they're the same ones I get" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans) [07:01:17] mforns: eventlogging cleaner on db1107 completed! \o/ [07:01:43] now I want to see another run today (it will happen in a few hours) and also triple check the tables [07:01:51] buuuut looking good [07:01:58] this time no errors reported [07:05:20] * elukey brb [08:01:54] so I briefly checked min(timestamp) of all tables, it is consistent with the whitelist [08:02:04] will do other checks but everything looks good [08:16:08] * elukey errand for a bit! [08:44:21] rebooting stat boxes [08:47:38] \o/ [08:53:31] stat boxes rebooted! [08:53:47] !log disabled camus as prep step for analytics1003 reboot [08:53:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:05:58] rebooting an1003 [09:11:41] all right, an1003 back in service [09:11:48] let's see if everything looks right [09:11:55] moritzm: Hadoop reboots completed :) [09:15:38] \o/ [09:31:52] (03CR) 10Mforns: "Thanks for the reviews guys, will take care of comments today." (0310 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/402387 (https://phabricator.wikimedia.org/T179530) (owner: 10Mforns) [09:46:31] !log removed upstart config for brrd on eventlog1001 (failing and spamming syslog, old leftover?) [09:46:34] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:46:57] !log rebooted analytics1003 [09:46:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:49:45] mforns o/ [09:50:31] I am currently trying to figure out what happened in [09:50:32] https://grafana.wikimedia.org/dashboard/db/eventlogging?orgId=1&from=1516112999752&to=1516115881761 [09:53:28] !log disable druid middlemanager on druid1002 as prep step for reboot [09:53:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:22:07] !log reboot druid1002 for kernel upgrades [10:22:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:24:51] last host standing is druid1001 [10:37:11] !log re-run webrequest-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's reboot) [10:37:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:38:54] !log stopped all crons on hadoop-coordinator-1 [10:38:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:39:08] this should hopefully stop the spam --^ [10:59:11] so webrequest-druid-hourly-wf-2018-1-17-8 seems consistently failing [10:59:15] I've re-ran it two times [10:59:27] and the logs on druid1002's overlord don't make a lot of sense [11:25:38] it keeps failing, lovely [11:27:48] hellooooo [11:44:14] !log restart druid middlemanager on druid1002 [11:44:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:44:41] !log re-run pageview-druid-hourly-wf-2018-1-17-9 and pageview-druid-hourly-wf-2018-1-17-8 (failed due to druid1002's middlemanager being in a weird state after reboot) [11:44:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:44:49] mforns: --^ [11:45:53] elukey, thanks! the hour 9 just failed, should I rerun it? [11:46:15] check the log :) [11:46:43] just re-run it, it should be completed in a minute [11:47:08] it is unfortunate that druid sometimes gets into this state when either zookeeper or the indexer service gets changed [11:47:09] elukey, ok ok, thanks a lot [11:47:19] aha [11:53:28] it took me a bit to figure out that it was the middlemanager the issue :( [11:53:46] mforns: did you read what I posted about the el cleaner earlier on? [11:53:57] elukey, no, wait [11:54:06] no, no, was off [11:54:11] what was it? [11:54:42] it completed! No issues registered, I'll let todays run go before saying victory but we are close [11:54:58] moreover I checked all the min(timestamps) across the tables [11:55:06] \\\\\\o////// [11:55:07] comparing them with the whitelist [11:55:12] and they look sane [11:55:34] \o/ \o/ [11:57:37] also mforns not sure what happened in https://grafana.wikimedia.org/dashboard/db/eventlogging?orgId=1&from=1516112099034&to=1516116301963 [11:57:45] yesterday burrow alarmed [11:57:49] and I noticed the gap [11:58:01] but it is not clear to me from the logs what happened [11:58:57] whenever you have time (not urgent) could you review it? [11:59:07] I'll try to do it as well after lunch [12:17:59] going to lunch! [12:18:02] * elukey lunch! [12:31:52] elukey, no no, I will do it! thanks for the heads up [13:23:50] 10Analytics-Kanban: Make banner-activiy success file cleaner not fail when there's nothing to be cleaned - https://phabricator.wikimedia.org/T185100#3905982 (10mforns) [13:24:58] (03PS1) 10Mforns: Make banner-actvity cleaner not fail when there's nothing to drop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/404662 (https://phabricator.wikimedia.org/T185100) [13:25:54] 10Analytics-Kanban, 10Patch-For-Review: Make banner-activiy success file cleaner not fail when there's nothing to be cleaned - https://phabricator.wikimedia.org/T185100#3905982 (10mforns) [13:39:12] (03PS10) 10Fdans: Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) [13:41:57] (03CR) 10jerkins-bot: [V: 04-1] Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans) [13:47:27] so I got an alert for banner impression stream not pushing data for the past 30 mins [13:47:59] https://grafana.wikimedia.org/dashboard/db/prometheus-druid?refresh=1m&panelId=41&fullscreen&orgId=1 [13:48:28] but https://yarn.wikimedia.org/proxy/application_1515441536446_8509/ doesn't look blocked [13:51:13] elukey, hm [13:53:14] elukey, no idea on what to do here [13:57:27] but indeed pivot is showing no data since 1 hour ago, actually flapping since 2 hours ago [13:58:40] and also you can see a huge drop in activity since Jan 15th at 7pm [13:58:56] https://tinyurl.com/yal7a2ba [14:01:21] hm, it has been going on for a couple days now, and the daily job run already and confirmed the data for Jan 15th and Jan 16th [14:01:32] so I'd say it's not an issue with the streaming job [14:01:36] rather a data issue [14:05:20] the banner data query is pretty straightforward... [14:07:25] yeah.. [14:07:39] mforns: what is the filter/source for kafka data? [14:07:51] we can easily check if data is there or not [14:08:08] elukey, no the issue should be already in webrequest table [14:08:32] I suspect the data comes corrupt or simply does not come [14:09:11] it makese sense though that banner impression data decreased a lot after the past weeks though [14:10:26] elukey, yes, the number of webrequests with uri_path = '/beacon/impression' was in the order of millions in Jan 12th, and since 15th is in the order of thousands. [14:11:06] let's tail the webrequest log and see :) [14:11:20] sure :] [14:15:09] ( ^_^)/ [14:15:34] o/ [14:15:40] mforns: kafkacat shows some activity [14:16:13] iirc spark is pulling directly from webrequest right? [14:16:16] elukey, I think even with the low volume since the 15th, there is actually a problem today [14:16:22] yeah [14:16:26] elukey, no [14:16:31] I think it pulls from kafka [14:16:47] through tranquility [14:16:47] yeah sorry I wanted to say webrequest data on kafka [14:16:53] ah! sorry my bad [14:17:02] yes, I think so [14:17:29] no wait tranquillity handles the realtime indexers on druid [14:17:32] I checked, and there is a regular number of webrequests with uri_path = '/beacon/impression' during the hours of the alerts in hive webrequest [14:17:40] oh ok [14:18:07] (03PS11) 10Fdans: Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) [14:18:13] so the overlord console is not showing up any indexer [14:18:17] for realtime data [14:18:24] aha [14:19:37] ok now I am going to brutally kill the spark job [14:20:00] don't hurt yourself :] [14:20:31] (03CR) 10jerkins-bot: [V: 04-1] Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans) [14:21:12] !log forced kill of banner impression data streaming job to get it restarted [14:21:14] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:21:43] now the cron on an1003 should figure out that nothing is there in ~5m and restart the job [14:22:46] ok [14:27:37] job respawned, no indexer on druid [14:29:42] (03PS12) 10Fdans: Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) [14:32:19] (03CR) 10Fdans: "@Milimetric sorry, my package.json got bamboozled in one of the many rebases and of course I hadn't erased my node_modules so everything w" (033 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans) [14:34:32] (03CR) 10jerkins-bot: [V: 04-1] Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans) [14:37:11] now spark+tranquillity handle the realtime indexing jobs [14:38:21] via the indexing service (overlord+middlemanager+peons) service [14:38:42] (using its apis and zookeeper directly) [14:38:58] the peons acting as real time indexers gets data via POST [14:39:03] and then druid index that data [14:39:30] at the moment all the regular "batch" indexing jobs, that need to contact the overlord, are working [14:40:01] so I am wondering if rebooting druid1002 caused some issues for tranquillity [14:40:04] maybe in zookeeper? [14:41:30] 2018-01-17T14:02:59,126 INFO io.druid.indexing.overlord.RemoteTaskRunner: Tried to delete status path[/druid/analytics-eqiad/indexer/status/druid1003.eqiad.wmnet:8091/index_realtime_banner_activity_minutely_2018-01-17T14:00:00.000Z_1_0] that didn't exist! Must've gone away already? [14:41:55] maybe the middlemanager is acting weirdly? [14:42:47] !log restart druid middlemanager on druid1003 as attempt to unblock realtime streaming [14:42:48] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:45:22] 10Analytics, 10ChangeProp, 10EventBus, 10Reading-Infrastructure-Team-Backlog, and 3 others: Update node-rdkafka version to v2.x - https://phabricator.wikimedia.org/T176126#3906116 (10mobrovac) >>! In T176126#3904165, @Ottomata wrote: > We've also got librdkafka 0.11 backported for Jessie in our apt repo no... [14:49:14] sorry elukey got distracted [14:49:55] at least at the end of the day, the daily job will replace the missing data with good data [14:50:04] 10Analytics, 10ChangeProp, 10EventBus, 10Reading-Infrastructure-Team-Backlog, and 3 others: Update node-rdkafka version to v2.x - https://phabricator.wikimedia.org/T176126#3906135 (10Ottomata) I'd prefer to pin the version in puppet, then restrict it everywhere for SCB. If we fix up that patch to somethin... [14:50:27] mforns: sure but it bothers me that everything is so fragile to reboots [14:50:32] aha [14:52:02] 10Analytics, 10ChangeProp, 10EventBus, 10Reading-Infrastructure-Team-Backlog, and 3 others: Update node-rdkafka version to v2.x - https://phabricator.wikimedia.org/T176126#3906136 (10mobrovac) I would prefer having a stable and known-to-work version everywhere rather than confining the problem to SCB [14:55:45] INFO: line 617: Update /var/run/eventlogging_cleaner with the current end_ts 20171019110001 [14:55:51] oh yesssss [14:56:45] 10Analytics-Kanban, 10DBA: Purge all old data from EventLogging master - https://phabricator.wikimedia.org/T168414#3906141 (10elukey) [14:58:11] 10Analytics-Kanban, 10DBA: Purge all old data from EventLogging master - https://phabricator.wikimedia.org/T168414#3363866 (10elukey) The first run completed without any errors, and then another one (cleaning up only daily data) ran as well setting the following: ``` INFO: line 617: Update /var/run/eventloggi... [14:58:29] \o/ [15:00:02] !!!!!!!!!!!!!! [15:00:26] :D [15:00:31] * elukey hugs mforns [15:00:31] :'D [15:00:36] :''''''''D [15:00:46] * mforns hugs elukey [15:00:50] xD [15:01:45] elukey, gonna run grab some food and be back in a while [15:02:20] sure! [15:03:23] gooood next hour kicked in and we now have realtime indexers again [15:53:19] mforns: I keep seeing Warning: Duplicate entry 'blablabla' for key etc.. in the eventlogging logs [15:53:33] elukey, is it super frequent? [15:53:45] not super frequent but it seems happening regularly [15:53:52] looking [15:54:34] I am checking now eventlogging_consumer-mysql-m4-master-00.log.1 [15:57:27] 10Analytics, 10TCB-Team: Where should keys used for stats in an extension be documented? - https://phabricator.wikimedia.org/T185111#3906321 (10WMDE-Fisch) [15:58:34] elukey, duplicates may happen when there are restarts [15:58:50] but shouldn't happen theoretically otherwise [15:59:36] I'm tailing eventlogging_consumer-mysql-m4-master-00.log for 5 minutes now and didn't see any duplicates [16:01:09] mforns_brb: try to open it with say less [16:01:30] there are some from this morning [16:01:39] and afaict we haven't done anything at that time [16:01:44] elukey, yes, maybe there was a restart this morning? [16:01:48] oh ok [16:02:54] maybe those aligns with kafka consumer groups changes, but I believe we'd need to investigate [16:03:08] not now, but maybe open a task? [16:05:00] yes sure [16:07:05] (03CR) 10Ottomata: [C: 031] Make banner-actvity cleaner not fail when there's nothing to drop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/404662 (https://phabricator.wikimedia.org/T185100) (owner: 10Mforns) [16:13:22] ottomata: when you add something to the blacklist, I was thinking we should re-check those schemas and data every once in a while to see if they've fixed the problems. Also, probably we should ping whoever's generating the data? [16:16:15] milimetric: those job ones are special [16:16:20] in that they aren't eventlogging schemas [16:16:23] they are from mediawiki job queue [16:16:25] so who knows [16:16:35] they don't really have a schema [16:17:00] for evenltogging analytics stuff, definitely. [16:17:04] ok. Do the job queue / future job queue people know about this and the blacklist? [16:17:28] like Giuseppe [16:17:46] also, I think I'm gonna try calling him Juice from now on, see how that goes :) [16:17:49] haha [16:17:56] yes, but they've never looked at the reinfed data [16:18:00] i keep meeting to give them a tour [16:18:04] maybe this friday during ops hangouts... [16:18:05] :) [16:18:40] cool [16:20:45] 10Analytics, 10TCB-Team: Where should keys used for stats in an extension be documented? - https://phabricator.wikimedia.org/T185111#3906399 (10WMDE-Fisch) [16:24:01] !log re-run all the pageview-druid-hourly failed jobs via Hue [16:24:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:29:57] 10Analytics, 10TCB-Team: Where should keys used for stats in an extension be documented? - https://phabricator.wikimedia.org/T185111#3906421 (10WMDE-Fisch) [16:30:08] joal: ops sync wanna? [16:44:20] heh, fdans is it a known problem that the countries are all wrong? [16:44:37] like, it thinks Antarctica is Algeria and Australia is Bahamas [16:45:43] what the hell [16:46:07] milimetric: this will all be correct when we switch to ISO codes [16:46:12] right, I figured [16:46:16] but it's funny [16:46:22] also, Argentina and a couple of others aren't right [16:46:27] but oddly enough everything else seems ok [16:47:55] milimetric: it does make me wonder if it's the shoddy provisional table I made (isoLookup.js) or if it's the map geometries [16:48:12] if it's the former there's no problem [16:49:08] fdans: there are some more failed tests too: https://integration.wikimedia.org/ci/job/analytics-wikistats2-npm-browser-node-6-docker/96/console [16:49:19] oh, and it doesn't recognize Brazil! [16:50:35] milimetric: yeah I'm investigating those, they aren't failing for me locally [16:50:45] oh wait they are the merge tests [16:50:49] ok that makes sense [16:54:38] there are still jobs failing for druid, this time page view indexing [16:54:42] checking it now [16:59:32] :/ [17:01:37] ping fdans [17:11:01] 10Analytics-Kanban, 10Patch-For-Review: Make banner-activity success file cleaner not fail when there's nothing to be cleaned - https://phabricator.wikimedia.org/T185100#3906566 (10Reedy) [17:17:34] hey folks. Did stat1005 get rebooted yesterday? [17:17:45] I seem to have lost some work (or lost my mind) [17:18:26] halfak: hey! This morning EU time, I've sent multiple emails to alert it [17:18:45] (engineering@, analytics@ and personal emails to people holding a screen/tmux session) [17:19:23] elukey, ahh I lost my email [17:19:42] :( [17:20:18] it was for the meltdown upgrades, needed to deploy the new kernel [17:20:27] No worries. You did well. It's good to know when the reboot happened. That'll help me recover :) [17:20:43] We should have a keyword to put in emails so I can set up a trigger to highlight the emails. [17:20:51] Maybe [Reboot] or something like that. [17:20:59] Or [Downtime] [17:21:11] yeah I was about to ask what would be best to pop up in people's emails list [17:21:42] I don't have a strong opinion. Any keyword I can flag is great ^_^ [17:21:56] EIther way, thanks for the info and thanks for making sure to send that email [17:22:28] It's my own fault I missed it and it doesn't matter that much. It's nice to know I'm not losing my mind -- or my screen sessions :D [17:29:42] !log restarted all druid overlords on druid100[123] (weird race condition messages about who was the leader for some task) [17:29:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:30:17] halfak: it happens, if I can make the whole alert email spam more effective I am all for it [17:30:45] so let's try to do this - next time, in the personal emails that I'll send for tmux/screen sessions, I'll put [Reboot] [17:31:00] Awesome. I'll set up the filter with that keyword in brackets [17:33:17] !log killed the banner impression spark job (application_1515441536446_27293) again to force it to respawn (real time indexers not present) [17:33:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:52:23] nuria_: aqs per country vs wikistats looks mostly legit https://usercontent.irccloud-cdn.com/file/ZF2053kz/Screen%20Shot%202018-01-17%20at%2018.49.32.png [18:04:54] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3906791 (10Ottomata) FYI: @Samwalton9 and @Samtar, your access expired on 2018-01-01 and your accounts have been removed. Thanks! :)... [18:05:04] streaming data back [18:05:43] ottomata: I had to restart all the druid overlords, they were "confused" about who was the leader [18:06:07] will see tomorrow what explodes with druid1001 [18:06:29] * elukey off! [18:08:01] huh that is weiiird [18:08:03] and not good [18:08:08] cmon druid you've been so wonderful! [18:23:41] hey joal. did you see Gerard's question on wiki-research-l re clickstream? [18:24:03] joal: I think the question boils down to, did what Ellery say at https://meta.wikimedia.org/wiki/Research_talk:Wikipedia_clickstream#Not_found get done for this recurrent release? [18:48:52] 10Analytics: Estimate how long a new Dashiki Layout for Qualtrics Survey data would take - https://phabricator.wikimedia.org/T184627#3907003 (10egalvezwmf) [18:50:50] 10Analytics: Estimate how long a new Dashiki Layout for Qualtrics Survey data would take - https://phabricator.wikimedia.org/T184627#3890442 (10egalvezwmf) @mcruzWMF Can you upload the screenshots? I was going to upload to commons but I wasn't sure if there are attributions. Thanks! [19:00:23] (03CR) 10Milimetric: Map component and Pageviews by Country metric (0312 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans) [19:07:16] 10Analytics, 10Community-Tech: Data missing in page creation datasets - https://phabricator.wikimedia.org/T185019#3907074 (10Nettrom) 05Open>03Resolved a:03Nettrom I checked the dashboard for enwiki and spot-checked a dataset, and the data appears to be in working order. Thanks for helping take care of t... [19:09:49] thanks milimetric :) [19:56:19] ottomata: where can i run a spark shell? anywhere 1005 works after giving some errors [19:56:43] nuria_: stat1005 or stat1004 [19:56:43] either [19:56:47] the errors you see are probably normal [19:56:52] about trying to pick a port [19:56:58] it tries a few til it finds an open one [19:57:25] ah i see, ok [19:57:35] I have not used this in ages -> me hacker now [20:29:00] (03PS13) 10Fdans: Map component and Pageviews by Country metric [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) [20:29:42] (03CR) 10Fdans: Map component and Pageviews by Country metric (0310 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/392661 (https://phabricator.wikimedia.org/T181529) (owner: 10Fdans) [20:30:11] nuria_: can we make a joint goal with services to upgrade main kafka clusters next quarter, if all goes well and gets finished witth jumbo this quarter? [20:31:54] ottomata: sounds great [20:32:52] yeehaw [20:34:28] ottomata: added line 53 https://etherpad.wikimedia.org/p/analytics-goals [20:36:14] haha nuria_ FYI, you've said this several times, the name of the big project is not event streams [20:36:25] that's a different hting [20:36:26] http://wikitech.wikimedia.org/wiki/EventStreams [20:36:34] ottomata: yes, argh, so true [20:36:58] Stream Data Platform is working name [20:37:00] ya? [20:37:06] we are talking about renaming [20:37:07] it [20:37:09] but htat is it for now [20:37:09] k [20:37:48] yes [20:37:51] yes [20:38:00] gr8 [20:40:36] ottomata: for teh spark shell to have access to algebird and refinery i have to pass it like --jar /srv/deployment/analytics/refinery/artifacts/refinery-core.jar [20:41:43] ottomata: but where do i find third party deps (algebird) in the box , do i need to rsync them there? [20:42:50] 10Analytics-Cluster, 10Analytics-Kanban: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#3907314 (10Ottomata) p:05Triage>03Normal [20:44:01] algebird? [20:44:05] nuria_: what is depending on that? [20:44:07] something you want to do? [20:44:10] if so, then ya [20:44:13] you just rsync or download them there [20:44:33] if you want them productionized/deployed with refinery, we'll have to get them into archiva and then added to refinery/artifacts somewhere [20:44:36] or find a .deb package [20:45:18] or, if you are writing refinery-source code that will be deployed that depends on algebird, then you'll need to add the dependency in a pom.xml file [20:47:03] 10Analytics-Cluster, 10Analytics-Kanban: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#3907347 (10Ottomata) [20:55:09] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Port Kafka clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461#3907367 (10Ottomata) Yeehaw, FYI, all Kafka clients have been ported from analytics to jumbo in deployment-prep in Cloud VPS. EventLogging was a breeze there. The... [20:55:32] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Port Kafka clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461#3907368 (10Ottomata) If all is still well tomorrow, I will delete the analytics instances in deployment-prep. [20:56:43] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#3907372 (10Ottomata) [20:56:53] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#3840663 (10Ottomata) a:03Ottomata [21:00:03] 10Analytics, 10Discovery: Send Mediawiki Kafka logs to Kafka jumbo cluster with TLS encryption - https://phabricator.wikimedia.org/T126494#3907389 (10Ottomata) [21:08:41] 10Analytics-Cluster, 10Analytics-Kanban: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#3907424 (10Ottomata) @Jgreen FYI, we'll need to coordinate this soon :) [22:44:00] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3907835 (10zhuyifei1999) Query 4835 works in around 6 seconds for me. What's wrong? [22:52:22] 10Analytics: Create dashboard for interlanguage navigation stats - https://phabricator.wikimedia.org/T185156#3907880 (10Milimetric) [23:28:26] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3908049 (10IKhitron) See five top queries on my profile. They usually run 1-5 second. Now half of them run dozens of seconds. [23:45:41] ottomata: i was getting some compilation errors but i guess we need to use spark2-shell? that seems to work best with teh spark we have