[07:18:08] (PS1) Mforns: Make use of the new explode by file feature [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/307243 (https://phabricator.wikimedia.org/T132481) [07:20:21] (CR) Mforns: [C: -1] "Wait after related change has been deployed." [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/307243 (https://phabricator.wikimedia.org/T132481) (owner: Mforns) [07:39:08] (PS1) Mforns: Make use of the new explode by file feature [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/307244 (https://phabricator.wikimedia.org/T132481) [07:39:47] (CR) Mforns: [C: -1] "Wait until the related patch is deployed." [analytics/limn-ee-data] - https://gerrit.wikimedia.org/r/307244 (https://phabricator.wikimedia.org/T132481) (owner: Mforns) [08:08:32] goood morning [08:08:47] I should have found the varnishkafka issue [08:54:33] Hi a-team ! [08:56:49] * elukey waves to joal o/ o/ o/ o/ [08:57:17] \o o/ \o/ elukey! [08:59:08] elukey: reading emails, let me know when you have time to catch me up on things [09:00:00] sure! I am merging a change for Varnishkafka [09:00:16] it should allow more incomplete transactions to be kept in memory [09:00:22] from 1000 to 5000 [09:00:33] since we are seeing timeouts for upload :/ [09:01:12] Arf :( [09:01:32] Ok, so vk merged and applied for upload cache [09:01:49] * joal will need some time to be back on track [09:16:46] all right the change has been merged [09:32:06] Analytics, Pageviews-API, Tool-Labs-tools-Pageviews: siteviews data for 2016 August 27 appears to be empty - https://phabricator.wikimedia.org/T144159#2590263 (Amire80) [09:48:27] joal: I know that you are super busy but if you have a minute I'd need to ask you something about https://phabricator.wikimedia.org/T144158 [09:48:52] From the sal Andrew restarted some brokers [09:49:10] but it seems like preferred replica election wasn't run in the end [09:49:37] elukey: reading [09:49:39] he was testing EL kafka client rebalancing so I am not sure if this is meant to be like this or not [09:49:50] (I mean, for ongoing tests) [09:50:58] ah snap dates does not coincide [09:50:59] mmmm [09:51:08] it is the 18th not the 8th [09:51:13] anyhow, it seems weird [09:53:13] elukey: I confirm that certain current leaders are not prefered ones [09:53:23] elukey: however I can't say if it's on purpose :) [09:56:12] joal: I checked the *18th* in the sal and there were some network maintenance on main routers [09:56:20] so I am going to run a preferred replica election [09:56:28] maybe we'd need some alarms [09:58:25] elukey: not sure how to alarm on this: when leaders are not prefered ones? [09:59:01] elukey: but I agree, it'd be good to know that the cluster is not in regular shape [09:59:08] maybe we could figure out a way to alarm on differences in a metric like established connections or something similar [09:59:18] from the avg or the majority of metrics [09:59:24] not sure, just thinking out loud :) [09:59:44] elukey: why not [10:00:55] joal: anyhow, how did the vacations go? [10:01:10] I realized that I haven't asked :) [10:01:11] elukey: holidays were great :) [10:01:31] elukey: I have met almost all my family (parents and brothers) [10:01:35] and moved a bit [10:01:40] that was great [10:02:36] How a bout you? You had a week last week, hadn't you? [10:04:43] yes! Lisbon and Porto, lovely :) [10:04:51] :D [10:04:55] Very hot I guess [10:05:32] Not that much, but I met mobrovac over there and he said that the weather was much worse (hot) the week before [10:05:39] so I might have been very lucky :) [10:06:15] it is important to be lucky, it helps in many ways :) [10:08:06] ah yes then when I got back Cassandra was behaving like crazy [10:08:17] so all the luck was counter-balanced [10:08:22] and now vk hates me again [10:08:36] karma :D [10:08:42] Arf ... Being lucky with Cassandra is something wee have not experienced yet [10:08:50] ahahhahaah [10:09:39] So what's up with cassandra? [10:17:20] I didn't have time to investigate sadly, but from the 17th the READ message dropped restarted https://grafana.wikimedia.org/dashboard/db/aqs-elukey [10:17:44] unrelated: I just noticed a big jump in pageviews top project [10:18:13] anyhow, I think it might be due to a new traffic pattern that causes more sstable reads [10:18:26] maybe somebody started to request big queries [10:18:33] elukey: maybe, what I have observed is that a cassandra restrat seems to help [10:18:42] elukey: maybe a RAM flush helps a bit? [10:18:50] elukey: I don't know :( [10:19:18] elukey: About top projects, more people vewing it seems good :) [10:19:29] ah yes with the new cluster I'd be happy [10:19:30] :P [10:19:48] huhu, we first need to make sure it actually changes sonething ;) [10:19:48] joal: what I wanted to do was checking traffic pattern changes from the 17th [10:19:52] maybe big queries [10:20:06] elukey: feasible [10:33:10] all right going afk for lunch, ttl! [10:33:12] * elukey lunch [10:57:37] * elukey back [11:07:17] (PS2) Mforns: Support passing the exploded values by file path [analytics/reportupdater] - https://gerrit.wikimedia.org/r/306966 (https://phabricator.wikimedia.org/T132481) [11:08:00] (PS2) Mforns: Disable the deprecated option by_wiki [analytics/reportupdater] - https://gerrit.wikimedia.org/r/306968 (https://phabricator.wikimedia.org/T132481) [11:32:07] (PS1) Mforns: Make use of the new explode by file feature [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/307273 (https://phabricator.wikimedia.org/T132481) [11:32:42] (CR) Mforns: [C: -1] "Wait for the related patch to be deployed." [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/307273 (https://phabricator.wikimedia.org/T132481) (owner: Mforns) [11:43:34] (PS1) Mforns: Make use of the new explode by file feature [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/307274 (https://phabricator.wikimedia.org/T132481) [11:44:03] (CR) Mforns: [C: -1] "Wait for the related change to be deployed." [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/307274 (https://phabricator.wikimedia.org/T132481) (owner: Mforns) [11:47:24] Analytics-Kanban, Patch-For-Review: Make reportupdater support passing the values of an explode_by using a file path - https://phabricator.wikimedia.org/T132481#2590554 (mforns) Ok, this can be reviewed. - The first patch adds the new feature to reportupdater, with backwards compat. - The patches 2, 3,... [12:17:27] elukey: here? [12:25:12] yes :) [12:25:37] I'm wondering about the coordinator alert thing (the one from 2 days ago in text) [12:25:41] elukey: --^ [12:26:59] yes I sent an email about it, I am not sure what to do :/ [12:27:03] * elukey ignorant [12:27:15] elukey: That's not what I meant: ) [12:27:26] ahhaha yes yes I know [12:27:30] I was joking [12:27:35] elukey: I just want to confirm: The red coords for misc, I don't see them [12:28:17] https://hue.wikimedia.org/oozie/list_oozie_bundle/0006700-160512101937480-oozie-oozi-B - I went in here [12:28:23] oh elukey, I think I have understood [12:28:36] yeah I am sure it is a red herring [12:28:38] but I was worried [12:28:43] better triple check [12:28:44] :D [12:28:55] elukey: because of the % change for data acceptation while testing VK, I stopped the coodinator from the bundle and manually started another one [12:29:17] ahhhhhh [12:29:21] elukey: So there is a running coord with red flags :) [12:29:32] elukey: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0045463-160519124420827-oozie-oozi-C/ [12:29:52] ok, one thing solved :) [12:30:07] now, the imported thing [12:30:31] elukey: Have you stopped and waited for camus to have stopped when you restarted the hadoop things the 27th? [12:31:09] elukey: From the chan, looks like yes [12:32:59] I did but some of them might have run since for some reason I failed to disable all the crons [12:33:07] hm [12:33:07] I rechecked ~30 mins later and corrected [12:33:23] elukey: it seems to be the thing I assume [12:33:24] yeah I was sure that there were all disabled with puppet [12:33:43] yes probably [12:33:49] elukey: I'll double check the checker logs [12:34:40] what should we do in these cases? Re-run camus manually? [12:36:07] elukey: depends on the error [12:36:19] elukey: if error there isw 1 [12:39:40] ? [12:47:41] elukey: very weird errors in camus webrequest logs [12:49:45] I checked them but didn't really get their meaning [13:19:40] elukey: I found what happened, corrected the thing, and will restart the load job [13:20:59] can you enlight me? :) [13:22:10] elukey: I'll do my best :) [13:22:48] good moorrrninnngg [13:23:21] elukey: Because of some camus jobs cut in the middle, indexes in history files have been in incoherent state for 2 runs [13:23:27] ottomata: morning' ! [13:23:56] uh oh [13:24:04] how goes? just checking emails now [13:24:17] ottomata: good ! [13:24:47] elukey: the camus partition checker doesn't deal well enough with errors (I will correct that RIGHT NOW) [13:25:24] elukey: Currently when an error occur on a topic, the checker doesn't work the topics that still were to be worked [13:25:30] it fails [13:25:42] hm [13:25:58] elukey: So, there were problems on camus runs at time 00 and 10, and success after [13:26:26] ottomata: o/ [13:26:29] elukey: unfortunately, text partition was to be flagged at time 10 [13:26:29] HIIIIIIII [13:26:49] elukey: does it make sense what I'm writing? [13:27:27] yes it does.. how did you fix it though? [13:28:08] elukey: hacky way: Copied the camus pro file in my folder, and modified it to only run on text topic [13:28:19] ottomata: the camus issue is almost due to my hadoop daemons restarts [13:28:30] *is almost for sure due to [13:28:31] hm, ok [13:28:40] i wish camus was a little more resilient :/ [13:28:47] we shouldn't have to always make sure it is stopped [13:28:53] elukey: I'll patch the checker for this not happen anymore - meaning, if recovery is possible (it was the case in that example), it should go for it [13:29:01] my bad, I have missed some to disable some crons by accident and realized only afterwards.. of course camus punished me [13:29:33] ottomata: unfortunately we have the same kind of issues for other jobs: some oozie jobs fails when restarting without stopping the daemons [13:29:53] thanks joal :) [13:30:01] no problem elukey :) [13:30:15] elukey: sorry for having made not resilient enough code ;) [13:31:04] ottomata, elukey: do you need me more now? If not, I take a break until standup ! [13:31:41] not i! [13:31:51] joal: no thanks to you for the patience :-) [13:32:47] be back for standup then :) Later a-team ! [13:32:52] laters [13:33:06] ottomata: where did you go in these days? vacation? [13:33:46] ja!, thursday and friday I was up in vermont at a wedding of two good friends of mine [13:33:58] they rented a house, and we all went up and camped out for a few days there [13:34:18] nice! [13:38:10] Analytics-Kanban, EventBus, Wikimedia-Stream: Public Event Streams - https://phabricator.wikimedia.org/T130651#2590955 (Ottomata) Interesting! @Pchelolo can you link to code? Would like to be inspired :) [14:13:47] * elukey afk for a bit [14:23:35] (PS1) Milimetric: Fix js error in cohort member removal [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/307291 (https://phabricator.wikimedia.org/T113454) [14:23:52] (CR) Milimetric: [C: 2 V: 2] Fix js error in cohort member removal [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/307291 (https://phabricator.wikimedia.org/T113454) (owner: Milimetric) [14:26:07] hour hostname invalid_dt [14:26:07] 8 cp4006.ulsfo.wmnet 6827 [14:26:07] 8 cp4005.ulsfo.wmnet 10777 [14:26:07] 9 cp4005.ulsfo.wmnet 2 [14:26:07] 10 cp4006.ulsfo.wmnet 1 [14:26:09] 11 cp4006.ulsfo.wmnet 5 [14:26:12] 11 cp4005.ulsfo.wmnet 2 [14:26:14] 11 cp4007.ulsfo.wmnet 14 [14:26:17] 13 cp4014.ulsfo.wmnet 4 [14:26:24] and I have applied the vk patch at ~9 [14:26:26] \o/ [14:26:59] O.o!!! [14:27:03] o/ [14:27:19] (invalid_dt == LENGTH(dt) != 19) [14:34:41] back [14:34:47] elukey: YAY !!! [14:35:49] joal: so this time we don't hit the timeout but too many incomplete records (> 1000) in memory [14:35:59] now the limit is 5000 [14:36:04] and it seems working [14:36:10] we might want to bump it up to 10k [14:37:01] elukey: thinking out loud: growing this number means we use more ram for this - Could this mess the caching space or something in that area? [14:41:57] halfak: o/ [14:42:05] halfak: I thought you might like it: https://twitter.com/ThePracticalDev/status/770234031078273024 [14:44:15] joal: this is a good thought, but atm I didn't observe any huge growth in memory utilization.. we'll need to find a good balance :) [14:44:42] cool elukey :) [14:53:04] ottomata: would it make sense to add a graphite monitor for https://grafana-admin.wikimedia.org/dashboard/db/kafka?panelId=6&fullscreen&edit ? [14:53:27] something like "if any metric is 0 for more than 24 hours" [14:53:35] like per host? [14:53:45] yes [14:53:47] we have an alert about total message rate [14:53:52] maybe ja [14:54:05] ah really? [14:54:25] yeah, but not per host [14:54:27] just the sum of those [14:55:55] ahhh okok [14:56:01] it would be a nice follow up for https://phabricator.wikimedia.org/T144158 [14:57:16] afaiu network maintenance caused brokers to loose the leader position [14:57:25] and they didn't get it back for 11 days [14:57:49] nothing huge of course but it might lead to issues [14:58:17] very difficult to spot if you don't happen to look the dashboard by chance [15:01:09] yeah sounds like a good idea [15:06:49] Analytics-Dashiki, Analytics-Kanban: Bookmarkable date filters for browser stats dashboard - https://phabricator.wikimedia.org/T143689#2591184 (Nuria) a:Nuria [15:07:03] ottomata: https://phabricator.wikimedia.org/T143536 - this might interest you (mw-vagrant should move to jessie sooner or later) [15:07:18] Analytics-Dashiki, Analytics-Kanban: Bookmarkable date filters for browser stats dashboard - https://phabricator.wikimedia.org/T143689#2575865 (Nuria) https://gerrit.wikimedia.org/r/#/c/306980/ [15:19:57] elukey: oo that would be nice [15:38:15] Analytics, Pageviews-API, Tool-Labs-tools-Pageviews: siteviews data for 2016 August 27 appears to be empty - https://phabricator.wikimedia.org/T144159#2590263 (Milimetric) A job failed, is being rerun now, so things should be in order soon. We'll track and close this when it's resolved. Thanks for... [15:38:58] Analytics, Analytics-Kanban, Pageviews-API, Tool-Labs-tools-Pageviews: siteviews data for 2016 August 27 appears to be empty - https://phabricator.wikimedia.org/T144159#2591279 (Milimetric) p:Triage>Normal [15:40:11] Analytics: Global Unique Devices Counts - https://phabricator.wikimedia.org/T143927#2591284 (Milimetric) p:Triage>Normal [15:40:41] Analytics-Kanban, Tool-Labs-tools-Pageviews: siteviews data for 2016 August 27 appears to be empty - https://phabricator.wikimedia.org/T144159#2591287 (Milimetric) [15:41:13] Analytics: Show pageviews prior to 2015 in dashiki - https://phabricator.wikimedia.org/T143906#2582806 (Milimetric) p:Triage>Normal [15:41:44] Analytics: ResearchSpike: Pivot UI: react, plywood - https://phabricator.wikimedia.org/T143828#2591292 (Milimetric) p:Triage>Normal [15:44:00] Analytics: ResearchSpike: Pivot UI: react, plywood - https://phabricator.wikimedia.org/T143828#2591297 (Milimetric) Open>Resolved a:Milimetric Pivot already can query example files, checked in with pivot source right now. They're adding other modules to query mysql or postgresql, so we can wait... [15:44:23] Analytics: Revamp Eventlogging so anyone can use it - https://phabricator.wikimedia.org/T143794#2591303 (Milimetric) p:Triage>Normal [15:45:05] Analytics: Stop vital signs metric creation on wikimetrics - https://phabricator.wikimedia.org/T143715#2591312 (Milimetric) p:Triage>Normal [15:45:41] Analytics-Kanban: Stop vital signs metric creation on wikimetrics - https://phabricator.wikimedia.org/T143715#2576713 (Milimetric) [15:46:27] Analytics: delete useless wikimetrics.report or cohort records - https://phabricator.wikimedia.org/T120713#2591317 (Milimetric) [15:49:11] Analytics: Vital Signs: Please make the data for enwiki and other big wikis less sad, and not just be missing for most days - https://phabricator.wikimedia.org/T120036#2591339 (Milimetric) Open>declined We're going stop vital signs metric creation on labs and replace it with our hadoop pipeline. Lab... [15:49:46] Analytics, Analytics-Wikimetrics, Easy: Upgrade wikimetrics code to check labs lag table - https://phabricator.wikimedia.org/T119514#2591342 (Milimetric) Open>declined turning off vital signs generation in wikimetrics, so closing this. [15:50:23] Analytics: Vital Signs: Please provide an "all languages" de-duplicated stream for the Community/Content groups of metrics - https://phabricator.wikimedia.org/T120037#2591347 (Milimetric) Open>declined deprecating vital signs metric generation in wikimetrics [15:52:39] Analytics, Fundraising-Analysis: FR tech hadoop onboarding {flea} - https://phabricator.wikimedia.org/T118613#2591373 (Milimetric) Open>Resolved a:Milimetric If any more support is needed, please open another task [15:54:52] Analytics, Analytics-Dashiki: Fix annotation date parsing in Firefox {crow} - https://phabricator.wikimedia.org/T112273#2591382 (Milimetric) Open>Resolved a:Milimetric looks good now [15:55:47] Analytics, Analytics-Dashiki, Wikimedia-Site-requests: Need a Dashiki namespace so we can protect configs {crow} - https://phabricator.wikimedia.org/T112268#1630543 (Milimetric) [15:58:49] Analytics: Workshop to teach analysts, etc about Quarry, Hive, Wikimetrics and EL {flea} - https://phabricator.wikimedia.org/T105544#2591427 (Milimetric) Open>Resolved a:Milimetric We did some of this with the Hive class, we'll continue to think of relevant tech talks [16:00:29] Analytics, Analytics-EventLogging: eventlogging user agent data should be parsed so spiders can be easily identified {flea} - https://phabricator.wikimedia.org/T121550#2591436 (Milimetric) [16:01:15] Analytics, Analytics-EventLogging, WMF-Legal, Privacy: Allow opting out from logging some of the default EventLogging fields on a schema-by-schema basis - https://phabricator.wikimedia.org/T108757#1529694 (Milimetric) We'll preprocess the user-agent as part of T121550, so that should help with th... [16:07:19] Analytics-Kanban: Continue New AQS Loading - https://phabricator.wikimedia.org/T140866#2591458 (JAllemandou) a:JAllemandou>Nuria [16:14:57] elukey: Do we sync up on logistics for Big data conf tomorrow morning? [16:15:21] Analytics: Vital Signs: Please make the data for enwiki and other big wikis less sad, and not just be missing for most days - https://phabricator.wikimedia.org/T120036#2591538 (Jdforrester-WMF) So… This is not Declined, just being done in another way? I'm confused. [16:16:49] (CR) Ottomata: "- static_data: oh, ja didn't remember that it already existed. Let's leave it then." [analytics/refinery] - https://gerrit.wikimedia.org/r/306292 (https://phabricator.wikimedia.org/T141476) (owner: Milimetric) [16:18:34] joal: +1, sure [16:20:25] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Set up dedicated Druid Zookeeper - https://phabricator.wikimedia.org/T138263#2591641 (Ottomata) @JAllemandou Let's figure this out soon, this is ready to go otherwise. [16:40:19] Analytics, Analytics-Dashiki, Wikimedia-Site-requests: Need a Dashiki namespace so we can protect configs {crow} - https://phabricator.wikimedia.org/T112268#1630543 (Dereckson) You don't need an extension to add a edit warning on the top of any page of a namespace. Check https://en.wikipedia.org/wik... [16:44:16] ottomata: trying to gather info on druid data stored in zookeeper [16:49:07] ottomata: hdfs dfs -ls [16:49:10] oops [16:52:14] Analytics-EventLogging: Add sampling support in EventLogging - https://phabricator.wikimedia.org/T67500#709928 (JKatzWMF) Hi @nuria, I just want to bump this thread as the issue has come up enough times in the last few months that I asked the team for solutions and they pointed me to this. The web team has... [16:56:33] What is the prevailing analytics dashboard? Is it still Vital Signs? [17:18:27] going offline team! Talk with you tomorrow! [17:36:10] hare: we don't have a single dashboard that shows everything [17:36:39] but we're working on a new version of wikistats, and that will eventually present all the data we have [17:36:43] Hooray [17:36:47] mforns: hey, back [17:36:53] hi milimetric [17:36:55] :] [17:37:13] to the cave? [17:38:09] mforns: ^ [17:38:15] yes! milimetric [17:40:28] ottomata: It looks like it should be safe to fully change zookeeper for druid [17:40:46] ottomata: I'm logging off, we can talk and take actions tomorrow if you agree [17:40:52] See you tomorrow a-team ! [17:54:20] Analytics-EventLogging: Add sampling support in EventLogging - https://phabricator.wikimedia.org/T67500#2592095 (Nuria) @JKatzWMF : The Eventlogging client has added sampling abilities client side. Thus far the only way to keep track on sampling is to send it with the schema though (suboptimal, I know, but... [17:56:35] joal: ok awesome! [17:59:02] Analytics-Kanban, Analytics-Wikimetrics: Stop vital signs metric creation on wikimetrics - https://phabricator.wikimedia.org/T143715#2592126 (Nuria) [18:00:47] (PS1) MaxSem: Prepare production branch [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307344 [18:01:04] (CR) MaxSem: [C: 2 V: 2] Prepare production branch [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307344 (owner: MaxSem) [18:42:48] mforns, milimetric : let me know if you are done and you want to talk about different ways to stop vital signs creation on wikimetrics [18:43:07] nuria_, we are on it right now [18:43:15] ok nuria_, we'll ping you [18:47:44] ok [19:08:23] Analytics: Vital Signs: Please provide an "all languages" de-duplicated stream for the Community/Content groups of metrics - https://phabricator.wikimedia.org/T120037#2592494 (Jdforrester-WMF) As with {T120036}, this isn't Declined? [19:13:03] Analytics: Vital Signs: Please provide an "all languages" de-duplicated stream for the Community/Content groups of metrics - https://phabricator.wikimedia.org/T120037#1843258 (Nuria) sorry, yes should be declined in favor of the work we are doing towards edit history reconstruction in mw. [19:13:57] Analytics: Vital Signs: Please provide an "all languages" de-duplicated stream for the Community/Content groups of metrics - https://phabricator.wikimedia.org/T120037#2592518 (Nuria) SEE: https://phabricator.wikimedia.org/T143924 [19:14:07] Analytics: Vital Signs: Please make the data for enwiki and other big wikis less sad, and not just be missing for most days - https://phabricator.wikimedia.org/T120036#1843248 (Nuria) Please see: https://phabricator.wikimedia.org/T143924 [19:19:23] Analytics: Vital Signs: Please provide an "all languages" de-duplicated stream for the Community/Content groups of metrics - https://phabricator.wikimedia.org/T120037#2592534 (Jdforrester-WMF) OK, will mark these two as blocked by that then. [19:19:36] Analytics-Kanban: Wikistats 2.0. Edit Reports: Setting up a pipeline to source Historical Edit Data into hdfs {lama} - https://phabricator.wikimedia.org/T130256#2592536 (Jdforrester-WMF) [19:19:38] Analytics: Vital Signs: Please provide an "all languages" de-duplicated stream for the Community/Content groups of metrics - https://phabricator.wikimedia.org/T120037#2592535 (Jdforrester-WMF) [19:19:46] Analytics: Vital Signs: Please provide an "all languages" de-duplicated stream for the Community/Content groups of metrics - https://phabricator.wikimedia.org/T120037#1843258 (Jdforrester-WMF) declined>stalled [19:20:06] Analytics: Vital Signs: Please make the data for enwiki and other big wikis less sad, and not just be missing for most days - https://phabricator.wikimedia.org/T120036#2592538 (Jdforrester-WMF) [19:20:08] Analytics-Kanban: Wikistats 2.0. Edit Reports: Setting up a pipeline to source Historical Edit Data into hdfs {lama} - https://phabricator.wikimedia.org/T130256#2131074 (Jdforrester-WMF) [19:20:12] Analytics: Vital Signs: Please make the data for enwiki and other big wikis less sad, and not just be missing for most days - https://phabricator.wikimedia.org/T120036#1843248 (Jdforrester-WMF) declined>stalled [19:36:57] Analytics-Kanban, Analytics-Wikimetrics: Stop vital signs metric creation on wikimetrics - https://phabricator.wikimedia.org/T143715#2592579 (Nuria) Stopped metrics via turning reports that were recurrent and executed via wikimetrics bot off: mysql> update report set recurrent=0, old_recurrent=1 where... [19:42:05] Analytics: Look through wikimetrics/scripts and clean them up {dove} - https://phabricator.wikimedia.org/T123956#2592612 (Nuria) Open>declined Not doing any work on wikimetrics any time soon. closing. [19:42:16] Analytics, Analytics-Wikimetrics: Look through wikimetrics/scripts and clean them up {dove} - https://phabricator.wikimedia.org/T123956#2592615 (Nuria) [19:56:14] Analytics-Kanban: AQS Cassandra READ timeouts caused an increase of 503s - https://phabricator.wikimedia.org/T143873#2592642 (Nuria) {F4414209} [19:57:00] Analytics-Kanban: AQS Cassandra READ timeouts caused an increase of 503s - https://phabricator.wikimedia.org/T143873#2592645 (Nuria) {F4414211} [19:57:25] Analytics-Kanban: AQS Cassandra READ timeouts caused an increase of 503s - https://phabricator.wikimedia.org/T143873#2592650 (Nuria) The # of SSTables read correlates with read messages dropped [20:01:46] Analytics-Kanban, Analytics-Wikimetrics, Browser-Support-Firefox, Patch-For-Review: Cannot remove invalid members from cohort - https://phabricator.wikimedia.org/T113454#2592669 (Nuria) Open>Resolved [20:09:02] ottomata: is it going to be OK to move through the rsync mount the data for caching folks? [20:09:21] is about 14 files of 4G each [20:10:25] nuria_: ja i think so 3T avail on stat1001 [20:10:56] ottomata: k, will put then https://datasets.wikimedia.org/limn-public-data/caching/ [20:12:15] k [20:28:05] ottomata: hey, ok, I'm free [20:28:15] not sure how much I have left in the tank after that painful experience :) [20:28:17] but free [20:30:27] milimetric: we can do later [20:30:27] haha [20:30:33] tomorrow or whtaever [20:30:56] unless you have tips on how to write good standalone tests when I need things like kafka running [20:31:09] have been trying docker and travis, not getting very far [20:31:33] like integration tests that spin up kafka and listen for events? [20:32:41] i have to spin up kafka ja [20:33:07] we usually mock any services, because whether or not the network is behaving doesn't interest us in tests [20:33:17] but are these tests for the network specifically? [20:34:41] ottomata: if you explained an example test it would help [20:36:05] milimetric: subscribe to allowed topic, filter on a couple of fields, consume a few messages [20:36:10] test that some data in the topic is filtered [20:36:18] or [20:36:19] even [20:36:22] just subscribe and consume a few messages [20:36:29] or [20:36:39] assign at a partitcular offset [20:36:47] test that message consumed is what is expected [20:37:18] how is the filtering and offset stuff done? Is that standard kafka api that should work? [20:37:22] no [20:37:25] filtering is custom [20:37:29] offset is "standard" [20:37:52] k [20:38:00] I wouldn't test the offset or other standard features [20:38:04] i'm sure I can test the filtering function with just unit tests, but i want to test that the main function works as expected [20:38:07] naw i have to [20:38:12] that's most of what this thing is [20:38:16] socket.io + kafka [20:38:19] if i don't test that [20:38:23] then i will be testing hardly anything [20:38:31] its just a socket.io interface on top of a kafka client [20:38:47] that's a good thing. those libraries should test their standard features themselves [20:38:57] if you test it, it would just be duplicate work [20:38:58] but i want to test my use of those features with socket.io [20:39:07] how else will i be sure that if i change somehting [20:39:21] say, add timestamp offset support to way we can subscribe [20:39:25] still works [20:40:10] that I do with mocks [20:40:29] the kafka client will change though [20:40:32] when timestamp support is available [20:40:38] so I mock any outside service and I call it and then inspect the mock to make sure it was called the way I expected it [20:40:40] then i will add code to use the changed client api [20:40:45] how do I test that code I added? [20:41:05] I'm not sure of the details but I can say generally [20:41:45] if you have the api mocked, and you used to call it with like {offset: x} and now you call it with {offset: x, timestamp: y} then the mock inspection should fail because you're passing a parameter it doesn't know about [20:42:04] so you'd change your test to expect it to also pass timestamp: y and then it should pass [20:42:10] i dunnoooooooo [20:42:12] i mean i get the idea [20:42:13] but [20:42:23] or if you're TDD, you would change your test first to expect timestamp and it would fail, then you'd implement and make it pass [20:42:26] this library is specifically tied to kafka and a kafka client [20:42:50] right, but it's not responsible for that client's internal correctness [20:42:52] i want to test that the code I wrote works specifically with the kafka client [20:42:56] only that it correctly talks to it [20:43:13] yeah, the mock should let you do that if you inspect how you call it thoroughly [20:43:43] you're still testing what you want, but not the service itself, just the interface to the service, translated into a mock [20:43:53] hha, sounds like a ton of work [20:44:04] nah, modern mocking libs are easy [20:44:10] let me find an example in dashiki [20:45:33] milimetric: btw, kafka is not http [20:45:54] ottomata: ok so here we stub $.ajax (jquery calling out to the network): https://github.com/wikimedia/analytics-dashiki/blob/master/test/app/apis.js#L18 [20:46:10] and here we test that our code called it with the URL we expected: https://github.com/wikimedia/analytics-dashiki/blob/master/test/app/apis.js#L41 [20:46:49] and if we cared (which in this case we don't) about what $.ajax returns, we could mock that here: https://github.com/wikimedia/analytics-dashiki/blob/master/test/app/apis.js#L33 [20:47:49] Analytics, Analytics-Dashiki, Wikimedia-Site-requests: Need a Dashiki namespace so we can protect configs {crow} - https://phabricator.wikimedia.org/T112268#2592767 (Milimetric) Yeah, but we want an extension for other reasons like rendering the content of the page nicely if it's a Dashiki:____ page. [20:50:27] milimetric: what is going on with sinon? [20:50:31] looks like http mocking to me :) [20:50:45] yeah, in that case, but there's mocking libraries to help with whatever you need [20:50:53] sinon just has a bunch of features to mock async and all that [20:51:32] I'm not saying it's like super duper straightforward but in the long run it keeps you from having complicated tests that break when services are down and other stuff unrelated to your code correctness [20:51:55] milimetric: why not just use docker/travis with kafka running? [20:52:53] you said you were trying and not getting very far. I had the same kind of problems when trying to set up integration tests instead of unit tests [20:52:54] milimetric: i think i'm not getting how i would use this [20:53:15] oh sorry, didn't mean to suggest this exact library, I have to see what you actually need to mock [20:53:22] just giving an example mock [20:53:28] (that it's not a lot of code) [20:54:13] you got code for a test that would use a local instance of a service? I'll check it out and see if I can recommend a good simple way to mock it [20:54:38] i guess it seems like a lot of work, services does stuff like this for change-prop [20:54:39] https://github.com/wikimedia/change-propagation/blob/master/.travis.yml [20:54:42] but, that's just travis [20:54:52] i'd like to be able to run consistent tests locally [20:55:02] sure milimetric um [20:55:06] i'm still using socket.io tests though [20:55:14] https://github.com/ottomata/kasocki/blob/master/test/Kasocki.js [20:55:38] subscribe tests were easy [20:55:40] but now i want to do consumption [20:55:49] and to do that I have to prep the topics in a running broker to have the data I want [20:59:55] ok, so in your case require('socket.io-client')(`http://localhost:${port}/`) should just work no matter what you do to your code, so that could be mocked. [21:00:04] so the mock would be an object that has fake .on and .emit functions [21:00:39] those could get pretty complicated, they could act async so you can test as if they were actually being called and queued up for execution later [21:01:13] and you could mock .disconnect too [21:01:33] and so you could have a function that does meaningful work, call it from your test [21:01:59] and then you would inspect your mock like client.emit was called with such and such, and client.on was called with these parameters, and then client.disconnect was called [21:02:16] sorry it's a bit hand-wavey, we can walk through a full example tomorrow when I have more energy? [21:03:07] milimetric: ok we can talk more [21:03:12] but that really doesn't test what i want to test [21:03:15] i want to test teh code i wrote [21:03:26] not the socket io -client and a mock [21:03:50] i want to test this [21:03:50] https://github.com/ottomata/kasocki/blob/master/lib/Kasocki.js [21:03:51] :) [21:03:54] yeah, then I just misunderstood, the idea is to get 100% coverage of the code you wrote, but not of other libraries [21:04:12] things like [21:04:13] https://github.com/ottomata/kasocki/blob/master/lib/Kasocki.js#L498 [21:05:03] oh yeah, ok, I see, we can dive into it more tomorrow [21:06:22] k thanks [22:16:19] newbie question - given an sql query what's the easiest way to pipe that out of hive into a file? [22:16:43] something like hive -e 'select *' > /temp.tsv ? [22:44:38] Analytics-EventLogging: Add sampling support in EventLogging - https://phabricator.wikimedia.org/T67500#2593207 (JKatzWMF) @Nuria Thanks! I wasn't aware of that possibility, but it makes sense and seems like a decent fit for now, at least from a queryers perspective. [23:02:28] (PS1) MaxSem: Remove unused uses [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307447 [23:02:30] (PS1) MaxSem: Refactor the logging command [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307448 [23:02:32] (PS1) MaxSem: Fix graphite port [analytics/discovery-stats] - https://gerrit.wikimedia.org/r/307449