[01:16:09] 10Analytics, 10Analytics-Wikistats: New page stats are inaccurate for fawiki - https://phabricator.wikimedia.org/T183208#3846927 (10Huji) [01:16:20] 10Analytics, 10Analytics-Wikistats: New page stats are inaccurate for fawiki - https://phabricator.wikimedia.org/T183208#3846939 (10Huji) [01:28:30] (03PS1) 10Ottomata: JsonRefine - Be nicer about what type coercions we will accept [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/399126 (https://phabricator.wikimedia.org/T182000) [01:38:39] (03PS2) 10Ottomata: JsonRefine - Be nicer about what type coercions we will accept [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/399126 (https://phabricator.wikimedia.org/T182000) [01:39:54] (03PS3) 10Ottomata: JsonRefine - Be nicer about what type coercions we will accept [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/399126 (https://phabricator.wikimedia.org/T182000) [01:41:48] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819#3847000 (10Smalyshev) Here's how I see the process for handling releases (see also {T183020}): 1. WDQS logs are placed in separate p... [03:47:25] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819#3847071 (10Nuria) @Smalyshev: Take a look at information we keep on pageview hourly, for long time keeping we need to remove PII and... [03:53:50] 10Analytics, 10Analytics-Wikistats: roadmap of migration to Wikistats 2 - https://phabricator.wikimedia.org/T183180#3847077 (10Nuria) >As said in the launch statement, migration of (part of the) existing reports will happen over time. Is there a road map, with timing and priorities? Not yet, it is not likely w... [07:13:36] (03PS5) 10Mforns: Add link path to router-link [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398854 (https://phabricator.wikimedia.org/T183149) [07:17:11] (03CR) 10Mforns: "I think this works as expected now, ready to review :]" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398854 (https://phabricator.wikimedia.org/T183149) (owner: 10Mforns) [07:18:21] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: [Wikistats2] Add link path to router-link - https://phabricator.wikimedia.org/T183149#3844925 (10mforns) [09:46:12] just added all the puppetization for running the eventlogging cleaner stuff on db1107 [09:46:25] the cron should start at 11 UTC [10:22:59] https://gerrit.wikimedia.org/r/#/c/399153/2 needs to be merged [10:23:19] will wait for Marcel to triple check :) [10:46:18] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3847559 (10elukey) [11:34:53] we might have a problem [11:35:11] on kafka1023 (the old notebook1002) we have some kafka disk partitions almost filled [11:35:35] for some reason multiple text partitions ended up in the same disk partition on kafka1023 [11:52:03] * elukey lunch! [13:01:55] mforns: o/ [13:02:02] hola amigo [13:02:23] whenever you have time can you check the code review for the el's whitelist? [13:02:32] ready to clean up the el master :) [13:09:33] https://gerrit.wikimedia.org/r/#/c/399153/2/modules/profile/files/mariadb/misc/eventlogging/eventlogging_purging_whitelist.tsv [13:27:03] Heya elukey - I finally manage to be here ! [13:27:39] elukey: About kafka - Is there anything I can help with? [13:31:32] joal: hey! Nothing really urgent, the disk usage seems stable, need to chat with andrew about this though [13:31:39] I don't want pages during the holidays :D [13:31:46] elukey: I hear that :) [13:31:59] elukey: do we try to update druid when you'll have time? [13:36:06] elukey, will do, I'm having some trouble with internet connection... [13:36:43] mforns: ack! [13:38:21] joal: how urgent is it? It would need to review metrics and make some calculations [13:38:54] elukey: I think it's fairly urgent - pivot is pretty-much unusable on pageview data for instance [13:39:12] elukey: :( [13:39:54] for all pageview data or for big queries? [13:40:01] I checked and the swap seem to be stable [13:40:21] elukey: Simple request for yesterday data timesout [13:40:53] elukey: I don't know what, but I have the feeling something has change in druid [13:41:06] elukey: Those requests used to work [13:41:41] elukey: I'm trying to think of the dimodifications we've made, and I can't think of anything except addition of metrics [13:42:29] yep [13:43:22] joal: lol https://grafana.wikimedia.org/dashboard/db/prometheus-druid?panelId=1&fullscreen&orgId=1&from=now-7d&to=now [13:44:32] lemme check one metric that I didn't add in the dashboard [13:48:18] elukey: This chart you pasted tell me there are simple changes we can make hat maybe could help ;) [13:58:11] yep I know, I pasted it several times :) [13:58:17] there is also https://grafana.wikimedia.org/dashboard/db/prometheus-druid?orgId=1&from=now-2d&to=now&panelId=42&fullscreen [13:58:50] now this is the rate of count (datapoints) for the broker query metric [13:59:15] 10Analytics, 10Analytics-Wikistats: New page stats are inaccurate for fawiki - https://phabricator.wikimedia.org/T183208#3846927 (10JAllemandou) Hi @Huji, Thanks for your ticket, it is very clear and well documented :) I'll try to give you answers to some of the things you pointed: - We know about the chart... [13:59:28] is there anything that keeps requesting pageview data? [14:00:15] elukey: I'm thinking of soembody having a pivot open with regular reloading [14:01:11] but even that seems super low [14:01:21] I nkow you posted that chart multiple times, I was just overwhelmed by data vetting :S [14:01:41] Now I feel I can do other things, like writing a christmas list for java 8 [14:01:47] elukey: --^ :-P [14:02:38] please do!! [14:02:40] \o/ [14:06:27] mforns: o/ [14:06:41] the other weird thing that is happening is burrow alarming for eventlogging lagging [14:06:57] mforns: About the problem you mentioned yesterday for mediawiki_history_old - How do you think we should execute? [14:08:27] 10Analytics, 10Analytics-Wikistats: New page stats are inaccurate for fawiki - https://phabricator.wikimedia.org/T183208#3847999 (10Huji) >>! In T183208#3847949, @JAllemandou wrote: > - We know about the chart misalignment (T182817), we will work on correcting this (this is a real bug !). Then this part of th... [14:10:16] !log temporary changed JVM Heap settings for the druid broker on druid1001 - Xmx25g Xms10g (run puppet and restart the daemon to rollback) [14:10:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:14:49] so removed ~15g to Xms and a query like https://goo.gl/W3fNEZ still fails [14:15:02] elukey: right [14:15:16] elukey: That doesn't smell [14:15:19] good [14:24:32] joal: something interesting.. on druid1001's historical metrics log I can see the following [14:24:35] ","version":"0.9.2"}] [14:24:38] Event [{"feed":"metrics","timestamp":"2017-12-19T14:20:24.968Z","service":"druid/historical","host":"druid1001.eqiad.wmnet:8083","metric":"query/segment/time","value":40004,"dataSource":"pageviews-hourly","duration":"PT86400S","hasFilters":"false","id":"6e250e50-dd6a-4982-892a-7fd41d9881bd","interval":["2017-12-13T00:00:00.000Z/2017-12-14T00:00:00.000Z"],"numComplexMetrics":"0","numMetrics":"1" [14:24:44] ,"segment":"pageviews-hourly_2017-12-13T00:00:00.000Z_2017-12-14T00:00:00.000Z_2017-12-14T01:14:27.556Z","status":"failed","type":"timeseries","version":"0.9.2"}] [14:24:48] a lot of them [14:24:57] checkout the "status":"failed" [14:25:08] the metric is "Milliseconds taken to query individual segment. Includes time to page in the segment from disk." [14:25:27] elukey: Would that mean that a segment is corrupted [14:25:30] ? [14:25:49] (03PS4) 10Ottomata: JsonRefine - Be nicer about what type coercions we will accept [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/399126 (https://phabricator.wikimedia.org/T182000) [14:27:00] joal: I have no idea :D [14:27:27] elukey: :D [14:27:42] I'd have expected an error in the historical.log [14:27:45] but only INFO [14:28:46] ottomata: o/ [14:28:58] whenever you have time checkout kafka1023's df -h :( [14:29:01] hiii [14:29:04] hiiiii [14:29:19] woah [14:29:21] /dev/sdc1 1.8T 1.7T 81G 96% /var/spool/kafka/c [14:29:27] /dev/sde1 1.8T 5.2G 1.7T 1% /var/spool/kafka/e [14:29:27] hmmm [14:29:28] haha [14:29:42] there are a ton of webrequest partitions in some disk partitions [14:29:45] and others are free [14:29:50] yeah exactly :D [14:30:02] well NICE JOB KAFKA [14:30:08] a good reason to use raid :) [14:30:16] sheesh [14:30:17] hm [14:30:34] hiya [14:30:39] iii [14:30:40] hi [14:31:00] elukey: if we could operate this way until we decom kafka analytics, i'd just leave it [14:31:10] but i'm worried about sdc1 [14:32:16] elukey: its totally caught up, so i think 80G free might be enough [14:32:21] hard to be sure though [14:33:19] ottomata: I thought the same but wanted to brainbounce with you, it would be really horrible to get a page during the holidays :D [14:34:01] ottomata: another thing that we could do it to lower down the maximum space allowed for each topic partition on disk [14:34:07] now it is ~350G [14:34:24] hm [14:34:31] we could do that, but that'd be for the whole cluster, right? [14:34:41] yeah [14:37:55] so joal confirmed, if we find what that "status":"failed" means for the historical we probably solve the issue [14:40:24] joal: also - why https://grafana.wikimedia.org/dashboard/db/prometheus-druid?orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=druid_analytics&var-druid_datasource=pageviews-hourly does not show any increase in cached objects for the broker? [14:40:29] err historical [14:42:01] elukey: I have no clue :( [14:42:21] elukey: maybe because query fails? [14:42:48] elukey: ah no, I think that druid only caches once: either at broker or historical level [14:42:59] So if broker caches, no historical cache gets used [14:44:06] interesting, we have /var/lib/druid/segment-cache [14:44:44] (03PS6) 10Fdans: Add pageviews by country endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/393591 (https://phabricator.wikimedia.org/T181520) [14:48:30] elukey: From what I read in the historical metric log, it means queries have failed [14:49:09] I don't find errors in any historical.log [14:49:17] historical-metrics.log [14:49:23] it says queries have failed [14:49:57] on which host? [14:50:01] 1001 [14:50:12] tail -10000 historical-metrics.log | grep failed | less [14:50:49] you are talking about the error that I pasted before right? the status: faile? [14:50:53] Interestingly elukey - It seems memory on druid1001 is still almost full ! [14:50:58] correct elukey [14:51:27] sure, but I'd expect in historical.log something like "Look man, this went really wrong, aborting" [14:51:30] :D [14:51:47] elukey: looking at htop, there seem to have a druid process asking for ~60G virtual memory! [14:52:44] on druid1002 I can see 2017-12-19T14:49:56,956 INFO io.druid.server.QueryResource: Timeout waiting for task. [1f9a48c0-56fa-442a-998a-75d42e4dceb5] [14:52:52] INFO [14:52:54] sigh [14:52:56] :( [14:55:08] elukey: in broker start line: -XX:MaxDirectMemorySize=64g [14:55:33] elukey: not sure about what it means though [14:59:35] need coffee to refresh my brain :D [15:03:36] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Review the alert message about adblocker preventing AQS requests - https://phabricator.wikimedia.org/T182958#3848278 (10Milimetric) @Trizek-WMF sorry I'm not sure if you mean your suggestion is still clearer. To make sure, I'll copy my suggestio... [15:04:36] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Review the alert message about adblocker preventing AQS requests - https://phabricator.wikimedia.org/T182958#3848283 (10Trizek-WMF) Sorry I haven't been clear enough. Yours is perfectly fine, that is what I meant to say. :) [15:07:27] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Review the alert message about adblocker preventing AQS requests - https://phabricator.wikimedia.org/T182958#3848289 (10Milimetric) No problem! Thank you for the task, will update the patch now. [15:08:06] joal: I think it is a segment size problem, because if we restrict the time window it works fine [15:08:59] (03PS2) 10Milimetric: Change addblocker text [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398598 (https://phabricator.wikimedia.org/T182958) (owner: 10Nuria) [15:09:14] (03CR) 10Milimetric: [C: 032] Change addblocker text [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398598 (https://phabricator.wikimedia.org/T182958) (owner: 10Nuria) [15:09:41] elukey: I get errors for other datasources [15:09:55] with smaller segment sizes [15:10:05] (like banner activity) [15:10:39] elukey: banner_activity for last monh doesn't work [15:10:54] No filtering, goupby, nothin nothing [15:11:26] (03Merged) 10jenkins-bot: Change addblocker text [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398598 (https://phabricator.wikimedia.org/T182958) (owner: 10Nuria) [15:12:33] joal: so I am going to lower down all the Xms settings for the daemons [15:12:41] this needs to be done [15:12:47] druid.segmentCache.numLoadingThreads might also be useful [15:12:50] elukey: Wokrs for me [15:12:52] since it is 1 now [15:13:07] elukey: I have no clue what this guys does [15:13:57] 10Analytics-Kanban, 10Analytics-Wikistats: Add a tooltip to all non-obvious concepts like split categories, abbreviations - https://phabricator.wikimedia.org/T177950#3848324 (10Milimetric) [15:14:00] elukey: I think direct memory size should also be changed [15:15:53] joal: let's do one at the time [15:16:19] k elukey, watching metrics, I can also try queries at your will [15:16:25] 10Analytics-Kanban, 10Analytics-Wikistats: Add clear definitions to all metrics, along with links to Research: pages - https://phabricator.wikimedia.org/T183261#3848328 (10Milimetric) [15:16:35] fdans, back [15:16:42] baticueva? [15:16:43] 10Analytics, 10Analytics-Wikistats: New page stats are inaccurate for fawiki - https://phabricator.wikimedia.org/T183208#3846927 (10Milimetric) +1, very true about lack of transparency, @Huji, and we know this is an issue. We are working on two tasks that will make this better: 1. Labeling all non-obvious in... [15:16:48] joal: https://gerrit.wikimedia.org/r/39919 - care to double check? [15:17:07] mforns: I figured it out! don't worry :) [15:17:11] fdans, ah never mind [15:17:15] :] [15:17:24] elukey: ?? [15:17:25] joal: sorry https://gerrit.wikimedia.org/r/399195 [15:17:28] :D [15:17:29] right [15:18:42] elukey - Sounds sane [15:18:54] (03CR) 10Ottomata: Add refinery-drop-hive-partitions (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/392733 (https://phabricator.wikimedia.org/T181064) (owner: 10Ottomata) [15:19:06] (03PS2) 10Ottomata: Add refinery-drop-hive-partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/392733 (https://phabricator.wikimedia.org/T181064) [15:19:47] (03CR) 10Ottomata: [V: 032 C: 032] Add refinery-drop-hive-partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/392733 (https://phabricator.wikimedia.org/T181064) (owner: 10Ottomata) [15:20:22] elukey: look at that http://druid.io/docs/latest/configuration/production-cluster.html [15:20:24] aqs working with per country data with bucketing and K applied [15:20:27] * fdans dances [15:20:55] We have the same config for some memory setting, but when looking at hardware, well, seems to be different :D [15:20:56] 10Analytics, 10Analytics-Wikistats: Display of radio buttons in Wikistats 2 is somewhat confusing - https://phabricator.wikimedia.org/T183185#3848366 (10Milimetric) Yeah, I agree these are confusing, we'll look for a better widget (I was actually hoping to use OOJS UI here but haven't had time to modularize it... [15:20:58] elukey: --^ [15:21:10] * joal danes with fdans :D [15:21:17] s/danes/dances [15:21:42] nonono joal, I like danes with fdans [15:21:43] http://puppytoob.com/wp-content/uploads/2017/04/Great-Dane-Puppies.jpg [15:21:47] :D [15:22:35] 10Analytics, 10Analytics-Wikistats: Make the colors used the line charts in Wikistats 2 more easy to recognize. - https://phabricator.wikimedia.org/T183184#3848371 (10Milimetric) Yeah, we have to rethink this. Do you think even thicker lines would be a good idea? Or maybe just when a color is lighter than #9... [15:23:29] Thanks milimetric for commenting on the ticket from Huji :) [15:23:42] joal: that was the best-written ticket1! [15:23:43] love it [15:23:46] love our users [15:23:50] Dec 19 15:23:19 druid1002 druid[27431]: Too small initial heap for new size specified [15:23:52] Yessir, very much agreed ! [15:23:53] ufff [15:24:33] elukey: :( Ok - -XX:NewSize=6G - I'm sorry I didn't catch that :( [15:24:37] joal: when we go into beta, we should put a big banner up for everyone that's been using the alpha saying "THANK YOU SO MUCH YOU ALL ROCK FOR HELPING US MAKE THIS UI BETTER" [15:24:51] milimetric: +1000 [15:25:22] joal: me too :( [15:26:22] elukey: should we change those NewSize and MaxDirectMemory? I'd say NewSize=2G, maxDirectMemory=24G [15:26:38] It would at least resonnate less crazy with the hardaware [15:27:31] yeah let's do some calculations [15:27:41] elukey: batcave? [15:28:21] r3.8xlarge (Cores: 32, Memory: 244 GB, SSD) [15:28:23] ahahah [15:28:29] you have now :) [15:29:23] even better: r3.8xlarge (Cores: 32, Memory: 244 GB, SSD - this hardware is a bit overkill for the broker but we choose it for simplicity [15:29:32] joal: joining [15:31:12] (03CR) 10Mforns: "Looks good overall, but still I'm conflicted about my own naming idea. Can we discuss with the team?" (032 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/393591 (https://phabricator.wikimedia.org/T181520) (owner: 10Fdans) [15:33:01] (03CR) 10Mforns: [V: 032 C: 032] "LGTM!" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/398411 (https://phabricator.wikimedia.org/T182477) (owner: 10Milimetric) [15:40:26] (03CR) 10Fdans: [V: 032 C: 032] "Yesssss, much needed change, and extra cool that you identified the problem with ctrl-click link opening. The site doesn't feel like it's " [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/398854 (https://phabricator.wikimedia.org/T183149) (owner: 10Mforns) [15:41:39] thanks a lot fdans :] [15:41:49] fdans, you have a mac right? [15:42:00] could you test the command+click [15:42:01] ? [15:42:23] I could not, 'cause linux [15:44:02] (03CR) 10Fdans: Add pageviews by country endpoint (032 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/393591 (https://phabricator.wikimedia.org/T181520) (owner: 10Fdans) [15:44:33] mforns: yes I tested that before, both in chrome and safari, and it works perfectly [15:44:41] awesome, thanks! [15:46:31] (03PS14) 10Fdans: Add pageview by country oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) [15:50:55] (03PS15) 10Fdans: Add pageview by country oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) [15:58:34] (03CR) 10Fdans: "@Joal I'm happy with the change as it is right now. i wonder if there is a better way to bucketise in the query" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) (owner: 10Fdans) [16:01:06] fdans: standuuup [16:02:10] elukey: I feel the same now working with you on Druid that I felt when we were fighting cassandra [16:02:27] joal: ahhaha yes [16:02:28] Thanks a lot elukey for raising the bar on our tools :) [16:02:48] joal: let's hope that the bar will not fall on our heads :D [16:02:50] :) [16:15:51] 10Analytics, 10Analytics-Wikistats: Display of radio buttons in Wikistats 2 is somewhat confusing - https://phabricator.wikimedia.org/T183185#3848624 (10Erik_Zachte) No, I'm fine with second bullet point, I just meant to say 'there is too much happening when I click the radiobutton'. But subdivide on the chose... [16:18:48] 10Analytics, 10Analytics-Wikistats: Make the colors used the line charts in Wikistats 2 more easy to recognize. - https://phabricator.wikimedia.org/T183184#3848628 (10Erik_Zachte) My preference would be the second option: Just when a color is lighter than #999999 add a thin black outline. But using lighter col... [16:21:01] (03PS1) 10Milimetric: Add link to CC0 license [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/399212 [16:30:41] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3848696 (10Erik_Zachte) For the record, I'm copying from a mail exchange with @Milimetric : > The disadvantage is that the list becomes really hard to scroll... [16:32:20] 10Analytics, 10Analytics-Wikistats: New page stats are inaccurate for fawiki - https://phabricator.wikimedia.org/T183208#3846927 (10Nemo_bis) > the underlying problem is lack of transparency This used to be fixed by linking definitions at https://www.mediawiki.org/wiki/Analytics/Metric_definitions [16:34:47] !log manually started eventlogging cleaner on db1107 to purge/sanitize data up to 90 days ago (tmux is running for user eventlogcleaner) [16:34:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:34:57] mforns, ottomata --^ [16:35:03] elukey: I'm also looking a server-board for druid100[123], I can't find anyting really in the past few days [16:35:54] (03PS2) 10Milimetric: Fix visualizer when no startDate specified in conf [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/399212 [16:43:18] (03PS1) 10Milimetric: Update browser reports [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/399220 [16:43:48] (03CR) 10Milimetric: [V: 032 C: 032] Update browser reports [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/399220 (owner: 10Milimetric) [16:44:23] nuria_: I deployed the browser dashboard, but didn't deploy vital-signs or that stub reportcard dashiki thing, because I'm guessing we should sunset those, right? [16:44:33] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3848739 (10jmatazzoni) @Erik_Zachte, just to be clear. In T179530#3839200 above, I'm not describing an interface problem (though I agree there is one). I'm de... [16:44:47] milimetric: nah, let's keep them, they are cheap to keep and linked everywhere [16:45:11] why don't we just update the links to point to wikistats 2? [16:45:38] (03CR) 10Milimetric: [V: 032 C: 032] Fix visualizer when no startDate specified in conf [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/399212 (owner: 10Milimetric) [16:46:13] milimetric: because wikistats2 doesn't let you yet compare 5 projects on a graph thus there is no equivalent [16:46:40] milimetric: let's just keep them until we have abette multiproject UI [16:46:44] *a better [16:46:46] ok, makes sense [16:46:48] deploying [16:51:36] (03PS7) 10Fdans: Add pageviews by country endpoint [analytics/aqs] - 10https://gerrit.wikimedia.org/r/393591 (https://phabricator.wikimedia.org/T181520) [16:51:56] mforns: name changed :) [16:52:02] (03PS1) 10Milimetric: Remove unused deploy config [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/399222 [16:52:25] (03CR) 10Milimetric: [V: 032 C: 032] Remove unused deploy config [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/399222 (owner: 10Milimetric) [16:52:27] fdans, looking [16:52:54] !log temporarily stop superset to test druid's performances [16:52:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:53:40] 10Analytics, 10Analytics-Wikistats: When searching for a project language, display a full list of languages - https://phabricator.wikimedia.org/T182960#3848764 (10Erik_Zachte) For the record, I'm copying from a mail exchange with @Milimetric : The disadvantage is that the list becomes really hard to scroll th... [16:54:41] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats Bug: Menu to select projects doesn't work (sometimes?) - https://phabricator.wikimedia.org/T179530#3848770 (10Erik_Zachte) Ah, sorry I mixed up two tasks then. Just moved my comments to https://phabricator.wikimedia.org/T182960 [16:54:41] elukey: could you run this in production cqlsh? [16:54:52] (03PS1) 10Milimetric: Update vital-signs and reportcard [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/399224 [16:55:01] (03CR) 10Milimetric: [V: 032 C: 032] Update vital-signs and reportcard [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/399224 (owner: 10Milimetric) [16:55:14] https://www.irccloud.com/pastebin/8r6ui04y/ [16:55:27] ok, nuria_, deployed, we should see them whenever puppet runs [16:55:29] elukey whenever you have a moment :) [16:56:10] fdans: sure, has it been reviewed by others? [16:56:34] elukey: nooope [16:56:58] has it been tested in labs? [16:57:03] elukey: yes [16:57:09] all right, goood :) [16:57:16] let's ask to some aqs master a review [16:57:17] it only affects the top by country keyspace [16:57:24] "only" [16:57:44] elukey: do we have a ticket for druid perf issues? [16:57:44] mmmm drop keyspace "local_group_default_T_top_bycountry"; [16:57:49] nuria_: not yet :) [16:58:10] elukey: this is to add the access column to the keyspace [16:58:28] (this keyspace isn't being used in production, I created it a couple weeks ago to test the oozie jobs) [16:58:38] you could have specified that :D [16:58:52] "hey Luca let's drop this data in production" [16:58:52] yeah luca but where's the mystery then [16:58:57] hahaha [16:59:10] hehe [17:00:04] anyhow, let's get a quick review from joal or anybody and proceed [17:01:17] (yes I know I am paranoid, welcome to dealing with Luca the annoying ops session) [17:03:32] nooo you're the best luca [17:04:25] (03CR) 10Nuria: "nice, thanks for cleaning up" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/399222 (owner: 10Milimetric) [17:04:37] elukey: if it'll make you feel less uneasy, that script was generated by cassandra through DESCRIBE in beta, I didn't write it :D [17:07:56] fdans: yep but every statement that goes to cassandra in prod needs a review [17:08:05] there are a ton of settings that need to be kept in mind [17:08:11] just want to make sure that we do things properly [17:11:28] 10Analytics, 10Analytics-Wikistats: When searching for a project language, display a full list of languages - https://phabricator.wikimedia.org/T182960#3848807 (10Trizek-WMF) So the problem here is to find a way to have people understanding that thy have to type something into the input field, if we asume typi... [17:13:30] 10Analytics-Kanban: Druid Woes - https://phabricator.wikimedia.org/T183273#3848809 (10Nuria) [17:13:41] elukey: https://phabricator.wikimedia.org/T183273 here it is [17:13:42] fdans, elukey - reading [17:15:15] fdans, elukey: sounds correct to me ! [17:15:33] thank you joal! [17:16:02] joal: can you execute it? I am a bit swamped with db1107 and druid :( [17:16:09] elukey: I can ! [17:16:53] !log Initilizaing new cassandra keyspace for pageviews/top-by-country [17:16:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:17:16] ahah - there was a glitch [17:17:22] hopefully I reread it! [17:18:37] fdans: done :) [17:19:25] elukey: no change for pivot - still timeout [17:19:28] :( [17:19:35] elukey: stop metric gathering? [17:20:22] it started way before the issues, I am not convinced that it is the problem [17:20:30] I'll dig more into it [17:21:33] elukey: while I agree, procedure for test seems also not to bad: remove superset, then remove metrics - and see - if it doesn't change anything, it just means we've reach a threshold in term of data handled by the cluster [17:24:42] joal: I'd need to restart the daemons to do that, I'd prefer not to [17:25:03] ok elukey [17:27:35] I mean, I'll do it as last step, but it feels really weird [17:27:55] just rolled back in puppet the jvm settings so they'll be the same as before [17:28:13] k [17:28:19] and re-enabled superset [17:28:26] !log re-enabled superset [17:28:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:28:33] elukey: You have other track leads? [17:29:36] I'd like to have more info about why the historical times out [17:30:18] elukey: So do I ! [17:30:27] elukey: outputing more logs? [17:31:54] it could be an option yes [17:33:06] joal: how long ago were you able to query one week of pageview data without any issue? [17:34:26] elukey: Difficult to think of - I built the superset dashboard (doing exactly that) last week, or the week before [17:36:21] elukey: I created the superset pageview dashboard tha time: 2017-12-12 16:42:20 [17:36:41] elukey: meaning that day (12th), pageview was still queriable [17:36:48] elukey: It was slow, but queriable [17:40:37] joal: even now it is queriable, but not for say a week [17:40:44] right [17:41:22] elukey: dashboard do 2 quesries, 1 for a week of data split by access-method, the other for one day of data, split by country [17:44:15] (03CR) 10Nuria: "Thanks" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/399212 (owner: 10Milimetric) [17:45:42] joal: can you retry now in pivot? [17:45:48] sure elukey [17:46:47] elukey: timeout feels longer, but still [17:46:50] failed :( [17:47:00] elukey: https://phabricator.wikimedia.org/T182628#3848935 [17:47:24] joal: it didn't fail for me [17:47:30] so, the original dell quote had 8 core processors [17:47:41] while existent hadoop nodes have 6 core [17:47:42] elukey: I'm doing a different query, using a small amount of data [17:47:49] (1 month of pageview-daily0 [17:47:54] i asked rob to see if we could get new nodes with 6 cores to keep things homogeneous [17:48:10] but, the only 6 core systems are have much slower clock speeds [17:48:25] (03CR) 10Joal: [C: 04-1] "Still some things not ok for me, plus some naming :)" (0312 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) (owner: 10Fdans) [17:48:34] so i dunno what to do. if we get a system with more cores, we might need to do some special casing for memory/worker allocation [17:49:49] ottomata: the ratio core/memory is a lot different? [17:50:05] ottomata: maybe we could bump memory if we have more cores? [17:50:05] 10Analytics, 10Analytics-Wikistats: When searching for a project language, display a full list of languages - https://phabricator.wikimedia.org/T182960#3848944 (10Nuria) >See what has been done for Compact languages link. Th This is heavily desktop-focused, our approach has to work for desktop and mobile and t... [17:50:06] ottomata: we can keep the actual config for the moment, worst that can happen is that they'll be a tiny bit underused [17:50:13] yeah [17:50:17] true [17:50:24] go with the 8 cores you mean? [17:50:28] yeah [17:50:35] (32 effective cores vs 24(?)) [17:50:53] they memory should be the same so I'd say yes [17:51:00] yeah [17:51:24] but, if we were to special case configure, we'd do it with more workers. but if we did that, the workers would have less ram [17:51:50] so if we were planning to support a hetergeneous cluster, we should probably aim to have a consistent core/rem ratio [17:51:59] as joal just said ^^^ :) [17:53:16] I agree, but it is hard to get the same hw over the years, and I'd prefer not to have a slower clock rate just to keep numbers consistent [17:53:48] I imagine our current machine have 64G RAM and 24 cores - So if we bump to 32 cores, having 96Gb RAM would do :) [17:54:13] this is another option [17:54:35] milimetric: given that mforns is on a very different TZ , could you fill in for him at sos tomorrow? (if not I can do it) [17:54:46] but worst that can happen is that we keep the same config as we are and it may happen that some cores will not get used completely [17:54:55] that is not a major issue [17:54:57] nuria_: no problem, I can do SoS tomorrow [17:55:02] mforns: I got your back [17:55:05] elukey: 1/3 of cores unused - What a shame ! [17:55:10] ok, cc mforns THANKS [17:55:15] thanks milimetric and nuria_ [17:55:20] thanks milimetric :) [17:55:32] elukey: yeah, i think I agree at going with 8 cores [17:55:41] but, i think maybe we should bump ram too [17:56:00] then slowly as the older nodes get replaced, we'll expect to ahve 8 cores with 96 ram too [17:56:17] joal: I agree with you, if we have more ram ok, my point is that if this is not possible for some reason X I'd prefer more cores that might be underused that same cores with less clock speed :) [17:56:41] aye [17:56:49] ok, i'll ask robh if we can get a quote for bumped ram then [17:56:53] I think we have an agreement :) [17:56:54] makes sense elukey :) [17:56:59] <3 [17:57:23] i'll ask for 96 and 128 options, since 128 can move to 32gb dimms [17:57:24] so pivot is now consistently returning data in no time for one week of page views [17:57:27] and may be cheaper at that amount [17:57:48] so the v4 8 core cpu at 2.1GHz is better than another cpu at 1.7GHz and 6 cores? [17:57:50] elukey: cached data I guess [17:57:57] as logn as we bump ram to 96gb min [17:58:16] ottomata: that sound right? (i honestly only skimmed backlog ;) [17:58:27] elukey: [17:58:28] https://pivot.wikimedia.org/#pageviews-daily/line-chart/2/EQUQLgxg9AqgKgYWAGgN7APYAdgC5gQAWAhgJYB2KwApgB5YBO1Azs6RpbutnsEwGZVyxALbVeAfQlhSY4AF9kwYhBkc86FWs7AKVOoxZt1XTDnxEylJQaat2nbub7VBS4XPyVFy1Q+Z4ANqafibAMmIAYgA2GBgMVAAmAK4MxNq8AAoAjACyCmi+GfgR1ABKxOQA5uJKKWnFwDn5Ssxg1OYAtNnyALryA8jBNPR2xo5mvAJCouL4jBgAVtSqBcDM8WAAgqETWg68Gwxg+qNGB6Y8+NPus7wAbqTUAO4SEBjJ5Cc+iaRMu7xEiwINRyL9qgpFOhbOcTE4pq4Zp5gF [17:58:33] JSmsAEbJCAAa2o2wBpn2YVKACEsbiTnVUukLk1sgARNZHfHFEKNZmnQz2OGTa6I27I1GycQ/P4rRrEZggsEUKqQ3rIMAMZLUJSPF5vD5fFCBYDq17vT4nBWBBXkZLRaKFX5icjjXgLZarJTRWSkE64ACs8iAA== [17:58:36] Arf sorry elukey [17:59:05] robh: yep exactly, to keep the ratio between cores/memory the same [17:59:29] elukey: https://gist.github.com/jobar/c1d089052aa543efed43f051a4adac53 [17:59:43] robh yes, was about to update ticket :) [17:59:52] and yeah let's bump ram, 96 or 128 will be great [18:00:05] cool, thats what we've done on pretty much every old to new system refresh goign from v3 to v4 of the cpu [18:00:09] so its quite normal [18:00:16] whatever we do, when we get new hardware in the future (which will probably happen early next FY because of OOW nodes), we'll aim to get more of the same [18:00:35] more memory per core is actually a very good thing for hadoop stuff [18:00:45] elukey: seems a problem to access data in time (back to druid) [18:00:53] ottomata: MOARRRRR ! [18:03:23] (03PS16) 10Fdans: Add top by country pageviews oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) [18:03:45] (03CR) 10Fdans: "@Joal addressed all comments, thank you for the review!! :)" (0311 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) (owner: 10Fdans) [18:06:10] ok, dell quote updates requested =] [18:06:18] i havent asked for hp because i dont wanna [18:06:29] im not impressed with their raid so far [18:07:00] (and we keep having other odd issues we dont have with dells since we are used to running dell) also your current hadoop are all dell r730xd or r720xd [18:07:06] seems nice to keep it the same. [18:07:31] (its less of an issue on 1u generic systems, but having clusters iwth hardware raid its awfully nice to make them match) [18:08:05] fine w me! :) [18:08:17] (03CR) 10Joal: [C: 04-1] "You went too fast @fdans :-P" (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) (owner: 10Fdans) [18:09:12] joal: I've no idea what happened in bundle.xml, sorry, maybe i hit cmd-z by accident :( [18:12:02] joal: IIRC another thing that changed was marcel pushing el data to druid right? [18:13:16] joal testing query so that I stop making a fool of myself... [18:13:21] elukey: Correct elukey - I looked at data size, nothing huge (compared to mediawiki-history-reduced for instance_) [18:13:33] fdans: :D [18:13:46] fdans: Don't worry, it's no big deal [18:20:18] joal: I'd be inclined to test debug on log4j for one historical [18:20:35] elukey: do you wish me to have a look with you? [18:20:50] elukey: for the moment I'm in meeting, but I can do soon [18:21:05] sure sure [18:31:15] * fdans has joined the clandestine slack team [18:33:11] * elukey afk for 1h, bbl! [18:45:09] (03PS17) 10Fdans: Add top by country pageviews oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) [18:45:19] joal: query tested 👌🏼 [18:45:25] Yay :) [18:50:43] Gone for diner guys [19:08:51] ottomata: here if you want [19:09:44] so /var/spool/kafka/c has only 68GB left [19:10:53] the main issue seems to be [19:11:16] the trouble is we can't assign partitions to disks. [19:11:26] we coudl resssign a large partition or two to another broker [19:11:39] OR, we could do as you suggested and just lower the topic byte retention [19:13:22] ah can we re-assign partitions? [19:13:51] I am wondering if stopping the broker, moving topic partitions data to another disk partition, start the broker would work [19:15:02] ha, hm it actually probably would... [19:15:57] elukey: we could try with a unused test topic... [19:16:56] in theory the mapping disk -> topic partition should happen when kafka boots [19:17:00] so it should work [19:17:11] i think so too [19:18:44] I am wondering why kafka did this mess [19:19:00] In the beginning I thought it was something that I've misconfigured [19:19:03] but [19:19:04] log.dirs=/var/spool/kafka/a/data,/var/spool/kafka/b/data,/var/spool/kafka/c/data,/var/spool/kafka/d/data,/var/spool/kafka/e/data,/var/spool/kafka/f/data,/var/spool/kafka/g/data,/var/spool/kafka/h/data,/var/spool/kafka/i/data,/var/spool/kafka/j/data,/var/spool/kafka/k/data,/var/spool/kafka/l/data [19:19:04] elukey: it doesn't know anything about the size of the partitions [19:19:23] so, when it gets them assigned, i think it just does it randomly/round robin [19:19:45] yeah but it never happened! Maybe we were super lucky [19:19:47] ufff [19:20:13] yeah, each logdir has the same number of partitions [19:20:36] elukey: probably its more because the topics and partitions were created together when the cluster was a whole [19:20:36] so [19:20:41] you create a webrequest topic with 12 partitions [19:20:58] its going to randomly round robin assign those to dirs [19:21:07] but if you all the sudden çretae all partitions for all topics [19:21:09] it will do the same [19:21:14] but not consider which topic [19:21:18] it only considers the number of partitions i think [19:21:44] elukey: i'm making some coffee be sitting down in a min and then maybe we can try this? [19:22:33] ottomata: I'd need to step away from keyboard to have dinner with family in ~30/40 min, otherwise later on when I am back, sorry :( [19:23:08] ok no worries [19:23:14] we can do tomorrow maybe elukey? [19:23:20] yeah it seems fine now [19:23:38] ok, i'll ACK the alert [19:25:04] joal: I've set debug for a bit on druid1002, and something useful came up, but still haven't got the time to check logs in depth. Will do tomorrow morning! [19:25:53] ebernhardson: Thanks for the help with webrequests -> queries the other day. [19:26:29] I think I have created the table I was hoping for. [19:27:20] ebernhardson: I have some rough desktop data, and wonder if you have a way to sanity check if it's in the correct range. [19:40:55] elukey: can you tell me where you stored the logs? [19:42:36] Hi joal and ottomata: Question for you about cluster resources. [19:43:02] Hi Si [19:43:07] Hi Shilad sorry [19:43:11] np! [19:43:34] joal: I have been monitoring the cluster usage to see when I should run jobs. [19:43:59] (I now have searches in sessions, so I want to rebuild them) [19:44:02] Shilad: sure, what specifically? [19:44:07] (Yay!) [19:44:25] I have seen a few recurring jobs that use a ton of vcores. [19:44:38] Wondering if this is intentional, or if it just expands because of job size. [19:44:56] depends. Hive specificially will take whatever it can get but also return things fairly generously [19:45:10] For example, one is running right now. [19:45:17] spark is a little worse on that aspect, i generally have to give it explicit instructions about max resources or it will take everything and keep it [19:45:35] (A hive job) It seems to run every hour or so... maybe more often? [19:46:00] I want to launch a series of individual jobs, one per day of logs, to rebuild the session tables. [19:46:17] hmm, flemmerich's stuff is research, rather than things in our repos, so hard to check what it is exactly [19:46:35] It seems to run pretty regularly. [19:46:55] I can limit the cores in my jobs to 16 or 32. In practice the job is only using that many for a short amount of time. [19:47:05] I am wondering about scheduling these. [19:47:21] Should I just run them (in the nice queue), and let yarn worry about? [19:47:36] 10Analytics, 10Analytics-Wikistats: Display of radio buttons in Wikistats 2 is somewhat confusing - https://phabricator.wikimedia.org/T183185#3849231 (10Nuria) >(I was actually hoping to use OOJS UI here but haven't had time to modularize it properly). I think it should not be our task to modularize OOJS UI, l... [19:47:37] generally i find the cluster is least busy afternoon to evening, PST. Generally running things in the nice queue and letting yarn figure it out is good enough [19:47:55] joal: /var/log/druid/historical.log, some of it has debug, but mostly for jmx/exporter stuff.. I think I didn't catch a query timeout logged [19:48:09] (I did it only on druid1002, tomorrow we can do it to all the brokers) [19:48:14] (err historicals) [19:48:49] Okay. Thanks, joal and ebernhardson. [19:49:01] Shilad: np, thanks ebernhardson :) [19:49:11] noted elukey - Thanks [19:49:45] ebernhardson: Re my earlier question about searches. [19:50:08] I created one day of desktop sessions (in hive table shilad.sessions). [19:50:31] Less than 3% of sessions have a text search. Any idea if that is correct? [19:50:57] joal: we'll find the issue tomorrow! AQS team back :D [19:51:04] :) [19:51:13] elukey: FYI, i just tested our partition dir move idea in mw vagrant [19:51:14] works great. [19:51:56] Shilad: unfortunately, we don't have metrics on the % of sessions that perform searches. Our (sampled) user tracking starts when they type the first letter into a search box or land on special:search results (if they, for example, came to search from a link they followed). [19:52:10] ottomata: yyyyyyeeeeeeaaaahhhhhhh [19:52:12] \o/ [19:52:28] this is an awesome news [19:52:36] elukey: very interesting debug logs indeed! let's discuss tomorrow :) [19:52:38] elukey: i'd feel good about doing it now [19:52:42] Shilad: generally though, we know there are around 1M full text searches per day on enwiki, so comparing page views to full text searches 3% doesn't seem out of the ballpark [19:52:47] if you want, otherwise if you are leaving lets wait til tomorrow [19:52:48] ebernhardson: Got it. Well... maybe this is useful then? [19:52:51] Shilad: 1M desktop full text searches i mean [19:53:00] ebernhardson: I have 6M searches across all wikis for Dec 1st. [19:53:04] So sounds about right. [19:53:13] ottomata: yeah I am leaving now, will try to ping you back if I come back soon, otherwise tomorrow morning? [19:54:06] (/me afk) [19:55:53] ok sounds good [19:55:54] ottomata: I can be here in support, but with no root power :) [19:56:31] eberhardson: Breakdown of num searches per session: 80M have 0 searches, 2M have 1, 700K have 2, 175K have 3, 97K have 4, roughly 153K have five or more. [20:00:27] Shilad: lemme see how that compares to the direct query logging [20:00:39] ottomata: the bytes in graph for kafka is per second? https://grafana.wikimedia.org/dashboard/db/kafka?refresh=5m&orgId=1 [20:01:37] ottomata: answering my own question: yes [20:02:11] ottomata: so 30MB per sec [20:04:21] haha wow this slack convo... [20:04:28] yes [20:06:45] Shilad: i compared against the backend logs and they roughly agree, but vary a bit. In part though this could be because our backend logs contain additional things. I see 2.68M sessions with 1 query, 339k with 2, 99k with 3, 40k with 4, and i didn't have it sum up the >= 5 [20:07:20] these are with array_contains(requests.querytype, 'full_text') and source='web' for dec 1 [20:10:35] ebernhardson: Excellent! Thank you for checking. [20:11:01] Shilad: one thing potentially, are you looking as the response code for search requests? If a web request with search=??? query parameter returns a 3xx (redirect) http code to send a user directly to a page those wouldn't be counted in my numbers. I could probably adjust the query to count them though. My count should only contain those that land on the full text results page [20:11:58] ebernhardson: Yes, I am including those. I wasn't exactly sure where to draw the line on what constitutes a text search. [20:12:17] 10Analytics, 10EventBus, 10Services (later): Document ChangeProp and EventBus monitoring - https://phabricator.wikimedia.org/T171533#3849284 (10Pchelolo) Began making the documentation page for the whole system on [[ https://wikitech.wikimedia.org/wiki/Kafka_Job_Queue | WikiTech ]]. It's still an fairly earl... [20:12:44] Shilad: at least for the search team, we usually consider the redirect to be an autocomplete hit (although it doesn't technically have to come from autocomplete, some users type full titles). [20:14:16] ebernhardson: Interesting. Lots of the redirects seem to fall in the category of things I want to capture: User types "Obama" -> Barack Obama [20:14:35] Shilad: i suppose worth noting, when a user selects something from the autocomplete drop down it still submits it to search. As long at the submitted query is near enough (light case folding and whatnot, always true when selecting a result from autocomplete=) to a real title they get redirected to that page [20:14:37] ebernhardson: Maybe I'll try it both ways and see how it works. [20:14:49] Shilad: it's certainly still a search, but sometimes it's worth distinguishing autocomplete search and full text search [20:15:05] Shilad: you might find this interesting: https://phabricator.wikimedia.org/F11851691 [20:16:11] ebernhardson: This is great! [20:17:59] 10Analytics-Kanban, 10Operations, 10hardware-requests, 10ops-eqiad: Decommission db104[67] - https://phabricator.wikimedia.org/T181784#3849298 (10Cmjohnson) Disks are wiped [20:18:09] 10Analytics-Kanban, 10Operations, 10hardware-requests, 10ops-eqiad: Decommission db104[67] - https://phabricator.wikimedia.org/T181784#3849299 (10Cmjohnson) [20:18:57] ebernhardson: Should I move this conversation to another search-related IRC channel? If so, which one? [20:20:00] Shilad: we talk search in #wikimedia-discovery. Typically it's pretty quiet throug the US day though, having some activity in the morning when EU and US overlap [20:21:23] ebernhardson: Thanks. I'll move over there now. [20:22:53] It was nice to heardrop on the search side :) Thanks Shilad and ebernhardson :) [20:23:40] ebernhardson: I'm having trouble connecting to wikimedia-discovery for some reason. [20:23:57] Maybe I'll just ask my "last" question here: [20:24:05] ebernhardson: I'm wondering if the AC -> full-text conversions in https://phabricator.wikimedia.org/F11851691 are HTTP 3xxs? [20:24:15] Or are those counted in a different part of the diagram? [20:25:40] (03CR) 10Joal: [C: 031] "Looks good :) Thanks fdans!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) (owner: 10Fdans) [20:26:37] Shilad: should be independent, but would have to double check with mpopov (irc: bearloga) to double check. That is based off different data than these backend logs, it's frontend javascript doing the logging so it has built in knowledge of if something was autocomplete or fulltext or whatever, doesn't have to derive it from information in the logs [20:26:58] ebernhardson: Thanks! [20:44:44] ebernhar|lunch Shilad: that's correct, the AC -> FT conversion is from reconstructing sessions with event logging data and checking search queries + sequence of events + timestamps. [20:48:33] 10Analytics-Kanban: Alert on age of backups on analytics1002 - https://phabricator.wikimedia.org/T182327#3820110 (10Ottomata) a:03Ottomata [21:02:13] PROBLEM - Age of most recent Analytics meta MySQL database backup files on analytics1002 is CRITICAL: CRITICAL: 0/1 -- /srv/backup/mysql/analytics-meta: Does not exist [21:03:52] PROBLEM - Age of most recent Hadoop NameNode backup files on analytics1002 is CRITICAL: CRITICAL: 0/1 -- /srv/backup/hadoop/namenode: Does not exist [21:08:39] bearloga: Thanks! [21:08:50] ^^ it works! but isn't yet true :) [21:17:29] ebernhardson and bearloga: Do you know anything about search-redirect.php? [21:18:05] I've been including it in my search counts, but wonder if it will double count because it leads to a "normal" search. [21:19:47] It looks like it leads to Special:Search [21:21:22] Shilad: heh, i've no clue what that is :) we have so much old stuff that just lives forever ... [21:21:33] Hah! [21:23:04] ebernhardson: I think you are right about removing 3xxs. I looked at the top queries for Dec 1st and #5 was "Justice League (film)". [21:23:28] It's hard to argue that is a natural language phrase somebody would use to describe Justice League :) [21:25:11] Shilad: heh, yea :) Almost certainly that was chosen from the autocomplete [21:29:13] RECOVERY - Age of most recent Analytics meta MySQL database backup files on analytics1002 is OK: OK: 1/1 -- /srv/backup/mysql/analytics-meta: 0hrs [21:30:02] RECOVERY - Age of most recent Hadoop NameNode backup files on analytics1002 is OK: OK: 1/1 -- /srv/backup/hadoop/namenode: 21hrs [21:45:51] (03PS6) 10Shilad Sen: Spark job to create desktop page ids viewed and searches performed in each session. [analytics/refinery/source] (nav-vectors) - 10https://gerrit.wikimedia.org/r/383761 (https://phabricator.wikimedia.org/T174796) [21:46:10] Gone for tonight - Bye a-team [21:47:36] 10Analytics, 10Analytics-Cluster, 10Analytics-EventLogging: Move EventLogging analytics processes to Kafka jumbo-eqiad cluster - https://phabricator.wikimedia.org/T183297#3849554 (10Ottomata) p:05Triage>03Normal [21:52:35] ottomata: o/ [21:52:38] I am back [21:52:46] if you want we can make the kafka experiment [22:05:52] milimetric: i was going to add a wikistats 2 link to analytics.wikimedia.org [22:06:08] milimetric: i can redeploy dashiki too to add footnote [22:08:05] nuria: oh I already added the link to the cc0 license and deployed [22:09:39] ottomata: all right we'll do it tomorrow then :) [22:09:41] * elukey afk! [22:12:45] oh elukey sorry! [22:12:48] missed your ping [22:12:51] ya lets do tomorrow [22:15:18] milimetric: oohhh i see it [22:15:28] milimetric: thank you! [22:16:50] (03PS1) 10Nuria: Add link to wikistats to analytics.wikimedia.org [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/399309 (https://phabricator.wikimedia.org/T182904) [22:17:16] milimetric: if you take a look at this one i can deploy too: https://gerrit.wikimedia.org/r/#/c/399309/ [22:32:58] 10Analytics: Decomission old analytics kafka cluster - https://phabricator.wikimedia.org/T183303#3849703 (10Nuria) [22:34:38] 10Analytics, 10Patch-For-Review: Update druid to latest release (0.10) - https://phabricator.wikimedia.org/T164008#3849723 (10Nuria) [22:35:20] 10Analytics-Kanban: Put in service 8 new hadoop nodes - https://phabricator.wikimedia.org/T182926#3849725 (10Nuria) a:03elukey [22:36:17] 10Analytics-Kanban: Refresh zookeeper nodes on eqiad - https://phabricator.wikimedia.org/T182924#3849738 (10Nuria) [22:38:41] 10Analytics: Private geo wiki data in new analytics stack - https://phabricator.wikimedia.org/T176996#3849753 (10Nuria) a:03Milimetric [22:39:33] 10Analytics, 10Analytics-EventLogging: Purge refined JSON data after 90 days - https://phabricator.wikimedia.org/T181064#3849755 (10Nuria) [22:40:22] Sorry, nuria, still omw from the doctor, will look tonight [22:40:57] milimetric: argh sorry [22:41:04] milimetric: i though you were done [22:41:24] np, took forever [22:43:47] 10Analytics-Kanban: Hadoop expansion Hardware - https://phabricator.wikimedia.org/T182029#3849778 (10Nuria) [22:43:49] 10Analytics-Kanban: Put in service 8 new hadoop nodes - https://phabricator.wikimedia.org/T182926#3849780 (10Nuria) [22:44:45] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats2: The granularity selector does not work for tops metrics - https://phabricator.wikimedia.org/T180266#3849782 (10Nuria) [22:46:34] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats2: The granularity selector does not work for tops metrics - https://phabricator.wikimedia.org/T180266#3751801 (10Nuria) I do not get what option do you have to change granularity in top metrics? Maybe screenshot? [22:49:13] 10Analytics-Kanban, 10Analytics-Wikistats, 10I18n: Move non-SI prefixes to user- or locale-specific interface - https://phabricator.wikimedia.org/T179906#3849789 (10Nuria) p:05Triage>03High [22:50:05] 10Analytics-Kanban: Beta: Wikistats split webpack bundle - https://phabricator.wikimedia.org/T181841#3849791 (10Nuria) [22:50:09] 10Analytics-Kanban, 10Patch-For-Review: Wikistats Beta: split webpack bundle - https://phabricator.wikimedia.org/T182601#3849793 (10Nuria) [22:53:10] 10Analytics-Kanban, 10Continuous-Integration-Config: Add CI to all analytics/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180301#3753283 (10Nuria) These are repos we use: analytics/analytics.wikimedia.org analytics/aqs analytics/aqs/deploy analytics/camus analytics/kafkatee... [22:53:20] 10Analytics, 10Continuous-Integration-Config: Add CI to all analytics/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180301#3849799 (10Nuria) [22:54:51] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#2192798 (10Nuria) Can we go ahead and close ticket? [22:56:37] 10Analytics, 10Continuous-Integration-Config: Add CI to all analytics/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180301#3849804 (10hashar) [22:56:50] 10Analytics, 10Continuous-Integration-Config: Add CI to all analytics/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180301#3753283 (10hashar) [22:57:27] 10Analytics, 10Continuous-Integration-Config: Add CI to all analytics/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180301#3753283 (10hashar) Thanks @Nuria I guess I will dig in them sometime after the holidays. [22:58:12] 10Analytics-Kanban: Making geowiki data public - https://phabricator.wikimedia.org/T131280#3849827 (10Nuria) [22:58:14] 10Analytics-Kanban: Read the python code and design the Hadoop version - https://phabricator.wikimedia.org/T182944#3849826 (10Nuria) [22:58:38] 10Analytics-Kanban: Read the python code and design the Hadoop version - https://phabricator.wikimedia.org/T182944#3839376 (10Nuria) [22:58:40] 10Analytics: Private geo wiki data in new analytics stack - https://phabricator.wikimedia.org/T176996#3849829 (10Nuria) [23:03:59] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Handle error due to lack of data - https://phabricator.wikimedia.org/T182224#3849838 (10Nuria) 05Open>03Resolved [23:04:12] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3849840 (10Nuria) 05Open>03Resolved [23:04:29] 10Analytics-Kanban, 10Patch-For-Review: EventLogging purging alarm - https://phabricator.wikimedia.org/T182978#3849843 (10Nuria) 05Open>03Resolved [23:05:57] 10Analytics-Kanban, 10Patch-For-Review: Provide oozie job running ClickStream spark job regularly - https://phabricator.wikimedia.org/T175844#3849844 (10Nuria) 05Open>03Resolved [23:06:09] 10Analytics-Kanban, 10Patch-For-Review: Wikistats Beta: split webpack bundle - https://phabricator.wikimedia.org/T182601#3849845 (10Nuria) 05Open>03Resolved [23:06:24] 10Analytics-Kanban, 10Analytics-Wikistats: Add link to new wikistats 2.0 to wikistats 1.0 pages - https://phabricator.wikimedia.org/T182001#3849846 (10Nuria) 05Open>03Resolved [23:08:40] 10Analytics-Kanban: Load test druid backend via siege - https://phabricator.wikimedia.org/T182603#3849848 (10Nuria) 05Open>03Resolved [23:08:58] 10Analytics-Kanban: Create small sample mediawiki-history table in MariaDB - https://phabricator.wikimedia.org/T165309#3849851 (10Nuria) 05Open>03Resolved [23:09:03] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819#3849852 (10Smalyshev) Thinking about it, I don't think we ever would need more that hourly resolution for anything related to queries... [23:09:49] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: The laptop attempts a vertical take-off when loading ii.wikipedia.org - https://phabricator.wikimedia.org/T182700#3849860 (10Nuria) 05Open>03Resolved [23:10:13] 10Analytics-Kanban, 10Patch-For-Review: Productionize Superset - https://phabricator.wikimedia.org/T166689#3849861 (10Nuria) 05Open>03Resolved [23:10:52] 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Bug: Bar Chart disappears - https://phabricator.wikimedia.org/T182461#3849862 (10Nuria) 05Open>03Resolved [23:11:11] 10Analytics-Kanban: Update mediawiki_history_reduced oozie job loading AQS druid backend - https://phabricator.wikimedia.org/T178504#3849863 (10Nuria) 05Open>03Resolved [23:11:40] 10Analytics-Kanban, 10Patch-For-Review: Check data from new API endpoints against existing sources - https://phabricator.wikimedia.org/T178478#3849864 (10Nuria) 05Open>03Resolved [23:12:32] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Enable more accurate smaps based RSS tracking by yarn nodemanager - https://phabricator.wikimedia.org/T182276#3849865 (10Nuria) 05Open>03Resolved [23:12:46] 10Analytics-Kanban, 10Patch-For-Review: Top is ordering items by name and not by value - https://phabricator.wikimedia.org/T182772#3849866 (10Nuria) 05Open>03Resolved [23:14:46] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819#3849867 (10Nuria) @Smalyshev We like to default to public if possible, the more eyes on the data the more useful it can be. [23:15:10] 10Analytics-Kanban: Fix mediawiki-history page reconstruction bug (deletes and restores) - simple patch - https://phabricator.wikimedia.org/T179690#3849868 (10Nuria) 05Open>03Resolved [23:16:02] 10Analytics-Kanban, 10Patch-For-Review: Add documentation for .m suffix code to pagecounts-ez doc page - https://phabricator.wikimedia.org/T180452#3849870 (10Nuria) 05Open>03Resolved [23:16:19] 10Analytics-Kanban: Fix mediawiki history page reconstruction bug (similar timestamps) - https://phabricator.wikimedia.org/T179074#3849871 (10Nuria) 05Open>03Resolved [23:31:31] 10Analytics, 10Pageviews-API: Wikistats Bug : wrong data in Top viewed articles (about frwiki) - https://phabricator.wikimedia.org/T182954#3849889 (10Nuria) 05duplicate>03Invalid [23:37:41] 10Analytics-Kanban, 10Analytics-Wikistats: Add clear definitions to all metrics, along with links to Research: pages - https://phabricator.wikimedia.org/T183261#3848328 (10Nuria) mmm.. more like we need an FAQ rather than super detail explanations. started one here: https://wikitech.wikimedia.org/wiki/Analytic... [23:46:48] 10Analytics-Kanban, 10Analytics-Wikistats: Add clear definitions to all metrics, along with links to Research: pages - https://phabricator.wikimedia.org/T183261#3849907 (10Nuria) Not to say we do not need to label stuff but i think a short FAQ to the point might be more useful than long metric descriptions. [23:59:22] (03CR) 10Milimetric: [V: 032 C: 032] "Totally didn't need to wait for me on this! :)" [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/399309 (https://phabricator.wikimedia.org/T182904) (owner: 10Nuria)