[00:55:24] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Partially purge MobileWikiAppiOSUserHistory eventlogging schema - https://phabricator.wikimedia.org/T195269 (10fdans) 05Open>03Resolved [05:30:39] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Decommision edit analysis dashboard - https://phabricator.wikimedia.org/T199340 (10Nuria) [06:24:17] 10Analytics, 10Analytics-Kanban: Some fields in webrequest druid dataset should eb ingested as numbers - https://phabricator.wikimedia.org/T167494 (10Nuria) {F23713534} indeed this works, see pretty slider in turnilo [06:24:42] 10Analytics, 10Analytics-Kanban: Some fields in webrequest druid dataset should eb ingested as numbers - https://phabricator.wikimedia.org/T167494 (10Nuria) Code: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445553/ [06:25:05] nuria_: good evening :) [07:20:05] Good morning elukey :) [07:31:48] o/ [07:53:17] elukey: if you have time, can we test Nur ia's patch for turnilo? [07:55:48] sure [07:55:59] I just fixed the commit msg, jenkins wasn't happy about it [07:57:12] do you want me to merge? [08:00:10] elukey: I have a doubt about the validity, so whether we merge or deploy manually is for you to decide [08:00:43] elukey: Reassigning manually a dimension to number when it is actually indexed as string seems incorrect to me - But maybe turnilo is samrt enough and generates correct requests? [08:00:49] 10Analytics, 10EventBus, 10Services (next), 10Wikimedia-Incident: Clean up leftover topics - https://phabricator.wikimedia.org/T199510 (10mobrovac) p:05Triage>03Normal [08:01:51] 10Analytics, 10EventBus, 10Services (next), 10User-Elukey, 10Wikimedia-Incident: Clean up leftover topics - https://phabricator.wikimedia.org/T199510 (10elukey) [08:06:18] (03PS1) 10Sahil505: Changed color shades for sections & charts [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445574 (https://phabricator.wikimedia.org/T183184) [08:06:21] joal: yeah looks strange [08:06:25] let's apply it manually [08:07:04] +1 elukey [08:07:56] (03CR) 10Sahil505: [C: 04-1] "WIP" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445574 (https://phabricator.wikimedia.org/T183184) (owner: 10Sahil505) [08:08:23] datacube is webrequest_sampled right? [08:08:31] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, 10Readers-Web-Kanbanana-Board: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10Ryasmeen) Verified this issue on production. Looks good to me. However, found ano... [08:08:34] webrequest_sampled_128 yes [08:09:36] done [08:09:41] checking [08:11:52] elukey: it works :) [08:12:17] elukey: it seems that plywood is smart enough to convert numeric-range queries to string regexp :) [08:12:58] ack, merging then [08:14:22] elukey: And the grouping is actually pretty cool ! [08:14:42] elukey: Ok for me, we can merge :) [08:17:46] done :) [08:18:27] Thanks elukey :) [08:59:32] Any idea why uniquie devices for wikidata increased 170% in June ? https://stats.wikimedia.org/v2/#/wikidata.org [09:00:08] seems a bit .... unexpected [09:01:47] (03PS8) 10Jonas Kress (WMDE): Introduce oozie job to schedule generating metrics for Wikidata co-editors [analytics/refinery] - 10https://gerrit.wikimedia.org/r/443409 (https://phabricator.wikimedia.org/T193641) [09:12:42] interesting addshore ! [09:12:49] thats what I thought :p [09:13:17] Is there a process for asking it to be investigates? should I file a phab ticket or? [09:13:52] phab ticket is the way [09:13:59] will do, thanks! [09:14:07] I can already tell you the spike comes from the offset part of the compuation [09:14:54] addshore: We compute uniques from 2 values: uniques-by-cookies (with last-access-cookie), and what we call unique offset, based on fingerprinting [09:15:03] the spike we see here is from fingerprinting [09:15:43] 10Analytics, 10Wikidata, 10User-Addshore: Investigate June Unique devices increase of 170% for wikidata - https://phabricator.wikimedia.org/T199517 (10Addshore) [09:15:47] i can even tell you that it comes from a specific period [09:16:04] interesting [09:16:19] thats still one hell of a spike [09:16:20] addshore: https://gist.github.com/jobar/87b2a7f95b4aff4bdf504c5d6023e6d5 [09:17:44] addshore: anything special from May-30 to June-20 (with a special look between May-30 and June-4)? [09:17:52] *looks* [09:19:24] (03CR) 10Joal: [C: 031] "LGTM, let's wait for @nuria before merging." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/443409 (https://phabricator.wikimedia.org/T193641) (owner: 10Jonas Kress (WMDE)) [09:20:34] I wonder if this has something to do with us turning off the special:ItemDisambiguation page and people potentially finding another way around it? [09:20:39] *thinks more* [09:21:18] joal: do requests / pageloads increase? or is it juts a spike in devices those page loads are coming from? [09:22:24] addshore: no spike in pageviews [09:23:07] and uniquie devices only comes from page views? not from other requests? (api and what not) ? [09:24:39] addshore: pageviews only [09:24:55] super interesting, well, I cant think of anything that will have done that [09:25:07] addshore: not true sorry - pageviews OR redicrect-to-pageviews [09:26:05] joal: split by country code, they all come from 1 country [09:27:22] Nice addshore :) [09:27:35] all because you taught me how to use turnilo :P [09:27:41] :D [09:27:55] now i can look at page views from that country for that time period ;) [09:28:46] addshore: Indeed :) [09:28:55] addshore: SPIKEEEEE ! [09:29:28] All Chromium [09:30:13] * addshore forgot pageviews_daily doesnt have a split by pages :P [09:30:17] * addshore moves to the next data set [09:33:23] doesn't have it either addshore :) [09:34:20] and a split by uri_path doesnt seem to help [09:34:51] With the split: [09:34:53] https://turnilo.wikimedia.org/#webrequest_sampled_128/3/N4IgbglgzgrghgGwgLzgFwgewHYgFwhLYCmAtAMYAWcATmiADQgYC2xyOx+IAomuQHoAqgBUAwoxAAzCAjTEaUfAG1QaAJ4AHLgVZcmNYlO4B9E3sl6ASnGwBzYkryqQUNLXoEATAAYAjAAcpD4ArKReISI+PnjRsT4AdNE+AFqSxNgAJty+gcEAbOH5UTFx0UnRaQC+ALpVDGpaOq5oNBD2kobGBOSYMNit6ia9mfogcOQYONztkmCIMI4qICIAEiA1TNiYnlKIUMT1jdrcbm0dBkbcMG0mlJhukhNTuASzTPMIi07KIADuAISfwgAGsIJl0HAEpgaHYNlsdvg9ggDrUmFBNEg0D9js0zu04UxMhA2NgoFhXiAbhATJp0JRJF [09:34:53] AYZ5QF1uJQINjJMTDJMKdxRlByBliR16oQSZz8NgYAgEEdmE1uBYiSSMuTpgQzCqQHYaLZZbROepuAAFPwAEUZzPwrKuWvMau5EF5L24cCFIoJIHFSBYUrwMrlaJAbA9NyWeFA0AAsrKMEj9sR0QSEM0OVymCx4xBlhmlHUmJp2iRMha1WSKT9NiBi9hSwBlG0EfOSNMOLK2kDE0kaynU2n0tuSzwhcUcuyUJCTzxB+VAA [09:35:00] ooh long link pasting.... [09:35:19] http://goo.gl/ZjrHR7 [09:35:31] remove the split and the spike comes back [09:39:11] looks like it all comes from a single ISP [09:40:54] * elukey likes seeing people using isp data in turnilo \o/ [09:41:18] :D Well, I started with Ips and then realized that they were all very similar, hadn't seen the ISP bit before :) [09:42:10] still can't make a split for uri_path show anything in particular, so looks like its just hundreds of thousands of requests from that ISP being detected as different devices for some reason [09:42:32] addshore: Trying to understand that bit [09:43:37] joal: there are a shit tonne of different UAs now, the pageviews table grouped them all as Chromium I think, but in the webrequest table the UAs all have slightly different versions listed, I guess that accounts for the devices [09:43:43] addshore: we added ispdata country etc.. only recently, and you are the first one that I am seeing use it :) [09:43:57] elukey: and I'm using them both :D [09:43:59] addshore: Then it's the thing [09:44:59] cool, so, how much of that am I allowed to paste / explain in a public phab ticket then? :P Am I allowed to use UA examples? / say the country / say the ISP? [09:46:27] 10Analytics, 10Wikidata, 10User-Addshore: Investigate June Unique devices increase of 170% for wikidata - https://phabricator.wikimedia.org/T199517 (10Addshore) a:03Addshore [09:55:50] addshore: this is a tricky question :( [09:56:28] I guess I can write up a kind of description leaving out all of the details, and then put the details in an NDA covered paste? [09:58:33] addshore: all the explanation can be made public, we probably don't want to give examples of the UA I think [09:58:53] ack :) [10:04:40] joal: https://wikitech.wikimedia.org/w/index.php?title=Incident_documentation/20180711-kafka-eqiad&redirect=no#Kafka_heap_size_considerations [10:04:50] whenever you have time let me know if they make sense [10:04:51] addshore: https://gist.github.com/jobar/bdba15a530767fdb7ae20ec01564fe2e [10:05:02] * joal reads [10:05:13] interesting [10:05:24] addshore: gist data extracted from https://wikimedia.org/api/rest_v1/metrics/pageviews/top-by-country/wikidata.org/all-access/2018/01 [10:06:42] 10Analytics, 10Wikidata, 10User-Addshore: Investigate June Unique devices increase of 170% for wikidata - https://phabricator.wikimedia.org/T199517 (10Addshore) So, thanks to @JAllemandou for reminding me that turnilo should be the thing I use to investigate this. It looks like the maint spike was between M... [10:06:45] 10Analytics, 10Wikidata, 10User-Addshore: Investigate June Unique devices increase of 170% for wikidata - https://phabricator.wikimedia.org/T199517 (10Addshore) a:05Addshore>03None [10:07:00] joal: I added my comment with my bit of looking, would be great if you could add a comment with your findings too! [10:07:18] so that looks like this is all new traffic? [10:08:22] addshore: I think it is some kind of automatic crawling tha doesn't declare itself [10:08:27] I still wonder if it is legit traffic, or some script rotating its UAs [10:08:36] joal: thats my thought as well [10:09:01] * addshore goes to hue quickly :P [10:09:28] addshore: let me show you some swap goodies if you may :) [10:09:42] swap goodies? :P [10:09:51] SWAP :) [10:10:17] that is SWAP? :P [10:10:45] addshore: https://wikitech.wikimedia.org/wiki/SWAP [10:10:55] oooh [10:11:14] addshore: You'll never go to hue for this anymore :D [10:11:20] hahahhahaaa [10:11:27] * addshore has a meeting in 4 mins :P [10:11:38] addshore: and there is a ssh tunnel that'll be always up :) [10:11:59] addshore: later then [10:12:15] addshore: leaving me time to prep for the show ;) [10:13:08] well, i have logged in :P [10:13:12] well, it is loading [10:30:32] (03PS2) 10Sahil505: Changed color shades for sections & charts [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445574 (https://phabricator.wikimedia.org/T183184) [10:37:27] joal: Is there some way to link to the analysis from wikistats for the spike? I know that can be done for other dashboards [10:37:50] addshore: I don't think I get what you want [10:38:02] annotations! thats the word [10:38:09] Ah ! [10:38:12] to indicate why there is a spike :) [10:38:38] Annotations in wikistats-v2 are either alread there or almost, so it'll be possible :) [10:38:50] cool! [10:42:10] 10Analytics, 10Wikidata, 10User-Addshore: Investigate June Unique devices increase of 170% for wikidata - https://phabricator.wikimedia.org/T199517 (10Addshore) 05Open>03Resolved a:03Addshore It looks like this might be some bot or script scraping stuff that isn't identified as a script in any way, and... [10:42:54] joal: I also added a note about mirror maker [10:43:13] in that case i think we were incredibly lucky that empty topics did not get replicated over to codfw [10:43:14] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-Addshore: Investigate June Unique devices increase of 170% for wikidata - https://phabricator.wikimedia.org/T199517 (10Addshore) [10:44:35] (03CR) 10Sahil505: [C: 04-1] "WIP" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/445574 (https://phabricator.wikimedia.org/T183184) (owner: 10Sahil505) [10:48:31] elukey: very clear as usual :) [11:36:46] * elukey lunch! [12:47:22] o/ are the eventlogging-processor error logs shipped to hadoop at all? [12:47:41] (i thought i'd have a go at verifying https://phabricator.wikimedia.org/T196904) [12:57:35] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, 10Readers-Web-Kanbanana-Board: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10phuedx) a:05phuedx>03Ottomata I don't have access to `eventlog1002` so I don'... [13:00:09] phuedx: o/ not that I know of [13:00:17] do you need anything to check on eventlog1002? [13:01:08] yeah -- there was a low-frequency processing error occurring where a VirtualPageView event couldn't be processed because it was too long [13:01:37] we (readers web) deployed a fix for this as part of yesterday's train and i wanted to check if the error had disappeared [13:01:51] i checked the event.eventerror table but to no avail [13:05:59] checking in the logs [13:06:05] <3 [13:07:27] so I am grepping grep -rni 'Unable to process.*VirtualPageView' eventlogging-processor@client-side-* [13:07:40] and sort by time [13:07:44] let's see the last occurrence [13:08:47] so I see one at Jul 13 13:06 [13:08:56] UTC time, so basically now [13:09:30] (checking the full entry in the logs) [13:09:54] phuedx: yeah it seems like the one posted in the task [13:10:17] just gone over the ticket to find that the error was occurring ~2 times a minute [13:10:59] elukey: there's not much that we can do about clients that haven't refreshed and gotten the latest version of page previews. maybe i'll check back in a couple of days too? [13:11:41] phuedx: yeah sure, I have the grep handy in my history so I can quickly report anomalies [13:11:51] ta <3 [13:11:52] let's recheck on monday [13:11:54] :) [13:12:07] do you mind if i dump this chat in the phab task verbatim? [13:14:41] sure! [13:15:16] I can give you the last timings of the occurrences of the issue [13:15:22] Jul 13 12:51 [13:15:22] Jul 13 12:52 [13:15:22] Jul 13 12:53 [13:15:22] Jul 13 12:53 [13:15:22] Jul 13 12:58 [13:15:24] Jul 13 12:59 [13:15:27] Jul 13 13:05 [13:15:29] Jul 13 13:06 [13:15:32] Jul 13 13:09 [13:15:34] Jul 13 13:14 [13:18:30] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, 10Readers-Web-Kanbanana-Board: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10phuedx) @elukey was good enough to check the logs on `eventlog1002`: ``` 14:00:1... [13:21:50] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add interface::add_ip6_mapped { 'main': } to all the Analytics hosts - https://phabricator.wikimedia.org/T199180 (10elukey) @ayounsi only the PTRs right? Or should it be the case to add the AAAA too? I am a bit afraid of seeing Hadoop starting use ipv6 aft... [13:23:08] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add interface::add_ip6_mapped { 'main': } to all the Analytics hosts - https://phabricator.wikimedia.org/T199180 (10ayounsi) Only PTR indeed. AAAA is at your discretion. [14:06:54] (03PS1) 10Fdans: Suppresses subprocess stdout and stderr to avoid false alarms [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445621 (https://phabricator.wikimedia.org/T198966) [14:12:17] (03CR) 10Joal: [C: 031] "LGTM !" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445621 (https://phabricator.wikimedia.org/T198966) (owner: 10Fdans) [14:21:04] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) So now on stat* and notebook* we have a /etc/gitconfig rule that forces all git users to use the http[s] proxy. The conf1006 fl... [14:42:14] (03PS2) 10Fdans: Adds empty dir removal to hive partition dropping jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) [14:59:56] joal: hello! i did tested turnilo's patch [15:00:13] joal: but to truly work needs a retsart [15:00:27] cc elukey on turnilo restart [15:02:53] nuria_: change already live :) [15:09:00] (03PS3) 10Fdans: Adds empty dir removal to hive partition dropping jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) [15:20:42] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, 10Readers-Web-Kanbanana-Board: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10mforns) @phuedx @elukey I think a minimal amount of errors is expected. AFAIK we... [15:24:05] hey nuria_ - I asked elukey for a double check because it seemed weird to be able to have a numeric dimension whilei ndexed as text - But it works :) [15:31:27] joal: yaya, i started my own turnilo in thorium which .. ahem , might still be running and tested it cause i thought the same thing [15:52:11] 10Analytics, 10Analytics-EventLogging, 10Page-Previews, 10Readers-Web-Backlog, 10Readers-Web-Kanbanana-Board: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904 (10phuedx) >>! In T196904#4422959, @mforns wrote: > So, the errors should be a lot l... [15:56:11] (03CR) 10Mforns: [V: 032 C: 032] "Niceee LGMT!" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/444012 (https://phabricator.wikimedia.org/T198510) (owner: 10Sahil505) [16:01:22] ping fdans [16:39:55] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design: Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10Tbayer) [17:12:48] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add interface::add_ip6_mapped { 'main': } to all the Analytics hosts - https://phabricator.wikimedia.org/T199180 (10ayounsi) Maybe out of scope here, but looking at something else I noticed that notebook1004 (in the analytics vlans) have an autoconfigured IP. [17:37:09] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10ayounsi) In addition to T198623#4415961 We have notebook1003 and notebook1004 sending `ICMPv6 Multicast Listener Report` every 2 minute... [17:55:27] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) https://www.ietf.org/proceedings/50/I-D/nfsv4-rpc-ipv6-00.txt ``` IPv6 enabled RPC service must join a well known multicast... [18:00:17] 10Analytics, 10MinervaNeue, 10Readers-Web-Backlog, 10Design: Sticky header instrumentation - https://phabricator.wikimedia.org/T199157 (10ovasileva) [18:21:05] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Some fields in webrequest druid dataset should be ingested as numbers - https://phabricator.wikimedia.org/T167494 (10Nuria) browser versions cannnot be cleanly converted to numbers cause data has '-' . to be clear, it is possible to filter those in turnilo... [18:21:18] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Some fields in webrequest druid dataset should be ingested as numbers - https://phabricator.wikimedia.org/T167494 (10Nuria) Please reopen if needed [18:21:31] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Some fields in webrequest druid dataset should be ingested as numbers - https://phabricator.wikimedia.org/T167494 (10Nuria) 05Open>03Resolved [18:33:02] joal: let me know what you think of this one [18:35:06] nuria_: you were talking about T167494 ? [18:35:07] T167494: Some fields in webrequest druid dataset should be ingested as numbers - https://phabricator.wikimedia.org/T167494 [18:35:38] joal: no, something else but got idea from that ticket [18:35:58] https://usercontent.irccloud-cdn.com/file/ygEcX4C4/Screen%20Shot%202018-07-13%20at%2011.35.42%20AM.png [18:36:04] joal: see screenshot [19:06:27] 10Analytics, 10Analytics-Kanban: Problems with external referrals? - https://phabricator.wikimedia.org/T195880 (10Nuria) Nuria to look into the differences between "unknown" and "none" [19:16:47] nuria_: Adding metrics is a very good idea :) [19:20:31] joal: ok, sold [19:23:38] nuria_: Which ones is another discussion topic :) [19:36:19] 10Analytics: Singapore does not appear on wikistats map - https://phabricator.wikimedia.org/T199571 (10Nuria) [19:37:28] joal: let me know wht think of this one though, it might be not optimal to do it this way [19:45:23] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-Addshore: Investigate June Unique devices increase of 170% for wikidata - https://phabricator.wikimedia.org/T199517 (10Nuria) {F23734550} It coincides with a spike of pageviews from thailand, that seems like a bot accessing teh desktop size, wi... [19:48:39] nuria_: The amusing part about --^ is that the UA actually changes (minor-minor version is never seen more than 2/3 times for the same IP), as well as page viewed [19:49:06] joal: well know technic to avoid shedding from servers [19:49:25] joal: ya.... [19:54:14] joal: changed measure of percentage of bot traffic a bit: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445654/ [22:10:28] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Performance-Team (Radar): Some fields in webrequest druid dataset should be ingested as numbers - https://phabricator.wikimedia.org/T167494 (10Krinkle) @Nuria Thanks! This is really great. I'm also impressed by how quickly it responds to queries. [22:10:42] 10Analytics, 10Analytics-Kanban, 10Performance-Team (Radar): Some fields in webrequest druid dataset should be ingested as numbers - https://phabricator.wikimedia.org/T167494 (10Krinkle) [22:10:44] 10Analytics, 10Analytics-Kanban, 10Performance-Team (Radar): Some fields in webrequest druid dataset should be ingested as numbers - https://phabricator.wikimedia.org/T167494 (10Krinkle)