[00:05:08] cluster seems very sluggish currently [00:14:06] 21 apps pending (in "ACCEPTED" state), with launch times up to more than an hour ago - is that normal? https://yarn.wikimedia.org/cluster/apps/ACCEPTED [00:15:39] a-team ^ [00:20:29] nuria_: it seems you have two very large queries running in parallel, with 1350000 MB (1.35 terabyte) memory allocated each? [01:02:56] ok, cluster seems back to normal [06:42:35] morning! I'd need to run for an errand in a bit, will be back hopefully in ~1h [06:42:43] np elukey :) [08:16:37] joal: qq about aqs [08:16:56] (if you have time, otherwise whenever you want) [08:17:16] in the source repo I finally got to the point in which npm install returns only [08:17:19] npm WARN eslint-config-node-services@2.2.5 requires a peer of eslint@^4.12.0 but none was installed. [08:17:36] before there were two errors: [08:17:36] ├── UNMET PEER DEPENDENCY eslint@^4.12.0 [08:17:36] ├── UNMET PEER DEPENDENCY eslint-config-wikimedia@0.4.0 [08:17:50] and of course I've updated preq [08:18:04] weird thing is that eslint-config-node-services seems not to have deps [08:18:08] https://www.npmjs.com/package/eslint-config-node-services [08:18:17] but I am fairly ignorant about npm :) [08:18:34] hm [08:18:41] I've not had this error yet [08:19:06] ah meaning that if you run npm install all good? [08:19:16] in previous cases, yes :( [08:19:29] atm my diff is [08:19:30] - "eslint-config-node-services": "^2.2.2", [08:19:30] - "eslint-config-wikimedia": "^0.4.0", [08:19:30] + "eslint-config-node-services": "^2.2.5", [08:19:30] + "eslint-config-wikimedia": "^0.5.0", [08:19:36] - "preq": "^0.5.2", [08:19:36] + "preq": "^0.5.6", [08:19:42] so very minimal [08:20:18] yes, but eslint has hanged ;) [08:21:37] elukey: when aplying the same changes you do, I get the same errors (at least, it's good :) [08:22:49] the good thing is that https://github.com/wikimedia/eslint-config-node-services has known owners [08:22:53] :P [08:23:06] :) [08:23:45] https://github.com/wikimedia/eslint-config-node-services/blob/master/package.json#L26 indeed there is a dep for eslint 4.12 [08:23:56] current version is 4.19.1 elukey [08:24:13] 4.12.0 and above exist [08:24:30] I love nodejs [08:24:35] * elukey asks to services [08:25:19] elukey possibly it needs an explcit import? [08:30:39] totally ignorant :) [08:30:56] ah you mean in your package.json [08:31:03] yup [08:32:12] bingo :) [08:32:19] \o/ ! [08:33:16] elukey: for deps, I usually take example on restbase pakage.json (https://github.com/wikimedia/restbase/blob/master/package.json#L56) [08:34:29] so if I use 4.12.0 I get a dependency tree as output, if I use 4.19.1 I don't [08:35:21] ah no it was maybe doing its things [08:35:46] gooood [08:36:40] so joal, should I send a code review for the source package? And then run the server.js stuff [08:36:47] err source repo [08:37:16] elukey: Indeed - First, update the code (in our case, package.json) [08:37:31] then, update the deploy repo --> server.js etc [08:38:46] (03PS1) 10Elukey: Update package dependencies [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439851 (https://phabricator.wikimedia.org/T190213) [08:40:17] wondering if it is enough to add eslint 4.12 though [08:40:36] elukey: have you run npm test? [08:41:39] sure! [08:41:48] Grat :) [08:41:50] * elukey runs npm test [08:41:52] +e [08:41:53] hahahaha [08:41:55] :D [08:42:25] in fact they fail! [08:42:41] '/o\ [08:43:11] elukey: when adding eslint, I still get a warning about eslint-plugin-jsdoc@3.1.3 [08:44:23] alone or with the new deps as well? [08:44:41] With new deps [08:45:33] what warning do you get exactly? [08:47:56] UNMET PEER DEPENDENCY eslint-plugin-jsdoc@3.1.3 [08:48:43] what package does it refers to? I don't get it :( [08:49:23] elukey: in eslint@4.19.1 [08:49:39] npm WARN eslint-config-node-services@2.2.5 requires a peer of eslint-plugin-jsdoc@^3.2.0 but none was installed. [08:50:34] so I added 4.12.0 explicitly not 4.19.1, but changing it and re-doing npm install works as well [08:51:49] elukey: I did the same as you I think? Or maybe not :( [08:52:57] I am probably the one making the error joal, usually this is the end result :D [08:53:05] nope not sure at all [08:53:25] npm is like magic - You linda never get what you expect :) [08:55:00] elukey: re-ran npm install after having removed node_modules folder - all ok [08:55:38] This is the kind of things you can expect from npm (just sayin' - nothing personal against npm :-P) [08:57:48] * elukey notes down the trick [09:14:03] I like when npm install crashes my vm [09:20:37] elukey: I was sure you'd gonna love that tool :) [09:23:06] joal: yes I am in love with it [09:32:22] elukey: I'm finishing my patch for MediawikiHistoryChecker - I'll be on tests failure with you in minutes [09:32:37] so I removed the node_modules dir [09:32:43] re-ran npm install [09:32:45] and now I get [09:32:45] npm ERR! file /root/.npm/nan/2.7.0/package/package.json [09:32:45] npm ERR! code EJSONPARSE [09:33:10] \o/ ! Thank you NPM :) [09:33:23] hm [09:33:40] I have never seen that one either [09:33:50] but now that I think about it, it might be due to me running a special setup in vagrant [09:33:54] so I'll nuke and retry [09:33:59] k [09:38:27] ok so now npm tests returns [09:38:28] Error: eslint-config-node-services: [09:38:28] Configuration for rule "indent" is invalid: [09:38:29] Value "off" is the wrong type. [09:38:34] :P [09:43:23] :-S [09:44:05] ah updating mocha-eslint as in restbase changes the error [09:44:22] lib/druidUtil.js 119:1 error Expected indentation of 4 spaces but found 41 indent [09:44:25] loooool [09:44:32] So nice [09:44:39] The error I got was: "TypeError: this.log is not a function\n at connectionPool.acquire.then.then.then (/home/jo/wmf/code/analytics/aqs/node_modules/restbase-mod-table-sqlite/lib/clientWrapper.js:90:18) [09:45:26] it feels like playing Jenga [09:45:37] :D [09:48:10] so those errors are due to function indentation [09:48:22] elukey: yes, linting :( [09:48:23] I vote to not care about them :D [09:48:31] Works for me :) [09:51:22] hi, I'm trying to use page histories from the data lake and the results I get seem wrong [09:51:29] e.g. [09:51:56] 0: jdbc:hive2://analytics1003.eqiad.wmnet:100> select wiki_db wiki, count(*) count from wmf.mediawiki_page_history where snapshot='2018-05' and page_namespace = 8 and page_title like '%.js' and wiki_db = 'ttwikibooks' group by wiki_db; [09:52:06] ttwikibooks 9 [09:52:09] but [09:52:37] mysql:research@analytics-store.eqiad.wmnet [ttwikibooks]> select count(*) from revision join page on rev_page = page_id where page_namespace = 8 and page_title like '%.js' and rev_timestamp > '20180101000000'; [09:52:55] +----------+ [09:52:55] | count(*) | [09:52:56] +----------+ [09:52:56] | 0 | [09:52:56] +----------+ [09:53:05] what am I doing wrong? [09:53:18] Hi tgr [09:54:17] I wouldn't say you are doing things wrong, I'd say you're not fully aware of how the page_history is set :) [09:54:58] tgr: page_history dataset is about page-events, not revisions - So we're talking about create, move, delete and restore [09:55:30] I see [09:55:45] in that case, not what I want. Still seems wrong, though. [09:56:01] Also tgr, page_history spans the entire life of wikis [09:56:49] ah, so the snapshot means the whole history as seen in that month? [09:57:00] correct tgr [09:57:50] The 9 events you find in hive are aroubd 2010 [09:58:19] tgr: If you're after revisions, you should use denormalized-history, and specify event_entity = revision [09:58:57] I see there is also a mediawiki_revision table [09:59:06] am I better off with that one, performance-wise? [09:59:09] tgr: denormalized history is named wmf/mediawiki_hsitory [09:59:27] tgr: for small wikis, yes [09:59:34] for big wikis, no :) [09:59:57] For by-wiki query (like group by wiki), denormalized-history is better [10:00:59] I see. Thanks a lot, that was a big help! [10:01:09] np tgr :) [10:01:37] tgr: Please ask for more if you want, it's great these datasets get used :) [10:03:01] thanks you for making them! being able to query stats over all wikis is something I sorely missed before [10:03:16] I do imagine that yes :) [10:03:27] tgr: I'm running an example query, will send you a gist in minutes [10:04:50] is there a way to run something on small wikis only? [10:05:05] tgr: not easily - you'd need to devise a list [10:05:10] an easy way I mean, I could construct an in clause from small.dblist [10:05:15] ah, ok [10:05:20] That would be the easiest [10:11:41] (03PS1) 10Elukey: Update aqs to db369e6 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/439868 [10:12:08] ah! [10:12:14] tgr: https://gist.github.com/jobar/f0cb85130c7315ba8ef18a94a9d11772 [10:12:41] elukey: you ready-to-deploy man :) [10:13:22] (03PS1) 10Joal: Add MediawikiHistoryChecker spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/439869 (https://phabricator.wikimedia.org/T192481) [10:15:08] joal: thanks! having the timestamp format in the docs would be nice; apparently I have been brainwashed by MediaWiki to think that all timestamps are YYYYMMDDHHmmss :) [10:15:29] in general, having example queries there would be great [10:18:14] tgr - I support this idea :) [10:18:34] tgr: Which wikipage are you looking at, so that I add an example? [10:20:40] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history but there are lots of similar pages so maybe some more general place would be better [10:20:42] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Goal, and 3 others: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus - https://phabricator.wikimedia.org/T190327#4274949 (10Pchelolo) [10:20:46] 10Analytics, 10Commons, 10EventBus, 10MediaWiki-JobQueue, and 5 others: Make gwtoolsetUploadMediafileJob JSON-serializable - https://phabricator.wikimedia.org/T192946#4274946 (10Pchelolo) 05Open>03Resolved Thank you, logs indicate the job's being serialized properly now. [10:20:52] (brb) [10:47:12] (03PS2) 10Elukey: Update package dependencies [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439851 (https://phabricator.wikimedia.org/T190213) [10:47:29] (wasn't updated to the last version) [11:18:12] joal: mmmm I am reviewing the list of modules updated, and I don't see preq [11:18:27] does https://gerrit.wikimedia.org/r/#/c/analytics/aqs/+/439851/1/package.json need to be merged before starting the ./service.js etc.. ? [11:22:00] The build script will update the pointer of the deploy repository's submodule, create a Docker container in which it will install the module dependencies and send the changes to Gerrit. Review them and merge. [11:22:05] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Page views and Country name table columns overlapping in the Page Views By Country metric on Dashboard - https://phabricator.wikimedia.org/T191121#4275097 (10sahil505) [11:22:21] of course yes! [11:22:38] (03CR) 10Elukey: [V: 032 C: 032] Update package dependencies [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439851 (https://phabricator.wikimedia.org/T190213) (owner: 10Elukey) [11:22:57] (03Abandoned) 10Elukey: Update aqs to db369e6 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/439868 (owner: 10Elukey) [11:25:50] (03PS1) 10Elukey: Update aqs to d94601a [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/439879 [11:26:36] nope [11:29:21] better: preq not among node modules, but changes included [11:32:04] elukey: meh? [11:32:50] is it because they are dev deps? [11:33:07] 10Analytics, 10Analytics-Wikistats: Improve UI for ‘All wikis’ & ‘Explore Topics’ dropdown - https://phabricator.wikimedia.org/T196982#4275109 (10sahil505) [11:36:51] (03PS1) 10Elukey: Update travis.yaml with the correct nodejs versions [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439882 [11:37:32] (03CR) 10Elukey: [V: 032 C: 032] Update travis.yaml with the correct nodejs versions [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439882 (owner: 10Elukey) [11:40:39] 10Analytics, 10Analytics-Wikistats: Correct Net Bytes Difference Metric value on Dashboard - https://phabricator.wikimedia.org/T196983#4275130 (10sahil505) [11:51:08] joal: TIL - when you use ^ in the deps it means "whatever latest minor version from X onwards) [11:52:04] Ahhh ! I thought i could have been major ones as well ! [11:53:48] and it is a dev dep, so in prod we will not see it (npm install --production will avoid it) [11:54:04] hyperswitch depends on it [11:54:11] and we already have 0.5.6 in prod :D [11:56:18] (03Abandoned) 10Elukey: Update aqs to d94601a [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/439879 (owner: 10Elukey) [11:58:02] soooo we are ready for keepAlive :) [12:02:35] hm - So no change was needed? Wow [12:02:42] I wouldn't have guessed that [12:04:03] (Peter explained to me all this 10 mins ago, I had no clue) [12:06:29] elukey: So, now that we know that - How do we configure the thing to use keep_alive? [12:08:19] this is what I am trying to figure out :) [12:08:29] Marko did this for restbase [12:08:30] https://github.com/wikimedia/restbase/pull/984/commits/f2309e244e7b8235ef3503a84405addc1b4bcf2d [12:08:36] but the config is rather different from ours [12:08:42] so basically [12:08:42] + agentOptions: [12:08:43] + keepAlive: true [12:12:32] Doesn't sound too complicated [12:13:08] yeah, not familiar with our aqs hyperswitch's config though :( [12:40:42] (03PS2) 10Joal: Add MediawikiHistoryChecker spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/439869 (https://phabricator.wikimedia.org/T192481) [12:53:02] (03PS3) 10Joal: Add MediawikiHistoryChecker spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/439869 (https://phabricator.wikimedia.org/T192481) [13:13:27] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Intervals/buckets for data arround pageviews per country in wikistats maps - https://phabricator.wikimedia.org/T188928#4024359 (10fdans) a:03fdans [13:50:51] 10Analytics, 10EventBus, 10ORES, 10Patch-For-Review, and 3 others: Numeric keys in ORES models causing downstream Hive ingestion to fail - https://phabricator.wikimedia.org/T195979#4275547 (10Ottomata) Thanks @awight. I just tried to re-enable, but there are more (possibly MANY more) problems with this da... [13:51:05] 10Analytics, 10EventBus, 10ORES, 10Patch-For-Review, and 3 others: Invalid field names in ORES models causing downstream Hive ingestion to fail - https://phabricator.wikimedia.org/T195979#4275548 (10Ottomata) [13:53:43] 10Analytics, 10EventBus, 10ORES, 10Patch-For-Review, and 3 others: Invalid field names in ORES models causing downstream Hive ingestion to fail - https://phabricator.wikimedia.org/T195979#4275583 (10awight) @Ottomata Thanks for the investigation and explanations! This should be fun ;-) [13:59:35] 10Analytics, 10EventBus, 10ORES, 10Scoring-platform-team (Current), and 2 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000#4275600 (10Ottomata) [14:06:06] joal: if you have time, do you mind to discuss the structure of the aqs config yaml? [14:06:22] elukey: yes - 1min? [14:06:41] sure whenever you want [14:09:49] 10Analytics, 10EventBus, 10ORES, 10Scoring-platform-team (Current), and 2 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000#4275641 (10Pchelolo) > To avoid this, can we change the schema so that scores is an object keyed by model nam... [14:16:09] elukey: here ! [14:16:26] :) [14:16:40] bc? [14:16:44] sure [14:20:20] mforns: ok, I'm looking at the routing issue again [14:20:37] did you find anything out so far? [14:20:48] milimetric, hey, I'm almost theeeeerreee, but don't know if you'll like the "solution" I found [14:20:57] batcave? [14:21:00] omw [14:21:04] k [14:21:09] oh, it might be busy, batcave-2 [14:21:22] ok [14:24:15] elukey: https://github.com/wikimedia/analytics-aqs/blob/master/sys/mediawiki-history-metrics.js#L280 [14:35:38] joal: so basically https://github.com/wikimedia/analytics-aqs/blob/c1edede11fbcc4688931f416d6049cefce2b47d2/lib/druidUtil.js#L115 [14:35:46] needs to have agentOptions [14:35:56] with keepAlive: true [14:35:59] and that's it [14:36:11] it gets passed directly to preq [14:36:20] awesome - super easy [14:38:44] 10Analytics, 10EventBus, 10ORES, 10Patch-For-Review, and 3 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000#4275722 (10Ottomata) @Ladsgroup are there any compatibility constraints between model versions? [14:54:32] (03PS1) 10Elukey: druidUtil.js: use HTTP Keep Alive for Druid connections [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439924 (https://phabricator.wikimedia.org/T190213) [14:55:12] 10Analytics, 10Analytics-Wikistats: Correct Net Bytes Difference Metric value on Dashboard - https://phabricator.wikimedia.org/T196983#4275767 (10Nuria) Looks like in the dashboard every number has two digits in decimal precision, let's try to see if we can with css keep everything in one line. [15:01:27] (03PS2) 10Elukey: druidUtil.js: use HTTP Keep Alive for Druid connections [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439924 (https://phabricator.wikimedia.org/T190213) [15:01:32] ping fdans [15:02:17] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10Services (doing): Support connection/rate limiting in EventStreams - https://phabricator.wikimedia.org/T196553#4275812 (10Ottomata) [15:02:46] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10Services (doing): Move EventStreams to main Kafka clusters - https://phabricator.wikimedia.org/T185225#4275828 (10Ottomata) [15:28:45] ottomata: forgot to ask - I saw that kafka-jumbo1005 was down for the NIC in the end, was it required a lot of work to put it back into service? [15:28:55] or just Chris going to the DC and swap the hw? [15:28:57] (03CR) 10Joal: [C: 031] "LGTM - Let's test when you want :)" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439924 (https://phabricator.wikimedia.org/T190213) (owner: 10Elukey) [15:29:30] joal: good question - how to we test this? [15:29:31] :D [15:29:35] *do [15:29:39] I have no clue !!! [15:29:45] tcp dump? [15:30:08] the main issue is that druid in labs is in project analytics aqs in deployment-prep [15:30:28] elukey: yup, he just swaped the hw [15:30:29] and all came back [15:30:33] nice :) [15:30:41] ottomata: and all those vk alerts? [15:30:54] I saw a couple of them popping up in the backlog [15:32:05] elukey: that i'm not sure. i bounced those few and they seemed fine afer [15:32:07] they all just had timeouts [15:32:17] ack :) [15:32:48] I was wondering if there is a corner case in how vk handles kafka failures [15:41:47] (afk for ~30m) [15:52:13] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904#4276014 (10ovasileva) p:05Triage>03High [15:52:41] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904#4276016 (10Tbayer) What was our reason again (in T184793 and T186728) to record both `source_title` and `source_url` in [[https://meta.... [15:58:33] elukey: yeah, something weird happened there for sure [15:58:36] not sure what though [15:59:25] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904#4276036 (10Ottomata) Hm, it might! As long as the total char length is less than the URL limit, it would. [16:02:56] HMMMM elukey i just thought of something with eventstreams on main [16:03:14] i don't think we can do it active-active [16:03:21] or even failover the kafka cluster [16:03:32] because, SSE clients save the offsets they use [16:03:41] and when reconnecting resume from those offsets [16:03:46] and the offsets will be different in each kafka cluster [16:03:58] *-) [16:04:23] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Correct Net Bytes Difference Metric value on Dashboard - https://phabricator.wikimedia.org/T196983#4276069 (10sahil505) p:05Triage>03High [16:05:34] so, by putting this on main [16:05:41] we can only really put it on main-eqiad [16:05:43] not either [16:05:52] so, that loses our redundancy benifit of doing so [16:06:05] we do get the advantage of not relying on MirrorMaker when eqiad is active [16:06:09] but blagh [16:06:12] it might not be worth it [16:06:17] maybe we should just do jumbo after all [16:18:06] ottomata: agreed, but in case of a failover to codfw we'd have a redundant service, even if clients would need to consume from the start of the topic/partition [16:18:37] so I can see the value of having it on main rather than on jumbo [16:18:49] plus we don't rely on mirror maker that is a plus [16:19:25] we could advertise in wikitech [16:19:39] that there will be some cases in which the offset might need a reset or not work [16:20:00] so whoever is consuming knows about it [16:27:57] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog, 10Epic: [EPIC] Explore an API for logging events sampled by session - https://phabricator.wikimedia.org/T168380#4276297 (10Jdlrobson) [16:28:49] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog, 10Epic: [EPIC] Explore an API for logging events sampled by session - https://phabricator.wikimedia.org/T168380#4276301 (10Jdlrobson) This feels like a good technical project relating to beta data, but it would need some coordination and discuss... [16:30:42] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog, 10Reading-Infrastructure-Team-Backlog, 10Epic: [EPIC] Explore an API for logging events sampled by session - https://phabricator.wikimedia.org/T168380#4276304 (10Jdlrobson) This may be of interest to RI. [16:33:40] (03CR) 10Ppchelko: [C: 031] druidUtil.js: use HTTP Keep Alive for Druid connections [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439924 (https://phabricator.wikimedia.org/T190213) (owner: 10Elukey) [16:35:38] yeahhhh but elukey we'd ahve to do active failover [16:35:41] and to do a failover [16:35:46] we'd force clients to miss messages [16:35:53] and then when failling back to main eqiad [16:35:54] do it again [16:36:04] whereas, if mirror maker breaks, or if eqiad is unavailable for a whiel [16:36:08] messages are paused [16:36:14] but when things are fixed folks won't miss anything [16:42:19] sure but maybe say one/two weeks of eqiad failover to codfw might not be tolerated by clients [16:42:40] say for example that we failover to codfw and do invasive network maintenance in eqiad [16:42:43] or stuff like that [16:44:00] and the active failover is of course a "nuclear" option [16:44:06] to use only when needed [16:44:08] but we'd have it [16:45:47] (03CR) 10Elukey: [V: 032 C: 032] druidUtil.js: use HTTP Keep Alive for Druid connections [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439924 (https://phabricator.wikimedia.org/T190213) (owner: 10Elukey) [16:46:02] 10Analytics: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419#4276366 (10Ottomata) Hm, likely I just don't understand how the plugins work. Is it that the plugins only work record data on the backend when PHP is executed? I'd be surprised if that were the case, but... [16:48:02] (03PS1) 10Elukey: Update aqs to 433d1ef [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/439969 [16:52:12] 10Analytics: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419#4276395 (10Nuria) Please @varment be so kind as to share the details about the technical objections. [16:54:48] elukey: in that case failling over from jumbo -> codfw would be the same thing [16:55:07] ottomata: sure but we wouldn't care about mirror maker :) [16:55:35] I mean, I will support any decision, but given the amount of work that we put when mirror maker broker [16:55:42] I'd really love not to rely on it if possible [16:55:44] that's it :) [17:01:28] ottomata: I will respond on Phab, but tldr: I do not know the specifics on why those plugins are problematic, but am getting the notes from Reaktiv and will share when I have more info. In short though, I think we have exhausted the existing WordPress plugin options. [17:02:06] true [17:04:46] varnent: ok, ya would be good to know why i guess. i don't understand how the extensions are different than piwik in those regards, but probably its just because i don't know how the plugins work [17:05:38] Yeah - I am working on getting that info. But I am aware they have a very tall order given the requests from Advancement, Blog team, and Security - which they have the total notes on and frankly I do not even understand the specific details of at this point as it's gotten somewhat complex on multiple fronts. :) [17:06:49] 10Analytics: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419#4276458 (10Varnent) @Ottomata and @Nuria - I have requested the technical information from Reaktiv and will share as soon as I can. However, they have reviewed these extensions and done two broad reviews of... [17:07:12] (03PS1) 10Sahil505: Corrected Dashboard Metric value CSS [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/439978 (https://phabricator.wikimedia.org/T196983) [17:08:12] My hunch is that security was a major roadblock as many of these want to send info in ways we find problematic. [17:08:58] That they came to the same analytics solutions (albeit more focused on Piwik's replacement) does not surprise me. [17:10:20] 10Analytics: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419#4276472 (10Varnent) @Ottomata and @Nuria, here is the writeup they provided: {F22147444} [17:10:35] @Ottomata and @Nuria - writeup now in Phab [17:10:44] (03CR) 10Sahil505: [C: 031] "Have tested this for different (max possible) statistics value & % change label." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/439978 (https://phabricator.wikimedia.org/T196983) (owner: 10Sahil505) [17:13:17] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Page views and Country name table columns overlapping in the Page Views By Country metric on Dashboard - https://phabricator.wikimedia.org/T191121#4276477 (10sahil505) [17:14:18] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Correct Net Bytes Difference Metric value on Dashboard - https://phabricator.wikimedia.org/T196983#4275130 (10sahil505) p:05High>03Triage [17:15:29] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Correct Net Bytes Difference Metric value on Dashboard - https://phabricator.wikimedia.org/T196983#4276484 (10sahil505) p:05Triage>03High [17:26:13] fdans: o/ [17:26:24] do you have a minute for a js linter consultation? :) [17:26:40] elukey: always [17:26:52] thanks :) [17:27:00] so aqs uses the eslint package [17:27:20] yep [17:28:11] and some functions calls do need to wrap at some point into multiple lines [17:28:30] but the only solution that it accepts atm is something along the lines of [17:28:50] (cannot paste in here correctly grr) [17:28:55] druidUtil.makeTimeseriesQuery = function( [17:28:58] uri,dataSource, granularity, filter, aggregations, postAggregations, [17:29:02] intervals) { [17:29:05] return makeQuery(uri, { [17:29:11] that's horrible in my opinion [17:29:26] is there any reccomended way about doing this? [17:29:38] it fails for https://eslint.org/docs/rules/indent [17:29:46] I am reading it but not finding a way to resolve this [17:29:57] (travis tests fail generating noise etc..) [17:30:00] (really annoying) [17:31:36] let's see [17:32:25] I like how it is done now (C style) [17:34:49] elukey: how about the way the last function in that module does it? [17:34:49] https://github.com/wikimedia/analytics-aqs/blob/433d1efe29369404259dd176ceb8b254a3601dd4/lib/druidUtil.js#L132 [17:35:06] it doesn't like it [17:35:45] elukey: ok, let's do main-eqiad [17:35:54] and put a note about failover: basically don't do it [17:36:04] elukey: can you pass me the whole function? [17:36:16] fdans: what do you mean? [17:36:26] oh sorry, nvm [17:39:20] elukey: how about doing // eslint-disable-line [17:39:29] yes I just used /* eslint-disable */ [17:39:33] I was about to say that :D [17:39:40] less painful :D [17:40:03] * fdans hates eslint [17:41:34] (03PS1) 10Elukey: Fix eslint indentation and add a missing ; [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439986 (https://phabricator.wikimedia.org/T190213) [17:41:41] fdans: do you mind to review --^ [17:43:46] (03CR) 10Fdans: [C: 032] Fix eslint indentation and add a missing ; [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439986 (https://phabricator.wikimedia.org/T190213) (owner: 10Elukey) [17:45:21] 10Analytics: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419#4276524 (10Ottomata) Alright, thanks. I've also just looked a bit into wp-statistics code, and it seems they do not use client side logging to grab the data. https://github.com/wp-statistics/wp-statistics... [17:47:03] thanks :) [17:47:10] (03CR) 10Elukey: [V: 032] Fix eslint indentation and add a missing ; [analytics/aqs] - 10https://gerrit.wikimedia.org/r/439986 (https://phabricator.wikimedia.org/T190213) (owner: 10Elukey) [17:47:44] (03Abandoned) 10Elukey: Update aqs to 433d1ef [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/439969 (owner: 10Elukey) [17:50:10] (03PS1) 10Elukey: Update aqs to 02d5c80 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/439989 [17:55:17] joal: I think that we are ready to deploy [17:55:37] in theory we could test it in labs adding a "fake" druid endpoint on localhost with netcat [17:55:40] or something like that [17:55:59] let's sync tomorrow morning (or whenever you have time :) [17:56:06] going afk team! [17:56:09] * elukey off! [17:56:48] elukey: tomorrow is kids day for me [17:56:52] elukey: I'll be here later [18:15:54] 10Analytics, 10Analytics-Kanban: Confusing results in Turnilo - https://phabricator.wikimedia.org/T196785#4276611 (10Nuria) a:03Nuria [18:25:38] nuria_: checker tested - it works [18:25:52] nuria_: tell me when you want me to give you a tour [18:31:19] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000#4276662 (10Ottomata) [18:31:43] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000#4275600 (10Ottomata) a:05Ladsgroup>03Ottomata [18:34:59] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10Services (doing): Move EventStreams to main Kafka clusters - https://phabricator.wikimedia.org/T185225#4276668 (10Ottomata) [18:35:28] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review, 10Services (watching): Enable multiple topics in EventStreams URL - https://phabricator.wikimedia.org/T187418#3974509 (10Ottomata) [18:36:24] neilpquinn: Some docs of interest: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater [18:36:45] ottomata: Hiiii. June 19 is a WMF Holiday for US staff. do you mind moving Analytics hang time to some other day in that week or next? [18:40:57] oh! sure. [18:42:09] man that meeting is hard to schedule [18:45:28] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10Services (doing): Move EventStreams to main Kafka clusters - https://phabricator.wikimedia.org/T185225#4276724 (10Ottomata) [18:46:01] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Consider increasing retention for mediawiki event topics - https://phabricator.wikimedia.org/T196409#4276727 (10Ottomata) Why not eh?! [18:47:46] (03PS4) 10Joal: Add MediawikiHistoryChecker spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/439869 (https://phabricator.wikimedia.org/T192481) [19:00:10] 10Quarry: Add message when the server is unreachable - https://phabricator.wikimedia.org/T197027#4276776 (10Framawiki) p:05Triage>03Low [19:05:22] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 10Services (watching): Consider increasing retention for mediawiki event topics - https://phabricator.wikimedia.org/T196409#4276809 (10Ottomata) [19:05:25] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Wikimedia-Stream: Increase kafka event retention to 14 or 21 days - https://phabricator.wikimedia.org/T187296#4276807 (10Ottomata) [19:06:01] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Wikimedia-Stream: Increase kafka event retention to 14 or 21 days - https://phabricator.wikimedia.org/T187296#3970994 (10Ottomata) I'll make this 31 days just to bump it up to a month. We have plenty of space for this. [19:09:06] 10Quarry: Define in a single place the pseudoname of unnamed queries - https://phabricator.wikimedia.org/T197029#4276818 (10Framawiki) p:05Triage>03Low [19:15:53] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Wikimedia-Stream: Increase kafka event retention to 14 or 21 days - https://phabricator.wikimedia.org/T187296#4276857 (10Ottomata) Doing the following for all main-eqiad and main-codfw: ``` for t in \ mediawiki.page-create... [19:21:24] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Wikidata, and 2 others: Increase kafka event retention to 14 or 21 days - https://phabricator.wikimedia.org/T187296#4276878 (10Ottomata) [19:22:03] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Wikidata, and 2 others: Increase kafka event retention to 14 or 21 days - https://phabricator.wikimedia.org/T187296#3970994 (10Ottomata) mediawiki eventbus topics should now be retained for 31 days in main Kafka clusters. If we add a new mediawiki topic, we... [19:22:08] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Wikidata, and 2 others: Increase kafka event retention to 14 or 21 days - https://phabricator.wikimedia.org/T187296#4276880 (10Ottomata) [19:22:30] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Wikidata, and 2 others: Increase kafka event retention to 31 - https://phabricator.wikimedia.org/T187296#3970994 (10Ottomata) a:03Ottomata [19:23:20] 10Analytics, 10Analytics-Kanban, 10Wikimedia-Stream, 10Patch-For-Review: Support timestamp based consumption in KafkaSSE and EventStreams - https://phabricator.wikimedia.org/T196009#4276884 (10Ottomata) [19:24:34] 10Analytics, 10Services (watching): Enable TLS and authorization for cross DC MirrorMaker - https://phabricator.wikimedia.org/T196081#4276887 (10Ottomata) [19:24:36] 10Analytics, 10EventBus: TLS encryption for cross DC Kafka main MirrorMaker instances - https://phabricator.wikimedia.org/T194764#4276889 (10Ottomata) [19:28:56] (03PS1) 10Joal: Add check step in mediawiki-history jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/440005 (https://phabricator.wikimedia.org/T192481) [19:29:34] (03PS5) 10Joal: Add MediawikiHistoryChecker spark job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/439869 (https://phabricator.wikimedia.org/T192481) [19:34:31] (03PS1) 10Framawiki: [WIP, DON'T MERGE] Port to Python3 [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/440007 (https://phabricator.wikimedia.org/T192698) [19:34:52] (03CR) 10jerkins-bot: [V: 04-1] [WIP, DON'T MERGE] Port to Python3 [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/440007 (https://phabricator.wikimedia.org/T192698) (owner: 10Framawiki) [19:36:49] (03CR) 10Framawiki: "Practically each export format need fix :(" [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/440007 (https://phabricator.wikimedia.org/T192698) (owner: 10Framawiki) [19:45:28] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Wikidata, and 2 others: Increase kafka event retention to 31 - https://phabricator.wikimedia.org/T187296#4276952 (10Pchelolo) > If we add a new mediawiki topic, we need to remember to run this command for it. Or implement T157092 :) [19:45:43] 10Analytics, 10Analytics-Kanban, 10Discovery, 10Wikidata, and 3 others: Increase kafka event retention to 31 - https://phabricator.wikimedia.org/T187296#4276953 (10Pchelolo) [19:55:19] (03CR) 10Joal: [V: 031] "Tested on cluster against correct and incorrect data." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/439869 (https://phabricator.wikimedia.org/T192481) (owner: 10Joal) [20:13:23] (03PS2) 10Joal: Add check step in mediawiki-history jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/440005 (https://phabricator.wikimedia.org/T192481) [20:13:48] (03CR) 10Joal: [V: 031] "Validated on cluster for success and failure." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/440005 (https://phabricator.wikimedia.org/T192481) (owner: 10Joal) [20:27:12] (03PS3) 10Joal: Add validation step in mediawiki-history jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/440005 (https://phabricator.wikimedia.org/T192481) [20:27:19] Gone for tonight team - see you tomorrow [20:53:39] ottomata: question if you may [20:53:44] yaa [20:54:02] ottomata: teh bumping up of retention is a config in kafka itself? [20:54:31] ottomata: per topic, right? [20:54:55] nuria_: yes per topic [20:55:00] but it is not in a file anywhere [20:55:09] its stored in zookeeper and set by running a command [20:55:16] ottomata: and how does that get persisted if we restart cluster? [20:55:21] it is in zookeeper [20:55:29] ottomata: zookeeper cluster [20:55:29] but, it would not be persisted if we say/ spun up a new cluster [20:55:38] so unfortuntetly it is unpuppetized [20:55:40] as are ACLs [20:55:42] if we used them [20:56:03] ottomata: ok, should we add item to puupetize? [20:56:15] ehhhh [20:56:17] it will not be easy [20:56:22] not sure [20:56:24] it will be like puppetizing mysql access permissions [20:56:41] it'll be a buncha hacky puppet execs [20:56:48] unless there some external tool already... [20:56:53] that can ensure configs [20:57:05] ottomata: ah, cause they are stored on system itself [20:57:09] yeah [20:57:49] ottomata: i see [21:13:19] 10Analytics, 10Services (watching): Enable TLS and authorization for cross DC MirrorMaker - https://phabricator.wikimedia.org/T196081#4277186 (10Ottomata) Assuming the certificate CN is `kafka_mirror_maker`, we will need to add the following ACLs to clusters to which we connect over SSL port 9093: ``` kafka a... [21:13:59] 10Analytics, 10Analytics-Kanban, 10Services (watching): Enable TLS and authorization for cross DC MirrorMaker - https://phabricator.wikimedia.org/T196081#4277187 (10Ottomata) [21:29:17] (03PS2) 10Mforns: [WIP] Trying to improve routing logic [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/438030 (owner: 10Milimetric) [23:12:29] 10Analytics, 10Analytics-Dashiki, 10Community-Tech, 10Patch-For-Review: Add draft namespace creations to page creation dashboard - https://phabricator.wikimedia.org/T176375#4277431 (10Liuxinyu970226) [23:14:11] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Some VirtualPageView are too long and fail EventLogging processing - https://phabricator.wikimedia.org/T196904#4277519 (10Jdlrobson) @ottomata @tbayer i believe the idea was to be consistent with pageviews. Even if we drop one of those it's likely...