[00:20:13] 10Analytics, 10Analytics-EventLogging, 10Performance-Team, 10Technical-Debt: JsonData and EventLogging have multiple classes with the same name - https://phabricator.wikimedia.org/T159079#3056276 (10Legoktm) Just to clarify for history, these classes were originally written by robla for the JsonData extens... [04:29:08] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3403243 (10Nuria) Tested fix on beta and I think it looks good , can @kaldari verify? events flowing are... [06:24:54] morning! [06:25:07] dbststore1002's alter tables are 88/312 [06:25:13] in just a night of work [06:26:40] db1047 is 174/212 [06:27:23] maybe by the end of this week we'll be done ? [06:39:52] 10Analytics-Kanban, 10Wikimedia-Stream: Port RCStream clients to EventStreams - https://phabricator.wikimedia.org/T156919#3403311 (10Aklapper) [07:19:28] pfff elukey [07:19:55] elukey: unexpected need to care Lino a bit more this morning - will ping you when I'll have time to deploy [07:20:28] joal: sure! [08:04:13] * elukey commutes to the co-working [08:06:19] 10Analytics, 10Project-Admins: Ceate tag "Analytics-Data-Quality" on Phabricator - https://phabricator.wikimedia.org/T169560#3403380 (10Aklapper) a:05Aklapper>03None See https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects#Creating_new_projects for required data (in this case: a projec... [08:06:55] 10Analytics, 10Project-Admins: Create tag "Analytics-Data-Quality" on Phabricator - https://phabricator.wikimedia.org/T169560#3401740 (10Aklapper) [08:20:51] 10Analytics: Push mediawiki history data into labs. Public history data lake - https://phabricator.wikimedia.org/T169572#3403436 (10JAllemandou) [08:25:52] (03PS1) 10Joal: Correct sqoop script (overwrite on retry) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/363136 [08:26:21] elukey: --^ If you have a minute, I'd like this be deployed today :) [08:26:37] joal: sure! [08:27:47] elukey: I'm also retrying sqoop one last time [08:31:41] failed again? [08:33:31] joal: qq - where is --delete-target-dir used ? [08:33:47] I am a bit ignorant with docopt but I don't see it in the parameters [08:33:53] elukey: manual launches failed yesterday [08:34:20] elukey: this is a sqoop-job option, used only when try_number >1 (RE-try) [08:34:52] ahhh [08:35:17] sorry didn't get the whole thing, you are building the list of parameters for sqoop [08:35:20] okok [08:38:40] (03CR) 10Elukey: [C: 031] "From my understanding of the script it LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/363136 (owner: 10Joal) [08:39:16] joal: --^ I assumed that you tested it and the config correctly populates the number of tries done [08:42:38] elukey: I know it does through logs (using the config try for looging :) [08:42:39] https://www.theverge.com/2017/7/3/15917950/nasdaq-nyse-stock-market-data-error [08:43:24] :) [08:43:45] elukey: I'd have used 123.45 :) [08:43:52] hahahah [08:43:57] the classic "oooops" [08:45:31] elukey: kinda easier to catch :) [08:54:50] https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/ [08:55:09] elukey: you're a bit late, it buzzed a lot 2 days ago ;) [08:55:15] :-P [08:55:44] elukey: kafka 0.11 is out is also the thing [08:56:13] oh yes but it is a major driver for us to deploy 0.11 rather than 0.10 :) [08:57:00] atm we should have at-least-once with varnishkafka, since it retries on failures [08:57:00] elukey: for main cluster I understand - Is it really that important for jumbo? [08:57:36] it might be interesting to have, not having duplicates for webrequests is nice [08:58:03] I mean, it is basically a setting to turn on in the brokers and the producer only needs to support it [08:58:05] elukey: It'll be interesting to check performance impact [08:58:37] if it is how they have described in the link I think it is negligible [08:58:52] it does only persists a sequence number [08:59:28] elukey: it is advertised as negligible (as it should be) - testing will be interesting :) [09:00:32] we test it in production! [09:00:34] :P [09:00:45] indeed, no other way :) [09:01:23] elukey: https://blogs.msdn.microsoft.com/seliot/2011/04/25/i-dont-always-test-my-code-but-when-i-do-i-do-it-in-production/ [09:03:53] yep :) [09:05:07] elukey: gone again, will be around 12:30 - If you're ok, let's deploy at that moment (or after lunch) [09:05:35] sure thing! [10:08:23] 10Analytics-Tech-community-metrics: "sortinghat export" creates a JSON file with whitespace at the end of each line, Owlbot does not - https://phabricator.wikimedia.org/T169614#3403800 (10Aklapper) [10:16:59] https://www.datadoghq.com/blog/monitor-hadoop-metrics/ [10:17:11] * elukey adds metrics to grafana and to puppet [10:18:38] TotalLoad looks nice [10:19:05] planning to add alarms on percentage of hdfs space used (70 warning, 80 critical) [10:19:12] and missing/corrupt hdfs blocks [10:19:27] over a timeframe of multiple hours [10:20:07] 10Analytics, 10Analytics-Cluster, 10User-Elukey: Monitor more HDFS health metrics - https://phabricator.wikimedia.org/T163908#3403857 (10elukey) [10:20:16] renamed --^ [10:20:43] awesome elukey :) [10:20:52] elukey: sqoop failed with same error as last time :( [10:20:53] 10Analytics, 10Analytics-Cluster, 10User-Elukey: Monitor more HDFS health metrics - https://phabricator.wikimedia.org/T163908#3214017 (10elukey) https://www.datadoghq.com/blog/monitor-hadoop-metrics was very useful, added several metrics to grafana under the hdfs section. [10:21:23] elukey: I suggest deploying now and do sqoop later [10:24:17] elukey: nice post on hadoop metrics ! [10:25:03] joal: sure! [10:25:22] elukey: Ok, merging the patch for sqoop [10:26:03] (03CR) 10Joal: [V: 032 C: 032] "Self-merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/363136 (owner: 10Joal) [10:30:44] joal: added these ones https://gerrit.wikimedia.org/r/#/c/363154/1/modules/role/manifests/analytics_cluster/hadoop/master.pp [10:31:47] (fixing the cr, values are wrong :/) [10:31:57] awesome elukey - I don't understand metrics settings well, but looks good :) [10:33:08] joal: so here a TL;DR: [10:33:24] HDFS percentage used: 70% warning, 80% critical [10:33:42] missing/corrupt blocks: 2 warning, 5 critical (checked historic data and it seems good enough) [10:34:06] the last one for 180mins, the first for 30 mins [10:35:22] k [10:36:14] elukey: the thing that didn't make total sense to me was the "percentage => 60" field [10:37:05] ah yes, it is (IIRC) the number of datapoints of the timespan that are crossing the thresholds [10:37:17] I use 60 usually but it could be other values [10:37:59] named percentage? [10:38:09] sounds weird [10:38:37] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Monitor more HDFS health metrics - https://phabricator.wikimedia.org/T163908#3403892 (10elukey) [10:39:32] # $percentage - Number of datapoints exceeding the [10:39:33] # threshold. Defaults to 1%. [10:39:36] joal: --^ [10:39:52] (define monitoring::graphite_threshold() [10:40:00] ok [10:42:46] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Monitor more HDFS health metrics - https://phabricator.wikimedia.org/T163908#3403899 (10elukey) a:03elukey [10:43:37] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Monitor more HDFS health metrics - https://phabricator.wikimedia.org/T163908#3214017 (10elukey) [10:44:05] * elukey sees that joal does not trust him [10:47:35] joal: I'd go to a quick lunch if you don't mind, but if you are about to make big changes I'll wait [10:51:18] (03PS1) 10Joal: Bump mediawiki sqoop orm jar to 0.0.2 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/363163 [10:51:26] elukey: trying that --^ [10:51:39] elukey: once done with that, I'll deploy :) [10:52:01] elukey: If you can just double check space on stat1002 before you leave, that'll be ok :) [10:52:54] elukey: Ah, I had not seen your previous message: I obviously don't trust you when you use names that are weird :-P [10:55:05] ahahahahah [10:55:25] stat1002 looks ok [10:55:33] I'll wait the end of the deployment :) [10:55:46] thanks elukey [10:56:02] (03CR) 10Joal: [V: 032 C: 032] "Self merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/363163 (owner: 10Joal) [10:57:05] !log Deploying refinery with scap [10:57:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:57:34] elukey: I DIDN'T forget the scap message :) [10:58:57] \o/ [11:03:12] elukey: success in scap deploy :) [11:03:34] super [11:03:37] !log Deploying refinery onto hdfs [11:03:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:03:53] elukey: please have lunch, we'll merge and check druid deletion patch when you're back [11:07:10] joal: we can merge it now, no problem, I thought you were doing other things! [11:08:20] elukey: deployment done, I'm restarting failed uniques job, we can merge druid-deletion patch [11:08:29] all right, merging [11:08:37] elukey: Next thing (after lunch and my afternoon break) is sqoop regular fail [11:08:42] thanks [11:10:02] joal: wasn't the fix deployed with the refinery? [11:10:03] !log Restart unique_devices-per_project_family-monthly-coord after correction deployed [11:10:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:11:32] joal: merged! [11:11:46] elukey: right - let's wait for something to break ;) [11:13:37] elukey: okey, uniques monthly job seems better, hoppefully druid-data-deletion will be as well [11:13:56] elukey: puppet has run, crontab is updated [11:14:22] super [11:14:25] elukey: running the cron command manuall to check [11:14:27] I was running it now [11:14:31] heheh [11:14:32] (puppet) [11:15:32] elukey: I messed up by not calling python explicitely on the cron command :( [11:15:47] elukey: I'm very sorry [11:16:14] joal: ah it is not in the shebang? [11:16:26] no it is [11:16:27] it is, but file is not executable :( [11:17:29] ah right, I forgot to double check.. lemme see how to fix it [11:18:37] (03PS1) 10Joal: Add execution right to druid data drop script [analytics/refinery] - 10https://gerrit.wikimedia.org/r/363166 [11:18:43] elukey: --^ [11:19:12] * joal hides in sadness [11:19:19] yeah just saw the others, I thought it was managed via puppet [11:19:34] come ooonnn if these are all the major issues for a deployment we are really good :) [11:19:44] :) [11:20:03] let's merge and deploy! [11:20:06] yup [11:20:21] elukey: I'll merge and deploy, can you check space on stat1002? [11:20:43] already done, it is good [11:20:46] (03CR) 10Joal: [V: 032 C: 032] "Merging for redeploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/363166 (owner: 10Joal) [11:21:18] !log Redeploying refinery with scap [11:21:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:26:47] looks good now [11:27:35] elukey: just checked: looks good indeed :) [11:27:45] Thanks mate for having waited [11:27:57] you are super welcome [11:28:01] We'll care sqoop later on after standup if you agree [11:28:05] sure [11:28:10] Awesome :) [11:28:15] Enjoy your lunch ! [11:28:17] didn't get what is missing to do, I thought you fixed it? [11:28:31] hm elukey - I think that's all :) [11:28:48] elukey: I fixed some aspect of it, but there still is some other issue [11:29:15] ahhhhhh [11:29:21] all right :) [11:30:09] I htink emmanuel is doing stuff on the DBs, and I wonder if this is not what makes the jobs fail [11:30:17] We'll discuss that later :) [11:41:52] * elukey lunch! [14:03:02] hellooo [14:04:42] o/ [14:21:53] joal: I forgot, do you want to change the yarn queues? [15:00:30] joal teaaaaaaaaam europe! [15:02:54] elukey: updating the yarn queues would be great ! [15:03:08] joal: do you want to do it after standup? [15:05:22] sounds good elukey [15:17:37] (we are going to do it tomorrow to properly drain the cluster first) [15:28:49] (03PS1) 10Addshore: Remove revisionslider form bf list [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363202 [15:32:49] (03PS2) 10Addshore: Remove revisionslider form bf list [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363202 [15:33:05] (03PS1) 10Addshore: Remove revisionslider form bf list [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363203 [15:33:12] (03CR) 10Addshore: [C: 032] Remove revisionslider form bf list [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363202 (owner: 10Addshore) [15:33:15] (03CR) 10Addshore: [C: 032] Remove revisionslider form bf list [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363203 (owner: 10Addshore) [15:33:20] (03Merged) 10jenkins-bot: Remove revisionslider form bf list [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363202 (owner: 10Addshore) [15:33:25] (03Merged) 10jenkins-bot: Remove revisionslider form bf list [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363203 (owner: 10Addshore) [15:51:06] (03PS1) 10Addshore: Remove 2 betafeatures that are not in wgBetaFeaturesWhitelist [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363208 [15:51:18] (03CR) 10Addshore: [C: 032] Remove 2 betafeatures that are not in wgBetaFeaturesWhitelist [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363208 (owner: 10Addshore) [15:51:23] (03PS1) 10Addshore: Remove 2 betafeatures that are not in wgBetaFeaturesWhitelist [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363209 [15:51:26] (03Merged) 10jenkins-bot: Remove 2 betafeatures that are not in wgBetaFeaturesWhitelist [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363208 (owner: 10Addshore) [15:51:28] (03CR) 10Addshore: [C: 032] Remove 2 betafeatures that are not in wgBetaFeaturesWhitelist [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363209 (owner: 10Addshore) [15:51:36] (03Merged) 10jenkins-bot: Remove 2 betafeatures that are not in wgBetaFeaturesWhitelist [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/363209 (owner: 10Addshore) [17:18:56] * elukey off! [17:51:54] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jul-Sep 2017): Find out (and fix) why we have a higher number of identity entries than before switching to new Bitergia DB scheme - https://phabricator.wikimedia.org/T168217#3405259 (10Aklapper) 05Open>03Resolved I found a bunch of `uuid`s (like `c... [17:59:09] 10Quarry: Provide a way to hyperlink texts - https://phabricator.wikimedia.org/T74874#771075 (10Framawiki) [[ https://phabricator.wikimedia.org/T74874 | Asked by a user here ]] [18:01:04] 10Analytics, 10MediaWiki-API, 10RESTBase-API, 10Services: Top API user agents stats - https://phabricator.wikimedia.org/T142139#3405280 (10Tgr) >>! In T142139#2927394, @Tgr wrote: > ApiAction is collected via varnishkafka and does include cached requests. Stats aggregation is T137321. Actually, ApiAction... [18:10:05] Guys, looks like I'm going to be off from now on - Not sure yet, but looks like it's on its way [18:17:42] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3405328 (10mforns) Note that the records were not duplicated into tables. They were misdirected into a wr... [18:28:36] joal, !!!!!!!!!!!!! [18:28:38] :D [19:04:24] joal: \o/ [19:10:45] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3405392 (10kaldari) Since the records were misdirected rather than split into both tables, I think it wou... [19:50:21] 10Analytics: Bucketize userEditCount in EL instrumentation - https://phabricator.wikimedia.org/T169672#3405428 (10mforns) [19:53:30] 10Analytics: Bucketize userEditCount in EL instrumentation - https://phabricator.wikimedia.org/T169672#3405455 (10mforns) [19:59:41] 10Analytics: Fix EventLogging editCountBucket fields historically - https://phabricator.wikimedia.org/T169674#3405456 (10mforns) [22:45:05] 10Quarry: Provide a way to hyperlink texts - https://phabricator.wikimedia.org/T74874#3405668 (10MZMcBride) >>! In T74874#3405277, @Framawiki wrote: > [[ https://phabricator.wikimedia.org/T74874 | Asked by a user here ]] Wrong link?