[01:52:58] 10Analytics-Kanban, 10Easy: Investigate requests flagged as pageview in analytics header coming from bots - https://phabricator.wikimedia.org/T135251#3795244 (10Nuria) [01:57:21] 10Analytics-Kanban, 10Easy: Investigate requests flagged as pageview in analytics header coming from bots - https://phabricator.wikimedia.org/T135251#3795247 (10Nuria) I could not find any requests from googlebot in couple days I spot-checked in October, now requests marked with "pageviews=1" in x-analytics do... [01:58:38] 10Analytics: Fix EventLogging editCountBucket fields historically - https://phabricator.wikimedia.org/T169674#3795250 (10Nuria) [01:59:36] 10Analytics-Cluster, 10Analytics-Kanban: Beeline does not print full stack traces when a query fails - https://phabricator.wikimedia.org/T136858#3795251 (10Nuria) a:05Milimetric>03None [02:04:36] 10Analytics-Cluster, 10Analytics-Kanban: Beeline does not print full stack traces when a query fails - https://phabricator.wikimedia.org/T136858#3795260 (10Nuria) a:03Nuria [03:09:45] (03PS4) 10Milimetric: [WIP] Work so far on simplifying and fixing breakdowns [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/391490 (https://phabricator.wikimedia.org/T180556) [03:09:53] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Work so far on simplifying and fixing breakdowns [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/391490 (https://phabricator.wikimedia.org/T180556) (owner: 10Milimetric) [03:11:19] fdans: it works, yay! :) Still rough around the edges and I've only fixed the line graph, but it works in firefox [09:21:47] https://github.com/wikimedia/operations-software-druid_exporter [09:21:51] \o/ [09:36:10] hehe elukey :) [09:38:32] (03PS2) 10Joal: Grow RestbaseMetrics spark job to count MW API [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/392700 (https://phabricator.wikimedia.org/T176785) [09:39:08] (03CR) 10Joal: [C: 032] Update MR cassandra loader to local quorum write [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/392623 (owner: 10Joal) [09:40:41] so kafka1018 is still down [09:40:54] not sure if you followed the mess that happened yesterday joal [09:40:58] https://phabricator.wikimedia.org/T181518 [09:41:10] wow, no I didn't [09:41:19] thanks for letting me know [09:42:13] elukey: is the cluster overwhelmed? [09:42:34] 10Analytics-Kanban: Fix mediawiki-history page reconstruction bug (deletes and restores) - simple patch - https://phabricator.wikimedia.org/T179690#3733045 (10JAllemandou) [09:42:52] nono it can take one host down, it is only complaining about partitions not fully replicated [09:42:54] (03Merged) 10jenkins-bot: Update MR cassandra loader to local quorum write [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/392623 (owner: 10Joal) [09:43:00] right [09:43:33] elukey: I was asking since there seems to be a plan to quickly replace k18 with a new spare [09:43:48] I hope there is! :D [09:43:58] huhu [09:44:04] need to ask to Chris/Rob/Faidon about the options [09:44:14] worst case scenario we'll use notebook1002 [09:44:54] we reimage it and let it take kafka1018's spot [09:45:02] elukey: my wonder is about how long it'll take to fix k18 - if not so long, maybe we should just operate with -1 node instead of spending a lot of time reinstanciate a new one? [09:45:06] just a dumb idea [09:45:23] nono it is a good idea, the option is on the table too [09:45:30] ok :) [09:45:39] but we have never done it before [09:51:01] joal: another thing - I am half way done with the puppet refactoring to allow prometheus exporters in hadoop, after that I'd need to reboot the whole cluster for kernel+jvm updates [09:51:25] elukey: no roblem for me [09:51:32] elukey: many thanks for doing that :) [09:51:48] elukey: by the way - do you want me to showcase you warp10? [09:52:21] joal: if you don't mind later on I'd be interested! Not a lot of time now :( [09:52:36] I know the fealing :) [12:05:17] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3796172 (10elukey) [12:05:54] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3741372 (10elukey) 05stalled>03Open [12:12:38] +HADOOP_DATANODE_OPTS="$HADOOP_DATANODE_OPTS -Xms2048m -javaagent:/usr/share/java/prometheus/jmx_prometheus_javaagent.jar=10.64.53.19:51010:/etc/hadoop/prometheus_hdfs_datanode_jmx_exporter.yaml" [12:12:42] ALMOST! [12:12:42] \o/ [12:12:55] now I need to convince andrew about this puppet code :P [12:13:28] I also discovered that we don't have jvm metrics for the journalnodes [12:13:32] * elukey cries in acorner [12:21:27] * elukey lunch [13:15:28] 10Analytics, 10ChangeProp, 10EventBus, 10Reading-Infrastructure-Team-Backlog, and 2 others: Update node-rdkafka version to v2.x - https://phabricator.wikimedia.org/T176126#3796314 (10mobrovac) [13:17:42] 10Analytics, 10ChangeProp, 10EventBus, 10Reading-Infrastructure-Team-Backlog, and 2 others: Update node-rdkafka version to v2.x - https://phabricator.wikimedia.org/T176126#3796318 (10mobrovac) @faidon back-ported [librdkafka v0.11.1](https://people.wikimedia.org/~faidon/librdkafka-0.11.1/). Next step invol... [14:01:58] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3796406 (10elukey) [14:28:31] joal: I'd need to reboot all the druid hosts.. do you have a min to coordinate/brain-bounce them on irc ? [14:28:43] Hi elukey [14:28:52] give me a minute and I'll be here [14:29:57] sure, thanks! [14:31:40] Heya elukey [14:31:43] ready now [14:32:54] soooo [14:33:15] (03PS1) 10Fdans: [wip] Add pageview by country oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) [14:33:28] I'd say to start with druid100[456] [14:33:45] for each host, stop all the daemons, reboot, check that they are up [14:34:19] mw history load jobs should be fine right? Didn't check but they should be once a day no? [14:35:12] (03PS2) 10Fdans: [wip] Add pageview by country oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) [14:35:27] elukey: no autiomatic loading currently on [456], only manual - so no problem [14:35:51] ah nice [14:38:32] rebooting druid1004 now [14:44:05] Shilad what do you ned? [14:44:07] need? [14:46:53] (03PS3) 10Fdans: [wip] Add pageview by country oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/394062 (https://phabricator.wikimedia.org/T181521) [15:00:19] joal: do you have a couple of minutes to talk about the grouping sets trickaroo? [15:00:35] I do fdans [15:00:54] à la batcaveaux! [15:00:59] :d [15:08:11] joal: lost you on hangouts! [15:26:29] will be a bit late for ops sync people, sorry! (some phone calls to make) [15:28:37] (03PS5) 10Joal: Update mediawiki-history page reconstruction [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/388267 (https://phabricator.wikimedia.org/T179074) [15:28:39] (03PS1) 10Joal: Update mediawiki-history page reconstruction - 2 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/394068 [15:29:23] milimetric: Hello :) If you have a minute, I'd love a quick CR on the patches above - before I go and test them this evening [15:29:58] milimetric: idea is not to be able to merge, rather to ensure I have not be blind on some stuff [15:30:04] will look in a few minutes joal [15:31:05] thanks :) [15:57:11] joal: to be honest, I couldn't recreate the problems and evaluate the code very well in my mind. But nothing seems obviously wrong and I really love the explanation of the two situations you're solving in the comments above the new tests. So good. [15:57:27] to do a proper review I would need a lot more time [15:57:46] makes sense milimetric [15:58:05] but the first two are good, one I had reviewed before and it didn't change much, and the other looks good [15:58:27] Only choice that differs from what we discussed yesterday is the update of create-event-startTimestamp [15:58:36] It made life way easier in linking after [16:00:34] ping milimetric [16:03:21] aha, I see joal: https://gerrit.wikimedia.org/r/#/c/394068/1/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/page/PageHistoryBuilder.scala@362 [16:06:24] (03CR) 10Milimetric: [C: 032] Update mediawiki-history page reconstruction - 2 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/394068 (owner: 10Joal) [16:06:28] (03CR) 10Milimetric: [C: 032] Update mediawiki-history page reconstruction [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/388267 (https://phabricator.wikimedia.org/T179074) (owner: 10Joal) [16:10:49] (03Merged) 10jenkins-bot: Update mediawiki-history page reconstruction [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/388267 (https://phabricator.wikimedia.org/T179074) (owner: 10Joal) [16:10:51] (03Merged) 10jenkins-bot: Update mediawiki-history page reconstruction - 2 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/394068 (owner: 10Joal) [16:32:47] nuria_: can you invite me to SoS? [16:40:00] fdans: done [16:40:42] 10Analytics-Dashiki, 10Analytics-Kanban: Browser-major numbers represented as a percentage (when they're not) on the desktop tabular view - https://phabricator.wikimedia.org/T181164#3796956 (10Nuria) 05Open>03Resolved [16:41:10] 10Analytics-Kanban, 10Easy: Investigate requests flagged as pageview in analytics header coming from bots - https://phabricator.wikimedia.org/T135251#3796960 (10Nuria) 05Open>03Resolved [16:41:31] 10Analytics, 10DBA, 10Patch-For-Review: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#3796962 (10Nuria) [16:41:34] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Improve purging for analytics-slave data on Eventlogging - https://phabricator.wikimedia.org/T156933#3796961 (10Nuria) 05Open>03Resolved [16:41:39] elukey: got a second to cave? [16:43:36] fdans: sure [16:48:31] 10Analytics, 10ORES, 10Scoring-platform-team: Enable ores::base on stat1006 - https://phabricator.wikimedia.org/T181646#3796974 (10Halfak) [16:48:38] o/ [16:48:56] I can't remember why ores::base isn't already on stat1006 [16:52:26] halfak: different os? [16:52:44] halfak: that might not support the install? [16:52:49] halfak: total wild guess [16:52:53] halfak: might be wrong 100% [16:53:09] It could be. We're aiming to support stretch in our new cluster so that's a short-term problem at worst. [16:53:32] was it supposed to be on stat1006? :) [16:54:16] stat1006 would be a better place to do ORES work. [16:54:38] Because it's somewhat under-utilized and we're not going to be using Spark/Hive any time soon. [16:55:07] I'm setting up on stat1005 right now though to see if it'll work. [16:58:58] halfak: probably because it was only added for hadoop stuff, and stat1006 is not a hadoop node [16:59:02] so we can probably include that easiliy there [16:59:15] \o/ [16:59:17] halfak: i betcha it is on stat1004 though [16:59:19] which is jessie [16:59:26] since stat1004 is a hadoop node [16:59:26] I'll utilize stat1005 :) [16:59:32] Jessie is going down. [16:59:33] k [16:59:38] But I could use it for now. [17:16:12] joal: https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=druid_analytics&var-instance=All [17:16:19] I forgot to mention it to you [17:16:34] it is kinda like the old Ganglia overview [17:16:38] that you liked a lot :) [17:16:57] Thanks elukey :) [17:32:01] nuria_ milimetric i’ll be in in 2 min, gad to go out a second [17:33:40] ottomata: I heard a rumor there is a GPU on one of the stats servers. Could you tell me more? [17:50:14] * elukey off! [17:52:10] Shilad: https://phabricator.wikimedia.org/T148843 [17:53:21] Shilad: especially more recent comments. tl;dr, if you can figure out how to use it, you will be a hero [19:04:34] Hah. Thank you! I won't get to it for a few weeks, but I'm very interested. [19:04:46] ottomata: ^ [19:06:45] :) [20:30:05] heads-up: the web team plans to re-enable the popups instrumentation for another week (to record data from a new version of the schema) https://phabricator.wikimedia.org/T181493 [20:43:32] ottomata: Just scanning through the phab ticket on GPUs and noticed it's an AMD GPU. That definitely makes tensorflow much trickier. [21:19:32] !log rerun webrequest-druid-hourly-wf-2017-11-29-18 [21:19:33] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:36:57] joal: me learning about hive rank [21:37:02] joal: so many things [21:38:05] nuria_: by hive rank you probably mean SQL-analytics functions: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics [21:46:03] !log rerun pageview-druid-hourly-wf-2017-11-29-18 and pageview-druid-hourly-wf-2017-11-29-19 [21:46:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log