[01:03:41] PROBLEM - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message consume rate in last 30m on einsteinium is CRITICAL: 0 le 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [01:17:47] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Fix Mirror Maker erratic behavior when replicating from main-eqiad to jumbo - https://phabricator.wikimedia.org/T189464#4220038 (10Ottomata) Just got another ``` [2018-05-21 00:05:59,778] 2363 [mirrormaker-thread-5]... [01:20:39] !log bouncing main -> jumbo MirrorMaker with increased max.request.size - T189464 [01:20:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [01:20:43] T189464: Fix Mirror Maker erratic behavior when replicating from main-eqiad to jumbo - https://phabricator.wikimedia.org/T189464 [01:22:51] RECOVERY - Kafka MirrorMaker main-eqiad_to_jumbo-eqiad average message consume rate in last 30m on einsteinium is OK: (C)0 le (W)100 le 3097 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_jumbo-eqiad [05:59:44] Hi team - forgot to mention it last week but today is bank holiday in France, I'll be off [06:24:27] ack! [06:39:30] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4189390 (10SimonPoole) IMHO there are multiple, mainly communications related issues, that continue to lead to confusion * people actually need to read the text of the CC0 licence, you... [07:28:45] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728#4220248 (10Cirdan) >>! In T193728#4213806, @Micru wrote: >> since I hold the rights to that text > > The concept of "rights" is quite flexible, as shows Wikipedia. The Wikipedias are b... [07:55:42] (03CR) 10Elukey: "Thanks Dan!" (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/433597 (https://phabricator.wikimedia.org/T194055) (owner: 10Elukey) [08:27:45] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#4220305 (10Halfak) We talked in IRC and I think @Pchelolo and @Ottomata understand this now. The TLDR is that i... [08:30:54] (03PS2) 10Elukey: [WIP] Index some fields from isp_data and geocoded_data in Druid's webrequest [analytics/refinery] - 10https://gerrit.wikimedia.org/r/433597 (https://phabricator.wikimedia.org/T194055) [08:36:08] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Add maxmind ip info to webrequest dataset on druid - https://phabricator.wikimedia.org/T194055#4220340 (10elukey) >>! In T194055#4212116, @Nuria wrote: > For webrequest data even city does not seem of much help (given that it is sampled t... [09:25:22] 10Analytics, 10Analytics-Cluster: Archiva web UI formats links to download artifacts with local IP address - https://phabricator.wikimedia.org/T67628#4220443 (10elukey) 05Open>03Resolved a:03elukey Don't see this issue anymore, closing. [09:27:28] 10Analytics: Register refinery-hive UDF functions as stored names, so we don't have to CREATE TEMPORARY FUNCTION every time. - https://phabricator.wikimedia.org/T95455#4220449 (10elukey) [09:28:22] 10Analytics: Create HivePartitioner in Camus - https://phabricator.wikimedia.org/T92494#4220450 (10elukey) [09:28:51] 10Analytics: Admin can perform "Database Auditing" on the cluster to be aware of actions of cluster users - https://phabricator.wikimedia.org/T85288#4220451 (10elukey) [09:29:32] 10Analytics: Investigate getting redirect_page_id as an x_analytics field using the X analytics extension. {pika} - https://phabricator.wikimedia.org/T89397#4220454 (10elukey) [09:29:57] 10Analytics: MD5 checksums missing from pagecounts-all - https://phabricator.wikimedia.org/T73710#4220455 (10elukey) [09:32:10] 10Analytics: Make oozie write external stats per default - https://phabricator.wikimedia.org/T70569#4220457 (10elukey) [09:32:33] 10Analytics: Make oozie use system's libpath per default - https://phabricator.wikimedia.org/T70568#4220458 (10elukey) [09:32:54] 10Analytics: System deletes non-sanction tables older than 90 days - https://phabricator.wikimedia.org/T67949#4220459 (10elukey) [09:33:16] 10Analytics: pagecount for api.php - https://phabricator.wikimedia.org/T47121#4220460 (10elukey) [09:33:31] 10Analytics, 10Operations, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#4220461 (10elukey) [09:34:25] 10Analytics, 10Analytics-Cluster, 10Operations, 10Packaging: libcglib3-java replaces libcglib-java in Jessie - https://phabricator.wikimedia.org/T137791#4220464 (10elukey) 05Open>03declined [09:35:45] 10Analytics, 10Analytics-Cluster: Queries in Hue always return an empty result set - https://phabricator.wikimedia.org/T128039#4220468 (10elukey) 05Open>03declined Given how old this task is, I suppose that there is not much to do. Please re-open if necessary! [09:36:53] 10Analytics, 10Analytics-Cluster: hive shows a flood of warnings on content_type LIKE ''..." - https://phabricator.wikimedia.org/T113014#4220473 (10elukey) 05Open>03declined Closing super stale tasks in the Analytics-Cluster queue. This one doesn't seem to be an issue anymore, but please re-open if necessary. [09:37:29] 10Analytics: Install SparkR on the cluster - https://phabricator.wikimedia.org/T102485#4220476 (10elukey) [09:37:54] 10Analytics: Identify mobile apps as a separate agent_type - https://phabricator.wikimedia.org/T96324#4220477 (10elukey) [09:39:08] 10Analytics: Augment refinery-dump-status-webrequest-partitions script to show more useful status of webrequest raw partitions - https://phabricator.wikimedia.org/T92499#4220480 (10elukey) [09:43:05] Did some cleanup :P [10:30:30] 10Analytics, 10Analytics-Wikistats: Beta: Y-axis units and rounding issues - https://phabricator.wikimedia.org/T187429#4220539 (10sahil505) Just to put it up, the #2 issue is not specific to just line charts but it is also observed in bar charts. Here is an image:{F18421670} [10:31:05] 10Analytics, 10Analytics-Wikistats: Beta: Y-axis units and rounding issues - https://phabricator.wikimedia.org/T187429#4220547 (10sahil505) [10:31:59] * elukey lunch + errand! [10:50:16] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1005 is CRITICAL: Return code of 255 is out of bounds [11:20:27] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on stat1005 is OK: OK [12:54:29] (03PS1) 10Sahil505: Corrected y-axis labels to one decimal place to avoid similar labels [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/434326 (https://phabricator.wikimedia.org/T187429) [13:00:18] (03CR) 10Sahil505: "> Uploaded patch set 1." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/434326 (https://phabricator.wikimedia.org/T187429) (owner: 10Sahil505) [13:58:03] o/ [13:58:56] ottomata: o/ [14:22:04] ottomata elukey ok for me to deploy refinery source? [14:24:43] (03PS1) 10Fdans: Update changelog.md for v0.0.64 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/434340 [14:27:08] fdans: sure I don't see any issue.. qs: did you triple check that all the patches have been merged? [14:28:03] elukey: everything that I'm listing on the changelog has been merged afaik [14:28:06] super [14:28:23] also, do we need to do some follow ups after the deploy? Job restart, etc.. [14:28:26] ? [14:29:24] elukey: my ua_parser change affects the parsing of the user agent strings, so I imagine so, right? [14:29:55] also I have to send an email to the analytics list about the changes [14:30:40] sure, I didn't follow the ua_parser change but let's chat during standup about what are the follow ups after the deploy [14:30:46] but you can proceed if you want! [14:33:08] elukey: thank you! merging changelog [14:35:15] elukey: I'll be in the batcave for anything :) [14:36:10] (03CR) 10Fdans: [C: 032] Update changelog.md for v0.0.64 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/434340 (owner: 10Fdans) [14:40:28] !log Deploying refinery-source v0.0.64 using Jenkins [14:40:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:07:10] (03PS1) 10Fdans: Bump up version of jar versions in oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/434346 [15:25:05] 10Analytics: Install SparkR on the cluster - https://phabricator.wikimedia.org/T102485#4220921 (10Ottomata) 05Open>03Resolved a:03Ottomata This is part of spark2. [15:25:24] 10Analytics: Identify mobile apps as a separate agent_type - https://phabricator.wikimedia.org/T96324#4220925 (10Milimetric) 05Open>03declined access_method contains this information [15:25:53] 10Analytics: Make oozie write external stats per default - https://phabricator.wikimedia.org/T70569#4220927 (10Ottomata) 05Open>03declined [15:26:07] 10Analytics: Make oozie use system's libpath per default - https://phabricator.wikimedia.org/T70568#4220929 (10Ottomata) 05Open>03declined [15:27:53] 10Analytics: Create HivePartitioner in Camus - https://phabricator.wikimedia.org/T92494#4220934 (10Ottomata) 05Open>03declined Not going to do this, hope to move away from Camus one day. [15:29:34] 10Analytics: Admin can perform "Database Auditing" on the cluster to be aware of actions of cluster users - https://phabricator.wikimedia.org/T85288#4220941 (10Milimetric) 05Open>03declined [15:30:30] 10Analytics: System deletes non-sanction tables older than 90 days - https://phabricator.wikimedia.org/T67949#4220944 (10Milimetric) 05Open>03Invalid [15:31:12] 10Analytics: Register refinery-hive UDF functions as stored names, so we don't have to CREATE TEMPORARY FUNCTION every time. - https://phabricator.wikimedia.org/T95455#4220947 (10Milimetric) p:05Low>03Lowest [15:32:49] 10Analytics: Augment refinery-dump-status-webrequest-partitions script to show more useful status of webrequest raw partitions - https://phabricator.wikimedia.org/T92499#4220950 (10Milimetric) 05Open>03declined We usually need to look further if the email flags any problems, and it doesn't happen very often [15:34:59] 10Analytics: pagecount for api.php - https://phabricator.wikimedia.org/T47121#4220953 (10Milimetric) 05Open>03Resolved a:03Milimetric This is covered by the Pageview definition, and the pageview dumps, https://meta.wikimedia.org/wiki/Research:Page_view [15:59:34] ottomata: https://gerrit.wikimedia.org/r/434350 - anything against it? Just reimaged one druid node in labs and realized that without use_cdh the option is missing :( [16:00:36] oh! yes. [16:00:37] sounds good [16:03:16] 10Analytics, 10EventBus: SSL and inter broker encryption for Kafka main - https://phabricator.wikimedia.org/T193778#4220999 (10Ottomata) [16:03:29] thanks! [16:10:38] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#4221008 (10Ottomata) [16:11:26] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3319947 (10Ottomata) [16:11:49] 10Analytics, 10EventBus, 10ORES, 10Reading-Infrastructure-Team-Backlog, and 3 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3319947 (10Ottomata) [16:17:51] 10Analytics, 10Android-app-Bugs, 10Wikipedia-Android-App-Backlog: Count link previews on the Android app - https://phabricator.wikimedia.org/T194961#4221017 (10LGoto) Hi @mpopov Is this for you? [16:30:55] 10Analytics, 10Android-app-Bugs, 10Wikipedia-Android-App-Backlog: Count link previews on the Android app - https://phabricator.wikimedia.org/T194961#4221056 (10mpopov) >>! In T194961#4221017, @LGoto wrote: > Hi @mpopov Is this for you? Analytics Engineering as far as I know [16:34:36] ottomata: sooo I'm also deploying this https://github.com/wikimedia/analytics-refinery-source/commit/444cec0ee73d952e224b065b2a16341eb3390bda [16:35:01] which affects the refinery-job jar, but I'm not sure if which versions should be bumped up [16:36:25] fdans: that isn't run as an oozie [16:36:26] job [16:36:31] its scheduled by cron in puppet [16:36:32] so [16:36:34] we can do that later [16:36:37] and watch it more carefully [16:36:45] the cron should use the old jar for now [16:37:06] ohh ok, otherwise the only occurrence I find is in webrequest load [16:38:42] (03PS2) 10Fdans: Bump up version of jar versions in oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/434346 [16:38:45] great [16:38:47] sounds right to me [16:39:01] (03CR) 10Ottomata: [C: 032] Bump up version of jar versions in oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/434346 (owner: 10Fdans) [16:39:03] (03CR) 10Ottomata: [V: 032 C: 032] Bump up version of jar versions in oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/434346 (owner: 10Fdans) [16:46:04] !log deploying refinery [16:46:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:46:26] joal: o/ - are you working with KIS on d-1/hadoop-coordinator now? [17:05:35] ottomata: one side effect of not having the hadoop-cdh extension in druid is that in the ingestion task the name node host domain is needed, as opposed to simply the name of the hadoop cluster (the one stated in /etc/hadoop/conf/etc..) [17:05:50] just discovered it now [17:07:42] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Reimage stat1004 with Debian Stretch - https://phabricator.wikimedia.org/T192640#4221075 (10Ottomata) [17:07:53] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Reimage stat1004 with Debian Stretch - https://phabricator.wikimedia.org/T192640#4145568 (10Ottomata) a:05elukey>03Ottomata [17:12:10] elukey: i think i missed something, we can't use the cdh extension anymore? [17:12:25] (or forgot something) [17:16:46] yes do you remember that we discussed about Druid bumping the hadoop-client to 2.7.3 + some jackson issues that were solved not using the cdh extension with 0.11? [17:16:57] 10Analytics, 10EventBus: SSL and inter broker encryption for Kafka main - https://phabricator.wikimedia.org/T193778#4221089 (10Ottomata) [17:17:01] YES [17:17:01] ok [17:17:15] don't htink i made the connection between that and not using hadoop extension [17:17:16] hm. [17:17:23] really? changing that means we need to provide that config? [17:17:28] elukey: maybe we just need to manually set HADOOP_HOME [17:17:30] or somehting? [17:17:32] HADOOP_CONF_DIR [17:17:33] ? [17:18:14] I think that if we provide the config it will work [17:18:21] i don't think that the reading of e.g. hdfs-site.xml for the HA clust ername is a cloudera specific thing [17:18:26] it just needs to know where to look ya? [17:18:27] ya ok [17:18:38] going to check a couple of things [17:18:44] thanks for the brain bounce :) [17:24:24] ottomata: to log in to hue I need my labs login right? It's not working for moi [17:25:23] fdans: did you try with your wikitech creds? [17:25:44] elukey: yessir [17:25:59] fdans: shell username [17:26:03] is this the first time that you login ? [17:26:03] ldap/labs pw [17:26:43] elukey: I remember logging in to hue ages ago, so it shouldn't be [17:27:52] I just synced your account in hue anyway [17:28:10] user fdans [17:28:15] are you able to access now [17:28:15] ? [17:28:34] yessss elukey thank you [17:35:38] ottomata: I'm going to restart the webrequest load bundle, can we hang out for a couple mins in the batcave? [17:35:54] sure, gimme 10 mins? [17:35:55] maybe less? [17:36:16] sure! I'll be in there [18:01:49] 10Analytics, 10EventBus, 10Patch-For-Review: SSL and inter broker encryption for Kafka main - https://phabricator.wikimedia.org/T193778#4221138 (10Ottomata) [18:02:04] 10Analytics, 10EventBus, 10Patch-For-Review: SSL and inter broker encryption for Kafka main - https://phabricator.wikimedia.org/T193778#4179461 (10Ottomata) [18:03:48] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review: SSL and inter broker encryption for Kafka main - https://phabricator.wikimedia.org/T193778#4179461 (10Ottomata) [18:16:57] ottomata: HADOOP_CONF_DIR did the trick! \o/ [18:17:07] yes! [18:17:07] :) [18:17:53] tomorrow I'll amend the code change for the druid 0.11, so everything will work even if we reimage [18:17:56] thanks! [18:18:15] going off team, talk with you tomorrow o/ [18:22:48] ottomata: let's merge this thing? don't know how you feel about deploying now or waiting [18:22:48] https://gerrit.wikimedia.org/r/#/c/427624/ [18:23:55] fdans: sure let's do it! [18:24:00] gimme 5 mins to finish an email :) [18:24:17] I'll be in the batcaviar [18:46:44] !log restarting eventlogging with python-ua-parser 0.8.0 [18:46:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:13:08] (03PS1) 10Fdans: Add nyc.wikimedia and mai.wikimedia to the whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/434370 [19:16:38] !log restarted eventlogging file log consumers with new consumer groups beginning at end of topic [19:16:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:20:34] way better now fdans https://grafana-admin.wikimedia.org/dashboard/db/kafka-by-topic?refresh=5m&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=All&var-topic=eventlogging-valid-mixed&from=now-1h&to=now&panelId=33&fullscreen [19:22:03] also interseting [19:22:03] https://grafana.wikimedia.org/dashboard/db/kafka-consumer-lag?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=jumbo-eqiad&var-topic=All&var-consumer_group=eventlogging_consumer_files_00&from=now-30d&to=now [19:22:07] that's why it did that [19:22:24] it stopped committing offsets on may 16? [19:22:27] why? weird. [19:33:40] 10Analytics, 10Research: Upload XML dumps to hdfs - https://phabricator.wikimedia.org/T186559#4221222 (10Neil_P._Quinn_WMF) [19:39:26] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review: SSL and inter broker encryption for Kafka main - https://phabricator.wikimedia.org/T193778#4221240 (10Ottomata) @elukey Done in deployment-prep, looks like it does for jumbo, works fine. If you are available, what say you about doing this to... [19:43:46] Hello ottomata ! Quick question: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Data_retention_and_auto-purging#Black-listed_schemas says "schemas that are black-listed like this, do not currently have the options of partial or minimal purging. " Is it still true? [19:44:40] ah, chelsyx no. no longer true [19:44:47] i'm not sure if we have 100% deployed the hive based purging stuff [19:44:58] but we do support it, mforns woudl know more but he is out today [19:45:31] Cool. Thanks! I will file a ticket :) [20:47:21] 10Analytics, 10Operations, 10Research, 10Traffic, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#4221400 (10gh87) From Safari 11.1 (13605.1.33.1.4) at mac OS X High Sierra 10.13.4: Below happens when going from one article to another:... [20:52:11] someone straight up deleted my passport-mediawiki-oauth repository from gerrit [20:52:12] nice... [21:24:49] !log granted User:CN=kafka_fundraising_client read permissions for group fundraising* on kafka-jumbo (for kafkatee webrequest consumption: kafka acls --add --allow-principal User:CN=kafka_fundraising_client --consumer --topic '*' --group 'fundraising*' [21:24:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:59:29] 10Analytics, 10Product-Analytics: Partially purge MobileWikiAppiOSUserHistory eventlogging schema - https://phabricator.wikimedia.org/T195269#4221538 (10chelsyx) [23:01:40] 10Analytics, 10Product-Analytics: Partially purge MobileWikiAppiOSUserHistory eventlogging schema - https://phabricator.wikimedia.org/T195269#4221551 (10chelsyx)