[00:07:56] 10Analytics, 10Editing-team (Q3 2019-2020 Kanban Board), 10Product-Analytics (Kanban): Enable Editing Team members to run queries independently - https://phabricator.wikimedia.org/T224029 (10JTannerWMF) [00:13:53] 10Analytics, 10Growth-Team (Current Sprint), 10Product-Analytics (Kanban): Homepage: purge sanitized event data through 2019-11-04 - https://phabricator.wikimedia.org/T244312 (10nettrom_WMF) [00:21:58] 10Analytics, 10Editing-team, 10Product-Analytics (Kanban): Enable Editing Team members to run queries independently - https://phabricator.wikimedia.org/T224029 (10JTannerWMF) [00:22:31] 10Analytics, 10Editing-team (Tracking), 10Product-Analytics (Kanban): Enable Editing Team members to run queries independently - https://phabricator.wikimedia.org/T224029 (10JTannerWMF) [00:26:47] 10Analytics, 10Editing-team (Tracking), 10Product-Analytics (Kanban): Enable Editing Team members to run queries independently - https://phabricator.wikimedia.org/T224029 (10Nuria) Hopefully nobody feels blocked by this task as those having requested access already have it. [02:39:07] 10Analytics, 10Readers-Web-Backlog (Tracking): Report unhandled jQuery $.Deferred errors in client side error logging - https://phabricator.wikimedia.org/T244261 (10Tgr) Thanks @Niedzielski, that matches my understanding. (Aside: testing exception throwing in the dev toolbar console can be unreliable. At least... [06:12:42] goood morning [07:11:22] 10Analytics, 10Pageviews-Anomaly: Topviews Analysis of the Hungarian Wikipedia is flooded with spam - https://phabricator.wikimedia.org/T237282 (10Bencemac) The flood fortunately stopped, but the most viewed articles of 2019 is strongly occupied by them (5., 7., 8., 10–16., etc.). Would be a disclaimer possibl... [10:45:38] ok presto code review to enable kerberos+tls https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570248/ [10:45:52] will wait for Andrew's review before merging, since it is quite big [10:47:51] 10Analytics, 10Product-Analytics: Enable shell access to presto from jupyter/stats machines - https://phabricator.wikimedia.org/T243312 (10elukey) This is currently blocked by enabling kerberos for Presto (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570248/), hopefully we should be able to unblock t... [10:48:50] 10Analytics: Defining a better authentication scheme for Druid and Presto - https://phabricator.wikimedia.org/T241189 (10elukey) [10:48:53] 10Analytics, 10Security-Team, 10SecTeam Discussion, 10User-Elukey: VPN access to superset/turnilo instead of LDAP - https://phabricator.wikimedia.org/T242998 (10elukey) 05Open→03Declined After a chat with Moritz at all hands there seems to be more willingness to test a 2FA solution before thinking abo... [10:50:09] 10Analytics, 10Privacy Engineering, 10Research, 10Privacy, 10Security: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10Miriam) @JFishback_WMF and @leila - thanks, I got the data! I'll share it through stats today. [10:55:15] 10Analytics: Defining a better authentication scheme for Druid and Presto - https://phabricator.wikimedia.org/T241189 (10elukey) I think that the title of this task is a little bit misleading. Druid and Presto will need to get Kerberos authentication enabled, the problem will be how to authenticate properly all... [11:27:50] (03CR) 10DCausse: [C: 03+1] Add spark code for wikidata json dumps parsing (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 (https://phabricator.wikimedia.org/T209655) (owner: 10Joal) [11:30:31] * elukey lunch! [11:46:08] 10Analytics, 10Privacy Engineering, 10Research, 10Privacy, 10Security: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10Miriam) @JAllemandou so I moved the datasets to: /srv/published/datasets/research/zika-researc... [13:17:49] 10Analytics, 10Cite, 10Reference Previews, 10Research, and 2 others: Instrument Cite to record the nubmer of footnote marks and references list entries rendered in each article - https://phabricator.wikimedia.org/T241833 (10awight) >>! In T241833#5848149, @Miriam wrote: > Hi @awight ! I believe @tizianopic... [13:20:56] 10Analytics, 10Privacy Engineering, 10Research, 10Privacy, 10Security: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10JAllemandou) Moving the `zika-research` folder to `one-off` is a good idea, as the latter alre... [13:47:36] 10Analytics, 10Privacy Engineering, 10Research, 10Privacy, 10Security: Release data from a public health related research conducted by WMF and formal collaborators - https://phabricator.wikimedia.org/T242844 (10Miriam) Cool! Thanks @JAllemandou - the data is now under: /srv/published/datasets/one-off/zik... [14:01:29] 10Analytics, 10Cite, 10Reference Previews, 10Research, and 2 others: Instrument Cite to record the nubmer of footnote marks and references list entries rendered in each article - https://phabricator.wikimedia.org/T241833 (10Ottomata) Hm, yeah you'd have to add this to revision-create, you'd have to somehow... [14:12:30] (03CR) 10Joal: "Thanks for the quick review @DCausse :)" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 (https://phabricator.wikimedia.org/T209655) (owner: 10Joal) [15:32:53] elukey: hope you don't mind: [15:32:53] https://gerrit.wikimedia.org/r/c/operations/puppet/+/570362/1/modules/role/manifests/analytics_test_cluster/coordinator.pp [15:33:59] nono please go :) [15:34:37] there are some issues on analytics1030 now, one disk is broken and apparently it was part of the lvm phisical layout [15:34:40] still need to fix it [15:48:55] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10Nuria) @bd808 (cc @Gilles ) Is there a possibility that we have filenames that... [16:05:31] 10Analytics, 10Multimedia, 10Tool-Pageviews: Fix double encoding of urls on mediarequests api - https://phabricator.wikimedia.org/T244373 (10Nuria) [16:07:30] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10Nuria) >Thanks! I will fix this clientside for now. opened ticket to fix on API... [16:12:42] elukey@analytics1030:~$ presto --catalog analytics_test_hive [16:12:42] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 [16:12:42] presto> select * from wmf.webrequest where year=2019 and month=11 and day =18 limit 10; [16:12:45] Error running command: Kerberos error for [presto@analytics1030.eqiad.wmnet]: Unable to obtain password from user [16:12:48] \o/ \o/ [16:13:03] it works if I kinit [16:19:49] COOOOL [16:19:52] very great [16:24:14] elukey: ❤️🇮🇹 [16:24:42] (re: kerberos on presto, the flag makes it confusing) [16:24:52] ahahha [16:28:04] ottomata: getting a coffee, will be 2 mins late [16:28:25] k [16:53:41] Hi all, I'm having trouble accessing Hue (Error: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient) Any help troubleshooting would be helpful! (Also, please let me know if I should ask elsewhere) [17:04:55] eyener: hi! can you try the workaround in https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hue#Hive_query_errors_with_Kerberos ? [17:06:54] 10Analytics, 10Analytics-Kanban, 10Multimedia, 10Tool-Pageviews: Fix double encoding of urls on mediarequests api - https://phabricator.wikimedia.org/T244373 (10Nuria) a:05MusikAnimal→03fdans [17:12:06] 10Analytics, 10Analytics-Kanban, 10Multimedia, 10Tool-Pageviews: Fix double encoding of urls on mediarequests api - https://phabricator.wikimedia.org/T244373 (10Nuria) [17:16:03] 10Analytics: Add druid load job for data quality table - https://phabricator.wikimedia.org/T244379 (10mforns) [17:17:01] elukey that worked, thank you! A few queries I'm trying via both Hue and beeline are returning 0 rows (which is not expected). I was also having trouble with an ALTER statement giving me a 'permissions denied' error on the same database - would this be something I could get help troubleshooting? [17:17:43] yes sure! We are in meetings now so you can write, somebody will answer (if not urgent) [17:23:11] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update wikitext-processing on hadoop various aspects - https://phabricator.wikimedia.org/T238858 (10JAllemandou) a:03JAllemandou [17:23:44] 10Analytics: Convert siteinfo dumps from json to parquet - https://phabricator.wikimedia.org/T244380 (10JAllemandou) [17:24:52] Not urgent, thanks elukey! I'll post the query here, but let me know if a ticket / email would be better. [17:25:24] eyener: probably a task is better so we can work on it asyncronously [17:25:30] you can tag "Analytics" [17:25:49] Perfect, thanks for the guidance elukey. I'll make that now. [17:25:58] thanks! [17:55:15] PROBLEM - Webrequests Varnishkafka log producer on cp5012 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.132.0.112: Connection reset by peer https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [17:55:22] meeeeh [17:55:52] the host is failing on #ops [17:59:50] 10Analytics, 10Analytics-Kanban: Fix hdfs-rsync`prune-empty-dirs` feature - https://phabricator.wikimedia.org/T243832 (10JAllemandou) p:05Triage→03High [18:13:13] diner with kids, back in a bit to deloy [18:14:01] RECOVERY - Webrequests Varnishkafka log producer on cp5012 is OK: PROCS OK: 1 process with args /usr/bin/varnishkafka -S /etc/varnishkafka/webrequest.conf https://wikitech.wikimedia.org/wiki/Analytics/Systems/Varnishkafka [18:26:24] ah mforns what was the issue with druid and requests? [18:26:32] if you want we can bc for a moment and check [18:29:03] 10Analytics: Ingest data quality into druid for visualization - https://phabricator.wikimedia.org/T244388 (10Nuria) [18:30:09] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10User-Elukey: Upgrade to Superset 0.35.2 - https://phabricator.wikimedia.org/T242870 (10Nuria) 05Open→03Resolved [18:30:11] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Superset Updates - https://phabricator.wikimedia.org/T211706 (10Nuria) [18:30:39] 10Analytics, 10Analytics-Kanban, 10User-Elukey: No queries run in Hue - https://phabricator.wikimedia.org/T242306 (10Nuria) 05Open→03Resolved [18:31:10] 10Analytics, 10Analytics-Kanban: Removed not used CDH packages from Hadoop nodes - https://phabricator.wikimedia.org/T242754 (10Nuria) 05Open→03Resolved [18:31:30] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Investigate Hue alarms - https://phabricator.wikimedia.org/T241649 (10Nuria) 05Open→03Resolved [18:41:50] going to complete the presto work tomorrow, o/ [19:03:15] ottomata: hello! [19:03:59] ottomata: I'm about to send a patch on hdfs-tools-deploy - shall I remove version 0.0.2 and keep only current (0.0.3) and new (0.0.5)? [19:07:33] (03CR) 10Joal: [C: 03+2] Change format of data_quality_stats to parquet [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/566836 (https://phabricator.wikimedia.org/T241375) (owner: 10Mforns) [19:11:58] (03Merged) 10jenkins-bot: Change format of data_quality_stats to parquet [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/566836 (https://phabricator.wikimedia.org/T241375) (owner: 10Mforns) [19:14:13] a-team I'm going to release refinery-0.0.113 - last call before deploy :) [19:17:20] (03PS1) 10Joal: Bump changelog.md to v0.0.113 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/570403 [19:19:10] nuria: about archiving with analytics user, you first need to create the parent folder and chown it to analytics with hdfs user (from an-coord1001 [19:20:16] looks like noone is here currently - movign the deploy along [19:20:43] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merge for deploy" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/570403 (owner: 10Joal) [19:25:02] joal: I am here! [19:25:06] joal: bc? [19:25:17] sure nuria [19:39:35] ottomata: Hello!!! I need help with jnekins please :( [19:40:14] hi be with you in 5 mins! [19:40:26] Thanks ottomata :) [19:45:26] joal: is it the password thing? [19:45:42] expired pw? [19:45:59] I think so ottomata? not sure at all - logs say: ReasonPhrase:Unauthorized. [19:49:44] i thought we documented this last time... [19:50:24] joal: what's the deploy etherpad? [19:50:30] https://etherpad.wikimedia.org/p/analytics-weekly-train [19:53:03] I'm looking for a table in hdfs with ores scores for revisions [19:53:23] halfak says that it exists :) [19:53:23] o/ ottomata ^ [19:53:53] Ooh. Looks like this might be relevant https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/ORES [19:54:20] OOOO [19:54:26] that does seem like what i'm looking for [19:56:30] joal try deploy again [19:56:34] sure [19:57:13] groceryheist: i'm not totally sure where those tables come from, but the originating data is event.mediawiki_revision_score [19:59:20] so my spark can't find ores.revision_score [19:59:25] but it does find event.mediawiki_revision_score [19:59:33] Aha! [20:00:30] oh my. that table is partitioned like crazy [20:00:47] "Listing leaf files and directories for 14063 paths:" [20:01:54] (03PS1) 10Joal: Deploy hdfs-tools 0.0.5 [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/570420 [20:02:01] ottomata: --^ please :) [20:08:21] (03CR) 10Ottomata: [C: 03+1] Deploy hdfs-tools 0.0.5 [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/570420 (owner: 10Joal) [20:13:46] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/hdfs-tools/deploy] - 10https://gerrit.wikimedia.org/r/570420 (owner: 10Joal) [20:15:04] ottomata: anything special when deploying hdfs-tools? [20:15:22] simple `scap deploy "Message"` ottomata ? [20:20:19] ya [20:20:25] ok :) [20:20:39] !log Deploy hdfs-tools 0.0.5 using scap [20:20:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:29:20] !log Refinery-source released in archiva by jenkins [20:29:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:29:23] Thanks ottomata :) [20:33:33] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/565558 (https://phabricator.wikimedia.org/T238858) (owner: 10Joal) [20:33:54] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/566834 (https://phabricator.wikimedia.org/T241375) (owner: 10Mforns) [20:34:55] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Make history and current wikitext available in hadoop - https://phabricator.wikimedia.org/T238858 (10JAllemandou) [20:51:41] !log Deploy refinery using scap [20:51:43] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:04:33] ok here we are - starting operations after deploy [21:06:50] !log Kill-restart mediawiki-wikitext-history-coord and mediawiki-wikitext-current-coord [21:06:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:09:57] mforns: question on mediawiki_history_dumps - Shall I drop all existing data as it is not correctly formatted? [21:10:34] joal, fine by me :] [21:10:39] mforns: I'd do that, and then restart the job starting 2019-11 (included), so to have at least 3 versions [21:10:45] ok - doing so [21:11:11] !log Kill-restart mediawiki-history-dumps-coord, drop existing data, and restart at 2019-11 [21:11:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:12:24] joal, how's deployment? do you need assistance? [21:12:31] or rubber duck [21:12:33] I started with the easy ones :) [21:12:45] I'll come to data-quality in minutes [21:12:55] k [21:14:21] Kill data_quality_stats-hourly-bundle and data_quality_stats-daily-bundle [21:14:27] !log Kill data_quality_stats-hourly-bundle and data_quality_stats-daily-bundle [21:14:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:16:39] ottomata: Could you please merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/566822/ (docs only) [21:19:40] one comment joal [21:19:43] please [21:19:52] put on patch [21:24:50] git pull [21:24:52] oops [21:25:11] mforns: batcave about data-quality data? [21:25:22] joal, sure! [21:46:21] ottomata: patch updated - Thanks for the comment 0 [21:47:22] merged! [21:49:33] Thanks a lot ottomata :) [21:51:00] ok deploment done :) [21:53:10] 10Analytics, 10Analytics-Kanban: Make history and current wikitext available in hadoop - https://phabricator.wikimedia.org/T238858 (10JAllemandou) [21:53:12] 10Analytics: Provide data dumps in the Analytics Data Lake - https://phabricator.wikimedia.org/T186559 (10JAllemandou) [21:58:44] I'm calling that a day - See you team tomorrow [21:59:27] Thanks for the email on alarm mforns :) [22:02:21] byyeee [22:25:44] 10Analytics: Kerberos password for Trey Jones (tjones) - https://phabricator.wikimedia.org/T244416 (10TJones) [23:10:18] back in business [23:21:18] Someting's funny with the SAL on mirror on Wikitech [23:22:22] https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:37] did something change about how that works/ [23:22:38] ?