[00:19:11] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Drop old mediawiki_history_reduced snapshots - https://phabricator.wikimedia.org/T197888 (10fdans) @Nuria confirmed that the table is only holding 6 snapshots right now. This task can be marked as done. [00:41:43] (03Abandoned) 10Fdans: Replace time range selector [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/498746 (owner: 10Fdans) [00:43:33] (03CR) 10Fdans: [C: 04-1] Add concept of metric groups, rotate in dashboard (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/494241 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [00:47:19] (03CR) 10Nuria: [C: 03+2] Fix docs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/498744 (owner: 10Awight) [00:52:27] (03Merged) 10jenkins-bot: Fix docs [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/498744 (owner: 10Awight) [02:08:43] 10Analytics, 10Analytics-SWAP: notebook1004: can't stat() fuse.fuse_dfs file system /mnt/hdfs - https://phabricator.wikimedia.org/T219111 (10GTirloni) p:05Triage→03Normal [02:09:34] 10Analytics, 10Analytics-SWAP: notebook1004: can't stat() fuse.fuse_dfs file system /mnt/hdfs - https://phabricator.wikimedia.org/T219111 (10GTirloni) [03:50:35] 10Analytics: Replace current time range selector - https://phabricator.wikimedia.org/T219112 (10fdans) [07:11:53] morning! [08:09:45] Morning elukey [08:15:57] elukey: looks like an-coord1001 is not in the 'Promotheus machine stats' grafana dashboard list [08:16:02] elukey: is that expected? [08:16:18] bonjour [08:17:07] RECOVERY - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is OK: OK [08:17:29] this was me --^ [08:17:50] in theory no, there should be a prometheus exporter for basic machine stats on all hosts [08:17:53] 10Analytics-Kanban, 10Product-Analytics: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Yair_rand) @Nuria The WMF internal wiki isn't publicly accessible, making its contents unavailable to the community. Unless there's particular reason th... [08:18:07] that dashboard is to be deprecated though, looking for the new one [08:18:13] to see if the issue is still there [08:18:34] elukey: I'm interested for the new one- Couldn't find it [08:18:48] that should be https://grafana.wikimedia.org/d/000000377/host-overview [08:18:59] Filippo is re-organizing everythig [08:19:01] \o/ ! [08:19:02] *everything [08:19:02] Thanks :) [08:19:28] finishing one task update and then I'll check those metrics, really strange :( [08:19:41] joal: btw 9h for a run of sqoop?? [08:19:49] that is more than awesome! [08:20:31] elukey: the number of processors and mappers make the diff - number of mappers can stay (cluster-related), but the number of processors depends on the machine [08:21:21] elukey: on an-coord1001 there is 16 CPUs, and critical stuff (hive-metastore, hive-server, oozie0 [08:21:53] elukey: no an-coord1001 in 'host overview' either :( [08:22:33] elukey: Sqoop was happening from an-coord, so we'll need to move to 10 processors I think - It'll take longer :) [08:24:48] ah yes makes sense [08:29:02] ok checking an-coord [08:29:12] so I have two things to check [08:29:23] 1) an-coord is not in the prometheus master target list [08:29:36] 2) an-coord's base exporter doesn't work [08:31:23] http: Accept error: accept tcp [::]:9100: accept4: too many open files; retrying in 1s [08:31:26] ahahahah [08:37:20] ahhh [08:37:20] elukey@an-coord1001:~$ sudo lsof -p 228483 | wc -l [08:37:21] 1031 [08:37:26] this is the pid of the exporter [08:37:45] elukey@an-coord1001:~$ sudo cat /proc/228483/limits | grep file [08:37:46] Max file size unlimited unlimited bytes [08:37:48] Max core file size 0 unlimited bytes [08:37:51] Max open files 1024 4096 files [08:37:55] Max file locks unlimited unlimited locks [08:40:45] and the process is in interruptible sleep due to IO [08:49:22] it seems to have leaked some sockets [08:49:35] but now it is blocking my attempts to kill it [08:52:26] ah! [08:52:35] it was blocked for /mnt/hdfs [08:52:44] unmounted and forced the mount [08:52:59] now the exporter works [08:53:14] I hate the fuse hdfs mount point [08:53:50] joal: I can see metrics flowing in host-overview now [08:54:04] need to follow up with filippo about alarming though [08:54:32] I think that we got a weird use case, namely the systemd unit showing as running but the process stuck in that sleep [08:55:41] brb [09:26:07] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata, 10Patch-For-Review: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10Lea_Lacroix_WMDE) +1 with what was said above, I also thought that it was just a change of da... [10:29:47] elukey: question for you - I see labsdb1012 activity but I'm testing sqoop - would that be you? [10:32:14] joal: nope, I guess it is replication of data [10:32:18] what kind of activity? [10:32:26] elukey: select-range :S [10:32:27] weird [10:32:47] where do you see that select range? [10:33:46] elukey: https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=labsdb1012&var-port=9104&panelId=31&fullscreen [10:33:55] sorry - select-SCAN [10:34:12] no idea what that is [10:35:44] nothing that I can see from show processlist [10:36:17] ah no there is a select [10:36:55] but for information_schema [10:37:02] I think it is something related to replication [10:37:02] meh? [10:37:08] ok :) [10:37:47] ah I think it is tendril [10:37:51] elukey: indeed - pattern seems stable accross days (excuse me for pinging) [10:37:54] tendril ? [10:39:37] np! it is https://tendril.wikimedia.org [10:39:52] it shows how dbs relate to each other, if they are lagging, etc.. [10:40:27] Ohhhhh :) [10:40:31] Nice linik !! [10:42:41] elukey: I was wondering as I am testing sqooping from an-store cluster and wasn't expecting to see any activity on labsdb1012 :) [10:43:26] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10dr0ptp4kt) >> - Based on 1A, the volume of visits coming from integrated transla... [10:46:51] :) [11:35:19] * elukey lunch + errand! [12:46:47] (03PS1) 10Hashar: Fix PHP CodeSniffer [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/498831 [12:46:50] (03PS1) 10Hashar: build: add php-parallel-lint [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/498832 (https://phabricator.wikimedia.org/T179963) [12:46:58] (03CR) 10jerkins-bot: [V: 04-1] Fix PHP CodeSniffer [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/498831 (owner: 10Hashar) [12:52:00] (03CR) 10Hashar: "recheck" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/498831 (owner: 10Hashar) [12:52:05] (03CR) 10Hashar: "recheck" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/498832 (https://phabricator.wikimedia.org/T179963) (owner: 10Hashar) [12:54:35] (03CR) 10Ladsgroup: [C: 03+2] Fix PHP CodeSniffer [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/498831 (owner: 10Hashar) [12:54:42] (03CR) 10Ladsgroup: [C: 03+2] build: add php-parallel-lint [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/498832 (https://phabricator.wikimedia.org/T179963) (owner: 10Hashar) [12:54:56] (03Merged) 10jenkins-bot: Fix PHP CodeSniffer [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/498831 (owner: 10Hashar) [12:55:06] (03Merged) 10jenkins-bot: build: add php-parallel-lint [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/498832 (https://phabricator.wikimedia.org/T179963) (owner: 10Hashar) [12:57:47] (03PS6) 10Joal: Add revision_hidden_parts to mediawiki-history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492304 (https://phabricator.wikimedia.org/T178587) [13:06:51] (03PS7) 10Joal: Add revision_hidden_parts to mediawiki-history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492304 (https://phabricator.wikimedia.org/T178587) [13:09:43] (03PS9) 10Joal: Add change_tags to mediawiki_history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492320 [13:13:17] (03PS10) 10Joal: Add change_tags to mediawiki_history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492320 [13:20:38] (03PS8) 10Joal: Add revision_deleted_parts to mediawiki-history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492304 (https://phabricator.wikimedia.org/T178587) [13:20:57] (03PS11) 10Joal: Add change_tags to mediawiki_history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492320 [13:21:25] (03PS12) 10Joal: Update mediawiki-reconstruction with log info [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/493012 [13:29:38] (03PS5) 10Joal: Update mw user-history timestamps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/497604 (https://phabricator.wikimedia.org/T218463) [13:43:26] (03PS1) 10Joal: Correct names in mediawiki-history sql package [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/498861 [13:45:39] elukey: if you're nearby [13:54:26] joal: I am now sorry [13:54:37] no problem elukey ) [13:54:57] elukey: I am wondering about how I should update puppet [13:55:14] elukey: sqoop-related [13:55:30] sure [13:55:46] in the sqoop_mediawiki.pp file, num_mappers and num_processors are defined globally [13:56:23] It was good before we were having the beast, but now I think we'd like to change the parameters depedning on how much data the job will be gathering [13:56:53] Question: Can I define the parameters to be passed to the erb templates in the file blocks? [13:57:44] in theory no, you can use in the erb file a variable defined in the class [13:57:58] :( [13:58:08] do you need to define num_mappers/num_processors for each script file? [13:58:39] if so we can use differen variable names [13:59:05] and each template will gather the ones that it needs [13:59:15] like: num_mappers_sqoop_production = x [13:59:20] elukey: I'd like different values for mediawiki/production and private [13:59:35] elukey: otherwise, can I hard-code those values in the script? [14:00:01] elukey: mediawiki/production get all-time data, while private gets 1 month [14:00:08] joal: I would create variables in the puppet class so we can see how the scripts differ [14:00:13] this is a big difference :) [14:00:25] ok will do :) [14:00:31] super :) [14:00:35] Thanks :) [14:03:11] (03CR) 10Ottomata: [C: 03+1] "Looks fine to me but joal should confirm." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/498700 (https://phabricator.wikimedia.org/T144100) (owner: 10Awight) [14:07:36] 10Analytics, 10EventBus, 10WMF-JobQueue, 10Core Platform Team Backlog (Later), 10Services (next): Partition htmlCacheUpdate job topic - https://phabricator.wikimedia.org/T219159 (10Pchelolo) [14:09:55] 10Analytics, 10EventBus, 10WMF-JobQueue, 10Core Platform Team Backlog (Later), 10Services (next): Partition htmlCacheUpdate job topic - https://phabricator.wikimedia.org/T219159 (10Pchelolo) [14:25:29] 10Analytics, 10EventBus, 10WMF-JobQueue, 10Core Platform Team Backlog (Later), 10Services (next): Partition htmlCacheUpdate job topic - https://phabricator.wikimedia.org/T219159 (10Ottomata) Can do, @Pchelolo you want {eqiad,codfw}.mediawiki.job.htmlCacheUpdate topics to be bumped to 8 partitions? [14:28:17] 10Analytics, 10EventBus, 10WMF-JobQueue, 10Core Platform Team Backlog (Later), 10Services (next): Partition htmlCacheUpdate job topic - https://phabricator.wikimedia.org/T219159 (10Pchelolo) @Ottomata yes, but not just yet, we still need to prepare the patches etc. [14:53:51] 10Analytics, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Security, 10good first bug: Pageview dumps incorrectly formatted, need to escape special characters - https://phabricator.wikimedia.org/T144100 (10awight) Updated the `page_title` field documentation to reflect what happens in code, https:... [14:55:57] hey team :] [14:56:38] 10Analytics-Kanban, 10Product-Analytics: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) @Yair_rand the public guidelines as to data retention are public in the privacy policy: https://foundation.wikimedia.org/wiki/Privacy_policy [14:57:39] Hi mforns :) Welcome back :) [14:57:52] joal: hello [14:58:02] joal: was looking at https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/498861/1//COMMIT_MSG [14:58:20] elukey: question for before standup: how many CPUs (processors) can I dedicate to sqooping on an-coord1001 you think ? [14:58:23] joal: but i do not get the rename , "registrar" is not a word in English is it? [14:58:42] elukey: machine has 16 cpus, and handles hive and oozie [14:59:19] (03CR) 10Nuria: [C: 04-1] Correct names in mediawiki-history sql package (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/498861 (owner: 10Joal) [14:59:28] nuria: registrar is indeed an english word! [15:00:18] team, I might be a couple minutes late to standup, google is asking me for 2FA, and my phone is dead, will have to charge it and wait to get the code [15:00:35] or... join with my personal account [15:00:35] nuria: it's actually `registerer` which is not :) [15:00:36] ottomata: i see, are we thinking of this one? "https://registrar.washington.edu/" [15:00:56] yup [15:00:58] i guess [15:01:18] nuria: I changed that name after having discussed with milimetric about registerer not being an english word :) [15:02:17] standup? [15:02:17] ping ottomata , milimetric standdduppp [15:02:42] OH a meet is in the calendar now! [15:02:55] I’m just finished with the inspection but I won’t be able to jump on until later [15:03:03] np milimetric :) [15:03:17] tryhing to do hangout... [15:04:02] a-team, can anyone send me the link to the batcave? [15:04:10] https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave [15:04:12] mforns: https://hangouts.google.com/hangouts/_/wikimedia.org/a-batcave?pli=1&authuser=1 [15:04:17] thaaanks! [15:39:41] (03CR) 10Nuria: Escape whitespaces in page_title (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/498700 (https://phabricator.wikimedia.org/T144100) (owner: 10Awight) [15:42:47] (03CR) 10Nuria: [C: 04-1] Correct names in mediawiki-history sql package (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/498861 (owner: 10Joal) [15:43:54] 10Analytics, 10EventBus, 10WMF-JobQueue, 10Core Platform Team Backlog (Later), 10Services (next): Partition htmlCacheUpdate job topic - https://phabricator.wikimedia.org/T219159 (10Ottomata) Oo, actually we should get @herron to do this :) [15:48:16] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): EventGate should be able to configure hasty and guaranteed kafka producers individually - https://phabricator.wikimedia.org/T219032 (10fdans) [15:49:08] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): EventGate should be able to configure hasty and guaranteed kafka producers individually - https://phabricator.wikimedia.org/T219032 (10fdans) p:05Triage→03High [15:52:17] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Ingest data from PrefUpdate EventLogging schema into Druid - https://phabricator.wikimedia.org/T218964 (10fdans) [15:52:27] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Ingest data from PrefUpdate EventLogging schema into Druid - https://phabricator.wikimedia.org/T218964 (10fdans) a:03fdans [15:52:56] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Ingest data from PrefUpdate EventLogging schema into Druid - https://phabricator.wikimedia.org/T218964 (10fdans) p:05Triage→03High [15:55:21] 10Analytics, 10Analytics-Wikistats: URLs in description break annotations format - https://phabricator.wikimedia.org/T218845 (10fdans) p:05Triage→03High [15:59:08] 10Analytics, 10Product-Analytics: prefUpdate schema contains multiple identical events for the same preference update - https://phabricator.wikimedia.org/T218835 (10fdans) This schema has no owner and it's the owner who has to correct the instrumentation. cc @jlinehan fyi [16:00:05] 10Analytics, 10Analytics-Data-Quality, 10Product-Analytics: A few alterblocks events have event_timestamps from before 2001 - https://phabricator.wikimedia.org/T218824 (10fdans) p:05Triage→03Normal [16:01:21] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP: notebook1004: can't stat() fuse.fuse_dfs file system /mnt/hdfs - https://phabricator.wikimedia.org/T219111 (10fdans) a:03elukey [16:02:07] 10Analytics, 10Analytics-Wikimetrics: csv upload tricks you into staring at a "Refresh" button - https://phabricator.wikimedia.org/T65402 (10fdans) 05Open→03Declined Wikimetrics will be sunset this quarter. [16:02:42] 10Analytics, 10Analytics-Wikimetrics: Reports download with the filename "Bytes" - https://phabricator.wikimedia.org/T65401 (10fdans) 05Open→03Declined Wikistats will be sunset by the end of this quarter [16:04:04] 10Analytics, 10Analytics-Wikimetrics: "Created" date in report queue showing up strange - https://phabricator.wikimedia.org/T56300 (10fdans) 05Open→03Declined Wikistats will be sunset by the end of this quarter [16:17:22] 10Analytics, 10Analytics-Wikistats: Long annotations text being clipped - https://phabricator.wikimedia.org/T218846 (10mforns) p:05Triage→03Normal [16:18:37] 10Analytics, 10Analytics-Kanban, 10Analytics-SWAP: notebook1004: can't stat() fuse.fuse_dfs file system /mnt/hdfs - https://phabricator.wikimedia.org/T219111 (10elukey) 05Open→03Resolved Thanks a lot, it has been fixed this morning! [16:23:14] 10Analytics: "Latest X" filter in Turnilo picks the wrong dates - https://phabricator.wikimedia.org/T219040 (10Nuria) This happens on other datasources, could be a turnilo bug or rather missconfuration, need to check which one. [16:23:23] 10Analytics: "Latest X" filter in Turnilo picks the wrong dates - https://phabricator.wikimedia.org/T219040 (10mforns) p:05Triage→03Normal [16:24:52] 10Analytics: Replace current time range selector - https://phabricator.wikimedia.org/T219112 (10mforns) p:05Triage→03High [16:29:42] 10Analytics, 10Operations, 10Product-Analytics, 10Patch-For-Review, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10mforns) @aborrero ping :] [16:29:51] 10Analytics, 10Operations, 10Product-Analytics, 10Patch-For-Review, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10mforns) p:05High→03Normal [16:30:39] 10Analytics: Alarms for virtualpageview should exist (probably in oozie) for jobs that have been idle too long - https://phabricator.wikimedia.org/T213716 (10mforns) p:05High→03Normal [16:31:07] 10Analytics, 10Operations, 10Product-Analytics, 10Patch-For-Review, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10elukey) This is my bad, I should I have followed up on this task. There are more variables since I added my last comment to add i... [16:33:13] 10Analytics, 10Analytics-Wikistats: Add user_is_bot_by_group to MediaWiki history - https://phabricator.wikimedia.org/T219177 (10mforns) [16:34:10] 10Analytics: vet edit data on the data lake - https://phabricator.wikimedia.org/T153923 (10mforns) [16:34:12] 10Analytics, 10Analytics-Wikistats: Feedback on hive table mediawiki_history by Erik Z - https://phabricator.wikimedia.org/T178591 (10mforns) 05Open→03Resolved Closing this one, the remaining work was moved to this other task: T219177 [16:35:31] 10Analytics, 10Analytics-Wikistats: Add user_is_bot_by_group to MediaWiki history - https://phabricator.wikimedia.org/T219177 (10mforns) p:05Triage→03Normal [16:37:16] 10Analytics, 10Scoring-platform-team: [Discuss] ORES model development and deployment processes - https://phabricator.wikimedia.org/T216246 (10Harej) [16:38:39] 10Analytics: Mediawiki History: moves counted twice in Revision - https://phabricator.wikimedia.org/T189044 (10mforns) @JAllemandou @Milimetric (grosking) Should we then add some documentation in regards to that? [16:40:02] 10Analytics, 10MobileFrontend, 10Performance-Team (Radar), 10Readers-Web-Backlog (Tracking), 10Technical-Debt: Figure out XAnalytics stuff - https://phabricator.wikimedia.org/T190381 (10mforns) This is less relevant every time, because we're are moving towards an event-centric analytics framework. Moving... [16:44:07] 10Analytics, 10ORES, 10Scoring-platform-team (Current): Emit synthetic mediawiki.revision-score events for both datacenters - https://phabricator.wikimedia.org/T214545 (10Harej) p:05Triage→03Lowest a:05awight→03None [16:45:46] 10Analytics, 10Operations, 10Product-Analytics, 10Patch-For-Review, 10User-Elukey: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10diego) Finally it's not just me squeezing notebooks memory :) [16:45:57] 10Analytics, 10ORES, 10Scoring-platform-team, 10Patch-For-Review, 10Services (watching): Wire ORES recent_score events into Hadoop - https://phabricator.wikimedia.org/T209732 (10Harej) p:05Triage→03Lowest a:05awight→03None [16:46:01] 10Analytics, 10ORES, 10Scoring-platform-team: Emit synthetic mediawiki.revision-score events for both datacenters - https://phabricator.wikimedia.org/T214545 (10Harej) [16:58:35] 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad: confirm gpu form factor in stat1005 - https://phabricator.wikimedia.org/T216528 (10Cmjohnson) [17:01:22] 10Analytics, 10Operations, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10Cmjohnson) [17:01:27] 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad: confirm gpu form factor in stat1005 - https://phabricator.wikimedia.org/T216528 (10Cmjohnson) 05Open→03Resolved Resolving [17:19:38] 10Analytics, 10good first bug: Productionize and run 2018 job for Global Innovation Index from Hadoop Geowiki data - https://phabricator.wikimedia.org/T190535 (10mforns) p:05High→03Normal [17:19:54] 10Analytics, 10good first bug: Productionize and run 2018 job for Global Innovation Index from Hadoop Geowiki data - https://phabricator.wikimedia.org/T190535 (10mforns) p:05Normal→03High [17:30:52] 10Analytics: Create test Kerberos identities/accounts for some selected users from Analytics - https://phabricator.wikimedia.org/T212258 (10mforns) p:05High→03Triage [17:30:54] 10Analytics: Create test Kerberos identities/accounts for some selected users from Analytics - https://phabricator.wikimedia.org/T212258 (10mforns) p:05Triage→03High [17:30:56] 10Analytics, 10EventBus: EventGate Helm chart should POST test event for readinessProbe - https://phabricator.wikimedia.org/T218680 (10Ottomata) > it might make more sense to create a specialized GET /healthz endpoint that does just produces (and deletes if required/prudent?) a hardcoded test event in kafka.... [17:31:23] 10Analytics: Upgrade python ua parser to 0.6.3 version - https://phabricator.wikimedia.org/T212854 (10mforns) p:05High→03Triage [17:31:25] 10Analytics: Upgrade python ua parser to 0.6.3 version - https://phabricator.wikimedia.org/T212854 (10mforns) p:05Triage→03High [17:31:47] 10Analytics: Alarm on data quality issues - https://phabricator.wikimedia.org/T159840 (10mforns) p:05Normal→03Triage [17:31:51] 10Analytics: Alarm on data quality issues - https://phabricator.wikimedia.org/T159840 (10mforns) p:05Triage→03Normal [17:31:58] 10Analytics, 10Analytics-Wikistats: Wikistats 2: New Pages split by editor type wrongly claims no anonymous users create pages - https://phabricator.wikimedia.org/T185342 (10mforns) p:05Normal→03Triage [17:32:00] 10Analytics, 10Analytics-Wikistats: Wikistats 2: New Pages split by editor type wrongly claims no anonymous users create pages - https://phabricator.wikimedia.org/T185342 (10mforns) p:05Triage→03Normal [17:32:15] 10Analytics: Mediawiki History: moves counted twice in Revision - https://phabricator.wikimedia.org/T189044 (10mforns) p:05Normal→03Triage [17:32:17] 10Analytics: Mediawiki History: moves counted twice in Revision - https://phabricator.wikimedia.org/T189044 (10mforns) p:05Triage→03Normal [17:32:25] 10Analytics, 10Analytics-Data-Quality: Spike: Quantify how many EventLogging requests we get from non-wiki* hostnames or apps - https://phabricator.wikimedia.org/T190840 (10mforns) p:05Normal→03Triage [17:32:28] 10Analytics, 10Analytics-Data-Quality: Spike: Quantify how many EventLogging requests we get from non-wiki* hostnames or apps - https://phabricator.wikimedia.org/T190840 (10mforns) p:05Triage→03Normal [17:33:15] 10Analytics: reportupdater TLC - https://phabricator.wikimedia.org/T193167 (10mforns) p:05Normal→03Triage [17:33:18] 10Analytics: reportupdater TLC - https://phabricator.wikimedia.org/T193167 (10mforns) p:05Triage→03Normal [17:33:27] 10Analytics: [reportupdater] Add a configurable hive client - https://phabricator.wikimedia.org/T193169 (10mforns) p:05Normal→03Triage [17:33:29] 10Analytics: [reportupdater] Add a configurable hive client - https://phabricator.wikimedia.org/T193169 (10mforns) p:05Triage→03Normal [17:33:31] 10Analytics: [reportupdater] eliminate the funnel parameter - https://phabricator.wikimedia.org/T193170 (10mforns) p:05Normal→03Triage [17:33:33] 10Analytics: [reportupdater] eliminate the funnel parameter - https://phabricator.wikimedia.org/T193170 (10mforns) p:05Triage→03Normal [17:33:35] 10Analytics: [reportupdater] consider not requiring date as a first colum of query/script results - https://phabricator.wikimedia.org/T193174 (10mforns) p:05Normal→03Triage [17:33:37] 10Analytics: [reportupdater] consider not requiring date as a first colum of query/script results - https://phabricator.wikimedia.org/T193174 (10mforns) p:05Triage→03Normal [17:34:06] ok, phunnel script is working again [17:36:05] Heya team [17:36:18] elukey: may I launch sqoop from an-coord1001 with 10 processors? [17:36:35] +2 [17:36:46] Starting ! [17:49:42] (03PS9) 10Joal: Add revision_deleted_parts to mediawiki-history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492304 (https://phabricator.wikimedia.org/T178587) [17:52:28] (03PS10) 10Joal: Add revision_deleted_parts to mediawiki-history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492304 (https://phabricator.wikimedia.org/T178587) [17:52:43] (03PS12) 10Joal: Add change_tags to mediawiki_history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/492320 [17:56:53] (03PS13) 10Joal: Update mediawiki-reconstruction with log info [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/493012 [17:59:18] (03PS6) 10Joal: Update mw user-history timestamps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/497604 (https://phabricator.wikimedia.org/T218463) [18:31:27] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Drop old mediawiki_history_reduced snapshots - https://phabricator.wikimedia.org/T197888 (10Nuria) 05Open→03Resolved [18:34:50] (03CR) 10Ottomata: "Annoying q: if we are renaming these (and these are dt-ish strings (not integers), should we name them accordingly?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/497604 (https://phabricator.wikimedia.org/T218463) (owner: 10Joal) [18:35:13] nuria: if you have time, do you want to check histograms in superset 0.31? [18:35:35] elukey: yes, i can do that now, out of meetings for 20 mins [18:36:37] elukey: i can test and let you know [18:36:43] elukey: command is ssh -L 9080:analytics-tool1004.eqiad.wmnet:80 analytics-tool1004.eqiad.wmnet: [18:36:54] correct [18:37:03] elukey: what is the port in analytics-tool1004.eqiad.wmnet [18:37:36] 9080 [18:37:49] sorry, I meant 80 [18:37:54] local port is 9080 [18:38:03] there is httpd in there doing ldap auth [18:38:11] nuria: --^ [18:38:42] elukey: so ssh -L 9080:analytics-tool1004.eqiad.wmnet:80 analytics-tool1004.eqiad.wmnet: 80 ? [18:39:12] ah no you can use [18:39:24] your command without the ':' at the end [18:39:50] ssh -L 9080:analytics-tool1004.eqiad.wmnet:80 analytics-tool1004.eqiad.wmnet [18:40:02] and then localhost:9080 in the browser [18:41:27] elukey: ok, let me see about histograms, it will be a few mins [18:42:34] sure [18:45:20] elukey: ya, +1 [18:45:23] elukey: they work [18:46:46] elukey: see if you can see it [18:46:48] http://localhost:9080/superset/explore/?form_data=%7B%22datasource%22%3A%22356__druid%22%2C%22viz_type%22%3A%22histogram%22%2C%22slice_id%22%3A167%2C%22url_params%22%3A%7B%7D%2C%22granularity%22%3A%22PT1H%22%2C%22druid_time_origin%22%3Anull%2C%22time_range%22%3A%22Last+day%22%2C%22all_columns_x%22%3A%5B%22response_size%22%5D%2C%22adhoc_filters%22%3A%5B%5D%2C%22row_limit%22%3A10000%2C%22groupby%22%3A%5B%5D%2C%22color_ [18:46:48] scheme%22%3A%22bnbColors%22%2C%22link_length%22%3A%2210%22%2C%22x_axis_label%22%3A%22%22%2C%22y_axis_label%22%3A%22%22%2C%22global_opacity%22%3A1%2C%22normalized%22%3Afalse%7D [18:46:53] elukey: argh sorry [18:48:13] elukey: if you look at http://localhost:9080/chart/list/ is now at the top [18:49:44] elukey: want me to test more things? i can do that [18:49:47] nuria: ah lovely! So 0.31 seems the good one [18:50:10] I need to tweak two things before [18:50:19] is it ok if we do it later on in the week? [18:54:22] (03CR) 10Joal: "> Patch Set 6:" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/497604 (https://phabricator.wikimedia.org/T218463) (owner: 10Joal) [18:54:59] (03PS2) 10Joal: Correct names in mediawiki-history sql package [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/498861 [18:55:02] 10Analytics, 10Anti-Harassment, 10EventBus, 10Core Platform Team (Multi-DC (TEC1)), and 3 others: Warning: get_class expects object (string given) from EventBusHooks.php - https://phabricator.wikimedia.org/T218952 (10mobrovac) 05Open→03Resolved Patch merged, will go out with the next train. Resolving. [19:00:54] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Sorry for the delay, was on vacation last week." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/497499 (https://phabricator.wikimedia.org/T218594) (owner: 10Bearloga) [19:01:07] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Core Platform Team Kanban (Done with CPT), 10Services (done): Decrease timeout for EventBus extension for analytics events - https://phabricator.wikimedia.org/T218260 (10Pchelolo) 05Open→03Resolved Merged and deployed as a part of SWAT. Resolving. [19:03:31] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 4 others: EventBus extension should never log unserialized events - https://phabricator.wikimedia.org/T218254 (10Ottomata) 05Open→03Resolved [19:06:45] Hi. I have this board: https://language-reportcard.wmflabs.org/cx2/#mt-engines [19:07:15] The labels on the left, like Apertium, Google, scratch, etc., come straight from the database. [19:09:22] One of them is "scratch", which is how it appears in the database, but on the web dashboard, I'd like the label to be different. [19:09:38] What's the nicest way to do it? [19:10:08] Can I write SQL in a way that shows another value? Or do I have to write a post-processing script? Or is there some other trick? [19:12:37] a-team ^ [19:12:39] 10Analytics, 10Anti-Harassment, 10EventBus, 10Core Platform Team (Multi-DC (TEC1)), and 4 others: Warning: get_class expects object (string given) from EventBusHooks.php - https://phabricator.wikimedia.org/T218952 (10dmaza) >>! In T218952#5046392, @Krinkle wrote: > Tagging AHT as this affect events from on... [19:15:22] aharoni: sounds like you could use a case statement [19:15:52] case col when 'scratch' then 'Scratch' else col end, something like that [19:16:26] milimetric: the thing that is documented here? - https://dev.mysql.com/doc/refman/5.7/en/case.html [19:17:34] aharoni: https://dev.mysql.com/doc/refman/5.7/en/control-flow-functions.html#operator_case [19:17:48] but we use mariadb, though I don't see any obvious difference [19:18:25] milimetric: great, thank you. I wasn't familiar with this. I hoped that there is something in SQL that can do it, and it looks like the right thing. I'll try it. [19:21:22] * elukey off! [19:24:59] (03CR) 10Ottomata: "RIGHT RIGHT I'm sorry I keep forgetting that. You'll probably have to remind me over and over again." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/497604 (https://phabricator.wikimedia.org/T218463) (owner: 10Joal) [19:30:36] (03PS1) 10Amire80: Clean up the dashboard labels [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/498981 (https://phabricator.wikimedia.org/T210135) [19:33:36] 10Analytics, 10Analytics-Wikistats: Feedback on hive table mediawiki_history by Erik Z - https://phabricator.wikimedia.org/T178591 (10Milimetric) I remember discussing this recently, and the idea we had then was to have a single field, something like `bot_detected_by` which would be a list of `name-regex, grou... [19:34:55] 10Analytics, 10Analytics-Wikistats: Add user_is_bot_by_group to MediaWiki history - https://phabricator.wikimedia.org/T219177 (10Nuria) Moving @Milimetric 's comment: I remember discussing this recently, and the idea we had then was to have a single field, something like bot_detected_by which would be a list... [19:50:44] milimetric: if you have time today, CRs are waiting for you :) [19:51:38] ok joal, will try to catch up with pings and get to those after [19:52:06] milimetric: sounds great :) [19:52:37] milimetric: I have finalized the one about revision_deleted_parts, and rebased others [19:53:08] cool, nice progress [19:53:48] milimetric: I have also reworked the page_history one, but not yet ready to be seen ;) [20:00:32] 10Analytics-Kanban, 10Patch-For-Review: Coordinate work on minor changes for Edit Data Quality - https://phabricator.wikimedia.org/T213603 (10Milimetric) Good checking. The user events from before the 90s are funny and weird, but I am hopeful the new patch fixes the `event_timestamp` for the page create events. [20:08:59] (03CR) 10Nuria: [C: 03+2] Update refinery sqoop to use dedicated labsdb host [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495266 (https://phabricator.wikimedia.org/T215550) (owner: 10Joal) [20:10:16] (03CR) 10Nuria: "Let's deploy this with the wednesday deployment this week?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/489313 (https://phabricator.wikimedia.org/T215655) (owner: 10Milimetric) [20:11:09] (03CR) 10Milimetric: "If it gets the +2, sure, happy to put it on the train." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/489313 (https://phabricator.wikimedia.org/T215655) (owner: 10Milimetric) [20:33:53] hi milimetric . I made https://gerrit.wikimedia.org/r/#/c/analytics/limn-language-data/+/498981/ . There's a Dashiki board that uses this data. If this patch is merged, will the board be just deployed? Or does it need more intervention and deployment by the Analytics team? [20:36:36] Also, will it update all the previous data? [20:39:28] 10Analytics: Review parent task for any potential pageview definition improvements - https://phabricator.wikimedia.org/T156656 (10Milimetric) Sorry to have missed this ping @awight, and thanks for the work! The `pagecounts-raw` data is the older stuff, where you updated the docs as mentioned in T144100#5053676.... [20:42:36] a-team ^ [20:42:42] (03CR) 10Milimetric: [C: 03+2] Clean up the dashboard labels [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/498981 (https://phabricator.wikimedia.org/T210135) (owner: 10Amire80) [20:43:07] aharoni: I merged it, that'll start by itself, and the dataset will get updated automatically with the new columns [20:43:29] aharoni: let me know if anything looks wrong though [20:43:53] milimetric: thanks! How long will it take? A few hours? [20:45:47] aharoni: it runs weekly, so it'll start showing up on the next run, I think next Monday-ish: https://github.com/wikimedia/analytics-limn-language-data/blob/master/mt_engines/config.yaml [20:46:09] milimetric: oh, so a week from now? [20:46:32] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikimetrics: Sunset Wikimetrics - https://phabricator.wikimedia.org/T211835 (10mforns) https://metrics.wmflabs.org is now pointing to https://eventmetrics.wmflabs.org [20:47:09] aharoni: yeah, I believe the weekly jobs run on Sunday but this one is a day delayed, so sometime Monday. You can always delete old data and force reruns with the rerun feature [20:47:37] milimetric: how do I do it? [20:47:44] the rerun [20:48:08] aharoni: I think I have to do it if you want to, I think you tried before and you don't have sudo -u stats on stat1007, right? [20:48:34] milimetric: I don't think that I even tried. If you can do it, I'll appreciate it. [20:48:48] aharoni: so just let me know if it's something you want. First make sure this report can be re-run, like, did any of the underlying data change and do you mind losing the format/column names as they have been so far? [20:49:19] milimetric: Yes, I want it. It's possible to rerun everything from the start. The data is still there. [20:49:22] aharoni: yeah, we talked about this before, remember: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Reportupdater#Re-runs [20:49:39] ok, aharoni I'll rerun everything [20:49:44] Thank you. [20:50:26] (03PS4) 10Bmansurov: WIP: Oozie: add article recommender [analytics/refinery] - 10https://gerrit.wikimedia.org/r/496885 (https://phabricator.wikimedia.org/T210844) [20:51:22] (03CR) 10Bmansurov: "> Patch Set 2:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/496885 (https://phabricator.wikimedia.org/T210844) (owner: 10Bmansurov) [20:54:25] ok aharoni, marked for rerun, but not sure when it's going to get picked up. If you don't see it by tomorrow, ping me, I can force it [20:54:39] milimetric: Thanks! [20:57:56] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikimetrics: Sunset Wikimetrics - https://phabricator.wikimedia.org/T211835 (10Nuria) Many thanks, let's please make sure to update all docs. [21:07:46] 10Analytics, 10Analytics-Dashiki: Detect bad hash in tabs layout - https://phabricator.wikimedia.org/T219235 (10Milimetric) [21:27:07] (03CR) 10Nuria: [C: 04-1] "Super thanks for doing these changes!" (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/498702 (https://phabricator.wikimedia.org/T144100) (owner: 10Awight) [21:32:26] (03CR) 10Nuria: "* All metrics is still red and in lowercase on upper right corner, let's please capitalize and change font color." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/494241 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [21:33:34] fdans: CR-ed the "All metrics " patch , the css issue is corrected but there are few others that still are not , I have listed them on gerrit [21:44:08] (03PS7) 10Nuria: Add concept of metric groups, rotate in dashboard [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/494241 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [21:45:29] (03CR) 10Nuria: "Sorry about my comments prior, I think this patch just needed rebasing." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/494241 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [21:51:20] (03CR) 10Nuria: "Still a bit confused on what is in what patch." (031 comment) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/494241 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [21:51:51] (03PS2) 10Nuria: Create metrics matrix component [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/498748 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [21:54:26] (03CR) 10jerkins-bot: [V: 04-1] Create metrics matrix component [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/498748 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [21:54:48] (03CR) 10Nuria: [C: 04-1] "This change does not load for me after rebasing." [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/498748 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [22:06:28] (03CR) 10Nuria: WIP: Oozie: add article recommender (035 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/496885 (https://phabricator.wikimedia.org/T210844) (owner: 10Bmansurov) [22:10:07] (03CR) 10Nuria: Correct names in mediawiki-history sql package (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/498861 (owner: 10Joal) [22:42:57] 10Analytics: "Latest X" filter in Turnilo picks the wrong dates - https://phabricator.wikimedia.org/T219040 (10Nuria) Ya, i think this is a turnilo bug on date calculations. We need to see if it present on latest version before upgrading. [23:11:08] 10Analytics-Kanban, 10Product-Analytics: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) FYI @Amire80 https://phabricator.wikimedia.org/T207171 has no privacy issues for the most part. [23:12:02] 10Analytics, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Nuria) This work needs the bot filtering be active first, otherwise you would just get "fake" top10 lists per country as much of the data will be distorted... [23:22:21] 10Analytics, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Amire80) >>! In T207171#5056278, @Nuria wrote: > This work needs the bot filtering be active first, otherwise you would just get "fake" top10 lists per coun... [23:26:35] 10Analytics, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Nuria) >Is this also an issue for the topviews that are shown per language? Yes, it is an issue with any top list. Now, topviews has a "spam" list so titles... [23:29:02] 10Analytics, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Amire80) >>! In T207171#5056374, @Nuria wrote: >>Is this also an issue for the topviews that are shown per language? > Yes, it is an issue with any top list... [23:32:40] 10Analytics, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Nuria) >OK, so can the same list be reused for the top views per language and per country? If they have spam, it's probably OK that they have the same spam... [23:39:07] 10Analytics, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Amire80) What I'm trying to say is that in the pageviews tool there is now a topviews tool that shows data per language, and there should also be a topviews... [23:43:33] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Fundraising-Backlog, and 2 others: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10Nuria) Looking at code patch on vagrant. [23:51:50] 10Analytics, 10MediaWiki-extensions-GrowthExperiments, 10Product-Analytics, 10Growth-Team (Current Sprint): Homepage: instrumentation - https://phabricator.wikimedia.org/T216586 (10MMiller_WMF) [23:52:39] 10Analytics, 10Language-strategy, 10Tool-Pageviews: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Nuria) >So maybe spam exists, but it doesn't look like a huge blocking problem. It is, at times bot traffic amounts to 10% of our total traffic. This is not...