[00:25:55] 10Analytics, 10Editing-team, 10Performance-Team, 10observability: VE edit data stopped at ~2019-11-24Z01:25 - https://phabricator.wikimedia.org/T239121 (10Jdforrester-WMF) >>! In T239121#5695254, @Nuria wrote: > @Jdforrester-WMF Is this data published from parsoid servers or browser clients? Client-side. [01:11:09] (03PS4) 10Nuria: Create table to hold calculations of session features [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552943 (https://phabricator.wikimedia.org/T238360) [01:13:43] (03PS5) 10Nuria: Create table to hold calculations of session features [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552943 (https://phabricator.wikimedia.org/T238360) [01:29:16] (03PS6) 10Nuria: [WIP] Table and workflow for features computations per session [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552943 (https://phabricator.wikimedia.org/T238360) [01:34:52] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Hourly Feature extraction for bot detection from webrequest - https://phabricator.wikimedia.org/T238360 (10Nuria) Code is WIP but if @mforns and @JAllemandou could take a look would be very helpful [05:48:03] (03PS7) 10Nuria: [WIP] Table and workflow for features computations per session [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552943 (https://phabricator.wikimedia.org/T238360) [07:10:39] !log apply systemd user limits to stat1006,stat1007 and notebook100* [07:10:41] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:12:53] 10Analytics, 10Analytics-Kanban, 10Operations, 10Product-Analytics, and 2 others: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10elukey) >>! In T212824#5695358, @JAllemandou wrote: > @elukey: We should apply the same treatment for stat1007 :) Done! Only stat1... [09:16:29] !log apply systemd user limits to stat1005 [09:16:31] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:17:56] 10Analytics, 10Analytics-Kanban, 10Operations, 10Product-Analytics, and 2 others: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10elukey) [09:19:22] 10Analytics, 10Analytics-Kanban, 10Operations, 10Product-Analytics, and 2 others: notebook/stat server(s) running out of memory - https://phabricator.wikimedia.org/T212824 (10elukey) I am inclined to close this task, and re-open if more alarms will appear. We'll likely have to tune limits but I hope that t... [09:21:00] brb [09:54:38] joal: just ran the python script that allocates memory like crazy on stat1007, nothing exploded [09:54:47] and it was killed once crossing the 90% mark [09:54:58] [Wed Nov 27 09:50:45 2019] Memory cgroup out of memory: Kill process 75000 (python) score 999 or sacrifice child [10:11:57] https://pypi.org/project/apache-superset/ 0.35.1 is out! [10:58:49] (03PS4) 10Elukey: WIP - Superset 0.35.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/552238 [11:10:41] (03PS5) 10Elukey: WIP - Superset 0.35.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/552238 [11:17:41] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10Cparle) FWIW that query also looks sensible to me. Unless someone can point out something wrong with it I think... [11:57:16] 10Analytics, 10Analytics-Kanban, 10Operations, 10SRE-Access-Requests: Add system user analytics-privatedata to the anaytics-privatedata-users group - https://phabricator.wikimedia.org/T238306 (10jbond) 05Open→03Resolved I think this is complete but please reopen if there are still outstanding tasks [11:57:19] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Prepare the Hadoop Analytics cluster for Kerberos - https://phabricator.wikimedia.org/T237269 (10jbond) [12:32:00] so good news is that superset 0.35.1 in pypi builds super quick and easy [12:32:21] everything works except one dashboard, I found a little bug [12:32:34] that seems a regression, I didn't notice it last week [12:34:26] Hi elukey - \o/ for limits :) [12:39:09] \o/ [12:48:33] (03PS6) 10Elukey: WIP - Superset 0.35.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/552238 [12:58:28] (03PS7) 10Elukey: WIP - Superset 0.35.1 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/552238 [13:30:11] (03PS2) 10Joal: [WIP] Add slot, slot_roles and wbc_entity_usage to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) [13:38:46] (03PS3) 10Joal: [WIP] Add slot, slot_roles and wbc_entity_usage to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) [14:47:50] there you go https://github.com/apache/incubator-superset/issues/8676 [14:47:53] sigh [14:48:13] I didn't check all the dashboards apparently last week [15:00:20] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: Upgrade to Superset 0.35 - https://phabricator.wikimedia.org/T236690 (10elukey) Version 0.35.1 was released on pypi, I tested it with https://gerrit.wikimedia.org/r/#/c/analytics/superset/deploy/+/552238/ on an-tool1005 Good thing... [15:36:38] (03PS2) 10Mforns: Add data quality metric: traffic variations per country [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) [15:47:29] (03PS3) 10Mforns: Add data quality metric: traffic variations per country [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) [15:54:55] (03PS10) 10Mforns: Refactor data_quality oozie bundle to fix too many partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) [15:58:14] (03PS11) 10Mforns: Refactor data_quality oozie bundle to fix too many partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) [15:58:27] (03PS12) 10Mforns: Refactor data_quality oozie bundle to fix too many partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) [15:59:05] (03PS4) 10Mforns: Add data quality metric: traffic variations per country [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) [16:00:44] (03PS13) 10Mforns: Refactor data_quality oozie bundle to fix too many partitions [analytics/refinery] - 10https://gerrit.wikimedia.org/r/547320 (https://phabricator.wikimedia.org/T235486) [16:10:02] elukey: hola! [16:10:38] elukey: in order not to have big files on people's homedirs on stats machines do we put them .. where? /srv/? [16:13:07] nuria: hola! [16:13:29] the home dirs are all under /srv/home [16:13:38] (there is a symlink) [16:13:49] elukey: so no problems with space anymore in users homedirs , right? [16:13:51] and /srv is on a big partition (on stat etc..) [16:14:04] well it depends on the space available on the partition [16:14:12] but usually there is plenty of spae [16:14:14] *spece [16:14:16] argh [16:14:17] space [16:14:24] the only crowded one is stat1007 [16:15:43] elukey: k ! [16:16:46] nuria: qq about https://phabricator.wikimedia.org/T220575 - shouldn't it be owned by somebody in Kate's team? [16:16:57] the files I mean [16:17:29] elukey: I will own them cause they are litigation related [16:19:23] ah okok got it [16:19:55] now it is clearer :) [16:29:51] elukey: i ned to transfer those to my homedir, super sorry [16:29:51] 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) [16:31:12] nuria: np! I didn't ask to put pressure, your questions reminded me of that task and I asked :) [16:37:47] 10Analytics, 10Discovery, 10Event-Platform, 10Wikidata, and 3 others: Log Wikidata Query Service queries to the event gate infrastructure - https://phabricator.wikimedia.org/T101013 (10TJones) 05Open→03Resolved [16:42:24] 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on dbstore1003 - https://phabricator.wikimedia.org/T239217 (10elukey) Thanks a lot! [16:59:17] (03CR) 10Joal: [V: 03+2] "Confirmed working on cluster :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [17:30:31] 10Analytics, 10Discovery, 10Event-Platform, 10Wikidata, and 3 others: Log Wikidata Query Service queries to the event gate infrastructure - https://phabricator.wikimedia.org/T101013 (10JAllemandou) Does this being closed mean we can access data on kafka? [17:39:31] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) Rough first patchset submitted to get the conversation started. Very little is certain here since this is essentially the MVP trans... [17:43:45] going afk for a bit, checking later! [17:44:12] bye elukey [17:44:44] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Nuria) I am pinging @awight here for his comments as a long time mediawiki developer, adding him to CR [17:58:44] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) >>! In T235189#5697973, @Nuria wrote: > I am pinging @awight here for his comments as a long time mediawiki developer, adding him t... [18:25:27] 10Analytics: Label high volume bot spikes in pageview data as automated traffic - https://phabricator.wikimedia.org/T238357 (10Isaac) > weblight data will be excluded from the classification entirely, the way it gets to us it does not have any client IP that we can use. This is true for any other proxy as out tr... [18:26:49] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10Nuria) We will be scooping the slot tables to the cluster so these queries can run in parallel. [18:33:43] (03PS4) 10Joal: [WIP] Add slot, slot_roles and wbc_entity_usage to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) [19:05:46] (03PS5) 10Joal: [WIP] Add slot, slot_roles and wbc_entity_usage to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) [19:06:54] (03CR) 10Joal: [V: 03+2] "Tested as well (change to get_mediawiki_section_dbname_mapping as well)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [19:07:02] this is ready for review nuria --^ [19:14:19] Ah nuria - actually we miss another table :( [19:23:08] (03CR) 10Mforns: "LGTM overall as WIP patch! Left some minor comments, some on typos others on naming, etc." (0314 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552943 (https://phabricator.wikimedia.org/T238360) (owner: 10Nuria) [19:43:19] (03CR) 10Nuria: [C: 03+2] [WIP] Add slot, slot_roles and wbc_entity_usage to sqoop (0314 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [19:43:23] (03CR) 10Nuria: [C: 04-1] [WIP] Add slot, slot_roles and wbc_entity_usage to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [19:44:09] joal: code reviewed, mean to do -1 cause i think we need to do some name changes [20:07:04] (03PS1) 10Joal: Add newly sqooped tables to hive and mw-load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553405 (https://phabricator.wikimedia.org/T239127) [20:07:37] nuria: tables created: joal.mediawiki_wbc_entity_usage, joal.mediawiki_slot_roles and joal.mediawiki_slots [20:13:41] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Import slots/slots_roles and wikibase.wbc_entity_usage through scoop - https://phabricator.wikimedia.org/T239127 (10Nuria) [20:13:44] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10Nuria) [20:20:29] (03CR) 10Nuria: [C: 03+1] "Looks good, I imagine we have tested this so virtual +2 if that is the case" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553405 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [20:36:33] (03CR) 10Joal: "Replies here, patch follows." (0314 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [20:40:42] 10Analytics, 10Product-Analytics, 10Epic: Provide feature parity between the wiki replicas and the Analytics Data Lake - https://phabricator.wikimedia.org/T212172 (10Nuria) Just an FYI in this ticket that while we will be working to fix quality bugs in mw like: T230915 it is not likely we will be able to add... [20:42:55] (03PS6) 10Joal: [WIP] Add slot, slot_roles and wbc_entity_usage to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) [20:43:01] 10Analytics: Remove postal code and longitude / latitude from geocoded data object - https://phabricator.wikimedia.org/T236740 (10Nuria) longitude / latitude in geocoded_data object in webrequest is quite imprecise (as it comes from the IP) so it is a field that at this time is just eating space and not very u... [20:43:13] 10Analytics: Remove postal code and longitude / latitude from geocoded data object - https://phabricator.wikimedia.org/T236740 (10Nuria) ping @kzimmerman [20:43:27] 10Analytics: Remove postal code and longitude / latitude from geocoded data object on webrequest data - https://phabricator.wikimedia.org/T236740 (10Nuria) [20:43:53] 10Analytics: Remove postal code and longitude / latitude from geocoded data object on webrequest data - https://phabricator.wikimedia.org/T236740 (10Nuria) [20:47:35] 10Analytics: Set up automatic deletion for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10Nuria) 05Open→03Resolved [20:54:30] 10Analytics, 10Multimedia, 10Tool-Pageviews: Add ability to the pageview tool in labs to get mediarequests per file similar to existing functionality to get pageviews per page title - https://phabricator.wikimedia.org/T234590 (10Nuria) The Apis are now deployed so code changes can happen, we are still loadin... [20:55:07] 10Analytics, 10Reading Depth, 10Readers-Web-Backlog (Tracking): [Bug] Many JSON decode ReadingDepth schema errors from wikiyy - https://phabricator.wikimedia.org/T212330 (10Nuria) 05Open→03Declined [20:55:26] 10Analytics, 10Reading Depth, 10Readers-Web-Backlog (Tracking): [Bug] Many JSON decode ReadingDepth schema errors from wikiyy - https://phabricator.wikimedia.org/T212330 (10Nuria) Declining, ReadingDepth instrumentation is been turned off [20:56:08] 10Analytics: Remove postal code and longitude / latitude from geocoded data object on webrequest data - https://phabricator.wikimedia.org/T236740 (10Nuria) p:05Low→03Triage [20:57:12] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [20:57:35] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) @fgiunchedi I'm curious about these stack traces. I think we may have covered this before, but in production the code is minified,... [21:01:00] (03CR) 10Joal: [V: 03+2] "Tested (changes and new tables)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [21:01:27] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Nuria) >Or is this primarily for use in a development setting where minification has not yet happened? The primary usage I think will be prod... [21:01:56] (03PS7) 10Joal: Add slot, slot_roles and wbc_entity_usage to sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) [21:02:15] (03CR) 10Joal: [V: 03+2] "With forgotten changes in previous patch ..." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [21:03:07] nuria: I'm sqooping content and content_models table for all DBs now (tested on small number) - Will go for diner, and then bac kto finalize on the sqoop patch if ok for you [21:06:31] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10awight) Looks fun! I'm wondering whether there has been consideration of https://github.com/wikimedia/mediawiki/blob/master/resources/src/me... [21:15:04] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10awight) >>! In T235189#5698440, @jlinehan wrote: > @fgiunchedi I'm curious about these stack traces. I think we may have covered this before,... [21:57:07] (03PS5) 10Mforns: Add data quality metric: traffic variations per country [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) [22:02:33] nuria: hi - I'm gonna stop my day soon - wanna talk about sqoop change? [22:06:36] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) >>! In T235189#5698449, @Nuria wrote: > The primary usage I think will be production where it will be enabled for a few wikis and i... [22:09:07] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) >>! In T235189#5698465, @awight wrote: > Minified code can be mapped back to the full source using standardized [[ https://develope... [22:13:22] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) >>! In T235189#5698465, @awight wrote: > In my limited experience, stack traces are almost always valuable. Absolutely, I'd never... [22:19:20] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10jlinehan) >>! In T235189#5698453, @awight wrote: > Looks fun! I'm wondering whether there has been consideration of https://github.com/wikim... [22:22:48] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Nuria) @awight I did not know about https://github.com/wikimedia/mediawiki/blob/master/resources/src/mediawiki.base/mediawiki.errorLogger.js... [22:43:42] (03CR) 10Nuria: "Just naming changes that I think we need to correct before to merge, virtual +1 once corrected." (039 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/553160 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal) [22:49:07] (03CR) 10Nuria: Add data quality metric: traffic variations per country (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/550498 (https://phabricator.wikimedia.org/T234484) (owner: 10Mforns) [22:56:49] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10awight) >>! In T235189#5698566, @Nuria wrote: > We welcome any advice of how to integrate these two. That would depend on how flexible we c... [23:00:05] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10awight) > it seems to be logging at 100%. So it sounds like the volume is low enough to consume unsampled. Although you might still conside... [23:03:06] 10Analytics, 10Better Use Of Data, 10Epic, 10Patch-For-Review, and 2 others: Prototype client to log errors in vagrant - https://phabricator.wikimedia.org/T235189 (10Nuria) >So it sounds like the volume is low enough to consume unsampled. Rather, we will be adding config and enabling it for just 1 wiki at... [23:55:44] oh thats a fun "upgrade". tox 3.7 that i was using uses absolute path for {toxinidir} and friends, but 3.10 used in CI uses relative paths. Which breaks how i was cheating with `ln -s`..gotta find another way [23:59:25] heh, although it tries to be smart so something like {env:PWD}/{toxinidir}/.. doubles things up...so for extra beauty we get this hack: /{toxinidir}/... which doubles up the leading /, but convinces tox to not strip the path [23:59:39] and wrong room, sorry :P