[00:05:00] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10MattCleinman) I believe I have reproduction steps: 1. It's via the Explore Feed - on a "random article of the day" which has... [08:07:46] 10Analytics-Clusters, 10Operations, 10netops, 10Patch-For-Review: Move netflow data to Eventgate Analytics - https://phabricator.wikimedia.org/T248865 (10elukey) For this quarter I'd propose to stop the work on moving netflow to eventgate/mep, keeping the current 'ad-hoc' configuration, and then re-evaluat... [09:16:09] so I think I might have found a solution for the journal nodes [09:16:16] instead of restarting, stopping all and then start again [09:16:21] * elukey cries in a corner [09:18:57] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade the Hadoop test cluster to BigTop - https://phabricator.wikimedia.org/T244499 (10elukey) I've done another round of rollout/rollback, and I found the following interesting log: ` 2020-07-09 08:52:37,907 INFO org.apache.ha... [10:12:39] 10Analytics-Clusters: Neflow data pipeline - https://phabricator.wikimedia.org/T257554 (10elukey) p:05Triage→03Medium [10:13:11] 10Analytics-Clusters: Neflow data pipeline - https://phabricator.wikimedia.org/T257554 (10elukey) [10:13:17] 10Analytics, 10Operations, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10elukey) [10:13:19] 10Analytics: Setup refinment/sanitization on netflow data similar to how it happens for other event-based data - https://phabricator.wikimedia.org/T245287 (10elukey) [10:13:21] 10Analytics: Set up automatic deletion/snatization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10elukey) [10:13:25] 10Analytics-Clusters, 10Operations, 10netops, 10Patch-For-Review: Move netflow data to Eventgate Analytics - https://phabricator.wikimedia.org/T248865 (10elukey) [10:14:34] 10Analytics: Setup refinment/sanitization on netflow data similar to how it happens for other event-based data - https://phabricator.wikimedia.org/T245287 (10elukey) [10:14:41] 10Analytics-Clusters, 10Operations, 10netops, 10Patch-For-Review: Move netflow data to Eventgate Analytics - https://phabricator.wikimedia.org/T248865 (10elukey) [10:16:06] 10Analytics: Set up automatic deletion/snatization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10elukey) T248865 seems to be doable long term, but for the moment I'd proceed with dropping old data periodically keeping netflow as "custom" for the time being. [10:26:24] * elukey lunch! [10:28:19] 10Analytics-Clusters, 10Analytics-Radar, 10Operations, 10observability: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10fgiunchedi) I copied `prometheus-burrow-exporter` to `buster-wikimedia` and tested in a WMCS instance. puppet runs successfully and burrow from Buster... [11:10:15] 10Analytics-Clusters, 10Analytics-Radar, 10Operations, 10observability: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10fgiunchedi) Burrow + exporter seems to be working as expected when connected to real kafka/zk, we are good to go with production VMs [12:43:17] o/ [12:43:47] don't cry Luca, I'm here [12:45:40] 10Analytics-Clusters, 10Cassandra, 10User-Elukey: Cassandra3 migration plan proposal - https://phabricator.wikimedia.org/T249756 (10elukey) We agreed on upgrading Cassandra to 3.11 to start experimenting with rolling upgrades for Cassandra. The high level plan is to test the in place upgrade in cloud/labs fi... [12:45:49] 10Analytics-Clusters, 10Cassandra, 10User-Elukey: Cassandra3 migration plan proposal - https://phabricator.wikimedia.org/T249756 (10elukey) [12:46:03] 10Analytics-Clusters, 10Analytics-Kanban, 10Cassandra, 10User-Elukey: Cassandra3 migration plan proposal - https://phabricator.wikimedia.org/T249756 (10elukey) [12:47:18] I kind of think not using Kafka Connect is a mistake [12:48:45] 10Analytics, 10Cassandra: Set up a testing environment for the AQS Cassandra 3 migration - https://phabricator.wikimedia.org/T257572 (10elukey) [12:48:51] joal: hellooo this change in the deployment etherpad doesn't seem to be merged? https://gerrit.wikimedia.org/r/c/analytics/refinery/+/610151 [12:48:53] 10Analytics-Clusters, 10Cassandra: Set up a testing environment for the AQS Cassandra 3 migration - https://phabricator.wikimedia.org/T257572 (10elukey) [12:49:12] oh sorry joal, just realized you're off today, pls don't respond :) [12:53:26] (03CR) 10Fdans: [V: 03+2] Correct unique-devices per-project-family bug [analytics/refinery] - 10https://gerrit.wikimedia.org/r/610151 (https://phabricator.wikimedia.org/T257358) (owner: 10Joal) [13:15:14] elukey o/ [13:15:41] hola! [13:18:49] ottomata: I've thoughts about event sourcing :) [13:34:00] 10Analytics, 10Dumps-Generation: Sample HTML Dumps - Request for feedback - https://phabricator.wikimedia.org/T257480 (10Ottomata) [13:34:08] milimetric: i would love to hear them! [13:34:59] ok, so first thought: we should do it ourselves, with mediawiki history. I wonder how feasible this is, wanna chat? [13:35:17] 10Analytics-Clusters, 10Operations, 10netops, 10Patch-For-Review: Move netflow data to Eventgate Analytics - https://phabricator.wikimedia.org/T248865 (10Ottomata) Sure! [13:36:05] milimetric: gimme 5-10 mins [13:36:08] finsihing email chewcking [13:36:09] anytime [13:49:16] ok milimetric [13:49:26] ok, will join in 1 min [13:49:46] in bc k [14:20:59] elukey, we are having problems with python versions on JupyterHubs when using PySpark. This started few days ago, [14:21:57] do you have any intuition why this problem appears now? [14:22:15] can you give me more details about what problem you see ? :) [14:23:21] elukey: https://phabricator.wikimedia.org/T256997 [14:24:14] ahhh yes! [14:24:31] we are going to rollout a fix, but there is a temp workaround [14:24:33] lemme get it [14:24:50] Cool, thx [14:36:12] (in a meeting will get the info in some mins) [14:45:52] 10Analytics, 10ChangeProp, 10Core Platform Team, 10Event-Platform, and 2 others: Run EventBus tests in MediaWiki core CI - https://phabricator.wikimedia.org/T257583 (10Pchelolo) [14:54:01] no rushes [15:00:34] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10Nuria) >I believe I have reproduction steps: Nice! I looked at requests more in detail and the only thing it looked a bit s... [15:00:59] a-team: are we standupping? [15:02:35] ottomata, elukey , fdans : standdduppp [15:07:14] dsaez: https://phabricator.wikimedia.org/T234629#6192867 [15:07:35] in theory in a bit it should not be needed anymore (a bit == some days) [15:18:28] elukey, that line could be added to the jupyterhub kernels configuration or do we need to run a separated pyspark instance? [15:21:36] dsaez: I'll follow up with Andrew to build the new spark pkg today/tomorrow so we can see if it resolves the issue [15:21:54] okk ... cool, thx [16:21:47] 10Analytics-Radar, 10Better Use Of Data, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, and 7 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Jdlrobson) [16:40:51] ottomata: o/ [16:41:11] forgot to ask - I rebuilt the last version of spark2 for buster, can I upload + copy to stretch? [16:41:32] dsaez seems to have an issue with jupyter hub https://phabricator.wikimedia.org/T256997 [16:43:39] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10MattCleinman) Ignore my repro steps, they do not actually work. [16:45:49] (will check later, bbiab) [17:31:10] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10Nuria) Let me dig out which are urls frequently requested (in the thousands) to see if we find a pattern. [17:33:40] elukey: trying to remember...... [17:33:52] i think perhaps the versions matter? not sure... [17:34:03] i guess not since the python versions are separate [17:34:10] dunno if there would be e.g. shared lib problems [17:34:14] i can't remember what we do [17:34:17] i'd say yes? [17:34:21] give it a try [17:34:56] ottomata, elukey pointed to a work around you suggested that is to define the env var like this: PYSPARK_PYTHON=python3.7 pyspark2 --master yarn [17:35:10] yeah, but then the jupyter kernel needs fixed sigh [17:35:13] but I don't know if this need to go in the jupterhub configuration or were [17:35:17] actually i have a workaround in the spark package [17:35:26] https://gerrit.wikimedia.org/r/c/operations/debs/spark2/+/602386 [17:35:29] not released though [17:35:38] i guess we should do it if it is messin gup jupyter on buster [17:36:28] ottomata: this is the error: https://phabricator.wikimedia.org/T256997 [17:37:33] ya the problem is that stat008 uses pythyon 3.7 by default [17:37:45] and the hadoop workers are stretch and use python 3.5 by default [17:37:56] so, that env needs to be set properly to force the workers to use the same version as the driver [17:37:59] on the CLI that workaround is easy [17:38:08] in jupyter it isn't because the kernel builds the env itself [17:38:20] the fix I linked to makes the default always set to the version the driver uses [17:39:01] actually it might need more testing [17:39:06] ipython is a problem [17:39:21] i don't kjnow if jupyter needs to be laucned with ipython [17:39:26] if it does, it won't work [17:39:35] hmmm or hmm [17:39:41] it might sigh i dunno [17:39:45] needs more thought [17:44:59] 10Analytics, 10VPS-project-codesearch: Add analytics/* gerrit repos to code search - https://phabricator.wikimedia.org/T249318 (10Milimetric) 05Open→03Resolved Thank you very much for doing it better than I was going to! I don't use uBlock so I have no opinions, but it seems like the main request here is... [17:51:53] 10Analytics, 10Fundraising-Backlog: Dashboard for CentralNotice impression rates using Druid, centralnotice_analytics and CN events - https://phabricator.wikimedia.org/T254792 (10Milimetric) Looked at this a little bit closer. I don't see any reason it can't be implemented whenever you prioritize it. One sma... [18:40:49] 10Analytics-Radar, 10Gerrit, 10Operations: update git-review to >= 1.27 on all stretch hosts across the board - https://phabricator.wikimedia.org/T257609 (10Dzahn) [18:42:49] 10Analytics-Radar, 10Gerrit, 10Operations: update git-review to >= 1.27 on all stretch hosts across the board - https://phabricator.wikimedia.org/T257609 (10Dzahn) per T257496#6294709 I imported the buster 1.27 package to stretch (first tested on mwdebug1001, then imported on apt.wikimedia.org with included... [18:47:43] 10Analytics-Radar, 10Gerrit, 10Operations: upgrade git-review to >= 1.27 on all stretch hosts across the board - https://phabricator.wikimedia.org/T257609 (10Dzahn) [18:48:08] ottomata: yes I meant the release for spark2 still in progress [18:48:38] 10Analytics-Radar, 10Gerrit, 10Operations: upgrade git-review to >= 1.27 on all stretch hosts across the board - https://phabricator.wikimedia.org/T257609 (10Dzahn) 05Open→03Resolved a:03Dzahn Upgraded the package on stat1004, stat1006 and stat1007 (apt-get update, apt-get install git-review). https:/... [18:49:27] elukey: we should build that and install on stat1008 and try some things [18:49:55] ottomata: it is already on apt1001, do you want me to skip apt for the moment? [18:50:17] oh you built it i see cool! [18:50:23] no no apt is fine [18:50:26] go for it [18:50:31] lets hodl off on stretch though [18:50:37] juts do buster and lets try on stat1008 [18:51:26] root@apt1001:/srv/wikimedia# reprepro lsbycomponent spark2 [18:51:26] spark2 | 2.4.4-bin-hadoop2.6-2 | stretch-wikimedia | main | amd64, i386, source [18:51:29] spark2 | 2.4.4-bin-hadoop2.6-3 | buster-wikimedia | main | amd64, i386, source [18:51:51] all right installing on stat1008 [18:52:34] !log upgrade spark2 to 2.4.4-bin-hadoop2.6-3 on stat1008 [18:52:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:53:59] ottomata: stat1008 ready [18:56:36] I just tested a simple pyspark sql query and it worked fine [18:57:09] did you do it with yarn? [18:57:16] I did yes [18:57:19] oh! [18:57:20] that is good [18:57:24] you did e.g. a count or osmething? [18:57:30] (i'm still waiting for mine to load...) [18:57:35] I did spark.sql("SELECT * FROM wmf.webrequest where year=2020 and month=3 and day=16 and hour=0 limit 10").show(); [18:57:50] nice that is good [18:57:59] try to just count there hour [18:58:04] sure [18:58:07] make sure an executor is actually launched [18:58:21] limit and/or show might be able to optimize around that [18:58:46] yep worked [18:58:51] spark.sql("SELECT count(*) FROM wmf.webrequest where year=2020 and month=5 and day=16 and hour=0").show(); [18:59:47] great! [18:59:57] i just can't recall off the top of my head if we need to build special for stretch... [19:00:06] i think we don't because the deps are per python package [19:00:08] sory [19:00:10] per python version [19:00:27] yeah the last time we decided to copy (in reprepro I mean) [19:00:47] because there was the version problem, with the same source [19:02:17] ok [19:02:20] go for it i think it works [19:02:28] lets check on e.g. stat1004 or soemthing [19:02:36] so reprepro copy, install there [19:02:41] then lets check and make sure e.g. pandas or numpy works [19:03:24] yep, is it ok if I complete this tomorrow morning? In the meantime, we can ask dsaez to re-check if anything changed (maybe we are lucky adn we don't have to mess with jupyter) [19:03:34] (on stat1008) [19:08:04] I get it as silent yes, will report tomorrow how the tests go :) [19:08:30] dsaez: o/ if you have time let me know if pyspark works on jupyter now (please stop the kernel and start it again to tesT) [19:08:33] * elukey afk! [19:44:02] elukey: for sure [19:44:06] ! [19:44:21] sorry was in 1on1 with nuria [19:48:19] 10Analytics, 10Dumps-Generation: Sample HTML Dumps - Request for feedback - https://phabricator.wikimedia.org/T257480 (10Milimetric) As an example, here is our dump of metadata for all history and how we structure it to solve the problem of "enwiki is massive while other wikis are tiny": https://dumps.wikimedi... [19:49:51] 10Analytics, 10Analytics-Kanban, 10Core Platform Team Workboards (Initiatives): Design Document that proposes an alternative architecture for historic data endpoints - https://phabricator.wikimedia.org/T241184 (10Milimetric) I don't see why not, making it read-only now. The first section on the lambda archi... [19:56:30] dr apt laters! [20:21:59] (03CR) 10Nettrom: [C: 03+1] eventlogging: Remove unused props from PrefUpdate [analytics/refinery] - 10https://gerrit.wikimedia.org/r/588105 (https://phabricator.wikimedia.org/T249894) (owner: 10Krinkle) [20:29:43] (03CR) 10Nuria: [C: 03+2] "Merging." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/588105 (https://phabricator.wikimedia.org/T249894) (owner: 10Krinkle) [20:46:23] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Create job that backfills Pagecounts-EZ (2011 - 2016) data via hadoop correcting issues - https://phabricator.wikimedia.org/T252857 (10Milimetric) Just curious, but is this data supposed to be in the `pageview_historical` table now? I only see data for 20... [22:49:30] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10Nuria) Looking at this list below it seems to me we have a bug related to urls with punctuation marks. See url list with numb... [22:53:46] 10Analytics, 10Product-Analytics: Re-process webrequests from 2020-05-18 so that page views from latest Wikipedia app releases are counted - https://phabricator.wikimedia.org/T256516 (10JoeWalsh) That makes sense, thanks so much for pulling the data! We'll be able to use this to ensure we have a complete fix f...