[06:09:45] o/ [06:56:17] ebernhardson: o/ - as FYI rocm is way more stable on stat1005, I updated it yesterday to 2.7.1 (latest upstream). Miriam worked on tensorflow and it seems processing stuff well, if you want to do some tests and report back I'll be super happy :) [07:09:12] Good morning elukey :) [07:12:24] bonjour! [07:12:47] I forgot to ask to Andrew all the info related to notebooks [07:12:48] sigh [07:13:16] joal: when you have time, do you want to discuss https://phabricator.wikimedia.org/T231067 ? [07:13:30] I'd love to know your opinion [07:13:36] Sure elukey! You're always my number one priority :) [07:13:36] nothing urgent, when you have 10 mins :) [07:13:40] <3 [07:13:50] now, later? [07:15:38] when you have time :) [07:15:50] I added some thoughts to the task [07:16:09] we can discuss them in the bc if you want (probably quicker) [07:16:10] We can do it now [07:16:13] sure [07:16:17] To the cave! [07:16:19] ack joining [07:16:30] (I feel like cro-magnon when saing that) [08:18:12] 10Analytics: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) @MoritzMuehlenhoff what do you think about our plan? Could it be enough to consider backporting openjdk-8 to buster? The value added would not only be related to Hadoop, but to all java-based systems like Kafka/Dr... [08:18:56] https://issues.apache.org/jira/browse/KAFKA-7264 talks about java 11 support for kafka, fixed for 2.1.0.. [08:44:07] 10Analytics-Kanban: Deprecate Python 2 software from the Analytics infrastructure - https://phabricator.wikimedia.org/T204734 (10elukey) [08:55:17] I believe that we should start working on https://www.python.org/doc/sunset-python-2/ sooner rather than later [08:56:07] joal: do you know if we use libraries in refinery/etc.. that are definitely not available in python3? [08:56:24] hm - I actually have no idea [08:56:49] elukey: nothing pops in my mind, but I really don't know well our python libs [09:02:39] 10Analytics, 10Analytics-Cluster, 10User-Elukey: Update to CDH 6 or other up-to-date Hadoop distribution - https://phabricator.wikimedia.org/T203693 (10elukey) [09:03:39] 10Analytics: Parse wikidumps and extract redirect information for 1 small wiki, romanian - https://phabricator.wikimedia.org/T232123 (10MGerlach) a:05leila→03MGerlach Martin will work on this project as part of his onboarding [09:10:33] (03PS1) 10Joal: Squash all commits on uap-java from 2016-04 [analytics/ua-parser/uap-java] - 10https://gerrit.wikimedia.org/r/536129 (https://phabricator.wikimedia.org/T212854) [09:14:34] (03PS2) 10Joal: Squash all commits on uap-java from 2016-04 [analytics/ua-parser/uap-java] - 10https://gerrit.wikimedia.org/r/536129 (https://phabricator.wikimedia.org/T212854) [09:35:59] !log drop old database 'superset' from analytics-meta (an-coord1001) after a precautionary backup [09:36:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:44:24] 10Analytics, 10Analytics-Kanban: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10elukey) Had a chat with Manuel, and we have some alternatives: 1) use xtrabackup, that is similar to what we are currently doing with lvmsnapshot, but in theory safer and possibly faste... [10:28:47] (03PS1) 10Ladsgroup: Use the new wdqs address [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/536144 (https://phabricator.wikimedia.org/T176875) [10:45:46] * elukey lunch [11:06:44] 10Analytics, 10Operations, 10Traffic: Images served with text/html content type - https://phabricator.wikimedia.org/T232679 (10jbond) p:05Triage→03Normal [11:44:57] (03CR) 10WMDE-leszek: [C: 03+2] Use the new wdqs address [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/536144 (https://phabricator.wikimedia.org/T176875) (owner: 10Ladsgroup) [11:46:12] (03Merged) 10jenkins-bot: Use the new wdqs address [analytics/wmde/toolkit-analyzer] - 10https://gerrit.wikimedia.org/r/536144 (https://phabricator.wikimedia.org/T176875) (owner: 10Ladsgroup) [11:55:42] nice --^ [11:58:57] elukey: I have a question for you [11:59:25] sure [11:59:28] (https://pythonclock.org/ is awesome) [11:59:58] elukey: I'm fighting badly with uap-core and uap-java repos, mostly because our repo versions have squashed cherry-pick history [12:00:39] elukey: do you think it'd be ok to push force fresh versions of those, matching versions of the base github repos? [12:01:28] joal: in theory yes, I don't see any problem with it [12:01:42] Then, if we need to update the code for us, possibly we could use [WMF] commit tags, facilitating rebasing when updating (if needed, we should always provide PRs associated with those internal patches) [12:02:00] +1 [12:02:02] ok elukey - I'll wait for nuria's opinion on that - Hopefully it'll be ok :) [12:02:16] Thanks buddy [12:02:45] in my mind, we should aim to eventually "return" to a full upstream maintained solution, if somebody work on it (or restarted to work on it) [12:02:56] we usually fork stuff but we forget to check upstream periodically :d [12:03:03] maintaining forks is painful [12:03:03] yes [12:52:14] (03PS10) 10Fdans: Add per file mediarequests endpoint to AQS [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) [12:53:01] (03CR) 10jerkins-bot: [V: 04-1] Add per file mediarequests endpoint to AQS [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) (owner: 10Fdans) [12:53:53] ಠ_ಠ [12:55:03] * joal looks away [12:55:36] (03PS11) 10Fdans: Add per file mediarequests endpoint to AQS [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) [13:04:41] (03PS14) 10Fdans: Add cassandra loading job for requests per file metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) [13:08:23] (03CR) 10Nuria: [C: 03+1] "Looks good, let's make sure to push to a branch and test in beta (you can push a commit on top of this one)" (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) (owner: 10Fdans) [13:09:02] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Epic: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10fgiunchedi) [13:09:12] @nuria i'm doing a final test on the cassandra load job [13:09:26] lol look at me using @s like we're on slack [13:09:39] * fdans if only [13:11:30] (03CR) 10Nuria: [C: 03+1] "Looks good, maybe @joal can take one last look here for anything I might have missed?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) (owner: 10Fdans) [13:12:17] fdans: k [13:12:17] 10Analytics: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10MoritzMuehlenhoff) So, let me summarise to make I got this correctly. We have the following two options: 1. Upgrade to CDH 6.3 on Stretch which provides Hadoop and Scala supporting both Java 8 and 11 and then reimage eac... [13:13:03] 10Analytics, 10Research: Parse wikidumps and extract redirect information for 1 small wiki, romanian - https://phabricator.wikimedia.org/T232123 (10Nuria) [13:27:33] ottomata: created ticket for production access https://phabricator.wikimedia.org/T232707 [13:28:03] o/ [13:28:46] mgerlach: is 'mgerlach' your wikitech username? [13:28:59] and, is that a good shell username too (it is nice if they are the same) [13:29:16] (but they don't have to be...your wikitech might be Mgerlach_WMF or something?) [13:29:36] ottomata: yes (I am 99% sure) [13:29:40] ok cool [13:31:01] mgerlach: o/ (Luca, Analytics team :) [13:31:37] (don't think that we met yet but I might be wrong :D) [13:33:38] elukey: o/ probably not [13:33:49] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to analytics cluster for Martin Gerlach - https://phabricator.wikimedia.org/T232707 (10Ottomata) a:05Ottomata→03None [13:34:09] mgerlach: luca is my counter part on all things analytics opsey/backend! [13:34:11] (03PS15) 10Fdans: Add cassandra loading job for requests per file metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) [13:34:36] mgerlach: great, i edited your task and updated it [13:35:03] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to analytics cluster for Martin Gerlach - https://phabricator.wikimedia.org/T232707 (10Ottomata) [13:35:18] ottomata: great, thanks. [13:37:12] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10Ottomata) 2 minutes is definitely acceptable, and I think we could run the backup once a day even. [13:44:34] 10Analytics: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10Ottomata) A tricky part about Java upgrades is that from what I can tell, any inter JVM process communication seems to fail between different java versions. So, Hadoop <-> Hadoop stuff will fail if the processes are runn... [14:19:00] I'm going to write a book called "How to recover when Oozie has broken your spirit" [14:32:05] 10Analytics, 10EventBus, 10Scoring-platform-team: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Halfak) Confirmed. We expect no overlap in revision-scored where a revision is scored twice, but should that happen, new event. [14:34:56] 10Analytics, 10EventBus, 10Scoring-platform-team: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Ottomata) @Halfak I think we've confirmed this before too, but I want to make super sure! All predictions are strings (or can be cast to strings), an... [14:39:28] 10Analytics, 10EventBus, 10Scoring-platform-team: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Halfak) Hmm yes. You can cram a bool or int into a string. Predictions are all int, bool, string, or list of strings. [14:41:14] (03PS16) 10Fdans: Add cassandra loading job for requests per file metric [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) [14:57:33] fdans: also, "the things you can learn while oozie executes your job" [14:58:24] "the things you can learn while oozie takes 15 minutes to kill your job" [15:03:53] 10Analytics, 10Research: Parse wikidumps and extract redirect information for 1 small wiki, romanian - https://phabricator.wikimedia.org/T232123 (10leila) p:05Triage→03Normal [15:10:41] 10Analytics, 10Cleanup, 10Editing-team: Deletion of limn-edit-data repository - https://phabricator.wikimedia.org/T228982 (10TheSandDoctor) This appears to be a duplicate of T228981 but pinging different users...? [15:17:48] 10Analytics, 10Cleanup, 10Editing-team: Deletion of limn-edit-data repository - https://phabricator.wikimedia.org/T228982 (10fdans) @TheSandDoctor nope. This task is regarding the deletion of https://github.com/wikimedia/analytics-limn-edit-data The task you mention is for the deletion of https://github.com... [15:26:14] (03CR) 10Nuria: "This updates also to the newest version of uap-core?" [analytics/ua-parser/uap-java] - 10https://gerrit.wikimedia.org/r/536129 (https://phabricator.wikimedia.org/T212854) (owner: 10Joal) [15:29:52] (03CR) 10Fdans: [V: 03+1] "Verified that this works fine. Compared values resultant from cassandra load job with mediarequests datasetr, including the last change re" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) (owner: 10Fdans) [15:32:24] ottomata: did you see https://issues.apache.org/jira/browse/KAFKA-7264 ? [15:32:29] (also, o/) [15:34:50] i did not! [15:35:04] wonder what the upgrade process is [15:35:10] i bet good, they try to make that stuff easy [15:35:11] ish [15:38:04] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to analytics cluster for Martin Gerlach - https://phabricator.wikimedia.org/T232707 (10leila) I approve. thanks! [15:39:27] (03CR) 10Fdans: [V: 03+1] "Verified to be working in beta. This is ready to be merged." [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) (owner: 10Fdans) [15:43:16] joal nuria both changes can be merged at yalls convenience [15:56:20] fdans: Will review this evening after meetings [16:00:46] ping fdans , ottomata standdduppp [16:01:16] oh [16:04:04] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Version analytics meta mysql database backup - https://phabricator.wikimedia.org/T231208 (10elukey) a:05Ottomata→03elukey [16:12:04] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to analytics cluster for Martin Gerlach - https://phabricator.wikimedia.org/T232707 (10Nuria) Approved on my end as well. [17:09:40] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: Upgrade Spark to 2.4.x - https://phabricator.wikimedia.org/T222253 (10fdans) [17:10:27] 10Analytics, 10Analytics-SWAP, 10Product-Analytics, 10Patch-For-Review: Upgrade all SWAP users to JupyterLab 1.0 - https://phabricator.wikimedia.org/T230724 (10fdans) [17:12:08] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Upgrade n ua parser to latest version for both java and python - https://phabricator.wikimedia.org/T212854 (10Nuria) [17:14:23] 10Analytics, 10Operations, 10Patch-For-Review, 10User-Elukey: Archival of home directories on servers with very large homes - https://phabricator.wikimedia.org/T215171 (10fdans) [17:15:57] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Sunset Wikimetrics - https://phabricator.wikimedia.org/T211835 (10fdans) 05Open→03Resolved [17:16:04] 10Analytics, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Sunset Wikimetrics - https://phabricator.wikimedia.org/T211835 (10fdans) 05Resolved→03Open [17:27:33] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Operations, and 2 others: Upgrade python-kafka - https://phabricator.wikimedia.org/T221848 (10Ottomata) 05Open→03Resolved Closing this task. {T222941} is still an issue, but for now we will ensure that we don't accidentally upgrade to 1.4.6 o... [17:32:35] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10Ottomata) Status: 1.4.1 is on eventlog1002, 1.4.6 is in apt and used by coal (IIUC). We aren't going to solve https:... [17:36:39] 10Analytics, 10Patch-For-Review: Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10Ottomata) [17:59:36] 10Analytics, 10Patch-For-Review, 10Performance-Team (Radar): Eventlogging processors are frequently failing heartbeats causing consumer group rebalances - https://phabricator.wikimedia.org/T222941 (10Krinkle) [18:06:26] (03CR) 10Joal: [C: 04-1] "Many non-important comments, one important (pageTitle normalization gone, breaking other API)." (0311 comments) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/534824 (https://phabricator.wikimedia.org/T231589) (owner: 10Fdans) [18:24:47] (03Abandoned) 10Joal: Squash all commits on uap-java from 2016-04 [analytics/ua-parser/uap-java] - 10https://gerrit.wikimedia.org/r/536129 (https://phabricator.wikimedia.org/T212854) (owner: 10Joal) [18:29:39] fdans: I hope you don't hate me too much for the review :S [18:34:35] (03CR) 10Joal: [C: 03+1] "Except from the naming discussion on value fields (aqs CR), looks great. Thanks a lot for the coordinator in addition to the bundle :)" (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/533921 (https://phabricator.wikimedia.org/T228149) (owner: 10Fdans) [18:38:56] Gone for diner - Back in a bit [18:40:46] ottomata: when you can, can you do a pass over the onboarding etherpad we have together and assign months to the items that are unassiged. For example, eventlogging. [18:41:58] done [18:42:08] ottomata: thanks! [18:42:55] joal: one more q for you, too. does T232123 include some intro to http://pythonhosted.org/mediawiki-utilities/ ? or shall we plan for mgerlach to learn that separately? [18:42:56] T232123: Parse wikidumps and extract redirect information for 1 small wiki, romanian - https://phabricator.wikimedia.org/T232123 [19:31:37] leila: no mediawiki-utilities in what we currently (yet!) [19:39:02] for the ones interested in huge-data analytics: https://blog.acolyer.org/2019/09/11/procella/ [19:39:23] and the paper: https://ai.google/research/pubs/pub48388/ [19:39:34] The numbers come from another dimension... [19:40:06] Gone for tonight team - see you tomorrow [19:57:14] (03PS9) 10Mforns: [WIP] Add Oozie job for mediawiki history dumps [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530002 (https://phabricator.wikimedia.org/T208612) [20:41:53] (03CR) 10Mforns: "This is tested and ready for final review!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530002 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [20:43:40] (03CR) 10Mforns: [C: 04-2] "recheck" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [20:45:16] (03PS7) 10Mforns: [WIP] Add spark job to create mediawiki history dumps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/528504 (https://phabricator.wikimedia.org/T208612) [21:52:54] ottomata: friendly remainder about: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/535910/ [21:53:21] DOH nurriaaa [21:53:27] i forgot! [21:53:29] i can merge now [21:53:34] but am leaving soon, not working tomorrow [21:58:18] nuria: should I merge and let it go? [22:00:28] ottomata: ya, if something goes bad i will revert and enlist a friendly ops for +2 [22:00:33] ok [22:00:33] ottomata: sounds ok? [22:04:55] merged and applied nuria [22:05:06] k [22:05:11] byyyeee [22:05:22] ottomata: applied meaning you re-run puppet? [22:21:14] 10Analytics, 10Operations, 10Traffic: Images served with text/html content type - https://phabricator.wikimedia.org/T232679 (10Nuria) cc @Ottomata just in case he can do the change too [23:02:55] 10Analytics, 10Operations, 10Traffic: Images served with text/html content type - https://phabricator.wikimedia.org/T232679 (10Nuria) [23:09:36] 10Analytics: Client_IP and Ip are always the same , even for proxied requests for opera mini - https://phabricator.wikimedia.org/T232795 (10Nuria) [23:11:37] 10Analytics: Client_IP and Ip are always the same , even for proxied requests for opera mini - https://phabricator.wikimedia.org/T232795 (10Nuria) Verfiable with this query: select ip, client_ip,x_analytics, user_agent from wmf.webrequest where year=2019 and month=09 and day=01 and hour=01 and user_agent lik...