[07:24:03] goood morning [07:25:58] Good morning! [07:26:07] elukey: today is webrequest-change day :) [07:26:50] bonjour! OK taking a day of holidasy! [07:26:54] *holidays :D [07:26:57] :D [07:27:19] elukey: it should be no-op from your side [07:28:47] joal: http://www.quickmeme.com/meme/3r73wi [07:29:12] :D [07:29:21] okok jokes aside, lemme know if/how I can helo [07:29:24] *help [07:29:37] I could start trying to write words in correct english for example [07:30:53] elukey: https://image.slidesharecdn.com/testinginproduction-150602121522-lva1-app6892/95/testing-in-production-6-638.jpg?cb=1433247388 [07:31:23] elukey: anyhow, I'm gonna move gently and ask for your pair of eyes first if ok for you [07:32:58] ahahahhahaahh [07:33:07] ack! [07:40:07] joal: also, we have analytics-hive.eqiad.wmnet working for an-coord1002 [07:40:33] yes elukey - I have been stopped while testing last week and forgot to get back on it [07:40:40] I promise I'll do today [07:41:09] nono I didn't mean to push you, I was just bringing it up, didn't remember what we decided last week :) [07:41:22] yup [07:41:27] first: test! [07:41:31] (in prod, obviously) [07:44:06] I am almost done in prepping a presentation for the team about bigtop and the failover for coord, I had an idea about the db failover that could work nicely [07:44:15] but I need to verify it with Manuel first [07:44:28] elukey: I'm eager to know :) [07:44:43] basically, my idea is to create an idential copy of an-coord1001 on 1002, mariadb included [07:44:59] then we set mariadb on 1002 as replica of 1001 [07:45:26] so 1001 will have two replicas: db1108, on which we also do backups etc.., and 1002 [07:45:43] if 1001 brutally dies, we just need to flip config to 1002, that's it [07:45:50] Nice :) [07:45:57] when 1001 is up again, we just make it as replica of 1002, etc.. [07:46:07] and in theory, we could also ease failovers of the coordinator [07:46:18] say we want to reboot 1001, we do [07:46:27] 1) set mariadb on 1001 as read-only [07:46:42] 2) stop 1002's replica config [07:46:46] 3) move clients to 1002 [07:47:10] that's it, there is surely some brief moment in which oozie/superset/etc.. are not happy about read only [07:47:37] elukey: Still a lot better than a full teardown [07:47:44] if this works and it is not crazy, it could mean a super fast failover if needed [07:47:55] elukey: how complicated would that be to move clients to 1002? [07:48:26] joal: if my ideas are correct, just flipping a couple of dns entries [07:48:36] \o/ [07:50:08] also we'd have a replicated hive metastore, and a replicated hive server (but not load balanced initially) [07:50:23] elukey: would you agree for a quick batcave for me to underline the plan about webrequest change? [07:50:35] joal: yep, ok in ~20 mins? [07:50:53] elukey: I'd like to move almost now, before new hour comes in [07:51:00] elukey: can do it in here as well [07:51:20] joal: ah sure then, let's do it [07:51:23] joining bc [07:51:33] ack [07:55:51] !log Kill webrequest-bundle oozie job for table update [07:55:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:00:00] !log Drop webrequest table [08:00:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:01:08] !log Recreate wmf.webrequest hive table with new partitioning [08:01:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:01:50] !log Repair wmf.webrequest hive table partitions [08:01:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:10:09] (03PS5) 10Joal: Improve webrequest-refine query [analytics/refinery] - 10https://gerrit.wikimedia.org/r/638086 (https://phabricator.wikimedia.org/T267008) [08:10:32] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/638086 (https://phabricator.wikimedia.org/T267008) (owner: 10Joal) [08:13:33] !log Deploying refinery with scap [08:13:35] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:25:04] !log Deploying refinery onto HDFS [08:25:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:31:43] !log Restart webrequest bun [08:31:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:31:58] !log Restart webrequest bundle oozie job with update [08:32:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:33:26] ok elukey - operation done - Monitoring the first run before rerunning the beginning of the day [08:35:12] nice! [08:38:04] Arf - failure :( [08:41:30] MEH - There was a typo :]9 [08:43:15] !log Kill webrequest bundle to correct typo [08:43:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:45:10] !log Correct webrequest job directly on HDFS and restart webrequest bundle oozie job [08:45:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:56:42] (03PS1) 10Joal: Fix typo in refine_webrequest.hql [fixed in prod] [analytics/refinery] - 10https://gerrit.wikimedia.org/r/641142 [08:57:03] (03PS2) 10Joal: Fix typo in refine_webrequest.hql [analytics/refinery] - 10https://gerrit.wikimedia.org/r/641142 [08:57:29] (03CR) 10Joal: [V: 03+2 C: 03+2] "Merging - already fixed in prod" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/641142 (owner: 10Joal) [09:13:30] !log Rerun webrequest-refine for hours 0 to 6 of day 2020-11-16 - This will prevent webrequest-druid-daily to get loaded with incoherent data due to bucketing change [09:13:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:34:40] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10JAllemandou) My perspective on `page_id` at pageview level: - `page_id` is //mostly// available in pageviews - not prese... [09:52:54] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10JAllemandou) >>! In T267283#6620924, @Nuria wrote: > @JAllemandou Given that user fingerp... [10:41:14] Morning! [10:41:19] o/ [10:41:30] !log about to update stat1008 to new kernel and rocm [10:41:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:34:11] * elukey afk! lunch! [11:34:26] klausman: great news about stat1008 \o/ (I saw the email) [11:35:06] pinging miriam explicitly, Tobias upgraded rocm on stat1008 too [11:35:09] :) [11:35:43] yayyy thanks klausman and elukey!! thanks SO much! [11:36:27] miriam: tf-rocm's compatibility is changed, 2.3.1 is required now [11:36:50] (will reaad laterzzz) [11:36:50] You are most welcome :) [11:37:07] noted, thanks! [11:37:19] elukey: once you're back, we can take a look at the ATS thingy [12:01:14] (03CR) 10Gilles: [C: 03+1] Make compatible with Python 3 [analytics/statsv] - 10https://gerrit.wikimedia.org/r/639223 (https://phabricator.wikimedia.org/T267269) (owner: 10Dave Pifke) [12:31:24] hi team :) [12:46:52] * klausman out for lunch and groceries, bbiab [14:21:49] hola fdans [14:22:08] klausman: ping me anytime if you want to chat [14:25:04] joal: ah I forgot, there is an interesting issue now for bigtop https://issues.apache.org/jira/browse/BIGTOP-3445 [14:25:07] :D :D :D [14:25:35] Yay :) we'll be able to test :) [14:26:02] I am also super interested in seeing the difference with swift [14:31:34] elukey: yes! sorry, got distracted :) [14:33:49] klausman: do you prefer in here, on meet, tcp over pigeon,.. ? [14:33:54] *pidgeon [14:33:56] :D [14:34:13] meet works [14:34:21] gimme a secn and I'll be in the BC [14:34:29] ahhh yes yes even more [14:34:34] ping me when ready, no rush [14:37:48] Sono pronto, capitano. [14:39:28] ack :D [14:58:50] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Isaac) > For the sake of consistency, I'd rather continue using page_title as identifier. Thanks @JAllemandou for these a... [15:29:02] heya teamm! [15:37:53] hola hola [15:51:33] 10Analytics, 10Analytics-Kanban: Fix Maxmind geoip database archive - https://phabricator.wikimedia.org/T264152 (10fdans) 05Open→03Resolved [15:51:34] 10Analytics, 10Analytics-Kanban: Improve mediawiki-wikitext spark job repartitioning - https://phabricator.wikimedia.org/T263736 (10fdans) 05Open→03Resolved [15:51:37] 10Analytics, 10Analytics-Kanban: Prevent dumps-dependent jobs to wait indefinitely - https://phabricator.wikimedia.org/T263529 (10fdans) 05Open→03Resolved [15:51:39] 10Analytics, 10Analytics-Kanban: Filter out EventLogging data with bunk user-agents - https://phabricator.wikimedia.org/T266130 (10fdans) 05Open→03Resolved [15:51:41] 10Analytics, 10Analytics-Kanban: Possible issue between Maxmind and Hive 2.x libs in Refinery source - https://phabricator.wikimedia.org/T266322 (10fdans) 05Open→03Resolved [15:51:43] 10Analytics, 10Analytics-Kanban: Purge raw webrequest_stats and webrequest_stats_hourly - https://phabricator.wikimedia.org/T262826 (10fdans) 05Open→03Resolved [15:51:45] 10Analytics, 10Analytics-Kanban: Check whether mediawiki production event data is equivalent to mediawiki-history data over a month - https://phabricator.wikimedia.org/T262261 (10fdans) 05Open→03Resolved [15:51:47] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Eventgate crashes on invalid event - https://phabricator.wikimedia.org/T260839 (10fdans) 05Open→03Resolved [15:51:49] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: [SPIKE] Prototype of incremental updates for mediawiki history for simplewiki , including reverts using apache hudi - https://phabricator.wikimedia.org/T258532 (10fdans) [15:51:51] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: Stats for newer projects not available - https://phabricator.wikimedia.org/T258033 (10fdans) 05Open→03Resolved [15:51:53] 10Analytics-Radar, 10Performance-Team, 10MW-1.36-notes (1.36.0-wmf.8; 2020-09-08), 10Patch-For-Review: Invalid navigation timing events - https://phabricator.wikimedia.org/T254606 (10fdans) [15:51:55] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: jsonschema-tools should have option to materialize schemas with default max/min validation for e.g. max long, max double, etc. - https://phabricator.wikimedia.org/T258659 (10fdans) 05Open→03Resolved [15:51:57] 10Analytics, 10Analytics-Kanban: Hadoop Hardware Orders FY2019-2020 - https://phabricator.wikimedia.org/T243521 (10fdans) [15:51:59] 10Analytics, 10Analytics-Kanban: Analytics Ops Technical Debt - https://phabricator.wikimedia.org/T240437 (10fdans) [15:52:01] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Create the new Hadoop test cluster - https://phabricator.wikimedia.org/T255139 (10fdans) 05Open→03Resolved [15:52:03] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Move https termination from nginx to envoy (if possible) - https://phabricator.wikimedia.org/T240439 (10fdans) 05Open→03Resolved [15:52:47] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to Production Shell Access (analytics-privatedata-users) for Rmaung - https://phabricator.wikimedia.org/T266250 (10herron) 05Open→03Resolved Since this has been awaiting input for several weeks, I'll temporarily transition it to closed d... [16:07:40] joal: kinda oddball question, i see wmf.webrequest had a new schema deployed today. What are the chances that even though webrequest.page_id was changed to bigint in refinery in 2017, that in prod is wasn't bigint until this morning? [16:08:01] (having issue with downstream task that can't put that bigint in an int column) [16:08:14] wow [16:08:17] hi ebernhardson [16:08:24] very possible it is :S [16:09:05] i'm suspicious, but hard to find definitive proof :) [16:09:22] hm [16:10:45] ebernhardson: I'm trying to find an idea of how to make this sure [16:11:15] yea, it's not very obvious how to check. Part of why we commit schemas to git, to have a second place to check :) [16:11:59] yeah - usually when there is a schema change we apply it, so it feels bizzare if it has not been - but very possible [16:15:12] hmm, parquet-tools should be able to report the schema when it was written. I hvae to jump into a meeting now but will look a little closer after [16:15:34] (but that schema of course is not the table schema, its parquets version ) [16:17:17] ebernhardson: I can do that with spark too [16:23:12] ottomata: should I move all the event migration tasks to the event platform column? [16:23:39] ebernhardson: I confirm the parquet files have an int before today's migrationg [16:23:48] ebernhardson: Man this is unexpected :( [16:26:44] joal: mine is not super complicated, now that i know why it happened it's not hard to cast on our side. But you might run into other things [16:26:59] ebernhardson: I hear that [16:27:26] Thanks for the heads up ebernhardson, and sorry for the inconvenient bug :S [16:27:29] 10Analytics, 10Event-Platform, 10Product-Infrastructure-Data: Client-side error logging should use Elastic Common Schema (ECS) fields when possible - https://phabricator.wikimedia.org/T267602 (10Mholloway) [16:28:39] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 2 others: Clients need to generate an ISO 8601 formatted timestamp - https://phabricator.wikimedia.org/T240460 (10fdans) p:05Low→03High [16:35:17] 10Analytics, 10Event-Platform, 10Product-Infrastructure-Data: Client-side error logging should use Elastic Common Schema (ECS) fields when possible - https://phabricator.wikimedia.org/T267602 (10fdans) p:05Triage→03Medium [16:39:05] 10Analytics-Radar, 10Pageviews-API, 10RESTBase-API, 10Wikifeeds, 10Chinese-Sites: views error in mostread feed - https://phabricator.wikimedia.org/T267624 (10fdans) The way the rank is computed is probably excluding bots, which the "views" field might not be doing. [16:39:19] (03CR) 10Neil P. Quinn-WMF: Oozie job for Wikipedia Preview stats (033 comments) [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/635578 (https://phabricator.wikimedia.org/T261953) (owner: 10Sbisson) [16:40:50] 10Analytics, 10Better Use Of Data, 10Event-Platform: Adopt conventions for server receive and client/event timestamps in non analytics event schemas - https://phabricator.wikimedia.org/T267648 (10fdans) p:05Triage→03Medium [16:59:03] ottomata: if you can later, I'd like to sync up with you on the UI, to see if I'm missing sth [17:00:23] sure! [17:43:35] 10Analytics, 10Product-Analytics: Analyze differences between checksum-based and revert-tag based reverts in mediawiki_history - https://phabricator.wikimedia.org/T266374 (10LGoto) p:05Medium→03Low [17:50:10] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Add dimensions to editors_daily dataset - https://phabricator.wikimedia.org/T256050 (10cchen) [17:51:43] ottomata: quick question! Who should we ask in analytics to approve server access? For tasks like this one? https://phabricator.wikimedia.org/T267314 Thanks! [17:52:02] miriam: I am the new approver! [17:52:40] miriam: i guess I should ask [17:52:45] who is Swagoel? [17:54:42] mforns: o/ - do you have 10 mins to chat ? [17:54:49] 10Analytics, 10Product-Analytics, 10Structured Data Engineering, 10Patch-For-Review, and 2 others: [L] Instrument MediaSearch results page - https://phabricator.wikimedia.org/T258183 (10nettrom_WMF) [17:55:00] heya elukey I have a meeting in 5 but until then! [17:55:03] bc? [17:55:11] ah snap no it requires more time :D [17:55:16] ah ok [17:55:23] we can do it tomorrow, np [17:55:26] my meeting ends in 35 [17:55:30] ok, tomorrow then [17:55:38] ah ok in 35 is good [17:55:40] I'll wait [17:55:42] :) [17:55:45] ok! [18:10:35] hi ottomata sorry, just finished the meeting [18:12:32] Swagoel is a prospective Harvard student who is collaborating with us in a volunteer capacity. Swati has worked in the past with leila and another former collaborator. She will be working on the "maps of visual knowledge gap" project, for which she will need access to the wikidata tables on hive [18:12:54] elukey: meeting ended early, wanna bc? [18:14:18] mforns: sure! [18:14:32] elukey: ok, omw [18:15:58] mforns: eventstreams patch up! [18:15:59] https://gerrit.wikimedia.org/r/c/mediawiki/services/eventstreams/+/641215 [18:16:24] ottomata: lookin, wanna talk ui in 15 mins? [18:17:20] mforns: ok! or now would be better for me if you are avail? [18:17:39] ottomata: in meeting [18:17:42] ah k [18:23:37] thanks ottomata :) [18:23:44] for the approval! [18:25:21] razzi: if you have time, do you want to join bc? [18:25:26] ya [18:45:29] ottomata: done with the meeting, if you have time, or else I'll be here later, or tomorrow! [18:47:31] ya lets' do now [18:47:43] in bc [18:48:13] mforns: ^ [18:48:25] ottomata: ok coming! [19:03:53] * elukey off! [19:03:55] o/ [19:10:27] gottta go afk for a bit, back later [19:32:44] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to production shell groups for JAnstee - https://phabricator.wikimedia.org/T266249 (10herron) 05Open→03Resolved I'll transition this to closed for the time being due to inactivity. When ready to proceed please add a comment of manager a... [20:26:34] * ottomata back [22:26:01] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, and 2 others: Clients need to generate an ISO 8601 formatted timestamp - https://phabricator.wikimedia.org/T240460 (10Ottomata) I've deployed the eventgate-wikimedia change in all staging clusters. I'll do the prod ones tomorrow. If... [23:48:55] 10Quarry, 10cloud-services-team (Kanban): Do some checks of how many queries will break in a multiinstance environment - https://phabricator.wikimedia.org/T267989 (10Bstorm)