[05:07:09] 10Analytics, 10DBA, 10User-Kormat: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10Marostegui) >>! In T256966#6274434, @Kormat wrote: > This host was reimaged to buster recently (2020-06-22) as part of T254870, and the symptoms do sound very like https://jira.mariadb.or... [06:21:19] good morning folks [07:03:34] so I just tried the cookbooks for the big top upgrade in hadoop test, the change-distro.py failed during the last step but it was for a inconsistency in the standby namendoe left by another upgrade/rollback attmept [07:03:44] the rest worked really nicely :) [07:08:13] I'll wait a bit and then I'll attempt a rollback [07:12:02] 10Analytics-Clusters, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Create a profile to standardize the deployment of JVM packages and configurations - https://phabricator.wikimedia.org/T253553 (10elukey) The goal of this task has been reached, there are some remaining systems to be migrated (CI, c... [07:12:12] 10Analytics-Clusters, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Create a profile to standardize the deployment of JVM packages and configurations - https://phabricator.wikimedia.org/T253553 (10elukey) [07:14:37] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Decomission notebook hosts - https://phabricator.wikimedia.org/T249752 (10elukey) [07:20:23] Hi elukey - awesome :) [07:21:40] (03CR) 10Joal: "Tested on cluster" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/609465 (https://phabricator.wikimedia.org/T255548) (owner: 10Joal) [07:24:17] 10Analytics, 10DBA, 10User-Kormat: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10Kormat) > From the reported logs In that case let me supply more logs :) The errors from line 15 onwards are what made me think of that mariadb upstream issue. {P11741} [07:24:48] 160/3 [07:25:17] oops - 53.33 for the result [07:29:58] 10Analytics, 10DBA, 10User-Kormat: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10Marostegui) Ah, I only saw the ones reported on the task initial creation. Those are definitely similar to the ones we did see during the crashes with labsdb hosts. Going to comment on th... [07:34:26] 10Analytics, 10DBA, 10Upstream, 10User-Kormat: dbstore1005 s8 mariadb instance crashed - https://phabricator.wikimedia.org/T256966 (10Marostegui) [07:51:37] !log enable binlog on matomo's database on matomo1002 [07:51:38] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:56:21] joal: do you want to test/switch aqs' druid datasource? [07:56:29] yessir! [08:01:00] elukey: --^ :) [08:02:03] joal: aqs1004 ready for testing :) [08:02:08] \o/ [08:02:13] Teeeeeeeesting! [08:02:45] ready for me elukey :) [08:03:00] joal: so ok to rollout? [08:03:08] please yes! [08:03:47] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10Marostegui) >>! In T234826#6274823, @elukey wrote: > Status of the databases: > > `analytics-meta` has binlog enabled, with ROW forma... [08:07:49] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10elukey) Yes correct I was checking and reporting the current status to find with you folks if there was any blocker that I didn't know... [08:09:12] !log roll restart aqs on aqs100[4-9] to pick up new druid settings [08:09:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:10:24] joal: done! [08:12:28] checking UI elukey :) [08:15:42] big drop of contribs last month on frwiki [08:18:35] ok UI looks good - new monthly data released :) [08:18:41] Thanks a lot elukey :) [08:19:27] super [08:33:46] piwik looks ok after enabling the binlog for the db (pre-requisite to enable replication later on) [08:55:43] * elukey interview time, bbl [08:57:31] 10Analytics-Radar, 10Analytics-Wikistats, 10Chinese-Sites: X-axis is at odds with stated period in header of trend charts for 'total articles' for a wiki - https://phabricator.wikimedia.org/T180118 (10VulpesVulpes825) 05Open→03Declined As the final Wikistats-1 dump-based reports have been released, Wikis... [09:24:03] (03PS2) 10Joal: Fix mediawiki-history skewed join bug [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/609465 (https://phabricator.wikimedia.org/T255548) [10:32:59] * elukey lunch! [13:14:04] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Create anaconda .deb package with stacked conda user envs - https://phabricator.wikimedia.org/T251006 (10Ottomata) Ok, I think you are right. Will do that. [14:22:01] some insights on pageviews thanks to the new filter/split component: https://usercontent.irccloud-cdn.com/file/eEzCsF34/%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%BC%E3%83%B3%E3%82%B7%E3%83%A7%E3%83%83%E3%83%88%202020-07-06%20%E5%8D%88%E5%BE%8C4.21.33.png [14:22:47] automated pageviews on mobile app have increased by an order of magnitude over the last few days [14:23:41] fdans: very much expected, it had dropped a month ago (or two) :) [14:24:10] * joal really wants that feature - screen-shot is too much teasing fdans [14:24:18] :) [14:25:29] joal: yea I was thinking about the issue we discussed last week but automated seem a dispropotiornately high increase [14:26:17] hm - interesting fdans! [14:26:26] I addn't noted the automated aspect [14:26:56] joal: hmm, not disproportionate though, now that I'm seeing it compared to user [14:27:14] fdans: that's what I was checking, but with turnilo :) [14:27:58] fdans: seems back in the same ranges it was before the change [14:28:09] with some automated - but not all [14:28:50] we should nonetheless let them know aobut that, to try to pinpoint potential pre-laoding issues (maybe the app preloads, therefore leading to alot of events, therefore being fla as automated) [14:29:18] https://usercontent.irccloud-cdn.com/file/vlUPGer0/%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%BC%E3%83%B3%E3%82%B7%E3%83%A7%E3%83%83%E3%83%88%202020-07-06%20%E5%8D%88%E5%BE%8C4.28.58.png [14:29:30] \o/ [14:29:33] looking at daily though.... automated is the blue line [14:29:45] red is user, and it does seem back to normal [14:29:54] but automated seems way higher than before [14:29:57] true [14:30:16] hm - let's discuss with the android team [14:30:25] (03CR) 10Nuria: Correcting docs (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/609558 (owner: 10Nuria) [14:30:26] the detection model is more learned now? [14:30:32] nope [14:30:35] heuristics [14:32:02] (03CR) 10Joal: Correcting docs (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/609558 (owner: 10Nuria) [14:36:21] elukey / ottomata: Presto from Superset seems to hit a 30-second timeout. I thought we set it to 60, did I remember it wrong? The timeout seems to come from a web server not Presto itself: "The proxy server could not handle the request

Reason: Error reading from remote server" [14:37:33] milimetric: I don't recall setting timeouts for that, we can check for sure. 30s timeout seems to be related to a big query though :D [14:38:36] yep :D https://grafana.wikimedia.org/d/pMd25ruZz/presto?orgId=1 [14:38:37] elukey: I'm doing some basic stuff on the history tables, we should tune it to make sure it can handle that. Like for example, count of all events by wiki [14:39:51] milimetric: I think that there is a lot of data moving from hdfs to the presto workers in your query [14:40:08] yeah... we need colocation for sure [14:40:08] but sure we can track the timeout and review it [14:40:24] also one presto worker is down, hw completely burned [14:40:40] (4 worker nodes, one query manager on an-coord1001) [14:40:42] basically, if it can't handle this, then it won't be able to handle the majority of use cases that people have for Presto [14:41:10] k, makes sense, it was finishing this kind of thing before (I think) [14:41:40] I could be wrong about some subtlety, I don't have a good intuition yet on how well it optimizes the data it transfers from hdfs [14:42:11] milimetric: I clearly blame you for the query :D [14:43:26] this is all fair and good, as it should be [14:58:50] milimetric: have you tried to run the same query from the presto-cli? [14:58:54] just to compare results [14:59:02] not yet, good point [15:00:02] because IIUC you can set something like SET SESSION query_max_execution_time = '30s'; in there [15:02:15] ping ottomata [15:02:58] ping ottomata [15:08:56] the query took 41 seconds to complete, but it did look at all the rows (5.6B) [15:09:18] so that 30-second timeout is basically not enough to let us analyze mediawiki_history, I'll bring it up PS [15:09:44] is there a place where we set it for superset? [15:16:47] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Backfill wdqs_external_sparql_query without filtering on meta.domain - https://phabricator.wikimedia.org/T256797 (10Ottomata) ` 20/07/03 10:57:01 INFO Refine: Successfully refined 1510 of 1510 dataset partitions into table `event`.`wdqs_external_sparql_query... [16:00:50] a-team: I tried to add a week-by-week breakdown in https://etherpad.wikimedia.org/p/analytics-staff-meeting [16:00:56] to list who's availabile in what weeks [16:01:07] if you want to add your infos please do so :) [16:03:18] (if it is not a good idea I can drop all) [17:19:44] * elukey off! [17:32:37] 10Analytics, 10Product-Analytics (Kanban): Collect metrics/tables which might be touched by IP masking feature - https://phabricator.wikimedia.org/T255816 (10jwang) [17:40:02] elukey: thank you, othera-team can add their info [17:40:08] done!b [17:45:12] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: eventgate-wikimedia should expose runtime stream configuration - https://phabricator.wikimedia.org/T253157 (10Nuria) 05Open→03Resolved [17:45:15] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Automate ingestion and refinement into Hive of event data from Kafka using stream configs and canary/heartbeat events - https://phabricator.wikimedia.org/T251609 (10Nuria) [17:46:21] 10Analytics, 10Product-Analytics (Kanban): Collect metrics/tables which might be touched by IP masking feature - https://phabricator.wikimedia.org/T255816 (10SNowick_WMF) [17:48:25] 10Analytics, 10Analytics-Kanban, 10Operations, 10observability: systemd::syslog conf should use :programname equals instead of startswith - https://phabricator.wikimedia.org/T251606 (10Nuria) 05Open→03Resolved [17:49:04] 10Analytics, 10Analytics-Kanban: Unique devices, retrofit with bot detection code - https://phabricator.wikimedia.org/T250744 (10Nuria) Let's write docs before we close ticket (cc @JAllemandou ) [17:49:42] 10Analytics, 10Analytics-Kanban: Make ActorSignatureGenerator class a non-singleton - https://phabricator.wikimedia.org/T255660 (10Nuria) 05Open→03Resolved [17:50:41] 10Analytics-Clusters, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Create a profile to standardize the deployment of JVM packages and configurations - https://phabricator.wikimedia.org/T253553 (10Nuria) 05Open→03Resolved [17:51:13] 10Analytics, 10Analytics-Kanban: Purge old files on Archiva to free some space - https://phabricator.wikimedia.org/T254849 (10Nuria) 05Open→03Resolved [17:51:16] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Move Archiva to Debian Buster - https://phabricator.wikimedia.org/T252767 (10Nuria) [17:51:34] 10Analytics, 10Analytics-Kanban: Make anomaly detection correctly handle holes in time-series - https://phabricator.wikimedia.org/T251542 (10Nuria) 05Open→03Resolved [17:51:58] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: permanent links in wikistats don't (always) work - https://phabricator.wikimedia.org/T254076 (10Nuria) [17:52:06] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats: permanent links in wikistats don't (always) work - https://phabricator.wikimedia.org/T254076 (10Nuria) 05Stalled→03Resolved [17:52:54] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: 'namespace_is_content' column in pageview data returns 1, 0 and NULL as booleans in Superset/Turnilo - https://phabricator.wikimedia.org/T255222 (10Nuria) 05Open→03Resolved [17:53:10] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Repurpose notebook100[3,4] - https://phabricator.wikimedia.org/T256363 (10Nuria) 05Open→03Resolved [17:53:46] 10Analytics, 10Analytics-Kanban: Delete pageview_actor_hourly data after 90 days - https://phabricator.wikimedia.org/T256417 (10Nuria) 05Open→03Resolved [17:55:51] 10Analytics, 10Analytics-Kanban, 10Dumps-Generation, 10Patch-For-Review: Some xml-dumps files don't follow BZ2 'correct' definition - https://phabricator.wikimedia.org/T243241 (10Nuria) 05Open→03Resolved [17:56:09] 10Analytics, 10Analytics-Kanban: Unify puppet roles for stat and notebook hosts - https://phabricator.wikimedia.org/T243934 (10Nuria) [17:56:12] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Decomission notebook hosts - https://phabricator.wikimedia.org/T249752 (10Nuria) 05Open→03Resolved [18:08:00] 10Analytics, 10Product-Analytics (Kanban): Collect metrics/tables which might be touched by IP masking feature - https://phabricator.wikimedia.org/T255816 (10Mayakp.wiki) [18:10:39] 10Analytics, 10Product-Analytics (Kanban): Collect metrics/tables which might be touched by IP masking feature - https://phabricator.wikimedia.org/T255816 (10Mayakp.wiki) @jwang : We have been discussing this in our 1:1s with the product teams. @MNeisler will be investigating this for Web and Editing. [18:11:46] 10Analytics, 10Event-Platform: eventgate-wikimedia should emit metrics about validation errors - https://phabricator.wikimedia.org/T257237 (10Ottomata) [19:22:03] 10Analytics-EventLogging, 10Analytics-Radar, 10QuickSurveys, 10MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), and 2 others: QuickSurveys EventLogging missing ~10% of interactions - https://phabricator.wikimedia.org/T220627 (10Isaac) > I should be clearer: what I meant is that sendBeacon will consistently fail i... [19:49:21] gone for tonight [21:37:21] 10Analytics-Clusters, 10Analytics-Kanban, 10Operations, 10Patch-For-Review: Create a profile to standardize the deployment of JVM packages and configurations - https://phabricator.wikimedia.org/T253553 (10hashar) Thank you @elukey for the new Puppet Java profile and for taking in accounts suggestions for t... [22:03:35] 10Analytics, 10Analytics-Kanban: Create intermediate dataset: pageview with actor information - https://phabricator.wikimedia.org/T255467 (10Nuria) 05Open→03Resolved [22:03:37] 10Analytics, 10Analytics-Kanban: Unique devices, retrofit with bot detection code - https://phabricator.wikimedia.org/T250744 (10Nuria) [22:04:03] 10Analytics, 10Analytics-Kanban: Aggregate pageview from pageview_actor_hourly - https://phabricator.wikimedia.org/T256049 (10Nuria) 05Open→03Resolved [22:06:15] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update clickstream and interlanguage jobs to use `pageview_actor_hourly` table instread of webrequest - https://phabricator.wikimedia.org/T255779 (10Nuria) 05Open→03Resolved [22:11:13] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Create job that backfills Pagecounts-EZ (2011 - 2016) data via hadoop correcting issues - https://phabricator.wikimedia.org/T252857 (10Nuria) 05Open→03Resolved [22:11:15] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Migrate pagecounts-ez generation to hadoop - https://phabricator.wikimedia.org/T192474 (10Nuria) [22:11:50] 10Analytics, 10Analytics-Kanban: Language selector is not working anywhere now - https://phabricator.wikimedia.org/T246971 (10Nuria) 05Open→03Resolved [22:12:28] 10Analytics, 10Analytics-Kanban: Temporarily remove hourly traffic alarms from analytics-alerts - https://phabricator.wikimedia.org/T254256 (10Nuria) 05Open→03Resolved [22:12:52] 10Analytics-Clusters, 10Analytics-Kanban: Add new kafka brokers kafka-jumbo100[789] to the jumbo-eqiad Kafka cluster - https://phabricator.wikimedia.org/T252675 (10Nuria) 05Open→03Resolved [22:12:54] 10Analytics: Analytics Hardware for Fiscal Year 2019/2020 - https://phabricator.wikimedia.org/T244211 (10Nuria) [22:14:26] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Order mediawiki_history dumps by event_timestamp - https://phabricator.wikimedia.org/T254233 (10Nuria) 05Open→03Resolved [22:17:00] 10Analytics, 10Analytics-Kanban: Refine should DROP IF EXISTS before ADD PARTITION - https://phabricator.wikimedia.org/T246235 (10Nuria) [22:17:09] 10Analytics, 10Analytics-Kanban: Refine should DROP IF EXISTS before ADD PARTITION - https://phabricator.wikimedia.org/T246235 (10Nuria) 05Open→03Resolved [22:36:43] 10Analytics, 10Analytics-Kanban: Table wmf_raw.mediawiki_imagelinks seems to be missing data - https://phabricator.wikimedia.org/T254188 (10Nuria) [22:37:18] 10Analytics, 10Analytics-Kanban: Table wmf_raw.mediawiki_imagelinks seems to be missing data - https://phabricator.wikimedia.org/T254188 (10Nuria) 05Open→03Resolved [22:38:33] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Backfill wdqs_external_sparql_query without filtering on meta.domain - https://phabricator.wikimedia.org/T256797 (10Nuria) 05Open→03Resolved [22:38:35] 10Analytics, 10Event-Platform: Refine should add field to indicate if event is from wikimedia domain instead of filtering - https://phabricator.wikimedia.org/T256677 (10Nuria) [22:42:22] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform: Camus should look for multiple possible timestamp fields to use for hourly partitioining - https://phabricator.wikimedia.org/T256370 (10Nuria) 05Open→03Resolved [22:42:26] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Nuria) [23:07:35] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, and 2 others: Decommission EventLogging backend components by migrating to MEP - https://phabricator.wikimedia.org/T238230 (10Nuria) > scripts/eventlogging_legacy_schema_convert.js is this script just used via node on the repo in... [23:08:18] 10Analytics, 10Analytics-Kanban: Spike, see how easy/hard is to scoop all tables from Eventlogging log database - https://phabricator.wikimedia.org/T250709 (10Nuria) 05Open→03Resolved [23:12:45] 10Analytics, 10Analytics-Kanban: Rename pageview_actor_hourly to pageview_actor - https://phabricator.wikimedia.org/T256415 (10Nuria) I guess in this case we are updating this table hourly but not aggreggating anything hourly. Renaming this soon seems a bit of a hassle but if @JAllemandou thinks it should be... [23:14:49] 10Analytics, 10Product-Analytics: Keep canonical_data.wikis updated - https://phabricator.wikimedia.org/T241741 (10Nuria) [23:17:36] 10Analytics, 10Analytics-Kanban: Bump up SLA of pageview jobs after deploying bots check - https://phabricator.wikimedia.org/T252220 (10Nuria) 05Open→03Declined [23:18:10] 10Analytics: [HiveToDruid] Add support for ingesting subfields of map columns - https://phabricator.wikimedia.org/T208589 (10Nuria) [23:23:46] 10Analytics, 10Analytics-Kanban: Refactor Mediawiki-Database ingestion - https://phabricator.wikimedia.org/T209178 (10Nuria) 05Open→03Resolved [23:35:42] 10Analytics, 10Product-Analytics, 10Epic: Create data quality alarm on agent-type - https://phabricator.wikimedia.org/T257276 (10Nuria) [23:36:06] 10Analytics: Create data quality alarm on agent-type - https://phabricator.wikimedia.org/T257276 (10Nuria) [23:37:56] 10Analytics, 10Product-Analytics, 10Epic: API pageview counts for 'Mobile app' are incorrect since switch to mobile-html - https://phabricator.wikimedia.org/T256508 (10Nuria) @JoeWalsh let's talk at your covenience about a path to migrate pageviews for the app to be event based [23:38:33] 10Analytics: Create data quality alarm on access-method - https://phabricator.wikimedia.org/T257276 (10Nuria) [23:46:08] 10Analytics, 10Analytics-Kanban, 10Product-Analytics (Kanban): PageviewDefinition should detect /api/rest_v1/page/mobile-html requests as pageviews - https://phabricator.wikimedia.org/T256514 (10Nuria) While thase changes are in it is worth mentioning that the velocity at which the requests are being fired f... [23:56:16] 10Analytics, 10Product-Analytics, 10Epic: Definition of what constitutes a mobile pageview - https://phabricator.wikimedia.org/T257277 (10Nuria)