[08:11:02] Hi team - nuria was right in saying cassandra loading jobs would be faster - backfilling single jobs (one day) were taking ~45minutes, and with the new jar ~25 minutes !! [08:11:24] This also means the load on the cassandra machines is higher in term of compactions - I'm monitoring that closely [08:13:00] fdans: would you still be nearby today? [08:13:12] by any chance (I know you're supposed to be in holidays) [08:15:36] joal: I'm correcting a couple things in the cron'd command [08:17:51] In term of logging size - Tada: 1 random day of mediarequests-per-file, before change - 90G ,after change - 1M - THERE YOU GO ! [08:18:32] woooooo [08:24:17] loooool [08:24:18] :D [08:25:37] very nice [08:26:00] fdans: I am wondering if it would make sense to add MAILTO=analytics-alerts@etc.. [08:26:14] so we'll know if something is broken while you're away [08:30:41] nic job joal [08:30:45] *nice [08:35:44] elukey: I'm leaving for errand, will be back in 1h30 [08:35:55] ack [08:52:34] elukey already did! [08:53:26] <3 [09:42:39] Back :) [09:58:58] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10Patch-For-Review, and 2 others: Track WDQS updater UA in wikidata-special-entitydata grafana dashboard - https://phabricator.wikimedia.org/T218998 (10Addshore) Deployment has happened, so we can get ready to look for changes. [10:04:58] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10Patch-For-Review, and 2 others: Track WDQS updater UA in wikidata-special-entitydata grafana dashboard - https://phabricator.wikimedia.org/T218998 (10Ladsgroup) Added to https://grafana.wikimedia.org/d/000000188/wikidata-special-entitydata in the top... [10:18:41] I almost forgot about https://phabricator.wikimedia.org/T234234 [10:18:52] it is a big work [10:31:14] 10Analytics, 10Analytics-Kanban: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10jcrespo) > buster + last version of mariadb Buster will install 10.3. We have available 10.3, 10.4 and percona-server 8.0. Those (and buster) are in theory supported, but please not... [10:34:56] 10Analytics, 10Analytics-Kanban: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10elukey) >>! In T234826#5684098, @jcrespo wrote: >> buster + last version of mariadb > > Buster will install 10.3. We have available 10.3, 10.4 and percona-server 8.0. Those (and bus... [10:35:17] 10Analytics, 10Analytics-Kanban: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10Marostegui) >>! In T234826#5683929, @elukey wrote: > High level plan: > > 1) Upgrade db1108, if possible, to buster + last version of mariadb. Since we'd like to keep the current `l... [10:38:05] 10Analytics, 10Analytics-Kanban: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10elukey) >>! In T234826#5684102, @Marostegui wrote: >>>! In T234826#5683929, @elukey wrote: >> High level plan: >> >> 1) Upgrade db1108, if possible, to buster + last version of mari... [10:38:45] 10Analytics, 10User-Elukey: Architecture of recent changes on top of kafka. Produce Design Document. - https://phabricator.wikimedia.org/T234234 (10elukey) [10:40:00] 10Analytics, 10Analytics-Kanban: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10Marostegui) We are still using stretch in all our production hosts. But we are soon going to start exploring what to do with the Buster upgrade, so we can keep in touch for that :) [10:48:16] 10Analytics, 10User-Elukey: Architecture of recent changes on top of kafka. Produce Design Document. - https://phabricator.wikimedia.org/T234234 (10elukey) Today I quickly joined irc.wikimedia.org with my IRC client, and checked a couple of channels like en.wikimedia. The `rc-pmtpa` bot indeed writes a ton of... [10:54:18] 10Analytics, 10User-Elukey: Architecture of recent changes on top of kafka. Produce Design Document. - https://phabricator.wikimedia.org/T234234 (10elukey) @Krinkle I am pinging you since you have probably the most context: do you think that it could be something doable to migrate the bots consuming from irc.w... [11:36:27] * elukey lunch! [12:40:04] (03Abandoned) 10Thiemo Kreuz (WMDE): Make connections persistent in WikimediaDb lib class [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/493214 (https://phabricator.wikimedia.org/T216613) (owner: 10Thiemo Kreuz (WMDE)) [13:42:07] 10Analytics, 10Analytics-Kanban: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10Marostegui) >>! In T234826#5684318, @jcrespo wrote: >> I was planning to have only one mariadb instance acting as multi-source > > I strongly suggest to use several instances- it wo... [14:12:41] 10Analytics, 10Analytics-Kanban: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10elukey) Makes sense, I'll go for multi-instance then, I have no intention to resurrect another dbstore1002 :) Regarding the ownership of the service, I was hoping that we could have... [14:17:55] 10Analytics, 10Analytics-Kanban: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10Marostegui) >>! In T234826#5684639, @elukey wrote: > Makes sense, I'll go for multi-instance then, I have no intention to resurrect another dbstore1002 :) Yaay!! <3 > > Regarding... [14:21:21] nuria: as per our talk about MCR yesterday: https://www.mediawiki.org/wiki/Multi-Content_Revisions/Database_Schema#revision [14:26:13] (03PS1) 10Joal: Update geoeditors-daily to include bot edits [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552510 (https://phabricator.wikimedia.org/T238855) [14:27:04] (03CR) 10Joal: [V: 03+2] "Queries tested and data vetted." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552510 (https://phabricator.wikimedia.org/T238855) (owner: 10Joal) [14:30:11] ping addshore - Would you have a minute for a question? [14:43:18] joal: yup [14:43:34] addshore: \o/ :) [14:43:57] addshore: there are rows with interesting rc_type values in the cu_changes tables for wikidata :0 [14:44:00] :) [14:44:15] oooh, such as? [14:44:17] addshore: namely an undocumented rc_type of 142 [14:44:29] addshore: any bell ringing? [14:45:42] hmmmmmmmmmmm [14:45:48] only on wikidata.org? [14:46:12] checking [14:47:00] do you have an example full row I can see? :) [14:47:05] addshore: at least for sure not on enwiki nor commons [14:48:42] looks like something flow related? [14:48:56] I see things like v1,new-post and v1,reply etc [14:49:11] in cuc_comment [14:49:17] ah, completely possible [14:49:28] =[ [14:49:29] [= [14:52:22] addshore: other wikis have it as well - noticeably mediawikiwiki, frwiki, zhwiki [14:53:02] Thanks a lot for the quick answer :) [15:20:00] 10Analytics, 10User-Elukey: Architecture of recent changes on top of kafka. Produce Design Document. - https://phabricator.wikimedia.org/T234234 (10MoritzMuehlenhoff) >>! In T234234#5684132, @elukey wrote: > difficult to complete before the Debian Jessie EOL deadline (end of June 2020) JFTR, the jessie deadli... [15:23:42] 10Analytics, 10User-Elukey: Architecture of recent changes on top of kafka. Produce Design Document. - https://phabricator.wikimedia.org/T234234 (10elukey) >>! In T234234#5684813, @MoritzMuehlenhoff wrote: >>>! In T234234#5684132, @elukey wrote: >> difficult to complete before the Debian Jessie EOL deadline (e... [15:35:07] 10Analytics, 10Analytics-Kanban: Rerun pingback reports to categorize software versions correctly. - https://phabricator.wikimedia.org/T238389 (10mforns) @CCicalese OK, *now* the dashboard looks good. Thanks for the patience! [15:49:42] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10matthiasmullie) As far as I can tell: @mpopov's queries (`wbc_entity_usage` based) include both MediaInfo items,... [16:03:05] 10Analytics, 10Analytics-Kanban: Rerun pingback reports to categorize software versions correctly. - https://phabricator.wikimedia.org/T238389 (10CCicalese_WMF) @mforns It looks great! Thank you! [16:07:41] 10Analytics, 10Growth-Team, 10Product-Analytics: Growth: implement wider data purge window - https://phabricator.wikimedia.org/T237124 (10mforns) Hey all, Would it be OK to keep original data for 270 days (can you keep all fields?), or do you need some fields to be sanitized to be able to keep it for 270 day... [16:08:01] hi team. quick question: in Turnilo, pageviews_hourly, "View Count" includes page views (read-only) and edits or both? [16:09:37] hi sukhe, shouldn't include edits, we have a separate dataset about them [16:09:52] are you interested in some particular number? [16:10:34] elukey: thank you, I just wanted to confirm the formal definition of "pageviews" and while I thought it didn't mean edits, I just wanted to confirm [16:10:50] (03CR) 10Nuria: Update geoeditors-daily to include bot edits (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/552510 (https://phabricator.wikimedia.org/T238855) (owner: 10Joal) [16:12:26] sukhe: more formal answer :) https://meta.wikimedia.org/wiki/Research:Page_view [16:12:51] elukey: ah thanks! [16:12:57] * sukhe needs to upgrade his search skills [16:13:15] sukhe: nono please feel free to ping us anytime, we are happy to help :) [16:14:16] thank you :folded hands emoji: :) [16:20:20] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10Nuria) @Addshore : disclaimer: I know next to nothing about this but how are you taking into account that the re... [16:21:49] 10Analytics, 10Analytics-Kanban: Hourly Feature extraction for bot detection from webrequest - https://phabricator.wikimedia.org/T238360 (10Nuria) a:03Nuria [16:22:23] 10Analytics, 10Analytics-Kanban: Import siteinfo dumps onto HDFS - https://phabricator.wikimedia.org/T234333 (10Nuria) [16:23:10] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10Nuria) 05Open→03Resolved [16:23:12] 10Analytics-EventLogging, 10Analytics-Kanban: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170 (10Nuria) [16:23:57] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10Nuria) I closed this but i think there is an open question of whether the dump needs to happen again cc @jcrespo [16:24:04] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10Nuria) 05Resolved→03Open [16:24:06] 10Analytics-EventLogging, 10Analytics-Kanban: Sunset MySQL data store for eventlogging - https://phabricator.wikimedia.org/T159170 (10Nuria) [16:24:19] 10Analytics, 10Analytics-Kanban, 10Services (watching): Add cassandra loading job for mediarequests per referer - https://phabricator.wikimedia.org/T232858 (10Nuria) 05Open→03Resolved [16:25:27] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10Abit) [16:53:14] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Logging level of cassandra should be warning or error but not debug - https://phabricator.wikimedia.org/T236698 (10Nuria) 05Open→03Resolved [17:00:44] ping elukey [17:01:20] cominggg [17:01:20] sorry [17:02:12] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Add bot edits to geoeditors-daily - https://phabricator.wikimedia.org/T238855 (10JAllemandou) a:03JAllemandou [17:48:17] 10Analytics, 10Event-Platform, 10Wikimedia-Stream: Eliminate usage of mocha-eslint from eventstreams - https://phabricator.wikimedia.org/T238937 (10Pchelolo) [17:52:01] 10Analytics, 10ChangeProp, 10Event-Platform, 10MediaWiki-JobQueue, and 3 others: Separate ChangeProp and JobQueue Redis - https://phabricator.wikimedia.org/T183586 (10Pchelolo) 05Open→03Declined the idea is still solid, but I don't think there's a pressing need to do it now. [17:52:05] 10Analytics, 10ChangeProp, 10Event-Platform, 10MediaWiki-JobQueue, and 5 others: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088 (10Pchelolo) [17:54:45] 10Analytics, 10ChangeProp, 10Core Platform Team, 10Event-Platform, and 2 others: Make Kafka JobQueue use Special:RunSingleJob - https://phabricator.wikimedia.org/T182372 (10Pchelolo) [17:57:04] 10Analytics, 10Event-Platform, 10MediaWiki-JobQueue, 10Core Platform Team (Needs Cleaning - Services Operations): Create scripts to estimate Kafka queue size per wiki - https://phabricator.wikimedia.org/T182259 (10Pchelolo) 05Open→03Declined [17:57:14] 10Analytics, 10CPT Initiatives (MCR), 10Multi-Content-Revisions (New Features): MCR: Import all slots from XML dumps - https://phabricator.wikimedia.org/T220525 (10Nuria) [17:58:20] 10Analytics, 10ChangeProp, 10Core Platform Team, 10MediaWiki-JobQueue, and 2 others: Consider the possibility of separating ChangeProp and JobQueue on Kafka level - https://phabricator.wikimedia.org/T199431 (10Pchelolo) It's still a viable idea, but I don't think we have the capacity to work on it now. Ice... [18:02:46] bbiab [18:03:35] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10Nuria) So, per my comment above, I think the number of items is actually smaller than the one @Addshore has comp... [18:24:04] 10Analytics, 10CPT Initiatives (MCR), 10Multi-Content-Revisions (New Features), 10User-ArielGlenn: MCR: Import all slots from XML dumps - https://phabricator.wikimedia.org/T220525 (10ArielGlenn) [18:34:07] 10Analytics, 10Cleanup, 10Event-Platform, 10Gerrit, and 4 others: Archive eventgate-ci repository from gerrit - https://phabricator.wikimedia.org/T229111 (10Jdforrester-WMF) [18:52:03] hey a-team: IIRC, there' now no EventLogging to MariaDB. When that got turned off, did it also turn off logging to MariaDB on betalabs? [18:54:51] Nettrom: in theory no :) [19:04:03] Nettrom: no, taht shoudl work same [19:04:09] *that should work [19:15:40] * elukey off [19:17:29] thanks nuria and elukey, I'm digging into it a bit now [19:20:18] Nettrom: the mysql instance on labs fails not infrequently, if you do not find what you are looking for it might need a restart, let me know [19:20:48] nuria: yeah, I. just restarted it to see if that helps :) [19:21:31] Nettrom: nice [19:39:12] 10Analytics, 10Analytics-Kanban, 10Inuka-Team, 10Patch-For-Review: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10Nuria) KaIOs appears like 1000 times a day but still i do not see it in any of the pageview_hourly data [19:39:18] 10Analytics, 10Analytics-Kanban: Create kerberos principals for users - https://phabricator.wikimedia.org/T237605 (10Pcoombe) Hello, requesting Kerberos credentials for Hadoop access on stat100x and notebook100x. My username is `pcoombe`. Thanks! [19:45:49] nuria: yep, turning all the things off and on again seemed to do the trick :) [20:56:03] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10mpopov) Here are the missing screenshots: >>! In T238878#5683048, @Nuria wrote: > The work done by @mpopov (if... [21:19:07] 10Analytics, 10Analytics-Kanban, 10Inuka-Team, 10Patch-For-Review: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10Nuria) the uap-core version has the changes: [nuriaruiz@nurieta][/workplace/analytics/uap-java/uap-core]$ more regexes.yaml | grep -i kaio # KaiOS os_... [21:28:29] 10Analytics, 10Analytics-Kanban: Doubts and questions about Kerberos and Hadoop - https://phabricator.wikimedia.org/T238560 (10spatton) Hey @elukey, @Nuria, and analytics team: we in online fundraising **really** appreciate your willingness to move this release back. We understand its importance and appreciate... [21:32:36] 10Analytics, 10Readers-Web-Backlog (Needs Product Owner Decisions): % of "none" referers seems too high - https://phabricator.wikimedia.org/T195880 (10Isaac) I wanted to add a couple data points / hypotheses to this discussion: * Chrome Mobile Version 38 that Nuria mentions as #3 in T195880#4429156 is actually... [23:02:57] 10Analytics, 10Analytics-Kanban, 10Inuka-Team, 10Patch-For-Review: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10Nuria) Never mind , this is working: Mozilla/5.0 (Mobile; LYF/F220B/LYF-F220B-003-01-35-080419;Android; rv:48.0) Gecko/48.0 Firefox/48.0 KAIOS/2.5 {"browser_... [23:03:01] 10Analytics, 10Inuka-Team (Kanban): Add KaiOS to the list of OS query options for pageviews in Turnilo - https://phabricator.wikimedia.org/T231998 (10Nuria) [23:03:04] 10Analytics, 10Analytics-Kanban, 10Inuka-Team, 10Patch-For-Review: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10Nuria) 05Open→03Resolved [23:03:11] 10Analytics, 10Analytics-Kanban, 10Inuka-Team, 10Patch-For-Review: Update ua parser on analytics stack - https://phabricator.wikimedia.org/T237743 (10Nuria) [23:03:31] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Create test Kerberos identities/accounts for some selected users in hadoop test cluster - https://phabricator.wikimedia.org/T212258 (10Nuria) 05Open→03Resolved [23:03:33] 10Analytics: Enable Security (stronger authentication and data encryption) for the Analytics Hadoop cluster and its dependent services - https://phabricator.wikimedia.org/T211836 (10Nuria) [23:05:14] 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10mpopov) I was looking at [[ https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibase/+/814e7...