[03:17:50] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Milimetric) @Krinkle, the schema wouldn't change, it's fine as it is. The rest is correct. The confusion might come from the fact that we don't... [03:22:58] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Pchelolo) Hm... After hacking this around a little bit more, I think cre... [04:40:51] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Nuria) >My WIP code for the library is here, but the more I look into it... [06:08:55] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10Marostegui) ` root@DBSTORE[staging]> drop table mep_word_persistence; Query OK, 0 rows affected (5.53 sec) ` [06:09:03] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Convert Aria/Tokudb tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10Marostegui) ` root@DBSTORE[staging]> drop table mep_word_persistence; Query OK, 0 rows affected (5.53 sec) ` [06:10:03] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) [06:10:06] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Convert Aria/Tokudb tables to InnoDB on dbstore1002 - https://phabricator.wikimedia.org/T213706 (10Marostegui) 05Open→03Resolved This can be closed as there are no more Aria tables on the staging database: ` root@dbstore1002.eqiad.wmnet[(none)]> select TABL... [06:35:47] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) [07:38:18] good morning team [07:38:35] * fdans is still awful at managing his stupid jet lag [07:38:52] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10chelsyx) Hello A-team! We are asked to build a dashboard and need to pipe data from multiple sources to the same place: pageview... [07:39:55] hello :) [07:43:41] 10Analytics, 10User-Elukey: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10elukey) So slaporte's crontab on stat1007 is the following: ` 46 23 * * * sh /home/slaporte/send_report.sh ` That is everyday at 23:46. I copied to my home and tested it, seems working fine... [08:00:08] 10Analytics, 10User-Elukey: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10elukey) I did some tests swapping the recipients with me and Francisco, we have both received the email. Not sure if @Slaporte added the cron yesterday or not, but everything should work fine.... [08:07:49] Morning fdans [08:11:05] helloooo joal [08:11:13] joal: just who I needed [08:11:24] joal: could I ask you something in the batcave? [08:11:49] Please fdans - joining [08:18:52] joal: back! sorry [08:19:20] fdans: To the cave ! [08:39:28] 10Analytics, 10Analytics-Kanban, 10DBA: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10elukey) p:05Triage→03High [08:41:12] 10Analytics, 10Analytics-Kanban, 10DBA: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10elukey) [08:41:57] 10Analytics, 10Analytics-Kanban: update mw scooping to be able to scoop from new db cluster - https://phabricator.wikimedia.org/T215290 (10elukey) [08:41:59] 10Analytics, 10Analytics-Kanban: Update reportupdater to be able to query the new db cluster that will substitute 1002 - https://phabricator.wikimedia.org/T215289 (10elukey) [08:42:02] 10Analytics, 10Analytics-Kanban, 10DBA: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10elukey) [08:42:44] 10Analytics, 10Analytics-Kanban, 10DBA: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10elukey) [08:43:55] 10Analytics, 10Analytics-Kanban, 10DBA: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10elukey) @leila @Halfak Hi! The new dbstore100[3-5] hosts are ready, so I'd ask your teams to start using those and see what's missing/not-working/etc.. Let me know! [08:47:01] RoanKattouw: sorry forgot to answer about the multi-ip point that you brought up.. I completely get the confusion, I am checking if maybe DNS SRV records could help.. the main issue that I can see is that even if sX-etc.. gets its separate IP, then the ports will be shared on the same host (we have only three dbstores) :( [08:56:50] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10Marostegui) [09:17:39] (03CR) 10Addshore: [C: 03+1] "Looks pretty fine :)" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/489097 (https://phabricator.wikimedia.org/T213894) (owner: 10Ladsgroup) [09:29:58] 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Wikidata, and 3 others: track number of editors from other Wikimedia projects who also edit on Wikidata over time - https://phabricator.wikimedia.org/T193641 (10Addshore) 05Open→03Resolved [09:45:20] 10Analytics, 10Analytics-Kanban, 10EventBus: Spike: Can Refine handle map types if Hive Schema already exists with map fields? - https://phabricator.wikimedia.org/T215442 (10JAllemandou) @Ottomata: Double reading is way to go when we have schema discrepancies that can't be solve through casting (struct -> ma... [09:53:18] (going afk for a bit, be back in ~20 mins) [10:13:36] addshore: Good morning - Would you have a minute for me? [10:13:54] hehe, yes, i was almost about to poke you but was going to dive in to something else first [10:13:55] fire away! [10:14:15] I wonder about the wikidata json dumps release [10:14:22] yup [10:14:51] So far I have noticed they are produced bi-weekly (2 times per month), but not at regular monthly-aligned dates [10:15:10] so, im not 100% sure what the schedule there is :/ [10:15:31] im curious, did you try loading all of the current revisions from the XML dumps / the content already n hadoop? [10:15:36] For instance, they have been produced ion 2019-01-02 but on 2019-02-01 [10:16:42] addshore: /wmf/data/wmf/mediawiki/wikitext_history/snapshot=2018-12/wiki_db=wikidatawiki [10:16:58] addshore: I'm now interested in json dumps for parsing easiness [10:17:13] addshore: Would there be a way to try to align production dates to month dates ? [10:17:29] So, the JSON in the wikitext_history should be pretty consistent, not sure if parsing from hadoop would be easy though? [10:17:56] I imagine the intention is the dumps are generated on the same day, but they take some time, let me have a look [10:18:02] addshore: a big job extracting the data while it;s already produced on a regular basis :( [10:18:44] Actually for feb, json-dumps have been made available on 2019-02-04 my bad [10:18:54] 02-01 were lexemes [10:19:46] And, I don't mind the data not being available after a few days, but naming the folder after the job-start (1st of month for instance) would facilitate gathering new data [10:22:23] joal: https://phabricator.wikimedia.org/T209390 might be relevant ? [10:22:44] or maybe not, but might still be useful :P [10:23:01] not directly relevant addshore, but indeed interesting :) [10:23:23] addshore: I'm more after a "regular way" to get my hands on json and rdf dumps :) [10:23:35] instead of parsing folders to get what I 'm after [10:24:54] https://www.irccloud.com/pastebin/2YFTOfXc/ [10:25:02] that seems to be the cron [10:25:09] weekday 1 [10:25:48] feb 4th was a monday! [10:25:49] addshore: that's what I understood - Dumps get produced on mondays [10:26:00] thats funny, heh [10:26:07] :) [10:27:44] I think you might have to talk to Ariel :) [10:28:20] Okey - He's the dump-master :) [10:28:28] Thanks for that addshore [10:28:44] addshore: While we're at it - Please fire your question you thought you might not ;) [10:29:21] https://phabricator.wikimedia.org/T92966 might also interest you [10:30:02] So, I was going to try to figure out how many times per wiki does a logged in user load the same wikidata item (page in NS 0) [10:30:26] wow addshore - I need to process that ) [10:31:04] addshore: links between wikis and wikidata items are retrievab [10:31:11] "a logged in user" is the hard bit, and I was just going to group by IP & UA or something similar to get a kind of userish value? [10:31:20] le through url in wikidata right? [10:31:52] Yup [10:32:34] so, namespace_id = 0, is_pageview = 1, uri_host = 'www.wikidata.org', x_analytics_map['loggedIn'] = 1 [10:32:38] addshore: so it;s about linking pageviews to wikidata-items using titles (there'll be some defect here, but should be reasonnably accurate) [10:33:35] do you think using some combination of IP & UA might lead to something vaugly indicating a single user? [10:33:37] ish [10:33:42] addshore: I have not understood then - You're after wikidata.org pageviews [10:33:59] correct addshore - We use some fingerprinting [10:34:19] HASH(IP + UA + accept-language) is usually what we do [10:34:45] gotcha! [10:35:08] I'll give it a go in a bit and ping you if i need help! [10:35:09] The thing I don't get is "how many times per wiki" [10:36:32] sorry, not how many times per wiki :/ [10:36:34] per week [10:36:36] :D [10:36:57] Ahhhhh ! this makes a lot more sense in my mind now :) [10:37:27] you;re after recurrent viewers of single items [10:38:05] yes :) [10:38:47] I think the idea you have of fingerprinting + the filter you had makes sense [10:39:32] Something else to use addshore: since you have is_pageview = true, ou can use pageview_info['page_title'] to get titles instead of relying on uti_path [10:40:54] ack! [10:54:35] https://www.irccloud.com/pastebin/0hpGyPj7/ [10:54:47] joal: ^^ it seems to work, but my brain had a hard time naming the final 2 columns [10:55:36] maybe actually, users_loading_an_entity and entity_load_count [11:03:41] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) [11:04:06] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) [11:32:23] so I had an idea for dbstore host/ports [11:33:06] I filed a code change to add SRV records.. if SRE is ok, this would become a quick way to get dbstore host/port combination for s1 [11:33:09] answers = dns.resolver.query('_s1-analytics._tcp.eqiad.wmnet', 'SRV') [11:33:12] host, port = answers[0].target, answers[0].port [11:33:51] so the "glue code" would become even simpler [11:34:10] 1) get wiki/section mapping (like itwiki -> s2) [11:34:31] 2) SRV query for _s2-analytics._tcp.eqiad.wmnet [11:34:38] 3) connect to host/port [11:34:44] joal: --^ [11:38:03] Amir1: ^^ [11:39:36] That looks nice [11:42:27] the mapping between wiki/section would still be needed [11:42:34] but not all the logic for the port [11:42:39] hopefully it'll get merged :) [11:43:14] * elukey lunch! [12:03:33] heyaaa [12:17:36] hola marcelo! [12:22:20] I don’t think it’s jetlag fdans, I think it’s a weird virus. I hear other friends around the country having weird sleep problems and me too [12:22:24] addshore: except for the hour filter (you have multiple days, I assume you want every hour of them), seems correct :) [12:22:49] yes, i fixed that when I ran it :) [12:23:14] hey a-team, is there anyone who wants to pair with me and get bored vetting event_sanitized data before I activate the deletion script for raw event databse? :)))) [12:23:45] not now, but around before standup [12:24:22] mforns: before standup is not my easiest time really, but can do after if you want :) [12:25:40] milimetric: dunno but I just bought a box of melatonin pills and this BS ends today [12:26:17] elukey: I think I don't understand exactly the syntax you're using - My main understanding is: The port-changing aspect of the servers will be part of the python function providing the correct server to call - Ok ? [12:26:25] joal, thanks! [12:48:56] elukey: for when you're back - https://gerrit.wikimedia.org/r/489194 [12:49:09] elukey: Data has been checked, green flag everywhere [12:50:47] joal: re syntax - the SRV record contains hostname port basically, so I just get it from the record itself, rather than having to replicate the logic to find the port in every script [12:52:49] does it sound good? Any concern? [12:53:10] (didn't get what syntax you were referring to, if the dns record or python) [12:53:22] for aqs, merging + depooling aqs1004 [12:58:55] joal: aqs1004 depooled and ready to test [13:33:43] (03PS1) 10GoranSMilovanovic: xmlConfig re-factor [analytics/wmde/WDCM-Overview-Dashboard] - 10https://gerrit.wikimedia.org/r/489204 [13:34:02] (03CR) 10GoranSMilovanovic: [C: 03+2] xmlConfig re-factor [analytics/wmde/WDCM-Overview-Dashboard] - 10https://gerrit.wikimedia.org/r/489204 (owner: 10GoranSMilovanovic) [13:34:08] (03Merged) 10jenkins-bot: xmlConfig re-factor [analytics/wmde/WDCM-Overview-Dashboard] - 10https://gerrit.wikimedia.org/r/489204 (owner: 10GoranSMilovanovic) [13:35:58] sorry elukey, missed your ping [13:36:01] elukey: testing now [13:36:50] elukey: we hax dataz - deployment can proceed ! [13:37:00] Thanks matre [13:37:00] (03PS1) 10GoranSMilovanovic: xmlConfig - refactor [analytics/wmde/WDCM-Usage-Dashboard] - 10https://gerrit.wikimedia.org/r/489205 [13:37:06] ack! [13:37:12] (03CR) 10GoranSMilovanovic: [C: 03+2] xmlConfig - refactor [analytics/wmde/WDCM-Usage-Dashboard] - 10https://gerrit.wikimedia.org/r/489205 (owner: 10GoranSMilovanovic) [13:37:16] what about the dns/python thing? [13:37:21] does it sound reasonable? [13:37:35] elukey: getting port as part of the DNS sounds actually better :) No code hack [13:37:46] exactly [13:37:50] only a DNS request [13:40:27] aqs deploy completed [13:42:31] testing wikistats/v2 UI [13:45:42] everything looks fine on UI - Let's call it a win :) [13:45:57] !log wikistats2 snapshot updated to 2019-01 [13:45:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:50:53] joal: if you have a minute I'd need to discuss with you what topic(s?) we'd need to import from camus in the hadoop testing cluster [13:51:25] elukey: sure ! [13:51:29] is the sampled webrequest enough in your opinion? [13:51:36] maybe we can start from there and then see [13:52:01] elukey: sampled webrequest is definitely the first one - it'll allow us test the full chain [13:52:16] And I think it actually should be enough [13:52:29] (03PS1) 10GoranSMilovanovic: xml config + re-factor [analytics/wmde/WDCM-Semantics-Dashboard] - 10https://gerrit.wikimedia.org/r/489209 [13:52:38] (03CR) 10GoranSMilovanovic: [C: 03+2] xml config + re-factor [analytics/wmde/WDCM-Semantics-Dashboard] - 10https://gerrit.wikimedia.org/r/489209 (owner: 10GoranSMilovanovic) [13:53:26] joal: all right, it is now a matter of pulling data from one webrequest_text partition and then push it to webrequest_text_test or something similar [13:53:46] and then add a camus job via puppet to pull the data, purge it, etc.. [13:54:00] correct elukey :) [13:54:16] should we do it in Spark or maybe kafkacat or else? [13:54:23] kafkatee sorry [13:55:56] elukey: kafkatee should be enough if we don't care about resiliency [13:57:49] could be yes [13:57:57] I wanted to avoid a deployment of kafkatee [13:58:09] elukey: spark-streaming then :) [13:58:16] elukey: shouldn't be difficult [13:58:44] I can try to come up with a basic job [13:58:48] so I'll learn something [13:58:59] elukey: if you want :) Should be fun [14:11:00] 10Analytics, 10Research: Add (scoop) wikidatadawiki.wb_items_per_site MariaDB table to wmf_raw - https://phabricator.wikimedia.org/T215616 (10diego) [14:13:20] milimetric: do you have a couple minutes to batcave? [14:14:39] sorry fdans, I’m en route to meet up with like 10 other wmfers here in nyc [14:15:08] milimetric: yall didn't have enough of each other last week? [14:16:37] haha, apparently quite the opposite [14:17:48] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10Patch-For-Review, and 2 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10elukey) If https://gerrit.wikimedia.org/r/489170 is approved by SRE (should be),... [14:18:35] milimetric: --^ [14:18:59] (not now, whenever you have time :) [14:20:06] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10Patch-For-Review, and 2 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10JAllemandou) I like it, thanks @elukey ! [14:23:57] Wow elukey - I'm sorry I didn't even realize I was asking for a deploy on friday (aqs) - thanks again a lot, I'll try to be more carefull next time :S [14:28:16] nah it was a minor one, I'd have said no otherwise :) [14:29:03] <# [14:29:07] <3 [14:29:17] * joal should learn to type before trying emojis ... [14:32:09] afk for a bit :) [15:07:12] 10Analytics, 10Fundraising-Backlog: Clean up old fundraising-related user data on Analytics hosts - https://phabricator.wikimedia.org/T215382 (10Jgreen) a:03Jgreen [15:22:19] elukey: well if the DNS records get approved (and I like them and hope they do), then there's nothing for me to approve :) And my patches will be pretty easy [15:23:40] hopefully yes! The problem will be to translate wikiname -> section, that requires mediawiki-config deployed [15:24:27] or we could use https://noc.wikimedia.org/conf/dblists/s1.dblist etc.. [15:24:32] 10Analytics, 10Discovery-Search, 10Multimedia, 10Reading-Admin, and 3 others: Image Classification Working Group - https://phabricator.wikimedia.org/T215413 (10dr0ptp4kt) [15:24:35] so basically GETting them [15:24:40] but might be overkill [15:27:40] 10Analytics, 10Analytics-Kanban, 10EventBus: Spike: Can Refine handle map types if Hive Schema already exists with map fields? - https://phabricator.wikimedia.org/T215442 (10Ottomata) Ok, I ready to cave and go back to double reading for JSON data. :/ [15:29:52] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482663 (owner: 10Joal) [15:30:49] (03CR) 10Joal: [C: 03+2] "Merging" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482663 (owner: 10Joal) [15:30:53] Thanks mforns :) [15:30:58] np! [15:31:22] (03PS2) 10Joal: Use spark dynamic allocation in mediawiki-history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482663 [15:31:35] (03CR) 10Joal: [V: 03+2 C: 03+2] Use spark dynamic allocation in mediawiki-history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/482663 (owner: 10Joal) [15:31:40] sorry for spam team [15:34:06] spam? [15:36:40] (03CR) 10Mforns: [C: 04-1] "I think there are a couple typos in the property names!" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans) [15:46:06] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Finalize eventlogging to druid ingestion - https://phabricator.wikimedia.org/T206342 (10mforns) [15:46:42] spam? [15:47:20] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Finalize eventlogging to druid ingestion - https://phabricator.wikimedia.org/T206342 (10mforns) The docs are here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Hive_to_Druid Moving to done! [15:54:09] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Ottomata) Ya I think up to your discretion. I'd be ok with a generic li... [16:00:58] 10Analytics, 10Discovery-Search, 10Multimedia, 10Reading-Admin, and 3 others: Image Classification Working Group - https://phabricator.wikimedia.org/T215413 (10Miriam) @Gilles thanks for this! Images and graphics have very different underlying image statistics: it is therefore fairly easy for a classifier... [16:01:28] 10Analytics, 10Fundraising-Backlog: Clean up old fundraising-related user data on Analytics hosts - https://phabricator.wikimedia.org/T215382 (10Jgreen) a:05Jgreen→03elukey @elukey I'm fairly confident this data is no longer needed but I made backups to the fundraising archive until I can confirm that. You... [16:12:57] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10Ottomata) Thanks so much for the review! I'll respond to the AJV/schema stuff first. > if all JSON schem... [16:16:15] 10Analytics, 10Fundraising-Backlog: Clean up old fundraising-related user data on Analytics hosts - https://phabricator.wikimedia.org/T215382 (10elukey) All cleaned up, thanks! [16:16:23] 10Analytics, 10Analytics-Kanban, 10Fundraising-Backlog: Clean up old fundraising-related user data on Analytics hosts - https://phabricator.wikimedia.org/T215382 (10elukey) [16:20:29] (03Abandoned) 10Milimetric: [WIP] Analyze external link insertion and deletion [analytics/limn-edit-data] - 10https://gerrit.wikimedia.org/r/301432 (https://phabricator.wikimedia.org/T115119) (owner: 10Milimetric) [16:31:33] 10Analytics, 10Product-Analytics, 10Reading-analysis: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10mforns) @mpopov Just a heads-up that we'll be turning on the deletion script that will delete unsanitized EL data... [16:39:13] joal: thoughts on my geoeditors blunder? [16:39:20] if you want to chat pre-standup, I'm around [16:40:09] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10Ottomata) > media-src *; img-src *; style-src *; csp directives for the x-webkit-csp and x-content-securit... [16:43:34] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update big spark jobs conf with better settings - https://phabricator.wikimedia.org/T213525 (10Milimetric) reviewed doc and fixed some spelling. I don't know what spill files are, but the rest made sense to me. [16:45:03] 10Analytics-EventLogging, 10Analytics-Kanban: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive - https://phabricator.wikimedia.org/T209503 (10mforns) I've been doing some data vetting for the last 2 days. I've found 2 minor issues with the data: - Some fi... [16:59:06] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: CI Support for Schema Registry - https://phabricator.wikimedia.org/T206814 (10Pchelolo) [16:59:09] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Pchelolo) 05Open→03Stalled > Although, I'm not sure if we should bot... [16:59:52] mforns: ping standup [17:00:56] mforns: holaaa [17:02:56] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Ottomata) Hm, can we do this piece by piece? I'm fine with aborting the... [17:09:06] 10Analytics: Move FR banner-impression jobs to events (lambda) - https://phabricator.wikimedia.org/T215636 (10JAllemandou) [17:11:47] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Pchelolo) That I already have, just need to revert a couple of commits t... [17:42:54] * elukey off! [17:43:58] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10Ottomata) > lodash <= 4.17.5 EventGate uses ^4.17.11 > eslint-config-wikimedia > Outdated version (old: 0... [17:58:39] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Ottomata) Ya, but I think we shouldn't allow that. Since JSONSchema doe... [18:00:46] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Finalize eventlogging to druid ingestion - https://phabricator.wikimedia.org/T206342 (10Nuria) 05Open→03Resolved [18:23:13] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10Ottomata) Interesting! > The Ajv maintainers seem to recommend validating against this bundled meta-schema... [18:28:51] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10Pchelolo) > Since we will (probably? for now?) only be producing new schemas to EventGate, I think it woul... [18:33:44] 10Analytics, 10Pageviews-API, 10Tool-Pageviews: Pageviews agent=bot is always 0 - https://phabricator.wikimedia.org/T197277 (10Nuria) Indeed, that is a mistake that should be corrected on docs. [18:33:56] 10Analytics, 10Pageviews-API, 10Tool-Pageviews, 10good first bug: Pageviews agent=bot is always 0 - https://phabricator.wikimedia.org/T197277 (10Nuria) [18:58:41] 10Analytics, 10Analytics-Kanban, 10EventBus: Spike: Can Refine handle map types if Hive Schema already exists with map fields? - https://phabricator.wikimedia.org/T215442 (10JAllemandou) I must say I also pushed for stopping double reading, and continue to think so. I wonder if having a first refine step gat... [19:04:35] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update big spark jobs conf with better settings - https://phabricator.wikimedia.org/T213525 (10JAllemandou) For the record @Milimetric : spilled files are the temporary files generated between steps when data doesn't fit in memory (they're called spilled b... [19:38:31] thanks jo! makes sense [19:51:43] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10leila) @elukey notified the team. [20:06:13] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10Ottomata) > The Ajv maintainers seem to recommend validating against this bundled meta-schema - ajv/lib/re... [20:21:31] 10Analytics, 10Analytics-Kanban, 10EventBus: Spike: Can Refine handle map types if Hive Schema already exists with map fields? - https://phabricator.wikimedia.org/T215442 (10Ottomata) Not sure I understand...? [20:23:59] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10bmansurov) I haven't used dbstore1002 so all good on my end. [20:32:34] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10DBA, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Ottomata) [20:33:55] 10Analytics: Generate edit totals by country by month - https://phabricator.wikimedia.org/T215655 (10Milimetric) [20:34:12] 10Analytics, 10Analytics-Kanban: Generate edit totals by country by month - https://phabricator.wikimedia.org/T215655 (10Milimetric) p:05Triage→03High a:03Milimetric [20:43:42] (03PS1) 10Milimetric: Create geoeditors edits monthly dataset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/489313 (https://phabricator.wikimedia.org/T215655) [20:44:58] (03CR) 10Milimetric: "I've tested the hive and hql pieces but not yet tested the oozie job." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/489313 (https://phabricator.wikimedia.org/T215655) (owner: 10Milimetric) [20:45:32] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10DBA, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Ottomata) [20:46:49] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10DBA, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Ottomata) a:05Ottomata→03Dzahn Thanks Daniel! The Analytics usages are gone. I'm assigning... [20:51:07] (03CR) 10Ottomata: Change email send workflow to notify of completed jobs (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans) [21:27:28] ottomata: yt? [21:29:55] ottomata: superset is kaput [21:30:05] ottomata: and also crashed yesterday due to use i think [21:33:49] (03CR) 10Nuria: [C: 03+1] Create geoeditors edits monthly dataset [analytics/refinery] - 10https://gerrit.wikimedia.org/r/489313 (https://phabricator.wikimedia.org/T215655) (owner: 10Milimetric) [21:56:38] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10sbassett) > lodash <= 4.17.5 > > EventGate uses ^4.17.11 It does, but npm is creating a hard dependency... [22:24:06] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10sbassett) Some additional follow-up: > Relative URIs (ones with the path portion only, e.g. /mediawiki/re... [22:27:35] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10DBA, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn) [22:28:16] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10DBA, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn) [22:30:20] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10DBA, and 2 others: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070 (10Dzahn) >>! In T162070#4939472, @Ottomata wrote: > Thanks Daniel! The Analytics usages are gone....