[00:01:41] 10Analytics, 10EventBus, 10MediaWiki-Core-Testing, 10Quibble, and 4 others: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins - https://phabricator.wikimedia.org/T216069 (10holger.knust) Antoine, I tried to recreate the test results locally using the instructions in the link you shared but it failed (... [01:10:40] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Product-Analytics, and 5 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Nuria) @Ottomata let's not include cookie info and just have a nocookie marker that is useful in this context... [01:11:30] 10Analytics, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10MNeisler) @mforns Thanks! Yes, happy to discuss and coordinate on this. I reviewed this task with @Neil_P._Quinn_WMF today. I'm going to first work on d... [01:28:27] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10chelsyx) **Update**: I checked the table in hadoop again and found all of my test... [04:23:34] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10Nuria) @santhosh Let's please clean the code that sends data to graphana if we th... [07:20:38] 10Analytics: Clean up analytics data for user zhousquared - https://phabricator.wikimedia.org/T216679 (10elukey) p:05Triage→03Normal [07:27:03] hello people [07:27:12] going to deploy the refinery to stat1006 and notebooks [07:49:49] 10Analytics, 10Analytics-Wikistats, 10Wikisource, 10Wikisource-Community-User-Group: Punjabi Wikisource on Wikistats metrics - https://phabricator.wikimedia.org/T216680 (10satdeep_gill) [08:08:32] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10elukey) @mpopov: analytics-mysql deployed on notebooks and stat1006, I have als... [08:09:01] so analytics-mysql available everywhere [08:14:46] RoanKattouw: o/ - we added a tool called 'analytics-mysql' on the analytics nodes (like stat1006), it should help querying the new dbstore nodes. All details in https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_replicas, lemme know what you think about it (there might be some bugs in the beginning, if so please report them! :) [08:15:12] 10Analytics, 10Analytics-Wikistats: Punjabi Wikisource on Wikistats metrics - https://phabricator.wikimedia.org/T216680 (10Bodhisattwa) [08:23:24] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) @Melderick Restarts are done automatically by the script running the specific dump, up to a certain number of times.... [08:24:07] 10Analytics, 10Analytics-Wikistats: Punjabi Wikisource on Wikistats metrics - https://phabricator.wikimedia.org/T216680 (10JAllemandou) Hi @satdeep_gill wikistats is not mainained anymore. However wikistats2 is. Punjabi Wikisource project statistics will be present in next month update of wikistats2 as per T2... [08:40:04] joal: o/ [08:40:06] bonjour [08:40:12] hello elukey :) [08:40:18] when you have time I'd need to chat with you about some hive dbs [08:40:35] I have time :) [08:40:55] :) [08:40:58] so the task is https://phabricator.wikimedia.org/T200875 [08:41:17] if you see in the description there is one hive db that I'd need to move under baho's namespace [08:41:55] I saw that there is a "rename" alter command that should be handy to use [08:42:17] mmm even if I can see a lot of temp tables [08:42:32] I might ask Leila to check if we can clean them up first [08:42:46] anyway, the more generic question is what procedure should we follow in these cases [08:43:03] (user not active, valuable data, etc..) [08:43:13] elukey: thinking about this for a minute [08:43:25] :) [08:43:35] we can discuss in bc if you have concerns etc.. [08:43:53] No concern really, just trying not moss a bit [08:43:56] miss sorry [08:44:27] I can think of 2 ways here: RENAME in hive, or physical hdfs move and hive recreation [08:44:49] rename seems to move managed tables IIUC [08:44:55] seems neat [08:45:11] the second one is more expensive in term of human work, but would be more reliable on various aspects (enforcing data is moved, checking user rights, etc) [08:46:53] yep, but I'd try the rename first anyway, if it works it would be wayyy less boring for us :D [08:47:08] makes sense [08:47:44] what do you think about the procedure though? I like the idea that we tranfer things to we have an "active" owner that we can ping [08:48:02] Baho proposed a sort of "parking space" for old research data [08:48:19] but I am afraid that it will become a black hole of horrors [08:48:24] in the long run [08:48:38] (like nobody checking it, PII data breaching policies, etc..) [08:50:55] elukey: I support the idea of no-black-hole on the hadoop cluster :) [08:51:28] joal: super :) [08:51:38] so I'll open another task to see if we can trim down the tables [08:51:46] Leila will probably want to kille me [08:51:47] *kill [08:51:52] :D [08:53:33] elukey: we need to maintain our reputation of data droppers :) [08:54:52] ahahahha [08:55:24] joal: for next fiscal, there is a big elephant in the room that I was thinking about... CDH 6 ? [08:55:37] elukey: bigtop 1.2? [08:55:41] 3 sorry :) [08:56:15] yeah I am not super convinced about it, especially after finding that we can get the deb sources for CDH (that was my major concern) [08:56:39] I am following their mailing list and the security part concerns me [08:57:05] CDH seems more oriented in keeping a sort of "Stability" of tools supported [08:57:16] I hear you - It makes me a bit sad, but I defintely understand :) [08:57:20] in big top people sometime propose to drop things [08:57:29] or to focus on kubernetes more etc.. [08:57:32] that makes sense [08:57:46] but probably not the best for our use case [08:58:00] anyway, I am in favor of event trying it [08:58:09] no opposing to more/better opensource :) [08:59:32] I know that elukey - Thanks for dedication in that <3 [09:00:44] and probably having the testing cluster would be way better for testing [09:00:53] i am starting to think that we should have a 'permantent' one [09:01:02] possibly moving old hosts to testing [09:01:09] I'll have a chat with SRE [09:01:12] they will not like it [09:01:16] :D [09:01:37] :) [09:02:45] * joal feels akwardly good at having an ops-folk supporting stuff the ops-team won't like :) [09:15:53] we can come up with an agreement, like keeping the host a bit more for the "Testing" cluster. If they break we just decom them [09:17:58] for sure elukey - it also probably depends on DC-space [09:19:42] probably that too [09:20:02] but I think that the major concern is keeping old hw that is OOW [09:20:13] k [09:29:54] 10Quarry, 10Patch-For-Review, 10Security: Setup CSP http header - https://phabricator.wikimedia.org/T214637 (10GTirloni) 05Open→03Resolved a:03Framawiki [09:32:18] 10Analytics, 10Discovery, 10Operations, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10fgiunchedi) >>! In T213976#4968603, @Ottomata wrote: > Alright, I'm not familiar with Swift, but if we were to do t... [09:35:30] chown -R in hdfs dfs is really slow [09:35:51] I am moving files to Baho's user dir and it take ages to chown them with -R [09:36:03] even if the dir structure is not nested a lot [09:36:14] elukey: many files I guess [09:36:59] anyway, just done :) [09:37:06] \o/ :) [09:37:23] the hive db that I thought contained tables was in reality /user/etc.., I got confused [09:37:29] so no alters for the moment [09:37:39] I also have experienced that recursive chown/chmod on HDFS for folders containing a large number of files is slow [09:37:41] (except the ones that we'll have to do for Ellery's data :D) [09:39:26] all right first big task to clean up research data done [09:39:31] \o/ [09:53:09] 10Analytics: Old job_tracker setting in oozie properties - https://phabricator.wikimedia.org/T216519 (10elukey) It seems a property that we set but then never use anywhere? @JAllemandou thoughts? Tried to find info in git history but no luck up to now.. [09:53:45] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Make an Analytics Data Lake table to provide meta info about wikis - https://phabricator.wikimedia.org/T184576 (10Neil_P._Quinn_WMF) @EBernhardson, actually this is quite close to being done! If you look at the [wiki segmentation spreadsheet](https:/... [09:53:55] elukey: no specific thoughts --^ If oozie works without that prop set, let's remove it !!! [09:57:57] 10Quarry, 10Patch-For-Review, 10Security: Setup CSP http header in Quarry - https://phabricator.wikimedia.org/T214637 (10Legoktm) [10:21:02] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add notebook100[3,4] and stat1006 to the scap targets [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/491827 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey) [10:31:18] https://kafka-summit.org/events/kafka-summit-london-2019/ [10:33:02] 10Analytics, 10EventBus, 10MediaWiki-Core-Testing, 10Quibble, and 4 others: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins - https://phabricator.wikimedia.org/T216069 (10hashar) My bad sorry. The container `releng/quibble-stretch` no more ship with PHP since: ed2233f0 - Turn quibble-stretch into a... [10:40:18] joal: in wmf raw, is it camus that adds the _IMPORTED file right? [10:45:04] I always forget [10:45:05] ufff [10:52:08] -c --check If set, a CamusPartitionChecker job will be submitted after the camus run if any (checking and flagging imported partitions). [10:52:11] yes [10:52:23] so why it does not add the IMPORTED flag? [10:53:22] mmmm the logs talk about the camus dir [11:19:46] --check-java-opts '-Dkafka.whitelist.topics="test_webrequest"' [11:19:49] sigh [11:20:24] fixing in puppet.. [11:21:20] (should have been test_webrequest_text) [11:27:15] ah now very interesting task [11:27:24] run the checker for all the past partitions [11:44:00] 10Analytics, 10Discovery, 10Operations, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Ladsgroup) In general it would be great if the storage would be decoupled from the analytics cluster through an API... [11:50:15] 10Analytics, 10Analytics-Cluster, 10cloud-services-team (Kanban): CloudVPS: cloudvirtan1002 puppet failures due to memory allocation issues? - https://phabricator.wikimedia.org/T216707 (10aborrero) [11:52:55] 10Analytics, 10Analytics-Cluster, 10cloud-services-team (Kanban): CloudVPS: cloudvirtan1002 puppet failures due to memory allocation issues? - https://phabricator.wikimedia.org/T216707 (10elukey) This host is still not actively used as far as I know, we got to a dead end after deploying HDFS/Presto on the ab... [11:54:00] 10Analytics, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10mforns) @MNeisler Cool :] Here's the Druid transforms expression list, so that you know the possibilities and the limitations: http://druid.io/docs/late... [11:54:27] * elukey reads https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-camus/src/main/scala/org/wikimedia/analytics/refinery/camus/CamusPartitionChecker.scala [11:55:02] elukey: sorry was away - I possibly can help :) [11:55:15] elukey: I think your topic is webrequest_test_text, no ? [11:55:24] not test_webrequest_text [11:55:55] also elukey, checker is able to handle multiple partitions at once [11:56:06] You can manually run it and ask for how long back it should look [11:56:51] 10Analytics-Kanban, 10Patch-For-Review: Coordinate work on minor changes for Edit Data Quality - https://phabricator.wikimedia.org/T213603 (10JAllemandou) [11:58:53] sigh you are right [11:58:59] but why I didn't get any error in the logs? [11:59:53] elukey: hm - whitelist defines all potential topics to match in the present data, not expected ones [12:00:07] elukey: I could even be a regex IIRC [12:01:46] elukey: sorry for not having been available on time :S [12:01:56] joal: for the backfill, is it doable with the -d date ? [12:02:11] elukey: give me a minute to double check params meanig [12:02:15] nah don't worry I am learning things [12:02:26] this is why I was reading the checker code [12:02:33] but scala is a bit tough :D [12:02:44] elukey: I can hear that :) [12:02:55] elukey: better to use -n X [12:03:07] checking [12:03:11] elukey: -d is for a dedicated date, -n X test the latest X runs [12:03:15] ( I am fixing the whitelist) [12:03:25] ahhh [12:03:40] how does the checker works? High level I mean [12:04:24] elukey: it reads camus management files (written by camus after each run, read by next run to know where to start) [12:04:41] in camus history [12:04:46] elukey: yes sir [12:05:06] in camus history, there is 1 folder per run [12:05:30] and when the checker figures out that the last one was in the hour before, it tries to flag the hour [12:05:31] and the folder comntains files about r ead partitions [12:08:01] hdfs@analytics1030:/mnt/hdfs/wmf/data$ find -name _IMPORTED [12:08:01] ./raw/webrequest/webrequest_test_text/hourly/2019/02/20/16/_IMPORTED [12:08:04] ./raw/webrequest/webrequest_test_text/hourly/2019/02/20/17/_IMPORTED [12:08:07] .. [12:08:10] \o/ [12:08:13] elukey: nope, it actually checks previous-offset and current-offset for every run and topic [12:08:16] :) [12:08:44] joal: but where does it get that it needs to flag an hour? [12:09:13] (I mean, when it is the time to add _IMPORTED) [12:09:42] elukey: if, per topic, the calendar hour is between the biggest-previous-offset-timnestamp an [12:09:54] and the smallest-current-offset-timestamp [12:10:01] makes sense ? [12:11:00] Basically checking if for the checked-run (by topic), every partition read is in the next hour while it for at least one of them of them it started before [12:11:15] elukey: Might be easier in da cave :) [12:13:17] joal: sure [12:26:10] 10Analytics, 10Analytics-Cluster, 10cloud-services-team (Kanban): CloudVPS: cloudvirtan1002 puppet failures due to memory allocation issues? - https://phabricator.wikimedia.org/T216707 (10GTirloni) This server has 128GB of RAM. There are two VM's currently running on it: * ca-worker-2 - 122GB * canary-an100... [12:31:17] 10Analytics, 10DBA, 10MediaWiki-Database, 10Research, 10Wikidata: Improve interlingual links across wikis through Wikidata IDs - https://phabricator.wikimedia.org/T215616 (10diego) I think we are talking about three different things: i) page_id -> CurrentWikidataItem: this was my original request, and... [12:32:59] 10Analytics, 10Analytics-Cluster, 10cloud-services-team (Kanban): CloudVPS: cloudvirtan1002 puppet failures due to memory allocation issues? - https://phabricator.wikimedia.org/T216707 (10elukey) Shouldn't be a big deal, the puppet code related to workers can figure out by itself how to self adjust settings... [12:49:00] 10Analytics, 10DBA, 10MediaWiki-Database, 10Research, 10Wikidata: Improve interlingual links across wikis through Wikidata IDs - https://phabricator.wikimedia.org/T215616 (10JAllemandou) We're on the same page @diego :) I can precompute the table described in ii) if needed, and will surely do it once we... [13:01:04] * elukey lunch! [13:06:35] 10Analytics, 10MediaWiki-Database, 10Research, 10Wikidata: Improve interlingual links across wikis through Wikidata IDs - https://phabricator.wikimedia.org/T215616 (10Marostegui) Going to remove the #DBA tag from here as there are not really any actionables (yet) for the DBAs and we already provided some i... [13:25:21] 10Analytics, 10EventBus, 10MediaWiki-Core-Testing, 10Quibble, and 4 others: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins - https://phabricator.wikimedia.org/T216069 (10hashar) Reproduced on my Debian Stretch with the command line: ` DISPLAY='' \ ZUUL_URL=https://gerrit.wikimedia.org/r/p \ ZUUL_PR... [13:49:23] 10Analytics, 10Analytics-Wikistats: Punjabi Wikisource on Wikistats metrics - https://phabricator.wikimedia.org/T216680 (10satdeep_gill) @JAllemandou Will that add Punjabi Wikisource to http://francisco.dance/wikigrowth/. [13:56:27] 10Analytics, 10Analytics-Wikistats: Punjabi Wikisource on Wikistats metrics - https://phabricator.wikimedia.org/T216680 (10JAllemandou) Not automatically. The data powering the website you refer has been (more or less) manually extracted from the api by @fdans. Another export will need to be done next month f... [14:29:09] hey irccloud users [14:29:22] is there a way to make it not keep me online in freenode when I quit irccloud? [14:30:04] ottomata: if you don't want people/things notifying you you can change your nickname [14:31:12] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Product-Analytics, and 5 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Ottomata) Ok! Until we need it i'm going to leave it out of schemas, but most likely it will be a `has_cookie:... [14:33:44] 10Analytics, 10Discovery, 10Operations, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Ottomata) > it would be great if the storage would be decoupled from the analytics cluster through an API Well, AP... [14:35:54] 10Analytics, 10Analytics-Cluster, 10cloud-services-team (Kanban): CloudVPS: cloudvirtan1002 puppet failures due to memory allocation issues? - https://phabricator.wikimedia.org/T216707 (10Ottomata) Ya we can reduce the VM RAM size for sure. However, most likely all of this hardware is going to be moved out... [14:50:03] elukey: oooo one day... https://github.com/apache/kafka/pull/6295 [14:50:27] btw that is based on Kafka Connect ^ :) [14:52:34] 10Analytics: How to get display statistics of the content publised on Commons - https://phabricator.wikimedia.org/T201180 (10Milimetric) Yeah, going to close as a duplicate of that. The main obstacle now is expanding storage on the API cluster to fit the mediacounts dataset. [14:53:09] 10Analytics: How to get display statistics of the content publised on Commons - https://phabricator.wikimedia.org/T201180 (10Milimetric) [14:53:11] 10Analytics, 10Tool-Pageviews: Statistics for views of individual Wikimedia images - https://phabricator.wikimedia.org/T210313 (10Milimetric) [14:54:18] elukey: btw oozie webrequest text spamming analytics-alerts! [14:54:26] test [14:54:58] ottomata: I have sent an email, my bad [14:55:11] :( [14:55:12] oops [14:55:17] meant the other paren! [14:55:18] :) [14:56:06] the weird thing is that I am receiving spam on my email (ok) but also in alerts [14:56:54] ah snap coordinator.properties [14:57:58] 10Analytics, 10Dumps-Generation, 10ORES, 10Scoring-platform-team, and 3 others: Decide whether we will include raw features - https://phabricator.wikimedia.org/T211069 (10Milimetric) This increase in data sounds fine, and the proposed example path looks fine too. Hundreds of subfolders are only annoying w... [14:58:08] mmm should only be bundle.properties that I have fixed [14:58:13] 10Analytics, 10Analytics-Cluster, 10cloud-services-team (Kanban): CloudVPS: cloudvirtan1002 puppet failures due to memory allocation issues? - https://phabricator.wikimedia.org/T216707 (10MoritzMuehlenhoff) >>! In T216707#4971943, @Ottomata wrote: > Ya we can reduce the VM RAM size for sure. However, most l... [14:59:46] oozie I really hate you [15:00:32] ah the send_error_email [15:00:34] ufffff [15:00:36] sorry people [15:12:40] ok better now [15:19:33] nope [15:20:10] so I have uploaded send_error xml with my email to the proper location [15:20:22] then bundle.properties references my email address [15:21:00] no idea why it keeps spamming [15:22:32] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Clickstream dataset for Persian Wikipedia only includes external values - https://phabricator.wikimedia.org/T191964 (10Milimetric) it's not you, it's Java. But I can't help without details, ping me on IRC, I'm very behind on my phab pings as you can see. [15:24:04] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Product-Analytics, and 5 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Ottomata) BTW, want to make sure this is clear and ok with yall. By having `http.request_headers`, I'm sugges... [15:33:02] 10Analytics, 10Analytics-Kanban, 10Anti-Harassment: Add ipblocks_restrictions table to Data Lake - https://phabricator.wikimedia.org/T209549 (10Milimetric) p:05Normal→03High a:03Milimetric [15:35:56] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Product-Analytics, and 5 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Ottomata) > Until we need it i'm going to leave it out of schemas Oh, sorry, I forgot this is actually used i... [15:36:27] elukey: how far are you on T210706? I can help do any boring stuff you've got left [15:36:28] T210706: Move AQS to nodejs 10 - https://phabricator.wikimedia.org/T210706 [15:38:54] milimetric: already done everything sre related, now it is the time for upgrading the code to nodejs10 and test it [15:39:03] do you have time for it? [15:39:08] I mean upgrading the modules etc.. [15:39:11] then we can test in labs [15:40:47] elukey: I will make some time either before or after standup [15:43:04] milimetric: <3 [15:43:14] lemme know if you are ok with the plan that I made in there [15:43:23] (if it makes sense etc..) [15:57:22] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "tested locally, deploying browser dashboard with new build" [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/489998 (https://phabricator.wikimedia.org/T210589) (owner: 10Fdans) [16:02:52] 10Analytics, 10Analytics-Kanban: Add caused_by_user_text to mediawiki_page_history - https://phabricator.wikimedia.org/T167608 (10JAllemandou) a:03JAllemandou [16:03:27] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10Product-Analytics: mediawiki_history datasets have null user_text for IP edits - https://phabricator.wikimedia.org/T206883 (10JAllemandou) a:03JAllemandou [16:14:53] fdans: I'm seeing a ghost in pageviews.js [16:14:59] when I look at it on github, it's fine [16:15:27] https://www.irccloud.com/pastebin/ImvufOGt/ [16:15:34] ?! [16:16:15] milimetric: wat [16:16:27] yeah... [16:16:36] I have no idea how this could be happening [16:17:49] fdans: please check your own version and tell me if you see the same thing. [16:17:55] package.json says it's 1.4.1 [16:20:50] milimetric: yep, I see the same thing [16:21:01] but why do we care? dashiki doesn't use pageviews per country [16:24:02] PROBLEM - eventbus grafana alert on icinga2001 is CRITICAL: CRITICAL: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is alerting: EventBus POST Response Status alert. [16:24:37] 10Analytics, 10Analytics-Wikistats: Punjabi Wikisource on Wikistats metrics - https://phabricator.wikimedia.org/T216680 (10satdeep_gill) Alright! In that case feel free to close this task. [16:26:58] 10Analytics, 10Analytics-Cluster, 10cloud-services-team (Kanban): CloudVPS: cloudvirtan1002 puppet failures due to memory allocation issues? - https://phabricator.wikimedia.org/T216707 (10Ottomata) https://phabricator.wikimedia.org/T207321#4882266 and below. tl;dr Running reliable 'production' service in Cl... [16:27:26] eventubs is alarming for topic mediawiki.job.cirrusSearchElasticaWrite. MessageSizeTooLargeError: MESSAGE_SIZE_TOO_LARGE [16:27:30] again [16:28:10] :( [16:28:32] 10Analytics, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Evaluate using TypeScript on node projects - https://phabricator.wikimedia.org/T206268 (10Ottomata) Cool! Not going to focus on this now, but if I (or anyone!) ever finds time, I'd be happy to move that way. [16:31:22] RECOVERY - eventbus grafana alert on icinga2001 is OK: OK: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is not alerting. [16:36:04] fdans: no this is just like slapped at the end of the file, so it's always making that call after exporting the pageviews module, meaning anyone importing it will make the call and get a console.log [16:37:30] 10Analytics, 10Analytics-Cluster, 10cloud-services-team (Kanban): CloudVPS: cloudvirtan1002 puppet failures due to memory allocation issues? - https://phabricator.wikimedia.org/T216707 (10JAllemandou) No near-term plan for presto for me. Tests worked fine, we can move them :) Thanks! [16:38:46] fdans: the pageviews.min file doesn't have it... even weirder, I guess I'll switch require.config to point to that for now [16:41:36] 10Analytics, 10Analytics-Wikistats: Punjabi Wikisource on Wikistats metrics - https://phabricator.wikimedia.org/T216680 (10JAllemandou) [16:41:40] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Punjabi Wikisource WikiStats 2.0 - https://phabricator.wikimedia.org/T215082 (10JAllemandou) [16:43:42] 10Analytics, 10EventBus, 10Internet-Archive, 10MediaWiki-extensions-WikimediaEvents, and 2 others: Edits to Flow pages result in a page-links-change event with no performer - https://phabricator.wikimedia.org/T216726 (10bmansurov) [16:48:50] fdans: the rate-of-growth dashboard is a big hit, good work [16:50:35] ottomata: ahhh it is a grafana alert! Just realized it [16:50:59] there is one for graphite 400/500s [16:51:33] mmmmmm [16:52:00] 10Analytics, 10EventBus, 10Internet-Archive, 10MediaWiki-extensions-WikimediaEvents, and 2 others: Edits to Flow pages result in a page-links-change event with no performer - https://phabricator.wikimedia.org/T216726 (10bmansurov) So here's the page in question: https://fr.wikipedia.org/wiki/Sujet:Uumkuv2o... [16:54:52] (03PS1) 10Milimetric: Update dashboards with latest dashiki build [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/491999 [16:55:25] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Update dashboards with latest dashiki build [analytics/analytics.wikimedia.org] - 10https://gerrit.wikimedia.org/r/491999 (owner: 10Milimetric) [16:55:42] (03PS1) 10Milimetric: Fix pageviews.js reference [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/492000 [16:55:55] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Fix pageviews.js reference [analytics/dashiki] - 10https://gerrit.wikimedia.org/r/492000 (owner: 10Milimetric) [16:57:03] 10Analytics, 10User-Elukey: Staging environment for upgrades of superset - https://phabricator.wikimedia.org/T212243 (10elukey) There is a caveat though - Debian Buster and Python 3.6. Would it make sense to do the following first: * create analytics-tool1004 with Buster when Moritz is ready * deploy Superset... [17:11:34] harej: aw, thank you so much, it was a fun lil project [17:13:42] it's interesting that people want metrics for specific projects but that reports aren't run for such projects. is there a way to decide which wikis reports are run for? i imagine it would be wasteful to do analytics on literally every wiki, but then how do we know which wikis to measure if we don't have the data? [17:13:52] 10Quarry, 10Security: Use data attributes instead of unsafe-inline var definitions within Quarry template files - https://phabricator.wikimedia.org/T216653 (10sbassett) p:05Triage→03Low [17:14:52] harej: we try to be 'all-wikis' by default [17:16:02] harej: yeah basically, it's simple for us to show stats for pageviews because we get alerts whenever a new site declares pageview [17:17:15] for editing metrics, dependant on mediawiki history reconstruction, we need to explicitly add that wiki to the list of groups, and it is not always in sync with the pageviews one, which cause us not to load data for a bunch of wiki sites [17:18:13] harej: luckily, mediawiki-history metrics "go back in time", so even if we don't have data right now, just adding the site to our list means next month we'll have editing stats for that site all the way to its creation date [17:20:02] so on one side, yes, we should have a dynamic list of wikis that is synced to reality, but on the other side, if a wiki isn't there it just means no one has asked for it, because it really takes no effort to add it [17:21:01] possibly a stupid question, but is there any reason the wiki list just couldn't be dynamically generated from sitematrix or anything like that? [17:22:01] harej: not stupid question at all [17:23:18] the original reason is we have to group wikis according to their size when computing these stats (i.e. enwiki needs a group of its own while other groups can have dozens of wikis together), so our current list has grouping info for every wiki [17:23:43] it's feasible to still do it dynamically, but it's a pending task for us [17:24:01] Right, I think you can get around that but it would take work [17:25:49] yeah, exactly [17:26:29] I'm thinking now that I should add a "don't see your wiki?" kind of link in my site for people to complain about missing wikis [17:27:44] 10Analytics, 10Analytics-Kanban, 10Operations, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10elukey) [17:27:54] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) [17:31:04] 10Analytics, 10Analytics-Kanban, 10Operations, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10elukey) @RobH I'd like to establish the next steps for this task, so we can order a new GPU and test it as soon as possible, since a lot of the new Fiscal Year plannin... [17:35:08] 10Analytics, 10Product-Analytics: Timestamp column in EventLogging tables have incompatible collation - https://phabricator.wikimedia.org/T216658 (10Milimetric) 05Open→03Declined This sucks but we're not likely to work on it, as we're moving away from mysql. We don't want to be mean though, so we can help... [17:35:31] 10Analytics, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), 10Core Platform Team Backlog (Later), 10Services (next): EventBusRCFeedFormatter should clean up events from nulls - https://phabricator.wikimedia.org/T216567 (10Milimetric) p:05Triage→03High [17:35:39] 10Analytics, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), 10Core Platform Team Backlog (Later), 10Services (next): EventBusRCFeedFormatter should clean up events from nulls - https://phabricator.wikimedia.org/T216567 (10Milimetric) p:05High→03Normal [17:36:12] 10Analytics, 10Analytics-Kanban, 10EventBus, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), and 2 others: extensions/EventBus/includes/EventBusRCFeedEngine.php:45 PHP Notice: Undefined index: eventServiceName - https://phabricator.wikimedia.org/T216561 (10Milimetric) p:05Triage→03Normal [17:36:15] neilpquinn: o/ [17:36:32] any comment about my dbstore1002 sunset proposal in two weeks? [17:36:46] [ ] No way! You are completely crazy Luca [17:36:55] [ ] Yes I am happy to nuke that thing! [17:37:32] [X] You are completely crazy AND please nuke that thing elukey :) [17:37:50] joal: excuse me, please don't interfere with my poll :P [17:37:51] 10Analytics, 10Pageviews-API: Add wikimania.wikimedia.org to pageview whitelist - https://phabricator.wikimedia.org/T216525 (10Milimetric) This is not only not in the whitelist, it's excluded from the pageview definition. The logic was that we don't want to include traffic to wikis that are not about "content... [17:37:57] 10Analytics, 10Pageviews-API: Add wikimania.wikimedia.org to pageview whitelist - https://phabricator.wikimedia.org/T216525 (10Milimetric) p:05Triage→03Normal [17:38:05] * joal won't do it again [17:38:06] 10Analytics, 10Pageviews-API: Add wikimania.wikimedia.org to pageview definition - https://phabricator.wikimedia.org/T216525 (10Milimetric) [17:39:38] 10Analytics: Old job_tracker setting in oozie properties - https://phabricator.wikimedia.org/T216519 (10Milimetric) p:05Triage→03Normal [17:40:20] 10Analytics, 10Analytics-Kanban, 10Operations, 10Wikimedia-Stream, 10Services (watching): Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10Milimetric) p:05Triage→03High [17:41:07] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10cloud-services-team (Kanban): CloudVPS: cloudvirtan1002 puppet failures due to memory allocation issues? - https://phabricator.wikimedia.org/T216707 (10Milimetric) p:05Triage→03High a:03Ottomata [17:44:57] 10Analytics, 10Analytics-Cluster, 10User-EBernhardson: Setup ivysettings.xml for sourcing spark job dependencies from archiva - https://phabricator.wikimedia.org/T216093 (10Ottomata) Hm, not sure I understand. Why do the spark jobs contact Maven central at runtime? Can they just add dependencies in pom.xml? [17:47:48] 10Analytics, 10Analytics-Cluster, 10User-EBernhardson: Setup ivysettings.xml for sourcing spark job dependencies from archiva - https://phabricator.wikimedia.org/T216093 (10EBernhardson) Your python projects have pom.xml? :) [17:50:08] 10Analytics, 10Analytics-Cluster, 10User-EBernhardson: Setup ivysettings.xml for sourcing spark job dependencies from archiva - https://phabricator.wikimedia.org/T216093 (10EBernhardson) I suppose i should be a little more clear, running a pyspark job with extra java dependencies currently looks something li... [17:52:21] 10Analytics, 10Analytics-Cluster, 10Operations, 10Traffic: Respect X-Forwarded-For only from trustworthy sources - https://phabricator.wikimedia.org/T56783 (10Milimetric) 05Open→03Declined >>! In T56783#2688311, @BBlack wrote: > Or is this basically now an off-topic ticket going nowhere? My money's on... [17:53:46] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Kanban (Doing), and 2 others: Add monolog adapters for Eventbus - https://phabricator.wikimedia.org/T216163 (10Ottomata) p:05Normal→03High [17:53:55] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Patch-For-Review, 10Services (watching): EventBus mediawiki extension should support multiple 'event service' endpoints - https://phabricator.wikimedia.org/T214446 (10Ottomata) p:05Normal→03High [17:54:01] 10Analytics, 10Analytics-Kanban, 10EventBus: Spike: Can Refine handle map types if Hive Schema already exists with map fields? - https://phabricator.wikimedia.org/T215442 (10Ottomata) p:05Normal→03High [17:54:05] 10Analytics, 10Analytics-EventLogging, 10Discovery, 10EventBus, and 2 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Ottomata) p:05Normal→03High [17:54:10] 10Analytics, 10Analytics-Kanban, 10EventBus, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), and 2 others: extensions/EventBus/includes/EventBusRCFeedEngine.php:45 PHP Notice: Undefined index: eventServiceName - https://phabricator.wikimedia.org/T216561 (10Ottomata) p:05Normal→03High [17:55:39] 10Analytics, 10EventBus, 10serviceops, 10Services (watching): Datacenter aware configs for EventGate topic prefixes - https://phabricator.wikimedia.org/T213564 (10Ottomata) I think we will just have different values.yaml files in prod that specify --set topic_prefix=XXXX appropriately. [17:56:42] PROBLEM - eventbus grafana alert on icinga2001 is CRITICAL: CRITICAL: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is alerting: EventBus POST Response Status alert. [17:57:41] elukey: I was actually just pinging to rest of my the analysts five minutes ago to make sure I knew how they felt :) [17:57:53] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 3 others: Modern Event Platform: Stream Connectors - https://phabricator.wikimedia.org/T214430 (10Ottomata) [17:58:03] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline - https://phabricator.wikimedia.org/T211247 (10Ottomata) p:05Normal→03High [17:58:05] but it seems like we're ready to rip the bandage off, if you get what I mean :) [17:58:08] 10Analytics: Alert on validation errors on new stream intake service - https://phabricator.wikimedia.org/T210457 (10Ottomata) p:05Normal→03High [17:58:34] aka we seem to be ready to nuke that thing hahaha [17:58:47] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 5 others: Modern Event Platform: Stream Intake Service: EventGate security review - https://phabricator.wikimedia.org/T208251 (10Ottomata) p:05Normal→03High [17:58:59] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Later), 10Services (later): Make schemas use required $schema property with absolute path (not absolute URL) to the schema - https://phabricator.wikimedia.org/T208361 (10Ottomata) p:05Normal→03High [17:59:11] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), and 2 others: Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Ottomata) p:05Normal→03High [17:59:36] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), and 2 others: RFC: Modern Event Platform: Stream Intake Service - https://phabricator.wikimedia.org/T201963 (10Ottomata) Can we close this? [18:00:46] 10Analytics, 10EventBus, 10Internet-Archive, 10MediaWiki-extensions-WikimediaEvents, and 2 others: Edits to Flow pages result in a page-links-change event with no performer - https://phabricator.wikimedia.org/T216726 (10Pchelolo) I'm not sure how Flow works internally, which hooks are called and why doesn'... [18:01:19] neilpquinn: <3 [18:01:38] RECOVERY - eventbus grafana alert on icinga2001 is OK: OK: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is not alerting. [18:02:33] 10Analytics, 10EventBus, 10Internet-Archive, 10MediaWiki-extensions-WikimediaEvents, and 2 others: Edits to Flow pages result in a page-links-change event with no performer - https://phabricator.wikimedia.org/T216726 (10Cyberpower678) Why am I subscribed to this? [18:03:13] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10Cmjohnson) [18:05:43] elukey: okay, I have now heard from everybody, and we are officially okay with you euthanizing dbstore1002 :) [18:06:22] * elukey dances [18:06:26] \o/ [18:10:47] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Make an Analytics Data Lake table to provide meta info about wikis - https://phabricator.wikimedia.org/T184576 (10Neil_P._Quinn_WMF) a:03Neil_P._Quinn_WMF [18:13:03] 10Analytics, 10Analytics-Kanban, 10User-Marostegui: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 (10Neil_P._Quinn_WMF) [18:13:08] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 2 others: Replace the current multisource analytics-store setup - https://phabricator.wikimedia.org/T172410 (10Neil_P._Quinn_WMF) [18:13:16] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10Neil_P._Quinn_WMF) [18:16:32] PROBLEM - eventbus grafana alert on icinga2001 is CRITICAL: CRITICAL: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is alerting: EventBus POST Response Status alert. [18:19:02] RECOVERY - eventbus grafana alert on icinga2001 is OK: OK: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is not alerting. [18:29:31] ottomata: CDH6 for Q1/Q2 ? [18:29:32] :D [18:30:23] * elukey off! [18:35:52] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10Cmjohnson) a:05Cmjohnson→03ayounsi @arzhel This server needs to go into the cloud-support vlan but it's not available to me for row C.... [18:47:38] 10Analytics, 10Analytics-Kanban, 10EventBus, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), and 2 others: extensions/EventBus/includes/EventBusRCFeedEngine.php:45 PHP Notice: Undefined index: eventServiceName - https://phabricator.wikimedia.org/T216561 (10thcipriani) @Ottomata I backported your fix and deplo... [18:58:22] 10Analytics, 10EventBus, 10MediaWiki-Core-Testing, 10Quibble, and 4 others: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins - https://phabricator.wikimedia.org/T216069 (10hashar) Found it. The Selenium test suite uses Mocha and eventually for the Page test it dies with: (node:1165) UnhandledPromis... [19:01:51] 10Analytics, 10EventBus, 10MediaWiki-Core-Testing, 10Quibble, and 4 others: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins - https://phabricator.wikimedia.org/T216069 (10Pchelolo) Thank you a lot @hashar . I guess this ticket can be closed, however I have a last question - do you think we could som... [19:07:57] 10Analytics, 10EventBus, 10MediaWiki-Core-Testing, 10Quibble, and 4 others: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins - https://phabricator.wikimedia.org/T216069 (10hashar) Definitely, the test suite is broken. Quibble is fine itself, it just run commands. The issue is somewhere in the webdriv... [19:21:34] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10cloud-services-team (Kanban): CloudVPS: cloudvirtan1002 puppet failures due to memory allocation issues? - https://phabricator.wikimedia.org/T216707 (10Ottomata) Ok, deleting all instances for now... [19:21:57] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Presto cluster online and usable with test data pushed from analytics prod infrastructure accessible by Cloud (labs) users - https://phabricator.wikimedia.org/T204951 (10Ottomata) Deleting all instances because {T216707}. [19:30:16] 10Analytics, 10Analytics-Cluster, 10User-EBernhardson: Setup ivysettings.xml for sourcing spark job dependencies from archiva - https://phabricator.wikimedia.org/T216093 (10Ottomata) Ah, probably the proper thing to do would be to get the jars from archiva (with wget, rsync or even git fat) and then include... [19:33:39] 10Analytics, 10Analytics-Kanban, 10EventBus, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), and 2 others: extensions/EventBus/includes/EventBusRCFeedEngine.php:45 PHP Notice: Undefined index: eventServiceName - https://phabricator.wikimedia.org/T216561 (10Ottomata) 05Open→03Resolved [19:40:34] 10Analytics, 10Analytics-Cluster, 10User-EBernhardson: Setup ivysettings.xml for sourcing spark job dependencies from archiva - https://phabricator.wikimedia.org/T216093 (10EBernhardson) I'd really rather not write custom code to handle all of the dependencies myself. We have a dependency resolution layer in... [19:43:37] 10Analytics, 10Analytics-Cluster, 10User-EBernhardson: Setup ivysettings.xml for sourcing spark job dependencies from archiva - https://phabricator.wikimedia.org/T216093 (10Ottomata) Hm, I guess it's just generally how we do things. Not opposed to a working ivysettings.xml. [19:52:45] 10Analytics, 10MediaWiki-extensions-WikimediaEvents, 10The-Wikipedia-Library, 10Patch-For-Review: ExternalLinksChange Logging instrumentation is completely broken - https://phabricator.wikimedia.org/T162365 (10Milimetric) This latest change I pushed removes any EventLogging instrumentation from the SpamBla... [20:03:27] ottomata: i haven't been able to ssh into deployment-aqs01.deployment-prep.eqiad.wmflabs or wikimetrics-01.eqiad.wmflabs but I can ssh into other labs machines like tools-login [20:03:38] I'm using bastion-eqiad.wmflabs.org [20:04:37] I see, tools is a different domain, wonder what's going on... [20:23:47] hm [20:28:54] PROBLEM - eventbus grafana alert on icinga2001 is CRITICAL: CRITICAL: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is alerting: EventBus POST Response Status alert. [20:30:29] milimetric: it looks like they've changed recomendations since I last checked [20:30:29] https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_with_ProxyJump_ssh_option_(recommended) [20:30:49] but for non tool labs, the proper bastion is primary.bastion.wmflabs.org [20:31:19] RECOVERY - eventbus grafana alert on icinga2001 is OK: OK: EventBus ( https://grafana.wikimedia.org/d/000000201/eventbus ) is not alerting. [20:32:36] 10Analytics, 10Product-Analytics: Timestamp column in EventLogging tables have incompatible collation - https://phabricator.wikimedia.org/T216658 (10Tbayer) >>! In T216658#4972706, @Milimetric wrote: > This sucks but we're not likely to work on it, as we're moving away from mysql. We don't want to be mean tho... [20:33:12] thanks ottomata, I searched all over wikitech, no idea how you found that [20:33:35] milimetric: left column, Cloud VPS Help -> Access [20:34:46] :) I just searched for "bastion", "proxy", etc... I guess search's still not that great [20:46:55] 10Analytics, 10Product-Analytics: Timestamp column in EventLogging tables have incompatible collation - https://phabricator.wikimedia.org/T216658 (10Tbayer) >>! In T216658#4973620, @Tbayer wrote: >>>! In T216658#4972706, @Milimetric wrote: >> This sucks but we're not likely to work on it, as we're moving away... [20:49:16] 10Analytics, 10Product-Analytics: Timestamp column in EventLogging tables have incompatible collation - https://phabricator.wikimedia.org/T216658 (10Ottomata) I haven't looked into it, but the naming of PrefUpdate_5563398_15423246 is unusual. IIRC, tables with an extra suffix are some kind of backup or archiv... [21:02:31] (03PS1) 10Fdans: [wip] Refactor dashboard metric widget [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/492060 (https://phabricator.wikimedia.org/T187806) [21:03:28] 10Quarry, 10Security: Use data attributes instead of unsafe-inline var definitions within Quarry template files - https://phabricator.wikimedia.org/T216653 (10Bawolff) I should emphasize of course, that quarry has a very low risk profile, so its really not worth worrying too much (As much as I love better secu... [21:03:43] (03PS2) 10Fdans: [wip] Refactor dashboard metric widget [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/492060 (https://phabricator.wikimedia.org/T187806) [21:06:16] (03CR) 10jerkins-bot: [V: 04-1] [wip] Refactor dashboard metric widget [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/492060 (https://phabricator.wikimedia.org/T187806) (owner: 10Fdans) [21:06:29] yeayea whatever [21:20:19] 10Analytics, 10Product-Analytics: Timestamp column in EventLogging tables have incompatible collation - https://phabricator.wikimedia.org/T216658 (10Tbayer) >>! In T216658#4973662, @Ottomata wrote: > I haven't looked into it, but the naming of PrefUpdate_5563398_15423246 is unusual. IIRC, tables with an extra... [21:27:58] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move AQS to nodejs 10 - https://phabricator.wikimedia.org/T210706 (10Milimetric) Started trying to help on this, ran into two problems: 1. keyspaces aren't created yet on deployment-aqs01. That means aqs didn't really start successfully but systemd says... [21:40:03] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 2 others: Replace the current multisource analytics-store setup - https://phabricator.wikimedia.org/T172410 (10Milimetric) >>! In T172410#4965393, @jcrespo wrote: > While this may look like an annoyance, we don't usually talk a... [21:45:44] 10Analytics: [Bug] Type mismatch for a few other schemas - https://phabricator.wikimedia.org/T216771 (10Milimetric) [21:50:52] 10Analytics: Coarse alarm on data quality for refined data based on entropy calculations - https://phabricator.wikimedia.org/T215863 (10Milimetric) [21:52:44] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 2 others: Replace the current multisource analytics-store setup - https://phabricator.wikimedia.org/T172410 (10leila) DEAL! :) [21:52:59] 10Analytics: Coarse alarm on data quality for refined data based on entropy calculations - https://phabricator.wikimedia.org/T215863 (10Milimetric) [21:54:02] bd808: hiya [21:54:13] do you think we should separate out user-agent and api-user-agent? [21:54:14] https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/491887/4/includes/api/ApiMain.php [21:56:38] ottomata: the existing code uses $this->getUserAgent() which is the concatenation of both. [21:56:49] ya [21:57:20] bd808: we were going to put user-agent into this headers map [21:57:23] since it is a header [21:57:28] (03PS2) 10Milimetric: [WIP] Create geoeditors edits monthly and yearly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/489313 (https://phabricator.wikimedia.org/T215655) [21:57:32] but, the concatenated value isn't honestly a header [21:57:45] so i think that's why petr did it this way [21:58:10] whatever stuff you are using is going to have to be changed to adapt to new schema anyway (we will help), just wondering if this is ok, or if it is better or worse [21:58:11] *shrug* you are trying to fit context specific data into a context free box [21:58:27] aye bd808 if we want something other than the header [21:58:30] we can make an even t specific field [21:58:37] api_user_agent or something [21:58:53] that is not in the header map [21:59:54] I don't think its a big deal in any direction. the original intent of collecting the UA data was to do bucketing/reporting based on UA, but that got shot down when Legal decided that UA == PII after we started implementation [22:00:19] but I do think we probably want to collect Api-user-agent somewhere [22:01:42] I can't remember if this specific data is important for the user facing SpecialPage or not... (or that's even fed by this logging) [22:02:21] bd808 i hightly doubt it is [22:02:34] bd808: by default all data is only going to be kept 90 days anyway [22:02:51] its not. I was thinking about ApiFeatureUsage but it is a separate thing [22:02:55] but, if you don't need this data [22:03:01] its best not to log it [22:03:14] api user agents soudns like interesting stuff tho [22:03:47] Ok, if its ok with you then, we'll keep it like this with different fields in the map for each header [22:04:03] yeah, that should be fine [22:04:04] that will make querying interesting too then, you caan find out what user agents are or aren't being overridden more easily maybe [22:14:50] tahnks bryan! [22:15:02] and sorry i keep putting an 'i' in your name in comments! [22:28:47] ottomata: :) there are too many Br?[iy]r?[ao]n's to keep straight [22:52:08] bd808: that's a tshirt waiting to happen [22:52:22] \'Br?[iy]r?[ao]n\' [22:53:12] (03PS1) 10Milimetric: Add suggestion to use x1 if db not found [analytics/refinery] - 10https://gerrit.wikimedia.org/r/492208 [22:53:14] (03PS1) 10Milimetric: Use db_mapping to find the hostname [analytics/refinery] - 10https://gerrit.wikimedia.org/r/492209 (https://phabricator.wikimedia.org/T215290) [23:42:26] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10Melderick) @ArielGlenn I see. Thanks for the information. @Nicolastorzec Yes, I don't worry about configuring cron to automatica... [23:49:51] 10Analytics, 10Analytics-Cluster, 10User-EBernhardson: Setup ivysettings.xml for sourcing spark job dependencies from archiva - https://phabricator.wikimedia.org/T216093 (10EBernhardson) This actually turned out quite trivial, the ivysettings.xml needed: ` ...