[01:06:18] 10Analytics, 10Analytics-Kanban: Provide edit tags in the Data Lake edit data - https://phabricator.wikimedia.org/T161149 (10Neil_P._Quinn_WMF) >>! In T161149#5034672, @Nuria wrote: > So even if change tags are not incorporated to march snapshot they are available in hadoop and scooped monthly so you could cal... [04:43:26] (03CR) 10Nuria: Update refinery sqoop to use dedicated labsdb host (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495266 (https://phabricator.wikimedia.org/T215550) (owner: 10Joal) [06:49:47] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10RyanSteinberg) >>! In T213969#5028074, @leila wrote: > @Lauren.maggio can you make sure someone on your team is ready to start asse... [06:52:27] 10Analytics, 10ChangeProp, 10Community-Tech, 10Core Platform Team, and 6 others: Provide the ability to have time-delayed or time-offset jobs in the job queue - https://phabricator.wikimedia.org/T218812 (10Joe) @Mooeypoo thanks for clarifying requirements of the current project better, it's much clearer no... [07:37:00] 10Analytics-Kanban, 10Product-Analytics: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Amire80) Somewhat related: {T207171}. And I join the people who are asking to make this information available. Information about countries where the n... [08:02:00] morning! [08:02:09] so superset upstream seems responsive to some bugs opened [08:02:36] Good news Luca :) [08:02:42] (there seems to be only one active committer though, that is also the creator of Air Flow) [08:02:43] Good morning :) [08:02:50] bonjour! [08:02:52] :) [08:03:41] joal: do you remember the edit filter that we were checking yesterday? (rather than no data returned, it was "count") [08:03:53] IIRC it wasn't set up correctly in the first place [08:03:58] but I don't recall how we fixed it [08:04:11] (I am trying a fix and I want to see if it works for regular data) [08:04:16] elukey: start-time was incorrect I think [08:05:03] yeah it was infinite, I put Feb 1st now but still nothing.. also IIRc the time granularity? [08:05:22] not that I remeber [08:06:09] can you quicky check on analytics-tool1004 when you have time? [08:06:52] elukey: on superset.wikimedia.org, using `since 2 years` in `edit filter` chart provides results [08:07:10] elukey: anything special you'd like me to check on analytics-tool1004? [08:11:17] (03PS11) 10Joal: Update refinery sqoop to use dedicated labsdb host [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495266 (https://phabricator.wikimedia.org/T215550) [08:11:42] joal: ah yes with years it works :) [08:11:57] joal: the bubble chart issue that you found should be fixed [08:12:21] together with the filter box one (that was not showing up in your pageview dashboard for example) [08:12:27] maps are still broken [08:12:33] looking [08:13:46] hm - I don't recall the filter view missing, but he :) [08:14:21] it was https://github.com/apache/incubator-superset/issues/7074 [08:14:40] the filter box was reporting "IndexError: list index out of range" [08:15:00] ok ! I didn't notice (my bad :S) I'm not a good tester [08:15:19] nono I don't recall if I applied some manual fixes or not [08:15:30] I noticed it when upgrading to 0.31 [08:15:37] but now it is completely solved [08:16:17] \o/ - This is awesome you do elukey :) [08:16:28] in theory if they fix the world map bubble and the other little issue with filter box (key error on no data returned) we should be good [08:18:25] elukey: I confim bubbles bubble! [08:18:25] :) [08:18:29] niceeee [08:19:15] https://www.youtube.com/watch?v=s7IYR_rELyE [08:19:18] elukey: --^ [08:21:32] ahahahahah [08:29:53] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Access to yarn.wikimedia.org for julia.glen - https://phabricator.wikimedia.org/T218815 (10MoritzMuehlenhoff) I confirm that she has an NDA in place, so I've added uid=julianglen to the nda group in LDAP. [08:46:39] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Access to yarn.wikimedia.org for julia.glen - https://phabricator.wikimedia.org/T218815 (10elukey) All right everything should be set! If you have trouble signing in please make sure that you also test `Julia.glen` :) [08:46:52] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Access to yarn.wikimedia.org for julia.glen - https://phabricator.wikimedia.org/T218815 (10elukey) [09:47:18] I found the problem with TLS in hadoop test [09:47:39] basically due to a puppet issue one yarn node manager didn't have the TLS config [09:47:46] and one single shuffler was causing a mess [09:47:51] * elukey cries in a corner [09:55:48] :S [10:00:15] but now hive seems to run fine, so I am ready to test an oozie job [10:00:30] \o/ !!! ninja-catch elukey :) [10:01:45] elukey: this is also interesting learning: if a single node has wrong config, all use non-secured :S [10:02:28] joal: nono in this case the errors were that the shuffler was erroring due to TLS handshake not understood [10:02:46] and the reducers impacted were causing retries etc.. [10:02:53] ending up in failure (eventually) [10:02:58] * joal feels like a shuffler- handshake not understood :S [10:03:00] so all good from this side [10:03:20] my bad, didn't explained very well [10:03:46] now the shuffler only listen for TLS traffic, rather than regular TCP/HTTP one [10:03:49] on the same port [10:04:04] so the reducers call a https:// link when they need data from mappers, using the shuffler [10:04:18] on one host the shuffler was still configured for TCP/HTTP [10:04:30] so when the reducer was sending a TLS client hello message to start the handshake [10:04:31] therefore failing !!! - Makes sense [10:04:43] the shuffler was erroring out since it didn't get the message [10:04:50] exactly [10:04:51] :) [10:05:16] man - pfff - Super good catch [10:05:57] the logs in this case are good if you already experienced something similar, otherwise really confusing [10:06:30] the other intesting bit in https://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html is [10:06:37] IMPORTANT: Using encrypted shuffle will incur in a significant performance impact. Users should profile this and potentially reserve 1 or more cores for encrypted shuffle. [10:06:53] Yes elukey - I recall having read that [10:07:09] but I am not sure if this still holds, the JVM is way better in handling TLS nowadays [10:07:31] elukey: I assume we're gonna experience and see :) [10:08:22] in theory this could be the first big change to our prod cluster, enabling TLS where needed [10:08:32] then stronger auth etc.. [10:08:43] +1 elukey :) [10:09:31] :) [10:43:18] 10Analytics, 10Analytics-Wikistats: Feedback on hive table mediawiki_history by Erik Z - https://phabricator.wikimedia.org/T178591 (10JAllemandou) Only thing remaining here that has not been worked is a field about being a bot or not instead of relying on the groups. We have `user_is_anonymous` and `user_is_bo... [10:53:49] ahh so oozie and hive do read mapred-site.xml [10:53:58] finally one oozie job working [10:54:00] with TLS [10:54:03] \o/ [10:54:21] elukey: hat down for you :) [11:12:53] 10Analytics-Kanban, 10Patch-For-Review: Coordinate work on minor changes for Edit Data Quality - https://phabricator.wikimedia.org/T213603 (10JAllemandou) Data check details: - I ran mediawiki-history-check on data generated by this patch and failures are coming from expected changes: - No failure for user-... [11:14:55] 10Analytics-Kanban, 10Patch-For-Review: Coordinate work on minor changes for Edit Data Quality - https://phabricator.wikimedia.org/T213603 (10JAllemandou) The validation checks above are done without the patch for page-history refactor (https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/493390) becau... [11:20:17] * elukey lunch [11:28:44] 10Analytics, 10ChangeProp, 10Community-Tech, 10Core Platform Team, and 6 others: Provide the ability to have time-delayed or time-offset jobs in the job queue - https://phabricator.wikimedia.org/T218812 (10Pchelolo) [12:21:54] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10Pginer-WMF) Thanks @chelsyx, this looks good! I went through the report in more d... [13:18:28] TIL: the journalnode protocol is not built on top of HTTP [13:18:42] but it is IPC [13:18:59] so in theory both yarn and hdfs TLS config could be avoided (in theory) [13:19:10] since we care about the http in the shuffler [13:21:08] but with TLS everywhere it migth be better if we want to re-enable in the future httpfs [13:21:11] mmmmm [13:38:54] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata, 10Patch-For-Review: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) The xml/sql dumps run twice a month; the 'full' dumps, containing all content for... [14:10:34] (03PS11) 10Joal: Update mediawiki-reconstruction with log info [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/493012 [14:10:52] milimetric: hello - Do you work today? [14:11:38] hey joal, yes, sorry, had another crazy morning [14:12:02] how can I help joal [14:12:12] No prob milimetric, but since I'll leave soon to get kids, I'd like to grab you for a few minutes before [14:12:22] batcave? [14:12:38] going to the batcave to find out how you're making fun of me ;) [14:19:21] 10Analytics, 10Fundraising-Backlog, 10Product-Analytics: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10Ottomata) So, wow. EventLogging's schemaschema.json is forcing that the array items is itself an array, which only validates that each element in the... [14:24:06] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update wikimedia-history revision data with deleted field (and find it a new name?) - https://phabricator.wikimedia.org/T178587 (10Milimetric) `revision_deleted_by_page_deleted`? It's so long and ugly but maybe it's clear enough and doesn't have the poten... [14:53:45] joal: confirmed that change was deployed to all projects on Thursday September 18th, 2014. I'd use 2014-09-19 as the cutoff. The release process is super well documented: https://www.mediawiki.org/wiki/MediaWiki_1.24/Roadmap#Schedule_for_the_deployments and https://www.mediawiki.org/wiki/MediaWiki_1.24/wmf21#Core_changes shows the change is indeed included in 1.24/wmf21 [14:56:04] (03CR) 10Milimetric: "skim over my comments from PS9 when you go over this again" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/493390 (https://phabricator.wikimedia.org/T190434) (owner: 10Joal) [15:00:49] 10Analytics-Kanban, 10Product-Analytics: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Nuria) @Yair_rand The guidelines are not confidential, they apply to a set of systems that only people working at WMF (and research collaborators) have... [15:02:11] 10Analytics, 10Fundraising-Backlog, 10Product-Analytics: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10Ottomata) Oof: `lang=php public function getSequenceChildRef( $i ) { TODO: make this conform to draft-03 by also allowing single object ` https:/... [15:03:09] 10Analytics, 10WMDE-Analytics-Engineering, 10Wikidata, 10Wikidata-Campsite: Track WDQS updater UA in wikidata-special-entitydata grafana dashboard - https://phabricator.wikimedia.org/T218998 (10Addshore) [15:09:35] 10Analytics, 10Fundraising-Backlog, 10Product-Analytics: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10Ottomata) I can make the EventLogging extension accept the proper items schema object format, but I can't seem to make it properly validate instances... [15:15:50] 10Analytics, 10Product-Analytics: Ingest data from PrefUpdate EventLogging schema into Druid - https://phabricator.wikimedia.org/T218964 (10ovasileva) As mentioned in the task, since we just did the first deployment for AMC, it would be really helpful if we could monitor the opt-in rate as we proceed with comm... [15:17:39] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata, 10Patch-For-Review: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10Rosiestep) @ArielGlenn - Reflecting on what you've said, the 1st and the 20th are fine for Wo... [15:34:16] 10Analytics, 10ChangeProp, 10Community-Tech, 10Core Platform Team, and 6 others: Provide the ability to have time-delayed or time-offset jobs in the job queue - https://phabricator.wikimedia.org/T218812 (10aezell) Thanks for the insight @Joe. I appreciate your giving some thought to this. I'll share some... [15:42:45] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Access to yarn.wikimedia.org for julia.glen - https://phabricator.wikimedia.org/T218815 (10Julia.glen) julia.glen is the one. Thanks! [16:01:37] ping elukey standdup and joal ? [16:03:13] 10Analytics, 10Analytics-Kanban, 10Fundraising-Backlog, 10Product-Analytics, 10Patch-For-Review: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10Ottomata) [16:03:41] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Access to yarn.wikimedia.org for julia.glen - https://phabricator.wikimedia.org/T218815 (10Nuria) Did you tested that you had access @Julia.glen ? Can you confirm things are working? [16:10:10] Thanks milimetric for the deploy-date check - I'll document all that with comments in the code :) [16:11:25] 10Analytics, 10Analytics-Kanban, 10Fundraising-Backlog, 10Product-Analytics, 10Patch-For-Review: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10Ottomata) a:05chelsyx→03Ottomata [16:11:41] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Fundraising-Backlog, and 2 others: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10Ottomata) [16:14:52] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Eventlogging mysql consumer restarting for several hours due to schema parsing errors - https://phabricator.wikimedia.org/T218831 (10Nuria) a:03Nuria [16:23:59] (03PS4) 10Joal: Update mw user-history timestamps [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/497604 (https://phabricator.wikimedia.org/T218463) [16:49:05] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update wikimedia-history revision data with deleted field (and find it a new name?) - https://phabricator.wikimedia.org/T178587 (10JAllemandou) And after saying it, and saying it again, I'm actually ok to go with `revision_is_page_deleted`, even I think it... [16:52:29] (03CR) 10Nuria: [C: 03+1] "Ready to merge I think when we have tested this version of code works." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495266 (https://phabricator.wikimedia.org/T215550) (owner: 10Joal) [16:53:51] 10Analytics, 10Analytics-Wikistats: Feedback on hive table mediawiki_history by Erik Z - https://phabricator.wikimedia.org/T178591 (10Nuria) @JAllemandou what is the difference between user_is_bot_by_name and user_is_bot_by_group/ [16:56:58] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Access to yarn.wikimedia.org for julia.glen - https://phabricator.wikimedia.org/T218815 (10Julia.glen) Yes. Julia.glen is the one that works. Thanks [16:59:40] 10Analytics, 10Analytics-Wikistats: Feedback on hive table mediawiki_history by Erik Z - https://phabricator.wikimedia.org/T178591 (10JAllemandou) @nuria: we set `user_is_bot_by_name` using a regex: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/ana... [17:03:07] 10Analytics, 10Analytics-Wikistats: Feedback on hive table mediawiki_history by Erik Z - https://phabricator.wikimedia.org/T178591 (10Nuria) then i think user_is_flagged_as_bot_in_mediawiki_group (please reword) but something longer and more explicit [17:08:57] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Access to yarn.wikimedia.org for julia.glen - https://phabricator.wikimedia.org/T218815 (10Nuria) 05Open→03Resolved [17:20:11] 10Analytics, 10Analytics-Wikistats: Feedback on hive table mediawiki_history by Erik Z - https://phabricator.wikimedia.org/T178591 (10JAllemandou) the `bot_by_name` and `bot_by_group` terminology is the one we decided to use a while ago - Let's see if others have opinion :) [17:24:06] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Operations, and 3 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10Ottomata) I tcpdumped while reproducing this on kubestage1001 and the kafka brokers. I could see the looped metadata requ... [17:31:25] * elukey off! [17:31:39] Bye elukey - enjoy your weekend :) [17:31:46] elukey: kudos again on TLS :) [17:38:25] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Operations, and 3 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10Ottomata) Interesting fact: kafka-jumbo1003 and kafka-jumbo1006 are the only brokers in this cluster that are not in Row... [17:48:09] joal: o/ [18:31:11] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Operations, and 3 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10Ottomata) Ohooohooooo! Could this be related to IPv6? On a [[ https://logstash.wikimedia.org/goto/1a4ba12d778db9ae95d75db... [19:02:59] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Operations, and 3 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10Ottomata) From https://github.com/edenhill/librdkafka/wiki/FAQ > librdkafka will use the system resolver to resolve the... [19:44:24] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on notebook1004 is CRITICAL: CRITICAL [20:25:05] 10Analytics, 10EventBus, 10Services (watching): EventGate should be able to configure hasty and guaranteed kafka producers individually - https://phabricator.wikimedia.org/T219032 (10Ottomata) [21:48:14] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata, 10Patch-For-Review: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10Melderick) Hello, can't we try to have a bit of both worlds ? Keep a weekly wikidata dump and... [22:23:09] 10Analytics: "Latest X" filter in Turnilo picks the wrong dates - https://phabricator.wikimedia.org/T219040 (10Tbayer) [23:45:35] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Fundraising-Backlog, and 2 others: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10chelsyx) Sorry to hear that @Ottomata :( Let me know when you need any changes on my side.