[00:21:05] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 2 others: Replace the current multisource analytics-store setup - https://phabricator.wikimedia.org/T172410 (10Neil_P._Quinn_WMF) I discussed this with @Nuria in a meeting today and she clarified that T212386 //is// important w... [00:30:18] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10Neil_P._Quinn_WMF) >>! In T212386#4886726, @Neil_P._Quinn_WMF wrote: > Product A... [00:37:03] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Product-Analytics, and 4 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Neil_P._Quinn_WMF) [01:03:12] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10DarTar) @bmansurov thanks for the detailed explanation. I defer to @tizianopiccardi and @Miriam on the best strategy to check for potential regressions f... [01:04:56] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Bug: can't make a YoY time series chart in Superset - https://phabricator.wikimedia.org/T210687 (10Tbayer) OK, that works, thanks ([[https://superset.wikimedia.org/superset/explore/?form_data=%7B%22datasource%22%3A%22352__druid%22%2C%22viz_type%22%3A%22ti... [01:14:42] 10Analytics, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10Neil_P._Quinn_WMF) @Milimetric How quickly will you be able to set the cube up in Turnilo once we provide the transform spec? We're trying to prioritize... [01:39:22] 10Analytics, 10Product-Analytics: Columns named "dt" in the Data Lake have different formats - https://phabricator.wikimedia.org/T212529 (10Neil_P._Quinn_WMF) I going to boldly broaden this into a plea to fully standardize datetimes/timestamps in the Data Lake 🙏 There are actually two more formats that Morten... [01:59:44] 10Analytics, 10Product-Analytics: Standardize datetimes/timestamps in the Data Lake - https://phabricator.wikimedia.org/T212529 (10Neil_P._Quinn_WMF) [02:45:37] 10Analytics, 10Page-Issue-Warnings: event_pageissues Turnilo view contains no valid data from before January 5 - https://phabricator.wikimedia.org/T214136 (10Tbayer) [06:36:52] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui) Thanks @greg - I have done the mysqldump for it. @elukey I have left it at: `c... [06:39:13] 10Analytics, 10Analytics-Kanban, 10DBA, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10Marostegui) [07:00:23] morning! [07:00:28] o/ [07:17:17] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) [08:01:21] Hi folks [08:01:31] bonjour! [08:01:38] Bonjour elukey :) [08:02:36] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) [08:04:03] Morning all! [08:04:09] Morning addshore :) [08:04:11] :D [09:31:36] joal: I am going to merge a puppet change that enables the role coordinator on analytics1030 [09:31:45] but without all the timers/crons [09:31:50] only hive/oozie/etc.. [09:31:54] elukey: from a camus perspective, it'll try to get the whole stuff? [09:31:59] Ah ok :) [09:32:03] +1 ! [09:43:24] joal: we'll also need to remove stuff that pushes to Druid [09:43:35] even if it would be great to have it tested properly [09:44:27] elukey: 2 things push to druid IIRC - oozie (jobs not started, ok) and some spark job (through timer - ok?) [09:45:16] joal: yeah I am not deploying refinery so nothing will start [09:45:24] sounds good [09:46:12] the other point is that when we'll enable security then druid must follow as well [09:46:25] difficult to test right now [09:46:30] I might ask for 3 vms [09:46:54] Could be interesting for testing purposes indeed [09:47:07] elukey: We could use one of the hadoop worker nodes? [09:47:11] is this about Kerberos? I had tested Druid in labs https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/474672/ successfully [09:47:56] moritzm: yep yep but it would be great to have the same scenario tested in prod as well with all the spark jobs pushing data etc.. [09:48:10] to be really sure what nothing explodes after we enable the madness [09:48:46] ideally with the testing cluster we could have a replica of the data flows and debug them before deciding to turn on security in the "official" one [09:51:46] moritzm: does it make sense? [09:51:54] totally [09:52:10] I just couldn't grasp what the context of the discussion was about :-) [09:52:20] okok :D [09:52:29] maybe we can simply spin up Ganeti VMs for them? should work fine without baremetal [09:52:59] I was thinking the same, a little cluster on ganeti [09:54:56] joal: if we really want to be brave after security we could also test CDH6 [09:55:01] :P [09:55:17] * joal fears nothing when riding along elukey [09:55:25] * elukey hugs joal [09:56:21] http://www.quickmeme.com/p/3w2l1z [09:58:39] Or we can spin up a hadoop cluster 3.0 in kubernetes [10:00:06] elukey: about kubernetes, I'm very interested if you have enough understanding of how data oriented software can be setup (namelly hdfs) :) [10:00:32] I was joking :D [10:00:51] I know! I'm very interested nonetheless ! [10:07:34] PROBLEM - Check if the Hadoop HDFS Fuse mountpoint is readable on analytics1030 is CRITICAL: NRPE: Command check_check_hadoop_mount_readability not defined [10:08:15] ah! [10:08:51] but notifications are disabled [10:08:53] mmmm [10:09:26] PROBLEM - Check the last execution of hdfs-balancer on analytics1030 is CRITICAL: NRPE: Command check_check_hdfs-balancer_status not defined [10:12:28] PROBLEM - Hive Metastore on analytics1030 is CRITICAL: NRPE: Command check_hive-metasore not defined [10:14:16] PROBLEM - Hive Server on analytics1030 is CRITICAL: NRPE: Command check_hive-server2 not defined [10:20:59] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) [11:43:05] * elukey lunch! [13:35:42] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10bmansurov) Update: the patches have been merged yesterday. We just need to close the open conversation about other remaining items. [14:34:51] 10Analytics, 10Operations, 10Research, 10Article-Recommendation, 10User-Marostegui: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) @Marostegui thanks, your last suggestion is captured a {T211980}. [14:35:43] 10Analytics, 10Operations, 10Research, 10Article-Recommendation, 10User-Marostegui: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) [14:35:45] 10Analytics, 10Research, 10Article-Recommendation: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10bmansurov) [14:35:54] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. - https://phabricator.wikimedia.org/T212256 (10elukey) Side notes related to the first puppet run of a coordinator: *... [15:07:07] 10Analytics, 10Page-Issue-Warnings: event_pageissues Turnilo view contains no valid data from before January 5 - https://phabricator.wikimedia.org/T214136 (10mforns) p:05Triage→03Unbreak! a:03mforns [15:25:54] PROBLEM - Hue Server on analytics1039 is CRITICAL: NRPE: Command check_hue not defined [15:25:59] 10Analytics, 10Page-Issue-Warnings: event_pageissues Turnilo view contains no valid data from before January 5 - https://phabricator.wikimedia.org/T214136 (10mforns) @Tbayer Thanks for the heads up. I think I know what happened. On January 5th, I deployed a new feature of HiveToDruid, the job that loads data... [15:26:11] 10Analytics, 10Analytics-Kanban, 10Page-Issue-Warnings: event_pageissues Turnilo view contains no valid data from before January 5 - https://phabricator.wikimedia.org/T214136 (10mforns) [15:34:23] joal / fdans / mforns: anyone interested in pairing on this infuriating join problem? [15:34:58] milimetric: sure, I can only do 25min tho, I got all hands planning meeting [15:34:59] milimetric, yessss, I like to be infuriated :] [15:35:20] fdans: no prob, I'll infuriate mforns, to the batcave! [15:35:26] k [15:35:37] milimetric, grabbing some water [15:37:20] RECOVERY - Hue Server on analytics1039 is OK: PROCS OK: 1 process with command name python2.7, args /usr/lib/hue/build/env/bin/hue [15:53:07] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Set up a Analytics Hadoop test cluster in production that runs a configuration as close as possible to the current one. - https://phabricator.wikimedia.org/T212256 (10elukey) Side notes related to the Hadoop UI role: * libssl1.0.0's dummy... [15:54:49] 10Analytics, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10Nuria) That is quite easy, it just will take a few hours to load data with different transformations and see how the data looks in turnilo, probably @mf... [16:03:11] 10Analytics, 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10faidon) >>! In T213748#4890195, @RobH wrote: > Synced up with Chris via IRC: > > All systems were able to come back up within a2 without incident. The s... [16:05:09] 10Analytics, 10Operations, 10ops-eqiad: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10RobH) [16:05:18] 10Analytics, 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH) 05Resolved→03Open a:05RobH→03Cmjohnson >>! In T213748#4892549, @faidon wrote: >>>! In T213748#4890195, @RobH wrote: >> Synced up with Chris... [16:12:16] 10Analytics, 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10Cmjohnson) Correct, it was only the fuse [16:15:14] 10Analytics, 10Operations, 10Research, 10Article-Recommendation, 10User-Marostegui: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Ottomata) > I don't think writing from Hadoop directly to M2 master is a good idea. But it is not really my call.... [16:19:48] 10Analytics, 10Operations, 10Research, 10Article-Recommendation, 10User-Marostegui: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Nuria) >Finding a separate place to run the import scripts is probably the path of least resistance. +1 [16:33:29] milimetric: o/ [16:33:32] do you have a minute? [16:33:39] yeah elukey [16:33:45] thanks :) [16:34:01] so I am working on moving report updater to systemd timers [16:34:28] the only problem is that using the new puppet code means using the systemd::syslog class [16:34:48] so you mean we lose file logs? [16:34:48] that enforces a convention [16:34:53] nono [16:35:06] the convention is: base_path/$title [16:35:15] base path can be configured, like /var/log [16:35:31] $title is the name of the job in puppet, like say limn-language-data-interlanguage [16:35:50] in theory we could have something like [16:35:55] /srv/reportupdater/log as base path [16:36:02] oh yeah, that's totally fine to change [16:36:09] and then a log in: /srv/reportupdater/log/limn-language-data-interlanguage/limn-language-data-interlanguage.log [16:36:10] as long as they make sense and the path isn't like alsalsiejalijsfliaj, it's good [16:36:24] yep, totally fine [16:36:25] or even something different as file name [16:36:37] do we rsync those somewhere? [16:36:40] nope [16:36:47] people log in and tail if something's wrong [16:36:49] mostly I do that [16:37:07] I or whoever's on ops duty [16:37:12] like, general AE [16:38:00] ack! [16:41:28] milimetric: proposal [16:41:47] /srv/reportupdater/log/limn-language-data-interlanguage/syslog.log [16:41:49] good? [16:42:13] ok, sounds good [16:46:06] milimetric: sorry, and this [16:46:16] /srv/reportupdater/log/reportupdater-limn-language-data-interlanguage/syslog.log [16:46:18] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Security-Team, and 3 others: Modern Event Platform: Stream Intake Service: AJV usage security review - https://phabricator.wikimedia.org/T208251 (10sbassett) @Ottomata et al - just fyi, this is officially in-progress and I hope to have a review completed... [16:46:19] too long? [16:46:23] yes probably [16:46:31] the other looks better [16:46:34] uff I hate this convention [16:46:53] the main issue is that it would be great to name "reportupdater-limn-language-data-interlanguage" the puppet resource [16:46:57] but not the directory [16:48:07] elukey: why does "reportupdater" have to get repeated? [16:48:13] oh I see [16:48:17] yeah :( [16:48:31] uh.... that's fine elukey, nobody cares about the name of these logs [16:48:37] go for it, make it pretty in puppet [17:47:58] fdans: looks like adam cannot come to meeting, do you want to jump in hangout and work on your dev env [17:48:17] ok nuria! [17:48:38] our room nuria? [17:49:01] fdans: sure [18:11:42] o/, y'all. Cross-posting what I just posted in -perf: [18:12:14] There may be a bug in schema for the event.navigationtiming Hive table: [18:12:14] The deviceMemory property of the event is (correctly) specified as a number – Navigator.deviceMemory is a floating point value [18:12:14] Whereas the event.deviceMemory field in the corresponding Hive table is specified as BIGINT [18:12:14] Since the beginning of the year, there've been just over 15,000 rows inserted with the event.deviceMemory field set to 0 [18:45:08] 10Analytics, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10Neil_P._Quinn_WMF) a:05Neil_P._Quinn_WMF→03None >>! In T211173#4892533, @Nuria wrote: > That is quite easy, it just will take a few hours to load da... [18:46:04] * elukey off! [19:11:43] milimetric: heya - back to duplicates after diner - Can you confirm me the table and snapshot we are looking at? [19:11:59] I'm assuming mediawiki_history snapshot 2018-11 [19:12:08] eqi stat1004 [19:12:11] oopxs [19:13:11] joal: the code to load mediawiki into druid .. is it committed to refinery anywhere? [19:13:53] nuria: oozie/mediawiki/history/druid [19:15:45] joal: yes [19:15:53] great, thanks milimetric [19:33:28] 10Analytics, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10Nuria) The code to ingest this data already exists but it does not work well due to number of dimensions and how hard it is to understand the dimension... [20:03:15] milimetric: have a minute for me in da cave? [21:48:02] (03PS8) 10Milimetric: Update mediawiki-history comment and actor joins [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/480796 (https://phabricator.wikimedia.org/T210543) (owner: 10Joal) [21:55:44] 10Analytics, 10Analytics-Kanban, 10Page-Issue-Warnings: event_pageissues Turnilo view contains no valid data from before January 5 - https://phabricator.wikimedia.org/T214136 (10Tbayer) >>! In T214136#4892512, @mforns wrote: > @Tbayer Thanks for the heads up. I think I know what happened. > > On January 5th... [22:12:45] (03CR) 10Nuria: Update mediawiki-history comment and actor joins (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/480796 (https://phabricator.wikimedia.org/T210543) (owner: 10Joal) [23:44:24] 10Analytics, 10Anti-Harassment (AHT Sprint 35): 👩‍👧 Track how often blocked users attempt to edit - https://phabricator.wikimedia.org/T189724 (10jrbs)