[06:47:38] good morning [07:33:47] Goo d morning elukey [07:33:53] bonjour [07:34:52] joal: this morning I have been thinking if it is safe to proceed with the analytics-hive.eqiad.wmnet CNAME in prod, since we don't run the same hadoop distro [07:35:09] my tests are based on bigtop, not cdh [07:35:18] I am fairly confident that it should work anyway, but.. [07:35:26] hm [07:35:44] elukey: what does it take to take with the prod system? [07:36:11] joal: sorry I didn't get it :D [07:36:23] elukey: we don't need to remove the current kerb settings right, it's an additional one (with CNAME and all)? [07:36:52] elukey: If this is the case, we can set it up, test, and move when we're sure it works [07:37:24] joal: nono hive needs to run with one kerb principal, either the CNAME one or the host-specific one [07:37:36] so if we change it, we have to modify all the clients [07:38:39] Ah! meh - I completely missed the fact that the principal change impacts the server as well, not the client (it is obvious now you tell it, but didn't connected it) [07:39:00] the only alternative that I can think of is to add an hive server to an-coord1002 with the new credentials, let it connect to the same metastore, point the cname to it and then move clients over [07:39:32] That's a nice one elukey [07:39:35] it may work for oozie, but the hive version is very old in CDH [07:39:45] right [07:40:39] elukey: adding a hive server for test makes a lot of sense - new small hive-server with new config, tests (manual and with oozie), if everything ok rolluout? [07:42:29] joal: I am wondering if two hive servers need specific session-related settings for the metastore, I need to investigte [07:43:05] elukey: another way is to stop the cluster for some time, test, and restart with either new config, or old [07:43:39] elukey: I'd be ok doing that, but need to prep the change for oozie principals [07:44:18] we need bigtooooppppp [07:44:20] :D [07:44:41] :) [07:44:51] this cluster is bigtopyan [08:05:13] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10JAllemandou) > In your dataset, this could be e.g. user_agent_map+city. The problem is th... [08:08:19] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10TedTed) > It seems that next steps are to do some tryouts exploring the idea of "a single... [08:28:33] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10TedTed) > Open question: Would using a fingerprint user-identifier based on user-agent a... [08:30:11] 10Analytics: Fix the remaining bugs open on for Hue next - https://phabricator.wikimedia.org/T264896 (10elukey) [08:30:20] 10Analytics: Fix the remaining bugs open on for Hue next - https://phabricator.wikimedia.org/T264896 (10elukey) >>! In T264896#6577474, @JAllemandou wrote: > I have experienced problems with jobs pagination as well: > - Go to a webrequest-load coordinator page (or any cooridnator with many historical jobs) - Th... [08:59:55] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10elukey) I was able to fix 1044, it was a problem with one broken disk not configured properly in the DELL raid controller setup. 1046 is still not working :( [09:00:31] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['analytics1048.eqiad.wmnet', 'analytics1049.eqiad.wmnet', 'analytics1050.eq... [09:00:40] so only one old worker node not booting [09:09:13] elukey: I sent you a message to proff-read when you have a minute, please :) [09:15:09] joal: answered :) [09:15:20] \o/ thanks mate :) [10:01:54] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1048.eqiad.wmnet'] ` Of which those **FAILED**: ` ['analytics1050.eqiad.wmnet', 'analytics1049.eqiad.wmnet'] ` [10:18:11] 10Analytics, 10Product-Analytics: Add timestamps of important revision events to mediawiki_history - https://phabricator.wikimedia.org/T266375 (10Count_Count) It would be great for counter-vandalism tools and bots, if those tag changes could be published via the Recent Changes event stream (https://stream.wiki... [10:35:59] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['analytics1049.eqiad.wmnet', 'analytics1050.eqiad.wmnet'] ` The log can be... [10:47:46] 10Analytics, 10Operations, 10ops-eqiad: analytics1046 stuck in booting - https://phabricator.wikimedia.org/T267392 (10elukey) [10:48:33] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10elukey) Opened https://phabricator.wikimedia.org/T267392 for analytics1046 [11:06:05] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1049.eqiad.wmnet', 'analytics1050.eqiad.wmnet'] ` and were **ALL** successful. [11:06:52] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['analytics1052.eqiad.wmnet', 'analytics1051.eqiad.wmnet'] ` The log can be... [11:22:02] bigtop upstream is starting the process for 1.5! \o/ [11:27:02] * elukey afk! lunch [11:35:55] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1051.eqiad.wmnet', 'analytics1052.eqiad.wmnet'] ` and were **ALL** successful. [11:36:25] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['analytics1053.eqiad.wmnet', 'analytics1054.eqiad.wmnet'] ` The log can be... [12:05:02] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1054.eqiad.wmnet', 'analytics1053.eqiad.wmnet'] ` and were **ALL** successful. [13:49:17] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['analytics1055.eqiad.wmnet', 'analytics1056.eqiad.wmnet', 'analytics1057.eq... [14:21:18] oh noes also 1057 gets stuck in booting [14:21:22] * elukey cries in a corner [14:21:26] :( [14:25:02] 10Analytics, 10Operations, 10ops-eqiad: analytics1046/analytics1057 stuck in booting - https://phabricator.wikimedia.org/T267392 (10elukey) [14:28:15] 10Analytics, 10Analytics-Kanban: Set up automatic deletion/snitization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10ayounsi) T231339#5442835 the ones we can drop. The logic is that everything we only use for DDoS detection we can get rid of quite quickly, but what we need for tr... [14:50:10] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1057.eqiad.wmnet'] ` Of which those **FAILED**: ` ['analytics1057.eqiad.wmnet'] ` [15:05:23] hellooo teamm [15:08:49] o/ [15:51:37] 10Analytics-Clusters, 10Patch-For-Review: Create a temporary hadoop backup cluster - https://phabricator.wikimedia.org/T260411 (10elukey) After a full round of reimages only 1046 and 1057 are not available, since they don't boot anymore. Let's see if dcops can help in https://phabricator.wikimedia.org/T267392 [15:54:37] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:01:31] a-team: I'll be hanging out in the cave in 30 minutes if anyone wants to socialize [16:01:37] :] [16:05:17] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:06:22] Are there any hidden gotchas i should be aware of using hdfs:///path/to/file instead of hdfs://analytics-hadoop/path/to/file ? [16:06:38] basically, wondering if i can simplify my integration environment by never using the name of the cluster in prod code [16:10:26] ebernhardson: in theory it should be fine since your hdfs client should read from /etc/hadoop/conf [16:12:27] elukey: ok, sounds like i was expecting. Thanks! [16:17:07] (03PS14) 10Sbisson: Oozie job for Wikipedia Preview stats [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/635578 (https://phabricator.wikimedia.org/T261953) [16:17:47] (03CR) 10Sbisson: "I addressed all the comments and retested. I think it could be good to go." (035 comments) [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/635578 (https://phabricator.wikimedia.org/T261953) (owner: 10Sbisson) [16:26:44] * elukey afk! [16:26:45] o/ [16:29:09] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10Milimetric) >>! In T267283#6608149, @TedTed wrote: >> Open question: Would using a finge... [16:30:28] (I'm socializing with myself) [16:52:43] 10Analytics, 10Analytics-Kanban: Set up automatic deletion/snitization for netflow data set in Hive - https://phabricator.wikimedia.org/T231339 (10mforns) > T231339#5442835 the ones we can drop. > > The logic is that everything we only use for DDoS detection we can get rid of quite quickly, but what we need f... [17:17:37] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10Nuria) >We have IPs in a temporary dataset, called pageview_actor that feeds into pagevie... [17:24:54] 10Analytics, 10Analytics-Kanban, 10Growth-Team, 10Product-Analytics, 10Patch-For-Review: Migrate Growth EventLogging schemas to Event Platform - https://phabricator.wikimedia.org/T267333 (10mforns) Thanks for the clarification @nettrom_WMF, will work on that once we agree on T240460. [17:43:43] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10RobH) Just got notice this was received, so it should be delivered to our cage/storage now. [17:44:20] 10Analytics, 10Operations, 10ops-eqiad: analytics1046/analytics1057 stuck in booting - https://phabricator.wikimedia.org/T267392 (10wiki_willy) a:03Cmjohnson [17:47:36] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10mforns) Looks great @lexnasser! I feel the threshold on unique actors is working well. I was surprised to see that the nu... [19:00:55] !log launched backfilling of data quality stats for os_family_entropy_by_access_method [19:00:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:12:15] ah milimetric! I got distracted and forgot about the socialize hour!!!!!! :CCCCCCC [19:13:00] sorry... [19:13:01] it's ok mforns, we just talked about your hair the whole time [19:13:08] xDDDDDD [19:13:10] (we did not, that was a bad joke :)) [19:13:12] as I imagined [19:13:44] ok, then there were more people, ok [19:14:20] yeah, it's nice actually, not to worry too much about how many people join each week or if we necessarily have it each week, I'm getting into that informality of it [19:14:43] aha [19:27:16] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10Isaac) Thanks all for this fascinating (and hopefully productive) conversation! > With t... [21:58:05] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10TedTed) > potentially might allow the release of data from years back with ability to do... [22:08:30] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [22:19:10] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [22:56:51] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Evaluate a differentially private solution to release wikipedia's project-title-country data - https://phabricator.wikimedia.org/T267283 (10Nuria) Thanks @TedTed for all these pointers, on my end I need to digest all this info be... [23:23:27] 10Analytics-Clusters, 10DC-Ops, 10Operations, 10ops-eqiad: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10wiki_willy) Just a side note - these Netbox errors should go away, once the assets are entered into Netbox: https://netbox.wikimedia.org/extras/rep... [23:24:56] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [23:35:38] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers