[01:57:15] PROBLEM - Throughput of EventLogging EventError events on alert1001 is CRITICAL: 82.52 ge 30 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=13&fullscreen&orgId=1 [02:05:31] RECOVERY - Throughput of EventLogging EventError events on alert1001 is OK: (C)30 ge (W)20 ge 0.6726 https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=13&fullscreen&orgId=1 [06:30:36] good morning [06:37:21] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10elukey) @jcrespo perfect timing, I have finished https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Mysql_Meta#Restore_a_Ba... [06:40:21] today I wanted to migrate hive to an-coord1002, but I just realized a biiiig problem [06:41:08] we have an-coord1001 (keytab + host:port) all over refinery [06:41:19] in oozie config files [06:41:52] this is another big part of the SPOF, because if an-coord1001 fails then a big refinery deployment is also needed, to also restarts jobs [06:42:06] no bueno [06:42:47] I am wondering if there is a solution to this mess [07:15:41] Good morning elukey [07:15:43] hm [07:26:20] bonjour joal [07:26:39] I was trying to find a way for oozie to read action defaults from hdfs, but it seems too magical [07:27:44] we cannot really think about a complete deployment/restart of all jobs in case an-coord1001 goes down [07:28:01] or just like in this case to move hive to another server [07:28:02] elukey: I completely agree :S [07:28:28] elukey: could we have a redirection address? [07:29:46] a dns CNAME you mean? [07:30:17] yeah - In order to be able to reroute traffic if a backend is down [07:30:28] I thought about it but we have two configs [07:30:29] hive/an-coord1001.eqiad.wmnet@WIKIMEDIA [07:30:34] jdbc:hive2://an-coord1001.eqiad.wmnet:10000/default [07:30:45] so the latter can use a CNAME, the former no sadly [07:31:23] hm [07:36:10] I am checking oozie.service.HadoopAccessorService.action.configurations [07:36:22] "If the ACTION.xml file exists, its properties will be used as defaults properties for the action. If the path is relative is looked within the Oozie configuration directory; though the path can be absolute (i.e. to point to Hadoop client conf/ directories in the local filesystem." [07:38:37] so IIUC, under /etc/oozie/conf/action-conf on an-coord1001 we could place hive2.xml (or similar) and set parameters on it [07:38:42] elukey: this way is great to prevent us having so much boilerplate [07:39:02] elukey: I still think changes will require job restarts [07:39:10] ah yes yes you are right [07:39:16] elukey: But the first gain is already tremendous [09:08:17] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Fix TLS certificate location and expire for Hadoop/Presto/etc.. and add alarms on TLS cert expiry - https://phabricator.wikimedia.org/T253957 (10elukey) Documentation updated, finally the task is done! [09:30:28] joal: I have to move a journalnode from analytics1052 to an-worker1080, since the former host is in decom [09:30:35] my idea is the following: [09:30:49] - merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/635507 [09:31:04] - stop the journalnode on 1052, copy its partition data to 1080 [09:31:22] - run puppet on 1080, so it starts the JN, keeping the one on 1052 down [09:31:32] - run puppet on the masters, and restart them [09:31:45] in theory at the end they'll have the list of JNs updated [09:31:52] does it sound ok? [09:37:38] elukey: it souns ok - I assume you'll disable puppet on the masters before the start of the op so that they don't get updated inadvertantly [09:39:12] joal: in theory no need, even if the config changes we need to restart them to force the refresh [09:39:20] ack elukey [09:39:24] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10jcrespo) Everything there looks fine! There may be procedures that I could help you simplify to be done more easily, we can talk on a... [09:52:19] I am restarting the master hdfs daemosn now [10:22:53] to be sure, I am going to roll restart the journals [10:22:55] (03CR) 10Faidon Liambotis: Add Refine transform function for Netflow data set (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/634328 (https://phabricator.wikimedia.org/T254332) (owner: 10Mforns) [10:26:00] !log move journalnode from analytics1052 (to be decommed) to an-worker1080 [10:26:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:28:30] joal: the new JN seems happy [10:41:15] !log decommission analytics1052 from the hadoop cluster [10:41:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:49:18] 10Analytics: On analytics.wikimedia.org, Safari on iOS ignores collapsed tabs interaction; stats dashboard is poorly navigable on Safari on iOS and Chrome on Android - https://phabricator.wikimedia.org/T266071 (10Aklapper) [10:50:17] 10Analytics: On analytics.wikimedia.org, Safari on iOS ignores collapsed tabs interaction; stats dashboard is poorly navigable on Safari on iOS and Chrome on Android - https://phabricator.wikimedia.org/T266071 (10Aklapper) Hi @gh87, thanks for taking the time to report this! This ticket seems to mix several unre... [10:53:45] happy JNs mean happy us :) [10:59:16] I didn't recall that the namenode uses Paxos for the edit log [10:59:24] I thought it was something simpler [10:59:39] anyway, the good thing is that the journalnodes don't talk with each other [10:59:45] so swapping them is easy [10:59:53] This is great [11:00:13] I'll update the docs so I don't forget :D [11:01:05] also two nodes left to decom :) [11:01:12] (one is in progress) [11:01:17] * joal is eager to get started copying data! [11:02:20] joal: I think that using bigtop directly is fine, is it ok for you? [11:02:39] yessir!!! [11:03:14] elukey: it'll be a good fire-test for our storage [11:03:54] ack then [11:03:59] going to have lunch! ttl [12:15:28] I found http://oozie.apache.org/docs/4.2.0/AG_ActionConfiguration.html that is interesting, explaining the oozie option that we were discussing [12:23:22] so IIUC something like hive2=/etc/hive/conf/hive-site.xml could be enough? [12:27:29] mmm no [12:35:55] ok I need to test this in hadoop test [12:43:14] (03CR) 10Ayounsi: [C: 03+1] Add Refine transform function for Netflow data set (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/634328 (https://phabricator.wikimedia.org/T254332) (owner: 10Mforns) [12:55:39] 10Analytics, 10Browser-Support-Apple-Safari: On analytics.wikimedia.org, Safari on iOS ignores collapsed tabs interaction - https://phabricator.wikimedia.org/T266122 (10gh87) [13:05:46] helloooo [13:06:13] hola hola [13:18:23] yoohooo [13:19:54] morninnn [13:31:33] 10Analytics-Clusters, 10Analytics-Kanban, 10User-Elukey: Repurpose db1108 as generic Analytics db replica - https://phabricator.wikimedia.org/T234826 (10Nintendofan885) [13:34:10] https://www.oreilly.com/library/view/apache-oozie-essentials/9781785880384/ch05s03.html [13:36:18] not super useful but interesting --^ [13:36:37] I am also trying to use the oozie cli in hadoop test since hue is not there yet, what a pain :D [13:42:49] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Add sbisson to analytics-privatedata-users and create a kerberos identity - https://phabricator.wikimedia.org/T265969 (10Arrbee) This is an approved request for Stephane. Thanks. [14:09:09] 10Analytics: Filter non-mediawiki hostnames at ingestion time - https://phabricator.wikimedia.org/T266130 (10mforns) [14:09:53] 10Analytics: Dashboards on analytics.wikimedia.org are less readable on smartphones - https://phabricator.wikimedia.org/T266132 (10gh87) [14:10:56] 10Analytics: analytics.wikimedia.org is less readable on smartphones - https://phabricator.wikimedia.org/T266132 (10gh87) [14:13:06] joal yt? [14:13:32] wanna talk about unique metrics proxy? [14:20:05] elukey: do you have any opinions on the Turnilo thing? I initially thought the left-most IP made the most sense as the intended value of the XFF header, but is that not the case in other environments? [14:25:36] milimetric: in theory the lefmost should be the external ip, did upstream complain? [14:26:25] no, but I realized that they were choosing to trust the 1st hop, which is the right-most on purpose. I'm not 100% sure if that's a bug or not, but it made me wonder [14:26:41] 10Quarry: Quarry down for logged in users - https://phabricator.wikimedia.org/T265997 (10Count_Count) Works again. [14:26:42] I guess it's probably a bug, since their setting is called "always" and that implies trust the whole XFF header [14:28:25] yep [14:30:12] milimetric: that feature has been way more complex then any of us could have guessed [14:30:19] 10Analytics, 10Operations, 10puppet-compiler, 10Jenkins, 10Release-Engineering-Team (CI & Testing services): Puppet CI idea: Add a PCC-Nodes tag to commit message to launch PCC job on new patch in gerrit - https://phabricator.wikimedia.org/T266139 (10Ottomata) [14:31:22] Spookreeeno: heh, nah, I just wasn't that familiar with that whole stack and our environment is as complicated as possible to maximize confusion :) [14:32:26] milimetric: tbh when I first saw the task I was expecting it to be like a few steps. Simple environments don't exist! [14:35:30] 10Analytics: Stats dashboards on analytics.wikimedia.org are poorly navigable on mobile devices - https://phabricator.wikimedia.org/T266142 (10gh87) [14:36:01] 10Analytics: On analytics.wikimedia.org, Safari on iOS ignores collapsed tabs interaction; stats dashboard is poorly navigable on Safari on iOS and Chrome on Android - https://phabricator.wikimedia.org/T266071 (10gh87) 05Open→03Invalid [14:37:23] 10Analytics: On analytics.wikimedia.org, Safari on iOS ignores collapsed tabs interaction; stats dashboard is poorly navigable on Safari on iOS and Chrome on Android - https://phabricator.wikimedia.org/T266071 (10gh87) Splitting into separate tasks: T266122, T266132, T266142 [14:40:54] elukey: o/ [14:41:02] o/ [14:41:14] having a very strange systemd unit backslash escaping thing, have you dealt with this before? [14:41:24] /lib/systemd/system/camus-eventgate-analytics-external_events.service:9: Ignoring unknown escape sequences: "-Dkafka.whitelist.topics=(eqiad|codfw)\.eventgate-analytics-external\.test\.event" [14:41:37] i think it has changed because I removed one level of quotes? [14:41:38] not srue. [14:41:38] but [14:41:45] maybe i have to put like a billion \\\ in puppet? [14:42:16] there are some chars that needs to be escaped [14:43:12] the ExecStart string is not bash, but it is interpreted by its own systemd parser [14:43:24] right [14:43:27] this is why I always suggest to make a /usr/local/bin script [14:43:29] googled as much [14:43:34] ha yeah... [14:43:35] hm [14:44:06] what did you change? [14:44:57] ah https://gerrit.wikimedia.org/r/c/operations/puppet/+/635549/3/modules/profile/manifests/analytics/refinery/job/camus.pp right? [14:45:20] hm i think it changed from [14:45:20] --check-java-opts '-Dkafka.whitelist.topics="(eqiad|codfw)\.eventgate-analytics-external\.test\.event"' [14:45:20] to [14:45:20] --check-java-opts '-Dkafka.whitelist.topics=(eqiad|codfw)\.eventgate-analytics-external\.test\.event' [14:45:20] withotu the double quotes around the actual regex [14:45:35] i'll try and add them back in... [14:46:04] when do you get the Ignoring unknown escape sequences ? [14:46:41] i didn't see it until i did sudo systemctl status camus-eventgate-analytics-external_events [14:47:36] 10Analytics, 10Mobile: analytics.wikimedia.org is less readable on smartphones - https://phabricator.wikimedia.org/T266132 (10Reedy) [14:48:39] ottomata: does the dot need to be escaped? [14:48:55] yes [14:48:59] part of the regex [14:49:02] meaning literal . [14:49:13] well in this case [14:49:16] it isn' thte . that is escaped [14:49:21] it is the backlash [14:49:36] we want the regex to have \. [14:49:38] so it probably needs a \\ [14:49:44] in puppet i have \\. [14:49:47] so i guess i need \\\. [14:49:50] but, it was \\. before [14:49:52] with the double quotes [14:49:55] so i'm trying that first [14:50:03] yeah then \\\, lemme check, I think we already had this problem before with Marcel [14:50:34] yes see data_purge.pp [14:50:35] oh [14:50:57] ottomata: --^ [14:51:18] # Also, we need the systemd to escape our \w, AND we need puppet to do the same. So we use [14:51:21] # \\\\w to result in \\w in systemd which then ends up executing with a \w. [14:52:37] anyway, I think that we should put all into bash scripts, maybe having something that does it automatically for the user [14:56:49] klausman: o/ [14:56:56] I am reading https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/RELEASE.md#release-212 and I have a doubt [14:57:01] ya elukey strangely [14:57:05] with the extra double quote [14:57:06] s [14:57:10] the \. works [14:57:50] klausman: it explicitly mentions "Switches ROCM builds to use ROCM 3.7" meanwhile it does not for 3.8 in recent releases.. Did they release tenforflow-rocm for 3.8? If not we can't upgrade :( [14:58:33] \o [14:59:17] I will see and compare the filelist in both [14:59:20] 10Analytics, 10Analytics-Kanban: Fix the remaining bugs open on for Hue next - https://phabricator.wikimedia.org/T264896 (10Milimetric) a:05Milimetric→03None [15:02:05] klausman: <3 [15:04:54] which python version would we be running against? [15:05:12] (they have packages of tf-rocm for 3.6, 3.7 and 3.8) [15:06:32] 3.7 [15:13:55] 10Analytics, 10Operations, 10puppet-compiler, 10Jenkins, 10Release-Engineering-Team (CI & Testing services): Puppet CI idea: Add a PCC-Nodes tag to commit message to launch PCC job on new patch in gerrit - https://phabricator.wikimedia.org/T266139 (10jbond) @Ottomata This is already possible however you... [15:14:54] 10Analytics, 10Operations, 10puppet-compiler, 10Jenkins, 10Release-Engineering-Team (CI & Testing services): Puppet CI idea: Add a PCC-Nodes tag to commit message to launch PCC job on new patch in gerrit - https://phabricator.wikimedia.org/T266139 (10jbond) [15:20:33] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10lexnasser) @Amire80 I definitely agree that San Marino is an edge case. Do you think there are any other metrics that cou... [15:21:46] (03PS1) 10Mforns: Correct denominator in structured-data wikidata query [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/635564 (https://phabricator.wikimedia.org/T264945) [15:23:44] (03CR) 10Mforns: [V: 03+2] "Indeed, Isaac was right, the where clause filter works." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/635564 (https://phabricator.wikimedia.org/T264945) (owner: 10Mforns) [15:31:50] Heya mforns Here I am now if you wish to talk :) [15:36:57] joal: I think that the oozie option to pick action defaults works, I am still testing but it looks promising [15:37:17] if so I'll send a patch, we may remove hive-related stuff from the .properties file [15:37:32] but still a massive restart is needed, a real downside [15:37:41] elukey: That's super great - There are a bunch of stuff we should remove [15:38:31] elukey: in case of an-coord host down, jobs needs to be restarted, but only the default config needs to be changed - Am I getting correctly? [15:38:56] joal: if what I am testing is 100% proven (I hope so) then yes [15:39:04] That's great [15:39:20] well I hoped for a smoother transition :( [15:39:25] Yeah I know [15:39:55] it is good that we know all these things now, I'll make sure to add everything in the failover procedure [15:40:04] yup [15:40:14] also we'd need a full restart if we want the hive-server on an-coord1002 [15:40:26] elukey: we probably could use the script I wrote (long ago) to help with restarts [15:41:20] elukey: bin/refinery-oozie-rerun [15:41:39] yep! [15:41:51] going to take a little break before meetings :) [15:42:04] elukey: still a lot of manual checking, but possibly quite some overhead than full-manual [15:54:41] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move https termination from nginx to envoy (if possible) - https://phabricator.wikimedia.org/T240439 (10razzi) [15:55:31] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move https termination from nginx to envoy (if possible) - https://phabricator.wikimedia.org/T240439 (10razzi) [15:56:21] (03CR) 10Isaac Johnson: [C: 03+1] "Looks good -- thanks!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/635564 (https://phabricator.wikimedia.org/T264945) (owner: 10Mforns) [15:56:25] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move https termination from nginx to envoy (if possible) - https://phabricator.wikimedia.org/T240439 (10razzi) Good idea @elukey - added. [15:58:25] (03CR) 10Nuria: "The file would need to be corrected by hand as "numerator" of this calculation does not exit historically. The tunning session metrics fol" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/635564 (https://phabricator.wikimedia.org/T264945) (owner: 10Mforns) [15:59:35] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Amire80) >>! In T207171#6568590, @lexnasser wrote: > @Amire80 I definitely agree that San Marino is an edge case. Do you... [16:07:17] sorry joal, was in a meeting, maybe after standup? [16:07:21] sure! [16:31:49] fdans: can you give me (us) permits to see the jd doc? [16:31:51] mforns: meet.google.com/gpk-qhdc-seb [16:37:04] joal: i asked grant (via google doc) to share with team [16:37:19] Thanks nuria [16:54:14] (03PS1) 10Sbisson: Oozie job for Wikipedia Preview stats [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/635578 (https://phabricator.wikimedia.org/T261953) [16:55:37] nuria: mforns just shared :) [17:00:04] fdans: thankssss [17:02:03] nuria: when you have time https://gerrit.wikimedia.org/r/c/operations/puppet/+/635227 [17:02:09] also fdans --^ [17:31:36] ottomata: https://gerrit.wikimedia.org/r/c/operations/puppet/+/635227 :D (when you have time) [17:31:48] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Privacy Engineering, and 4 others: Remove http.client_ip from EventGate default schema (again) - https://phabricator.wikimedia.org/T262626 (10Jdlrobson) [17:32:01] * elukey afk! [18:16:46] ahh +1 elukey [18:16:49] i am the approver now [18:17:01] to all who would like to be approved: beware... [18:17:17] 10Analytics, 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Add sbisson to analytics-privatedata-users and create a kerberos identity - https://phabricator.wikimedia.org/T265969 (10Ottomata) I am the Analytics approver now. APPROVED. [18:17:56] ottomata: power!!!! [18:19:43] :P [18:21:06] * Spookreeeno goes back to avoiding doing useful work [18:47:22] oh elukey ! [18:47:28] an-test-coord1001 puppet disabled!!! [18:47:29] :) [18:48:33] erggg i'm going to enable it.... [18:48:36] hope i don't destroy things [19:09:17] razzi: FYI how goes with the ganeti thing [19:09:18] ? [19:09:24] s/FYI/BTW [19:13:55] ottomata: Just ate lunch, about to pick it back up [19:15:18] k lemme know if i can help [19:18:15] quick question: do I want to include the ```+node 'an-test-client1001.eqiad.wmnet' { [19:18:15] + role(analytics_test_cluster::client) [19:18:15] +} [19:18:15] ``` [19:18:16] in the patch with the mac address, or should that be a follow-up? Or does it not matter? [19:19:02] hm, doesn't matter too much [19:19:03] but [19:19:07] probably better to do two patches [19:19:13] cool [19:19:13] so your first puppet runs on the host don't tryy to apply that [19:19:16] actually [19:19:21] you could add the node entry [19:19:22] with just [19:19:25] role(standby) [19:19:26] i think... [19:19:38] hm no... [19:19:39] something.. [19:19:43] role(insetup) [19:19:43] ? [19:19:50] yes [19:19:52] tthat ^^^ [19:20:05] but i'm not sure if it matters [19:22:46] (03PS3) 10Nuria: [WIP] Adding quality alarms for mobile app data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/633579 (https://phabricator.wikimedia.org/T257692) [19:25:37] Cool, good to be specific. ottomata check out https://gerrit.wikimedia.org/r/c/operations/puppet/+/635624 [19:28:33] +1 razzi [19:28:52] cool [19:30:27] razzi: ok if i puppet merge yours? i got one going at the same time [19:30:38] Yeah sgtm [19:34:21] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update Wikidata usage metric - https://phabricator.wikimedia.org/T264945 (10mforns) The table `wmf_raw.mediawiki_page` is not historical and only has snapshots since 2020-04. Thus, we cannot re-calculate the denominator using that same table and query. --... [19:38:23] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Update Wikidata usage metric - https://phabricator.wikimedia.org/T264945 (10Nuria) >I found differences of <0.1% for recent months and <0.3% for older months. I think that's acceptable. +1 [19:49:46] We're just about to enter the maintenance window (starts in 10 minutes, lasts an hour) for switching from nginx to envoy. There will be several puppet merges in (hopefully) quick succession [19:51:00] See https://phabricator.wikimedia.org/T240439 [19:57:46] am here for ya razzi if you need anything [20:00:38] (03CR) 10Mforns: [V: 03+2 C: 03+2] "I got a +1 here and another in the Phab task." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/635564 (https://phabricator.wikimedia.org/T264945) (owner: 10Mforns) [20:02:35] !log stop nginx on matomo1002.eqiad.wmnet to switch to envoy [20:02:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:03:24] Hey, I don't know if you've seen this: https://phabricator.wikimedia.org/T257118 there are lots of VMs in beta cluster that are related to anlaytics (aqs, zookeeper, kafka, ...) that are unclaimed (by claim, I mean just saying that it's being used) and beta cluster has reached its quota (and I'm trying to clean it a bit). Can you take a look and mark the ones that you think is used? https://phabricator.wikimedia.org/T257118 [20:03:49] and delete the ones that are not [20:04:05] I'm slowly shutting down VMs these days [20:05:00] Amir1: done, thoes are all needed [20:05:05] oh aqs [20:05:09] not sure about that ping milimetric probably? [20:05:15] !log stop nginx on analytics-tool1004.eqiad.wmnet to switch to envoy (superset) [20:05:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:06:41] still good, thanks! [20:06:42] ottomata: it's in the docs to do beta testing on, but we use it very very rarely [20:06:50] I mean cc Amir1 [20:07:03] milimetric: can it be scaled down a bit if possible? [20:07:23] !log stop nginx on analytics-tool1007.eqiad.wmnet to switch to envoy (turnilo) [20:07:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:07:54] Amir1: um, maybe, but I'm swamped I'm not sure when we'd get to that. If you need the space and can't find it anywhere else, just delete them and we'll figure something out. [20:08:57] noted. thanks [20:10:11] !log stop nginx on analytics-tool1001.eqiad.wmnet to switch to envoy (hue) [20:10:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:10:43] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move https termination from nginx to envoy (if possible) - https://phabricator.wikimedia.org/T240439 (10razzi) [20:12:48] !log stop nginx on analytics-tool1001.eqiad.wmnet to switch to envoy (hue-next) [20:12:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:12:59] 10Analytics, 10Mobile: analytics.wikimedia.org is less readable on smartphones - https://phabricator.wikimedia.org/T266132 (10gh87) [20:15:03] 10Analytics, 10Patch-For-Review, 10User-Elukey: Move https termination from nginx to envoy (if possible) - https://phabricator.wikimedia.org/T240439 (10razzi) [20:23:27] Alright, all internal hosts have been switched to use envoy for tls! [20:24:19] nice stuff! [20:24:26] ottomata: if you're cool with it, I'm going to go ahead and deploy envoy to stats.wikimedia.org as well, since it should be no-downtime [20:24:31] https://gerrit.wikimedia.org/r/c/operations/puppet/+/634667 [20:24:33] ya that sounds fine razzi [20:24:44] Cool [20:24:54] i actually am going to head out very soon, so if you are comfortable doing it without me or luca here, plz proceed! [20:25:04] i think either way its an easy revert [20:25:06] so ya go ahead [20:29:30] ottomata: I added envoy to stats.wikimedia.org, but I'm realizing I don't actually know where changes to hieradata/common/profile/trafficserver/backend.yaml would be applied, so I'm pausing here for now [20:30:12] I assume running puppet agent on thorium.eqiad.wmnet would not apply the changes for https://gerrit.wikimedia.org/r/c/operations/puppet/+/634669 [20:31:57] ah yeah, actually that might be good to wait for [20:32:02] to sync up with luca and or someone from traffic [20:32:17] that makes a change on the frontends [20:32:22] for proxying traffic [20:32:26] yeah good to pause [20:34:38] Cool. Catch you later ottomata ! [20:35:04] (leaving soon but not yet right this minute but) l8888rs!! :) [20:48:11] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Amire80) ... Another kind of related scenario that someone has just brought up in the Wikimedia Telegram chat: "Is there... [20:50:46] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Fix Maxmind geoip database archive - https://phabricator.wikimedia.org/T264152 (10razzi) I deployed this to an-launcher1002 but got a permission error: ` razzi@an-launcher1002:~$ sudo journalctl -u archive-maxmind-geoip-database.service | cat -- Logs begi... [20:53:44] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Fix Maxmind geoip database archive - https://phabricator.wikimedia.org/T264152 (10Ottomata) Hm, I think you can `hdfs dfs -chmod 775 /wmf/data/archive/geoip` and it should work. [21:18:39] PROBLEM - Check the last execution of archive-maxmind-geoip-database on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit archive-maxmind-geoip-database https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [21:22:46] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Isaac) > Have in mind that per population data is not necessarily needed (it will be great to have at some point but it f... [23:36:27] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Nuria) >A daily release to provide quick information for editors interested in very targeted editing. I suspect that this... [23:38:08] 10Analytics, 10Analytics-Wikistats, 10Inuka-Team, 10Language-strategy, and 2 others: Have a way to show the most popular pages per country - https://phabricator.wikimedia.org/T207171 (10Nuria) I would implement the daily "top" 1st and once that is in place I would add the monthly job, given the very diffe... [23:50:32] Is there a dataset of the most downloaded files on Commons?