[02:26:49] (03PS5) 10Milimetric: Publish monthly geoeditor numbers [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) [03:03:51] (03PS6) 10Milimetric: Publish monthly geoeditor numbers [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) [03:05:29] (03CR) 10Milimetric: "ok, this is ready for review, see file produced by test here: /user/milimetric/archive/geoeditors/public/geoeditors-monthly-2019-06.tsv" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/530878 (https://phabricator.wikimedia.org/T131280) (owner: 10Milimetric) [06:04:40] !log delete content of /tmp/* on HDFS [06:04:42] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:05:29] ahahah even for deleting it goes in OOM [06:07:16] ok with a bigger heap size it works [06:07:25] it is moving files under the HDFS Trash [06:14:09] it seems that hive is the spammer in this case [06:15:18] I am cleaning up now stuff from 2017 [06:24:56] !log remove /tmp/hive-staging_hive_(2017|2018)* data from HDFS instead of /tmp/* to avoid causing hive failures (it needs to write temporary data for the current running jobs) [06:24:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:32:55] 38.6M -> 36.1M :D [06:38:47] sorry, I was missing the trash [06:38:56] 35.7M now [06:39:04] so ~3M files cleaned up [06:39:06] :O [06:42:38] 10Analytics, 10Analytics-Cluster: 500k files in hdfs /tmp - https://phabricator.wikimedia.org/T234954 (10elukey) Thanks a lot Erik for this task, we weren't aware of it. It seems that hive leaves intermediate files in tmp when something fails, and we had a ton of garbage left from past years. Just cleaned up t... [06:43:11] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: 500k files in hdfs /tmp - https://phabricator.wikimedia.org/T234954 (10elukey) [06:46:12] 10Analytics, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) T214364 has to be taken into consideration since it lists the missing dependencies that we had to create for CDH on stretch. [06:47:58] brb [07:05:48] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to analytics cluster for Martin Gerlach - https://phabricator.wikimedia.org/T232707 (10MGerlach) @MoritzMuehlenhoff opening this again since I cannot access the cluster anymore, e.g. via 'ssh mgerlach@stat1007.eqiad.wmnet' This happended aft... [07:08:43] Hi team [07:08:51] Thanks for the cleanup elukey! [07:09:26] !log drop test_wmf_netflow fro druid analytics and restart turnilo [07:09:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:09:30] joal: bonjour! [07:12:34] elukey: about the hive-staging tmp files, I wonder if there would be a way to configure automatic retention [07:15:33] from what I can read it seems that there isn't a way [07:16:04] :S [07:16:40] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Upgrade Spark to 2.4.x - https://phabricator.wikimedia.org/T222253 (10JAllemandou) There are a couple of jobs I'd like to check (mediawiki-history, checker, mobile-app-session jobs and wikidata jobs). If we can't run in yarn, I'll do... [07:19:20] 10Analytics, 10Analytics-Kanban: Check home leftovers of smalyshev - https://phabricator.wikimedia.org/T231861 (10elukey) 05Open→03Resolved Everything cleaned up! [07:19:25] 10Analytics, 10Analytics-Kanban: Check home leftovers of smalyshev - https://phabricator.wikimedia.org/T231861 (10elukey) [07:24:38] so joal, the only "blocker" for kerberos now is https://phabricator.wikimedia.org/T234229 [07:25:02] ok [07:25:17] I asked to some people to test the Hadoop test cluster in https://phabricator.wikimedia.org/T212258 [07:25:38] \o/ [07:25:38] but in theory we could start planning the day of the madness [07:27:00] there is some light at the end of the tunnel [07:28:28] I can see a shadow through that light elukey - https://vignette.wikia.nocookie.net/gems-of-war/images/5/50/Troop_Kerberos.png/revision/latest/scale-to-width-down/350?cb=20160128204958 [07:29:55] :D [07:30:03] * elukey starts https://etherpad.wikimedia.org/p/analytics-kerberos-deployment [07:32:27] I wonder about the hdfs-labs rsync [07:38:56] it would be nice to get rid of the hdfs fuse mountpoint [07:39:02] it is really really brittle [07:39:09] I do agree on that [07:39:24] I wonder about how to copy data without opening a hole [07:40:17] in theory we already have a hole [07:40:27] ah [07:40:29] since the scripts are pulling data from rsync [07:40:56] we rely on the fs checks basically [07:41:01] (this is how i see it) [07:41:49] if we kerberize rsync, then the labstore nodes will keep copying but with some auth [07:42:31] that is more or less similar (in my opinion) to copy directly from hdfs (but with proper krb auth) [07:43:20] do you have concerns joal ? [07:43:56] (brb sorry) [07:49:32] elukey: I'm wondering about attack vector from labs host [07:49:46] if there are kerb creds on that host [07:50:10] I thought we originally were rsncing FROM a stat machine [07:53:48] the rsync starts on the labstore node, logs in into stat1007 and pulls data from /mnt/hdfs afaik [07:55:06] hm - I had the impression we prefered to have analytics hosts initiating copy, to open holes from our network (at the moment we were copying to cassandra for instance) [07:57:48] it is configured in profile::dumps::distribution::datasets::fetcher [07:57:59] (that is added to the labstore nodes) [07:58:56] and dumps::web::fetches::stats [08:00:05] in theory those nodes have a leg in labs, but they are not running vms etc.. [08:01:00] right [08:01:03] if we kerberize the labstores, then they'll likely have a keytab stored somewhere, accessible by a certain user [08:01:23] so the attack vector is to be able to read that file [08:01:45] yup [08:02:29] and it will be able to read only what is allowed on hadoop [08:02:32] particularly if the machine is accessible as a labs tool - if the thing is only running file server, I guess it's less problematic [08:02:51] elukey: right - we'll need proper user/group setting [08:03:34] we could use what we have now, in theory it works [08:03:46] the main problem is the authentication part :D [08:05:43] right :) [08:10:53] I asked to Moritz to take a look, we'll have an authoritative answer :) [10:30:41] * elukey lunch! [10:33:17] 10Analytics, 10User-Elukey: CDH Jessie dependencies not available on Stretch - https://phabricator.wikimedia.org/T214364 (10MoritzMuehlenhoff) Can we narrow down which component needs libssl1.0.0? One of the many outdated/bundled ones? libssl1.0.0 is OpenSSL 1.0.1 which is EOLed by upstream. There's still a v... [10:57:53] (03PS3) 10Fdans: Add backfill queries for per referer mediarequests [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541817 (https://phabricator.wikimedia.org/T228149) [11:13:02] (03PS4) 10Fdans: Add backfill queries for per referer mediarequests [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541817 (https://phabricator.wikimedia.org/T228149) [11:20:54] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to analytics cluster for Martin Gerlach - https://phabricator.wikimedia.org/T232707 (10MoritzMuehlenhoff) Try running ` ssh-add ~/.ssh/id_ed25519 ` It will ask you for the passphrase of our SSH key. After running doing that, can you retry... [11:26:14] 10Analytics: Analytics Access for Grant - https://phabricator.wikimedia.org/T235260 (10gsingers) [11:30:42] (03CR) 10Fdans: [V: 03+1] "Both queries tested successfully" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541817 (https://phabricator.wikimedia.org/T228149) (owner: 10Fdans) [11:57:41] 10Analytics, 10Operations, 10SRE-Access-Requests: Requesting access to analytics cluster for Martin Gerlach - https://phabricator.wikimedia.org/T232707 (10MGerlach) That solved it. Thanks. [11:58:04] I'm looking for a best practice recommendation, to produce graphs, reports, or other monitoring for a low-volume eventlogging stream. [11:58:44] Hi awight - not sure I can help with best, but I somtimes practice :) [11:58:46] Grafana would be fine, but daily granularity is all we really need. [11:59:04] awight: I think reportupdater would be the easiest [11:59:13] "commonly practiced" :) [11:59:31] joal: Okay nice. So e.g. I write a daily hql query in reportupdater-queries? [11:59:52] awight: Reportupdater is a scheduling tool, lighter than oozie, that we use to generate thinks a smaller scale [12:00:13] indeed awight - A folder with our config and queries (you can easily generate multiple reports) [12:01:01] That sounds like just what I need, thanks! [12:01:05] and by reports we means TSV files - For display we usually use dashiki (but I'll let milimetric, fdans or mforns confirm, as I'm really not into UI) [12:01:10] np awight :) [12:01:26] awight: Thank you for using our stuff ;) [12:02:00] For context, this is to monitor a new feature "reference previews" that may affect users' interactions with citations, hopefully for the better: T233108. [12:02:01] T233108: Basic dashboards for Reference Previews tracking - https://phabricator.wikimedia.org/T233108 [12:02:48] ack awight [12:05:36] great, a TSV might be all we need for this project. [12:28:28] Just a note--I see a lot of HQL queries (all?) in that repo, which are formatting a date string from partition columns year, month, day to compare against script $1 and $2. [12:28:35] That defeats the partitioning, I believe. [12:29:33] awight: I have not checked RU queries - But will after your comment :) [12:31:45] +1. I'll do it the other way around for my script, parsing the parameter strings instead. [12:51:09] !log deployed eventlogging python3 version in deployment-prep [12:51:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:51:19] awight: --^ thanks a lot for the help btw [12:52:12] elukey: Happy to help rubber-duck debug :-) [12:54:12] (03PS1) 10Awight: New report for Reference Previews [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/542419 (https://phabricator.wikimedia.org/T233108) [12:54:13] at the moment I don't see anything on fire [12:56:03] joal: ^ my attempt at optimizing the time partition matching. One thing that bothers me is that I don't see a simple way to do this with multi-day queries. For example, (month >= start.month and month < end.month) is incorrect when crossing the new year, since it becomes the impossible (month >= 12 and month < 1). [12:56:26] I'm sure there's a way to do this comparison, but it wasn't obvious to me. [12:56:30] gtg, thanks again! [12:56:38] Thank you awight :) [13:05:34] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade eventlogging to Python 3 - https://phabricator.wikimedia.org/T233231 (10elukey) Deployed in beta, no errors from python3 but kafka-python seems not doing its work. All the kafka consumers are stuck, this is the stacktrace of all-ev... [13:05:53] sigh --^ [13:51:56] helloooo [13:55:26] hi mforns [13:55:33] hey joal :] [13:58:57] o/ [14:03:55] hey team, sorry had to restart [14:19:47] 10Analytics, 10Analytics-Kanban: HivePartition (refinery::Hive.py) does not allow partition values to have dots (.) - https://phabricator.wikimedia.org/T235268 (10mforns) [14:21:32] (03PS1) 10Mforns: Allow HivePartitions to have dots (.) in their values [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542441 (https://phabricator.wikimedia.org/T235268) [14:30:43] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade eventlogging to Python 3 - https://phabricator.wikimedia.org/T233231 (10elukey) Tried to switch the kafka consumer to the confluent one, but ended up in a similar situation: ` Waiting for the GIL 10Analytics: MediaWiki history dumps have some events in 2025 - https://phabricator.wikimedia.org/T235269 (10mforns) [14:37:59] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade eventlogging to Python 3 - https://phabricator.wikimedia.org/T233231 (10elukey) PEBCAK of the day: I picked the wrong consumer, since this pulls events from valid mixed, that does't get populated if the processors are not working..... [14:41:22] a-team someone knows how to reach the WMF staff calendar? [14:42:42] I just searched for it in Google calendar, no? [14:47:31] mforns: do you need to add the cal to yours? It should be searchable in the box on the left in theory [14:48:21] milimetric, elukey, I did, but it's not there (at least for me) [14:48:55] weird, maybe email oit? [14:51:07] mforns: https://office.wikimedia.org/wiki/Office_IT/Calendars#List_of_WMF_Calendars [14:51:30] then you should be able to add it [14:51:35] with the + button [14:51:55] elukey, <3 [14:51:57] thanks [15:05:21] 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10leila) @elukey the day has arrived. Thanks for your patience. Please purge all home directories in the task Description. We have extracted the data we need for release. @Isaac Thanks for your help t... [15:12:52] leila: \o/ [15:13:02] * leila hides [15:13:04] :D [15:13:27] just to confirm, I can nuke all right? [15:13:33] ow yes [15:13:38] the data extracted is elsewhere etc.. [15:13:43] correct. [15:13:48] super, proceeding :) [15:13:56] * leila holds her breath [15:23:20] 10Analytics: browser dashboards not updated since 09/29 - https://phabricator.wikimedia.org/T235278 (10Nuria) [15:28:54] eventlogging seems to work in beta with py3 [15:28:57] * elukey dances [15:29:42] 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10elukey) [15:30:07] elukey, mforns : i think after the recentreportupdater changes reports have not executed last week: https://phabricator.wikimedia.org/T235278 [15:30:26] 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10elukey) 05Open→03Resolved Everything cleaned up! @Isaac if possible please clean the data that I copied originally in your home directories (if not needed anymore of course!). [15:30:57] nuria, I think that is fine, 2019-09-29 is the start of the week [15:31:06] RU weekly reports start on sunday [15:31:30] we do not have yet a full week after that [15:31:35] mforns: but it is missing next week, there should be one on teh 6/7 [15:32:08] next computation will take place next monday and will fill in the 2019-10-06 data point [15:32:42] 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10Isaac) Done! Thanks for the reminder @elukey ! [15:33:00] current last data point spans from 2019-09-29 to 2019-10-05 both included [15:33:24] the next week should be from 2019-10-06 to 2019-10-12 [15:33:52] but it's still missing today and tomorrow [15:39:14] 10Analytics: browser dashboards not updated since 09/29 - https://phabricator.wikimedia.org/T235278 (10mforns) This is normal behavior I think. In reportupdater weekly reports span weeks starting on sunday 00:00:00 and ending following saturday 23:59:59. Also, the data-point receives the label of the start date... [15:48:54] 10Analytics: browser dashboards not updated since 09/29 - https://phabricator.wikimedia.org/T235278 (10mforns) Note that the pingback reports are still bad after the migration to Hive. But not because of scheduling reasons, rather the queries are still not doing what the MySQL ones did before. Working on that r... [15:56:40] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade eventlogging to Python 3 - https://phabricator.wikimedia.org/T233231 (10elukey) Fixed some little issues here and there and updated the code review. Re-deployed in beta, the service seems now working fine! [16:25:40] awight: re your > 12 < 1 comment , those are string comparators 2018-12-12 versus 2019-01-01, right? [16:26:07] awight: not integer n> 12 and n <1 [16:28:11] 10Analytics: Add partition pruning for wmf.browser_general - https://phabricator.wikimedia.org/T235283 (10mforns) [16:40:46] * elukey off! [16:59:20] !Log re-refining events from WikipediaPortal schema T234461 [16:59:20] T234461: Sudden drop in WikipediaPortal events - https://phabricator.wikimedia.org/T234461 [17:01:43] 10Analytics, 10Analytics-Kanban: Superset not able to load a reading dashboard - https://phabricator.wikimedia.org/T234684 (10JKatzWMF) @elukey Thanks for making this adjustment! That makes a huge difference. [17:53:17] 10Analytics, 10Research: Taxonomy of new user reading patterns - https://phabricator.wikimedia.org/T234188 (10Isaac) Adding in a few links for reference: Definitely want to tie this to two projects from the Growth Team: [[https://www.mediawiki.org/wiki/Growth/Understanding_first_day|Understanding First Day]] a... [18:52:55] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Wikimedia-Portals: Sudden drop in WikipediaPortal events - https://phabricator.wikimedia.org/T234461 (10Nuria) oh-oh backfilling not working. These are the sites however, mostly span and www.wikipedia.org ` >select distinct webhost, count(*) fro... [18:53:08] (03CR) 10Mforns: Add the MobileWebUIActionsTracking schema to EventLogging whitelist (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/541946 (https://phabricator.wikimedia.org/T234563) (owner: 10MNeisler) [18:58:48] (03PS1) 10Nuria: Removing editCountBucket from Popup schema [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542540 [19:11:49] (03CR) 10Ottomata: [C: 03+1] Allow HivePartitions to have dots (.) in their values (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542441 (https://phabricator.wikimedia.org/T235268) (owner: 10Mforns) [19:13:36] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Move reportupdater reports that pull data from eventlogging mysql to pull data from hadoop - https://phabricator.wikimedia.org/T223414 (10mforns) @CCicalese_WMF Hi :] After checking the results calculated by reportupdater for t... [20:29:55] Hi ! I have a quick question about getting on to stat1007. whenever i try to ssh it prompts me for a password but Nuria mentioned i should not have to provide one. Does anyone know what i might be doing wrong https://usercontent.irccloud-cdn.com/file/6y259pjA/image.png [20:38:33] jkumalah: have you set up your ssh keys so you can acces production? [20:38:36] *access? [20:59:19] I went through the whole process and just got approved [20:59:33] i'm not sure if i need to do additional configuration [21:19:39] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542540 (owner: 10Nuria) [21:20:56] (03CR) 10Mforns: Allow HivePartitions to have dots (.) in their values (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542441 (https://phabricator.wikimedia.org/T235268) (owner: 10Mforns) [22:01:57] (03CR) 10Nuria: [C: 03+2] Removing editCountBucket from Popup schema [analytics/refinery] - 10https://gerrit.wikimedia.org/r/542540 (owner: 10Nuria) [22:02:31] jkumalah: did you gave you ssh keys to someone from SRE via phab? [22:16:04] jkumalah: approval does just work for you to be able to get your keys listed in production [22:16:42] jkumalah: if you did not generated ssh keys (or that sounds complicated) please sync in with devs on your team so they can help you [22:35:34] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, 10ops-eqiad: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Jclark-ctr) [22:42:45] 10Analytics, 10DC-Ops, 10Operations, 10decommission, 10ops-eqiad: Decommission old Kafka analytics brokers: kafka1012,kafka1013,kafka1014,kafka1020,kafka1022,kafka1023 - https://phabricator.wikimedia.org/T226517 (10Jclark-ctr) [23:16:08] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, 10ops-eqiad: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul) ` papaul@asw2-b-eqiad# show | compare [edit interfaces] - ge-5/0/12 { - description dbproxy1004; - } [23:17:11] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, 10ops-eqiad: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul) [23:27:36] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, and 2 others: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul) [23:29:16] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Wikimedia-Portals: Sudden drop in WikipediaPortal events - https://phabricator.wikimedia.org/T234461 (10Nuria) I should really read the docs I WROTE, re running refine [23:29:29] 10Analytics, 10Analytics-EventLogging, 10Operations, 10decommission, and 2 others: Decommission dbproxy1004 and dbproxy1009 - https://phabricator.wikimedia.org/T228768 (10Papaul) @Jclark-ctr once you add dbproxy1009 to the decom Sheet, you can resolve the task. Thanks