[04:27:03] PROBLEM - Check the last execution of monitor_refine_mediawiki_events on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_mediawiki_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:14:33] morning! [06:14:57] Hi elukey_ - Hi team [06:15:20] I';; be on and off today, NaΓ© is sick and I need to keep her home :( [06:16:46] joal: ack don't worry, please take care of her :) [06:21:06] !log restart camus mediawiki_events on an-coord1001 with increased mapreduce heap size [06:21:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:41:27] !log manually applied https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/538235/ on an-coord1001 [07:41:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:44:29] !log restart manually refine_mediawiki_events on an-coord1001 with --since 48 to force the refinement after camus backfilled the missing data [07:44:30] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:46:18] ok everything seems good now [08:05:31] RECOVERY - Check the last execution of monitor_refine_mediawiki_events on an-coord1001 is OK: OK: Status of the systemd unit monitor_refine_mediawiki_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:17:27] PROBLEM - Check the last execution of refinery-drop-webrequest-raw-partitions on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-webrequest-raw-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:20:57] PROBLEM - Check the last execution of refinery-drop-eventlogging-partitions on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-eventlogging-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:22:14] all right, this is of course my fault since we are running a mixture of python2 and python3 [08:22:17] on an-coord1001 [08:22:29] need to quickly deploy refinery to apply the last fixes [08:22:35] without hdfs/etc.. [08:24:49] !log deploy refinery to apply all the python2 -> python3 fixes [08:24:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:35:31] 10Analytics, 10Analytics-EventLogging: Update client-side event validator to support (at least) draft 3 of JSON Schema - https://phabricator.wikimedia.org/T182094 (10phuedx) πŸ‘Œ Also, the client-side event validation was removed some time ago anyway! Thanks for tidying up, @Ottomata. [08:40:47] (03PS1) 10Elukey: hdfs.py: use startsWith with a string when checking sh()'s output [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538569 (https://phabricator.wikimedia.org/T204735) [08:41:14] (03CR) 10Elukey: [V: 03+1] "Works on an-coord1001" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538569 (https://phabricator.wikimedia.org/T204735) (owner: 10Elukey) [08:51:19] PROBLEM - Check the last execution of refinery-drop-webrequest-refined-partitions on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit refinery-drop-webrequest-refined-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:51:55] RECOVERY - Check the last execution of refinery-drop-eventlogging-partitions on an-coord1001 is OK: OK: Status of the systemd unit refinery-drop-eventlogging-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:53:20] yes yes [09:24:08] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move the Analytics Refinery to Python 3 - https://phabricator.wikimedia.org/T204735 (10elukey) ` Sep 23 08:33:35 an-coord1001 refinery-drop-older-than[40168]: File "/srv/deployment/analytics/refinery/bin/refinery-drop-older-than", line 595, in S... [09:25:00] !log temporarily disable *drop* timers on an-coord1001 to verify refinery python change with the team [09:25:01] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:25:30] as I wrote in the task I am 99% positive that everything works fine, but I'll wait for Marcel's confirmation to be super sure [09:25:37] this should be the last fix hopefully [09:35:33] 10Analytics, 10Analytics-EventLogging, 10Technical-Debt (Deprecation): Eventlogging is not compatible with python3? - https://phabricator.wikimedia.org/T233591 (10awight) [09:40:05] 10Analytics, 10Analytics-EventLogging, 10Technical-Debt (Deprecation): Eventlogging is not compatible with python3? - https://phabricator.wikimedia.org/T233591 (10elukey) Thanks a lot for the task :) We started collecting python2 -> python 3 use cases in https://phabricator.wikimedia.org/T204734, and there... [09:42:08] 10Analytics, 10Analytics-EventLogging, 10Technical-Debt (Deprecation): Eventlogging is not compatible with python3? - https://phabricator.wikimedia.org/T233591 (10awight) >>! In T233591#5515270, @elukey wrote: > We started collecting python2 -> python 3 use cases in https://phabricator.wikimedia.org/T204734,... [09:42:19] 10Analytics, 10Analytics-EventLogging, 10Technical-Debt (Deprecation): Eventlogging is not compatible with python3? - https://phabricator.wikimedia.org/T233591 (10awight) [09:42:22] 10Analytics: Upgrade eventlogging to Python 3 - https://phabricator.wikimedia.org/T233231 (10awight) [09:43:37] 10Analytics: Upgrade eventlogging to Python 3 - https://phabricator.wikimedia.org/T233231 (10awight) [09:43:45] 10Analytics: Upgrade eventlogging to Python 3 - https://phabricator.wikimedia.org/T233231 (10awight) [10:42:12] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10elukey) Only on db1107 there are `_log_sql_aff*` files for up 507G, that are not on db1108. No idea what they are about. If... [10:49:55] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10elukey) There is also another option that we could think about to avoid spending a ton of time on this. Since we want to allow... [11:02:04] * elukey lunch! [11:11:19] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10jcrespo) s/mysqldump/mydumper/ Regarding the machines, I would suggest to either decom both or keep both. Keeping a service w... [12:05:11] ls [12:05:15] oops :) [12:07:03] :) [12:14:03] 10Analytics, 10Research: Parse wikidumps and extract redirect information for 1 small wiki, romanian - https://phabricator.wikimedia.org/T232123 (10JAllemandou) Hi @MGerlach, Awesome results :) I have some requests for improvement before we plan on how we move forward: - Can you confirm which spark-kernel yo... [12:17:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 6 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10JAllemandou) Hi @Ottomata - I like the `annotations` subobject - Explicit for the win. I however have no goo... [12:44:48] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10elukey) Yes I am aware, but in case of a big disaster (like host completely broken) we could think about an alternative starti... [12:46:39] 10Analytics: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) [12:47:24] 10Analytics: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) T233604 tracks the work to import the openjdk-8 package to a special component for Debian Buster, thanks Moritz! [12:52:53] https://issues.apache.org/jira/browse/BIGTOP-3021 - bigtop removed al long time ago Hue [12:54:01] (03PS1) 10Joal: Update subnet lists for IpUtil [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/538607 (https://phabricator.wikimedia.org/T233504) [12:55:26] but I think that in https://github.com/apache/bigtop/tree/bigtop-alpha they are working on Hadoop 3 [12:57:59] elukey: without hue, not so nice I guess [12:58:10] also oozie gone [12:58:22] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Archive data on eventlogging MySQL to analytics replica before decomisioning - https://phabricator.wikimedia.org/T231858 (10Marostegui) >>! In T231858#5515959, @elukey wrote: > Yes I am aware, but in case of a big disaster (like host completely broke... [12:58:26] woa [13:08:40] 10Analytics-Kanban: Per referer mediarequests returns requests count as string - https://phabricator.wikimedia.org/T233622 (10fdans) [13:09:31] (03PS1) 10Fdans: Cast mediarequests value as int before submitting the response [analytics/aqs] - 10https://gerrit.wikimedia.org/r/538611 (https://phabricator.wikimedia.org/T233622) [13:10:37] (03CR) 10jerkins-bot: [V: 04-1] Cast mediarequests value as int before submitting the response [analytics/aqs] - 10https://gerrit.wikimedia.org/r/538611 (https://phabricator.wikimedia.org/T233622) (owner: 10Fdans) [13:11:08] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 6 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Ottomata) > The capsule is coming back! Haha, not quite. The fields would still be defined explicitly. But... [13:13:06] (03PS2) 10Fdans: Cast mediarequests value as int before submitting the response [analytics/aqs] - 10https://gerrit.wikimedia.org/r/538611 (https://phabricator.wikimedia.org/T233622) [13:16:38] (03PS4) 10Fdans: Correct parameters in mediarequest cassandra jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/537936 [13:16:54] (03CR) 10Fdans: Correct parameters in mediarequest cassandra jobs (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/537936 (owner: 10Fdans) [13:18:18] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 6 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Ottomata) Or even more DRY: `lang=yaml title: analytics/mediawiki/revision/create $id: /analytics/mediawiki... [13:18:52] (03PS1) 10Joal: Add network-origin to the geoeditors-daily table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538613 (https://phabricator.wikimedia.org/T233504) [13:20:49] (03PS2) 10Joal: Add network-origin to the geoeditors-daily table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538613 (https://phabricator.wikimedia.org/T233504) [13:21:01] (03CR) 10Ottomata: [C: 03+2] Update subnet lists for IpUtil [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/538607 (https://phabricator.wikimedia.org/T233504) (owner: 10Joal) [13:24:03] 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Jul-Sep 2019), 10Patch-For-Review: add whether an edit happened on cloud VPS to geoeditors-daily dataset - https://phabricator.wikimedia.org/T233504 (10JAllemandou) Provided 2 patches based on the request above as examples. More dis... [13:30:23] 10Analytics, 10Research: Parse wikidumps and extract redirect information for 1 small wiki, romanian - https://phabricator.wikimedia.org/T232123 (10MGerlach) Thanks for the feedback @JAllemandou **Regarding your questions:** - yes, I used PySpark Yarn (large) - looking at all namespaces is ok - I will... [13:47:01] a-team :D https://usercontent.irccloud-cdn.com/file/LIDjKE3J/Screen%20Shot%202019-09-23%20at%203.46.30%20PM.png [13:47:21] it looks a lot less exciting when you add the image breakdown though [13:47:23] COOOL [13:47:29] https://usercontent.irccloud-cdn.com/file/cRLtlw6x/Screen%20Shot%202019-09-23%20at%203.46.39%20PM.png [13:47:43] wow! [13:47:56] nice work! [13:50:05] does anyone have a clue on why image requests are increasing wildly and in a suspiciously linear way since mid July? [13:52:15] there are things like wiki love monuments that may have contributed [13:53:21] in terms of absolute numbers this metric is pretty insane [13:54:20] "3 billion images every day are requested to the Wikipedia servers" some people over here are gonna be drooling over that line [13:55:01] 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Epic: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata) [13:55:06] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Legacy (Watching / External), 10Services (watching): Modern Event Platform: Stream Intake Service - https://phabricator.wikimedia.org/T201068 (10Ottomata) [13:57:47] fdans: Nice :) A funny approx we can make is that on average, a wikipedia page contains 5 images :) [13:58:02] ~3b images for ~600M pageviews :) [13:58:35] joal: yes I was thinking about that :) [13:59:00] joal: how about that perfect upward slope in the last 2 months though? [13:59:09] fdans: no clue about thatr [13:59:32] fdans: interestingly, the slope is also true for videos (it seems) [13:59:51] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Allow all Analytics tools to work with Kerberos auth - https://phabricator.wikimedia.org/T226698 (10elukey) Now I get it thanks, this is what we already do for other datasets, so it is not a new use case. The rsync should be from the dumps... [14:02:17] joal: but pageviews have not increased [14:02:17] http://localhost:8000/dist-dev/#/all-projects/reading/total-page-views/normal|bar|3-month|~total|daily [14:02:30] nope [14:05:42] (sorry for the localhost address) [14:08:35] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Legacy (Watching / External), 10Services (watching): Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) [14:15:53] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 3 others: Public schema.wikimedia.org endpoint for schema.svc - https://phabricator.wikimedia.org/T233630 (10Ottomata) [14:22:05] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 3 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) [14:27:56] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) [14:56:22] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Allow all Analytics tools to work with Kerberos auth - https://phabricator.wikimedia.org/T226698 (10mforns) @elukey > Eventually the data will be accessible via datasets.w.o or dumps.w.o right? Yes, from dumps.w.o [14:56:52] mforns: o/ [14:57:07] hey elukey :] [14:57:13] hello :) [14:57:20] when you have a minute I'd need your review for https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/538569/ [14:57:41] I applied the fix on an-coord1001 and it looks good, but all the drop scripts depends on it [14:57:51] so before doing something nasty, I wanted to triple check :) [14:57:58] (I ran one drop script and it worked fine) [14:58:08] (the others are all disabled) [15:21:17] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10CPT Initiatives (Modern Event Platform (TEC2)), and 2 others: Modern Event Platform: Stream Configuration: Implementation - https://phabricator.wikimedia.org/T233634 (10Ottomata) p:05Triageβ†’03Normal [15:23:59] 10Analytics, 10Analytics-Cluster, 10DC-Ops, 10Operations, 10ops-eqiad: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted - https://phabricator.wikimedia.org/T232069 (10elukey) This host can keep running with one disk less, fixed it with https://gerrit.wikimedia.org/r/#/c/operations... [15:24:37] 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Jul-Sep 2019), 10Patch-For-Review: add whether an edit happened on cloud VPS to geoeditors-daily dataset - https://phabricator.wikimedia.org/T233504 (10fdans) p:05Triageβ†’03High [15:25:12] 10Analytics: Change HDFS balancer threshold - https://phabricator.wikimedia.org/T231828 (10fdans) [15:25:30] 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Jul-Sep 2019), 10Patch-For-Review: add whether an edit happened on cloud VPS to geoeditors-daily dataset - https://phabricator.wikimedia.org/T233504 (10Ottomata) Not sure about the `is_cloud_vps` name...can the dashboard just examin... [15:26:05] 10Analytics: Upgrade eventlogging to Python 3 - https://phabricator.wikimedia.org/T233231 (10fdans) p:05Triageβ†’03High [15:26:23] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: Figure out how to $ref common schema across schema repositories - https://phabricator.wikimedia.org/T233432 (10Ottomata) p:05Triageβ†’03High [15:26:39] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 3 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) p:05Triageβ†’03Normal [15:26:44] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 3 others: Public schema.wikimedia.org endpoint for schema.svc - https://phabricator.wikimedia.org/T233630 (10Ottomata) p:05Triageβ†’03Normal [15:26:51] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 3 others: Public schema.wikimedia.org endpoint for schema.svc - https://phabricator.wikimedia.org/T233630 (10Ottomata) a:05Ottomataβ†’03None [15:27:22] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 3 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) a:05Ottomataβ†’03None [15:30:26] 10Analytics, 10Operations, 10Traffic: Images served with text/html content type - https://phabricator.wikimedia.org/T232679 (10Ottomata) 05Openβ†’03Declined Nuria I think we can decline this yes? Doing so, feel free to reopen if I am wrong. [15:35:54] (03PS3) 10Elukey: [WIP] Move codebase to python3 [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/537268 (https://phabricator.wikimedia.org/T204736) [15:38:24] 10Analytics, 10Operations, 10Traffic: Cookies and misc services caching - https://phabricator.wikimedia.org/T232453 (10fdans) cc @Aklapper gasserandreas seems to be moving stuff around our board, could you take a look at it? Seems malicious. [15:38:56] (03CR) 10Elukey: "This is the first version that passes tests on localhost, but it will likely fail in Jenkins since it should be still using python2." [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/537268 (https://phabricator.wikimedia.org/T204736) (owner: 10Elukey) [15:42:55] 10Analytics: Add urlshortener to Turnilo - https://phabricator.wikimedia.org/T233336 (10fdans) https://www.mediawiki.org/wiki/Extension:UrlShortener [16:04:47] ottomata: odd thing, in hive event.mediawiki_cirrussearch_request ends on sept 20, 23:00. Nothing for sept 21 00:00 onwards [16:05:42] ... [16:05:45] looking [16:11:58] mforns: o/ [16:12:07] heyy elukey [16:12:59] I was looking if other scripts use any of the outputs returned by Hdfs.methods and have to deal with bytes encoding [16:13:09] I forgot to mention during standup that after my fix on an-coord1001 refinery-drop-eventlogging-partitions.service restarted [16:13:27] and it worked.. I got an heart attack when I saw 2019-09-23T08:46:56 INFO Removing 439 directories for tree depth 6. [16:13:34] but it seems in line with the usual [16:13:43] and also on hdfs I can see the past three months [16:13:54] xD [16:14:27] good point on checking other scripts [16:14:42] yea, tree depth 6 is the hourly level, 439 is probably like a couple days [16:14:53] for all schemas [16:15:05] yep yep but at first sight it seemed doomsday [16:15:06] :D [16:15:27] the main problem of ls() is that it uses sh() directly [16:15:44] that on python2 should have always returned a text string [16:15:47] like now [16:15:58] but for some reason it wasn't so b'Found ' was probably needed? [16:16:39] elukey, I remember having had to workaround the fact that sh() returned bytes in python2 [16:16:50] outside of Hdfs.py [16:17:22] that's why I was trying to remember what it was and find it [16:18:52] ah ok all right so my fix to force text string is not working everywhere then [16:24:41] not sure! [16:26:59] I mean, it may not work if you compare it with bytes strings [16:27:16] so if it doesn't work, it will fail for sure and we'll notice (like in this case) [16:27:45] but in my mind bytes strings are a bit cumbersome to use in scripts when you pass stuff around to functions [16:28:57] yea, of course [16:29:05] I think I found one in: [16:29:26] import-mediawiki-dumps line 117 [16:29:34] elukey, ^ [16:30:34] yes good catch! [16:31:21] ah so ls() does the splitline() before returning [16:31:22] good [16:31:54] (03PS2) 10Elukey: hdfs.py: use startsWith with a string when checking sh()'s output [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538569 (https://phabricator.wikimedia.org/T204735) [16:34:04] ah snap there is also 'file_type': 'f' if parts[0].decode('utf-8')[0] == '-' else 'd', [16:34:09] in the ls() itself [16:34:35] elukey, there's a couple in Hdfs.py itself, marking them in the CR [16:35:42] yes yes all the decode() ones [16:35:43] sigh [16:35:59] (03PS3) 10Elukey: hdfs.py: use startsWith with a string when checking sh()'s output [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538569 (https://phabricator.wikimedia.org/T204735) [16:37:09] (03CR) 10Mforns: hdfs.py: use startsWith with a string when checking sh()'s output (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538569 (https://phabricator.wikimedia.org/T204735) (owner: 10Elukey) [16:37:24] (03PS4) 10Elukey: hdfs.py, import-mediawiki-dumps: use sh()'s output as text not binary [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538569 (https://phabricator.wikimedia.org/T204735) [16:37:37] no, good that we found them! [16:38:03] I don't think the failures would be catastrophic, they just will make the script error [16:38:19] yes exactly [16:38:24] (03PS5) 10Elukey: hdfs.py, import-mediawiki-dumps: use sh()'s output as text not binary [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538569 (https://phabricator.wikimedia.org/T204735) [16:41:08] elukey, I think it looks good now! should we add the import-mediawiki-dumps one to this change? [16:41:39] mforns: already done [16:41:48] oh, sorry no see [16:42:33] oh, you created another patch sorry [16:42:34] we can probably merge/deploy quickly via scap if you are ok [16:42:42] well renamed it [16:42:49] it is the same one [16:43:04] ? sorry, I'm messing up with Gerrit urls... [16:43:30] 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy (Jul-Sep 2019), 10Patch-For-Review: add whether an edit happened on cloud VPS to geoeditors-daily dataset - https://phabricator.wikimedia.org/T233504 (10bd808) >>! In T233504#5513225, @Nuria wrote: > Can someone from #cloud-services-... [16:44:24] (03CR) 10Mforns: [C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538569 (https://phabricator.wikimedia.org/T204735) (owner: 10Elukey) [16:44:28] <3 [16:44:41] ottomata, joal - going to do a quick refinery deploy ok? [16:44:47] (03CR) 10Elukey: [V: 03+2] hdfs.py, import-mediawiki-dumps: use sh()'s output as text not binary [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538569 (https://phabricator.wikimedia.org/T204735) (owner: 10Elukey) [16:45:43] mforns: does it work for you? [16:46:19] all right will take this as yes :D [16:46:42] !log deploy refinery again (no hdfs, no source) to deploy the latest python fixes [16:46:44] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:52:49] elukey, sure :] [16:56:14] mforns: seems working :) [16:56:19] I found an interesting warning [16:56:20] :D [16:56:20] ResourceWarning: unclosed file <_io.TextIOWrapper name='/dev/null' mode='w' encoding='UTF-8' [16:56:29] hm [16:56:44] it seems that we do open /dev/null for writing but never closing it [16:56:47] this seems unrelated though.. [16:56:50] oh yes [16:56:54] I just mentioned [16:57:10] that might be drop-older-than [16:57:15] I can not recall another script that does that [16:57:26] lemme check [16:58:20] lol [16:58:20] refinery-drop-older-than[83604]: TypeError: a bytes-like object is required, not 'str' [16:58:26] elukey, yea, drop-older-than is missing a couple closes [16:58:30] it is in [16:58:30] Sep 23 16:55:52 an-coord1001 refinery-drop-older-than[83604]: File "/srv/deployment/analytics/refinery/python/refinery/hive.py", line 353, in query [16:58:32] !! [16:58:33] Sep 23 16:55:52 an-coord1001 refinery-drop-older-than[83604]: f.write(query) [16:59:26] ahh because it writes the tmp file [16:59:58] f.write(query.decode()) [17:00:03] that should fix it [17:00:14] where in drop-older-than? [17:00:20] or is it Hive.py? [17:00:42] in hive.py:353 [17:01:03] but it gets called in File "/srv/deployment/analytics/refinery/bin/refinery-drop-older-than", line 146, in drop_partitions [17:02:13] that kinda makes sense [17:02:13] elukey, but why does a file need bytes to write? [17:03:23] this is a good point, maybe the file is opened in binary mode? [17:04:17] hmm [17:04:39] https://docs.python.org/3/library/tempfile.html#tempfile.TemporaryFile [17:04:48] "The mode parameter defaults to 'w+b' so that the file created can be read and written without being closed." [17:05:14] "Binary mode is used so that it behaves consistently on all platforms without regard for the data that is stored. " [17:05:52] so in theory we could use mode='w' [17:05:56] and solve the problem [17:07:20] aha [17:07:24] makes sense! [17:08:51] code review incoming :) [17:08:54] (03PS1) 10Elukey: hive.py: open the temporary file in text mode [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538675 (https://phabricator.wikimedia.org/T204735) [17:09:23] mforns: to avoid another deploy, I can quickly change it on an-coord1001 and then test it [17:09:27] if you like the patch --^ [17:09:36] lookin [17:09:58] (03CR) 10Mforns: [C: 03+2] "LGTM!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538675 (https://phabricator.wikimedia.org/T204735) (owner: 10Elukey) [17:10:04] ok! [17:12:14] mforns: worked! [17:12:17] for refinery-drop-webrequest-raw-partitions.service it dropped [17:12:19] :D [17:12:21] 10Analytics, 10Operations, 10User-Elukey: setup/install krb1001/WMF5173 - https://phabricator.wikimedia.org/T233141 (10RobH) p:05Triageβ†’03Normal [17:12:24] Dropping 18 Hive partitions from table wmf_raw.webrequest. [17:12:32] Removing 26 directories for tree depth 6. [17:13:04] yea, looks good [17:15:58] RECOVERY - Check the last execution of refinery-drop-webrequest-raw-partitions on an-coord1001 is OK: OK: Status of the systemd unit refinery-drop-webrequest-raw-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:16:28] ah refinery-drop-webrequest-raw-partitions.service has only 31 days [17:16:31] as retention [17:16:34] didn't know that [17:16:53] ok all looks good [17:16:58] (03CR) 10Elukey: [V: 03+2] hive.py: open the temporary file in text mode [analytics/refinery] - 10https://gerrit.wikimedia.org/r/538675 (https://phabricator.wikimedia.org/T204735) (owner: 10Elukey) [17:17:17] \o/ [17:18:50] thanks mforns! [17:19:05] no thank you, sorry for you working so late [17:19:11] nah super fine :) [17:19:18] I think that I'll need to deploy a third time [17:19:22] RECOVERY - Check the last execution of refinery-drop-webrequest-refined-partitions on an-coord1001 is OK: OK: Status of the systemd unit refinery-drop-webrequest-refined-partitions https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [17:19:33] maybe this may hit statXX where report updater runs? [17:19:45] not sure if it uses tmp files [17:20:50] going out for a run, then I'll re-check [17:21:09] it should be ok, but in case I'll fix it as soon as I am back with another deploy [17:21:12] ttl o/ [17:21:40] Thanks elukey and mforns!! [17:35:06] elukey, reportupdater does not use tempfile library [17:41:34] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 3 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) [18:16:42] 10Analytics, 10Operations, 10User-Elukey: setup/install krb1001/WMF5173 - https://phabricator.wikimedia.org/T233141 (10RobH) [18:17:49] 10Analytics, 10Operations, 10User-Elukey: setup/install krb1001/WMF5173 - https://phabricator.wikimedia.org/T233141 (10RobH) a:05RobHβ†’03elukey @ekuley, Assigning this to you since you initially requested the hardware. If someone else needs to implement, feel free to reassign or resolve this task as nee... [18:23:49] 10Analytics, 10Analytics-Kanban, 10Research-Backlog, 10Patch-For-Review: Release edit data lake data as a public json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10mforns) @ArielGlenn thanks for chiming in! > How big are these dumps for one set, and how many sets do we intend to... [18:26:34] elukey: yt? [18:33:01] ebernhardson: i think what is happeneing is that there is too much data for the current camus task to pull in its max elotted time [18:33:17] testing that now, and will fix in some way if so... [18:34:20] we also might need to add partitions to these topics [18:34:31] right now only a single consumer process can pull [18:35:51] (03CR) 10Joal: [C: 03+2] Cast mediarequests value as int before submitting the response [analytics/aqs] - 10https://gerrit.wikimedia.org/r/538611 (https://phabricator.wikimedia.org/T233622) (owner: 10Fdans) [18:36:36] (03Merged) 10jenkins-bot: Cast mediarequests value as int before submitting the response [analytics/aqs] - 10https://gerrit.wikimedia.org/r/538611 (https://phabricator.wikimedia.org/T233622) (owner: 10Fdans) [18:37:00] ottomata: more partitions for that topic makes sense [18:37:05] ottomata: thanks for looking! [18:37:17] yeah, also i just noticed that this camus job only has map.tasks set to 10 [18:37:24] and there are at least 34 topic/partitions to pull from [18:37:27] so it queues them up [18:37:40] going to increase that and see if cirrusearch starts pulling again [18:54:19] ottomata: I am now [18:54:34] did I break something?? [18:59:04] (going afk again, ping me on the phone if needed) [19:21:27] elukey: hey sorry [19:21:41] camus still broken, i think that job is just configured bad and never caught up after thihs weekend [19:22:09] wasn't sure what was going on so was pinging you [19:22:11] but i tihkn i know now [19:29:28] 10Analytics, 10EventBus, 10Product-Analytics: Review draft Modern Event Platform schema guidelines - https://phabricator.wikimedia.org/T233329 (10Neil_P._Quinn_WMF) >>! In T233329#5511829, @Ottomata wrote: > Thanks for comments! Thanks for working on this! 😁 >> So...we can't reliably upgrade producer and c... [19:30:55] ottomata: heya - question a [19:30:59] ya [19:31:00] again sorry [19:31:32] in clouds, is that expected that the available disk-space on various setup would be the same ? [19:31:44] ? [19:31:47] It is advertised to be different, but seems to be the same once the instance is created [19:31:53] in clouds? [19:31:57] yup [19:32:01] not suree i understand teh q [19:32:15] I created an instance, then dropped it and recreated it with bigger disk space [19:32:29] But, disk space was actually the same (ram changes, but not disk) [19:32:32] ah [19:32:42] i never even checked disk space before [19:32:48] i would expect it to be the new thing [19:32:59] you might want to ask in #wikimedia-cloud [19:33:02] so hd I [19:33:13] I'll ask in the clouds chan I guess [19:40:55] 10Analytics, 10LDAP-Access-Requests: log-in credential confusion for Hive - https://phabricator.wikimedia.org/T233648 (10herron) p:05Triageβ†’03Normal [19:45:01] 10Analytics, 10LDAP-Access-Requests: log-in credential confusion for Hive - https://phabricator.wikimedia.org/T233648 (10Ottomata) Hue needs to have accounts manually created. Just did. Try now with 'daisy' [20:02:17] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: Figure out how to $ref common schema across schema repositories - https://phabricator.wikimedia.org/T233432 (10Ottomata) I think using git submodules would be the least confusing. In all the other options, there is some 'd... [20:02:33] for the archive ottomata: LVM volume not created nor mounted [20:05:08] joal: it was a bug? [20:05:44] ottomata: not sure - They told it happend every now and then that big volumes don't get mounted (you get disks, but not configured) [20:05:48] oh hm [20:16:18] 10Analytics, 10Operations, 10Traffic: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria) [20:35:15] (03CR) 10Nuria: "I know this is merged just pointing out the issue with sizes and js" (031 comment) [analytics/aqs] - 10https://gerrit.wikimedia.org/r/538611 (https://phabricator.wikimedia.org/T233622) (owner: 10Fdans) [20:56:26] !log created new camus job for high volume mediawiki analytics events: mediawiki_analytics_events [20:56:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:43:26] 10Analytics, 10Analytics-Kanban, 10Research-Backlog, 10Patch-For-Review: Release edit data lake data as a public json dump /mysql dump, other? - https://phabricator.wikimedia.org/T208612 (10Bstorm) That size seems fine for the current disk available.