Wikimedia IRC logs browser

2021-01-27 02:11:55	<wikibugs>	'Analytics-Radar: Presto error in Superest - only when grouping - https://phabricator.wikimedia.org/T270503 (''EYener) Hi @JAllemandou thanks for the reply! I am pulling this task back up and opened the dashboard to implement these suggestions. However, I encountered a new error on all charts: presto error: Fa...'
2021-01-27 07:18:20	<elukey>	good morning
2021-01-27 07:33:06	<wikibugs>	('CR) ''Thiemo Kreuz (WMDE): [C: ''+1] Collect metrics of all wikis [analytics/reportupdater-queries] - ''https://gerrit.wikimedia.org/r/655886 (https://phabricator.wikimedia.org/T271894) (owner: ''WMDE-Fisch)'
2021-01-27 07:33:17	<wikibugs>	'Analytics, ''SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (''elukey)'
2021-01-27 07:33:48	<wikibugs>	'Analytics, ''SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (''elukey)'
2021-01-27 07:52:40	<elukey>	I am checking some rack availability for the new hadoop workers, and I found that in some we have more than 5 workers
2021-01-27 07:52:46	<elukey>	no bueno
2021-01-27 07:57:17	<elukey>	I am trying to spread nodes evenly on rows so a rack down with say 7 nodes on top shouldn't cause a ton of issues, but it is not great either
2021-01-27 08:01:12	<elukey>	ah no max seems to be 6
2021-01-27 08:06:21	<elukey>	no sigh 7 in rack C4
2021-01-27 08:31:28	<elukey>	ok completed the review, overall after the recent workers addition we have
2021-01-27 08:31:31	<elukey>	19 A
2021-01-27 08:31:33	<elukey>	19 B
2021-01-27 08:31:33	<elukey>	that looks very good
2021-01-27 08:31:36	<elukey>	21 C
2021-01-27 08:31:38	<elukey>	19 D
2021-01-27 08:31:46	<elukey>	so the new 6 nodes can be spread anywhere
2021-01-27 08:31:54	<elukey>	will comment in the task
2021-01-27 08:38:26	<wikibugs>	'Analytics-Clusters, ''DC-Ops, ''SRE, ''ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[18-41] - https://phabricator.wikimedia.org/T260445 (''elukey) Hi @wiki_willy thanks a lot for following up! I re-done the calculations of the workers' distribution after the last racking and this is what I g...'
2021-01-27 08:38:40	<elukey>	added my notes to --^
2021-01-27 09:17:10	<wikibugs>	'Analytics, ''Performance-Team: Coal graphs died around 2021-01-26 20:50 UTC - https://phabricator.wikimedia.org/T273033 (''Gilles)'
2021-01-27 09:26:14	<wikibugs>	'Analytics, ''Performance-Team, ''Patch-For-Review: Coal graphs died around 2021-01-26 20:50 UTC - https://phabricator.wikimedia.org/T273033 (''Gilles) Seems like coal simply needed to be restarted, it hadn't been since python3-snappy was installed on the host a few days ago for navtiming's sake. Won't hurt...'
2021-01-27 09:27:32	<wikibugs>	'Analytics: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (''elukey) I had some thoughts about bottlenecks and the only one that came to mind, not mentioned in the description of the task, is the database. The only an-airflow...'
2021-01-27 09:36:49	<wikibugs>	'Analytics, ''Performance-Team, ''Patch-For-Review: Coal graphs died around 2021-01-26 20:50 UTC - https://phabricator.wikimedia.org/T273033 (''Gilles) p:''Triage→''High'
2021-01-27 09:44:01	<wikibugs>	'Analytics, ''SRE, ''ops-eqiad: Degraded RAID on an-worker1099 - https://phabricator.wikimedia.org/T273034 (''elukey)'
2021-01-27 10:22:29	<wikibugs>	'Analytics, ''Better Use Of Data, ''Event-Platform, ''Product-Infrastructure-Data, and 3 others: EventLogging PHP EventServiceClient should use EventBus->send(). - https://phabricator.wikimedia.org/T272863 (''hashar) That is the MediaWiki installer failing: `counterexample * A dependency error was encount...'
2021-01-27 10:26:09	<wikibugs>	'Analytics, ''Better Use Of Data, ''Event-Platform, ''Product-Infrastructure-Data, and 5 others: EventLogging PHP EventServiceClient should use EventBus->send(). - https://phabricator.wikimedia.org/T272863 (''hashar)'
2021-01-27 10:37:39	<wikibugs>	'Analytics, ''Better Use Of Data, ''Event-Platform, ''Product-Infrastructure-Data, and 5 others: EventLogging PHP EventServiceClient should use EventBus->send(). - https://phabricator.wikimedia.org/T272863 (''hashar) The CI config change to add EventBus to the wmf-quibble* jobs is https://gerrit.wikimedia...'
2021-01-27 11:15:56	<elukey>	!log add client_port and debug fields to X-Analytics in webrequest varnishkafka streams
2021-01-27 11:15:58	<stashbot>	Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
2021-01-27 11:21:14	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Patch-For-Review: Add client TCP source port to webrequest - https://phabricator.wikimedia.org/T271953 (''elukey) Both changed deployed by Valentin, I checked the client_port field in webrequest_text on Kafka and it works nicely. The debug header needs to be triggered by an...'
2021-01-27 12:23:08	<elukey>	lunch!
2021-01-27 12:30:01	<klausman>	Same.
2021-01-27 12:42:21	<wikibugs>	'Analytics, ''Better Use Of Data, ''Event-Platform, ''Product-Infrastructure-Data, and 5 others: EventLogging PHP EventServiceClient should use EventBus->send(). - https://phabricator.wikimedia.org/T272863 (''hashar) On the dummy change https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/6589...'
2021-01-27 13:02:39	<joal>	!log Copy /wmf/data/event to backup cluster (30Tb) - T272846
2021-01-27 13:02:41	<stashbot>	Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
2021-01-27 13:02:42	<stashbot>	T272846: Backup HDFS data before BigTop upgrade - https://phabricator.wikimedia.org/T272846
2021-01-27 13:36:14	<dsaez>	hey a-team, good morning/afternoon/evening . I'm having issues with the pageviews API, it works correctly from the browser, but using python I get this error: https://pastebin.pl/view/7fa0efeb
2021-01-27 13:36:32	<dsaez>	I'm wondering if there is any user agent issue
2021-01-27 13:37:29	<mforns>	hi dsaez :] looking into this
2021-01-27 13:38:03	<dsaez>	thx mforns
2021-01-27 13:38:32	<elukey>	dsaez: yes it is me
2021-01-27 13:39:23	<elukey>	or it should be me, let's try to see :)
2021-01-27 13:39:33	<elukey>	are you using python-requests?
2021-01-27 13:39:52	<elukey>	because we added a specific block in Varnish the other day after a big surge in traffic
2021-01-27 13:40:05	<elukey>	following https://meta.wikimedia.org/wiki/User-Agent_policy
2021-01-27 13:40:20	<elukey>	so the block returns a 403 in this case but it should mention the UA policy
2021-01-27 13:40:26	<elukey>	that I don't see in your paste
2021-01-27 13:40:31	<elukey>	what is the HTTP error code returneD?
2021-01-27 13:41:09	<elukey>	also, can you give us the link to check?
2021-01-27 13:41:16	<wikibugs>	'Analytics: Add user to analytics-privatedata-users group - https://phabricator.wikimedia.org/T273058 (''gmodena)'
2021-01-27 13:41:35	<mforns>	elukey: it's a 403
2021-01-27 13:41:43	<dsaez>	yep 403
2021-01-27 13:41:56	<dsaez>	I'm using requests
2021-01-27 13:42:13	<dsaez>	for example https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/de.wikipedia/all-access/user/Johann_Wolfgang_von_Goethe/daily/2015101300/2015102700
2021-01-27 13:42:19	<dsaez>	that works from the browser
2021-01-27 13:42:22	<mforns>	the error message mentions indeed: Scripted requests from your IP have been blocked, please see https://meta.wikimedia.org/wiki/User-Agent_policy.
2021-01-27 13:43:39	<dsaez>	but this: requests.get(that_url) returns the error
2021-01-27 13:44:26	<dsaez>	elukey, sorry, I don't get it. This is an API, so which is the expected UA ?
2021-01-27 13:45:31	<mforns>	dsaez: The generic format is <client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]. Parts that are not applicable can be omitted. See: https://meta.wikimedia.org/wiki/User-Agent_policy
2021-01-27 13:45:42	<elukey>	mforns: ah didn't see that yes
2021-01-27 13:46:16	<mforns>	dsaez: I think you can use requests to send user agent:
2021-01-27 13:46:20	<elukey>	dsaez: You'd need to provide a UA that can tell us how to contact you in case the volume of requests is big
2021-01-27 13:46:42	<mforns>	response = requests.get(url, headers = {'User-agent': 'blahblah'})
2021-01-27 13:47:02	<elukey>	this is far from perfect, we may lift the block very soon (it was due to emergency) but in general we should follow the UA policy for the APIs
2021-01-27 13:47:51	<dsaez>	got it. Sounds very strict, I've done two calls.
2021-01-27 13:49:02	<elukey>	yes yes I know, we also have to figure out throttling, it is a temporary measure
2021-01-27 13:49:15	<elukey>	but in the long term we suggest to everybody to use a proper UA
2021-01-27 13:49:28	<dsaez>	got it
2021-01-27 13:49:29	<dsaez>	in fact
2021-01-27 13:49:33	<dsaez>	is not blocked
2021-01-27 13:49:57	<dsaez>	if I add the blahblah agent, is enough
2021-01-27 13:50:08	<elukey>	yes please use a better UA :D
2021-01-27 13:50:13	<dsaez>	hahaha
2021-01-27 13:50:14	<dsaez>	sure
2021-01-27 14:02:27	<wikibugs>	'Analytics-Radar, ''Release-Engineering-Team, ''observability, ''serviceops, and 2 others: Create a separate 'mwdebug' cluster - https://phabricator.wikimedia.org/T262202 (''jijiki)'
2021-01-27 14:02:30	<wikibugs>	'Analytics, ''Analytics-Kanban, ''serviceops, ''User-jijiki: Mechanism to flag webrequests as "debug" - https://phabricator.wikimedia.org/T263683 (''jijiki) ''Open→''Resolved @Milimetric patch is merged! We are setting debug=1 in the X-Analytics header if "X-Wikimedia-Debug" is present. Thank you fo...'
2021-01-27 14:03:50	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Patch-For-Review: Add client TCP source port to webrequest - https://phabricator.wikimedia.org/T271953 (''jijiki) Debug header works, we tested it with @elukey:)'
2021-01-27 14:04:57	<elukey>	joal: are you around?\
2021-01-27 14:24:29	<elukey>	joal: I killed the copy (client + map-reduce job), we were causing network alarms :(
2021-01-27 14:53:07	<wikibugs>	'Analytics, ''SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (''akosiaris) This is weird. I don't think we have encountered this before. ExecStop in the systemd unit file runs `ifdown ens5` but running that on the host returns ` root@kafka-test1006:...'
2021-01-27 14:53:39	<wikibugs>	'Analytics: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (''Ottomata) > if we want to have two/three more Airflow instances Do we want/need this? > store a little mariadb instance on every deployment of Airflow, getting re...'
2021-01-27 15:00:13	<wikibugs>	'Analytics, ''Better Use Of Data, ''Event-Platform, ''Product-Analytics, ''Product-Infrastructure-Data: MEP: Should stream configurations be written in YAML? - https://phabricator.wikimedia.org/T269774 (''Ottomata) > Create a new repo for stream configs and add it as a git submodule to operations/mediaw...'
2021-01-27 15:14:06	<wikibugs>	'Analytics: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (''elukey) @Ottomata the main problem that I can see how is that multi-tenancy is not really something that Airflow does well (and the people from Polidea confirmed th...'
2021-01-27 15:17:34	<wikibugs>	'Analytics, ''SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (''elukey) @akosiaris not reliably, but today I rebooted the 4 schema VMs and one of them got back with the same issue..'
2021-01-27 15:24:53	<wikibugs>	'Analytics, ''SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (''MoritzMuehlenhoff) >>! In T273026#6780528, @akosiaris wrote: > This is weird. I don't think we have encountered this before. > > ExecStop in the systemd unit file runs `ifdown ens5` but...'
2021-01-27 15:28:31	<joal>	heya elukey
2021-01-27 15:28:37	<joal>	sorry I'm with kids
2021-01-27 15:28:41	<joal>	good that you killed it
2021-01-27 15:28:52	<joal>	Let's review togother when I have time
2021-01-27 15:30:46	<elukey>	ack! I just pinged if you were around, I used the hammer :D
2021-01-27 15:38:14	<wikibugs>	'Analytics, ''SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (''akosiaris) >>! In T273026#6780640, @MoritzMuehlenhoff wrote: >>>! In T273026#6780528, @akosiaris wrote: >> This is weird. I don't think we have encountered this before. >> >> ExecStop in...'
2021-01-27 15:39:11	<wikibugs>	'Analytics, ''SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (''elukey) I recall VMs only from my past experience, I encountered this problem a couple of times before this one.'
2021-01-27 15:45:08	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Event-Platform, ''EventStreams, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (''JMeybohm) >>! In T269160#6777382, @elukey wrote: > Waiting for @JMeybohm'...'
2021-01-27 15:46:24	<wikibugs>	'Analytics, ''SRE, ''ops-eqiad: Degraded RAID on an-worker1099 - https://phabricator.wikimedia.org/T273034 (''elukey) @Ottomata @razzi this is the first datanode disk failure after the change that I made to use facter to populate the available partitions that Yarn and HDFS can use on a given worker node. In...'
2021-01-27 15:47:22	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Event-Platform, ''EventStreams, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (''elukey) >>! In T269160#6780685, @JMeybohm wrote: >>>! In T269160#6777382,...'
2021-01-27 15:47:31	<wikibugs>	'Analytics, ''Event-Platform: Rematerialise all event schemas with enforceNumericBounds: true - https://phabricator.wikimedia.org/T273069 (''Ottomata)'
2021-01-27 15:47:37	<wikibugs>	'Analytics, ''Event-Platform: Rematerialize all event schemas with enforceNumericBounds: true - https://phabricator.wikimedia.org/T273069 (''Ottomata)'
2021-01-27 15:49:36	<wikibugs>	'Analytics, ''SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (''MoritzMuehlenhoff) >>! In T273026#6780670, @akosiaris wrote: > Do you by any chance remember if it was on VMs only? Or was it physical hosts too? From my memory only VMs. I've checked my...'
2021-01-27 15:50:54	<elukey>	ottomata: if you are ok I'd heml eventstreams-internal!
2021-01-27 15:51:30	<elukey>	cd /srv/deployment-charts/helmfile.d/services/eventstreams-internal; helmfile -e codfw -i apply
2021-01-27 15:51:33	<elukey>	and then eqiad
2021-01-27 15:51:39	<elukey>	does it sound ok?
2021-01-27 15:51:58	<ottomata>	go for it!
2021-01-27 15:51:59	<ottomata>	yes!
2021-01-27 15:52:10	<ottomata>	(no lvs yet, right?
2021-01-27 15:52:11	<ottomata>	)
2021-01-27 15:52:49	<ottomata>	not sure if i can test very easily without, would have to do some curl --resolve magic and look up lots of stuff, but if the kube logs look good we can assume i tworks
2021-01-27 15:52:58	<ottomata>	will look at logs after you apply
2021-01-27 15:53:30	<elukey>	no lvs exactly
2021-01-27 15:54:56	<elukey>	ok we can start with
2021-01-27 15:54:56	<elukey>	Error: pods is forbidden: User "eventstreams-internal" cannot list resource "pods" in API group "" in the namespace "eventstreams-internal"
2021-01-27 15:56:56	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Event-Platform, ''EventStreams, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (''elukey) ` Error: pods is forbidden: User "eventstreams-internal" cannot l...'
2021-01-27 16:00:39	<wikibugs>	'Analytics, ''Performance-Team: Coal graphs died around 2021-01-26 20:50 UTC - https://phabricator.wikimedia.org/T273033 (''Gilles) ''Open→''Resolved a:''Gilles Restarting coal fixed the data, as expected: {F34044291}'
2021-01-27 16:01:02	<elukey>	ah I may know why
2021-01-27 16:01:26	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Event-Platform, ''EventStreams, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (''JMeybohm) You probably have not yet depoyed the admin part (the new names...'
2021-01-27 16:04:19	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Event-Platform, ''EventStreams, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (''elukey) >>! In T269160#6780761, @JMeybohm wrote: > You probably have not...'
2021-01-27 16:06:48	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Event-Platform, ''EventStreams, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (''JMeybohm) Apart from you testing my attention again (kube_env admin [codf...'
2021-01-27 16:13:09	<wikibugs>	('PS1) ''Mforns: Make HiveToDruid return exit code when deployMode=client [analytics/refinery/source] - ''https://gerrit.wikimedia.org/r/659017 (https://phabricator.wikimedia.org/T271568)'
2021-01-27 16:14:23	<ottomata>	elukey: should we start refering to presto as trino?
2021-01-27 16:14:34	<ottomata>	was thikning about adding presto support to wmfdata python
2021-01-27 16:14:40	<ottomata>	looked fro client
2021-01-27 16:14:41	<ottomata>	https://github.com/trinodb/trino-python-client
2021-01-27 16:14:44	<ottomata>	looks like the one maybe
2021-01-27 16:15:19	<elukey>	ottomata: to avoid too much work, I'd just upgrade to the latest presto (fb presto) and then think about migrating to trino later
2021-01-27 16:15:31	<elukey>	I thought we agreed on this during a standup :D
2021-01-27 16:19:06	<wikibugs>	'Analytics, ''Patch-For-Review: Follow up on Druid alarms not firing when Druid indexations were failing due to permission issues - https://phabricator.wikimedia.org/T271568 (''mforns) After some tests, I think the problem lies in the code: ` if (spark.conf.get("spark.master") != "yarn") { sys.exit(if (su...'
2021-01-27 16:19:45	<wikibugs>	'Analytics, ''Analytics-EventLogging, ''Analytics-Kanban, ''Event-Platform, and 2 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (''Ottomata)'
2021-01-27 16:19:50	<ottomata>	elukey: my memory is poor
2021-01-27 16:20:09	<ottomata>	if did presto in wmfdata then, should I use https://github.com/prestodb/presto-python-client instead?
2021-01-27 16:20:30	<ottomata>	trino one has more recent commits
2021-01-27 16:20:52	<elukey>	yes that client should be ok in my opinion
2021-01-27 16:21:19	<elukey>	to clarify - if we want to move to trino I am 100% onboard, it seemed only too much for us
2021-01-27 16:21:38	<elukey>	but if you want to move to trino +1
2021-01-27 16:23:04	<ottomata>	elukey: naw i'm not trying to expedite move to it
2021-01-27 16:23:12	<ottomata>	just wondering what our language should be, but
2021-01-27 16:23:21	<ottomata>	it sounds like for my q: we should keep saying 'presto'
2021-01-27 16:23:27	<ottomata>	i can use a trino client now
2021-01-27 16:23:38	<ottomata>	and later when we change rename to 'trino' in wmfdata
2021-01-27 16:23:40	<ottomata>	e.g. ^
2021-01-27 16:32:48	<joal>	Here I am
2021-01-27 16:33:18	<joal>	elukey: Hi :)
2021-01-27 16:33:29	<joal>	elukey: I'm sorry again about the network mess :(
2021-01-27 16:36:00	<joal>	razzi: Hello :) would you have a minute for me?
2021-01-27 16:47:52	<elukey>	joal: not your fault :)
2021-01-27 16:48:00	<joal>	I wondered :S
2021-01-27 16:48:06	<joal>	:)
2021-01-27 16:48:10	<elukey>	no I mean it was the data copy
2021-01-27 16:48:14	<joal>	I probably shouldn't be back :)
2021-01-27 16:48:19	<elukey>	but you didn't really do it on purpose
2021-01-27 16:48:20	<joal>	It was elukey
2021-01-27 16:48:23	<elukey>	so not your fault :)
2021-01-27 16:48:25	<joal>	well, I did!
2021-01-27 16:48:35	<elukey>	uffff
2021-01-27 16:48:47	<elukey>	I strongly disagree :D
2021-01-27 16:49:00	<joal>	We knew it would put load on the network - We just didn't know how much and how much was too much :)
2021-01-27 16:49:01	<elukey>	but I cannot really convince you otherwise :D
2021-01-27 16:49:04	<joal>	hehehe :)
2021-01-27 16:49:27	<joal>	anyway - Shall I try with half the number of mappers?
2021-01-27 16:50:47	<joal>	elukey: --^
2021-01-27 16:51:38	<wikibugs>	('CR) ''Elukey: [C: ''+1] "Completely ignorant about this but the option looks present for 2.4 and it makes sense to me, thanks Marcel!" [analytics/refinery/source] - ''https://gerrit.wikimedia.org/r/659017 (https://phabricator.wikimedia.org/T271568) (owner: ''Mforns)'
2021-01-27 16:52:14	<mforns>	thanks for the CR elukey :]
2021-01-27 16:53:38	<elukey>	joal: yes let's try!
2021-01-27 16:53:40	<wikibugs>	('CR) ''Joal: [C: ''+1] "LGTM!Thanks @mforns" [analytics/refinery/source] - ''https://gerrit.wikimedia.org/r/659017 (https://phabricator.wikimedia.org/T271568) (owner: ''Mforns)'
2021-01-27 16:53:46	<joal>	ack elukey - launching the thing
2021-01-27 16:53:51	<elukey>	joal: is there a way to throttle it a bit too?
2021-01-27 16:54:17	<mforns>	thx for CR joal, do you know why we are not returning exit code inside YARN?
2021-01-27 16:55:28	<joal>	mforns: I imagine we could, but there would be no way to actually take advantage of it I think
2021-01-27 16:56:26	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Event-Platform, ''EventStreams, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (''elukey) es-internal deployed in both eqiad and codfw, next steps are: -...'
2021-01-27 16:56:37	<mforns>	joal: aha
2021-01-27 16:58:30	<wikibugs>	'Analytics, ''Analytics-EventLogging, ''Analytics-Kanban, ''Event-Platform, and 2 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (''mforns)'
2021-01-27 16:58:40	<wikibugs>	'Analytics: Filter out webrequest where debug=1 from pageview - https://phabricator.wikimedia.org/T273083 (''JAllemandou)'
2021-01-27 17:00:11	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Event-Platform, ''EventStreams, and 5 others: Set up internal eventstreams instance exposing all streams declared in stream config (and in kafka jumbo) - https://phabricator.wikimedia.org/T269160 (''Ottomata) @elukey [[ https://logstash.wikimedia.org/goto/b408da9f4b39f66a...'
2021-01-27 17:01:58	<ottomata>	fdans: milimetric joal yoohoo!
2021-01-27 17:02:04	<joal>	elukey: file-listing done, actual copy starting
2021-01-27 17:02:12	<joal>	ottomata: tuning-session!
2021-01-27 17:02:23	<ottomata>	oh ho ok
2021-01-27 17:02:53	<joal>	elukey: 8.8M files to be copied
2021-01-27 17:03:33	<wikibugs>	'Analytics, ''SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (''Legoktm) p:''Triage→''Low'
2021-01-27 17:04:31	<joal>	elukey: I also have a question when ou have a minute
2021-01-27 17:05:43	<mforns>	joal: ping standup?
2021-01-27 17:05:58	<joal>	mforns: tuning session? shall I maybe not be there?
2021-01-27 17:06:06	<joal>	fdans: --^ ?
2021-01-27 17:06:12	<mforns>	oh!
2021-01-27 17:10:29	<wikibugs>	'Analytics, ''SRE: Errors for ifup@ens5.service after rebooting Ganeti VMs - https://phabricator.wikimedia.org/T273026 (''akosiaris) I 'll take your word for it. +1 on the cleanup thing.'
2021-01-27 17:13:40	<elukey>	Amir1: the client_port flag is now in new webrequest data, so if you need to check/use it you can :)
2021-01-27 17:13:50	<elukey>	what is the ideal use case? Query via Superset?
2021-01-27 17:13:55	<elukey>	or do you use hive via cli?
2021-01-27 17:13:59	<elukey>	(or even presto)
2021-01-27 17:14:00	<Amir1>	Awesome
2021-01-27 17:14:05	<Amir1>	I do hive
2021-01-27 17:14:07	<Amir1>	beeline
2021-01-27 17:14:19	<elukey>	perfect
2021-01-27 17:14:23	<joal>	Amir1: I suggest you try spark ;)
2021-01-27 17:14:29	<Amir1>	I need to ask the cu in ukwiki
2021-01-27 17:14:46	<Amir1>	usually yes but this one is a specific problem :D
2021-01-27 17:15:03	<Amir1>	Thank you!
2021-01-27 17:15:28	<wikibugs>	'Analytics, ''Analytics-EventLogging, ''Analytics-Kanban, ''Event-Platform, and 2 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (''Ottomata) a:''Gilles→''Ottomata'
2021-01-27 17:18:15	<joal>	elukey: I just restarted the copy job - I realized I messed up and had not changed the number of mappers :(
2021-01-27 17:19:12	<Amir1>	https://phabricator.wikimedia.org/T265692#6781099 let the CU know
2021-01-27 17:27:08	<wikibugs>	'Analytics, ''SRE: archiva artifact links point to 127.0.0.1 - https://phabricator.wikimedia.org/T164993 (''elukey)'
2021-01-27 17:34:50	<joal>	razzi: not sure if you got my previous ping with the irc issues - trying again
2021-01-27 17:35:37	<razzi>	joal: didn't see the ping, please go again :)
2021-01-27 17:35:43	<joal>	Hi razzi :)
2021-01-27 17:35:49	<joal>	I have a questio
2021-01-27 17:35:53	<joal>	if you have a minute
2021-01-27 17:36:07	<razzi>	indeed I do
2021-01-27 17:36:28	<joal>	razzi: Can you confirm that user eyener is in analytics-privatedata-users group?
2021-01-27 17:36:50	<joal>	I think elukey told me 10 times ho to do it, and I still can't recall :(
2021-01-27 17:37:59	<razzi>	joal: I can confirm that user is in analytics-privatedata-users by running `groups eyener`
2021-01-27 17:38:54	<joal>	ack razzi - I wouldn't have expected I can run groups command as not-root - Thanks a lot!!
2021-01-27 17:39:19	<razzi>	you're welcome :)
2021-01-27 17:56:23	<joal>	eyener: Hi! I'm reading your comment on the presto error ticket
2021-01-27 18:03:03	<elukey>	razzi: do you want to reboot an-launcher1002?
2021-01-27 18:03:36	<razzi>	elukey: yeah, bc?
2021-01-27 18:03:58	<razzi>	elukey: or maybe it's not that involved and we can do so async
2021-01-27 18:04:09	<elukey>	razzi: I think that we can do it in here if you are ok
2021-01-27 18:05:57	<elukey>	razzi: to recap - first thing is to check what's running with 'systemctl list-timers'
2021-01-27 18:06:33	<elukey>	we have to identify the prefixes to stop
2021-01-27 18:06:44	<elukey>	ah also, let's disable puppet
2021-01-27 18:06:58	<elukey>	with something like "Razzi - prepping for reboot"
2021-01-27 18:07:06	<razzi>	elukey: sounds good
2021-01-27 18:07:13	<elukey>	one first example could be
2021-01-27 18:07:27	<elukey>	sudo systemctl stop 'reportupdater-*.timer'
2021-01-27 18:07:41	<elukey>	the important bit here is the .timer at the end
2021-01-27 18:07:59	<elukey>	since if you do stop reportupdater-* you'll target the service, that might be running
2021-01-27 18:08:06	<elukey>	we want to stop scheduled executions
2021-01-27 18:08:16	<elukey>	(and basically gently draining)
2021-01-27 18:08:41	<elukey>	eventually you'll end up with systemctl list-timers showing only system level timers
2021-01-27 18:08:44	<elukey>	like logrotate etc..
2021-01-27 18:08:47	<elukey>	that are fine to run
2021-01-27 18:09:04	<elukey>	once done, we'll need to check if any java/python processes are running
2021-01-27 18:09:20	<elukey>	if yes, let's wait until the finish, otherwise green light to reboot
2021-01-27 18:09:33	<elukey>	then puppet enable + run and the maintenance is done :)
2021-01-27 18:09:35	<razzi>	I don't see reportupdater- timers in systemctl list-timers
2021-01-27 18:09:49	<elukey>	Wed 2021-01-27 19:00:00 UTC 53min left Wed 2021-01-27 18:00:00 UTC 6min ago reportupdater-browser.timer
2021-01-27 18:10:03	<elukey>	on what host are you?
2021-01-27 18:10:09	<razzi>	:) an-master oops
2021-01-27 18:10:18	<elukey>	ah yes it makes sense then :D
2021-01-27 18:10:35	<razzi>	Reenabled puppet, now on to an-launcher1002
2021-01-27 18:10:51	<wikibugs>	'Analytics-Radar: Presto error in Superest - only when grouping - https://phabricator.wikimedia.org/T270503 (''JAllemandou) Hi @EYener > presto error: Failed to list directory: hdfs://analytics-hadoop/wmf/data/event_sanitized/CentralNoticeBannerHistory/year=2021/month=1/day=9/hour=21 I have not experienced th...'
2021-01-27 18:12:42	<joal>	elukey: are we still ok in term of network?
2021-01-27 18:14:36	<elukey>	joal: it seems so yes, no complains about links saturation
2021-01-27 18:14:46	<joal>	ack elukey - thanks for checking
2021-01-27 18:15:02	<joal>	elukey: something else if you may?
2021-01-27 18:15:58	<elukey>	joal: sure what's up
2021-01-27 18:16:44	<joal>	elukey: we're gonna need all users setup on the backup cluster :(
2021-01-27 18:17:16	<joal>	elukey: the /user folder is looking wrong despite me having resync
2021-01-27 18:17:24	<joal>	in terms of ownership
2021-01-27 18:17:36	<elukey>	joal: you wiped and re-copied right?
2021-01-27 18:17:52	<joal>	elukey: I distcp -update - which should do the same
2021-01-27 18:18:53	<eyener>	joal Awesome! You fixed it! :) I am not sure what the issue was but every chart in that dash was failing to load yesterday
2021-01-27 18:19:20	<joal>	eyener: eh :) Fixing without touching is m prefered way - ususaly doesn't work though :)
2021-01-27 18:20:02	<joal>	thanks for letting me know eyener - sorry for no good answer on updating charts (yet)
2021-01-27 18:21:20	<eyener>	Ha no worries joal - appreciate you checking it out. I've asked in the Superset slack workspace as well and haven't received a reply but I'll let you know if I ever figure it out
2021-01-27 18:21:29	<eyener>	maybe some jinja templating or something...?
2021-01-27 18:22:11	<joal>	very possible eyener - /me is no superset ninja for sure
2021-01-27 18:22:13	<elukey>	joal: not sure, have you tried to explicitly wipe and copy a single user dir? Just to see if perms are weird
2021-01-27 18:22:28	<elukey>	in theory users are already deployed on the cluster, on all nodes
2021-01-27 18:22:31	<elukey>	masters + workers
2021-01-27 18:22:35	<joal>	MQH
2021-01-27 18:22:37	<joal>	MEH
2021-01-27 18:22:50	<wikibugs>	'Analytics, ''Product-Infrastructure-Data, ''Wikimedia-Logstash, ''observability, ''Patch-For-Review: Create a separate logstash ElasticSearch index for schemaed events - https://phabricator.wikimedia.org/T265938 (''Ottomata) In a meeting with devs doing client error logging today, we realized that conf...'
2021-01-27 18:23:06	<joal>	elukey: I'll try wipe-out for real and see if it changes anything
2021-01-27 18:23:29	<joal>	elukey: and I'll use 64 mappers as my basis
2021-01-27 18:24:01	<elukey>	perfect thanks
2021-01-27 18:24:11	<elukey>	if it doesn't work we can check again but it is weird
2021-01-27 18:24:16	<joal>	sure elukey
2021-01-27 18:24:33	<joal>	thanks for confirming that the hardware should be ready
2021-01-27 18:25:15	<mforns>	joal, I believe the changes you did to hdfs cleaner need to be deployed?
2021-01-27 18:25:40	<joal>	mforns: I think elukey did?
2021-01-27 18:25:47	<joal>	maybe not?
2021-01-27 18:26:01	<mforns>	joal: isn't the hdfs cleaner in refinery repo?
2021-01-27 18:26:20	<elukey>	mforns: yep the three timers have been deployed
2021-01-27 18:26:34	<mforns>	ok elukey thanks
2021-01-27 18:26:38	<joal>	mforns: I have not changed the code - only added puppet stuff :)
2021-01-27 18:26:46	<wikibugs>	'Analytics-Clusters, ''DC-Ops, ''SRE, ''ops-eqiad: (Need By: TBD) rack/setup/install an-worker11[18-41] - https://phabricator.wikimedia.org/T260445 (''wiki_willy) Hi @elukey - thanks for the mapping. What makes it tough is that the remaining 6x hosts need to be on 10g switches, which really limits our op...'
2021-01-27 18:26:46	<mforns>	ok ok :]
2021-01-27 18:26:54	<joal>	thanks for checking mforns
2021-01-27 18:31:52	<elukey>	mforns: in theory I should be on-call now right? Anything to handover?
2021-01-27 18:31:59	<elukey>	forgot to ask during standup
2021-01-27 18:32:06	<elukey>	razzi: how are things going?
2021-01-27 18:32:15	<mforns>	elukey: no no, it's tomorrow
2021-01-27 18:32:33	<razzi>	elukey: good, have stopped some more timers, still going through the list
2021-01-27 18:32:45	<elukey>	okok
2021-01-27 18:36:47	<razzi>	I believe the following services should be kept, am I missing any?
2021-01-27 18:36:47	<razzi>	export_smart_data_dump.service
2021-01-27 18:36:47	<razzi>	logrotate.service
2021-01-27 18:36:47	<razzi>	man-db.service
2021-01-27 18:36:47	<razzi>	systemd-tmpfiles-clean.service
2021-01-27 18:38:19	<razzi>	oh and apt-daily.service and apt-daily-upgrade.service
2021-01-27 18:38:30	<elukey>	yes yes
2021-01-27 18:38:45	<elukey>	the only one that you missed is the hdfs-cleaner-*
2021-01-27 18:38:59	<elukey>	those are the periodical jobs that clean up some dirs in hdfs
2021-01-27 18:40:29	<elukey>	razzi: --^
2021-01-27 18:41:36	<razzi>	cool
2021-01-27 18:45:34	<elukey>	razzi: can you stop them?
2021-01-27 18:45:45	<elukey>	so we can proceed with the next steps :)
2021-01-27 18:46:08	<razzi>	yes yes, got distracted
2021-01-27 18:47:51	<elukey>	razzi: also there are mediawiki-* and hdfs-balancer
2021-01-27 18:48:30	<elukey>	we should really think about changing the names, adding something line analytics- in front
2021-01-27 18:48:34	<razzi>	How about prometheus-nic-firmware-textfile / prometheus_intel_microcode?
2021-01-27 18:49:17	<elukey>	those are fine, the prometheus exporters can be left aside
2021-01-27 18:49:23	<elukey>	they just expose metrics
2021-01-27 18:49:30	<razzi>	ok cool
2021-01-27 18:50:00	<elukey>	then we need to make sure that no java/python processes are running, and if so we'd need to wait
2021-01-27 18:50:33	<razzi>	so wait should hdfs-balancer and mediawiki* be stopped?
2021-01-27 18:54:05	<elukey>	yep yep
2021-01-27 18:54:16	<elukey>	those don't need to run while we reboot
2021-01-27 18:55:50	<razzi>	ok should be all set to reboot
2021-01-27 18:57:10	<elukey>	razzi: what about java/python processes running?
2021-01-27 18:57:16	<razzi>	oh right
2021-01-27 18:57:38	<elukey>	also you didn't stop the hdfs-cleaner timers
2021-01-27 18:58:21	<mforns>	ottomata: not sure if I need a +1 for these, but just in case, can you look? :]
2021-01-27 18:58:23	<mforns>	https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/659022
2021-01-27 18:58:28	<mforns>	https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/658426
2021-01-27 19:00:15	<razzi>	elukey: alright, stopped
2021-01-27 19:01:59	<fkaelin>	We would like to run a spark job that downloads all commons images from swift and stores the base64 image bytes in a column on hdfs, there will be roughly 7TB of data. Is there a recommended folder to store such a dataset, ie so that the size will not cause problems and it is available for others on the team?
2021-01-27 19:04:28	<elukey>	fkaelin: I suspect that you are working with Miriam :D
2021-01-27 19:05:34	<elukey>	fkaelin: so there are a couple of things to check - how many files are we talking about? (the hdfs namenode suffers a bit when we add million files more etc..)
2021-01-27 19:05:45	<elukey>	I am more concerned about that than those 7TB of space
2021-01-27 19:06:04	<elukey>	razzi: so next steps? :)
2021-01-27 19:06:31	<razzi>	I see a couple of python processes: eventlogging_to_druid_navigationtiming_hourly and eventlogging_to_druid_navigationtiming_daily
2021-01-27 19:06:31	<razzi>	and a couple java ones org.wikimedia.analytics.refinery.job.HiveToDruid
2021-01-27 19:06:31	<razzi>	No idea how long they'll take to finish
2021-01-27 19:06:47	<elukey>	razzi: perfect
2021-01-27 19:06:54	<elukey>	one thing to check is when they started
2021-01-27 19:07:02	<elukey>	one on Jan25
2021-01-27 19:07:13	<elukey>	the other on Jan21
2021-01-27 19:07:30	<elukey>	or not sorry lemme check better
2021-01-27 19:07:34	<elukey>	I might say something silly
2021-01-27 19:08:20	<elukey>	mmm yes weird they have been running for a while
2021-01-27 19:08:50	<elukey>	mforns: holaaaaa
2021-01-27 19:08:55	<elukey>	do you have a min?
2021-01-27 19:09:36	<elukey>	the navtiming hourly + daily hive2druid indexations seem taking a lot of times, they started hours and hours ago
2021-01-27 19:09:45	<elukey>	has it ever happened that they got stuck?
2021-01-27 19:10:17	<fkaelin>	elukey yes, that is work with miriam. the image bytes will be stored as byte64 encoded strings in a schema, so the number of files depends on the whatever blocksize hadoop/spark chooses
2021-01-27 19:11:39	<elukey>	fkaelin: okok so 7TB is a bit but we have a lot of space, and it is a one off, the only thing that we should check is how many files will be generated.. if it is say 10 millions it might be a problem, if we are taking about few thousand I think it is fine
2021-01-27 19:12:20	<elukey>	fkaelin: can we run a test on a subset of data to see how many files are generated?
2021-01-27 19:13:26	<elukey>	our blocksize for hadoop is 256M IIRC
2021-01-27 19:14:06	<elukey>	razzi: since we cannot leave things stopped for so much, let's reboot an-launcher1002
2021-01-27 19:14:15	<elukey>	those two jobs seem stuck
2021-01-27 19:14:25	<elukey>	(we need to downtime first)
2021-01-27 19:19:50	<elukey>	razzi: I am rebooting the node myself, we should not wait this long
2021-01-27 19:20:13	<elukey>	we stopped camus for a long time and when it restart it lags for a while
2021-01-27 19:20:26	<elukey>	so when doing maintenance let's focus on the task please :)
2021-01-27 19:21:37	<razzi>	elukey: alright yeah
2021-01-27 19:24:49	<mforns>	elukey: in meeting! it finishes in 25mins
2021-01-27 19:25:03	<elukey>	mforns: all good! We can follow up tomorrow
2021-01-27 19:25:27	<mforns>	elukey: but yes, it happened start of the year!
2021-01-27 19:25:42	<elukey>	sigh :(
2021-01-27 19:25:55	<elukey>	razzi: ok host is up, can you re-enable and run puppet?
2021-01-27 19:27:16	<razzi>	elukey: re-enable timers via systemctl start?
2021-01-27 19:27:35	<elukey>	razzi: a puppet run is sufficient to restore all puppet-defined timers
2021-01-27 19:27:50	<razzi>	gotcha, that makes sense
2021-01-27 19:27:56	<ottomata>	sorry mforns looks like you got em +ed :)
2021-01-27 19:28:12	<mforns>	ottomata: yes, no problemo, they deployed :]
2021-01-27 19:29:04	<ottomata>	razzi: hm i did migrate a bunch of navigationtiming data to event platform today!
2021-01-27 19:29:11	<ottomata>	i wouldn't expect it tocause issue
2021-01-27 19:29:13	<ottomata>	but..would it?
2021-01-27 19:29:32	<ottomata>	mforns: can you think of anything i hive to druid that woudl need to be cahnged to deal with events with migrated schema?
2021-01-27 19:29:37	<ottomata>	the hive table was migrated yesterday
2021-01-27 19:29:39	<elukey>	one job was stuck since the 21st :(
2021-01-27 19:29:46	<elukey>	the other from the 25th
2021-01-27 19:29:55	<mforns>	ottomata: on meeting, but will repond in a bit!
2021-01-27 19:30:13	<ottomata>	hm yeah i didn't touch navigation timing until yesterday
2021-01-27 19:30:15	<ottomata>	also meting! :)
2021-01-27 19:30:24	<elukey>	all right I am going to dinner, ttl!
2021-01-27 19:30:29	<razzi>	cya elukey
2021-01-27 19:31:07	<ottomata>	l8rs
2021-01-27 19:31:23	<wikibugs>	'Analytics-Kanban, ''Better Use Of Data, ''Product-Analytics, ''Product-Infrastructure-Data: Roll-up raw sessionTick data into distribution - https://phabricator.wikimedia.org/T271455 (''sdkim) a:''mforns→''Mayakp.wiki'
2021-01-27 19:37:22	<razzi>	afk for lunch
2021-01-27 20:02:37	<joal>	gone for tonight team - see you tomorrow
2021-01-27 20:09:03	<fkaelin>	elukey for the tests I used the default blocksize which seems to be 64mb. So for 7TB of data we are looking at ~100k files, or ~25k if we set the blocksize to 256MB.
2021-01-27 20:12:26	<fkaelin>	elukey the job will run over a couple days on a small number of workers (aiming for ~100qps to swift), so the hdfs files will be created at a slow pace.
2021-01-27 20:27:16	<wikibugs>	'Analytics: Generalize the current Airflow puppet/scap code to deploy a dedicated Analytics instance - https://phabricator.wikimedia.org/T272973 (''EBernhardson) Concur with regard to multi-tenancy, I tried to setup our airflow initially in a way that used the builtin multi-tenancy but as soon as I started inte...'
2021-01-27 20:50:34	<mforns>	ottomata: I don't see any thing that would need to be changed for HiveToDruid re. migrated schemas...
2021-01-27 20:50:58	<mforns>	ottomata: maybe the only thing would be the meaning of dt field
2021-01-27 20:51:22	<mforns>	but IIUC the meaning of dt does not change rright?
2021-01-27 20:52:28	<mforns>	and all other fields are available with the same name in a backwards compatible way... so, I'd say no changes needed
2021-01-27 20:54:26	<eyener>	joal if you're around, I'm getting another iteration of the `presto error: Failed to list directory: hdfs://analytics-hadoop/wmf/data/event_sanitized/CentralNoticeBannerHistory/year=2021/month=1/day=9/hour=1` error when I try to edit the Banner History dash
2021-01-27 20:57:45	<wikibugs>	('PS1) ''Mforns: Add en.wikidata to pageview whitelist [analytics/refinery] - ''https://gerrit.wikimedia.org/r/659081'
2021-01-27 20:59:58	<ottomata>	mforns: no its the same with legacy data
2021-01-27 21:00:07	<ottomata>	dt only means event time for new schemas
2021-01-27 21:02:49	<mforns>	I see ottomata, HiveToDruid will work for new schemas the same, the only difference (if we want to use a time field other than dt for a given dataset) would be we have to explicitly specify it from druid_load.pp (which is already supported)
2021-01-27 21:03:03	<ottomata>	cool!
2021-01-27 21:08:05	<wikibugs>	'Analytics, ''Analytics-EventLogging, ''Analytics-Kanban, ''Event-Platform, and 2 others: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 (''mforns)'
2021-01-27 21:08:49	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Event-Platform: MobileWebUIActionsTracking Event Platform Migration - https://phabricator.wikimedia.org/T267347 (''mforns)'
2021-01-27 21:09:00	<wikibugs>	'Analytics, ''Analytics-Kanban, ''Event-Platform, ''Patch-For-Review: DesktopWebUIActionsTracking Event Platform Migration - https://phabricator.wikimedia.org/T271164 (''mforns)'
2021-01-27 21:15:58	<wikibugs>	'Analytics, ''Better Use Of Data: Create Oozie job for session length - https://phabricator.wikimedia.org/T273116 (''mforns)'
2021-01-27 21:33:53	<ottomata>	fkaelin: default block size shoudl be 256MB
2021-01-27 21:33:55	<ottomata>	https://yarn.wikimedia.org/conf
2021-01-27 21:33:59	<ottomata>	dfs.blocksize
2021-01-27 21:38:43	<ottomata>	mforns: tomorrow my morningg i'm going to migrate my nav timing schemas to all wikis, if you are around we can do yours at the same time (without a deployment window)
2021-01-29 23:22:06	<razzi>	an-test-presto1001 is out of disk space and is causing alarms, but since it's a test node I'm not going to bother with it for now
2021-01-29 23:25:04	<wikibugs>	'Analytics: Presto should warn or prevent users from querying without Hive partition predicates - https://phabricator.wikimedia.org/T273004 (''razzi) One way to go about this may be to use `hive.max-partitions-per-scan`. From the docs: \| hive.max-partitions-per-scan \| Maximum number of partitions for a single...'
2021-01-29 23:27:02	<razzi>	There is also a problem on kafka-test1009: after rebooting, I see
2021-01-29 23:27:02	<razzi>	```
2021-01-29 23:27:02	<razzi>	razzi@kafka-test1009:~$ sudo systemctl list-units --failed
2021-01-29 23:27:02	<razzi>	UNIT LOAD ACTIVE SUB DESCRIPTION
2021-01-29 23:27:02	<razzi>	● ifup@ens5.service loaded failed failed ifup for ens5
2021-01-29 23:27:03	<razzi>	```
2021-01-29 23:27:03	<razzi>	Again, since it's a test node, I'm going to leave it alone
2021-01-29 23:41:56	<wikibugs>	'Analytics, ''Product-Data-Infrastructure, ''Wikimedia-Logstash, ''observability, ''Patch-For-Review: Create a separate logstash ElasticSearch index for schemaed events - https://phabricator.wikimedia.org/T265938 (''colewhite) >>! In T265938#6781389, @Ottomata wrote: > In a meeting with devs doing clien...'

Wikimedia IRC logs browser - #wikimedia-analytics