Wikimedia IRC logs browser

2020-10-14 08:10:06	<wikibugs>	'serviceops, ''Growth-Structured-Tasks, ''Growth-Team, ''Release-Engineering-Team: Move mwaddlink-query from github to gerrit - https://phabricator.wikimedia.org/T261403 (''kostajh) >>! In T261403#6538868, @MGerlach wrote: > @kostajh >> 1. take https://github.com/martingerlach/mwaddlink-query and move ut...'
2020-10-14 08:11:56	<wikibugs>	'serviceops, ''Growth-Structured-Tasks, ''Growth-Team, ''Release-Engineering-Team: Move mwaddlink-query from github to gerrit - https://phabricator.wikimedia.org/T261403 (''kostajh) >>! In T261403#6540928, @thcipriani wrote: >> Second, should this go under `mediawiki/services/{service-name}`, or somewhere...'
2020-10-14 08:18:28	<wikibugs>	'serviceops, ''MediaWiki-Parser, ''Parsoid, ''Platform Team Workboards (Green): CAPEX for ParserCache for Parsoid - https://phabricator.wikimedia.org/T263587 (''ArielGlenn) In case we wanted to cannibalise some servers from the restbase cluster as we move their content to parsercache backends, assuming su...'
2020-10-14 10:56:25	<effie>	apergos: I love the term cannibalise
2020-10-14 10:56:59	<apergos>	thanks :-)
2020-10-14 10:57:36	<apergos>	you know what I love? see the staff channel
2020-10-14 12:39:32	<hnowlan>	what's the recommended way to run the mtail tests? `python -m unittest discover . -p "*_test.py"` or something else?
2020-10-14 12:42:51	<wikibugs>	'serviceops, ''Growth-Structured-Tasks, ''Growth-Team, ''Release-Engineering-Team: Move mwaddlink-query from github to gerrit - https://phabricator.wikimedia.org/T261403 (''kostajh) >>! In T261403#6538868, @MGerlach wrote: >> 3. Growth engineers can work with you and Release Engineering to set up the Depl...'
2020-10-14 12:45:34	<volans>	hnowlan: there is tox env so tox -e mtail #from the root of puppet shouild do
2020-10-14 12:45:37	<volans>	but I never tried
2020-10-14 12:53:38	<hnowlan>	volans: ah nice, didn't notice that. thanks
2020-10-14 15:10:29	<addshore>	O/ coming back to this static sites related k8s stuff. Should I potentially start an email discussion about the ticket? (Not sure precicily where).
2020-10-14 15:10:52	<addshore>	We would love to figure out the direction so that we can setup a group of people to tackle it
2020-10-14 15:19:19	<wikibugs>	'serviceops, ''Kubernetes: Support TLS for service-to-service communication in k8s staging - https://phabricator.wikimedia.org/T260917 (''JMeybohm) I did test the change on deploy2001 and it behaves as expected (e.g. different from what PCC suggests). Means this PCC not showing a diff here is definitely a bug.'
2020-10-14 15:27:16	<cdanis>	addshore: this is re: T264710 ?
2020-10-14 15:28:37	<addshore>	Yup
2020-10-14 15:39:21	<wikibugs>	'serviceops, ''Push-Notification-Service, ''Product-Infrastructure-Team-Backlog (Kanban), ''User-jijiki: High latency on push notification service initialization - https://phabricator.wikimedia.org/T265258 (''LGoto) p:''Triage→''Medium'
2020-10-14 15:47:23	<addshore>	cdanis: yup (with a ping ;))
2020-10-14 15:51:05	<akosiaris>	addshore: probably not already. We 've seen the task, haven't had time to comment yet
2020-10-14 15:51:17	<akosiaris>	s/already/yet/ (sorry translation hiccup)
2020-10-14 15:53:09	<jayme>	+1 on that. I was about to blubber something about the possibility of creating a very lightweight nginx based image for that purpose
2020-10-14 15:53:11	<cdanis>	akosiaris: addshore: I think it would be possible to build something k8sified that looked a lot like the 'microsites' change model does now (merge to gerrit --> automatically deployed) ... with some effort :)
2020-10-14 15:55:39	<akosiaris>	I am not even sure k8sification is the best path forward. A way to populate a swift container also seems tempting.
2020-10-14 15:55:43	<jayme>	cdanis: yeah..thats probably the stuff I talked to mutante about (at least I guess so). Would be desirable to have a gereric solution here.
2020-10-14 15:56:09	<akosiaris>	but alas, it's late in the evening and I don't have all the requirements yet clear in my head
2020-10-14 15:56:20	<akosiaris>	jayme: you probably want to add that use case in the task
2020-10-14 15:56:43	<jayme>	Yeah, indeed I want to :)
2020-10-14 15:57:29	<addshore>	cdanis: copying how microsites currently works but doing it in a container sounds like another good potential starting point
2020-10-14 15:58:01	<addshore>	cdanis: copying how microsites currently works but doing it in a container sounds like another good potential starting point
2020-10-14 15:58:05	<addshore>	Perfectly happy for our initial usecase to be a testing ground and iterate through ideas too
2020-10-14 15:58:51	<jayme>	the hard part is "The hosting location can be pointed to from sub paths of query.wikidata.org" I guess
2020-10-14 15:59:15	<cdanis>	that part makes things quite messy in the traffic layer, yes
2020-10-14 15:59:46	<jayme>	or maybe it is not if you can just add the foobar.microsite.svc.wmnet as upstream config to your nginx and proxy_pass
2020-10-14 16:00:19	<jayme>	your == WDQS nginx :)
2020-10-14 16:00:46	<addshore>	Jayme yes, that sounds like what I thought might work (buy not sure myself as I don't know about our k8s bits and networking)
2020-10-14 16:03:03	<jayme>	that should work in case we're aiming a k8s approach. Not sure about the "just put it in swift" idea, though. But that can be figured out.
2020-10-14 16:03:25	<addshore>	Adding swift into the possibilities increases complexity somewhat. If this service ends up being nodejs to take advantage of service runner, then it may as well just fetch from git itself?
2020-10-14 16:04:04	<addshore>	However that has the downside of a dep on git, and if git goes away, the service can't come up? So swift in the middle might make sense
2020-10-14 16:05:10	<jayme>	git fetching during runtime/startup you mean? I would rather bake the files into the docker image tbh. to not have that dependency
2020-10-14 16:05:12	<cdanis>	addshore: well, the advantage of swift is it eliminates a SPOF on gerrit
2020-10-14 16:05:21	<addshore>	Ack
2020-10-14 16:05:49	<addshore>	Starting to sound like 2 services almost? 1 to update swift? 1 to serve from swift?
2020-10-14 16:06:11	<cdanis>	potentially, one to update swift and to also send CDN purges
2020-10-14 16:06:13	<addshore>	The updating one perhaps doesn't need to be k8sifyed? Thoughts?
2020-10-14 16:06:30	<jayme>	cant the ci pipeline push to swift?
2020-10-14 16:06:35	<addshore>	CDN purges, good point, hmm
2020-10-14 16:07:08	<addshore>	If the ci pipeline can push to swift that sounds great
2020-10-14 16:07:25	<jayme>	still, purging is a problem indeed
2020-10-14 16:08:19	<addshore>	Yes, especially given the usecase of a path under query.wikidata.org. does this hosting service even know what URLs to purge? Probably not...
2020-10-14 16:09:09	<addshore>	How does that currently happen for microsites?
2020-10-14 16:09:33	<cdanis>	it's possible it doesn't
2020-10-14 16:09:54	<addshore>	If not, that sounds like an easy level of functionality to match ;)
2020-10-14 16:12:41	<addshore>	The only thing to circle back to is, is this complexity and effort worth it, Vs a super light weight nodejs service per static site? I'd be inclined to personally pay for the sticks of ram and extra CPU rather than create the thing we just talked about :0
2020-10-14 16:18:51	<jayme>	...or some nginx that we can probably run with even less ressources than nodejs. Indeed.
2020-10-14 16:20:53	<addshore>	The one flag there is that it would be good to have structured logging and metrics via Prometheus, but I guess nginx can do those now?
2020-10-14 16:22:01	<jayme>	addshore: structured logging yes. They have json encoding support in logging stancas now. For (really usefull) prometheus metrics I think one still needs to use some lua extension
2020-10-14 16:22:26	<addshore>	https://github.com/nginxinc/nginx-prometheus-exporter https://medium.com/bolt-labs/using-json-for-nginx-log-format-793743064fc4
2020-10-14 16:23:09	<jayme>	the exporter is pretty shit tbh. as it just consumes the status page when not on nginx plus :-/
2020-10-14 16:23:25	<addshore>	heh
2020-10-14 16:24:37	<jayme>	don't know...maybe thats even enough for simple things as serving files. We could get telemetry from envoy then as well (response time/size buckets)
2020-10-14 16:24:40	<addshore>	The comment at https://phabricator.wikimedia.org/T264710#6532140 which talks about metrics requirements sounds like it is mainly geared toward the idea of 1 static sites services for all sites. I imagine advanced metrics are less useful when each site is its own service
2020-10-14 16:26:44	<addshore>	status codes would also be useful I guess? and I figure that will end up in logstash, but not availible anywhere else
2020-10-14 16:27:06	<addshore>	(I'll write all of this discussion up in the ticket after)
2020-10-14 16:27:07	<jayme>	hm..you still want to get an idea about response times, errors etc.
2020-10-14 16:27:21	<jayme>	addshore: cool, thanks! Was about to ask
2020-10-14 16:28:31	<addshore>	https://blog.ruanbekker.com/blog/2020/04/25/nginx-metrics-on-prometheus-with-the-nginx-log-exporter/sounds promising?
2020-10-14 16:29:25	<jayme>	or somthing like this https://github.com/knyar/nginx-lua-prometheus
2020-10-14 16:30:29	<jayme>	(to not have to write the log and read it again
2020-10-14 16:30:44	<addshore>	nice
2020-10-14 16:31:21	<addshore>	So, a well tuned nginx base image for use by static sites could be a fairly simple approach that serviceops might be happy with? :) (not pressuring for a decision right now, i'm already very happy with this disucssion), but i figure its between the one sevrice to rule them all approach, and this
2020-10-14 16:32:08	<jayme>	Guess that could at least be a cheap (in terms of ressources) option, yes
2020-10-14 16:32:41	<addshore>	Right, I'm going to write some of this up in the phab task, and link to the chat logs :)
2020-10-14 16:33:06	<jayme>	great, thanks!
2020-10-14 16:33:26	<_joe_>	why not envoy instead of nginx? the configuration is sooo sweet and simple, you'd like it addshore :P
2020-10-14 16:33:41	<jayme>	because static files _joe_ :)
2020-10-14 16:33:55	<_joe_>	envoy can serve static content
2020-10-14 16:34:03	<_joe_>	they just say you shouldn't
2020-10-14 16:34:05	<jayme>	really?
2020-10-14 16:34:05	<_joe_>	:P
2020-10-14 16:34:08	<jayme>	ah, yeah
2020-10-14 16:34:28	<jayme>	I remember them always saying "this is not a webserver!" :D
2020-10-14 16:34:29	<_joe_>	to be clear, i was joking all along
2020-10-14 16:35:01	<addshore>	https://github.com/envoyproxy/envoy/issues/378
2020-10-14 16:35:03	<addshore>	:p
2020-10-14 16:35:51	<rzl>	we'll just throw up a staticoid behind it, it's fine
2020-10-14 16:36:04	<_joe_>	stabs rzl
2020-10-14 16:36:30	<rzl>	stop your stopwatches
2020-10-14 16:36:34	<rzl>	365 days exactly, very impressive
2020-10-14 16:36:43	<bblack>	implements staticoid as haproxy->envoy->varnish->ats->envoy->haproxy->nginx->lighttpd, all over localhost
2020-10-14 16:37:02	<rzl>	the good ol' "haproxy sandwich" as it's called
2020-10-14 16:37:10	<_joe_>	bblack: you forgot restbase, and a complimentary call to the action api
2020-10-14 16:37:49	<addshore>	i just spilt my tea
2020-10-14 16:37:53	<_joe_>	jokes aside, I think the engineering difficulty is not saying "ok, we use nginx as a base"
2020-10-14 16:37:55	<_joe_>	but
2020-10-14 16:38:02	<bblack>	and a per-character ratelimiter to make sure nobody types the http protocol too fast, using a remote ratelimit service hosted in AWS
2020-10-14 16:38:09	<_joe_>	how do you manage the repo feeding it?
2020-10-14 16:38:36	<_joe_>	because the obvious choice is "all static sites in one repo"
2020-10-14 16:38:51	<_joe_>	and then, who has merge rights there?
2020-10-14 16:39:28	<jayme>	the non obvious choice is one "service" per microsite, still
2020-10-14 16:39:40	<addshore>	starts printing his static site and filling envelopes
2020-10-14 16:39:58	<_joe_>	jayme: sigh
2020-10-14 16:40:32	<_joe_>	before we are able to do that, we need to upgrade k8s
2020-10-14 16:40:34	<jayme>	yeah, I know. But that's more or less buying time from humans with compute ressources, right
2020-10-14 16:40:59	<jayme>	that was to "sigh"
2020-10-14 16:41:05	<_joe_>	also that
2020-10-14 16:41:19	<jayme>	we def. need to upgrade k8s to have an ingress to handle this stuff...indeed
2020-10-14 16:41:23	<bblack>	humans always cost more than compute resources
2020-10-14 16:41:31	<_joe_>	that ^^
2020-10-14 16:41:34	<addshore>	I enjoy my human time, and with the back of napkin maths its worth spending on these compute reosurces
2020-10-14 16:42:10	<jayme>	:)
2020-10-14 16:43:40	<_joe_>	tbh the best solution would be to have separate repos, and any repo that merges a new tag triggers an image build
2020-10-14 16:43:47	<_joe_>	at the latest tag, for all repos
2020-10-14 16:44:06	<jayme>	hm..thats nice as well
2020-10-14 16:44:08	<_joe_>	that would save more human time, and also compute power
2020-10-14 16:45:33	<jayme>	need to run, will check back on the task tomorrow o/
2020-10-14 16:51:32	<addshore>	I guess the only question there _joe_ is in terms of deployments and how that would work?
2020-10-14 16:51:47	<addshore>	everyone with a site on the static sites service can deploy new images?
2020-10-14 16:52:58	<_joe_>	basically, yes
2020-10-14 16:53:14	<_joe_>	you know, in an ideal world where we have a ci we trust
2020-10-14 16:53:21	<_joe_>	we should just do CD for such stuff
2020-10-14 16:53:29	<_joe_>	git push => published
2020-10-14 16:53:38	<_joe_>	but we don't
2020-10-14 16:54:01	<_joe_>	I'm not giving a jenkins installation that's exposed to the internet the ability to deploy anything to kubernetes
2020-10-14 16:55:52	<addshore>	Right, i'll try to write that up as a proposal in the ticekt too
2020-10-14 17:04:26	<addshore>	Written up a latest iteration with all of the things discussed
2020-10-14 17:04:37	<addshore>	The bot didn't say anything, so here is a link https://phabricator.wikimedia.org/T264710
2020-10-14 17:18:26	<cdanis>	_joe_: one repo per site is how microsites work already, right?
2020-10-14 17:18:48	<_joe_>	cdanis: now you're expecting too much of my memory
2020-10-14 17:18:52	<cdanis>	https://wikitech.wikimedia.org/wiki/Microsites
2020-10-14 17:18:54	<cdanis>	looks like it
2020-10-14 17:40:32	<addshore>	How easy would this magic step of new tag in one repo = new build of image from another repo be?
2020-10-14 17:42:05	<addshore>	I actually thought in this direction a bit originally, as the static site we have has a build step, and it would be nice to blubberize it individually, so we still have some image we can use for dev etc, but then when that image is built, trigger a "static site" image that pulls files from the various built images into a single container
2020-10-14 18:29:36	<_joe_>	addshore: you are one awful bash script away from the solution, yes
2020-10-14 18:29:43	<_joe_>	and blubber allows variants
2020-10-14 18:30:09	<_joe_>	but i'm not sure the pipeline allows to build the same image from different repos
2020-10-14 18:31:18	<_joe_>	the interesting thing would be a tag in one of those repos triggering the rebuild of the image from a container repo
2020-10-14 18:39:21	<addshore>	I might write a bit more in the ticket tonight about the above solution. I personally think it gives us the best bits of both sides

Wikimedia IRC logs browser - #wikimedia-serviceops