[00:19:26] <wm-bot>	 !log lucaswerkmeister@tools-bastion-13 tools.ranker deployed 1abc7122fa (l10n updates: lb, skr-arab)
[00:19:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL
[00:19:49] <wm-bb>	 <lucaswerkmeister> hm, why CrashLoopBackOff
[00:19:56] <wm-bb>	 <lucaswerkmeister> kubectl logs shows
[00:19:57] <wm-bb>	 <lucaswerkmeister> ++ whoami
[00:19:58] <wm-bb>	 <lucaswerkmeister> whoami: cannot find name for user ID 54606
[00:20:07] <wm-bb>	 <lucaswerkmeister> (54606 being the UID of tools.ranker)
[00:20:45] <wm-bb>	 <lucaswerkmeister> did another webservice restart, same result
[00:21:04] <wm-bb>	 <lucaswerkmeister> at least k8s is keeping the old pod alive and so the tool is still working, I guess ^^
[00:22:28] <wm-bb>	 <lucaswerkmeister> I’m guessing the failing command is https://gerrit.wikimedia.org/g/operations/docker-images/toollabs-images/+/7aaade322111aa942667d7a58b0a7dfd9832d954/shared/python/webservice-runner#6
[00:26:17] <wm-bot>	 !log lucaswerkmeister@tools-bastion-13 tools.ranker (new code / l10n version is not actually live yet due to T385847)
[00:26:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL
[00:26:39] <wm-bb>	 <lucaswerkmeister> issue filed, maybe someone else can make sense of it
[09:45:17] <wm-bot>	 !log lucaswerkmeister@tools-bastion-13 tools.ranker (new code version is now live and has been for ~8h thanks to T385847 being fixed)
[09:45:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL
[12:12:38] <dhinus>	 anybody here knows how to fix T385871?
[12:12:39] <stashbot>	 T385871: Wikidocumentaries is down - https://phabricator.wikimedia.org/T385871
[12:13:30] <dhinus>	 it might need a manual restart (VM was rebooted 9 days ago)
[12:19:07] <andrewbogott>	 dhinus: did you already log in and poke?  If not I'll have a try
[12:21:14] <andrewbogott>	  / is full
[12:23:41] <andrewbogott>	 !log wikidocumentaries / volume is full. Freed a bit of space by cleaning up /var/log/journal and 'docker container prune'
[12:23:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidocumentaries/SAL
[12:24:26] <andrewbogott>	 !log wikidocumentaries rebooted hupu2.wikidocumentaries.eqiad1.wikimedia.cloud in  hopes the service will come back on its own
[12:24:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidocumentaries/SAL
[12:52:18] <dhinus>	 andrewbogott: thanks for fixing it! I had a quick look but didn't check the disk space!
[12:55:39] <andrewbogott>	 I didn't fix it, I just tried to fix it :)
[12:57:11] <dhinus>	 ah ok :D is the web service still down?
[12:57:18] <dhinus>	 I pinged one of the maintainers in the task
[12:57:43] <dhinus>	 ok saw your comment in the task now
[12:58:28] <andrewbogott>	 yeah, still down
[17:58:24] <andrewbogott>	 !log tools "SET GLOBAL read_only=OFF;  " on tools-db-4; both -5 and -4 were set to read only.  No idea why or how...
[17:58:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[18:09:11] <birdcup>	 Howdy! Need help adding a couple of emails to a CloudVPS Grafana project Alerts notification list or (if possible) how to get edit access to set up alerting rules and templates myself. Ty!!  
[18:13:10] <mutante>	 birdcup: I think it would depend on which project it's about and then contacting the admins of that project
[18:21:49] <birdcup>	 ok, thank you! will go look into that 
[18:28:08] <mutante>	 birdcup: if you know the project name, then try to find it here: https://openstack-browser.toolforge.org/project/   click on it and see who is a member
[18:29:46] <bd808>	 grafana alerts is going to be separate from project membership unless the project is running it's own grafana stack somehow
[18:31:23] <taavi>	 soo what exactly do you mean by "grafana alerts"? since https://grafana.wmcloud.org/ is just a querying interface for several prometheus instances and does not contain any alerting features on its own
[18:32:25] * bd808 doesn't need to summon taavi because he is self summoning :)
[18:33:16] <taavi>	 cloud vps definitely does not have a self-service alerting managing system (or if we have one, someone's intentionally built that to be hidden from me somehow)
[18:36:36] <inflatador>	 Does anyone know the correct S3 URLs for swift objects? I have no problem with downloading w/ `s3cmd get s3://categories/data-20250201.jnl.xz` , but the URL I get from `s3cmd info` , `http://object.eqiad1.wikimediacloud.org/categories/data-20250201.jnl.xz` , results in a 404 . Note that I am using https, not http as specified in the URL
[18:37:40] <taavi>	 inflatador: https://object.eqiad1.wikimediacloud.org/PROJECT:BUCKET/FILE_PATH per https://wikitech.wikimedia.org/wiki/Help:Object_storage_user_guide#S3_API
[18:37:40] <birdcup>	 on Grafana if you go to Alerting > Contact points (it at least seems like) one should be able to create a set of rules that if fulfilled, will result in alerts being sent out. https://grafana.com/tutorials/alerting-get-started-pt4/  
[18:38:01] <taavi>	 which grafana instance are we talking about?
[18:38:40] <inflatador>	 taavi many thanks!
[18:39:34] <birdcup>	 so trying to figure out how to make that happen since I don't seem to have access to login via grafana-rw.wmcloud.org. also this is for https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=catalyst&var-instance=All&from=now-2d&to=now the Catalyst instance. 
[18:44:51] <taavi>	 that grafana instance is a basically read-only querying interface to several prometheus instances and can't send any alerts or be used to modify any alerting configuration..
[18:45:00] <taavi>	 this feels like an XY problem to me. which problem are you actually trying to solve?
[20:51:07] <arturo>	 !log tools resize tools-legacy-redirector to have 2 vCPU T385908
[20:51:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[20:51:11] <stashbot>	 T385908: toolforge-legacy-redirector: constant failed probes by prometheus - https://phabricator.wikimedia.org/T385908
[22:06:34] <jeena>	 taavi: We wanted to send our team an alert when our instance is down, for example. Is there an existing way to do this?
[23:01:28] <bd808>	 jeena: there is not
[23:02:07] <bd808>	 well... at least there is not a generic way that has been exposed to al projects
[23:04:37] <bd808>	 jeena: Making a phab task explaining what y'all are interested in doing would help get a conversation started about how we could add your project to the existing monitoring tooling in the metricsinfra project
[23:05:10] <jeena>	 Thanks bd808. I figured out how to add a contact point and alert policy on grafana, but since smtp isn't configured we are gonna try with the slack app, unless that reveals something else that needs configuring
[23:05:28] <bd808>	 the slack app for what?
[23:06:18] <bd808>	 https://wikitech.wikimedia.org/wiki/Nova_Resource:Metricsinfra has some info about the metricsinfra project
[23:06:45] <jeena>	 Oh apparently you have to make a slack app and connect it to your workspace to get webhook alerts on slack? I'm not familiar with the workflow
[23:06:52] <jeena>	 I'll check the link you sent
[23:07:06] <jeena>	 Also here is our task https://phabricator.wikimedia.org/T385330
[23:08:02] <bd808>	 jeena: broadly speaking I think it is a poor use of your team's energy to try and fumble around and build things on our shared infrastructure without talking to the WMCS team.
[23:08:45] <bd808>	 thcipriani: ^
[23:09:24] <jeena>	 That makes sense. Would you be available for a quick video chat?
[23:09:32] <jeena>	 Just to get on the same page
[23:09:37] <bd808>	 sure
[23:09:44] <jeena>	 Okay cool
[23:11:07] <jeena>	 I sent you a link on slack 
[23:49:56] <andrewbogott>	 Wikimedia Cloud Services (wikitech.wikimedia.org) | Status: tools-db instability, please ask for !help if the database is unavailable or read-only | Ask questions here, but please provide links and context. Use "!help" if nobody responds. | More details and channel logs at https://wikitech.wikimedia.org/wiki/Help:IRC | Code of Conduct applies: https://www.mediawiki.org/wiki/CoC
[23:59:32] <bd808>	 andrewbogott: did you mean to change the motd status and not just announce to the channel?
[23:59:56] <bd808>	 !status tools-db instability, please ask for !help if the database is unavailable or read-only
[23:59:57] <wmopbot>	 Too long status