[00:19:26] !log lucaswerkmeister@tools-bastion-13 tools.ranker deployed 1abc7122fa (l10n updates: lb, skr-arab) [00:19:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL [00:19:49] hm, why CrashLoopBackOff [00:19:56] kubectl logs shows [00:19:57] ++ whoami [00:19:58] whoami: cannot find name for user ID 54606 [00:20:07] (54606 being the UID of tools.ranker) [00:20:45] did another webservice restart, same result [00:21:04] at least k8s is keeping the old pod alive and so the tool is still working, I guess ^^ [00:22:28] I’m guessing the failing command is https://gerrit.wikimedia.org/g/operations/docker-images/toollabs-images/+/7aaade322111aa942667d7a58b0a7dfd9832d954/shared/python/webservice-runner#6 [00:26:17] !log lucaswerkmeister@tools-bastion-13 tools.ranker (new code / l10n version is not actually live yet due to T385847) [00:26:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL [00:26:39] issue filed, maybe someone else can make sense of it [09:45:17] !log lucaswerkmeister@tools-bastion-13 tools.ranker (new code version is now live and has been for ~8h thanks to T385847 being fixed) [09:45:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.ranker/SAL [12:12:38] anybody here knows how to fix T385871? [12:12:39] T385871: Wikidocumentaries is down - https://phabricator.wikimedia.org/T385871 [12:13:30] it might need a manual restart (VM was rebooted 9 days ago) [12:19:07] dhinus: did you already log in and poke? If not I'll have a try [12:21:14] / is full [12:23:41] !log wikidocumentaries / volume is full. Freed a bit of space by cleaning up /var/log/journal and 'docker container prune' [12:23:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidocumentaries/SAL [12:24:26] !log wikidocumentaries rebooted hupu2.wikidocumentaries.eqiad1.wikimedia.cloud in hopes the service will come back on its own [12:24:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidocumentaries/SAL [12:52:18] andrewbogott: thanks for fixing it! I had a quick look but didn't check the disk space! [12:55:39] I didn't fix it, I just tried to fix it :) [12:57:11] ah ok :D is the web service still down? [12:57:18] I pinged one of the maintainers in the task [12:57:43] ok saw your comment in the task now [12:58:28] yeah, still down [17:58:24] !log tools "SET GLOBAL read_only=OFF; " on tools-db-4; both -5 and -4 were set to read only. No idea why or how... [17:58:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:09:11] Howdy! Need help adding a couple of emails to a CloudVPS Grafana project Alerts notification list or (if possible) how to get edit access to set up alerting rules and templates myself. Ty!! [18:13:10] birdcup: I think it would depend on which project it's about and then contacting the admins of that project [18:21:49] ok, thank you! will go look into that [18:28:08] birdcup: if you know the project name, then try to find it here: https://openstack-browser.toolforge.org/project/ click on it and see who is a member [18:29:46] grafana alerts is going to be separate from project membership unless the project is running it's own grafana stack somehow [18:31:23] soo what exactly do you mean by "grafana alerts"? since https://grafana.wmcloud.org/ is just a querying interface for several prometheus instances and does not contain any alerting features on its own [18:32:25] * bd808 doesn't need to summon taavi because he is self summoning :) [18:33:16] cloud vps definitely does not have a self-service alerting managing system (or if we have one, someone's intentionally built that to be hidden from me somehow) [18:36:36] Does anyone know the correct S3 URLs for swift objects? I have no problem with downloading w/ `s3cmd get s3://categories/data-20250201.jnl.xz` , but the URL I get from `s3cmd info` , `http://object.eqiad1.wikimediacloud.org/categories/data-20250201.jnl.xz` , results in a 404 . Note that I am using https, not http as specified in the URL [18:37:40] inflatador: https://object.eqiad1.wikimediacloud.org/PROJECT:BUCKET/FILE_PATH per https://wikitech.wikimedia.org/wiki/Help:Object_storage_user_guide#S3_API [18:37:40] on Grafana if you go to Alerting > Contact points (it at least seems like) one should be able to create a set of rules that if fulfilled, will result in alerts being sent out. https://grafana.com/tutorials/alerting-get-started-pt4/ [18:38:01] which grafana instance are we talking about? [18:38:40] taavi many thanks! [18:39:34] so trying to figure out how to make that happen since I don't seem to have access to login via grafana-rw.wmcloud.org. also this is for https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=catalyst&var-instance=All&from=now-2d&to=now the Catalyst instance. [18:44:51] that grafana instance is a basically read-only querying interface to several prometheus instances and can't send any alerts or be used to modify any alerting configuration.. [18:45:00] this feels like an XY problem to me. which problem are you actually trying to solve? [20:51:07] !log tools resize tools-legacy-redirector to have 2 vCPU T385908 [20:51:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:51:11] T385908: toolforge-legacy-redirector: constant failed probes by prometheus - https://phabricator.wikimedia.org/T385908 [22:06:34] taavi: We wanted to send our team an alert when our instance is down, for example. Is there an existing way to do this? [23:01:28] jeena: there is not [23:02:07] well... at least there is not a generic way that has been exposed to al projects [23:04:37] jeena: Making a phab task explaining what y'all are interested in doing would help get a conversation started about how we could add your project to the existing monitoring tooling in the metricsinfra project [23:05:10] Thanks bd808. I figured out how to add a contact point and alert policy on grafana, but since smtp isn't configured we are gonna try with the slack app, unless that reveals something else that needs configuring [23:05:28] the slack app for what? [23:06:18] https://wikitech.wikimedia.org/wiki/Nova_Resource:Metricsinfra has some info about the metricsinfra project [23:06:45] Oh apparently you have to make a slack app and connect it to your workspace to get webhook alerts on slack? I'm not familiar with the workflow [23:06:52] I'll check the link you sent [23:07:06] Also here is our task https://phabricator.wikimedia.org/T385330 [23:08:02] jeena: broadly speaking I think it is a poor use of your team's energy to try and fumble around and build things on our shared infrastructure without talking to the WMCS team. [23:08:45] thcipriani: ^ [23:09:24] That makes sense. Would you be available for a quick video chat? [23:09:32] Just to get on the same page [23:09:37] sure [23:09:44] Okay cool [23:11:07] I sent you a link on slack [23:49:56] Wikimedia Cloud Services (wikitech.wikimedia.org) | Status: tools-db instability, please ask for !help if the database is unavailable or read-only | Ask questions here, but please provide links and context. Use "!help" if nobody responds. | More details and channel logs at https://wikitech.wikimedia.org/wiki/Help:IRC | Code of Conduct applies: https://www.mediawiki.org/wiki/CoC [23:59:32] andrewbogott: did you mean to change the motd status and not just announce to the channel? [23:59:56] !status tools-db instability, please ask for !help if the database is unavailable or read-only [23:59:57] Too long status