[12:16:57] please update the monday meeting doc [12:17:13] trying to catch up from being away last week and missing yesterday's meeting, not easy :) [12:21:55] anything I can help you with paravoid? [12:22:38] most of our goals have no updates :) [12:23:31] I'm checking too [13:53:40] jijiki: do the jobrunners serve HTTP? [13:54:38] to whom ? [13:54:47] anyone [13:54:49] or anything [13:55:13] then yes [13:55:28] they don't show up on https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard as an option [13:55:57] oh I see where you were going with this [13:56:18] I take it that the goal of this dashboard is for issues that can directly affect users [13:57:06] are the MW jobrunners in their own 'cluster'? [13:57:10] yeah [13:57:44] jobrunners and jobscalers are the same server s [13:57:55] but the cluster is jobrunners [13:58:21] let me give you a list of jobs that jobrunners run :) [13:58:52] https://phabricator.wikimedia.org/T219148 [13:59:28] is that the 'misc' cluster? [14:00:16] I guess what I'm first wondering is if this data exists for them / is meaningful [14:00:17] there is a redis misc cluster and there used to be (I think) and upload misc cluster [14:00:29] right but I mean the silly 'cluster' variable we define in hiera [14:00:45] if there was a mw misc cluster, I don't know I am afraid [14:00:53] I see three values for it amongst things that are exporting the mediawiki_http_requests_duration_sum variable: api_appserver, appserver, misc [14:01:29] mmm nah I don't know [14:19:11] uughh I'm trying to track down the misc vs jobrunners thing [14:19:26] I am glad it is you and not me 🙃 [14:20:20] yeah prometheus isn't scraping those atm, I'll comment on the review [14:21:04] If that dashboard is used for incident reports, what about having a counter that shows exactly how many 5xx has been served in the selected time window? [14:22:14] thanks for the reviews btw! [14:23:10] godog: I think I know [14:23:18] jobrunners can run really long [14:23:22] and really short jobs [14:23:46] so latency maybe doesn't make sense here [14:24:19] eg if one uploads a loong video that needs to be encoded to whatever open source codec [14:25:05] fair enough, so yeah it wouldn't be very indicative [14:25:08] it == latency [14:26:18] XioNoX: yeah that'd be a good indicator in general, I'm not sure the dashboard itself is used for incident reports but sth like that e.g. in kibana would be helpful [14:27:49] there are a few ways in kibana to see total # of 50x [14:28:01] it is my usual source for impact estimation on pomos [22:22:47] chaomodus: is https://github.com/netbox-community/netbox/commit/30d160500704cfe442fcd7ec2d1f79aa9507d371 not suitable for https://phabricator.wikimedia.org/T230449 ? [22:55:01] hah no that's exactly what i was looking at [22:55:18] If I'm reading it right [22:55:25] well that makes that task much simpler... [23:09:30] haha :) [23:11:07] of course it's not supported in pynetbox and also does not actually reserve the ip address [23:12:14] ah perhaps it does allocate it [23:12:51] ah yes [23:12:54] okay i can use this [23:14:05] XioNoX: something weird tho, look at https://netbox.wikimedia.org/ipam/prefixes/131/ip-addresses/ [23:14:13] the top 2 are ones that got allocated with this [23:14:31] oooh i see [23:14:35] there's like child prefixes [23:15:10] okay [23:15:13] no worries this is correct. [23:16:36] paravoid: good find, I had misinterpreted the ticket I read about that change, and didn't see teh POST method (the GET method is what we thought was avaialble)