[00:05:11] 06Labs, 15User-bd808: Provision novaobserver credentials on all Labs hosts - https://phabricator.wikimedia.org/T160929#3116647 (10Andrew) 05Open>03Resolved [00:48:56] PROBLEM - Puppet run on tools-exec-1418 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:15:41] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Output data for new XTools: Articleinfo - https://phabricator.wikimedia.org/T157706#3013867 (10MusikAnimal) I think all the data is there now, check out http://localhost:8000/articleinfo/en.wikipedia.org/Bonzun (this one has checkwiki errors at the time of wri... [01:28:58] RECOVERY - Puppet run on tools-exec-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [03:01:01] 10Wikibugs: Introduce wikibugs in #wikimedia-ve channel - https://phabricator.wikimedia.org/T160973#3116833 (10White-Master) [03:02:09] 10Wikibugs, 06Wikimedia-Venezuela: Introduce wikibugs in #wikimedia-ve channel - https://phabricator.wikimedia.org/T160973#3116846 (10White-Master) [06:35:48] PROBLEM - Puppet run on tools-exec-1402 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:10:51] RECOVERY - Puppet run on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [10:06:50] 10Wikibugs, 06Wikimedia-Venezuela: Introduce wikibugs in #wikimedia-ve channel - https://phabricator.wikimedia.org/T160973#3117296 (10Aklapper) Configuration is in https://phabricator.wikimedia.org/source/wikibugs/browse/master/channels.yaml if someone wants to write a patch :) [10:29:10] 06Labs: Providing index of backlinks table to labs replicas - https://phabricator.wikimedia.org/T159984#3117397 (10Ebraminio) @Umherirrender: Thanks a lot for your help on this, but can I also know the reason behind this? I have a very specific query something like this: SELECT pl_title, COUNT(*) FROM pagelinks... [11:48:06] 06Labs, 10DBA: Prepare and check storage layer for kbp.wikipedia.org - https://phabricator.wikimedia.org/T160869#3117533 (10Dereckson) p:05Triage>03Low [11:50:44] 06Labs, 06Operations, 10hardware-requests: Codfw: (1) hardware access request for labtest - https://phabricator.wikimedia.org/T154706#3117556 (10chasemp) >>! In T154706#3116218, @RobH wrote: > @chasemp: > > Is there a specific existing server that meets this requirement to base a new spec off of? > There... [11:50:53] 06Labs, 06Operations, 10hardware-requests: Codfw: (1) hardware access request for labtest - https://phabricator.wikimedia.org/T154706#3117562 (10chasemp) [11:54:29] 06Labs, 06Operations, 10hardware-requests: Eqiad: (2) hardware access request for labcontrol1003/1004 - https://phabricator.wikimedia.org/T158207#3117577 (10chasemp) >>! In T158207#3116228, @RobH wrote: > Is there a specific cpu seed we have to stick to? 24 cores without HT is dual 12 core CPUs. Anything b... [11:57:40] 06Labs, 06Operations, 10hardware-requests: Eqiad: (2) hardware access request for labnet1003/1004 - https://phabricator.wikimedia.org/T158204#3117579 (10chasemp) >>! In T158204#3116230, @RobH wrote: > Is there a specific cpu seed we have to stick to? 24 cores without HT is dual 12 core CPUs. Anything between... [12:01:58] 06Labs: Request creation of getstarted labs project - https://phabricator.wikimedia.org/T160884#3117582 (10chasemp) Thanks for the overview @freddy2001. We'll get to this within the week. [12:03:37] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Deprecate precise instances in Labs by 2017-03-31 - https://phabricator.wikimedia.org/T143349#3117585 (10chasemp) >>! In T143349#3109558, @chasemp wrote: > A note that the appointed time grows nigh, and this is quickly becoming the most mysterious item left... [12:03:52] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Deprecate precise instances in Labs by 2017-03-31 - https://phabricator.wikimedia.org/T143349#3117586 (10chasemp) >>! In T143349#3113723, @hashar wrote: > I have finally deleted all three Precise instances from the `integration` labs project and updated the... [12:05:18] 06Labs, 10Horizon: User dschwen unable to log into horizon.wikimedia.org (An error occurred authenticating. Please try again later.) - https://phabricator.wikimedia.org/T154860#3117589 (10chasemp) 05Open>03Resolved [12:40:49] 06Labs, 06Operations, 10hardware-requests: Eqiad: (2) hardware access request for labnet1003/1004 - https://phabricator.wikimedia.org/T158204#3117643 (10chasemp) 05Open>03stalled Let's hold on this one out of the pending 3 for last, I want to do some more review on CPU specs since the existing is such a... [13:41:14] 06Labs, 10Horizon: User dschwen unable to log into horizon.wikimedia.org (An error occurred authenticating. Please try again later.) - https://phabricator.wikimedia.org/T154860#3117744 (10Andrew) 05Resolved>03Open Fixing dschwen's login was a bit of a hack... I'd like to keep this open until the actual cau... [13:41:24] 06Labs, 10Horizon: User dschwen unable to log into horizon.wikimedia.org (An error occurred authenticating. Please try again later.) - https://phabricator.wikimedia.org/T154860#3117746 (10Andrew) p:05Triage>03Normal [13:50:56] Hi [13:52:35] can anyone help me to mass add tawiki categories to wikidata [14:12:30] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Track labs instances hanging - https://phabricator.wikimedia.org/T141673#2507274 (10hashar) Potentially this one is solved for good. I closed the umbrella task I had (T152599) and haven't noticed such hang for a while now. There is a subtask about upgrading... [14:14:46] 06Labs: Measure capacity and utilization of labvirt**** servers - https://phabricator.wikimedia.org/T107067#3117866 (10hashar) I have added a graph of the CPU sum of `{system,user,nice,iowait,irq,softirq}` per labvirt hosts using a 1 day moving median. View over the last 7 days: https://grafana.wikimedia.org/d... [14:49:01] 10MediaWiki-extensions-OpenStackManager, 13Patch-For-Review: Wikitech 'Requested domain is invalid' - https://phabricator.wikimedia.org/T160995#3117956 (10bd808) [15:00:52] 06Labs, 10MediaWiki-extensions-OpenStackManager, 13Patch-For-Review: Wikitech 'Requested domain is invalid' - https://phabricator.wikimedia.org/T160995#3117969 (10Andrew) [15:02:47] 06Labs, 10MediaWiki-extensions-OpenStackManager, 13Patch-For-Review: Wikitech 'Requested domain is invalid' - https://phabricator.wikimedia.org/T160995#3117974 (10Andrew) 05Open>03Resolved [15:13:37] !log wikilabels staged wikilabels-wmflabs-deploy:c26481b [15:13:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [15:24:40] !log wikilabels deployed wikilabels-wmflabs-deploy:c26481b T161002 [15:24:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [15:24:43] T161002: Late march wikilabels deployment - https://phabricator.wikimedia.org/T161002 [15:35:29] !log wikilabels staged wikilabels-wmflabs-deploy:3f295c0 [15:35:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [16:16:31] 06Labs, 10Labs-Infrastructure: Rebalance tools exec nodes with an eye towards CPU usage - https://phabricator.wikimedia.org/T161006#3118187 (10Andrew) [16:18:56] !log moving tools-exec-1406 to labvirt1011 to ease CPU usage on labvirt1004 [16:18:57] Unknown project "moving" [16:19:32] ^ andrewbogott wah wah stashbot error [16:19:51] !log tools moving tools-exec-1406 to labvirt1011 to ease CPU usage on labvirt1004 [16:19:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [16:20:00] stashbot should really ping the logging user for those errors [16:20:01] See https://wikitech.wikimedia.org/wiki/Tool:Stashbot for help. [16:21:04] andrewbogott: yeah a pingback on username seems best [16:21:23] PROBLEM - Host tools-exec-1406 is DOWN: CRITICAL - Host Unreachable (10.68.18.13) [16:29:56] 06Labs, 10Labs-Infrastructure: Rebalance tools exec nodes with an eye towards CPU usage - https://phabricator.wikimedia.org/T161006#3118228 (10hashar) Went creating a lame graph that for each labvirt node graph the CPU usage * 2: https://grafana.wikimedia.org/dashboard/db/labs-capacity-planning?panelId=91&ful... [17:06:48] !log tools moving tools-webgrid-lighttpd-1404 to labvirt1012 to ease pressure on labvirt1004 [17:06:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:12:28] PROBLEM - Host tools-webgrid-lighttpd-1404 is DOWN: CRITICAL - Host Unreachable (10.68.17.144) [17:24:55] (03PS1) 10BryanDavis: Ping sending nick when reporting !log errors [labs/tools/stashbot] - 10https://gerrit.wikimedia.org/r/343925 [17:25:44] chasemp, andrewbogott: ask an ye shall receive ^ [17:54:44] RECOVERY - Host tools-exec-1406 is UP: PING OK - Packet loss = 0%, RTA = 1.46 ms [18:30:27] 06Labs: Providing index of backlinks table to labs replicas - https://phabricator.wikimedia.org/T159984#3118695 (10Umherirrender) A page in mediawiki is always in a namespace, so seaching for a page named 'C' can return the page in the main namespace, but also a user page or a template, because in all these name... [18:30:45] How would I go about shutting down a tool in non-compliance? [18:33:11] Dispenser: well, I think you wouldn't but you could report a tool in non-compliance and then a Tool admin would attempt to address it. What flavor of non-compliance? security issue or licensing issue? [18:33:34] https://phabricator.wikimedia.org/T160205 Privacy Policy I believe [18:34:10] 06Labs: Providing index of backlinks table to labs replicas - https://phabricator.wikimedia.org/T159984#3085420 (10jcrespo) Thanks for the help provided, Umherirrender. [18:35:02] Dispenser: ok thanks [18:47:55] RECOVERY - Host tools-webgrid-lighttpd-1404 is UP: PING OK - Packet loss = 0%, RTA = 2.19 ms [18:53:47] 06Labs, 10Tool-Labs: Add interstitial to wikidata-externalid-url - https://phabricator.wikimedia.org/T160205#3118792 (10ArthurPSmith) @Dispenser wikidata-externalid-url is installed on tool-labs which fully preserves user privacy, I'm not sure what your concern is? Please clarify where you think any policy has... [19:03:16] 06Labs, 10Tool-Labs: Add interstitial to wikidata-externalid-url - https://phabricator.wikimedia.org/T160205#3118815 (10Dispenser) "People follow a link to the Tool Labs, which has the WMF privacy policy, and they end up quietly disclosing their IPs in violation of that policy. You //can// redirect users to [... [19:03:40] (03CR) 10Andrew Bogott: [C: 031] Ping sending nick when reporting !log errors [labs/tools/stashbot] - 10https://gerrit.wikimedia.org/r/343925 (owner: 10BryanDavis) [19:19:26] 06Labs, 10Gerrit, 06Operations, 06Release-Engineering-Team, 07LDAP: Remove user gerrit2 from ldap - https://phabricator.wikimedia.org/T160122#3118904 (10Paladox) Can this be closed as resolved please? [19:19:53] 06Labs, 10Gerrit, 06Operations, 06Release-Engineering-Team, 07LDAP: Remove user gerrit2 from ldap - https://phabricator.wikimedia.org/T160122#3118905 (10demon) Haven't had time. [19:22:30] 06Labs, 06Operations: Instance creation fails before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3118922 (10chasemp) https://gerrit.wikimedia.org/r/#/c/343636/ [19:25:29] 06Labs, 10Gerrit, 06Operations, 06Release-Engineering-Team, 07LDAP: Remove user gerrit2 from ldap - https://phabricator.wikimedia.org/T160122#3118954 (10Paladox) >>! In T160122#3118905, @demon wrote: > Haven't had time. Oh, sorry, i thought it was done. [19:47:33] 10Tool-Labs-tools-Xtools, 06Community-Tech: [Epic] Rewrite XTools: Articleinfo - https://phabricator.wikimedia.org/T157602#3119102 (10kaldari) [19:47:37] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Output data for new XTools: Articleinfo - https://phabricator.wikimedia.org/T157706#3119101 (10kaldari) 05Open>03Resolved [19:56:45] 06Labs, 10Tool-Labs: Add interstitial to wikidata-externalid-url - https://phabricator.wikimedia.org/T160205#3119126 (10ArthurPSmith) @Dispenser, ok the issue is that people clicking an "external id" link are going to an external site? Is there any situation in which it is not obvious this is going to an exter... [19:59:16] 06Labs, 10Tool-Labs: Add interstitial to wikidata-externalid-url - https://phabricator.wikimedia.org/T160205#3119144 (10ArthurPSmith) specifically, looking at The Godfather, which you mention here, there are close to 3 dozen OTHER external id links that similarly would show user IP information if followed. [20:13:31] marxarelli: btw, look at the tools k8s master role now - it's using a 'profile' (same as prod), so it should be easier to construct a different master role if you want [20:18:23] 06Labs, 10Tool-Labs, 10Prod-Kubernetes, 10Tools-Kubernetes, 07kubernetes: Fully document process for building a new version of Kubernetes debs - https://phabricator.wikimedia.org/T161031#3119203 (10yuvipanda) [20:18:59] 06Labs, 10Tool-Labs, 10Prod-Kubernetes, 10Tools-Kubernetes, 07kubernetes: Fully document process for building a new version of Kubernetes debs - https://phabricator.wikimedia.org/T161031#3119219 (10yuvipanda) @akosiaris started some docs with https://wikitech.wikimedia.org/wiki/Tools_Kubernetes#Building... [20:24:28] 06Labs, 10Tool-Labs, 10Prod-Kubernetes, 10Tools-Kubernetes, 07kubernetes: Coordinate Kubernetes efforts between Tool Labs and Production - https://phabricator.wikimedia.org/T153943#3119235 (10bd808) [20:26:59] 06Labs, 10Tool-Labs, 10Prod-Kubernetes, 10Tools-Kubernetes, 07kubernetes: Coordinate Kubernetes efforts between Tool Labs and Production - https://phabricator.wikimedia.org/T153943#3119244 (10bd808) WP:BOLD edits have been made to the main content here. Lets see if we can get decisions that have been mad... [20:27:05] 06Labs, 10Tool-Labs, 10Prod-Kubernetes, 10Tools-Kubernetes, 07kubernetes: Coordinate Kubernetes efforts between Tool Labs and Production - https://phabricator.wikimedia.org/T153943#3119245 (10yuvipanda) [20:43:37] 06Labs, 10Tool-Labs: Add interstitial to wikidata-externalid-url - https://phabricator.wikimedia.org/T160205#3119363 (10Dispenser) @ArthurPSmith I didn't make the rule, just making sure its enforced. # The status bubble in Chromium says "...wmflabs.org/.../wikida..." this varies with browser width. There migh... [20:48:54] 06Labs, 10Tool-Labs: Add interstitial to wikidata-externalid-url - https://phabricator.wikimedia.org/T160205#3119406 (10ArthurPSmith) Hmm, I think the big issue may be point 3. Do you have an example where this might have come up? I could certainly make it an interstitial easily enough, but that makes these li... [20:51:01] 06Labs, 10Tool-Labs: Add interstitial to wikidata-externalid-url - https://phabricator.wikimedia.org/T160205#3119415 (10ArthurPSmith) a:03ArthurPSmith (claiming task - if this really needs to be done I can certainly take care of it) [21:08:04] 06Labs, 10Tool-Labs: Add interstitial to wikidata-externalid-url - https://phabricator.wikimedia.org/T160205#3119470 (10ArthurPSmith) Hmm, Ok, I read through the discussion you linked with @coren - I certainly see there can be a privacy violation regarding expectations in cases as were discussed there. I think... [21:31:59] 06Labs, 10Tool-Labs: Add interstitial to wikidata-externalid-url - https://phabricator.wikimedia.org/T160205#3119726 (10Dispenser) I was thinking something like Internet in a Box / [[http://schoolserver.org/|XSCE School Server]] project where for "protect the children" an interstitial is necessary. I don't kn... [21:45:52] 06Labs, 10Labs-Infrastructure: Rebalance tools exec nodes with an eye towards CPU usage - https://phabricator.wikimedia.org/T161006#3119772 (10hashar) labvirt1004 had its load bump since March 7th {F6828118 size=full} and that fits nicely with a shift of CPU from labvirt1001 to labvirt1004: {F6828140 size=fu... [21:50:02] 06Labs, 10Labs-Infrastructure: Rebalance tools exec nodes with an eye towards CPU usage - https://phabricator.wikimedia.org/T161006#3119810 (10hashar) I lack data from the OpenStack side but a theory would be that a lot of Nodepool instances ends up being scheduled on the same host. Maybe because that is the... [22:18:09] 06Labs, 10Labs-Infrastructure: Rebalance tools exec nodes with an eye towards CPU usage - https://phabricator.wikimedia.org/T161006#3119938 (10Andrew) It is actually possible to explicitly tell the scheduler to not put multiple nodepool instances on the same labvirt. That would work if the total number of nod... [22:28:20] 06Labs, 10Labs-Infrastructure: Rebalance tools exec nodes with an eye towards CPU usage - https://phabricator.wikimedia.org/T161006#3120002 (10hashar) I guess that prevents the scheduler to select a compute node that already has an instance in that antiaffinity group isn't it ? We have a pool of 25 instances... [22:28:36] andrewbogott: yeah ServerGroupAntiAffinityFilter would be a good fit to force the load to spread on all labvirt :( [22:28:54] but we got a pool of 25 instances > to the 14 labvirt eek [22:32:10] andrewbogott: most probably the scheduler is over provisioning whatever host somehow [22:32:23] seems that was the case for labvirt1001 at some point (the time to boot instance was pretty bad) [22:32:38] and could have shifted to labvirt1004 which now has high systemcpu/user cpu [22:32:59] maybe it is overprovisioning a host based on whatever the scheduler might do :( [22:35:52] The labvirt1001 thing is unrelated, it doesn't seem to do with load [22:36:14] 10Tool-Labs-tools-Xtools, 06Community-Tech: XTools: Top edits - 'All' namespaces option - https://phabricator.wikimedia.org/T160721#3120041 (10kaldari) [22:36:21] The scheduler only knows about vcpu allocation, not about actual load. And of course most instances don't use anywhere close to their allotted cpus, so overprovisioning works fine [22:36:28] except if we happen to get unlucky :( [22:36:36] which is exactly the case of nodepool [22:36:56] thanks, yuvipanda. i think i have the master set up semi-properly but still wrestling a bit with authz/n [22:37:00] since when there are lot of jobs, the instance will most probably use 100% cpu [22:37:28] yeah [22:37:30] so potentially the scheduler could think that compute node X has lot of vcpu room, keep adding the 25 or so nodepool instances to it [22:37:47] assuming that 25 instances * 2 vcpu are only going to consume like a couple real CPU [22:37:57] (with whatever is the cpu ratio over commitment) [22:38:16] but the nodepool instances are unusual cause they consume bunch of cpu and thus send the compute node to sky rocket cpu [22:38:50] marxarelli: right. we've a thurs meeting for it I guess [22:38:56] which would explain a few outages labs had. A compute node being over busy, some messages are not processed by it or takes to loong [22:39:19] timeout, the rabbit mq / amqp whatever thing get overcrowed : labs die because of CI unusual load [22:39:38] :-( [22:40:12] hashar: I don't think it would ever cause outages, just slowdowns. [22:40:30] yuvipanda: i'm just trying to auth as an admin for now :) configured it to use abac for authz and a static token file for authn but it doesn't seem to be working for me [22:40:34] But, it's a problem… I'm wondering if I should try to make a special schedule filter that turns away instances at certain CPU metrics [22:40:55] 10Tool-Labs-tools-Xtools, 06Community-Tech: XTools: Top edits - 'All' namespaces option - https://phabricator.wikimedia.org/T160721#3120063 (10kaldari) [22:41:10] do you happen to have any logs of the scheduler decisions? [22:41:18] I dont mind spending some time digging into them [22:41:29] yes, but they're noisy and not very interesting :) [22:42:06] * hashar throws perl at the log files [22:42:41] I guess the scheduler weight the hosts somehow [22:44:32] It's not weighted, it just filters based on limits. so it will often fill one node until it's full before moving on to the next one [22:44:35] 10Tool-Labs-tools-Xtools, 06Community-Tech: XTools: Top edits - 'All' namespaces option - https://phabricator.wikimedia.org/T160721#3120073 (10kaldari) [22:44:36] It's not as smart as I might like [22:46:30] 10Tool-Labs-tools-Xtools, 06Community-Tech: Add a server-side caching service for the new XTools - https://phabricator.wikimedia.org/T161057#3120084 (10kaldari) [22:46:36] 10Labs-project-Wikistats: Language name in rank.php should show up the language name in that language, not in English. - https://phabricator.wikimedia.org/T111607#3120100 (10Dzahn) a:03Dzahn [22:49:14] 10Tool-Labs-tools-Xtools, 06Community-Tech: Add a server-side caching service for the new XTools - https://phabricator.wikimedia.org/T161057#3120114 (10kaldari) [22:50:38] 10Tool-Labs-tools-Xtools, 06Community-Tech: Add a server-side caching service for the new XTools - https://phabricator.wikimedia.org/T161057#3120084 (10kaldari) p:05Triage>03Normal [22:57:09] 10Tool-Labs-tools-Xtools, 06Community-Tech: Add a server-side caching service for the new XTools - https://phabricator.wikimedia.org/T161057#3120084 (10Matthewrbowker) https://symfony.com/doc/current/components/cache.html - Appears to be easily available for symfony. I would avoid Redis because it makes xtool... [22:59:41] andrewbogott: what if it actually weight them? and the best one would always be the same? [23:00:09] I am looking at what to weigh... [23:00:10] found as setting scheduler_host_subset_size = 1 (default) that says: A value of 1 chooses the first host returned by the weighing functions [23:00:12] eg not random [23:00:12] options are many [23:00:16] https://www.irccloud.com/pastebin/lln1VPJD/ [23:00:39] especially weird since some of those are good things and some are bad... [23:00:50] so potentially, if the scheduler does weight the nodes somehow and that we happen to have one that is always the best, all new instances end up on it [23:00:58] yeah that is a rather complicated config :( [23:01:11] Like, if a host has high cpu.idle.time then it's a good candidate! But if it has high iowait.time then it's bad, right? [23:01:50] And I'm worried about using just one metric… like if I prefer systems with high cpu.idle.time then a system that's IO bound will just get more and more instances, making IO slower and slower, which will free up the CPU, etc. etc. [23:01:51] yeah sounds about right [23:03:18] liberty mentions an IoOpsFilter that tentatively filter out hosts having high io [23:03:55] That's a confusing name — in this context IoOps just means how many instances are being created or deleted or suspended on a given host [23:04:08] which apparently causes the scheduler to skip the compute node when there are more than "max_io_ops_per_host" operations such as: building an instance (cough nodepool), snapshotting etc [23:04:17] but that does not seem to take in account the actual instances I/O [23:04:47] maybe that could be used instead of the antiaffinity group? [23:04:58] eg if there are already 4 instances being build, skip it [23:05:05] yeah, it's a possibility [23:05:35] (which lets up to 13 * 4 = 52 instances to be spawned) [23:06:00] (it is not like I have any idea what I am talking about, I am just making wild guesses) [23:10:45] hashar: I am going to try a hotfix like https://gerrit.wikimedia.org/r/#/c/344051/ to see if it affects behavior [23:10:50] not until after I eat dinner though :) [23:12:12] andrewbogott: I will be out by that time [23:12:27] that's fine — I wont' leave it unattended in any case [23:12:43] seems the default for scheduler_weight_classes is all_weighter [23:12:50] so I guess that will make it to weight solely on cpu [23:12:52] yeah, I can't for the life of me tell what that means [23:13:26] Ram and disk space don't overprovision for the most part, though… since CPU is the main thing we overprovision it might be sound to just use that [23:13:31] but, I'll see what it does... [23:13:39] so I guess the scheduler will create instances on host that have the least cpu usage [23:14:23] maybe the default all_weighers gives a super high score for free RAM [23:14:34] and the instance having the most free RAM ends up receiving all the new instances [23:15:22] but yeah wild guess from my side sorry :( ideally for each instance creation we would get a log of nodes weight and the decision that got taken [23:18:15] andrewbogott: have a good dinner and thanks! [23:22:45] 10Tool-Labs-tools-Xtools: XTools: Top edits - 'All' option - https://phabricator.wikimedia.org/T160720#3120230 (10Samwilson) [23:22:48] 10Tool-Labs-tools-Xtools, 06Community-Tech: XTools: Top edits - 'All' namespaces option - https://phabricator.wikimedia.org/T160721#3120232 (10Samwilson) [23:23:45] 10Tool-Labs-tools-Xtools, 06Community-Tech: XTools: Top edits - 'All' namespaces option - https://phabricator.wikimedia.org/T160721#3108804 (10Samwilson) [23:29:38] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Rebalance tools exec nodes with an eye towards CPU usage - https://phabricator.wikimedia.org/T161006#3120259 (10hashar) To summarize the wild guesses I made to andrew over IRC: The scheduler possibly weights the hosts, the default being `all_weighers` which... [23:31:45] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Rebalance tools exec nodes with an eye towards CPU usage - https://phabricator.wikimedia.org/T161006#3120265 (10hashar) And a paper that happens to mention the case we have https://01.org/sites/default/files/utilization_based_scheduing_in_openstack_compute_n... [23:32:02] andrewbogott: and after all my guesses, I eventually find a paper explaining it all for us https://01.org/sites/default/files/utilization_based_scheduing_in_openstack_compute_nova_1.docx :] [23:50:38] 10Tool-Labs-tools-Xtools, 06Community-Tech: Add a server-side caching service for the new XTools - https://phabricator.wikimedia.org/T161057#3120084 (10Samwilson) There are multiple cache adapters available, and each installation of xTools can choose which is most suitable: http://api.symfony.com/3.2/Symfony/C... [23:52:38] 06Labs, 06Operations, 10hardware-requests: Eqiad: (2) hardware access request for labnet1003/1004 - https://phabricator.wikimedia.org/T158204#3029672 (10faidon) I'm not sure yet exactly which configurations are affected by the SSD/memory shortage, but I'm wondering if it would affect this order by virtue of...