[00:00:50] quiddity: this is good fodder for teh tool takeover committee I think [00:01:02] indeed! I will suggest [00:55:15] 10Tool-Labs-tools-Xtools, 06Community-Tech: Build new front-end for xtools-articleinfo - https://phabricator.wikimedia.org/T159395#3066378 (10kaldari) [00:55:36] 10Tool-Labs-tools-Xtools, 06Community-Tech: Build new front-end for xtools-articleinfo - https://phabricator.wikimedia.org/T159395#3066378 (10kaldari) p:05Triage>03Normal [04:09:59] 10Labs-project-Phabricator: Requesting /data/project NFS share for Nova_Resource:Twl - https://phabricator.wikimedia.org/T159407#3066662 (10jsn.sherman) [07:37:37] 06Labs, 06Operations, 10Traffic, 07Puppet, 07Technical-Debt: Uniform cluster nomenclature across puppet - https://phabricator.wikimedia.org/T159411#3066814 (10Joe) [07:45:01] 06Labs, 06Operations, 10Traffic, 07Easy, and 2 others: Convert all of our site.pp/roles to the role/profile paradigm - https://phabricator.wikimedia.org/T159412#3066827 (10Joe) [09:22:34] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#3066980 (10Marostegui) [09:22:38] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, and 2 others: labsdb1005 (mysql) maintenance for reimage - https://phabricator.wikimedia.org/T157358#3066977 (10Marostegui) 05Open>03Resolved a:03Marostegui I am closing this as nothing has been reported so far. If something arises, feel free to reo... [09:23:48] 06Labs, 10Labs-Infrastructure, 10DBA, 10Datasets-General-or-Unknown: Rebuild old timestamp format tables - https://phabricator.wikimedia.org/T151607#3066984 (10Marostegui) p:05Normal>03Low [10:27:39] 06Labs, 10DBA, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3067283 (10jcrespo) [11:07:47] 06Labs, 10The-Wikipedia-Library: Requesting /data/project NFS share for Nova_Resource:Twl - https://phabricator.wikimedia.org/T159407#3067388 (10Aklapper) I assume that context is T149433. (Removing #labs-project-phabricator as this has nothing to do with that Labs project) [11:37:02] 06Labs: Blacklist apache from unattended-upgrades on tools puppetmaster - https://phabricator.wikimedia.org/T159254#3067485 (10MoritzMuehlenhoff) @scfc: Updates for Apache in jessie are relatively rare, I could simply drop a note in the labs channel when that happens? [11:51:50] Should it be possible to successfully apply any role that is listed on horizon to any labs project? i.e. if applying a particular role makes it not possible to instantiate a VM is that a bug (that I should report)? Or is it likely that many roles won't work out of the box because they depend on other things? [12:08:28] tarrow: many roles will not work [12:08:43] lot of what is in puppet is aimed at production [12:08:48] and would not work out of the box on labs [12:09:07] but if you have some use case for a specific role, I guess it can be adjusted to accommodate for labs usage [12:10:44] not sure; I'm trying to use role::toollabs::elasticsearch. I'm applying it to a new instance which is (I think) pointed at my own standalone puppet master. If you apply it before it boots then it fails to get to the stage where you can ssh in. [12:12:00] I think I know why: it tries to read a secret from a file that doesn't exist in the "not secret" secret repo [12:13:59] if I make that secret file on my puppetmaster and apply the role once I can ssh to the box then it applies (It doesn't work but that's *my* problem) [12:15:12] but if I apply it before I spawn the instance then the instance is never possible to ssh into (even with the dummy secret now existing on my puppet master) [12:20:44] tarrow: role::toollabs::* most probably rely on toolslabs specific environment [12:21:34] if a secret is missing, a dummy one can probably be added to labs/private.git repo (make sure to only use dummy data in there) [12:22:33] maybe you can directly use the ::elasticsearch class [12:22:38] yep; but it needs to be added to the global labs private repo right? I can't just add it to the one on my puppetmaster [12:23:00] there is some doc on https://doc.wikimedia.org/puppet/puppet_classes/elasticsearch.html [12:23:22] (generated from modules/elasticsearch/manifests/init.pp ) [12:23:45] this way you don't depend on whatever toollabs context happens to be [12:24:12] and will not be impacted if toollabs suddenly change something in their role::toolllabs::elasticsearch class [12:25:08] it is lunch time for me [12:25:35] What I was hoping to do was fix a bug that happens on the toollabs instance on a copy on my labs project and then (if it works) submit a patch to role::toolllabs::elasticsearch [12:25:53] bon appetit :) [16:11:52] bd808: I want to add more wikis where an OAuth grant I have can operate... I guess I have to create a new OAUth consumer and deactivate the former? [16:12:37] TabbyCat: bd808 is out for the week and I'm not sure about the answer to that, you could wait till monday or make a ticket [16:15:37] chasemp: oh, yep, I remember now his red dot at Phabricator [16:15:42] it can wait [16:15:58] I'm just trying to repeat it but is this a known bug: spawn a VM on horizon, apply a puppet role, delete the instance, wait some time, make a new instance with the same name, find it has the role applied when it shouldn't? [16:16:02] I can't even create the pywibot family file yet as the script is failing :) [16:16:51] tarrow: it depends on how the role is applied but in general it's via instance name outright or instance prefix so that would be expected [16:17:12] as in, if you create an instance of the same name (which itself has issues) it's expected it's a recreation [16:17:27] but it's a point to be argued I imagine about whether we should tie that together or not [16:17:34] ah, even though the instance id is different? [16:17:59] puppet doesn't know about instance id's so it's not a factor [16:20:00] ah cool, just me being silly then. I should make a fresh name for each new test instance. [16:22:53] tarrow: it's more or less standard if something isn't a special snowflake to do some kind of 'function-01' style for a few reasons [16:23:11] it makes it easier to reasona bout which version or era of an instance or deployment had certain behaviors [16:23:30] and if you do any kind of log collection or monitoring or dealing with config or perf issues over a midterm duration [16:23:34] It seems that puppet "knows" that you've destroyed an instance though because you can make a new one with the same name and it will connect to the puppetmaster despite the fact it offers a certificate signed by a different key. If you generate a new key on the same host without destroying it then you can no longer connect to the labs puppetmaster [16:23:48] it gets confusing if multiple servers with distinct setups at possibly different flavors and such are all named x [16:24:18] yeah, I got up to test-10 and rolled back to test-01 because I didn't think anything would have persisted [16:24:28] tarrow: yes creation and deletion are handled but the custom backend is ignorant to it, that's possibly a bug honestly [16:24:34] or at least it should be explicit to leave it [16:25:02] tarrow: fair point yeah, rollover at some point has to be sane afa numbering [16:25:26] tarrow: my memory says now there is a task somewhere already discussing the lifecycle of puppet application in horizon [16:25:32] I'll try to dig it up [16:25:56] thanks; I'm new to puppet and have spent the last week going further and further down the rabbit hole [16:26:54] puppet is one big rabbit hole tarrow you are not alone [16:27:06] but it's our rabbit hole and we love to hate it [16:27:22] I don't quite understand how when I set a new puppetmaster in hiera and run the agent I can then still make changes on horizon and those roles are applied. But it also seems that the very first run of the instance is against the default puppetmaster [16:28:08] which sucks if you want to apply a role that prevent the client from starting up on the default puppetmaster but will work on your standalone one [16:28:22] two things may be relevant, 1 yes there are 2 puppet runs on all new instances before it's turned over for meeting a project specific master [16:28:29] that sets up the keys and such for us all and various updates [16:28:38] and then the second is puppet has a mechanism known as external node classifier [16:28:51] where you can use external logic or a script to lookup applicable roles and settings [16:29:12] we have a service that holds those values and parameters and I beleive all the masters look to it [16:29:23] whether they are the main labs master or project master by defalt [16:29:24] default [16:29:40] unfortunately it seems that the custom roles are applied before the important things like copying keys so if your custom role fails then you can't login [16:30:02] that's is possible, I'm not sure [16:30:17] order of operation in puppet is fixed but I'm not confident on how it plays out over equally weighted roles atm [16:30:41] I mean, if you remove the assignment in horizon it should then run [16:30:45] afaik [16:31:23] but then you have to reapply it once it has successfully run the first two times which stops you from prefixing [16:32:26] if the role application is broken ...yeah [16:32:41] and also trash the VM and make a new one with a totally new name/number if you make a mistake because you can't unapply it [16:33:37] I thikn that's pretty much all true yes [16:33:53] bootstrapping a broken role application where it doesn't already exist at all is going to have challenges [16:34:10] :) [16:34:52] If I eventually understand it i'll try and document a puppet on labs for idiots guide. [16:34:56] I would say we use the same mechanisms in our own projects as admins and suffer teh same pains [16:35:08] so it's not for lack of caring but it's a difficult problem with almost guaranteed edge cases [16:35:20] that would be great [16:50:44] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3068149 (10chasemp) [17:03:20] 06Labs: Show Already Used Hostnames in OpenStack Horizon - https://phabricator.wikimedia.org/T159460#3068188 (10Tarrow) [17:11:36] 06Labs, 10Tool-Labs, 07Epic: Find a solution for tools-exec-gift on Trusty - https://phabricator.wikimedia.org/T156981#3068216 (10chasemp) A note from yesterday: > annika: chasemp: can you take a look at job 1820616, please? it doesn't schedule on tools-exec-gift-trusty-01.tools.eqiad.wmflabs and i don't ex... [18:25:43] 10Tool-Labs-tools-Xtools, 06Community-Tech: Migrate XTools from Ubuntu Precise to Trusty - https://phabricator.wikimedia.org/T157123#3068565 (10DannyH) [18:26:24] 10Tool-Labs-tools-Xtools, 06Community-Tech, 07Documentation: Improve documentation for setting up XTools on a local machine - https://phabricator.wikimedia.org/T157609#3068567 (10DannyH) [18:28:28] 10Tool-Labs-tools-Xtools, 06Community-Tech: Investigation: Plan for rewriting XTools - https://phabricator.wikimedia.org/T154551#3068572 (10DannyH) [19:27:07] (03PS1) 10Mattflaschen: Add RecentChanges and Watchlist to #wikimedia-collaboration [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/340806 [20:16:30] 06Labs, 10Tool-Labs, 06translatewiki.net: update node.js on tools.telegrambot - https://phabricator.wikimedia.org/T159368#3068849 (10yuvipanda) Upon more thought, the easiest thing for you to do now is probably to use either https://github.com/tj/n or https://github.com/creationix/nvm. [20:17:19] 06Labs: Delete test-spm-1.project-proxy - https://phabricator.wikimedia.org/T159257#3068856 (10yuvipanda) a:05yuvipanda>03None I've no idea who created this instance or what they might be using it for? [21:02:46] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3068974 (10hashar) Nodepool emits a metric similar to the fullstack one. That is from the time an instance is created internally to nodepool until the time it has been added as a Jenkins slave.... [21:07:49] 06Labs, 10Tool-Labs, 06Tool-Labs-standards-committee, 10Technical-Tool-Request: New Maintainer needed for LanguageTool WikiCheck on Tool Labs - https://phabricator.wikimedia.org/T152049#3068997 (10Huji) [21:12:21] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3069031 (10hashar) There are some labvirt* that shows a bump in CPU guest / Load average. labvirt1001 specially is concerning (load raising to 45+). Graphs over 30 days: [[ https://grafana.wiki... [21:41:01] yuvipanda: so about https://phabricator.wikimedia.org/T159368#3068849 [21:43:38] do I need to sudo to `npm install -g n`? [21:43:58] because that asks for a password, and I don't even remember setting one [21:52:35] No, and you do not have sudo rights aharoni [21:53:35] yuvipanda: mmm, so how do I run it? [21:54:22] because it says "Error: EACCES, mkdir '/usr/local/lib/node_modules'" etc. [21:54:25] Aharoni try using nvm. It might be a better fit [21:54:33] For the tools environment which is pretty restricted [21:57:56] aharoni in general if you want to use newer software than what is supported you are a little on your own. We can point you to ways of accomplishing what you want but not much over that. [21:58:37] We would like to support this use case better but we don't have the resources to ATM. Sorry! [21:59:57] yuvipanda: not a big deal :) it's all very experimental [22:01:23] Aharoni cool. I think nvm should work for you but might need help from node experts. [22:11:16] aharoni: What I did was separately download new node and shim it into my path [22:11:39] trying to use nvm and have it work on the grid and webgrid was a nightmare [22:11:58] (hence why I opened the ticket for a new k8s image) [22:12:07] let me find the webscript I have... [22:15:14] aharoni: https://phabricator.wikimedia.org/P5013 [22:16:46] I'm off to bed soon but ping me if you want some advice [22:16:54] tarrow: thanks, I'll try. going to bed, too :) [22:17:11] there's always tomorrow :) [23:11:39] 06Labs, 10Labs-Infrastructure, 10DBA: Data integrity issue with enwiki_p user_groups on Wikimedia Tool Labs (missing rows) - https://phabricator.wikimedia.org/T159493#3069402 (10MZMcBride) [23:16:14] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3069452 (10chasemp) leaked 3 more now leaving to debug tomorrow > PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova... [23:28:44] 06Labs, 10labs-sprint-116, 10labs-sprint-117, 10labs-sprint-118, and 5 others: Replicate production elasticsearch indices to labs - https://phabricator.wikimedia.org/T109715#3069491 (10EBernhardson) 05Open>03Resolved a:03EBernhardson This was a proof of concept, that proof is completed. We know that...