[00:01:55] 06Labs, 10Labs-Infrastructure, 06Discovery, 06Maps, and 2 others: Update coastline data in OSM postgres db (osmdb.eqiad.wmnet) - https://phabricator.wikimedia.org/T140296#2460301 (10MaxSem) [02:48:42] 06Labs, 13Patch-For-Review: Allocate vlan and IPs for labtest VMs - https://phabricator.wikimedia.org/T123817#1938645 (10AlexMonk-WMF) So the private range has already been reserved for codfw labs, and labtest instances are using those. The public IPs have been reserved, though I imagine they'd need some speci... [02:56:56] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Tracking: Missing Toolserver features in Tools (tracking) - https://phabricator.wikimedia.org/T60791#2460517 (10Quiddity) [03:04:36] 06Labs, 10Tool-Labs, 07Tracking: Packages to be added to toollabs puppet - https://phabricator.wikimedia.org/T55704#2460535 (10Quiddity) [04:09:26] 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2460744 (10Quiddity) [04:16:49] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure, 07Tracking: Log files on labs instance fill up disk (/var is only 2GB) (tracking) - https://phabricator.wikimedia.org/T71601#2460767 (10Quiddity) [04:18:13] 10Tool-Labs-tools-Other, 07Tracking: merl tools (tracking) - https://phabricator.wikimedia.org/T69556#2460771 (10Quiddity) [05:49:38] hi - I get a 502 Bad Gateway using https://tools.wmflabs.org/wp-world/marks.php... [06:10:10] 06Labs, 10Beta-Cluster-Infrastructure, 10MediaWiki-General-or-Unknown, 06Operations: Create a poolcounter instance in deployment-prep - https://phabricator.wikimedia.org/T112501#2460920 (10greg) [10:33:54] 10Labs-Kubernetes, 13Patch-For-Review: Can't start k8s webservice for tool "admin-beta" - https://phabricator.wikimedia.org/T140303#2461505 (10yuvipanda) 05Open>03Resolved a:03yuvipanda Done! admin-beta now runs on k8s [10:38:20] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Disable service accounts - https://phabricator.wikimedia.org/T140347#2461508 (10yuvipanda) [12:24:11] hey labs people, I have an instance that I know is not supported anymore, but it's doing something really weird: there are lots and lots of "nc" processes [12:24:15] nc as in netcat [12:24:42] they're all just running at the same time. This is limn1, one of the last self-hosted puppetmasters and puppet has been disabled on it for a while [12:25:13] just checking with yall, I'm about to kill all those processes, but just in case one of you is running them for some reason [12:28:50] milimetric: what are they doing? [12:29:06] zhuyifei1999_: no idea, I just killed them all and nothing bad happened [12:29:10] (just wondering) [12:29:55] they were just all taking 2% of the CPU using up the whole box: https://tools.wmflabs.org/nagf/?project=analytics#h_limn1 [12:33:48] milimetric: https://graphite.wmflabs.org/render/?title=limn1+Load+last+day&width=800&height=250&from=-1day&hideLegend=false&uniqueLegend=true&target=alias%28color%28stacked%28analytics.limn1.loadavg.01%29%2C%22%23bbbbbb%22%29%2C%221-min%22%29&target=alias%28color%28analytics.limn1.loadavg.processes_running%2C%22%232030f4%22%29%2C%22Procs%22%29 [12:33:58] the load is about 500 [12:34:20] looks insane [12:34:28] that means 500 procs, right? [12:34:38] no [12:35:03] load = 500 means you need 500 cpus to make it not overloaded iirc [12:35:37] lol, ok [12:35:53] thx, no worries though, problem seems solved for now, I'll investigate more if it recurs [12:38:29] try strace it if it recurs, most of the cpu usage seems kernel usage, so there must be tons of kernel calls [13:14:19] 06Labs, 10Tool-Labs, 10DBA, 07Tracking: Certain tools users create multiple long running queries that take all memory from labsdb hosts, slowing it down and potentially crashing (tracking) - https://phabricator.wikimedia.org/T119601#2461836 (10jcrespo) [13:14:23] 06Labs, 10Tool-Labs, 10DBA, 06Operations, 10Traffic: Antigng-bot improper non-api http requests - https://phabricator.wikimedia.org/T137707#2461834 (10jcrespo) 05Open>03Resolved a:03jcrespo [15:02:19] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs: Add publicly-editable tag system to http://tools.wmflabs.org/?list - https://phabricator.wikimedia.org/T139991#2462288 (10bd808) a:05bd808>03None [15:03:05] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Split OGE grid status data collection out of admin tool - https://phabricator.wikimedia.org/T140251#2462293 (10bd808) a:03bd808 [15:03:40] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Modernize the admin tool's codebase - https://phabricator.wikimedia.org/T140254#2462298 (10bd808) a:03bd808 [15:05:18] yuvipanda: wasn't the cluster that delivered graphite and grafana, usage data going to be upgraded. I'm not getting any data from it now. [15:06:50] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Split OGE grid status data collection out of admin tool - https://phabricator.wikimedia.org/T140251#2462324 (10bd808) Working on this as the https://tools.wmflabs.org/gridengine-status tool with source code at {R1921}. I started as a human read... [15:08:18] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Modernize the admin tool's codebase - https://phabricator.wikimedia.org/T140254#2458217 (10bd808) Working on this as the https://tools.wmflabs.org/admin-beta/ tool with source code at {R1922}. [15:09:30] 06Labs, 10Labs-Infrastructure: Shrink default quota for labs projects - https://phabricator.wikimedia.org/T140158#2462332 (10Andrew) a:03Andrew [15:14:57] CP678|Laptop: yuvi is off today, but -- https://grafana.wikimedia.org/dashboard/db/labs-project-board -- looks like it has up to date data on the cyberbot project to me. Note that occasionally the javascript that grafana uses will flake out and leave a graph empty. Hitting the reload button (circular arrows) in the top right corner once or twice usually fixes this. [15:15:51] Similar data is available at https://tools.wmflabs.org/nagf/?project=cyberbot too [15:28:54] Change on 12www.mediawiki.org a page OAuth/For Developers was modified, changed by NKohli (WMF) link https://www.mediawiki.org/w/index.php?diff=2187955 edit summary: Reveal secret about /authenticate instead of /authorize [15:32:57] bd808: before yuvipanda did something with it roughly a month ago, it never flaked out. [15:33:08] I want that kind of operation again. [15:33:42] CP678|Laptop: {{cn}} We only started messing with it because the physical server was dying under the load of collecting data [15:34:13] so "never" may mean "you had not noticed" [15:35:12] graphite is a touchy beast. even our prod servers for it are always on the ragged edge of dying [15:36:17] it is completely possible to make it a very reliable service, but that takes more hardware and humans than Wikimedia can afford to devote to it honestly [15:38:23] I've had nearly unlimited money and lots of humans before and it was still rigged up and constant tending :) [15:38:51] but yeah yuvi's work had little visible impact for users it was focused around performance for teh system itself which is loads better [15:39:03] so he really holds none of the blame if it's messy [15:39:09] or no more so than all of the rest of us [15:54:37] bd808: no I mean never. [15:54:57] I could stare at for a whole half-hour without a single flake. [15:55:15] Now there's always one graph missing during a refresh. [15:56:44] that sounds more like a grafana bug than a problem with the labs graphite server, but I guess you'd need to look at the js errors and ajax requests in your client to decide for sure. [15:58:02] yes [15:59:14] labs-project-board makes a metric ton of parallel requests to graphite.wmflabs.org :/ [16:01:54] is that a VM? [16:03:53] chasemp: maybe? its fronted by novaproxy-01 which actually might be part of the problem [16:28:20] 06Labs, 13Patch-For-Review: Allocate vlan and IPs for labtest VMs - https://phabricator.wikimedia.org/T123817#2462551 (10AlexMonk-WMF) [17:08:39] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Shrink default quota for labs projects - https://phabricator.wikimedia.org/T140158#2462739 (10Andrew) I merged a patch thinking it would lower the default for new projects but, surprisingly, it lowered the quotas for all projects that had defaults previously... [17:11:21] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Shrink default quota for labs projects - https://phabricator.wikimedia.org/T140158#2462761 (10Andrew) {F4274459} [17:12:48] getAllEmbedings ($Templ) [17:12:53] oops [17:12:56] http://www.commitstrip.com/en/2015/11/10/coder-epitaphs/ [17:13:42] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Shrink default quota for labs projects - https://phabricator.wikimedia.org/T140158#2462768 (10Andrew) This is all done now, except for (optional) pestering of admins for the above projects. [17:15:15] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Shrink default quota for labs projects - https://phabricator.wikimedia.org/T140158#2462788 (10chasemp) >>! In T140158#2462768, @Andrew wrote: > This is all done now, except for (optional) pestering of admins for the above projects. I think we should break t... [17:16:05] yay, now 3 of four of my statistic things at horizon are red [17:16:13] * Luke081515 doesn't like red circels [17:22:04] 06Labs: Review resource usage for projects with quotas over the default. - https://phabricator.wikimedia.org/T140381#2462860 (10Andrew) [17:22:18] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Shrink default quota for labs projects - https://phabricator.wikimedia.org/T140158#2462872 (10Andrew) 05Open>03Resolved OK, followup is now in T140381 [17:24:12] andrewbogott: how do you want that? ^ brngt project admins to comment on that, or make subtask or...? [17:24:52] Luke081515: sorry, what's the question? [17:25:04] What is the 'that' in how do you want that? [17:25:12] andrewbogott: concerning T140381 [17:25:12] T140381: Review resource usage for projects with quotas over the default. - https://phabricator.wikimedia.org/T140381 [17:25:59] Luke081515: probably we'll start a wiki page about it, but for now that bug is just a note to follow up post-purge [17:26:09] hm, ok [17:26:35] Luke081515: note that I was careful to ensure that no project is currently in quota violation [17:26:39] So there shouldn't be anything pressing [17:26:45] at least for my project, I think I can confirm that I need the actual quota, I deleted all uneeded ressources some days ago [17:26:53] atm it's just a bookkeeping change/a way to notice future growth [17:27:06] Luke081515: that's fine, feel free to add a note about your project on that ticket if you like [17:27:37] andrewbogott: yeah, ok. Actually I just don't like big red things at horizon ;). My first is always that that is something bad, not normal [17:27:39] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Shrink default quota for labs projects - https://phabricator.wikimedia.org/T140158#2462970 (10chasemp) 05Resolved>03Open small reopen window here as I asked @andrew if he could put the details on non-default allocations. the above means we have essentia... [17:27:51] In this case it just means 'full' not 'danger' [17:27:56] Agreed it shouldn't really show as red [17:28:19] maybe orange would be ok [17:28:32] I always assosiate red with an error or alert first :-/ [17:29:36] 06Labs: Review resource usage for projects with quotas over the default. - https://phabricator.wikimedia.org/T140381#2462995 (10Luke081515) From IRC: 19:26 < Luke081515> at least for my project, I think I can confirm that I need the actual quota, I deleted all uneeded ressources some days ago 19:27 < andrewb... [17:29:59] Luke081515: if you don't say what your project is in the comment, it doesn't really help me :) [17:30:43] oh :D [17:30:54] luckily I can edit comments ;) [17:31:24] andrewbogott: added now [17:34:42] 06Labs: Review resource usage for projects with quotas over the default. - https://phabricator.wikimedia.org/T140381#2463069 (10Andrew) [17:35:45] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Shrink default quota for labs projects - https://phabricator.wikimedia.org/T140158#2463073 (10Andrew) 05Open>03Resolved I updated T140381 with every usage above default quotas. [17:51:33] 06Labs, 10Horizon: Allow users to edit proxies - https://phabricator.wikimedia.org/T140391#2463148 (10mobrovac) [18:13:55] andrewbogott: hi there [18:14:17] how can i include a puppet role in a private manifest ? [18:18:13] Is revision text accessible from labs, without API calls to production wikipedia? [18:18:26] maybe in dumps ragesoss [18:18:41] think there's a mount for dumps [18:26:48] ragesoss: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database [18:27:19] chasemp: is instance spawning still restricted ? [18:28:40] Yes there has been a reaource driven freeze for a week or so. We are now talking about how/when to revert it matanya [18:28:48] resource even [18:29:31] thanks chasemp i'll think of a solution to my problems [18:30:03] what about my question about calling roles from operations puppet in my private puppet manifest ? [18:35:50] Never saw and I dont think I understand the question [18:36:16] seems reasonable but ops/puppet wont behave as a stable upstream [18:36:23] buyer beware? [19:37:58] yuvipanda: http://x-team.com/2016/07/introduction-kubernetes-architecture/ - Quite a pleaseant read [19:39:22] chasemp: to clarify: https://github.com/Toollabs/video2commons/blob/0b5ce84f59444a2bf1d42cb2eba78e9c77a4ad39/puppet/backend.pp [19:39:36] this includes puppet code running on a labs instance ^] [19:40:18] I would like to include the defined type from operations puppet, such as git::clone, or include some classes [19:40:29] sure but what's the question? [19:40:36] whehter that's allowed or a good idea or ? [19:40:43] the question is: is that possible, or i need to right it again on my own ? [19:40:58] *write [19:41:03] you can have multiple module collections [19:41:07] yuvipanda: Though I found the way it thinks of services/pods somewhat confusing, different from what I thought so far. [19:41:22] bd808: can you elaborate please ? [19:41:33] matanya: so in theory you could have ops/puppet.git cloned and include it in your puppet configuration [19:41:55] matanya: https://docs.puppet.com/puppet/latest/reference/dirs_modulepath.html [19:42:11] why in theory ? [19:42:17] bad practice ? [19:42:23] because I haven't done it in practice :) [19:42:52] it's going to be confusing if you count on a consistent interface or really any sanity or consistency at all [19:42:59] we tend to refactor on the fly and it can be deep or shallow [19:43:02] options or context [19:43:20] so while I would grab a defined type or a lib from puppet and stash it in my own project in an ugly way sure [19:43:30] it's all buyer beware [19:43:31] matanya: are you not using the ops/puppet.git already on these labs hosts? [19:43:43] I do, that is why i asked [19:43:54] it calles all the tools/ mocules [19:43:58] *modules [19:44:25] and i don't think ops would won't my stuff on the prod git repo (i.e.ops/puppet) [19:44:29] why not move your custom module into ops/puppet and manage via cherry-picks when needed like we do with beta cluster? [19:44:52] I don't want to mess with it too much [19:45:00] it is a live serving tool [19:45:07] with quite a lot of traffic [19:45:33] all the more reason to have it's puppet in the main repo I think [19:46:08] yes, but who will look after it from ops? i don't think there is any capacity within ops for merging it [19:46:58] matanya: don't tell me you don't have people who owe you code review ;) [19:47:13] well, i do :) quite a few [19:47:17] if it is well factored then it shouldn't be a problem [19:47:30] i will give it a shot [19:47:42] thanks for the encouragment [19:47:43] at worst there are weekly puppet swat windows now [19:47:53] yes, i am aware [19:48:13] if you run a project local puppetmaster you can use cherry-picks from gerrit to keep yourself unblocked [19:48:19] it's not too tricky [19:48:31] but breaks often [19:48:35] i tried that path [19:48:43] not in my experience [19:48:49] but I guess that may vary [19:49:10] if you use pip or other dev, bleeding edge stuff than ... [19:49:14] *then [19:49:28] if I'm you for non-wmf oriented advice and I want my thing to be simple [19:49:40] I would have multiple paths to module dirs with one of htem being the ops/puppet repo [19:49:47] and I would update that as needed for yourself [19:49:53] and keep it simple if you can [19:50:08] instead of trying to push it to main ops repo ? [19:50:26] he needs all the ops/puppet stuff too though or labs changes will break everything. This is the puppetception problem that we had with labs-vagrant [19:50:37] yes, that [19:50:49] well yeah you can't use it to manage things that will conflict :) [19:50:57] so then it depends on why want to do it [19:50:58] I certainly wouldn't use ops/puppet outside of WMF networks. that's madness [19:51:02] and it's entirely specific use case dependent [19:51:12] it is all on labs [19:51:24] that's not a use case [19:51:34] I mean, do you want to use real::server_ip defined type and that's it [19:51:41] that was a comment to bd808 [19:52:00] sure my comment was somewhat general :) [19:52:21] it would be nice to *really* solve this problem (easy puppet for projects) [19:52:26] yes [19:52:31] it's on my top 10 most wanted list [19:52:34] but I don't have the time or rights to really do much about it [19:52:37] I need like 10 defined types from the ops repo, and all the general good stuff from the labs modules [19:52:38] puppet environments per project [19:52:42] or some logical equiv [19:52:58] matanya: that's pretty deep integration then bd808 is probably right [19:53:03] you'll get pain from either side [19:53:09] so it's up to you what's most painful [19:53:14] no pain, no gain [19:53:47] I start playing around and see what kills me [19:54:18] I've played with https://github.com/voxpupuli/librarian-puppet a bit on my laptop [19:54:30] which is kind of like bundler for puppet manifests [19:56:43] matanya: the way to do pip that looks like it is going to get traction in production is using wheels. Right now ORES is doing that via git repo that hosts the wheels it needs [19:57:12] I thought of using python-pip from the apt repo [19:57:27] I <3 wheels [19:57:35] Super easy to build and maintain [19:57:42] I'm happy we made that investment [19:57:52] pip makes wheels very easy to build and install [19:58:13] pip wheel --no-deps the-package-i-want [19:58:15] Done [19:58:29] (of course, mind your arch if it's not a pure python package) [19:58:40] But pip will tell you when you do that wrong too :) [20:07:14] yuvipanda: around? [20:07:47] think I'm having some issues with Kubernetes [20:09:23] musikanimal: he's off today, you can ask in -labs [20:09:48] yuvipanda should not be allowed to take time off :) [20:10:15] heh for some reason I thought we were in another room so you already did ask in -labs :) [20:10:16] no biggie, I have a workarond [20:10:20] best to put it out there and see if anyone knows [20:10:21] ok [20:10:27] haha np, thanks [20:11:08] I've found restarting the kubernetes webservice is a bit buggy sometimes [20:11:34] I have to kill it entirely, wait a few seconds, then start [20:12:00] hm some part of it isn't synchronous? I don't really know musikanimal but it's worth a ticket if you can recreate it [20:12:20] doesn't happen every time, but he did mention something about race conditions [20:12:34] hence why waiting a few seconds seems to do the trick [20:12:54] hmmm.. that seems worth a bug report [20:13:17] I will play around and see if I can reliably reproduce it [20:13:21] there was a bug where it always waited $TIMEOUT seconds due to a flipped status check [20:13:27] but that should be fixed not [20:13:30] *now [20:13:52] that timeout may have been masking another problem though [20:17:14] I think the issue may resolve itself somehow. If I wait a while and do `webservice restart`, it appears to finish, I'll refresh the page, it hangs then I get a 502 [20:17:37] more `webservice restart`s don't work, but if I `stop` then `start` it works [20:17:59] then shortly thereafter `webservice restart` doesn't bring it down, and restarts very quickly [20:21:00] musikanimal: weird. "restart" is just a short cut for "stop && start" [20:21:14] no hidden magic there [20:21:32] I guess the difference is that I waited a bit in between [20:21:51] but anyway subsequent restarts seem to work fine [20:22:26] This first happened with /pageviews-test yesterday, I did a stop/start and then restarts worked after that. Today the same thing happened, the first restart brought the tool down [20:23:04] and it didn't give you an error message at the cli when it failed? [20:23:11] it seems like the restart vs stop/start is a red herring, it's just different state post stop/start regardless of mechanism [20:23:44] there is a "# FIXME: Treat pending state differently" in start() that might be somehow related [20:24:15] I didn't see any errors, no [20:25:14] oh... i see a way it could go badly [20:25:22] stop() waits max 15s and then returns [20:26:06] Yay i got graphite and grafana running on labs [20:26:08] http://gerrit-grap.wmflabs.org/login [20:26:13] if the pod isn't shutting down then I think it's a bit undefined what start() will actually do [20:26:15] http://gerrit-graph.wmflabs.org/ [20:26:21] same thing happened with /pageviews just now, the issue seems consistent [20:26:43] musikanimal: write it up please :) [20:26:48] sure thing [20:34:00] Change on 12www.mediawiki.org a page OAuth was modified, changed by Kaldari link https://www.mediawiki.org/w/index.php?diff=2188088 edit summary: [[OAuth/For Developers]] [20:40:17] 10Labs-Kubernetes: Apparent issue with restarting Kubernetes webservice - https://phabricator.wikimedia.org/T140415#2463927 (10MusikAnimal) [20:46:46] chasemp: is it acceptable to ask to backport packages from stretch to apt.wikimedia.org ? [20:47:08] matanya: tbh I can't recall what our backports policy is [20:47:44] that is a moritzm question chasemp ? [20:48:17] matanya: sure, but depends on the package [20:48:31] python packages [20:48:43] moritzm: e.g https://packages.debian.org/stretch/python-guess-language [21:04:13] bearloga: Is the discovery-stats project still used for something? [21:04:59] bd808: Hi! I'm helping ellery debug a flask service he's trying to run on tools. It's python3 and we're doing webservice uwsgi-plain start [21:05:25] but it doesn't seem to be picking it up - if i stop it, and check status - it claims that it's still running [21:05:30] the tool is wiki-talk [21:05:58] 10Tool-Labs-tools-Erwin's-tools: Unknown Error/MySQL errors - https://phabricator.wikimedia.org/T140421#2464064 (10Supernino) [21:06:02] the public url won't resolve, and hitting it doesn't show up in access logs either [21:06:10] error logs also don't seem to be updating [21:06:14] madhuvishy: and you think I know what's going on? ;) [21:06:18] ha ha [21:06:20] error logs are sloooow [21:06:23] nfs and all that [21:06:37] was wondering if you had encountered this before :) [21:07:04] bd808: and stopped webservices can keep claiming they are running? :) [21:07:13] let me look at the tool for a second [21:07:18] thanks [21:07:47] i think a bunch of things were tested in the past so it has multiple copies of things - let me know if something is confusing [21:08:07] the uwsgi.ini file looks right to me though [21:08:29] the gird job is stuck in dr state [21:08:50] which I think means it didn't shutdown nicely [21:08:52] aah [21:09:00] qstat shows that [21:09:19] what is dr? [21:09:34] Jeff_Green: Are you still active with the Labs 'fundraising' project? It looks to be abandoned. [21:12:01] madhuvishy: not ignoring. trying to find description of state info for OGE [21:12:25] bd808: no problem :) I'm looking too [21:12:33] "d" is for deletion, r is .. something [21:12:53] oh r is running [21:12:57] yeah [21:13:00] deleted, running [21:13:04] i dont even [21:13:23] http://gridscheduler.sourceforge.net/htmlman/htmlman1/qstat.html -- look for "the status of the job" [21:13:50] so "d" means that qdel asked for it to die, but "r" means it hasn't yet [21:14:09] right [21:14:09] ah you deleted me and I"m still running [21:14:56] madhuvishy: use your sudo super power to `qdel --force 8670884` [21:15:07] bd808 I created a tool on labs called readmore last week, but I cannot create files. Is there a way to reset the permission? [21:15:33] ewulczyn_______: what are you trying that means 'cannot create files'? [21:15:39] as you or as the bot user? [21:15:41] chasemp: [21:15:50] the permissions for the tool are set to root [21:15:55] and not the tool user [21:16:07] Yuvi mentioned some race condition and this happening sometimes [21:16:11] become readmore; mkdir test: mkdir: cannot create directory ‘test’: Permission denied [21:16:13] drwxr-xr-x 3 root root 4096 Jul 9 22:51 /data/project/readmore/ [21:16:24] interesting [21:16:25] yup [21:16:57] should be more like "drwxrwsr-x 7 tools.bd808-test tools.bd808-test 4096 Jun 30 17:12 /data/project/bd808-test" [21:17:13] madhuvishy: did it actually stop now? [21:17:29] state is still dr [21:17:48] did you try the --force qdel? [21:17:48] nope [21:17:52] yeah [21:18:04] that is what i did [21:18:20] ewulczyn_______: try again [21:18:49] chasemp: its fixed, thank you! [21:18:49] madhuvishy: worked when I did it! [21:18:51] I have never seen this happen though so I'm really curious what the race is [21:19:04] bd808: may be i don't have superpowers [21:19:27] `sudo qdel --force 8670884` -- warning: root forced the deletion of job 8670884 [21:20:00] bd808: yeah i can't sudo [21:20:02] cool [21:20:07] it died now [21:20:18] but i couldn't kill it [21:20:30] madhuvishy: you should talk to your boss about getting added to the right perm groups :) [21:20:37] :D [21:21:01] technically I start on 18th so I haven't started asking yet ;) [21:22:43] logs without timestamps are pretty useless :/ [21:22:47] it's running now - but isn't really loading [21:22:49] * bd808 glares at uwsgi.log [21:22:54] and logs aren't updating [21:23:24] yeah i was looking at access.log [21:23:29] since it has timestamps [21:23:35] but last ones are in feb [21:25:12] !log change perms for tools.readmore to correct bot [21:25:13] change is not a valid project. [21:25:16] !log tools change perms for tools.readmore to correct bot [21:25:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [21:28:31] madhuvishy: hmmm... it looks like the default route should be pretty quick. there must be other things wrong [21:29:02] 06Labs, 06Operations: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464178 (10chasemp) [21:29:10] bd808: yeah - but it would at least show up on access logs? [21:29:43] I'm not certain that uwsgi-plain actually runs through lighttpd [21:29:48] 06Labs, 06Operations: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464190 (10chasemp) p:05Triage>03Normal [21:29:58] * bd808 hasn't messed in python land for web tools much [21:30:04] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Mholloway was created, changed by Mholloway link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Mholloway edit summary: Created page with "{{Tools Access Request |Justification=Creating and maintaining tools to support the WMF's Wikipedia mobile applications. |Completed=false |User Name=Mholloway }}" [21:30:19] bd808: hmmm, okay - i'll see if i can find out [21:30:28] 06Labs, 06Operations, 10Ops-Access-Requests: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464178 (10chasemp) [21:30:44] there are a lot of old log files in that too [21:30:46] *tool [21:30:49] yeah [21:30:56] let me get rid of them all and see [21:31:27] you'll need to restart the service too. oge doesn't notice when the inode disappears [21:31:44] andrewbogott: I occasionally use it. but I can shut that instance down and only turn it on when I needed it. sorry I haven't! [21:31:47] that leads to the .nfs000000000* files [21:32:02] bearloga: are you subscribed to labs-l? [21:32:47] bearloga: if not, please subscribe now. https://lists.wikimedia.org/mailman/listinfo/labs-l [21:32:58] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Mholloway was modified, changed by BryanDavis link https://wikitech.wikimedia.org/w/index.php?diff=751038 edit summary: [21:33:08] andrewbogott: will do! [21:33:33] bearloga: also, please visit https://wikitech.wikimedia.org/wiki/Purge_2016 and follow the instructions at the top [21:35:56] andrewbogott: was that something I created? I don't even remember! [21:36:28] andrewbogott: if i'm the only user you can remove it [21:36:35] Jeff_Green: admins are you, AndyRussG, awight, Cdentinger [21:37:04] huh. i guess I would check in with them [21:37:17] Jeff_Green: shall I remove you as admin then? [21:37:21] sure [21:37:45] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Niedzielski was created, changed by Niedzielski link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Niedzielski edit summary: Created page with "{{Tools Access Request |Justification=Wikimedia Android mobile and Content Service development. |Completed=false |User Name=Niedzielski }}" [21:37:59] Jeff_Green: can you suggest an irc channel where I might find some/all of them? [21:39:19] andrewbogott: #wikimedia-fundraising [21:39:33] well, that makes sense :) [21:50:51] SPF|Cloud: is the wmt project still active? You are its only admin. [21:51:00] andrewbogott: not really [21:51:08] Shall I delete it? [21:51:11] I'm aware of the purge, just don't know what I should do with ti [21:51:44] can I request it again if I want to work on it again? [21:52:13] yes, or you can delete your instances and just keep it around as an empty project pending future work. [21:53:06] I'll delete the instances [21:53:40] ok. Please sign on the purge page as well so I don't bug you again :) [21:54:25] sure [21:54:46] I'm only waiting for wmt-exec to reboot properly, otherwise I can't delete it yes [21:54:46] yet* [21:57:19] 10Wikibugs: Do not notify #Trash tasks to IRC - https://phabricator.wikimedia.org/T140426#2464291 (10Danny_B) [22:03:59] andrewbogott: I'm not able to boot wmt-exec [22:04:16] well, I mean, I can't SSH into it [22:04:18] SPF|Cloud: why does booting matter? You're going to delete it, right? [22:04:32] I want to check if there's any data on it I want to backup first [22:05:10] southparkfan@wmt-bastion:~$ ssh southparkfan@wmt-exec.wmt.eqiad.wmflabs [22:05:10] ssh: connect to host wmt-exec.wmt.eqiad.wmflabs port 22: Connection refused [22:12:29] SPF|Cloud: I can't get it to do anything useful either. It won't update the syslog even [22:12:52] console doesn't work either? [22:13:43] getting a console on a VM is not straightforward. [22:14:05] Hi, how can I scp files into my tool's home directory? [22:14:29] wmt-apache seems to have crashed too, hard rebooting that [22:15:29] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Niedzielski was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=751280 edit summary: [22:17:23] andrewbogott: is this instance unrecoverable now? [22:19:42] SPF|Cloud: if it's very important I can probably rescue some data on there, if you know what you need. It won't happen today though. [22:19:55] I don't know if there's anything important on it [22:19:59] that's the problem [22:21:20] that doesn't sound very urgent :) [22:21:36] bd808: could you please use your superpowers and kill the 2 jobs that show up on qstat in the tool 'readmore' [22:21:44] that's fine [22:21:56] but that also means I won't delete wmt-exec (yet) ;) [22:21:57] i have a theory but cannot test if they don't die in the first place [22:22:07] bastion/apache are now being deleted [22:23:55] madhuvishy: {{done}} [22:24:05] bd808: <3 thanks [22:25:40] bd808: How can I transfer (large) files to my tool dir? I can scp to my home directory, but cannot move the files from there to my tool. I also cannot scp straight to the tool dir. [22:26:22] ewulczyn_______: you should be able to mv to the tool after the files are on a bastion [22:26:43] what error are you getting? [22:26:47] scp /path/to/file username@a:/path/to/destination [22:26:58] ewulczyn_______ ^^ [22:27:00] bd808: mv: cannot create regular file ‘/data/project/readmore/test.txt’: Permission denied [22:27:31] so username will be replaced with the user name you use to ssh into instances [22:27:44] and the a will be replaced with instance name [22:28:13] and first part /path/to/file will be the file you are wanting to copy over to anothe rinstance [22:28:22] then /path/to/destination is the destination file [22:28:27] ewulczyn_______: how large are we talking about? [22:28:28] where you want to it stored [22:28:35] http://unix.stackexchange.com/questions/106480/how-to-copy-files-from-one-machine-to-another-using-ssh [22:29:05] (03PS1) 10Legoktm: Send #Trash to /dev/null [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/299086 (https://phabricator.wikimedia.org/T140426) [22:29:19] chasemp: right now, I can't even transfer a small test file. Evnetually, the file will be about 2GB. [22:29:57] (03CR) 10Legoktm: [C: 032] Send #Trash to /dev/null [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/299086 (https://phabricator.wikimedia.org/T140426) (owner: 10Legoktm) [22:30:22] paladox: I am able to scp from my local machine to my home dir, but am getting permission errors if I want to scp to my tool dir. [22:30:25] (03Merged) 10jenkins-bot: Send #Trash to /dev/null [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/299086 (https://phabricator.wikimedia.org/T140426) (owner: 10Legoktm) [22:30:37] ewulczyn_______: it's a known thing, what file do you want in your tools dir [22:30:47] by known thing I mean we don't allow scp directly as the tool now [22:30:51] ewulczyn_______ oh, try for the destination location /home/usernamehere/ [22:31:05] then once it is ssh over you can mv it around [22:31:12] since i did that [22:31:22] chasemp: can I move a file from my homedir to my tool? [22:31:35] you should be able to yes, what file are you trying? [22:31:57] 10Wikibugs, 13Patch-For-Review: Do not notify #Trash tasks to IRC - https://phabricator.wikimedia.org/T140426#2464533 (10Legoktm) 05Open>03Resolved a:03Legoktm [22:32:44] chasemp: mv /home/ewulczyn/test.py /data/project/readmore/ [22:32:55] mv: cannot create regular file ‘/data/project/readmore/test.py’: Permission denied [22:33:10] "readmore" is my tool [22:33:18] you need to sudu su [22:34:06] don't have permissions to do that [22:34:14] become tool [22:34:43] Reedy: that isn't working either [22:34:57] as in, can become tool [22:35:04] where tool is replaced with the tool name? [22:35:09] but can't move from user to tool dir as user or tool [22:35:15] of course [22:35:35] you could just temporarily give "everyone" read [22:35:37] become tool [22:35:48] cp /home/user/file.whatever [22:36:11] I may know what's up but give me a sec [22:36:28] actually... Shouldn't the user account be part of the tools group? [22:36:35] so, should have the right sto move it there? [22:36:52] uh, isn't this what `take` is for? [22:36:57] ewulczyn_______: try again to copy as you to the /data/project/readmore [22:37:13] legoktm: maybe so yeah [22:37:23] chasemp: works, thanks [22:37:43] what was happening? [22:38:00] race condition hosted the ownership and mod values [22:38:09] take for instance drwxrwsr-x 12 tools.admin tools.admin 4096 Jun 29 13:53 admin [22:38:22] then see her as was: drwxr-xr-x 10 tools.readmore tools.readmore 4096 Jul 14 22:25 . [22:38:42] now: drwxrwxr-x 10 tools.readmore tools.readmore 4096 Jul 14 22:25 readmore/ [22:39:02] she was a member of the group but members of the group didn't have write permission :) [22:40:20] differnt ls command generated my examples there :) [22:40:33] right [22:43:28] oh s/hosted/hosed [22:44:27] legoktm: so we have a flask app that we are trying to run in the tool - uwsgi picks it up fine, and loads it - but then the tool doesn't load at all [22:44:49] and it won't die either if you stop it [22:44:57] (python3 app) [22:45:13] and nothing in error logs either [22:45:23] seen anything like this before? [22:48:15] madhuvishy: is debug mode enabled? [22:48:29] also, does uwsgi work with python3 now? I remember it didn't before [22:49:38] https://github.com/legoktm/checker/blob/master/README lets you test stuff so you can bypass the labs webservice stuff to make sure things work [22:50:17] legoktm: looks like not [22:52:04] legoktm: not sure, the docs say to use uwsgi-plan [22:52:06] plain [22:52:18] and on the uwsgi end, things look fine [22:52:39] but the job enters some weird state, and then I can't kill it [22:52:46] making debugging - hard [22:54:09] chasemp: how do you feel about adding me to tools admins now-ish? :D [22:56:21] I have to run here ...5 minuts ago but andrewbogott or bd808 if you have a minute to add madhu to tools admins I'm +1ing the idea today [22:56:59] sure. admin on the tools project is what's needed right? [22:57:24] and cloudadmin on wikitech? [22:57:50] first one gets her going today, the second I'm cool w/ but idk if requires some andrew direction on use [22:58:18] ok thanks gotta run [22:58:28] yeah just being able to sudo qdel for now would be great [22:58:30] thanks chasemp :) [22:58:43] !log tools Added Madhuvishy as projectadmin [22:58:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [22:58:58] bd808: :D [22:58:59] thank you [22:59:03] madhuvishy: log out of the tools bastion you are on and log back in [22:59:09] yup [23:01:13] madhuvishy: more on-wiki rights now too -- https://wikitech.wikimedia.org/w/index.php?title=Special:Log&page=User%3AMadhuvishy [23:02:05] !log tools.admin Added Madhuvishy as maintainer [23:02:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.admin/SAL, Master [23:02:55] bd808: still asks me for password [23:03:13] hmmm [23:03:26] what's your shell name? [23:05:08] madhuvishy [23:07:08] 06Labs, 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464178 (10bd808) * [x] [[https://tools.wmflabs.org/sal/log/AVXrojH3gCrwkbTdmhun|Added as admin on Tools project]] * [x] [[https://wikitech.wikimedia.or... [23:07:54] I don't see anything obviously wrong [23:10:51] let me check for an explicit sudo policy [23:11:18] ah that's the trick [23:12:19] !log tools Added Madhuvishy to project "roots" sudoer list [23:12:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [23:12:32] bd808: :) cool [23:14:44] yess i can sudo now :D [23:14:50] w00t [23:15:24] most projects don't have that extra level of config so I forgot about it