[00:01:58] Coren: could be, but only one security group was on the Instance create, and Help:Labs-vagrant doesn't mention configuring any other role. In 2014 I would have rebooted and run labs-vagrant provision over and over... 8-) [00:02:37] this is so close to being unbelievably easy. biab [00:02:48] spagewmf: you need to open port 80 :) [00:03:01] documentation bug; you either wanted to add port 80 to your default group or (better) create a webserver group with port 80 open and assign it at creation. [00:03:36] not being able to add instances to security groups after creation is so confusing [00:03:45] * bd808 whines [00:04:05] s/confusing/stupid/ [00:41:53] 6Labs, 10Tool-Labs, 6operations, 7Monitoring: Add catchall tests for toollabs to catchpoint - https://phabricator.wikimedia.org/T97321#1271185 (10yuvipanda) 5Open>3declined a:3yuvipanda Superseeded by T97748 and friends. We're having fine grained tests there rather than catchall ones. [00:47:49] 6Labs, 10Beta-Cluster: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1271190 (10yuvipanda) NFS is paging a lot now, so I'll highly appreciate it if this can happen sooner than later :) [00:48:40] Coren: whois isnt installed anymore..... [00:51:56] 6Labs, 6WMF-Legal: Discussion: can I park WikiSpy under a separate, simpler domain? - https://phabricator.wikimedia.org/T97846#1271192 (10ZhouZ) I have discussed this with @Slaporte and assuming our understanding of the situation is correct (@d33tah will register for a domain name which will be hosted on a thi... [00:52:51] Betacommand: I don't think it ever explicitly was. Open a ticket and we can add it. [00:53:02] Coren: under what? [00:53:34] tool-labs [00:55:05] 10Tool-Labs: Whois needs re-installed - https://phabricator.wikimedia.org/T98555#1271194 (10Betacommand) 3NEW a:3coren [00:55:23] Coren: didnt think you could uninstall whois on linux [00:55:41] Betacommand: I think it was part of base on Precise but not in by default on Trusty [00:55:55] Coren: thats my thought too in the ticket [00:56:28] 6Labs: Make labs domainproxies fully redundant - https://phabricator.wikimedia.org/T98556#1271202 (10yuvipanda) 3NEW [00:56:43] still kind of annoying when I am about to work on a backlog, only to find that my tools are busted [00:58:19] Coren: what a few hours or a few days to get it rolled out? [00:58:36] Prolly tomorrow morning. It's a trivial thingy. [01:03:09] Coren: as I remember there used to be a webserver security group available at creation, but nowadays at instance creation there's only the Security groups [x] checkbox. [02:16:17] Any reason why my projects instance was shutdown and how I can turn it back on? Its utrs-primary [02:16:32] Reboot isnt working [02:32:39] Izhidez: nothing in the channel or wiki logs [02:51:00] 6Labs, 6operations: Investigate ways of getting off raid6 for labs store - https://phabricator.wikimedia.org/T96063#1271404 (10coren) >>! In T96063#1268232, @mark wrote: > With the current stability and performance problems of NFS with RAID6, this is definitely not a "nice to have" but something that needs to... [03:08:46] Betacommand: ya i dont see even anything on the list that could have caused it... so im wondering whats going on [03:16:15] Oh lovely. Just got a failure to reboot instance message [03:25:05] Izhidez: hey! I am out atm - can you file a bug and I'll look at it as soon as I'm at a computer? [03:25:38] andrewbogott: ^ another victim of the same thing that afflicted dynamicproxy-gateway I suspect [03:26:07] yuvipanda: sure. Hopefully it goes to the right place. Ill do it as soon as im home [03:26:51] Izhidez: cool. add the 'labs' project and / or cc me [03:27:46] Izhidez: what project? [03:28:17] UTRS / instance: utrs-primary [03:37:13] Izhidez: I think I see what’s wrong; it will take a few minutes to fix [03:38:25] K thanks for looking andrewbogott [03:39:50] Izhidez: ok… can you connect now? [03:40:45] Give me a few mins, just walking home. On mobole [03:40:54] *mobile [03:59:54] andrewbogott: yes it's back up. thank you. any idea what the issue was? [04:00:45] It was obscure. Your instance is, I presume, very old? [04:01:12] It uses copy-on-write and required a backing image that wasn’t present on the new host because… I didn’t know anyone needed it anymore. [04:03:14] (which is a complicated way of saying: I broke it) [04:06:48] yes our instance is quite old. since tool labs dropped off [04:06:53] err. toolserver [04:11:22] andrewbogott: is it an issue also that the puppet was last run christmas day? [04:17:54] Izhidez: it means that your instance is likely to degrade and fall apart, yes. [04:18:04] But not related to this particular problem today. [04:18:26] eeeep. can we do anything to fix that, or should I file a bug and leave that for tomorrow after rest? [04:19:01] Izhidez: If puppet isn’t running it’s because a user of the instance broke it. [04:19:16] oh lovely [04:19:23] So… I could potentially help you sort it out, but typically it’s your responsibility to maintain your own instances. [04:21:53] i'm all for maintaining my own instance, and I'd love to not bug you guys about issues, I just need a starting point. I don't even know what puppet tbh, except it helps maintain the system. If i've done anything to break it, I don't know what [04:25:02] I think I found a few texts on it. I'll mail the list/ask TParis if he knows anything about it from when he admin'd the instance. [04:25:36] if he doesn't know i'll ask labs-l so I don't waste time here with you guys [04:41:35] 6Labs, 10wikitech.wikimedia.org: Re-enable OAuth in Wikitech - https://phabricator.wikimedia.org/T98567#1271586 (10yuvipanda) 3NEW [05:27:42] 6Labs, 10Beta-Cluster, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1271652 (10bd808) The apache2 logs in beta cluster now match the production config. This means that access logs are written to local disk at /var/log/apache2/other_vhosts_access.log on each host. Er... [05:39:15] 10Tool-Labs: xtools-ec has multiple webservices running - https://phabricator.wikimedia.org/T98432#1271675 (10yuvipanda) Now at 216! I'm killing them all now beause I guess we'll have them come back soon enough :) [05:46:34] 6Labs: Allow NFS to be enabled / disabled granularly for Labs projects - https://phabricator.wikimedia.org/T98571#1271685 (10yuvipanda) 3NEW [05:47:54] 6Labs: Allow NFS to be enabled / disabled granularly for Labs projects - https://phabricator.wikimedia.org/T98571#1271692 (10yuvipanda) [05:52:57] 6Labs: Allow NFS to be enabled / disabled granularly for Labs projects - https://phabricator.wikimedia.org/T98571#1271694 (10yuvipanda) For backups, maybe offer an extremely bandwidth limited NFS setup? Or actually an Object store would probably be good enough - tar stuff up and put it in there [06:57:57] 10Tool-Labs: Investigate / get rid of http debian repo from toollabs - https://phabricator.wikimedia.org/T98575#1271727 (10yuvipanda) 3NEW [07:01:38] 10Tool-Labs: Investigate / get rid of http debian repo from toollabs - https://phabricator.wikimedia.org/T98575#1271734 (10yuvipanda) So it's ok because we distribute the GPG key ourselves, but this should probably be in the local debrepo. [07:08:40] 10Tool-Labs: Investigate / get rid of http debian repo from toollabs - https://phabricator.wikimedia.org/T98575#1271740 (10yuvipanda) p:5Triage>3Lowest [07:37:32] 10Tool-Labs: Investigate / get rid of http debian repo from toollabs - https://phabricator.wikimedia.org/T98575#1271765 (10MoritzMuehlenhoff) As mentioned on IRC; if we trust the archive key and if mariadb manages their private archive key securely that's as secure as pulling packages from a standard Debian or U... [07:44:47] 6Labs, 6operations: upgrade salt in labs - https://phabricator.wikimedia.org/T98578#1271787 (10ArielGlenn) 3NEW a:3ArielGlenn [08:04:09] 6Labs, 6operations: upgrade salt in labs - https://phabricator.wikimedia.org/T98578#1271817 (10ArielGlenn) [08:05:03] 6Labs, 6operations: upgrade salt in labs - https://phabricator.wikimedia.org/T98578#1271819 (10ArielGlenn) 5Open>3Resolved This has been done. Some instances were skipped, a full list follows. 1) Instances that were shut off at the time: Instance: i-000000fd Status: SHUTOFF hostname: mwreview-merl Insta... [08:05:57] 6Labs, 6operations: upgrade salt in labs - https://phabricator.wikimedia.org/T98578#1271822 (10ArielGlenn) [08:30:07] 6Labs, 10Labs-Infrastructure: Increase RAID6 sync_speed_min to a sensible level - https://phabricator.wikimedia.org/T98456#1271871 (10mark) [08:30:37] 6Labs, 10Labs-Infrastructure, 6operations: Investigate ways of getting off raid6 for labs store - https://phabricator.wikimedia.org/T96063#1271872 (10mark) [08:32:07] 6Labs, 10Labs-Infrastructure, 6operations: Investigate ways of getting off raid6 for labs store - https://phabricator.wikimedia.org/T96063#1207452 (10mark) @Coren: Where can I see the mapping of raid array (md125 etc) to shelf? Is this documented? [08:34:04] 6Labs, 10Labs-Infrastructure, 6operations: Migrate Labs NFS storage from RAID6 to RAID10 - https://phabricator.wikimedia.org/T96063#1271874 (10mark) [08:54:26] PROBLEM - Puppet staleness on tools-mailrelay-01 is CRITICAL 100.00% of data above the critical threshold [43200.0] [09:04:10] prolly no one awake in here who cares but: [09:04:43] https://phabricator.wikimedia.org/T98578 this lists instances in labs that did not get the salt update, and why [09:05:03] I'll ping again later when a couple of the sf folsk are likely to be around [09:11:25] (03PS1) 10Ricordisamoa: Split CSS into a separate stylesheet [labs/tools/extreg-wos] - 10https://gerrit.wikimedia.org/r/209692 [09:12:29] (03CR) 10Ricordisamoa: "Not sure of whether/how we could use tools-static." [labs/tools/extreg-wos] - 10https://gerrit.wikimedia.org/r/209692 (owner: 10Ricordisamoa) [09:52:55] (03PS1) 10Ricordisamoa: Automatically generate toolinfo.json [labs/tools/extreg-wos] - 10https://gerrit.wikimedia.org/r/209697 [09:53:05] [13intuition] 15ChameleonWiki closed pull request #35: Localisation file for template linking and transclusion check. (06master...06master) 02https://github.com/Krinkle/intuition/pull/35 [09:54:50] Anyone know how I can wget a wmflabs.org URL from the tools labs shell? [09:58:35] magnus__: use http://tools-webproxy/toolname for http://tools.wmflabs.org/toolname instead [10:01:57] liangent: That gives me: Resolving tools-webproxy (tools-webproxy)... failed: Name or service not known. [10:06:51] liangent: Ah, it's "tools-webproxy-01"! [10:17:12] [13intuition] 15ChameleonWiki opened pull request #40: Localisation files for template linking and transclusion check (06master...06ttc_take2) 02https://github.com/Krinkle/intuition/pull/40 [10:17:48] liangent: No, that doesn't really work either - returns the tool index but nothing "below" (php script) [10:25:35] magnus__: at least this one is working for me ... curl -v 'http://tools-webproxy-01/citations/doibot.php' [10:53:42] 6Labs: Allow NFS to be enabled / disabled granularly for Labs projects - https://phabricator.wikimedia.org/T98571#1272046 (10coren) IMO an object store and some form of more generalized (and distributed) log collector should be able to cover the vast majority of what is currently (mis)using NFS and should be our... [11:13:00] 10Tool-Labs: Investigate / get rid of http debian repo from toollabs - https://phabricator.wikimedia.org/T98575#1272069 (10coren) I do believe it's mostly moot by now; this was added when MariaDB was new and shiny (relatively speaking) but now that we use it in prod we probably want to use our packages instead.... [11:17:07] 6Labs, 10Labs-Infrastructure, 6operations: Migrate Labs NFS storage from RAID6 to RAID10 - https://phabricator.wikimedia.org/T96063#1272083 (10coren) @mark: It's in the slides (https://commons.wikimedia.org/wiki/File:WMF_Labs_storage_presentation.pdf) but also ridiculously straightforward: shelves are mapped... [11:19:57] 6Labs, 10Labs-Infrastructure, 6operations: Migrate Labs NFS storage from RAID6 to RAID10 - https://phabricator.wikimedia.org/T96063#1272087 (10coren) A note: while it will probably increase the amount of necessary juggling, the entire setup would be //immensely// improved with raid10 if - rather than one she... [11:53:57] 6Labs, 10Labs-Infrastructure, 6operations: Migrate Labs NFS storage from RAID6 to RAID10 - https://phabricator.wikimedia.org/T96063#1272119 (10mark) Can you work out a plan, and list all the individual steps (ideally with command line invocations) on this ticket? [12:26:33] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds [12:28:18] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 68022 bytes in 0.864 second response time [12:41:34] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds [12:44:23] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 767754 bytes in 6.406 second response time [12:50:26] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds [13:11:42] 6Labs, 10Labs-Infrastructure: Instance in SHUTOFF state, not rebooting - https://phabricator.wikimedia.org/T98602#1272222 (10jkroll) 3NEW [13:52:36] Coren: 10,000 rows per query worked successfully. In fact I was executing around two of them per second. I wonder if I could do 100,000 rows per query ;) [13:54:57] The resulting database has nearly 11,000,000 rows! [13:57:10] harej: I think you're reaching for the point of diminishing returns at that point. [13:58:56] Sure. Also, can I make my database readable by others? [14:06:14] harej: just name it something_p [14:06:25] and that will automatically work? [14:24:51] I rhink so, yes [14:32:01] 10Tool-Labs, 7Regression: https://tools.wmflabs.org/ does not link many tools anymore - https://phabricator.wikimedia.org/T98609#1272374 (10JanZerebecki) 3NEW [15:08:33] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL 55.56% of data above the critical threshold [0.0] [15:29:31] Coren, is there some real-time index that shows how much resources my project takes up compared to others? I want to see if I'm a resource hog or if I'm underestimating the capacity of Tool Labs. [15:29:56] harej: Not as such; at least not in a way that would give you a meaningful comparison. [15:30:54] I'll assume then that I am using a proper amount of resources until you yell at me. [15:31:12] harej: It's more likely one of the DBAs that would do the yelling, but yeah. :-) [15:33:36] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1403 is OK Less than 1.00% above the threshold [0.0] [15:41:39] Coren: seems like 10,000 rows was the magic number. I tried a conservative increase to 15,000 and *boy* is it slow. [15:42:35] harej: Yeah, that's the way of transactions. You eventually start running into invisible barriers as you run out of cache, buffer or transaction windows. [15:43:00] harej: I'd be a little surprised if you get significant improvement between 1000 and 10000 though. [15:43:08] Did you try the former? [15:43:28] No, I went from 1 to 10,000 and was satisfied [15:44:30] ah so. Coren (early bird gets the work :-P) https://phabricator.wikimedia.org/T98578 this lists instances in labs that did not get the salt update, and why [15:44:54] feel free to pass it on to whoever should see it and fix up their stuff [15:45:24] apergos: Will do. [15:45:30] thank you! [16:10:35] 6Labs, 10Wikimedia-Extension-setup, 10wikitech.wikimedia.org: Re-enable OAuth in Wikitech - https://phabricator.wikimedia.org/T98567#1271586 (10Krenair) [16:26:20] 6Labs, 10Wikimedia-Extension-setup, 10wikitech.wikimedia.org, 5Patch-For-Review: Re-enable OAuth in Wikitech - https://phabricator.wikimedia.org/T98567#1272609 (10Krenair) a:3Krenair Definitely been set up before: ```mysql:wikiadmin@silver [labswiki]> show tables like 'oauth\_%'; +-----------------------... [18:38:18] Coren: so I was doing an audit of error / log files on NFS, and there are about 1.3T of them. 1.1T of that is just one file, error file for wp-world [18:39:12] Coren: all of which is just php notices [18:39:23] Coren: if I delete that file now, will I kill NFS? [18:39:50] Wait, which log? deployment-prep? [18:40:01] Coren: no, tools [18:40:07] Coren: wp-world tool’s php error log is 1.1T [18:40:14] Ow. [18:40:26] and growing [18:40:39] Now may not be the best time though; doing backup so I/O is rather tight. [18:40:54] It /shouldn't/ be too hard, but I'd rather not take a chance. [18:40:59] Coren: yeah, but we’re backing up 1.1T of error logs. I’ll file a task so won’t forget [18:41:12] Coren: do you have an approximate ETA for the backup? [18:41:19] yuvipanda: No we're not - the rsync config explicitly does not copy logs. :-) [18:41:40] Coren: aaha! sweet :D [18:42:05] * Coren is crazy but hardly stupid. [18:42:26] but apparently I am because that didn’t occur to me at all as a possible optimization [18:42:28] :) [18:42:35] yuvipanda: It's running at iowait idle, so somewhere between 60-70 hours total I'd guesstimate. I'ts about 10h in now. [18:42:41] Coren: ah, cool [18:43:09] Coren: me and bd808 are vaguely talking about a logstash service we can use. is a bit far off tho [18:43:36] yuvipanda: Yeah, a good object store and a good logstash-like thing would go a long way. [18:43:54] Coren: yup. [18:44:01] FYI: check replication-rsync.conf in my replication patch if you want to see the whole list of exceptions. [18:45:31] Coren: geohack has about 100G of access/error logs, so that’s the next big offender [18:54:03] yuvipanda: we have a ticked for sge + webgrid logrotation, right? [18:54:36] valhallasw`nuage: yeah but I think we should just ship them to logstash and truncate at X MB [18:54:53] Yeah, logrotation is not all that useful. It saves space, but does squat for IO [18:55:04] I mean, it's useful in general - just not what we need most atm. [18:55:05] valhallasw`nuage: partially also because logrotate + gzip is IO intensive again [18:55:05] yuvipanda: as far as I can see, logstash has no auth, so we can't really use it for tools [18:55:19] yuvipanda: then don't gzip and just rotate? [18:55:21] that's just a mv [18:55:23] valhallasw`nuage: me and bd808 are scheming on ways to fix that [18:55:33] sounds good [18:55:35] valhallasw`nuage: long term, NFS for logs is a no-no. So at *best* it’s a bandaid. [18:55:52] valhallasw`nuage: and our problem is also IOPS and not space [18:56:08] mkay [18:56:16] valhallasw`nuage: current thinking is to steal shamelessly from heroku and implement https://devcenter.heroku.com/articles/logging [18:56:20] that solves auth problems for us [18:56:39] stealing from heroku is an automatic +1 from me [18:56:54] valhallasw`nuage: :) we’re stealing all concepts from heroku :) [18:57:01] services manfiests are jsut procfiles :) [18:57:07] I mean, they have had their stuff working for 5 years now or so [18:57:21] then why are we nog calling it that? [18:57:37] or is it a-procfile-but-slightly-different-so-it's-not-really-compatible? :P [18:57:38] valhallasw`nuage: so we’ll basically have a kibana install only for admins to access, and then implement an equivalent for what’s in that link for others [18:57:48] *nod* [18:57:50] valhallasw`nuage: latter, sadly - because we have to support lighty, nodejs, etc [18:57:59] ok, fair enough [18:58:52] valhallasw`nuage: https://github.com/yuvipanda/tools-webservice [18:59:12] valhallasw`nuage: I’m moving all the webservices bits (currently in shell scripts and python scripts that call each other) into a proper python module [19:00:06] valhallasw`nuage: still a massive WIP :) [19:00:26] like all the other htings :P [19:00:28] valhallasw`nuage: that ‘one commit’ has been ammended millions of time. I’m going to get it to some decent shape there, then import it to gerrit and have it be reviewed [19:00:39] valhallasw`nuage: more massive than other things :) [19:13:52] I'm getting a "channel 0: open failed: administratively prohibited: open failed" error when trying to ssh into my labs instance. [19:15:11] polybuildr: try with -vvv [19:15:11] ? [19:15:54] legoktm: That was with -vvv [19:15:59] oh [19:16:01] Also, I have a suspicion I'm doing something very stupid. [19:16:01] yuvipanda: ^ [19:16:41] I have an approved request here https://phabricator.wikimedia.org/T98174 [19:17:05] Does that mean I can ssh into spam-honeypot.eqiad.wmflabs? [19:17:10] I'm sure I'm missing something. [19:18:32] yuvipanda: If we really want to copy heroku we could use their logging aggregator -- https://github.com/heroku/logplex [19:19:33] bd808: not much of a receptive upstream, I’d guess [19:19:56] also erlang [19:20:29] hi polybuildr. have you read https://wikitech.wikimedia.org/wiki/Help:Instances and https://wikitech.wikimedia.org/wiki/Help:Contents [19:20:57] yuvipanda: checking [19:21:11] polybuildr: it typically means you're trying to connect to a host that doesn't exist [19:21:50] valhallasw@tools-bastion-01:~$ ping spam-honeypot.eqiad.wmflabs [19:21:51] ping: unknown host spam-honeypot.eqiad.wmflabs [19:21:54] ^ that might be the issue [19:22:08] polybuildr: you still need to build an actual virtual machine [19:22:34] polybuildr: you got approval for a project, and each project can consist of multiple VMs [19:22:39] valhallasw`nuage: ah. :P I thought it might be something like that. [19:23:16] valhallasw`nuage: So what do I do next? Or should I go read up on the FAQs first? [19:23:34] polybuildr: check https://wikitech.wikimedia.org/wiki/Help:Instances#Creating_an_instance [19:23:50] polybuildr: and maybe log out and in again on wikitech before you do that, because the interface doesn't always update when you get new rights [19:24:49] polybuildr: at least the ssh connection seems to be working now! :-) [19:25:30] valhallasw`nuage: I was in college at the time, left now. No longer behind the proxy. [19:26:24] ah, okay [19:36:38] bd808: mmh. On the one hand, it's good to use a pre-existing setup, otoh, there are WMF people with logstash experience [19:36:54] and I don't know if we have anyone who does erlang [19:37:09] WMF people == me mostly ;) [19:37:48] I think we will be able to do fun stuff with elasticsearch as the backend [19:38:01] getting auth right might be a bit tricky, though :/ [19:38:13] yeah. there are things we can do [19:38:19] but that would be true with heroku's stuff as well, as we have a different auth setup anyway [19:38:21] valhallasw`nuage: got in! Thanks! :D [19:38:55] sharding by project/tool should let us lock things down reasonably [19:40:38] *nod* [19:42:10] valhallasw`nuage: Question. I don't use Vagrant for MW development on my local machine. I need to set up a wiki on my Labs instance. The recommended way seems to be vagrant. Should I go with vagrant or do a normal manual install (which I'm more comfortable with)? [19:42:18] bd808: added you to the tools logstash task; I tried getting a logstash hots to start, but I couldn't quite get it to work [19:42:39] polybuildr: there should be a manifest for mediawiki. Let me see if I can find docs... [19:43:06] labs-vagrant is much nicer than the older single mediawiki role [19:43:07] polybuildr: https://wikitech.wikimedia.org/wiki/Labsvagrant [19:43:17] sorry, https://wikitech.wikimedia.org/w/index.php?title=Help:Labs-vagrant [19:44:09] So your recommendation is to stick with Vagrant, right? [19:44:34] polybuildr: I would recommend it, yes. [19:44:45] bd808: Alright, thanks. :) [19:45:22] and as a bonus you can make puppet role(s) for things you need that it doesn't have and we can review and merge them to make things easier for you to repeat [19:45:55] it's also fairly easy to make local roles for things that really don't have value for sharing with others [19:46:40] labs-vagrant doesn't really use vagrant, but it shares the Puppet code and helper scripts with the MediaWiki-Vagrant project [19:46:58] bd808: Well, the trouble is that I don't even know what Puppet and Vagrant do, I just have a really vague idea. [19:47:03] But I shall read up. :) [19:47:18] (I am however working on making it actually use Vagrant and the vagrant-lxc plugin) [19:49:52] Also, how does a labs instance allow you to sudo without entering any password? [19:52:03] polybuildr: https://wikitech.wikimedia.org/wiki/Help:Sudo_Policies [19:52:50] if it asks for a password, that means you're not allowed to sudo (I think) [19:53:01] valhallasw`nuage: It isn't asking for a password. [19:53:10] I don't know how Linux machines can be configured to do that, which is why I'm asking. :P [19:53:11] valhallasw`nuage: https://phabricator.wikimedia.org/T41788 [19:53:49] valhallasw`nuage: My bad. Found out how. [20:43:35] yuvipanda: https://gerrit.wikimedia.org/r/#/c/203876/ [20:44:47] yuvipanda: should I make https://phabricator.wikimedia.org/T91979 into more of a script or shall I just close it? [20:51:19] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 767761 bytes in 3.882 second response time [21:02:45] valhallasw`nuage: yeah, I think we should put that script in the ops/puppet repo and auto provision it on toollabs::redis instances [21:03:14] yuvipanda: then we also need to provision redis-rdb-tools somehow [21:03:17] build a deb pkg? [21:03:19] ugh [21:03:22] I'll create a new task for that [21:03:26] valhallasw`nuage: needs rebase (that patch) [21:03:29] valhallasw`nuage: is trivial, I can do it [21:03:42] valhallasw`nuage: or you can try! python-stdeb makes it like a 30s process [21:03:45] that's because you couldn't be bothered to merge it for weeks! :-p [21:03:58] yeah, but I also have to figure out what to do with the .deb and stuff [21:04:04] so that's more than 30s process :P [21:04:15] valhallasw`nuage: that too :) [21:04:18] valhallasw`nuage: true, first time isn’t 30s [21:04:22] valhallasw`nuage: but I can build it for you if you want [21:04:30] nah I'll make a task for myself [21:04:56] valhallasw`nuage: sweet :D [21:05:20] 10Tool-Labs: Puppetize redis reporting - https://phabricator.wikimedia.org/T98641#1273413 (10valhallasw) 3NEW [21:05:23] ^ [21:05:42] 10Tool-Labs: Puppetize redis usage reporting tools - https://phabricator.wikimedia.org/T98641#1273424 (10valhallasw) [21:18:23] 6Labs, 6WMF-Legal: Request to review privacy policy and rules - https://phabricator.wikimedia.org/T97844#1273445 (10ZhouZ) Hi @d33tah, thank you for your contributions on WikiLabs. The WMF legal team has reviewed your issue and has some thoughts in regards to your questions. As you may know, user privacy is... [21:25:34] valhallasw`nuage: <3 thanks [21:39:06] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1273516 (10valhallasw) There seems to be some sort of leak again, growing to ~6GB from the 500MB we had two weeks ago: http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1430297062.188&target=tools.tools-redis.redis... [21:44:27] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1273531 (10Dfko) Interesting... Getting back on it. Can you post that dump for me somewhere? [21:44:59] yuvipanda: ^ what did you give dfko last time? the .rdb or the memory.csv dump with only the rq keys? [21:45:34] valhallasw`nuage: latter [21:45:38] ok [21:48:37] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1273533 (10valhallasw) Yep, it's in /home/dfko/redis/dfko.csv. [21:50:55] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1273534 (10Dfko) Thanks [22:00:44] 6Labs, 10Labs-Infrastructure, 3ToolLabs-Goals-Q4: Limit NFS bandwith per-instance - https://phabricator.wikimedia.org/T98048#1273559 (10coren) Not in puppet currently, but the server (labstore1001) has ~root/setup-server-tc setting up an outgoing NFS limit with fair queuing with fairly conservative settings,... [22:14:47] valhallasw`nuage: > if it asks for a password, that means you're not allowed to sudo (I think) [22:15:01] could that be the require authentication option? [22:16:48] Negative24: err? I think it has to do with sudo groups as defined on wikitech, but I'm not sure of the details [22:16:57] this was specifically in a labs context [22:17:28] valhallasw`nuage: reason I asked is because I have in some cases been asked for a password and it takes my LDAP password [22:17:39] hm, okay [22:17:44] I don't know :- [22:17:46] :-) [22:18:05] Hey all - I’m trying to figure how to get access to the EventLogging DB to test our logging - anyone know where I should start? [22:26:44] anyone have a sec to help me finally start using https://phabricator.wikimedia.org/T56054 ? [22:33:25] Betacommand: The .jsubrc? [22:33:51] Coren: yeah, However I think I figured it out, after failing the first time :P [22:34:39] Ah, that reminds me. Last changeset of the day is for you. :-) [22:34:47] the whois ? [22:34:53] * Coren nods. [22:34:58] thanks [22:37:46] Coren does it add it to login too? [22:38:03] Yes, dev nodes are a strict superset of exec nodes. [22:40:01] should know in a few minutes if the jsubrc worked [22:41:04] Coren I also have most of the info for the stats tool replacement, just need to work on formatting and throwing it into a table now [22:42:01] Coren: and it looks like I screwed something up [22:42:27] Coren: logs should be dumped to /data/project/betacommand-dev/sge_logs [22:42:32] but its empty [22:43:02] never mind, think it was a permissions issue [22:44:01] Betacommand: Neat. The patch has been merged; next puppet run should make you full of whoisness. [22:44:36] thanks [22:46:21] 10Tool-Labs, 5Patch-For-Review: Whois needs re-installed - https://phabricator.wikimedia.org/T98555#1273659 (10coren) 5Open>3Resolved Merged. [22:50:57] Coren: using the default -o and -e commands for a directory creates separate files for each job, appending the jobid to the filename. is there a way to avoid that? [22:51:41] If you want to avoid creating numbered files, you have to specify a filename and not a directory. ii.e.: log/afile rather than just log/ [22:51:55] Hm. [22:52:09] I see what you would have wanted to do though. [22:52:56] without specifying a directory all the commands dump to jobname.out or .err [22:53:27] Looking to keep the same format, just move the output to a log subdirectory [22:53:49] Betacommand: Ah, -e supports "$JOB_NAME" and "$JOB_ID" [22:54:00] (Also a few others, check man qsub) [22:54:24] So you can do something like '-e logs/$JOB_NAME.err' [22:56:28] Betacommand Coren do either one of you know who I can ask to get access to the EventLog testing? [22:56:50] coreyfloyd: You need to ask one of the project admins. [22:57:30] Coren: Thanks - is there an admin list I can look at [22:58:47] or someone specifically you think I should ask? [22:59:46] coreyfloyd: Do you know what project it's part of? [23:01:01] Coren: I know this is the list I need to be on: https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep [23:01:34] reference: https://wikitech.wikimedia.org/wiki/EventLogging/Testing/BetaLabs [23:02:42] bblack: is there docs on sslcert (puppet class)? [23:02:47] Ah, deployment-prep; you want to ask (likely) one of Hashar, yuvipanda, Ottomata for sure. Probably legoktm too. [23:02:57] not it [23:03:09] Coren: ahh thanks [23:03:12] * bd808 reads backscroll [23:03:17] yuvipanda: freeze tag! [23:03:22] stay put [23:03:41] I sometimes add people to deployment-prep [23:03:51] hi [23:04:17] * legoktm points to Krenair [23:04:21] hey guys - anyone volunteering to help? [23:04:22] lol [23:04:34] can you link to your phabricator profile for me coreyfloyd? [23:04:54] Krenair: https://phabricator.wikimedia.org/p/Fjalapeno/ [23:05:55] coreyfloyd, try logging in now [23:08:38] coreyfloyd? [23:08:56] Krenair: sorry - figuring out where to login (i was only on step 1 - get access) [23:09:23] oh, you just ssh to deployment-eventlogging02.eqiad.wmflabs like you would to any other labs instance [23:10:21] Krenair: ok… [23:10:32] Krenair: oh getting unable to resolve host name… [23:11:01] i’m on vpn… do I need to be connected some other way for it to work? Sorry I’m new to labs [23:11:45] Oh you've never actually used labs? [23:11:56] I don't think VPNs should get in the way as long as your VPN allows you to SSH out [23:12:32] coreyfloyd: do you have proxycommand set up? [23:12:36] Krenair: yeah - never needed to until now [23:12:50] You want https://wikitech.wikimedia.org/wiki/Help:Access [23:12:55] legoktm: no… [23:13:09] ok, read the page Krenair linked :P [23:13:17] Krenair: ok - awesome - thats what i needed! [23:13:19] thanks [23:13:47] If you have some sort of production shell access, it's a bit like that [23:13:53] you have to ssh via a bastion host [23:14:14] Krenair: ok - that makes sense [23:15:44] coreyfloyd, oh, by VPN did you mean the WMF corp thing? [23:15:53] Krenair: yeah [23:15:56] I seem to recall someone mentioning that before [23:16:04] Yeah the corp stuff is entirely unconnected [23:16:27] I don't think you can go straight from there to the labs hosts [23:17:02] Krenair: ok - yeah i wasn’t sure - i disconnected now [23:17:20] when in doubt, jiggle all the handles [23:19:32] Krenair: looks like I have some homework to do - I’m on a mac… [23:20:21] bblack: never mind. Figured it out [23:23:56] 6Labs, 10Beta-Cluster, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1273803 (10bd808) I created a new instance in deployment-prep named `deployment-fluorine.eqiad.wmflabs`. This host is an m1-large instance with 58G of local disk storage at /srv/mw-log (symlinked to... [23:26:32] 10Tool-Labs: Clean up huge logs on toollabs - https://phabricator.wikimedia.org/T98652#1273808 (10yuvipanda) 3NEW [23:29:28] Krenair: just about got this done… getting my ssh key denied… the publickey is definitely in gerrit… not sure what the issue is - any ideas? [23:30:25] coreyfloyd: not really a solution, but throwing a -vvv on the connection at least says more and might give you dieas [23:30:35] *ideas [23:32:15] coreyfloyd, hmm [23:33:06] coreyfloyd, do you know what your shell username is? [23:33:22] https://www.irccloud.com/pastebin/0xCcoyoB [23:33:27] Ah. [23:33:31] i thought fjalapeno… [23:33:37] You don't appear to have any SSH public keys in your account. [23:33:49] And I think your shell username is coreyfloyd, the same as your IRC nick [23:33:53] hmm, in gerrit it says i have 2… [23:33:58] Gerrit is separate. [23:34:08] You need to put your public key in your wikitech preferences for labs [23:34:10] oh - weird the instructions tell me to do it in gerri… [23:35:24] Krenair: oh no they don’t - they say wikitech then the next instruction talks about gerrit… i just read through it [23:36:18] * coreyfloyd face palm [23:37:35] Alright so I see you have ssh public keys for labs now coreyfloyd [23:38:14] Krenair: yeah - still getting the same error though… [23:38:24] what command are you running exactly? [23:38:45] and did you set a particular username in your ~/.ssh/config? [23:39:22] ssh deployment-eventlogging02.eqiad.wmflabs -vvv [23:39:35] https://www.irccloud.com/pastebin/GDSD7oxr [23:39:51] I think that is all right… [23:40:08] coreyfloyd, line 17 [23:40:15] needs to be "coreyfloyd" [23:40:42] that was it [23:40:52] so my shell is coreyfloyd… [23:41:08] Krenair: thanks - not sure why they are different [23:41:15] There are two usernames you'll generally associate with labs: [23:41:50] The username you use to log into Wikitech, Phabricator, Gerrit, and various restricted miscellaneous services if you're in one of the groups nda/wmf/ops [23:42:12] And the username you use to log into servers with [23:42:49] One is stored in LDAP as cn/sn, the other is uid [23:43:24] ok - thanks - yeah - accounts are a bit confusing… i created most of them the first day and haven’t really looked at them since [23:48:19] Coren: are https://dumps.wikimedia.org/other/wikidata/ accessible via tool labs or do I have to manually download them? [23:50:00] You know, I don't actually know. :-) I don't think they are but they should be. [23:52:44] Coren: should I file a bug for it? how easy/hard do you think it's going to be? [23:53:20] legoktm: It shouldn't be /too/ hard but Ariel is the one that can say for sure. I'm guessing it's no harder than adding a directory to the list of things to rsync. [23:53:29] ok [23:53:41] 6Labs, 10Beta-Cluster, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1273883 (10bd808) MediaWiki debug logs are now switched to deployment-fluorine [23:53:59] Coren: last question! if I download it manually, is it alright to put them on NFS? or should they go somewhere else? [23:54:41] legoktm: Depends on what you want to do with it; but if you need to put it on NFS please put it on scratch. :-) [23:55:08] I'm writing a script that reads from it and dumps stuff in mysql accordingly [23:55:13] what's scratch? [23:55:54] Well, /data/scratch [23:56:03] It's global to all projects. [23:56:31] But it's explicitly ephemeral - think of it as a generally long-lived /tmp [23:56:54]