[00:12:15] bd808, thanks for all your help though! Time to get hacking again )) [00:12:30] np [00:19:04] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876819 (10jcrespo) > it would be great if staff did acknowledge that this is the case, and could pro-actively report this to the volunteer who wri... [00:44:50] GIF Movie Gear v4.3.0. Bought this over 10 years ago, still gets updates. Somehow the installer is only 988 KB with manual, examples, and everything [01:19:26] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Awight was created, changed by Awight link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Awight edit summary: Created page with "{{Tools Access Request |Justification=I'd like to help with the tools.ores project. Thanks! |Completed=false |User Name=Awight }}" [02:19:41] Hey, I want to add user:Awight to this service group but I can't find him [02:19:43] https://wikitech.wikimedia.org/w/index.php?title=Special:NovaServiceGroup&action=managemembers&projectname=tools&servicegroupname=tools.ores&returnto=Special%3ANovaServiceGroup [02:19:47] can anyone help? [02:24:19] can't find him? [02:24:43] yeah [02:24:46] I tried [02:24:56] ? [02:25:35] Krenair: One the link when I want to add the user, it's not there [02:25:39] *on [02:26:19] Oh [02:26:28] Well he'll need to be a member of the project first [02:27:20] and he's not [02:27:26] so get a project admin to sort that out [02:27:44] I think that project has a form for it somewhere [02:28:18] by project do you mean tools? [02:28:35] there it is: https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request [02:28:54] your URL says projectname=tools, so I would assume so Amir1 :) [02:29:17] Thanks :) [02:29:35] but I don't want this for myself [02:29:39] no [02:29:42] I want it for someone else [02:29:45] you get him to fill it out [02:29:52] and this form doesn't support that :/ [02:29:53] then you can add him to your group inside tools [02:30:09] once a tools admin has processed it [02:30:35] I want to know if any of admin are around now? [02:30:45] to process the request [02:33:40] bd808, Reedy, YuviPanda? [02:34:06] there are also a bunch of ops who have admin access there and appear online [02:35:26] although all have been idle for ages [02:39:23] oh okay [02:39:26] thanks [02:41:36] * bd808 reads bacscroll [02:43:35] Amir1: So Awight doesn't have a tools account yet? [02:43:44] yeah [02:43:53] but he has a lbas acount [02:43:57] *labs [02:44:54] hmmm... I think I know how to approve a request but I don't know how to just give someone an account that hasn't requested one [02:45:14] don't you just add them to the project like you would any other project? [02:45:50] Oh. that is probably all it takes. [02:46:00] * bd808 looks at the tools project members list [02:48:04] Amir1: I think he will show up now in the service group management scrrn [02:48:11] *screen [02:48:34] awesoem [02:48:36] thanks [02:48:52] yw [02:49:28] added him [03:03:11] YuviPanda: no rush, but tools-elastic-0[123] aren't accepting my ssh key. tools-puppetmaster-01 does though so I'm guessing the elastic hosts have some problem from the ldap juggling last week. [03:54:18] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1876967 (10Beetstra) @jcrespo: I was notified immediately, but unfortunately at the start of my weekend, with an email which is hardly telling me a... [07:15:15] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1877049 (10Beetstra) @jcrespo: I have implemented new counting tables based on the three 'offending' queries above (and will implement if there are... [07:34:38] PROBLEM - Host tools-worker-04 is DOWN: CRITICAL - Host Unreachable (10.68.16.122) [07:35:02] bd808: uh, I'll look tomorrow. [08:12:54] 6Labs, 10Tool-Labs: Install Perl module Redis. - https://phabricator.wikimedia.org/T121341#1877078 (10valhallasw) 5Open>3Invalid a:3valhallasw libredis-perl is already installed: valhallasw@tools-bastion-01:~$ aptitude show libredis-perl Package: libredis-perl State: installed Automatically installed: n... [09:50:16] Hi every one [09:52:08] myabdurashid: hi [09:52:10] :o [09:52:12] need help? [09:56:39] highly welcom [11:08:17] Hi everyone, I need technical help. I want to set a link to a toollabs page as a edit summary. The address is https://tools.wmflabs.org/kasparbot/persondata/challenge.php?q=entity%3AQ53447 . [[toollabs:kasparbot/persondata/challenge.php?q=entity%3AQ53447|test]] doesn't work. The question marks seems to be treated as special character. How can I avoid this? [11:13:30] any permission issue on trusty? [11:14:36] from some exec host(s), some files can't be opened [11:14:47] with "permission denied" [14:19:55] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:24:49] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 958541 bytes in 2.733 second response time [16:04:12] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10MediaWiki-General-or-Unknown, 10wikitech.wikimedia.org, and 3 others: MWException after account creation on wikitech - https://phabricator.wikimedia.org/T117553#1877844 (10Reedy) > [15:50:56] could you just add a comment "+1 by basisbit; required... [16:06:24] andrewbogott: are there problems with crontab again, tasks do not start sometimes [16:10:29] PROBLEM - Puppet failure on tools-mail-01 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [16:24:18] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Awight was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=228130 edit summary: [16:39:52] !log rcm turned off rcm-5, instance is currently ununsed [16:39:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL, Master [16:41:04] 6Labs, 10Labs-Infrastructure: [Horizon] Design broken - https://phabricator.wikimedia.org/T120646#1877934 (10Luke081515) Can we solve this fast? I need horizon sometimes and at the moment this is annoying. [16:50:29] RECOVERY - Puppet failure on tools-mail-01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:23:09] chasemp: do you think I can bug you about the network thing? [17:23:28] doctaxon: hi! can you tell me which one was missed? [17:23:35] YuviPanda: sure, let's sync up this afternoon [17:23:39] we can at least take a crack at it [17:23:50] chasemp: ok, yeah. it's just a SNAT issue or something [17:24:13] let's see how deep teh or something hole goes :) I also have another related thing to ask etc [17:24:30] maybe like 30m post this meeting we can talk [17:24:35] or 1h or something [17:24:39] sure [17:24:46] it's intermittent [17:24:48] which bugs me [17:25:05] I wonder if the ip subnet that flannel allocates to a host is also used elsewhere [17:25:11] Yuvipanda - I misinterpreted the timezone, it's all best - sorry [17:25:17] is 192.168.0.0/16 used elsewhere? [17:25:25] doctaxon: np! Thanks for watching out! [17:34:36] YuviPanda: on 192 that's an interesting question, I think no but there was something that made me question that it wasn't used somewhere odd and openstack-y and isolated [17:34:42] but maybe not, will outline when we caht [17:34:44] chat even [17:34:49] ok [18:00:23] YuviPanda: 1h? [18:00:46] chasemp: kk [18:26:28] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ejegg was created, changed by Ejegg link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Ejegg edit summary: Created page with "{{Tools Access Request |Justification=Share aggregated counts of clicks on fundraising emails (see requirements here: https://phabricator.wikimedia.org/T114010). I basically..." [18:26:35] tools-dev is only for me unusabe slow? [18:28:15] Steinsplitter: let me look [18:29:13] Steinsplitter: the load is ~1.5, but that should be OK. (4 cpu host). It's also not very slow for me [18:29:22] (tools-bastion-02) [18:30:24] strange, ok. thx [18:31:21] Steinsplitter: if it's slow again, please check with 'top' if there's anything obvious [18:31:29] it might be a transient issue [19:00:42] chasemp: ping when aroubd :D [19:08:23] YuviPanda: ping [19:08:32] chasemp: pong [19:08:37] chasemp: so what's happening is: [19:08:48] # on tools-worker-01, a docker container can't reach the outside world [19:08:55] # on all other workers, it can [19:09:09] this isn't restricted to tools-worker-01 - also happened to -04 and -05, and I took them out of rotation [19:09:15] they have all the same software and the same kernel [19:09:18] so not sure what's going on [19:09:26] *but* they can communicate *inside the kubernetes network itself* [19:09:28] do you have an active example? [19:09:29] err [19:09:30] flannel network [19:09:33] yes [19:11:09] where is it? :) [19:11:15] yeah am setting it back up moment [19:11:20] ok [19:12:48] chasemp: ok, if you get on tools-worker-01 [19:12:57] I'm trying to send dns requests from a container in there [19:13:13] chasemp: it has ip 192.168.34.1 [19:13:24] tcpdump -i any host 192.168.34.1 [19:13:31] should show requests going out but nothing coming back [19:18:26] YuviPanda: hi! can I bug you a bit (about https://phabricator.wikimedia.org/T121313) whenever you have some time? [19:19:04] ashley: sure! I'll take a look in maybe 30-45mins? [19:19:25] super, thank you very much :D [19:41:35] YuviPanda: question about labs. Is it possible to run an arbitrary domain (say, playlist.wikiedu.org) on a wmflabs instance? [19:41:52] or will only *.wmflabs.org domains work? [19:56:23] 6Labs, 10Labs-Infrastructure: maps-wma1 instance unresponsive (second time in 3 days) - https://phabricator.wikimedia.org/T121431#1878872 (10Aklapper) [20:07:44] ragesoss: in meeting i'll look after [20:11:00] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ejegg was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=228562 edit summary: [20:33:56] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Mavrikant was created, changed by Mavrikant link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Mavrikant edit summary: Created page with "{{Tools Access Request |Justification=https://tr.wikipedia.org/wiki/Kullanıcı:KET_Bot I wanna move KET Bot to here from AWS. https://tr.wikipedia.org/wiki/Kullanıcı:M..." [21:01:03] chasemp: oh shit we are a bit fucked. [21:01:13] chasemp: there's pam stuff causing cron to fail everywhere [21:01:14] for which reason? [21:01:23] and puppet runs on cron [21:01:46] see: orchestration convo from 20minutes ago and collectively groan in pain [21:01:56] yeah [21:02:01] can you point me to an example host? [21:02:08] is this toollabs only or ? [21:02:53] chasemp: at least toollabs (look at any tools-webgrid-lighttpd-140x instance (try 1402)) [21:02:57] am checking elsewhere [21:03:16] look at syslog [21:03:21] alright [21:03:44] hmm [21:03:48] ores-web-02 works fine [21:03:51] so maybe we aren't fucked [21:04:27] I see the errors but don't know what they mean as far as user impact [21:04:34] hmmm [21:04:49] maybe we aren't fucked [21:05:29] somthings wrong but maybe not the something you thought [21:06:00] yeah [21:07:25] chasemp: it looks like root's crons are working [21:07:29] chasemp: and all user's crons are failing [21:07:40] chasemp: with a PERMISSION DENIED [21:07:44] root crons don't depend on ldap but user crons do? [21:08:10] chasemp: yeah, so it's directly caused by https://gerrit.wikimedia.org/r/#/c/258663/ [21:08:27] chasemp: I know because this actually happened on friday and I had to fix it that night because tool labs crons run a lot of wiki bots [21:08:41] chasemp: so I've hand reverted that patch on tools-submit (where tools crons run) and that's worked [21:08:47] chasemp: no idea why however [21:08:56] what's teh downside to revert? [21:11:30] ragesoss: that needs configuration on the WMF end, so it's probably best to create a task in the #labs project? [21:11:56] valhallasw`cloud: not necessarily, they can just point the A record at a public IP [21:12:04] but we prefer not to do that [21:12:08] yes, but that needs a public ip, which also needs a request [21:12:18] ragesoss: so, it is technically possible, but we prefer not doing it :D [21:12:24] but can be made to work [21:12:43] you just need to put up appropriate notices to comply with labs ToS let me find the bug where we did this for another person [21:14:06] 6Labs, 6WMF-Legal: Discussion: can I park WikiSpy under a separate, simpler domain? - https://phabricator.wikimedia.org/T97846#1879165 (10yuvipanda) So, you can get a separate IP and point your DNS at it. I'm not sure what to do about SSL certs - maybe now that letsencrypt is usable that can be used? Sorry ab... [21:14:07] ragesoss: https://phabricator.wikimedia.org/T97846 [21:15:36] YuviPanda: okay, cool. the context is that I'll be working on — essentially — a social media gadget that lets people create a "playlist" of, for example, 'my top 3 Wikipedia articles about asteroids'. not sure where it'll be hosted — possibly partly on labs just for the processor-heavy backend processes — and wanted to figure out how much could *potentially* run on labs. [21:15:55] ragesoss: isn't that the same as the 'Gather' extension? [21:16:01] hey ragesoss, how is that different from collections? [21:16:22] ashley: so... that instance looks hosed. [21:16:43] well, the collecting part is not really the focus of this, and it's not necessarily connect to a Wikipedia account. [21:17:05] the real focus of the project is to design twitter cards that can go viral. [21:17:43] we may tie it in to Gather, if the ability to create anonymous collections is ready before we wrap it up. [21:18:08] 6Labs, 10Labs-Infrastructure: maps-wma1 instance unresponsive (second time in 3 days) - https://phabricator.wikimedia.org/T121431#1879181 (10yuvipanda) I can't seem to ssh into this even as root :| [21:18:09] but the database end of what we have planned is really trivial. [21:18:45] YuviPanda: uh-huh. can anything be done about it or is it one of those "scrap it and start over" things? (if it's the latter, I don't really have problem with that -- basically everything, save for the data and two custom quickie shell scripts, is on git.wm.o anyway) [21:23:13] Dschwen: how old is the maps-wma1 isntance? [21:23:16] *instance [21:26:38] ashley: that would be wonderful. [21:26:49] ashley: sorry about the instability :( [21:28:45] YuviPanda: no worries, shit happens sometimes and there's nothing we can do about it; thanks for looking into it. :) what needs to be done to have the current instance scrapped and a fresh one created? will whining at you work? or do I need to file a more formal request somewhere? [21:30:24] ashley: you can just go to Special:NovaInstance on wikitech and create a new one. https://wikitech.wikimedia.org/wiki/Help:MediaWiki-Vagrant_in_Labs has more detailed instructions [21:30:46] ashley: can you leave your current instance as is and just create a new one? so we can poke at it to see what's going on [21:31:00] sure thing [21:31:07] thanks [21:31:28] 6Labs: Can't ssh to social-tools1 from bastion-01 - https://phabricator.wikimedia.org/T121313#1879210 (10yuvipanda) My root key doesn't let me in to this instance :( Not sure what's happening. [22:04:57] chasemp: ^ maps-wma1 and social-tools1 [22:10:35] maps-wma1 I can't get in, I confirmed with yurik on 2015-12-09 [22:10:42] oh [22:10:53] the social-tools1 I can't get into and have no idea [22:10:53] I see [22:10:55] ok [22:11:03] so these aren't new? [22:11:22] at least the maps one has been around for a bit [22:11:31] let me see here [22:11:33] https://phabricator.wikimedia.org/T121431#1879181 Dschwen says they were able to login earlier [22:11:49] and I am not sure if it's the smae project yurik is involved in [22:12:10] oh I'm sorry [22:12:16] yurik said it's not his I just looked :D [22:12:21] they suggested wikidata [22:12:43] but without root working [22:12:46] I'm out of ideas atm [22:12:48] nope, not mine [22:13:17] chasemp: I'm going to try hitting it with salt [22:13:30] YuviPanda: here's another on my wishlist, I have used virsh before to mount VM consoles in this kind of case [22:13:39] but it has to be pre-set-up to be a sane thing for fallback [22:13:44] but I believe we can do it [22:13:52] when something is hit with salt, it gets preserved forever [22:13:55] guys, our project is maps-team [22:14:11] maps is volunteer-controlled, though we have access [22:14:22] chasemp: <3 [22:14:27] chasemp: needs to happen ya [22:14:30] is our mgmt access [22:14:50] yurik: and no, you can not resize instances, and I can not resise instances either [22:15:02] MaxSem: can you by chance access maps-wma1.maps.eqiad.wmflabs now? [22:15:28] salt can't reach it but that's no surprise [22:15:53] it has gone totally dark then [22:15:54] chasemp, Permission denied (publickey). [22:15:59] k tx MaxSem [22:16:10] YuviPanda: is this the day of immediate feedback on discussion or what :) [22:16:36] everything sucks [22:16:52] well yeah but you can't let that get you down :) [22:17:20] heh [22:17:41] shall I restart that instance and see if that helps? [22:17:48] gonna doooo itttt [22:17:53] andrewbogott and I told someone friday, if root is bad and salt is bad and users cannot log in themselves [22:18:01] it's a new isntance situation [22:18:13] sure [22:18:31] > | created | 2014-03-06T16:24:50Z | [22:18:34] not a new instance [22:18:39] nope [22:18:51] I'm wondering whether we could have puppet spin up a second ssh server with password auth, or a call-back kind of situation [22:19:04] because these kinds of issues do seem to pop up every now and then [22:19:22] if we could mount console with virsh we would a hard fallback [22:19:34] valhallasw`cloud: the problem here I suspect is that puppet isn't running at all [22:19:35] have a even [22:19:35] oh, yeah, that would be a more reasonable solution [22:19:39] yeah, virish +1 [22:19:49] chasemp: can you open a bug for it? I think that'll be super important [22:20:11] yes [22:24:12] restart didn't do anything [22:24:21] 6Labs: Figure out how to get console on VMs with virsh - https://phabricator.wikimedia.org/T121452#1879329 (10chasemp) 3NEW [22:24:23] 6Labs, 10Labs-Infrastructure: maps-wma1 instance unresponsive (second time in 3 days) - https://phabricator.wikimedia.org/T121431#1879336 (10yuvipanda) A restart didn't do anything either. [22:24:59] there isn't much else we are prepared to do [22:25:35] 6Labs, 10Labs-Infrastructure: maps-wma1 instance unresponsive (second time in 3 days) - https://phabricator.wikimedia.org/T121431#1879339 (10yuvipanda) Hmm, I wonder / suspect if puppet ever ran on this instance in a while? https://tools.wmflabs.org/nagf/?project=maps is all empty. [22:26:32] chasemp: yeah [22:27:26] 6Labs, 10Labs-Infrastructure: maps-wma1 instance unresponsive (second time in 3 days) - https://phabricator.wikimedia.org/T121431#1879341 (10yuvipanda) We can try with @andrew or @coren's key when they're back from their vacation since it's been there longer than mine or chase's, but I also recommend rebuildin... [22:27:44] chasemp: for social-tools1 I can confirm puppet is clean (from graphite data) [22:27:55] so not sure why that tanked [22:28:09] try salt w/ both? [22:28:15] I tried salt with wma [22:28:17] let me try for this [22:30:25] chasemp: interesting I can [22:30:27] hit social-tools [22:30:41] cat: /etc/ldap.yaml: No such file or directory [22:30:47] nice can probably fix at least root then [22:30:56] forcing puppet run on that one [22:31:04] haaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa [22:31:09] ?[1;31mError: Could not retrieve catalog from remote server: Error 400 on SERVER: Resource type mw-extension doesn't exist on node social-tools1.social-tools.eqiad.wmflabs?[0m [22:31:11] ?[1;31mWarning: Not using cache on failed catalog?[0m [22:31:13] ?[1;31mError: Could not retrieve catalog; skipping run?[0m [22:31:15] yup [22:31:20] really old instance using mediawiki_singlenode [22:31:29] ashley: ^ [22:31:52] ashley: I highly reccomend you set up a new one with mediawiki_vagrant setup. this one is recoverable but only with massive effort [22:32:11] bd808: so mediawiki_singlenode instances are going to die off bit by bit if people have customized them [22:32:48] chasemp: for context, mediawiki_singlenode was a BeforeMyTime way of setting up a mw instance on a labs instance. except it was... fragile. it was replaced with labs-vagrant and now mediawiki-vagrant-in-labs. [22:33:06] was it puppet based or ? [22:33:23] there was a puppet role for it ya [22:33:35] well, still is [22:33:42] modules/role/manifests/deprecated I think [22:33:56] kk [22:33:59] but you can customize it with some mw-extension stuff I don't really know about and that's been broken from BeforeMyTime too [22:34:21] ashley: I can give you ssh access for right now if you want [22:34:27] ashley: the instnace will die soon though [22:34:55] YuviPanda: you can just uncheck the manifest in wikitech? [22:35:07] hmm [22:35:09] probably [22:35:21] then it'll get all updates, just not for the deprecated mw_sn thing [22:35:21] but I don't know what that'll do to the code [22:35:36] nothing? puppet never removes anything, unless explicitly asked to [22:35:47] prolly [22:36:12] i'm doign it now [22:38:02] good call on that one [22:38:07] probably sanest course of action [22:39:18] valhallasw`cloud: haha ofc [22:39:20] I can't uncheck it [22:39:29] since that role has been removed a long time ago [22:39:52] just re-add it to the list? [22:40:17] as in https://wikitech.wikimedia.org/wiki/Special:NovaPuppetGroup [23:01:47] valhallasw`cloud: ashley ok, so social-tools1 is back up [23:07:18] :o :D [23:14:10] YuviPanda: damn you're awesome ;D everything's fine and dandy once again, thank you so much \o/ [23:14:36] ashley: yw. I highly reccomend moving it tho [23:14:50] valhallasw`cloud: maybe I should remove that role from all instances [23:15:22] how would I do that? (remember, I'm a total Labs n00b -- I'm a code monkey, not ops, and I don't even work here!) [23:17:15] ashley: https://wikitech.wikimedia.org/wiki/Help:MediaWiki-Vagrant_in_Labs [23:18:17] oh boy [23:19:05] that looks like something I'll be wanting to tackle after Christmas (and preferably with you or someone equally wise guru around who I can poke all the time for tips and guidance) [23:19:15] :) [23:19:23] ashley: but next time this breaks, we might not be able to help fix [23:20:18] I know >.< it's just that I've never worked with vagrant or anything and this doesn't seem to be exactly as simple as "click here to migrate stuff and wait for the magic to happen in the background" [23:21:17] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Mavrikant was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=228885 edit summary: