[01:07:58] YuviPanda: I can’t even remember what I was going to ask about [01:16:06] 6Labs, 10Tool-Labs, 10DBA, 6Stewards-and-global-tools: Throttling linkwatcher tool user as it is consuming 100% CPU - https://phabricator.wikimedia.org/T121094#1894139 (10Trijnstel) Any update on this one? [03:25:34] For a developer API, I'm using a web proxy on Wikimedia Labs going through eqiad. The developers will be looking at the responses, including the headers. A tribute to Terry Pratchett is cute and all, but do we really need to add the X-Clacks-Overhead? This is going to confuse developers who will need to investigate this strange header, wasting time that they could have spent better on understanding the API and the standa [03:48:57] YuviPanda: ^ [03:49:00] I agree, FWIW [08:20:10] I am running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 2.8.0.0 [libirc v. 1.0.3] my source code is licensed under GPL and located at https://github.com/benapetr/wikimedia-bot I will be very happy if you fix my bugs or implement new features [08:20:10] @help [08:20:33] You are trusted identified by name .*@wikipedia/.* [08:20:33] @whoami [08:20:51] @rss+ gmane http://rss.gmane.org/gmane.org.wikimedia.labs [08:20:51] Item was inserted to feed [08:20:55] @rss-on [08:20:55] Permission denied [08:20:58] ... [08:21:01] I trust: .*@wikimedia/.* (2trusted), .*@mediawiki/.* (2trusted), .*@wikimedia/Ryan-lane (2admin), .*@wikipedia/.* (2trusted), .*@nightshade.toolserver.org (2trusted), .*@wikimedia/Krinkle (2admin), .*@[Ww]ikimedia/.* (2trusted), .*@wikipedia/Cyberpower678 (2admin), .*@wirenat2\.strw\.leidenuniv\.nl (2trusted), .*@unaffiliated/valhallasw (2trusted), .*@mediawiki/yuvipanda (2admin), .*@wikipedia/Coren (2admin), [08:21:01] @trusted [08:21:36] * legoktm gently pokes YuviPanda [08:22:53] the context is https://phabricator.wikimedia.org/T121775 [08:29:03] ahaha [08:29:08] there are two wikibugs instances running [08:29:23] but because of the redis queue system, they're not duplicating messages [09:08:41] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Zeeshan was created, changed by Zeeshan link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Zeeshan edit summary: Created page with "{{Tools Access Request |Justification=for accessing the edit histories of pages |Completed=false |User Name=Zeeshan }}" [09:30:19] 10Tool-Labs-tools-Other: create tool to crunch metrics for views (play started) of video and audio files - https://phabricator.wikimedia.org/T116363#1894352 (10Mrjohncummings) >>! In T116363#1873639, @matmarex wrote: > Well then, I went ahead, created a tool and crunched the data. Anybody, please feel free to wr... [11:31:44] (03PS1) 10Alexandros Kosiaris: osm: Add dummy osm_update_pass [labs/private] - 10https://gerrit.wikimedia.org/r/260344 [11:33:51] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] osm: Add dummy osm_update_pass [labs/private] - 10https://gerrit.wikimedia.org/r/260344 (owner: 10Alexandros Kosiaris) [14:41:36] 6Labs, 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review: Instance creation fails - https://phabricator.wikimedia.org/T120586#1894763 (10hashar) So this task is about Nova getting weird back on Sunday December 6th. Got fixed by Yuvi by restarting `nova-conductor` a... [14:42:05] 6Labs, 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review: Instance creation fails - https://phabricator.wikimedia.org/T120586#1894765 (10hashar) p:5High>3Normal Lowering priority since the immediate issue has been fixed. [15:54:37] legoktm: That's actually very nice behaviour. [17:41:10] 6Labs, 10MediaWiki-Revision-deletion, 7Tracking: Need to access revision histories of wikipedia pages - https://phabricator.wikimedia.org/T122035#1895286 (10Krenair) [17:41:27] 6Labs, 10MediaWiki-Revision-deletion, 7Tracking: Need to access revision histories of wikipedia pages - https://phabricator.wikimedia.org/T122035#1894323 (10Krenair) You want to access revision-deleted information from labs? [17:41:39] 6Labs, 10MediaWiki-Revision-deletion: Need to access revision histories of wikipedia pages - https://phabricator.wikimedia.org/T122035#1895290 (10Krenair) [17:42:45] 6Labs, 10MediaWiki-Revision-deletion: Need to access revision histories of wikipedia pages - https://phabricator.wikimedia.org/T122035#1895297 (10yuvipanda) p:5High>3Triage [17:58:33] 6Labs, 10MediaWiki-Revision-deletion: Need to access revision histories of wikipedia pages - https://phabricator.wikimedia.org/T122035#1895336 (10Luke081515) But why T122035#1894335? [18:16:56] 6Labs, 10Tool-Labs: Move tools-mail to trusty - https://phabricator.wikimedia.org/T96299#1895372 (10coren) [18:17:32] 6Labs, 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review: Instance creation fails - https://phabricator.wikimedia.org/T120586#1895375 (10coren) a:5coren>3None [18:17:53] 6Labs, 10Tool-Labs: Have a 'undergoing scheduled maintenance' page for labs set up for scheduled maintenance - https://phabricator.wikimedia.org/T90595#1895378 (10coren) a:5coren>3None [18:17:56] why two wikibugs? [18:18:19] 6Labs, 10Incident-Labs-NFS-20151216, 10Labs-Infrastructure, 6operations, 10ops-eqiad: labstore1002 issues while trying to reboot - https://phabricator.wikimedia.org/T98183#1895381 (10coren) a:5coren>3chasemp [18:19:34] 6Labs, 10Incident-Labs-NFS-20151216, 6operations: Investigate better way of deferring activation of Labs LVM volumes (and corresponding snapshots) until after system boot - https://phabricator.wikimedia.org/T121629#1895383 (10yuvipanda) Need to figure out if lvm snapshots need to be activated for COW to work [18:20:14] 6Labs, 6Discovery, 10Maps: Replacements for a.toolserver.org, b.toolserver.org, c.toolserver.org not available - https://phabricator.wikimedia.org/T103272#1895386 (10Alexrk2) tiles.wmflabs.org *.toolserver.org Doesn't seem to work anymore. No tiles are provided. https://tools.wmflabs.org/geohack/geohack.ph... [18:20:38] !log tools.wikibugs duplicate wikibugs, trying qmod -rj [18:20:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [18:21:52] so we have a rogue wikibugs [18:21:53] gah. [18:24:33] !log tools.wikibugs using `listlogins` in nickserv, we find one running on 208.80.155.186 (-1409), one on 208.80.155.145 (-1405, just restarted) [18:24:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [18:24:45] 6Labs, 6Discovery, 10Maps: Replacements for a.toolserver.org, b.toolserver.org, c.toolserver.org not available - https://phabricator.wikimedia.org/T103272#1895405 (10Kghbln) Affirmative. The tiles are gone, e.g. on WikiVoyage. [18:26:07] !log tools.wikibugs killed wikibugs manually, no SGE in sight. [18:26:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [18:26:39] what on earth [18:26:51] 6Labs: Instances locking up randomly - https://phabricator.wikimedia.org/T121998#1895407 (10chasemp) p:5Triage>3High [18:27:27] !log tools.wikibugs yet it respawns! What on earth. Again from 208.80.155.186, and killed again. [18:27:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [18:27:37] what on earth?! [18:27:54] chasemp: I'm going to do the failover now [18:27:58] andrewbogott: do you have logs from those whre you could add to https://phabricator.wikimedia.org/T121998 [18:28:04] YuviPanda: ok, how can I help? [18:28:07] good vibes? [18:28:13] 6Labs: Instances locking up randomly - https://phabricator.wikimedia.org/T121998#1895412 (10coren) May or may not be significant, but `tools-proxy-01` has (partially) live userland at least insofar as nginx continues accepting connection and proxying. ssh, on the other hand, times out. [18:28:23] I didn’t capture logs, although I have pasted some stuff into the SAL, let me look [18:28:57] !log tools.wikibugs what's even weirder is that it starts both wikibugs.py and redis2irc.py, which are two distinct SGE jobs. Uuh? [18:29:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [18:29:04] I meant irc logs just to reclal what vm etc andrewbogott [18:29:48] chasemp: :D look at tools-worker-08? [18:30:04] 6Labs: Instances locking up randomly - https://phabricator.wikimedia.org/T121998#1895416 (10Andrew) 2015-12-16 16:24 andrewbogott: rebooting tools-exec-1221 as it was in kernel lockup 2015-12-16 23:14 andrewbogott: rebooting tools-exec-1407, unresponsive 2015-12-18 15:16 andrewbogott: rebooting locked up host... [18:30:09] !log tools.wikibugs ah, there are SGE processes running. OK, killing those as well. [18:30:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [18:30:15] uh. [18:30:16] chasemp: I copied the SAL entries into that bug [18:30:18] thanks valhallasw`cloud [18:30:22] andrewbogott: oh sweet [18:30:22] now everything is dead! [18:30:23] :D [18:30:32] there should be a good way to statically link a SAL entry [18:31:38] !log tools.wikibugs and restarted with fab start-jobs. Welcome back, wikibugs. [18:31:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [18:31:53] !log tools failover proxy to tools-proxy-02 [18:31:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [18:36:44] chasemp: yeah, failover iniitated, traffic already flowing through -02 now [18:36:55] YuviPanda: reboot on -01? [18:37:27] chasemp: yeah, want me to do it? difference with tools-worker-08 was that worker-08 lost userspace processes too [18:37:36] (that too is unrebooted) [18:37:37] sure [18:37:43] YuviPanda: so I've got ssh connections to all worker-** that are responding, so if it borks again, we can see if this allows us to poke around. [18:37:45] if this issue was happening in k8s land pre-nfs [18:37:50] and it has only been happening outside of k8s post-nfs [18:38:04] I'm inclined to say it's a similar issue but maybe not the same issue [18:39:24] chasemp: possibly, yeah. proxy is in strange world tho [18:39:39] valhallasw`cloud: which are responding? can you list? [18:40:25] 1,2,3,,6,7,9 [18:40:30] 4,5 are not in dns, 8 doens't respond [18:41:33] valhallasw`cloud: right, 4 5 died earlier and were deleted, and 8 is dead [18:41:39] but rest seem stable [18:41:51] chasemp: ok, rebooting -01 now [18:41:57] tx [18:44:30] !log tools reboot tools-proxy-01 [18:44:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [18:46:29] YuviPanda: ok, and now also exec-120{1-5} and -140{1-5}. [18:46:36] #spreadyourchances [18:46:46] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 383 bytes in 0.002 second response time [18:46:54] 6Labs, 6Discovery, 10Maps: Replacements for a.toolserver.org, b.toolserver.org, c.toolserver.org not available - https://phabricator.wikimedia.org/T103272#1895447 (10Yurik) Would it make sense to switch dewiki to the new maps server, just like it-wiki, ru-wiki, en-wikivoyage and ru-wikivoyage did? Italian... [18:47:04] YuviPanda: ^ is that the proxy change? [18:47:28] um [18:47:36] internal dns vs external? [18:47:38] ah [18:47:40] yes [18:47:45] goddamit, I didn't wait for 20mins [18:47:50] * YuviPanda forces run on holmium [18:48:06] chasemp: yeah, so internal hosts hitting tools.wmflabs.org fail. fixing that now [18:48:14] ok tx [18:49:40] andrewbogott: chasemp > The last Puppet run was at Wed Dec 9 15:52:45 UTC 2015 (16430 minutes ago). [18:49:44] on labservices1001 [18:49:47] * YuviPanda runs puppet [18:49:55] that is surprising [18:50:05] why would that be? [18:50:09] YuviPanda: my fault, although I put it in the sal [18:50:17] it was disabled, right? [18:50:22] oh ok [18:50:33] andrewbogott: no [18:50:37] running -tv made it run [18:50:42] well then… weird [18:51:00] that cron is maybe dying early [18:52:18] anyway, that should fixup shinken soon [18:56:50] chasemp: andrewbogott only suspicious line in kernel log is [18:56:52] > Dec 21 18:43:50 tools-proxy-01 kernel: [6982360.881574] nfs: server labstore.svc.eqiad.wmnet not responding, still trying [18:56:55] so maybe it's all jsut NFS? [18:57:05] ugh [18:57:18] do we really need nfs on the proxies? [18:57:29] or is it just that it's a per project concern atm? [18:57:30] there was a terrible reason we needed it [18:57:36] no we can disable them per-instance [18:57:38] the k8s master has them off [18:57:45] let me go find that terrible reason [18:57:51] puppet fails without it [19:00:37] I have some suspicions nfs is still currently periodically unstable tbh [19:00:43] I imagine we all do [19:01:22] yeah [19:01:29] ok am looking at why proxy needs NFS [19:01:41] it's because tools.admin is deployed from there [19:01:44] let me split that out [19:01:52] and de-NFS the proxies [19:02:01] hey thanks [19:02:09] lemme make task [19:03:24] 6Labs, 10Tool-Labs: nginx puppet manifest requires nfs so error page cannot be updated over puppet - https://phabricator.wikimedia.org/T110836#1895472 (10yuvipanda) This should be fine now that we're using aptly and it doesn't use NFS. [19:08:03] 6Labs, 6Discovery, 10Maps: Replacements for a.toolserver.org, b.toolserver.org, c.toolserver.org not available - https://phabricator.wikimedia.org/T103272#1895499 (10Kghbln) > Would it make sense to switch dewiki Perhaps it would. But what about all the other wikis out there not run by WMF? [19:10:57] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Zeeshan was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=237408 edit summary: [19:30:42] YuviPanda: I'm wondering if we should just unpuppetize tools.admin completely [19:31:03] and treat it as a tool rather than infra [19:31:20] valhallasw`cloud: yup, that's what I'm doing now. [19:31:29] valhallasw`cloud: remove the git::clone, and just add a cron [19:31:36] YuviPanda: why a cron? [19:31:43] just git pull manually after changes [19:31:47] it's a seperate git repo anyway [19:32:17] that's fine too, but people have grown used to it auto updating [19:32:24] since it has a git::clone now with ensure => latest [19:32:31] people = scfc, you and me [19:32:56] although tools.admin is a bit bigger [19:33:10] yah [19:36:48] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 964555 bytes in 2.986 second response time [19:52:21] hello! just one thing. i am trying to connect to tool labs using transmit for mac. there is an option i was using a while ago which is 'mount as disk', very comfortable for programming. now it says "Internal error (rc=1) while attempting to mount the file system. For now, the best way to diagnose is to look for error messages using Console". [19:52:46] any idea of what has changed and what can I do [19:53:38] marmick: username? [19:53:56] marcmiquel [19:55:08] marmick: Dec 21 19:50:20 tools-bastion-01 sshd[29797]: Received disconnect from : disconnected by user [19:55:29] marmick: so I think you'll have to dig into the logs for transmit to see what's happening... [19:55:31] i did log in as in any other ftp. then i logged out [19:55:42] but what didn't work is the fucntion [19:55:45] "mount as a disk" [19:55:56] Ah. [19:56:02] which is setting up the folder [19:56:07] as if it was local [19:56:52] marmick: right, so that sounds like an issue with Transmit. Try checking the Console, as suggested? [19:56:54] and/or restarting [19:58:39] ok. im on it [19:58:43] thanks [20:07:38] valhallasw`cloud: got it working again :) [20:27:59] 6Labs, 6Discovery, 10Maps: Replacements for a.toolserver.org, b.toolserver.org, c.toolserver.org not available - https://phabricator.wikimedia.org/T103272#1895859 (10Yurik) >>! In T103272#1895499, @Kghbln wrote: > Perhaps it would. But what about all the other wikis out there not run by WMF? I think we shou... [20:28:52] !log ores depool ores-web-01 from lb [20:28:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL, Master [20:35:27] do precise instances build any more? I'm at "error" twice now trying to create a small instance [20:36:30] andrewbogott: chasemp ^ [20:36:40] thanks yuvi [20:37:00] apergos: I think that’s my fault, give me a few minutes to sort out dhcp for new instances? [20:37:03] sure sure [20:37:08] I can even do this tomorrow [20:37:27] I got the trusty one to build but maybe that was lucky timing [20:45:46] I'm going to bow out of this channel again because freenode + autojoin of over 10 channels + pidgin is stupid [21:47:41] PROBLEM - Puppet failure on tools-worker-05 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0] [21:49:41] PROBLEM - Puppet failure on tools-worker-04 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [0.0] [21:50:18] ^ is me [21:52:33] RECOVERY - Puppet failure on tools-worker-05 is OK: OK: Less than 1.00% above the threshold [0.0] [23:30:21] YuviPanda: is it reasonable that a command that takes 7 to 15 seconds on my laptop takes 22 *minutes* when run on bastion? Is it mostly NFS latency? Fierce scheduling? [23:30:35] (tools-bastion-01) [23:36:04] abartov: uh, you shouldn't be running scripts/stuff on the bastion [23:36:24] (and it really shouldn't be taking 22 minutes :/) [23:36:35] I'm testing my tool and need to see some debugging output. In production, I run it via the grid. [23:38:40] (looks like bastion may be thrashing: "KiB Mem: 8176828 total, 8034300 used, 142528 free, 61432 buffers") [23:41:39] abartov: in that case, you should use tools-dev [23:42:02] legoktm: ah! thanks. Okay, my flight is boarding. I'll revisit in ~10 hours. [23:42:12] o/ have a safe flight! [23:42:38] (how do I reach tools-dev?) [23:46:13] abartov: dev.tools.wmflabs.org