[05:47:31] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Arifys was created, changed by Arifys link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Arifys edit summary: Created page with "{{Tools Access Request |Justification=develop my bot |Completed=false |User Name=Arifys }}" [08:19:56] Good morning! [09:23:35] 6Labs, 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Scaling, 5Patch-For-Review: Instance creation fails - https://phabricator.wikimedia.org/T120586#1905895 (10Aklapper) p:5Lowest>3Normal [13:57:02] Hi, good morning [13:57:09] What happend with magnus-tool? [13:57:40] https://tools.wmflabs.org/magnus-toolserver/commonsapi.php?image=File:Ludwig%20van%20Beethoven%20-%20Symphonie%205%20c-moll%20-%201.%20Allegro%20con%20brio.ogg [14:08:34] YuviPanda, Coren: Is something wrong with the webservice setup? lighthttpd jobs claim to be running, but visiting URLs claims "is not currently serviced". For example, https://tools.wmflabs.org/oauth-hello-world/ and https://tools.wmflabs.org/anomiebot/ [14:09:29] i experience the same [14:09:49] several others, but not all are not serviced right now [14:10:14] https://tools.wmflabs.org/whois/ for example and mine too [14:10:44] :( [14:11:07] Again somebody is playing with webservices setup [14:11:28] I hate this situation [14:11:55] Nothing I did, but if it's just redis having a headache, just restarting your webservice should fix it. [14:11:56] you can bring it back up [14:12:04] with a simple webserver stop and start [14:12:06] * Coren goes and do a mass poke. [14:12:23] Its not my webservice [14:12:45] However I am using this webservice in my application [14:13:44] and Magnus Manske is not overhere from long time ago [14:14:15] Coren: could you restart this webservice? [14:14:25] Yeah, I'm about to. [14:14:58] if magnus tool is down, wikiradio is down too [14:15:28] you dont have access to toollabs? [14:15:35] yes [14:15:45] I have access, however, its not my webservice [14:15:57] I can't do a "become commonsapi" [14:16:01] I'm poking all webservices now; this should wake them up. [14:16:08] Give it a few minutes. [14:16:23] ok [14:16:33] ready [14:16:35] (It's a lot of jobs to restart) [14:16:36] Coren: thanks [14:19:16] Coren: hm. Did webservices restart earlier without informing redis somehow? [14:19:36] let me grep the accounting log for anything obvious... [14:19:39] valhallasw`cloud: Not that I can see. It lookes like redis amnesia to me. [14:20:55] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:21:17] ^^ unsurprising side effect [14:24:34] <_joe_> do you need assistance? [14:24:56] I dont know if it is about this problem, however Any sound file in commons is working [14:25:25] The_Photographer: I assume you mean magnus-toolserver, not commonsapi? [14:25:25] <_joe_> the home is recovering I' [14:25:27] <_joe_> d say [14:25:46] valhallasw`cloud: no [14:26:03] valhallasw`cloud: try do play in this file https://commons.wikimedia.org/wiki/File:Claude_Debussy_-_La_fille_aux_cleveux_de_lin_-_David_Hernando_Vitores_-_Kayoko_Morimoto_%28Wasei_Duo%29.ogg [14:26:15] that's not what I'm responding to [14:26:32] and playback in commons is completely unrelated to tool labs? [14:26:50] valhallasw`cloud: yes [14:26:57] <_joe_> The_Photographer: what valhallasw`cloud said; btw that works perfectly for me [14:27:15] The_Photographer: there is no tool called 'commonsapi', but there is a 'commonsapi.php' in 'magnus-toolserver' [14:27:22] is that what you were referring to? [14:27:33] valhallasw`cloud: yes, now is working [14:27:36] ok [14:29:18] valhallasw`cloud: are you Magnus? [14:29:28] no [14:32:23] 6Labs, 10Tool-Labs: wikiviewstats webservice crashing all the time - https://phabricator.wikimedia.org/T122506#1906100 (10valhallasw) 3NEW [14:33:41] Coren: there seems to have been a mass webserver crash around 13:50 UTC [14:34:01] including magnus-toolserver [14:34:40] Hm. 20-odd minutes before I restarted them; but I don't get why none of them would have notified the proxy on restart. [14:34:41] and then there's the mass-rescheduling you did around 15:16 [14:34:44] er, 13:16 [14:34:52] no, 14:16 #timesarehard [14:34:55] no, it's 1h20 [14:35:48] Didn't you just say 13:50? [14:35:50] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 966112 bytes in 6.220 second response time [14:36:04] yes, UTC. It's now 14:35 UTC [14:36:27] That's not 1h20. :-) [14:36:43] #mathishardaswell :-D [14:36:51] you are, of course, correct [14:37:20] At any rate, reschedule did force the proxy to receive the updates. :-) So whatever happened the first time wasn't just 'webservices crashed and restarted" [14:37:49] Because /that/ works as expected. [14:37:54] right. The SGE exit code is '0', but that doesn't tell us much [14:38:43] service.log says 2015-12-28T13:51:23.183129 No running webservice job found, attempting to start it [14:41:06] ugh, magnus' error log is 10GB >_< [14:42:19] 6Labs, 10Tool-Labs: Remove overly-large log files - https://phabricator.wikimedia.org/T122508#1906120 (10valhallasw) 3NEW [14:50:04] 6Labs, 10Tool-Labs: investigate webservices crash 28 dec 2015 - https://phabricator.wikimedia.org/T122509#1906131 (10valhallasw) 3NEW [14:52:25] valhallasw`cloud: my god [14:53:24] valhallasw`cloud: if magnus is not here, you could add me to the commonsapi.php project? [14:53:39] The_Photographer: no. [14:53:50] I have explained to you before how project management works [14:55:06] Coren, thanks i am about here but headed to a place with normal internet. Will have to pick your brain on the masss restart mechanism. [14:55:37] Also, chasemp here i guess irc is unhappy with me :) [14:57:44] Guest58868: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Admin has a few notes, but it's SGE so most of the time the solution is 'get a list of jobs with information you don't need, cut/sed/etc to get an actual list, xargs qmod -rj' [14:58:44] * Coren guesses Guest58868 is a disguised Yuvi [14:58:53] Thanks valhallasw [14:58:59] chase :) [15:00:03] Oh. Misparsed "Also, (chasemp here) ..." as "Also, casemp[:] (here ...)" and figured that if you were talking to chasemp you couldn't *be* chasemp. :-) [15:00:50] Heh, trying to release my nick here brb [15:01:02] I cant see my source updated, what happend? [15:01:19] Guest58868: just use chasemp_ as nick? that's what I do when I'm fighting iwth nickserv :-) [15:01:32] I edited a PHP file, I restarted my webservice and the html still old [15:01:39] Its not a client problem [15:01:47] I tested the problem in severals machines [15:01:50] pc [15:06:16] The_Photographer: odd. Restarting the webservice should be unnecessary for php to begin with [15:08:22] The_Photographer: check the access.log/error.log for any information [15:09:10] valhallasw`cloud: git tell "update" however still the old version [15:09:35] The_Photographer: I'm not sure what that means. Is the file on tool labs the one you expect? [15:09:58] i.e. if you view it with `less`? [15:11:29] valhallasw`cloud: the deployed php is old in comparison with the current file [15:12:05] right, so the issue is in your git deployment? [15:12:27] valhallasw`cloud: ready, a cache problem with webservice, I restarted 4 times [15:13:30] The_Photographer: in "16:11 valhallasw`cloud: the deployed php is old in comparison with the current file ", what do you mean with 'the deployed php' and what with 'the current file'? [15:17:02] valhallasw`cloud: the published php was old in comparison with the source file [15:18:41] The_Photographer: which two things are you comparing? [15:19:12] valhallasw`cloud: html generated with php and php source file [15:21:19] *which* php source file [15:21:25] the one you have locally or the one on tool labs? [15:59:54] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Jayantanth was modified, changed by Jayantanth link https://wikitech.wikimedia.org/w/index.php?diff=241909 edit summary: [16:02:04] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Jayantanth was modified, changed by Merlijn van Deen link https://wikitech.wikimedia.org/w/index.php?diff=241919 edit summary: [16:37:32] 6Labs, 10DBA, 5Patch-For-Review: watchlist table not available on labs - https://phabricator.wikimedia.org/T59617#1906273 (10jcrespo) @jcrespo The reason this is "hard" is because it involves replication filters and requires a mysql restart (downtime) so it has to be properly scheduled. This is a reminder fr... [16:51:14] 6Labs, 10Tool-Labs: investigate webservices crash 28 dec 2015 - https://phabricator.wikimedia.org/T122509#1906294 (10valhallasw) I'm leaving the 'why webservers crashed', and moving to 'why was the proxy incorrectly forwarding' now, and will use `magnus-toolserver` for that (we got IRC pings this tool was offl... [16:53:02] conclusion: 'ugh, redis'. [16:53:13] there's no obvious way to synchronise this :/ [16:59:09] (03PS26) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [16:59:46] 6Labs, 10Tool-Labs: Possible race condition in webservice HSET/HDEL - https://phabricator.wikimedia.org/T122515#1906301 (10valhallasw) 3NEW [17:01:40] (03CR) 10Ricordisamoa: "PS26 moves URL creation from EditManager (js) to EditDispatcher (py)" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [17:02:36] (03CR) 10Ricordisamoa: "PS26 also partially fixes URLs for new entities" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [17:34:24] (03PS27) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [17:36:12] (03CR) 10Ricordisamoa: "PS27 adds documentation to some private EditManager methods" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [17:48:09] YuviPanda: about then? [17:48:17] wasn't sure if you were on mobile or not [18:01:18] (03PS28) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [18:04:14] (03CR) 10Ricordisamoa: "PS28 changes info.alert in EditManager#updateStatus() to an object and documents it" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [18:19:28] chasemp: hey [18:20:39] YuviPanda: got a couple q's on this redis tools proxy situation but going to grab a bite, maybe we can reconvene in a bit? [18:21:07] chasemp: yeah, that'll also let me read backscroll [18:21:11] kk [18:23:28] valhallasw`cloud: Coren thanks for handling it! [18:23:36] np [18:45:14] 6Labs, 10Tool-Labs: Uwsgi breaks flask project-relative URLs - https://phabricator.wikimedia.org/T85362#1906581 (10Ricordisamoa) [19:15:01] valhallasw`cloud: I too have no idea why the webservice jobs all just died [19:16:08] YuviPanda: Given that they also managed to restart without the proxy knowing, it's really odd. [19:16:42] Coren: valhallasw`cloud has a very plausible theory for that involving a race condition [19:17:00] https://phabricator.wikimedia.org/T122515 that is [19:17:09] * Coren reaez [19:17:51] Hah. How evil. [19:18:23] so this meant that webservicemonitor restarted them before the post-execute script was executed [19:18:31] so not fully sure why the mass die off happened [19:19:01] Yeah; that seems iffy though that so many would end up having the order reversed; but the mass death is the more interesting problem. [19:21:32] 6Labs, 10Tool-Labs: Possible race condition in webservice HSET/HDEL - https://phabricator.wikimedia.org/T122515#1906645 (10coren) The only way around this that I can think of is for the post-exit script to first check that the kv that is actually there really /is/ the one it expects to delete before going ahea... [19:23:36] YuviPanda: So what we thought might have been a redis replication fail when you did the switchover might have been another case of this? [19:24:09] Hm. Not really; we had /wrong/ entries that time and not missing ones. [19:25:36] Coren: no, I think it's two different causes manifesting in similar places. [19:39:11] 6Labs, 10Tool-Labs: Possible race condition in webservice HSET/HDEL - https://phabricator.wikimedia.org/T122515#1906689 (10scfc) We could use a different data structure in Redis. Currently: ``` 127.0.0.1:6379> HGETALL prefix:admin 1) ".*" 2) "http://tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs:47897" 127.... [19:53:09] 6Labs, 10Tool-Labs: Possible race condition in webservice HSET/HDEL - https://phabricator.wikimedia.org/T122515#1906725 (10scfc) I'm not sure if my memory serves me correctly, but I think we previously did just that with `portgranter`, i. e. only remove the entry that was created, and we ran into problems when... [20:07:10] 6Labs, 10WikiProject-X, 7Tracking: New Labs project: WPX - https://phabricator.wikimedia.org/T122534#1906771 (10Harej) 3NEW [20:10:35] YuviPanda: on the proxy hosts in tools do they each use a local redis instance respectively then? [20:10:49] 6Labs: nfs-exports.service is failing on labstore1001 often - https://phabricator.wikimedia.org/T122250#1906787 (10yuvipanda) https://gerrit.wikimedia.org/r/#/c/261266/ for mitigation, since most of the failures seem to be for the really large tools project (which has more than 100 instances) [20:11:17] chasemp: yeah. [20:11:26] chasemp: so what happened last week was different than this week [20:11:38] chasemp: last week was a failover issue [20:11:56] while this week https://phabricator.wikimedia.org/T122515 [20:12:37] so I was leaning towards asking, it seems we are using redis::legacy and not redis::instance [20:12:41] or I thought but [20:12:44] service redis-instance-tcp_6379 status [20:12:50] is running [20:13:21] chasemp: really? I thought we switched [20:13:29] I remember switching too [20:14:16] maybe I'm misunderstanding [20:14:33] so tools-redis is what's using redis::legacy [20:14:37] the proxy is in the dynamicproxy class [20:15:00] ok [20:15:56] chasemp: https://phabricator.wikimedia.org/T119936 is ultimate solution I think [20:16:08] the redis solution was fine when no similar alternatives existed, but now they do... [20:16:33] is it this same mechanism for the in-tools proxy and the general labs one? [20:16:54] yeah [20:17:00] they just have different routing logic [20:17:05] domain based vs URL based [20:17:19] underlying mechanism is the same (nginx + lua + redis) [20:19:43] valhallasw`cloud: thanks for the investigation but also the detailed task stuff, helps immensely [20:29:29] so what is dumps-1.dumps.eqiad.wmflabs I wonder [20:33:01] hydriz 9763 11378 0 Dec17 pts/3 00:47:22 python runner.py --verbose [20:33:02] hammering nfs [20:33:30] chasemp: usually I just kill it and notify the project owners. [20:34:15] chasemp: Nemo_bis seems to be an admin [20:35:41] I mean it's not generally a fault thing as they are using a file system naively (in a reasonable way I mean that) but [20:35:51] nfs isn't really that robust throughput wise so projects bump into each other [20:36:01] and this one project is using the lionshare and triggering alerts [20:36:02] * YuviPanda nods [20:36:05] also this is labstore103 [20:36:07] so not as big a deal [20:36:14] since it's readonly [20:36:19] right [20:37:28] I'm not sure the alert level makes sense tbh [20:37:38] I wonder if labstore1003 is caught in teh web of alerts for 1001/1002 use case [20:38:38] it did not page though, right [20:38:41] i did not get anything at least [20:38:52] no but alerted in irc etc [20:38:58] chasemp: yeah, I think labstore1003 we should increase limits [20:39:01] over threshold, but threshold for what [20:39:07] we only have 2 levels of notifications currently [20:39:26] sure, I mean to say it pinged us but nothing is wrong really [20:39:32] ah, the thresholds, right [20:40:35] YuviPanda: why do we use nfs for a ro use case? [20:41:04] chasemp: it's providing dumps and other datasets. they're very large and updated from prod dumps [20:41:18] sure but prod dumps use http right [20:41:47] I'm wondering why nfs as tech for somethign where we never do what we need nfs for [20:41:54] or [20:41:58] could we just kill nfs use here [20:42:03] and serve over http [20:43:34] chasemp: lots of tools read it, so if everyone reads it over HTTP that's a lot of waste and work. Plus many of the individual dumps are pretty large, so if I curl and cache that fills up my instance right there. and ofc, for tools it's even worse... [20:43:46] and our http interface for dumps is also throttled [20:43:48] I think [20:43:59] which we should be doing here I imagine but also [20:44:10] are the downsides to http (or a like prod setup) more than the downside of having more nfs? [20:44:28] honest question [20:44:45] fwiw: dumps can also be copied from prod via rsync, and it's just adding a nother source and they could also rsync them [20:45:11] I'm catching up now, maybe we can't do nfs for this etc [20:45:16] everyone is using this as their local copy of the dumps [20:45:25] I'm bending my brain around that now [20:45:26] yeah, that's what it's intended to be used as :) [20:45:45] before my time, so I don't know the original rationale. [20:45:57] but that's what everyone uses now. maybe apergos or Coren knows more? [20:46:01] nfs all the things maybe [20:46:10] I do know that getting people to use HTTP instead of this is a losing battle. [20:46:25] because ppl or machines? [20:46:35] Hardly; the reason this was used was twofold: the dumps machine that did http at the time was having pain [20:47:12] The second reason is that a lot of people already had ts-related code that did naive file operations that could not be changed easily to http [20:47:29] ahh, public dumps was in a weird place at the time [20:47:53] what does the second one mean, they had code expecting local copies that used ts [20:47:54] pre-nfs [20:47:59] That, but also there wasn't a dedicated server for lab dumps; they were sent to the file servers directly [20:48:06] and using nfs was a way to preserve that and work around buggy public dump server? [20:48:16] chasemp: Basically [20:48:22] ok thanks, that sheds light [20:49:03] Coren: does teh threshold tripping on labstore1003 mean trouble as is? [20:49:16] or is it caught in teh web of looking like a labstore but doing totally one-off things [20:49:40] Well, it's a labstore insofar as it serves storage over NFS to labs [20:49:57] But I would say you can be much more forgiving on 1003; it's all dumps and only dumps. [20:50:32] understood, but the tech and the function being distinct in this case is a big divide [20:50:32] ok [20:50:38] YuviPanda: is that task harming something? [20:50:56] can probably be kill -STOP'ed for a few hours with no damage [20:51:11] not fully sure, chasemp was investigatingish [20:51:12] Nemo_bis: it was tripping threshold/use alarms for labstore1003 is all [20:51:16] In particular, nothing *but* dumps users are impacted by dumps users giving a big load. [20:51:26] I think those alarm are probably too low [20:51:49] chasemp: Probably; they were meant to be conservative but can probably be relaxed for 1003 especially [20:52:30] Well, archiving stuff on archive.org helps us *reduce* the load [20:52:42] yeah I think we should just up the threshold [20:52:48] Coren: the original public dumps being less savvy, why not repurpose this server as a mirror but same mechanism? [20:53:02] i.e. why didn't we go w/ this as an http mirror and ppl keep their local copy mechanics [20:53:06] why move to nfs back then [20:53:17] the average is hardly 20 MB/s, quite low if you ask me :) https://tools.wmflabs.org/nagf/?project=dumps#h_overview_network-bytes [20:53:32] chasemp: Because storage. In practice, you ended up with a bazillion copies of the current dump all over the fs. [20:53:52] so it was a solve two problems with one nfs sized stone thing [20:53:56] chasemp: But that can probably be reconsidered now that Ariel overhauled the dumps. [20:54:07] understood [20:54:43] chasemp: Well, it was gluster at the time. One big storage unit; so we said "have 1 copy of the dumps > have N copy of the dumps where N is unbounded" [20:55:02] I didn't know glusterfs ever lived here tbh [20:55:15] chasemp: Then we got rid of gluster; then we split off dumps. [20:55:29] gotcha [20:55:36] chasemp: Ryan was optimistic in the early days that it could be made to work reliably. [20:55:44] which thing? gluster? [20:55:48] * Coren nods. [20:56:11] ah [20:56:17] Funnily enough, moving to NFS was a *major* reliability improvement. Hilarious in retrospect. :-) [20:56:29] well nfs as a ro shared data drive [20:56:51] I meant over gluster in general. Once upon a time, homes and /data/project were there. [20:57:09] * Coren still has nightmares about split brains. [21:12:22] 6Labs: nfs-exports.service is failing on labstore1001 often - https://phabricator.wikimedia.org/T122250#1907035 (10chasemp) for posterity I did a bunch of digging on this to come to the above: > sudo journalctl -u nfs-exports.service --no-pager > /home/rush/nfsexport.log ``` Dec 22 15:03:51 labstore1001 nfs-e... [21:13:53] chasemp: are you able to ssh into tools-redis-01? [21:14:30] hung [21:14:36] accept key and then hangs [21:15:07] interesting [21:15:34] by that I mean ofc 'crap' [21:15:50] ideas? [21:15:53] chasemp: paravoid had a way of getting shell on the instances btw [21:16:09] chasemp: so this is unrelated to instances getting hung on k8s since this one is on trusty and different kernel. [21:16:10] virsh console? [21:16:13] and userspace is fine [21:16:15] yeah [21:16:24] VNC [21:17:05] chasemp: so salt has been super reliable now. I can hit that instance from salt [21:17:36] well, for the instances it knows about :) [21:17:39] that's cool tho [21:17:41] yeah [21:17:46] chasemp: super low expectations :D [21:17:57] anyway [21:18:11] I still clock it at about 75-80% of instances on it's best day but that's a quasi-salt problem [21:18:13] anyhoo [21:18:17] are you on the console now? [21:18:19] no [21:18:23] I don't know how to get on console :D [21:18:28] do you? I can reset root pw.. [21:18:33] I was working on de-NFSing that [21:18:48] I don't know it was done then no [21:19:18] I should bug him to document it better next time heh [21:19:28] I'm going to decom it and do a switchover [21:19:34] and leave it be for investigation [21:19:35] k [21:19:53] YuviPanda: you said you can run commands on it? [21:19:57] via salt [21:20:12] chasemp: yeah [21:20:18] salt 'tools-redis-01.tools.eqiad.wmflabs' cmd.run [21:24:03] * YuviPanda logs out and back into wikitech [21:24:18] give me a sec w/ it I guess [21:24:38] chasemp: oh yeah I am not touching it atm. feel free to play with it as long as the redis process doesn't crash :D [21:24:44] am just setting up a new instance now [21:24:50] since I"ve wanted to move it off anyway [21:27:01] !log tools created tools-redis-1001 [21:27:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [21:30:45] chasemp: I need to disable puppet on both the redises, let me know when I can do that (you can continue poking after) [21:30:54] go ahead [21:31:14] chasemp: ok [21:31:53] !log disable puppet on tools-redis-01 and -02 [21:31:53] disable is not a valid project. [21:32:01] !log tools disable puppet on tools-redis-01 and -02 [21:32:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [21:39:03] YuviPanda: is it possible console is stuck as well [21:39:24] chasemp: not sure. but redis is still accessible I think [21:39:26] and it responds to pong [21:57:34] PROBLEM - Puppet failure on tools-redis-1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [22:01:57] 10PAWS, 7Documentation: Write inline documentation - https://phabricator.wikimedia.org/T122545#1907242 (10awight) [22:31:18] !log tools disable NFS on tools-redis-1001 and 1002 [22:31:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [22:34:49] !log attempt to unmount nfs volumes on tools-redis-01 to debug but it hands (I am on console and see root at console hang on login) [22:34:49] attempt is not a valid project. [22:34:56] !log tools attempt to unmount nfs volumes on tools-redis-01 to debug but it hands (I am on console and see root at console hang on login) [22:34:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [22:39:38] PROBLEM - Puppet failure on tools-redis-1002 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [22:41:34] hello [22:43:16] any chance of a sudo apt-get install mono-complete ? [22:44:07] pengo: it used to be there in past but was removed as some people believe that labs shouldn't contain development files for whatever reasons [22:44:16] err? [22:44:28] pengo: sure, file a bug, I can do it in about 20-30mins [22:44:29] what? [22:44:33] in past = ~3 years ago [22:44:36] ok cool [22:44:40] the 'err' was targetted at pengo [22:44:42] err [22:44:44] petan [22:44:46] ok [22:45:13] I wasn't really confused before but now I am :P [22:45:18] i can compile on my pc, just thought that someone else might want to edit my source code at some point :) [22:45:28] yes I thought the same few years ago... [22:45:59] btw YuviPanda that discussion about mono-complete not being supposed to be in labs is actually probably even logged on irc, I didn't make it up [22:46:18] it was Core n who opposed it afaik [22:47:14] ok well i'll write a ticket up anyway [22:52:02] 6Labs, 10Tool-Labs, 7Tracking: Request for mono-complete package on wmflabs - https://phabricator.wikimedia.org/T122551#1907357 (10Pengo) 3NEW [22:53:09] pengo: I just checked, mono-complete is actually already installed [22:53:14] oh [22:53:30] it is too [22:53:32] 6Labs, 10Tool-Labs, 7Tracking: Request for mono-complete package on wmflabs - https://phabricator.wikimedia.org/T122551#1907367 (10yuvipanda) I actually just checked - mono-complete is actually installed (via `exec_environ.pp`) and should be there on all instances. [22:53:43] well i'm getting this error https://stackoverflow.com/questions/10490155/unable-to-run-net-app-with-mono-mscorlib-dll-not-found-version-mismatch [22:53:45] 6Labs, 10Tool-Labs, 7Tracking: Packages to be added to toollabs puppet - https://phabricator.wikimedia.org/T55704#1907371 (10yuvipanda) [22:53:47] 6Labs, 10Tool-Labs, 7Tracking: Request for mono-complete package on wmflabs - https://phabricator.wikimedia.org/T122551#1907368 (10yuvipanda) 5Open>3Invalid a:3yuvipanda marking as invalid :) [22:54:33] oh.. maybe it's mono-devel i need [22:55:05] that's installed too [22:55:11] all the packages mentioned there are installed [22:55:22] pengo: where are you getting the error? with jsub? [22:55:27] or running it on tools-login itself? [22:55:37] xbuild [22:55:38] if so, try adding -l release=trusty to jsub to make it run on trusty and see if that goes away [22:56:29] i can actually run the compiled .exe (compiled on another machine) so i guess that lib must be there somewhere [22:57:59] Hey to get that replica.my.cnf, I need to create a new tool via https://wikitech.wikimedia.org/w/index.php?title=Special:NovaServiceGroup&action=addservicegroup&projectname=tools ? [23:04:56] pengo: how are you running it? [23:04:58] xzise: yes [23:05:26] YuviPanda, don't hate me.... [23:05:27] ok thanks [23:06:07] YuviPanda, [23:06:17] that's my index.php [23:06:35] i'm sure there's a better way [23:06:39] awww :D [23:06:49] pengo: I don't know what passthru does [23:07:14] pengo: I wonder if PHP is just not passing enough environment [23:07:31] pengo: you should try http://www.mono-project.com/docs/web/fastcgi/lighttpd/ [23:07:32] exec commandline and sends it to webpage [23:07:41] you can add lighttpd config to .lighttpd.conf I think [23:07:43] nah that runs fine [23:07:46] and try out with websrvice restart [23:07:55] it's compiling it on the server that i can't do [23:08:11] but it's fine if i upload the .exe [23:08:15] oh I see [23:08:23] wait why are you compiliong it via PHP? [23:09:06] i'm not [23:09:14] that's just how i'm running it [23:09:22] xbuild /p:Configuration=Release every.csproj [23:09:26] to compile [23:09:40] ah [23:09:43] so that's what's failing [23:09:56] ok, unfortunately I've to say I don't know enough mono to figure out what's going on :( [23:11:00] src/Categories.cs(135,28): error CS0246: The type or namespace name `Tuple' could not be found. Are you missing an assembly reference? [23:11:11] maybe Tuple is a new class [23:11:29] i could just get rid of it [23:14:06] pengo: hmm yeah maybe we have a older version of mono (the version in ubuntu trusty) [23:14:50] they're up to T now? [23:15:03] they're up to X now [23:15:07] T is from almost a year and half ago now [23:15:18] oh [23:16:48] guess it's been a while since I paid attention to their release names [23:18:15] so i guess they'll upgrade to 16.04 eventually? [23:18:26] cause it's not an urgent issue :) [23:20:02] we'll probably provide debian jessie at some point [23:20:05] which has newer packages [23:20:17] and docker container support at which point we can separate package availability from distro [23:20:56] oh cool [23:24:15] and umm.. if i want to rename my tool i should just make a new one and request the old one deleted, right? [23:25:40] yeah [23:25:48] every-other-wiki-has was a good idea for a name at the time [23:33:42] hmm.. my page current takes 103.8413296 seconds to execute.. i should probably do something about thatt [23:38:08] 6Labs, 10Tool-Labs, 7Tracking: Request for mono-complete package on wmflabs - https://phabricator.wikimedia.org/T122551#1907492 (10Pengo) Ah probably actually an outdated library or something. Thanks for checking. [23:39:44] (03PS29) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [23:42:22] (03CR) 10Ricordisamoa: "PS29 fixes EditDispatcher in demo mode" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa)