[00:09:17] Getting 503 errors when trying to log into mobile site on Beta Labs: [00:09:28] Request: GET http://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:UserLogin&returnto=Main+Page&returntoquery=welcome%3Dyes, from 127.0.0.1 via deployment-cache-mobile03 deployment-cache-mobile03 ([127.0.0.1]:3128), Varnish XID 196729703 [00:09:36] Error: 503, Service Unavailable at Wed, 05 Nov 2014 00:07:14 GMT [00:09:56] It been broken for a while now [00:10:59] andrewbogott_afk: ^ [00:11:13] http://en.m.wikipedia.beta.wmflabs.org/w/index.php?title=Special:UserLogin [00:38:15] 3Wikimedia Labs: http://en.m.wikipedia.beta.wmflabs.org is down - 10https://bugzilla.wikimedia.org/72997 (10Rummana Yasmeen) 3NEW p:3Unprio s:3normal a:3None http://en.m.wikipedia.beta.wmflabs.org is down. Could not test anything on VE before wmf7 deployment. [00:43:56] 3Wikimedia Labs: http://en.m.wikipedia.beta.wmflabs.org is down - 10https://bugzilla.wikimedia.org/72997#c1 (10Daniel Zahn) appears to work for me over here, i see the Main_Page http://downforeveryoneorjustme.com/en.m.wikipedia.beta.wmflabs.org [00:45:41] 3Wikimedia Labs: http://en.m.wikipedia.beta.wmflabs.org is down - 10https://bugzilla.wikimedia.org/72997#c2 (10Ryan Kaldari) If I try to go to http://en.m.wikipedia.beta.wmflabs.org/w/index.php?title=Special:UserLogin I get a 503 error. [00:46:26] 3Wikimedia Labs: http://en.m.wikipedia.beta.wmflabs.org is down - 10https://bugzilla.wikimedia.org/72997#c3 (10Ryan Kaldari) It's been broken pretty much all day. [00:50:41] 3Wikimedia Labs: http://en.m.wikipedia.beta.wmflabs.org is down - 10https://bugzilla.wikimedia.org/72997#c4 (10Rummana Yasmeen) Created attachment 17025 --> https://bugzilla.wikimedia.org/attachment.cgi?id=17025&action=edit Screenshot [00:53:57] 3Wikimedia Labs: http://en.m.wikipedia.beta.wmflabs.org is down - 10https://bugzilla.wikimedia.org/72997#c5 (10Daniel Zahn) i see. yea, confirmed error page on http://en.m.wikipedia.beta.wmflabs.org/w/index.php?title=Special:UserLogin i just looked at main page and thought "down" means completely unreachabl... [01:04:41] kaldari: The hhvm error log on deployment-mediawiki01 says "Fatal error: Call to undefined method WebResponse::getheader() in /srv/mediawiki/php-master/extensions/MobileFrontend/includes/MobileContext.php on line 1024" [01:24:26] 3Wikimedia Labs / 3deployment-prep (beta): no log in deployment-bastion:/data/project/logs from "503 server unavailable" on beta labs - 10https://bugzilla.wikimedia.org/72275#c5 (10Bryan Davis) Fatals come from the apache or hhvm process logs. It looks like we haven't recorded any new apache2.log data in the... [02:18:35] hi, I need to run a script that uses 7z, 7z is installed in tools-login but not in tools-exec-* instances, can someone install 7z in exec instances? [02:20:04] danilo: yea, but we'll have to make a patch for it in gerrit [02:20:10] to make puppet install it [02:20:39] hmm, actually see this: [02:20:44] manifests/exec_environ.pp: 'p7zip', [02:20:52] manifests/dev_environ.pp: 'p7zip-full', # requested by Betacommand to extract files using 7zip [02:21:04] that looks like it should be on exec as well [02:21:11] just not the "-full" variety [02:22:19] the non-full variety says it provides 7zr, but p7zip-full provides 7z and 7za which support more compression formats. [02:24:55] I'm using https://github.com/halfak/Mediawiki-Utilities to make a search in dumps, and it uses 7z for to decompress de dump [02:25:32] *to decompress the dump [02:25:56] danilo: i'm making a patch.. [02:26:17] ok, thank you [02:26:18] that is reasonable to do the same we already do in other places [02:27:47] danilo: https://gerrit.wikimedia.org/r/171192 just quoted you, ok? [02:28:16] ok [03:59:34] Beta cluster having any issue? [04:07:43] kart_: it 503s for me [06:10:52] (copied from #wikimedia-qa) hey everybody, seems like beta labs has gone south, everything is 503 error, even load.php and http://en.wikipedia.beta.wmflabs.org/w/api.php [06:12:25] Nov 5 06:12:11 deployment-mediawiki02 hhvm: Failed to initialize central HHBC repository:#012 Failed to initialize schema in /run/hhvm/cache/fcgi.hhbc.sq3: RepoQuery::step(repo=0x7f0a7a419800) error: 'CREATE TABLE main.Unit_559716689_1415146855(unitSn INTEGER PRIMARY KEY, md5 BLOB, preload INTEGER, bc BLOB, data BLOB, UNIQUE (md5));' --> (13) database or disk is full#012 Failed to open /var/www/.hhvm.hhbc: 14 - unable to open database file [06:14:41] !log deployment-prep restarted hhvm on beta app servers [06:14:46] Logged the message, Master [07:28:42] RECOVERY - ToolLabs: Puppet failure events on labmon1001 is OK: OK: All targets OK [07:29:40] YESSS [07:29:41] finally [07:29:53] andrewbogott_afk: Coren ^ fixed all the trusty errors :) libvips is still a problem, though... [09:55:28] RECOVERY - ToolLabs: Puppet freshness check on labmon1001 is OK: OK: All targets OK [10:04:50] !ping [10:04:50] !pong [10:06:58] !log tools cleaned out pacct and atop logs on tools-login [10:07:04] Logged the message, Master [10:26:10] hello all. Where should i request new packages ? MongoDB [10:27:18] YuviPanda: you got an email, need to finish gdal install. [10:27:43] there's no mongodb in toollabs, and probably won't be for a while... [10:28:05] yug: the images you sent don't actually load in my email. copy pasting the text would be better, I think. [10:28:26] yug: also we can't install qgis, since it's a GUI application, and servers can't really have any GUI applications [10:28:37] (my understanding of qgis, at least) [10:30:00] Yes, correct for qgis, it s a gui. Ok for mongo too. I am still discovering lab's rules and boundaries :) YuviPanda [10:30:07] :) [10:33:27] YuviPanda: For gdal, a google search show me the package 'python-gdal' for trusty and other ubuntus. [10:33:37] sure, that can be installed easily. [10:34:28] YuviPanda: ok, it seems to include the bunch of gdal_*.py i am looking for. [10:34:40] alright, let me install it... [10:34:55] ping me when done, so i retest my scripts. [10:36:09] yug: ok [10:36:14] yug: which machine are you testing them on? [10:36:37] @tool-trusty [10:37:12] YuviPanda: i will ping arun on that as well [10:37:17] yug: ok [10:37:36] yug: tools-trusty has python-gdal now [10:38:05] YuviPanda: Ok, let's release the beast! [10:41:49] YuviPanda: Script running under my eyes, gdal py passed (at first look). [10:41:56] yug: yay :) [10:42:06] we'll have to put it on the grid later, but good start now :) [10:43:58] YuviPanda: Other issues are local, i have to edit my code to switch from global node module to local node modules (topojson) [10:44:07] ah, yeah :) [11:07:38] YuviPanda: I see gdal generated the requested files. How could i browse, open or download these file (png images) to See if they are correct? [11:08:53] YuviPanda: i write you a short email with the path. [11:15:30] YuviPanda: Done. [11:21:09] akoopal: are you about? [11:21:26] akoopal: http://tools.wmflabs.org/erwin85/randomarticle.simple.php is giving an "internal error" [11:21:51] thedj ^^ [12:02:58] Tpt: just salut :) [12:33:43] sDrewth: lemme take a quick look. [12:34:48] thanks [12:37:02] 'backend is overloaded' [12:38:37] k :-/ [12:40:02] i can try to stop/start the webservice.. [12:40:12] * thedj doesn't know this much about tools to be honest... [12:41:01] sDrewth: better ? [12:41:19] ugh, i see rly need to fix the css there as well. [12:43:19] yep, better, thx [12:43:28] won' know how long.. [12:43:52] at enWS we use it to enable it to a pick a page to proofread [12:45:04] the problem with all these unmaintained tools is that they all do something different from the next.... [12:45:20] i mean the skin + css in itself is a freaking disaster area. [12:45:25] yep [12:46:05] space in numbers of cases [12:46:42] still better than nothing, and the way that it had deteriorated at toolserver [12:47:07] yup, at least you can easily transfer tools between interested parties now. [12:48:15] and restarting tools can be done by following instructions, whereas hacking not so [12:49:38] so a doofus like me can restart certain tools, leaving others to code [12:51:53] the better question is, why do we need to restart websservices all the time. [13:00:07] I will leave that question for those who are technically capable of answering it, because my answer is "to make the tools start working again" [13:07:57] thedj: Most of the time, it's coding errors that accumulate - tools that do not properly check that resources they allocated at the start remain available (DB connections, logins, etc), or resource leaks. [13:14:45] Coren: can we put that explanation in the FAQ ? [13:15:39] because as a 'can you fix this tool? here are the rights'-maintainer of abandoned tools, some of this stuff is a bit difficult to figure out :) [13:50:40] Coren: Hey! Do you know if https://bugzilla.wikimedia.org/show_bug.cgi?id=72154 is Labs or Datasets territory? [13:54:07] * Coren checks. [13:56:29] "Yes". It's one or the other. :-) Lemme dig a bit and try to figure out which. [13:58:07] ... actually, I can't figure out which one is failed because I can't see something failed. [13:58:55] 3Wikimedia Labs / 3tools: Dumps not updating again. - 10https://bugzilla.wikimedia.org/72154#c2 (10Marc A. Pelletier) dewiki, at least, contains 20141024 updated Nov 5. Do you have a specific example of a failed dump? [14:04:23] Coren: has the DB corruption been fixed yet? [14:04:36] See email by sean on labs-l [14:06:45] thanks, wasnt sure if those had been implemented or where still being discussed [14:07:23] There's still discussion about how to best limit overlong queries, but those are orthogonal to the actual repairs. [14:18:25] 3Wikimedia Labs / 3Other: Can't create account on phab-01: 'worker_activetask' is full - 10https://bugzilla.wikimedia.org/72403#c2 (10Andre Klapper) 5NEW>3RESO/WOR Works for me; I was able to create yet another account there. Please reopen if still an issue [14:19:10] 3Wikimedia Labs / 3deployment-prep (beta): Mobile redirect goes to wrong domain name on beta labs - 10https://bugzilla.wikimedia.org/71079#c13 (10Andre Klapper) rkaldari: Can you still reproduce your issue here? What are the exact steps that Brion can try? [14:49:11] 3Wikimedia Labs / 3deployment-prep (beta): HHVM fcgi restart during scap runs cause 503s (and failed tests) - 10https://bugzilla.wikimedia.org/72366#c22 (10Chris McMahon) *** Bug 73010 has been marked as a duplicate of this bug. *** [15:56:58] Hi, anyone knows why this tool is down? http://tools.wmflabs.org/grantmaking/geo-data-prototype.html [15:59:41] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (11.11%) WARN: tools.tools-login.diskspace._var.byte_avail.value (100.00%) [16:05:33] icinga-wm: thats a curious URL - what is on it ? [16:05:51] oh sorry - Oscar_ [16:06:52] darkblue_b: stats of wiki editors [16:07:18] * darkblue_b looks at http://meta.wikimedia.org/wiki/Wikimedia_Foundation_Report,_June_2014 [16:07:36] darkblue_b: per country* [16:07:53] what about French islands ? [16:07:58] are they listed as France ? [16:08:16] something like stats.wikimedia.org but more in detail [16:08:19] at any rate, I am interested in the tech stack [16:10:10] maybe something like this .. http://meta.wikimedia.org/wiki/Wikimedia_Foundation_Report,_June_2014#mediaviewer/File:2013-14_Funding_by_location.jpg [16:18:25] 3Wikimedia Labs / 3deployment-prep (beta): HHVM fcgi restart during scap runs cause 503s (and failed tests) - 10https://bugzilla.wikimedia.org/72366#c25 (10Bryan Davis) 5PATC>3NEW I have removed the hhvm restart step from scap for now. I'm pretty sure that we will want to put it back in some form in the... [16:45:54] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (10.00%) WARN: tools.tools-login.diskspace._var.byte_avail.value (100.00%) [17:01:08] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (11.11%) WARN: tools.tools-login.diskspace._var.byte_avail.value (25.00%) [17:26:49] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [17:33:17] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (11.11%) [17:55:27] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (11.11%) [18:21:05] Coren: could you add the trusty.tools fingerprint to https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints ? [18:21:47] legoktm: Yes, I most certainly could. :-) [18:21:53] (Will do today) [18:21:55] thanks [18:22:09] I'll just hope I connected to the right server :P [18:22:39] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [18:41:53] andrewbogott: alaso, libvips-tools is another package we can't install on trusty because of package issues [18:41:55] can you take a look [18:42:08] sure [18:42:20] tools-trusty? [18:42:23] Or a different host? [18:45:36] andrewbogott: yeah [18:45:37] andrewbogott: same host [18:45:46] andrewbogott: I have it disabled in puppet now, but we should turn it back on [18:45:48] andrewbogott: also libvips-dev [19:06:53] hey there, is this better now, Yuvi, Andrew or Coren? [19:06:55] https://gerrit.wikimedia.org/r/#/c/171192/2 [19:07:04] * Coren looks. [19:07:26] thanks:) [19:12:02] danilomac: you are "danilo" ? [19:12:21] if yea, then that was that change for 7z you requested and it got merged [19:14:34] YuviPanda: do you know where libvips-tools came from? Did I build it? [19:14:48] Did you? [19:15:29] !log tools exec nodes have p7zip-full now [19:15:31] Package[p7zip-full]/ensure: ensure changed 'purged' to 'latest' [19:15:32] Logged the message, Master [19:17:07] mutante: ok, thanks! [19:21:36] danilo: yw, it should work now, well i ran puppet on tools-exec-03 [19:36:48] mutante: my script still run in tools-login but not in grid, but now the error is "KeyboardInterrupt" [19:37:35] do I need to add a argument to jsub to run in tools-exec-3? [19:38:49] danilo: eh.. i don't know, sry [19:42:36] the 7z is installed in all exec isntances or just in exec-03? [19:47:09] hum, it is in all instaces, it is maybe an error in my script... [20:06:22] it was lack of memory, I put '-mem 500m' in jsub and now it works :)