[00:00:07] 6Labs, 6Operations, 13Patch-For-Review: Setup private docker registry with authentication support in tools - https://phabricator.wikimedia.org/T118758#2074234 (10yuvipanda) This now works properly, and I can push and pull! However, docker has decided to do incredibly braindead things and ties image names to... [00:00:28] andrewbogott: ^ I need to do this sooner than latter, will probably add that to the lua script [00:00:36] just a vague fyi [00:30:13] yuvipanda: Hey, do you know why it happens? 5:17 PM when I want to use grid engine for python3 (via virtualenv) it returns this error to me: /data/project/dexbot/p3/bin/python3: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.17' not found (required by /data/project/dexbot/p3/bin/python3) [00:30:13] 5:17 PM I want to know if it's reported before or should I file a bug? [00:30:14] 5:18 PM or I'm doing something wrong ;) [00:51:44] Amir1|afk: -l relase=trusty :) [01:04:36] oh thanks [01:12:52] 6Labs, 10Tool-Labs, 13Patch-For-Review: setup-tomcat does not work - https://phabricator.wikimedia.org/T118094#2074458 (10Shangkuanlc) Hi @zhuyifei1999, Sorry if I make you feel pressured, this really helps us to move forward. The volunteer engineering team has been working on this for about a year, and it... [01:26:35] PROBLEM - SSH on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:31:27] RECOVERY - SSH on tools-webgrid-lighttpd-1208 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [02:57:49] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [05:43:54] i'm having issues with commonswiki_p database [05:44:18] particularly when calling sql query from php [05:45:50] it always returns null [05:46:09] 6Labs, 10Tool-Labs: zoomviewer seems to be down - https://phabricator.wikimedia.org/T97790#2074807 (10dschwen) 5Open>3Resolved [05:46:16] but if i run the same query from console, i get result [05:58:16] RECOVERY - Puppet failure on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [06:14:34] hmm, further debugging shows that the connection to commonswiki database is being refused [06:14:38] i wonder why [06:34:42] 6Labs, 10Labs-Infrastructure, 6Operations: Estimate hardware requirements for relevance lab elasticsearch servers - https://phabricator.wikimedia.org/T128433#2074820 (10Peachey88) [06:53:29] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:58:23] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 796808 bytes in 6.941 second response time [07:32:15] 6Labs, 10Tool-Labs, 10pywikibot-core: Tool Labs: shared Pywikibot code not available - https://phabricator.wikimedia.org/T125505#2074894 (10Ato_01) 5Open>3Resolved [07:50:00] RECOVERY - Puppet failure on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [08:34:52] PROBLEM - Host tools-bastion-01 is DOWN: CRITICAL - Host Unreachable (10.68.17.228) [09:16:06] RECOVERY - Puppet failure on tools-exec-1218 is OK: OK: Less than 1.00% above the threshold [0.0] [09:53:20] 6Labs, 10wikitech.wikimedia.org: "Manage Service Groups" is linked twice in sidebar - https://phabricator.wikimedia.org/T128404#2075094 (10scfc) @Krenair: I don't know. What is the result for "Tim Landscheidt"? [10:58:07] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [11:26:05] 6Labs, 10wikitech.wikimedia.org: "Manage Service Groups" is linked twice in sidebar - https://phabricator.wikimedia.org/T128404#2075262 (10Krenair) ```> var_dump( $wgMemc->get( wfMemcKey( 'openstackmanager', 'roles', 'Tim Landscheidt' ) ) ); array(2) { [0]=> string(4) "user" [1]=> string(12) "projectad... [11:30:18] 6Labs, 10wikitech.wikimedia.org: "Manage Service Groups" is linked twice in sidebar - https://phabricator.wikimedia.org/T128404#2075264 (10Krenair) a:3Krenair [12:26:01] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [13:42:37] (03PS64) 10Ricordisamoa: Initial commit [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 [13:47:10] (03CR) 10Ricordisamoa: "PS64 updates grunt-contrib-jshint from ~0.12.0 to ~1.0.0" [labs/tools/wikidata-slicer] - 10https://gerrit.wikimedia.org/r/241296 (owner: 10Ricordisamoa) [14:54:39] Hey folks. SSH to tool labs is timing out a lot today. [14:57:56] where are you coming from? [15:33:36] Hi, thanks for all the help so far. [15:34:02] I run into an issue where my Python job raises an KeyboardInterrupt exception [15:34:10] It's a long running task [15:34:22] It doesn't happen locally, nor on my own server [15:35:07] So I expect it to be something OpenGrid specific, but I have no idea. Phabricator doesn't seem to have tickets about this. [15:35:40] Is this a bug? Or did I hit some kind of limitation? [15:50:48] fako: your job is probably being killed for using too much memory [15:52:41] (SGE sends SIGINT and gives you a few seconds to clean up after yourself before sending SIGKILL) [15:53:45] Ah, ok [15:54:06] I'll dig a bit deeper into memory usage then. Thank you. [16:17:46] hey yuvipanda, could I be added to the deployment project in labs? (Username: nschaaf) [16:33:07] were there any changes to commonswiki database? (access setings, location...) [16:33:47] i'm getting Can't connect to MySQL server on 'commonswiki.labsdb' (111) [111 = connection refused] [16:36:25] Danny_B, I can connect to it without problems [16:53:39] FyI: I just fixed a stalled rebase on deployment-puppetmaster:/var/lib/operations/puppet [16:53:55] actually, I should probably log that: [16:54:07] !log deployment-prep fixed a stalled rebase on deployment-puppetmaster:/var/lib/operations/puppet [16:54:08] Please !log in #wikimedia-releng for beta cluster SAL [16:54:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL, Master [17:29:31] tom29739: console or php? [17:38:32] Danny_B, I've been using DataGrip on my dev machine. It works on that and console. These is what I see: http://prntscr.com/a9wvlq [17:38:36] 10Tool-Labs-tools-Other, 10Possible-Tech-Projects: Fix TreeViews to provide pageviews statistics for all articles of any wikiproject etc. - https://phabricator.wikimedia.org/T56184#552874 (10Sumit) IMPORTANT: This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Wikimedia has... [17:39:04] Danny_B, I've been the command: 'sql commonswiki' to connect on console [17:39:41] Danny_B, what have you been running? [17:49:57] tom29739: i can't connect from my php script. i can connect to other dbases though. (this script used to work at least a month ago, i have changed nothing in it) [17:50:39] Danny_B, can you connect from console to it? [17:50:53] like to the dbase. [17:52:05] Sorry, my client is acting up today [17:52:28] yes [17:54:03] Danny_B, what connection details are you using? [17:58:03] 6Labs, 10Labs-Infrastructure, 6Operations: Estimate hardware requirements for relevance lab elasticsearch servers - https://phabricator.wikimedia.org/T128433#2076499 (10TJones) [18:08:19] Danny_B: on what host are you running the script? [18:09:17] Danny_B: it sounds like the script is using labsdb1002 rather than the host commonswiki.labsdb points to now [18:14:47] valhallasw`cloud: can you please start https://tools.wmflabs.org/derivative/deri1.php ? [18:15:09] valhallasw`cloud: /etc/hosts says 10.64.37.4 [18:15:35] Danny_B: right, that's probably wrong. Try removing the entries and using dns instead [18:15:53] matanya: it is started. [18:16:02] as you can see by the fact that it 404s ;-) [18:16:13] * sorry, restart [18:19:19] valhallasw`cloud: do i have to restart anzthing? [18:19:44] removing did not help [18:19:54] in fact now i don't even get the error [18:20:01] but null [18:21:12] matanya: uh, why? That file just seems not to exist? [18:21:30] Danny_B: that's... Weird [18:21:38] hmmm qstat [18:21:38] error: unable to send message to qmaster using port 6444 on hosqstat [18:21:39] error: unable to send message to qmaster using port 6444 on host "tools-grid-master.tools.eqiad.wmflabs": can't resolve host name [18:21:39] t "tools-grid-master.tools.eqiad.wmflabs": can't resolve host name [18:22:23] ho, I see, already working on it [18:25:22] valhallasw`cloud: what's the current host? i'll try to add it [18:25:54] oh, sorry valhallasw`cloud i'll check what the users complains about [18:26:13] toollabs down [18:26:23] yes that's me [18:26:32] should be back in a moment [18:26:46] as soon as this puppet run completes [18:26:46] tssk, no bamboo for you ;) [18:26:55] btw, is the topic still up to date? [18:27:26] should be back up [18:27:44] not yet here [18:28:08] valhallasw`cloud: Can you restart wikibugs? [18:28:32] hmm [18:28:47] still 502 [18:30:00] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [18:30:02] PROBLEM - Puppet failure on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [0.0] [18:30:12] apparently nginx is using a different dns server [18:30:16] * yuvipanda waits for puppet to run there [18:30:51] Luke081515: sjoerddebruin back up now [18:30:55] yep [18:30:56] sorry about that :| [18:31:09] np [18:31:09] I need to revert all of this morning's work now :| [18:31:14] yikes [18:31:22] but I ges someone should restart wikibugs? [18:31:36] but shouldn't affect anyone [18:31:38] else [18:47:29] so more just a question...because we can't get the disk space we would need for an ongoing project (~4T) in the labs virt cluster we are putting together a hardware ask for dedicated hardware. But other than disk space there is no real reason we need dedicated hardware. Are there any options i'm not thinking of for allowing to run in labs? I'm guessing we couldn't just buy dedicated disks, and that all the labvirt's have every bay fille [19:00:48] * valhallasw`cloud prods wikibugs [19:01:39] hrm. [19:01:45] yuvipanda: do bots need a kick in general? [19:02:04] I'm not clear on what happened w/ the issues a bit earlier [19:05:03] wikibugs [19:05:06] wikibugs reload [19:05:45] tom29739: ?? [19:06:10] There's a command for reloading it somewhere [19:06:25] I'm pretty sure there isn't. [19:10:24] No, there isn't, I'm thinking of wmbot [19:10:34] That has commands [19:13:03] ebernhardson: there is a mechanism for waht you mean but we haven't deployed it anywhere so it's a wishlist item at this point, but also there are two layouts you could mean: a cluster of hosts that offer ES as a service to labs things or physical hosts within teh labs realm that are managed as VM's [19:13:13] they are pretty different [19:17:08] chasemp: shouldn't need to I think [19:17:19] chasemp: what happened earlier was: https://gerrit.wikimedia.org/r/#/c/274164/ [19:17:25] chasemp: well, in terms of the ask right now we are thinking about basically asking for 2 dedicated machines of the same spec as nobelium. These wouldn't be live updating though [19:17:31] chasemp: I didn't realize that setting a zonefile would make the recursor *not* reach out to the other backends [19:17:40] thus resulting in empty responses to 'em all [19:18:18] chasemp: so to most bots it'll just be a transient DNS outage [19:18:30] only if they were explicitly reaching out to other tools hosts [19:18:43] wikibugs needs kick because it tries to connect to tools-redis [19:19:00] yuvipanda: ah, yeah, could you? [19:19:29] what, kick wikibugs? [19:19:31] sure [19:19:57] done [19:20:30] yuvipanda: I just did [19:20:35] it's doing stuff [19:20:52] ah [19:20:56] I think you now killed wb2-phab? :P [19:21:01] I qmod -rj'd both [19:21:05] which is what I usually do :) [19:21:14] yuvipanda: well, the job is dead somehow :/ [19:21:22] valhallasw`cloud: phab is having issues too atm [19:21:24] so maybe it's that? [19:21:31] sporadoic ones at least [19:21:35] maybe [19:21:39] phab should be back and ok for a bit afaik [19:21:46] but I'm more inclined to blame sge [19:21:48] 2016-03-01 19:21:01,985 - wikibugs.wb2-phab - INFO - Shutting down [19:21:53] fun [19:21:56] that looks like a SIGINT from SGE [19:22:06] but then it didn't come back up? dunno [19:22:14] anyway, should be OK again [19:22:39] uh. [19:22:45] * valhallasw`cloud prods wikibugs [19:24:05] wikibugs-static :p [19:24:20] hm, that was the wrong wikibugs I killed [19:24:20] bah. [19:25:02] * valhallasw`cloud prods tools-exec-1406 [19:25:26] that one. [19:25:55] instructions for that bot anywhere? [19:26:16] There aren't any that I can find [19:26:31] https://wikitech.wikimedia.org/wiki/Wikibugs [19:26:47] but valhallasw`cloud wrote most of it, so he knows more :) [19:26:58] production bot box [19:27:26] Ah, it's just not linked anywhere on Wikitech or meta, so maybe that's why I couldn't find it [19:29:04] tx yuvipanda, this is good I can never remember the names [19:29:10] :) [19:30:05] chasemp: wikitech, but mostly the readme / fabfile in the repo [19:30:21] however, in this case it's SGE being annoying I think [19:30:37] * valhallasw`cloud kills all jobs [19:30:50] now where is that other wikibugs [19:31:40] are we now completely wikibugs-free? [19:31:58] seems so. good. [19:32:05] I can't see any in here [19:32:11] * valhallasw`cloud fab start_jobs it [19:33:54] RECOVERY wikibugs is back [19:35:05] :D [19:37:38] 6Labs, 6Operations, 13Patch-For-Review: Setup private docker registry with authentication support in tools - https://phabricator.wikimedia.org/T118758#2077031 (10yuvipanda) Meh, that screwed up, reverting all the CNAME work... [20:02:14] 10PAWS: Allow restarting PAWS hub without taking down all the instances - https://phabricator.wikimedia.org/T128508#2077129 (10yuvipanda) [20:03:23] 6Labs, 6Operations, 13Patch-For-Review: Setup private docker registry with authentication support in tools - https://phabricator.wikimedia.org/T118758#2077142 (10scfc) Is @Joe's T123628 a duplicate of this task? AFAIUI, there the registry would be a container and the name issue solved like other containers?... [20:05:00] 6Labs, 10Tool-Labs: Install a docker registry to be used by kubernetes - https://phabricator.wikimedia.org/T123628#2077150 (10yuvipanda) [20:05:02] 6Labs, 6Operations, 13Patch-For-Review: Setup private docker registry with authentication support in tools - https://phabricator.wikimedia.org/T118758#2077151 (10yuvipanda) [20:06:40] 6Labs, 6Operations, 13Patch-For-Review: Setup private docker registry with authentication support in tools - https://phabricator.wikimedia.org/T118758#1808509 (10yuvipanda) Indeed it's the same, I've merged it in. The reason it's not just a container is mostly because we don't have swift on the horizon yet... [20:07:45] 6Labs, 6Operations, 13Patch-For-Review: Setup private docker registry with authentication support in tools - https://phabricator.wikimedia.org/T118758#2077164 (10yuvipanda) That ticket also has a far more complex setup for a ful PaaS system that we aren't doing yet (and when we do do it, we shouldn't be buil... [20:39:46] hmm .. why don't i see the list of instances or links to create instances / proxies on wikitech projectadmin resource pages? [20:45:26] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:46:36] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [20:48:13] subbu: log out and back in [20:48:24] ah, ok. [20:48:25] you're lying, shinken-wm [20:49:11] It is [20:49:25] is it down, tom29739? [20:49:29] it works for me... [20:49:41] It works for me too [20:49:55] I meant I agree that shinken-wm is lying [20:51:11] yuvipanda, why is it saying that if it's not down? and that puppet failure was from hours ago. [20:51:40] maybe it's lagged [20:52:48] does it need restarting or something? [20:55:27] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 797130 bytes in 9.426 second response time [20:58:35] yuvipanda, i added a proxy nlwiki.wtexp to nlwiki.wikitextexp.eqiad.wmflabs at port 8080 but http://nlwiki.wtexp.wmflabs.org/ keeps spinning .. on the vm, localhost:8080/wiki/Main_Page resolves successfully .. is tihs specific to the proxy or vagrant? [20:59:19] i rebooted the vm already .. just in case that would help .. but, it didn't. [20:59:22] subbu: did you open port 8080 in the project's firewall? [20:59:43] subbu: https://wikitech.wikimedia.org/wiki/Special:NovaSecurityGroup [21:00:04] no. thanks. :) [21:01:56] \o/ [21:02:33] bd808, I'm having problems with mediawiki-oauthclient [21:02:38] I think we have a bug somewhere to make the error page for the proxy remind you to check the security groups [21:02:59] tom29739: what sort of problems? [21:07:39] bd808, this: http://prntscr.com/aa02xb [21:08:03] I think it can't load the SSL keychain or something [21:08:18] curl is installed and it's trying to connect to metawiki [21:08:28] hmm... looks like it. I haven't tested any of that on a Windows host before [21:08:45] It doesn't work on toollabs either [21:09:08] Well that would be odd. I have a tool using that library [21:09:28] bd808, I can get VM using Ubuntu up and running if that would help [21:10:26] It's loading from composer [21:11:08] tom29739: which oauth server are you pointing it at? [21:12:05] bd808, $endpoint = 'https://meta.wikimedia.org/w/index.php?title=Special:OAuth'; [21:12:06] $redir = 'https://meta.wikimedia.org/view/Special:OAuth?'; [21:12:56] I just changed the localhost to meta from the example because I don't have a MediaWiki instance on my computer [21:13:56] *nod* the tool I have setup points to mediawiki.org but otherwise looks mostly the same [21:17:11] bd808, I was told that the OAuth central wiki was meta so I used that [21:18:41] bd808, does toollabs have curl enabled? Like as a PHP extension? [21:26:27] tom29739: it should have [21:27:05] strange, I was getting an error with it not being there the other day [21:31:56] tom29739: it may help you to debug if you make a small test project that only tries to do the OAuth communication without any other things that may cause complications. I know that it is possible from tools though because it works with https://tools.wmflabs.org/bash [21:44:55] tom29739: you don't need to use HTTPS with OAuth 1.0 [21:46:05] although if you want to use meta you probably have to [21:46:18] but you really shouldn't have cert problems with meta [21:48:03] tom29739: find out the exact URL that curl request goes to, try it from the command line with curl -v, see if that works [21:48:13] tgr, which one should I use? I heard something about mediawikiwiki being the central oauth wiki and then that changed to meta. [21:48:46] Like which wiki [21:48:57] OR can I use any of them [22:02:08] tom29739: doesn't really matter, whatever your users will be most comfortable with [22:02:37] any Wikimedia single-sign-on wiki will work [22:02:49] OAuth credentials are global [22:03:33] but they all have the same certificate so that will not help with your curl problems [22:09:33] tgr, now I'm, getting this 'Empty HTTP response! Status: 301' [22:10:14] This is just my basic debug script, copypasted from the example [22:13:56] http://termbin.com/lw5a [22:44:04] 6Labs, 10Labs-Infrastructure, 10Monitoring, 6Operations: labstore monitoring - "Last run result for unit .. was exit-code" - https://phabricator.wikimedia.org/T128526#2077860 (10Dzahn) [22:45:07] 6Labs, 10Labs-Infrastructure, 10Tool-Labs, 6Discovery, and 3 others: labstore monitoring - "Last run result for unit .. was exit-code" - https://phabricator.wikimedia.org/T128526#2077887 (10Dzahn) [22:45:44] 6Labs, 10Labs-Infrastructure, 10Tool-Labs, 6Discovery, and 3 others: labstore monitoring - "Last run result for unit .. was exit-code" - https://phabricator.wikimedia.org/T128526#2077892 (10Dzahn) [23:03:36] tom29739: that sample code is a bit old [23:04:02] (patches welcome :) [23:04:32] you should use https everywhere, Wikipedia does not accept HTTP connections since May last year or so [23:17:23] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2077988 (10Milimetric) @Bianjiang: could you be more specific about what you're trying to accomplish? The pageview API provides access to project-level aggregat... [23:20:47] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2078015 (10DannyH)