[00:00:46] 06Labs, 10Tool-Labs: Data collection for tools job_count seems to be broken - https://phabricator.wikimedia.org/T149634#2758878 (10bd808) 05Open>03Resolved a:03chasemp {F4682058} [00:05:26] legoktm ^^ it actually does try and re connect ssh. [00:08:12] 06Labs, 10MediaWiki-extensions-Newsletter: Clear 'nl_*' tables in http://newsletter-test.wmflabs.org/ - https://phabricator.wikimedia.org/T149651#2758891 (1001tonythomas) [00:09:18] paladox: On which line? [00:10:32] (03PS4) 10Paladox: Adds a grrrit-wm restarting command for you to type in irc [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/318976 (https://phabricator.wikimedia.org/T149609) [00:11:06] marktraceur hi, line 166 [00:11:15] It calls this ircClient.addListener('join', waitForChannelJoins); [00:11:54] which calls this https://phabricator.wikimedia.org/diffusion/TGRT/browse/master/src/relay.js;8ff7cf21a1f72783afad72f6fbbd6b3eb2e9402c$140 [00:12:08] then https://phabricator.wikimedia.org/diffusion/TGRT/browse/master/src/relay.js;8ff7cf21a1f72783afad72f6fbbd6b3eb2e9402c$149 [00:12:16] then https://phabricator.wikimedia.org/diffusion/TGRT/browse/master/src/relay.js;8ff7cf21a1f72783afad72f6fbbd6b3eb2e9402c$113 [00:12:22] marktraceur ^^ [00:13:06] paladox: OK, that starts a new connection, but doesn't kill the old one AFAICT [00:14:54] marktraceur, oh [00:14:55] Would this [00:14:56] }).on('end', function() { [00:14:56] console.log('Client disconnected'); [00:14:57] }); [00:14:58] ? [00:15:09] paladox: Also, it only works if the IRC client reconnects automatically, which I don't know enough to say it does [00:15:21] paladox: No, that's not an order to disconnect, it's a listener for disconnections. [00:15:28] oh [00:15:40] also we have a new command we are testing that should restart the bot [00:15:48] paladox: Let's talk about logical or [00:15:51] !grrrit-wm- [00:15:56] paladox: What does "a" || "b" evaluate to? [00:16:06] nicknames [00:16:16] What? [00:20:56] paladox: Try running the expression "a" || "b" in a javascript console and tell me if the result is what you expected [00:21:44] Oh i thought you were talking about when i did || for nicknames never mind [00:21:44] anyways i think a || b is it allowing you to do a or b [00:22:38] paladox: I'm not asking you what it does, I'm asking you to evaluate the expression and tell me what the result is [00:23:02] Oh, i doint know how to do that [00:23:54] paladox: Open a console, fire up nodejs, type in "a" || "b", and it should give you the result. [00:24:16] oh [00:24:20] ok thanks [00:24:53] "a" || "b" [00:24:59] > "a" || "b" [00:24:59] 'a' [00:24:59] > [00:25:06] marktraceur ^^ [00:25:11] oh so a will win [00:25:13] paladox: OK, you didn't need to paste all four lines. [00:25:21] paladox: Yeah, that's what we call short-circuiting. [00:25:25] sorru [00:25:26] sorry [00:25:34] paladox: I imagine you expected something like [ "a", "b" ] [00:25:41] paladox: But that's an array. [00:25:46] Oh yep [00:25:50] paladox: Those square brackets are how we create arrays [00:26:40] yep, so something like var whitelist = [ "paladox" || "mutante" || "Krenair" || "hashar" || "ostriches" || "greg-g" || "twentyafterfour" || "apergos" || "robh" ]; [00:26:42] will work? [00:26:48] Oh my god no [00:26:54] paladox: || is not how we define arrays [00:27:04] oh sorry [00:27:09] i forgot to remove that [00:29:11] paladox: And just in case you thought you were done, when you've fixed that part, try running "a" === [ "a", "b" ] in your console and see what happens [00:29:22] oh [00:29:58] marktraceur it says false [00:30:10] Yeah [00:30:12] Of course it does [00:30:26] paladox: Which means when you try to check from === whitelist, that won't work [00:30:27] oh [00:30:42] paladox: Luckily, you know how to use indexOf [00:31:02] Well i got that from http://code.runnable.com/UkmYEow-67ktAAFM/irc-command-bot-with-node-irc-for-node-js [00:31:19] oh now i get it [00:31:48] 10Tool-Labs-tools-Pageviews: Add Mediaviews to Pageviews suite - https://phabricator.wikimedia.org/T149642#2758938 (10MusikAnimal) [00:32:15] something like from === whitelist.indexOf [00:32:21] marktraceur ^^ [00:32:52] paladox: OK, I stand corrected, you also copied your use of indexOf [00:32:59] yep [00:33:01] paladox: Why don't you spend a few minutes looking at your code [00:33:10] paladox: See if you can figure out how indexOf works [00:33:20] paladox: If not, then maybe try googling "javascript indexOf" [00:33:26] Ok [00:33:27] thanks [00:33:30] paladox: Then once you think you have an answer, try running it in nodejs [00:34:18] oh now i get it. from.indexOf === whitelist [00:34:25] http://www.w3schools.com/jsref/jsref_indexof.asp [00:34:30] paladox: That is not even slightly correct [00:35:49] oh from.indexOf(whitelist) ? [00:36:02] from.indexOf(whitelist) !== -1 [00:36:10] paladox: Have you tried running something like that in a JS console? [00:36:11] marktraceur ^^ [00:36:36] http://stackoverflow.com/questions/1789945/how-to-check-if-one-string-contains-another-substring-in-javascript [00:41:37] paladox: Have you tried running the code in a JS console? [00:46:20] marktraceur yes didnt work since i was using undefined code [00:46:27] anyways i have to go sorry [00:46:32] paladox: Yeah, so test something similar [00:47:01] ok [00:47:17] I'll be around later [00:47:20] marktraceur: ill take a look i still have to write the backend for that lol [00:47:30] Zppix: What backend? [00:47:42] Functionality [00:47:55] Keeping it up to date via db [00:47:56] marktraceur grrrit-wm [00:48:16] Zppix: OK, and you understand how arrays work in JS right [00:48:29] I know a decent amount of js [00:48:46] Zppix: I guess I can be hopeful, thin [00:48:48] then. [00:48:49] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [00:49:02] marktraceur: lol [00:50:07] Zppix: I also have qualms about using nicks as "secure" identifiers, e.g., but that's not a JS concern, and I'm sure I'll have an opportunity to raise it at a later stage in the CR. [00:50:36] I plan on using hostnames [00:50:51] Excuse me cloaks [00:51:29] Well, cloaks aren't universal [00:53:58] I thought of that... You see i also plan on at some point making it so users have to "auth to the bot" to attempt to use any major high importance stuff [00:54:47] OK, I look forward to seeing further improvements [00:55:08] Me too [01:23:53] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [02:55:50] !log deployment-prep Managed to mess up the deployment-puppetmaster02 cert, had to move those nodes back [02:55:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [04:55:13] (03PS1) 10BryanDavis: Add COPYING file [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/319018 [04:55:16] (03PS1) 10BryanDavis: StewardBot: ratelimit @steward pings [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/319019 [05:07:04] 10Tool-Labs-tools-stewardbots: Evaluate cleanup on StewardBot's code - https://phabricator.wikimedia.org/T149404#2751215 (10bd808) Do you know what version of irclib is in use? I had to go back quite a way in the history of https://github.com/jaraco/irc even to see where the library was named irclib. I've been... [06:04:27] PROBLEM - Puppet run on tools-worker-1014 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:39:25] RECOVERY - Puppet run on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [07:28:27] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759234 (10Paladox) [07:28:36] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2755213 (10Paladox) 05stalled>03Open [07:56:21] (03PS2) 10BryanDavis: StewardBot: ratelimit @steward pings [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/319019 (https://phabricator.wikimedia.org/T148110) [07:58:43] (03CR) 10MarcoAurelio: "Does this support !steward as well?" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/319019 (https://phabricator.wikimedia.org/T148110) (owner: 10BryanDavis) [08:03:34] (03CR) 10MarcoAurelio: "SASL in SSL mode should be fine I think, but I have no idea how to do that." [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/318229 (https://phabricator.wikimedia.org/T149265) (owner: 10Platonides) [08:24:42] (03CR) 10MarcoAurelio: [C: 032] Introduce tox + flake8 [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/318521 (https://phabricator.wikimedia.org/T128503) (owner: 10Hashar) [08:25:27] (03Merged) 10jenkins-bot: Introduce tox + flake8 [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/318521 (https://phabricator.wikimedia.org/T128503) (owner: 10Hashar) [08:29:05] (03CR) 10MarcoAurelio: "check experimental" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/319019 (https://phabricator.wikimedia.org/T148110) (owner: 10BryanDavis) [08:31:28] (03CR) 10MarcoAurelio: "check experimental" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/318229 (https://phabricator.wikimedia.org/T149265) (owner: 10Platonides) [08:35:47] 10Tool-Labs-tools-stewardbots, 10Continuous-Integration-Config, 13Patch-For-Review: Implement jenkins tests on labs/tools/stewardbots - https://phabricator.wikimedia.org/T128503#2759303 (10MarcoAurelio) Maybe we shouldn't be using the `mediawiki` queue. Is there a `labs` queue? (Sometimes the mediawiki queue... [08:36:39] 10Tool-Labs-tools-stewardbots, 10Continuous-Integration-Config, 13Patch-For-Review: Implement jenkins tests on labs/tools/stewardbots - https://phabricator.wikimedia.org/T128503#2759304 (10MarcoAurelio) I've merged the above change. It always fails on tox, but I guess it's because the bot code is old. [08:40:24] 10Tool-Labs-tools-stewardbots: Evaluate cleanup on StewardBot's code - https://phabricator.wikimedia.org/T149404#2759305 (10MarcoAurelio) >>! In T149404#2759123, @bd808 wrote: > Do you know what version of irclib is in use? I had to go back quite a way in the history of https://github.com/jaraco/irc even to see... [09:52:48] Is the tools elastic search cluster just for general use? I couldn't see any documentation about it; I don't want to tread on anyone's toes but I'd like to do some elasticy stuff and if I could do that without having VMs on labs that would be cool. [12:18:10] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759556 (10Aklapper) @Paladox: Why did you re-add the #Labs team project? [12:19:39] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759569 (10Aklapper) (In general: Could people please be specific, avoid using only "this" and "it", actually be explicit what they're talking about, and take more time to phrase sentences that do not of... [12:41:58] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/DatGuy was created, changed by DatGuy link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/DatGuy edit summary: Created page with "{{Tools Access Request |Justification=I am already a bot operator on the English Wikipedia. Soon, I might also have a bot that does continuous edits. I will need the tool labs..." [12:48:39] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759613 (10Paladox) Sorry, I didn't see the labs tag had already been added and then removed. Reason why I added the tag is because labs needs to create this new labs project so I can create the instance. [12:53:12] 06Labs, 06Operations: cronspam from labstores, labcontrol, labstestservices - https://phabricator.wikimedia.org/T149574#2759635 (10MoritzMuehlenhoff) The "Cron /usr/bin/rsync --delete --delete-after -aSO /srv/glance//images/ labcontrol1002.wikimedia.org:/srv/glance//images/" message... [13:27:32] !log tools reboot tools-exec-1404 post depool for test [13:27:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:40:24] Is anyone around to help me figure out why I can't SSH to one of my instanceS? [13:43:34] addshore: possibly, what: project / instance / user name / current behavior do you see [13:43:48] chasemp: db.cognate.eqiad.wmflabs [13:44:23] the scripts that conntect to it cant get to the mysql and I get channel 0: open failed: connect failed: No route to host when trying to ssh to it [13:45:08] wild guess if whatever script doesn't know how to use the bastion to access [13:45:28] by script I mean the wikis setup to read from the db (whoich are hosted on labs) [13:45:35] you can't connect directly to that instance which would probably surface as no route to host [13:45:37] it was working a few days ago [13:45:59] my ssh config is setup to proxy through bastion, sshing to other instances works just fine [13:46:09] yeah host seems unresponsive to me as well [13:46:17] I can attempt a power cycle? [13:46:22] possibly something there went awry [13:49:11] chasemp: yeh, go for it! [13:49:22] I have also tried that once in the last 10 mins! [13:50:43] chasemp: looking at nagf it dies aprox 1 week ago [13:55:37] Hello, I am looking for help with java/jdbc/tomcat on toollabs. Most important: what jdbc-driver do I have to select in the datasource object? [13:56:21] gradzeichen: bd808 sent out a java tutorial recently to labs-l I would look for that and read through it first (I haven't had time to myself) [13:57:13] ok, will do [13:57:57] addshore: I can see that it's up and running but cannot get in still...can you ping andrewbogott to take a look when he is around? [13:58:09] it did in fact reboot but still is dark so yeah, that's interesting [13:59:07] addshore: is ssh allowed in your default security group? [13:59:13] andrewbogott: yup [14:01:38] chasemp: checked it, it does not speak about database access [14:02:38] gradzeichen: I'm not sure, try asking bd808 when he's about it's still early there tho [14:02:46] possibly a good addition to the existing help [14:03:22] no rush regarding that instance, I just span up another one and got it all setup! [14:03:36] so do with db.cognate.eqiad.wmflabs as you wish! :) [14:03:56] addshore: I want to investigate a bit more; I'll delete it when I finish [14:04:17] andrewbogott: I saw it reboot and saw it live on labvirt1001 but oddly unavail [14:04:22] curious what you find :) [14:04:33] I bet that grub is broken :( [14:07:36] paladox: are you around? [14:29:00] addshore: db is back, do you want anything there or should I just delete it now? [14:30:19] chasemp: this was a side-effect of the kernel upgrades last week… I /think/ that this is one that failed the automatic upgrade process — I tuned grub by hand and set it to boot the precise kernel (which didn't exist since it was a trusty box) [14:30:42] Fixed by mounting by hand as per https://wikitech.wikimedia.org/wiki/OpenStack#Mounting_an_instance.27s_disk and changing menu.lst [14:33:33] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2755213 (10Krenair) New project requests don't just need to be in #Labs, they also need to block T76375 - but I'm not sure this qualifies for a project of it's own. Why not just an extra tool, or even a... [14:34:49] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759828 (10Zppix) >>! In T149529#2759181, @greg wrote: > Of what? Again, please use the names of things you want a test instance of. I'm still confused on what you need. You haven't listed anything yet o... [14:44:18] 10Tool-Labs-tools-stewardbots: Evaluate cleanup on StewardBot's code - https://phabricator.wikimedia.org/T149404#2759864 (10bd808) >>! In T149404#2759305, @MarcoAurelio wrote: > Nope, sorry. I've been having a quick look through the files in WMF labs and I couldn't find anything. Maybe a shared library for all W... [14:48:09] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/DatGuy was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=943577 edit summary: [14:56:06] andrewbogott: ooooh I see! [14:56:14] Yes, feel free to nuke it now ! :) [14:58:10] (03PS3) 10BryanDavis: StewardBot: ratelimit @steward pings [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/319019 (https://phabricator.wikimedia.org/T148110) [14:59:43] (03CR) 10BryanDavis: "> Does this support !steward as well?" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/319019 (https://phabricator.wikimedia.org/T148110) (owner: 10BryanDavis) [15:01:33] (03CR) 10BryanDavis: "> - tox-jessie https://integration.wikimedia.org/ci/job/tox-jessie/13041/console" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/319019 (https://phabricator.wikimedia.org/T148110) (owner: 10BryanDavis) [15:44:55] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760101 (10Paladox) @Krenair oh, I guess we could have it on tools. But we need the ability to perminatly stop this test bot since it will just duplicate things if it is left running. [15:59:07] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760158 (10greg) I saw complaints last night of testing related to this making noise in our production Gerrit (and Phab?); what is your testing plan and how will you ensure that you are not disruptive in... [16:02:47] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760170 (10Paladox) @greg we could test on either an instance. Or duplicate grrrit-wm on the tools labs so that the production one is always working, and we can test using the test bot under a different... [16:05:19] tarrow: In theory the elasticsearch cluster in tool labs is for anyone to use. In practice I'm the only one who has used it yet. If you open a phab task about your use case and I can create you a set of credentials that will give you write access to an index. [16:07:41] 06Labs, 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760183 (10Paladox) Ive managed to create a instance on the git project. It is a small instance. [16:08:29] bd808: can we add this to wikitech somewhere even if it's a "ask bd808?" [16:08:37] I honestly didn't recall the state of things either :) [16:11:04] chasemp: yeah. I should write that (and a zillion other things) [16:11:10] heh [16:11:19] now I know the answer is ask you [16:11:45] I said I would document the search relevancy cluster like a month ago, and no movement [16:11:54] gradzeichen: I don't think we have a mysql jdbc connector jar that is globally installed. I think that https://dev.mysql.com/downloads/connector/j/ would work for you [16:17:39] bd808: can i install myself? [16:18:03] or needs this to be installed globally to work with tomcat? [16:18:45] gradzeichen: hmmm.. good question about how to integrate with the tomcat setup. I really haven't played with that. [16:19:14] additional question: toollabs has java1.7 installed [16:19:24] current version of java is 1.8 [16:19:48] if i compile locally with 1.8, it will not work on labs [16:20:01] and i have to compile on labs [16:20:11] gradzeichen: yes. we are stuck with 1.7 for the foreseeable future [16:20:28] ok, but i really need jdbc to go on [16:20:47] there are some phab tasks about upgrading with lots of problems that we haven't figured out how to solve. [16:21:13] we may be able to do 1.8 in the kubernetes containers, but probably never on the OGE grid [16:22:01] trying to shim this into SGE seems like a fools game yeah [16:23:45] We are looking for alternatives to tomcat on kubernetes too [16:23:54] some more light weight servlet container [16:24:08] I have very little exp w/ tomcat actually [16:24:34] way back when running a confluence/jira stack which was a nightmare and that's about it [16:26:32] gradzeichen: so back to your question, yes you can download the jars and put them somewhere in your tool's $HOME. Then I guess we need to find out how to configure the classpath for tomcat to pick them up or find the right magic directory to put them in [16:26:52] I think you should be abel to jsut put them in your war file somehow [16:27:27] * bd808 is a bit hazy on how wars work anymore. it's been 10+ years since he was a full-time java developer [16:29:09] i downloaded the jar and uploaded it to my account. i will try to put it in my war, but at the moment my ssh is frozen. [16:47:18] bd808: Thanks, I've made a ticket here: https://phabricator.wikimedia.org/T149709 [16:48:33] 06Labs, 10Tool-Labs, 15User-bd808: Possible use of tools-lab-elasticsearch cluster - https://phabricator.wikimedia.org/T149709#2760359 (10bd808) [16:50:00] tarrow: I can probably get some stuff started for this Wednesday/Thursday. It will give me a good reason to document what we have and the potential limitations of the setup. [16:50:29] Thank you! That's great :) [16:50:59] the big one is basically that our Elasticsearch is a shared environment much like our redis service. Everyone needs to play nice or things will go badly for everyone. [16:52:42] (03CR) 10Paladox: "test" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/318976 (https://phabricator.wikimedia.org/T149609) (owner: 10Paladox) [16:52:42] (03CR) 10Paladox: "test" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/318976 (https://phabricator.wikimedia.org/T149609) (owner: 10Paladox) [16:55:31] sure; I would probably appreciate details on 'nice'. In the past I've only used ElasticSearch in an environment where I'm the only user. If you can let me know some rough guidelines to stick to that would be great. [16:55:48] paladox: please put your testing bot in non-public channels. We don't need the noise. Maybe something like ##grrrit [16:56:01] Ok sorry [16:56:45] tarrow: I think we will have to figure it out as we go :) Mostly the concern I would have with the existing setup would be running things out of RAM with complex queries. [16:57:13] ok :) [16:58:56] 06Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Some versions of an image not rendering at all at wikitech - https://phabricator.wikimedia.org/T145811#2760437 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff Fixed, wikitech renders images again. [17:03:56] (03Draft1) 10Paladox: testing bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/319106 [17:06:22] (03CR) 10Paladox: "recheck" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/319106 (owner: 10Paladox) [17:26:54] 06Labs, 06Operations, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#2760585 (10yuvipanda) The issues with role::puppetmaster::standalone not being able to be its own client are fixed now! htt... [17:32:08] 10Quarry, 10Analytics-Wikimetrics: Include Tulu Wikipedia in Metrics and Quarry - https://phabricator.wikimedia.org/T148950#2760605 (10Pavanaja) [17:57:13] !log tools depool exec nodes on labvirt1002 [17:57:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [18:00:46] halfak: btw, let us know if snuggle no longer requires NFS :) will be happy to remove [18:01:53] PROBLEM - Host tools-services-01 is DOWN: CRITICAL - Host Unreachable (10.68.16.29) [18:02:01] PROBLEM - Host tools-webgrid-generic-1403 is DOWN: CRITICAL - Host Unreachable (10.68.18.52) [18:02:32] PROBLEM - Host tools-webgrid-lighttpd-1405 is DOWN: CRITICAL - Host Unreachable (10.68.17.65) [18:02:34] PROBLEM - Host tools-exec-gift is DOWN: CRITICAL - Host Unreachable (10.68.16.40) [18:02:50] PROBLEM - Host tools-redis-1001 is DOWN: CRITICAL - Host Unreachable (10.68.22.56) [18:02:56] @quiet shinken-wm [18:03:03] not sure if that's good enough? [18:03:06] let's see [18:03:49] PROBLEM - Host tools-exec-1203 is DOWN: CRITICAL - Host Unreachable (10.68.16.133) [18:03:49] PROBLEM - Host tools-webgrid-lighttpd-1204 is DOWN: CRITICAL - Host Unreachable (10.68.18.49) [18:03:51] :/ [18:04:21] PROBLEM - Host tools-webgrid-lighttpd-1401 is DOWN: CRITICAL - Host Unreachable (10.68.16.34) [18:04:47] PROBLEM - Host tools-exec-1405 is DOWN: CRITICAL - Host Unreachable (10.68.18.3) [18:05:09] PROBLEM - Host tools-k8s-etcd-03 is DOWN: CRITICAL - Host Unreachable (10.68.21.239) [18:05:13] PROBLEM - Host tools-exec-1210 is DOWN: CRITICAL - Host Unreachable (10.68.17.147) [18:05:21] @bd808: I compiled the jdbc-driver into my war-file and deployed. I does not work and I think it - conceptually - cannot, as the driver probably needs to be in the servers classpath, not in the classpath of the servlet? [18:05:26] apparently not [18:50:40] hi madhuvishy. question: there is a Tools maintenance on the way, right? and this is the one that may result in tools' unavailability for up to 48 hours? [18:51:03] leila: I just sent an email! It was tomorrow, but we had to push it [18:51:13] leila: hey! there is a general labs maintenance underway, but that's different from the one madhuvishy announced [18:51:28] this should only have partial disruption on and off for a bit. [18:51:37] leila: anything you want me to keep an eye on wlm-wise? [18:51:49] yeah yuvipanda. thanks. [18:51:58] madhuvishy: cool. /me goes to your email. [18:53:06] \o/, madhuvishy. just in time. :D [18:53:50] leila: :) [18:54:38] yuvipanda: we may need your support if the maintenance stays at 11/14. that's during WLM international jury process, and we can't have Montage down for 48 hours. [18:54:47] can you let me know how you can help, yuvipanda? [18:55:03] leila: sure! I can work with you to make sure we have montage available uninterrupted in that time [18:55:05] or even, madhuvishy: any chance that scheduled time-window be reconsidered? [18:55:17] leila: I'm also fairly sure it won't take 48 hours [18:55:22] yuvipanda: thank you. :) [18:55:25] leila: it's easier for us to make montage be available rather than resched it [18:55:33] I see, yuvipanda. [18:55:47] leila: can you setup an email thread between me and the people doing montage now? [18:55:53] leila: aah, it's the only slot available for us before I leave to India. chase is unavailable next week [18:56:17] thanks yuvipanda! [18:56:46] np madhuvishy [18:59:45] gradzeichen: yeah. I think you may be right about that for Tomcat. I need to grab some lunch first but I can try to dig into the config files for how we have tomcat running on the job grid. I'm sure there is a way to make things work. It may even be that we already have some mysql jdbc driver jar in the path. [19:02:54] bd808: is LDAP having problems? [19:03:02] 2016-11-01T19:02:21.189156+00:00 oxygen nslcd[1065]: [334873] no available LDAP server found: Server is unavailable [19:03:02] 2016-11-01T19:02:21.210945+00:00 oxygen diamond[536]: sudo: ldap_start_tls_s(): Can't contact LDAP server [19:03:02] 2016-11-01T19:02:21.223114+00:00 oxygen diamond[536]: sudo: unable to resolve host oxygen [19:03:29] and he didn't found chronium, hydrogen, acamar, and achernar .wikimedia.org [19:04:02] so I can't login via ssh key at oxygen.rcm.eqiad.wmflabs [19:07:45] can a labs-admin please take a look at that instance? I can't ssh into it [19:07:54] I will be back in ~30-45 min after dinner [19:10:27] !log tools depooled tools nodes from labvirt1004 and 1007 [19:10:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:18:30] yuvipanda hi it seems that grrrit-wm wont reconnect [19:18:33] Sagan, will be a dns problem rather than an ldap problem [19:18:43] if you can't connect to chromium, hydrogen, acamar and achernar [19:18:51] when it said ping timeout, i ssh in to check, and when doing this [19:18:52] kubectl get pods [19:19:02] it shows [19:19:02] tools.lolrrit-wm@tools-bastion-03:~/lolrrit-wm$ kubectl get pods [19:19:02] Unable to connect to the server: dial tcp 10.68.17.142:6443: getsockopt: no route to host [19:19:09] paladox: because there's ongoing labs maintenance, and the kubernetes master is just being restarted [19:19:17] Oh [19:19:18] ok [19:20:03] yuvipanda, what about the network? [19:20:22] non-tools instances shouldn't be seeing problems connecting to dns right? [19:20:28] It seems i carn't ssh into a new instance i created [19:20:47] Krenair: it should be fine, but my suspicion on Sagan's issue is probably an instance that hasn't run puppet in a long long time [19:20:56] and has wrong ldap and puppetmaster addresses [19:20:57] I managed to ssh in the first time, then it just stalled, then i closed my console and reopended it and tryed ssh in and it is not working [19:21:07] $ ssh bot-gerrit [19:21:07] channel 0: open failed: connect failed: No route to host [19:21:07] stdio forwarding failed [19:21:08] ssh_exchange_identification: Connection closed by remote host [19:21:16] paladox: ongoing labs maintenance, the bastion hosts are also probably restarting just now [19:21:18] so hang on :) [19:21:23] Oh [19:21:25] thanks [19:22:02] Oh, but i can ssh into one instance but carn't into the new one. [19:22:27] hmm might be something else. I can't really help right now tho, sorry [19:22:31] Ok [19:22:42] Krenair: but yeah, network in general should be untouched [19:22:48] ok [19:23:35] it seems to be stuck on rebooting [19:26:34] 10Tool-Labs-tools-Other: create tool to crunch metrics for views (play started) of video and audio files - https://phabricator.wikimedia.org/T116363#2760981 (10harej-NIOSH) Going to close this task as complete since the metrics crunching is now underway; T149642 is the task for implementing the UI. [19:26:35] 10Tool-Labs-tools-Other: create tool to crunch metrics for views (play started) of video and audio files - https://phabricator.wikimedia.org/T116363#2760981 (10harej-NIOSH) Going to close this task as complete since the metrics crunching is now underway; T149642 is the task for implementing the UI. [19:26:47] 10Tool-Labs-tools-Other: create tool to crunch metrics for views (play started) of video and audio files - https://phabricator.wikimedia.org/T116363#2760983 (10harej-NIOSH) 05Open>03Resolved [19:26:49] 10Tool-Labs-tools-Other: create tool to crunch metrics for views (play started) of video and audio files - https://phabricator.wikimedia.org/T116363#2760983 (10harej-NIOSH) 05Open>03Resolved [19:28:05] 06Labs, 10grrrit-wm: Request creation of labs project - https://phabricator.wikimedia.org/T149733#2761000 (10Paladox) [19:28:06] 06Labs, 10grrrit-wm: Request creation of labs project - https://phabricator.wikimedia.org/T149733#2761000 (10Paladox) [19:28:17] 06Labs, 10grrrit-wm: Request creation of grrrit-wm-test labs project - https://phabricator.wikimedia.org/T149733#2761014 (10Paladox) [19:28:19] 06Labs, 10grrrit-wm: Request creation of grrrit-wm-test labs project - https://phabricator.wikimedia.org/T149733#2761014 (10Paladox) [19:30:23] 06Labs, 10grrrit-wm: Request creation of grrrit-wm-test labs project - https://phabricator.wikimedia.org/T149733#2761000 (10yuvipanda) This should just be another tool rather than a labs project I think. [19:30:29] 06Labs, 10grrrit-wm: Request creation of grrrit-wm-test labs project - https://phabricator.wikimedia.org/T149733#2761000 (10yuvipanda) This should just be another tool rather than a labs project I think. [19:31:03] 06Labs, 10grrrit-wm: Request creation of grrrit-wm-test labs project - https://phabricator.wikimedia.org/T149733#2761018 (10Paladox) Oh then tool please? [19:31:07] 06Labs, 10grrrit-wm: Request creation of grrrit-wm-test labs project - https://phabricator.wikimedia.org/T149733#2761018 (10Paladox) Oh then tool please? [19:31:18] Why did it do ^^ that twice [19:31:54] now is not really a good time to try to do this paladox, it could be caught in a reboot during the op [19:31:57] I would wait this maint out [19:32:07] chasemp that wasent me [19:32:13] i doint do wikibugs [19:34:29] I'll deal with wikibugs once the maintenance is done [19:34:39] !log tools migrate tools-elastic-03 to labvirt1009 [19:34:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:40:51] Howdy. We just deployed our beta Wikipedia Library Card platform. It was working well but now after announcing new signups we are getting 502 errors on all pages. I contacted our developer, but can anyone else see what's going on here? [19:40:52] http://twl-test.wmflabs.org/ [19:43:53] Ocaasi: we are currently in a large maintenance period which while it should be relatively short comes with maximum volatility [19:44:03] ah, good to know! [19:44:07] a message to labs-announce should come at the end [19:52:00] !log tools move tools-elastic-03 to labvirt1010, -02 already in 09 [19:52:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:00:51] is someone free to take a look at my instance issue? [20:01:40] Sagan: check out status in the topic, maint in progress labs wise [20:01:42] wide even [20:02:54] 06Labs, 10grrrit-wm: Request creation of grrrit-wm-test labs project - https://phabricator.wikimedia.org/T149733#2761118 (10Paladox) 05Open>03declined [20:02:55] 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2761119 (10Paladox) [20:02:56] 06Labs, 10grrrit-wm: Request creation of grrrit-wm-test labs project - https://phabricator.wikimedia.org/T149733#2761118 (10Paladox) 05Open>03declined [20:03:02] 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2761119 (10Paladox) [20:03:16] Sagan: can you file a bug on phabricator? [20:03:31] I looked a it a tiny bit, and you've a really strange resolv.conf and I've no idea how that happened [20:04:29] yuvipanda: ok [20:05:05] Sagan: try now [20:05:31] yuvipanda: nice, that works [20:05:39] still need to file a bug? :) [20:05:42] what did you do? [20:05:45] Sagan: can you still file a bug so I can track this? [20:05:51] ok [20:07:05] (03PS5) 10Paladox: Adds a grrrit-wm restarting command for you to type in irc [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/318976 (https://phabricator.wikimedia.org/T149609) [20:08:27] !log tools depooled things on labvirt1006 and 1008 [20:08:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:08:46] 06Labs, 10Labs-Infrastructure, 10Labs-project-other: Can't ssh to oxygen.rcm.eqiad.wmflabs - https://phabricator.wikimedia.org/T149737#2761146 (10Luke081515) [20:08:48] 06Labs, 10Labs-Infrastructure, 10Labs-project-other: Can't ssh to oxygen.rcm.eqiad.wmflabs - https://phabricator.wikimedia.org/T149737#2761146 (10Luke081515) [20:09:08] yuvipanda: ^ [20:09:14] 10Labs-project-Wikistats: new possible wikifarms / hives detected - check for lists - https://phabricator.wikimedia.org/T38570#2761161 (10RobiH) Loads of new data here: https://wikiapiary.com/ https://wikiapiary.com/wiki/Websites https://wikiapiary.com/wiki/Farm:Farms Maybe even potential to join forces with t... [20:09:15] 10Labs-project-Wikistats: new possible wikifarms / hives detected - check for lists - https://phabricator.wikimedia.org/T38570#2761161 (10RobiH) Loads of new data here: https://wikiapiary.com/ https://wikiapiary.com/wiki/Websites https://wikiapiary.com/wiki/Farm:Farms Maybe even potential to join forces with t... [20:09:56] 06Labs, 10Labs-Infrastructure, 10Labs-project-other: Can't ssh to oxygen.rcm.eqiad.wmflabs - https://phabricator.wikimedia.org/T149737#2761166 (10yuvipanda) 05Open>03Resolved a:03yuvipanda Somehow the instance's /etc/resolv.conf got to: ``` root@oxygen:~# cat /etc/resolv.conf domain rcm. search rcm.... [20:09:58] 06Labs, 10Labs-Infrastructure, 10Labs-project-other: Can't ssh to oxygen.rcm.eqiad.wmflabs - https://phabricator.wikimedia.org/T149737#2761166 (10yuvipanda) 05Open>03Resolved a:03yuvipanda Somehow the instance's /etc/resolv.conf got to: ``` root@oxygen:~# cat /etc/resolv.conf domain rcm. search rcm.... [20:12:37] !log tools.stashbot Test message after the elasticsearch vms were rearranged to live on separate physical hosts [20:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL [20:32:16] !log tools depool tools things on labvirt1005 and 1009 [20:32:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:34:33] Is Tool Labs going down for a reboot? [21:19:24] yuvipanda: ok, so which name I should use - wdqs-puppet.eqiad.wmflabs or wdqs-puppet.wikidata-query.eqiad.wmflabs [21:19:34] latter [21:19:44] Error: Could not retrieve catalog from remote server: Server hostname 'wdqs-puppet.eqiad.wmflabs' did not match server certificate; expected one of wdqs-puppet.wikidata-query.eqiad.wmflabs, DNS:puppet, DNS:puppet.wikidata-query.eqiad.wmflabs, DNS:wdqs-puppet.wikidata-query.eqiad.wmflabs [21:19:52] Could https://petscan.wmflabs.org/ be restarted? It throws 502... [21:19:55] yup, you need the latter [21:20:01] Is this an planned outage? [21:20:03] yuvipanda: that's what I used [21:20:21] first or just now? [21:20:29] since the error you pasted suggests otherwise? [21:20:36] yuvipanda: right now it's configured for it [21:20:37] Urbanecm: Probably, they're doing rolling reboots. [21:20:40] but it doesn't work [21:20:54] SMalyshev: i see. this is wdqs-puppet? [21:20:58] yuvipanda: yes [21:21:03] ok, I'll take a look now [21:21:07] yuvipanda: thanks! [21:21:16] Matthew_: I was informed about an planned outage which was announced at 18:00 UTC but this time passed. Could it be this? [21:21:35] Urbanecm: it started at that time, but is ongoing [21:21:39] Urbanecm: Maybe. My instances just came up. [21:21:50] Like, 2 minutes ago. So it appears to be still ongoing. [21:22:00] yuvipanda: And when it'll end? [21:22:04] Matthew_: [21:22:07] ^ [21:22:18] Urbanecm: we're done with 10 of 13 [21:22:22] I don't know, yuvipanda would have a better answer. [21:22:28] From how many? [21:22:41]