[00:32:02] !ping [00:32:02] !pong [00:32:31] !ping [00:32:32] !pong [00:35:48] !pung [00:35:48] !derp [00:36:29] <^d> !peng [01:23:24] 3Wikimedia Labs / 3Infrastructure: Difeerent results with queries in labs versus production - 10https://bugzilla.wikimedia.org/72413 (10nuria) 3NEW p:3Unprio s:3normal a:3None Different results with queries in labs versus production around september 21st. The following query returns different (very... [01:23:38] 3Wikimedia Labs / 3Infrastructure: Different results with queries in labs versus production - 10https://bugzilla.wikimedia.org/72413 (10nuria) [05:45:32] who runs quarry [05:46:09] YuviPanda|zzzz: ping [05:48:35] petan: ping [05:48:35] Hi PiRSquared17, you just managed to say pointless nick: ping. Now please try again with some proper meaning of your request, something like nick: I need this and that. Or don't do that at all, it's very annoying. Thank you [05:48:45] okay [05:50:23] is anyone here? [06:02:56] awful [06:23:25] 3Wikimedia Labs / 3tools: Missing page revisions on enwiki - 10https://bugzilla.wikimedia.org/72226 (10Sean Pringle) a:5Marc A. Pelletier>3Sean Pringle [06:23:39] 3Wikimedia Labs: Discrepancy between enwiki_p.pagelinks on labs and production - 10https://bugzilla.wikimedia.org/71176 (10Sean Pringle) a:3Sean Pringle [06:24:43] 3Wikimedia Labs / 3tools: Rows missing from enwiki_p on s1, c1 - 10https://bugzilla.wikimedia.org/71084 (10Sean Pringle) a:5Marc A. Pelletier>3Sean Pringle [06:27:24] 3Wikimedia Labs / 3tools: Missing page revisions on enwiki - 10https://bugzilla.wikimedia.org/72226#c1 (10Sean Pringle) 5NEW>3ASSI Sync in progress. Cause is not yet confirmed, with https://mariadb.atlassian.net/browse/MDEV-6551 a possibility. Very interested to hear if anyone observes this with recent... [06:28:09] 3Wikimedia Labs / 3tools: Rows missing from enwiki_p on s1, c1 - 10https://bugzilla.wikimedia.org/71084#c2 (10Sean Pringle) 5NEW>3ASSI Sync in progress. Cause is not yet confirmed, with https://mariadb.atlassian.net/browse/MDEV-6551 a possibility. Very interested to hear if anyone observes this with rec... [06:28:23] 3Wikimedia Labs / 3Infrastructure: missing database entries at categorylinks table on dewiki db - 10https://bugzilla.wikimedia.org/70711#c1 (10Sean Pringle) Sync in progress. Cause is not yet confirmed, with https://mariadb.atlassian.net/browse/MDEV-6551 a possibility. Very interested to hear if anyone obse... [06:28:25] 3Wikimedia Labs: Discrepancy between enwiki_p.pagelinks on labs and production - 10https://bugzilla.wikimedia.org/71176#c1 (10Sean Pringle) Sync in progress. Cause is not yet confirmed, with https://mariadb.atlassian.net/browse/MDEV-6551 a possibility. Very interested to hear if anyone observes this with rece... [06:29:09] 3Wikimedia Labs / 3Infrastructure: missing database entries at categorylinks table on dewiki db - 10https://bugzilla.wikimedia.org/70711 (10Sean Pringle) [06:29:09] 3Wikimedia Labs / 3tools: Missing page revisions on enwiki - 10https://bugzilla.wikimedia.org/72226 (10Sean Pringle) [06:29:10] 3Wikimedia Labs / 3tools: Rows missing from enwiki_p on s1, c1 - 10https://bugzilla.wikimedia.org/71084 (10Sean Pringle) [06:30:26] 3Wikimedia Labs / 3Infrastructure: missing database entries at categorylinks table on dewiki db - 10https://bugzilla.wikimedia.org/70711 (10Sean Pringle) [06:30:26] 3Wikimedia Labs / 3tools: Rows missing from enwiki_p on s1, c1 - 10https://bugzilla.wikimedia.org/71084 (10Sean Pringle) [06:30:26] 3Wikimedia Labs: Discrepancy between enwiki_p.pagelinks on labs and production - 10https://bugzilla.wikimedia.org/71176 (10Sean Pringle) [06:42:41] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (20.00%) [06:44:41] 3Wikimedia Labs / 3Infrastructure: Different results with queries in labs versus production - 10https://bugzilla.wikimedia.org/72413 (10Sean Pringle) a:3Sean Pringle [06:45:11] 3Wikimedia Labs / 3tools: Missing page revisions on enwiki - 10https://bugzilla.wikimedia.org/72226 (10Sean Pringle) [06:45:11] 3Wikimedia Labs / 3Infrastructure: Different results with queries in labs versus production - 10https://bugzilla.wikimedia.org/72413 (10Sean Pringle) [06:45:11] 3Wikimedia Labs / 3Infrastructure: Different results with queries in labs versus production - 10https://bugzilla.wikimedia.org/72413#c1 (10Sean Pringle) Sync in progress. Cause is not yet confirmed, with https://mariadb.atlassian.net/browse/MDEV-6551 a possibility. Very interested to hear if anyone observes... [06:52:15] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [06:56:34] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (30.00%) [07:06:09] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [07:47:07] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (11.11%) [07:55:17] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [11:55:10] I've been brought out of my Wikibreak to fix my broken tools on labs, but I can't even get in. I keep getting A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [11:55:32] Coren, ^ [12:20:51] !ping [12:20:51] !pong [12:27:51] YuviPanda: http://quarry.wmflabs.org/query/808 [12:28:03] PiRSquared: hey! I responded to your PM [12:28:08] PiRSquared: I'm in the process of fixing it now. [12:28:14] okay [12:28:17] thanks :) [12:28:24] PiRSquared: thanks for finding it and letting me know! [12:28:52] I actually found it by accident when there was some HTML in an edit summary [12:39:31] bye :-) [12:55:55] YuviPanda, I can't seem to connect to labs. [12:56:08] Cyberpower678: via ssh? [12:56:15] yes. [12:56:37] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [12:56:40] YuviPanda, ^ [12:56:47] Cyberpower678: try now? I'm monitoring the tools-login auth log [12:57:42] YuviPanda, A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. [12:58:12] Cyberpower678: hmm, so I see no connection attempts from you. Perhaps port 22 is blocked on your network? are you able to log in via ssh elsewhere? [12:58:47] Why would it be blocked? And where would it likely be blocked from? [12:59:43] YuviPanda, ^ [12:59:46] Cyberpower678: your local network administrator? [13:00:23] ssh tunneling is commonly used to get around http content filtering, so some places block port 22 as well [13:00:23] I'm on a home network and manage the router myself. Or are you referring to my ISP? [13:00:32] hmm, your ISP shouldn't be blocking it, no [13:00:51] !ping [13:00:51] !pong [13:01:08] Well I'm in charge of the network in my apartment, and I don't recall activating any port blocks. [13:01:58] Cyberpower678: can you try sshing to some other server somewhere else and see if it works? [13:02:36] I don't have anywhere else to SSH into. [13:03:17] MySQL Workbench works though. [13:03:24] Strange [13:03:29] oh, that is... [13:03:30] strange [13:03:44] Cyberpower678: try sshing to trusty.tools.wmflabs.org? [13:04:21] ok, nevermind, I misremembered that [13:04:29] Cyberpower678: mind if I PM? [13:04:37] sure [13:05:50] !ping [13:05:50] !pong [13:07:01] !ping? [13:07:03] !ping [13:07:03] !pong [13:12:06] !ping [13:12:06] !pong [13:17:08] Earwig|away: I dunno what you are doing on tools-login, but it's *really* heavy and should really be done on tools-dev please. [13:21:19] Earwig|away: Specifically, it's I/O heavy. That watch should really be set to something much higher than 1s [13:22:04] Coren: so for Cyberpower678 sshing to tools-login fails but bastion works. [13:22:27] I'm not fully sure what's happening. port 22 seems fine on his network [13:23:19] xtools still kaput? [13:23:58] warpath: Cyberpower678 is around, I expect for that reason. [13:24:30] he destroys everything :\ [13:24:31] yeah, but he's having strange SSH issues [13:24:49] Cyberpower678: You don't seem to be hitting the right IP at all. What, exactly, is the IP you are trying to reach? [13:25:34] Coren, I take that as an insult. [13:25:44] ... what? [13:25:59] warpath: Cyberpower678 is around, I expect for that reason. [13:26:47] What, how can you possibly take that as an insult? You *are* here because there are issues with xtools. [13:27:09] Cyberpower678: I think he meant there are issues with xtools and you are here to attempt to fix it. [13:27:16] and not you are causing the issues :) [13:28:01] Yes, yes, "you are here because X is broken", not "X is broken because you are here" [13:28:39] Cyberpower678: What is the exact hostname/IP you are attempting to connect to? [13:29:11] Coren, oh. Yes. Someone has gotten my attention about the desperate issue with xtools, so I'm trying to fix them. [13:29:21] So I take that back then. [13:30:06] * YuviPanda goes afk [13:30:13] tools-login-eqiad.wmflabs.org resolves to 198.105.244.65 Port: 22 [13:31:07] IT'S THE HOST. [13:31:17] IT RESOLVES TO A BAD IP. :/ [13:32:12] Host tools-login-eqiad.wmflabs.org not found: 3(NXDOMAIN) [13:32:12] That hostname has been obsolete since the end of migration, and was removed from DNS quite some time ago. I have no idea why your computer resolves it at all. [13:32:50] Use 'tools-login.wmflabs.org' or 'login.tools.wmflabs.org' both of which are canonical. [13:33:43] Perhaps you have something in your local hosts file? [13:37:01] 198.105.244.65 isn't even one of the WMF's IP, it's owned by "Search Guide Inc"; it looks like you have an evil DNS server that highjacks incorrect host names to point them at http://searchguide.windstream.net [13:38:32] My ISP is Windstream [13:38:39] Blame them. [13:38:58] Coren, I thought eqiad was going to be maintained forever. [13:39:01] Huh. That URL is a reskinned Yahoo search result page with ads. How... evil. Yep, I blame them. :-) [13:39:24] I know. But they provide the internet. [13:39:41] eqiad is. The hostname only existed to distinguish the two "tools-login" during the migration. [13:40:07] * Cyberpower678 rewrites all the connection settings. [13:40:29] Sorry, I thought that was clear during the migration. [13:41:00] Having DC-specific hostnames prevents load balancing or HA. [13:41:52] No biggy. [13:42:06] * Cyberpower678 is in and has all settings rewritten. [13:42:51] Did I miss something, or are there supposed to be 4 webgrids running on xtools? [13:43:50] Cyberpower678: There really shouldn't be. Ew. How did /that/ happen? [13:44:16] That might explain why xtools is down. They would likely be clashing with each other. [13:44:17] I note one is long-running, and the other three were started in quick succession. [13:45:25] I deleted the 3 newest ones. [13:45:50] I'm still getting 404. [13:45:51] Likely you'll have to restart the other one too; its entry for the proxy would have been highjacked by the latter 3 [13:45:52] Hmm... [13:47:32] 10/8 is right around the time a corruption bug in the job list was fixed; it's likely xtools got confused as a result (there were a couple tools that had issues then) but because it "looked" up it wasn't flagged as "obviously broken" [13:47:56] Coren, I'm a little rusty from not using the terminal, but remind me how to jsub a .sh file to run every 5 minutes? [13:48:18] You need to stuff that in your crontab [13:48:54] But also, if all you want is to restart the webservice, you can use bigbrother instead. [13:49:20] Literally, just put 'webservice' in ".bigbrotherrc" in your tool's home. [13:49:43] * Coren can do it for you if you want. [13:50:14] Please do. I'm not up to speed with the changes here. [13:50:41] {{done}} [13:50:51] I note xtools is happy and running. Yeay. [13:51:03] So what now. Will webservice restart on its own? [13:51:17] Yes, up to three times in any 24h window. [13:51:30] Cool. [13:51:35] Can you do the same for supercount [13:52:23] Sure. {{done}} [13:52:55] Thank you. [13:56:42] Coren, When will Big Brother kick in and restore webservice on supercount? [13:57:27] bigbrother normally checks every minute, but changes in its configuration (like adding it) may take up to five. [13:58:04] Webservice dies as soon as it starts now. :/ [13:58:09] But I see it trying to start it now. Look at 'bigbrother.log' [13:58:37] 2014-10-23 13:56:36: (configfile.c.912) source: /var/run/lighttpd/supercount.conf line: 551 pos: 41 parser failed somehow near here: fastcgi.server [13:58:52] I didn't modify it. [13:59:03] It's been working for the past few months. [14:00:28] Your .lighttpd.conf is subtly broken, it lacks an EOL. It may have worked by "accident" previously. [14:01:05] Accident? [14:01:08] * Coren fixes that [14:01:21] It's been powering supercount for the past few months. [14:01:54] Thank you Coren. [14:01:56] Well, not having an EOL means that if the default config /happens/ to have a blank line after your configuration is merged, then it'll work. But that's happy conincidence not correct. :-) [14:02:13] At least, it didn't have a BOM at the start. :-P [14:02:58] Looks like it's happy. [14:03:25] 3Tool Labs tools / 3X!'s tools: Xtools offline - 10https://bugzilla.wikimedia.org/72104 (10Cyberpower678) 5NEW>3RESO/FIX p:5Unprio>3High a:3Cyberpower678 [14:04:20] Cyberpower678: You should add a maintainer or two to your tools so that you aren't pulled out of vacation at the slightest problem. :-) [14:05:13] Coren, it's hard to find one not biased towards removing optin globally. [14:07:54] We try to respect community consensus. [14:11:14] * Cyberpower678 is signing off. [14:47:47] I have a labs instance that doesn't show up for the project I thought it was in [14:47:59] any way I can verify on the host itself or find what project it does belong to? [14:53:21] chasemp: Trivially, /etc/wmflabs-project [14:54:03] (Assuming puppet ever ran on the box) [14:58:46] so it shows project I thought [14:58:52] but I don't see it under manage instances [15:01:24] Try to log off and back on again? I know there have been issues with Wikitech lately. [15:01:36] k [15:20:09] chasemp: Did it help? [15:21:34] got pulled into a meeting [15:21:46] Meetings happen. :-) [15:35:06] Coren: sure enough logged out=>in now it shows up [15:35:09] thanks [15:48:50] Gridengine puppetization is taking shape. Someone want to review https://gerrit.wikimedia.org/r/168306 ? [17:49:31] ori, if you're around, I just got another of those suspicious intermittent 503 errors from beta labs seconds ago: https://saucelabs.com/jobs/b588a985089947578012dee238e20670 if it's HHVM there might be something in the logs just now [18:17:52] We could successfully setup the mx for beta ( deployment-mx ) which would be a polonium equivalent for beta and patches that should go to production mx can be tested there, live - just like our BounceHandler extension [18:18:45] after discussions, currently its routing only our labs instance ( http://mediawiki-verp.wmflabs.org/ ) outgoing e-mails - for starters [18:19:24] we analysed how it would respond to bounes and found that the bounces got correctly placed into mediaiwiki-verps table, 'bounce-records' [18:20:17] as a next step, we would want beta to rely on deployment-mx for outgoing mails ( once Jeff_Green makes sure the DKIM passes ) [20:53:58] YuviPanda|zzz: What's your sql for dummies web tool again? [21:01:03] multichill: quarry.wmflabs.org [21:02:10] legoktm: Thank you, couldn't remember the name :-) [22:55:34] !log tools reboot tools-shadow, upstart seems hosed [22:55:38] Logged the message, Master [23:04:32] RECOVERY - ToolLabs: Puppet freshness check on labmon1001 is OK: OK: All targets OK [23:54:07] *FINALLY* got a puppetized gridengine config worth a damn [23:57:37] :D