[00:14:41] 3Labs: Request to create Gather labs project - https://phabricator.wikimedia.org/T89185#1029569 (10Andrew) 5Open>3Resolved a:3Andrew I've created a new project, 'gather' with admin member Robmoen. Please file a new cleanup bug when you finish with the project so I can reclaim the space. -Andrew [00:14:42] 3Labs: New Labs project requests (Tracking) - https://phabricator.wikimedia.org/T76375#1029572 (10Andrew) [00:17:01] 3Labs: New Labs project requests (Tracking) - https://phabricator.wikimedia.org/T76375#1029588 (10Andrew) [00:17:02] 3Labs, Wikimedia-IEG-grant-review: Create "grantreview" labs project - https://phabricator.wikimedia.org/T88852#1029585 (10Andrew) 5Open>3Resolved a:3Andrew Created. Bryan, you're the only project admin for the moment -- you can grant rights to the project's heirs as you see fit. [00:17:43] 3Labs, Wikimedia-Fundraising-CiviCRM, Wikimedia-Fundraising: Create new labs project: fundraising-integration - https://phabricator.wikimedia.org/T88599#1029599 (10Andrew) ok... if this is going to make use of the 'integration' project can I close this bug? [00:31:10] !log grantreview Added Niharika29 as a project admin [00:31:14] Logged the message, Master [01:01:45] I am unable to setup a proxy for https://wikitech.wikimedia.org/wiki/Nova_Resource:I-0000085e.eqiad.wmflabs. Whenever I try I get http://pastie.org/pastes/9937474/text [01:02:35] I am following https://www.mediawiki.org/wiki/User:BDavis_(WMF)/Notes/Labs-vagrant [01:02:44] prtksxna: I’ll try [01:02:52] Thanks andrewbogott! [01:03:23] Hm, I bet the firewall needs tuning [01:04:45] Got the same error? [01:06:15] prtksxna: try now? [01:07:38] andrewbogott: Great success http://oooo.wmflabs.org/wiki/Main_Page [01:07:46] cool [01:21:46] oooo.wmflabs.org? ಠ_ಠ [01:22:28] object oriented object orientation [01:23:52] Does labs support puny code host names? ಠ_ಠ.wmflabs.org would be awesome as the name of a monitoring service [01:51:51] hehehe http://ಠ-ಠ.wmflabs.org/wiki/Main_Page [01:52:25] ಠ_ಠ [01:53:55] 3Release-Engineering, Engineering-Community, Wikibugs: Only use -devtools irc channel for phab-related ticket announcements - https://phabricator.wikimedia.org/T89153#1029727 (10Aklapper) [01:54:03] 3Release-Engineering, Engineering-Community, Wikibugs: Only use -devtools irc channel for phab-related ticket announcements - https://phabricator.wikimedia.org/T89153#1028550 (10Aklapper) sounds good to me [01:54:19] legoktm: http://╯‵д′╯彡┻━┻.wmflabs.org/wiki/Main_Page [01:54:45] That may be the name of my next laptop [01:54:51] my IRC client won't even let me click on that [01:55:08] try xn--d1a644lha820cjib27ad0264k.wmflabs.org [01:55:20] I used copy+paste [01:57:24] why is monobook the default [01:59:41] 3Release-Engineering, Engineering-Community, Wikibugs: Only use -devtools irc channel for phab-related ticket announcements - https://phabricator.wikimedia.org/T89153#1029759 (10Legoktm) Why wasn't -devtools merged into -releng? > Now: The signal/noise ratio is too low in -releng and I think we can decide to ke... [03:06:51] 3Labs: Request to create Gather labs project - https://phabricator.wikimedia.org/T89185#1029825 (10yuvipanda) There is already a mobile project and and this instance should just live there. Any admina on he mobile project (max? Jon? Kaldari?) Should be able to add you as an admin and create instances for you [03:23:18] andrewbogott: This is strange, I am suddenly getting Permission denied (publickey) for the instances I was able to SSH into an hour ago. [03:26:45] prtksxna: what instance? [03:27:20] https://wikitech.wikimedia.org/wiki/Nova_Resource:I-00000862.eqiad.wmflabs and https://wikitech.wikimedia.org/wiki/Nova_Resource:I-0000085e.eqiad.wmflabs [03:27:31] andrewbogott: I ssh into bastion and then ssh ahead [03:27:45] ‘slatetest’? [03:28:23] andrewbogott: yup [03:29:21] prtksxna: try now? [03:29:37] andrewbogott: Works! What changed? [03:29:52] I didn’t do anything, I just wanted to see the problem. [03:30:57] /o\ [03:31:07] yep! [03:55:27] 3Release-Engineering, Engineering-Community, Wikibugs: Only use -devtools irc channel for phab-related ticket announcements - https://phabricator.wikimedia.org/T89153#1029868 (10greg) >>! In T89153#1029759, @Legoktm wrote: > Why wasn't -devtools merged into -releng? Because it was good to have a dedicated chann... [04:08:09] 3Release-Engineering, Engineering-Community, Wikibugs: Only use -devtools irc channel for phab-related ticket announcements - https://phabricator.wikimedia.org/T89153#1029877 (10Dzahn) >>! In T89153#1029868, @greg wrote: > Because it was good to have a dedicated channel for the bugzilla->phab migration I think... [04:13:48] Hi, I need to request a new project for my Extension? =] [04:27:54] 3Labs: MediaWiki Extension "ImportArticles" project - https://phabricator.wikimedia.org/T89208#1029894 (10Cblair91) 3NEW [04:32:31] (03PS1) 10Greg Grossmeier: Remove Phabricator and Code-Review from -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/189901 (https://phabricator.wikimedia.org/T89153) [04:32:46] (03CR) 10jenkins-bot: [V: 04-1] Remove Phabricator and Code-Review from -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/189901 (https://phabricator.wikimedia.org/T89153) (owner: 10Greg Grossmeier) [04:35:07] (03PS2) 10Greg Grossmeier: Remove Phabricator and Code-Review from -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/189901 (https://phabricator.wikimedia.org/T89153) [05:08:42] 3Release-Engineering, Engineering-Community, Wikibugs: Only use -devtools irc channel for phab-related ticket announcements - https://phabricator.wikimedia.org/T89153#1029923 (10MZMcBride) >>! In T89153#1029868, @greg wrote: >>>! In T89153#1029759, @Legoktm wrote: >> Why wasn't -devtools merged into -releng? >... [06:39:53] PROBLEM - Puppet staleness on tools-exec-15 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [07:39:46] PROBLEM - Puppet failure on tools-dev is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:55:27] PROBLEM - Puppet failure on tools-exec-12 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [08:04:46] RECOVERY - Puppet failure on tools-dev is OK: OK: Less than 1.00% above the threshold [0.0] [08:04:52] PROBLEM - Puppet failure on tools-submit is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [08:25:22] RECOVERY - Puppet failure on tools-exec-12 is OK: OK: Less than 1.00% above the threshold [0.0] [08:29:59] RECOVERY - Puppet failure on tools-submit is OK: OK: Less than 1.00% above the threshold [0.0] [08:35:09] PROBLEM - Puppet failure on tools-exec-13 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [08:36:59] PROBLEM - Puppet failure on tools-exec-08 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [08:51:28] PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [09:00:07] RECOVERY - Puppet failure on tools-exec-13 is OK: OK: Less than 1.00% above the threshold [0.0] [09:05:55] PROBLEM - Puppet failure on tools-submit is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [09:06:03] PROBLEM - Puppet failure on tools-redis is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:07:01] RECOVERY - Puppet failure on tools-exec-08 is OK: OK: Less than 1.00% above the threshold [0.0] [09:08:33] PROBLEM - Puppet failure on tools-webproxy is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:12:17] PROBLEM - Puppet failure on tools-exec-03 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [09:13:05] PROBLEM - Puppet failure on tools-mail is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [09:13:48] PROBLEM - Puppet failure on tools-exec-09 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:21:02] PROBLEM - Puppet failure on tools-exec-04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:27:03] PROBLEM - Puppet failure on tools-login is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:28:35] RECOVERY - Puppet failure on tools-webproxy is OK: OK: Less than 1.00% above the threshold [0.0] [09:30:52] RECOVERY - Puppet failure on tools-submit is OK: OK: Less than 1.00% above the threshold [0.0] [09:31:11] PROBLEM - Puppet failure on tools-exec-11 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [09:33:50] RECOVERY - Puppet failure on tools-exec-09 is OK: OK: Less than 1.00% above the threshold [0.0] [09:35:58] RECOVERY - Puppet failure on tools-redis is OK: OK: Less than 1.00% above the threshold [0.0] [09:36:28] RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0] [09:37:17] RECOVERY - Puppet failure on tools-exec-03 is OK: OK: Less than 1.00% above the threshold [0.0] [09:43:12] RECOVERY - Puppet failure on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [09:46:08] RECOVERY - Puppet failure on tools-exec-04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:46:58] RECOVERY - Puppet failure on tools-login is OK: OK: Less than 1.00% above the threshold [0.0] [09:56:16] RECOVERY - Puppet failure on tools-exec-11 is OK: OK: Less than 1.00% above the threshold [0.0] [09:56:35] Hi, I'm unable to connect to my VM - even after a reboot [09:56:40] $ ssh mwoffliner1.eqiad.wmflabs [09:56:40] ssh: Could not resolve hostname bastion.wmflabs.org: Name or service not known [09:56:41] ssh_exchange_identification: Connection closed by remote host [09:57:23] now it works....! Sorry... [10:02:22] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [10:32:22] RECOVERY - Puppet failure on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [0.0] [10:45:27] PROBLEM - Puppet failure on tools-exec-10 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [10:47:17] PROBLEM - Puppet failure on tools-exec-11 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [10:54:58] PROBLEM - Puppet failure on tools-exec-14 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:02:35] PROBLEM - Puppet failure on tools-exec-07 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:05:45] PROBLEM - Puppet failure on tools-dev is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:17:12] RECOVERY - Puppet failure on tools-exec-11 is OK: OK: Less than 1.00% above the threshold [0.0] [11:19:41] PROBLEM - Puppet failure on tools-webgrid-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:22:57] PROBLEM - Puppet failure on tools-submit is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:22:57] PROBLEM - Puppet failure on tools-exec-08 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [11:24:57] RECOVERY - Puppet failure on tools-exec-14 is OK: OK: Less than 1.00% above the threshold [0.0] [11:25:53] RECOVERY - Puppet failure on tools-dev is OK: OK: Less than 1.00% above the threshold [0.0] [11:26:36] PROBLEM - Puppet failure on tools-exec-02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:27:36] RECOVERY - Puppet failure on tools-exec-07 is OK: OK: Less than 1.00% above the threshold [0.0] [11:32:15] PROBLEM - Puppet failure on tools-exec-15 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:32:27] PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [11:35:24] RECOVERY - Puppet failure on tools-exec-10 is OK: OK: Less than 1.00% above the threshold [0.0] [11:37:10] PROBLEM - Puppet failure on tools-exec-13 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [11:37:59] PROBLEM - Puppet failure on tools-uwsgi-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:39:43] RECOVERY - Puppet failure on tools-webgrid-02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:39:55] RECOVERY - Puppet staleness on tools-exec-15 is OK: OK: Less than 1.00% above the threshold [3600.0] [11:48:03] RECOVERY - Puppet failure on tools-exec-08 is OK: OK: Less than 1.00% above the threshold [0.0] [11:52:43] PROBLEM - Puppet failure on tools-master is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:52:51] RECOVERY - Puppet failure on tools-submit is OK: OK: Less than 1.00% above the threshold [0.0] [11:57:30] RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0] [11:59:00] PROBLEM - Puppet failure on tools-login is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [12:02:52] RECOVERY - Puppet failure on tools-uwsgi-01 is OK: OK: Less than 1.00% above the threshold [0.0] [12:08:57] PROBLEM - Puppet failure on tools-submit is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [12:09:47] PROBLEM - Puppet failure on tools-exec-09 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [12:10:42] PROBLEM - Puppet failure on tools-webgrid-02 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [12:11:36] RECOVERY - Puppet failure on tools-exec-02 is OK: OK: Less than 1.00% above the threshold [0.0] [12:14:07] PROBLEM - Puppet failure on tools-exec-11 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [12:14:55] PROBLEM - Puppet failure on tools-webgrid-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [12:17:45] RECOVERY - Puppet failure on tools-master is OK: OK: Less than 1.00% above the threshold [0.0] [12:19:37] PROBLEM - Puppet failure on tools-trusty is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [12:29:02] RECOVERY - Puppet failure on tools-login is OK: OK: Less than 1.00% above the threshold [0.0] [12:33:54] RECOVERY - Puppet failure on tools-submit is OK: OK: Less than 1.00% above the threshold [0.0] [12:34:12] RECOVERY - Puppet failure on tools-exec-11 is OK: OK: Less than 1.00% above the threshold [0.0] [12:34:48] RECOVERY - Puppet failure on tools-exec-09 is OK: OK: Less than 1.00% above the threshold [0.0] [12:40:44] RECOVERY - Puppet failure on tools-webgrid-02 is OK: OK: Less than 1.00% above the threshold [0.0] [12:44:34] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [12:44:54] RECOVERY - Puppet failure on tools-webgrid-01 is OK: OK: Less than 1.00% above the threshold [0.0] [13:15:56] Kelson: looks like a temporary DNS issue [13:16:15] Betacommand: yes, it probably was :) [13:30:09] PROBLEM - Puppet failure on tools-exec-11 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [13:36:12] PROBLEM - Puppet failure on tools-mail is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [13:37:26] PROBLEM - Puppet failure on tools-webgrid-05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [13:43:43] PROBLEM - Puppet failure on tools-exec-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [13:45:53] PROBLEM - Puppet failure on tools-exec-09 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:45:55] 3Tool-Labs, pywikibot-core, Pywikibot-login.py, Possible-Tech-Projects: Pywikibot: Implement support for OAuth - https://phabricator.wikimedia.org/T74065#1030901 (10Qgil) Wikimedia will [[ https://phabricator.wikimedia.org/T921 | apply to Google Summer of Code and Outreachy ]] on Tuesday, February 17. If you wan... [13:47:23] PROBLEM - Puppet failure on tools-exec-10 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:53:26] PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [13:55:13] RECOVERY - Puppet failure on tools-exec-11 is OK: OK: Less than 1.00% above the threshold [0.0] [13:57:21] PROBLEM - Puppet failure on tools-webgrid-06 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [13:57:39] PROBLEM - Puppet failure on tools-exec-wmt is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [13:58:31] !log DNS is fuzzy again [13:58:32] DNS is not a valid project. [13:58:35] bah [13:58:59] PROBLEM - Puppet failure on tools-uwsgi-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [14:01:07] RECOVERY - Puppet failure on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [14:02:19] RECOVERY - Puppet failure on tools-webgrid-05 is OK: OK: Less than 1.00% above the threshold [0.0] [14:08:38] RECOVERY - Puppet failure on tools-exec-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:10:19] 3Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1030971 (10hashar) The TTL issue seems to have fixed the DNS response for a while now. Since last Friday, it is flapping against causing multiple issue on beta cluster and continuous integ... [14:11:29] Coren: hey marc :-]  DNS on labs is flapping again :/ [14:12:05] 3Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1030973 (10coren) p:5Normal>3Unbreak! This has gotten to "serious showstopper" level and I'm going to drop what I'm doing to fix this. [14:12:27] RECOVERY - Puppet failure on tools-exec-10 is OK: OK: Less than 1.00% above the threshold [0.0] [14:12:36] 3Labs, Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1030975 (10coren) [14:12:53] seems virt1000 had some outage a few minutes ago [14:12:55] based on ganglia [14:13:56] 3Labs, Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1030984 (10coren) a:5yuvipanda>3coren [14:14:54] There are too many things that self-amplify on this thing. [14:15:46] RECOVERY - Puppet failure on tools-exec-09 is OK: OK: Less than 1.00% above the threshold [0.0] [14:17:09] PROBLEM - Puppet failure on tools-mail is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [14:17:23] 3Tool-Labs, WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1030996 (10Technical13) @coren @LuisV_WMF @yuvipanda would this same ticket apply if someone just wanted to be added to a tool project but not actually usurp the tool? I ask because @Atet... [14:17:29] RECOVERY - Puppet failure on tools-webgrid-06 is OK: OK: Less than 1.00% above the threshold [0.0] [14:18:28] RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0] [14:18:51] 3Labs, Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1031000 (10akosiaris) Just an update on the sysctl side. I worked with @yuvipanda to puppetize a firewall rule for NOTRACK for DNS so that the net.netfilter.nf_conntrack_max solution... [14:19:08] Coren: If you have a minute, can you tell me if pid 25681 on tools-exec-13 is hung up on anything obvious? [14:21:25] anomie: It's stuck in a loop reading from a closed socket, afaict [14:21:54] Coren: Huh. Which fd? [14:22:40] 7. tools-exec-13.eqiad.wmflabs:57665->text-lb.eqiad.wikimedia.org:https [14:23:19] Actually, it's getting EAGAIN in a loop. [14:23:49] lsof seems to think it's "established". Weird. [14:23:53] RECOVERY - Puppet failure on tools-uwsgi-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:24:20] Hm. I misread - it's not closed, it's looping doing a read with no data and nowait. [14:24:45] PROBLEM - Puppet failure on tools-exec-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [14:24:48] You have a select() that times out, but you're still reading from the fd even though there is no data. [14:24:53] PROBLEM - Puppet failure on tools-submit is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [14:25:04] (short timeout, 100ms) [14:25:55] It's probably in LWP::UserAgent, since it's an https connection. Unless you want to look at it anymore, I'm just going to qdel it and resubmit. [14:27:04] anomie: Coren : beta cluster has some issue currently [14:27:16] might be related to DNS but I haven't looked at the varnish box yet [14:27:27] I can't reliably reach the backend for commons.beta.../api.php [14:29:00] UserAgent implements its own timeout, and that may be what isn't kicking in - it has a default of 180 but can be made to not timeout. [14:29:32] hashar: from the 'net you mean? [14:29:53] from another labs instance [14:30:14] the varnish cache is served just fine though, requests apparently hit the backend but somehow I get no reply [14:31:10] PROBLEM - Puppet failure on tools-exec-11 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [14:32:30] Coren: Huh. On tools-login, qstat is giving me errors now. Wasn't a few minutes ago. [14:32:32] error: commlib error: access denied (server host resolves rdata host "tools-login" as "tools-login.eqiad.wmflabs") [14:32:32] error: unable to contact qmaster using port 6444 on host "tools-master.eqiad.wmflabs" [14:33:59] Yeah, the DNS issue has gotten bad again. I expect there's a project that's hitting unusually hard on DNS but I'm tired of putting duct tape and bailing wire on that thing. [14:34:13] 3Labs: Move wikitech web interface to a dedicated server - https://phabricator.wikimedia.org/T88300#1031023 (10akosiaris) Wikitech is currently refusing logins. The error displayed after a long period of wait is: Incorrect password entered. Please try again. It might not be related to the wikitech move but it... [14:34:20] 3Labs: Move wikitech web interface to a dedicated server - https://phabricator.wikimedia.org/T88300#1031024 (10akosiaris) p:5High>3Unbreak! [14:37:15] 3Labs: Move wikitech web interface to a dedicated server - https://phabricator.wikimedia.org/T88300#1031034 (10chasemp) >>! In T88300#1031023, @akosiaris wrote: > Wikitech is currently refusing logins. The error displayed after a long period of wait is: > > Incorrect password entered. Please try again. confirm... [14:38:32] I see the immediate culprit though. Something is *hammering* requests for mx-tw.mail.gm0.yahoodns.net. - hudreds/s [14:40:06] PROBLEM - Puppet failure on tools-login is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [14:40:59] Hi Coren. Did you see my question on [[Phab:T87730]? [14:41:08] Hi Coren. Did you see my question on [[Phab:T87730]]? [14:41:12] @link [14:41:12] https://wikitech.wikimedia.org/wiki/Phab:T87730 [14:41:33] T13|mobile: neat trick :) [14:42:04] 3Tool-Labs, WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1031051 (10coren) No; since the software in Labs is all Open Source you can just Do It. [14:42:06] RECOVERY - Puppet failure on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [14:42:24] chasemp: even though it redirects anyways, I should probably add phab to wm-bot directly at some point. [14:42:35] RECOVERY - Puppet failure on tools-exec-wmt is OK: OK: Less than 1.00% above the threshold [0.0] [14:43:27] 3Tool-Labs, WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1031054 (10Technical13) >>! In T87730#1031051, @coren wrote: > No; since the software in Labs is all Open Source you can just Do It. In that case, can I get an admin to add me to the maint... [14:45:03] 3Tool-Labs, WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1031074 (10coren) Oh, wait - my apologies: I misunderstood your request as "add a tool project" as opposed to "be added to a tool project". For the latter, the response is "yes, that's act... [14:45:53] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [14:47:14] 3Tool-Labs, WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1031089 (10Technical13) @coren that's what I thought. Is there any process for this or discussion about setting up a process I can watch and if so where might I find links to those things,... [14:49:37] RECOVERY - Puppet failure on tools-exec-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:50:19] Ah. I see what is going on atm. Someone doing a joe job on the toolserver.org MX and the bounces being silly. [14:51:46] PROBLEM - Puppet failure on tools-exec-09 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [14:52:06] this channel is again getting over spammed [14:52:35] petan: labs DNS is flappy so shin ken keep spamling about [14:52:39] spamming about it [14:53:04] hm... yes I like to have nagios messages on irc, but all these bots in 1 channel MEH [14:53:23] I think I need to enhance my irc client [14:53:31] so that it put nagios to different window :D [14:56:47] PROBLEM - Puppet failure on tools-webgrid-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:00:08] PROBLEM - Puppet failure on tools-exec-13 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [15:01:14] RECOVERY - Puppet failure on tools-exec-11 is OK: OK: Less than 1.00% above the threshold [0.0] [15:02:42] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<12.50%) [15:04:00] PROBLEM - Puppet failure on tools-webgrid-04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:08:12] Ah-HA! [15:09:54] RECOVERY - Puppet failure on tools-submit is OK: OK: Less than 1.00% above the threshold [0.0] [15:10:00] RECOVERY - Puppet failure on tools-login is OK: OK: Less than 1.00% above the threshold [0.0] [15:10:52] RECOVERY - Puppet failure on tools-exec-catscan is OK: OK: Less than 1.00% above the threshold [0.0] [15:11:07] Why in Baal's name is dnsmasq returning SERVFAIL instead of NXDOMAIN when the name doesn't exist? [15:12:36] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [15:16:47] RECOVERY - Puppet failure on tools-exec-09 is OK: OK: Less than 1.00% above the threshold [0.0] [15:20:09] RECOVERY - Puppet failure on tools-exec-13 is OK: OK: Less than 1.00% above the threshold [0.0] [15:21:47] RECOVERY - Puppet failure on tools-webgrid-02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:28:59] RECOVERY - Puppet failure on tools-webgrid-04 is OK: OK: Less than 1.00% above the threshold [0.0] [15:31:38] wikitech seems a but busted on the password front [15:32:53] I tried to reset my password and when I submitted to the form to reset the password with my temporary password and my new password - i get incorrect password, please try again [15:53:53] manybubbles: I've heard of that issue earlier; I was hoping it was related to the DNS thing - no such luck. I'll be looking into it shortly. [15:54:14] thanks - I'll just anon it up for a while! [15:56:52] andrewbogott_afk: Or if you arrive before I get to it, ^^ [16:03:37] also morebots impacted? replies when asked but doesn't seem to !log [16:13:07] <^d> Coren: Those 2 instances are still hosed. Should I just delete + recreate at this point? [16:21:17] ^d: Hm. I might be able to fix them manually - there is nothing left to learn from them. Gimme a sec. [16:21:29] <^d> mmk. I only really need 07 [16:21:32] <^d> 09 I can kill [16:21:43] Fixing 07 [16:23:22] ^d: {{done}}. Give it 4-5 minutes and reboot it. [16:23:29] <^d> Thanks! [16:40:39] Coren: any news on wikitech? afaict people are !logging on -operations but that goes into the void unless morebots does retries and so on [16:41:53] -operations and -releng [16:42:45] godog: I think I have the DNS issue under control now; I'm making sure it sticks and I'll jump on that issue next. [16:43:17] godog: Thankfully, morebots queues things it was unable to actually do. [16:43:33] sweet, thanks! good to know at least it isn't losing anything [16:45:15] 3Release-Engineering, Wikimedia-Labs-wikitech-interface: add [[wikitech:Release Engineering/SAL]] to [[wikitech:mediawiki:sidebar]] - https://phabricator.wikimedia.org/T73165#1031450 (10greg) 5Open>3Resolved Apparently I had/have the rights: https://wikitech.wikimedia.org/w/index.php?title=MediaWiki%3ASideba... [16:45:28] 3Release-Engineering, Wikimedia-Labs-wikitech-interface: add [[wikitech:Release Engineering/SAL]] to [[wikitech:mediawiki:sidebar]] - https://phabricator.wikimedia.org/T73165#1031452 (10greg) a:5coren>3greg [17:00:38] have there been login problems on wikitech today? [17:11:22] ragesoss: Known issue. I'm looking into it now. [17:21:14] godog: Can you try lokking into wikitech for me? (I don't want to risk looking my auth token) [17:21:25] s/lokking/logging/ [17:23:01] Coren: sure, wrong password [17:23:37] oh, ffs, there's absolutely no logging going on. [17:23:58] * Coren is on it. [17:29:45] andrewbogott_afk: I can use your help once you wake. [17:33:21] 3Labs: Move wikitech web interface to a dedicated server - https://phabricator.wikimedia.org/T88300#1031700 (10coren) a:5Andrew>3coren [17:34:56] 3Labs, Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1031710 (10coren) I've managed to reduce the pressure on dnsmasq enough to unbreak the immediate major fail, but there are still issues (there was a bounce storm on the toolserver.org... [17:37:39] 3Labs, Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1031724 (10akosiaris) @Coren, I am not sure I follow, how did you reduce the pressure on dnsmasq ? And which rules did you tighten to account for the toolserver.org MX joe job ? [17:41:22] (03CR) 10Legoktm: [C: 032] "I don't agree with this." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/189901 (https://phabricator.wikimedia.org/T89153) (owner: 10Greg Grossmeier) [17:41:35] (03Merged) 10jenkins-bot: Remove Phabricator and Code-Review from -releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/189901 (https://phabricator.wikimedia.org/T89153) (owner: 10Greg Grossmeier) [17:42:55] damn, life would be easier if my laptop didn’t lock up every day or so. [17:43:05] * andrewbogott shakes fist at Yosemite [17:43:58] Coren: what’s up? [17:46:06] 3Labs: Move wikitech web interface to a dedicated server - https://phabricator.wikimedia.org/T88300#1031767 (10Andrew) This was unrelated to the move, except inasmuch as I'd hoped the move would fix it. Keystone was dead on virt1000, presumably because of OOM. Keystone is required for a wikitech login, hence b... [17:50:53] <^d> Coren: elastic07 is dandy. thanks! [17:51:07] Coren: any particular action we should take to kick morebots into flushing its backlog? [17:52:33] eh wikibugs? [17:53:18] godog: I'm not sure. I know it logs to local files first and buffers there but I don't know if there is a magic incantation to have it catch up to it [17:53:34] !log tools.wikibugs Updated channels.yaml to: 0941e5af42ab1c035b023246da5dde30b17c0f63 Remove Phabricator and Code-Review from -releng [17:53:37] there we go [17:53:38] Logged the message, Master [17:53:51] greg-g: ^ [17:54:50] tada [17:54:56] now, did it catch up the old messages? [17:55:21] doesn't look like it [17:56:13] andrewbogott: for the record, what did you fix? [17:56:34] Coren: https://phabricator.wikimedia.org/T88300#1031767 [17:56:43] I restarted keystone on virt1000. It must’ve oom’d again overnight :( [17:57:46] Oh bleh. [17:57:55] yeah [17:59:00] 3Labs: Can virt1000 take more ram? - https://phabricator.wikimedia.org/T89266#1031811 (10Andrew) 3NEW a:3Cmjohnson [17:59:34] greg-g: is it behind? [18:00:02] legoktm: I mean, it doesn't log things that were logged during the outage, right? [18:00:41] there was an outage?? [18:01:10] 3Labs, Wikimedia-Labs-Infrastructure: Internal DNS look-ups fail every once in a while - https://phabricator.wikimedia.org/T72076#1031820 (10coren) A huge pile of incoming bounces were stuck in the queue waiting to be forwarded to mostly bad email addresses alongside genuine toollab users, making exim do a lot o... [18:01:51] greg-g: judging from a quick glance to the code no it doesn't [18:03:08] Ah, I was under the impression that it did. Perhaps it's not for the same category of outages because I've seen it catch up with backlog before. [18:08:32] 3Labs, hardware-requests, ops-eqiad, operations: Can virt1000 take more ram? - https://phabricator.wikimedia.org/T89266#1031856 (10RobH) virt1000 is an R610 with a total of 16GB ram, installed via 4 * DIMM DDR3 Synchronous 1333 MHz 4GB sticks. I show the system has 12 dimm slots, and 4 of them are filled. If w... [18:09:46] andrewbogott: hi, can you explain in clear words why changing 'true' to true in puppet can break the world, apart from chaning a string to a bool ? [18:11:13] matanya: I believe it’s because ‘true’ != true [18:11:32] So if there are two places in the code that use the quoted string, and one of them is a comparison, the logic flips. [18:11:37] make sense? [18:11:38] andrewbogott: It isn't, which is why instances of 'true' should be hunted down and killed. :-) [18:12:29] that part i know :) i was trying to explain to someone who to verify there 'true' can be safely converted to true [18:12:42] the xy problem ... :) [18:12:48] How to verify... [18:12:52] testing? [18:13:11] But also, you need to search for every occurance of the variable that was set to ‘true’, and every use of that variable via parameters… [18:13:14] etc. etc. :( [18:13:17] 3Labs, hardware-requests, ops-eqiad, operations: Can virt1000 take more ram? - https://phabricator.wikimedia.org/T89266#1031910 (10RobH) fyi: determining what memory banks are in use: sudo lshw -class memory (or just pull the -class and following for a full hardware output, but its a bit overwhelming.) [18:13:25] Man, I must not understand your question because I’m just telling you what you already know. [18:14:52] like say one has managehome => 'true', how can he know that every managehome is not broken apart from going over every use of managehome ? [18:15:50] hope that clarifys andrewbogott :) i'm asking badly today ... [18:16:04] Question makes sense, but I’m pretty sure the answer is ‘you can’t’ [18:16:17] That might be an argument for fixing every incidence of ‘true’ throughout the codebase in one go. [18:16:26] :( [18:17:31] thanks andrewbogott that answer will suffice for now :) [18:18:05] It’s kind of terrible news. The only real solution is to never have made that mistake in the first place :( [18:18:40] yeah, well... [18:18:53] i'm out, see you later and thanks! [18:19:46] 3Labs, hardware-requests, ops-eqiad, operations: Can virt1000 take more ram? - https://phabricator.wikimedia.org/T89266#1031919 (10coren) Honestly, I'm a little worried to know that those services manage to explode 16G of ram and suspect there is something broken that more memory is more likely to hide than fix. [18:19:48] ‘later! [18:20:21] 3Labs, hardware-requests, ops-eqiad, operations: Can virt1000 take more ram? - https://phabricator.wikimedia.org/T89266#1031921 (10Andrew) Supporting 400 puppet clients? It doesn't surprise me that that uses a lot of ram. [18:22:44] Coren: sorry, that was a little curt. I certainly don’t object to analysis of current memory usage. Partly I don’t really know how to approach the problem. [18:22:49] 3Labs, hardware-requests, ops-eqiad, operations: Can virt1000 take more ram? - https://phabricator.wikimedia.org/T89266#1031925 (10coren) That's 40M per connection even if they were all simultaneous - probably a lot more given that we stagger much of it - that's a //lot// even for cruddy code like puppet. [18:23:32] Heh. Don't worry about it. Curt doesn't affect me. :-) [18:23:51] Hm [18:24:00] I'm worried that there is a bigger underlying problem that adding ram will just make worse in the end. [18:24:05] It is still running daily backup jobs of the giant old wiitech db [18:24:18] but that timing doesn’t line up with last nights’ outage [18:29:46] 3Release-Engineering, Engineering-Community, Wikibugs: Only use -devtools irc channel for phab-related ticket announcements - https://phabricator.wikimedia.org/T89153#1031940 (10greg) 5Open>3Resolved >>! In T89153#1031745, @gerritbot wrote: > Change 189901 merged by jenkins-bot: Thanks @legoktm. And your st... [18:29:58] 3Release-Engineering, Engineering-Community, Wikibugs: Only use -devtools irc channel for phab-related ticket announcements - https://phabricator.wikimedia.org/T89153#1031943 (10greg) p:5Triage>3Normal [18:30:06] 3Project-Creators, Wikimedia-Labs-wikistats, MediaWiki-extensions-OpenStackManager, Wikimedia-Labs-Infrastructure, Wikimedia-Labs-wikitech-interface, Labs, Labs-Vagrant, Tool-Labs-tools-Article-request, Tool-Labs, Wikimedia-Labs-Other, Beta-Cluster, Wikimedia-Labs-extdist, Wikimedia-Labs-General: Labs' Phabricator... [18:33:55] oh heh, phabricator decided to just add to the projects field all the projects I mentioned [18:34:06] this was https://phabricator.wikimedia.org/T89270 - Labs' Phabricator tags overhaul fwiw [18:34:11] input welcome :) [18:36:21] That's a neat phabricator trick. If you say a project name it adds it to the project header section [18:36:50] that trick I think dies on next update it's not universally popular [18:37:23] Count me in the camp of wishing it was gone [18:38:27] As long as it still works when creating a project by email, I guess it's OK [18:38:48] or, well, I'd rather have it understand 'Projects: #x #y #z' in an email [18:40:18] (03CR) 10Jforrester: [C: 031] Limit shown projects to 4 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/187469 (https://phabricator.wikimedia.org/T88011) (owner: 10Merlijn van Deen) [18:40:21] yeah the email case [18:40:24] it seems they are going to do [18:40:30] !project: foo [18:40:48] but it's still in discussion I think [18:40:58] (03CR) 10Legoktm: "I'd rather we just make sure the URL is always visible and never cropped off..." [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/187469 (https://phabricator.wikimedia.org/T88011) (owner: 10Merlijn van Deen) [18:41:19] valhallasw`cloud: https://secure.phabricator.com/T6819#95182 [18:43:37] that feature never worked for bugs created via conduit :( [18:45:54] yeah a few of the transaction type things never did [18:46:11] they never got the "created" line item either (had to construct that manually for migration as an example) [18:48:12] 3Project-Creators, Wikimedia-Labs-wikistats, MediaWiki-extensions-OpenStackManager, Wikimedia-Labs-Infrastructure, Wikimedia-Labs-wikitech-interface, Labs, Labs-Vagrant, Tool-Labs-tools-Article-request, Tool-Labs, Wikimedia-Labs-Other, Beta-Cluster, Wikimedia-Labs-extdist, Wikimedia-Labs-General: Labs' Phabricator... [19:01:05] chasemp: ah cool [19:06:20] (03CR) 10Merlijn van Deen: "I think it's and/and rather than or/or. Unfortunately, making sure the URL is in there is non-trivial, as there can also be text /after/ t" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/187469 (https://phabricator.wikimedia.org/T88011) (owner: 10Merlijn van Deen) [19:44:27] 3Project-Creators, Wikimedia-Labs-wikistats, MediaWiki-extensions-OpenStackManager, Wikimedia-Labs-Infrastructure, Wikimedia-Labs-wikitech-interface, Labs, Labs-Vagrant, Tool-Labs-tools-Article-request, Tool-Labs, Wikimedia-Labs-Other, Beta-Cluster, Wikimedia-Labs-extdist, Wikimedia-Labs-General: Labs' Phabricator... [19:53:36] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<37.50%) [19:58:34] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [21:02:21] 3Project-Creators, Wikimedia-Labs-wikistats, MediaWiki-extensions-OpenStackManager, Wikimedia-Labs-Infrastructure, Wikimedia-Labs-wikitech-interface, Labs, Labs-Vagrant, Tool-Labs-tools-Article-request, Tool-Labs, Wikimedia-Labs-Other, Beta-Cluster, Wikimedia-Labs-extdist, Wikimedia-Labs-General: Labs' Phabricator... [21:20:34] 3Project-Creators, Wikimedia-Labs-wikistats, MediaWiki-extensions-OpenStackManager, Wikimedia-Labs-Infrastructure, Wikimedia-Labs-wikitech-interface, Labs, Labs-Vagrant, Tool-Labs-tools-Article-request, Tool-Labs, Wikimedia-Labs-Other, Beta-Cluster, Wikimedia-Labs-extdist, Wikimedia-Labs-General: Labs' Phabricator... [21:25:24] T13|mobile, I do think this bug mentioned on my talk page can be Cyberbot's fault. [21:26:12] Why's that? [21:26:25] A str_replace bug to replace Book: with Book talk: can easily create the bug mentioned. [21:27:34] T13|mobile, ^ [21:27:53] The software is suppose to know what the NSs are unless it contains an unconventional character [21:28:06] That's not it. [21:28:41] Cyberbot I is parsing the text Book:War Book: Volume 1 and processing that page. [21:29:42] Now it wants to post on that talk page. What's one way of doing that? str_replace Book: with Book_talk: and initializing the page. However, what do you get when you perform str_replace on Book:War Book: Volume 1? [21:29:45] T13|mobile, ^ [21:31:12] Depending on how you do it, the worst case would be Book_talk:War Book_talk: Volume 1 [21:31:28] Which is what it's doing. [21:31:31] :p [21:31:56] Which would still start with Book_talk: and the software shouldn't be putting that in NS:0 [21:32:19] That's not the issue being mentioned though. :p [21:32:27] But it is. [21:33:02] The issue is why are they in both NS:0 and NS:109 [21:33:10] No it's mentioning that the bot is creating an incorrectly named page. [21:34:56] See Betacommand's comment in [[Phab:T87645#1007067]] [21:35:02] @link [21:35:02] https://wikitech.wikimedia.org/wiki/Phab:T87645#1007067 [21:39:31] Oh wow. My bot is making a real mess on the DB. :p [21:39:35] Cyberpower678: ^^ [21:39:54] Not just your bot [21:44:09] ITs not the bots fault, someone broke namespace detection [21:48:17] Betacommand: did you hear back from coren? [21:50:15] A LOT of content in that list is a result of my bot. :p [21:53:31] harej: yeah, its doable with some constraints [21:53:42] and those constraints are... [21:54:01] by the way, using OAuth may not be totally feasible, if we are collecting usernames by passing around one tablet for everyone to log in on [21:56:50] harej: keeping as little personal data as needed, clearly stating what we are collecting, who has access, how long we are keeping what parts, and removing personal data once its not longer needed [22:06:19] be back in a bit [22:10:54] 3Tool-Labs, WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1032781 (10coren) Not yet; this discussion needs to take place somewhere and starting it is on my mid-priority todo list. That said, if you find a suitable venue to get the ball rolling an... [22:13:40] Betacommand, did I miss anything since I disconnected? [22:31:37] 3Tool-Labs, WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1032866 (10Technical13) @coren http://meta.wikimedia.org/wiki/Requests_for_comment/Abandoned_Labs_tools and I've pinged everyone in this ticket on the page on Meta. [23:30:55] PROBLEM - Puppet staleness on tools-exec-15 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [23:45:52] 3Labs: Request to create Gather labs project - https://phabricator.wikimedia.org/T89185#1033004 (10rmoen) @yuvipanda As I cannot see the mobile project on wikitech, I didn't know it existed until after creating this task. I have an instance running in project Gather. Again, I think we only need the instance u... [23:45:54] RECOVERY - Puppet staleness on tools-exec-15 is OK: OK: Less than 1.00% above the threshold [3600.0] [23:47:15] RECOVERY - Puppet failure on tools-exec-15 is OK: OK: Less than 1.00% above the threshold [0.0]