[00:55:24] PROBLEM - SSH on tools-elastic-02 is CRITICAL: Server answer [01:00:25] RECOVERY - SSH on tools-elastic-02 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [01:36:26] PROBLEM - SSH on tools-elastic-02 is CRITICAL: Server answer [01:46:25] RECOVERY - SSH on tools-elastic-02 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [04:48:50] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [05:23:49] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:09:31] PROBLEM - Puppet staleness on tools-checker-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [08:52:25] bd808: "At the very least it must tell users with 2FA enabled that they cannot use Striker." <= do you mean that anyone with 2fa enabled can't use striker? [08:54:14] * zhuyifei1999_ needs to choose between horizon and striker, if that's true :/ [09:23:22] 10Tool-Labs-tools-stewardbots: Automatically start the irc bots - https://phabricator.wikimedia.org/T144461#2676571 (10MarcoAurelio) 05Open>03Resolved I think this is resolved. If you disagree, please reopen. [09:28:35] 10Tool-Labs-tools-Pageviews, 07I18n: massviews-category-description lego for "category" - https://phabricator.wikimedia.org/T146973#2676575 (10Nemo_bis) [09:42:03] 06Labs, 10Labs-Infrastructure, 10DBA, 07Upstream: db1069: convert user_groups table to InnoDB across all the wikis - https://phabricator.wikimedia.org/T146121#2676629 (10Marostegui) I have just converted S2 user_group tables to InnoDB. Note: Percona has not replied yet to the bug report after I sent the s... [10:48:15] !log tools.heritage Manually run populate_image_table.py to populate https://commons.wikimedia.org/wiki/Commons:Monuments_database/Indexed_images/Statistics [10:48:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL, Master [11:34:14] (03PS1) 10Jean-Frédéric: Update India base category [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/313388 [13:00:22] 06Labs, 10Labs-Infrastructure, 10DBA: Implement a frontend failover solution for labsdb (possibly HAProxy) - https://phabricator.wikimedia.org/T141097#2676968 (10chasemp) [13:01:53] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Initial setup and provision of labsdb1009, labsdb1010 and labsdb1011 - https://phabricator.wikimedia.org/T140452#2676972 (10chasemp) [14:39:41] andrewbogott: yuvipanda: anomie: There seems to be a serious session handling bug. I can't logout, and it seems to be impacting my ability to login to Horizon. [14:39:49] On wikitech that is. [14:40:18] CP678|Laptop: I don't understand. You can't log out of what? [14:40:27] I tried purging my cache, but that didn't fix it. [14:40:42] andrewbogott: I said wikitech [14:41:05] I can still even access my preferences after logout [14:41:43] So, you click the 'log out' link… what happens? It reloads the same page that still has the 'log out' link? [14:41:50] It says I logged out. [14:42:08] But then I go to some other page, and I'm logged in again. [14:42:25] Worked fine when I tried it, logged into and out of wikitech several times. [14:43:18] Well I was trying to display the quotas, but it was blank. So I tried logging out and logging back in. [14:43:27] From the quota page. [14:44:01] It says I'm logged out, but when I go to the main page for example, I logged back in. [14:45:24] CP678|Laptop: Do SUL wikis misbehave in the same way? (e..g wikipedia, meta, etc.)? [14:45:39] I can't reproduce the problem here either [14:45:44] No [14:45:54] They work fine, and I logged out successfully. [14:46:20] For Wikipedia, I can't log back in. [14:46:20] There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again. [14:47:07] I'm at an offsite and have to go in a minute. Probably best to open a bug and try to attract the attention of someone who has worked on sessions and/or the ldap auth plugin. [14:47:11] It's happening on all of my devices. [14:47:29] CP678|Laptop: In any case, your wikitech session shouldn't have anything to do with your horizon session (other than using the same credentials) [14:50:30] CP678|Laptop: Still can't reproduce. Check the cookies your browser is sending and make sure things are sane, e.g. sending the right cookies and not sending the same cookie twice with different values or something. [14:50:51] Well, something is corrupted. I get invalid credential on horizon, I can't logout on Wikitech, and I can't login on Wikipedia. [14:51:01] On everyone of my devices. [14:51:57] anomie: what is the likelihood of it happening on all of my devices including my iPhone. Which I don't even know how to clear the cookies of. [14:53:55] Did something change on the MW software? [14:53:58] * anomie tries to look at the logs for mention of "Cyberpower678" and sees lots of logs about hits to the long-deprecated action=tokens. [14:54:01] recently? \ [14:54:26] CP678|Laptop: There have been no non-emergency deploys this week since ops are at an offsite. [14:54:43] hmm... [14:55:41] action=tokens sounds like an API thing. My problem is on index.php [14:55:45] :/ [14:55:57] It's your bot that happens to contain your username in the agent. [14:56:16] What's the UA? [14:56:50] "wAPI/1.1 (Bot: Cyberbot I Operator: Cyberpower678 Contact: English Wikipedia Email )" [14:57:31] Oh. That's probably NoomBot, that I took over. I haven't maintained the script in a while. [15:01:59] Got back into Wikipedia. :D [15:26:09] 10Tool-Labs-tools-Pageviews, 07I18n: massviews-category-description lego for "category" - https://phabricator.wikimedia.org/T146973#2676575 (10MusikAnimal) Yeah, this one is tricky... I try not to leave markup in the message so I can ensure there's a working link. For casing, I'm actually converting the 'cate... [15:26:33] 06Labs, 10Tool-Labs, 10Pywikibot-core: Running a core script fails with 'permission denied' creating a logfile folder - https://phabricator.wikimedia.org/T146996#2677290 (10Xqt) [15:26:45] 06Labs, 10Tool-Labs, 10Pywikibot-core: Running a core script fails with 'permission denied' creating a logfile folder - https://phabricator.wikimedia.org/T146996#2677305 (10Xqt) p:05Triage>03Low [15:56:16] CP678|Laptop: "There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again." <= https://github.com/wikimedia/mediawiki/blob/c42f06642091e1cb1142066b0a52a69a0c7e783d/languages/i18n/en-ca.json#L20 [15:56:18] https://github.com/wikimedia/mediawiki/blob/fdff075d0c050f797746bc9b9dcf2736ac499100/includes/specials/SpecialCreateAccount.php#L40 [15:57:45] looks like your browser isn't keeping the cookies (and mw cann't verify the tokens) [15:57:48] *can't [16:00:50] https://github.com/wikimedia/mediawiki/blob/20995c9635a1e59dea3e2dbd176b8ac16ae57a75/includes/specialpage/AuthManagerSpecialPage.php#L412 [16:26:09] I couldn't log into Horizon this morning. [16:27:10] It appears I can now, but it's acting weird. [16:27:54] It's not showing any instances, or the quota graphs, or security groups, or.. basically anything. [16:28:05] I get a little red popup box that says: Error: Unable to retrieve usage information. [16:28:32] And "Error: Unable to retrieve instances." [16:28:53] I've tried logging in then out again. [16:30:34] Wikitech works though. [17:48:44] zhuyifei1999_: that's the last ditch option and it will only apply to things like changing your password and managing your ssh public keys I think if we end up not finding a better solution. (also I will find a better solution!) [17:49:05] k [18:01:28] yuvipanda: off site ? [18:03:55] bd808: do you have rights to increase quota ? [18:04:26] matanya: no, and the new process is to file a phab task and wait for the weekly techops meeting [18:04:42] * bd808 looks for the parent task [18:05:12] matanya: T140904 [18:05:19] ok, sigh. thanks [18:05:26] stashbot: r u sick? [18:07:50] matanya: the basic reason for the policy change is that we were handing out quota without really paying much attention and Labs use has grown to the point that we just don't have tons of spare cpu/ram/disk [18:08:16] bd808: the reasoning is clear [18:08:17] we have a reasonable cushion today but want to pay more attention to how it gets spent [18:08:42] it just doesn't play well with my limited time nowadays [18:10:02] !log tools Investigating elasticsearch cluster issues effecting stashbot [18:10:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [18:15:37] !log tools Rebooting tools-elastic-02.tools.eqiad.wmflabs via wikitech; couldn't ssh in [18:15:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [18:21:20] !log stashbot Restarted bot after fixing tool's elasticsearch cluster [18:21:20] stashbot is not a valid project. [18:21:20] Did you mean tools.stashbot instead of stashbot? [18:21:32] !log tools.stashbot Restarted bot after fixing tool's elasticsearch cluster [18:21:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL, Master [18:25:15] 06Labs: Request increased quota for Video labs project - https://phabricator.wikimedia.org/T147013#2677778 (10Matanya) [18:25:45] bd808: ^ [18:25:53] PROBLEM - Puppet staleness on tools-elastic-02 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [43200.0] [18:26:03] hope it is in the right format [18:27:29] Looks ok to me. How short are you? [18:27:42] the full 4cpu + 16G? [18:28:33] yes bd808 [18:28:35] * bd808 wishes these transcodes would actually happen in prod [18:28:51] 50/50 vcpus [18:29:01] RAM Used 53,248 of 54,000 [18:29:08] *nod* I think there will be a normal labs techops meeting on Monday [18:29:29] I will be away for about a week starting tomorrow [18:29:55] so i will need to delay my project like a week instead of doing it today [18:30:01] but that is ok, i guess [18:35:55] RECOVERY - Puppet staleness on tools-elastic-02 is OK: OK: Less than 1.00% above the threshold [3600.0] [19:03:19] (03PS1) 10Jean-Frédéric: Allow to set lang parameter in update_database [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/313451 [19:03:21] (03PS1) 10Jean-Frédéric: Expand ReadMe on development environment [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/313452 [19:37:19] PROBLEM - Puppet run on tools-docker-builder-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:41:09] 06Labs, 10Tool-Labs, 07Wikimedia-Incident: Unmount unneeded NFS mounts from tool labs hosts - https://phabricator.wikimedia.org/T136222#2327313 (10greg) This follow-up task from an incident report has not been updated recently. If it is no longer valid, please add a comment explaining why. If it is still val... [19:41:14] 06Labs, 10Tool-Labs, 07Wikimedia-Incident: Switch toollabs-webservice to be deployed with an actual deployment mechanism - https://phabricator.wikimedia.org/T136168#2325488 (10greg) This follow-up task from an incident report has not been updated recently. If it is no longer valid, please add a comment expla... [19:41:31] 06Labs, 06Operations, 07Wikimedia-Incident: Investigate better way of deferring activation of Labs LVM volumes (and corresponding snapshots) until after system boot - https://phabricator.wikimedia.org/T121629#1884194 (10greg) This follow-up task from an incident report has not been updated recently. If it is... [19:42:03] 06Labs, 07Wikimedia-Incident: Labs: investigate alternatives to maps' storage requirements - https://phabricator.wikimedia.org/T103264#1385921 (10greg) This follow-up task from an incident report has not been updated recently. If it is no longer valid, please add a comment explaining why. If it is still valid,... [19:42:07] 06Labs, 10Labs-Sprint-102, 10Labs-Sprint-103, 10Labs-Sprint-104, and 3 others: Audit projects' use of NFS, and remove it where not necessary - https://phabricator.wikimedia.org/T102240#1360124 (10greg) This follow-up task from an incident report has not been updated recently. If it is no longer valid, plea... [19:42:12] 06Labs, 07Wikimedia-Incident: Create scripts to help stagger restarts of labs VMs by different criteria - https://phabricator.wikimedia.org/T94613#1168086 (10greg) This follow-up task from an incident report has not been updated recently. If it is no longer valid, please add a comment explaining why. If it is... [19:42:18] 06Labs, 10Labs-Q4-Sprint-1, 07Wikimedia-Incident: Create a simple checklist to follow for announcing / doing planned maintenance (on labs) - https://phabricator.wikimedia.org/T94608#1168013 (10greg) This follow-up task from an incident report has not been updated recently. If it is no longer valid, please ad... [20:10:39] 10Striker, 06Community-Tech-Tool-Labs, 15User-bd808: Striker should respect TitleBlacklist bans on new account names - https://phabricator.wikimedia.org/T147024#2678188 (10bd808) [20:35:26] PROBLEM - Puppet run on tools-worker-1014 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:10:25] RECOVERY - Puppet run on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [22:42:22] RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [22:47:20] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [23:01:16] is there any way to use the old wikitech-based web proxy manager? http://discovery-experimental.wmflabs.org/ gives me a 502 after I set up a web proxy with Horizon. the backend is listed as :3838 rather than http://.shiny-r.eqiad.wmflabs:3838 (which is what our other, working web proxies are set up as) [23:01:50] bearloga: no, wikitech is on it's way out as a cloud management platform [23:06:22] can someone please help me with this problem then? discovery-experimental.wmflabs.org is supposed to point to discovery-testing.shiny-r.eqiad.wmflabs:3838 but it's not working. I'm SSH'd into discovery-testing and running `curl http://localhost:3838` gives the output I expect so everything is OK there, which makes me think something funny is going on with [23:06:22] DNS/web proxy stuff on labs [23:13:39] RECOVERY - Host tools-secgroup-test-102 is UP: PING OK - Packet loss = 0%, RTA = 0.98 ms [23:15:09] 10Striker, 06Community-Tech-Tool-Labs, 15User-bd808: Striker should respect TitleBlacklist bans on new account names - https://phabricator.wikimedia.org/T147024#2678846 (10bd808) Based on T110751#2298602 the right thing to do sounds like [[https://wikitech.wikimedia.org/wiki/Special:ApiSandbox#action=query&f... [23:20:10] bearloga: the problem looks to be the instance is listening on localhost, (via loopback), but not on the public address (on eth0) [23:20:50] `sudo lsof -i -n | grep 3838` verifys it's listing on 127.0.0.1:3838, rather than the epected 0.0.0.0:3838 [23:20:54] PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170) [23:21:19] as for why ... not sure yet i have to look into how the lxc forwarded_port stuff works in vagrant [23:21:39] you have to tell it to bind to 0.0.0.0 [23:21:44] it's a security feature [23:22:05] mw-vagrant handles it magically with our puppet role for labs [23:22:11] * bd808 looks for the setting [23:22:55] looks like host_ip [23:23:08] yeah, that's it [23:27:15] ebernhardson bd808: going to try that now [23:27:26] ebernhardson bd808: thanks! will let you know if it works :) [23:29:35] ebernhardson bd808: that was it! awesome, thank you!!! :D [23:41:54] 06Labs, 10MediaWiki-extensions-OATHAuth: Move two-factor auth data (TOTP seed) from labswiki database to LDAP - https://phabricator.wikimedia.org/T136350#2678972 (10bd808) 05Open>03declined I've become convinced that moving the secrets to LDAP is not a good solution. We need some sort of centralized servic... [23:45:33] 10Striker, 06Community-Tech-Tool-Labs, 15User-bd808: Striker should respect TitleBlacklist bans on new account names - https://phabricator.wikimedia.org/T147024#2678990 (10bd808) We should add an ip block check as well. The OAuth step of account creation will check for a block on meta (or whatever wiki is us...