[01:11:40] (03PS2) 10BryanDavis: Add css layout for small screens [labs/toollabs] - 10https://gerrit.wikimedia.org/r/273775 (https://phabricator.wikimedia.org/T119830) [01:16:10] 6Labs, 10Tool-Labs, 13Patch-For-Review, 15User-bd808: Make error pages mobile friendly - https://phabricator.wikimedia.org/T119830#2071182 (10bd808) Patch at https://gerrit.wikimedia.org/r/273775 updated to incorporate single column layout for `#thebigtable` inspired by @Dispenser's suggestion in T119830#2... [01:20:07] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2071184 (10Bianjiang) Is there any recommended way to get all article's pageviews on a daily/weekly basis? or I can build my own solution by calling the stats AP... [01:29:52] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2071195 (10ezachte) Bianjang, Daily and monthly aggregates are at https://dumps.wikimedia.org/other/pagecounts-ez/merged/ Erik [01:51:33] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2071204 (10Bianjiang) Thanks. Not sure if those dump will be affected by T114019? [02:56:44] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2071273 (10ezachte) Bianjiang, probably not. Dumps 2.0 is about database dumps, not traffic log dumps. Erik [06:32:55] 6Labs, 10Tool-Labs, 13Patch-For-Review: setup-tomcat does not work - https://phabricator.wikimedia.org/T118094#2071384 (10zhuyifei1999) @scfc Any progress on the code review? And what about the name collision? [07:30:54] 6Labs, 10Tool-Labs, 10pywikibot-core: Tool Labs: shared Pywikibot code not available - https://phabricator.wikimedia.org/T125505#2071460 (10Ato_01) It is working again. [07:35:43] 6Labs, 6Operations, 10ops-eqiad: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2071461 (10jcrespo) @liangent, user databases were lost and will not be able to be recovered. [08:11:32] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071492 (10jcrespo) [08:15:42] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071504 (10jcrespo) All queries killed and user throttled to 1 connection per user until this is resolved. [08:36:54] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071492 (10mahmoud) Can you expand upon the issue at hand? Do you know when it started or how similar the in-flight queries are/were? [08:37:39] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071525 (10mahmoud) Also what are the headings to these columns in the issue description? [08:50:00] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071546 (10jcrespo) >>! In T128355#2071525, @mahmoud wrote: > Also what are the headings to these columns in the issue description? Sorry, I assumed everybody would b... [09:00:46] 6Labs, 6Operations, 10ops-eqiad: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2071553 (10jcrespo) I would happily would, but I would prefer to actually solve the issue for you forever so you can self-serve. One tip before I research what is failing: * It is probable that commons a... [09:05:34] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071555 (10mahmoud) Ah, if it's just that pattern, it's probably the same issue just magnified across all the languages. Stephen and I will look into it, thanks! [09:28:34] 6Labs, 6Operations, 10ops-eqiad: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2071592 (10jcrespo) @liangent Your problem is that your script is trying to execute `CREATE TRIGGER` with `DEFINER=```root```@```208.80.154.151````, for which you have no rights. Removing the DEFINER, wit... [09:29:13] 6Labs, 6Operations, 10ops-eqiad: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2071593 (10jcrespo) Also, please use a different ticket for importing issues, as this is offtopic. [09:41:55] (03CR) 10MarcoAurelio: "Which specific problems present this *simple* HTML page, which can be editted by any online HTML editor as I did? If the problem is mainte" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/273772 (owner: 10MarcoAurelio) [09:43:52] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071599 (10mahmoud) OK, Stephen and I both have work tomorrow and nothing has jumped out at us, bug wise. I'll just put it out there that there were no code changes a... [09:52:54] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071602 (10jcrespo) Your code allows 338 simultaneous connections to the replica databases. That is a bug. [09:56:52] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071606 (10mahmoud) Haha, I'm sure some would argue 338 is small potatoes, but I'm not one to argue. So, what's the connection limit, or recommendation? We'll see to... [10:05:46] (03PS2) 10MarcoAurelio: Updating HTML for main ~stewardbots page. [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/273772 [10:07:23] (03CR) 10MarcoAurelio: "I have simplified the code. You can preview it accessing , pasting the code and usin" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/273772 (owner: 10MarcoAurelio) [10:10:38] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071618 (10jcrespo) > Haha I do not consider this a laughing matter. Replica labs is a fundamental piece of infrastructure for Wikimedia wikis, and many communities r... [10:12:40] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071620 (10jcrespo) > Also, is there an idle timeout or recommended connection keep alive time? Connections that are idle for over 300 seconds are killed to avoid unu... [10:26:16] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071659 (10mahmoud) Cool, I'm also up at 2:20am still looking at this, so I think you can assume I take this seriously, despite my attempts at a friendly and community... [10:30:39] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2071661 (10jcrespo) No rush here. [13:17:53] 6Labs, 10Tool-Labs, 13Patch-For-Review: setup-tomcat does not work - https://phabricator.wikimedia.org/T118094#2071965 (10scfc) @zhuyifei1999: Sorry for the delay. Regarding the name collision, I thought that it was already a problem, but it turned out that that would only happen when a change of mine is me... [13:25:35] 6Labs, 10Tool-Labs, 13Patch-For-Review: setup-tomcat does not work - https://phabricator.wikimedia.org/T118094#2072025 (10scfc) (I don't get why: ``` /usr/bin/qsub -sync y -o /dev/null -e /dev/null -i /dev/null -q "webgrid-generic" -l h_vmem=512m -b y -N "setup-$tool" tomcat7-instance-create public_tomcat >... [13:29:48] 6Labs, 10Tool-Labs, 13Patch-For-Review: setup-tomcat does not work - https://phabricator.wikimedia.org/T118094#2072033 (10scfc) Adding `-l release=trusty` worked (I did not check whether that actually produces a working Tomcat server). I'll amend your patch, @zhuyifei1999, make new packages and deploy them. [13:31:52] (03PS3) 10Tim Landscheidt: setup-tomcat: Add to install list & Change queue [labs/toollabs] - 10https://gerrit.wikimedia.org/r/272699 (https://phabricator.wikimedia.org/T118094) (owner: 10Zhuyifei1999) [13:32:50] (03CR) 10Tim Landscheidt: [C: 032] setup-tomcat: Add to install list & Change queue [labs/toollabs] - 10https://gerrit.wikimedia.org/r/272699 (https://phabricator.wikimedia.org/T118094) (owner: 10Zhuyifei1999) [13:33:13] (03CR) 10Tim Landscheidt: [V: 032] setup-tomcat: Add to install list & Change queue [labs/toollabs] - 10https://gerrit.wikimedia.org/r/272699 (https://phabricator.wikimedia.org/T118094) (owner: 10Zhuyifei1999) [13:34:02] PROBLEM - Puppet failure on tools-bastion-05 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [13:34:39] (03CR) 10Tim Landscheidt: [C: 032] "(Did not test this, but looks good.)" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/273775 (https://phabricator.wikimedia.org/T119830) (owner: 10BryanDavis) [13:36:11] (03PS1) 10Tim Landscheidt: Cut release 1.10 [labs/toollabs] - 10https://gerrit.wikimedia.org/r/273893 [13:37:30] (03CR) 10Tim Landscheidt: [C: 032] Cut release 1.10 [labs/toollabs] - 10https://gerrit.wikimedia.org/r/273893 (owner: 10Tim Landscheidt) [13:47:43] when I want to use grid engine for python3 (via virtualenv) it returns this error to me: /data/project/dexbot/p3/bin/python3: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.17' not found (required by /data/project/dexbot/p3/bin/python3) [13:47:58] I want to know if it's reported before or should I file a bug? [13:48:11] or I'm doing something wrong ;) [13:51:58] 6Labs, 10Tool-Labs: zoomviewer seems to be down - https://phabricator.wikimedia.org/T97790#2072059 (10Shyamal) 5Resolved>3Open [13:58:10] 6Labs, 10Tool-Labs, 13Patch-For-Review: setup-tomcat does not work - https://phabricator.wikimedia.org/T118094#2072078 (10scfc) 5Open>3Resolved a:3zhuyifei1999 [13:59:03] (03PS1) 10Giuseppe Lavagetto: Adding non-null content to the wikimedia.org-wiki-mail.key file [labs/private] - 10https://gerrit.wikimedia.org/r/273896 [13:59:37] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Adding non-null content to the wikimedia.org-wiki-mail.key file [labs/private] - 10https://gerrit.wikimedia.org/r/273896 (owner: 10Giuseppe Lavagetto) [13:59:42] RECOVERY - Puppet failure on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [14:13:43] 6Labs, 10Tool-Labs, 13Patch-For-Review: setup-tomcat does not work - https://phabricator.wikimedia.org/T118094#2072103 (10zhuyifei1999) Thanks. I was asking because a mail in Wikitech-I was giving me pressure :) [14:17:41] (03CR) 10Tim Landscheidt: [C: 04-1] "This would create binary packages that are different for different OS releases and cannot be handled by aptly. debian-mentors suggested t" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268563 (owner: 10Tim Landscheidt) [14:18:35] 6Labs, 10Tool-Labs, 13Patch-For-Review: setup-tomcat does not work - https://phabricator.wikimedia.org/T118094#2072112 (10Shoichi) I got the ~/public_tomcat/. It works. Thank you all very much! [14:36:37] 6Labs, 10Tool-Labs: zoomviewer seems to be down - https://phabricator.wikimedia.org/T97790#2072131 (10dschwen) Restarted the webserver 45mins ago. The rest of the time I spent figuring out my LDAP password to reply :-) [14:57:05] 6Labs, 10Tool-Labs, 13Patch-For-Review, 15User-bd808: Make error pages mobile friendly - https://phabricator.wikimedia.org/T119830#2072195 (10Dispenser) 5Open>3Resolved [15:28:49] Yoo hoo, everybody! [15:29:37] I got an email about a failed puppet run on one of my instances, and I have no idea what steps to take (other than ask for support on #wikimedia-labs) [15:32:57] ugh, my puppte.log is 0 bytes [15:33:04] puppet.log [15:33:15] and my /var/log/puppet is empty, too [15:45:02] dschwen: run puppet manually [15:45:13] that usually gives you similar results and/or an error message [15:46:02] Ok, let me check.. [15:46:29] puppet agent -tv ? [15:47:10] (03CR) 10ArthurPSmith: [C: 031] "A little more cleanup, this looks fine, thanks!" [labs/tools/ptable] - 10https://gerrit.wikimedia.org/r/273749 (owner: 10Ricordisamoa) [15:47:37] E: Package 'git-fat' has no installation candidate [15:47:47] No problem, I already got fat :-( [15:48:41] (03CR) 10Ricordisamoa: [C: 032] Share PropertyAlreadySetException and TableCell from base.py [labs/tools/ptable] - 10https://gerrit.wikimedia.org/r/273749 (owner: 10Ricordisamoa) [15:49:09] (03Merged) 10jenkins-bot: Share PropertyAlreadySetException and TableCell from base.py [labs/tools/ptable] - 10https://gerrit.wikimedia.org/r/273749 (owner: 10Ricordisamoa) [15:50:25] ha, I know the author of git-fat [15:50:34] small world [15:52:57] Anyhow Reedy , It looks like my apt sources are not up to date [15:53:44] Package git-fat is not available, but is referred to by another package. [16:14:31] I have deb http://apt.wikimedia.org/wikimedia precise-wikimedia main universe thirdparty in my sources.list.d [16:16:54] apt-cache policy shows the package with a priority of -1001 [16:17:00] that's... rather low [16:22:16] ooohhhh crap [16:22:27] I figured it out [16:22:48] I penalized the crap out of that source [16:22:56] :-/ [16:28:00] well, fixed. puppet ran. [16:46:31] liangent [16:52:57] 6Labs, 6Operations, 10ops-codfw: Figure out what labstore hardware is viable in codfw - https://phabricator.wikimedia.org/T128083#2072530 (10Papaul) labstore2002 is plugged into the switch but there is no activity light on the switch port ge-1/0/1 labstore2003 only has production DNS no mgmt DNS labstore2004... [17:03:04] 6Labs, 6Operations, 10ops-eqiad: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2072589 (10liangent) Thanks. It will not be a reimport but a move, so I'll have to take care of host names every time... There's no way to ensure reliability I think, unless you have prev.commonswiki.labsd... [17:21:33] yurik: Could you update the MOTD for tools login to use HTTPS in the urls? [17:21:40] kaldari: Did it work with Intuition? [17:22:41] Krinkle: I got your notes. I'm going to try it today. [17:23:03] Krinkle, ?? [17:23:09] yuvipanda: ^ [17:23:10] The server-side part is working great [17:23:14] sigh [17:23:17] Sorry :) [17:23:31] Krinkle, i get those pings about once a day [17:23:43] i guess i'm used to it by now ) [17:23:45] yurik: IRCcloud has a bug where it changes the tab order depending on who last spoke in any channel. You probably said something since I started typing [17:24:07] i have been quiet on labs since forever ) [17:24:14] yurik: It's global [17:24:16] bleh [17:24:17] I'm in a lot of channels [17:24:21] that's idiotic [17:24:41] oh well [17:30:13] 6Labs: New project for job candidate tests - https://phabricator.wikimedia.org/T127970#2072759 (10chasemp) a:5chasemp>3jcrespo tossing over to jaime, let me know if I can help but I think it's in your corner from here on out? [17:30:27] 6Labs, 10Labs-Infrastructure: nfs-export alert on labstore1001 - https://phabricator.wikimedia.org/T127835#2072761 (10chasemp) p:5Triage>3Normal [17:31:38] 6Labs, 7Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2072766 (10jcrespo) [17:31:40] 6Labs: New project for job candidate tests - https://phabricator.wikimedia.org/T127970#2072764 (10jcrespo) 5Open>3Resolved The setup is resolved. I will create a separate ticket for extra help or closing. [17:33:55] 6Labs, 10Labs-Infrastructure: nfs-export alert on labstore1001 - https://phabricator.wikimedia.org/T127835#2072773 (10chasemp) 5Open>3Resolved this is OK now and has been for awhile, we have little history in icinga but I see no recent errors for it. I know it was flaky for awhile but seems to be ok. It'... [17:40:55] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2072806 (10DannyH) [17:51:13] Krinkle: the real fix is T102367 rather than a MOTD note [17:51:13] T102367: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367 [17:52:16] Which is currently blocked by stats.grok.se I guess (T102457) [17:52:16] T102457: Fix all http-only tools in tools.wmflabs.org - https://phabricator.wikimedia.org/T102457 [17:52:40] bd808: we could have a whitelised HTTPS proxy [17:52:52] bd808: calling out to stats.grok.se from JS is already a privacy policy violation [17:53:07] :/ yeah [17:53:27] * bd808 is using a Labs tool right now that is very much in violation [17:53:47] http://hatjitsu.wmflabs.org is a huge pile of external js [17:54:02] and doesn't work over https either because of that [18:02:19] 6Labs, 13Patch-For-Review: Periodic internal labs dns outages - https://phabricator.wikimedia.org/T124680#2072882 (10Andrew) new datapoint: Both labs-ns2 (labservices1001) and labs-ns3 (holmium) caused alerts during the outage on the 24th. [18:10:22] Hi, is there any documentation anywhere on how to sync. the code from gerrit to the tool-labs? [18:10:56] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 7HTTPS: Fix all http-only tools in tools.wmflabs.org - https://phabricator.wikimedia.org/T102457#1364862 (10Krinkle) >>! In T102457#1391885, @yuvipanda wrote: > I can also help the maintainer of stats.grok.de setup SSL, if so desired. Or maybe Magnus' tool sho... [18:11:04] mafk: 'git pull' [18:11:39] Krinkle: so, after I "become ", then git pull ? [18:12:04] mafk: Git isn't a Wikimedia-specific technology. It's a general tool to maintain and fetch code between servers and users. [18:12:22] You have to set up the repo once in your tool account, and then 'git pull' will sync it indeed [18:12:36] mafk: Which Gerrit repository do you want to copy to there? [18:13:11] Krinkle: we have a labs/tools/stewardbots gerrit repository and I'd like to merge code from there to tools:~stewardbots [18:13:47] mafk: OK [18:13:49] git clone https://gerrit.wikimedia.org/r/labs/tools/stewardbots [18:13:56] then $ cd stewardbots [18:13:58] I did that already [18:14:02] and $ git pull; to update the code [18:14:16] I have the repository in my pc, and have submitted a patch to gerrit already [18:14:43] Was the patch merged? [18:14:44] https://gerrit.wikimedia.org/r/#/c/273772/ [18:14:47] This one? [18:15:03] https://gerrit.wikimedia.org/r/#/projects/labs/tools/stewardbots,dashboards/default:recent [18:15:19] I'm told that the random usernames assigned to helpees on #wikipedia-en-help are generated by a tool on WMFLabs - is this the case? And if so, how can I request a modification? [18:15:23] Once the patch is merged, it will be included in the next 'git pull' [18:15:24] Krinkle: not yet, I have +2 rights though; and would like to know what to do after I submit the change (cr+2 v+2) [18:15:48] mafk: It seems Matanya disagrees? Please log in to Gerrit and reply to the concerns there [18:15:56] Then those with CR+2 rights can merge the change indeed [18:16:04] I don't know this repository in particular, so maybe ask Matanya [18:16:11] I already replied, unless he's posted more comments [18:16:14] Or someone else that co-owns this tool [18:16:27] If you are a co-owner and feel comfortable merging it, go ahead. [18:16:40] CR+2 and V+2 and then "Publish and Submit" [18:16:56] (the V+2 is normally automated but this code doesnt' have tests yet so it has to be verified manually) [18:17:17] I'd preferr if matanya removed his -2 first [18:17:37] yeah, I'd love jenkins could run some tests there [19:16:23] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Unable to add service group to service groups - https://phabricator.wikimedia.org/T128400#2073155 (10yuvipanda) [19:16:53] andrewbogott: ^ filed it [19:17:17] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Unable to add service group to service groups - https://phabricator.wikimedia.org/T128400#2073170 (10yuvipanda) I want 'paws-public' to be able to access 'paws' files, so I'll probably just add them into LDAP manually? [19:41:15] PROBLEM - Puppet failure on tools-redis-1002 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [19:46:03] PROBLEM - Puppet failure on tools-webgrid-generic-1405 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:46:22] hu?: Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not parse for environment production: invalid byte sequence in US-ASCII at /etc/puppet/manifests/role/planet.pp:1 on node wikidev16videos.wikidata-dev.eqiad.wmflabs [19:46:28] fresh instance [19:47:51] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1205 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [19:48:21] PROBLEM - Puppet failure on tools-exec-1212 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [19:50:01] PROBLEM - Puppet failure on tools-exec-gift is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [19:55:02] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [0.0] [19:56:02] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1204 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [19:58:34] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:58:34] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [20:00:16] PROBLEM - Puppet failure on tools-exec-1408 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [20:00:17] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:01:02] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:01:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:04:37] PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [20:05:55] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0] [20:05:56] PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:06:01] PROBLEM - Puppet failure on tools-bastion-05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:06:11] PROBLEM - Puppet failure on tools-exec-1213 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:06:21] current puppet failures should be fixed with https://gerrit.wikimedia.org/r/#/c/273964/ [20:07:28] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [20:07:32] 6Labs, 10wikitech.wikimedia.org: "Manage Service Groups" is linked twice in sidebar - https://phabricator.wikimedia.org/T128404#2073373 (10scfc) [20:09:06] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [20:10:32] 6Labs, 10wikitech.wikimedia.org: "Manage Service Groups" is linked twice in sidebar - https://phabricator.wikimedia.org/T128404#2073373 (10Samtar) Same for me, Windows 8.1, Firefox 44.0.2 [20:11:47] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Unable to add service group to service groups - https://phabricator.wikimedia.org/T128400#2073403 (10yuvipanda) I've added `uid=tools.paws-public,ou=people,ou=servicegroups,dc=wikimedia,dc=org` to tools.paws so that works fine for now. [20:18:14] 6Labs, 10wikitech.wikimedia.org: "Manage Service Groups" is linked twice in sidebar - https://phabricator.wikimedia.org/T128404#2073373 (10Krenair) https://wikitech.wikimedia.org/wiki/MediaWiki:Sidebar/Group:user only shows one... This system is based on user roles - I wonder if one of the recent role changes... [20:21:13] RECOVERY - Puppet failure on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [0.0] [20:21:13] RECOVERY - Puppet failure on tools-redis-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [20:21:37] 6Labs, 10wikitech.wikimedia.org: "Manage Service Groups" is linked twice in sidebar - https://phabricator.wikimedia.org/T128404#2073449 (10Krenair) ```> $roles = $wgMemc->get( wfMemcKey( 'openstackmanager', 'roles', 'Alex Monk' ) ); > var_dump( $roles ); array(2) { [0]=> string(12) "projectadmin" [1]=>... [20:30:56] RECOVERY - Puppet failure on tools-webgrid-generic-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [20:35:17] RECOVERY - Puppet failure on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [20:35:17] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [20:36:03] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [20:39:39] RECOVERY - Puppet failure on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [20:41:00] RECOVERY - Puppet failure on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [20:41:16] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [20:41:32] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 7HTTPS: Fix all http-only tools in tools.wmflabs.org - https://phabricator.wikimedia.org/T102457#2073490 (10Nemo_bis) >>! In T102457#2072902, @Krinkle wrote: > Or maybe Magnus' tool should use the new [Page View API](https://wikitech.wikimedia.org/wiki/Analytic... [20:42:42] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 7HTTPS: Fix all http-only tools in tools.wmflabs.org - https://phabricator.wikimedia.org/T102457#1364862 (10Dzahn) >>! In T102457#2073490, @Nemo_bis wrote: > This bug is fixed as regards Magnus tools, I think; and it's probably too generic for the whole of Tool... [20:44:10] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [20:47:30] RECOVERY - Puppet failure on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [20:50:07] 6Labs, 10Tool-Labs, 10Traffic, 7HTTPS: Detect tools.wmflabs.org tools which are HTTP-only - https://phabricator.wikimedia.org/T128409#2073537 (10Nemo_bis) [20:50:49] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 7HTTPS: Fix all http-only tools in tools.wmflabs.org - https://phabricator.wikimedia.org/T102457#2073559 (10Nemo_bis) >>! In T102457#2073493, @Dzahn wrote: > Use as a tracking ticket. Make subtasks for individual tools who are still http-only.? We need a list... [20:51:15] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, and 2 others: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#2073569 (10Nemo_bis) [20:51:19] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 7HTTPS: Make Magnus tools on tools.wmflabs.org work in HTTPS - https://phabricator.wikimedia.org/T102457#2073566 (10Nemo_bis) 5Open>3Resolved a:3Magnus [20:51:30] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 7HTTPS: Make Magnus tools on tools.wmflabs.org work in HTTPS - https://phabricator.wikimedia.org/T102457#1364862 (10Nemo_bis) [20:59:26] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 7HTTPS: Detect tools.wmflabs.org tools which are HTTP-only - https://phabricator.wikimedia.org/T128409#2073607 (10Ricordisamoa) [21:03:02] Nemo_bis: thanks! [21:16:35] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 7HTTPS: Detect tools.wmflabs.org tools which are HTTP-only - https://phabricator.wikimedia.org/T128409#2073537 (10scfc) I don't think it is feasible to detect those tools this way because there would be way too many code paths that are only triggered under spec... [21:17:47] 6Labs, 10Tool-Labs: Puppet errors on tools-web-static-01 and tools-web-static-02 - https://phabricator.wikimedia.org/T128411#2073655 (10scfc) [21:33:59] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, 7HTTPS: Detect tools.wmflabs.org tools which are HTTP-only - https://phabricator.wikimedia.org/T128409#2073765 (10Dzahn) step 0. figure out which instance(s) are "on the proxy" step 1. what is the relevant role class? i think "dynamicproxy" module but no role... [22:07:00] 6Labs, 13Patch-For-Review: Periodic internal labs dns outages - https://phabricator.wikimedia.org/T124680#2073883 (10Andrew) To recap: Designate writes domain information to the pdns mysql database directly. Records, on the other hand, are relayed to pdns via axfr. That means there are potentially three wri... [22:13:06] RECOVERY - Puppet failure on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [22:14:42] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:13] !log ganglia project created, bastion/jessie/aggregator testing, admins dzahn, filippo [22:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ganglia/SAL, Master [22:27:15] !log ganglia nice, this project exited in the past and SAL continues as if it was never deleted :) [22:27:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ganglia/SAL, Master [22:29:31] !log ganglia why can i not add instances in this project? there is no "Add instance" link on Special:NovaInstance, unlike for other projects i'm admin of :( [22:29:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ganglia/SAL, Master [22:30:27] is it known that instance creation is broken like that? [22:30:39] i can manually construct the URL and go to https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=create&project=ganglia®ion=eqiad [22:30:52] and the dropdown also has no values [22:31:13] !log ganglia https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=create&project=ganglia®ion=eqiad broken [22:31:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ganglia/SAL, Master [22:31:49] ah, let me first try the logout/login stuff agin [22:32:40] yes, indeed, that helped [22:33:06] andrewbogott, could that be due to our caching? [22:33:08] ^ [22:33:54] !log logout/login fixed it. created instance jessie-bastion-01 [22:33:54] logout/login is not a valid project. [22:33:59] probably [22:34:01] !log ganglia logout/login fixed it. created instance jessie-bastion-01 [22:34:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ganglia/SAL, Master [23:04:03] RECOVERY - Puppet failure on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [23:07:56] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [23:13:33] 6Labs, 10Tool-Labs: Fix 'unknown's in shinken - https://phabricator.wikimedia.org/T99072#2074100 (10scfc) A lot of checks with "UNKNOWN" have "(Service Check Timed Out)" or "execution of the check script exited with exception timed out". AFAIUI, Shinken itself does not do actively check Puppet runs, etc., but...