[00:27:00] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:02:01] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [01:02:02] 06Labs, 10wikitech.wikimedia.org: Upgrade SMW to 1.9 or later - https://phabricator.wikimedia.org/T62886#2833823 (10Reedy) p:05Normal>03Lowest [01:02:11] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: osm4wiki fills error.log with warnings about uninitialized variables - https://phabricator.wikimedia.org/T151568#2833824 (10scfc) [01:13:53] (03PS3) 10BryanDavis: Add client side registration form validation [labs/striker] - 10https://gerrit.wikimedia.org/r/313142 (https://phabricator.wikimedia.org/T144710) [01:17:10] 10Striker: Gernerate minified js and source maps as a build step - https://phabricator.wikimedia.org/T151974#2833848 (10bd808) [01:17:40] (03CR) 10BryanDavis: "PS3 adds an un-minified version of parsley.js and I created T151974 to track the desire to make minification a build step." [labs/striker] - 10https://gerrit.wikimedia.org/r/313142 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [01:20:36] bd808: how hard is it to set up striker locally? or should I just review the code? [01:20:57] legoktm: there is a mw-vagrant role [01:21:07] and it comes with instructions for the manual bits [01:21:27] ...I....don't have mw-vagrant set up anymore :S [01:21:52] :disapproving stare: [01:22:09] You won't want to set it up manually ;) [01:22:12] but containers!!!111 [01:22:29] it needs a SUL wiki, an LDAP wiki, and a Phabricator instance [01:23:01] did you ever get mw-vagrant and the LXC provider to work on Fedora? [01:23:21] its pretty slick on ubuntu/debian [01:23:52] nope [01:24:02] legoktm: there's a testing project in labs too if you'd like access to that instead of a local install [01:24:13] I just have a systemd-nspawn container running now [01:24:20] bd808: well, I assume you've tested all of these patches right? [01:24:28] yes! [01:24:31] many times [01:24:45] mostly they need other eyeballs to look for dumb things [01:28:12] (03CR) 10Legoktm: [C: 032] Add client side registration form validation [labs/striker] - 10https://gerrit.wikimedia.org/r/313142 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [01:29:42] (03Merged) 10jenkins-bot: Add client side registration form validation [labs/striker] - 10https://gerrit.wikimedia.org/r/313142 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [01:31:05] * bd808 hugs legoktm [01:50:33] 06Labs, 10Tool-Labs, 06Security-Team: Allow tools to have their own X.tools.wmflabs.org subdomain - https://phabricator.wikimedia.org/T125589#2833904 (10Krinkle) [01:50:54] 06Labs, 10Tool-Labs, 06Security-Team: Allow tools to have their own ".tools.wmflabs.org" subdomain - https://phabricator.wikimedia.org/T125589#1991911 (10Krinkle) [02:04:57] (03PS2) 10BryanDavis: Update client side validation for username and shellname [labs/striker] - 10https://gerrit.wikimedia.org/r/316205 [02:04:59] (03PS3) 10BryanDavis: Check request ip for account creation blocks on Wikitech [labs/striker] - 10https://gerrit.wikimedia.org/r/316026 (https://phabricator.wikimedia.org/T147024) [02:05:01] (03PS4) 10BryanDavis: Create LDAP and Striker users from registration form data [labs/striker] - 10https://gerrit.wikimedia.org/r/313143 (https://phabricator.wikimedia.org/T144710) [02:05:03] (03PS3) 10BryanDavis: Validate new usernames with action=query&list=users&usprop=cancreate [labs/striker] - 10https://gerrit.wikimedia.org/r/316025 (https://phabricator.wikimedia.org/T147024) [02:05:05] (03PS4) 10BryanDavis: Use consistent naming for accounts [labs/striker] - 10https://gerrit.wikimedia.org/r/313145 [02:05:07] (03PS4) 10BryanDavis: Add striker.labsauth.utils.oauth_from_session helper [labs/striker] - 10https://gerrit.wikimedia.org/r/313144 (https://phabricator.wikimedia.org/T144710) [02:05:09] (03PS4) 10BryanDavis: Add a goal prompt for SSH public key upload [labs/striker] - 10https://gerrit.wikimedia.org/r/313146 (https://phabricator.wikimedia.org/T144710) [04:34:57] 06Labs, 10Tool-Labs: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980#2834080 (10bd808) [04:47:59] 06Labs, 10Tool-Labs, 15User-bd808: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980#2834141 (10bd808) a:03bd808 Looking at the nodes, I think it will be easiest to remove tools-exec-1201 through tools-exec-1211 and leave consecutively numbered hosts. [04:48:45] !log tools draining tools-exec-1201 [04:48:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:52:36] !log tools drained tools-exec-1201 (T151980) [04:52:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:52:40] T151980: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980 [04:55:59] !log tools disabled queues on tools-exec-1202 (T151980) [04:56:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:56:19] !log tools disabled queues on tools-exec-1203 (T151980) [04:56:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:56:34] !log tools disabled queues on tools-exec-1204 (T151980) [04:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:56:47] !log tools disabled queues on tools-exec-1205 (T151980) [04:56:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:57:14] !log tools disabled queues on tools-exec-1206 (T151980) [04:57:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:57:30] !log tools disabled queues on tools-exec-1207 (T151980) [04:57:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:57:44] !log tools disabled queues on tools-exec-1208 (T151980) [04:57:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:57:50] T151980: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980 [04:58:02] !log tools disabled queues on tools-exec-1209 (T151980) [04:58:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:58:16] !log tools disabled queues on tools-exec-1210 (T151980) [04:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:58:36] !log tools disabled queues on tools-exec-1211 (T151980) [04:58:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:00:16] !log tools drained tools-exec-1202 (T151980) [05:00:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:00:45] !log tools drained tools-exec-1203 (T151980) [05:00:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:04:33] !log tools drained tools-exec-1204 (T151980) [05:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:04:36] T151980: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980 [05:07:14] !log tools drained tools-exec-1205 (T151980) [05:07:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:10:57] !log tools drained tools-exec-1206 (T151980) [05:11:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:11:01] T151980: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980 [05:12:29] !log tools drained tools-exec-1207 (T151980) [05:12:32] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:13:00] !log tools drained tools-exec-1208 (T151980) [05:13:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:14:26] !log tools drained tools-exec-1209 (T151980) [05:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:18:32] !log tools drained tools-exec-1211 (T151980) [05:18:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:18:35] T151980: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980 [05:20:41] !log tools rescheduled continuous jobs on tools-exec-1210; 2 task queue jobs remain (T151980) [05:20:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:26:11] 06Labs, 10Tool-Labs, 15User-bd808: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980#2834171 (10bd808) The queues are disabled on tools-exec-1201 through tools-exec-1211. All continuous jobs have been rescheduled using `qmod -rj`. There are 2 task queue jobs still running on too... [06:40:40] PROBLEM - Puppet run on tools-exec-1416 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:58:02] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:00:40] PROBLEM - Puppet run on tools-exec-1404 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:05:59] PROBLEM - Puppet run on tools-exec-1403 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [07:15:40] RECOVERY - Puppet run on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [07:33:02] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:35:40] RECOVERY - Puppet run on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [07:40:59] RECOVERY - Puppet run on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [08:48:55] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:28:54] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [09:41:29] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2834366 (10Marostegui) >>! In T147052#2834292, @gerritbot wrote: > Change... [09:48:10] PROBLEM - Puppet run on tools-webgrid-lighttpd-1210 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [10:13:07] (03CR) 10Lokal Profil: [C: 032] Fix Flake8 violation E305 and pin flake8 dependency [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323692 (owner: 10Jean-Frédéric) [10:14:45] (03Merged) 10jenkins-bot: Fix Flake8 violation E305 and pin flake8 dependency [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323692 (owner: 10Jean-Frédéric) [10:19:38] (03PS1) 10Lokal Profil: [Blocked] Change DB engine to InnoDB [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/324396 (https://phabricator.wikimedia.org/T138517) [10:21:57] (03CR) 10Jean-Frédéric: [C: 031] "LGTM" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/324396 (https://phabricator.wikimedia.org/T138517) (owner: 10Lokal Profil) [10:23:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [11:29:03] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2834556 (10Marostegui) I have deleted all the databases that the script su... [11:51:03] Hi [11:51:06] Hi [11:51:24] Is the data dump on labs time laggged? [11:51:26] https://quarry.wmflabs.org/query/5979 [11:51:50] I've got 45 rows here showing up as orphaned, that when I check on English Wikipedia aren't [11:53:41] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:54:38] ShakespeareFan00, there is at ticket to report drift issues: https://phabricator.wikimedia.org/T138967 [11:55:14] lag can be seen on https://tools.wmflabs.org/replag/ [11:55:29] which just queries the heartbeat_p.heartbeat tables [11:57:24] 06Labs, 10DBA: Labs database replica drift - https://phabricator.wikimedia.org/T138967#2834642 (10ShakespeareFan00) https://quarry.wmflabs.org/query/5979 This as of 30 Novemeber 2016 is showing 45 rows that the query says are 'orphaned' but when checked on English Wikipedia certainly arent. [11:58:26] The replag doesn't show anything obvious as to why my 45 rows are incorrectly in the query [12:01:27] I get 61 rows on production [12:01:43] are that supposed to be 0 rows? [12:01:54] (because in the comment it is not clear) [12:03:39] the thing is [12:03:54] you are querying *link tables [12:04:03] those are filled-in asynchronously [12:04:35] which means 2 things: they are not consistent with the contents at all time (independently of the replication lag) [12:04:48] and the asnyc tasks can fail even if the edits suceeded [12:07:44] jynus: It was 45 when I queried it [12:08:03] And yes it should have been 0 as i was clearing up entries in it [12:08:21] yes, I am executing the same query on production and giving you the result there [12:08:38] sometimes reparsing/purging the pages fixes the issue [12:08:50] I'd been doing that anyway :) [12:09:14] Unless of course there's some oversight in the query specfic to the pages concerned. [12:10:25] that, I cannot say, but if there is someone that can help is probably on this channel [12:33:42] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [12:54:08] RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 1.44 ms [12:57:19] (03CR) 10Lokal Profil: [C: 04-1] "Two typos otherwise fine." (032 comments) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323689 (owner: 10Jean-Frédéric) [12:57:42] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [12:58:00] (03CR) 10Lokal Profil: "recheck" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/309844 (owner: 10Jean-Frédéric) [12:59:43] RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 1.12 ms [13:04:40] PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218) [13:34:27] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2834847 (10Marostegui) I have been taking a look at the tables the script... [13:37:14] (03PS3) 10Jean-Frédéric: Add script to print a ready-made deploy message for SAL [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323689 [13:38:19] (03CR) 10Jean-Frédéric: "Good catch with the typos − fixed :)" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323689 (owner: 10Jean-Frédéric) [14:06:52] 06Labs, 10Labs-Infrastructure, 10DBA, 13Patch-For-Review: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2834958 (10Marostegui) I have dropped all the tables suggested by the scri... [14:48:40] 06Labs: accessing Wiktionary dumps from etytree project - https://phabricator.wikimedia.org/T151933#2835048 (10chasemp) 05Open>03Resolved [14:54:10] 06Labs: accessing Wiktionary dumps from etytree project - https://phabricator.wikimedia.org/T151933#2835056 (10Epantaleo) thanks! [15:07:00] can I have a tool labs admin to restart a tool? [15:09:14] mafk: I'm about what tool are we talking about? [15:09:23] tools.coibot [15:09:43] COIBot is not answering any IRC commands for ~one week aprox. [15:10:06] in the wiki-side, it seems it's working fine [15:10:21] I've asked the mantained on Meta and Wikitech some days ago (Beetstra) [15:15:29] mafk: it seems to be crashing every day since at least hte 18th [15:16:06] Yep, it has some issues lately [15:17:09] but I restarted it as indicated [15:17:19] !log tools restart coibot 'coibot.sh -o syslog.output -e syslog.errors -r yes' [15:17:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:19:06] thanks chasemp - I saw the bot went down on IRC, waiting for respawn [15:19:54] which isn't doing [15:20:56] started and died mysteriously [15:20:58] Wed Nov 30 15:17:00 UTC 2016 [15:20:58] Starting coibot on tools-exec-1419 [15:25:07] mafk: DBD::mysql::st execute failed: SELECT command denied to user 's51229'@'10.68.23.223' for table 'coibot_rangecontentblacklist' at Parser.pl line 1883 [15:25:48] :S [15:25:49] DBI connect('coibot;bots-sql2','coibot',...) failed: Unknown MySQL server host 'bots-sql2' (0) at ReportSaver.pl line 96. [15:25:49] Dying painfully because of faulthy mysql handle at ReportSaver.pl line 98. [15:26:14] mafk: I got it started here but eventually it dies [15:26:21] and dumps errors like that [15:26:37] chasemp: I guess it needs Beetstra's magic touch [15:26:51] something is up here yes, and I don't have the time to dig in unf [15:27:14] mafk: if you are interested in being a maintainer we could explore adding you to poke at it :) [15:27:19] but atm [15:27:19] tools.coibot:*:51229:beetstra [15:27:21] chasemp: can you paste into a Phab Paste the whole error sequence of when restarting so I can show to him? [15:27:27] sure [15:28:24] restarting clean to get a distinct error log [15:28:27] chasemp: thanks & already asked Beetstra to add me there, but I'm a n00b in this area so I'd only be able to do basic stuff like restarting the bot... when and if it wants [15:29:27] https://phabricator.wikimedia.org/P4543 [15:30:00] sure I understand, no time like the present though :) [15:30:17] thanks for the paste, I'll mail it to him [15:48:34] 06Labs, 10Tool-Labs, 06WMF-Legal: Install unrar on Tool Labs - https://phabricator.wikimedia.org/T151794#2835137 (10Dispenser) $ unrar-free --list [[https://commons.wikimedia.org/wiki/File:Camera_10125.jpg|Camera_10125.jpg]] ``` unknown archive type, only plain RAR 2.0 supported(normal and solid archives), S... [15:59:29] (03CR) 10Lokal Profil: [C: 032] Add script to print a ready-made deploy message for SAL [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323689 (owner: 10Jean-Frédéric) [16:00:04] (03CR) 10Lokal Profil: [C: 032] Enable YAML linting via tox using yamllint [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/309844 (owner: 10Jean-Frédéric) [16:04:27] (03Merged) 10jenkins-bot: Add script to print a ready-made deploy message for SAL [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/323689 (owner: 10Jean-Frédéric) [16:05:44] (03Merged) 10jenkins-bot: Enable YAML linting via tox using yamllint [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/309844 (owner: 10Jean-Frédéric) [16:12:48] !log tools.heritage Deploy latest from Git master: 0ecce3e, 095108c, d452948, f8922c9, a6d9634, 4ab4148, d9afa6b [16:12:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL [17:09:17] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:23:28] jynus: Do you happen to have some time to give some advice about some queries that stopped completing? Used to be 3-4 minutes but now just gets killed because of timeout. See for example http://tools.wmflabs.org/multichill/queries2/commons/paintings_without_wikidata_ci.sql [17:25:24] you can use SHOW EXPLAIN to get the EXPLAIN [17:26:14] multichill, https://phabricator.wikimedia.org/P4545 [17:26:46] templatelinks was recently altered, check if you have to rewrite your queries to optimize for that [17:28:56] jynus: That's useful info. https://www.mediawiki.org/w/index.php?title=Manual:Templatelinks_table&action=history doesn't seem to have any recent changes. Is it up to date? [17:29:28] yes, what was not up to date was production [17:29:58] https://phabricator.wikimedia.org/T139090 [17:33:48] jynus: I have no clue how I would improve this. Would I need to force an index? Or did this query just become impossible after the index change? [17:34:38] you cannot force indexes on views [17:35:04] you may be able to rewrite it to make it faster [17:35:44] if you need support to add more views to optimize queries, file a bug [17:35:58] https://gerrit.wikimedia.org/r/#/c/190774/3/maintenance/archives/patch-tl_from_namespace.sql <- is this the actual index change? [17:36:12] there are a couple of _userindex variants on some views [17:36:26] yes [17:37:27] you could also alter the order in which the tables are queried [17:37:31] with straight join [17:37:40] it all depends on each specific case [17:38:31] I can give you a copy of production, and even optimized views, but production is optimized for mediawiki usage [17:38:54] that change was a huge win for certain wiki queries [17:39:31] alternatively, it may not have nothing to do with that change [17:39:41] I'm a complete noob when it comes to query optimization. [17:39:46] and you just hit a link with millions of usages [17:39:59] which makes the query innherently slow [17:40:36] multichill, then I would recommend you coming to https://phabricator.wikimedia.org/T149624 [17:41:15] That's a different continent, not going there [17:41:25] And I don't believe in remote participation [17:41:26] or reading wonderful free guides such as http://www.slideshare.net/jynus/query-optimization-from-0-to-10-and-up-to-57 [17:42:30] sadly, there is not a --run-faster mysql parameter (yet) [17:42:35] :-) [17:43:22] I looked at a couple of most of them were plain unreadable junk. This looks much better. [17:43:43] I have a local mediawiki instance so I can always look at the DB layout there [17:44:15] RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [17:44:29] you have the source here: https://github.com/jynus/query-optimization [17:47:04] Was time for a rewrite anyway so I think I'm just going to break it up in a couple of smaller queries and do some more local processing [17:47:51] execution limit may be higher in the future [17:48:09] as we will have dedicated resources for analytics-like processing [17:48:25] I grepped in my log. Was between 3-5 minutes so it's burning resources it shouldn't burn [17:48:50] https://phabricator.wikimedia.org/T147051 [17:50:54] 7 JOINs is a bit too much anyway, changed it to 3. [17:51:33] Let's see how long that takes [17:51:49] despite the common myth, joins are not a performance problem [17:52:04] the issue there is the large range on templatelinks [17:52:54] cut off [17:54:27] It does make it easier to focus ;-) The query is doing a filepage.page_id=ctl.tl_from JOIN. Isn't tl_from indexed? [17:58:37] jynus: How big is the templatelinks table anyway for Commons? Much bigger than other wiki's? [17:59:51] incredible big [18:00:13] like hunderds of GB big, IIRC [18:00:28] what it is worse [18:00:40] a single template can have half of the table [18:00:53] I think discovery run into this problem recently [18:01:33] there is a single text row for each time a template is used, which is quite common on commons :-) [18:01:34] I remember back in the day we did some hacks on Commons to keep some links from entering the database because tables exploded. I think it was interwiki links [18:01:55] Each page uses on average probably 30 templates [18:01:58] there is a session about that on the wikitech17 [18:02:14] it being large is a problem [18:02:26] Good reason to finally switch some junk to LUA [18:02:27] some templates having millions of usages is worse [18:02:51] because people assume they will get a few dozen records [18:02:55] !bash < jynus> sadly, there is not a --run-faster mysql parameter (yet) [18:02:56] bd808: Stored quip at https://tools.wmflabs.org/bash/quip/AVi2Z1eBQMK9DA-FJpXK [18:02:56] and they get millions [18:03:58] So for now, stay away from the template table on Commons? [18:04:33] Do you graph the size of that table somewhere jynus? Might be a user change that made the number of links explode? [18:05:59] multichill, I wanted to do that, and had the code for it [18:06:17] but some people expressed concern for privacy [18:06:35] so I have to implement it, but under NDA [18:07:01] Huh? How is the size of the templatelinks table a privacy concern? [18:07:04] on a private prometheus (which is actually more work) [18:07:08] multichill, not that one [18:07:11] but others [18:07:18] it is easy to have all [18:07:28] Like for example? [18:07:30] not easy to cherry pick this one yes [18:07:35] this one no [18:08:00] I do not know, I asked, security/performance said no [18:08:02] For which Commons table the size would be a privacy concern? [18:08:10] ask them [18:08:35] there is also another practical problem [18:08:49] Some sort of privacy ghost haunting the foundation? I've seen some odd changes with reason "privacy" :P [18:08:54] we have close to 100.000 tables, so I need more development [18:09:09] but I welcome patches [18:09:41] I cannot record that size every minute for s3, unlike the other thigs I expose [18:09:53] so needs more work [18:09:57] For performance top 10 wiki's are probably most interesting I guess? [18:10:38] which means I have to implement that and I have no time right now because I am trying to fix labs [18:11:09] Priorities.... I guess I'm just going to hit the competition ( https://commons.wikimedia.org/w/index.php?title=File:Laguna_Hedionda,_Bolivia,_2016-02-03,_DD_50.JPG&action=cirrusdump ) [18:13:51] 06Labs, 10Tool-Labs, 13Patch-For-Review: error starting webservice - https://phabricator.wikimedia.org/T142932#2835715 (10scfc) 05Open>03Resolved a:05valhallasw>03Andrew >>! In T142932#2827290, @chasemp wrote: > Sure, I was thinking of webservices running in k8s land. We had some issues a week ago t... [18:24:39] Hi, my home folder seems to have changed to /mnt/nfs/labstore-secondary-tools-home/tom29739 from the /home/tom29739 it was before. Is that supposed to happen? [18:27:25] tom29739_ are you on tools? [18:27:34] Yep it is expected on tools [18:27:40] as it was migrated to nfs. [18:28:07] paladox, yes, but the user homes have been on NFS for ages. [18:28:17] Oh [18:28:52] If I SSH to tools-bastion-02 then I get the same home folder that I get on tools-bastion-03 [18:29:33] tom29739_: that's the actual path you should still be referencing /home as usual [18:29:44] wherever that path surfaced don't worry about it and use /home [18:29:48] OK. [18:34:35] andrewbogott: Hi, The new nwoffliners servers you have achieved to organized for me are meanwhile setup and ready to start in December for a montly run. [18:35:02] andrewbogott: Thank you again very much for your help here. [18:35:56] andrewbogott: But I would have a last request. One time the new ZIM files generated we need to get them retrieved per rsync from our downlaod server via rsync. Might that be possible, like for mwoffliner [1..3 [18:36:21] ] to assign them a public ip (on mwoffliner4 and mwoffliner5)? [18:38:04] Kelson: is it possible to rsync out to a static IP on the other end? We could do this I believe but using static IPs for an rsync target only isn't a great option [18:40:04] chasemp: Yes, I understand... It's just not how the whole stuff is thought. I try as much as possible to avoid to use rsync in push mode. [18:40:37] chasemp: maybe using one of he VM as a gateway would be the solution.... [18:40:41] can you rsync via ssh that hops through the bastion? [18:41:05] it would be easy to create a wikitech account just for that and add it to the project [18:42:01] bd808: It's an alternative indeed... [18:42:38] seems like yes http://unix.stackexchange.com/questions/43094/how-to-use-rsync-with-a-remote-remote-host [18:43:00] think rsync just reads your ssh config right? [18:43:04] then you can give back 3 static ips instead of taking 2 more :) [18:44:29] bd808: chasemp: Have to rethink again about that problem. Just thought this would be easy to assign public ips. Thx for your quick feedbacks. [18:45:11] Kelson: sure yeah, thanks for thinking on it [18:45:15] it is, but they're limited in number [18:45:24] we have a /25 for the whole of labs [18:45:53] can't really just give them out for convenience alone [18:47:09] Krenair: yes, if you are short on ips (which is probably the case, everybody is more or less) I'll do my best to find a way to free at least 2 of the one I use. [18:48:32] someday we'll have IPv6 for Labs. That will be a happy day [18:49:35] bd808: oh yes! [18:51:14] 06Labs, 10Labs-Infrastructure, 10Icinga, 06Operations: remove/fix "Check for gridmaster host resolution" Icinga check for "labtest" - https://phabricator.wikimedia.org/T152024#2835831 (10Dzahn) p:05Triage>03Low [18:52:46] 06Labs, 10Labs-Infrastructure, 10Icinga, 06Operations: remove/fix "Check for gridmaster host resolution" Icinga check for "labtest" - https://phabricator.wikimedia.org/T152024#2835820 (10Dzahn) [19:19:19] 06Labs, 10Labs-Infrastructure, 10Icinga, 06Operations: remove/fix "Check for gridmaster host resolution" Icinga check for "labtest" - https://phabricator.wikimedia.org/T152024#2835945 (10Krenair) "gridmaster host resolution" is a tools project specific thing, why is it even in icinga instead of shinken? [20:02:43] (03CR) 10Andrew Bogott: [C: 04-1] Create LDAP and Striker users from registration form data (033 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/313143 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:03:24] 06Labs, 10Tool-Labs: /etc/cron.daily/logrotate: gzip: stdin: file size changed while zipping - https://phabricator.wikimedia.org/T96007#1206058 (10scfc) I can't find any "gzip: stdin: file size changed while zipping" in the mails I received last week and I don't remember any verbose `cron` mails from `logrotat... [20:09:13] (03CR) 10BryanDavis: Create LDAP and Striker users from registration form data (033 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/313143 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:10:30] (03CR) 10Andrew Bogott: [C: 031] Add striker.labsauth.utils.oauth_from_session helper [labs/striker] - 10https://gerrit.wikimedia.org/r/313144 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:12:47] (03CR) 10Andrew Bogott: Create LDAP and Striker users from registration form data (032 comments) [labs/striker] - 10https://gerrit.wikimedia.org/r/313143 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:16:57] (03CR) 10Andrew Bogott: [C: 031] Create LDAP and Striker users from registration form data (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/313143 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:22:56] (03CR) 10Andrew Bogott: [C: 032] Create LDAP and Striker users from registration form data [labs/striker] - 10https://gerrit.wikimedia.org/r/313143 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:23:09] (03CR) 10Andrew Bogott: [C: 032] Add striker.labsauth.utils.oauth_from_session helper [labs/striker] - 10https://gerrit.wikimedia.org/r/313144 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:24:20] (03Merged) 10jenkins-bot: Create LDAP and Striker users from registration form data [labs/striker] - 10https://gerrit.wikimedia.org/r/313143 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:24:35] (03Merged) 10jenkins-bot: Add striker.labsauth.utils.oauth_from_session helper [labs/striker] - 10https://gerrit.wikimedia.org/r/313144 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:24:43] (03CR) 10Andrew Bogott: [C: 032] Use consistent naming for accounts [labs/striker] - 10https://gerrit.wikimedia.org/r/313145 (owner: 10BryanDavis) [20:25:34] (03Merged) 10jenkins-bot: Use consistent naming for accounts [labs/striker] - 10https://gerrit.wikimedia.org/r/313145 (owner: 10BryanDavis) [20:29:18] (03CR) 10Andrew Bogott: [C: 04-1] Add a goal prompt for SSH public key upload (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/313146 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:40:10] (03PS5) 10BryanDavis: Add a goal prompt for SSH public key upload [labs/striker] - 10https://gerrit.wikimedia.org/r/313146 (https://phabricator.wikimedia.org/T144710) [20:40:26] (03CR) 10BryanDavis: Add a goal prompt for SSH public key upload (031 comment) [labs/striker] - 10https://gerrit.wikimedia.org/r/313146 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:47:47] (03CR) 10Andrew Bogott: [C: 032] Validate new usernames with action=query&list=users&usprop=cancreate [labs/striker] - 10https://gerrit.wikimedia.org/r/316025 (https://phabricator.wikimedia.org/T147024) (owner: 10BryanDavis) [20:50:56] (03CR) 10Andrew Bogott: [C: 032] Check request ip for account creation blocks on Wikitech [labs/striker] - 10https://gerrit.wikimedia.org/r/316026 (https://phabricator.wikimedia.org/T147024) (owner: 10BryanDavis) [20:55:28] (03CR) 10Andrew Bogott: [C: 032] Update client side validation for username and shellname [labs/striker] - 10https://gerrit.wikimedia.org/r/316205 (owner: 10BryanDavis) [20:55:56] (03CR) 10Andrew Bogott: [C: 032] Add a goal prompt for SSH public key upload [labs/striker] - 10https://gerrit.wikimedia.org/r/313146 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:56:15] * bd808 hugs andrewbogott [20:56:26] what's this going to cost me? ;) [20:56:56] (03Merged) 10jenkins-bot: Add a goal prompt for SSH public key upload [labs/striker] - 10https://gerrit.wikimedia.org/r/313146 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:58:18] (03CR) 10BryanDavis: [C: 04-2] "Needs to be rebuilt" [labs/striker/staticfiles] - 10https://gerrit.wikimedia.org/r/313152 (https://phabricator.wikimedia.org/T144710) (owner: 10BryanDavis) [20:59:33] bd808: I'll think of something :) [20:59:47] 06Labs, 10Tool-Labs, 10DBA: Tool Labs: Add skin, language, and variant to user_properties_anon - https://phabricator.wikimedia.org/T152043#2836353 (10Krinkle) [21:00:00] 06Labs, 10Tool-Labs, 10DBA, 07Regression: Tool Labs: Add skin, language, and variant to user_properties_anon - https://phabricator.wikimedia.org/T152043#2836368 (10Krinkle) p:05Triage>03High [21:06:45] I created a new host toolsbeta-valhallasw-puppet-compiler-01, but puppet is not running due to 'puppet-agent[557]: Could not request certificate: getaddrinfo: Name or service not known"'. This may be due to the old local puppetmaster, but I can't find any references to it (Hiera:Toolbeta has been deleted) [21:08:18] (I wouldn't know why else a dns resolution would fail, but unfortunately the log does not indicate which address is being resolved...) [21:10:22] jynus: I think I eliminated all usage of the templatelinks table and running the bot now. Fingers crossed ;-) [21:19:23] yuvipanda: I just noticed I still had a bot using http://wdq.wmflabs.org/ and it seems to be down [21:22:46] PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:23:36] PROBLEM - Puppet run on tools-webgrid-lighttpd-1407 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:23:38] àha [21:23:40] "Error: Failed to apply catalog: Parameter source failed on File[/usr/local/sbin/puppet-run]: Could not understand source" [21:23:44] PROBLEM - Puppet run on tools-exec-1405 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:24:24] PROBLEM - Puppet run on tools-exec-1219 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:24:27] so at least puppet is running now, albeit broken ;-) [21:24:32] PROBLEM - Puppet run on tools-docker-registry-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:24:56] PROBLEM - Puppet run on tools-worker-1005 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:25:04] PROBLEM - Puppet run on tools-exec-1218 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:25:36] PROBLEM - Puppet run on tools-webgrid-lighttpd-1204 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:25:39] PROBLEM - Puppet run on tools-docker-registry-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:25:55] PROBLEM - Puppet run on tools-proxy-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:26:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:26:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:26:43] PROBLEM - Puppet run on tools-exec-1404 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:26:47] PROBLEM - Puppet run on tools-webgrid-lighttpd-1404 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:27:09] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:27:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:27:15] 06Labs, 10wikitech.wikimedia.org: Script to check blocks no longer works - https://phabricator.wikimedia.org/T152047#2836476 (10scfc) [21:27:15] PROBLEM - Puppet run on tools-logs-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:27:16] PROBLEM - Puppet run on tools-redis-1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:27:16] PROBLEM - Puppet run on tools-exec-1203 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:27:34] PROBLEM - Puppet run on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:27:40] PROBLEM - Puppet run on tools-exec-1408 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:28:10] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:28:14] PROBLEM - Puppet run on tools-exec-1205 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:28:20] PROBLEM - Puppet run on tools-worker-1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:28:30] PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:28:38] PROBLEM - Puppet run on tools-worker-1022 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:28:38] PROBLEM - Puppet run on tools-exec-1215 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:29:02] PROBLEM - Puppet run on tools-exec-1415 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:29:10] PROBLEM - Puppet run on tools-exec-1413 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:29:18] PROBLEM - Puppet run on tools-worker-1015 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:29:21] PROBLEM - Puppet run on tools-worker-1019 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:30:04] PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:30:18] PROBLEM - Puppet run on tools-webgrid-lighttpd-1413 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:30:28] PROBLEM - Puppet run on tools-webgrid-lighttpd-1406 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:30:28] PROBLEM - Puppet run on tools-static-10 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:31:59] PROBLEM - Puppet run on tools-worker-1003 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:31:59] PROBLEM - Puppet run on tools-exec-1403 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:32:11] PROBLEM - Puppet run on tools-exec-1419 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:32:29] PROBLEM - Puppet run on tools-worker-1014 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:32:55] PROBLEM - Puppet run on tools-exec-1213 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:33:47] PROBLEM - Puppet run on tools-bastion-05 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:33:53] PROBLEM - Puppet run on tools-exec-1211 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:34:05] matanya: do you have any stats handy for how much content the 'video' project has contributed to commons? [21:34:22] yes andrewbogott [21:34:22] PROBLEM - Puppet run on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:34:22] PROBLEM - Puppet run on tools-docker-builder-03 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:34:27] few secs [21:34:50] PROBLEM - Puppet run on tools-exec-1402 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:36:00] PROBLEM - Puppet run on tools-exec-1202 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:36:28] 5665, videos to-date andrewbogott [21:37:10] matanya: awesome, do you happen to have total data size as well? (We're just gathering up some Labs usage stats for a report) [21:37:38] andrewbogott: as in byes used ? [21:37:53] matanya: yes, but only if you already have it [21:38:00] Or total minutes or hours would be good [21:38:08] (that one's almost certainly harder) [21:38:08] let me check [21:40:00] andrewbogott: you will need to write some python to calculate that, i fear [21:40:14] matanya: ok — that's fine, I'll wait and see if we really need it. [21:40:16] Thank you! [21:40:22] something along the lines: get files in https://commons.wikimedia.org/wiki/Category:Uploaded_with_video2commons [21:40:36] query api for length/size [21:40:57] is video2commons == labs 'video' project? [21:41:08] yes [21:41:24] video2commons is the name of the app and the frontend [21:42:00] the video project hosts that, but mostly the transcoders, downloaders, redis, converters etc [21:42:19] nice [21:43:19] hope i am not drawing too many resources again [21:43:38] matanya: nope! big numbers are good for propaganda :) [21:43:53] ah, that yeah :) [22:02:15] RECOVERY - Puppet run on tools-redis-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [22:02:47] RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [22:03:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [22:03:12] RECOVERY - Puppet run on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [22:03:34] RECOVERY - Puppet run on tools-webgrid-lighttpd-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [22:03:40] RECOVERY - Puppet run on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [22:03:42] RECOVERY - Puppet run on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [22:04:22] RECOVERY - Puppet run on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0] [22:04:30] RECOVERY - Puppet run on tools-docker-registry-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:04:56] RECOVERY - Puppet run on tools-worker-1005 is OK: OK: Less than 1.00% above the threshold [0.0] [22:05:04] RECOVERY - Puppet run on tools-exec-1218 is OK: OK: Less than 1.00% above the threshold [0.0] [22:05:33] RECOVERY - Puppet run on tools-static-10 is OK: OK: Less than 1.00% above the threshold [0.0] [22:05:37] RECOVERY - Puppet run on tools-docker-registry-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:05:37] RECOVERY - Puppet run on tools-webgrid-lighttpd-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [22:05:57] RECOVERY - Puppet run on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:06:07] RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [22:06:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [22:06:39] RECOVERY - Puppet run on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [22:06:47] RECOVERY - Puppet run on tools-webgrid-lighttpd-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:10] RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:10] RECOVERY - Puppet run on tools-exec-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:16] RECOVERY - Puppet run on tools-logs-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:16] RECOVERY - Puppet run on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:32] RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [22:07:38] RECOVERY - Puppet run on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [22:08:23] RECOVERY - Puppet run on tools-worker-1004 is OK: OK: Less than 1.00% above the threshold [0.0] [22:08:29] RECOVERY - Puppet run on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [22:08:37] 06Labs, 10wikitech.wikimedia.org: Script to check blocks no longer works - https://phabricator.wikimedia.org/T152047#2836593 (10MarcoAurelio) Those users have been reblocked with hideuser by me yesterday because of their abusiveness as you are well aware of. I cannot say if there's a technical workaround for y... [22:08:40] RECOVERY - Puppet run on tools-worker-1022 is OK: OK: Less than 1.00% above the threshold [0.0] [22:08:54] RECOVERY - Puppet run on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [0.0] [22:09:02] RECOVERY - Puppet run on tools-exec-1415 is OK: OK: Less than 1.00% above the threshold [0.0] [22:09:14] RECOVERY - Puppet run on tools-exec-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [22:09:18] RECOVERY - Puppet run on tools-worker-1015 is OK: OK: Less than 1.00% above the threshold [0.0] [22:09:20] RECOVERY - Puppet run on tools-worker-1019 is OK: OK: Less than 1.00% above the threshold [0.0] [22:09:20] RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [22:09:24] RECOVERY - Puppet run on tools-docker-builder-03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:10:02] RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [22:10:19] RECOVERY - Puppet run on tools-webgrid-lighttpd-1413 is OK: OK: Less than 1.00% above the threshold [0.0] [22:10:29] RECOVERY - Puppet run on tools-webgrid-lighttpd-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [22:11:01] RECOVERY - Puppet run on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [22:11:59] RECOVERY - Puppet run on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [22:11:59] RECOVERY - Puppet run on tools-worker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [22:12:29] RECOVERY - Puppet run on tools-worker-1014 is OK: OK: Less than 1.00% above the threshold [0.0] [22:12:53] RECOVERY - Puppet run on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [0.0] [22:13:47] RECOVERY - Puppet run on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [22:14:49] RECOVERY - Puppet run on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [22:26:59] PROBLEM - Puppet run on tools-worker-1008 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [22:27:00] 06Labs, 10wikitech.wikimedia.org: Script to check blocks no longer works - https://phabricator.wikimedia.org/T152047#2836627 (10scfc) 05Open>03declined The script runs logged out and (after becoming oversighter) rewriting that is too much work for me for a single vandal, so I'll limit myself to vandals I s... [22:27:49] 06Labs, 10wikitech.wikimedia.org: Script to check blocks no longer works - https://phabricator.wikimedia.org/T152047#2836631 (10scfc) [22:41:59] RECOVERY - Puppet run on tools-worker-1008 is OK: OK: Less than 1.00% above the threshold [0.0] [22:45:15] 10Tool-Labs-tools-Other: Some yifeibot tasks seem to hang indefinately - https://phabricator.wikimedia.org/T152054#2836683 (10bd808) [22:47:27] !log tools.yifeibot Deleted 2 jobs running on tools-exec-1210 for many hours/days (T151980) [22:47:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.yifeibot/SAL [22:47:32] T151980: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980 [22:48:55] 10Tool-Labs-tools-Other: Some yifeibot tasks seem to hang indefinately - https://phabricator.wikimedia.org/T152054#2836708 (10bd808) I killed the first two jobs so I could finish decommissioning the host they were running on. [22:54:36] !log tools Removed tools-exec-12[00-11] from @general hostgroup [22:54:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:01:19] 06Labs, 10DBA: Prepare and check storage layer for new fi.wikivoyage.org - https://phabricator.wikimedia.org/T151756#2836726 (10jcrespo) I've filtered the database on sanitarium (db1069) and checked it is filtered on labs. I suppose views are pending. [23:04:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [23:05:05] 06Labs, 10Labs-Infrastructure, 10DBA, 10Datasets-General-or-Unknown, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2836731 (10jcrespo) I wanted to sanitize this for T151756, I realized the database hasn't been... [23:06:48] !log tools Removed tools-exec-12[00-11] from gridengine (T151980) [23:06:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:06:52] T151980: Reduce Precise OGE exec hosts to 10 - https://phabricator.wikimedia.org/T151980 [23:08:58] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Quietmouse was created, changed by Quietmouse link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Quietmouse edit summary: Created page with "{{Tools Access Request |Justification=I am going to do a number of mesh networking projects here. I am setting up a mesh network at a K-12 school using a solution from open-me..."