[00:01:58] icinga.wmflabs.org is deleted? sigh [00:02:17] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [00:28:17] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [01:03:17] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [01:43:13] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [01:54:29] 6Labs, 10Labs-Infrastructure, 6operations, 3labs-sprint-117: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1699192 (10Andrew) Ah, mdns is a new service which I haven't thought much about. Thanks for fixing in the short-term, I'll work on a better solutio... [01:59:13] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [02:18:13] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [02:34:11] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [03:25:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [03:32:01] 6Labs: Labs project: popcorn - https://phabricator.wikimedia.org/T114514#1699236 (10Revi) [04:05:13] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [05:15:23] 10Tool-Labs-tools-Article-request: Click targets for checkboxes should include the labels - https://phabricator.wikimedia.org/T114496#1699283 (10Matthewrbowker) 5Open>3Resolved Done, see https://github.com/Matthewrbowker/articlerequest/commit/161f5a4e5a06de8a2825fde84d0475957eb6f3a6 [05:26:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [0.0] [06:55:20] 10MediaWiki-extensions-OpenStackManager, 10Librarization, 10MediaWiki-extensions-Translate: Bring in spyc for OpenStackManager and Translate via composer - https://phabricator.wikimedia.org/T75945#1699297 (10Nikerabbit) If you care or need about standard compliant parsing and generation. It's the uncommon sy... [07:23:27] 6Labs, 10Tool-Labs, 10Adminbot, 6operations: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1699308 (10Steinsplitter) means that mwclient is broken right now on labs? [07:31:16] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [07:50:48] 6Labs, 10Tool-Labs, 10Adminbot, 6operations: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1699312 (10zhuyifei1999) >>! In T114365#1699308, @Steinsplitter wrote: > means that mwclient is broken right now on labs? @Andrew fixed it [08:27:14] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [09:02:16] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [09:28:18] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0] [09:39:32] 6Labs, 10Tool-Labs, 7Database: s51053 is abusing resources on labsdbs, throttle his grants - https://phabricator.wikimedia.org/T114559#1699378 (10jcrespo) 3NEW a:3jcrespo [09:50:41] 6Labs, 10Tool-Labs, 7Database: s51053 (tools.jackbot) is abusing resources on labsdbs, throttle his grants - https://phabricator.wikimedia.org/T114559#1699388 (10valhallasw) [09:53:04] 6Labs, 10Tool-Labs: provide easier way to contact people abusing resources - https://phabricator.wikimedia.org/T114560#1699391 (10valhallasw) 3NEW [10:15:28] 6Labs, 10Tool-Labs, 7Database: s51053 (tools.jackbot) is abusing resources on labsdbs, throttle his grants - https://phabricator.wikimedia.org/T114559#1699408 (10JackPotte) The site https://tools.wmflabs.org/jackbot/snottywong/ has a very high number of visitors, it had been developed by snottywong for the E... [11:18:01] 6Labs, 10Tool-Labs: provide easier way to contact people abusing resources - https://phabricator.wikimedia.org/T114560#1699437 (10valhallasw) [12:03:59] abusing? or misusing? intent is important [12:08:44] my running jobs all died about 8 minutes ago :-/ [12:18:02] sDrewth: which ones? [12:18:24] which jobs? [12:18:57] I had six djvutext.py running putting text onto enWS [12:19:08] restarted now [12:19:35] 6Labs, 10Tool-Labs, 7Database: s51053 (tools.jackbot) is abusing resources on labsdbs, throttle his grants - https://phabricator.wikimedia.org/T114559#1699515 (10jcrespo) Hi, @JackPotte, can you confirm you are s51053? High number of hits are not currently an issue. However, mainly in labsdb1001 (enwiki, c... [12:25:00] sDrewth: which user, which jobs, how did you submit them, ... [12:26:24] wikisource-bot, jobs submitted from command line hours ago that were trundling away [12:27:16] example nohup python pwb.py djvutext.py -lang:en -family:wikisource -djvu:Encyclopedia_of_Virginia_Biography_volume_5.djvu -index:Encyclopedia_of_Virginia_Biography_volume_5.djvu -pt:30 -log -always >> Virg5.log 2>> Virg5.err & [12:28:08] sDrewth: I count 12 jobs running, all on tools-login [12:28:16] where they shouldn't be running in the first place [12:28:34] so I'm going to guess the jobs didn't die, but your connection died, and because of the nohup they are still running [12:28:49] ah, okay [12:28:49] sDrewth: please run them on the grid. [12:29:51] I thought that I would still be able to see them, even if nohup'd [12:30:29] coet|cawiki: there are tools.cobain jobs running on tools-login again. Could you clear them out? Thanks. [12:31:09] Negative24: you have some jobs running on tools-login as well (python main.py, 3 processes). Please move them to the grid. [12:33:58] valhallasw`cloud: if submitted to the grid, do they still need to be nohup'd? [12:34:24] sDrewth: no, I think nohup might even cause sge to lose the jobs (not sure) [12:34:49] k, sooooo much to learn for nuff nuffs like me [12:35:25] yeah, sge is not the most user-friendly system :/ [12:39:40] and if I want to submit a list of ten files to slowly churn through, I am not sure how to string them together in SGE, hence why I just go for plain and simple [12:40:01] and I don't want to dump them all on at the one time [12:40:42] sDrewth: the easiest is to create a shell script that has all of the steps in it, then submitting that shell script to the grid [12:41:21] then they will all be done in order [12:41:44] you can probably also just submit them at the same time; pywikibot will make sure you don't edit too fast, even with multiple processes [12:58:20] 6Labs, 10Tool-Labs, 7Database: s51053 (tools.jackbot) is abusing resources on labsdbs, throttle his grants - https://phabricator.wikimedia.org/T114559#1699540 (10Krenair) >>! In T114559#1699515, @jcrespo wrote: > Hi, @JackPotte, can you confirm you are s51053? You can check this stuff like so: ```krenair@t... [13:00:22] or just run 'ps uwx' and look at the user column [13:01:50] valhallasw`cloud: sorry, I look it as soon as possible [13:02:47] 6Labs, 10Tool-Labs, 7Database: s51053 (tools.jackbot) is abusing resources on labsdbs, throttle his grants - https://phabricator.wikimedia.org/T114559#1699544 (10jcrespo) @Krenair, yes, but many people do not use Phabricator, or IRC, or email, or the Wikis,... there is no single point of contact. Hence T1145... [13:28:22] 6Labs, 10Tool-Labs, 7Database: s51053 (tools.jackbot) is abusing resources on labsdbs, throttle his grants - https://phabricator.wikimedia.org/T114559#1699568 (10JackPotte) So I propose to study the queue system with https://github.com/wikimedia/analytics-quarry-web during my next free time: in six months. [13:37:53] coet|cawiki: thanks! [14:33:38] 6Labs, 10Tool-Labs, 7Database: s51053 (tools.jackbot) is abusing resources on labsdbs, throttle his grants - https://phabricator.wikimedia.org/T114559#1699676 (10jcrespo) Ok, meanwhile, I will reduce the impact on db side (as I intended originally with this ticket), because we have received complains of slow... [14:43:47] valhallasw`cloud: hmm that's funny [14:45:05] valhallasw`cloud: oh those are some zombie processes from some threads that didn't hang up. I just killed them all [14:51:35] Negative24: OK, thanks :) [15:03:17] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [15:40:06] 6Labs, 10Tool-Labs, 7Database: s51053 (tools.jackbot) is abusing resources on labsdbs, throttle his grants - https://phabricator.wikimedia.org/T114559#1699714 (10jcrespo) 5Open>3Resolved [16:09:11] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:09:57] ssh only slow for me? [16:27:46] Steinsplitter, check now [16:28:01] if it is still slow, then it is definitly you :-) [16:30:25] fast again ;o [16:49:22] 6Labs, 10Tool-Labs, 7Database, 3Labs-Q4-Sprint-1, and 5 others: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1699768 (10jcrespo) This is almost finished, but the following tables failed to be imported: ``` s51071__templatetiger_p/desort5wiki s51071__templatetiger_... [16:59:34] 6Labs, 10Tool-Labs: provide easier way to contact people abusing resources - https://phabricator.wikimedia.org/T114560#1699782 (10scfc) `getent group 51053` will give the member list of `tools.jackbot`, if that is helpful for this task. [17:04:15] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [17:25:15] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [18:30:18] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [18:56:13] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]