[00:01:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [00:19:48] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2929491 (10dschwen) [00:58:10] win 9 [01:35:47] PROBLEM - Puppet run on tools-worker-1025 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:36:51] PROBLEM - Puppet run on tools-elastic-03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:37:39] PROBLEM - Puppet run on tools-exec-1416 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:11:52] RECOVERY - Puppet run on tools-elastic-03 is OK: OK: Less than 1.00% above the threshold [0.0] [02:15:47] RECOVERY - Puppet run on tools-worker-1025 is OK: OK: Less than 1.00% above the threshold [0.0] [02:17:39] RECOVERY - Puppet run on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [03:10:06] (03CR) 10Legoktm: [C: 032] Adding more configuration for #brickimedia [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/331204 (owner: 10SamanthaNguyen) [03:10:29] (03Merged) 10jenkins-bot: Adding more configuration for #brickimedia [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/331204 (owner: 10SamanthaNguyen) [03:10:38] (03CR) 10jenkins-bot: Adding more configuration for #brickimedia [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/331204 (owner: 10SamanthaNguyen) [06:49:10] PROBLEM - Puppet run on tools-cron-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:29:11] RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [13:15:22] akosiaris: For the mega missing comma patch, how would you suggest it get split up? [13:16:48] friendly12345: per puppet module I 'd say [13:17:06] 1 patch per puppet module sounds fine [13:20:03] You do realise the amount of patches that would add up to? And it's still going to be a large number of changes in modules/roles. [13:21:16] friendly12345: yeah, in general it's easier to merge a large number of small patches and not a small number of large patches [13:21:23] regardless of type of change [13:21:58] So 60+ patches [13:24:26] friendly12345: yeah it would probably be around that number [13:24:52] it's fine if you decide to do 2-3 modules in one go [13:25:06] assuming the changes per module are minimal [13:25:21] the point is, keep the change small, otherwise it's unreviewable [13:32:10] 10Tool-Labs-tools-Pageviews: "Ranked X of the most-viewed pages for ..." states wrong rank - https://phabricator.wikimedia.org/T154986#2930461 (10Tbayer) [15:18:38] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Facilitate Volunteer NDA application process for potential Tool Labs standards committee appointees - https://phabricator.wikimedia.org/T154625#2930634 (10Aklapper) [15:18:45] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 15User-bd808: Facilitate Volunteer NDA application process for potential Tool Labs standards committee appointees - https://phabricator.wikimedia.org/T154625#2918138 (10Aklapper) [15:31:59] PROBLEM - Puppet run on tools-worker-1003 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:43:06] 06Labs, 10wikitech.wikimedia.org, 03Google-Code-In-2016, 13Patch-For-Review, 07Technical-Debt: Identify/Cleanup ContentHandler deprecated calls (and hook subscribers) in Wikitech specific extensions branches - https://phabricator.wikimedia.org/T147924#2930683 (10FilipGCI) a:03FilipGCI [15:45:22] 10Tool-Labs-tools-Other: Authentication Error on Using CommonsHelper - https://phabricator.wikimedia.org/T143221#2561066 (10BU_Rob13) This has happened to me several times as well recently. At my home computer, I literally cannot use CommonsHelper anymore, and have been unable to for months. Oddly, when I swappe... [16:00:53] !log wikilabels deploying f01a39e into staging (T154897) [16:00:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [16:00:57] T154897: Chinese translations are not being loaded - https://phabricator.wikimedia.org/T154897 [16:02:23] confirming it's working, going to prod [16:03:21] !log wikilabels deploying f01a39e into prod (T154897) [16:03:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL [17:06:50] 06Labs, 10wikitech.wikimedia.org, 03Google-Code-In-2016, 13Patch-For-Review, 07Technical-Debt: Identify/Cleanup ContentHandler deprecated calls (and hook subscribers) in Wikitech specific extensions branches - https://phabricator.wikimedia.org/T147924#2931060 (10FilipGCI) For now, checked "SemanticForms"... [17:23:11] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Methaspirin was created, changed by Methaspirin link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Methaspirin edit summary: Created page with "{{Tools Access Request |Justification=Updating and creating Wikidata articles to reflect up-to-date PubChem content. |Completed=false |User Name=Methaspirin }}" [17:58:25] 10Tool-Labs-tools-Pageviews: Build custom topviews API - https://phabricator.wikimedia.org/T155018#2931200 (10MusikAnimal) [17:59:04] 10Tool-Labs-tools-Pageviews: "Ranked X of the most-viewed pages for ..." states wrong rank - https://phabricator.wikimedia.org/T154986#2930461 (10MusikAnimal) Yup, that'd be correct. Thanks for reporting! I think I'm going to first build a custom Topviews API that acts like the normal /top RESTBase endpoint, exc... [18:02:50] 06Labs, 10wikitech.wikimedia.org, 05WMF-deploy-2017-01-03_(1.29.0-wmf.7): Wikitech blank page and no logs with mediawiki 1.29.0-wmf.7 - https://phabricator.wikimedia.org/T154618#2931223 (10Reedy) FYI, I have created a 3.7 branch... ``` reedy@ko-kra:~/SemanticForms$ git checkout 3.7 Note: checking out '3.7'.... [18:08:55] (03CR) 10Andrew Bogott: [C: 032] Add a password strength meter [labs/striker] - 10https://gerrit.wikimedia.org/r/329018 (https://phabricator.wikimedia.org/T153935) (owner: 10BryanDavis) [18:12:12] (03CR) 10Andrew Bogott: [C: 032] Allow changing LDAP password [labs/striker] - 10https://gerrit.wikimedia.org/r/328622 (https://phabricator.wikimedia.org/T153935) (owner: 10BryanDavis) [18:14:34] (03CR) 10Andrew Bogott: [C: 032] Cleanup a few things with account registration form [labs/striker] - 10https://gerrit.wikimedia.org/r/328620 (owner: 10BryanDavis) [18:30:08] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2931330 (10dschwen) [18:30:44] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2565438 (10dschwen) I have upgraded my remaining instances. [18:34:50] 10Tool-Labs-tools-Database-Queries: max_user_connections is (too) low - running lots of simple queries - https://phabricator.wikimedia.org/T155025#2931359 (10dschwen) [18:37:39] andrewbogott: stewardbot is failing due to high load on the grid [18:37:55] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2931404 (10Andrew) [18:38:13] matanya: is that running on precise by chance? [18:38:31] trusty andrewbogott [18:38:38] andrewbogott: https://dpaste.de/AakZ [18:38:40] ok, well, so much for that theory [18:49:07] matanya: I'm not ignoring you but we're very shorthanded today so it might be a bit [18:50:27] 06Labs, 10wikitech.wikimedia.org, 03Google-Code-In-2016, 13Patch-For-Review, 07Technical-Debt: Identify/Cleanup ContentHandler deprecated calls (and hook subscribers) in Wikitech specific extensions branches - https://phabricator.wikimedia.org/T147924#2931480 (10FilipGCI) Ok! So: **"SemanticForms"** - pa... [18:54:30] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Methaspirin was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=1296424 edit summary: [19:00:01] matanya: things look mostly ok — do you see that it's not doing what it's meant to do, or just those failure messages in qstat? [19:00:05] And, if you resubmit does it help? [19:00:13] (Things don't look especially overloaded.) [19:02:22] matanya: also, what is the actual tool name for stewardbot? [19:04:58] andrewbogott: i didn't work, can we meet in person for a few minutes ? [19:05:12] *it [19:05:17] oh, are you at the dev summit? [19:05:24] !log tools Killed 3 jobs from tools.arnaub that were causing high load on tools-exec-1411 [19:05:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:05:31] (Madhu just killed some jobs, so maybe try once more) [19:05:38] andrewbogott: i was with you in the elevator this morning :) [19:05:49] I was not at all awake then :( [19:05:53] But I remember, in retrospect! [19:06:31] Still not working? (I'm happy to come find you but also hoping that I can listen to halfak's talk) [19:07:06] seems like not working [19:07:31] matanya: what happens when you submit a job? [19:08:08] it says : Your job 755956 ("stewardbot") has been submitted [19:08:36] but when i do qstat on it, it shows what i pasted above [19:09:01] https://www.irccloud.com/pastebin/vWUsvaMH/ [19:09:25] yes, that [19:11:33] matanya: we're on the steps right by the front door [19:12:30] matanya: what's the name of your tool? [19:12:31] I am in joe's session [19:12:35] ah, ok [19:12:37] stewardbot [19:12:49] # become stewardbot [19:12:49] become: no such tool 'stewardbot' [19:13:04] stewardsbots [19:13:08] ah! [19:13:10] *stewardbots [19:13:17] sorry, typos... [19:13:22] matanya: that paste is just 4 out of 20 nodes rejecting jobs. Haven't looked further, but that's not itself too scary [19:13:25] 06Labs, 10wikitech.wikimedia.org, 03Google-Code-In-2016, 13Patch-For-Review, 07Technical-Debt: Identify/Cleanup ContentHandler deprecated calls (and hook subscribers) in Wikitech specific extensions branches - https://phabricator.wikimedia.org/T147924#2931580 (10FilipGCI) 05Open>03Resolved Ok, patch... [19:14:05] i see errors in the logs : File "/usr/lib/python2.7/dist-packages/irclib.py", line 785, in send_raw [19:14:05] raise ServerNotConnectedError, "Not connected." [19:14:05] ServerNotConnectedError: Not connected. [19:14:12] yeah, I think this is a red herring, unless you actually see that the bot isn't doing its job [19:14:22] but I think that overloading is not the problem [19:14:29] that's likely freenode [19:14:41] ah, that makes sense [19:14:58] Do we still have some exec nodes without public IPs? [19:16:08] * andrewbogott checks [19:17:13] I released a few more precise ips last week so if we have exec nodes that haven't had them added yet we should have some to spare [19:17:15] i fixed it. bd808 guess what, the conf had a server name hardcoded, rather than service name ... [19:17:32] :) life is a circle [19:18:01] thanks for the help andrewbogott and bd808 [19:18:48] (matanya and I were talking in meat space about hard coded ips/hosts vs service names this morning) [19:33:37] 06Labs, 10Labs-Infrastructure, 10Tool-Labs, 10DBA, 10Wikimedia-Developer-Summit (2017): Labsdbs for WMF tools and contributors: get more data, faster - https://phabricator.wikimedia.org/T149624#2931640 (10jcrespo) Slides at: https://commons.wikimedia.org/wiki/File:Labsdbs-_get_more_data,_faster.pdf [19:38:48] 06Labs, 10DBA: Prepare and check storage layer for wikimania2018wiki - https://phabricator.wikimedia.org/T155041#2931704 (10Krenair) [19:43:47] 10Tool-Labs-tools-Database-Queries: max_user_connections is (too) low - running lots of simple queries - https://phabricator.wikimedia.org/T155025#2931725 (10dschwen) [20:02:47] 06Labs, 10DBA, 10Wikidata, 07Performance, and 3 others: Create a new project in labs for testing RedisLock in Wikidata - https://phabricator.wikimedia.org/T155042#2931757 (10Ladsgroup) [20:05:16] PROBLEM - Puppet staleness on tools-worker-1003 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [20:50:34] 10PAWS, 10MediaWiki-extensions-OAuth, 10Pywikibot-OAuth, 10Pywikibot-core: PAWS can not login - https://phabricator.wikimedia.org/T136114#2931882 (10Dvorapa) [20:50:52] 10PAWS, 10MediaWiki-extensions-OAuth, 10Pywikibot-OAuth: PAWS can not login - https://phabricator.wikimedia.org/T136114#2931884 (10Dvorapa) [21:13:12] 06Labs, 10DBA, 10Wikidata, 07Performance, and 3 others: Create a new project in labs for testing RedisLock in Wikidata - https://phabricator.wikimedia.org/T155042#2931952 (10chasemp) would it make sense instead to bump up the quota for wikidata-dev to accomodate? Could be temp or permanent as well dependi... [21:26:22] 06Labs, 10Labs-Infrastructure: Deprecate precise instances in Labs by 03/31/2017 - https://phabricator.wikimedia.org/T143349#2931982 (10dschwen) @Andrew I upgraded the other two instances to Xenial. Given that the upgrade was rather painless (so far... I hope all the puppet stuff is still working as intended.... [21:28:34] Yoo hoo. Any ops in here? I'd love some input on https://phabricator.wikimedia.org/T155025 (me hitting max_user_connections frequently) [21:29:00] Obviously the answer I can already give myself is "Fix yo s%^t!" :-D [21:29:26] But there might be mitigating circumstances. [21:30:11] The tool connects to many databases (one for each language) and does a bunch of really simple (i.e. inexpensive, I think) queries in bursts. [21:30:43] Persistent connections don't help me (connection limit is 5, but I have connections to about 20-30 different dbs) [21:30:46] you shouldn't need to do seperate connections for each db on the same server [21:31:54] I see... but... the server hostname is different each time [21:32:15] I wonder if that makes php mysqli think those are different servers [21:33:16] MW has some ways around this in terms of "reusing connections" and such [21:33:46] hm, and I think persistent connections are database specific anyways [21:33:56] MW does not connect to many different DBs [21:34:06] not usually [21:34:09] but in maintenance scripts it can [21:34:21] and teh db connections are slooow [21:34:24] but those are not run in parallel [21:34:45] I have diufferent users querying different language data more or less at the same time [21:34:59] my persistent connection don't live very long at all [21:39:24] I think, if it's the case you need more connections because it's a high use tool... You can ask [21:40:40] yeah, thanks [21:41:05] I'm already using php apc (apcu) to do in memory caching of the hot data [21:41:09] helps a lot [22:23:13] (03CR) 10Lokal Profil: [C: 031] "Looks good. Only two questions about line lengths." (032 comments) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/331407 (owner: 10Jean-Frédéric) [22:31:16] 06Labs, 10Tool-Labs: Create NodeJS container for Tool Labs - https://phabricator.wikimedia.org/T155063#2932218 (10Tarrow)