[00:01:54] PROBLEM - Puppet run on tools-exec-1441 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [00:09:05] 06Labs, 06Developer-Relations (Apr-Jun 2017), 03Google-Summer-of-Code (2017), 10Outreachy (Round-14): Set up a Zulip instance on tool Labs - https://phabricator.wikimedia.org/T163169#3192436 (10srishakatux) @Aklapper Yeah! Realizing that shouldn't have been done, removing the tag now. [00:13:38] 06Labs, 10Tool-Labs: labsdb1001 crashing regularly in the last 2 days due to OOM - https://phabricator.wikimedia.org/T163001#3192457 (10kaldari) [00:29:06] RECOVERY - Puppet run on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0] [00:36:50] RECOVERY - Puppet run on tools-exec-1441 is OK: OK: Less than 1.00% above the threshold [0.0] [00:52:07] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other, 15User-bd808: bigbrother not trying to start missing iabot job - https://phabricator.wikimedia.org/T163265#3192543 (10bd808) p:05Triage>03High a:03bd808 [01:21:34] PROBLEM - Puppet run on tools-exec-1434 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [01:25:06] PROBLEM - Puppet run on tools-exec-1439 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:32:53] PROBLEM - Puppet run on tools-exec-1441 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:56:33] RECOVERY - Puppet run on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [02:00:05] RECOVERY - Puppet run on tools-exec-1439 is OK: OK: Less than 1.00% above the threshold [0.0] [02:07:53] RECOVERY - Puppet run on tools-exec-1441 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:25] 06Labs, 10Labs-Infrastructure: Automatically updated list of all configured domains - https://phabricator.wikimedia.org/T45580#3192725 (10Krinkle) [02:20:41] 06Labs, 10Labs-Infrastructure: Automatically updated list of all configured domains - https://phabricator.wikimedia.org/T45580#3192731 (10Krinkle) https://wmflabs.org now redirects to which is a good starting point. For Tool Labs, https://tools.wmfla... [02:21:55] 06Labs, 10Labs-Infrastructure, 10Tool-Labs-tools-Other, 15User-bd808: Automatically updated list of all configured domains - https://phabricator.wikimedia.org/T45580#3192732 (10Krinkle) [02:23:17] PROBLEM - Puppet run on tools-exec-1433 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [02:40:25] 06Labs, 10Labs-Infrastructure: Wikitech error when adding users to projects - https://phabricator.wikimedia.org/T159021#3192768 (10Andrew) I haven't tracked down the relationship, but I suspect this issue is a symptom of token overload, addressed in T163259. That bug should be fixed (albeit poorly) -- does th... [02:40:42] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Shorter token life for novaobserver/novaadmin - https://phabricator.wikimedia.org/T163259#3191526 (10Andrew) [02:40:44] 06Labs, 10Labs-Infrastructure: Wikitech error when adding users to projects - https://phabricator.wikimedia.org/T159021#3192770 (10Andrew) [02:49:44] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Shorter token life for novaobserver/novaadmin - https://phabricator.wikimedia.org/T163259#3192772 (10Andrew) There were about 550,000 tokens found by the queries in those two added crons: novaobserver and novaadmin tokens too young to expire but more than .... [02:58:17] RECOVERY - Puppet run on tools-exec-1433 is OK: OK: Less than 1.00% above the threshold [0.0] [03:26:16] 06Labs, 10Tool-Labs: labsdb1001 crashing regularly in the last 2 days due to OOM - https://phabricator.wikimedia.org/T163001#3192825 (10Anomie) Third run, no deviation outside the norm in [[https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&panelId=14&fullscreen&orgId=1&var-server=labsdb... [03:31:29] 06Labs, 10Labs-Infrastructure, 10Tool-Labs-tools-Other, 15User-bd808: Automatically updated list of all configured domains - https://phabricator.wikimedia.org/T45580#3192834 (10bd808) >>! In T45580#3192731, @Krinkle wrote: > One thing we could do is add a view to PROBLEM - Puppet run on tools-exec-1436 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [04:10:58] RECOVERY - Puppet run on tools-exec-1436 is OK: OK: Less than 1.00% above the threshold [0.0] [04:27:04] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Have Edit Counter use same architecture and front-end as the other pieces that have been re-written - https://phabricator.wikimedia.org/T160481#3192865 (10Samwilson) https://github.com/x-tools/xtools-rebirth/pull/15 is pretty much ready for review. There are s... [05:03:52] PROBLEM - Puppet run on tools-exec-1441 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [05:13:29] PROBLEM - Puppet run on tools-exec-1430 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [05:22:47] 06Labs, 10Labs-Infrastructure: Wikitech error when adding users to projects - https://phabricator.wikimedia.org/T159021#3054591 (10Legoktm) >>! In T159021#3132565, @bd808 wrote: > The logged exception was: > ``` > UnexpectedValueException from line 196 of /srv/mediawiki/php-1.29.0-wmf.13/includes/user/UserGrou... [05:23:47] 06Labs, 10Labs-Infrastructure, 06Community-Tech-Tool-Labs: invisible-unicorn (dynamicproxy) should provide an easy way to see where a host routes without knowing the project - https://phabricator.wikimedia.org/T115752#3192912 (10bd808) Putting the host routes for each project into https://tools.wmflabs.org/o... [05:38:53] RECOVERY - Puppet run on tools-exec-1441 is OK: OK: Less than 1.00% above the threshold [0.0] [05:48:32] RECOVERY - Puppet run on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0] [06:20:06] PROBLEM - Puppet run on tools-exec-1432 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:29:52] 06Labs, 10Labs-Infrastructure: Wikitech error when adding users to projects - https://phabricator.wikimedia.org/T159021#3054591 (10TTO) Do we have a stack trace for this? The only possibly relevant call to User::addGroup() I could find is https://phabricator.wikimedia.org/diffusion/EOST/browse/master/special/S... [06:47:41] PROBLEM - Puppet run on tools-exec-gift-trusty-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:55:08] RECOVERY - Puppet run on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0] [07:22:41] RECOVERY - Puppet run on tools-exec-gift-trusty-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:52:26] 10Tool-Labs-tools-Attribution-Generator, 06TCB-Team: English Quotation Marks - https://phabricator.wikimedia.org/T163309#3193058 (10Katja_Ullrich_WMDE) [08:05:08] 06Labs, 10Labs-Infrastructure: Wikitech error when adding users to projects - https://phabricator.wikimedia.org/T159021#3193077 (10Legoktm) ``` 2017-02-25 00:50:02 [63aa22b668ebe4dde03fc343] silver labswiki 1.29.0-wmf.13 exception ERROR: [63aa22b668ebe4dde03fc343] /wiki/Special:NovaProject UnexpectedValueExc... [08:24:27] Hi! What do I need to get r/o-access to the wikidata database? Currently I can access dewiki_p on dewiki.labsdb and dewikisource_p on dewikisource.labsdb (and a private one on tools-db) [08:26:49] Wurgl do you want to change wikidata, or just having a place to write "near" wikidata? [08:27:58] ah, sorry, I missread r/o as r/w [08:28:32] wikidata is on wikidatawiki_p / wikidatawiki.labsdb [08:28:56] if you need to find a database, there is a meta_p database with a list of all project names [08:29:12] r/o <-- read only [08:29:44] wikidata is on wikidatawiki_p / wikidatawiki.labsdb [08:30:41] Thanks! Works [08:33:20] the list of complete dbnames is at: https://tools.wmflabs.org/replag/ [08:33:32] as taken from the meta database [08:36:13] I just need that magic Q for some articles. Maybe 10 accesses per day [08:37:53] you do not need to ask for permission- it is there for you to use it :-) [08:38:51] I am new to this bot things � [08:49:04] PROBLEM - Puppet run on tools-exec-1435 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [08:59:53] 06Labs, 10Labs-Infrastructure: Wikitech error when adding users to projects - https://phabricator.wikimedia.org/T159021#3193126 (10TTO) Is it possible that for some reason, the user to which OpenStackManager is trying to add the group doesn't actually exist? Although User::addGroup could do a better job of han... [09:24:02] RECOVERY - Puppet run on tools-exec-1435 is OK: OK: Less than 1.00% above the threshold [0.0] [09:30:02] jynus: Is there somewhere a schema for the wikidata database, or do I have to type a lot of statements like "show create table ...' [09:30:47] Wurgl, 2 options [09:30:54] the documented schema is [09:31:21] https://phabricator.wikimedia.org/source/mediawiki/browse/master/maintenance/tables.sql [09:31:57] but sometimes small differences are behind or ahead of that, plus extension create its own tables [09:32:07] Something similar to this one? https://upload.wikimedia.org/wikipedia/commons/9/94/MediaWiki_1.28.0_database_schema.svg [09:33:02] then you have the information_schema_p list [09:35:13] the question is: I have a page (with a page_id) in dewiki_p � how to find that magic Q29469999? [09:45:08] SELECT eu_entity_id FROM wbc_entity_usage WHERE eu_page_id=? [09:50:37] select * from wbc_entity_usage where eu_entity_id = 'Q29469999'; --> Empty set (0.03 sec) :-( [09:52:15] Now problem, I will find out � [09:52:19] -w [09:52:35] [dewiki]> SELECT * FROM wbc_entity_usage WHERE eu_entity_id = 'Q29469999'; [09:52:47] see the current db^ [09:53:53] aha [09:53:56] Fine [10:14:30] PROBLEM - Puppet run on tools-exec-1430 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [10:29:52] PROBLEM - Puppet run on tools-exec-1441 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [10:49:30] RECOVERY - Puppet run on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0] [11:09:52] RECOVERY - Puppet run on tools-exec-1441 is OK: OK: Less than 1.00% above the threshold [0.0] [11:21:05] PROBLEM - Puppet run on tools-exec-1432 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:56:05] RECOVERY - Puppet run on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0] [12:03:06] 06Labs, 10Tool-Labs: labsdb1001 crashing regularly in the last 2 days due to OOM - https://phabricator.wikimedia.org/T163001#3193373 (10Marostegui) There was no issues in the 8:51 run either: https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&panelId=14&fullscreen&orgId=1&var-server=labs... [12:56:06] PROBLEM - Puppet run on tools-exec-1439 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:16:07] PROBLEM - Free space - all mounts on tools-proxy-01 is CRITICAL: CRITICAL: tools.tools-proxy-01.diskspace._public_dumps.byte_percentfree (No valid datapoints found)tools.tools-proxy-01.diskspace.root.byte_percentfree (<55.56%) [13:20:07] !log tools clean up disk space on tools-proxy-01 [13:20:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:23:36] !log tools stop docker on tools-proxy-01 [13:23:39] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:31:07] RECOVERY - Puppet run on tools-exec-1439 is OK: OK: Less than 1.00% above the threshold [0.0] [13:48:51] chasemp, tools issues expected? [13:49:08] I see yes [13:49:15] jynus: we know something up there and are working to get it figured out [13:49:23] TTL exceeded ? [13:49:32] (04:47:07 μμ) icinga-wm: PROBLEM - Host tools.wmflabs.org is DOWN: CRITICAL - Time to live exceeded (tools.wmflabs.org) [13:49:36] routing problems ? [13:49:36] that message is a mystery to me akosiaris [13:49:51] ok looking [13:50:29] yeah routing for sure [13:50:39] I can see labnet1001 sending back to cr2-eqiad [13:51:10] akosiaris: my guess is andrew has decoupled the floating IP and now it's a routing issue [13:51:13] now that I thinka bout it [13:51:28] he is trying to get that IP attached to the secondary and it seems to be not going well [13:51:43] we have the switchover in 10 minutes [13:51:55] can we please revert whatever it was ? [13:52:09] it wasn't a planned maint so we can't, it's down either way unf [13:52:14] oh [13:52:20] damn [13:52:24] working on it [13:52:34] ok, shout if you need help, I am around [13:52:47] akosiaris: afa we know it's unrelated but things seem weird so I'm not entirely sure [13:52:48] well, ping not shout .. but anyway [13:53:59] chasemp: whatever you did just fixed it! [13:54:05] the loop is gone [13:54:07] nice! [14:01:07] RECOVERY - Free space - all mounts on tools-proxy-01 is OK: OK: tools.tools-proxy-01.diskspace._public_dumps.byte_percentfree (No valid datapoints found) [14:14:19] Hello, Amitie_10g here [14:14:37] Amitie_10g: how can we assist you? [14:15:44] I'm running a bot in PHP, so, is possible to install an specific PECL module (Judy) to the Grid, in order to optimise the memory consumption for arrays? [14:16:57] It may be a bit before someone is able to do that (we're doing a datacentre switchover today) [14:17:27] Ahh yeah [14:17:43] I tried from the Tool account but unsuccess. [14:18:10] I'll wait for the maintenance and I'll return soon [14:18:33] The same question but in another way [14:19:20] Is possible for normal Tool accounts to install PEAR/PECL modules? If not, where an operator can request that? Here, at Phabricator? [14:19:26] PROBLEM - Puppet run on tools-exec-1437 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [14:20:39] Amitie_10g: I believe its done through phabricator, but im not 100% sure let me look. [14:30:31] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Angela was created, changed by Angela link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Angela edit summary: Created page with "{{Tools Access Request |Justification=Final project of the University |Completed=false |User Name=Angela }}" [15:01:23] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Fix caching problems with XTools - https://phabricator.wikimedia.org/T162753#3193881 (10MusikAnimal) p:05Normal>03High [15:04:27] RECOVERY - Puppet run on tools-exec-1437 is OK: OK: Less than 1.00% above the threshold [0.0] [15:05:41] 06Labs, 06Operations: kube-proxy pulls in docker and starts service even when it isnt needed - https://phabricator.wikimedia.org/T163336#3193909 (10chasemp) [15:06:26] 06Labs, 06Operations: kube-proxy pulls in docker and starts service even when it isnt needed - https://phabricator.wikimedia.org/T163336#3193924 (10chasemp) p:05Triage>03Normal [15:07:40] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Fix caching problems with XTools - https://phabricator.wikimedia.org/T162753#3193941 (10MusikAnimal) This is blocking testing of T160481, T162752, T162754, and anything else we do moving forward. @Samwilson I looked into this for an hour or two last, but got... [15:10:11] !log tools apt-get install psmisc on tools-proxy-0[12] [15:10:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [15:23:35] PROBLEM - Puppet run on tools-exec-1442 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:23:44] andrewbogott: there is an outstanding "1 nodepool alien(s) present" alert for 1d2h [15:24:11] paravoid: yeah, that's a false alert of some sort, I can't find the instance it's worried about. [15:24:15] hashar, any idea what that's about? [15:39:44] 06Labs, 10Tool-Labs: Judy PECL module not available in Tool labs, need to install it - https://phabricator.wikimedia.org/T163340#3194094 (10Amitie_10g) [15:40:22] I just opened that request at Phabricator. [15:47:57] (03CR) 10Lokal Profil: [C: 032] Track number of tracked images (on top of found images) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/338009 (owner: 10Jean-Frédéric) [15:49:02] (03Merged) 10jenkins-bot: Track number of tracked images (on top of found images) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/338009 (owner: 10Jean-Frédéric) [15:49:07] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: XTools: Clean up "Pages created" tool - https://phabricator.wikimedia.org/T162752#3194138 (10MusikAnimal) [15:50:04] (03CR) 10jenkins-bot: Track number of tracked images (on top of found images) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/338009 (owner: 10Jean-Frédéric) [15:58:36] RECOVERY - Puppet run on tools-exec-1442 is OK: OK: Less than 1.00% above the threshold [0.0] [15:59:37] (03CR) 10Lokal Profil: "I would definitly als owant to stor the per-country settings in the json file. The main issue with that approach is that I'm not sure how " (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/342198 (owner: 10Lokal Profil) [16:04:15] !log test [16:04:53] !log test [16:06:13] chasemp andrewbogott ^^ spammer [16:07:54] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Revent was created, changed by Revent link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Revent edit summary: Created page with "{{Tools Access Request |Justification=Invited by zhuyfei1999 to have access to the live logs of Embedded Data Bot, a Commons adminbot, for better monitoring and stepping on Wi..." [16:09:02] thanks paladox [16:09:13] Your welcome :) [16:10:12] bd808: mind approving Revent's tools access request? [16:11:28] zhuyifei1999_: pushing buttons... [16:11:33] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Revent was modified, changed by BryanDavis link https://wikitech.wikimedia.org/w/index.php?diff=1756894 edit summary: [16:12:18] thanks :) [16:14:02] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Tonitrus was modified, changed by BryanDavis link https://wikitech.wikimedia.org/w/index.php?diff=1756897 edit summary: [16:17:48] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Codeofdusk was modified, changed by BryanDavis link https://wikitech.wikimedia.org/w/index.php?diff=1756899 edit summary: [16:22:05] PROBLEM - Puppet run on tools-exec-1432 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:23:16] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Shorter token life for novaobserver/novaadmin - https://phabricator.wikimedia.org/T163259#3194271 (10Andrew) Currently: root@MISC m5[keystone]> SELECT COUNT(*) FROM token; +----------+ | COUNT(*) | +----------+ | 49163 | +----------+ I'll check again in... [16:31:20] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Srishakatux was modified, changed by BryanDavis link https://wikitech.wikimedia.org/w/index.php?diff=1756903 edit summary: [16:35:07] andrewbogott: paravoid : "1 nodepool alien(s) present" that is a left over instance . I have deleted it. I will look at making the alarm more spammy so we catch it eariler [16:35:21] * hashar vanishes for dinner [16:35:28] hashar: I could tell which one it was, everything was in 'active' state [16:35:33] um… couldn't tell [16:35:54] will fill a task about it later tonight and cc you :} I gotta escape! [16:40:07] PROBLEM - Puppet run on tools-exec-1438 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:40:55] 06Labs, 10Tool-Labs-tools-Other: wsexport tool writing output to $HOME/tool/temp puts load on Tool Labs NFS server - https://phabricator.wikimedia.org/T163208#3194367 (10chasemp) chasemp_freenode_#wikimedia-labs_20170418.log 1 tools-exec-1430 1 tools-exec-1437 1 tools-exec-1439 1 tools-... [16:46:54] 06Labs, 10Tool-Labs-tools-Other: wsexport tool writing output to $HOME/tool/temp puts load on Tool Labs NFS server - https://phabricator.wikimedia.org/T163208#3194397 (10bd808) @chasemp has suggested that we try an experiment to isolate the impact of these jobs to a single exec node on the grid. This would inv... [16:49:06] bd808: symlink $HOME/tool/temp to /tmp? [16:50:25] valhallasw`cloud: I think the files have to either get back to $HOME at some point or that we have to pin the builder jobs to the same exec node as the webservice [16:50:41] ah [16:51:13] I think what happens in this tool is that a web request comes in, it spawns a background job, and then polls for the output file to return to the client [16:52:30] right, that makes sense. I interpreted /tool/temp to be 'a temp dir that is used until a file is copied back to /tool/output/...' but that's not right then [16:56:45] 06Labs, 10Tool-Labs: Judy PECL module not available in Tool labs, need to install it - https://phabricator.wikimedia.org/T163340#3194094 (10valhallasw) What is the error message you receive when you try to install the PECL module? * There is no php5-judy package available, so we cannot make it available in ph... [16:57:06] RECOVERY - Puppet run on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0] [17:00:01] Hi all, I've run into a problem with my tool commons-mass-description at toollabs. I can't start kubernetes webservice (it says I'm using gridengine one) but when I try to stop the gridengine it says it isn't running. How may I fix it? [17:00:38] Urbanecm: ugh. I think I've seen webservice get confused like this before [17:00:53] let me log into the tool account and take a look around [17:01:14] What do you mean by "let me"? [17:02:06] bd808: ^ [17:02:15] "give me a mintue" [17:02:21] I'm looking now [17:03:13] so qstat is empty and webservice status says its not running. What happens when you run `webservice --backend=kubernetes python start`? [17:04:13] Looks like you already have another webservice running, with a gridengine backend [17:04:14] You should stop that webservice by issuing: [17:04:14] webservice --backend=gridengine stop [17:04:14] And then start it again with backend kubernetes by issuing: [17:04:15] webservice --backend=kubernetes start [17:04:19] This is my output [17:06:11] hmmm.. I bet that is because of the state of service.manifest. Try `webservice stop` to clear that state [17:06:28] I know it's not running but I think there is bookkeeping that got missed here [17:07:02] I've run webservice stop and then webservice --backend=kubernetes python start but I get the same output [17:07:56] qstat shows it running on the grid again [17:08:08] I have empty output [17:09:36] Urbanecm: uwsgi.log shows errors. "ImportError: No module named site" [17:09:45] I think its in a restart loop because of that [17:11:11] yeah. it keeps dying as soon as it starts [17:11:59] is your venv build to python3 on kubernetes? [17:13:08] I think webservice has gone crazy here :/ [17:14:07] Urbanecm: I'm going to delete the service.manifest file and see if that makes webservice stop trying to start the grid job [17:14:54] i'm tailing service.log now to see if it tries to start again [17:15:08] RECOVERY - Puppet run on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0] [17:15:28] !log tools.commons-mass-description rm service.manifest to try and stop restart loop [17:15:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.commons-mass-description/SAL [17:16:57] !log tools.commons-mass-description webservice --backend=kubernetes python start [17:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.commons-mass-description/SAL [17:17:06] Urbanecm: I think it is working now [17:18:03] I'm not sure why, but webservice's reconciliation loop got stuck. webservice stop should have cleared the service.manifest file but it was not doing so [17:18:20] This is probably a bug in webservice [17:31:00] 06Labs, 10Tool-Labs: webservice stop says service not running but service.manifest not cleared - https://phabricator.wikimedia.org/T163355#3194566 (10bd808) [17:31:17] Urbanecm: long winded task created to look into why the manifest was not cleared ^ [17:32:11] 06Labs: IO issues for Tools instances flapping with iowait and puppet failure - https://phabricator.wikimedia.org/T161898#3146683 (10Phe) Are you sure you used the IO amount to get your report? I did a fix in phetools, but I didn't get why I was mentioned in this report, it's unclear if it was worth or not. [17:39:11] 06Labs, 10Tool-Labs: Judy PECL module not available in Tool labs, need to install it - https://phabricator.wikimedia.org/T163340#3194664 (10Amitie_10g) Yes, there is not php5-judy, and it is only available from PECL. When I trying to install judy from PECL (in bastion), the following error appears: ``` $ pecl... [17:39:59] Meanwhile the developers answer me at Phabricator, could anyone assist for some of my request? [17:40:05] 06Labs: IO issues for Tools instances flapping with iowait and puppet failure - https://phabricator.wikimedia.org/T161898#3194672 (10Phe) By the way, until I'm at it, I see you ulimit unlimited for core file size and core file are dumped to nfs, I saw recently, I don't remember where, a 1.5GB core file, that sur... [17:40:37] Is possible to set an alternative php_dir value for PECL modules (to be used for php-cli)? [17:47:27] Amitie_10g: -R ? [17:47:29] pecl help install [17:49:10] The question is how can I install the PECL module to the Grid from the Bastion. [17:49:35] PROBLEM - Puppet run on tools-exec-1442 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [17:49:51] .user.ini, or via https://secure.php.net/manual/en/function.dl.php [17:49:59] I'll try [17:51:16] Well, I need phpize... [17:52:04] 06Labs, 10Tool-Labs: Judy PECL module not available in Tool labs, need to install it - https://phabricator.wikimedia.org/T163340#3194719 (10Amitie_10g) After trying to install from bastion to an alternative php_dir, I got phpize missing, and php5-dev is required for that. [17:52:44] I set the alternative php_dir with pecl install -R [17:54:53] Notice that my bot needs Judy only for php-cli; the Webserver don't need it. [17:55:14] Amitie_10g: that shouldn't matter; both phps are fully under your control [17:56:18] Yes, my scripts, but not the libraries needed. [17:59:55] Let me see [18:00:32] a php array is a dynamic sparse array. Is this actually needed or an optimization made before the need has been proven? [18:01:30] Yes, but I'm finding a way to reduce the memory consumption, and Judy is the answer I'm finding, because, the memory cosnumed grown more than 128 MB [18:02:05] I tweaked my script to increment the memory limit to 256 and I'll see if it stops or not [18:02:34] Considering the array is populated with a JSON with 1000 rows [18:02:49] and several megabytes [18:03:43] Id rather see you give your tool up to 521M of RAM than have yet another deb package to support [18:03:44] Ahh, but before make Judy useful (for PHP, Python and other languages), first, libjudy should be installed, globally from the Ubuntu repo [18:04:19] I'll try with 256 MB for now, and I'll increment to 512 MB if fails [18:05:49] someday we may have a nice way to handle this with custom containers for things running on kubernetes, but we are quite a ways off from that today [18:11:06] PROBLEM - Puppet run on tools-exec-1438 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:23:33] 06Labs, 10Labs-Infrastructure: Wikitech error when adding users to projects - https://phabricator.wikimedia.org/T159021#3194896 (10Andrew) If anything it's most likely that OSM is adding a user to a group when the user is already in the group -- much of T150091 involved duplicating OSM behavior in a keystone c... [18:28:04] 06Labs, 10Labs-Infrastructure: Wikitech error when adding users to projects - https://phabricator.wikimedia.org/T159021#3194909 (10Andrew) @Legoktm are you able to actually produce this issue, or are you still digging back into that previous occurrence? [18:29:37] RECOVERY - Puppet run on tools-exec-1442 is OK: OK: Less than 1.00% above the threshold [0.0] [18:42:24] PROBLEM - Puppet run on tools-exec-1431 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [18:46:08] RECOVERY - Puppet run on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0] [19:17:26] RECOVERY - Puppet run on tools-exec-1431 is OK: OK: Less than 1.00% above the threshold [0.0] [19:20:30] 10Tool-Labs-tools-Xtools, 06Community-Tech: Remove references to "Range Contributions" and "Autoblock" within xTools code - https://phabricator.wikimedia.org/T163374#3195216 (10Matthewrbowker) [19:40:30] I have a problem with permissions in the tools.persondata directory. How can I request a change of the ownership (new owner should be tools.persondata ) [19:43:21] 06Labs, 10Cassandra, 06Services (blocked): Request increased quota for services-testbed labs project - https://phabricator.wikimedia.org/T163375#3195307 (10Eevans) [19:45:31] Wurgl: use the `take` command [19:45:56] Thanks [19:48:35] This is not standard linux? Never seen this command, and it is not part of my Suse? [19:48:40] However, it works [19:52:47] no, it's not standard unix, it's specific to tool labs [19:52:57] Aha! Okay [19:56:24] 06Labs, 10Labs-Infrastructure, 07artificial-intelligence: Provide large disk space to wikibrain for memory-mapped file - https://phabricator.wikimedia.org/T161554#3195368 (10Halfak) [20:14:54] 06Labs, 10Labs-Infrastructure, 07artificial-intelligence: Provide large disk space to WikiBrain for memory-mapped file - https://phabricator.wikimedia.org/T161554#3195439 (10Halfak) [20:22:59] 06Labs, 10Labs-Infrastructure, 10Tool-Labs-tools-Other, 15User-bd808: Automatically updated list of all configured domains - https://phabricator.wikimedia.org/T45580#3195464 (10Andrew) This page is just a proof of concept (not live-updating) but is this the kind of thing we're talking about? https://wik... [20:32:10] 06Labs, 10Labs-Infrastructure, 10Tool-Labs-tools-Other, 15User-bd808: Automatically updated list of all configured domains - https://phabricator.wikimedia.org/T45580#3195531 (10bd808) >>! In T45580#3195464, @Andrew wrote: > This page is just a proof of concept (not live-updating) but is this the kind of th... [20:37:00] 06Labs, 10Labs-Infrastructure, 10Tool-Labs-tools-Other, 15User-bd808: Automatically updated list of all configured domains - https://phabricator.wikimedia.org/T45580#3195539 (10Andrew) I figured a big dump that users can search is better than a search widget since it's not that much data -- but, y'know, ei... [20:41:14] (03CR) 10Andrew Bogott: [C: 032] Send tool maintainers a notice when a git repo is created [labs/striker] - 10https://gerrit.wikimedia.org/r/348390 (owner: 10BryanDavis) [20:58:36] TBolliger: you around by any chance ? [20:59:42] !log tools.wsexport Pinning jsub jobs to tools-exec-1426 for T163208 [20:59:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wsexport/SAL [20:59:45] T163208: wsexport tool writing output to $HOME/tool/temp puts load on Tool Labs NFS server - https://phabricator.wikimedia.org/T163208 [21:01:30] 06Labs, 10Tool-Labs-tools-Other: wsexport tool writing output to $HOME/tool/temp puts load on Tool Labs NFS server - https://phabricator.wikimedia.org/T163208#3195669 (10bd808) Here's the list of exec nodes with puppet failures so far this month from my irc logs sorted by frequency: ``` $ grep 'Puppet run on t... [21:05:31] !log tools.wsexport Killed conversion job 1199220 running since 2017-01-19 [21:05:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wsexport/SAL [21:51:14] 06Labs, 10Tool-Labs-tools-Other: wsexport tool writing output to $HOME/tool/temp puts load on Tool Labs NFS server - https://phabricator.wikimedia.org/T163208#3195828 (10bd808) The long term awesome solution for the wsexport tool might be to move it to a project of its own and put the web frontend on a VM that... [22:01:52] (03CR) 10Andrew Bogott: [C: 032] Notify admins of new membership requests [labs/striker] - 10https://gerrit.wikimedia.org/r/348389 (https://phabricator.wikimedia.org/T162508) (owner: 10BryanDavis) [22:02:16] 06Labs, 06Operations: Update documentation for Tools Proxy failover - https://phabricator.wikimedia.org/T163390#3195874 (10chasemp) [22:02:22] 06Labs, 06Operations: Update documentation for Tools Proxy failover - https://phabricator.wikimedia.org/T163390#3195887 (10chasemp) p:05Triage>03Normal [22:05:32] 06Labs, 06Operations: Ensure kubelet is stopped on Tools Proxy hosts - https://phabricator.wikimedia.org/T163391#3195909 (10chasemp) [22:12:10] 06Labs, 06Operations: Ensure kubelet is stopped on Tools Proxy hosts - https://phabricator.wikimedia.org/T163391#3195948 (10chasemp) p:05Triage>03High [22:15:50] 06Labs, 06Operations: Update documentation for Tools Proxy failover - https://phabricator.wikimedia.org/T163390#3195949 (10chasemp) [22:16:19] 06Labs, 06Operations: Update documentation for Tools Proxy failover - https://phabricator.wikimedia.org/T163390#3195874 (10chasemp) [22:16:41] (03PS2) 10BryanDavis: Add links to user's accounts to membership request [labs/striker] - 10https://gerrit.wikimedia.org/r/348386 (https://phabricator.wikimedia.org/T162508) [22:16:51] 06Labs, 06Operations: Update documentation for Tools Proxy failover - https://phabricator.wikimedia.org/T163390#3195874 (10chasemp) [22:16:57] (03CR) 10Dereckson: [C: 031] Add configs, docs and credit contributors [labs/tools/Wikimedia-Emoji-Bot] - 10https://gerrit.wikimedia.org/r/348010 (owner: 10D3r1ck01) [22:21:55] Matanya — yes, I'm here [22:22:40] 06Labs, 06Operations: Determinte appropriate proxy_read_timeout setting for Tools Proxy - https://phabricator.wikimedia.org/T163393#3195964 (10chasemp) [22:22:54] 06Labs, 06Operations: Determinte appropriate proxy_read_timeout setting for Tools Proxy - https://phabricator.wikimedia.org/T163393#3195976 (10chasemp) p:05Triage>03Normal [22:26:45] 06Labs, 06Operations: Update documentation for Tools Proxy failover - https://phabricator.wikimedia.org/T163390#3195874 (10madhuvishy) Related - https://phabricator.wikimedia.org/T143639 that documents some of this, and also has been assigned to me for a while [23:03:01] 06Labs, 06Operations: Update documentation for Tools Proxy failover - https://phabricator.wikimedia.org/T163390#3196059 (10chasemp) [23:03:56] 06Labs, 06Operations: Determinte appropriate proxy_read_timeout setting for Tools Proxy - https://phabricator.wikimedia.org/T163393#3195964 (10madhuvishy) Original task on the timeout increase from 10m to 1 hour - T120335 [23:06:20] 06Labs, 06Operations: Determine appropriate proxy_read_timeout setting for Tools Proxy - https://phabricator.wikimedia.org/T163393#3196073 (10madhuvishy) [23:16:41] test [23:17:25] 18:16 - 16:53 = 2.5h (since a puppet alert due to IO reasons from time of wsexport pin) [23:18:18] fwiw [23:18:21] 2334 tools.w+ 20 0 355132 118684 15448 R 100.0 1.5 0:14.23 ebook-convert [23:18:23] 2139 tools.w+ 20 0 686564 506484 8972 R 99.1 6.2 3:09.80 ebook-convert [23:18:24] running there now [23:42:45] I just verified https://tools.wmflabs.org/wsexport/tool/book.php is working also just as a note [23:45:05] 06Labs: IO issues for Tools instances flapping with iowait and puppet failure - https://phabricator.wikimedia.org/T161898#3196166 (10chasemp) >>! In T161898#3194599, @Phe wrote: > Are you sure you used the IO amount to get your report? I did a fix in phetools, but I didn't get why I was mentioned in this report,... [23:53:06] 06Labs, 06Operations: Ensure we can survive a loss of labservices1001 - https://phabricator.wikimedia.org/T163402#3196191 (10chasemp) [23:53:15] 06Labs, 06Operations: Ensure we can survive a loss of labservices1001 - https://phabricator.wikimedia.org/T163402#3196206 (10chasemp) p:05Triage>03High [23:53:38] 06Labs, 06Operations: Ensure we can survive a loss of labservices1001 - https://phabricator.wikimedia.org/T163402#3196191 (10chasemp) [23:54:16] PROBLEM - Puppet run on tools-exec-1433 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [23:55:21] gah