[00:35:27] I've been trying to set up my little silex website thinghy on tool labs but can't get the rewrite rules to work properly [00:35:36] I tried this config http://silex.sensiolabs.org/doc/web_servers.html#lighttpd [00:35:50] Anyone experience with this that can spare a few mins to help with this? [02:33:16] 10Quarry, 6Analytics-Backlog: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1688177 (10Bawolff) Can't you already write db queries of this form using UNION and foreign table references? (Of course, that's not very user friendly) [03:22:24] 10Quarry: 'New query' highlighted when looking at existing queries - https://phabricator.wikimedia.org/T106411#1688191 (10Ricordisamoa) Because of https://github.com/wikimedia/analytics-quarry-web/blob/2f23db6fa4c2891cefdb40fd2f13e00b6514ba9a/quarry/web/templates/query/view.html#L1 [03:32:15] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [03:33:13] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [04:23:15] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [04:24:14] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [04:30:06] 6Labs, 5Patch-For-Review: Remove NFS completely from dynamicproxy project - https://phabricator.wikimedia.org/T102369#1688264 (10yuvipanda) All gone except /data/project! [04:32:50] 10Quarry, 6Analytics-Backlog: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1688272 (10Bawolff) I mean, like http://quarry.wmflabs.org/query/5417 [04:38:10] 6Labs: Make labs domainproxies fully redundant - https://phabricator.wikimedia.org/T98556#1688277 (10yuvipanda) Ok, so the proxies are running fine except NovaProxy is no longer worky! Need to figure out wtf is going on [04:38:45] 10Quarry, 6Analytics-Backlog: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1688278 (10Ricordisamoa) >>! In T95582#1688177, @Bawolff wrote: > Can't you already write db queries of this form using UNION and foreign table references? (... [04:47:28] !log copied data.db from dynamicproxy-gateway to novaproxy-01 [04:47:28] copied is not a valid project. [04:48:17] 6Labs: Make labs domainproxies fully redundant - https://phabricator.wikimedia.org/T98556#1688281 (10yuvipanda) Figured it out. Had co copy the sqlite db from dynamicproxy to novaproxy, since that's where the api gets its data from and not from Redis. Everything is working fine now! [04:55:33] 6Labs: Make labs domainproxies fully redundant - https://phabricator.wikimedia.org/T98556#1688282 (10yuvipanda) Switching over is: # Point the IP to the other instance on Special:NovaAddress (this immediately switches public traffic, rest of steps are for Special:NovaProxy) # Make sure that /etc/dynamicproxy-ap... [05:03:52] 10Quarry, 6Analytics-Backlog: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1688289 (10Bawolff) >>! In T95582#1688278, @Ricordisamoa wrote: >>>! In T95582#1688177, @Bawolff wrote: >> Can't you already write db queries of this form us... [05:31:45] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 400 Bad Request - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 509 bytes in 0.002 second response time [05:31:52] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 400 Bad Request - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 509 bytes in 0.002 second response time [05:32:34] PROBLEM - Puppet failure on tools-proxy-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:33:29] shinken-wm: is lying [05:33:36] aaah [05:33:42] probably because it's pointing to the wrong active-host [05:33:46] PROBLEM - SSH on tools-proxy-01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:33:50] this is what happens when switchovers require 4-5 different places [05:36:20] PROBLEM - Puppet failure on tools-proxy-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [05:41:49] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 898527 bytes in 2.673 second response time [05:41:53] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 898527 bytes in 2.441 second response time [05:42:43] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/BengaliHindu was created, changed by BengaliHindu link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/BengaliHindu edit summary: Created page with "{{Tools Access Request |Justification=I would like to test bots for Bengali Wikipedia. |Completed=false |User Name=BengaliHindu }}" [06:01:38] PROBLEM - Puppet failure on tools-packages is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [06:02:29] PROBLEM - Puppet failure on tools-packages is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [06:03:13] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [06:04:16] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [06:05:23] PROBLEM - Puppet failure on tools-proxy-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [06:15:22] RECOVERY - Puppet failure on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:16:19] RECOVERY - Puppet failure on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:17:32] RECOVERY - Puppet failure on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [06:26:54] PROBLEM - SSH on tools-proxy-02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:28:06] hmm [06:28:19] ^ is an interesting error [06:28:21] since I clearly can ssh [06:28:31] oooh [06:28:36] I wonder if it's debian vs trusty issue [06:28:39] hmm [06:28:45] also not sure why only testing-shinken- has problems [06:29:01] Krenair: you should take down testing-shinken- :P [06:35:19] PROBLEM - Puppet failure on tools-proxy-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [06:36:22] PROBLEM - Puppet failure on tools-proxy-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:36:41] RECOVERY - Puppet failure on tools-packages is OK: OK: Less than 1.00% above the threshold [0.0] [06:37:32] RECOVERY - Puppet failure on tools-packages is OK: OK: Less than 1.00% above the threshold [0.0] [06:40:26] !log tools migrated webproxy to tools-proxy-01 [06:40:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [06:47:26] PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [06:50:17] PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:52:41] PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:54:15] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [06:55:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [06:55:18] RECOVERY - Puppet failure on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:56:23] RECOVERY - Puppet failure on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [06:56:40] PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:07:10] PROBLEM - Host ToolLabs is DOWN: CRITICAL - Host Unreachable (tools.wmflabs.org) [07:07:28] PROBLEM - Host tools-webproxy-01 is DOWN: CRITICAL - Host Unreachable (10.68.17.139) [07:07:57] PROBLEM - Host ToolLabs is DOWN: CRITICAL - Host Unreachable (tools.wmflabs.org) [07:08:49] 9woah [07:08:52] woah [07:08:53] dont' freak out shinken [07:08:56] it's fine [07:10:12] PROBLEM - Host tools-webproxy-01 is DOWN: CRITICAL - Host Unreachable (10.68.17.139) [07:10:34] PROBLEM - Host tools-webproxy-02 is DOWN: CRITICAL - Host Unreachable (10.68.17.145) [07:10:49] !log deleted tools-webproxy-01 and -02, running on proxy-01 and -02 now [07:10:50] deleted is not a valid project. [07:12:03] !log tools deleted tools-webproxy-01 and -02, running on proxy-01 and -02 now [07:12:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [07:12:12] that's just a shinken false alarm [07:12:46] PROBLEM - Host tools-webproxy-02 is DOWN: CRITICAL - Host Unreachable (10.68.17.145) [07:13:16] andrewbogott: sorry if those alerts woke you up, all is well [07:16:59] 6Labs, 10Tool-Labs: Setup a way to store secrets and access them from puppet inside the Tool Labs project - https://phabricator.wikimedia.org/T112005#1688392 (10yuvipanda) @Joe suggested we just have a private repo on labs puppetmaster, which I highly approve of! [07:57:14] RECOVERY - Host ToolLabs is UP: PING OK - Packet loss = 0%, RTA = 0.45 ms [07:57:20] RECOVERY - Host ToolLabs is UP: PING OK - Packet loss = 0%, RTA = 0.71 ms [08:00:59] yes well done shinken [08:04:14] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [08:05:13] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [09:06:08] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/BengaliHindu was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=186966 edit summary: [10:00:00] 6Labs, 10Tool-Labs: proxylistener errors on tools-proxy-01 - https://phabricator.wikimedia.org/T114223#1688653 (10scfc) 3NEW [10:00:05] (03CR) 10Hashar: "The problem is right now the list of envs Jenkins runs is hardcoded in the Zuul configuration to 'flake8' and 'py34'. And I would like to" [labs/tools/forrestbot] - 10https://gerrit.wikimedia.org/r/242348 (owner: 10Hashar) [10:07:52] 6Labs, 10Tool-Labs: querycache and querycachetwo tables aren't available on labs sql dbs - https://phabricator.wikimedia.org/T65782#1688676 (10scfc) a:5coren>3Slaporte @Slaporte: Feel free to assign to someone else from Legal. [10:36:17] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Nedol was created, changed by Nedol link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Nedol edit summary: Created page with "{{Tools Access Request |Justification=Wikipedia DB access |Completed=false |User Name=Nedol }}" [11:16:37] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Nedol was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=187037 edit summary: [11:53:10] 6Labs, 10wikitech.wikimedia.org: Adding a user to a project results in a blank page with the user added to the project but no shell access - https://phabricator.wikimedia.org/T114229#1688874 (10scfc) 3NEW [12:54:34] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [12:56:25] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:14:29] JohnFLewis: have I been added to bastion yet? the message on gerrit hasn't changed and I have tried to ssh with no luck. Again thank you for the help :) [13:15:14] johnflewis@bast1001:~$ id asherman [13:15:14] uid=12989(asherman) gid=500(wikidev) groups=500(wikidev),707(bastiononly) [13:15:29] yep. should be able to ssh. [13:28:04] JohnFLewis: the ssh key passphrase prompt still points to id_rsa instead of id_rsa_phab which i created for bastion seperate from labs (my labs key is ida_rsa_wmf). I have made sure I am using the correct passphrase with the app Keychain Access however the app only shows two ssh keys (a git key and id_rsa) not the id_rsa_wmf or id_rsa_phab. [13:28:28] my config file looks like this: Host stat1003 User asherman IdentityFile ~/.ssh/id_rsa_phab ProxyCommand ssh -a -W %h:%p asherman@bast1001.wikimedia.org [13:28:48] i type ssh stat1003 and get the prompt i described above [13:29:08] asherman: what's the config for bast1001 again? [13:30:06] btw the password Keychain Access showed my for id_rsa(the one the prompt is pointing to) was the one I thought it was so I am not using the wrong passphrase...... does this help https://wikitech.wikimedia.org/wiki/SSH_access#SSH_configuration [13:30:14] that link mentions bastion1001 [13:31:20] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [13:32:30] asherman: look at my config (https://tools.wmflabs.org/paste/view/55e1540d) and compare it with yours/try mine (changing key and username) [13:34:41] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [14:02:14] JohnFLewis: I AM IN :)))))) your config pointed me at the right ssh key and my original password for that one worked! [14:02:27] :) [14:02:43] JohnFLewis: Thank you so much again :) [14:05:04] (03PS1) 10Hashar: Run flake8 with tox [labs/migration-assistant] - 10https://gerrit.wikimedia.org/r/242549 [14:05:09] (03CR) 10jenkins-bot: [V: 04-1] Run flake8 with tox [labs/migration-assistant] - 10https://gerrit.wikimedia.org/r/242549 (owner: 10Hashar) [14:13:17] (03CR) 10Hashar: "recheck" [labs/migration-assistant] - 10https://gerrit.wikimedia.org/r/242549 (owner: 10Hashar) [14:15:36] (03CR) 10Hashar: [C: 032] Run flake8 with tox [labs/migration-assistant] - 10https://gerrit.wikimedia.org/r/242549 (owner: 10Hashar) [14:16:27] (03CR) 10Hashar: [C: 032] Run flake8 with tox [labs/migration-assistant] - 10https://gerrit.wikimedia.org/r/242549 (owner: 10Hashar) [14:17:04] 6Labs: Make labs domainproxies fully redundant - https://phabricator.wikimedia.org/T98556#1689267 (10Andrew) This is probably a good thing to document on https://wikitech.wikimedia.org/wiki/Labs_troubleshooting [14:17:21] (03CR) 10Hashar: [C: 032] Run flake8 with tox [labs/migration-assistant] - 10https://gerrit.wikimedia.org/r/242549 (owner: 10Hashar) [14:17:32] (03Merged) 10jenkins-bot: Run flake8 with tox [labs/migration-assistant] - 10https://gerrit.wikimedia.org/r/242549 (owner: 10Hashar) [14:30:43] if the heritage db maintainer on tools is here: your DDLs are getting blocked by the high usage of that database: ping me if I can help speed it up [14:36:20] PROBLEM - Puppet failure on tools-proxy-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [14:37:24] PROBLEM - Puppet failure on tools-proxy-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:46:28] 6Labs: Eliminate SPOFs in Labs infrastructure (Tracking) - https://phabricator.wikimedia.org/T105723#1689426 (10Andrew) [14:46:29] 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-107, 5Patch-For-Review: holmium is a spof - https://phabricator.wikimedia.org/T106142#1689423 (10Andrew) 5Open>3Resolved a:3Andrew labservices1001 is now the secondary nameserver for labs instances. Nothing hits pdns auth directly, but that server is also l... [14:54:14] (03CR) 10Aude: [C: 032] Report analytics/limn-wikidata-data to wikidata-feed [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/241060 (owner: 10Addshore) [14:54:51] (03Merged) 10jenkins-bot: Report analytics/limn-wikidata-data to wikidata-feed [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/241060 (owner: 10Addshore) [14:56:16] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [14:57:12] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [14:59:14] yuvipanda: around? [15:23:51] <_joe_> aude: it's a bit early for yuvi :) [15:26:00] _joe_: i realize that :/ [15:26:39] * aude just wonders if https://wikitech.wikimedia.org/wiki/Grrrit-wm#Deploying works now (with fabric) [15:27:20] aude: no, yuvi moved it to kubernetes [15:27:26] valhallasw`cloud: ah [15:27:50] so, what do we do now? [15:27:55] what's wrong? [15:28:03] * aude wants to deploy a change [15:28:11] and would like to know how we do it now [15:29:08] I think it needs manual intervention on the kubernetes host, as the auth it not implemented yet [15:29:16] oh [15:29:18] :( [15:29:31] * aude can wait [15:36:47] :D [15:43:23] <_joe_> aude: I can help with deploying to kubernetes [15:43:29] <_joe_> but not now, in a meeting [15:43:42] <_joe_> valhallasw`cloud: auth is implemented, only in a lame way :) [15:55:34] 6Labs, 10Tool-Labs: querycache and querycachetwo tables aren't available on labs sql dbs - https://phabricator.wikimedia.org/T65782#1689724 (10Slaporte) @scfc, I'm curious which user groups can see the data in querycache/querycachetwo through MediaWiki. Are any of these special pages restricted to admins? Can... [16:01:20] _joe_: not urgent but don't want to forget [16:01:36] <_joe_> aude: sorry, meeting [16:01:40] k [16:03:00] aude I shall do it in a hour or so [16:03:09] yuvipanda: ok [16:03:14] yuvipanda: is the puppet failure on tools-proxy-02 unexpected or is that box a work in progress? [16:03:27] yuvipanda: also PLEASE DOCUMENT. love, valhallasw [16:03:53] Valhalla even if documented there isn't anything anyone not mecan do atm [16:03:58] because the docker image is [16:04:04] yuvipanda/grrrit [16:04:07] on dockerhub [16:04:13] which only I can push to [16:05:06] andrewbogott _Joe_ is using it ATM to test the kubernetes proxy support [16:05:15] it isn't the active proxy so... [16:05:17] yuvipanda: ok, I’ll ignore then [16:05:22] ok [16:05:26] although the failure is interesting :) [16:05:29] <_joe_> andrewbogott: it's me yes [16:05:57] andrewbogott novaproxy is now redundant btw. I added docs on switchover to the bug, I should move them somewhere else and mark it done [16:05:59] <_joe_> andrewbogott: the failure right now is that python-requests is not explicitly declared in puppet right now [16:06:38] yuvipanda: I put a link in that bug about where the docs should go [16:06:47] awesome [16:07:01] I'll do that when I manage to get off the bed [16:07:02] _joe_: oh, the error changed… I was intrigued by the present => true thing [16:07:16] anytime soon, I tell myself... [16:07:42] <_joe_> yuvipanda: take your time, it's only 9 AM and you stopped working at 1 AM! [16:07:58] <_joe_> you're allowed more than 8 hours of non-work you know [16:08:10] <_joe_> actually you're supposed to work for 8 hours [16:10:56] someone set the alarm while I was still in the office [16:11:06] I was hard to miss,was playing loud music! [16:30:09] 6Labs, 10wikitech.wikimedia.org: Adding a user to a project results in a blank page with the user added to the project but no shell access - https://phabricator.wikimedia.org/T114229#1689856 (10Krenair) Can you let me know if this happens again and at what time? [16:45:57] yuvipanda: it's not just rebuilding, it's also killing/restarting in case that's necessary :P [16:46:03] and those should be doable by me, I think [16:46:11] maybe by impersonating you, but still. [16:48:36] Valhalla true. I'll document those [16:48:59] thanks :-) [18:01:28] 6Labs, 10Salt: clean up old ec2id-based salt keys on labs - https://phabricator.wikimedia.org/T103089#1690233 (10ArielGlenn) @Andrew, can you have a look at these last few? [18:04:42] I need to upload a few files to our labs project, anyone who would provide me some handholding? [18:06:00] dennyvrandecic: sure. What's working/not working? And which project? [18:06:40] I want to move a few files into tools.wmflabs.org/wikidata-primary-sources/data/ [18:07:18] and I always keep forgetting where I have to ssh in and with which account, etc. [18:07:44] let me check what your username is.. [18:08:33] dennyvrandecic: ssh vrandezo@tools.wmflabs.org [18:08:50] dennyvrandecic: then, once logged in, run become wikidata-primary-sources [18:09:25] if you want to scp files in, the easiest is to scp them as your normal user (to /home/vrandeze), then to copy them as wikidata-primary-sources user [18:09:30] vrandezo [18:09:37] typing is hard :< [18:09:46] that sounds good! thanks this was superuseful [18:10:21] so, let's see if I can ssh into tools. I may have not set up my keys yet... (new computer since last time) [18:11:34] Connection timed out? that's not the error message i expected [18:12:02] eh. sorry, login.tools.wmflabs.org [18:12:10] ah, thanks! [18:15:52] ok, I have scped the file to my home directory as me [18:16:16] now how do I move them to the project directory and become primary sources? [18:16:22] sorry for being so dense [18:17:36] dennyvrandecic: become wikidata-primary-sources [18:17:40] (the command is called become) [18:17:51] then cd public_html/data [18:18:08] and cp /home/vrandezo/... [18:18:36] thanks for your patience! valhallasw`cloud! [18:19:36] you're welcome :) [18:21:10] :( my file in my home directory is world readable and yet after I became the project, I cannot access it [18:23:01] hmm. it seems permissions on /home are more strict than they were. yuvipanda, anything you can remember about that? [18:23:22] no I haven't touched it in a while... [18:23:23] well [18:23:26] ever actually [18:23:53] ok, so in that case, do it the other way around; copy the files from /home/vrandezo to /data/project/... as user vrandezo [18:24:12] we should fix this soon I hope. [18:24:53] dennyvrandecic: but afterwards, you might need to `take ` as the tool user [18:24:58] sorry this is so complicated :( [18:25:05] that's ok [18:25:35] is there a doc on doing this, though? harassing you on IRC doesn't seem to be the most effective way in the long run [18:25:43] (even though it works pretty well right now :) ) [18:26:36] wah, I cannot cp into the /data/project file because I have no rights to do so! :D [18:26:58] wtf [18:27:12] I could chmod the directory to be word writeable for a moment while i do that, but that sounds yucky... [18:27:22] group writable should be enough [18:27:31] ok [18:27:32] PROBLEM - Puppet staleness on tools-proxy-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [18:27:43] dennyvrandecic: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Updating_files [18:27:44] ^ ignore [18:27:54] (the warning that is) [18:28:49] ok, will try :) thanks again for the patience and the link to the help [18:30:36] PROBLEM - Puppet staleness on tools-proxy-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [43200.0] [18:31:35] valhallasw`cloud: thanks! I moved it to a local directory inside the project as my user account, then take'd it as the project, and then moved it to the public_html directory. that worked and is now available. [18:31:43] Thank you for your help! valhallasw`cloud [18:31:44] pfew! [18:32:10] you're still welcome :) [18:32:20] :) [18:37:57] 6Labs, 10Tool-Labs, 10Wikimedia-Mailing-lists: Shutdown toolserver-l mailman list - https://phabricator.wikimedia.org/T113845#1690440 (10Dzahn) a:3Dzahn [18:38:44] 6Labs, 10Tool-Labs, 10Wikimedia-Mailing-lists: Shutdown toolserver-l mailman list - https://phabricator.wikimedia.org/T113845#1690446 (10Dzahn) @Nosy79 ping me on IRC when you got a minute? let's do these missing steps to close the list as mentioned above by @multichill [19:19:23] 6Labs, 10wikitech.wikimedia.org: Adding a user to a project results in a blank page with the user added to the project but no shell access - https://phabricator.wikimedia.org/T114229#1690604 (10scfc) Will do in the future. If you want to look at the existing logs, the times should be roughly the modifications... [19:19:44] 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-107, 5Patch-For-Review, 3labs-sprint-116: holmium is a spof - https://phabricator.wikimedia.org/T106142#1690605 (10Andrew) [19:20:02] 6Labs, 3labs-sprint-116: Make labs domainproxies fully redundant - https://phabricator.wikimedia.org/T98556#1690607 (10Andrew) [19:20:17] 6Labs, 10Tool-Labs, 7Database, 3Labs-Q4-Sprint-1, and 5 others: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1690609 (10Andrew) [19:20:35] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3labs-sprint-116: Make sure tools-db is backed up in some form - https://phabricator.wikimedia.org/T88716#1690612 (10Andrew) [19:33:45] 6Labs, 10Tool-Labs, 7Database, 3Labs-Q4-Sprint-1, and 5 others: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1690679 (10jcrespo) I've been updating Mark on this, the backup is running, but a series of coincidences happened here: * This host has MyISAM tables, which... [20:12:45] 6Labs, 10Labs-Infrastructure, 3labs-sprint-116: Audit private IP allocation for Labs instances - https://phabricator.wikimedia.org/T113982#1690844 (10Andrew) First, history! To begin with, there was https://gerrit.wikimedia.org/r/#/c/20873/3/manifests/network.pp This confuses me a bit -- it reserves four d... [20:13:10] 6Labs, 10Labs-Infrastructure, 3labs-sprint-116: Audit private IP allocation for Labs instances - https://phabricator.wikimedia.org/T113982#1690846 (10Andrew) [20:22:51] 6Labs, 10Labs-Infrastructure, 3labs-sprint-116: Audit private IP allocation for Labs instances - https://phabricator.wikimedia.org/T113982#1690878 (10Andrew) Here's what I conclude: * Everything under 10.68 (aka 10.68.0.0/16) is reserved for labs instances. * Everything that matters (docs, configs, etc) now... [20:24:23] 6Labs, 10Labs-Infrastructure, 3labs-sprint-116: Audit private IP allocation for Labs instances - https://phabricator.wikimedia.org/T113982#1690883 (10Andrew) Note that dns/templates/10.in-addr.arpa agrees with the above: ; eqiad labs realm ; 10.68.0.0/24 - labs-instances1-a-eqiad $ORIGIN 0.68.{{ zonename }... [20:25:09] 6Labs, 10Labs-Infrastructure, 3labs-sprint-116: Audit private IP allocation for Labs instances - https://phabricator.wikimedia.org/T113982#1690891 (10Andrew) a:5Andrew>3mark Mark, if you concur then we can close this. [20:37:19] PROBLEM - Puppet failure on tools-proxy-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:38:09] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:38:32] PROBLEM - Puppet failure on tools-proxy-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:39:34] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [20:45:33] RECOVERY - Puppet staleness on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [20:47:34] RECOVERY - Puppet staleness on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [3600.0] [20:56:51] Krenair: can you kill testing-shinken- [20:57:29] yes [20:58:42] Krenair: thanks [20:59:04] Basically it's working, but with a couple of live hacks [20:59:11] and unmerged commits [20:59:29] Need to sort out nickserv password, and need to fix the channel list [20:59:46] ok [20:59:55] the other part of that commit, for icinga, is going to get a password in production. [21:00:05] ah I see [21:00:07] shinken... I guess we're going to have to make it optional for that :/ [21:00:09] yes would need a private commit [21:00:11] yeah [21:00:26] Yeah I'll arrange that with ori at some point I guess [21:00:35] Krenair: \o/ awesome [21:06:29] Krenair: good news, shinken-wm doesn't identify :-p [21:06:50] ... How is this good news? [21:07:00] It's neither good nor is it news. [21:07:26] Oh, sort out as in 'actually register the account' :-p [21:08:16] there's several wm bots not actually authing, and I don't think it has ever caused any issues to be honest [21:14:37] RECOVERY - Puppet failure on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [22:13:35] 6Labs, 10Labs-Infrastructure, 3labs-sprint-116: Make sure nova is re-using old private IPs - https://phabricator.wikimedia.org/T113648#1691248 (10Andrew) Something looks wrong in the fixed_ip table in the nova db. Each IP has two entries. Here's a pair for an existing instance: MariaDB MISC m5 localhost n... [22:48:32] RECOVERY - Puppet failure on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:54:35] PROBLEM - Puppet failure on tools-proxy-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [22:56:23] RECOVERY - Puppet failure on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:56:36] Krenair: you need to kill testing-shinken- again :) [22:57:05] I would, but... clearly something is wrong [22:57:23] RECOVERY - Puppet failure on tools-proxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:58:21] well [22:58:25] maybe not in the irc part [22:59:30] yuvipanda, FWIW, `sudo service tcpircbot-testing-shinken- stop` on shinken-ircbot-testing.shinken.eqiad.wmflabs fixes it [22:59:49] hmm ok [23:00:03] Krenair: so when I want to keep these down I'd just do chmod -x on the executable [23:00:07] and that'll get rid of it [23:00:29] ok [23:01:33] did that on /srv/tcpircbot/tcpircbot.py [23:04:32] RECOVERY - Puppet failure on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:06:19] Krenair: kk thanks [23:20:07] valhallasw`cloud: Hmm, ForrestBot is unhappy. Could you consider merging https://gerrit.wikimedia.org/r/#/c/242369/ so I can deduce why? :-)