[00:17:34] RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [00:57:47] 6Labs, 6WMF-Legal: Ensure that Terms of Use document restrictions on third-party web interactions - https://phabricator.wikimedia.org/T129936#2143252 (10ZhouZ) Hi @chasemp, Thanks for pointing out these potential inconsistencies and lack of clarity on these terms. These are all things we will be looking at a... [01:24:39] !log ores deployed ores-wikimedia-config:8af4377 [01:24:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL, Master [01:24:48] and good night! [01:27:06] ugh, phab-01 is broken again? [01:28:03] (no login, even as root) [01:28:20] and of course it uses the central salt master so I can't try that either [01:31:21] and phab-02 [01:31:22] and phab-04 [01:35:23] 6Labs: Login to phab-0[124].phabricator.eqiad.wmflabs is broken, even as root - https://phabricator.wikimedia.org/T130693#2143312 (10Krenair) [01:36:44] 6Labs: Login to phab-0[124].phabricator.eqiad.wmflabs is broken, even as root - https://phabricator.wikimedia.org/T130693#2143324 (10Krenair) [01:46:48] 6Labs: Login to phab-0[124].phabricator.eqiad.wmflabs is broken, even as root - https://phabricator.wikimedia.org/T130693#2143325 (10Krenair) dealt with puppet/phab-03 [01:48:00] !log phabricator Unbroken puppet by dropping local commits, at least some of these were already merged to master. Not sure about 2f5e74c65e399fd5ecfb4d3a6eade28c191113ff or 46831af9a4e433d1caec82a8fbc881e2a6d8427d [01:48:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL, Master [03:42:29] 6Labs, 10Tool-Labs: Setup an easy to use logrotate based system for rotating tools logs - https://phabricator.wikimedia.org/T68623#2143387 (10MusikAnimal) We're using a simple bash script for xtools, which I've now added to pageviews: ``` for logfile in *.err *.out *.log do tail -c 100000 $logfile > temp.$$;... [06:55:46] PROBLEM - Puppet run on tools-webgrid-lighttpd-1208 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [07:35:46] RECOVERY - Puppet run on tools-webgrid-lighttpd-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [07:50:50] !log ores-staging made a new security group called webserver enabling access to port 8080 [07:50:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores-staging/SAL, Master [07:52:50] !log revscoring deleted DNS proxy mw-revscoring pointing out to mediawiki.revscoring.eqiad.wmflabs [07:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Revscoring/SAL, Master [07:53:50] !log ores-staging made a new DNS proxy called mw-revscoring pointing out to mediawiki-ores.ores-staging.eqiad.wmflabs:8080 [07:53:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores-staging/SAL, Master [08:30:23] (03CR) 10Glaisher: "Can you check the logs to see what error it was? It works fine for me locally." [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [08:34:52] PROBLEM - Host tools-bastion-01 is DOWN: CRITICAL - Host Unreachable (10.68.17.228) [10:03:27] !log puppet3-diffs added Elukey as a project member [10:03:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet3-diffs/SAL, Master [10:28:03] RECOVERY - Puppet run on tools-proxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:36:43] jynus: i left you a msg on your commons talkpage. [11:37:29] Steinsplitter, that is nice, but again, wrong channel [11:37:46] on-wiki is a bad place for communicating bugs [11:37:54] use phabricator instead [11:38:11] you schould help volounteers [11:38:18] not playing bureaucratic games [11:38:31] if you go to phabricator, I said "this probably require intervention by labs admins to manually generate one for you." [11:38:39] I am not a labs admin [11:38:52] I cannot help you anymore [11:39:42] well, if you telled me this before then i don't have to get angry *_* [11:39:48] I aid that [11:39:53] on the right channel [11:39:56] phabricator [11:40:25] phabricator is what wikimedia employees and volunteers use to coordinate about bugs [11:41:00] "jcrespo placed this task up for grabs." means "I am not working on this" [11:41:11] "jcrespo moved this task to Blocked external/Not db team on the DBA workboard." [11:41:23] means this is not a DBA task [11:42:05] I would recommend reading about phabricator on https://www.mediawiki.org/wiki/Phabricator/Project_management [11:42:53] we have millions of users to attend, we need to be strict about how to communicate problems in order to attend all [11:42:56] years ago everything was less bureaucratic, but okay. time is changing. wikipedia is becoming bigger. [11:43:05] exactly [11:43:40] what would it happen if every user sent me a message on wiki, do you think I would answer to all myself? [11:44:21] there are multiple people working on task, and we need to coordinate among us to do as much as possible [11:44:47] s/would/could/ s/task/tasks/ [11:45:45] having rules means Quality of process, and a guarantee that your (all people's) issues are processed and resolver correctly [11:47:13] as a veteran of the project, I need to be stricter for you to follow the rules (the same way we do for veterans on-wiki) [11:50:20] when möller was deputy director a ping was enough, but ok - stuff has changed [11:55:21] that sounds like corruption/lack of transparency to me [11:57:42] it is called colaboration.... [11:58:19] sounds like "I am a friend of someone, give me preferential treatment" [11:58:35] tickets, on the other side, can be easily audited [12:00:10] I have >200 pending tickets (only myself), sorry for wanting some kind of organization https://phabricator.wikimedia.org/tag/dba/ that every open source project has [12:02:03] I should add *serious* before open source [12:34:54] jynus: i created a workaround (using a other user, you closed the old user) so the bug can be closed. :) [12:51:08] chasemp, andrewbogott, yuvipanda https://gerrit.wikimedia.org/r/#/c/278899/ is generally good to go, but the caveat is that labs instances which don't use unattended-upgrades for some reason will have an exim installed, which doesn't support add|keep_environment yet, so exim will fail to start [12:52:26] AFAIK you've started to use clustershell for labs instance, so maybe one you can run a labs-wide upgrade for those instances? or we just send a headsup mail advising people to upgrade if they run into that problem [14:32:35] 6Labs, 10Tool-Labs: Labs/Tools mailing list reform - https://phabricator.wikimedia.org/T130637#2144395 (10chasemp) >>! In T130637#2142322, @Legoktm wrote: > Does labs-announce have reply-to set to labs-l? It seems to yep > Reply-To: labs-l@lists.wikimedia.org But I do see people either replying or sending t... [15:24:00] andrewbogott: can you delay rebooting encoding02 for 2 hours? I've signaled the node to stop accepting new tasks, but there's still a task running on it [15:26:50] 6Labs, 6WMF-Legal: Ensure that Terms of Use document restrictions on third-party web interactions - https://phabricator.wikimedia.org/T129936#2144531 (10bd808) There actually is a potential technical solution to disallowing loading javascript, images, etc from external domains. The [[http://www.html5rocks.com/... [15:29:24] andrewbogott: If you're here, can give cac.rcm.eqiad.wmflabs a last try? maybe 'sudo chown -R puppet /var/lib/puppet' helps [15:31:02] Why is there a image 'ubunuty-14.04-trusty (testing)' there?! [15:32:23] Luke081515: kernel upgrades are in progress, various things are in flux [15:32:56] andrewbogott: Is instance creation blocked by this, or not? [15:33:28] Luke081515: I'd recommend that you wait a bit [15:33:35] ok [15:33:59] andrewbogott: Can you ping me, if this is finished? [15:34:20] Luke081515: there will be a general announcement when the upgrades are done [15:34:29] here, or at the mailinglist? [15:34:42] mailing list [15:35:38] ok, thanks [15:36:06] 6Labs, 6WMF-Legal: Ensure that Terms of Use document restrictions on third-party web interactions - https://phabricator.wikimedia.org/T129936#2144560 (10chasemp) As discussed on irc a bit this seems right to me. Thanks @bd808 [15:48:37] jynus: Currently here? [15:50:32] question solved [16:19:33] 6Labs, 6WMF-Legal: Ensure that Terms of Use document restrictions on third-party web interactions - https://phabricator.wikimedia.org/T129936#2144661 (10tom29739) @bd808 Does that stop links to external sites, or just the loading of javascript, images, etc. on the page from external sites? [16:24:28] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:50] 6Labs, 6WMF-Legal: Ensure that Terms of Use document restrictions on third-party web interactions - https://phabricator.wikimedia.org/T129936#2144709 (10bd808) >>! In T129936#2144661, @tom29739 wrote: > @bd808 Does that stop links to external sites, or just the loading of javascript, images, etc. on the page f... [16:36:46] 6Labs, 10Tool-Labs, 6Security-Team: Procure *.tools.wmflabs.org certificate - https://phabricator.wikimedia.org/T130649#2144726 (10RobH) a:3yuvipanda This is all done via the private procurement task. I'm assigning this back to you just so you are aware. The key file is in the private repo, with the publ... [17:39:51] !log ores deployed ores-wikimedia-config:39b622e [17:39:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL, Master [17:41:40] 6Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2145156 (10bd808) [17:42:09] 6Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2145172 (10bd808) [17:42:12] 6Labs, 6WMF-Legal: Ensure that Terms of Use document restrictions on third-party web interactions - https://phabricator.wikimedia.org/T129936#2120419 (10bd808) [17:44:12] 6Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2145156 (10bd808) Marked as blocked by {T129936} because we need to determine what the official policy is before enabling enforcement. If the final policy incl... [17:44:45] 6Labs, 6WMF-Legal: Ensure that Terms of Use document restrictions on third-party web interactions - https://phabricator.wikimedia.org/T129936#2145180 (10bd808) I created {T130748} to discuss the technical details of enforcement once we have decided here what the official policy actually is. [18:15:46] 6Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2145364 (10chasemp) >>! In T130748#2145206, @valhallasw wrote: > We can also choose to monitor instead of blocking, by using https://www.w3.org/TR/CSP/#content... [18:15:55] 6Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2145366 (10chasemp) p:5Triage>3Normal [18:24:12] 6Labs: Add Content-Security-Policy header enforcing 3rd party web interaction restrictions to proxy responses - https://phabricator.wikimedia.org/T130748#2145423 (10valhallasw) >>! In T130748#2145364, @chasemp wrote: > Honest question, what would we do once we had the list of things that would overwise be blocke... [19:28:21] PROBLEM - SSH on tools-bastion-02 is CRITICAL: Connection refused [19:30:31] PROBLEM - Host tools-flannel-etcd-03 is DOWN: CRITICAL - Host Unreachable (10.68.22.169) [19:54:39] RECOVERY - Host tools-flannel-etcd-03 is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [19:58:51] PROBLEM - Host tools-webgrid-lighttpd-1409 is DOWN: CRITICAL - Host Unreachable (10.68.18.43) [19:58:59] PROBLEM - Host tools-webgrid-lighttpd-1410 is DOWN: CRITICAL - Host Unreachable (10.68.18.44) [19:59:01] PROBLEM - Host tools-bastion-02 is DOWN: CRITICAL - Host Unreachable (10.68.16.44) [19:59:45] PROBLEM - Host tools-exec-1206 is DOWN: CRITICAL - Host Unreachable (10.68.17.105) [19:59:55] PROBLEM - Host tools-exec-1408 is DOWN: CRITICAL - Host Unreachable (10.68.18.14) [20:00:05] PROBLEM - Host tools-exec-1204 is DOWN: CRITICAL - Host Unreachable (10.68.17.88) [20:00:41] PROBLEM - Host tools-k8s-etcd-02 is DOWN: CRITICAL - Host Unreachable (10.68.18.64) [20:01:03] PROBLEM - Host tools-exec-1201 is DOWN: CRITICAL - Host Unreachable (10.68.17.49) [20:01:08] PROBLEM - Host tools-exec-1218 is DOWN: CRITICAL - Host Unreachable (10.68.18.19) [20:01:14] PROBLEM - Host tools-webgrid-lighttpd-1411 is DOWN: CRITICAL - Host Unreachable (10.68.17.51) [20:01:44] PROBLEM - Host tools-exec-1202 is DOWN: CRITICAL - Host Unreachable (10.68.16.57) [20:01:52] PROBLEM - Host tools-webgrid-generic-1405 is DOWN: CRITICAL - Host Unreachable (10.68.16.110) [20:02:01] PROBLEM - Host tools-exec-1213 is DOWN: CRITICAL - Host Unreachable (10.68.17.252) [20:02:08] PROBLEM - Host tools-puppetmaster-01 is DOWN: CRITICAL - Host Unreachable (10.68.22.61) [20:02:13] PROBLEM - Host tools-webgrid-generic-1404 is DOWN: CRITICAL - Host Unreachable (10.68.18.53) [20:02:29] PROBLEM - Host tools-exec-cyberbot is DOWN: CRITICAL - Host Unreachable (10.68.16.39) [20:02:33] PROBLEM - Host tools-exec-1217 is DOWN: CRITICAL - Host Unreachable (10.68.18.20) [20:02:39] PROBLEM - Host tools-exec-1209 is DOWN: CRITICAL - Host Unreachable (10.68.17.129) [20:03:22] Hey folks. Hows the rebooting going? [20:03:58] early in the process, staging the kernel updates too longer than expected [20:04:14] Gotcha. Will hold tight and blame ORES downtime on yall ;) [20:04:20] PROBLEM - Puppet run on tools-worker-1003 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [20:04:36] *yall --> kernel exploits [20:04:51] yall are kernel exploits ;) [20:06:45] PROBLEM - Puppet run on tools-worker-1007 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:07:05] PROBLEM - Puppet run on tools-flannel-etcd-03 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [20:07:43] Just need to make it clear: I <3 labs ops and happily deal with a little bit of downtime for the awesomeness of labs. [20:08:03] * halfak mashes F5 [20:09:33] PROBLEM - Puppet run on tools-proxy-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:10:33] RECOVERY - Host tools-exec-1204 is UP: PING OK - Packet loss = 0%, RTA = 1.38 ms [20:11:01] RECOVERY - Host tools-exec-1217 is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms [20:11:02] RECOVERY - Host tools-exec-1209 is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [20:11:05] RECOVERY - Host tools-exec-1218 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms [20:11:11] RECOVERY - Host tools-exec-cyberbot is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms [20:11:27] RECOVERY - Host tools-webgrid-generic-1405 is UP: PING OK - Packet loss = 0%, RTA = 473.92 ms [20:11:31] halfak: :D [20:11:38] RECOVERY - Host tools-webgrid-lighttpd-1409 is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms [20:11:41] chasemp: I'm going to shut up shinken-wm [20:11:46] thanks [20:11:46] RECOVERY - Host tools-k8s-etcd-02 is UP: PING OK - Packet loss = 0%, RTA = 0.44 ms [20:11:47] RECOVERY - Host tools-exec-1202 is UP: PING OK - Packet loss = 0%, RTA = 1.05 ms [20:11:54] RECOVERY - Host tools-exec-1213 is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms [20:11:59] RECOVERY - Host tools-bastion-02 is UP: PING OK - Packet loss = 0%, RTA = 1.67 ms [20:12:08] halfak: the downtime is harming your wikilabels script again, causing slow js again at all users [20:12:15] RECOVERY - Host tools-webgrid-lighttpd-1410 is UP: PING OK - Packet loss = 0%, RTA = 0.96 ms [20:12:15] RECOVERY - Host tools-exec-1206 is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [20:12:17] you've promised to change it [20:13:13] RECOVERY - Host tools-puppetmaster-01 is UP: PING OK - Packet loss = 0%, RTA = 0.52 ms [20:29:12] i see, things not so hot [20:29:31] planned maintenance, nuria :) [20:29:53] haha, i did not mean to imply otherwise [20:29:57] sjoerddebruin, thanks for the notice. [20:30:09] I've been converting to hosting the JS on MediaWiki [20:30:14] I'll update the instructions now. :) [20:32:37] sjoerddebruin, https://meta.wikimedia.org/w/index.php?title=Wiki_labels&type=revision&diff=15471821&oldid=14953398 [20:32:47] \o/ [20:32:48] redis down on Tool Labs? I guess that's probably related to the planned maintenance? [20:32:49] Please direct anyone having an issue to the updated instructions. [20:32:56] sorry for the delay on this. [20:33:15] np, thanks for taking care now [20:33:52] MusikAnimal: yup, should be coming back up [20:34:58] we have a lot of VM's [20:35:34] thanks [20:39:24] sjoerddebruin, new to-dos are to implement better messaging in the gadget so that it'll tell you when the server is down and can't load any campaigns. [20:39:27] * halfak files tasks [20:39:40] yeah, most errors are not so helpful :P [20:40:04] I'll CC you on the tasks. It would be helpful to have you post about your experience. [20:40:10] Should have it posted in a couple minutes. [20:42:27] sjoerddebruin, fyi: https://phabricator.wikimedia.org/T130773 and https://phabricator.wikimedia.org/T130774 [20:42:45] great! [20:47:14] (There was a space missing) [20:49:22] are the reboots done, or is there still something to do? [20:51:10] Luke081515: still ongoing [20:51:55] Why are there lots of reboots today? [20:52:33] tom29739: there's a kernel exploit affecting ubuntu trusty machines (at least) [20:55:02] yuvipanda: What do you guess, how long the will take? [20:55:09] * Luke081515 needs to create a trusty instnace [20:55:12] *instance [20:55:23] at least a couple of hours? [21:01:28] bd808: Do you got an example role for me, with database changes? Maybe I'm able then to implement it at the role globalblocking ;) [21:02:05] Luke081515: role::centralauth does similar things (provisions a database, touches all the wikis) [21:03:58] bd808: Sorry, I forgot it, can you tell me the directory for extension roles again? :-/ [21:24:35] bd808: Ok, I found it [21:47:45] * tom29739 pokes stashbot. [21:48:13] bd808: the ES cluster is going to restart soon [21:48:17] or is probably already restarting [21:48:33] yuvipanda: *nod* it should be fine but I'll check on it later [21:49:04] ok! [21:57:27] I'm getting a 502 Bad Gateway for http://swis.wmflabs.org/ Is this due to ongoing work or does someone need to kick something? [21:58:13] best to wait until the reboots are completed before kicking things :) [22:00:02] * tom29739 shakes his fist at tools-bastion-02 [22:00:25] what did it do?