[00:19:12] RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [00:45:00] 06Labs, 13Patch-For-Review: Periodic internal labs dns outages - https://phabricator.wikimedia.org/T124680#2266224 (10scfc) @chasemp: Thanks. Does that mean that some Labs instance(s) are effectively causing the outages? Could you identify those with your debugging? While it is nice (and necessary) to armor... [01:07:21] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Diffusion, 15User-bd808: Create application to manage Diffusion repositories for a Tool Labs project - https://phabricator.wikimedia.org/T133252#2266279 (10mmodell) Looks like the conduit API is in place now: https://phabricator.wikimedia.org/conduit/met... [01:08:44] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Diffusion, 15User-bd808: Create application to manage Diffusion repositories for a Tool Labs project - https://phabricator.wikimedia.org/T133252#2266280 (10mmodell) Also, https://phabricator.wikimedia.org/conduit/method/user.ldapquery/ is live on this in... [01:54:46] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Diffusion, 15User-bd808: Create application to manage Diffusion repositories for a Tool Labs project - https://phabricator.wikimedia.org/T133252#2266309 (10bd808) >>! In T133252#2266279, @mmodell wrote: > Looks like the conduit API is in place now: https... [02:03:46] 06Labs, 10Tool-Labs, 10labs-sprint-117, 06Community-Tech-Tool-Labs, and 6 others: Organize a (annual?) toollabs survey - https://phabricator.wikimedia.org/T95155#2266311 (10bd808) 05Open>03Resolved At long last the detailed results have been [[https://meta.wikimedia.org/wiki/Research:Annual_Tool_Labs_S... [04:46:25] 10Tool-Labs, 13Patch-For-Review: Install xml2 on labs - https://phabricator.wikimedia.org/T134146#2266484 (10scfc) [05:02:53] 06Labs, 10Labs-Infrastructure, 10DBA: Add shared edit rights between eranbot and plagiabot tools - https://phabricator.wikimedia.org/T134392#2266490 (10eranroz) I agree though I'm not sure how to do it technically (am I allowed to grant access to the database to other users?) I would like to GRANT ALL PRIVI... [06:12:31] 06Labs, 10Labs-Infrastructure, 10DBA: Add shared edit rights between eranbot and plagiabot tools - https://phabricator.wikimedia.org/T134392#2266541 (10jcrespo) > I agree though I'm not sure how to do it technically (am I allowed to grant access to the database to other users?) Do not worry, this only neede... [06:20:42] 06Labs, 10Labs-Infrastructure, 10DBA: Add shared edit rights between eranbot and plagiabot tools - https://phabricator.wikimedia.org/T134392#2266543 (10jcrespo) From IRC: ``` Niharika: which database? I'm not finding anything on tools-db that is obviously owned by s52615. bd808: "s51306__... [06:20:50] 06Labs, 10Labs-Infrastructure, 10DBA: Add shared edit rights between eranbot and plagiabot tools - https://phabricator.wikimedia.org/T134392#2266544 (10jcrespo) [06:21:00] 06Labs, 10Labs-Infrastructure, 10DBA: Add shared edit rights between eranbot and plagiabot tools - https://phabricator.wikimedia.org/T134392#2264219 (10jcrespo) p:05Triage>03Normal [06:23:05] 06Labs, 10Labs-Infrastructure, 10DBA: Add shared edit rights between eranbot and plagiabot tools - https://phabricator.wikimedia.org/T134392#2266563 (10jcrespo) s51306__copyright_p only exists on labsdb1001. [06:32:55] 06Labs, 10Labs-Infrastructure, 10DBA: Add shared edit rights between eranbot and plagiabot tools - https://phabricator.wikimedia.org/T134392#2266572 (10jcrespo) 05Open>03Resolved a:03jcrespo @Niharika your grants (for s52615) have been updated to include: ``` GRANT ALL PRIVILEGES ON `s51306\_\_copyrig... [08:56:53] 06Labs, 10Labs-Infrastructure, 10DBA: Add shared edit rights between eranbot and plagiabot tools - https://phabricator.wikimedia.org/T134392#2266758 (10Niharika) >>! In T134392#2266572, @jcrespo wrote: > @Niharika your grants (for s52615) have been updated to include: > > ``` > GRANT ALL PRIVILEGES ON `s513... [09:08:31] 06Labs, 10Labs-Infrastructure, 10DBA: Add shared edit rights between eranbot and plagiabot tools - https://phabricator.wikimedia.org/T134392#2266780 (10jcrespo) > Where exactly am I supposed to run this command? You are not supposed to run anything, I did it already (that is why it is resolved). You already... [09:09:17] 06Labs, 10Labs-Infrastructure, 10DBA: Add shared edit rights between eranbot and plagiabot tools - https://phabricator.wikimedia.org/T134392#2266781 (10Niharika) >>! In T134392#2266780, @jcrespo wrote: >> Where exactly am I supposed to run this command? > > You are not supposed to run anything, I did it alr... [09:30:12] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:31:04] !log wikilabels restarting uwsgi-wikilabels-web manually to reset db connections [09:31:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL, Master [09:33:14] hmm [09:34:00] not sure why now [09:35:06] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 824703 bytes in 4.236 second response time [09:35:07] 10Tool-Labs, 13Patch-For-Review: Install xml2 on labs - https://phabricator.wikimedia.org/T134146#2266799 (10valhallasw) 05Open>03Resolved [09:50:08] YuviPanda: xml2 is a pretty fun tool to have :D [09:52:25] generally better than grepping xml [09:53:02] :D nice [09:53:05] 06Labs, 10Tool-Labs, 10labs-sprint-117, 06Community-Tech-Tool-Labs, and 6 others: Organize a (annual?) toollabs survey - https://phabricator.wikimedia.org/T95155#2266817 (10Qgil) [09:53:15] although, i've never had to grep xml outside of gridengine... [09:54:00] I have. But generally in Python, so no need for xml2 there [09:54:03] right [09:54:09] but nice to discover useful tools :D [09:54:12] * valhallasw`cloud shudders thinking back of SOAP [10:58:41] bd808: Nice to see the results of the survey are finally out! :-D [13:11:45] !log tools cherry-pick https://gerrit.wikimedia.org/r/#/c/280652/ on puppetmaster [13:11:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [13:19:02] godog: \o/ [13:20:38] YuviPanda: \o/ indeed! also mind if I allow tcp/9090 on tools on default security group for prometheus-node-exporter ? [13:25:01] godog: sure. but inside just tools you don't need security groups for it, right? [13:25:09] godog: but if you want it to be exported outside, yeah, feel free to [13:27:09] YuviPanda: ah heh what I wanted to do is also host-level monitoring inside the tools project, for that we'd need prometheus-node-exporter installed on the tools instances, which listens on :9090 and then prometheus polls it [13:27:25] godog: you dont need to touch security groups for that, yeah. [13:27:40] godog: yeah, is there a role for it? tools-worker-**** are also all on the same puppetmsater so you can apply the role to them via wikitech [13:29:29] yeah, the role is under another code review, ok thanks! node-exporter can wait but I'll see if I can get the k8s discovery going in the next half an hour [13:31:00] godog: \o/ awesome. [13:31:14] godog: let me know if you need auth help [13:38:00] for sure! thanks [13:38:28] godog: yw. also, if you run something on tools-k8s-master-01 (which is also under this puppetmaster) you can hit k8s master on localhost and don't need auth [13:59:44] YuviPanda: ack! yeah where are the tokens again? I remember you created a readonly one for me a while ago [14:00:10] 06Labs, 10DBA, 13Patch-For-Review: Move labs pdns database off of m5-master - https://phabricator.wikimedia.org/T128737#2267118 (10Andrew) [14:05:30] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [14:07:45] PROBLEM - Puppet run on tools-worker-1007 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [14:09:37] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [14:10:58] 06Labs, 10DBA, 13Patch-For-Review: Move labs pdns database off of m5-master - https://phabricator.wikimedia.org/T128737#2267137 (10Andrew) [14:34:37] godog: yeah, they're in the puppetmaster. check /var/lib/git/labs/secret [14:36:24] godog: I created a 'godog' account of 'infrastructure-readonly'. You can use that, or just add another account in the hieradata/common.yaml file there. infrastructure will give it write / read access to everything, infrastructure-readonly will give just read [14:45:28] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [14:46:37] YuviPanda: *nod* thanks a lot! I reused the readonly one for 'filippo' which was also infrastructure-readonly, looks like it is working! [14:47:09] YuviPanda: polling kubelets on tcp/10255 [14:47:32] godog: omg \o/ [14:47:36] godog: is there a web interface? [14:47:57] RECOVERY - Puppet run on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [14:49:13] YuviPanda: yeah probably the easiest now is if you ssh -L9090:localhost:9090 tools-prometheus-01.eqiad.wmflabs [14:49:45] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [14:49:55] godog: hmm, connection denied? [14:50:03] ah, nvm [14:51:08] godog: loading but really slowly. half is my connection I suppose [14:52:04] YuviPanda: but by itself the web interface isn't very interesting, most of the interesting things come from querying the metrics, e.g. kubelet_running_container_count [14:53:05] godog: woo, this is awesome [14:53:08] chasemp: ^ [14:53:10] valhallasw`cloud: ^ [14:53:19] hmmmm? [14:53:41] * valhallasw`cloud tries [14:53:43] valhallasw`cloud: metrics/monitoring for k8s with a web interface :) [14:55:01] godog: so we can plug this into alertmanager as well I suppose [14:55:06] ha, cool. [14:55:43] valhallasw`cloud: look at container_memory_rss for example [14:56:22] so we can do things like 'find all containers with memory usage > x' [14:56:24] with this [14:56:51] sounds good [14:56:51] YuviPanda: yup, I haven't tried it yet heh, also hooking it up to grafana [14:56:59] godog: wooo, awesome. [14:57:58] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [14:57:58] PROBLEM - Puppet run on tools-exec-1219 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [14:58:23] uh-oh [14:58:48] godog: you can expose it via a webproxy I guess [14:58:49] that DNS time of the day again? [14:58:55] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: DNS lookup failed for tools-redis-1001.tools.eqiad.wmflabs Resolv::DNS::Resource::IN::A at /etc/puppet/modules/toollabs/manifests/init.pp:184 on node tools-webgrid-generic-1401.tools.eqiad.wmflabs [14:59:40] ^ andrewbogott bad news? [14:59:46] valhallasw`cloud: can you run puppet again [14:59:48] * YuviPanda loads https://grafana.wikimedia.org/dashboard/db/labs-dns-dashboard [15:00:00] * valhallasw`cloud nods [15:00:02] PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:00:15] chasemp: I see everything has fallen over to holmium now [15:00:41] Connection initially died with ' Write failed: Broken pipe' -- not sure if that's my internet or something related. [15:00:46] * valhallasw`cloud is running puppet now [15:00:54] just a second, this is a bit of a tangle [15:00:59] recursor works for me now [15:01:07] YuviPanda: andrewbogott is in the middle of maint but I'm not sure if this is part of it...I'm guessing yes (and I'm in a meeting atm) [15:01:13] ok [15:01:30] PROBLEM - Puppet run on tools-exec-1213 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [15:02:42] PROBLEM - Puppet run on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [15:02:54] PROBLEM - Puppet run on tools-exec-1202 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:03:22] hmm [15:03:56] it works for me right now, so I suppose this is the tangle from the failover [15:04:00] labs-recursor1.wikimedia.org seems to e down [15:04:02] * YuviPanda doesn't touch anything [15:04:04] chasemp: yup [15:04:39] I have an idea of why maybe but I don't entirely get the effect [15:04:48] andrewbogott: I think this is fallout from removal of 208.80.155.118 [15:04:55] yep [15:05:03] ....why? [15:05:37] (puppet is really slow on tools-webgrid-generic-1401 ; still hanging after 'loading facts') [15:05:47] I don't know. My only guess so far is that the recursors are swapped, recursor0 is on holmium and recursor1 on labservices? [15:05:55] but, setting that aside for now [15:06:42] andrewbogott: can I enable and run puppet to see if it's added? [15:06:44] PROBLEM - Puppet run on tools-bastion-05 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:06:45] PROBLEM - Puppet run on tools-webgrid-lighttpd-1206 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:06:59] PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:06:59] chasemp: all done [15:07:02] I see it now :) [15:07:03] (dig cocntinues to work, so I'm still pinning these failures on the failover itself) [15:07:06] the .118 IP is back [15:07:11] and axfr is working with that udp change [15:07:15] nice [15:07:28] so we have a mystery but i /think/ everything is working at the moment [15:07:37] I have one more test to run before I can close the window [15:07:44] yeah I really didn't think xfr was over udp but hey makes sense on fix and failure at least [15:07:48] yep [15:07:56] if you don't mind give me a moment to run through things before calling it [15:08:01] PROBLEM - Puppet run on tools-exec-1206 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:09:42] ok, new instance dns working as well [15:09:43] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Documentation: Create a "my first PHP webservice" tutorial for Tool Labs - https://phabricator.wikimedia.org/T134493#2267264 (10bd808) [15:09:53] so, chasemp, everything looks good to me (apart from the .118 mystery) [15:09:56] for you? [15:11:18] 06Labs, 10DBA, 13Patch-For-Review: Move labs pdns database off of m5-master - https://phabricator.wikimedia.org/T128737#2267283 (10Andrew) This required the emergency application of https://gerrit.wikimedia.org/r/#/c/287093/ (axfr on udp, who knew?) but otherwise went as expected. [15:11:18] does this work for you [15:11:18] dig tools-bastion-03.eqiad.wmflabs @labs-recursor1.wikimedia.org [15:11:27] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:11:33] (from inside labs) [15:11:51] chasemp: yup [15:12:04] seems to [15:12:30] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Documentation: Create a "my first Python webservice" tutorial for Tool Labs - https://phabricator.wikimedia.org/T134494#2267284 (10bd808) [15:12:51] yep all seems well [15:12:57] oh, I see why puppet is so slow. It's trying to read the world from my home directory? stat("/home/valhallasw/.gem/ruby/1.9.1/gems/web-console-2.2.1/lib/puppet/type/monitoring::service.rb", 0x7ffcc48768e0) = -1 ENOENT (No such file or directory) ... fun with gems. [15:13:17] haha awwruby but that also seems terrible meaning you can compromise everything via paths [15:13:37] andrewbogott: call it? [15:13:44] sudo -i puppet agent -tv indeed works /much/ faster [15:13:50] um… I forgot to test domain creation, trying that now [15:13:54] yep ok [15:14:00] (better test deletion too) [15:14:09] different db perms etc [15:14:27] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Documentation: Create a "my first Pywikibot bot" tutorial for Tool Labs - https://phabricator.wikimedia.org/T134495#2267300 (10bd808) [15:14:42] * valhallasw`cloud wonders why the -i flag is not standard for sudo [15:15:51] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Documentation: Create a "my first Python webservice" tutorial for Tool Labs - https://phabricator.wikimedia.org/T134494#2267284 (10valhallasw) https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Python_application_stub is an example of this, but not as acce... [15:15:57] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Documentation: Create a "my first Pywikibot bot" tutorial for Tool Labs - https://phabricator.wikimedia.org/T134495#2267300 (10yuvipanda) Steal some from https://www.mediawiki.org/wiki/Manual:Pywikibot/PAWS_walk-through. Also I want us to support crons in... [15:16:14] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Documentation: Create a "my first Pywikibot bot" tutorial for Tool Labs - https://phabricator.wikimedia.org/T134495#2267328 (10yuvipanda) and interactive bots can already be run there and are being run there [15:16:36] godog: btw, the prometheus interface is all readonly right? so we can safely expose it to the world? [15:18:22] chasemp: I'm going to have to hack on this for a bit, but in the meantime instance creation/deletion and CI should be fine. [15:18:35] andrewbogott: what's the deal? [15:18:38] ok [15:18:41] I don't know yet [15:18:46] just domain creation is broken somehow [15:18:53] which makes sense, since it's a different code path from records [15:18:57] hm right [15:19:04] so the pool_manager inserts [15:19:05] are not working [15:20:19] @seen volans [15:20:19] Cyberpower678: Last time I saw volans they were talking in the channel, they are still in the channel #wikimedia-databases at 5/5/2016 3:03:42 PM (16m37s ago) [15:20:27] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Documentation: Create a "my first Pywikibot bot" tutorial for Tool Labs - https://phabricator.wikimedia.org/T134495#2267300 (10jayvdb) As an aside, I think we should be trying to enable the "my first Pywikibot bot" experience to be within #PAWS. For that,... [15:20:40] andrewbogott: ok let me know how I can help, I'm resetting the topic since this is an edge case and we don't expect DNS to have stability issues [15:20:47] yep, agreed [15:20:58] 06Labs, 10Tool-Labs: Web requests fail after a period of time - https://phabricator.wikimedia.org/T133090#2267350 (10Nettrom) I'm not sure if this ticket should be closed now. While I rarely experience issues with accessing my web services, the problem with the server being restarted appears to persist. I wasn... [15:23:28] YuviPanda: yeah it is effectively r/o, though by default also go's /debug/ is exposed, so you could e.g. run 'go tool pprof http://localhost:9090/debug/pprof/heap' [15:24:03] godog: ouch, I see. so shall we put it in front of an nginx that denies /debug? or is that harmless? [15:24:09] I'm not sure how harmful that is [15:26:40] yeah I don't know either, possibly it isn't but haven't looked closely yet, also if it is hooked up to grafana we don't need the web interface proxied/exposed [15:27:27] godog: right. but can we hook it up to grafana.wikimedia.org without exposing it to the internet? [15:28:59] yes and no, we could reverse-proxy it and deny clients not coming from wmf's networks [15:32:59] RECOVERY - Puppet run on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0] [15:33:50] godog: can you take care of that? that'd be awesome [15:39:56] andrewbogott: is this newly bad? [15:39:59] rush@tools-bastion-03:~$ host 10.68.16.187 | wc -l [15:39:59] 41 [15:40:00] or same old [15:40:29] don't know [15:40:39] I don't recognize that IP, is it a proxy? [15:40:43] Or something that's leaking? [15:41:05] it's CI [15:41:08] rush@tools-bastion-03:~$ host 10.68.16.187 | awk '{print $5}' | sort | uniq | grep 'ci-' | wc -l [15:41:08] 40 [15:41:14] ci-jessie-wikimedia-87247.contintcloud.eqiad.wmflabs. [15:41:14] ci-jessie-wikimedia-88992.contintcloud.eqiad.wmflabs. [15:41:14] ci-jessie-wikimedia-89288.contintcloud.eqiad.wmflabs. [15:41:16] ci-jessie-wikimedia-90080.contintcloud.eqiad.wmflabs. [15:41:18] ci-jessie-wikimedia-91103.contintcloud.eqiad.wmflabs. [15:41:20] ci-jessie-wikimedia-91181.contintcloud.eqiad.wmflabs. [15:41:22] ci-trusty-wikimedia-85713.contintcloud.eqiad.wmflabs. [15:41:24] ci-trusty-wikimedia-87718.contintcloud.eqiad.wmflabs. [15:41:26] ci-trusty-wikimedia-88194.contintcloud.eqiad.wmflabs. [15:41:28] etc [15:41:31] RECOVERY - Puppet run on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [0.0] [15:41:37] RECOVERY - Puppet run on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [15:41:54] definitely seems leaky? [15:41:59] RECOVERY - Puppet run on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [15:42:48] RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [15:43:04] RECOVERY - Puppet run on tools-exec-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [15:43:04] RECOVERY - Puppet run on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [15:45:26] chasemp: there have been leaks, historically, but not ongoing as far as I know. Can you tell, is that number growing right now? [15:46:02] seems not to be, and I'm not sure when it's from but it seemed worth mentioning [15:46:38] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [16:18:35] 06Labs, 10DBA, 13Patch-For-Review: Move labs pdns database off of m5-master - https://phabricator.wikimedia.org/T128737#2267514 (10Andrew) [16:19:26] 06Labs: Why is the IP for labs-recursor1 assigned to two hosts? - https://phabricator.wikimedia.org/T134501#2267515 (10Andrew) [16:20:00] andrewbogott: morning! [16:20:15] andrewbogott: there is an outstanding labvirt1008 disk space warning for the past 8d, in case you weren't aware of it [16:20:39] paravoid: I'll try to look at that today, thanks for the reminder. [16:20:43] * andrewbogott wants more hardware [16:21:47] :) [16:24:14] i dunno if it's just something old that's somehow still on my labs VM, but `apt-get update` is timing out trying to contact webproxy.eqiad.wmnet:8080 [16:25:22] ebernhardson: iirc that is not meant to happen anymore, update puppet? [16:26:05] yup, what chasemp said [16:26:30] chasemp: ok i'll dig into it some. It seems this group of machines with a custom puppet master has fallen out of sync (although the master looks up to date with prod branch). Will have some fun :) [16:33:05] 06Labs, 06Team-Practices, 07Privacy: http://hatjitsu.wmflabs.org loads resources from numerous 3rd party sites - https://phabricator.wikimedia.org/T134288#2267565 (10MBinder_WMF) Android, iOS, Reading Web, and Analytics Engineering all use this tool (to name a few of which I am aware). I'm happy to help coor... [16:45:44] looks like if apt::use_proxy was turned on at some point and then turned off, there is no ensure => absent to kill the file. I just deleted it and re-ran puppet, it didn't come back. [17:20:54] PROBLEM - Host tools-worker-1011 is DOWN: PING CRITICAL - Packet loss = 100% [17:23:28] (03PS13) 10BryanDavis: Rewrite jsub in python [labs/toollabs] - 10https://gerrit.wikimedia.org/r/285435 (https://phabricator.wikimedia.org/T132475) [17:24:35] YuviPanda: http://wdq.wmflabs.org/stats gives Times : - 2015-06-03T11:28:31Z [17:26:03] (03CR) 10BryanDavis: "PS13 suppresses most supported qsub arguments from `--help` output:" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/285435 (https://phabricator.wikimedia.org/T132475) (owner: 10BryanDavis) [17:31:49] 06Labs, 06Team-Practices, 07Privacy: http://hatjitsu.wmflabs.org loads resources from numerous 3rd party sites - https://phabricator.wikimedia.org/T134288#2267691 (10bd808) >>! In T134288#2267565, @MBinder_WMF wrote: > Android, iOS, Reading Web, and Analytics Engineering all use this tool (to name a few of w... [17:43:53] 06Labs, 10Tool-Labs: Create Labs project and Puppet magic to do redirects for things that have moved to Tool Labs - https://phabricator.wikimedia.org/T134508#2267756 (10bd808) [17:44:08] 06Labs, 10Tool-Labs: Create Labs project and Puppet magic to do redirects for things that have moved to Tool Labs - https://phabricator.wikimedia.org/T134508#2267771 (10bd808) [18:03:15] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:06:59] ^well it's up for me atm anyway [18:07:35] but it is ungodly slow [18:08:07] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 824673 bytes in 4.059 second response time [18:20:30] Any reason why the Java on labs hasn't been upgraded to 1.8? [18:22:16] *tool labs [18:24:32] Matthew_: long story recorded in https://phabricator.wikimedia.org/T121279 [18:24:38] and tasks linked from ther [18:25:00] tldr is we run ubuntu trusty which has no java8 packages and we do not have the manpower to maintain them. it'll be availble when we make kubernetes available [18:25:20] andrewbogott, Krenair: I hear you have a test version of wikitech for testing OpenStackManager stuff. Any chance I could get you to see if https://gerrit.wikimedia.org/r/#/c/286704/ + https://gerrit.wikimedia.org/r/#/c/286706/ work on it? And if you really want to help test stuff, https://gerrit.wikimedia.org/r/#/c/280945/ (and the patches leading up to it) + https://gerrit.wikimedia.org/r/#/c/286705/ (and the patches leading up to it) + [18:25:20] https://gerrit.wikimedia.org/r/#/c/286706/ + https://phabricator.wikimedia.org/P3008? [18:25:59] YuviPanda: Okay. Because all of the recommended packages for editing en.wp are Java8 only... [18:26:11] anomie, you know how we stage things on mw1017? [18:26:54] YuviPanda: https://en.wikipedia.org/wiki/Wikipedia:Creating_a_bot#Java this page. So yeah. No worries. [18:27:17] Krenair: I know /src/mediawiki has the current deployment branches that we can live-hack then access with cool browser extensions to set an HTTP header [18:27:18] ok! :) hopefully we'll have a java8 webservice setup in a month and something [18:27:55] anomie, well you can do something similar using labtestweb2001.codfw.wmnet + labtestwikitech.wikimedia.org (no funny http headers, it's a separate wiki served by a single host like normal wikitech) [18:28:12] Okay! And meanwhile I can maybe downgrade to 7 temporarily, if I can find a package. [18:28:16] YuviPanda: Installing openjdk-8-jre-headless on the jessie hosts won't do it? It looks like the package is available. [18:28:29] anomie: indeed, if we had any jessie hosts. [18:28:42] YuviPanda: Aren't tools-exec-14* all jessie? [18:28:47] anomie: gridengine was removed from jessie for being abandonware [18:28:52] anomie: nope they're all trusty [18:29:09] YuviPanda: Oh. But apt-cache show openjdk-8-jre-headless still says the package is available? [18:29:12] hence the '14' [18:29:14] since they're all ubuntu 14.04 :) [18:29:34] * anomie got confused because cat /etc/debian_version said "jessie/sid" [18:30:30] anomie: I think it was setup by valhallasw`cloud or moritzm at some point as a testing setup, but was soon abandoned because it was a lot of work. that's the 'we do not have the manpower to maintain it' part, I think. [18:30:40] anomie, actually my bad, it's not .codfw.wmnet, it has a public IP so it's just labtestweb2001.wikimedia.org, but serving labtestwikitech.wikimedia.org [18:31:03] anomie: I know that we're maintianing java8 versions for jessie though, so maybe they've recently added it for trusty? I'm unsure, but think it's unlikely [18:35:03] Krenair: ssh: connect to host labtestweb2001.wikimedia.org port 22: Connection timed out [18:35:16] anomie, are you going via the production bastion? [18:35:27] * anomie will try that [18:35:36] yeah it won't work unless you use the bastion [18:35:43] same as silver [19:06:34] anomie: I am sort of here but it looks like you have what you need already [19:06:54] andrewbogott: For the moment at least. Thanks. [19:16:15] 06Labs, 10Tool-Labs, 07Tracking: [Tracking] Tools that should get deleted - https://phabricator.wikimedia.org/T133777#2268208 (10Danny_B) [19:17:17] 10Tool-Labs-tools-Other, 06Community-Tech, 07I18n: Add jQuery i18n to Pageviews Analysis - https://phabricator.wikimedia.org/T133766#2268213 (10MusikAnimal) 05Open>03Resolved a:03MusikAnimal @Purodha Should be all set! Feel free to use `{{PLURAL}}` and `{{GENDER}}` for any messages that have parameters... [19:21:35] 06Labs, 06Operations, 07Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#2268224 (10Danny_B) [19:23:27] PROBLEM - SSH on tools-pastion-01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:25:10] 06Labs, 10Tool-Labs, 07Tracking: Overhaul logging setup for Tools (Tracking) - https://phabricator.wikimedia.org/T127367#2268230 (10Danny_B) [19:25:17] 06Labs, 07Tracking: Measure capacity and utilization of labs services (Tracking) - https://phabricator.wikimedia.org/T107066#2268232 (10Danny_B) [19:29:33] (03PS1) 10Dzahn: mw_rc_irc: add "secret" files without real secret data [labs/private] - 10https://gerrit.wikimedia.org/r/287136 [19:29:54] 06Labs, 07Tracking: [Tracking] Create labtest cluster - https://phabricator.wikimedia.org/T120293#2268239 (10Danny_B) [19:34:10] (03PS2) 10Dzahn: mw_rc_irc: add "secret" files without real secret data [labs/private] - 10https://gerrit.wikimedia.org/r/287136 [19:35:10] (03CR) 10Dzahn: [C: 032 V: 032] mw_rc_irc: add "secret" files without real secret data [labs/private] - 10https://gerrit.wikimedia.org/r/287136 (owner: 10Dzahn) [19:35:15] 06Labs, 10Labs-Sprint-104, 07Tracking: Recover files from old corrupted file system (Tracking) - https://phabricator.wikimedia.org/T104334#2268257 (10Danny_B) [19:35:20] 06Labs, 07Tracking: Storage capacity & redundancy expansion (tracking) - https://phabricator.wikimedia.org/T85604#2268259 (10Danny_B) [19:44:20] 06Labs, 07Tracking: Find replacements for various things that people were using NFS for but should not have been (Tracking) - https://phabricator.wikimedia.org/T104193#2268271 (10Danny_B) [19:44:23] 06Labs, 10Tool-Labs, 07Puppet, 07Tracking: Fully puppetize Grid Engine (Tracking) - https://phabricator.wikimedia.org/T88711#2268273 (10Danny_B) [19:48:35] 06Labs, 10Labs-Sprint-108, 10Labs-Sprint-109, 10Labs-Sprint-114, and 2 others: Have catchpoint checks for all labs services (Tracking) - https://phabricator.wikimedia.org/T107058#2268292 (10Danny_B) [19:59:28] 06Labs, 07Tracking: Fix often reported problems from the Tool Labs Survey (Tracking) - https://phabricator.wikimedia.org/T114442#2268301 (10Danny_B) [19:59:30] 06Labs, 07Tracking: Eliminate SPOFs in Labs infrastructure (Tracking) - https://phabricator.wikimedia.org/T105723#2268302 (10Danny_B) [20:00:38] Has xdebug been enabled on tools recently? I'm getting this warning: 'You are running composer with xdebug enabled. This has a major impact on runtime performance. See https://getcomposer.org/xdebug' when I run composer, which I wasn't before. [20:07:45] 10Labs-Other-Projects, 06Discovery, 10Maps, 07Tracking: [tracking] OSM on Labs - https://phabricator.wikimedia.org/T60797#2268328 (10Danny_B) [20:10:08] 06Labs, 10Labs-Infrastructure, 07Tracking: Get SSL certificates for wmflabs.org (tracking) - https://phabricator.wikimedia.org/T57957#2268336 (10Danny_B) [20:10:10] 06Labs, 10Labs-Infrastructure, 07Tracking: Upgrade OpenStack to the Folsom release (tracking) - https://phabricator.wikimedia.org/T48817#2268337 (10Danny_B) [20:43:01] 06Labs, 10DBA, 13Patch-For-Review: Move labs pdns database off of m5-master - https://phabricator.wikimedia.org/T128737#2268415 (10Andrew) Remaining tasks: [] wait a while [] verify that there's no longer any traffic to the 'pdns' database on m5-master [] clean up [20:44:50] 06Labs, 06Team-Practices, 07Privacy: http://hatjitsu.wmflabs.org loads resources from numerous 3rd party sites - https://phabricator.wikimedia.org/T134288#2268420 (10MBinder_WMF) @bd808 suh-weet [21:07:14] YuviPanda: openjdk8 for trusty? Tried compiling, but no success [21:07:21] Failing test cases [21:07:46] But maybe it's in baxkports by now? [21:23:41] 06Labs: Why is the IP for labs-recursor1 assigned to two hosts? - https://phabricator.wikimedia.org/T134501#2268541 (10Andrew) I have this sorted out now, but it's confusing. labservices1001 runs labs-ns0, aka 208.80.155.117 labservices1001 runs labs-recursor1, aka 208.80.155.118 holmium runs labs-ns1, aka 208... [21:26:14] 06Labs, 13Patch-For-Review: Periodic internal labs dns outages - https://phabricator.wikimedia.org/T124680#2268543 (10Andrew) An important fact (which i knew, but forgot, and just now re-learned as part of fixing T134501): labservices1001 is labs-ns0 but labs-recursor1 holmium is labs-ns1 but labs-recursor0... [21:26:43] 06Labs, 13Patch-For-Review: Periodic internal labs dns outages - https://phabricator.wikimedia.org/T124680#2268546 (10Andrew) [21:26:45] 06Labs: Why is the IP for labs-recursor1 assigned to two hosts? - https://phabricator.wikimedia.org/T134501#2267515 (10Andrew) 05Open>03Resolved [21:27:39] (03PS1) 10Mattflaschen: Add wildcard for Collaboration team (Collab-Team(-.*)?) [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/287146 [21:29:10] (03CR) 10Quiddity: [C: 031] "lgtm" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/287146 (owner: 10Mattflaschen) [21:34:53] (03CR) 10Legoktm: [C: 032] Add wildcard for Collaboration team (Collab-Team(-.*)?) [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/287146 (owner: 10Mattflaschen) [21:35:26] (03Merged) 10jenkins-bot: Add wildcard for Collaboration team (Collab-Team(-.*)?) [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/287146 (owner: 10Mattflaschen) [21:36:59] !log tools.wikibugs Updated channels.yaml to: 6e0a2d140f923140c73b0dd5a356816a1cb47aa5 Add wildcard for Collaboration team (Collab-Team(-.*)?) [21:37:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [21:58:13] 06Labs, 10Tool-Labs, 15User-bd808: Create Labs project and Puppet magic to do redirects for things that have moved to Tool Labs - https://phabricator.wikimedia.org/T134508#2268607 (10bd808) a:03bd808 [22:15:44] hi I read something about potential dns problems today on the mailing list. Does that mean that I can not access any external servers from my instances? [22:30:12] physikerwelt: I can actually access them [22:30:22] at my instances, but on labs, not tools [22:43:29] physikerwelt: the dns changes should be all done now. Are you actively having dns lookup problems? [22:56:13] bd808: no the problem is that I get 404 even with ips, not even apt-get update works [22:56:37] well that's not good. Which project? Math? [22:56:44] yes [22:56:57] it's the mathosphere instance [22:57:25] bd808: I need to leave the building , I'll create a phabricator ticket [22:57:32] good plan [22:58:30] !log math Joined project as admin [22:58:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Math/SAL, Master [23:00:26] physikerwelt: `curl -Lv google.com` works for me on mathosphere. I guess I need more info on what's busted [23:19:26] 06Labs, 06Team-Practices, 07Privacy: http://hatjitsu.wmflabs.org loads resources from numerous 3rd party sites - https://phabricator.wikimedia.org/T134288#2268728 (10bd808) A redirect has been setup, so http://hatjitsu.wmflabs.org now redirects to https://tools.wmflabs.org/hatjitsu/ @MBinder_WMF Let me know... [23:20:55] MaxSem: I'm going to kill the jitsu instance in the mobile project now. I've got a replacement running on Tool Labs [23:21:28] bd808, thanks for doing this! [23:21:33] !log redirects Configured redirect for hatjitsu.wmflabs.org [23:21:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Redirects/SAL, Master [23:22:26] !log mobile Deleted jitsu instance. Replaced with https://tools.wmflabs.org/hatjitsu/ [23:22:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Mobile/SAL, Master [23:23:30] MaxSem: yw [23:40:25] 06Labs: mathosphere.math.eqiad.wmflabs does not respond irregulary - https://phabricator.wikimedia.org/T120637#2268757 (10Physikerwelt) 05Open>03Resolved The problem does not occour anymore. [23:41:57] 06Labs, 10Math, 10MathSearch, 10Mathoid: Fix 503 on Help:Forumla purge - https://phabricator.wikimedia.org/T104549#2268761 (10Physikerwelt) [23:51:32] 06Labs, 10Tool-Labs, 13Patch-For-Review, 15User-bd808: Create Labs project and Puppet magic to do redirects for things that have moved to Tool Labs - https://phabricator.wikimedia.org/T134508#2268803 (10bd808) Project created and configured to handle hatjitsu.wmflabs.org. See https://wikitech.wikimedia.org... [23:52:51] 06Labs, 10Tool-Labs, 13Patch-For-Review, 15User-bd808: Create Labs project and Puppet magic to do redirects for things that have moved to Tool Labs - https://phabricator.wikimedia.org/T134508#2268821 (10bd808) @yuvipanda all that is left to do here is merge the prod no-op Puppet patch at https://gerrit.wik... [23:54:14] 06Labs, 10Math, 10MathSearch, 10Mathoid: Set up an domain name for the instance - https://phabricator.wikimedia.org/T134539#2268829 (10Physikerwelt) [23:58:54] bd808: thank you for testing, the problem occours only from within the vagrant instance