[00:32:44] YuviPanda|zzz: you were. [00:33:17] Not typing, i thought. Anyway, I'm also sleeping now :-P [00:35:42] Coren: ping [06:40:51] PROBLEM - Puppet failure on tools-exec-04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:42:05] 3Tool-Labs, Wikidata: Lost connection to MariaDB server during query - https://phabricator.wikimedia.org/T76699#948590 (10Springle) No smoking gun yet. As discussed a few times on labs-l in order fight abuse and replag we kill things explicitly when: 1. A query runs for more that 28800 seconds 2. One or more q... [06:42:38] 3Tool-Labs: Lost connection to MariaDB server during query - https://phabricator.wikimedia.org/T76956#948591 (10Springle) (dupe comment from T76699) No smoking gun yet. As discussed a few times on labs-l in order fight abuse and replag we kill things explicitly when: 1. A query runs for more that 28800 second... [06:43:51] 3Tool-Labs: Lost connection to MariaDB server during query - https://phabricator.wikimedia.org/T76956#948595 (10Springle) The ~4sec estimate ... could it be when running a query on a connection that has just slept for ~60sec? Might be an overlap. [07:10:50] RECOVERY - Puppet failure on tools-exec-04 is OK: OK: Less than 1.00% above the threshold [0.0] [11:19:12] PROBLEM - Puppet failure on tools-webgrid-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:38:34] YuviPanda|zzz: ding, dong, I'm baaaaaaack [11:38:48] valhallasw`cloud: aah, wanted to put g-p-u back on uwsgi :D [11:38:55] since I fixed the thing you reported [11:39:00] but probably not right now, I’ve to go soon. [11:39:00] YuviPanda|zzz: well, make sure to try uploading something :-p [11:39:04] :P [11:44:13] RECOVERY - Puppet failure on tools-webgrid-01 is OK: OK: Less than 1.00% above the threshold [0.0] [12:36:24] 3Tool-Labs: program created by proprietary compiler allowed on labs? - https://phabricator.wikimedia.org/T74253#948821 (10Inkowik) Yes, I would like to access the database replicas. I am also looking for a place where I can run my bot automatically, because on my local system I don't have the possibility to do so. [13:59:17] 3Tool-Labs: program created by proprietary compiler allowed on labs? - https://phabricator.wikimedia.org/T74253#948862 (10LuisV_WMF) I hope you won't be offended if I'm skeptical that you have the skills/ability to write a compiler that is worth keeping proprietary but not get access to a server that can run a b... [14:01:03] 3Tool-Labs: program created by proprietary compiler allowed on labs? - https://phabricator.wikimedia.org/T74253#948865 (10yuvipanda) There exists lots of proprietary compilers in the world that you can use without having to write one yourself. [14:01:47] 3Tool-Labs: program created by proprietary compiler allowed on labs? - https://phabricator.wikimedia.org/T74253#948866 (10valhallasw) >>! In T74253#948862, @LuisV_WMF wrote: > I hope you won't be offended if I'm skeptical that you have the skills/ability to write a compiler that is worth keeping proprietary but... [14:03:43] 3Tool-Labs: program created by proprietary compiler allowed on labs? - https://phabricator.wikimedia.org/T74253#948868 (10LuisV_WMF) My apologies, I misunderstood. [14:06:39] Hello every one [14:07:48] Can anyone tell me, how to access wikipedia database? I'm now connected to putty. now need further information or reading material regarding accessing database [14:11:49] asad_: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database [14:13:31] valhallasw`cloud: thanks... looking into this... Can you also tell me how to become member of project ? [14:13:51] asad_: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs [14:14:35] valhallasw`cloud: thanks... [16:37:15] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<33.33%) [18:47:17] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [19:41:06] (03PS5) 10Merlijn van Deen: + taxonomy script [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/181583 [19:41:08] (03PS1) 10Merlijn van Deen: Add config-fetcher to fab [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182234 [19:41:10] (03PS1) 10Merlijn van Deen: Move operations projects to #wikimedia-operations [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182235 [19:41:24] (03CR) 10jenkins-bot: [V: 04-1] Move operations projects to #wikimedia-operations [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182235 (owner: 10Merlijn van Deen) [19:41:26] (03CR) 10jenkins-bot: [V: 04-1] Add config-fetcher to fab [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182234 (owner: 10Merlijn van Deen) [19:41:28] (03PS2) 10Merlijn van Deen: Move operations projects to #wikimedia-operations [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182235 [19:41:44] (03CR) 10jenkins-bot: [V: 04-1] Move operations projects to #wikimedia-operations [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182235 (owner: 10Merlijn van Deen) [19:42:13] (03PS3) 10Merlijn van Deen: Move operations projects to #wikimedia-operations [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182235 [19:43:28] (03PS2) 10Merlijn van Deen: Add config-fetcher to fab [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182234 [19:43:40] (03CR) 10jenkins-bot: [V: 04-1] Add config-fetcher to fab [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182234 (owner: 10Merlijn van Deen) [19:44:45] (03PS3) 10Merlijn van Deen: Add config-fetcher to fab [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182234 [19:45:18] (03PS4) 10Merlijn van Deen: Move operations projects to #wikimedia-operations [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/182235 [20:37:16] PROBLEM - Puppet failure on tools-exec-07 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:02:16] RECOVERY - Puppet failure on tools-exec-07 is OK: OK: Less than 1.00% above the threshold [0.0] [21:19:06] PROBLEM - Puppet failure on tools-exec-08 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:40:41] PROBLEM - Puppet failure on tools-exec-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:46:55] Is the labs dns service freaking out? The deployment-prep project is logging a ton of puppet failures and they all seem to be caused by getaddrinfo failures. [21:47:21] Lots of stuff like "Error: /Stage[main]/Deployment::Salt_master/File[/srv/salt/top.sls]: Could not evaluate: getaddrinfo: Temporary failure in name resolution Could not retrieve file metadata for puppet:///modules/deployment/states/top.sls: getaddrinfo: Temporary failure in name resolution" [21:49:06] RECOVERY - Puppet failure on tools-exec-08 is OK: OK: Less than 1.00% above the threshold [0.0] [22:10:34] RECOVERY - Puppet failure on tools-exec-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:59] PROBLEM - Puppet failure on tools-webproxy is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [22:31:20] so... what's the deal with name resolving on the tools instances? lookups seems to fail more often than expected. [22:49:56] RECOVERY - Puppet failure on tools-webproxy is OK: OK: Less than 1.00% above the threshold [0.0] [23:01:22] Coren: any luck looking at the DNS situation? [23:01:54] hi YuviPanda, bd808 was just talking about that [23:02:21] Yeah that's the DNS server dying [23:02:43] Coren was investigating a few days ago and I was hoping he could tell me what is up [23:02:56] judging by the number of puppet failures in beta today I'd say its close to dead [23:04:44] I know only that you have to be really really careful restarting it or it will be dead dead [23:04:54] Because of how it plays with OS [23:05:28] Unfortunately I picked *this* day of all days to not being my laptop with me and stop working at midnight and it goes off now [23:05:54] I suppose all I can do now is hope Coren sees the pings [23:11:28] bd808: I'll be able to look at it in about 10h if it still has problems. [23:11:55] YuviPanda: No worries. It will get fixed eventually. (the zen of labs) [23:12:11] Heh :) [23:12:26] bd808: is it having any effects other than shinken spam? [23:12:53] That's all I've noticed and chrismcmahon hasn't been yelling. [23:13:09] there was one tools user comlaining in backscroll [23:13:14] *complaining [23:13:42] maybe a minor freakout in a Jenkins job output. I don't really care until deploys start up again. [23:14:00] Heh [23:14:07] I'll take a look tomorrow [23:14:16] And also set up an infra check [23:14:20] For DNS [23:15:13] * YuviPanda goes to sleep [23:18:07] YuviPanda: I know what the problem is, I just have no method to fix it short of replacing the server the instances resolve against entirely. Ryan had a couple ideas about how to do it, but I'm not sure I don't just want to slap a normal recursor in front of dnsmasq [23:18:43] We should do that as a start yeah [23:18:59] Coren: also I guess restarting temporarily relieves pressure?