[00:27:39] 06Labs, 13Patch-For-Review: Flaky tools-checker pages - https://phabricator.wikimedia.org/T136775#2347776 (10yuvipanda) p:05Triage>03High I've increased the timeout to 300s (5x server timeout), but there's an nginx in front of the uwsgi (for no good reason, I feel now), with its own timeouts probably. So I... [00:29:26] 06Labs, 13Patch-For-Review: Flaky tools-checker pages - https://phabricator.wikimedia.org/T136775#2347791 (10yuvipanda) a:03yuvipanda [00:31:04] replication database down? [01:02:30] musikanimal: something is certainly up with the replica dbs. I can't even get https://tools.wmflabs.org/replag/ to load [01:02:55] YuviPanda: ^ any idea where to start troubleshooting that? [01:03:21] the webservice is up because https://tools.wmflabs.org/replag/?source comes back quickly [01:09:14] bd808: ugh, just got back [01:09:28] `sql enwiki_p` is just out to lunch [01:09:38] `sql local` works [01:09:44] I'm trying to ssh to labsdb1001 [01:09:56] and mysql is running [01:09:58] hmm [01:10:06] but mysql is stuck [01:10:10] can't get there as root [01:10:25] ok, labsdb1003 works [01:10:27] hmm [01:10:46] bd808: ok, so labsdb1003 works but 1001 is out [01:10:53] 01:10:38 up 545 days, 22:56, 1 user, load average: 27.08, 27.10, 27.06 [01:10:57] I hope that's not a shitty disk [01:11:10] "up 545 days" ?! [01:11:16] yeah [01:11:29] I highly suspect a messed up disk [01:11:45] and 02 is already out right? [01:11:50] yeah [01:11:52] 1003 is good [01:11:53] so are we donw to one db server? [01:11:59] I'm checking if it is a messed up disk [01:12:41] should dns be switched to point the 1001 traffic to 1003? [01:12:52] that's an option but that's possibly going to overload 1003 [01:12:59] yeah [01:13:00] I'm going to first verify what's up with 1001 [01:13:14] k. i'll stand back and send you good vibes [01:13:42] bd808: I think if it is dead, I should switch and then enforce a strong time limit (~30mins?) [01:13:46] on the remaining db [01:18:50] also sending good vibes! :) [01:19:01] ty musikanimal [01:19:06] musikanimal: can you also file a bug? [01:19:17] sure [01:22:19] YuviPanda: reboot the whole server.. maybe a memory leak ? [01:22:26] its swapping [01:22:52] and didnt we have a bug that was XFS related [01:23:10] mutante: yeah, that's my next tought [01:23:20] i think it might help [01:23:23] ok [01:23:29] 06Labs, 10Labs-Infrastructure: replica labsdb1001 down - https://phabricator.wikimedia.org/T136787#2347893 (10MusikAnimal) [01:24:27] ^ I don't really know what else to put in the task description heh [01:25:47] mutante: hmm, I can't seem to get into labsdb1001.mgmt.eqiad.wmnet [01:25:51] root@ [01:25:54] tells me wrong password [01:26:26] YuviPanda: use admin@ [01:26:28] Cisco [01:26:31] ah [01:26:50] first time at a cisco! [01:27:12] i am surprised i remembered this [01:27:29] but definitely caught me , heh [01:27:30] mutante: [18008003.342666] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) [01:27:37] bingo [01:27:41] over and over again [01:27:42] xfs bug, memmory leak [01:27:46] what i thought [01:27:51] did we have a fix? [01:27:53] fix is to reboot [01:27:54] afaict [01:27:56] * YuviPanda searches [01:28:00] gotta ask Muehlenhoff [01:28:07] something was in phab.. i think .. [01:28:34] i tried earlier but did not find it right away [01:28:34] it is still booting tho [01:29:02] musikanimal: that line about the deadlock.. now :) [01:29:21] righ [01:29:26] mutante: it's booting tho [01:29:44] hmm, it could also be broken RAM i guess [01:29:58] mutante: it's going through with boot tho [01:30:37] ok, lets see how mysql behaves [01:30:42] mutante: yeah, is booted [01:31:06] ok, mysql did not get started [01:31:09] mutante: i wonder if i've to start mysql manually [01:31:20] '/tmp/mysql.sock' (2 "No such file [01:31:21] uhm [01:31:26] looks like it [01:31:44] service mysql start had no effect it looks like [01:32:05] starting it with the init script [01:32:15] root@labsdb1001:~# /etc/init.d/mysql status [01:32:15] /opt/wmf-mariadb10 * MySQL is not running [01:32:15] root@labsdb1001:~# /etc/init.d/mysql start [01:32:15] /opt/wmf-mariadb10 [01:32:16] Starting MySQL [01:32:17] ............ [01:32:43] and now it keeps adding dots [01:33:35] ah [01:33:38] ok [01:33:46] but this takes loong [01:34:19] ok, it is done [01:34:27] yay [01:34:27] * Manager of pid-file quit without updating file. [01:34:29] it seems back [01:34:39] running now [01:34:52] dont know the root pass yet [01:35:02] mutante: for mysql? [01:35:04] finds it [01:35:20] yep, got it [01:35:23] MariaDB [(none)]> [01:35:32] is it ok now? [01:35:36] looks to be [01:35:42] show processlist; [01:35:42] bd808: do you know why replag tool is still not working? [01:35:46] says its doing stuff [01:35:48] musikanimal: can you verify this is back up? [01:35:53] yup [01:35:57] YuviPanda: nope. I was just looking [01:36:12] nope == don't know why replag is still busted [01:36:29] ah ok [01:37:38] 06Labs, 10Labs-Infrastructure: replica labsdb1001 down - https://phabricator.wikimedia.org/T136787#2347925 (10yuvipanda) It was swapping, we rebooted it and it is back up. Seems to be an XFS related memory leak that @Dzahn remembers as having stuck elsewhere too. Lots of repeated: ``` Jun 1 22:27:59 labsdb10... [01:38:01] YuviPanda: working now -- https://tools.wmflabs.org/replag/ [01:38:09] 5h lag on s1 [01:38:17] actually almost 6 [01:38:23] yeah [01:38:33] so I'm guessing it was borked for a while [01:38:40] and increasing [01:38:42] not decreasing [01:38:47] although I'll give it a while I guess [01:39:19] yeah. the increasing just means that it isn't getting replication data yet [01:39:29] 06Labs, 10Labs-Infrastructure: replica labsdb1001 down - https://phabricator.wikimedia.org/T136787#2347932 (10Dzahn) After it was rebooted mysql was not running yet: 18:37 < mutante> root@labsdb1001:~# /etc/init.d/mysql status 18:37 < mutante> /opt/wmf-mariadb10 * MySQL is not running 18:37 < mutante> root@la... [01:40:00] bd808: nope, it's actually lag in db1069 [01:40:43] bd808: so everything except enwiki is going down [01:40:46] while enwiki is going up [01:44:03] YuviPanda: working on my end [01:44:10] musikanimal: ok! [01:44:56] very high replag though, but not complaining [01:47:57] thank you YuviPanda ! [01:48:45] and mutante [01:49:35] :) [01:53:04] musikanimal: mutante bd808 I see lag on db1069 is slowly going down, and then lag on labsdb1001 will catch up once that's done [01:53:18] cool [01:53:29] i'm afraid i found the next issue meanwhile [01:53:43] ores - connectiion refused [01:53:48] on scb1001/2001 [01:53:49] YuviPanda: [01:54:18] ores on scb? [01:54:20] mutante: that's a deploy still in progress by Alex [01:54:32] oh! ok [01:54:46] alright [01:55:47] 06Labs: High replag on db1069 - https://phabricator.wikimedia.org/T136789#2347957 (10yuvipanda) [01:56:48] 06Labs, 10DBA: High replag on db1069 - https://phabricator.wikimedia.org/T136789#2347971 (10yuvipanda) [01:57:46] mutante: thanks for the help! [01:57:55] musikanimal: bd808 thanks for poking and alerting! [01:58:35] YuviPanda: welcome [02:04:35] 06Labs, 10DBA: High replag on db1069 - https://phabricator.wikimedia.org/T136789#2347975 (10yuvipanda) People with access can keep a look on the db1069 replag at https://tendril.wikimedia.org/tree [02:12:22] 06Labs, 10DBA: High replag on db1069 - https://phabricator.wikimedia.org/T136789#2347978 (10yuvipanda) p:05Triage>03High [02:27:18] 06Labs, 13Patch-For-Review: Flaky tools-checker pages - https://phabricator.wikimedia.org/T136775#2347989 (10yuvipanda) Not going to touch nginx, since the uwsgi version we have doesn't seem to support binding to port 80 securely and dropping privs. [04:34:23] I was trying to install a python application for the pywikibot-core on toolslabs and I found that `pip` is not installed. [04:34:32] Is there any way for me to get pip installed on there ? [04:59:13] 06Labs: dumps-stats.dumps.eqiad.wmflabs instance was hammering NFS - https://phabricator.wikimedia.org/T134148#2348072 (10Hydriz) @chasemp I haven't been using the instance much since then, so it did not cause any issues. However, I will still need the space that the NFS storage provides and it is likely that th... [06:31:08] Hey, so Im trying to get the dump containing all media files from commons. I can find it in dumps.wikimedia at https://dumps.wikimedia.org/other/mediatitles/20160407/ [06:31:33] But it does not seem to be in the `/public/dumps/public` folder in labs [06:42:23] 06Labs, 10DBA, 06Operations: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2348194 (10jcrespo) Not necessarily, it is probably a less wide table and will not have so many metadata issues as revision. [06:56:53] (03PS1) 10Alexandros Kosiaris: Actually use the correct role for ores redis_password [labs/private] - 10https://gerrit.wikimedia.org/r/292317 [06:59:01] (03PS2) 10Alexandros Kosiaris: Actually use the correct role for ores redis_password [labs/private] - 10https://gerrit.wikimedia.org/r/292317 [06:59:18] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Actually use the correct role for ores redis_password [labs/private] - 10https://gerrit.wikimedia.org/r/292317 (owner: 10Alexandros Kosiaris) [07:45:16] 06Labs: dumps-stats.dumps.eqiad.wmflabs instance was hammering NFS - https://phabricator.wikimedia.org/T134148#2348239 (10Nemo_bis) In my understanding, the Labs physical hosts still have plenty of free disk: https://ganglia.wikimedia.org/latest/graph_all_periods.php?title=Labs+machines+disk&vl=&x=&n=&hreg[]=lab... [07:45:47] 06Labs: Dumps instances occasionally hammer NFS for temporary storage - https://phabricator.wikimedia.org/T134148#2348240 (10Nemo_bis) [07:53:49] 06Labs, 10DBA: High replag on db1069 - https://phabricator.wikimedia.org/T136789#2348243 (10yuvipanda) db1069 has no replag now, and replag on labsdb1001 is also reducing. [08:03:40] 06Labs, 10Labs-Infrastructure: replica labsdb1001 down - https://phabricator.wikimedia.org/T136787#2347893 (10MoritzMuehlenhoff) This XFS bug was reported here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1382333 The fix landed in 3.13.0-40.69, but before the crash occured labsdb1001 was still running... [08:04:14] 06Labs, 10Labs-Infrastructure: replica labsdb1001 down - https://phabricator.wikimedia.org/T136787#2348246 (10yuvipanda) \o/ awesome! [08:07:35] AbdealiJK: I was trying to install a python application for the pywikibot-core on toolslabs and I found that `pip` is not installed. <= use virtualenv [08:07:45] slow, but it works [08:07:53] zhuyifei1999_, yep, I got that working :) Thanks [08:08:04] ok great :) [08:16:31] 06Labs, 10DBA: High replag on db1069 - https://phabricator.wikimedia.org/T136789#2348276 (10yuvipanda) 05Open>03Resolved a:03yuvipanda All clear. [08:16:55] 06Labs, 10Labs-Infrastructure: replica labsdb1001 down - https://phabricator.wikimedia.org/T136787#2348279 (10yuvipanda) 05Open>03Resolved a:03yuvipanda [08:30:04] 06Labs, 10Labs-Infrastructure, 06Operations, 10ops-eqiad: connect usb external disk to labmon1001 - https://phabricator.wikimedia.org/T136242#2348321 (10yuvipanda) This actually sped up a lot over the evening, and finished now! \o/ So we're good I think. [12:05:08] 06Labs, 06Operations, 10netops: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#2348847 (10faidon) Thanks. Testing again production (e.g. upload.wikimedia.org, but any host including bast1001 would do) would also be a useful data poin... [12:06:19] 06Labs, 06Operations, 10netops: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#2343755 (10faidon) p:05Triage>03Normal [12:50:26] 06Labs, 10Tool-Labs: wikiviewstats is using 232G on Tools - https://phabricator.wikimedia.org/T136198#2348968 (10Technical13) >>! In T136198#2345929, @chasemp wrote: > @technical13 are you actively involved in this project? I'm probably the most active and although I've been away for personal medical reasons,... [14:30:30] 06Labs, 10Tool-Labs, 06Operations, 10netops: Someone seems to be running a port scanner in labs - https://phabricator.wikimedia.org/T136829#2349244 (10Peachey88) [14:44:38] 06Labs, 10Tool-Labs, 06Operations, 10netops: Someone seems to be running a port scanner in labs - https://phabricator.wikimedia.org/T136829#2349321 (10Andrew) @Reedy, you're right, I was briefly confusing the src and dst ports. I'll rename [14:45:01] 06Labs, 10Tool-Labs, 06Operations, 10netops: bitninja upset about us running a crawler - https://phabricator.wikimedia.org/T136829#2349322 (10Andrew) [14:47:33] gifti: You're the maintainer of 'German Wikipedia Broken Weblinks Bot' right? [14:47:48] the replag on s1 is getting better -- https://tools.wmflabs.org/replag/ [14:53:03] bd808, that lag is created by due to the importing [14:53:52] *nod* It was pretty horrible last night when we got labsdb1001 restarted after the xfs lockup [14:54:31] yes, I got it to 0 [14:54:54] then restarted the long running labs db fixing again [16:02:54] https://tools.wmflabs.org/guc/ isn't working - 502 [16:03:00] Krinkle: ^ [16:09:48] 10Tool-Labs-tools-Global-user-contributions: 'sockets disabled, connection limit reached' (likely due to scraping) - https://phabricator.wikimedia.org/T136842#2349560 (10valhallasw) [16:10:50] (03PS1) 10Merlijn van Deen: robots.txt: disallow /guc/ [labs/toollabs] - 10https://gerrit.wikimedia.org/r/292381 (https://phabricator.wikimedia.org/T136842) [16:13:59] (03CR) 10Glaisher: [C: 031] robots.txt: disallow /guc/ [labs/toollabs] - 10https://gerrit.wikimedia.org/r/292381 (https://phabricator.wikimedia.org/T136842) (owner: 10Merlijn van Deen) [16:17:59] 06Labs, 10Labs-Infrastructure: Copy graphite data from labmon1001 to an external HDD - https://phabricator.wikimedia.org/T136226#2349588 (10RobH) [16:36:43] (03PS2) 10Yuvipanda: robots.txt: disallow /guc/ [labs/toollabs] - 10https://gerrit.wikimedia.org/r/292381 (https://phabricator.wikimedia.org/T136842) (owner: 10Merlijn van Deen) [16:36:56] (03CR) 10Yuvipanda: [C: 032] robots.txt: disallow /guc/ [labs/toollabs] - 10https://gerrit.wikimedia.org/r/292381 (https://phabricator.wikimedia.org/T136842) (owner: 10Merlijn van Deen) [16:37:51] (03Merged) 10jenkins-bot: robots.txt: disallow /guc/ [labs/toollabs] - 10https://gerrit.wikimedia.org/r/292381 (https://phabricator.wikimedia.org/T136842) (owner: 10Merlijn van Deen) [16:41:33] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Jarallah II was modified, changed by BryanDavis link https://wikitech.wikimedia.org/w/index.php?diff=605207 edit summary: [17:15:20] 06Labs, 10Tool-Labs, 06Operations, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2349820 (10Andrew) [17:23:58] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: I/O on labmon1001 is very slow - https://phabricator.wikimedia.org/T127957#2349894 (10RobH) [17:24:00] 06Labs, 10Labs-Infrastructure: Copy graphite data from labmon1001 to an external HDD - https://phabricator.wikimedia.org/T136226#2349892 (10RobH) 05Open>03Resolved Yuvi confirmed via IRC the copy completed. system is reimaging. [17:26:53] 06Labs, 10Tool-Labs, 06Operations, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2349206 (10MoritzMuehlenhoff) The report only contains alerts of the kind: "A visitor reached our honeynet and sent... [17:35:51] 06Labs, 10Tool-Labs, 06Operations, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2349206 (10scfc) Maybe I'm missing something, but the link (https://bitninja.io/incidentReport.php?details=91e8f633... [17:38:21] 06Labs, 10Tool-Labs, 06Operations, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2349206 (10chasemp) It's hard for me to take the BitNinja reports seriously. Previously, it is my understanding, t... [17:38:45] Glaisher: checking [17:39:08] Krinkle: valhallasw`cloud already filed https://phabricator.wikimedia.org/T136842 [17:40:03] valhallasw`cloud: Hm.. [17:40:07] Weird. guc is POST only [17:40:18] (in web browsers, it will auto-submit with JavaScript) [17:40:28] * Krinkle reboots guc [17:40:39] Krinkle: googlebot POSTs for some reason :/ [17:41:17] Krinkle: in any case, the socket disabled warning means there are 250 or so concurrent connections open [17:41:36] valhallasw`cloud: I'd like the home page to remain index [17:41:38] ed [17:41:43] Can we somehow block querystring only? [17:42:03] I'll also see if I can fix the tool itself to handle this better [17:43:40] ^they disappeard right away? [17:43:49] my scrolle was messe dup [17:43:52] ha [17:44:32] Krinkle: sure, it's this robots.txt: https://phabricator.wikimedia.org/diffusion/LTOL/browse/master/www/robots.txt [17:47:26] 06Labs, 10Tool-Labs, 07Tracking: Contact tool maintainters using large amounts of disk space (tracking) - https://phabricator.wikimedia.org/T136212#2350055 (10chasemp) [17:47:28] 06Labs, 10Tool-Labs: wikiviewstats is using 232G on Tools - https://phabricator.wikimedia.org/T136198#2350052 (10chasemp) 05Open>03Resolved a:03chasemp >>! In T136198#2348968, @Technical13 wrote: >>>! In T136198#2345929, @chasemp wrote: >> @technical13 are you actively involved in this project? > > I'm... [17:51:09] 06Labs, 06Operations: labnet100[12].eqiad.wmnet need to be reimaged with RAID - https://phabricator.wikimedia.org/T136718#2350094 (10chasemp) [17:54:05] 06Labs, 10Tool-Labs: icelab is using 245G in Tools - https://phabricator.wikimedia.org/T136197#2350102 (10chasemp) There seems to be no activity on this tool. I haven't been able to get in touch with a maintainer, and it is using a lot of space here for seemingly no reason. I plan to archive this tool in a f... [17:55:25] 06Labs, 10Tool-Labs, 06Operations, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2349206 (10doctaxon) @Giftpflanze: Die reden wohl von unserem Bot?! [18:04:42] 06Labs, 10Tool-Labs, 06Operations, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2350165 (10Giftpflanze) @doctaxon apparently they do, but it seems the claim is unsubstantiated [18:07:52] andrewbogott: i guess i am [18:11:40] 06Labs, 10Tool-Labs, 06Operations, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2350197 (10MoritzMuehlenhoff) I've replied to them requesting to whitelist the captcha warnings. Will update the Ph... [18:21:37] 06Labs, 03Community-Tech-Sprint, 15User-bd808: Create project Google-api-proxy - https://phabricator.wikimedia.org/T136862#2350266 (10bd808) [18:22:11] 06Labs, 03Community-Tech-Sprint, 15User-bd808: Create project Google-api-proxy - https://phabricator.wikimedia.org/T136862#2350266 (10bd808) [18:22:13] 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2350287 (10bd808) [18:22:48] gifti:I was pinging you about https://phabricator.wikimedia.org/T136829 but it looks like there's been recent activity on that ticket [18:23:04] I'm convinced that bitninja is being overreactive and I've already asked them to chill out (as has Moritz) [18:23:21] but I was also hoping to offer them an olive branch along the lines of "we throttled that bot so it won't hit you as often" [18:23:30] is that a fairly easy thing for you to do if necessary? [18:24:25] Since they are a distributed monitoring service I'm not sure there is a good way to throttle. [18:24:45] andrewbogott, what is the actual effect on labs? [18:24:49] bd808: well, that's presuming that that one log is what they actually care about [18:25:01] as in, have you received complains from sites not working? [18:25:05] they seem to not really know what their beef is [18:25:10] from our own users? [18:25:14] jynus: no. [18:25:23] ok :-) [18:25:32] But, since the bot in question is checking for dead links… if they blacklist us then that bot will quietly mark them as dead and move on [18:25:35] I wouldn't expect to get a complaint [18:25:35] I thought things were completely different [18:25:43] I saw you too much worried [18:26:24] jynus: I was briefly worried when I thought the complaint had merit :) I'm still slightly worried that they're going to knock a bunch of references off of wikipedia by accident. [18:27:55] 06Labs, 03Community-Tech-Sprint, 15User-bd808: Create project Google-api-proxy - https://phabricator.wikimedia.org/T136862#2350334 (10bd808) 05Open>03Resolved https://wikitech.wikimedia.org/wiki/Nova_Resource:Google-api-proxy [18:27:57] 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2350337 (10bd808) [18:32:36] 06Labs, 10Labs-Infrastructure: Static IP for google-api-proxy project - https://phabricator.wikimedia.org/T136865#2350348 (10bd808) [18:33:09] andrewbogott, YuviPanda: ^ can I get a +1 quota for static ip on the google-api-proxy project? [18:33:20] bd808: yep, just a minute... [18:35:10] bd808: done [18:35:20] thanks andrewbogott [18:38:43] 06Labs, 10Labs-Infrastructure: Static IP for google-api-proxy project - https://phabricator.wikimedia.org/T136865#2350403 (10bd808) 05Open>03Resolved a:03Andrew [18:38:57] 06Labs, 13Patch-For-Review: Flaky tools-checker pages - https://phabricator.wikimedia.org/T136775#2350407 (10yuvipanda) I tested the changes by unmounting the NFS mounts on the checker host. This took 5 minutes to be spotted, which is ok. Remounting them fixed it and checks turned green. I'm going to turn pagi... [18:44:57] 06Labs, 13Patch-For-Review: Flaky tools-checker pages - https://phabricator.wikimedia.org/T136775#2350410 (10yuvipanda) 05Open>03Resolved [18:45:04] Any known issues with instance creation? I'm trying to bring up a jessie instance and seeing it melt on hooking up to the Labs salt master [18:45:13] https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&instanceid=ddf6f405-6aa6-42bd-910d-3f6d01ea48f1&project=google-api-proxy®ion=eqiad [18:45:40] bd808: same as https://phabricator.wikimedia.org/T136656? [18:45:55] yup [18:46:54] 06Labs, 10Labs-Infrastructure: Creating new instance failed - https://phabricator.wikimedia.org/T136656#2343333 (10bd808) I'm having the same problem with the google-api-proxy-01.google-api-proxy.eqiad.wmflabs instance based on the jessie base image. [18:47:07] bd808: creating another one worked for me [18:47:29] transient failures are the best! [18:55:35] 06Labs, 10Labs-Infrastructure: Creating new instance failed - https://phabricator.wikimedia.org/T136656#2350504 (10bd808) Creation of the next instance of the same type in the same project worked fine. [19:00:21] huh. how about an open bug about empty default security group for new projects? [19:04:40] bd808: don't think there's one but it's happened in the past too [19:04:59] Is it worth filing a bug about? [19:05:14] bd808: intermittently happens, so probably [19:05:18] so we can at least track incidents [19:07:12] 06Labs, 10Labs-Infrastructure: Empty default security group for newly created project - https://phabricator.wikimedia.org/T136871#2350514 (10bd808) [19:11:44] and another one! Disassociating a floating ip didn't release it back to the project [19:12:30] my quote shows 1/1 used but I both explicitly disassociated and deleted the instance that had it [19:12:36] *quota [19:14:41] bd808: can you try logging out and back in to see if that affliction has carried over? [19:15:27] logout doesn't fix it [19:15:37] andrewbogott: ^ [19:15:49] I'm writing it up [19:16:28] bd808: when you say the quota shows 1/1 [19:16:35] the quota on horizon or on wikitech? [19:16:37] And, what project is this? [19:16:48] horizon, google-api-proxy [19:17:59] 06Labs, 10Labs-Infrastructure, 10Horizon: Disassocaiting floating IP does not release it back for reuse - https://phabricator.wikimedia.org/T136872#2350559 (10bd808) [19:18:27] three 3 bugs in a row means time for me to eat lunch [19:45:00] 06Labs, 10Labs-Infrastructure, 10Horizon: Disassocaiting floating IP does not release it back for reuse - https://phabricator.wikimedia.org/T136872#2350559 (10Andrew) This looks to me like a caching bug in the horizon project summary. [19:45:25] 06Labs, 10Labs-Infrastructure, 10Horizon: Disassociating floating IP does not show it as available in the horizon project quota summary - https://phabricator.wikimedia.org/T136872#2350667 (10Andrew) [20:08:12] 06Labs, 10Labs-Infrastructure, 10Horizon: Disassociating floating IP does not show it as available in the horizon project quota summary - https://phabricator.wikimedia.org/T136872#2350744 (10bd808) When I try to associate I get "Error: Unable to associate floating IP." as a response. [20:16:39] 06Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#2350758 (10BBlack) I assume we still can't have jessie exec nodes. Looking at Launchpad, it seems that Ubuntu has openjdk-8 packages that are maintained for Wily (15.10), Xenial (16.... [20:17:19] 06Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#2350763 (10BBlack) Meant to include this link above re Ubuntu: https://launchpad.net/ubuntu/+source/openjdk-8 [20:18:29] 06Labs, 10Labs-Infrastructure, 10Horizon: Disassociating floating IP does not show it as available in the horizon project quota summary - https://phabricator.wikimedia.org/T136872#2350770 (10bd808) [20:46:21] 06Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#2350884 (10valhallasw) See {T121020} for discussion. The lack of official packages for 14.10 and 15.04 suggests to me that backporting to those distributions is nontrivial, and there... [20:48:02] 06Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#2350887 (10yuvipanda) I can definitely make k8s work, but on looking at the scripts merlbot is running, they all make *heavy* use of SGE options specified in the script file (look at... [20:56:26] 06Labs, 10Labs-Infrastructure: Reinstall labmon1001 with new disk configuration (and jessie) - https://phabricator.wikimedia.org/T136227#2350896 (10RobH) So in attempting to reinstall, this system will start with a working serial redirection, and then cease output for no reason. System then will attempt to co... [20:57:23] 06Labs, 10Labs-Infrastructure, 06Operations, 10ops-eqiad: Reinstall labmon1001 with new disk configuration (and jessie) - https://phabricator.wikimedia.org/T136227#2350898 (10RobH) a:03Cmjohnson Assignging this to @Cmjohnson and adding #ops-eqiad for him to remove power entirely from labsmon1001 and add... [21:14:28] aude: we're discussion the impending death of a bot that's very important to de.wikipedia.org. We need to notify people and, ideally, scare up the maintainer or a new maintainer... [21:14:53] Can you suggest where we should start for that? Is there a mailing list you can recommend, or some technical-community person at wmde I should talk to? [21:29:39] 06Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#2351107 (10bd808) p:05Normal>03High This bot will begin to break randomly on 2016-06-12 and be completely broken by 2016-07-12 if the insecure POST issue is not handled. >>! In T... [21:42:45] andrewbogott: oh noes :( [21:43:14] i suggest to poke birgit at [21:43:17] wmde [21:43:55] https://wikimedia.de/wiki/Mitarbeitende (not sure her irc nick) [21:49:07] andrewbogott: aude: https://lists.wikimedia.org/mailman/listinfo/vereinde-l [21:49:39] thats the mailing list of the WMDE org [21:57:27] 06Labs, 10Tool-Labs: Figure out a way to support java 1.8 on tool labs (Merl's bot) - https://phabricator.wikimedia.org/T121279#2351195 (10BBlack) >>! In T121279#2350884, @valhallasw wrote: > See {T121020} for discussion. The lack of official packages for 14.10 and 15.04 suggests to me that backporting to thos... [22:00:29] aude, mutante, thanks! [22:00:37] mutante: will they take it amiss if I email that list in english? [22:01:46] andrewbogott: i think it's ok and they will understand [22:03:49] hello, any tool labs admins able to give this guy a restart? https://tools.wmflabs.org/templatecount [22:04:11] I think it's still hanging from when the replicas went down yesterday. I had to manually restart xTools as well [22:04:22] musikanimal: I'll have a look [22:04:27] ty [22:05:39] bam, that did it. Thanks! [22:06:09] andrewbogott: I think this one needs a kick too http://tools.wmflabs.org/templatetransclusioncheck/ [22:06:21] musikanimal: I tried to restart templatecount, did anything happen? [22:06:29] hm, looks like yes [22:06:30] probably a bunch of others, I just mentioned those two because people brought it up at en:WP:VPT [22:06:31] yes that worked [22:11:21] thanks again! [22:15:06] bd808: I want to give the LdapAuthentication AuthManager patch some testing, do you think I can blow up anything by enabling AuthManager on labstestwiki? [22:15:15] technically that seems to be a production wiki [22:18:03] 06Labs, 06Discovery, 06Operations, 10hardware-requests, 03Discovery-Search-Sprint: eqiad: (2) Relevance forge servers - https://phabricator.wikimedia.org/T131184#2351298 (10debt) [22:20:54] tgr: it seems reasonable to me. Poke andrewbogott and Krenair too so they can help debug if things go wrong [22:21:24] tgr: that's the right place to test [22:21:37] puppet is currently disabled on that box but I don't think that matters. [22:21:50] I'm not positive that scap pushes out to that box, you may need to sync manually [22:23:18] labtestwikitech has been addded to scap [22:23:54] 06Labs, 06Discovery, 06Operations, 10hardware-requests: rack/upgrade/setup/install/deploy relforge100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T136708#2351316 (10debt) [22:24:46] eh [22:24:48] files/dsh/group/mediawiki-installation:labtestweb2001.wikimedia.org [22:27:38] that's right [22:36:44] (03PS1) 10Krinkle: Don't index username queries [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/292494 [22:36:56] (03CR) 10Krinkle: [C: 032 V: 032] Don't index username queries [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/292494 (owner: 10Krinkle) [22:37:41] (03PS1) 10Krinkle: Revert "robots.txt: disallow /guc/" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/292495 [22:39:41] Krinkle: I can push that robots.txt change out for you when you think the tool is ready for it [22:39:51] Deploying it now, one sec. [22:40:26] It's been deployed since 6 hours ago, but as local hack. [22:41:58] bd808: Go ahead :) [22:42:16] (03CR) 10BryanDavis: [C: 032] "Let's give this a shot" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/292495 (owner: 10Krinkle) [22:42:19] I'll monitor access log over the next few hours to make sure there's no POST requests from Google anymore. [22:42:57] 10.68.21.49 tools.wmflabs.org - [02/Jun/2016:22:35:47 +0000] "POST /guc/ HTTP/1.1" 200 8549 "https://tools.wmflabs.org/guc/?user=79.40.163.186" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" [22:43:07] (03Merged) 10jenkins-bot: Revert "robots.txt: disallow /guc/" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/292495 (owner: 10Krinkle) [22:43:13] Well, for what its worth, it looks like it didn't even work in the first place? [22:43:40] lol. I don't know how often gbot grabs robots.txt [22:44:23] i wonder if you can tell it stuff via that google webmaster tools thing [22:44:32] now renamed to search console [22:46:04] Krinkle: {{done}} https://tools.wmflabs.org/robots.txt [22:47:16] (03PS1) 10Krinkle: Try to avoid auto-submit for bots or spiders on GET [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/292498 [22:47:27] (03CR) 10Krinkle: [C: 032] Try to avoid auto-submit for bots or spiders on GET [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/292498 (owner: 10Krinkle) [22:48:07] 06Labs, 10Tool-Labs: toolserver-home-archive is using 52G on Tools - https://phabricator.wikimedia.org/T136202#2326568 (10Dispenser) This file might actually have Toolserver data that was considered lost. I'm downloading it over the 6 days (ISP caps), but I'm out of the country for the next two weeks. So I c... [22:52:06] Wow, something is taking a long time - http://tools.wmflabs.org/snapshots/?action=updatelog (from 16:00 to 22:50, took: 24,633 seconds) [22:54:59] bd808: thx [22:55:57] (03CR) 10Krinkle: [V: 032] Try to avoid auto-submit for bots or spiders on GET [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/292498 (owner: 10Krinkle) [22:56:57] 06Labs, 10Labs-Infrastructure, 10Horizon: Disassociating floating IP does not show it as available in the horizon project quota summary - https://phabricator.wikimedia.org/T136872#2351396 (10bd808) Several hours later I have logged out and logged back in again and still see 1/1 used and get an "Error: Unable... [23:16:22] bd808: do you want me to allocate another IP to you just to unblock you? [23:16:37] YuviPanda: yes please [23:16:45] bd808: kk on it [23:17:47] !log google-api-proxy increase floating IP quota to 2 due to T136872 [23:17:47] T136872: Disassociating floating IP does not show it as available in the horizon project quota summary - https://phabricator.wikimedia.org/T136872 [23:17:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Google-api-proxy/SAL, Master [23:17:52] bd808: try now [23:18:30] worked! [23:18:38] bd808: \o/ cool [23:19:23] 06Labs, 10Labs-Infrastructure, 10Horizon: Disassociating floating IP does not show it as available in the horizon project quota summary - https://phabricator.wikimedia.org/T136872#2351573 (10bd808) [23:22:35] 06Labs, 10Labs-Infrastructure, 10Horizon: Disassociating floating IP does not show it as available in the horizon project quota summary - https://phabricator.wikimedia.org/T136872#2350559 (10yuvipanda) I've allocated an additional floating IP to this to unblock @bd808 for right now. [23:29:43] brion: do you want to keep that floating IP for the ogvjs project in the long term? is ok if you do, but if you can do without I'd prefer returning it back to pool since you get SSL with proxy [23:30:04] 06Labs, 06Operations, 10wikitech.wikimedia.org: distribution upgrade for wikitech-static instance - https://phabricator.wikimedia.org/T94585#2351602 (10Dzahn) 05Open>03Resolved a:03Dzahn Meanwhile this was done in T126385 and is on jessie. [23:35:50] 06Labs, 10Labs-Infrastructure, 10Horizon: Disassociating floating IP does not show it as available in the horizon project quota summary - https://phabricator.wikimedia.org/T136872#2351621 (10bd808) The ip 208.80.155.245 is now in use by the project. If horizon/nova show another ip allocated or in use that ca... [23:39:01] andrewbogott: where does labstestwiki run? silver? [23:40:26] tgr: labtestweb2001.wikimedia.org [23:40:50] tgr: it is in scap dsh lists