[00:02:39] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<22.22%) [00:31:14] Question: Is there a way to watch every new diff that gets published to a WMF site? [00:37:39] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [06:23:41] [06:30:52] 3Labs: Investigate / remove swap from labs instances - https://phabricator.wikimedia.org/T88450#1013718 (10yuvipanda) p:5Triage>3Low Can look at graphite data for memory / disk use before deciding. [06:34:22] !log deployment-prep killed deployment-mediawiki01 host. FOREEVERRR [06:34:29] Logged the message, Master [06:36:58] good day. [06:37:01] !log deployment-prep created deployment-mediawiki01 host [06:37:03] Logged the message, Master [06:39:21] hello hello queen of france [06:40:52] PROBLEM - Puppet failure on tools-exec-04 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:49:19] YuviPanda: can you please restart the webserver for https://tools.wmflabs.org/videoconvert ? [06:50:09] matanya: doing [06:50:15] thanks :) [06:50:30] matanya: done. and migrated it to trusty too [06:50:33] tell me if that causes problems :) [06:51:30] seems to be working well! thanks a lot [06:52:36] cool :) [06:57:16] YuviPanda: is this tool legit? https://tools.wmflabs.org/chromium/ [06:58:09] matanya: ugh, that seems bogus [06:58:27] matanya: hmm, interesting. [06:58:47] matanya: it seems to be someone experimenting with iojs [06:58:55] which seems alright [06:59:10] it downloads chrome for me [06:59:17] yeah [06:59:18] it does for me too [06:59:25] I'm not sure why it's doing that [06:59:58] matanya: it seems to be [07:03:41] YuviPanda: Would you be able to add me to the bastion project? I'm not seeing myself listed on there, but I'm shown as a member of my project. [07:03:50] looking [07:04:07] AlexZ_: what's your wikitech username? [07:04:34] YuviPanda: Azariv [07:04:55] back from when I was staff and not Az1568 :) [07:07:05] AlexZ_: ah :) [07:08:05] AlexZ_: I have added you [07:08:13] Thanks YuviPanda! [07:08:16] AlexZ_: yw [07:10:49] RECOVERY - Puppet failure on tools-exec-04 is OK: OK: Less than 1.00% above the threshold [0.0] [07:50:33] PROBLEM - Free space - all mounts on tools-webgrid-02 is CRITICAL: CRITICAL: tools.tools-webgrid-02.diskspace.root.byte_percentfree.value (<22.22%) [08:07:15] Can someone start the webservice at https://tools.wmflabs.org/videoconvert/ ? [08:07:46] Nemo_bis: I started it a while ago [08:12:27] 3Gerrit-Patch-Uploader: Gerrit Patch Uploader does not work because no space left of device - https://phabricator.wikimedia.org/T88517#1013871 (10Fomafix) 3NEW [08:14:28] 3Gerrit-Patch-Uploader: Gerrit Patch Uploader does not work because no space left of device - https://phabricator.wikimedia.org/T88517#1013880 (10yuvipanda) It should probably clone to its homedir, and not to /tmp [08:43:47] Ouch, forgot to refresh before asking ;) [08:53:29] 3Tool-Labs-tools-Other: videoconvert tool loads external resources - https://phabricator.wikimedia.org/T88519#1013923 (10Nemo_bis) 3NEW a:3Prolineserver [09:21:34] 3Tool-Labs-tools-Other: videoconvert tool loads external resources - https://phabricator.wikimedia.org/T88519#1013973 (10Prolineserver) 5Open>3Resolved [10:40:05] 3Labs: New labs instances not being allowed to mount anything from NFS - https://phabricator.wikimedia.org/T88527#1014054 (10yuvipanda) 3NEW [11:26:03] !log deployment-prep deleted instance deployment-mediawiki02 [11:26:09] Logged the message, Master [11:34:59] !log deployment-prep created instance deployment-mediawiki02 [11:35:02] Logged the message, Master [13:22:03] 3Tool-Labs-tools-Other: Migrate to Tool Labs: https://toolserver.org/~vvv/adminstats.php - https://phabricator.wikimedia.org/T63030#1014292 (10TTO) a:5TTO>3None [13:35:25] 3Labs: New labs instances not being allowed to mount anything from NFS - https://phabricator.wikimedia.org/T88527#1014307 (10yuvipanda) Until then, new instances won't have NFS. You can 'fix' this by: 1. running manage-nfs-volumes-deamon on labstore1001 2. Restarting the instance. 3. Praying [13:51:42] !log deployment-prep deleted deployment-jobrunner01, trusty version coming up [13:51:45] Logged the message, Master [13:53:31] 3Labs: New labs instances not being allowed to mount anything from NFS - https://phabricator.wikimedia.org/T88527#1014336 (10yuvipanda) p:5High>3Unbreak! [13:56:30] !log deployment-prep created deployment-jobrunner01, trusty instance [13:56:33] Logged the message, Master [14:09:04] 3Labs: New labs instances not being allowed to mount anything from NFS - https://phabricator.wikimedia.org/T88527#1014395 (10coren) /exp is a normal "side effect" of NFSv4 setup which must export everything from a single tree; so this the normal behaviour in our quasi-multi-tenant setup: there is a bind mount f... [14:11:43] 3Labs: New labs instances not being allowed to mount anything from NFS - https://phabricator.wikimedia.org/T88527#1014403 (10coren) manage-nfs-volumes-daemon is indeed behaving strangely, it fails to start properly from upstart. [15:04:09] 3Labs: New labs instances not being allowed to mount anything from NFS - https://phabricator.wikimedia.org/T88527#1014659 (10coren) 5Open>3Resolved a:3coren manage-nfs-volumes-daemon could not start properly from upstart because a few of the exports in /etc/exports.d (maintained by that script) was owned b... [16:55:04] hi -- I am trying to write a python script to query the wiki database on the tools-login server [16:55:39] but I get an "Access denied" error [16:55:59] does anyone has any tips [16:56:02] ? [16:56:56] I am using "enwiki.labsdb" as hostname, but I am not sure that's correct [16:57:40] stefano: yeah, that should be OK. Are you using the replica.my.cnf defaults_file? [16:58:34] valhallasw`cloud: I am using the user and password in the file [16:58:49] (i.e., that are in the file) [16:58:54] stefano: odd. What's the exact error you get? [16:59:21] mysql.connector.errors.OperationalError: 1045: Access denied for user 'u11553'@'%' (using password: YES) [17:00:49] I am asking about the host as the '%' seems suspicious [17:01:19] *nod*. that would typically be your own ip [17:01:51] stefano: does mysql -h enwiki.labsdb work in the shell? [17:04:28] yes, adding the -u and -p options it works [17:05:52] hrm. [17:05:53] db = MySQLdb.connect(host='enwiki.labsdb', read_default_file='~/replica.my.cnf') [17:05:55] ^ this works for me [17:08:54] yes, that actually works for me as well :) [17:09:38] thank you very much for you help valhallasw`cloud! [17:11:47] yw! [17:27:57] 3Labs: Make sure that manage-nfs-volumes-deamon can not run as root - https://phabricator.wikimedia.org/T88579#1015248 (10yuvipanda) 3NEW a:3coren [17:49:27] 3Labs: Labs NFSv4/idmapd mess - https://phabricator.wikimedia.org/T87870#1015324 (10coren) I see two big problems with this - implementation issues rather than a conceptual ones: (1) right now the numbering space between production and labs is disjoint, and possibly overlapping (I think it is, but mostly for th... [18:24:18] 3Labs: New labs instances not being allowed to mount anything from NFS - https://phabricator.wikimedia.org/T88527#1015430 (10Andrew) I suspect that this problem started yesterday, when we hit the magic number of 5000 users. That broke a bunch of ldap queries (due to a hard query-limit) which probably caused man... [18:25:07] YuviPanda: ^ possible explanation for what broke ^ [18:25:32] andrewbogott: hmm I didn't look at it until today [18:25:43] hm [18:25:55] And when I did run the script as root it worked [18:26:08] As in instances started getting their NFS mounts [18:26:34] Well, /I/ ran the user key script yesterday by hand, maybe that’s what did it. [18:29:34] andrewbogott: ah hmm [18:29:37] Maybe [18:29:46] But the last log entry was for 2nd feb [18:30:00] Anyway am being dragged off now [18:30:03] np [18:30:07] I’m caught up, thank you! [18:30:15] andrewbogott: Sounds like a reasonably good explanation - it'd certainly explain why it broke suddenly. [18:30:16] andrewbogott: did we raise the 5k limit for ldap? [18:30:25] YuviPanda: yes, to 12000 [18:31:27] I'm still impressed by the linux kernel not choking on 16k mounts. :-) [18:31:32] Cool [18:31:36] Cya tomorrow :) [18:31:49] Thanks for fixing up, Coren :) [18:33:44] andrewbogott: And yes, a quick look at the logs confirms the volume daemon broke when LDAP went boom; knowing what to look for makes it easy. :-) [18:33:52] PROBLEM - Puppet staleness on tools-exec-15 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0] [18:34:51] Coren: It had been broken for several days, right? Since the 31st, I think? [18:35:02] * Coren nods. [18:35:06] I only noticed because a user was complaining about not accessing bastion, and I saw that his keys weren’t there. [18:35:23] That should probably be monitored. [18:35:39] Yeah; only enumeration would die - lookups by an index would still work. [18:35:46] Yeah, I’ll make a note. You mean testing the ldap query, right? [18:35:51] * Coren nods. [18:36:15] Although 12000 should be enough for a while, it'd raise other issues earlier. [18:36:43] Too many things depend on LDAP atm. [18:37:08] [for it to not be better monitored] [18:38:44] https://phabricator.wikimedia.org/T88584 [19:29:51] My tool at WMFlabs is currently returning the following: The URI you have requested, /os/coor_g.php, is not currently serviced. Any suggestions? RHaworth [19:30:36] RHaworth: webservice start [19:30:41] 3Wikibugs: Match on usage of Additional Hashtags, so that project renames don't break the bot - https://phabricator.wikimedia.org/T87825#1015676 (10Quiddity) >>! In T87825#1000112, @Legoktm wrote: > Yes, it would be appreciated if you could fix the config. > > We had previously discussed storing immutable PHID... [19:34:36] 3Wikibugs: wikibugs project renames - https://phabricator.wikimedia.org/T87846#1015682 (10valhallasw) 5Open>3Resolved a:3valhallasw Resolved by @legoktm in I151c1fb1fe2620ef665c308f089af17882c53544 [19:58:36] 3Wikibugs: wikibugs project renames - https://phabricator.wikimedia.org/T87846#1015738 (10Quiddity) [20:19:25] valhallasw`cloud: Thanks. It required 'webservice restart' because it has run before. That gave: 'webservice status Your webservice is running'. But when I try to use it, I still get the error message and in shell access it changes to 'webservice status Your webservice is scheduled: Job is in error state'. I see a file called error.log but nothing has been written to it. Any more suggestions? [20:20:39] RHaworth: qstat, then qdel [20:20:53] then run webservice start again [20:29:16] valhallasw`cloud: Thanks, but it does not work. [21:07:12] RHaworth: not sure what else to try then :/ [21:07:30] Thanks anyway [21:08:15] maybe Coren or YuviPanda can take a look [21:11:53] MusikAnimal, I think I might have an idea on getting OAuth to work on different webservices. [21:14:56] * Coren checks. [21:15:23] Coren, check what? [21:16:27] Cyberpower678: RHaworth's webservice job, I presume :-p [21:16:34] ah, you just joined [21:16:45] RHaworth: "error state" means the job couldn't even be started. You can see the error if you do 'qstat -j " [21:16:47] :p [21:19:00] RHaworth: Is this the 'os' tool? [21:19:11] yes [21:19:30] RHaworth: qstat -j 7877097 [21:19:47] RHaworth: Specifically, the problem is that your error.log isn't writable. [21:20:20] RHaworth: Because you own it, and not your tool, and it doesn't have group write permission. [21:21:21] RHaworth: (1) webservice stop, (2) fix perms or owner, (3) webservice start, (4) profit! [21:22:59] 3Wikimedia-Fundraising-CiviCRM, Wikimedia-Fundraising, Labs: Create new labs project: fundraising-integration - https://phabricator.wikimedia.org/T88599#1015947 (10awight) 3NEW [21:25:01] 3Wikimedia-Fundraising-CiviCRM, Wikimedia-Fundraising, Labs: Create new labs project: fundraising-integration - https://phabricator.wikimedia.org/T88599#1015967 (10awight) [21:25:44] Coren, (4) profit! I'd like some of that. [21:26:58] RHaworth, for number 2 do (1) take $HOME/error.log (2) chmod 770 $HOME/error.log [21:27:54] Coren, the discussion for tool takeovers seems to have grinded to a quick standstill. [21:28:19] Cyberpower678: It has. Lemme try to poke the right people now who should be at their office(s) [21:28:51] Also a Phab request I filed has gone unnoticed as well. Can I poke you with it? [21:28:57] 3Wikimedia-Fundraising-CiviCRM, Wikimedia-Fundraising, Labs: Create new labs project: fundraising-integration - https://phabricator.wikimedia.org/T88599#1015973 (10awight) [21:29:50] Cyberpower678: Lots of travel in the past couple weeks. Sure, point me at it. [21:30:49] Coren, [21:30:53] cat /data/project/xtools/tmp/session/sess_034s3nuttijecq4oqql6bld7f6 [21:31:01] /facepalm [21:31:12] That's not what I wanted to do. [21:31:26] Paste fail? :-) [21:31:34] Yep [21:31:40] https://phabricator.wikimedia.org/T88123 [21:31:44] Here we go. [21:33:02] Coren, ^ [21:33:31] 3Labs: Create xtools project on Labs with domain xtools.wmflabs.org - https://phabricator.wikimedia.org/T88123#1015977 (10coren) a:3coren [21:34:52] *like [21:39:18] valhallasw`cloud, Coren & Cyberpower678: Many thanks, that has fixed it. But the permissions problem on error.log may be the result of my fiddling with that file and access.log. It may not be the original problem from a few days ago. I note that access.log had grown to 500M bytes. Is there a limit on the size of access.log? [21:40:05] RHaworth: As a rule, no - though you may want to truncate it now and then if you have no use for the data. [21:41:45] RHaworth, We've had xtools access logs grow as large as 10GB iirc. Maybe even larger. [21:50:01] Cyberpower678: Do you have a phab number for takeover of xtools already? [21:50:17] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1016028 (10Andrew) If there's no strong consensus for this then I'm ok with just keeping the wikitech db on the new host, Silver. There's a backup cron that runs there and a copy to wikitech-static which means that it m... [21:50:19] takeover of xtools [21:50:20] ? [21:50:35] nvm, found what I needed [21:50:45] It's already under active management. [21:51:43] Coren: when I catch up with springle I’m going to schedule some wikitech downtime to migrate the db over to silver. It might make sense to schedule labstore downtime in the same window… [21:52:02] andrewbogott: It would, I suppose. [21:52:31] Is there anything I need to know about rebooting labstore1001? This page seems to be… no longer correct. https://wikitech.wikimedia.org/wiki/Labs_NFS [21:52:44] Or at least I couldn’t find the start-nfs script anyplace. [21:53:07] (This is just Ghost followup, I don’t have any labstore agenda beyond just doing a reboot.) [21:57:56] * Coren checks. [21:58:56] andrewbogott: That should still be good, with some tweaks to the details. The known issue about XFS is of course no longer true at all. [21:59:14] Is there still a start-nfs script? I’ll look again... [21:59:28] root@labstore1001:~# which start-nfs [21:59:28] /usr/local/sbin/start-nfs [21:59:48] well, damn, there it is [21:59:52] I withdraw my complaint! [22:00:06] So, it’s just a) cycle power b) start-nfs c) profit? [22:01:32] andrewbogott: In theory - there have been issues with the port bonding which sometimes need watching on reboot; though nothing bad happened the last 2-3 times [22:01:44] cool [22:01:53] Well, in any case I will warn you (and everyone) before I try. [22:02:48] andrewbogott: mkay. It would be doubleplusgood if we used that opportunity to add the new shelf though because otherwise we need to make another downtime. [22:03:13] true! Is that blocked on procurement or could we get that set up soon? [22:03:48] (Welcome back, btw! How are your lungs?) [22:04:31] andrewbogott: My lungs are no longer full of fail, which is good. My understanding is that the shelf is in the DC but requires some moving of hardware to rack. [22:04:50] I.e.: was dependent on Chris being back in Ashburn. [22:05:53] weird to think that wikipedia physically exists only an hour or so from me and i almost never think about that [22:15:27] 3Labs: Increase storage available to labs NFS server - https://phabricator.wikimedia.org/T85607#1016091 (10coren) This appears to have been delivered according to RT. Putting what seems to be the physical move as blocker. [22:15:53] 3Labs: Increase storage available to labs NFS server - https://phabricator.wikimedia.org/T85607#1016092 (10coren) [22:22:36] PROBLEM - Free space - all mounts on tools-webproxy is CRITICAL: CRITICAL: tools.tools-webproxy.diskspace._var.byte_percentfree.value (<22.22%) [22:27:36] RECOVERY - Free space - all mounts on tools-webproxy is OK: OK: All targets OK [23:06:08] propably a stupid question - I can run Java on labs, right? [23:07:23] dennyvrandecic: yes [23:07:29] thx [23:08:53] 3Labs: Wikitech should use shared misc-host mysql - https://phabricator.wikimedia.org/T88311#1016303 (10Reedy) >>! In T88311#1016028, @Andrew wrote: > If there's no strong consensus for this then I'm ok with just keeping the wikitech db on the new host, Silver. There's a backup cron that runs there and a copy t... [23:10:03] dennyvrandecic: why you'd want to... ;) [23:10:35] cause the developer who wants to help me out with a specific task is more experienced in java [23:10:42] we can also go for go... ;) [23:15:54] dennyvrandecic: you can even run a tomcat webservice :-D [23:16:07] yep, that's probably what we are going to do :)