[00:36:33] hi all - does MediaWiki have an IRC channel itself ? [00:37:04] * dbb => darkblue_b [00:37:32] dbb: i think #mediawiki is it [00:37:58] yes thx [00:50:06] the instance estest1001.search.eqiad.wmflabs aparently has hung tasks, according to 'get console output' on wikitech. I additionally can't get to it via ssh. No clue how long it's been in this error state, any guesses at what went wrong before i just reboot it from wikitech? [00:50:32] e.g.: INFO: task jbd2/vda1-8:192 blocked for more than 120 seconds. [00:51:01] hmm that's disk access [00:51:06] I wonder if virt1005 is full >_> [00:52:47] nah, looks to only be ~50% in ganglia [00:52:50] YuviPanda: ^ [00:52:59] hmm ok [00:53:12] ebernhardson: I... guess... restart? and we'll investigate more deeply if it happens again? [00:53:16] sure [00:53:28] sorry! [00:53:34] no worries, its a mystery to me too :) [01:01:02] 6Labs: Do a VM cleanup day - https://phabricator.wikimedia.org/T119476#1827275 (10yuvipanda) 3NEW [01:01:23] ebernhardson: is it back up and behaving properly now? [01:02:02] YuviPanda: yup a reboot made it happy again [01:11:31] 6Labs, 6Discovery, 7Elasticsearch, 5Patch-For-Review: Replicate production elasticsearch indices to labs - https://phabricator.wikimedia.org/T109715#1827312 (10EBernhardson) turned on wikidatawiki and commonswiki as well. Will see what a day worth of logs looks like and maybe turn on a few more tomorrow. [02:11:03] 6Labs, 7Database: Create a script that'll allow easier swatting of queries that're overloading labsdb - https://phabricator.wikimedia.org/T119479#1827341 (10yuvipanda) 3NEW [03:25:54] 6Labs, 10Labs-Team-Backlog, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Labs NFSv4/idmapd mess - https://phabricator.wikimedia.org/T87870#1827377 (10coren) labstore* NFS daemons now use the ldap shim and grab group membership information from LDAP regardless of the systemwide setting. Left: turn off LDAP in ns... [05:20:17] YuviPanda: if person is member of project in labs he/she can delete instances created by other member? [05:20:36] kart_: only admins can create or delete instances [05:20:40] as in, projectadmins [05:21:15] YuviPanda: okay. Thanks! [06:53:01] 6Labs, 10Tool-Labs, 10Tool-Labs-tools-Erwin's-tools, 7Monitoring: monitor webservice / 504 errors for erwin - https://phabricator.wikimedia.org/T90800#1827533 (10Nemo_bis) Well, it was just down for 2 days until I restarted it (warned by supernino on IRC). [10:36:03] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Antigng was created, changed by Antigng link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Antigng edit summary: Created page with "{{Tools Access Request |Justification=[[:zh:User:Antigng-bot]] |Completed=false |User Name=Antigng }}" [11:15:37] 10Tool-Labs-tools-Global-user-contributions, 6Stewards-and-global-tools: Global user contributions doesn't work - https://phabricator.wikimedia.org/T119414#1827789 (10Luke081515) [11:15:40] 6Labs, 10Tool-Labs, 6operations, 7Database: labsdb1002 and labsdb1003 crashed, affecting replication - https://phabricator.wikimedia.org/T119315#1827787 (10Luke081515) 5Open>3Resolved Replag is gone: http://tools.wmflabs.org/betacommand-dev/cgi-bin/replag Thanks for this quick fix. [11:16:39] 6Labs, 10Tool-Labs, 6operations, 7Database: labsdb1002 and labsdb1003 crashed, affecting replication - https://phabricator.wikimedia.org/T119315#1827790 (10jcrespo) p:5Unbreak!>3High The previous tables have been checked. Things seem back to normal. [11:19:16] 6Labs, 10Tool-Labs, 6operations, 7Database: labsdb1002 and labsdb1003 crashed, affecting replication - https://phabricator.wikimedia.org/T119315#1827795 (10jcrespo) This is not 100% fixed to me, some checks and actionables I mentioned are pending, although lower priority but I suppose we can track those on... [11:26:34] 10Tool-Labs-tools-Global-user-contributions, 6Stewards-and-global-tools: Global user contributions doesn't work - https://phabricator.wikimedia.org/T119414#1827808 (10jcrespo) How to check if this is fixed now? [11:33:42] 10Tool-Labs-tools-Global-user-contributions, 6Stewards-and-global-tools: Global user contributions doesn't work - https://phabricator.wikimedia.org/T119414#1827821 (10Shanmugamp7) >>! In T119414#1827808, @jcrespo wrote: > How to check if this is fixed now? You can use [[https://tools.wmflabs.org/guc/?user=198... [12:24:06] 10Tool-Labs-tools-Global-user-contributions, 6Stewards-and-global-tools: Global user contributions doesn't work - https://phabricator.wikimedia.org/T119414#1827903 (10jcrespo) This looks like resolved to me, but I will let @Vituzzu close it or comment on it. [12:27:37] 10Tool-Labs-tools-Global-user-contributions, 6Stewards-and-global-tools: Global user contributions doesn't work - https://phabricator.wikimedia.org/T119414#1827905 (10Glaisher) I checked a few IPs that has made edits recently and it seems to be resolved. [12:27:50] 10Tool-Labs-tools-Global-user-contributions, 6Stewards-and-global-tools: Global user contributions doesn't work - https://phabricator.wikimedia.org/T119414#1827906 (10Glaisher) 5Open>3Resolved a:3jcrespo [12:32:09] you may love this: https://phabricator.wikimedia.org/T71463#1827909 [12:33:35] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Isart was created, changed by Isart link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Isart edit summary: Created page with "{{Tools Access Request |Justification=Hello, I would like to get access to the read replicas to help with the DBA tickets on phabricator |Completed=false |User Name=Isart }}" [12:48:04] Hello, do I need an account set up to save dashboards on http://graphite.wmflabs.org ? [12:48:28] or are they saved locally (as it seems) [12:48:54] or as it doesn't seem, sorry! [12:51:49] odd, now I'm seeing one saved..please disregard http://graphite.wmflabs.org/dashboard/#maps_warper1 [12:53:05] ahh I think there is a little bug. The "share" URL for that graph is http://graphite.wmflabs.org/dashboard/maps_warper1 which gives either an error or a blank dashboard, but the URL given above seems to work [13:07:35] can I create my own mysql db on tool labs? [13:09:31] You should be able to... [13:09:47] abartov: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Steps_to_create_a_user_database_on_tools-db [13:10:47] Reedy: thanks! I didn't realize the prefix thing. [13:10:52] heh [13:10:53] :) [13:11:30] Reedy: hmm, nope, access denied. [13:12:01] Paste the command and the error? [13:12:33] https://www.irccloud.com/pastebin/I1LjaFNp/ [13:13:05] well doesn't that suck [13:13:06] * Reedy tries [13:14:47] abartov: Try two underscores [13:14:58] create database s52767__something; [13:15:03] It's not obvious in the docs [13:15:10] ERROR 1044 (42000): Access denied for user 'u1226'@'%' to database 'reedy_test' [13:15:12] MariaDB [(none)]> CREATE DATABASE reedy__test; [13:15:12] ERROR 1044 (42000): Access denied for user 'u1226'@'%' to database 'reedy__test' [13:15:13] MariaDB [(none)]> CREATE DATABASE u1226__test; [13:15:13] Query OK, 1 row affected (0.08 sec) [13:16:08] Reedy: zomg two underscored! [13:16:14] underscores, even. [13:16:15] thanks! [13:16:31] I think I might make that explicit in the docs [13:17:30] Reedy: good idea. :) [13:18:32] done [13:39:42] once upon a time only one underscore was required to create a db [13:42:28] Reedy, it is documented [13:42:49] jynus: yeah, but it's not exactly obvious if you don't copy/paste it etc [13:43:05] so people look at it and go foo_bar not foo__bar [13:45:15] It said: " The name of the credential user is followed by two underscores and then the name of the database: " [13:45:36] before your edit [13:50:41] but double clarifing is a good addition if that helps someone [13:52:54] by the way, all (most) wierdness on labs has an actual reason (normally due to past incidents), even if it seems too weird at first [13:53:30] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Mardam was created, changed by Mardam link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Mardam edit summary: Created page with "{{Tools Access Request |Justification=I want to contribute to the wikidata-exports project |Completed=false |User Name=Mardam }}" [14:04:08] jynus: so what's the actual reason behind double underscores? I personally remember creating a database with only a single underscore [14:04:53] if I remember correctly, the previous grants [14:05:11] where given with 'name__%' [14:05:37] which meant that you had access to everithing starting with your name [14:05:57] this was later corrected to 'name\_\_%' [14:06:23] so the fact that it was possible to create databases with one underscore was a permission error [14:06:32] oh lol [14:07:12] this had some security implications, but allow me to skip those [14:07:40] (you can do archeology on phabricator, though, but I do not like finger pointing) [14:08:58] well, I'm doing archaeology on operations/puppet right now [15:56:09] 6Labs, 10Labs-Team-Backlog, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Labs NFSv4/idmapd mess - https://phabricator.wikimedia.org/T87870#1828248 (10coren) 5Open>3Resolved This is done; labstore* hosts are now normal prod servers with base::admin. [16:04:50] 6Labs, 6Discovery, 10Maps: Request to enable critical alert notifications for maps-warper instance on shinken.wmflabs.org - https://phabricator.wikimedia.org/T116053#1828271 (10Chippyy) I've set up http://graphite.wmflabs.org/dashboard/#maps_warper1 which may be the kind of persistent monitoring we are after... [17:22:21] 6Labs, 10Tool-Labs: Migrate some tools nodes away from labvirt1002, it's getting full - https://phabricator.wikimedia.org/T119399#1828460 (10Andrew) labvirt1002 is running at 5% free space now... a little better but I'd like to get it down low enough that icinga isn't worrying (which is, I think, 8%) [17:52:09] 6Labs, 10Labs-Infrastructure, 6operations, 10ops-eqiad, 5Patch-For-Review: Update tag and racktables for holmium: rename to labservices1002. - https://phabricator.wikimedia.org/T119533#1828624 (10Andrew) 3NEW a:3Andrew [17:52:49] 6Labs, 10Labs-Infrastructure, 6operations, 10ops-eqiad, 5Patch-For-Review: Update tag and racktables for holmium: rename to labservices1002. - https://phabricator.wikimedia.org/T119533#1828624 (10Andrew) um... not yet, though, I have to merge some other stuff. [18:18:28] 6Labs, 10Tool-Labs, 10Incident-20150617-LabsNFSOutage: Re-enable cron for tools on tool labs - https://phabricator.wikimedia.org/T104614#1828792 (10coren) 5Open>3Resolved All crontabs that had entries currently commented out that were //not// commented out before the labstore crash and that had not been... [18:19:41] 6Labs, 10Tool-Labs: Migrate some tools nodes away from labvirt1002, it's getting full - https://phabricator.wikimedia.org/T119399#1828805 (10coren) p:5Triage>3High Evaluating migration candidates. [18:39:44] 6Labs, 10Tool-Labs, 5Patch-For-Review: sge master not starting up on tools-master - https://phabricator.wikimedia.org/T109316#1828887 (10coren) This is done, but comes with a (different, arguably better) caveat: rather than have to start the `gridengine-master` service if the instance is (re)booted, puppet h... [18:39:48] 6Labs, 10Tool-Labs, 5Patch-For-Review: sge master not starting up on tools-master - https://phabricator.wikimedia.org/T109316#1828888 (10coren) 5Open>3Resolved [18:39:49] 6Labs, 10Tool-Labs: Puppetize gridengine master configuration - https://phabricator.wikimedia.org/T95747#1828889 (10coren) [19:09:06] 6Labs, 6operations, 7Puppet: Self hosted puppetmaster is broken - https://phabricator.wikimedia.org/T119541#1829090 (10yuvipanda) 3NEW [19:09:56] 6Labs, 6operations: Untangle labs/production roles from labs/instance roles - https://phabricator.wikimedia.org/T119401#1829098 (10yuvipanda) So my plan is to move everything that is to do with labs infrastructure in some form or way into labs/ and then rename all other things that run on top of labs to just b... [19:26:13] 6Labs, 10Tool-Labs, 10Incident-20150617-LabsNFSOutage: Re-enable cron for tools on tool labs - https://phabricator.wikimedia.org/T104614#1829148 (10valhallasw) Did you also send an e-mail to the affected accounts? The re-enabled cron might come as a surprise to some otherwise. [19:38:33] 6Labs, 10Tool-Labs, 10Incident-20150617-LabsNFSOutage: Re-enable cron for tools on tool labs - https://phabricator.wikimedia.org/T104614#1829197 (10coren) >>! In T104614#1829148, @valhallasw wrote: > Did you also send an e-mail to the affected accounts? The re-enabled cron might come as a surprise to some ot... [19:56:37] 6Labs: Document support levels for tools and labs projects - https://phabricator.wikimedia.org/T116598#1829268 (10chasemp) p:5Normal>3Low talked to @kaldari a bit and proposed a discussion time for next week [20:00:37] Coren: umm, what cronjobs of mine ('legobot') did you enable? everything was fine... [20:00:47] legoktm: Lemme check the diffs. [20:01:02] tools.legobot@tools-bastion-01:~$ crontab -l [20:01:02] crontabs/tools.legobot/: fopen: Permission denied [20:01:03] wtf? [20:01:21] ...crontab -e is empty... [20:01:28] legoktm: Hang on. [20:01:34] * legoktm doesn't touch anything [20:02:13] Ah. How dumb. The patch changed ownership. :-) [20:02:15] * Coren fixes that. [20:02:30] Done. [20:02:36] Now lemme see that diff of yours. [20:03:02] -#0,10,20,30,40,50 * * * * jsub -N chu -once -mem 600M -quiet php /data/project/legobot/harej/chu.php [20:03:02] +0,10,20,30,40,50 * * * * jsub -N chu -once -mem 600M -quiet php /data/project/legobot/harej/chu.php [20:03:27] okay yeah, I had intentionally disabled that one... [20:03:45] * legoktm re-disables [20:04:14] Ah! It was enabled before the crash; then disabled when we did the recover. Then you uncommented it, then commented it out again? [20:04:30] Yeah [20:04:31] Heh. No way my process could have figured that one out! :-) [20:04:45] Coren: by looking at mtime? :) [20:05:06] or disabling with a comment like # auto-disabled maybe? [20:05:39] YuviPanda: That doesn't tell me *what* changed or not. I suppose we could version control crontabs though. :-) [20:06:04] legoktm: Yeah, hindsight being 20/20 and all that. If I had to do it again, I'd comment out with something like #XXX# or somesuch. [20:06:06] Coren: sure but if it has a newer mtime it means someone has touched it and done things to it and so we shouldn't touch it again? [20:08:30] YuviPanda: Good point; I was only looking at actual contents. But a quick look at the timestamps shows only two such crontabs, and the other had only commented out comments. [20:08:59] Coren: ok! but if you had already modified them, the mtime would've changed no? [20:09:10] YuviPanda: I still have the originals handy. [20:09:15] ah ok [20:09:16] ok [21:21:39] 6Labs, 6Discovery, 7Elasticsearch, 5Patch-For-Review: Replicate production elasticsearch indices to labs - https://phabricator.wikimedia.org/T109715#1829708 (10EBernhardson) overnight load looked quite reasonable, turned on nlwiki, frwiki and eswiki this morning. merges now look to be backing up and getti... [21:27:43] 6Labs, 6Discovery, 7Elasticsearch, 5Patch-For-Review: Replicate production elasticsearch indices to labs - https://phabricator.wikimedia.org/T109715#1829713 (10EBernhardson) I've also increased the disk throughput limit from 20MB/s to 25MB/s. This will negatively impact query performance, but will help it... [22:08:24] 6Labs: Investigate moving mwoffliner onto a labs-on-real-hardware machine - https://phabricator.wikimedia.org/T117081#1829846 (10chasemp) [22:08:25] 6Labs, 10Labs-Team-Backlog: Support bare-metal server allocation in labs -- bootstrap mode - https://phabricator.wikimedia.org/T95185#1829845 (10chasemp) [22:09:31] 6Labs, 6Discovery, 7Elasticsearch, 5Patch-For-Review: Replicate production elasticsearch indices to labs - https://phabricator.wikimedia.org/T109715#1829867 (10EBernhardson) after applying those changes the merge rate on nobelium looks to be going back towards healthy: {F3011543} [22:47:13] 10Tool-Labs-tools-Global-user-contributions, 6Stewards-and-global-tools: Global user contributions doesn't work - https://phabricator.wikimedia.org/T119414#1829992 (10Vituzzu) 5Resolved>3Open [22:47:29] 10Tool-Labs-tools-Global-user-contributions, 6Stewards-and-global-tools: Global user contributions doesn't work - https://phabricator.wikimedia.org/T119414#1825671 (10Vituzzu) Got the error message again :/ [22:58:47] 6Labs, 6Community-Tech: Labs project for the Community Tech team - https://phabricator.wikimedia.org/T118944#1830044 (10DannyH) [23:03:32] 6Labs, 10Tool-Labs, 7Database: Client loses connection to database replica and cannot connect to it any further - https://phabricator.wikimedia.org/T119577#1830067 (10Giftpflanze) 3NEW [23:13:39] ^ i'd really like this bug to be resolved fast [23:15:17] gifti: You could start by actually putting some information in the bug [23:15:37] 6Labs, 10Tool-Labs, 7Database: Client loses connection to database replica and cannot connect to it any further - https://phabricator.wikimedia.org/T119577#1830110 (10yuvipanda) Can you provide more details? when does it lose connection? what error message are you getting when you try to connect again? Is it... [23:15:42] i wonder what information that would be [23:15:49] oh look [23:15:55] gifti: Is this the first time you're reporting a bug? [23:16:38] Generally assume nobody knows what you're talking about so include. Source system, destination system. Kind of application. Time stamps. Any information that could provide context [23:17:49] Steps on how to reproduce tend to be very helpful gifti [23:18:21] no, it's not my first time … but i didn't know what possibly i could add [23:19:37] gifti: How I or someone else can reproduce it. [23:21:48] 6Labs, 10Tool-Labs, 7Database: Client loses connection to database replica and cannot connect to it any further - https://phabricator.wikimedia.org/T119577#1830131 (10Giftpflanze) The last run of my job started at 24/11/2015 21:15 UTC and ended at 22:43 (then I got the error that I lost connection). The code... [23:23:40] gifti: it's generally helpful to provide exact error messages / code rather than descriptions (or in addition to descriptions) [23:23:54] 6Labs, 10Tool-Labs, 7Database: Client loses connection to database replica and cannot connect to it any further - https://phabricator.wikimedia.org/T119577#1830162 (10yuvipanda) Can you provide the exact error messages you are getting? [23:23:58] yeah, but i already deleted them … [23:24:17] gifti: Come on, YuviPanda is not a psychic. [23:24:43] well, /I/ think this is pretty precise [23:24:49] gifti: yeah, not much I can do without ability to reproduce [23:24:57] the bug [23:25:24] hm, is it enough for you to see the program? [23:25:29] or should i do more? [23:26:21] how about the exact error message to begin with and then maybe a small program that exhibits the behavior you are seeing? [23:26:55] ok [23:27:13] gifti: thanks! [23:38:35] so, how can I figure out why my non-lighttpd service isn't reached? [23:39:12] I followed the instructions with the portgrabber [23:39:25] * YuviPanda mumbles about portgrabber, should replace it soooooon hoppeffullly [23:39:31] abartov: which tool is this? [23:39:39] um Coren are you around and able to help abartov? [23:39:43] https://tools.wmflabs.org/authorlang-game/ [23:40:37] abartov: if Coren isn't around in 5mins I'll take a look [23:40:54] thanks [23:47:53] YuviPanda: do I need to 'webservice start' if I'm not using lighttpd? [23:48:10] abartov: so where are the instructions you were following? [23:48:15] I'm taking a look now [23:48:35] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#Other_web_servers [23:50:23] abartov: ok let me investigate [23:50:48] abartov: is this a ruby app? [23:50:58] always :) [23:51:09] :D [23:52:31] hmmm [23:52:37] it never gets scheduled on the webgrid nodes [23:52:59] * YuviPanda tests [23:53:18] hmmmm other things are scheduled there [23:54:38] (this is the URL I expect to work, and return a bit of JSON [23:54:41] https://tools.wmflabs.org/authorlang-game/game/main?action=desc [23:55:04] ok! [23:56:07] baaaam [23:56:45] abartov: ok it's scheduled now [23:56:52] abartov: and returning a 404 [23:57:09] abartov: I think the problem is now - are you expecting your URLs to be /authorlang-agme/ or just /? [23:57:15] it's gotta be the former unfortunately [23:57:22] abartov: I'll amend docs now [23:57:47] I can deal with the URLs. [23:57:51] what was the problem scheduling it? [23:57:53] abartov: and you don't need webservice commands [23:58:05] abartov: so by default we still say 'give us a ubuntu precise host' [23:58:15] abartov: except we don't have any ubuntu precise hosts for generic webservices [23:58:28] abartov: so this has been broken for about 9 months now and you're the first person to notice [23:58:43] YuviPanda: gah! [23:58:56] I have a 80% done setup that'll get rid of all this and provide beautiful nice webservices that don't fuck with portgrabber, but so many things to do :( [23:59:00] * YuviPanda weeps some more [23:59:05] abartov: at least I fixed the doc now [23:59:22] abartov: so you can restart your webserver the old fashioned way - qdel and then the full jstart [23:59:28] or you can do qmod -rj [23:59:39] abartov: I shall notify you when I fix this