[00:09:16] 3Wikibugs: Match on usage of Additional Hashtags, so that project renames don't break the bot - https://phabricator.wikimedia.org/T87825#1000074 (10Quiddity) 3NEW [00:17:56] 3Wikibugs: Match on usage of Additional Hashtags, so that project renames don't break the bot - https://phabricator.wikimedia.org/T87825#1000112 (10Legoktm) Yes, it would be appreciated if you could fix the config. We had previously discussed storing immutable PHIDs rather than human readable names but decided... [00:25:11] 3Quarry: Quarry does not respect ORDER BY sort order in result set - https://phabricator.wikimedia.org/T87829#1000128 (10MahmoudHashemi) 3NEW a:3yuvipanda [00:37:45] PROBLEM - Puppet failure on tools-webgrid-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [01:07:46] RECOVERY - Puppet failure on tools-webgrid-01 is OK: OK: Less than 1.00% above the threshold [0.0] [04:43:39] can someone enable this tool? > http://tools.wmflabs.org/steinsplitter/rc-uploads.php [04:43:53] it didn't come back after yesterday's reboot :( [07:27:41] 3Wikibugs: wikibugs project renames - https://phabricator.wikimedia.org/T87846#1000544 (10Quiddity) 3NEW [07:28:04] 3Gerrit-Patch-Uploader: Gerrit Patch Uploader is down - https://phabricator.wikimedia.org/T87847#1000553 (10Fomafix) 3NEW [08:32:48] comets: a bit late, but I bought it back up [08:33:40] 3Gerrit-Patch-Uploader: Gerrit Patch Uploader is down - https://phabricator.wikimedia.org/T87847#1000603 (10yuvipanda) 5Open>3Resolved a:3yuvipanda After effects from the great ToolLabs restart of 2015 [08:35:45] In tool labs, how can I retrieve the contents of specified page/revision? [08:35:57] I can't find text table on replica databases... [08:36:39] devunt: you’ll have to use the API [08:36:41] or the dumps [08:36:54] Wikimedia doesn’t use the text table at all, since there’s just way too much text. [08:40:02] oh, I have to retrieves ~50 page's contents in one http request [08:41:51] but fifty api request at once is too slow and also a resource wastes I think [08:46:14] YuviPanda, any suggestion for my circumstance? [08:46:23] I have no really no idea with dealing this [08:49:18] devunt: I think you can retreive multiple page contents in one HTTP request? [08:51:48] that 'one HTTP request' means api request from my tool? [08:56:54] devunt: yes. action=query, maybe? [08:56:59] I think you can pass in multiple page titles [08:59:14] ok I'll try. thank you [09:39:14] YuviPanda, It worked. thank you! [09:44:36] devunt: yw! [10:40:18] YuviPanda: Hi, It seems wikimedia-labs I-0000075f.eqiad.wmflabs is dead/frozen state. I can not access to it anymore nor restart it. Any clue? [11:46:57] 3Tool-Labs: Clean up list of projects on Tool Labs home page and add Tomcat tools - https://phabricator.wikimedia.org/T51937#1000799 (10TTO) Just wanted to give a poke here. The home page, with its complete alphabetical list of tools, is longer than ever, and it is becoming increasingly less useful. I would ask... [12:58:54] anybody here familiar with port assignment? my tomcat app on tool labs doesn't start as port 4001 is already in use by a different tomcat. [13:13:54] PROBLEM - Free space - all mounts on tools-exec-14 is CRITICAL: CRITICAL: tools.tools-exec-14.diskspace._var.byte_percentfree.value (<33.33%) [13:40:47] 3Tool-Labs, WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1000846 (10coren) My opinion is that we should keep that process fairly lightweight; tools are meant to be easy to be maintainable within the community and are required to be open sourced f... [15:08:07] 3Tool-Labs, WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1000889 (10LuisV_WMF) No objection here to the general idea, though as noted some care should be taken around personal data/passwords. [15:58:52] hi [16:00:26] someone could show me the path to access to bastion? [16:00:57] thanks [16:39:20] The_Photographer: If you want the bastion for tools, that's tools-login.wmflabs.org. The standard labs bastion is bastion1.wmflabs.org or bastion.wmflabs.org [16:39:49] andrewbogott: could you show me the manual to begin to developp? [16:39:52] Kelson: I'll take a look at your instance. Can you tell me what the actual name and project is for I-0000075f.eqiad.wmflabs? [16:40:50] The_Photographer: It really depends on what you're hoping to do. do you mean for Tools? [16:40:57] https://wikitech.wikimedia.org/wiki/Help:FAQ [16:41:39] andrewbogott: I want to create a tool to check files uploaded to commons, specifically the size and the exif information [16:42:32] The_Photographer: ok -- I can't help you with specifics (someone else might be able to) but this is a good place to start: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs [16:42:50] andrewbogott: ok [16:43:05] The_Photographer: I presume you have an account and tools access already? [16:43:27] andrewbogott: well I remember my old shell account in toolserver [16:43:53] andrewbogott: and today i have this menssage https://wikitech.wikimedia.org/wiki/User_talk:The_Photographer [16:44:17] The_Photographer: ok, great. One more step, I need to add you to the 'tools' project. One moment. [16:44:56] ok, done [16:45:08] you should be able to log in to tools-login now, once you set up a keypair and such [16:45:09] andrewbogott: ok thanks [16:48:10] Kelson: OK, I located and started your instance, it should start behaving in 5 minutes or so [17:04:25] andrewbogott: for formal procedure https://wikitech.wikimedia.org/wiki/Shell_Request/The_Photographer [17:05:06] The_Photographer: thanks -- I should mark as already done, right? [17:05:25] andrewbogott: of course yes [17:22:16] 3Tool-Labs, WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1001016 (10Aklapper) p:5Triage>3Normal [17:26:41] !log tools reschedule all tomcat jobs [17:26:47] Logged the message, Master [18:25:18] 3Labs: Labs NFSv4/idmapd mess - https://phabricator.wikimedia.org/T87870#1001152 (10faidon) 3NEW a:3coren [19:14:46] andrewbogott_afk: thanks, it works. Any way to reboot the VM on my own? [19:24:31] https://tools.wmflabs.org/paste/ is not working [19:34:14] 3MediaWiki-extensions-OpenStackManager: Convert OpenStackManager to use extension registration - https://phabricator.wikimedia.org/T87950#1001696 (10Legoktm) [19:34:17] * Nemo_bis would never have used a pastebin in labs [19:35:29] Nemo_bis: you prefer to give stuff to unrelated third party? [19:36:57] phabricator has a pastebin that can be used too. I don't have access to that tool to restart it. [19:37:22] <^d> Phab's pastebin is nice. [19:39:07] Nikerabbit: yes [19:39:52] 3MediaWiki-extensions-OpenStackManager: Convert OpenStackManager to use extension registration - https://phabricator.wikimedia.org/T87950#1001995 (10Legoktm) [19:39:59] how to you spell Phab? [19:40:16] Nemo_bis: ok, I don't [19:41:02] Nikerabbit: then there are phabricator and Domas http://p.defau.lt/new.html [19:49:56] tools-exec-11 is temporarily unavailable? [19:50:23] (webservice won't start.) [19:51:07] hey [19:51:15] a930913: webservices don't run on exec node, but on webgrid nodes [19:51:30] Mjbmr: https://phabricator.wikimedia.org/paste/ [19:51:40] a930913: if qstat says 'queue unavailable', that's OK [19:51:40] https://phabricator.wikimedia.org/paste/create/ [19:52:00] what's that? [19:52:08] valhallasw`cloud: Lemme go to my computer. [19:52:18] a930913: but if all webgrid nodes are unavailable, that would be an issue [19:52:56] a930913: webgrid-05 looks OK in terms of load, so it should probably schedule there [19:53:04] Mjbmr: that, among other things, is how you spell phab and where the paste function is [19:54:04] Coren: webgrid-01 to -04 look severely overloaded; each has 300+% load listed on https://tools.wmflabs.org/?status [19:54:18] valhallasw`cloud: "Your webservice is scheduled: [19:54:21] queue instance "continuous@tools-exec-11.eqiad.wmflabs" dropped because it is temporarily not available [19:54:40] a930913: that's... odd. [19:54:44] valhallasw`cloud: Indeed. [19:55:16] a930913: in any case, check qstat -j and check the queue details at the bottom [19:55:35] valhallasw`cloud: What job number? [19:55:36] a930913: that should tell you if all webgrid queues are full (theres 01 to 05) [19:55:54] a930913: I though webservice start tells you, but if not, run just qstat [19:56:05] valhallasw@tools-login:~$ ssh tools-webgrid-01 uptime [19:56:06] 19:55:22 up 1 day, 22:22, 0 users, load average: 25.50, 24.22, 24.54 [19:56:09] O_o [19:56:15] what's sigma? [19:57:21] a930913: qstat will return something like '7668932 0.30263 lighttpd-g tools.gerrit r 01/28/2015 02:09:48 webgrid-lighttpd@tools-webgrid 1' [19:57:41] valhallasw`cloud: Overloaded errors. [19:57:46] a930913: then qstat -j will tell you which queues have been dropped and for which reason [19:57:48] valhallasw`cloud: Pastebinning output now. [19:57:55] a930913: ok! [19:58:00] Paste is not serviced... [19:58:04] Woo. [19:58:13] a930913: are all 01 to 05 there, or just 01-04? [19:58:28] valhallasw`cloud: 01-04 [19:58:39] a930913: then your webservice should just run on -05.... [19:59:08] valhallasw`cloud: http://pastebin.com/AWpaCQ50 [20:01:17] a930913: doh! -05 is a trusty host, so it won't schedule there by default [20:03:14] a930913: -01 is becoming more relaxed now, so the server should start [20:03:52] valhallasw`cloud: Nope. [20:04:12] :/ [20:07:07] valhallasw`cloud: Running on trusty doesnt work either. [20:07:25] a930913: no, because it's still scheduled for a precise host [20:07:33] a930913: I think you can use webservice2 to schedule it for a trusty host [20:07:40] valhallasw`cloud: Yeah. [20:07:42] Didn't work. [20:07:52] :/ [20:08:02] valhallasw`cloud: hard resource_list: h_vmem=4g,release=trusty" [20:08:16] that should work, yeah. [20:08:56] and what doesn't work? does it stay in qw state? [20:11:40] valhallasw`cloud: Ah, trusty /is/ working but `status` is still showing error. [20:11:56] a930913: it's showing an E state?! [20:12:02] a930913: that's super weird [20:12:26] valhallasw`cloud: No, it shows a r state. [20:12:29] But. [20:12:31] tools.cluestuff@tools-login:~$ webservice status [20:12:33] Your webservice is scheduled: [20:12:35] queue instance "continuous@tools-exec-11.eqiad.wmflabs" dropped because it is temporarily not available [20:13:02] a930913: oh. oooooooooooooh! [20:13:13] that's just a bug in the status script! [20:13:14] valhallasw`cloud: Oh? [20:13:17] Oh. [20:13:23] :D [20:13:24] Ok, so trusty works. [20:15:10] Weirdness. [20:15:27] valhallasw`cloud: Also, any idea why my mailing list posts never get through? [20:15:39] a930913: no. are you subscribed? [20:15:45] valhallasw`cloud: Yes. [20:16:36] a930913: hm. and you don't get any message back it has been rejected? [20:16:41] valhallasw`cloud: Nope. [20:17:02] well, I don't have access to the labs-l settings, so I can't check if anything is wrong there [20:17:11] maybe try re-subscribing [20:18:54] valhallasw`cloud: Mmm, thanks for the help btw. [20:24:30] 3Wikibugs: Limit number of projects - https://phabricator.wikimedia.org/T88011#1002250 (10valhallasw) 3NEW [20:27:53] (03PS1) 10Merlijn van Deen: Limit shown projects to 4 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/187469 (https://phabricator.wikimedia.org/T88011) [20:28:51] (03CR) 10Merlijn van Deen: "(Oxford comma ftw)" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/187469 (https://phabricator.wikimedia.org/T88011) (owner: 10Merlijn van Deen) [20:28:53] (03CR) 10jenkins-bot: [V: 04-1] Limit shown projects to 4 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/187469 (https://phabricator.wikimedia.org/T88011) (owner: 10Merlijn van Deen) [20:30:19] (03PS2) 10Merlijn van Deen: Limit shown projects to 4 [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/187469 (https://phabricator.wikimedia.org/T88011) [20:41:45] i'm under spam attack of bigbrother, failing again and again to start job ''. how can I tell it to stop? [tried to rm .bigbrotherrc but today it sent me mails again] [20:43:53] Kelson: ordinarily you can reboot from the wikitech interface. This was an unusual case which defied the GUI. [20:44:11] andrewbogott: ok. thx for the precision [20:44:35] eranroz: I don't immediately know how to fix that, but Yuvi and Coren are both unreachable today so I guess I'll take a stab :) What's the name of the tool? [20:45:05] eranroz: it's a known bug that removing .bigbrotherrc doesn't work [20:45:16] andrewbogott: hewiki-tools [20:45:18] eranroz: however, bigbrother should give up after 5 (?) attempts, I think? [20:45:39] valhallasw`cloud: Ah, maybe you have better ideas about how to troubleshoot this? [20:45:42] valhallasw`cloud: yes. but it give up for today [20:46:09] valhallasw`cloud: (only for today) tommorow is a new day, and bigbrother tries again :) [20:46:48] eranroz: maybe replace it by a bigbrotherrc that just starts a webservice [20:46:59] that's unlikely to break... except the webgrid nodes are overloaded. Hm. [20:47:33] valhallasw`cloud: so that .bigbrotherrc content should be "webservice start"? [20:47:41] I think so, yes [20:48:05] valhallasw`cloud: Thank you! eranroz, bug me tomorrow if this continues to be a problem [20:48:37] valhallasw`cloud: thanks, i will try it. anderwbogoot - thanks [21:00:17] I'm positively surprised by the tools mainpage taking "only" 4 seconds to load http://www.webpagetest.org/result/150129_7J_1006/ [21:00:34] (Objective measuring triggered by complaints of slowness) [21:04:24] YuviPanda, the grid seems broken. [21:05:45] petan, tools seems to be very slow. The grid seems to be responding very slowly. [21:05:46] what?! O_o [21:06:39] Cyberpower678: Yuvi is en route to India. Can you tell me more specifically what you're seeing? [21:07:01] qstat takes forever to execute [21:07:10] Tasks take forever to start. [21:07:16] Tools take forever to load. [21:07:37] Scratch number 3 [21:07:46] Tools are loading normally again. [21:08:33] Things seem to be returning to normal. [21:08:41] Great! I did nothing :) [21:08:50] Must've been a momentary high load [21:10:39] andrewbogott: could be NFS -- webgrid nodes were reporting ~50 load an hour ago [21:11:22] tools-login is at 4 with just 4% cpu usage [21:29:02] valhallasw`cloud: Cyberpower678 everything OK now? [21:29:08] @mysql experts: does sb. know how i can force sql sequence database to use the same enconding as the replication tables? i tried http://pastebin.com/QSu8W2zD but it is not working for march/März which contains an umlaut [21:29:12] I'll be flying for the next 30h [21:30:16] YuviPanda, seems to be oiled well again. [21:31:10] Cool [21:32:05] hey guys. I am trying to make a query repeatedly and it is very very slow. maybe you know some way to rephrase it and speed it up a bit. [21:32:26] SELECT count(page_title) FROM page p, pagelinks pl WHERE pl.pl_title=%s AND pl.pl_from=p.page_id AND p.page_is_redirect=1 AND p.page_namespace=0 [21:32:45] this basicly says how many redirects a wikipedia article has [21:32:54] pointed to it. [21:34:24] marcmiquel: add pl_namespace in where [21:35:14] aha. any other change? [21:35:17] index is (pl_namespace,pl_title), so only pl_title is not indexed [21:36:55] SELECT count(page_title) FROM page p, pagelinks pl WHERE p.page_namespace=0 AND pl.pl_title=%s AND pl.pl_from=p.page_id AND p.page_is_redirect=1 [21:38:07] marcmiquel: i would write "SELECT count(page_id) FROM pagelinks INNER JOIN page ON pl_from=page_id WHERE pl_namespace=0 AND pl_title=%s AND page_is_redirect=1 AND page_namespace=0" it is more readable [21:38:26] readable by humans or by the mysql db? [21:38:32] my humans [21:38:33] YuviPanda, scratch that. Something keeps rapidly killing the webservice on xtools. [21:38:56] i intuitively think without inner joins [21:39:02] is that a problem for my queries? [21:39:06] and count(page_id) because it is the primary key and mysql knows that this is never null [21:39:24] maybe for the query optimizer, you have to test it [21:39:43] Cyberpower678: just xtools or the split out tools too? [21:39:53] do i test it just using both queries? [21:39:59] just xtools. [21:40:12] But I literally restarted the webservice 20 minutes ago. [21:40:25] And now it seems big brother rebooted it again. [21:40:26] marcmiquel: yes, but think of the query cache while testing [21:40:45] how can it affect query cache? [21:41:39] Cyberpower678: I'm about to get on a flight sadly [21:41:48] SET SESSION query_cache_type = OFF; [21:41:51] Will be out of action for 30h [21:42:59] YuviPanda, ever since labs rebooted. xTools has been crashing every few minutes I feel. [21:43:22] Do you have any idea why? [21:43:34] No. [21:44:21] Merlissimo: why off? [21:44:55] YuviPanda, not to mention excessive load times, [21:45:41] marcmiquel: if you test query runtime while developing your second try will be much fast because of the cache [21:46:15] but you should improve it, so that it is also fast on the first try [21:46:52] aha. u're suggesting to stop the cache to really know what is going on [21:47:02] YuviPanda, tools-login is slow again [21:47:07] qstat [21:47:40] * Cyberpower678 bangs his head on the table. [21:48:30] andrewbogott: ^ [21:49:30] I'll look shortly [21:50:58] Now it's fast again. [21:51:12] Ok something is going on and MusikAnimal is noticing the same thing too. [21:52:12] And slow again.... [21:53:28] andrewbogott, YuviPanda: and now I have two webservices running on xtools-ec?? [21:56:15] Merlissimo: your query with inner join is years light the others [21:57:47] andrewbogott, something I am concerned about is that it takes a really long time before grid submitted jobs are running [21:57:58] They sit in the queue for a really long time. [21:58:30] OK. I'm poking but I don't see anything obvious. [21:58:30] We may need to wait for Coren to recover [21:58:33] I think this is all grid related. [21:58:42] The grid queue is incredibly slow [21:58:49] recover? [21:58:54] sge scheduler interval in 15 second [21:59:12] Cyberpower678: he's out sick, I believe [21:59:17] Oh [22:00:06] Gah. It's been 9 minutes and my webservice task is still sitting waiting to be started. [22:00:38] Cyberpower678: webgrid is overloaded [22:00:53] by what? [22:00:58] (it's been 20+h for myself :D) [22:01:24] MusikAnimal, T13|inClass: ping [22:01:58] I'll ping you back on break [22:02:14] T13|inClass, about getting our own project... [22:02:33] marcmiquel: after your problem is solved, do have have any indea for my question before [22:02:39] Cyberpower678: load average: 36.27, 33.99, 30.37 [22:02:42] on tools-webgrid-01 [22:02:49] even though Cpu(s): 0.2%us, 0.5%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st [22:02:58] My webservice isn't even running. [22:03:16] sorry, Merlissimo, what was the question? [22:03:16] It won't start. It's idling qw for the past 12 minutes [22:03:19] yes, because the grid is not scheduling it, because the nodes are /already/ overwhelmed [22:03:35] So what's causing the overload? [22:04:16] marcmiquel: http://pastebin.com/QSu8W2zD is nor working for march/März [22:04:24] valhallasw`cloud, ^^ [22:04:36] Cyberpower678: want to try killing that job, and try again with webservice2 —release trusty start? [22:04:45] andrewbogott, no [22:04:50] That will schedue on a Trusty node which is somewhat experimental but should be idle at least [22:04:58] yeah, that will schedule it on tools-webgrid-05 which is not overloaded as badly [22:05:01] It also breaks xTools [22:06:51] Cyberpower678: if you add release='*' to your jub submission it could run on the trusty web grid which has high load but is not yet overloaded [22:07:12] andrewbogott, I think MusikAnimal and T13|inClass would like to have our own xTools project to work on. [22:07:26] xtools.wmflabs.org [22:07:32] valhallasw`cloud: are you able to see why the webgrid is overloaded? It shouldn't be, ordinarily... [22:07:36] Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util [22:07:36] vda 0.00 1.67 0.20 1.24 2.35 100.38 142.71 0.32 224.92 156.05 235.78 9.89 1.42 [22:07:39] webgrid 1-4 are in overloaded/alarm state that why no new job are scheduled there [22:07:40] andrewbogott: ^ iostat [22:07:44] what is the purpose of this query Merlissimo? it looks rather complicated. seq_0_to_360)a is what i don't get. [22:08:05] andrewbogott: if I interpret that correctly, the wait time for requests to vda is 225ms [22:08:20] andrewbogott: there's no cpu usage, so the high load also suggests I/O issues [22:08:27] valhallasw`cloud: yeah, but that doesn't explain what it's doing :) [22:08:38] marcmiquel: on dewiki there is a aubpage created for every day and i want to get the pageid of the subpages for the last 360 days [22:09:14] andrewbogott, can you provide us with our own project? [22:09:36] marcmiquel: the sequence database simply returns a table with 360 rows having value from 0 to 360 increased per each row [22:09:43] Cyberpower678: I can, but… is this an emergency? Tools will most likely return to normal as soon as Coren is able to do some routine maintenance. [22:09:53] andrewbogott: I'm not sure what more I can test -- I don't think I can access NFS logs [22:10:17] andrewbogott: (at least, I wouldn't expect non-root-users to be able to read them) [22:10:17] andrewbogott, people complain quite fast when xTools goes down, [22:10:18] Merlissimo: that's completely another level for me. I would make simpler queries and code. [22:11:07] Cyberpower678: running on trusty is not possible? [22:11:29] Merlissimo: YuviPanda tried it and it blew up. [22:11:33] Merlissimo, nope. I get a string of errors, and a broken tool. [22:12:17] Cyberpower678: your own project wouldn't help. iostat on bastion also has 220ms wait times. [22:12:52] I thought with our own project comes our own resources to manage such as webservices. [22:12:56] Merlissimo: the xtools code is incompatible with newer versions of php and hence breaks up [22:13:12] andrewbogott: xtools would still like to have its own project due to the number of tools we are maintaining to improve our ability to manage resources. [22:13:48] Cyberpower678: sure, but if the filesystem doesn't work, everything still does in flames [22:14:03] I think the time has come that we need to move out of tools. [22:14:06] Cyberpower678: but at least you'd get your web server started, that's a fact [22:14:34] valhallasw`cloud, the file system seems to be reliable. Everything else in tools isn't. [22:18:08] valhallasw`cloud: what flags did you use for your iowait? I'm not seeing the same stats [22:18:15] andrewbogott: iostat -x [22:18:38] thx [22:18:50] I noticed these new errors in syslog on integration slaves: https://gist.github.com/Krinkle/b219a1d4d3d5112f3be8/raw - snafu or should I file a task? [22:18:57] ldap_search_ext() failed [22:20:06] Krinkle: are those ongoing? The ldap servers are certainly up... [22:20:54] I restarted them a couple of days ago, so there might've been an brief interruption, although I didn't do them both at once [22:21:59] 4 matches in the past few hours on integration-slave1007 [22:22:06] I'm not monitoring this. Just noticed it. [22:22:25] Maybe check if it happens on other instances? e.g. tool labs maybe [22:47:45] PROBLEM - Free space - all mounts on tools-exec-15 is CRITICAL: CRITICAL: tools.tools-exec-15.diskspace._var.byte_percentfree.value (<55.56%)