[00:32:20] 6Labs: 'virt1' entry at markmonitor? - https://phabricator.wikimedia.org/T102689#1919296 (10Andrew) Doni writes: ``` HI Andrew, Here is the response we got from the register. Please be advised the host is associated to the domain that is not in your sponsorship. Please note that you can rename the host fo... [00:55:30] 6Labs, 10WikiProject-X, 7Tracking: New Labs project: WPX - https://phabricator.wikimedia.org/T122534#1919338 (10Andrew) I'm happy to set up a project, but can you tell us a bit more about what particular limitation you're hitting in tool labs? [01:17:16] YuviPanda: what’s your favorite example of something that queries ldap from python? [01:17:25] I know you dislike ldapsupportlib, for example [01:18:51] andrewbogott: the ssh ldap lookup thing is not too bad, but I like create-dbusers better [01:19:15] andrewbogott: the former works on preicse and the latter doesn't [01:20:09] ok [01:20:13] I think I want to work on precise [01:20:32] wait, ‘the ssh ldap lookup thing’ — any idea what that’s actually called? [01:21:30] andrewbogott: ssh-key-ldap-lookup [01:21:40] ok, probably I could have figured that out [01:21:41] thanks :) [01:21:45] :D [04:13:42] (03CR) 10Legoktm: [C: 032] Generate a gitinfo.json to be included in tarballs [labs/tools/extdist] - 10https://gerrit.wikimedia.org/r/262366 (https://phabricator.wikimedia.org/T122769) (owner: 10Legoktm) [04:26:33] (03Merged) 10jenkins-bot: Generate a gitinfo.json to be included in tarballs [labs/tools/extdist] - 10https://gerrit.wikimedia.org/r/262366 (https://phabricator.wikimedia.org/T122769) (owner: 10Legoktm) [05:00:54] 6Labs: 'virt1' entry at markmonitor? - https://phabricator.wikimedia.org/T102689#1919601 (10RobH) a:5Andrew>3RobH I got my MarkMonitor login fixed so I should be able to run this down in the portal and see if we cannot change it as advised. We're in our staff meetings tomorrow and Friday, so I likely won't... [06:04:39] 6Labs, 10WikiProject-X: New Labs project: WPX - https://phabricator.wikimedia.org/T122534#1919640 (10Harej) [06:42:33] 6Labs, 10WikiProject-X: New Labs project: WPX - https://phabricator.wikimedia.org/T122534#1919661 (10Harej) @Andrew, our tool labs project relies on a complex network of Python scripts. The scripts occasionally fail and I have no way of telling when this happens. This is because of how the grid engine reports... [09:22:07] 10Quarry: Cannot download data from a query with Unicode characters in its title - https://phabricator.wikimedia.org/T123031#1919728 (10Dalba) 3NEW [09:27:17] 10Quarry: Cannot download data from a query with Unicode characters in its title - https://phabricator.wikimedia.org/T123031#1919740 (10Dalba) [09:31:16] 6Labs, 10wikitech.wikimedia.org: Promote @valhallasw to contentadmin on wikitech - https://phabricator.wikimedia.org/T123032#1919747 (10scfc) 3NEW [10:06:09] 6Labs, 10Tool-Labs: SGE master down - https://phabricator.wikimedia.org/T123034#1919778 (10valhallasw) 3NEW [10:09:48] 6Labs, 10Tool-Labs: SGE master down - https://phabricator.wikimedia.org/T123034#1919796 (10valhallasw) BDB issues again. From /data/project/.system/gridengine/spool/qmaster/messages: ``` 01/07/2016 09:48:45|worker|tools-grid-master|E|The job -j of user(s) tools.toolschecker does not exist 01/07/2016 09:48:45|... [10:12:17] 6Labs, 10Tool-Labs: SGE master down - https://phabricator.wikimedia.org/T123034#1919799 (10valhallasw) Issues started early this morning: ``` 01/07/2016 04:05:50|worker|tools-grid-master|W|rescheduling job 2005690.1 01/07/2016 04:06:11|worker|tools-grid-master|E|error writing object with key "USER:tools.zkbot"... [10:21:52] 6Labs, 10Labs-Infrastructure, 7LDAP: No LDAP records for two users? - https://phabricator.wikimedia.org/T123036#1919806 (10scfc) 3NEW [10:31:12] 6Labs, 10Tool-Labs: SGE master down - https://phabricator.wikimedia.org/T123034#1919815 (10valhallasw) I tried the following: * make a backup of the database, * run db_recover on that, * move the recovered database in place but this did not solve the issue. However, moving the original data files back and... [10:42:11] 6Labs, 10Labs-Infrastructure, 7LDAP: No LDAP records for two users? - https://phabricator.wikimedia.org/T123036#1919822 (10valhallasw) I think @Andrew removed the users from ldap. [10:55:11] valhallasw`cloud: <3 [11:02:38] valhallasw`cloud: I did a bunch of tests and everything seems ok [11:09:37] 6Labs, 10Tool-Labs: SGE master down - https://phabricator.wikimedia.org/T123034#1920015 (10yuvipanda) p:5Unbreak!>3High Augh. And <3 to @valhallasw for dealing with it! I did a bunch of tests and everything seems ok. I'll check back on it tomorrow to see if... anything seems amiss. [11:18:21] YuviPanda: I'm not even sure what fixed it, though [11:18:30] in the end, I think it just amounted to 'restart the grid master' [11:18:44] which sounds like a bad way to deal with corruption, generally... [11:21:59] valhallasw`cloud: yeah [11:22:04] valhallasw`cloud: i wonder if bdb needs to be on NFS [11:22:16] valhallasw`cloud: if we can sacrifice 'autofailover' (which doesn't really work anyway...) [12:40:48] /msg NickServ VERIFY REGISTER thparkth uylbqudukdlb [12:40:57] sorry about that :) [12:41:32] darn extra space [12:55:22] YuviPanda: mmm. I think the toolserver did without, but I'm not 100% sure. We can also consider having tools-master export NFS rather than having everything on /data [12:55:37] YuviPanda: otoh, I'd rather keep stuff as much the same as we can, and just move the k8s asap [13:03:21] hi guys! :) just a quick question. is there an article which lists wikipedia language editions by year of creation or can I retrieve this info from mediawiki in tool labs? [16:06:38] 6Labs, 10Tool-Labs: SGE master down - https://phabricator.wikimedia.org/T123034#1920358 (10Phe) Not all things are ok, qstat --> 1958791 0.30662 lighttpd-p tools.phetoo dr 01/04/2016 23:48:09 webgrid-lighttpd@tools-webgrid 1 it runs on tools-webgrid-lighttpd-1205 but that task no longer exists on t... [16:12:34] 6Labs, 10Tool-Labs: SGE master down - https://phabricator.wikimedia.org/T123034#1920372 (10Phe) task 2052506 on the same server (dplbot) seems to have the same trouble [16:18:47] 6Labs, 10Tool-Labs: SGE master down - https://phabricator.wikimedia.org/T123034#1920393 (10valhallasw) Force-deleted job 1958791. Dplbots webservice is running: ``` valhallasw@tools-bastion-01:~$ ssh tools-webgrid-lighttpd-1205.eqiad.wmflabs ps aux | grep dpl 51290 13527 0.0 0.0 52160 2844 ? Ss... [16:22:05] 6Labs, 10Tool-Labs: tools-proxy lost dplbot - https://phabricator.wikimedia.org/T123072#1920409 (10valhallasw) 3NEW [16:22:57] 6Labs, 10Tool-Labs: tools-proxy lost dplbot - https://phabricator.wikimedia.org/T123072#1920417 (10valhallasw) [16:26:35] I'm having insane packet loss to labs, but it might just be my connection [16:27:19] (something like 35%) [16:32:45] 6Labs, 10Tool-Labs: SGE master down - https://phabricator.wikimedia.org/T123034#1920439 (10Phe) but marked as Deleting on toollabs status page since some hours: 2052506 lighttpd-dplbot dplbot Webgrid-lighttpd / Deleting 2016-01-07 11:14:53 28s 1/3 [16:44:30] 6Labs, 10Tool-Labs: tools-proxy lost dplbot - https://phabricator.wikimedia.org/T123072#1920458 (10valhallasw) ``` HSET prefix:dplbot .* http://tools-webgrid-lighttpd-1205.tools.eqiad.wmflabs:54744 HDEL prefix:dplbot .* HSET prefix:dplbot .* http://tools-webgrid-lighttpd-1205.tools.eqiad.wmflabs:33555 ``` so... [16:45:02] 6Labs, 10Tool-Labs: tools-proxy lost dplbot - https://phabricator.wikimedia.org/T123072#1920459 (10valhallasw) 5Open>3Resolved a:3valhallasw [16:53:25] is it possible to do lighttpd + fastcgi python on toollabs? [17:05:22] valhallasw`cloud: thanks for your vandalism cleanup — clearly I missed a bunch of edits last night. [17:07:01] 6Labs, 10Labs-Infrastructure, 7LDAP: No LDAP records for two users? - https://phabricator.wikimedia.org/T123036#1920476 (10Andrew) yeah, sorry -- I was trying to stop ongoing vandalism and used the first approach I could think of. I need to brush up on my wiki management skills :( [17:12:39] phe: yes, please see the docs for an example. You can also use uwsgi [17:20:18] MusikAnimal, ping [17:20:26] allo [17:58:51] valhallasw`cloud: yeah, I think so too. [17:59:08] valhallasw`cloud: I think there's consensus here that we should just do k8s and plug it into webservice + jsub [17:59:22] YuviPanda: right. [17:59:29] valhallasw`cloud: I'm building 'official' wikimedia images now / yesterday [17:59:32] YuviPanda: we don't have any logs on the nfs server about which server wrote to what file, I suppose? [17:59:41] these are ones that do not require us to trust dockerhub [17:59:45] valhallasw`cloud: no don't think so [17:59:58] I'm wondering if the issue is that master and shadow (or maybe some other host?) write to the bdb files at the same time [18:00:20] hmm [18:00:25] but the shadow isn't supposed to [18:00:29] valhallasw`cloud: I wonder if we should shut down the shado [18:00:32] *nod* [18:00:32] w [18:00:51] we can't have shadow/master without nfs, right? [18:01:34] valhallasw`cloud: yeah, but we can probably make hot backups of bdb [18:03:22] valhallasw`cloud: should start building docker images [18:03:31] tools-master docker images? :P [18:03:33] no [18:03:35] :P [18:03:38] exec node ones [18:03:59] ? [18:06:45] 6Labs, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure: deployment-mediawiki03 : apt broken trying to reach out webproxy.eqiad.wmnet - https://phabricator.wikimedia.org/T122953#1920548 (10faidon) [18:07:03] valhallasw`cloud, I was wondering if there is anything faster than a binary search? [18:07:53] I'm guessing no. [18:08:01] But I thought I'd ask, [18:08:42] Cyberpower678: I think it's the same as searching in a sorted array, so O(log n) [18:09:37] valhallasw`cloud, huh? [18:09:56] Cyberpower678: https://en.wikipedia.org/wiki/Big_O_notation [18:11:05] oh boy. [18:11:31] Cyberpower678: it basically tells you how the runtime scales with the number of elements (in your case, the number of revisions) [18:11:35] The binary search does keep my memory down, but the time is still pretty high. [18:12:05] x being elements, y being time? [18:12:13] yes [18:12:29] but it doesn't tell you what the prefix is [18:13:06] This is why I'm working on caching that data in a database. [18:13:09] I'm not sure how you're retrieving revisions, but I imagine that is probably your bottleneck? [18:14:18] valhallasw`cloud, I start in the middle of the revision. [18:14:22] histor [18:15:25] Then I increment/decrement the search needle to N/4 based on results, noting the upper and lower range accordingly. [18:16:01] And so on until the revision history is a decent size, then that gets downloaded and searched through. [18:16:46] how much time do those parts take, roughly? [18:17:24] * Cyberpower678 processes some data from the execution profilers. [18:18:20] if most of the time is spent in the first part, you can trade in some bandwidth for speed, by loading multiple revisions at once (i.e. instead of just loading N/2, you preload N/16, 2N/16, ....). You could also check for simple thing such as keeping connections open [18:18:53] if it's in the second part, you could choose to reduce the 'decent size' further by taking a few more steps first [18:19:29] It's the first part [18:19:46] Average execution time for the function is 2.02 seconds per URL [18:20:06] that's quite a lot. [18:20:11] Exactly [18:20:28] this is running from tool labs, right? [18:20:35] I've been working on caching the data, so it can be recalled from a databse in the future but... [18:20:47] No. My computer [18:21:14] what's the api call you're making? [18:21:15] I can use profilers on my computer to track SQL, memory, hits, and execution time. [18:22:48] I doubt it's the API calls. The profiler suggests that the average curl_exec time is 226 ms [18:23:18] that's still a lot, but what on earth are you doing that takes another 1.8 seconds? [18:24:13] I need to give it a closer read, give me sec... [18:24:45] Oh wait, it is the API calls [18:25:03] If you do it enough times like 10, you can easily get 2 seconds [18:25:10] right [18:25:32] you're doing something like https://en.wikipedia.org/w/api.php?action=query&revids=1&export ? [18:26:03] action=query&prop=revisions&format=php&rvdir=newer&rvprop=timestamp%7Ccontent&rvlimit=1&rawcontinue=&rvstartid={$history[$needle]['revid']}&rvendid={$history[$needle]['revid']}&titles=".urlencode( $page ) [18:26:45] These are the parameters for the first part. [18:27:49] *nod* [18:28:05] keeping the connection open or not does not seem to matter a lot [18:28:27] The curl handle remains open until the function concludes. [18:29:07] I'm not sure if that also means curl keeps the connection open -- in any case, if I test with requests, the response time is ~200ms irrespective of the method [18:30:42] however, on tool labs it's ~60 ms for a new connection and ~40 ms for an existing connection [18:30:49] so that gives you a factor five [18:31:17] So that's something out of my control unless I use asyncronous background queries. [18:31:44] And query ahead to make it available for use, when the program needs it. [18:32:04] yeah. And you can ask for multiple revisions in one go (the N/16, 2N16, ... etc trick) [18:32:32] Won't it compile from one to the other though? [18:33:03] Rather giving me 1 from both ends? [18:34:37] valhallasw`cloud, ^ [18:51:30] Cyberpower678: not sure what you mean [18:57:24] valhallasw`cloud, how can I retrieve 3 specific revisions of a pages with a single API query? [18:57:34] Cyberpower678: &revids=1|2|3 [19:05:09] ugh, the NFS backup host might be having I/O issues [19:05:29] YuviPanda: could thta explain the load issues on labstore1001? [19:06:02] no [19:08:55] valhallasw`cloud, oh [19:09:03] I never even noticed that. [19:14:26] backups almost back now [19:14:29] but not quite yet [19:51:38] 6Labs, 10Labs-Infrastructure, 7LDAP: No LDAP records for two users? - https://phabricator.wikimedia.org/T123036#1920760 (10scfc) 5Open>3Invalid a:3scfc No problem. [20:49:45] YuviPanda: uhhhhh [20:50:00] The last Puppet run was at Thu Dec 17 17:19:43 UTC 2015 (30448 minutes ago). [20:50:22] [12:50:09] Labs-project-extdist: Puppet is not running on extdist instances - https://phabricator.wikimedia.org/T123090#1920839 (Legoktm) NEW [20:51:16] (03PS1) 10Legoktm: Send Labs-* to `#wikimedia-labs` [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/262920 [20:51:33] (03PS2) 10Legoktm: Send Labs* to `#wikimedia-labs` [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/262920 [20:51:48] (03CR) 10Legoktm: [C: 032] Send Labs* to `#wikimedia-labs` [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/262920 (owner: 10Legoktm) [20:52:25] (03Merged) 10jenkins-bot: Send Labs* to `#wikimedia-labs` [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/262920 (owner: 10Legoktm) [20:52:41] valhallasw`cloud: around? :/ [20:52:50] legoktm: ya [20:53:03] so puppet isn't running on the extdist instances, and hasn't since Dec 17th [20:53:10] do you know how I could debug why? [20:53:13] !log tools.wikibugs Updated channels.yaml to: db26b7db94db89a49fac63df54d0189cf39ffc90 Send Labs* to `#wikimedia-labs` [20:53:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [20:53:31] legoktm: tail /var/log/puppet.log [20:54:02] Error: Failed to apply catalog: Parameter path failed on File[undef]: File paths must be fully qualified, not 'undef' at /etc/puppet/modules/extdist/manifests/init.pp:52 [20:54:02] Wrapped exception: [20:54:02] File paths must be fully qualified, not 'undef' [20:54:46] oh fuck [20:55:03] https://gerrit.wikimedia.org/r/#/c/259639/ [20:58:07] valhallasw`cloud: thanks. How does https://gerrit.wikimedia.org/r/#/c/262921/ look? [20:59:01] does [ path1 path2 etc] work for file? [20:59:02] oh, it was used below [20:59:15] ya, lgtm [22:37:22] 6Labs, 10Tool-Labs: missing database on replica server - https://phabricator.wikimedia.org/T105713#1921030 (10jcrespo) Just wanted to say that this happened recently again- I had to drop the table due to corruption. Ping me if you want me to recreate as empty. [23:20:56] 6Labs: Provide a simple way to backup arbitrary files from instances - https://phabricator.wikimedia.org/T104206#1921036 (10chasemp) >>! In T104206#1812133, @Halfak wrote: > Just +1ing this. > > I'm struggling to implement a robust backup strategy for #ores and #wikilabels right now. Because I don't want the... [23:37:49] 6Labs, 7Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#1921059 (10Andrew) [23:37:51] 6Labs, 10WikiProject-X: New Labs project: WPX - https://phabricator.wikimedia.org/T122534#1921055 (10Andrew) 5Open>3Resolved a:3Andrew @Harej -- I've created a new project called 'wpx' with you as the sole project admin. You can add additional admins via the 'Manage Projects' tab and create instances vi...