[00:18:09] Betacommand: X can play tricks.. even a super-experienced colleague gets cornered sometimes [00:23:44] darkblue_b: Server wouldn't /have/ X in the first place, so whatever the issue was that wasn't it. :-) [00:24:34] * Betacommand goes back to figuring out how to mount a windows share in linux [00:26:48] oh right - hm [00:26:54] tasksel --task-packages server [00:27:31] images here are often debian/ubuntu ? [00:27:57] Ubuntu here. [00:28:10] Both Precise and Trusty are available. [00:31:47] Coren: people may be interested in geo things.. I look after PostGIS here.. http://svn.osgeo.org/osgeo/livedvd/gisvm/trunk/bin/ [00:31:59] I am not that great with basic X ;-) [00:32:53] thats predicated on LUbuntu 14.04 for the last release cycle [00:33:02] it was XUbuntu before that... [00:33:49] For the most part, the XUbuntu/LUbuntu/KUbuntu are the same except for the window manager. [00:34:19] If it works on one, it'll work on any (though installying it might need to install more libraries, etc) [00:34:38] ok makes sense.. I dont deal with the GUI menus [00:36:37] that build system writes log file.. I wrote a hack that parses the output, and makes a CSV of the packages that are installed (and de-installed) [04:49:18] !log deployment-prep clean up coredump on deployment-prep [04:49:20] Logged the message, Master [04:52:57] YuviPanda: were those hhvm-related? [04:53:09] if so, those should automatically be moved to the shared space, if not, that's a bug [04:53:36] YuviPanda: also, if you !log in -qa it goes to the RelEng SAL and I won't miss it/have to catch it in here :P [04:55:48] greg-g: yeah, ori moved 'em to /var about a week ago [04:55:58] greg-g: and you could imagine how that affects labs [04:56:15] greg-g: it was a hhvm core dump [04:56:28] greg-g: on labs, /var fills up even after one core dump [05:00:10] greg-g: also our SAL setup needs fixing, I think [05:14:30] 3Wikimedia Labs / 3deployment-prep (beta): Puppet failures on deployment-pdf01 - 10https://bugzilla.wikimedia.org/73506 (10Yuvi Panda) 3NEW p:3Unprio s:3normal a:3None Error: Sysctl::Parameters[wikimedia base]: Could not evaluate: can't dup Symbol Notice: instanceproject: deployment-prep Notice: /Sta... [05:17:20] !log deployment-prep force apt-get install -f to unstuck puppet [05:17:23] Logged the message, Master [05:19:28] heh, that's one more machine fixed [05:51:31] YuviPanda: I just got love from icinga [05:51:39] kart_: yup, just fixed it :) [05:52:11] kart_: I managed to sneak in a puppet variable called '$lover_name' [05:52:14] Thanks :) [05:52:24] kart_: yw! [05:52:40] kart_: oh, you might know. how does one disable coredumps forever? :) [05:54:00] YuviPanda: hhvm? [05:54:10] kart_: yeah, but literally, I want to disable core dumps on labs [05:56:18] YuviPanda: that should be application specefic, but I don't have idea about 'all coredumps' [05:56:24] alright [09:35:05] 3Wikimedia Labs / 3deployment-prep (beta): Puppet fails to run on deployment-eventlogging02 - 10https://bugzilla.wikimedia.org/73479#c2 (10Yuvi Panda) 5PATC>3RESO/FIX Fixed with https://gerrit.wikimedia.org/r/#/c/173758/ :) [09:36:07] 3Wikimedia Labs / 3deployment-prep (beta): Puppet fails on deployment-sca01 - 10https://bugzilla.wikimedia.org/73508 (10Yuvi Panda) 3NEW p:3Unprio s:3normal a:3None yuvipanda@deployment-sca01:~$ sudo puppet agent -tv Warning: Unable to fetch my node definition, but the agent run will continue: Warnin... [12:58:38] 3Wikimedia Labs / 3Infrastructure: wikidata (federated?) database not available for many wikis - 10https://bugzilla.wikimedia.org/73511 (10Magnus Manske) 3NEW p:3Unprio s:3normal a:3None Despite #57876, for many smaller languages, the (federated?) wikidatawiki_p database is missing/empty: MariaDB [e... [13:28:26] YuviPanda: is there a way to check if i'm breaking labs ? [14:35:30] matanya: If labs breaks when you start and works again when you stop then you're breaking labs. :-) [14:35:46] matanya: Seriously though, what are you aprehending? [14:36:17] Coren: andrewbogott set me up on virt1016, with my super special vm, which i max 100% load [14:36:43] like 100% of time, wanted to know what is the affect on the rest of users [14:37:21] and other things, like, can i clone this machine if i need more horse power, or i'm imposing load on others [14:37:37] trying to be social ... :) [14:40:15] Coren: https://imgur.com/7iXRxYL <-- status of the poor machine [14:46:56] matanya: Well, the one resource we're shortest on is RAM and not CPU so it should be okay. [14:47:18] Good news Coren [14:47:19] At the very least, you're not all that noticable globally. [14:47:47] Coren: so l can clone this machine, to spread the load ? [14:49:10] That's not going to spread anything, actually. If you're not actually processing more data it's slightly /less/ efficient because of the context switching - and if you're processing more data then you're not spreading the load you're just increasing it. :-) [14:51:26] Coren: in spread I mean i can use more machines to do encoding at the same time = which is in fact permission to increase load :) [14:52:42] Ah, trying your hand at spin are ya? Give me a minute to take a look at how things are going. [14:52:57] How much ram does that instance take? [15:01:04] between 8 and 16 [15:01:30] matanya: Hm. Can you survive without a second one until the new hardware gets to us? [15:01:37] yes [15:01:46] just slower encoding [15:02:00] matanya: Then I'd rather you wait a bit - we're already seriously overcommited in ram. :-( [15:02:28] ok, i can configure it to use less, and draw more cpu power [15:02:37] but no rush anyway [15:03:04] To be actually helpful, you'd also need to use actually smaller /instances/ otherwise what your code doesn't use the kernel will. [15:03:40] ok [15:04:02] i'll keep this one running, and won't bring new ones until approved [15:04:23] Shouldn't be all that long; last I heard the hardware was on its way. [15:04:34] and then utilise more cpu, (preferably 16-32) and less ram (i.e. 8?) [15:05:08] Also, we'll get the codfw Labs up within the next couple months and that'll vastly increase our capacity. [15:06:37] Thanks Coren :) in the mean time you can sit back and enjoy the cinema i created in commons with this machine :https://commons.wikimedia.org/w/index.php?title=Special:ListFiles/Matanya&ilshowall=1 [15:07:51] Ooo. Historic! [15:12:55] PROBLEM - ToolLabs: Low disk space on /var on labmon1001 is CRITICAL: CRITICAL: tools.tools.diskspace._var.byte_avail.value (22.22%) [15:20:14] RECOVERY - ToolLabs: Low disk space on /var on labmon1001 is OK: OK: All targets OK [15:45:56] matanya: What devilry is this! The pictures, they move! [15:46:20] :D [15:47:50] 3Wikimedia Labs: Replication behind or missing records - 10https://bugzilla.wikimedia.org/72908#c3 (10nuria) Actually the initial user shows now in labsdb just fine (replication issue must have been solved) MariaDB [cswiki_p]> select user_id, user_registration from user where user_name = 'VlaMaul'; +---------... [16:01:52] 3Wikimedia Labs: Replication behind or missing records - 10https://bugzilla.wikimedia.org/72908#c4 (10nuria) a:3Sean Pringle But other users on the cohort do not appear on labs but appear in production, for example, this user only shows up in prod: mysql:research@analytics-store.eqiad.wmnet [cswiki]> select... [16:30:50] 3Wikimedia Labs: Replication behind or missing records - 10https://bugzilla.wikimedia.org/72908#c5 (10vojtech.dostal@wikimedia.cz) Thank you. You are right, VlaMaul is all right now, but 18 new users are not working. [16:43:07] Coren: btw, I migrated matanya's instance to virt1006, so it's mostly out of the way of other things. [16:48:13] !log deployment-prep delete deployment-analytics01, a tortoise from an ancient time. [16:48:19] Logged the message, Master [17:02:38] 3Wikimedia Labs / 3deployment-prep (beta): Puppet failure on instance udplog - 10https://bugzilla.wikimedia.org/73516 (10Yuvi Panda) 3NEW p:3Unprio s:3normal a:3None Error: Failed to apply catalog: Could not find dependency File[/usr/lib/ganglia/python_modules] for File[/usr/lib/ganglia/python_module... [18:06:01] Coren: any luck on finding a stable way to disable coredumps? [18:06:32] YuviPanda: Sorry, wasn't working on that today - I'm on codfw storage. [18:06:45] Coren: cool. just wanted to make sure it's on your radar. I've a workaround in place for now [18:06:53] (it puts coredumps on CWD) [18:07:39] * Coren would still like to know wth turns core dumps on. [18:07:57] Coren: I found some dumps after, and saved 'me [18:07:59] *em [18:08:06] What are they dumps of? [18:08:17] lighty, php, and other things [18:08:22] let me check where I put 'em [18:08:39] Coren: /home/yuvipanda/core [18:08:41] lighty and php [18:09:14] Huh. [18:09:27] indeed [18:09:29] my reaction too [18:23:23] 3Wikimedia Labs / 3deployment-prep (beta): Puppet failures on deployment-bastion - 10https://bugzilla.wikimedia.org/73520 (10Yuvi Panda) 3NEW p:3Unprio s:3normal a:3None Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Must pass trusted_group to Class[Keyholder] on node i-00... [20:37:28] !log deployment-prep cleaned out logs on deployment-bastion [20:37:30] Logged the message, Master [20:40:07] !log tools cleaned out /tmp on tools-login [20:40:10] Logged the message, Master [20:41:50] Coren: andrewbogott any reason not to setup a tidy setup on toollabs cleaning out everything in /tmp older than a week? [20:41:53] 3Wikimedia Labs / 3tools: Clean out files in /tmp older than 5 days - 10https://bugzilla.wikimedia.org/73527 (10Yuvi Panda) 3NEW p:3Unprio s:3normal a:3Marc A. Pelletier This keeps the root filesystem (on which /tmp is) with enough free space, resulting in lesser pages for the admins, making them hap... [20:42:24] * YuviPanda checks age of the socket files on webgrid [20:42:30] YuviPanda: I don't object [20:43:06] fun / terrible part of having working monitoring is discovering all these terrible things that've been happening for a while and then get sidetracked fixing them :) [20:44:26] hmm, I wonder if tidy stays clear of sockets [20:46:11] sockets would still be open, right? [20:47:10] well [20:47:13] if I use puppet's tidy [20:47:16] it'll try to delete them [20:47:17] and fail [20:47:20] if they're still open [20:47:24] which is perhaps good enough [20:47:31] but tidy does this by generating a file resource for each file [20:47:32] (yes) [20:47:42] so that'll slow things down [20:47:47] and *perhaps* even cause puppet errors [20:55:13] andrewbogott: anyway, I'll just wait for it to fuck up again to see if it annoys me, and if it does write a simple cron [20:56:06] Ah, if it tries and fails to delete open files and then throws an error… that's not so good [20:56:33] yeah [21:04:22] Coren: andrewbogott fyi, http://shinken.wmflabs.org/ now has free space checks for everything in toollabs/betacluster, and the puppet checks are also pretty solid :) [21:04:35] now to figure out IRC and then see why mail isn't being delivered properly [21:04:57] * YuviPanda goes to sleep [21:04:57] night [21:05:21] Hm. Lots (proportionally) of "Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.27)" [21:12:55] Oh, god, does the image metadata for a DJVU include the text layers? [21:30:22] WDQ is throwing 502. :/ [21:33:58] Coren: Is that lab's fault or Magnus'? [21:35:02] w930913: There are no current issues with labs that I see or heard of. [22:09:43] YuviPanda: is there a good way to add a redirect from https://tools.wmflabs.org/wikipedia-android-builds/ to https://android-builds.wmflabs.org/? [22:10:59] YuviPanda: someone in mobile channel mentioned that the old server is still the #1 hit in Google search [22:30:59] YuviPanda: fyi, I've added a redirect clause in the .lighttpd.conf. I hope that is correct. Seems to work [22:36:54] bearND: Yes, the redirect in the lightty config would be my choice too. [22:47:41] w930913: great