[00:02:09] Krenair: yes [00:02:22] did you file a ticket in phabricator? [00:02:27] yes [00:02:43] https://phabricator.wikimedia.org/T94332 [00:03:47] YuviPanda: Well I'll talk it over with twentyafterfour and see if he knows of any hidden caches. At least we know what the problem is. [02:50:39] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Add more Trusty exec nodes - https://phabricator.wikimedia.org/T94304#1161294 (10yuvipanda) 20 to 24 are active no \o/ @Coren @scfc I added them to OGE and hand copied /etc/hosts. Anything else left? [03:46:28] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Add more Trusty exec nodes - https://phabricator.wikimedia.org/T94304#1161327 (10scfc) I don't think so, //but// currently there are no grid jobs running on them despite them being in the output of `qstat -f`. I assume this will change, but before reso... [05:04:52] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Add more Trusty exec nodes - https://phabricator.wikimedia.org/T94304#1161359 (10yuvipanda) Yeah, I see jobs there now. However, they seem to not be overcommiting VMEM - VMEM seems to be pegged at RAM. Is that ok? [06:38:22] PROBLEM - Puppet failure on tools-exec-12 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [0.0] [07:59:55] 10Tool-Labs, 10Wikimedia-General-or-Unknown: Missing information template links in templatelinks database - https://phabricator.wikimedia.org/T89441#1161453 (10Aschroet) More than a month ago the ticket has been moved to "In Progress". I wonder if there is any progress on the issue? [08:51:18] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Damzow was created, changed by Damzow link https://wikitech.wikimedia.org/wiki/Nova+Resource%3aTools%2fAccess+Request%2fDamzow edit summary: Created page with "{{Tools Access Request |Justification=Planing to run he.wikipedia User:DamBot on it. |Completed=false |User Name=Damzow }}" [10:23:38] hello, my webservice isn't starting, i'm getting "(network.c.358) can't bind to port: 14004 Address already in use" on error.log, any idea? [10:24:57] just wait, that helped me [10:25:29] but otherwise ask an admin to resolve it [10:25:34] or file a ticket [10:29:22] i«ll grab a cup of tea for now [10:29:27] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Samudranb was created, changed by Samudranb link https://wikitech.wikimedia.org/wiki/Nova+Resource%3aTools%2fAccess+Request%2fSamudranb edit summary: Created page with "{{Tools Access Request |Justification=I am a grad student at UC Berkeley, helping out Prof Coye Cheshire for a study that he is running. I am developing a software which will..." [10:29:35] I think someone ran into one of those issues before [10:29:40] Maybe Coren can help [10:33:56] links on wm-bot aren't working, btw. [10:38:34] RECOVERY - Puppet failure on tools-exec-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:48:08] Alchimista: netstat -anp |grep "my port" [11:26:15] 6Labs, 6operations, 7Monitoring, 5Patch-For-Review: Setup alarms for labstore* to check for network saturation - https://phabricator.wikimedia.org/T92629#1162002 (10fgiunchedi) a:3coren [13:00:02] YuviPanda: Can you check out https://gerrit.wikimedia.org/r/#/c/199639/2 too? The monitoring patch depends on it. [13:14:31] Coren: hey around? [13:15:05] Amir1: Only partly, busy for a few hours. Something urgent? [13:15:32] Coren: No, I just want to remind you to send me the files [13:15:46] You told me to you will send them in Monday [13:16:16] sorry for being too pushy [13:16:17] I told you I'd look for them today, yes. :-) I will. I don't know if I'll find anything helpful. [13:49:13] twentyafterfour: I'm only here for an hour. I talked with epriestley and he seems to know what the problem is with phab-02 [13:49:31] Negative24: oh really? [13:50:18] Yeah. So a cached version of css is being served that's out of date [13:53:50] Negative24: any clue how to fix that? [13:55:03] twentyafterfour: No idea. Yuvi says nginx doesn't cache and I checked apache to see if mod_cache was enabled and it wasn't [13:56:41] You can see that https://phab-02.wmflabs.org/res/phabricator/f1eab25d/core.pkg.css (current version) and https://phab-02.wmflabs.org/res/phabricator/aaaaaaaa/core.pkg.css (broken cache version) are completely different [14:27:27] twentyafterfour: We could completely bypass nginx for a short time by using a public ip instead [14:38:14] twentyafterfour: I don't know how nginx is set up but this guy may also have the same problem: http://stackoverflow.com/questions/6236078/how-to-clear-the-cache-of-nginx [14:39:20] supposedly sendfile is used to send static copies of generated files [14:45:33] 6Labs, 10Beta-Cluster, 6operations: Core dumps fill up /var on labs instances - https://phabricator.wikimedia.org/T1259#1162629 (10fgiunchedi) p:5High>3Normal the immediate issue of /var filling up with core dumps seems fixed, hence priority normal, however the path used for cores doesn't seem to exist (... [14:45:49] 6Labs, 6Phabricator: Phab-02 sending old stylesheet copies - https://phabricator.wikimedia.org/T94413#1162631 (10Negative24) 3NEW [14:46:21] 6Labs, 10Beta-Cluster, 6operations: HHVM core dumps in labs - https://phabricator.wikimedia.org/T1259#1162639 (10fgiunchedi) [14:46:45] 6Labs, 6Phabricator: Phab-02 sending old stylesheet copies - https://phabricator.wikimedia.org/T94413#1162642 (10Negative24) [14:47:26] twentyafterfour: Created task ^ [14:49:16] twentyafterfour: I have to go now but I'll be on later today. [14:59:28] 6Labs, 6Phabricator: Phab-02 sending old stylesheet copies - https://phabricator.wikimedia.org/T94413#1162666 (10Negative24) [15:01:37] 6Labs, 6Phabricator: Phab-02 sending old stylesheet copies - https://phabricator.wikimedia.org/T94413#1162669 (10Negative24) [15:04:39] 6Labs: Sync up the new labs NFS project filesystem with the live one - https://phabricator.wikimedia.org/T93792#1162689 (10coren) The copy completed successfully over the weekend so I will be rescheduling a switch for today. [15:09:42] Coren: heya! I just woke up. [15:09:53] My timezone math is all fucked up now... [15:10:34] That's be some fun jetlag days for a bit. :-) [15:10:40] Coren: I did review it already :) [15:11:10] Coren: no jet lag surprisingly. Mostly I am used to you coming online at 8 pm or so and Europe saying hi at noon [15:11:48] YuviPanda: Hi from Europe, not at noon ; [15:11:52] ;) * [15:12:02] :D [15:12:31] So I used to be able to go 'oh it is 10pm here this is the approx time in EU east coast sf' [15:12:34] Can't do that... [15:12:50] YuviPanda: That's why I track time in UTC. My box uses it as local timezone. [15:13:28] So I know my day starts around 13h, SF around 16h, etc. :-) [15:13:45] Is it ever more efficient to make a table that maps entity ID to label (wikidata) instead of hitting the wikidatawiki db? [15:13:55] Coren: I do that for half the year too :D [15:14:15] But as of this week, I'm UTC+1. [15:14:46] a930913: Even better would have you keep a local cache. :-) [15:15:14] Coren: Yeah, but in what? [15:15:33] I keep a per run cache. [15:15:48] In a python dictionary. [15:16:50] a930913: I don't know enough about your dataset to give you an intelligent answer. It depends on how static the mapping is, and how frequently it changes. The simplest solution might be to write that dictionary out at the end and use it to preseed on the next run, add new values to it when you lack a mapping, and refresh the whole thing at some interval regardless? [15:17:19] I'm guessing there that the mappings are fairly static, and that most changes are additions. YMMV [15:17:50] Coren: But how do I deal with multiple processes? [15:18:35] And how precious is memory as a resource on the grid? [15:18:38] a930913: It's probably not an issue. The worse that can happen in that case is that a run writes a slighly older dictionary that is missing a couple entries which the next one will pick up. [15:19:08] a930913: Fairly precious, but if you are talking megs then provably irrelevant in the long run. [15:19:52] Coren: https://tools.wmflabs.org/dimensioner/progress/9 [15:20:11] I'm not sure how much is the cache, but that's 2GB so far :( [15:20:20] (I meant the cost of the dictionary) [15:20:44] 6Labs, 3ToolLabs-Goals-Q4: dhclient overwrites /etc/resolv.conf - https://phabricator.wikimedia.org/T93691#1162741 (10Andrew) We may be able to change the default dns server on labnet1001's dhcp server. That will fix part of this. The remainder can (I think) be fixed by adding appropriate settings to head/ta... [15:20:53] a930913: As in many things, I think you'll have to rely on empirical research and not theory. :-) [15:21:29] Coren: But these take ages to research :( That job has been running since Friday. [15:21:56] * a930913 waddles off to research. [15:23:45] YuviPanda: " Coren: I did review it already :)" - odd, I don't see your review. Have you saved your draft comments? [15:24:38] Coren: no I posted. I see them... [15:24:49] Coren: that's the roles patch right? [15:24:56] Not inline comments [15:25:02] Just a comment [15:25:14] Oh, which John was it who replied to the "Simultaneous job limit" email? [15:25:30] Oh! Silly me, I had presumed you'd have given a +1 or -1 and didn't scroll down. [15:25:33] * Coren facepalms. [15:25:39] :) [15:25:52] Clarifying question so no numbers [15:28:42] a930913: Unless I'm mistanken, that was Betacommand. [15:29:19] * a930913 stabs Betacommand for attention. [15:29:43] Coren: Aha, sqlite! [15:30:34] a930913: Ohgodno! Don't do a local database on a networked filesystem. That way only pain and suffering lies. [15:30:43] a930913: Flat file ftw. [15:31:01] Oh :( [15:31:11] The panic :D [15:31:36] How big /is/ that mapping table anyways? There can't be more than a few hundred thouands properties all told? [15:32:27] It will hit at least one label per entity matched. [15:32:28] (If that many) [15:32:42] So that #9 is matching all people. [15:33:04] Hm. [15:33:30] Okay, that's large enough a dataset that actual data will be needed to decide on a proper solution. Sorry. :-) [15:33:59] I wonder how much resource generating the whole thing from the dump will be... [15:38:22] a930913: Yes? [15:38:51] Betacommand: You are the John who replied to the "Simultaneous job limit" email? [15:38:57] yeah [15:39:43] Betacommand: How do you grab 500 pages from the dump? [15:40:02] Does the master thread generate minidumps? [15:41:08] a930913: I have a master thread that reads the dump file, parses it into a usable format, then passes them to worker threads in 500 page chunks [15:41:36] the workers do the actual parsing of the page [15:41:54] Ah, so you reparse the whole thing. [15:42:05] No reparsing [15:42:39] master creates page objects with the data, and passes those to workers [15:42:57] Oh, I'm forgetting they're in the same process :p [15:43:03] 6Labs, 6operations: bond0 connection on labstore1001 is unpuppetized - https://phabricator.wikimedia.org/T92622#1162796 (10fgiunchedi) p:5Triage>3Normal what was the problem with the switch btw? has it been fixed? [15:43:08] the workers parse the page objects for whatever I am looking for [15:43:23] a930913: No I actually use multi-processing [15:43:48] Betacommand: So how do you pass the objects? [15:44:15] In the creation of the worker process [15:44:33] STDIN? [15:44:54] NO [15:44:58] from multiprocessing import Process [15:45:38] p = Process(target=parse, args=(q,error)) [15:45:40] p.start() [15:46:20] where q is the group of page objects, and error is the thing Im scanning for [15:46:24] Ah, that's cool. [15:47:30] throw in multiprocessing.active_children() and a .sleep() and you have 90% of your master process [15:55:15] Coren: +1 on the roles split. Do babysit :) [15:55:45] YuviPanda: That part should be essentially a noop [15:58:22] Coren: still. It is good practice to watch it run in the background [15:58:42] Oh, I didn't say I wouldn't, just that I don't expect issues. [15:58:57] Also I basically always run it in the foreground. :-) [16:01:24] 6Labs, 6operations: bond0 connection on labstore1001 is unpuppetized - https://phabricator.wikimedia.org/T92622#1162915 (10coren) Not as far as I know. Last news I heard @faidon had a fair idea of what the issue might be, but fixing it would require downtime and the matter was set aside. @yuvipanda: the bond... [16:16:37] 6Labs, 6operations, 7Monitoring, 5Patch-For-Review: Setup alarms for labstore* to check for network saturation - https://phabricator.wikimedia.org/T92629#1162974 (10coren) 5Open>3Resolved Now properly monitored with thresholds at levels suggested by Faidon. [16:16:46] 6Labs, 6operations, 7Monitoring: Setup alarms for labstore* to check for network saturation - https://phabricator.wikimedia.org/T92629#1162976 (10coren) [16:25:12] 6Labs, 5Patch-For-Review: Make a labs_storage module - https://phabricator.wikimedia.org/T93781#1162999 (10coren) 5Open>3Resolved a:3coren This was done (with roles). Replication review is going to the module. [16:30:15] hi all, if I get a gerrit repository for an extension, whose only member is myself, will developers with +2 on mediawiki core have +2 rights here? [16:30:57] codezee: That depends on how the repo was set up. That would be the case only if the rights are set up that way. [16:31:17] codezee: IIRC, Andre doesn't do that by default. [16:31:36] codezee: everyone in the mediawiki project will have +2, I believe [16:33:46] Coren, Glaisher thanks for the info [16:44:18] 6Labs, 10Continuous-Integration, 6operations: Evaluate options to make puppet errors more visible - https://phabricator.wikimedia.org/T92710#1163086 (10fgiunchedi) p:5High>3Low looks like everything was working as expected on the alerting side, not sure there's any other action? [17:08:40] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Add more Trusty exec nodes - https://phabricator.wikimedia.org/T94304#1163228 (10scfc) I believe overcommitting was only enabled on the `lighttpd` exec nodes, and even there I think it is no longer enabled. So I don't think that is a problem. [17:29:44] 6Labs: Ensure that opsen are paged on failure of labstore1001's NFS service - https://phabricator.wikimedia.org/T76402#1163356 (10coren) Part of the issue is defining "failure"; if the host stops working we're already being notified - perhaps a check somewhere outside labs itself that the filesystem can be effec... [17:36:48] 10Tool-Labs, 5Patch-For-Review: Enable "Access-Control-Allow-Origin: *" header on tools-static.wmflabs.org - https://phabricator.wikimedia.org/T93466#1163391 (10Ricordisamoa) >>! In T93466#1141531, @yuvipanda wrote: > Done I just stumbled on this task by accident. Thanks for resolving it! [17:37:00] 10Tool-Labs: Enable "Access-Control-Allow-Origin: *" header on tools-static.wmflabs.org - https://phabricator.wikimedia.org/T93466#1163392 (10Ricordisamoa) [17:44:08] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4: Make sure tools-db is backed up in some form - https://phabricator.wikimedia.org/T88716#1163413 (10coren) (Presumably, back up //offsite//) This requires one of two things: either we dump the database to labstore2001 (which also gets rsyncs of labstores) or we add a... [17:46:38] 10Tool-Labs: webservice2 failing to start Python web services with xml.etree.ElementTree.ParseError - https://phabricator.wikimedia.org/T92039#1163423 (10Ricordisamoa) [17:49:03] 6Labs: Storage capacity & redundancy expansion (tracking) - https://phabricator.wikimedia.org/T85604#1163440 (10coren) Status update: * labstore2001 upgraded * copy done, mountpoint swap scheduled for today (Mar 30) 22h UTC Todo: * Finish review/tweaks of replication code and flip the switch [18:08:25] 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1163566 (10scfc) >>! In T92491#1155873, @BBlack wrote: >>>! In T92491#1112781, @coren wrote: >> Apt tools indeed use proper locking, but do so to ensure exclusive runs not concurrency. But Yuvi is correct t... [18:19:56] (03PS2) 10Awight: Use block style YAML [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/196852 [18:20:14] (03CR) 10Awight: "PS2: manual rebase" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/196852 (owner: 10Awight) [18:21:33] (03CR) 10Awight: "ping" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/199665 (owner: 10Awight) [18:28:09] 6Labs, 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review, 3ToolLabs-Q4-Sprint-1: Alert when conntrack table is full on labnet1001 - https://phabricator.wikimedia.org/T90437#1163749 (10coren) [18:28:27] 6Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Labs NFSv4/idmapd mess - https://phabricator.wikimedia.org/T87870#1163755 (10coren) [18:28:46] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Puppetize & fix tools-db - https://phabricator.wikimedia.org/T88234#1163758 (10coren) [18:29:35] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make sure tools-db is replicated somewhere - https://phabricator.wikimedia.org/T88718#1163763 (10coren) [18:31:59] 6Labs, 5Patch-For-Review, 3ToolLabs-Q4-Sprint-1: Process for user backups - https://phabricator.wikimedia.org/T85608#1163795 (10coren) [18:32:52] 6Labs, 5Patch-For-Review: Process for user backups - https://phabricator.wikimedia.org/T85608#950693 (10coren) [18:33:16] 6Labs, 3ToolLabs-Q4-Sprint-1: Sync up the new labs NFS project filesystem with the live one - https://phabricator.wikimedia.org/T93792#1163809 (10coren) [18:33:35] 10Tool-Labs-tools-Other: bring back missing-from-wikipedia - https://phabricator.wikimedia.org/T72199#1163813 (10terrrydactyl) Coren helped get the site up at: https://tools.wmflabs.org/missing-from-wikipedia/index I'll keep monitoring it for a while to make sure it stays up. [19:19:46] 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1164116 (10scfc) Not strictly unattended upgrades: ``` From: root@tools.wmflabs.org (Cron Daemon) Subject: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily... [19:26:42] 10Tool-Labs-tools-Other: SVG translate spits PHP errors instead of working - https://phabricator.wikimedia.org/T94433#1164164 (10Aklapper) Could you please open the browser's console (or whatever it is called in your browser) and reload the page that you see the problem on? If there is a problem or an error with... [19:28:20] 10Tool-Labs-tools-Other: SVG translate spits PHP errors instead of working - https://phabricator.wikimedia.org/T94433#1164166 (10Jarry1250) @aklapper Could you add a new project, Tool-Labs-tools-svgtranslate ? I think you need special perms to do that, and apparently I can't add this to a project until it exists :/ [19:30:18] 6Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: dhclient overwrites /etc/resolv.conf - https://phabricator.wikimedia.org/T93691#1164177 (10scfc) The problem with the whole setup (resolvconf and dhclient, that is) is that it does a lot of DWIM, and so I find it hard to get a consistent state to migrate to. `re... [19:37:15] Change on 12www.mediawiki.org a page OAuth (obsolete info) was modified, changed by Guillaume (WMF) link https://www.mediawiki.org/w/index.php?diff=1506757 edit summary: Remove obsolete status template per [[phab:T94180|T94180]] [19:38:25] 6Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: dhclient overwrites /etc/resolv.conf - https://phabricator.wikimedia.org/T93691#1164219 (10scfc) And forgot: As `/run` where resolvconf stores its copy is on a temporary file system, the state of the instances depends on whether the instance was rebooted after th... [19:41:53] (03PS1) 10Anomie: Send MediaWiki-API-Team and Blocked-on-MediaWiki-API-Team to #mediawiki-core [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/200657 (https://phabricator.wikimedia.org/T94471) [19:44:36] (03CR) 10Legoktm: [C: 032] Send MediaWiki-API-Team and Blocked-on-MediaWiki-API-Team to #mediawiki-core [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/200657 (https://phabricator.wikimedia.org/T94471) (owner: 10Anomie) [19:45:06] (03Merged) 10jenkins-bot: Send MediaWiki-API-Team and Blocked-on-MediaWiki-API-Team to #mediawiki-core [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/200657 (https://phabricator.wikimedia.org/T94471) (owner: 10Anomie) [19:45:55] !log tools.wikibugs Updated channels.yaml to: b2c38567b32d82881baab5c3227f14a9b8e9fff5 Send MediaWiki-API-Team and Blocked-on-MediaWiki-API-Team to #mediawiki-core [19:45:59] Logged the message, Master [19:56:06] Change on 12www.mediawiki.org a page Wikimedia Labs was modified, changed by Guillaume (WMF) link https://www.mediawiki.org/w/index.php?diff=1506889 edit summary: [-66] Remove obsolete status template per [[phab:T94180|T94180]] [19:57:16] Change on 12www.mediawiki.org a page OAuth (obsolete info) was modified, changed by Guillaume (WMF) link https://www.mediawiki.org/w/index.php?diff=1506903 edit summary: Remove obsolete status template per [[phab:T94180|T94180]] [19:57:40] 10Wikimedia-Labs-Infrastructure, 10Continuous-Integration, 3Continuous-Integration-Isolation: Figure out how to dedicate specific virt nodes to a specific labs project - https://phabricator.wikimedia.org/T84989#1164322 (10Andrew) [19:57:52] 10Wikimedia-Labs-Infrastructure, 10Continuous-Integration, 3Continuous-Integration-Isolation: Figure out how to dedicate specific virt nodes to a specific labs project - https://phabricator.wikimedia.org/T84989#936078 (10Andrew) I renamed this because 'baremetal' refers to a particular use case which this is... [20:04:32] Change on 12www.mediawiki.org a page OAuth (obsolete info)/en-gb was modified, changed by FuzzyBot link https://www.mediawiki.org/w/index.php?diff=1506963 edit summary: Updating to match new version of source page [20:04:32] Change on 12www.mediawiki.org a page OAuth (obsolete info)/zh was modified, changed by FuzzyBot link https://www.mediawiki.org/w/index.php?diff=1506964 edit summary: Updating to match new version of source page [20:05:08] Change on 12www.mediawiki.org a page OAuth (obsolete info)/en was modified, changed by FuzzyBot link https://www.mediawiki.org/w/index.php?diff=1506971 edit summary: Importing a new version from external source [20:15:06] Coren: https://phabricator.wikimedia.org/project/sprint/board/1139/query/open/ is all the tasks needed to complete the quarterly goal, btw. is incomplete, do add tasks on and move them to appropriate column. not urgent tho [20:58:31] hi all; question #1: does bigbrother support webservices running uwsgi-python?, and question #2: if it doesn't, how do I stop bigbrother from attempting to restart a job that was previously lighttpd and is no longer, since deleting the .bigbrotherrc file does not stop it [20:58:34] Filesystem switch in 90s [20:59:01] Earwig: It does. I'll help you with the syntax in a few minutes. [20:59:05] thanks [21:02:38] * Coren stares at the filesystem. [21:03:02] Filesystem switch complete. [21:03:08] * Coren watches things like a hawk. [21:05:17] ... things seem to be going even better than I anticipated. [21:07:05] Oh, duh!