[00:12:37] YuviPanda, no mergy? [00:12:55] https://github.com/mediawiki-utilities/python-mwapi/pull/16 [00:25:42] * halfak watches is local ORES for any unexpected errors. [00:26:07] Heh. The stream was ahead of the API for a few requests. [00:26:21] Either that or there were some revisions that were deleted *very* quickly. [03:07:39] worker-01 is offline [03:09:48] Nothing obvious in the logs. I'll save a snapshot and get the cron back in place. [03:10:56] halfak: staaah [03:10:56] p [03:11:01] halfak: or am I too late? [03:11:03] need to strace it [03:11:18] Not too late [03:11:21] ok [03:11:29] I've not touched it. [03:11:33] Just copied the log. [03:11:35] ok [03:11:40] let me strace see what's goign on [03:12:19] it's stuck on a recvfrom(99, [03:12:25] now to see what 99 is [03:13:14] celery 19314 www-data 99u IPv4 4969122 0t0 TCP ores-worker-01.ores.eqiad.wmflabs:44394->ores-redis:6379 (ESTABLISHED) [03:13:15] ok [03:13:20] so that's the connection to [03:13:20] redis [03:13:28] that it's trying to read something from [03:13:31] and is stuck ther [03:13:32] e [03:13:53] hmm [03:14:22] let me attempt a tcpdump [03:16:23] I wonder if it is that darn keep-alive setting [03:16:29] it might be. [03:16:33] let me look at it on the other side [03:16:36] kk [03:19:09] id=20453 addr=10.68.18.75:44394 fd=103 name= age=11853 idle=11853 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=NULL [03:19:12] is the stuck connection [03:19:20] idt's been idle for a while [03:19:37] halfak: do you have a link to where you found the tcpkeepalaive thing? [03:22:05] * halfak finds it [03:22:46] https://groups.google.com/forum/#!msg/celery-users/M9h4X4iclmM/y-r2Mf2LSuoJ [03:22:51] It was the second link in the email [03:23:02] ctrl-f for "You can enable your OS to answer TCP" [03:23:04] YuviPanda, ^ [03:23:20] reading now [03:24:12] hah [03:24:12] https://gist.github.com/kencochrane/192800a29d98a37ba69c [03:24:19] that strace pattern is very familiar >_> [03:24:23] :D! [03:24:58] still investigating [13:03:30] halfak: ping me when you are around [14:02:08] o/ Amir1 [14:02:34] YuviPanda, ores was backed-up again. [14:02:54] But worker-01 was online and processing requests just fine when I restarted uwsgi [14:03:00] Sorry to bork your strace [14:05:38] YuviPanda, I restarted web-01, but I left web-02 alone though, so you should be able to check things there. [14:27:34] hey halfak, I was afk [14:28:00] I wrote some tests for wb-vandalism [14:42:48] Amir1, awesome. Hopefully, I'll be submitting some PR's there once I've finished this push on OREs stability. [14:42:58] I'll tell you more about that at the meeting. [14:47:37] Speaking of which ORES isn't doing too great right now. [14:47:48] YuviPanda, I think we need to de-pool web-02 [14:48:00] * YuviPanda yawns [14:48:02] Hi [14:48:08] Groggggy [14:48:15] Hey dude. Sorry for a ton of messages. [14:48:23] halfak: /etc/nginx/sites-enabled/ [14:48:30] Edit and restart nginx? [14:48:34] Or [14:48:37] Hiera:ores [14:48:40] On wikitech [14:48:47] And edit that [14:48:56] And run sudo puppet agent -tv [14:49:03] On lb-02 [14:49:13] Do option 2 [14:49:17] Just remove the line for web-02? [14:49:20] Just realized option 1 won't stick [14:49:21] Yeah [14:51:31] kk done [14:51:36] * halfak continues to check things. [14:52:06] halfak: I'll check in 5mins [14:52:06] Yeah. that seems to have helped. [14:52:11] No more timeouts. [14:52:20] kk thanks YuviPanda [14:52:28] I'll be heading to a meeting in 8 minutes [15:06:45] * aetilley stumbles out of bed. [15:16:42] halfak: so [15:16:44] it is stuck in a look [15:16:47] *loop [15:16:48] celery-task-meta-fawiki:reverted:15934918:0.3.0 [15:16:52] it tries to get that key from redis [15:17:02] and recvfrom(6, "$79\r\n\200\2}q\0(X\6\0\0\0resultq\1NX\t\0\0\0tracebackq\2NX\6\0\0\0statusq\3X\4\0\0\0SENTq\4X\10\0\0\0childrenq\5Nu.\r\n", 65536, 0, NULL, NULL) = 86 [15:17:03] it gets it [15:17:05] and keeps trying [15:33:47] halfak: how long is your meeting? [15:41:03] halfak: https://github.com/celery/celery/issues/2374 [15:41:09] do we specify atimeout in our .get? [15:46:31] halfak: i suspect that we are calling .get without a timeout somehow [15:47:05] I have meetings solid until 1PM PDT :( [15:47:22] Will try to respond when I have a moment to think. [15:49:10] halfak: ok. I am unsure how much I can look through the code, however. [15:49:18] halfak: also yay meetings, etc :) [15:49:25] halfak: I might just turn it back on with some more logging [15:50:47] YuviPanda, https://github.com/mediawiki-utilities/python-mediawiki-utilities/blob/master/mw/util/api.py#L39 [15:51:18] (was back before requests didn't natively support retry) [15:51:21] halfak: timeout in celery [15:51:21] FYI [15:51:36] since this is a redis/celery call [15:51:40] oh! [15:51:40] not http [15:51:42] What [15:51:48] What .get are you talking about? [15:51:59] Oh! wait. I see. [15:52:01] yes ncelery-task-meta-fawiki:reverted:15934918:0.3.0 [15:52:12] is the key it constantly tries to get [15:52:16] it returns empty [15:52:20] so it sleeps and tries again [15:52:25] that's what .get on a celery task does [15:52:31] except it's supposed to timeout [15:52:50] so all the processes get stuck doing this [15:52:57] and that causes too many requests to pile up [15:53:04] https://github.com/wiki-ai/ores/blob/master/ores/score_processors/celery.py#L114 [15:53:08] That has no timeout [15:54:13] halfak: that's probably it then [15:56:48] OK. I'll get that in my update to ORES [15:56:58] Merge my mwapi PR? [15:57:06] Then the revscoring PR [15:57:18] * halfak has 3 minutes between meeting [15:57:21] Time to code! [15:57:41] halfak: done [15:57:43] on mwapi [15:58:40] halfak: I want to hold off on the revscoring one atm, so we can get the ores one sorted... [15:58:53] (and it adds a new dependency!) [15:58:59] atm -> we can do that later today [15:59:03] s/we/I/ [15:59:23] OK. [15:59:28] Sounds good. [16:00:21] No rush here. Just tell me when you want to meet. [16:00:36] skype [16:01:15] halfak: I'm going to go for a run now... [16:01:31] halfak: I should be back in... 45mins? in time for your next break [16:01:39] Perfect [16:01:43] o/ [16:01:55] halfak: cool :) [16:02:10] REVSCORING TEAM, ASSEMBLE! [16:02:25] ToAruShiroiNeko_, Amir1, aetilley [16:02:34] :) [16:02:36] Oh! We'll probably be using skype. [16:02:37] * YuviPanda slinks away [16:02:39] I'm in the hangout. [16:03:05] slink (v): move smoothly and quietly with gliding steps, in a stealthy or sensuous manner. [16:03:17] SensuousPanda [16:51:34] halfak: :D [16:59:11] halfak: I guess I didn't catch you this break :) I'll get ready and go to the office for our 12 meeting [17:03:41] Amir1 you there [17:03:56] do you remember what I and halfak agree to talk about? [17:57:34] ToAruShiroiNeko_: hey, I just arrived. you were talking about troll vandals I think [17:59:41] before that, halfak was asking me to gather some info from the communities [18:00:03] o/ got 60 seconds. [18:00:33] ToAruShiroiNeko_, should have put it in the etherpad. [18:00:36] Woops [18:00:46] * halfak --> next meeting [18:01:21] aw [19:04:43] halfak: can you paste here? [19:04:50] https://etherpad.wikimedia.org/p/open_infra_workshop [19:46:37] halfak, I made a few preliminary comments on a phab card. For some reason I can't see it in the revscoring phab board, even though I made the card there. [19:47:08] Make sure "revscoring" is in the projects list on the card. [19:48:42] I did. I just noticed it went into backlog. one sec [19:49:32] Fixed. [19:49:36] srry [20:01:43] Also can anybody remind me of that "log" command which spat out the error from last time? [20:02:09] I'll write it down this time. [20:03:00] checklog or printlog or something and some associated flags. [20:07:55] aetilley: sudo journalctl -u uwsgi * -f [20:08:18] That one. Thanks. [20:08:39] Which does not have "log" anywhere in it. oops.... [20:11:26] halfak: that was fascinating :) [20:13:53] halfak: also https://etherpad.wikimedia.org/p/quarry-wdqs-integration for my SPARQL / SQL integration [20:14:11] :D Thanks for coming YuviPanda [20:14:19] lol @ late professors [20:14:26] halfak: fascinating how papers were such a strong focus [20:14:27] Very stereotypical [20:14:31] oh? [20:14:45] * halfak shakes fists at the academics [20:14:56] Yeah. Papers the currency of legitimacy [20:15:41] that's interesting and strange :) [20:16:19] Yeah. I've got a big rant about the natures of papers and publishing. There's a strong incentive to publish early and often, so most of published stuff is increment or crap. [20:16:24] *incremental [20:16:46] right. as a rank outsider it feels very much like a case of incentives gone wrong. [20:16:51] "We performed this analsys" Why did you do that? "We'll talk about that in the next paper!" [20:16:58] but hey, I remember how much of university was utterly pointless [20:17:00] Definitely is. [20:17:03] and 'gone wrong' from so many levels [20:17:16] 1. university says we must pass at least 80% of students [20:17:30] 2. college decides to fulfill this by making people re-take tests until they pass [20:17:32] 3. LOL [20:18:37] 4. Profit [20:19:06] indeed [20:19:09] lots ofthem do profit [20:21:25] YuviPanda: That was to be called from inside debian-jessie correct? [20:21:37] yup [20:21:54] Interesting. "Failed to add match '*': Invalid argument" [20:21:55] * halfak finally finishes his meetings [20:22:07] wait [20:22:16] aetilley: try just ores*? [20:22:17] uwsgi-* [20:22:18] instead of uwsgi* [20:22:21] that too [20:22:39] haha [20:22:44] yeah, running [20:24:26] halfak: \o/ on the timeout [20:24:37] halfak: I'm going to make my way to the office now [20:24:37] and eat some food [20:25:24] OK. I'll have some repaired ORES/Revscoring stuff to try on staging later today [20:25:30] Ok, it appears to be looping through some error. I will try to print out one iteration. [20:33:20] just emailed you something [20:34:04] halfak: ^ [20:38:40] * halfak looks at email. [20:38:48] oh wait, I have an idea [20:39:29] no, no I don't. [20:39:58] pip install redis-py [20:40:01] heh. [20:40:09] That should have been installed for you [20:40:44] Well remember that this was a fresh clone of ores, per Yuvi's suggestion [20:40:48] so no pip [20:41:22] Its in the venv [20:41:33] In the vagrant [20:41:39] ...probably? [20:41:53] When did you get that error? [20:42:04] Oh! It's in the uwsgi startup logs [20:42:06] hmm [20:42:26] yeah... I don't know why it didn't install all of the dependencies [20:42:29] * halfak checks his vagrant [20:43:03] There is no venv in vagrant. [20:43:43] There should be one in vagrant:/srv/ores/ [20:43:57] yes [20:44:36] source venv/bin/activate? [20:44:39] on the vagrant: sudo -u www-data /srv/ores/venv/bin/pip install -r /srv/ores/config/requirements.txt [20:47:28] No such file/dir [20:47:53] Which one [20:48:55] autocomplete sees srv/ores but not config [20:49:08] lemme look [20:49:17] Me either! [20:49:18] WTF [20:49:29] yeah there's nothing in ores except for venv [20:49:30] OH! It's in your home dir [20:49:35] derp [20:49:42] /vagrant [20:50:00] sudo -u www-data /srv/ores/venv/bin/pip install -r /vagrant/requirements.txt [20:50:06] ^ on the guest [20:50:25] Yes. Done. [20:50:30] now where were we? [20:50:46] ah yes, one sec [20:52:45] Now did we need to pip install that other thing? [20:53:00] Browser still can't see 127.0.0.0:8081 [20:54:23] I mean 127.0.0.1:8081 [20:56:02] lemme venv [20:56:54] source venv/bin/activate correct? [20:58:20] "downloading/unpacking redis-py" [20:58:48] Not found. [20:59:52] Yeah. I'm getting the same. I just realized that this is because redis is an optional install now. [21:01:46] aetilley, sudo /srv/ores/venv/bin/pip install redis [21:02:58] Success. [21:03:07] Now... [21:03:54] Browser Sees Ores!!!!!!! [21:04:07] * aetilley dances [21:04:22] Woot! [21:04:48] Ok. I'm going to stretch. [21:05:07] Any last word before then? [21:05:13] https://github.com/wiki-ai/ores/issues/86 [21:05:18] Na [21:05:24] Glad we made it :) [21:05:28] ditto [21:05:29] ttys [21:05:33] o/