[01:50:32] are you deploying wikilabels :O [05:12:50] (03PS4) 10Ladsgroup: Rework highlighting frontend to make it work everywhere [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358311 (https://phabricator.wikimedia.org/T155930) [05:15:47] (03CR) 10Ladsgroup: Rework highlighting frontend to make it work everywhere (032 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358311 (https://phabricator.wikimedia.org/T155930) (owner: 10Ladsgroup) [11:16:45] (03CR) 10Gergő Tisza: Rework highlighting frontend to make it work everywhere (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358311 (https://phabricator.wikimedia.org/T155930) (owner: 10Ladsgroup) [12:28:35] (03PS5) 10Ladsgroup: Rework highlighting frontend to make it work everywhere [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358311 (https://phabricator.wikimedia.org/T155930) [12:28:40] (03CR) 10Ladsgroup: Rework highlighting frontend to make it work everywhere (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358311 (https://phabricator.wikimedia.org/T155930) (owner: 10Ladsgroup) [12:58:56] Amir1: if I set up ORES with vagrant, where should the test stats come from? the local ORES service or constant configuration? [12:59:49] tgr: It was my issue too, took half of day of frustration which finally I just added scores in the database [13:00:47] sorry, probably should have mentioned that's not working [13:00:59] I only figured it out yesterday myself [13:01:28] I can add a php settings file with hardcoded levels easily but that feels like a hack [13:01:54] should the revid metric provide stats somehow? [13:06:11] (03CR) 10Gergő Tisza: Rework highlighting frontend to make it work everywhere (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358311 (https://phabricator.wikimedia.org/T155930) (owner: 10Ladsgroup) [13:13:23] tgr: I think thresholds should be included for testwikis [13:13:47] included in the PHP config you mean? [13:26:51] (03PS6) 10Ladsgroup: Rework highlighting frontend to make it work everywhere [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358311 (https://phabricator.wikimedia.org/T155930) [13:27:34] (03CR) 10Ladsgroup: Rework highlighting frontend to make it work everywhere (031 comment) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358311 (https://phabricator.wikimedia.org/T155930) (owner: 10Ladsgroup) [13:34:16] yeah [13:46:57] (03CR) 10Gergő Tisza: Rework highlighting frontend to make it work everywhere (032 comments) [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358311 (https://phabricator.wikimedia.org/T155930) (owner: 10Ladsgroup) [13:48:14] o/ [14:01:46] (03CR) 10Ladsgroup: "This looks outside of scope of this patch. Do you want to take a pass and make a follow up or amend this one?" [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358311 (https://phabricator.wikimedia.org/T155930) (owner: 10Ladsgroup) [14:04:06] Zppix: o/ [14:04:27] How are you glorian_wd [14:05:11] I am good Zppix. I am working on the wikiclass PR now. I guess I will submit a revision of my PR shortly. :) [14:05:16] how about you? [14:05:37] Working on some miraheze stuff atm and good [14:55:12] o/ [14:55:16] done with morning meetings. [14:55:30] Looking at a pull request for mwrefs and then dealing with the failed wikilabels deploy [15:05:59] o/ Amir1 [15:06:07] Do we have the ORES deploy in beta yet? [15:10:29] halfak: moring [15:10:49] o/ zppix :D [15:12:32] so whats on the agenda? halfak [15:12:41] deployments. [15:12:55] Wikilabels has a failed deploy. Looks like the error message you fixed is erroring out. [15:12:57] what do you need me to do? [15:13:07] how? [15:13:07] And ORES has a deployment planned for today but I think we're behind. [15:13:11] what did i do wrong [15:13:18] See the deploy task. [15:13:22] ok [15:14:29] i have no clue how to fix that error halfak [15:14:50] It actually tells you how to fix it in the error itself [15:15:39] oh okay [15:16:07] ill fix it and then you goign to redploy? [15:18:53] halfak: fixed [15:22:00] halfak: not yet, On it [15:22:17] halfak: I can do it as a part of the prod deploy [15:22:22] it won't be super complex [15:23:07] Amir1, am working on it now [15:23:14] Will have a patchset shortly. [15:23:19] okay [15:23:30] Can you look at wikilabels and zppix's fix? [15:23:35] I have an error pasted in the phab card. [15:24:31] okay [15:25:44] halfak: do you mean https://github.com/wiki-ai/wikilabels/pull/187 ? [15:26:06] Yeah [15:26:18] You need to make that error get thrown. [15:26:40] The testdata should allow you to see the error if you try to request a workset for enwiki damaging_and_goodfaith campaign. [15:31:28] Amir1, https://gerrit.wikimedia.org/r/358616 for when you are ready. [15:32:50] halfak: merged, I'm trying to do post for this to get it thrown [15:33:08] Amir1, want to trade ORES for wikilabels now? [15:33:15] (that went faster than expected) [15:34:15] noo [15:34:18] I'm on it [15:34:26] kk [15:35:13] OK then I'm going to push ORES to beta? [15:35:15] Amir1, ^ [15:35:20] yeah [15:35:22] thanks [15:36:56] uh... hmm... getting permission issues in our deploy repo on deployment-tin [15:44:58] halfak: I checked and it fixes the problem [15:45:15] Amir1, great. Want to go to staging with it? [15:45:45] one small thing before moving on [15:47:14] the bot icinga2-wm is about to quit pending some python improvements as such like moving to python-irc (from python-irclib). So the bot may break or something and may have to be reverted. [15:50:41] testing zppix [15:53:44] Zppix, I think your next step should be getting your dev environment working :) [15:53:48] No more editing from the browser. [15:54:22] halfak: i cannot github issue not me [15:54:39] nope. I do not accept this. [15:55:04] halfak: https://github.com/wiki-ai/wikilabels/pull/188 [15:55:13] very fast, right now it shows this error to me [15:55:20] {"error": {"code": "conflict", "message": "No tasks available for user_id=3349 in campaign_id=5Please try again or check campaign status /stats/enwiki/"}} [15:56:12] hmm the stats link is not really fun, what do you think halfak [15:58:27] Amir1, agreed. It didn't get the full url that I wanted. [15:58:34] We need some way to get the hostname [15:58:45] Problem is that because of our lb, the hostname is weird. [15:58:58] We need to get it from the http request in an intelligent way. [15:59:52] Amir1, new code is in beta [15:59:57] awesome [16:00:07] halfak: that's for later work [16:00:19] Amir1, ok with me [16:00:38] {{merged}} [16:00:40] 10[1] 04https://meta.wikimedia.org/wiki/Template:merged [16:06:26] halfak: o/ [16:06:47] o/ glorian_wd [16:06:50] could you tell me what does it mean by "content_from_api" in https://pypi.org/project/pywikibase/0.1a/ [16:07:44] hmm [16:09:21] halfak: if you could explain with an example, it would be helpful for me to grasp [16:27:45] halfak: its true tho [16:28:49] glorian_wd, sounds like it'd be revscoring.datasource.revision_oriented.revision.text [16:28:55] Zppix, explain. [16:29:36] halfak: github always screws up my git history and causes the repo on my end to go corrupt [16:30:56] Zppix, doesn't make sense. How do you know this is happening? [16:32:15] because i check the git history and compare to github and i notice that is duplicated the history twice so 1 entry is now rduplicated [16:34:39] Zppix, demo for me? [16:35:17] you would have to have my git history files [16:36:50] make me a paste of command line messups :) [16:37:44] i will asap im doing a lot of things right now [16:51:59] Zppix, OK cool :) [17:51:23] Amir1, I downgraded that task to "high" [17:51:29] It looks like user-error. [17:51:41] Also, we should probably quiet logging for user-errors like ParamError [17:51:51] halfak: if it's not exploding our service, I'm fine :D [17:52:16] the thing is the user is putting so much pressure that ores service as a whole is suffering [17:52:23] https://grafana.wikimedia.org/dashboard/db/ores-extension?orgId=1&from=now-24h&to=now [17:52:32] this is for jobs that mediawiki is sending [17:52:47] the failure rate for this should minimal [17:53:26] halfak: is there any UA to contact? [17:53:56] Amir1, we don't track that [17:54:07] Amir1, no suffering [17:54:07] yes we do [17:54:19] Oh. I don't know where [17:54:19] uwsgi does that [17:54:46] I think the ORES extension might be the source of the issue. [17:55:11] halfak: I checked and number of jobs didn't get increased at that time [17:55:11] Generating that parse error doesn't slow ORES down at all. [17:55:19] It's super fast. Faster than a redis lookup. [17:58:17] I think queue got fulfilled and the timeout for Wikimedia is low [17:58:23] I take care of that [17:59:12] But we're getting a ton of requests for revids=last. :/ That seems to be the source of the error. [17:59:33] halfak: https://logstash.wikimedia.org/goto/ed28714489c9460dc6d5a3011aa4a0f0 [17:59:41] checkout User agent [18:02:01] Why is changeprop not hitting /precache/? [18:08:59] afk for dinner [18:09:02] be back soon [18:13:36] halfak: there? [18:22:07] working on something hard codezee [18:22:08] sorry [18:22:12] (03PS1) 10Anomie: API: Split description messages into summary + additional text [extensions/ORES] - 10https://gerrit.wikimedia.org/r/358724 [18:26:35] ohh, ok...I'll ask later [18:27:32] back [18:27:55] halfak: I think they haven't got to implement it [18:28:00] Gotcha. [18:28:05] This shit is bananas [18:28:14] or we didn't let them know about the endpoint I ask Petr [18:28:17] I can't get a single score to be generated on scb1001 [18:28:39] what the fuck? [18:29:06] Right. it's totally dead. [18:29:09] halfak: and what about other nodes? [18:29:23] haven't tried any others in eqiad. [18:29:42] okay, let me try [18:29:47] Same story for 1002 [18:31:59] scb1003 seems to work [18:32:51] Everything is working again O.O [18:32:55] did you do somethign Amir1? [18:33:15] nope, I just tried again [18:33:21] logged and tried [18:33:24] I need to leave for food. I skipped my lunch [18:33:27] brb [18:33:48] okay [18:33:50] https://grafana.wikimedia.org/dashboard/db/ores-extension?orgId=1 [18:33:53] another spike [18:34:06] we are definitely under pressure [18:36:09] Amir1: out of curiosity, are these stats related to the extension's usage on beta cluster? [18:36:35] codezee: nope, It's about the pressure on the production nodes [18:37:00] which are because of ORES right? [18:37:20] these are ores production nodes [18:38:13] oh, so since they are live, their optimal running is of critical importance I guess? [18:39:44] codezee: exactly [18:43:51] halfak: I need to be afk, I will be back soon [18:52:17] back [19:03:24] I'm back but I am in a critical meeting. [19:03:30] I'll be away for an hour. [19:04:21] halfak: It seems that number of requests is not different otherwise bytes sent would increase https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Service%20Cluster%20B%20eqiad&h=scb1004.eqiad.wmnet&r=day&z=default&jr=&js=&st=1497380564&event=hide&ts=0&v=1777351.13&m=bytes_out&vl=bytes%2Fsec&ti=Bytes%20Sent&z=large [19:04:57] I'm guessing it started to get slow because some nodes hit SWAP [19:05:24] Still getting 500s from scb1001 [19:05:46] https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=scb1001.eqiad.wmnet&m=bytes_out&s=by+name&mc=2&g=mem_report&c=Service+Cluster+B+eqiad [19:07:14] is there anything i can do to help with ores issues? [19:07:44] I don't know, I will let you know if I get something [19:07:49] ok [19:12:53] halfak: the timeout for prod is 10 sec. so It exceeds that time [19:13:14] It was before, but it isn't now. [19:13:21] scb1001 is still getting random 500s [19:13:29] But other servers seem to be responding cleanly. [19:13:33] Still no idea what's going on. [19:13:38] No error messages in app.log [19:13:55] try moving to scb1002 [19:50:17] halfak: I am aware that you seem busy with some critical issue. FYI, I have modified my PR based on your feedback. [19:50:42] Now, I am working on the feedback for Geritt GetSuggestions patch [19:57:59] glorian_wd: i saw that Ill waitfor aaron to look at it first [20:46:09] https://wikitech.wikimedia.org/wiki/Incident_documentation/20170613-ORES [20:46:11] Done [20:46:12] OK [20:46:20] What have I been ignoring no. [20:46:24] codezee! [20:46:28] Oh he's gone :( [20:46:30] glorian_wd, [20:46:33] looking at pr now [20:46:58] halfak: it's already late in codezee's place now :) [20:46:59] kk thanks [20:48:42] glorian_wd, I see that you have still not implemented your features in wikidatawiki.py [20:49:00] I think I did? [20:49:28] I see a lot of files here [20:49:38] And a whole folder called "wikidata" [20:49:50] Ah I see [20:49:59] I find it's more cleaner using folder [20:50:00] I also don't see any test cases. [20:50:11] rather than dumping into one file [20:50:33] but shall I just put everything into wikidatawiki.py? [20:50:54] sec. let me see how easy it is to just take this one. [20:51:11] for the test cases, I haven't really gone through it. I think I need time to figure it out [20:51:31] I think prod monitoring needs to be proritized for scb hosts at least halfak just my $.02 [20:51:43] So far, I tested the functions with extractors from revscoring, and see whether the functions can extract the features correctly [20:52:17] extract the features from wikiclass correctly* [20:57:12] glorian_wd, how did you test? [20:58:54] halfak: https://gist.github.com/GlorianY/bf23c7bd924848a053cf92b0f50682d4 [21:04:40] glorian_wd, had something come up. Go ahead and take a shot at moving all of the stuff to one file. Don't define any DenpendentSet classes. [21:04:50] Just datasources and features. [21:05:02] Only datasources when the intent is to re-use the data in multiple features. [21:05:12] I see [21:05:14] ok [21:05:51] But, I got to work on the GetSuggestions patch first. Otherwise, this will just take time to be merged [21:07:27] OK [21:08:29] halfak: if i communicate with ops what you need for prod mointoring with you there explainging your needs ( i can translate into icinga speak for them) that would be beneficial for now with scb issues [21:09:02] We should have caught those 500 errors. [21:09:09] No idea why it wasn't caught with the checks we have. [21:09:13] 1. figure that out [21:09:18] 2. fix it :) [21:09:48] i would if i had access to the needed things so sadly thatsup to yall [21:10:04] (note to self never say yall ever again) [21:10:27] yall is fine [21:11:30] But yeah, I don't have rights to see it either. Usually I ask questions about icinga in -operation [21:11:31] s [21:12:00] do you know what their current stup for you is ? [21:12:41] I don't have a coherent map of that, no. [21:14:21] paladox: is prod's icinga checks setup public? [21:14:32] What do you mean? [21:14:35] i think so [21:14:45] I dunno. I'm not that familiar. [21:14:47] where would one access that? [21:14:54] https://github.com/wikimedia/puppet/search?utf8=✓&q=icinga+checks&type= [21:15:01] puppeyt [21:15:03] puppet [21:16:05] halfak: I am just realized that your Wikipedia article hasn't yet linked to your Wikidata item [21:16:13] *not that important* [21:16:19] how did that happen. [21:16:42] I saw in the "In Other Projects" section [21:16:48] lol I have an article on ukwiki :) [21:18:25] i can link it whats the wikidata link and wikipedia link [21:20:17] Looks like they are linked now