[07:28:29] 10Scoring-platform-team, 10Wikilabels: [Discuss] Wikilabels routes refactor - https://phabricator.wikimedia.org/T165046#3414738 (10Jan_Dittrich) > The user would be completing those 5-item "worksets" without being exposed to a workset concept. @Pginer-WMF: So the idea would be to give the explicit concept "Wo... [08:05:29] 10Scoring-platform-team, 10Wikilabels: [Discuss] Wikilabels routes refactor - https://phabricator.wikimedia.org/T165046#3414789 (10Pginer-WMF) >>! In T165046#3414738, @Jan_Dittrich wrote: >> The user would be completing those 5-item "worksets" without being exposed to a workset concept. > > @Pginer-WMF: So th... [09:11:40] 10Scoring-platform-team, 10Operations: rack/setup/install ores1001-1009 - https://phabricator.wikimedia.org/T165171#3414940 (10akosiaris) [09:11:54] 10Scoring-platform-team, 10Operations: rack/setup/install ores1001-1009 - https://phabricator.wikimedia.org/T165171#3258960 (10akosiaris) [09:13:27] 10Scoring-platform-team-Backlog, 10ORES: Switch ORES to dedicated cluster - https://phabricator.wikimedia.org/T168073#3414946 (10akosiaris) [09:13:32] 10Scoring-platform-team, 10Operations: rack/setup/install ores1001-1009 - https://phabricator.wikimedia.org/T165171#3258960 (10akosiaris) 05Open>03Resolved Per @Ladsgroup 's comment we better handle the service implementation in T168073. Which is btw gonna be stalled as we are going to stress test a bit th... [09:13:55] 10Scoring-platform-team-Backlog, 10ORES, 10Patch-For-Review: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3391572 (10akosiaris) [09:13:57] 10Scoring-platform-team-Backlog, 10ORES: Switch ORES to dedicated cluster - https://phabricator.wikimedia.org/T168073#3355113 (10akosiaris) 05Open>03stalled Stalling this while T169246 takes place [09:46:30] 10Scoring-platform-team-Backlog, 10Graphite, 10ORES, 10Operations, 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3415026 (10fgiunchedi) [09:51:56] 10Scoring-platform-team-Backlog, 10ORES, 10Patch-For-Review: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3415063 (10akosiaris) ores1001-ores1009 have been pretty much in the same way scb clusters are setup, with the following exceptions in order to isolate the stress te... [09:59:38] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3415085 (10jcrespo) Someone announced 60 seconds of downtime, which I do not think is reasonable- rebooting fully a server and all its services takes around 3... [10:44:55] 10Scoring-platform-team, 10MediaWiki-extensions-ORES, 10MW-1.30-release-notes (WMF-deploy-2017-06-27_(1.30.0-wmf.7)), 10Patch-For-Review, and 2 others: [Discuss] Make ORES Review Tool preferences more prominent - https://phabricator.wikimedia.org/T167910#3415124 (10Trizek-WMF) >>! In T167910#3414308, @jmat... [14:07:31] (03PS4) 10Umherirrender: build: Updating mediawiki/mediawiki-codesniffer to 0.10.0 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/360227 (owner: 10Legoktm) [14:07:37] (03CR) 10Umherirrender: [C: 032] build: Updating mediawiki/mediawiki-codesniffer to 0.10.0 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/360227 (owner: 10Legoktm) [14:21:51] (03Merged) 10jenkins-bot: build: Updating mediawiki/mediawiki-codesniffer to 0.10.0 [extensions/ORES] - 10https://gerrit.wikimedia.org/r/360227 (owner: 10Legoktm) [14:32:28] o/ [14:46:50] o/ [14:54:21] Zppix, could you respond to your email notification about Wikilabels downtime with a note that the downtime will likely last 5 minutes, but the maintenance window will be for a whole hour? [14:54:31] I mis-typed when I said "one minute" [14:54:37] sure [14:55:53] done :) halfak [14:55:59] Thanks [14:56:01] n [14:56:03] np [14:56:16] I think today I'm going to try to get all of the new models merged. [14:56:29] We have a lot of models that have been languishing due to some prod issues. [14:56:35] And my admin work :\ [14:57:00] https://lists.wikimedia.org/pipermail/wikitech-l/2017-July/088435.html halfak [14:57:23] halfak: let me know if you need me to whip out my merge powers [14:58:01] Zppix, will do. We should talk about what kind of things to look for in some of these reviews too :) Maybe we can do that with one of the model PRs [14:58:21] halfak: sounds good maybe it will also help me learn to create a model for you all :) [15:00:25] We'll need to get your linux env set up to push to github [15:01:02] i got it setup overnight last night :) [15:01:17] halfak ^ [15:01:32] Nice! [15:01:42] 10Scoring-platform-team-Backlog, 10ORES, 10Patch-For-Review: Stress/capacity test new ores* cluster - https://phabricator.wikimedia.org/T169246#3415722 (10Halfak) Great! We'll need a nice way to send the requests and we'll probably want a stupid round-robin strategy. I think that should be a pretty straigh... [15:03:29] halfak: I found out why it did it (stupid me had 2 different git programs installed somehow) [15:04:56] Oh good. I'm glad it was a somewhat obvious solution [15:05:00] What text editor do you prefer? [15:05:25] I use npp halfak [15:06:19] Oh! Windows then? [15:06:28] Dev in linux will be much easier for most of our stuff. [15:06:48] halfak: i will work on moving to linux later on [15:06:55] We end up using some libraries that just work better on linux (e.g. enchant for spell checking, and signals for interrupts) [15:06:58] halfak: windows is easier for me to use as i've used it for years [15:07:32] sure. I can get that. Maybe we can just have you run tests on a labs VM [15:07:57] sure if you have one i can use or tell me where i should request one [15:07:58] halfak [15:09:16] We'll have one you can use. I'm working on setting one up that we'll be able to use as a shared dev and model-building space. [15:09:26] https://phabricator.wikimedia.org/T169809 [15:09:32] halfak: is https://github.com/wiki-ai/wikilabels-wmflabs-deploy/pull/38 ready to go? [15:09:39] Yup [15:09:42] Plz merge [15:09:47] And I'll send to staging :) [15:09:53] ok [15:10:04] done [15:10:27] Zppix, note the blocking subtask for that last one I linked to. We're just waiting on the Cloud folks to get to this task. It's delayed for a good reason, so don't poke 'em OK? [15:11:03] halfak: no worries ill just use windows for now [15:11:39] halfak: i trust cloud team knows what they are doing, i have no knowledge behind that kind of server mangment so i have no proper reason for bothering them [15:12:01] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3415739 (10Halfak) Announcements have been updated. Thanks for the note. Shall we always announce a 1 hour maintenance window for DB maintenance? [15:15:07] 10Scoring-platform-team, 10Wikilabels: [Discuss] Wikilabels routes refactor - https://phabricator.wikimedia.org/T165046#3415745 (10Halfak) It's interesting that you raise this issue. I've not heard from any users that it was a confusing concept. Is having "worksets" without calling them "worksets" less confu... [15:15:14] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3415746 (10jcrespo) It varies from maintanance to maintenance, depending on the work to be done. Some take more some take less- the "normally" was meant as "N... [15:17:11] halfak: i updated github.com/wiki-ai/ to include a link to our mediawiki.org page (https://www.mediawiki.org/wiki/Wikimedia_Scoring_Platform_team) [15:19:02] (03PS2) 10Umherirrender: Remove redundant or useless PHPCS rules [extensions/ORES] - 10https://gerrit.wikimedia.org/r/363516 (owner: 10Legoktm) [15:19:46] (03CR) 10Umherirrender: [C: 032] Remove redundant or useless PHPCS rules [extensions/ORES] - 10https://gerrit.wikimedia.org/r/363516 (owner: 10Legoktm) [15:20:48] Zppix, nice. I wonder if we can get ebernhardson and lzia in an this. [15:21:00] halfak: how so? [15:21:07] They both do wiki AI stuff too. [15:21:12] Different teams. [15:21:12] halfak: on what? :) [15:21:33] We have this wiki-ai org on github. Seems limiting that there's only ORES/wikilabels/etc stuff in there. [15:21:37] halfak: i mean im with you whatever you decide [15:21:40] But regardless it would be good to link to your work [15:21:55] i can add them to the org if you want halfak ill just need their github acct [15:22:36] (03Merged) 10jenkins-bot: Remove redundant or useless PHPCS rules [extensions/ORES] - 10https://gerrit.wikimedia.org/r/363516 (owner: 10Legoktm) [15:22:39] ahh. We could although mine is not nearly so general as what you are working on. It's probably only useful to someone doing things like I am (calculating labels from click logs, collecting feature vectors from elasticsearch, etc) [15:23:53] Nothing wrong with that. Most of our stuff is MediaWiki specific. But anyway, just want to leave that door open. [15:24:16] I'm stoked that you've appropriated this channel. [15:24:22] :D [15:24:38] And mailing list [15:24:47] I've been trying to convince lzia to come and hang out [15:39:44] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3415818 (10Halfak) [15:40:52] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3369055 (10Halfak) [15:41:01] 10Scoring-platform-team, 10DBA, 10Operations, 10cloud-services-team: Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3369055 (10Halfak) Gotcha. Next time, we should add these details to the task description and I'll pick them up from there when making announcement. :) In... [19:03:25] halfak: where is the skip button dialog message found for wikilabels [19:04:17] isn't overfitting fun: CV on a 1.1M sample dataset, and then eval the final model on a 10M sample dataset: cv-train-ndcg@10: 0.9143, cv-test-ndgc@10: 0.9320, holdout-test-ndcg@10: 0.8275 [19:07:42] ebernhardson, how did you overfit that? Did you accidentally tune to your CV set? [19:08:13] Zppix, https://github.com/wiki-ai/wikilabels/blob/master/wikilabels/i18n/en.json [19:08:21] Rather: https://github.com/wiki-ai/wikilabels/blob/master/wikilabels/i18n/en.json#L4 [19:08:54] ty [19:09:31] halfak: i think the hyperparameter optimization made a horrible choice for maximum tree depth and minimum samples per node, but i'm not sure why yet. It decided to use a depth of 10 and a min weight of 10, which are both the extremes of my parameter search and generally not even a good idea when i do CV on the full 40M samples [19:10:05] Interesting. Sounds like you either got very very unlucky or there's something weird going on. [19:10:07] i suppose it's partly because somethow the cv-test is saying those were really good values ... [19:12:28] i've trained this same dataset before and had sane results, the only thing different is i added two new features: wp10 and page_created_timestamp, not sure what best to try: either more cv folds (currently 5) or one new feature at a time to see if one of them is just throwing things way out [19:15:20] You are using a gradient boosting strategy, right? [19:15:54] ebernhardson, ^ [19:15:54] yes, this is gradient boosting ensemble with xgboost (using lambdarank as the objective) [19:16:17] Weird. I'm surprised it would be fragile like that. [19:16:18] halfak can you review my last pr [19:16:24] Zppix, already did [19:16:32] wow thats quick [19:18:12] replied halfak [19:21:21] replied [19:21:37] ok, im going to go bug wikibugs maintainers it appears it died [19:24:02] done [19:30:33] halfak: I read the test requirements on https://phabricator.wikimedia.org/T168007 . Is https://github.com/wiki-ai/ores/pull/215/files#diff-20dd115b72122ad56214dad325ca7df9 one such example of a request? [19:31:22] codezee, thanks for catching that. [19:31:23] https://github.com/wiki-ai/ores/pull/216 [19:31:35] I'd made that change yesterday in frustration because it stood in my way :) [19:33:33] ohh looks like its already done.. :P [19:33:49] codezee, I just merged the models you built for romanian Wikipedia. did you build them on ores-compute-01? [19:34:01] halfak: yes [19:34:17] halfak: I think it needs to be deleted right? [19:37:08] Eventually, yeah. Just wanted to make sure they were built in our prod env. :) [19:37:17] ohh,ok... [19:38:03] halfak: fyi i changed to task instead of edit [19:41:01] halfak: I've setup working ORES locally for future tasks but I'm not sure whats exactly celery, I saw it being mentioned as failing in a parallel thread with SignalTimeout... [19:42:16] besides I think nosetests fail locally saying no module celerytest, ... update required to requirements? [19:46:19] halfak: i dont think celery was added to ores requirements.txt want me to add it if its not? [19:46:42] nevermind it is [19:46:50] Zppix: celery is there in ORES requirements but still this error :/ [19:46:53] codezee: try running pip install requirements.txt [19:47:32] codezee, we just stopped using celerytest [19:47:40] but it's still in text-requirements.txt [19:47:58] test* [19:48:19] halfak: if we're not using it is it okay if i remove it from test-requirements.txt? [19:50:41] oh i see [19:51:59] ok, so I have the latest version and i get "ERROR: Failure: ImportError (cannot import name 'CELERY_TEST_BACKEND_CONFIG')" on nosetests [19:55:09] codezee: halfak ^ there you go [20:04:52] nice [20:05:23] codezee, not sure what is up there. [20:06:40] codezee, I just merged something that fixes our CI [20:06:54] That was causing different errors than you got, but this might help anyway [20:06:57] Try to pull master again. [20:09:46] Yay! tests pass \o/ [20:16:08] Nice! [20:36:38] 10Scoring-platform-team-Backlog, 10Wikimania-Hackathon-2017: ORES @ the Wikimania Hackathon - https://phabricator.wikimedia.org/T170015#3416777 (10Halfak) [20:38:19] PROBLEM - ping4 on ores-worker-09 is WARNING: PING WARNING - Packet loss = 0%, RTA = 104.37 ms [20:38:34] uh oh [20:38:35] PROBLEM - ping4 on ores-redis-01 is CRITICAL: PING CRITICAL - Packet loss = 44%, RTA = 2291.60 ms [20:38:38] PROBLEM - ping4 on ores-web-03 is CRITICAL: PING CRITICAL - Packet loss = 60%, RTA = 2221.11 ms [20:38:43] hmm [20:38:54] PROBLEM - ping4 on ores-lb-02 is CRITICAL: PING CRITICAL - Packet loss = 70%, RTA = 2333.73 ms [20:39:03] did labs have some hicups in it's dns [20:39:03] halfak everything is going to hell [20:39:22] PROBLEM - ping4 on ores-worker-09 is CRITICAL: PING CRITICAL - Packet loss = 0%, RTA = 258.00 ms [20:39:25] uh [20:39:30] it just happened in -releng [20:39:56] Confirmed that ores-lb-02 is online [20:39:57] RECOVERY - ping4 on ores-lb-02 is OK: PING OK - Packet loss = 0%, RTA = 2.51 ms [20:40:08] lol [20:40:25] RECOVERY - ping4 on ores-worker-09 is OK: PING OK - Packet loss = 0%, RTA = 2.33 ms [20:40:32] halfak well atleast we know icinga2-wm works lol [20:40:39] right :) [20:40:41] RECOVERY - ping4 on ores-redis-01 is OK: PING OK - Packet loss = 0%, RTA = 66.43 ms [20:40:42] lol [20:40:43] RECOVERY - ping4 on ores-web-03 is OK: PING OK - Packet loss = 0%, RTA = 4.62 ms [20:41:03] it just happened for phab-01 and wikistats when it pinged it's domain. [20:41:08] 10Scoring-platform-team, 10ORES, 10User-Zppix: Extend icinga check to catch 500 errors like those of the 20170613 incident - https://phabricator.wikimedia.org/T167830#3416793 (10Halfak) a:05Zppix>03Halfak [20:41:09] it was probaly a blips in the dns [20:41:17] or something is about to go down. [20:41:33] 10Scoring-platform-team, 10ORES, 10User-Zppix: Extend icinga check to catch 500 errors like those of the 20170613 incident - https://phabricator.wikimedia.org/T167830#3346179 (10Halfak) @akosiaris, see ^ No rush. [20:41:51] halfak phab is being weird why was that task reassigned? [20:42:12] Zppix, I just finished the work for it [20:42:32] ok [20:42:34] * halfak is trying to get the epic task for out last downtime event killed. [20:42:56] * Zppix is trying to let me connect to phab on my desktop [20:43:10] 10Scoring-platform-team-Backlog: Send celery logs and events to logstash - https://phabricator.wikimedia.org/T169586#3402704 (10Halfak) How do logs get sent to logstash anyway? I'm not even sure how to get started on this task. [20:43:27] 10Scoring-platform-team-Backlog, 10ORES: Send celery logs and events to logstash - https://phabricator.wikimedia.org/T169586#3416820 (10Halfak) [20:43:37] 10Scoring-platform-team-Backlog, 10ORES, 10Wikimedia-Logstash: Send celery logs and events to logstash - https://phabricator.wikimedia.org/T169586#3402704 (10Halfak) [20:43:39] i think theres documentation on that on wikitech halfak iirc [20:52:59] Zppix, any chance you want to look into that? [20:53:07] I got stuck working on something else for the next hour. [20:53:32] 10Scoring-platform-team-Backlog, 10Contributors-UX-Research, 10Wikilabels: Rename "abandon" button to something less confusing - https://phabricator.wikimedia.org/T138736#3416841 (10Zppix) 05Open>03Resolved Resolved already. Reopen if im incorrect [20:53:42] halfak i can attempt to yes [20:54:08] 10Scoring-platform-team, 10Contributors-UX-Research, 10Wikilabels: Rename "abandon" button to something less confusing - https://phabricator.wikimedia.org/T138736#3416846 (10Halfak) [20:55:21] are we sending logs and such from labs or prod halfak? [20:55:37] At least prod if not also labs [20:55:57] Theres logstash for both, ill look into prod first since its a bit more important [20:59:06] ill talk to bryan davis he seems to be the person to go to according to wikitech's page on logstash halfak [20:59:12] ill update you as i get info [21:00:42] thanks [21:07:20] halfak: but generally you'd need to add a python log config that routes the messages to the proper logstash collector server and then add rules there via Puppet to process and store the messages according to bd808 [21:07:55] We are sending some logs to logstash. How do the logs get picked up from the local servers? [21:09:09] he doesnt work with logstash atm so he doesnt know much then what he told me [21:09:17] ill see what i can find [21:10:49] halfak do you know what department of wikimedia maintains logstash? [21:10:52] is it ops? [21:11:07] I think it must be. [21:11:12] They'd know either way [21:11:12] ill ask there [21:26:40] halfak i think we use elasticsearch to log to logstash [21:26:55] see wikitech.wikimedia.org/wiki/elasticsearch [21:53:26] 10Scoring-platform-team, 10WMF-Communications, 10Wikimedia-Blog-Content: Announce new team: "Scoring Platform" - https://phabricator.wikimedia.org/T169755#3417001 (10Halfak) https://docs.google.com/document/d/12i0iXobCZX_6JySTGJcpPeqIoFMeq6O0NvOXZUju0iA/edit#heading=h.pk21ji5iohm1 [21:55:19] Zppix, I feel like there should be some puppet config about where to find the logs to import into logstash but I'm not finding that [21:55:31] Im still asking no reply [21:55:44] https://github.com/wikimedia/puppet/blob/34acf17716526dddda953067172f9f8257ab0e3e/modules/role/files/logstash/filter-ores.conf [21:55:48] Oh wait. look at that [21:56:04] https://github.com/wikimedia/puppet/commit/34acf17716526dddda953067172f9f8257ab0e3e [21:56:06] Ha! [21:56:25] why cant everything be so easy [21:57:00] I've got to step away pretty soon. Helping a buddy pick up an old deck from his yard. [21:57:19] * halfak <3's manual labor at the end of a long day of hacking [21:57:22] halfak ill look at that see if i cant figure it out [21:57:25] have a good one [21:57:27] Thanks dude. [21:57:29] o/ [21:57:41] o/ [22:18:53] zppix: its actually the other way arround, you log to logstash for the data to be inserted to elasticsearch [22:22:11] i guess zppix isn't around, but also somehow i maintain logstash :P [23:35:50] 10Scoring-platform-team-Backlog, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 10ORES: ORES spamming Beat Cluster's logstash - https://phabricator.wikimedia.org/T170026#3417317 (10greg) [23:36:02] 10Scoring-platform-team-Backlog, 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 10ORES: ORES spamming Beta Cluster's logstash - https://phabricator.wikimedia.org/T170026#3417330 (10greg)