[14:32:05] 10Scoring-platform-team, 10Analytics, 10EventBus: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Halfak) Confirmed. We expect no overlap in revision-scored where a revision is scored twice, but should that happen, new event. [14:34:04] 10Scoring-platform-team, 10MediaWiki-API, 10Wikimedia-database-error, 10Wikimedia-production-error: Certain prop=revisions API queries timeout with "internal_api_error_DBQueryError" - https://phabricator.wikimedia.org/T121333 (10Halfak) Just for clarity, who would be helpful with database performance issu... [14:34:56] 10Scoring-platform-team, 10Analytics, 10EventBus: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Ottomata) @Halfak I think we've confirmed this before too, but I want to make super sure! All predictions are strings (or can be cast to strings), an... [14:39:29] 10Scoring-platform-team, 10Analytics, 10EventBus: Change event.mediawiki_revision_score schema to use map types - https://phabricator.wikimedia.org/T225211 (10Halfak) Hmm yes. You can cram a bool or int into a string. Predictions are all int, bool, string, or list of strings. [14:44:30] 10ORES, 10Scoring-platform-team (Current), 10Repository-Admins, 10Release-Engineering-Team-TODO (201909): Update k18 password in phab diffusion - https://phabricator.wikimedia.org/T232661 (10MarcoAurelio) I'll try with `phab` as shell, and K19; and report back. [14:46:10] 10ORES, 10Scoring-platform-team (Current), 10Patch-For-Review, 10Puppet: Require git-lfs in ores::base puppet role - https://phabricator.wikimedia.org/T232494 (10Halfak) We need git-lfs on the nodes we use to build models too. Hence, why I suggested ores::base. But otherwise, I think this sounds like a f... [14:50:09] 10ORES, 10Scoring-platform-team (Current), 10Repository-Admins, 10Release-Engineering-Team-TODO (201909): Update k18 password in phab diffusion - https://phabricator.wikimedia.org/T232661 (10MarcoAurelio) >>! In T232661#5487934, @MarcoAurelio wrote: > I'll try with `phab` as shell, and K19; and report back... [14:50:19] o/ [14:50:47] I just saw your work on T232661. Are the K18 creds updated now? [14:50:52] T232661: Update k18 password in phab diffusion - https://phabricator.wikimedia.org/T232661 [14:50:57] hauskatze, ^ [14:51:41] halfak: not yet. I switched phabricator to replicate via SSH to Gerrit, and used K19 instead. [14:52:04] so far it works too [14:55:06] Ohhh [14:55:21] I tried that yesterday but with the wrong username [14:55:27] so it obviously failed [14:55:27] 10Scoring-platform-team, 10MediaWiki-API, 10Wikimedia-database-error, 10Wikimedia-production-error: Certain prop=revisions API queries timeout with "internal_api_error_DBQueryError" - https://phabricator.wikimedia.org/T121333 (10Anomie) 05Open→03Resolved a:03Anomie For core modules, probably CPT des... [14:55:49] so, ores/ores.git being out-of-date should be resolved [14:56:42] 10Scoring-platform-team, 10MediaWiki-API, 10Wikimedia-database-error, 10Wikimedia-production-error: Certain prop=revisions API queries timeout with "internal_api_error_DBQueryError" - https://phabricator.wikimedia.org/T121333 (10Halfak) Awesome writeup @anomie. Thank you. [14:56:58] Thanks for your help. We have a bunch of repos that replicate in a similar way. Should we do this for all of them? [14:58:02] 10Scoring-platform-team, 10MediaWiki-API, 10Core Platform Team Workboards (Clinic Duty Team), 10Wikimedia-database-error, 10Wikimedia-production-error: Certain prop=revisions API queries timeout with "internal_api_error_DBQueryError" - https://phabricator.wikimedia.org/T121333 (10Anomie) [14:58:19] I see lots of login errors in the console, yes. I can migrate them for you if you let me know the repos [14:59:11] There's also a weird branch name at https://phabricator.wikimedia.org/source/ores/branches/master/ [14:59:20] which is copied from GitHub [14:59:21] source/editquality, articlequality, draftquality, drafttopic [14:59:34] you may wish to kill that one if not needed - IDK :) [14:59:48] I'll see what I can do [15:00:10] Oh weird. [15:00:46] editquality.git seems already done: https://phabricator.wikimedia.org/source/editquality/manage/uris/ [15:01:39] * halfak tries to figure out how to delete the branch. [15:03:11] Hmm. The only way that could happen is if it got mirrored. [15:03:47] Looking at editquality, something don't add up [15:03:53] Or not enough coffee [15:04:23] so ediquality is mediawiki/services/ores/editquality right halfak ? [15:04:54] because https://gerrit.wikimedia.org/g/mediawiki/services/ores/editquality is 3 years behind :/ [15:04:56] https://gerrit.wikimedia.org/r/scoring/ores/editquality [15:05:11] I dunno what that repo is. It could probably be deleted. [15:05:36] halfak: I heard about "draft topic" for the first time today but I couldn't find any info online other than https://www.mediawiki.org/wiki/ORES/Draft_topic [15:05:51] ediquality @ phab points to mediawiki/services/ores/editquality halfak [15:06:03] https://phabricator.wikimedia.org/source/editquality/manage/uris/ [15:06:39] halfak: Does it exist somewhere or is it more at the idea stage? [15:08:19] stephanebisson, drafttopic has been in production for over a year :) [15:08:44] Alright so the editquality repo at Phab points to a wrong repo at Gerrit [15:08:45] hauskatze, interesting. We have been having trouble getting changes pushed to editquality from phab. [15:08:51] that's why it is not updating [15:08:53] ha [15:09:10] so if you confirm me it's https://gerrit.wikimedia.org/g/scoring/ores/editquality/+/refs/heads/master [15:09:19] I'll change the URI and re-replicate [15:09:47] confirmed. [15:09:56] Changing [15:09:57] "url = https://gerrit.wikimedia.org/r/scoring/ores/editquality" from our deployment config repo [15:10:00] thanks [15:10:20] stephanebisson, any questions I can answer about drafttopic? [15:11:35] Done. Forcing an update. [15:12:13] * hauskatze reads the Penal Code in the meanwhile [15:13:36] Updated :D [15:13:45] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [15:13:53] moving to articlequality [15:13:57] It's so weird that I have to delete a branch on phab. Like. How did it get there? I don't even know how to push changes to phab? [15:14:18] If you deleted it on GitHub it'll get deleted on Phab & gerrit [15:14:26] in the next replication run [15:14:37] Weird. OK that works. [15:14:56] Deleted. [15:17:42] Updating articlequality to use SSH [15:17:49] and forcing an update [15:18:31] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 977 bytes in 8.023 second response time https://wikitech.wikimedia.org/wiki/ORES [15:22:17] halfak: Is there more public doc? I have to run now but I'm thinking about ways for readers to explore a topic and wondering if the AI-created topic graph would be more usable than the traditional category tree. [15:22:39] it's taking a while to update the articlequality repo [15:22:44] still in progress [15:23:33] Thanks hauskatze [15:23:56] stephanebisson, I have the research paper describing how it is built and what it is designed for. See https://dl.acm.org/citation.cfm?id=3274290 [15:24:22] That's probably the best reference right now. [15:24:29] It's much more stable and useful than categories :) [15:26:11] PROBLEM - ORES web node labs ores-web-02 on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [15:27:35] RECOVERY - ORES web node labs ores-web-02 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.144 second response time https://wikitech.wikimedia.org/wiki/ORES [15:30:09] articlequality in sync. again [15:30:34] Moving to draftquality [15:31:20] * halfak pulls changes to the deploy config [15:31:27] It works! We are unblocked! [15:31:38] <3 hauskatze [15:32:41] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.004 second response time https://wikitech.wikimedia.org/wiki/ORES [15:35:17] draftquality should be also in sync. back again [15:35:25] rechecking [15:35:50] Yup, looks like it. [15:35:59] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1009 bytes in 8.495 second response time https://wikitech.wikimedia.org/wiki/ORES [15:36:05] Now to drafttopic [15:38:14] Updating... [15:40:43] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [15:41:20] (03PS1) 10Halfak: Bumps ORES to HEAD [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/536221 [15:42:11] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 979 bytes in 4.127 second response time https://wikitech.wikimedia.org/wiki/ORES [15:42:54] draftopic isn't updating for some reason, checking [15:43:57] Alright, forgot to set the right credential [15:44:05] * hauskatze eyes the nespresso machine [15:45:08] Lol [15:47:04] Updated! [15:47:17] halfak: all the repos you mentioned are now in sync. [15:50:15] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [15:51:41] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 977 bytes in 0.435 second response time https://wikitech.wikimedia.org/wiki/ORES [16:26:28] (03CR) 10Accraze: [C: 03+2] Bumps ORES to HEAD [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/536221 (owner: 10Halfak) [16:26:56] 10ORES, 10Scoring-platform-team (Current): ORES deploy early Sept. 2019 - https://phabricator.wikimedia.org/T232660 (10Halfak) https://gerrit.wikimedia.org/r/#/c/mediawiki/services/ores/deploy/+/536221 [16:28:18] (03CR) 10Halfak: [V: 03+2] Bumps ORES to HEAD [services/ores/deploy] - 10https://gerrit.wikimedia.org/r/536221 (owner: 10Halfak) [16:40:02] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.004 second response time https://wikitech.wikimedia.org/wiki/ORES [16:41:41] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 5.678 second response time https://wikitech.wikimedia.org/wiki/ORES [16:53:58] beta deploy looks good. I think we're ready to send this to prod in 7 minutes. [17:00:13] doing the deploy! [17:06:50] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.004 second response time https://wikitech.wikimedia.org/wiki/ORES [17:10:02] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1009 bytes in 4.126 second response time https://wikitech.wikimedia.org/wiki/ORES [17:10:46] PROBLEM - ORES web node labs ores-web-02 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.003 second response time https://wikitech.wikimedia.org/wiki/ORES [17:13:56] RECOVERY - ORES web node labs ores-web-02 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1009 bytes in 0.053 second response time https://wikitech.wikimedia.org/wiki/ORES [17:15:44] PROBLEM - ORES web node labs ores-web-01 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.003 second response time https://wikitech.wikimedia.org/wiki/ORES [17:17:24] RECOVERY - ORES web node labs ores-web-01 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 979 bytes in 5.053 second response time https://wikitech.wikimedia.org/wiki/ORES [17:19:53] Deploy complete! [17:19:56] Looks good. [17:20:15] Also, we had a major event a moment ago where someone was absolutely HAMMERING ores and it took it in stride :) [17:21:19] Most of it was served from the cache [17:21:22] * halfak flexes muscles. [17:21:28] OK off to lunch [17:35:24] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [17:37:00] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1007 bytes in 9.274 second response time https://wikitech.wikimedia.org/wiki/ORES [17:44:56] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/ORES [18:02:10] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 979 bytes in 2.002 second response time https://wikitech.wikimedia.org/wiki/ORES [18:33:32] 10ORES, 10Scoring-platform-team (Current), 10Repository-Admins, 10Release-Engineering-Team-TODO (201909): Update k18 password in phab diffusion - https://phabricator.wikimedia.org/T232661 (10Halfak) 05Open→03Resolved Looks like this is working for us. Thanks @MarcoAurelio et al.! [18:33:34] 10ORES, 10Scoring-platform-team (Current): ORES deploy early Sept. 2019 - https://phabricator.wikimedia.org/T232660 (10Halfak) [19:40:13] o/ accraze [19:40:17] Got some time to look at uwsgi [19:40:18] ? [19:41:43] yeah, lets do it! [19:42:22] Cool. Call when ready. [21:26:37] Deploying code to ores-web-04 [21:26:46] * halfak twiddles thumbs with great anticipation [21:30:26] accraze, I get that session dropping issue on staging too. [21:30:36] Eventually I get "connection aborted" so maybe unrelated. [21:31:12] hmmm [21:31:38] Aha! But it looks like I don't see any slowdowns when I re-create connections on the new ores-web-04 [21:31:46] So memory pressure must be related. [21:31:51] so maybe 4GB memory is too low? [21:31:57] How were we running 42 workers before without an issue!? [21:32:18] Whatever. Let's just convert all the web workers to 8GB :) [21:32:38] haha I think that's probably the best solution for now [21:50:14] brb -- gettin a sandwich