[00:33:07] PROBLEM - ORES web node labs ores-web-06 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/ORES [00:33:59] 10Scoring-platform-team, 10editquality-modeling, 10Growth-Team (Current Sprint), 10User-Tgr, 10artificial-intelligence: Update ORES filter thresholds for huwiki - https://phabricator.wikimedia.org/T230031 (10Etonkovidova) >>! In T230031#5519143, @Tgr wrote: > @MMiller_WMF not anything in particular, I ju... [00:34:03] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.029 second response time https://wikitech.wikimedia.org/wiki/ORES [00:34:11] PROBLEM - ORES web node labs ores-web-04 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.024 second response time https://wikitech.wikimedia.org/wiki/ORES [00:34:27] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/ORES [00:38:13] (03Merged) 10jenkins-bot: Make onSpecialContributionsGetFormFilters follow the OOUI mode [extensions/ORES] - 10https://gerrit.wikimedia.org/r/499791 (https://phabricator.wikimedia.org/T219238) (owner: 10Ladsgroup) [00:42:03] (03CR) 10jenkins-bot: Make onSpecialContributionsGetFormFilters follow the OOUI mode [extensions/ORES] - 10https://gerrit.wikimedia.org/r/499791 (https://phabricator.wikimedia.org/T219238) (owner: 10Ladsgroup) [01:11:35] RECOVERY - ORES web node labs ores-web-04 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.090 second response time https://wikitech.wikimedia.org/wiki/ORES [01:16:29] PROBLEM - ORES web node labs ores-web-04 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.038 second response time https://wikitech.wikimedia.org/wiki/ORES [01:41:07] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1009 bytes in 0.055 second response time https://wikitech.wikimedia.org/wiki/ORES [01:45:59] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.033 second response time https://wikitech.wikimedia.org/wiki/ORES [06:26:04] RECOVERY - ORES web node labs ores-web-05 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.210 second response time https://wikitech.wikimedia.org/wiki/ORES [06:26:28] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1009 bytes in 0.039 second response time https://wikitech.wikimedia.org/wiki/ORES [06:27:14] RECOVERY - ORES web node labs ores-web-04 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1009 bytes in 0.061 second response time https://wikitech.wikimedia.org/wiki/ORES [06:27:54] RECOVERY - ORES web node labs ores-web-06 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.271 second response time https://wikitech.wikimedia.org/wiki/ORES [06:40:08] PROBLEM - ORES web node labs ores-web-04 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.037 second response time https://wikitech.wikimedia.org/wiki/ORES [06:41:42] PROBLEM - ORES worker labs on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.035 second response time https://wikitech.wikimedia.org/wiki/ORES [07:03:30] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.032 second response time https://wikitech.wikimedia.org/wiki/ORES [07:05:28] PROBLEM - ORES web node labs ores-web-06 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.059 second response time https://wikitech.wikimedia.org/wiki/ORES [11:11:07] RECOVERY - ORES web node labs ores-web-05 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 1009 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/ORES [11:15:25] PROBLEM - ORES web node labs ores-web-05 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.131 second response time https://wikitech.wikimedia.org/wiki/ORES [13:11:46] RECOVERY - ORES web node labs ores-web-06 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 977 bytes in 0.058 second response time https://wikitech.wikimedia.org/wiki/ORES [13:16:38] PROBLEM - ORES web node labs ores-web-06 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.050 second response time https://wikitech.wikimedia.org/wiki/ORES [14:02:51] Technical Advice IRC meeting starting in 60 minutes in channel #wikimedia-tech, hosts: @Lucas_WMDE & @mutante - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [14:52:48] Technical Advice IRC meeting starting in 10 minutes in channel #wikimedia-tech, hosts: @amir1 & @Lucas_WMDE - all questions welcome, more infos: https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting [15:04:34] I will let you know when I see halfak and I will deliver that message to them [15:04:34] @notify halfak Ores-Redis02.experimental seems to be running/ is out of disk space per icinga2 [15:04:35] 04Error: Command “notify” not recognized. Please review and correct what you’ve written. [15:11:55] RECOVERY - ORES web node labs ores-web-06 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.261 second response time https://wikitech.wikimedia.org/wiki/ORES [15:26:56] 10Scoring-platform-team, 10Data-Services, 10Wikilabels, 10cloud-services-team (Kanban): postgresql on clouddb1002 needs some kind of puppet management of pg_hba.conf - https://phabricator.wikimedia.org/T209396 (10Bstorm) a:05Bstorm→03None [15:27:56] 10Scoring-platform-team, 10Data-Services, 10Wikilabels, 10Patch-For-Review, 10cloud-services-team (Kanban): clouddb1002 low on space -- move wikilabelsdb - https://phabricator.wikimedia.org/T224062 (10Bstorm) [15:35:46] PROBLEM - ORES web node labs ores-web-06 on ores.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 INTERNAL SERVER ERROR - 1968 bytes in 0.051 second response time https://wikitech.wikimedia.org/wiki/ORES [16:19:47] Async Standup: [16:20:10] Y: More Jade stuff: About ~75% done with the ProposeOrEndorse api module, worked some more on GetLabels module. [16:20:10] T: Finish up ProposeOrEndorse and hopefully GetLabels. Increase test coverage. Investigate inconsistent user lookup handling. [17:08:07] halfak: looks like you're away today [17:08:21] Hey! Yes. I'm doing meetings all weeks. [17:08:29] i'll be traveling to Chicago tomorrow through Sunday [17:08:32] let's talk soon [17:08:42] I started a new outline / draft in overleaf i'll invite you [17:08:51] still have a few data issues to work out [17:09:05] it would be convenient if recovering the older threshholds was doable [17:09:14] Great. I'll be back to normal on Monday. [17:10:04] in the recall_at_precision(min_precision=0.15) [17:10:08] format [17:10:35] if you can get some cycles to look into that it would be great [17:10:46] I should be able to stay occupied without it though [17:11:21] but like for some of the early adopters the cutoff we want to use has that format [17:11:25] OK I can look into that. Can you get me some specifics? E.g. which models, which time periods? [17:11:42] yeah [17:12:02] i'll send you a csv shortly [17:12:28] with the specific dates and thresholds [17:31:46] Cool. I'll do my best. Probably on Monday :) [17:41:40] 10ORES, 10Scoring-platform-team: ores-redis-02 is out of disk space. AOF file is too big - https://phabricator.wikimedia.org/T233831 (10Halfak) [17:46:07] 10ORES, 10Scoring-platform-team: ores-redis-02 is out of disk space. AOF file is too big - https://phabricator.wikimedia.org/T233831 (10Halfak) It looks like we can fix this issue in redis config. Why is this only a problem now? Who knows. [17:46:38] 10ORES, 10Scoring-platform-team: ores-redis-02 is out of disk space. AOF file is too big - https://phabricator.wikimedia.org/T233831 (10Halfak) Thanks to @zppix for reporting. @ACraze, if this sounds exciting, you might take a look. [18:00:31] 10ORES, 10Scoring-platform-team: ores-redis-02 is out of disk space. AOF file is too big - https://phabricator.wikimedia.org/T233831 (10Halfak) We probably need to make some changes here: https://github.com/wikimedia/puppet/blob/b347052863d4d2e87b37d6c2d9f44f833cfd9dc2/modules/role/manifests/labs/ores/redis.p... [18:05:41] 10ORES, 10Scoring-platform-team: ores-redis-02 is out of disk space. AOF file is too big - https://phabricator.wikimedia.org/T233831 (10Halfak) Looks like we already have: ` auto-aof-rewrite-min-size 512mb ` [18:11:13] 10ORES, 10Scoring-platform-team: ores-redis-02 is out of disk space. AOF file is too big - https://phabricator.wikimedia.org/T233831 (10Halfak) @akosiaris, do you have experience with trimming the aof files and limiting their size? I'm starting to realize that I'm getting in deep here and a big reason we cho... [18:11:52] RECOVERY - ORES web node labs ores-web-05 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 981 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/ORES [18:14:00] RECOVERY - ORES web node labs ores-web-06 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 939 bytes in 0.075 second response time https://wikitech.wikimedia.org/wiki/ORES [18:14:50] RECOVERY - ORES web node labs ores-web-04 on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 979 bytes in 0.080 second response time https://wikitech.wikimedia.org/wiki/ORES [18:15:02] RECOVERY - ORES worker labs on ores.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 977 bytes in 0.049 second response time https://wikitech.wikimedia.org/wiki/ORES [18:20:39] OK we're back. [18:20:43] Good enough for now. [18:21:01] I left a note for akosiaris that we should follow up on. [18:21:22] (sorry for that ping ako.siaris. Hopefully you're offline by now. Nothing urgent) [18:57:16] 10ORES, 10Scoring-platform-team, 10WMDE-Analytics-Engineering, 10Wikidata, 10User-GoranSMilovanovic: track quality of all/top 10000 Wikidata items over time - https://phabricator.wikimedia.org/T195702 (10Halfak) @abian, ORES models directly a measure "completeness". However, it turns out that accuracy a... [19:25:11] 10Scoring-platform-team, 10editquality-modeling, 10artificial-intelligence: Why is jawiki's goodfaith model so bad? - https://phabricator.wikimedia.org/T230953 (10Halfak) As a first pass, I'd love to just have someone try to use what we have deployed and tell us what it tends to get wrong. The best way to d... [20:31:34] * accraze goes outside for a bit