[03:19:51] inflatador: codfw and eqiad done [08:05:54] Caused by: java.time.format.DateTimeParseException: Text '2025-09-05T07:51:31.430+0000' could not be parsed, unparsed text found at index 23 [08:07:00] pfischer: the image_suggestion dag started to push tags but I think they ship a date format that is not supported [08:49:57] gehel: if you have a moment https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/1185053 the SUP is currently blocked because of this [10:02:36] lunch [10:41:55] self merging [11:10:29] sigh... Execution default of goal com.diffplug.spotless:spotless-maven-plugin:2.43.0:check failed: No such reference 'origin/main' [11:10:33] https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/jobs/606770 [11:20:22] disabled for now in order to ship a fix for the search update pipeline [12:11:30] the sup CI is kind of flaky failing with timeouts fetching devs from debian.org [12:11:38] s/devs/debs [12:23:04] the sup@eqiad was manually set to upgradeMode: stateless, not sure why, resetting to savepoint [12:26:44] ok things are getting worse... now class org.apache.flink.api.java.typeutils.runtime.RowSerializer cannot be cast to class org.wikimedia.eventutilities.flink.EventRowSerializer (org.apache.flink.api.java.typeutils.runtime.RowSerializer is in unnamed module of loader 'app'; org.wikimedia.eventutilities.flink.EventRowSerializer is in unnamed module of loader [12:26:46] org.apache.flink.util.ChildFirstClassLoader @516a6ef6) [12:39:39] dcausse: scream if you need help! [12:41:00] very puzzled by what's going on, we should not have RowSerializer in the state... [12:42:46] I'm not going to be able to help much. Should you ask on Slack if someone can add 2 eyes and a brain to the discussion? [12:43:38] sure, I will once I know what to ask :/ [12:44:45] :) [12:44:57] by that time, you'll have the problem solved :) [12:57:29] I feel that the state is borked... [12:58:35] from a fresh state this works... going to assume that we were not doing the right before? [13:08:36] Anyone up for a review of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/1181781 ? The code is pretty simple.. the secret sauce, such as it is, is in the data. [13:10:41] Trey314159: sure, I'll take a look on monday, was waiting for Peter to take a look and forgot about it :/ [13:11:04] ahh.. makes sense.. thanks! [13:17:08] I'm verifying that we applied all the plugins from https://gitlab.wikimedia.org/repos/search-platform/opensearch-plugins-deb/-/merge_requests/7/diffs , does ` curl -s $EP/_cat/plugins | grep opensearch-extra-analysis-khmer | grep -v wmf6` seem like a reasonable test? [13:32:54] we're getting lots of `CirrusSearchNodeIndexingNotIncreasing` alerts in #data-platform-alerts, but then I don't see them in the alerts page. hmm [13:33:04] inflatador: yes I'm on it [13:33:32] dcausse ACK, ping me if I can help w/anything [13:41:53] \o [13:43:50] o/ [13:48:09] .o/ [13:52:12] destroyed the producer release and re-shipping [13:52:19] sounds reasonable [13:53:17] possibly what happened is when we upgraded to flink 1.20 we forgot to update to eventutilities to 1.4 (which has this: https://gerrit.wikimedia.org/r/c/wikimedia-event-utilities/+/1079506/11/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/EventRowTypeInfo.java) [13:54:06] a new function to create the serializer and could have caused to fallback to RowSerializer instead of EventRowSerializer in the state [13:54:21] use the createSerializer on the parent class [13:54:52] this is kind of fragile... EventRowSerializer should probably not extend RowSerializer [13:55:04] well... testing now, perhaps this is not that [13:56:16] trying a restart to see if it can recover from its state [13:56:25] sounds fragile indeed :S [13:56:32] yes :( [14:04:39] hmm, turns out can't convince phab paste to render a table :S [14:13:31] a few examples of comparing query suggesters, it's very not obvious from here so i guess i need to boil these down into numbers: https://phabricator.wikimedia.org/T403826 [14:13:57] we still do unfortunate rewrites like "motogp records all categories" -> "motown records all categories" [14:16:43] and now it fails on restarts because it can't find its savepoints... [14:16:59] :S [14:25:22] it's not storing anything... [15:00:11] it deletes its own savepoint on restart... [15:01:41] INFO","message":"Disposing savepoint s3://cirrus-streaming-updater.wikikube-eqiad/producer/savepoints/savepoint-ce49a0-d83d9e7ac011." [15:04:10] hmm, thats odd [15:04:19] i don't understand why :S [15:06:40] and thats after a full reset, which shouldn't need a savepoint? [15:06:43] sigh https://issues.apache.org/jira/browse/FLINK-38033 [15:07:18] exactly what I'm seeing [15:07:28] that does look to match :S [15:07:41] so, downgrade to 1.11? Does that force the not-us projects to downgrade as well? [15:08:40] or upgrade to 1.12.1 [15:09:16] but yes the current state is pretty bad... anyone deploying a new version might hit this bug [15:09:23] :/ [15:09:50] not sure what to do... [15:10:42] I can restart the sup and wait for monday to deploy a fixed version or 1.11 [15:10:43] hmm, it's the operator itself that needs to change? Backwards seems safest, but not sure what the implications are [15:10:54] if it can run for the weekend, next week is certainly better than friday [15:11:00] ok [15:11:16] going to resume the SUP and file a task for monday [15:11:46] +1 [15:11:46] could be pretty bad for the wdqs-updater... for search we can drop the state more easily [15:16:34] should we warn gmodena- or anyone else who uses Flink? [15:20:40] sure [15:20:56] filing a ticket and will warn them on slack [15:50:41] workout, back in ~40 [15:58:59] calling it a day, have a nice week-end [16:24:57] extended regex seems to be working, can search for `insource:/\n\n\n/` and indeed it finds pages with 3 sequential newlines [16:25:58] highlighting is meh, but what can you expect when searching for unprintable chars :P [16:33:42] back [16:53:49] lunch, back in ~1h [21:01:11] ryankemper you have anything for pairing? I've just been looking at the OpenSearch on K8s stuff [21:03:01] inflatador: mostly just tidying up gerrit patches. back home in 30, ping me if you’ve anything that needs to be looked at [21:07:36] ryankemper np, I think we're good 'till next week. Enjoy the weekend! [21:07:51] likewise!