[22:32:40] mszabo: I'm still confused by this bug. It's not important that I be un-confused, but I'll ask my questions anyway in the hope of being a useful idiot. [22:33:22] this looks to have been a bug in the PCRE JIT itself in the end that existed between 10.35 and 10.39 [22:33:39] We're seeing PHP timeouts with a particular regex, even though we set backtrack and recursion limits. I'm trying to understand why these limits aren't preventing the timeout. Is it possible the regex is making slow but steady progress, exhausting the PHP timeout before hitting the backtrack/recursion limits? [22:34:34] or (as it sounds like) is this a bug in PCRE that causes its accounting to fail, leading to the limits not being enforced correctly? [22:35:19] I believe the latter, because in bisecting, it shows up after https://github.com/PCRE2Project/pcre2/commit/21c40e638b902679bc4be88afb9f93f845dbf5b0 and goes away after https://github.com/PCRE2Project/pcre2/commit/dc5f96663597572f694147aeec3525003c35112 [22:36:11] but I doubt our backtrack and recursion limits would save us anyways [22:36:30] <3 those detailed commit messages [22:39:50] he's an university prof :) [22:43:07] if once this bug is fixed we go back to a world where the backtrack and recursion limits effectively bound the execution time of bad regexes such that they don't tie up apache workers for minutes, is re2 still a compelling proposition, given that it it's not 100% pcre-compatible and would require carrying a custom php extension? [22:44:03] it makes sense for Google to burn down PCRE because Google is addicted to threads and statefulness. But the PHP execution model makes it less of an issue [22:45:48] re2's future is a bit uncertain btw since its maintainer died last year (former teammate of mine and a really great person, sadly :() [22:48:31] Wow that's awful, I was wondering what happened since I saw he stopped comitting [22:48:59] regarding pcre limits, one issue I see is that preg_match() returns 0 if there was no match, and false if an error occurred, which includes the case where the backtrack limit was hit [22:49:09] most call sites just treat this as a "not matched" situation: https://codesearch.wmcloud.org/search/?q=preg_match%5C%28&files=%5C.php%24&excludeFiles=&repos= [22:49:21] which is a tad scary [22:50:14] I think there's not much value switching *everything* to re2 since regexes under our control are relatively easy to troubleshoot/adjust as needed, but an alternative engine may be warranted for user-driven regex evaluation (SB, AF, TB etc) [22:56:33] yeah maybe [22:59:07] missing preg_match() error-checking could perhaps be addressed with a phpcs check (or maybe by having a wrapper function that returns some php/mw equivalent of std::expected or rust's Result. [23:01:15] yeah I suppose we have PHP to thank for making every preg_match() call require a corresponding if ( $result === false ) block :) [23:01:43] then again I suppose for most regex this problem would be purely academical