[00:24:07] .z,ƙ [00:24:10] ki�:��YWr��5����AWk��m���.�M� D��&"���C�G���a�� [00:24:15] �R���k�zc�*����&C�]VY�˟j|�ow [00:24:17] �/0��n [01:52:01] TimStarling: Meeting? [01:52:24] just a second [02:14:22] &�]2=KQ��!� pf��;�����Ƿ/ػ�2�{�:�wΩ�W�H��z� [02:14:24] b�G*F�0�F8�C [02:14:27] �J]I�hx�Ow��k�[��t�  [02:14:30] �R&��������>N��#�Y���_y�>�!�mR�/�储�"v9�:yD�~� [02:14:32] �t�1�l�<&�\ʤ��V��I8?NJk�>�Q��U� �pJ� [03:22:22] &���z|��Y�Y�6���R�7��������Zw��.l{$�W�M!�� [03:22:25] �9@ H��'t��&V�]���m��Da��h���oB� ��kP�E�f���O� [03:22:27] T�0��)���hhr:�����ȁw'C����� ��[&��%���o)y��M|����q"���G�[� LQ���Ҍ|\60f [06:03:08] �d[E�M�/zIX�ט [06:03:10] �p .2(� [06:03:13] �l.W*3�jG�^:���w�}g{C9r����nKN�r�?����h�` [06:03:16] y�8y� ���AX���e*������ϻ� [06:03:18] ��Π�[x�/(��ޢRt 9�-�(}�Q4jƦ��wn)Ɔ=f.�P��p  [12:53:29] well [12:53:38] guess it's time to go back [12:54:08] hang on [12:54:26] we already have +zq $~a here [12:54:31] but the spammer tried anyway and I saw because +o [12:54:31] ok [15:07:06] > An example is article_text which is now page_title. [15:07:17] Massive improvement. Glad that finally got out to prod. [15:07:46] article->getTitle->get(Prefixed)?Text() vs page.page_title [18:00:31] DanielK_WMDE_: so regarding https://phabricator.wikimedia.org/T203566 - if the plan is to let the parser cache roll over, I'm just curious whether that would affect third parties. In other words, are all three errors specific to wmf.19>wmf.20>wmf.19? I thought it was just in general about post-wmf.20 reading from pre-wmf.20, which would affect all wikis. [18:00:39] e.g. in 1.31 to 1.32 [18:07:55] Krinkle: anmie has a better understanding of the details than I do. My understanding is that this only affects "ParserOutput objects that were created under wmf.20 and then re-serialized under wmf.19", and we still don't know when and how that even happened. [18:08:27] The when is easy, wmf.20 produced a lot of errors, so we rolled back, as we do quite often. [18:09:06] That, and the fact that we have multiple servers, is why code must always be backward compatible, even if it's just about the 30 seconds during deployments. [18:09:09] how would that cause an object to be deserialized with one version, and be re-serialized by another? [18:09:16] ^ [18:09:22] i can see it being serialized be one, and then de-serialized by another, sure. [18:09:25] but that'S not the same thing [18:09:55] I don't know under what circumstances ParserOutput is fetched, mutated *and* saved. [18:09:59] that seems odd indeed. [18:10:15] But it's only the phab comment that claims that. I don't know if that's the case. [18:10:16] maybe the analysis is off [18:10:24] Afaik any object from wmf.19 read by wmf.20 would have these errors [18:10:27] I don't see why it wouldn't. [18:10:37] apparently they did not. [18:10:49] and afaik those errors are the reason we rolled back in the first place. [18:10:52] only relatively few cacehd POs had the problem [18:10:57] not ALL [18:11:24] Actually, the errors were from wmf.20 objects being read by wmf.19 during the deploy. [18:11:34] Which made us roll back (among other errors) [18:12:03] that situation is rare for 3rd parties [18:12:10] does anyone except us run multiversion? [18:12:24] I think some of the wikifarms do [18:12:28] (not wikia) [18:12:56] i have to leave the office now to find food and be back online in time for techcom [18:13:42] my understanding was that the new code would de-serialize old objects ok, initializing the new field with the default value. [18:13:59] if we can assume that, 3rd parties should be largely unaffected [18:14:23] DanielK_WMDE_: I'm not thinking about multiversion [18:14:38] I'm saying, if 1.32 has a problem reading 1.31 parser cache, that's a release blocker. [18:15:17] It is not yet clear to me whether none of the problems were triggered by wmf.20 reading wmf.19 PO objects. [18:15:29] sure. as far as i know, this is not the case. but you are right, we should double-check that. [18:15:58] i'm mentioning multiversion because you said earlier that wmf.20 objects being read by wmf.19. that only happens when you have mutliple versions running, or if you roll back [18:16:58] DanielK_WMDE_: I agree, but disagree on semantics. Anyone with more than 1 server will have multiple versions running. It's not related to multiversion (which is "only" about doing it for a prolonged period of time for specicic wikis) [18:17:37] It's a core design principle of MW for al least a few years now that cache be guarded against multiple versions of self running for unknown amounts of time. For WMF primarily, but also more generally per ^. [18:18:02] During upgrades/ during deployments. [18:18:25] Krinkle: ok, so old code reading new cache entries may cause errors, but that should only happen during roll-out. that'S not nice, but not a blocker imho. new code erroring on all old cache entries would definitly be a blocker [18:18:41] While it may be acceptable to have issues during the upgrade, the tricky issue with unversioned cache is that the problem can persist if the last server to touch the key during the flip=flop battle is one with the "old" version. [18:19:32] guarding against this by putting the version into the cache key is nice in theory, but kills the parser cache on every deployment .) [18:19:52] E.g. page X, server 1 and 2, version A for both. Deploy version B everywhere, hits srv 1, updates key for page X based on a user request. then a user request hits srv2 and change the cache key back to version A. [18:20:02] i thought about this a bit https://phabricator.wikimedia.org/T203781 [18:20:02] Then the deploy hits srv2 but noone edits page X for a while. [18:20:06] now the cache is broken until purged. [18:20:22] DanielK_WMDE_: breaking changes to cache format needs to be breaking changes. [18:20:42] I'm not saying put the git hash in the cache key, but it does need a version cache that can be bumped if needed if it is no longer compatible. [18:20:46] Alternatively, don't make breaking changes :) [18:21:08] it does, it's Parser::VERSION. We coud add ParserOutput::VERSION. [18:21:19] both have the same problem: we kill the app servers if we bump them [18:21:23] Potentially with readers for multiple versions. [18:21:29] in code, if that's what we want. [18:21:57] that would be ideal, with hand-written serialization to json [18:21:59] Anyway, there's plenty of obvious options that we both understand. But what obviously doesn't work is making a breaking change, without support for the old format, and not purging the cache and letting it break for old data. [18:22:01] but we don't have that [18:22:17] Anything other than that is fine :) [18:22:33] of course, but to my knowledge there *is* support for the old format. [18:22:43] so the code is backwards compatible. [18:22:51] I agree. That's my understanding as well. [18:23:05] but you are raising the issue of forward compatibility. which is apparently a problem [18:23:06] So we need to figure out why it is that old objects were re-saved with an incompatible class interpretation [18:23:21] Yeah, so there's two issues potentialy. [18:23:39] Old servers need to ignore new cache value or somehow have support for it. [18:23:40] potentially, yes [18:23:45] That's 1. [18:23:55] and 2: Figure out why old objects were broken by re-saving. [18:24:17] yea. no idea how to do put code into an old version :) [18:24:23] Assuming that old stuff is not broken by re-saves or that we never do re-saves, I think we can tolerate issue 1 mostly if it's just during deploys. [18:24:30] so i have no idea how to address no. 1 [18:24:43] That's fine I think. [18:24:53] right [18:24:57] Main concern is why it is broken, and it is still broken in prod today last I checked, hence UBN :) [18:25:25] afaim it'S still broken because there are still some "bad" entries in the cache, that suffer from the re-serialization bug [18:25:34] hasn't it been 22 days yet? [18:25:55] anyway, gotta go, or i'll be late for techcom [18:26:18] It's been 20.5 days, I think. [20:01:55] Noob question: I added some logging to Special:Export, and I can see log entries on mwlog1001.eqiad.wmnet in file /srv/mw-log/export.log as expected. But I can’t find them in logstash. Should I be able to, or am I just searching incorrectly in kibana? [20:02:06] Example log entry from export.log: 2018-09-26 20:01:27 [W6vllwrAIDsAAGvNdSMAAAAX] mw2171 commonswiki 1.32.0-wmf.23 export DEBUG: Special:Export POST, dir: [], offset: [], limit: [] {"dir":null,"offset":null,"limit":null} [20:03:32] 'export' => 'debug', [20:04:07] That should be enough... [20:05:25] bpirkle: Are you looking in prod? [20:07:23] hrm, I'm not sure I know. I'm hitting https://logstash.wikimedia.org and searching by various criteria, but I could be trying this in entirely the wrong place. [20:09:31] bpirkle: at https://logstash.wikimedia.org/app/kibana#/dashboard/mediawiki-errors with channel:Export, I'm indeed also unable to make anything appear. [20:09:57] I was ultimately able to make something appear by using X-Wikimedia-Debug and setting [x] Log on, which overrides wmf-config per-channel settings and lowers all thresholds to 'debug'. [20:10:03] export.log shows there's a lot of entries [20:10:15] But that shouldn't have made a difference for this one, given it is already using debug() and including debug+ [20:10:42] So, yeah, seems like the config isn't working. [20:11:51] That makes me both happy and sad. At least I wasn't missing something terribly obvious in kibana. [20:12:19] export.log will satisfy my immediate needs, but I'm curious what went astray [20:15:54] bpirkle: "type:mediawiki AND channel:export" [20:16:03] gives 2 results [20:21:47] Are messages being lost/dropped? [20:21:59] Reedy: A "last 24 hours" search of "channel:export" finds (only) those two entries, but export.log has quite a lot of entries. Should I not expect those to correspond? [20:22:18] They should [20:40:32] mobrovac: got a link about this retry/read-only changes? [20:40:50] yup, sec [20:41:33] Krinkle: T204154 [20:41:34] T204154: Kafka JobQueue should respect DB readonly mode - https://phabricator.wikimedia.org/T204154 [21:01:28] The new Meet interface keeps popping up notification on the bottom left corner, overlapping the buttons to exit the call