[00:02:17] maybe a simpler alternative is to convert to raw HTML, use regex to figure out where cuts start/end, replace those parentheses with something very unique, convert back to DOM, and then you can discard in a single pass [00:02:55] it's more work internally, but most of it is done by domino [00:04:21] parentheses in attribs etc. would still mess up the identification of ranges but the only failure would be too much / too little text removed [00:08:21] tgr: I've noticed the description of the "Picture of the day" in zh-wiki only shows "<" , is that excepted? https://zh.wikipedia.org/api/rest_v1/feed/featured/2018/01/10 [00:12:32] cooltey: probably weird EXIF data in the image [00:13:00] should be caught somewhere along the way though [00:13:29] ideally in the MediaWiki metadata extraction logic [00:19:14] I see. Should we fix it? 😮 [00:19:23] cooltey: either broken by https://gerrit.wikimedia.org/r/#/c/402151/ or a longtime bug that only surfaces with some weird image content [00:19:52] wanna file a bug about it? [00:20:56] OK! I will file a ticket, thanks! [00:31:59] tgr: a dumb question: what kind of tags should I tag for the ticket? 😅 [00:33:48] mcs since that's the service that fails, multimedia (or mw-file-management) since it's file related [00:35:49] tgr: thanks a lot! [18:24:33] jdlrobson: why do we have such constructs in mobile CSS: `@media print and (max-device-width: 720px)` [18:27:46] view-source:https://en.m.wikipedia.org/wiki/Jade_Terrapin_from_Allahabad