[02:12:49] joakino: after adding section collapse on NR prototype, we forgot to hide the arrows in the print styles. in wikilater you can see the arrows next to the section names in pdf. minor fix [17:58:37] nzr: joakino on vacatioonnnnn [18:57:45] jdlrobson: oh yeah forgot [20:26:38] jdlrobson: you on for the meeting later? Still see you as not responded [22:13:40] https://etherpad.wikimedia.org/p/trend-prod [22:42:29] niedzielski: hi, around ? [22:42:52] matanya: o/ hey there [22:43:31] niedzielski: I have a bug i don't know how to classify, have a sec to look ? [22:43:55] matanya: sure! [22:44:33] If you go to the hebrew wikipedia "in the news" card shows news about bolt losing his gold medal [22:45:09] if you go to desktop version: v [22:45:11] https://he.wikipedia.org/wiki/%D7%A2%D7%9E%D7%95%D7%93_%D7%A8%D7%90%D7%A9%D7%99 [22:45:42] then bolt is shown in the picture on the lower left part in the "in the news" section [22:46:13] and the text says something along the lines "bolt (in the image) lost a gold medal" [22:46:52] in the app the image is taken from an article about medals, and nulls the text - bolt is not in the image [22:47:02] niedzielski: is this clear ? [22:48:02] I guess the correct "fix" is not to say "in the image" but then how would a reader know it is bolt in the image on desktop ? [22:48:14] (not ignoring you, just processing :] ) [22:48:47] matanya: ah, so you're getting at the parenthetical content? [22:50:26] yes niedzielski [22:51:12] matanya: i worked a little bit on processing some of these news pages. one of the issues was trying to identify the subject of a news story. for example, a story about the U.S. election might mention both Hilary Clinton and Donald Trump but the subject of the story is the election itself. another issue, as you've identified, is how to eliminate parenthetical content that isn't wanted for feed presentation [22:51:33] matanya: as far as i know, the data on these pages is well curated but not uniformly structured from one wiki to the next [22:51:53] that is true [22:52:26] matanya: i think the long term solution is to update the templates, for the short term we try to use heuristics when processing the data on each wiki [22:53:29] matanya: i think the template would only need to add a CSS class to the parenthetical content to be stripped [22:53:45] that would be nice [22:54:14] matanya: the CSS would be unused by the desktop site but would provide enough structure for the Content Service backend to remove the extraneous information [22:54:32] I like this solution [22:58:59] matanya: yes, currently we go by whatever is the first link in (bold) tags. If there is none then it just takes the first link. coreyfloyd started a page about describing some of the markup: https://www.mediawiki.org/wiki/User:CFloyd_(WMF)/Feed_Markup_Documentation#In_the_news [22:59:26] thanks bearND [22:59:39] matanya: here's the logic containing the news wiki page, the headline CSS selector, and the headline subject selector (for example, U.S. election): https://phabricator.wikimedia.org/diffusion/GMOA/browse/master/etc/feed/news-sites.js;2270a3810584e631f87eac533567f887ebfb03db$61 [22:59:53] er, as bearND said :] [23:00:30] thank you both [23:00:56] matanya: What you want to look at for your use case is the topic article URL. Note that this document is currently a draft but feedback is welcome. [23:02:07] the topic article is not always what one links to, but i see your point [23:02:29] matanya: yes any feedback would be helpful here!