[06:29:29] <elukey> hello folks! [06:29:42] <elukey> an-airflow1001 is having some problems with logs again [06:30:04] <elukey> I can drop old logs if nobody is around, otherwise I can wait [06:30:07] <elukey> dcausse: --^ [06:52:41] <elukey> dropped :) [06:59:21] <dcausse> elukey: thanks :) [09:41:20] <zpapierski> dcausse: do you remember where can I get media entity namespace, or perhaphs remember what it is? [09:44:21] <dcausse> zpapierski: mediainfo entities do not have their own namespace, they are part of a "revision slot" (MCR: Multi content revision) [09:44:44] <dcausse> should be the FILE namespace [09:44:54] <zpapierski> hmm, I know little too little about MCR [09:44:55] <dcausse> for the entities we care about [09:44:55] <zpapierski> thanks [09:45:50] <zpapierski> so, I assume 6 [09:46:32] <dcausse> yes [10:21:22] <gehel> lunch [10:24:52] <dcausse> lunch too [12:12:13] <dcausse> zpapierski: fyi, there are some remotely related discussions in T230862 [12:12:13] <stashbot> T230862: Create a way to filter only WB-related changes from Commons recentchanges - https://phabricator.wikimedia.org/T230862 [12:12:55] <zpapierski> thanks for the heads - I was literally thinking about that [12:14:33] <zpapierski> s/heads/heads up/ [12:15:09] <zpapierski> that would be nice to have this for events, I'm guessing this was only done for RC api [12:19:28] <dcausse> yes I think so, it'd be great to have some hints in the events indeed [12:19:51] <zpapierski> I looked for anything, but I didn't found anything useful [12:20:17] <dcausse> yes not sure the usecase was ever needed since now [12:20:34] <zpapierski> btw - do you know if pageId is something I can rely on to be the same as entity ID form sdc? [12:21:04] <dcausse> commons MID is simply "M" + page_id [12:21:08] <dcausse> so yes [12:21:13] <zpapierski> terrific [12:22:22] <dcausse> the tricky part is that a revision might not be structured data related but still impacts the RDF output (the "schema:version" triple is about the revision itself) [12:22:53] <zpapierski> I assumed that this will simply produce empty diffs - isn't that the case? [12:22:56] <zpapierski> aa [12:23:00] <zpapierski> except the version [12:23:22] <zpapierski> we could ignore that, not sure we should, though [12:23:34] <dcausse> taking all revisions after the mediainfo slot has been created is a good approach [12:23:42] <dcausse> it matches what the dump will produces [12:24:20] <zpapierski> well, it requires less logic anyway, so I'm all for that :) [12:24:52] <dcausse> well not sure it requires less logic :/ [12:25:32] <zpapierski> doesn't it? new revisions will produce at least a single triple change, with the version [12:25:39] <zpapierski> how is it different from what we do now? [12:25:59] <dcausse> you don't want to produce RDF if the entity has no mediainfo slot [12:26:18] <dcausse> and you won't have this entity in the initial state [12:26:41] <zpapierski> ah, ok - but that's the case whenever we'd create new updates for version change or not [12:27:10] <zpapierski> and API, from what I understand, 404s when no mediaslot has been created [12:27:37] <zpapierski> so from that perspective, it should be invisible to updater, at least after we deal with knowing what 404 means [12:27:44] <zpapierski> (I know, that's not a small task) [12:28:19] <dcausse> relying on 404 is perhaps possible but might not be enough [12:28:34] <zpapierski> how so? [12:30:56] <dcausse> say File1 revision 1 has no mediainfo slot it won't present in the RDF dumps thus unknowns from the flink state [12:31:57] <dcausse> you receive File1 revision 2 (parent : 1) the event will be buffered thinking that the revision 1 has been misordered [12:32:13] <dcausse> creating a lot of timers I'm afraid [12:32:20] <zpapierski> I see [12:32:42] <zpapierski> I wonder shouldn't we simply keep the revision for media items, even if there are no mediainfo triples [12:33:07] <dcausse> the dump process would not be able to do that I think [12:33:34] <zpapierski> dump probably not, but at least we'd limit impact [12:34:12] <dcausse> so we'd need a hint in the events [12:34:27] <dcausse> or we need an extra MW-call prior the reordering [12:37:29] <zpapierski> That's true, some additional info is needed [12:39:39] <dcausse> calling MW is always going to decrease the quality of the stream (eventual consistency/network errors) [12:43:38] <zpapierski> not to mention additional latency :( [12:44:28] <dcausse> indeed [12:46:27] <zpapierski> otoh, not super sure how hints in events should be added - just informing about MCR slot isn't enough, if we plan to keep version change on each revision, but only after mediainfo slot is created [12:46:42] <zpapierski> or maybe it is... [12:47:18] <zpapierski> if given change provides 404 on request and the triggering event wasn't mediainfo slot related [12:47:46] <zpapierski> it should be because mediaslot doesn't yet exist (unless eventual consistency) [12:48:20] <zpapierski> in any case, even in case of eventual consistency, we at most loose a version bump, which shouldn't be a big deal [12:48:29] <zpapierski> I feel like I'm missing something here [12:48:41] <zpapierski> anyway, break for now [12:49:19] <zpapierski> gehel, dcausse : are we skipping today's sync? everyone from WMDE seems to be out [12:49:40] <dcausse> fine by me [12:49:46] <gehel> makes no sense to keep it :/ [12:49:49] <gehel> thanks for checking! [12:51:13] <dcausse> it depends on what we want: 1/ getting only mediainfo related changes: the revision history seems harder to reconstruct, 2/ getting all revisions only after the mediainfo slot has been created: we're inline with the dumps but the events must inform us that a slot is available [13:19:13] <gehel> dcausse, zpapierski: did you receive task to grade (text parsing, java solution). I can't see your scorecards. [13:19:37] <dcausse> gehel: no, only a python notebook recently [13:20:42] <gehel> I'm pinging Amanda about it. [13:21:08] <dcausse> I received two emails tho, but the first one is "Sorry, but we ran into an error loading this page." and thought it was the same as the one I received sometime later (the python one) [13:30:18] <zpapierski> gehel: same here [13:54:39] <zpapierski> relocating [14:48:03] <gehel> zpapierski, dcausse: I've sent you the task by email [14:48:11] <zpapierski> thx [15:01:30] <gehel> ryankemper, ebernhardson: triage? [15:01:39] <gehel> mpham: ^ [15:01:58] <mpham> hi, sorry, i'm out today! [15:02:09] <gehel> mpham: soo sorry! [15:59:21] <dcausse> dinner [16:22:41] <zpapierski> relocating (and probably out for the day) [16:23:30] <zpapierski> gehel: can I have some invite to the event you mentioned? I don't see it in mine of staff calendar [18:00:26] <ryankemper> waiting on reviews of https://gerrit.wikimedia.org/r/c/labs/private/+/715570 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/715569 from sre but besides that the TLS certs for wcqs should be good to go (wrt https://gerrit.wikimedia.org/r/c/operations/puppet/+/713958/) [19:02:39] <gehel> ebernhardson: sorry, I'm a few minutes late [19:02:55] <ebernhardson> apparently i should join now :) [19:03:00] <gehel> :9