[06:29:29] <elukey>	 hello folks!
[06:29:42] <elukey>	 an-airflow1001 is having some problems with logs again
[06:30:04] <elukey>	 I can drop old logs if nobody is around, otherwise I can wait
[06:30:07] <elukey>	 dcausse: --^
[06:52:41] <elukey>	 dropped :)
[06:59:21] <dcausse>	 elukey: thanks :)
[09:41:20] <zpapierski>	 dcausse: do you remember where can I get media entity namespace, or perhaphs remember what it is?
[09:44:21] <dcausse>	 zpapierski: mediainfo entities do not have their own namespace, they are part of a "revision slot" (MCR: Multi content revision)
[09:44:44] <dcausse>	 should be the FILE namespace
[09:44:54] <zpapierski>	 hmm, I know little too little about MCR
[09:44:55] <dcausse>	 for the entities we care about
[09:44:55] <zpapierski>	 thanks
[09:45:50] <zpapierski>	 so, I assume 6
[09:46:32] <dcausse>	 yes
[10:21:22] <gehel>	 lunch
[10:24:52] <dcausse>	 lunch too
[12:12:13] <dcausse>	 zpapierski: fyi, there are some remotely related discussions in T230862 
[12:12:13] <stashbot>	 T230862: Create a way to filter only WB-related changes from Commons recentchanges - https://phabricator.wikimedia.org/T230862
[12:12:55] <zpapierski>	 thanks for the heads - I was literally thinking about that
[12:14:33] <zpapierski>	 s/heads/heads up/
[12:15:09] <zpapierski>	 that would be nice to have this for events, I'm guessing this was only done for RC api
[12:19:28] <dcausse>	 yes I think so, it'd be great to have some hints in the events indeed
[12:19:51] <zpapierski>	 I looked for anything, but I didn't found anything useful
[12:20:17] <dcausse>	 yes not sure the usecase was ever needed since now
[12:20:34] <zpapierski>	 btw - do you know if pageId is something I can rely on to be the same as entity ID form sdc?
[12:21:04] <dcausse>	 commons MID is simply "M" + page_id
[12:21:08] <dcausse>	 so yes
[12:21:13] <zpapierski>	 terrific
[12:22:22] <dcausse>	 the tricky part is that a revision might not be structured data related but still impacts the RDF output (the "schema:version" triple is about the revision itself) 
[12:22:53] <zpapierski>	 I assumed that this will simply produce empty diffs - isn't that the case?
[12:22:56] <zpapierski>	 aa
[12:23:00] <zpapierski>	 except the version
[12:23:22] <zpapierski>	 we could ignore that, not sure we should, though
[12:23:34] <dcausse>	 taking all revisions after the mediainfo slot has been created is a good approach
[12:23:42] <dcausse>	 it matches what the dump will produces
[12:24:20] <zpapierski>	 well, it requires less logic anyway, so I'm all for that :)
[12:24:52] <dcausse>	 well not sure it requires less logic :/
[12:25:32] <zpapierski>	 doesn't it? new revisions will produce at least a single triple change, with the version
[12:25:39] <zpapierski>	 how is it different from what we do now?
[12:25:59] <dcausse>	 you don't want to produce RDF if the entity has no mediainfo slot
[12:26:18] <dcausse>	 and you won't have this entity in the initial state
[12:26:41] <zpapierski>	 ah, ok - but that's the case whenever we'd create new updates for version change or not
[12:27:10] <zpapierski>	 and API, from what I understand, 404s when no mediaslot has been created
[12:27:37] <zpapierski>	 so from that perspective, it should be invisible to updater, at least after we deal with knowing what 404 means
[12:27:44] <zpapierski>	 (I know, that's not a small task)
[12:28:19] <dcausse>	 relying on 404 is perhaps possible but might not be enough
[12:28:34] <zpapierski>	 how so?
[12:30:56] <dcausse>	 say File1 revision 1 has no mediainfo slot it won't present in the RDF dumps thus unknowns from the flink state
[12:31:57] <dcausse>	 you receive File1 revision 2 (parent : 1) the event will be buffered thinking that the revision 1 has been misordered
[12:32:13] <dcausse>	 creating a lot of timers I'm afraid
[12:32:20] <zpapierski>	 I see
[12:32:42] <zpapierski>	 I wonder shouldn't we simply keep the revision for media items, even if there are no mediainfo triples
[12:33:07] <dcausse>	 the dump process would not be able to do that I think
[12:33:34] <zpapierski>	 dump probably not, but at least we'd limit impact
[12:34:12] <dcausse>	 so we'd need a hint in the events
[12:34:27] <dcausse>	 or we need an extra MW-call prior the reordering
[12:37:29] <zpapierski>	 That's true, some additional info is needed 
[12:39:39] <dcausse>	 calling MW is always going to decrease the quality of the stream (eventual consistency/network errors)
[12:43:38] <zpapierski>	 not to mention additional latency :(
[12:44:28] <dcausse>	 indeed
[12:46:27] <zpapierski>	 otoh, not super sure how hints in events should be added - just informing about MCR slot isn't enough, if we plan to keep version change on each revision, but only after mediainfo slot is created
[12:46:42] <zpapierski>	 or maybe it is...
[12:47:18] <zpapierski>	 if given change provides 404 on request and the triggering event wasn't mediainfo slot related
[12:47:46] <zpapierski>	 it should be because mediaslot doesn't yet exist (unless eventual consistency)
[12:48:20] <zpapierski>	 in any case, even in case of eventual consistency, we at most loose a version bump, which shouldn't be a big deal
[12:48:29] <zpapierski>	 I feel like I'm missing something here
[12:48:41] <zpapierski>	 anyway, break for now
[12:49:19] <zpapierski>	 gehel, dcausse : are we skipping today's sync? everyone from WMDE seems to be out
[12:49:40] <dcausse>	 fine by me
[12:49:46] <gehel>	 makes no sense to keep it :/
[12:49:49] <gehel>	 thanks for checking!
[12:51:13] <dcausse>	 it depends on what we want: 1/ getting only mediainfo related changes: the revision history seems harder to reconstruct, 2/ getting all revisions only after the mediainfo slot has been created: we're inline with the dumps but the events must inform us that a slot is available
[13:19:13] <gehel>	 dcausse, zpapierski: did you receive task to grade (text parsing, java solution). I can't see your scorecards.
[13:19:37] <dcausse>	 gehel: no, only a python notebook recently
[13:20:42] <gehel>	 I'm pinging Amanda about it.
[13:21:08] <dcausse>	 I received two emails tho, but the first one is "Sorry, but we ran into an error loading this page." and thought it was the same as the one I received sometime later (the python one)
[13:30:18] <zpapierski>	 gehel: same here 
[13:54:39] <zpapierski>	 relocating
[14:48:03] <gehel>	 zpapierski, dcausse: I've sent you the task by email
[14:48:11] <zpapierski>	 thx
[15:01:30] <gehel>	 ryankemper, ebernhardson: triage?
[15:01:39] <gehel>	 mpham: ^
[15:01:58] <mpham>	 hi, sorry, i'm out today!
[15:02:09] <gehel>	 mpham: soo sorry!
[15:59:21] <dcausse>	 dinner
[16:22:41] <zpapierski>	 relocating (and probably out for the day)
[16:23:30] <zpapierski>	 gehel: can I have some invite to the event you mentioned? I don't see it in mine of staff calendar
[18:00:26] <ryankemper>	 waiting on reviews of https://gerrit.wikimedia.org/r/c/labs/private/+/715570 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/715569 from sre but besides that the TLS certs for wcqs should be good to go (wrt https://gerrit.wikimedia.org/r/c/operations/puppet/+/713958/)
[19:02:39] <gehel>	 ebernhardson: sorry, I'm a few minutes late
[19:02:55] <ebernhardson>	 apparently i should join now :)
[19:03:00] <gehel>	 :9