[06:49:56] <dcausse>	 sigh completely overlooked that we still have indices with old lucene versions... will file a task...
[06:51:34] <dcausse>	 we can probably keep the ones with version 7.x?
[06:52:06] <dcausse>	 sigh I see ttmserver in the list...
[08:33:18] <dcausse>	 filed T423993
[08:33:19] <stashbot>	 T423993: Upgrade old indices CirrusSearch opensearch cluster - https://phabricator.wikimedia.org/T423993
[09:33:31] <dcausse>	 errand+lunch
[12:39:10] <dcausse>	 cirrus is still checking for an opensearch version at 1.3.x
[12:39:41] <dcausse>	 translatewiki.net still runs elasticsearch 7.10.2
[12:40:12] <dcausse>	 pinged the LPL team about this
[12:42:52] <dcausse>	 wondering if we should have the beta cluster tested under 2.19 before moving forward with the prod clusters
[13:19:19] <inflatador>	 <o/
[13:22:19] <ebernhardson>	 \o
[13:23:01] <ebernhardson>	 we need to run a reindex for commons anyways, maybe just recreate everything
[13:23:11] * ebernhardson forgets if thats even an option in the automation, probably :P
[13:23:46] <dcausse>	 o/
[13:24:22] <dcausse>	 translate is a bit more annoying since it does not use aliases and will result in a downtime
[13:24:39] <ebernhardson>	 ahh, .ltrstore is going to be similar in prod IIUC :S
[13:24:54] <ebernhardson>	 brian and i pondered if we have to shuffle traffic with dns-discovery while recreating those
[13:24:57] <dcausse>	 wondering if we can go single DC to do all the annoying changes more safely
[13:25:02] <dcausse>	 yes
[13:26:04] <dcausse>	 another difficulty is that we have to decide what to do with elastic 7.10.2 support, if we explicitly drop that support we have to wait for translatewiki.net to switch to opensearch 1.3 (they're still on elastic)
[13:26:40] <ebernhardson>	 oh, hmm.  So far i think it's plausible, but plausible and a good idea are not the same :P
[13:26:47] <ebernhardson>	 (supporting 7, 1, and 2)
[13:27:10] <dcausse>	 yes, not great... likely that we'll break elastic at some point :/
[13:56:04] <dcausse>	 regarding cirrus meta store, I'd be tempted to drop most of it, I don't think the versions is something we care much about, it's only sanitizer state that might still be useful for third-parties (but I doubt anyone has setup something similar)
[13:57:23] <dcausse>	 for the analysis versions if actually required it could be fed directly in the index via the _meta field (https://docs.opensearch.org/latest/mappings/metadata-fields/index/)
[13:57:47] <dcausse>	 I used this to store some state for the knn index in dse-k8s
[13:59:54] <ebernhardson>	 yea i was wondering a little myself yesterday, what exactly does the metastore do? I feel like it's not really necessary to know when to run index upgrades 
[14:00:56] <ebernhardson>	 versioning doesn't seem that useful, namespace store is now unused, finally the saneitizer state.  Not sure what to do with the final one
[14:01:20] <ebernhardson>	 although we dont use that method, and i doubt other people run the saneitizer
[14:01:29] <dcausse>	 I mean we could keep it but it'll just be boostrapped in case it's used, which for us might mean never
[14:02:04] <dcausse>	 fair enough, we could ponder dropping the php saneitizer altogether
[14:02:43] <ebernhardson>	 Maybe just remove the automation loop, but keep the ability to check specific id's or namespaces orwhatever in-proecess
[14:02:58] <ebernhardson>	 i think the metastore is only needed for the automation loop part
[14:03:24] <ebernhardson>	 i'm trying to remember, i have a feeling (but not 100% sure) we've told people to run the saneitizer to fix something before
[14:06:43] <dcausse>	 the sanitizer has two modes, a standalone maint script that should not use the meta store and the continuous one that uses the meta store to keep the loop state
[14:07:25] <dcausse>	 +1 to drop the continuous mode
[14:09:21] <ebernhardson>	 +1
[14:11:17] <dcausse>	 files T424030 to take a decision regarding elasticearch 7.10.2
[14:11:18] <stashbot>	 T424030: Decide if CirrusSearch should still support elastic 7.10.2 in MW 1.46 - https://phabricator.wikimedia.org/T424030
[14:13:39] <ebernhardson>	 thanks
[14:16:02] <dcausse>	 and the metastore one: T424035
[14:16:02] <stashbot>	 T424035: Drop the CirrusSearch metastore - https://phabricator.wikimedia.org/T424035
[14:22:46] <inflatador>	 I can look into beta cluster. We just need a way to do host-specific hiera overrides, I think that is controlled in the Openstack web UI or something
[14:27:36] <dcausse>	 inflatador: thanks, I'll do a re-index of everything in the beta cluster
[14:27:59] <dcausse>	 the .tasks index gets auto-created? is it safe to just delete it?
[14:28:45] <ebernhardson>	 if you need to recreate metastore this is what i ended up using from mwscript-k8s: https://phabricator.wikimedia.org/P91297
[14:29:07] <dcausse>	 thanks!
[14:31:43] <dcausse>	 actually everything seems already on version 135249827, even the metastore
[14:31:48] <ebernhardson>	 nice
[14:32:11] <dcausse>	 no ttmserver sadly, I whish I could test their maint script there
[14:41:32] <inflatador>	 Re: beta cluster, looks like we can add hiera stuff via https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/master/deployment-prep/deployment-cirrussearch.yaml , will get a ticket started
[14:44:46] <inflatador>	 And maybe it's time to think about moving those non-cirrus indices to opensearch on k8s
[14:48:29] <dcausse>	 sure, translate might need the wmf plugins but hopefully should be fairly easy to build a custom image for them?
[14:50:10] <ebernhardson>	 big fan of the idea
[14:59:46] <inflatador>	 yeah, we should be able to do a custom image
[15:00:55] * ebernhardson has no clue what to do with this CI error...i guess wait and hope it works tomorrow: Could not open input file: tests/phpunit/generatePHPUnitConfig.php
[15:01:14] <ebernhardson>	 i mean i could dig into it, but i sure hope someone else already is...
[15:36:05] <carandraug>	 hi. I was talking on #wikimedia-hackathon about getting a "sematinc seach" software we wrote running on toolforge and was suggested I spoke to the "search team". We have a demo at https://meru.robots.ox.ac.uk/motd/ (searches only on videos from the media of the day)
[15:36:59] <ebernhardson>	 carandraug: thats pretty cool!  What kind of support would you need? In general toolforge if pretty flexible
[15:38:28] <carandraug>	 ebernhardson: you mean in terms of resources?
[15:39:36] <ebernhardson>	 carandraug: just in general, i'm happy to lend some support but i'm not really sure what you might need.  You get some default quota when signing up to toolforge and that can be expanded, i can certainly help you justify that as we are also familiar with the compute cost of vector search
[15:40:01] <carandraug>	 I need to study toolforge. I heard about it the first time last week. I don't know how to trigger the extraction of new features and re-indexing when new files are added for example
[15:41:29] <carandraug>	 ebernhardson: a GPU would be very nice. Doesn't need a particularly powerful one. Would that be available? For vector search we do faiss in CPU. What do you typically use?
[15:41:32] <ebernhardson>	 i'm also not deeply invested in toolforge, but at a high level i think of it as a service where you can either get VM's and set everything up yourself, or there are options to deploy containers.  I'm mostly familiar with the VM side of things.   Triggering things would probably be a scheduled job (via systemd) or some such
[15:42:57] <ebernhardson>	 GPU is extremely unlikely, we don't have any of them in the cloud side of things.  There are a few in the prod networks but very few.   For vector search we use opensearch's built in knn (uses faiss or hnsw internally), but that might be more than you want to deal with
[15:43:17] <ebernhardson>	 We do the batch generation of vectors in hadoop using thousands of cores in parallel
[15:43:27] <ebernhardson>	 (that unfortunately toolforge doesn't have access to)
[15:45:11] <carandraug>	 we use faiss directly and manage the database (sqlite) ourselves. Been working fine even with millions of images
[15:46:02] <carandraug>	 when you say you do batch generation of vectors, you mean for query? You wait until you have enough queries for a batch and compute their vectors?
[15:46:44] <ebernhardson>	 embedding the query happens in realtime on a GPU, but the index itself requires vectorizing hundred million+ passages and that part is done in hadoop
[15:47:33] <carandraug>	 instead of direct access to a GPU, access to a triton server with some models would also work (that's we do for our many demos). If that's also not possible, I could compute the features of the existing media on our cluster and compute the new ones on CPU (which is ~10x slower)
[15:47:36] <ebernhardson>	 it's really only the initial vectorization that takes massive compute, after that we can calculate a delta of only changed articles which is much more tractable, but still done in hadoop so it's all consistent
[15:48:39] <carandraug>	 by the way, the "we" on my previous statements is VGG, which is a research group in computer vision at the University of Oxford. All our work is FOSS
[15:49:22] <ebernhardson>	 triton would also be hard, afaik all GPUs owned by WMF are in a k8s cluster the runs "liftwing" which is a k8s-serving framework. That is only available in hte prod networks
[15:49:38] <ebernhardson>	 it basically has to be done on CPUs
[15:49:47] <ebernhardson>	 (or externally, public api or some such)
[15:52:46] <dcausse>	 for query time embedding it's probably fine to run on a cpu in wmcloud if the model is not massive, Santosh built something similar for text models at https://embed.toolforge.org/
[15:53:00] <carandraug>	 easier to start with CPU then. I know face search (uploading the image of a person face) works well enough with CPU, and I guess visual search (text describing an image) does as well. Object search (e.g., search for images with a "san jose sharks shirt") will likely be too slow on CPU
[15:55:46] <carandraug>	 would be ok to have something on toolforge then to search across all of wikimedia images then? (we have such a demo but didn't made it public because we found far too many nsfw content)
[15:56:45] <ebernhardson>	 carandraug: yea thats totally fine.  And indeed there is plenty of nsfw across the wikis, especially in commons. You won't have any problems linking them from toolforge, totally acceptable
[15:57:28] <ebernhardson>	 as a wmf provided service we try to not put a finger on the scale of nsfw vs nsfw, but in a toolforge project you can also choose to downrank that kind of content if it doesn't meet your goals
[15:58:11] <ebernhardson>	 s/nsfw vs nsfw/sfw vs nsfw/
[16:10:47] <carandraug>	 would it be ok to setup the motd demo on toolforge? That needs about 150GB of disk space. And less than 64GB (although I could make this requirements even smaller)
[16:11:27] <dcausse>	 but for 
[16:11:34] <dcausse>	 carandraug: and forgot a similar project from the wikidata team https://wd-vectordb.wmcloud.org/ 
[16:11:49] <carandraug>	 for the search results in videos we have thumbnails every .5 seconds but maybe you already have them?
[16:12:34] <ebernhardson>	 looking at a random toolforge project i'm in, the quota there is 8 instances, 8 vCPUs, 16G ram, 80G volume storage.  But can apply for more
[16:12:48] <ebernhardson>	 i suspect that's the typical quota handed out by default
[16:13:44] <ebernhardson>	 i'm not sure about video thumbnails honestly, videos are such a small part of the content we don't pay much attention to it
[16:18:40] <carandraug>	 would it be asking too muc for 32GB of memory?
[16:20:11] <carandraug>	 dcausse: this is much more similar to what I have in mind, thank you for sharing
[16:21:34] <dcausse>	 carandraug: best is to start a phabricator task to request a project see instructions at https://wikitech.wikimedia.org/wiki/Help:Cloud_VPS_project#Request_a_new_Cloud_VPS_project
[16:23:27] <dcausse>	 I suspect could vps is what you would need, toolforge sounds too restrictive for what you'd like to serve (but I could be wrong)
[16:34:46] <carandraug>	 dcausse: thank you for the advice. I will do that now
[16:42:59] <dcausse>	 dinner
[17:31:25] <inflatador>	 carandraug you might ask in #wikimedia-cloud channel as well, I think the team that owns toolforge hangs out there
[18:24:30] <ebernhardson>	 huh, from slack saw mention that we can potentially get cross-wiki config from mwcore without our awkward api abstraction. But not sure what the limitations are: https://codesearch.wmcloud.org/deployed/?q=wgConf-%3Eget\b&files=php