[00:02:10] 10Analytics, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10chelsyx) Since MobileWikiAppShareAFact may be useful for iOS in the future, I submit the pat... [00:32:14] 10Analytics, 10Discovery, 10Operations, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Nuria) @fgiunchedi thoughts on this? looks like we are talking about 10-100 G files, not quite Terabytes [00:56:16] 10Analytics, 10ORES, 10Scoring-platform-team (Current): Emit synthetic mediawiki.revision-score events for both datacenters - https://phabricator.wikimedia.org/T214545 (10awight) After talking to @joal at All Hands, I'm now thinking that T211069 should be the first priority for integrating the scoring pipeli... [01:18:04] 10Analytics, 10RESTBase, 10Core Platform Team Backlog (Later), 10Services (blocked): Verify that hit/miss stats in WebRequest are correct - https://phabricator.wikimedia.org/T215987 (10Pchelolo) [02:44:24] (03CR) 10Milimetric: "fdans: I agree with you on the name but we can rename everything in another change. I'm going to execute the edits part manually for now," [analytics/refinery] - 10https://gerrit.wikimedia.org/r/489313 (https://phabricator.wikimedia.org/T215655) (owner: 10Milimetric) [02:54:11] (03CR) 10Milimetric: [C: 04-1] "Ok, data is there now, but you know what, as a matter of fact, I'm going to -1 this and add another job, to run the GII dataset once a yea" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/489313 (https://phabricator.wikimedia.org/T215655) (owner: 10Milimetric) [04:34:20] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Generate edit totals by country by month - https://phabricator.wikimedia.org/T215655 (10Milimetric) For reference, this is how I generated the numbers sent to the GII folks: ` -- new data select coalesce(c.country, g.country_code) as country, sum... [04:35:41] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Generate edit totals by country by month - https://phabricator.wikimedia.org/T215655 (10Milimetric) Also, I created the table and filled it through 2019-01 manually. This way we can just restart the whole job and not have to worry about this addition of a... [06:28:00] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Good news: it seems that Python 3.7 support should be available with 1.13, the next release - https://github.com/tensorflow/tensorflow/issues/20517 [06:39:38] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10santhosh) > . What I was trying to add is for local wiki, distinguish create a new page when there is no page with the same titl... [06:47:13] 10Analytics, 10ExternalGuidance, 10Product-Analytics, 10Patch-For-Review: Measure the impact of externally-originated contributions - https://phabricator.wikimedia.org/T212414 (10chelsyx) >>! In T212414#4950138, @santhosh wrote: >> . What I was trying to add is for local wiki, distinguish create a new page... [07:50:28] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) So I had to install all the packages listed above to make Tensoflow run, each time it was failing for a different missing lib. Now I am getting this: ` (tes... [07:54:46] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) ` root@stat1005:/home/elukey/test# /opt/rocm/bin/rocm-smi ======================== ROCm System Management Interface ========================... [07:55:47] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10MoritzMuehlenhoff) Tensorflow is also finding it's way into Debian, BTW (currently only in experimental): https://packages.qa.debian.org/t/tensorflow.html [09:11:58] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) I switched to the `rocm-dkml` kernel drivers and followed instructions for https://wiki.archlinux.org/index.php/AMDGPU#Set_required_module_parameters (Sea Is... [09:27:28] 10Analytics, 10Discovery, 10Operations, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10fgiunchedi) >>! In T213976#4949742, @Nuria wrote: > @fgiunchedi thoughts on this? looks like we are talking about 1... [09:31:45] (03CR) 10Fdans: Change email send workflow to notify of completed jobs (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans) [09:43:03] 10Analytics, 10Discovery, 10Operations, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Miriam) > In the hundreds of megabytes I believe. @Halfak, @EBernhardson, @Miriam, @bmansurov, is this right? Will... [09:49:52] * elukey hates GPUss [09:50:45] 10Analytics, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10Wikidata, and 3 others: track number of editors from other Wikimedia projects who also edit on Wikidata over time - https://phabricator.wikimedia.org/T193641 (10WMDE-leszek) Much appreciated @JAllemandou ! [10:39:47] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) I have booted again without `amdgpu.dc=0` to reduce the number of variables (since basically it should be related to how to handle an external screen, harmle... [10:41:22] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10MoritzMuehlenhoff) Maybe try 4.20-1 from experimental to narrow the kernel oops down? [11:39:19] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Installed 4.20 from experimental but it seems that the kfd driver is not shipped: ` elukey@stat1005:~$ find /lib/modules/ -type f -name '*.ko' | grep kfd /l... [11:40:51] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10MoritzMuehlenhoff) >>! In T148843#4950613, @elukey wrote: > Installed 4.20 from experimental but it seems that the kfd driver is not shipped: > > ` > elukey@stat100... [12:25:52] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) so the amdkfd module is built by the `rock-dkml` package (when installing) grabbing the current kernel headers. Since I didn't install them (for both kernels... [12:53:33] * elukey lunch! [14:42:18] currently re-imaging stat1005 to stretch [14:42:32] to see if the 4.9 kernel works [14:42:44] 4.19 seems leading to severe issues [14:59:03] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 2 others: Replace the current multisource analytics-store setup - https://phabricator.wikimedia.org/T172410 (10mpopov) [14:59:15] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10mpopov) 05Resolved→03Open Actually, I would like to request for https://git... [15:17:22] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10jcrespo) Important, dblists are not the canonical place for database distributi... [15:20:08] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Thanks for keeping the afphabetical order :]" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490212 (https://phabricator.wikimedia.org/T209087) (owner: 10Chelsyx) [15:24:53] 10Analytics, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10mforns) Thanks @chelsyx! Please, can you tell me since when should I apply backfilling for M... [15:25:36] 10Analytics, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10mforns) @chelsyx, or did you just add those fields for events flowing in in the future? [15:35:02] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10mpopov) >>! In T212386#4951229, @jcrespo wrote: > Important, dblists are not th... [16:04:43] hey team :] [16:21:29] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) After trying to configure the rock-dkml package on Stretch with 4.9 and 4.14, I found this: https://github.com/RadeonOpenCompute/ROCm/... [16:24:59] (03CR) 10Nuria: [C: 04-1] Change email send workflow to notify of completed jobs (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans) [16:31:15] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10mpopov) @jcrespo: is it safe to assume that the current config of `s3 (default)... [16:31:21] elukey: hi sorry! if you don't mind, can we skip or postpone til after standup? [16:31:25] i have alex's attention now and won't later in the day [16:31:40] ack! [16:32:01] my update was that I have been crying the whole day with a GPU [16:32:09] and lost the battle for the moment [16:33:26] 10Analytics, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10Nuria) @mforns , @chelsyx When fields are renamed I think it will be of use to keep old and... [16:34:18] nice! a fun day! [16:35:41] ottomata: if you are ok I'll merge my kafkatee/kafkacat change [16:37:32] +1 elukey [16:40:44] 10Analytics, 10Analytics-Wikistats, 10Patch-For-Review: Check wikistats numbers for agreggations for "all-wikipedias" - https://phabricator.wikimedia.org/T189626 (10Nuria) Comments added to talk page. [16:41:04] (03PS6) 10Fdans: Change email send workflow to notify of completed jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) [16:49:26] 10Analytics, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10mforns) @Nuria the change @chelsyx posted does indeed add renamed fields and keeps the old o... [16:50:35] worked nicely! [16:51:46] joal: about sir? [17:02:34] ping elukey , ottomata , milimetric , joal [17:03:27] ping a-team standdduppp [17:05:50] ouch sorry! [17:09:08] OH NO [17:13:53] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/issues - very interesting to see that other people opened bugs for NULL pointer... [17:35:07] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10jcrespo) Yes, the most likely changes in the future are an s0 (closed wikis wit... [17:51:08] mforns and I are going to look at https://phabricator.wikimedia.org/T214384 [17:52:03] fdans: if you're deploying, will you restart the geoeditors job? [17:52:05] the monthly one? [17:52:07] * elukey off! [17:52:24] milimetric, fdans is deploying wikistats no? I will deploy refinery [17:52:28] milimetric: I think mforns is doing that part [17:52:42] oh sorry, yes [17:52:58] ok, mforns let me know if you need any help restarting [17:53:03] I mentioned it in standup but just to be sure [17:53:04] ok milimetric thanks :] [17:55:32] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Nuria) Per per standup conversation: let's look at how many records are affected by the bug in this case (we will know as every record that has a... [17:57:27] 10Analytics, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10chelsyx) @mforns According to https://meta.wikimedia.org/w/index.php?title=Schema:MobileWiki... [17:58:06] (03PS1) 10Fdans: Release 2.5.4 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/490378 [17:58:21] (03CR) 10Fdans: [V: 03+2 C: 03+2] Release 2.5.4 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/490378 (owner: 10Fdans) [17:59:43] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Ottomata) I still think this is a bad idea! BTW in {T215442} we determined that the only way we can really handle 'map' with Refine is to use the... [18:05:18] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10Cmjohnson) @elukey is this a 1G or 10G rack? [18:06:45] 10Analytics, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10Cmjohnson) [18:13:32] nuria, milimetric, fdans another proposal to solve float vs. int problems in JS, would be to use JS Number.epsylon to force JS floats to be different from int representations, we could add a helper to the EL client that allowed to transform 1.0 into 1.0 + Number.epsylon, i.e.: el.float(1.0) [18:15:17] or: N + rand(-1, 1) * Number.epsylon [18:16:15] a loss of precision in exchange for correct representations in json [18:16:30] mforns but the el client doesn’t know the schema so each instrumentation would have to know to do that [18:17:05] it’s a fine solution otherwise [18:17:24] yes, the instrumentation would have to be explicit, like: event = {stringField: "hi", intField: 1, floatField: float(1)} [18:17:49] yes [18:17:57] but if we provide the helper, it would be minimal changes [18:18:22] yeah, but if people forget, their data would be broken [18:18:38] it seems hard to go change all the instrumentation [18:18:50] my train of thought was, the problem comes from the way JS represents data. The sooner we fix that problem down the pipeline (the closer to JS) the better. [18:19:30] hm, I disagree. The problem comes from inferring the schema [18:19:33] yea, once it's broken, it needs fix [18:19:55] we should know the schema when refining [18:20:18] not just to make sure we have the right types but for validation [18:20:40] yes, makes sense [18:20:48] oh, hm, we could add this epsilon at the validation phase, because we know the schema then [18:20:48] i agree and disagree! [18:21:00] it would slow down the processor a bit [18:21:01] the problem is ultimately JSON's fault, but we also need to know the schema when refining [18:21:15] i'd rather not rely on the [18:21:19] the validator shiould jsut validate [18:21:22] and i don't want to build that into eventgate [18:21:34] JSON isn’t sentient, we are [18:22:25] but the problem is not JSON, we can have 1.0 in JSON [18:22:33] no you can't actually! [18:22:40] it's the way JS stringifies into JSON [18:22:42] hm [18:22:44] ??? [18:22:45] ok that is true......... [18:22:55] but javascript itself maybe it part of the problem? [18:23:04] javascript can't have 1.0 [18:23:08] or more correctly [18:23:13] there's no difference between 1 and 1.0 in javascript [18:23:18] yea [18:23:55] but now that milimetric says that, I think it wouldn't be far fetched to have the '.0' added in validation time no? [18:24:06] nawwww don't want to do it [18:24:11] xD [18:24:16] even if it was just eventlogging, i wouldn't want to do it in eventgate [18:24:24] it is kind of part of validation... [18:24:30] naw [18:24:39] the validation shoudlnt' really change things (well, we do add defaults...but that is all) [18:24:42] the schema should be the way we know [18:24:54] all these problems are solved if we use the JSONSchema [18:25:30] oh, ok, I wasn't aware there was a solution already [18:25:41] yeah, I just feel strongly we should look over all the data and fix this problem wherever we find it [18:25:54] my kafka connect + jsonschema prototype is the answer [18:25:57] but we aren't going to deploy that soon [18:26:05] And solve it for the near term as well [18:26:08] but we need to support map types this quarter [18:26:15] to do that, refine needs to uset he schema [18:26:20] i think joseph and I just want to build that into Refine [18:26:23] rather than doing inference [18:26:30] (perhaps we should have done that to begin with...) [18:26:33] yeah, we can’t afford to wait for that, we’re potentially breaking a lot of data [18:26:35] but, if we do that, we'll solve htis problem [18:26:44] integers will be bigints, numbers will be doubles [18:26:48] or not, but we need to check and prevent [18:26:55] wait for building it into refine? [18:27:01] yeah [18:27:13] that shouldn't be too hard.... [18:27:22] milimetric: for this one table w can just alter the schema now, right? [18:27:36] we need to backfill too [18:28:04] 10Analytics, 10Discovery, 10Operations, 10Research: Workflow to be able to move data files computed in jobs from analytics cluster to production - https://phabricator.wikimedia.org/T213976 (10Nuria) Ok, I leave up to @fgiunchedi and @Ottomata to think about to how to productionize the "deployment" of model... [18:28:09] and search to see if the problem exists elsewhere and fix it so it doesn’t happen between now and when the proper fix is in place [18:29:59] milimetric: it probably exists elsewhere, but i'd assume it is rare [18:31:30] milimetric: do we need to backfill to get the decimal part of the number? [18:31:36] i just tested [18:31:41] alterting the table will not break old data [18:31:45] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Nuria) @Ottomata can you clarify what you think is a bad idea so we are all on the same page? [18:31:48] but will return them as e.g. 3.0 [18:32:05] yeah, to repair the old data [18:32:28] so, rerefine from all data we have for navtiming [18:32:43] that isn't so hard i guess, we just need to alter the table and launch a big refine job with --ignore_succcess_flag=true [18:32:55] I am happy to do this, btw, mforns volunteered but I can do it if you want [18:33:36] yea, I would do that after deployment train [18:33:41] probably tomorrow [18:33:43] man my compy is being weird...restarting... [18:35:27] (03CR) 10Nuria: [C: 03+1] "I think it looks good but waiting on joal's +2 as he might disagree." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/484657 (https://phabricator.wikimedia.org/T206894) (owner: 10Fdans) [18:38:53] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Nuria) I just checked and there are records with deviceMemory=0 in the tens of thousands so backfilling is probably a good idea. [18:40:05] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Ottomata) @Nuria sorry, I mean assuming that all integers are floats is a bad idea. We should use double for decimals and bigints for integers, j... [18:40:59] 10Analytics, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10Nuria) @chelsyx we will backfill as much as we can but we might not have as much data. [18:41:51] 10Analytics, 10Product-Analytics, 10Reading-analysis, 10Patch-For-Review: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10chelsyx) No problem @Nuria ! [18:41:53] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Nuria) >sorry, I mean assuming that all integers are floats is a bad idea. Agree, I do not think we want to see numbers as edit_count of a user... [18:46:42] (03PS1) 10Mforns: Update changelog.md for v0.0.85 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/490385 [18:47:13] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging for deployment" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/490385 (owner: 10Mforns) [18:47:26] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10EBernhardson) >>! In T148843#4947755, @elukey wrote: > Let's see if we can narrow down the packages needed: > > > - hsa-rocr-dev - AMD Hete... [18:58:17] 10Analytics, 10Discovery-Search (Current work): Spike. Load search data into turnilo to test whether exploratory data can do away with some of the dashboards - https://phabricator.wikimedia.org/T216058 (10Nuria) [19:12:14] !log Deployed refinery-source v0.0.85 using jenkins [19:12:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:25:55] 10Analytics, 10Data-Services: Rethink Cloud DB replicas - https://phabricator.wikimedia.org/T215858 (10bd808) [19:35:10] ping ottomata ml meeting? [19:35:25] ml meeting!/ [19:35:27] OHG [19:35:28] sorry [19:35:32] ok [19:36:11] (03PS1) 10Mforns: Bump up jar version for mediawiki history jobs to v0.0.85 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490394 [19:39:52] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Self-merging for deployment" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/490394 (owner: 10Mforns) [19:46:35] !log Deploying refinery with scap [19:46:36] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:13:36] fdans, you still there? I was looking into deploying AQS, and found this change: [20:13:38] https://gerrit.wikimedia.org/r/#/c/analytics/aqs/+/477512/ [20:14:06] it's merged, but has a jenkins erros post-merge [20:14:13] is that an issue? [20:14:22] also, has this patch been tested locally? [20:15:47] milimetric, I didn't see any changes in refinery or refinery-source that needed restart of oozie job for geoeditors, is that needed? [20:25:06] mforns: argh, my brain is completely broken, I -1ed the change myself [20:25:17] heh, I saw that [20:25:33] ok, I'll merge and deploy and restart later [20:25:33] ok, then, I think train is finished [20:25:39] choo choo! [20:25:39] oh... [20:25:45] xDDDD [20:26:08] milimetric, I will deploy if you want to deploy this week [20:26:28] just ping me, and I'll do that [20:36:33] 10Analytics, 10EventBus, 10MediaWiki-Core-Testing, 10Quibble, and 4 others: Flaky quibble-vendor-mysql-hhvm-docker test in Jenkins - https://phabricator.wikimedia.org/T216069 (10mobrovac) p:05Triage→03High [21:26:26] mforns: question if you may [21:26:33] sure :] [21:26:48] mforns: i want to try couple things loading some search data into turnilo [21:27:09] aha [21:27:15] mforns: should i just copy one of the jobs in an-coord machine and change some parameters to test the load? [21:27:30] nuria, that is an option [21:27:51] or you can create a job call from the example in the docs, see: [21:28:26] https://wikitech.wikimedia.org/wiki/Analytics/Systems/Hive_to_Druid_Ingestion_Pipeline#How_to_run_from_the_command_line [21:28:55] mforns: I wanted to test it a bit before i would send the patch [21:28:58] mforns: aahahahah [21:29:10] mforns: i see, that is all wrapped up, ok, will do that [21:29:15] 10Analytics, 10EventBus, 10Research, 10Wikidata, and 5 others: Surface link changes as a stream - https://phabricator.wikimedia.org/T214706 (10bmansurov) Thanks, everyone, for helping me with this task. I see the events are being emitted. Here's some data: `lang=json { "rev_id": 857576225, "perfor... [21:29:54] nuria, we can delete the datasource after the test is finished [21:30:17] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline - https://phabricator.wikimedia.org/T211247 (10Ottomata) Status: - eventgate-analytics deployed to staging k8s. It can be reached at http:... [21:30:51] mforns: and once i make it final i would also need top add some config to turnilo, no? that i know how to test but wondering if we do taht everytime [21:31:32] ottomata: wowow eventgate [21:31:35] nuria, usually no additional turnilo config is needed... [21:31:46] mforns: k [21:32:06] 10Analytics, 10EventBus, 10Research, 10Wikidata, and 4 others: Surface link changes as a stream - https://phabricator.wikimedia.org/T214706 (10bmansurov) [21:32:17] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Intake Service: Implementation: Deployment Pipeline - https://phabricator.wikimedia.org/T211247 (10Ottomata) @Pchelolo If you want to test multi endpoint / monolog stuff in beta, you can add y... [21:35:54] nuria, let me know if I can help, or you want to pair on that [22:13:54] mforns: how about specifying measures? [22:14:53] nuria, you can use --metrics metric1,metric2,... [22:21:04] 10Analytics: Coarse alarm on data quality for refined data based on entrophy calculations - https://phabricator.wikimedia.org/T215863 (10Milimetric) I understand it, yay! And I like it. We could even compute the tolerance from past data once in a while, and use that instead of our guess. That way this approac... [22:30:05] mforns: and --time_measures? same right? [22:30:15] nuria, yes, [22:30:36] --time_measures will bucket numeric fields into string dimensions [22:31:31] numeric time measure fields into strings like: 50ms-250ms, 1sec-4sec, etc. [22:31:51] nuria, it assumes the initial value is in millisecs [22:32:26] nuria, if you specify a field as time_measure, you don't need to specify it as --dimension [22:32:41] mforns: k [22:38:54] mforns: you have to sudo -u hdfs right? [22:39:43] nuria, yes, it writes tmp data to hdfs [22:40:04] hm, I think that is missing from the docs, will add [22:47:38] mforns: i will modify docs really, no worries, was just doing that [22:47:44] mforns: i must be missing something [22:47:52] mforns: i get: [22:47:54] isn't it working? [22:47:55] https://www.irccloud.com/pastebin/nFLZweTm/ [22:48:31] nuria, can I see the command? do you want to pair? [22:48:50] mforns: ya, it is here: [22:48:54] https://www.irccloud.com/pastebin/PKHxzLcV/ [22:49:10] ok lookin [22:51:30] mforns: ok, i executed it just on command line [22:52:01] nuria, maybe it's because of the spaces between commas in the comma separated lists? [22:52:29] I think the interpreter will probably think it's another argument? [22:52:58] I would try --dimensions blah,blah,blah without spaces [22:53:08] or use quotes I guess [23:04:10] mforns: k let me try [23:21:16] nuria, I think there's still another space after comma in --time_measures [23:21:36] mforns: right, corrected and got it to run [23:21:44] cool! [23:22:05] nuria, one month of data can take a while depending on data size [23:22:12] mforns: just failed with "Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.IllegalArgumentException: Since equal to until (0) for last partition key 'hour'." [23:22:34] hmmmm [23:23:14] mforns: this is small about 10 per sec [23:23:25] k [23:23:26] mforns: let me fix time granularities [23:28:03] nuria, granularities look good, though [23:28:40] there's a typo in event.autocompletetype, but this is not the reason it fails... [23:29:26] did it fail when you executed it for 1 month period, as in load_search.sh?