[00:06:59] <wikibugs_>	 (03CR) 10Chelsyx: Add mobile_apps_uniques_by_country_daily and mobile_apps_uniques_by_country_monthly jobs in oozie (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/451566 (https://phabricator.wikimedia.org/T186828) (owner: 10Chelsyx)
[00:47:46] <wikibugs_>	 10Analytics, 10Analytics-Kanban: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420 (10Milimetric) As of June 21st, the query started returning 0.  Indeed, there's no data after that day:  ``` mysql:research@analytics-slave.eqiad.wmnet [log]> select max(rev_timestamp) from mediawi...
[01:37:23] <wikibugs_>	 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014 (10Graham87) I only found out by accident, but appears XML versions of these dumps were put up on dumps.wikimedia.org last October ... https://dumps.wikimedia.org/archive/2001-xml/  I'm planning to do my own imports from them. Bu...
[01:58:54] <wikibugs_>	 10Analytics, 10Analytics-Kanban: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420 (10kaldari) Thanks for your help Dan!!
[02:26:10] <wikibugs_>	 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014 (10Graham87) I would be planning to work with them, but I get the following error using importDump.php to import them under MW1.25 (old, I know, but I don't think an update would fix this): A database query error has occurred. Qu...
[02:59:41] <wikibugs_>	 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014 (10Graham87) Oh I see now from the source code for importUseModWikipedia.php: the account UseModWiki admin needs to exist first.
[03:38:45] <wikibugs_>	 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014 (10Graham87) The import script seems to halt on the title "Vector space]" or somewhere around there, using the filtered XML dump. So it's not working quite yet.
[07:06:53] <wikibugs_>	 (03CR) 10Joal: [C: 031] "Indeed - Missed that one :( Sorry ottomata" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/451784 (https://phabricator.wikimedia.org/T198908) (owner: 10Ottomata)
[09:18:41] <wikibugs_>	 (03CR) 10Joal: Add spark yarn scala and pyspark 'large' kernels (031 comment) [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/451781 (https://phabricator.wikimedia.org/T201519) (owner: 10Ottomata)
[09:20:26] <wikibugs_>	 10Analytics, 10Datasets-General-or-Unknown, 10Documentation: Missing documentation for pageviews dataset - https://phabricator.wikimedia.org/T201653 (10ArielGlenn) it's there for me. i just checked by clicking on the above link. (I also tried going directly from the analytics index page to see if the link is...
[09:33:31] <wikibugs_>	 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014 (10Graham87) After replacing all instances where the title was "Vector_Space]" with "Vector_Space1", the XML file imported perfectly here!
[09:42:04] <wikibugs_>	 10Analytics, 10Datasets-General-or-Unknown, 10Documentation: Missing documentation for pageviews dataset - https://phabricator.wikimedia.org/T201653 (10Tgr) Someone just added it (the file is dated to two hours ago). Thanks!  It still does not document the format though. Which is [[https://stackoverflow.com/...
[09:44:46] <wikibugs_>	 10Analytics, 10Datasets-General-or-Unknown, 10Documentation: Missing documentation for pageviews dataset - https://phabricator.wikimedia.org/T201653 (10ArielGlenn) Ah ha!  I hope whoever added the file will update it with the information you need. (I wonder who that kind person was?)
[13:25:59] <wikibugs_>	 10Analytics, 10Analytics-Kanban: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420 (10Ottomata) I don't know of anything that would have turned off these imports into MySQL.  Will look into it!
[13:26:50] <wikibugs_>	 (03PS2) 10Ottomata: Add spark yarn scala and pyspark 'large' kernels [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/451781 (https://phabricator.wikimedia.org/T201519)
[13:27:04] <wikibugs_>	 (03CR) 10Ottomata: Add spark yarn scala and pyspark 'large' kernels (031 comment) [analytics/jupyterhub/deploy] - 10https://gerrit.wikimedia.org/r/451781 (https://phabricator.wikimedia.org/T201519) (owner: 10Ottomata)
[13:35:16] <wikibugs_>	 10Analytics: Import 2001 wikipedia data - https://phabricator.wikimedia.org/T155014 (10Graham87) I've imported a few pages, including the page n admins an the one on Atlas Shrugged.  I've also created accounts and user pages on enwiki for [[https://en.wikipedia.org/wiki/User:Page_move_link_fixup_script |Page mov...
[13:40:04] <ottomata>	 joal:  do you remember if we ever figured out how and when and what makes oozie admin -sharelibupdate work?!?!
[13:40:09] <ottomata>	 i remember it was magical
[13:40:12] <ottomata>	 sometimes it worked, soemtimes it did nothign
[13:40:16] <ottomata>	 right now it is doing nothing!
[13:40:52] <joal>	 ottomata: I don't recall at all :(
[13:41:24] <joal>	 ottomata: everytime we did it together, it kinda worked, but you told me it wasn't working sometimes
[13:42:14] <ottomata>	 yeah, and puppet should do it, but my comment was
[13:42:18] <ottomata>	 # For unknown reasons, oozie admin -sharelibupdate is really flaky.
[13:42:18] <ottomata>	 # Sometimes it succeeds, sometimes it does nothing.  This script will
[13:42:18] <ottomata>	 # attempt to run it, but you might need to manually do so until
[13:42:18] <ottomata>	 # -shareliblist shows $spark2_sharelib in the output.
[13:42:38] <joal>	 :(
[13:45:08] <joal>	 ottomata: long SQL updatE?
[13:45:17] <joal>	 shouldn't be that long though, should it?
[13:45:28] <ottomata>	 no, it just returns immediately with no output and exit code 0
[13:45:43] <joal>	 ottomata: could be wrongly async
[13:48:41] <joal>	 hm
[13:49:08] <joal>	 ottomata: oozie server could need a shake?
[13:54:17] <ottomata>	 joal:  not totally sure
[13:54:17] <ottomata>	 but
[13:54:23] <ottomata>	 i was able to do it via the oozie rest api!!!
[13:54:33] <joal>	 WAT?
[13:54:40] <ottomata>	 curl http://localhost:11000/oozie/v2/admin/update_sharelib
[13:54:41] <ottomata>	 did it
[13:54:50] <joal>	 feels like a CLI bug then
[13:54:53] <ottomata>	 i read through some code
[13:54:58] <ottomata>	 and saw that the oozie cli code was just calling the rest api
[13:55:08] <ottomata>	 gonna change the puppet script to just use that
[13:57:35] <joal>	 great ottomata - cool finding!
[14:00:57] <ottomata>	 joal was going to ask you
[14:01:01] <ottomata>	 for camus partition checker
[14:01:10] <ottomata>	 we usually use the camus wrapper
[14:01:23] <ottomata>	 for all the other uses of camus wrapper, puppet is specifying a very old refinery-camus jar
[14:01:25] <ottomata>	 which is fine
[14:01:26] <ottomata>	 but
[14:01:37] <ottomata>	 refinery 0.0.68 uses newer scala
[14:01:52] <ottomata>	 and the old camus wrapper gets scala deps from spark 1 classpath
[14:01:55] <ottomata>	 which is not newer scala
[14:02:04] <ottomata>	 so
[14:02:18] <ottomata>	 i either need to modify the camus script wrapper to take a parameter to change the classpath
[14:02:20] <ottomata>	 or
[14:02:27] <ottomata>	 we upgrade refinery-camus jar for all uses of camus :D
[14:06:27] <HaeB>	 a-team: the web team is soon going to activate a new EL schema: https://meta.wikimedia.org/wiki/Schema:PageIssues and olliv and i were wondering how much it would take to make it (or part of it) compatible with Druid, to be able to access some of the data in Superset
[14:06:39] <HaeB>	 ..is https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines still the best reference for this?
[14:07:18] <milimetric>	 HaeB: those guidelines still apply but I can take a look too
[14:08:10] <HaeB>	 ...and the measures would actually be the number of actions per (say) minute, can druid aggregate/count individual rows   into such a measure ?
[14:10:50] <milimetric>	 HaeB: yes, that's what Druid's good at.  So the dimensions all look good except pageToken and sessionToken, which Druid would ingest but they wouldn't be useful in any dimensional queries I don't think
[14:11:13] <HaeB>	 yes, they could and should be dropped
[14:11:19] <HaeB>	 can one specify that too?
[14:11:26] <milimetric>	 HaeB: I think we have to tweak something to get it to roll-up at the minute resolution, but otherwise everything else looks good
[14:11:41] <milimetric>	 HaeB: we don't have conventions for excluding fields yet, that would be a good one though!
[14:12:02] <HaeB>	 how was it done for webrequest?
[14:12:10] <milimetric>	 I think there's some custom code there
[14:12:37] <milimetric>	 ottomata: what do you think should we establish a convention where like if we see a prefix of "unique_" or something, as in unique_session_token, we don't use that field when loading into Druid?
[14:13:16] <milimetric>	 could be a suffix like _nodruid
[14:13:17] <milimetric>	 (ew)
[14:13:27] <wikibugs_>	 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Upgrade spark 2.3.0 -> 2.3.1 on analytics cluster - https://phabricator.wikimedia.org/T200732 (10Ottomata) @EBernhardson try now.  The -sharelibupdate command has always been very flaky.  Sometimes it just doesn't work, and I dont' k...
[14:13:29] <milimetric>	 some metadata would be nice
[14:13:43] <ottomata>	 hmm
[14:14:03] <ottomata>	 unique might be pretty limiting.  hm
[14:14:07] <milimetric>	 yeah
[14:14:19] <ottomata>	 we could do it the other way, like we do for measure?
[14:14:27] <ottomata>	 explicitly label dimensions?
[14:14:38] <ottomata>	 anything that is not either dimension or measure woudl be skipped?
[14:14:43] <milimetric>	 seems to me that would make the whole schema harder to read
[14:14:46] <ottomata>	 yeah
[14:15:28] <milimetric>	 does the refine/druid loading code have access to the schema description?
[14:15:42] <milimetric>	 we could put something in there like  DO NOT INGEST IN DRUID and check for it
[14:15:50] <milimetric>	 *field description
[14:16:17] <milimetric>	 or NOT A DIMENSION, better, in case we use something other than Druid
[14:16:22] <ottomata>	 milimetric:  i don't think it does.  also, the EL druid/loading code has never been fully finished, since we didn't have use for it yet, and it would be much easier if this was in new system (e.g. no capsule)
[14:16:30] <ottomata>	 mforns:  might know more since he worked on that for nav timing
[14:16:41] <milimetric>	 yeah, I thought he was off today
[14:16:44] <ottomata>	 ah ok
[14:16:49] <milimetric>	 (could be wrong)
[14:16:52] <mforns>	 helo?
[14:16:56] <milimetric>	 sorry! you're here
[14:17:06] <mforns>	 yea!
[14:17:08] <mforns>	 reading
[14:17:18] <milimetric>	 mforns: what do you think, can/should we add something to the field description of a schema so that it doesn't get ingested as a dimension?
[14:17:44] <milimetric>	 like "*not a dimension*" and check for the exact string?
[14:18:17] <ottomata>	 milimetric:  refine jobs in hadoop generally don't get the schema
[14:18:19] <ottomata>	 they could 
[14:18:30] <joal>	 ottomata: About scala version, I have not strong opinion - Is there one you prefer?
[14:18:36] <mforns>	 milimetric, ottomata, DataFrameToDruid does not have access to the schema
[14:18:42] <ottomata>	 joal:  i think we should upgrade.  the code hasn't really changed at all
[14:18:44] <mforns>	 it has some conventions
[14:18:46] <ottomata>	 just the scala version it is compiled with
[14:19:06] <joal>	 ottomata: Works for me :)
[14:19:48] <mforns>	 milimetric, we could define a set of tags that add semantics to a field
[14:19:54] <milimetric>	 mforns: seems like it's hard to come up with a good convention that doesn't make the schema ugly and says "this is a dimension, this is not a dimension"
[14:19:55] <mforns>	 like @dimension
[14:20:21] <mforns>	 hm
[14:20:23] <milimetric>	 mforns: yeah, that'd be great, but are you against reading the schema from DataFrameToDruid?
[14:20:40] <joal>	 ottomata: Actually refinery-camus jar already uses the "provideed" scala-lang version
[14:20:50] <ottomata>	 ?
[14:20:57] <ottomata>	 yes
[14:20:59] <mforns>	 but at least we could go with @additive @categorical or things like this
[14:21:00] <ottomata>	 joal, but camus wrapper
[14:21:02] <ottomata>	 sets the classpath
[14:21:13] <mforns>	 we could read the schema from DataFrameToDruid why not?
[14:21:14] <ottomata>	 and gets scala from spark1 lib
[14:21:26] <ottomata>	 which is a different version than we complied it with in refinery
[14:21:31] <joal>	 ottomata: We need to update either the wrapper or the wrapper calls
[14:21:33] <ottomata>	 if i use camus wrapper with latest refinery-camus
[14:21:33] <ottomata>	 it fails
[14:21:36] <ottomata>	 yeah
[14:21:39] <ottomata>	 but if we update the wrapper
[14:21:46] <ottomata>	 i'd be updating it for all usages
[14:21:54] <joal>	 would it be an issue?
[14:21:58] <ottomata>	 oh maybe 2.11 will work with the older jars?
[14:21:58] <ottomata>	 hm
[14:22:00] <ottomata>	 checking
[14:22:21] <joal>	 As long as all-usages call the new version we're safe (I'm assuming you wanted to prevent that)
[14:22:51] <ottomata>	 oh, that's what i was saying
[14:22:55] <ottomata>	 just make everything use spark2 classpath
[14:23:00] <joal>	 yup
[14:23:03] <ottomata>	 and update to refinery-camus-0.0.69
[14:23:11] <ottomata>	 but mabye spark2 classpath works with old refinery too?
[14:23:57] <joal>	 I would expect not (scala updates usually don't go without changes), but we can try
[14:24:14] <ottomata>	 it sure does!
[14:24:25] <milimetric>	 mforns: yeah, that's great, I thought we talked about reading the schema before and someone was against it
[14:24:56] <milimetric>	 so that's awesome, mforns is it too distracting now to come up with the conventions, write them in the guidelines and tell Tilman the result?
[14:24:57] <ottomata>	 i'm not opposed, but it means you need to have the logic to find the schema and also know about the schema repo  now
[14:24:58] <ottomata>	 probably good
[14:25:06] <ottomata>	 if we can, that bit should be pluggable
[14:25:08] <mforns>	 milimetric, we're talking about schema registry or current schemas?
[14:25:14] <ottomata>	 because it is going to change
[14:25:45] <ottomata>	 maybe a function you can give to DataFrameToDruid that takes the DataFrame and returns the JSONSchema
[14:25:47] <milimetric>	 mforns: it's a new schema that we could try this out with, and carry the idea forward to the MEP if it works well
[14:26:08] <ottomata>	 but i agree that having the schema will be useful
[14:26:11] <ottomata>	 esp for things like this
[14:26:17] <milimetric>	 right ottomata, we keep the "where the schema is" abstract
[14:26:39] <ottomata>	 i wonder if things like this is more schema metadata?  if in MEP it should go in the metadata part
[14:26:44] <mforns>	 milimetric, OK, but we should thing a bit about the conventions no?
[14:26:45] <ottomata>	 instead of being in the schema field description
[14:26:48] <milimetric>	 so: @dimension, @measure would be good for start
[14:26:49] <mforns>	 *think
[14:27:01] <ottomata>	 hm 
[14:27:03] <ottomata>	 oh
[14:27:05] <milimetric>	 mforns: yeah, we can take all the time we want, 
[14:27:08] <ottomata>	 like adding that to the schema itself?
[14:27:16] <ottomata>	 that might be cool.... it would take a while to figure that out
[14:27:16] <milimetric>	 to the description, that was mforns' idea, yea
[14:27:17] <ottomata>	 but yeah
[14:27:20] <ottomata>	 well i mean
[14:27:28] <ottomata>	 we are going to use a paired down JSONSchema anyway
[14:27:31] <ottomata>	 not the full draft
[14:27:37] <ottomata>	 we might even make one
[14:27:42] <ottomata>	 that all event schemas need to conform to
[14:27:43] <ottomata>	 if we do that
[14:27:50] <ottomata>	 we could also add this tag convention to fields
[14:28:00] <ottomata>	 so it would be in the schema itself, rather than in freeform description
[14:28:05] <ottomata>	 e.g.
[14:28:18] <milimetric>	 oh so the other option you're proposing is the schema itself would be more like Druid's ingestion, like { dimensions: [], measures: [] } ?
[14:28:38] <ottomata>	 "country": {
[14:28:38] <ottomata>	   "type": "string",
[14:28:38] <ottomata>	   "tags": ["dimension"]
[14:28:38] <ottomata>	 }
[14:28:38] <ottomata>	 ?
[14:28:45] <ottomata>	 hmmm oh
[14:28:48] <ottomata>	 that might be even better
[14:28:52] <milimetric>	 ah, ok, can you do that on the current EL?
[14:28:58] <ottomata>	 since in later versions of jsonschema
[14:29:06] <ottomata>	 e.g. required: is separate from the fields
[14:29:09] <ottomata>	 so this probably shoudl be too
[14:29:17] <ottomata>	 milimetric:  mostly i'm considering for MEP
[14:29:22] <ottomata>	 not sure if you can do it now
[14:29:37] <milimetric>	 well, changing the structure now, definitely not
[14:29:40] <ottomata>	 i like your idea better
[14:29:41] <ottomata>	 tho
[14:29:54] <milimetric>	 marcel's idea, the tag in the description?
[14:29:56] <ottomata>	 properties: { ....}
[14:29:57] <ottomata>	 required [...]
[14:29:57] <ottomata>	 dimensions: [...]
[14:29:57] <ottomata>	 hmmm
[14:30:01] <ottomata>	 maybe
[14:30:02] <milimetric>	 oh
[14:30:04] <ottomata>	 hmm
[14:30:08] <ottomata>	 actually i dunno
[14:30:10] <ottomata>	 tags is more flexible
[14:30:18] <ottomata>	 i dunno i guess that is for future consideration :p
[14:30:20] <ottomata>	 for this use...
[14:30:30] <ottomata>	 i guess description is all we got?
[14:30:46] <mforns>	 do you guys want to meet in batcave?
[14:30:49] <ottomata>	 i dunno, maybe EL would let you make a schema with non standard fields
[14:30:53] <ottomata>	 we could try it
[14:30:54] <milimetric>	 sure, to the batcave!
[14:32:29] <ottomata>	 Error: Invalid key "tags" in "Schema -> Properties (properties) -> database"
[14:33:23] <ottomata>	 nope, doesn't letya
[14:33:30] <ottomata>	 since it validates the schema against jsonschema schema
[14:33:36] <ottomata>	 so we'd need a custom jsonschema schema
[14:33:46] <ottomata>	 we coudl actually add that pretty easy to EventLogging extension now, if yall wanted to
[14:34:09] <ottomata>	 it is here https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/schemas/schemaschema.json
[14:36:50] <ottomata>	 JSONSchema draft 3 doesn't seem to allow it
[14:36:54] <ottomata>	 but draft 4 is fine with it
[14:37:19] <ottomata>	 as does draft 7
[14:38:43] <ottomata>	 milimetric:  mforns, in general i think this is a pretty good idea for MEP,  and would  let us not have to use field naming conventions
[14:38:53] <ottomata>	 this is type metadata, and should probably go in the schema anyway
[14:38:54] <ottomata>	 just like the type does
[14:38:57] <mforns>	 ottomata, yea
[14:39:09] <mforns>	 what does MEP stand for?
[14:39:14] <ottomata>	 Modern Event Platform :)
[14:39:20] <ottomata>	 AKA EoF :p
[14:39:20] <mforns>	 oh gotcha
[14:41:09] <ottomata>	 oh man milimetric.  the mysql eventbus stuff is sill using the analytics kafka brokers :o
[14:52:44] <ottomata>	 !log restarting eventlogging-consumer@mysql-eventbus consuming from kafka jumbo-eqiad - T201420
[14:52:48] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[14:52:48] <stashbot>	 T201420: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420
[14:54:32] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420 (10Ottomata) Wow, this is 100% my fault.  We stopped using the Kafka cluster that this MySQL import process was configured to use back in June.  The configs for this were just...
[15:03:04] <wikibugs_>	 (03PS1) 10Ottomata: camus wrapper - Use Spark 2 jars to get Scala and Hadoop dependencies [analytics/refinery] - 10https://gerrit.wikimedia.org/r/451869 (https://phabricator.wikimedia.org/T198908)
[15:03:12] <joal>	 milimetric: standup?
[15:03:19] <HaeB>	 thanks for looking into it, everyone! to be clear, for this particular schema, it would be more a nice to have, and a good opportunity to check the requirements right now as we finalize the schema's format. but in general there is quite a bit of interest to make EL data more accessible in such a way (later also with more complicated schemas)
[15:03:52] <wikibugs_>	 (03CR) 10Ottomata: [C: 032] CamusPartitionChecker - only send emails if errors are encountered [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/451784 (https://phabricator.wikimedia.org/T198908) (owner: 10Ottomata)
[15:36:02] <mforns>	 HaeB, would you like fields pageIdSource and pageTitle (from PageIssues) to make it into Druid?
[15:36:25] <HaeB>	 no, that can be left out too
[15:37:26] <mforns>	 I think you don't need any change for that schema being loaded into Druid automatically
[15:38:00] <mforns>	 HaeB, hmmm, wait
[15:38:04] <mforns>	 there's this array field
[15:38:12] <mforns>	 issuesSeverity
[15:38:28] <mforns>	 I don't think Druid supports that
[15:41:40] <mforns>	 HaeB, also EventLoggingToDruid job does flatten nested object fields, but does not handle array fields...
[15:43:28] <mforns>	 if there's the possibility to transform that field into a simple string, that would help
[15:48:24] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10MW-1.32-release-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), 10Patch-For-Review, 10Performance-Team (Radar): Spin out a tiny EventLogging RL module for lightweight logging - https://phabricator.wikimedia.org/T187207 (10Niedzielski) I don't know that this is enti...
[15:54:55] <wikibugs_>	 (03PS5) 10Milimetric: Annotate wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440971 (https://phabricator.wikimedia.org/T194705)
[15:55:02] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Annotate wikistats [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/440971 (https://phabricator.wikimedia.org/T194705) (owner: 10Milimetric)
[15:55:25] <milimetric>	 I'm gonna take the afternoon off, be back later tonight
[16:44:34] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 5 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Jdlrobson)
[16:45:24] <HaeB>	 mforns: right, good point about the array... it only consists of one element though for several of the possible values of "action". if one could convert it into a string in these cases during ingestion, and set it to NULL otherwise (i.e. when it has >1 element), that would be useful
[16:59:56] <wikibugs_>	 10Analytics, 10Datasets-General-or-Unknown, 10Documentation: Missing documentation for pageviews dataset - https://phabricator.wikimedia.org/T201653 (10Tbayer) Instead of maintaining Readme documentation on dumps.wikimedia.org, we should link back to the corresponding documentation pages on Wikitech , which...
[17:01:00] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 5 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Ottomata) > This problem came up before in the context of consistently handling sampling...
[17:07:46] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 5 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Tbayer) >>! In T201124#4485329, @pmiazga wrote: > When refactoring, please keep in mind,...
[18:13:10] <ottomata>	 mforns:  shall i merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/451780/1/modules/profile/manifests/analytics/refinery/job/data_purge.pp ?
[18:26:54] <ottomata>	 joal yargghh, trying to figure this map type thing out
[18:27:07] <ottomata>	 i think its not going to work, and the probability array key, value thing is pretty cumbersome to query, no?
[18:27:16] <ottomata>	 maybe we shoudl just deal with it and refine as structs?
[18:27:18] <ottomata>	 the schema will be nasty
[18:27:31] <ottomata>	 but it will be as about easy to query as the map
[18:27:36] <ottomata>	 and refine will work.
[18:27:58] <ottomata>	 for a minute i was trying to figure out if we could somehow in the json schema specify that an object is a map
[18:28:00] <ottomata>	 or make a type: map
[18:28:02] <ottomata>	 or something
[18:28:11] <ottomata>	 type: map<string,double> :p
[18:28:15] <ottomata>	 in the jsonschema itself
[18:28:21] <ottomata>	 seems way to custom though
[18:28:23] <ottomata>	 too
[18:28:54] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-WikimediaEvents, 10Page-Issue-Warnings, and 6 others: Provide standard/reproducible way to access a PageToken - https://phabricator.wikimedia.org/T201124 (10Niedzielski) > Add a getter for the pageViewToken property defined in the ext.eventLoggi...
[18:40:54] <ottomata>	 if we had that tho
[18:41:09] <ottomata>	 i could make the jsonschema converter for Kafka Connect write parquet with a map directly....
[18:43:20] <joal>	 ottomata: same conclusion for me - Too cumbersome to convert using casts
[18:43:44] <joal>	 ottomata: only way I can think of is by reading data twice (which we managed to prevent using teh cast thing)
[18:43:47] <joal>	 Meh
[18:44:32] <ottomata>	 aye reading it twice, the second time with the hive schema?
[18:44:32] <ottomata>	 yeah.
[18:44:33] <ottomata>	 hm
[18:44:51] <ottomata>	 if we knew the schema of the map field  from the jsonschema though
[18:44:54] <ottomata>	 i think Kafka Connect could do it
[18:45:08] <joal>	 ottomata: Add a schema using map in the registry?
[18:45:15] <ottomata>	 yes
[18:45:27] <ottomata>	 my prototype converter
[18:45:27] <ottomata>	 https://github.com/ottomata/kafka-connect-jsonschema
[18:45:29] <ottomata>	 works
[18:45:39] <ottomata>	 but, there's no map type in JSONSChema
[18:45:45] <ottomata>	 we'd need to make somethin gup
[18:45:51] <joal>	 Mwarf
[18:45:59] <ottomata>	 but, the code to convert Maps is there
[18:46:03] <ottomata>	 if we knew from the schema
[18:46:11] <ottomata>	 https://github.com/ottomata/kafka-connect-jsonschema/blob/master/src/main/java/org/wikimedia/kafka/connect/jsonschema/JsonSchemaConverter.java#L613
[18:47:12] <joal>	 ottomata: No registry yet - So this leaves us with either arrays and explicit field for names, or structs 
[18:47:45] <ottomata>	 at least in the very short term, i prefer the structs option :/
[18:48:03] <ottomata>	 but joal, i can run this Kafka COnnect code in prod now
[18:48:07] <joal>	 You're the Master, Master :)
[18:48:08] <ottomata>	 it uses the eventbus /schemas endpoint
[18:48:10] <ottomata>	 to look up the schema
[18:48:46] <ottomata>	 but ya, we'd need some nasty custom jsonschema stuff to make map work
[18:48:47] <ottomata>	 sooooo
[18:48:48] <ottomata>	 yeah
[18:48:57] <ottomata>	 ok.  if we are goign to keep structs
[18:49:06] <ottomata>	 i'd say let's keep them close to the ORES response
[18:49:08] <ottomata>	 removing the rev_id
[18:49:20] <ottomata>	 and possibly some extra hierachy levels in the object
[18:49:28] <ottomata>	 putting model and version together in the scores object, etc.
[18:52:31] <ottomata>	 scores.damaging.probability.false
[19:01:48] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Ottomata) > I've asked a question to Analytics whether the current schema of ORES will be good for their use-case if we r...
[19:25:13] <wikibugs_>	 10Analytics, 10LDAP-Access-Requests: LDAP access for HWalls and GVarnum - https://phabricator.wikimedia.org/T201468 (10herron) Looping in analytics to look into that
[19:28:47] <mforns>	 ottomata, I copied the salt that I'm using to backfill Q3 (natural year 2018) over to /user/hdfs/eventlogging-sanitization-salt.txt, this way when the cron job is deployed, it should recognize it and be fine. I think you can merge it, yes
[19:29:05] <mforns>	 thanks :]
[19:31:08] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10wikimediafoundation.org: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419 (10Varnent) @nuria - good news - we now have access to piwik site via LDAP logins - bad news - wiki15 does not have access to the new website's data:  {F24...
[19:32:12] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10wikimediafoundation.org: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419 (10Varnent)
[19:32:13] <wikibugs_>	 10Analytics, 10LDAP-Access-Requests: LDAP access for HWalls and GVarnum - https://phabricator.wikimedia.org/T201468 (10Varnent) 05Open>03Resolved a:03Varnent @herron - thank you! I mentioned it in the primary ticket and so closing this one since it was specific to LDAP. :) Thank you again for your help!!
[19:34:11] <ottomata>	 hmm, i guess it is friday mforns, shall we merge on monday?
[19:34:17] <ottomata>	 not that it woudl break anything, but might cause cronspam
[19:35:20] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Halfak) A couple of notes:  1. For the drafttopic model, "prediction" will be a list of strings 2. The set of class names...
[19:38:45] <wikibugs_>	 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Operations, and 2 others: RFC: Modern Event Platform - Choose Schema Tech - https://phabricator.wikimedia.org/T198256 (10kchapman) This is being placed on Last Call closing August 22nd ending at 2pm PST(22:00 UTC, 23:00 CET)
[19:39:38] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Ottomata) New names won't hurt.  Yargh, list of strings will.  I had thought that predicition was always the same type....
[19:40:20] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Modify revision-score schema so that model probabilities won't conflict - https://phabricator.wikimedia.org/T197000 (10Ottomata) Actually, it won't totally hurt since we are now key-ing by model name.  As long as any given model always uses...
[21:34:01] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420 (10kaldari) Hmm, that's unfortunate. I think @Nettrom would be the best person to answer your question: How important do you think it would be to backfill the missing data? I'...
[21:45:24] <wikibugs_>	 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Page creation data no longer updates - https://phabricator.wikimedia.org/T201420 (10Nettrom) I don't think backfilling all the data is very important. The only ones that appear to be affected are the NPP reviewers, and I should be able to run some queries...