[00:23:28] 10Analytics-Kanban: Implement purging settings for Schema:ReadingDepth - https://phabricator.wikimedia.org/T167439#3346722 (10Tbayer) [00:27:48] 10Analytics-Kanban: Implement purging settings for Schema:ReadingDepth - https://phabricator.wikimedia.org/T167439#3346723 (10Tbayer) @mforns: Looks good, thanks! [00:27:59] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819#3346724 (10AndrewSu) My initial thought is that there will be two types of metrics. First, we want to look at **statement-level** me... [00:34:09] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819#3346729 (10Smalyshev) > Those statements might be part of the output of the SPARQL query, or they might simply be structural intermed... [01:20:35] 10Analytics-Kanban, 10Patch-For-Review: Load webrequest raw data into druid so ops can use it for troubleshooting - https://phabricator.wikimedia.org/T166967#3346806 (10faidon) This is definitely interesting, so many thanks on behalf of all of us for setting this up and thinking of us! :) Like every tool, I t... [09:32:25] (03PS1) 10Joal: Use native timestamps in mediawiki history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/358916 (https://phabricator.wikimedia.org/T161150) [10:00:32] (03PS2) 10Joal: Use native timestamps in mediawiki history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/358916 (https://phabricator.wikimedia.org/T161150) [10:15:34] (03PS3) 10Joal: Use native timestamps in mediawiki history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/358916 (https://phabricator.wikimedia.org/T161150) [10:23:33] (03PS4) 10Joal: Use native timestamps in mediawiki history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/358916 (https://phabricator.wikimedia.org/T161150) [10:30:16] (03PS5) 10Joal: Use native timestamps in mediawiki history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/358916 (https://phabricator.wikimedia.org/T161150) [11:07:31] 10Analytics-Kanban: Use native timestamp types in Data Lake edit data - https://phabricator.wikimedia.org/T161150#3347455 (10JAllemandou) Code ready on spark side, but timestamps in Parquet have been added to Hive in 1.2 version and we have 1.1 :( https://issues.apache.org/jira/browse/HIVE-6384). [11:08:55] Taking a break a-team, later ! [12:43:55] making some tea and then imma crush your last test milimetric [12:45:48] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3347733 (10Paladox) I created this fix https://gerrit-review.googlesource.com/#/c/110055/ that should fix it. I am unsure if there planni... [12:47:16] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3124611 (10Paladox) a:03Paladox [13:14:22] (03CR) 10Mforns: "LGTM!" (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/353287 (https://phabricator.wikimedia.org/T164021) (owner: 10Nuria) [13:19:26] fdans: reading your code a little and thinking about next steps [13:19:48] milimetric: what code? [13:19:57] that passed the tests yesterday [13:20:02] ah cool [13:20:31] I'm with the double breakdown one [13:21:15] the crossfilter api is surprisingly close [13:21:33] it's almost not worth wrapping [13:23:50] aaaaalmost... [13:24:19] omg I can't believe what I just done works [13:24:54] I love huge unmaintainable oneliners that will never make it into production [13:27:49] milimetric: I'm very proud of this: [13:28:12] https://usercontent.irccloud-cdn.com/file/DB4jLGet/Screen%20Shot%202017-06-14%20at%2015.27.34.png [13:28:51] I aspire not to need a minifier one day [13:31:22] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3347899 (10Aklapper) p:05Triage>03Low [13:32:23] lol [13:32:37] fdans: that's aaaalmost readable [13:33:09] milimetric: it's pretty crazy what you can do with cf's reducer [13:33:37] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3124611 (10Paladox) Making this a subtask to T156120 since gerrit 2.13 will have no more releases and they did not back port it to the 2.... [13:33:39] but you can do an arrow function and a ? operator there to make it a little smaller, no? [13:33:52] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3347907 (10Paladox) [13:36:28] milimetric: yeaaah this is something i put together in dev tools, and pressed enter :) [13:36:36] and yeeeah double breakdowns [13:37:18] oh, dev tools allows arrow functions now, it's great [13:38:12] yeah, I'm using one at the top, I'm just not that used to writing them down while sketching something [13:41:29] milimetric: are we ever going to need more than two breakdowns? [13:41:36] I know, I still have to look up all these fancy things [13:41:47] fdans: more than two is not allowed by design right now [13:41:52] two yes, because time and one other [13:45:13] fdans: what do you think of the shape of the output for breakdowns? I wasn't super sure if that was a good idea [13:45:42] like the breakdown dimension becomes the keys to the results [13:45:58] the end user would still have to sort by keys then [13:46:06] yeah I was thinking that [13:46:17] but it favours the columns being generic [13:46:33] not assuming that the keys are going to be dates [13:46:56] could be an ordered array though, with only the broken down columns being specified [13:47:20] that is sort of what crossfilter returns [13:47:37] like if your object usually has a, b, c, d and you breakdown by a and c, you'd get results like: [{a: '...', c: '...', measure: 1}, {a: '......}, ... ] [13:47:49] yeah, maybe that's better [13:48:02] i'll change the tests [13:48:11] hold on [13:48:14] milimetric: [13:48:15] holding [13:48:19] let me pong and you change them [13:48:21] k [13:48:36] 15:33:09 < fdans> milimetric: it's pretty crazy what you can do with cf's reducer --> fdans, you're ready for scala :) [13:48:48] :DDD [13:49:02] joal you'd like crossfilter [13:49:12] huhu [13:49:18] I never had any problems with scala's functional stuff, it's the polymorphic syntax that kills me [13:49:54] polymorphic as in "making the most of type generalisation?" [13:49:57] (clarification: not syntax that deals with polymorphism, but that there are two three ways to do the same thing) [13:50:08] Ah yes :) [13:50:13] milimetric: pong! but beware that the breakdown fn is ugly af right now [13:50:38] fdans: ok, do you agree with the new shape idea? [13:50:48] And obviously, among those 2 or 3 ways, one is said to be "the scala way" by some people that don't even have special titles about scala [13:51:13] milimetric I do, I think I can modify cf's output a bit [13:51:23] yeah, same as with "pythonic" things, but somehow it manages to be super confusing in scala [13:51:45] hehe [13:51:48] fdans: I'll give it a shot (changing the tests + the code) and then we can chat in the batcave a bit about next steps? [13:53:19] milimetric: sure [13:54:38] milimetric: you should only need to touch both reduces at the end (forgot to move the reduce to once the if/else block is closed) [13:55:14] k [14:01:05] fdans: I think unique should sort alphabetically or use an optional function passed to it? [14:02:33] milimetric: we can also use a Set? [14:03:10] is that kept in order? I was thinking an Array so we can pass it to d3 scales [14:03:17] and d3 extent [14:03:40] (I'm still puzzling over breakdown, bothers me there have to be two paths :)) [14:05:17] milimetric: there def don't have to be two paths... that function needs a second iteration [14:06:49] fdans: batcave? [14:06:56] yas [14:18:05] 10Analytics-Kanban, 10Analytics-Wikistats: AQS Api works with DimensionalData - https://phabricator.wikimedia.org/T167681#3348145 (10Milimetric) a:03Milimetric [14:36:57] 10Analytics-Kanban, 10Analytics-Wikistats: Interface from Detail page to DimensionalData - https://phabricator.wikimedia.org/T167680#3340666 (10fdans) a:03fdans [14:55:15] 10Analytics, 10Discovery-Analysis: Report updater should support Graphite mapping plugins - https://phabricator.wikimedia.org/T152257#3348326 (10debt) [14:58:22] fdans: I'm pretty sure we're not using dimensions properly here [14:58:33] oh no, that's right [14:58:35] if we do use them, it seems we should cache them [14:58:37] we gotta reuse them [14:58:40] yes [14:58:44] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey: Review Megacli Analytics Hadoop workers settings - https://phabricator.wikimedia.org/T166140#3348350 (10Nuria) 05Open>03Resolved [14:58:48] sorry, forgot to mention that [14:58:56] 10Analytics-Kanban, 10Patch-For-Review: Test failures in refinery master - https://phabricator.wikimedia.org/T166334#3348351 (10Nuria) 05Open>03Resolved [14:59:15] 10Analytics-Kanban, 10Patch-For-Review: Correct pageview_hourly loading scheme on pivot homescreen - https://phabricator.wikimedia.org/T167068#3348356 (10Nuria) 05Open>03Resolved [14:59:25] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Update puppet for new Kafka cluster and version - https://phabricator.wikimedia.org/T166162#3348358 (10Nuria) [14:59:27] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Update kafka.sh wrapper script for Kafka 0.10+ - https://phabricator.wikimedia.org/T166164#3348357 (10Nuria) 05Open>03Resolved [14:59:44] 10Analytics-Kanban: Update druid unique Devices Dataset to only contain hosts having more than 1000 uniques - https://phabricator.wikimedia.org/T164183#3348359 (10Nuria) 05Open>03Resolved [15:00:13] 10Analytics-Kanban: Initial Launch of new Wikistats 2.0 website - https://phabricator.wikimedia.org/T160370#3348363 (10Nuria) [15:00:15] 10Analytics-Kanban: Create yaml UI configuration files for Standard Metrics - https://phabricator.wikimedia.org/T166387#3348362 (10Nuria) 05Open>03Resolved [15:00:32] ping mforns elukey joal [15:00:46] elukey is offtoday [15:00:46] 10Analytics-Kanban: Initial Launch of new Wikistats 2.0 website - https://phabricator.wikimedia.org/T160370#3096561 (10Nuria) [15:01:00] 10Analytics-Kanban: Document that old deleted pages have empty fields in Analytics Cluster edit data - https://phabricator.wikimedia.org/T165201#3348367 (10Nuria) 05Open>03Resolved [15:05:12] (03CR) 10Mforns: "LGTM overall! Some comments that may be nonsense :]" (0310 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/358916 (https://phabricator.wikimedia.org/T161150) (owner: 10Joal) [16:10:06] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3348674 (10Aklapper) No, this is not a subtask - it is exactly the other way round (subtasks need to get resolved before parent tasks can... [16:10:27] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3348675 (10Aklapper) [16:10:44] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3124611 (10Aklapper) [16:12:27] 10Analytics, 10Operations, 10Traffic, 10Patch-For-Review: Implement Varnish-level rough ratelimiting - https://phabricator.wikimedia.org/T163233#3348682 (10Nuria) [16:14:31] 10Analytics-Tech-community-metrics, 10Gerrit, 10Upstream: Gerrit patchset 99101 cannot be accessed: "500 Internal server error" - https://phabricator.wikimedia.org/T161206#3348685 (10Paladox) Oh i see. It was confusing which way. But i see now. [16:17:18] a-team, ah! forgot to mention [16:17:25] T166510 [16:17:29] https://phabricator.wikimedia.org/T166510 [16:17:48] not found? [16:17:51] 404 for me ottomata [16:17:52] quote for new druid nodes came in. We only got the quote so soon because we wanted to try to get them this quater [16:18:00] Oh, maybe its secret because of quote [16:18:13] aha [16:18:15] so, anyway, we have a quote, so I suppose could place the order for 3 new druid nodes now if we wanted [16:18:35] it'll still be part of next FY budget [16:18:45] but we weren't really thinking about getting them so soon [16:18:46] but we could! [16:18:48] should we? [16:18:58] Given we think of using druid for wikistats backend, I'd say yes [16:19:13] +1 [16:20:48] nuria_: ^ opinion? [16:21:05] ottomata: on meeting, can talk in a bit [16:21:24] k [17:04:03] 10Analytics, 10Operations, 10Traffic: Increase request limits for GETs to /api/rest_v1/ - https://phabricator.wikimedia.org/T118365#3348807 (10Nuria) [17:06:07] 10Analytics, 10Operations, 10Traffic: Increase request limits for GETs to /api/rest_v1/ - https://phabricator.wikimedia.org/T118365#1798302 (10Nuria) >as well as the pageview API, which is currently low on backend capacity. Correction: pageview API has been rebuild since last comment and it can handle a LOT... [17:06:56] +1 ottomata, the sooner we can test the ideas we have for this backend the better [17:07:40] fdans: so I'm thinking we should set up a crossfilter.dimension each time "measure" is called, and use that to do all the other operations [17:07:40] 10Analytics, 10Operations, 10Traffic, 10Patch-For-Review: Implement Varnish-level rough ratelimiting - https://phabricator.wikimedia.org/T163233#3190763 (10Nuria) The fact that no requests have been throttled of late in PageviewAPI (see 429 graph below) kind of tells me that PageviewAPI has received too f... [17:07:51] there's no reason for a new dimension otherwise [17:08:19] there's no reason for one at all, really, it caches and optimizes operations like top and bottom which I don't think will be very common [17:08:33] but it seems there's no other way to work with the data, so that's ok [17:09:05] I found this tutorial which makes a lot more sense than the documentation: http://animateddata.co.uk/articles/crossfilter/ [17:09:56] leaving for now a-team - later [17:10:03] bye joaaaal :] [17:19:08] milimetric: yeah and cache dimensions in a dimensions object [17:20:30] fdans: I wonder though, why would we cache them? [17:20:56] I can't think of any situation where we'd need to change what we measure actually [17:21:23] so the metric will have some config that will define what to measure, we can even pass that when we construct the DimensionalData [17:21:58] and then that never changes, we can leave measure implemented but I think we can afford to .dispose() the active crossfilter dimension and create a new one just fine [17:22:14] since I don't even know when measure will be called [17:22:16] want me to do that? [17:22:40] then I think I can finish breakdown and the ping/pong [17:26:23] fdans: doing (lemme know if you disagree) [17:27:34] milimetric: if we're disposing of them I think we're fine [17:27:45] creating dimensions won't harm performance [17:27:52] docs says it will [17:27:59] it says "be careful!" [17:28:00] it's keeping them [17:28:00] :) [17:28:11] not creating them I think [17:28:27] so if you make one and don't keep it they get gc-ed? [17:28:54] no because they are part of cf's state I think [17:30:04] right, so then .dispose is required, and making them without tracking them is bad [17:30:05] k, doing [17:30:25] niiice [17:46:52] ottomata: +1 to order nodes that way we have them in place for new backend [17:54:08] oook [17:54:09] ! [18:32:10] 10Analytics, 10Operations, 10Traffic, 10Patch-For-Review: Implement Varnish-level rough ratelimiting - https://phabricator.wikimedia.org/T163233#3349142 (10GWicke) [18:32:14] 10Analytics, 10Operations, 10Traffic: Increase request limits for GETs to /api/rest_v1/ - https://phabricator.wikimedia.org/T118365#3349137 (10GWicke) 05Open>03Resolved a:03GWicke @bblack and myself looked into this yesterday after the deployment of the more aggressive global limits, and found that leg... [18:33:53] 10Analytics: Incorporate data from the GeoIP2 ISP database to webrequest - https://phabricator.wikimedia.org/T167907#3349143 (10Nuria) [18:34:48] 10Analytics-Kanban, 10Patch-For-Review: Load webrequest raw data into druid so ops can use it for troubleshooting - https://phabricator.wikimedia.org/T166967#3349155 (10Nuria) File ticket on this regard, we should follow up on this separately: https://phabricator.wikimedia.org/T167907 let's go ahead with add... [18:40:13] 10Analytics-Cluster, 10Analytics-Kanban: Hadoop cluster expansion. Add Nodes - https://phabricator.wikimedia.org/T152713#3349167 (10Ottomata) [18:47:34] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an EventLogging event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3349213 (10kaldari) > * What are the main metrics you wish to calculate? What is the frequenc... [19:14:28] 10Analytics-Kanban: Provide cumulative edit count in Data Lake edit data - https://phabricator.wikimedia.org/T161147#3349375 (10JAllemandou) a:03JAllemandou [19:14:53] (03PS1) 10Joal: Add new fields in mediawiki_history job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/359019 (https://phabricator.wikimedia.org/T161147) [19:22:58] nuria_: yt? [19:23:44] ottomata: on meeting , can talk in 10 mins [19:23:58] k [19:30:10] afk for a few, back shortly [19:30:13] ottomata: back, wassupopopp [19:30:16] oh ok [19:30:38] ok so i'm puppetizing a change to insert some eventbus events into mysql m4 master [19:30:44] qs: [19:31:04] should we insert these into the same 'log' database? or a separate new 'event' database [19:31:29] using the same db will be eaiser i guess, since eventlogging_sync replication only looks at log eb [19:31:30] db [19:31:57] ottomata: mmm, ya, i think (unless we are worried about number of tables in db) that shoudl be fine [19:32:19] ottomata: more so when eventbus events are much more limited in variety than el events [19:32:25] ottomata: also great for purging [19:32:28] oh right [19:32:35] because they won't be in whitelist, so they'll get auto purged? [19:32:36] if they are in log db? [19:32:42] ottomata: variety i mean that is less likely than teh number of tables will explode [19:32:44] ok, i think i had another q but now i can't remmeber [19:32:51] ottomata: right [19:33:13] ottomata: unless data goes in whitelist it will be autopurged, we can probably ping kaldari about this [19:33:24] ottomata: on ticket might be better [19:34:14] ok [19:34:36] ottomata: is all data on those events of public nature? [19:34:50] ottomata: how is the capsule thingy going to work? [19:35:01] ottomata: i see schema validation from your changes [19:35:09] ottomata: is there a way to say there is no capsule? [19:36:41] : is all data on those events of public nature? yes [19:36:47] capsule, ya it sfine [19:37:33] ottomata: we tell EL that there is "no" capsule [19:37:53] ottomata: then we need to watch out what columns are the table created going to have so replication can work [19:37:53] nuria_: https://github.com/wikimedia/eventlogging/blob/master/eventlogging/jrm.py#L271-L283 [19:38:02] oh right [19:38:08] oh but it adds an id field top level [19:38:09] hmmm [19:38:14] ottomata: righttt [19:38:22] ottomata: the timestamp and id are part of capsule [19:38:33] ottomata: and both are needed for replication and purging [19:38:46] ottomata: unique id at least [19:39:18] i think it only needs id [19:39:20] no? [19:39:23] for replication? [19:42:23] ah, nuria_ yeah, jrm.py is adding the id field [19:42:27] https://github.com/wikimedia/eventlogging/blob/master/eventlogging/jrm.py#L337 [19:42:32] (03PS6) 10Joal: Use native timestamps in mediawiki history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/358916 (https://phabricator.wikimedia.org/T161150) [19:42:37] better hope nobody adds a top level id field in an event bus schema [19:43:20] HMmm, i should make some auto converstion from our _dt 8601 json timestamps to mysql timestamps, like ori's got going for MW style timestamps. [19:43:22] ottomata: jaja ya, that would be something to document, but w/o capsule i am not sure whether column name will be id or event_id [19:43:42] it is id [19:43:46] ottomata: and i would look at purging code for assumptions as of columns [19:43:56] the event_ prefex [19:44:14] comes because when encapsulating, the event from the client side gets stuck in an event. subobject [19:44:15] so [19:44:39] { cap_field1, cap_field2, event: { field1, field 2} } [19:44:42] and then when insertting into mysql [19:44:46] the event is flattened [19:44:47] so [19:44:56] { cap_field1, cap_field2, event_field1, event_field2 } [19:45:04] so [19:45:06] with no 'encapsulate' [19:45:13] there's no 'event' object to flatten [19:45:20] there are lots of subobjects though [19:45:22] those will be flattened [19:45:25] e.g. performer [19:45:36] performer.user_groups -> performer_user_groups [19:45:42] meta.dt -> meta_dt [19:45:45] ottomata: ok [19:46:57] (03PS2) 10Joal: Add new fields in mediawiki_history job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/359019 (https://phabricator.wikimedia.org/T161147) [19:48:04] but ya nuria_ [19:48:18] does the auto purging rely on a top level (capsule) timestamp field? [19:48:54] 10Analytics, 10Operations, 10Traffic: Increase request limits for GETs to /api/rest_v1/ - https://phabricator.wikimedia.org/T118365#3349563 (10Nuria) >which matches metrics end points explicitly limited at 100/s per client IP. mmm... looking at pageview API dashboard i can see some of lawful traffic (spike... [19:49:11] ??? [19:49:12] # Timestamps are stored as VARCHAR(14) columns. [19:49:12] impl = sqlalchemy.Unicode(14) [19:49:13] in mysql? [19:49:15] why? [19:49:17] oooOkkkkk [19:49:29] ottomata: i think it relies on unique ids and timestamps, let me see [19:49:30] psshhh 20160522003038 [19:49:36] whyyyy? [19:49:50] fine then! if ori's code doesn't convert them, I won't either! [19:49:54] ISO 8601 will work fine for sorting! [19:50:01] if we are string sorting anyway :) [19:50:28] grrrr maybe I will do it [19:50:32] :) [19:51:15] ottomata: ayay [19:52:50] mforns: where is the patch we are working on for purging script? [19:53:54] nuria_, it's this one: https://gerrit.wikimedia.org/r/#/c/356383/ [19:54:18] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Improve purging for analytics-slave data on Eventlogging - https://phabricator.wikimedia.org/T156933#3349569 (10Nuria) Code here: https://gerrit.wikimedia.org/r/#/c/356383/ [19:55:07] mforns: does script assume unique ids? [19:55:19] mforns: on records? [19:55:24] ottomata: couldn't get to it again, and I have a meeting in 5, are you around after 17:00>? [19:55:24] nuria_, yes [19:55:27] mforns: or just an autoincrement id cc ottomata [19:55:46] nuria_, it uses uuid's not id [19:55:51] soem tables do not have id [19:55:53] cc ottomata [19:55:59] but all of them have uuid's [19:56:09] and they should be unique [19:57:13] 10Analytics-Dashiki, 10Analytics-Kanban, 10MW-1.30-release-notes (WMF-deploy-2017-06-13_(1.30.0-wmf.5)), 10Patch-For-Review, 10Wikimedia-log-errors: Warning: JsonConfig: Invalid $wgJsonConfigModels['JsonConfig.Dashiki'] array value, 'class' not found - https://phabricator.wikimedia.org/T166335#3349577 (10... [19:58:19] ah! it just works with sql alchemy's built in DateTime :) [19:58:26] ottomata: so the purges assumes that all tables have a uuid column [19:58:32] yes [19:58:35] mforns: uhhhh [19:59:00] not sure i understand, why is a uuid needed for purging? [19:59:14] ottomata, we use it for the limit offset [19:59:20] uuid? [19:59:22] we can not delete all records at the same time [19:59:31] like you say where uuid IN( ...) [19:59:32] big list? [19:59:45] so, one way is to select the uuids of the time range to delete [20:00:00] and use limit offset to get small slices of the uuids [20:00:05] yes [20:00:14] can't you just delete with limit, offset? [20:00:29] ottomata, yes! but there's also the case where we need to update [20:00:37] for partial purging, some fields are set to NULL [20:00:41] and others kept [20:00:58] but [20:00:59] ya, but can't you update with limit, offset too? [20:01:02] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an EventLogging event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3349583 (10Nuria) @kaldari: can you create pages on meta with definitions so we do not have t... [20:01:22] ottomata, we could if all records had the same update values [20:01:25] but they dont [20:01:32] sorry [20:01:35] oh, they aren't all just NULL? [20:01:44] oh, different rows might have different update values? [20:01:47] in the same schema? [20:01:50] no, you can't update with limit offset in mariadb [20:02:17] no, no I was confused, that happened when we were trying to update the userAgent (map) field [20:02:30] ah [20:02:41] but not the case any more, we could do update limit offset, but mariadb does not support that [20:02:42] hmmmmmmmmmmmm [20:03:03] but, the only reason schemas have uuid is because they are eventlogging analytics specific :( [20:03:11] lemme look at uuid creation... [20:03:29] ottomata, but right now I'm trying to change the script to not use uuids, and work on timestamps instead [20:03:50] :D [20:03:51] yeah! [20:03:53] the slices won't be so exact, but I think it'll work [20:04:04] but, also, the top level timestamp field is eventlogging capsule specifc [20:04:04] hm [20:04:20] however, we have to talk to Luca... [20:04:30] aha [20:04:40] less so than uuid though [20:04:49] i could make jrm.py add the timestamp field if it doesn't exist [20:05:08] ottomata, makes sense [20:05:29] it's an intuitive and sensible convention [20:05:40] to have a timestamp field [20:07:20] (03PS7) 10Joal: Use native timestamps in mediawiki history [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/358916 (https://phabricator.wikimedia.org/T161150) [20:08:45] (03PS3) 10Joal: Add new fields in mediawiki_history job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/359019 (https://phabricator.wikimedia.org/T161147) [20:09:51] ottomata: side note: where were our notes about eventlogging and schemas and guidelines to make it easy to (schema wise) to use druid and eventlogging on hadoop [20:11:32] uhhh hmm [20:11:37] i think we posted the ehterpad link on some ticket [20:13:07] not sure nuria [20:15:11] ottomata: https://etherpad.wikimedia.org/p/analytics-notes [20:16:05] haha [20:16:06] -notes [20:21:19] (03PS4) 10Joal: Add new fields in mediawiki_history job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/359019 (https://phabricator.wikimedia.org/T161147) [20:22:38] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10EventBus, and 5 others: Record an EventLogging event every time a new content namespace page is created - https://phabricator.wikimedia.org/T150369#3349662 (10kaldari) Sure, I was going to create it at https://meta.wikimedia.org/wiki/Researc... [20:25:29] (03CR) 10jerkins-bot: [V: 04-1] Add new fields in mediawiki_history job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/359019 (https://phabricator.wikimedia.org/T161147) (owner: 10Joal) [20:31:34] (03PS5) 10Joal: Add new fields in mediawiki_history job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/359019 (https://phabricator.wikimedia.org/T161147) [20:39:05] ya nuria_ i thought someone was gonna put those on a wiki somewhere :) [20:47:03] ottomata: how about https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Schemas [20:48:27] ottomata: or maybe https://www.mediawiki.org/wiki/Extension:EventLogging/Guide#Creating_a_schema [20:48:30] cc milimetric [20:52:42] milimetric: should we add guidelines to external eventloggin extension? that actually might be good as guidelines are kind of universal no? [20:54:51] nuria_: probably not extension i think [20:55:00] this is for our use case of eventlogging for analytics [20:55:06] but, maybe! [20:55:07] :) [21:08:27] ha, ottomata, mw timestamps *are* in ISO 8601 format: https://en.wikipedia.org/wiki/ISO_8601#General_principles [21:08:35] they're just not in "extended" format [21:08:49] nuria_: I'm making the page here: https://wikitech.wikimedia.org/w/index.php?title=Analytics/Systems/EventLogging/Schema_Guidelines [21:08:53] (blank for now) [21:21:17] nuria_ / ottomata: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines [21:21:37] preliminary thoughts, which I'm sending to mobile folks now [21:41:02] We haz cumulative dataz !!!! [21:41:24] (03PS6) 10Joal: Add new fields in mediawiki_history job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/359019 (https://phabricator.wikimedia.org/T161147) [22:00:19] joal: wow [22:01:44] milimetric: looks real nice cc dbrant [22:01:57] cc coreyfloyd [22:02:10] please be so kind as to take a look: https://wikitech.wikimedia.org/w/index.php?title=Analytics/Systems/EventLogging/Schema_Guidelines [22:02:23] nuria_: cool, thanks! [22:03:09] (I sent an email... hm, maybe it didn't go through) [22:08:54] nuria_: milimetric thanks! [22:20:13] 10Analytics, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Data request for logs from SparQL interface at query.wikidata.org - https://phabricator.wikimedia.org/T143819#3350099 (10Nuria) @Smalyshev @AndrewSu please take a look at other metric definitions we have. once you decide on a metric definition... [22:21:28] (03Draft2) 10Reedy: Add atjwiki [analytics/refinery] - 10https://gerrit.wikimedia.org/r/359062 (https://phabricator.wikimedia.org/T167720)