[00:07:28] yes, they need to be added to a cohort, such as the Misc cohort: https://en.wikipedia.org/wiki/Wikipedia:Education_program/Dashboard/Misc [00:08:22] courses based on the education program extension are legacy courses, and not recommended for future events, in terms of integration with the dashboard. [00:08:51] nothing major has broken yet with that integration, but if/when it does, it' [00:09:16] s unlikely to get fixed quickly. [00:09:22] harej: ^ [00:10:05] harej: the alternative is to create courses via the dashboard, and avoid the EP extension altogehter. [00:10:07] but for future courses (or "courses"), I can just do it directly through the Dashboard? [00:10:20] yes. [00:10:27] And that won't touch the EP space on Wikipedia. [00:10:39] the features that make edits are disabled on that outreachdashboard, at present. [00:10:52] and in any case, it won't touch the EP space. [00:11:15] but if you just want to sign people up and track activity, then it should be all set. [00:11:23] Now, I have some datasets from other lists that I want to migrate to the Outreach Dashboard so that all the information is kept in one place. Will people get pinged if I do this? [00:11:56] definitely not as long as the editing features are turned off. [00:12:06] and probably not even if they are turned on. [00:12:24] I can't think of any notifications that would get triggered, just from adding users to a course. [00:12:35] Lovely. [00:12:40] Everything is coming together. [00:12:53] And it seems like you're including the global metrics now. [00:27:25] ragesoss: I have a list of course IDs here: https://en.wikipedia.org/wiki/Wikipedia:Education_program/Dashboard/DC but it's not showing up on outreachdashboard [00:29:14] Seems the only two cohorts in existence as miscellanea and art+feminism [00:29:25] Which generally conforms to my schema of the world, but... :P [00:39:56] [Did you do some major breaking change? I notice the new URL. We had an entry on the dashboard before but not anymore it seems.] [01:03:19] Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API overview - https://phabricator.wikimedia.org/T112956#1827288 (Nuria) @Dave_Braunschweig While the referrer use case falls outside the pageview API it is certainly one we have looked at internally. If you are inte... [02:52:28] harej: I added a dc cohort, so it should pick up those courses upon the next update. [02:52:54] Thank you sage [05:18:41] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1827432 (Beetstra) Does this scheme also include a quick-searchable domain (it is unclear to me) - I mean st... [09:33:17] !log Backfill october top data into cassandra (Json corrected) [10:01:27] PROBLEM - Throughput of event logging NavigationTiming events on graphite1001 is CRITICAL: (null) [11:11:14] (PS23) Joal: Add cassandra load job for pageview API [analytics/refinery] - https://gerrit.wikimedia.org/r/236224 [11:11:50] Analytics-Kanban: Backfill daily-top-articles in cassandra [2015-09-01 - 2015-11-16 (included)] [5 pts] {melc} - https://phabricator.wikimedia.org/T118991#1827783 (JAllemandou) a:JAllemandou [11:14:42] !log Start backfilling of pageview API august data [12:31:44] Analytics-General-or-Unknown, Database, Patch-For-Review: Create a table in labs with replication lag data - https://phabricator.wikimedia.org/T71463#1827909 (jcrespo) ``` MariaDB LABS localhost heartbeat_p > SELECT * FROM heartbeat; +-------+----------------------------+------+ | shard | last_updated... [13:15:24] (PS1) DCausse: Drop support for message without rev id in avro decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255105 [13:20:41] (PS2) DCausse: Drop support for message without rev id in avro decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255105 [13:32:35] joal, do you know where the job that gets the top-N articles lives? [13:32:57] I'm hoping to steal the code for a custom run using the latest definition, unless the latest def has been deployed already :) [13:33:09] (similarly, how goes backfilling?) [13:33:55] (CR) DCausse: "Schema is mandatory in kafka message unless a default schema has been provided in camus.properties." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [13:57:05] Ironholds: latest has been deployed and november is up-to-date (no spiders + correct json) :) [14:04:38] halfak: Hello ! [14:04:50] Hi joal ! [14:04:58] halfak: I didn't realised the meeting is canceled today [14:05:07] Do you still want to take some time ? [14:05:20] Me either. Just noticed a couple minutes ago. This week is an off week. [14:05:26] :) [14:05:31] So, not really canceled -- just biweekly. [14:05:38] But yeah. I can take some time now. [14:05:39] Yes, but forgot about it :) [14:05:47] ok, batcave? [14:06:07] omw [14:07:31] (I'm around but working on stuff, let me know if you need me) [15:12:31] (PS1) Addshore: E_WARNING to E_USER_WARNING [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255116 [15:14:57] (CR) Addshore: [C: 2 V: 2] E_WARNING to E_USER_WARNING [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255116 (owner: Addshore) [15:15:32] joal: check out the slightly newer version / image ;) https://github.com/addshore/grafana-wmfpageviews-datasource [15:16:15] addshore: Wooow :) [15:18:29] addshore: when available on wikimedia grafana, let know, I'll have a play :) [15:20:34] (PS1) Addshore: Update path of api log archives [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255117 [15:20:48] joal: will do! I think It just needs a tiny bit more rounding now [15:21:04] (CR) Addshore: [C: 2 V: 2] Update path of api log archives [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255117 (owner: Addshore) [15:21:07] addshore: np at all, it really looks cool ! [15:30:49] joal: I wonder if I should file a ticket for it to gather some comments ;) [15:34:27] addshore: you could send an email to analytics list if you want even more comments :) [15:34:36] Ooohh, now that is an idea! [15:35:05] ottomata: is there anything for me to test about the grphite stuff ? [15:35:13] Or should I wait for you ? [15:37:03] joal: sure, if you take the check_graphite script and pass it the exact arguments in the order that puppet will call it, what happens? [15:37:13] works :( [15:37:15] for our check, and also for the ones that godog says failed [15:37:27] on every try I did (different hosts, metriucs) [15:40:17] ottomata: --^ [15:40:29] hm, ok. [15:40:33] that is what I was going to try [15:40:40] maybe it is a weird icinga thing as godog guesses [15:40:44] Just troied again, works for mne :( [15:40:56] man: just tryed again, works for me :) [15:41:12] ottomata: here is the command I launch : ./check_graphite -U http://graphite.wikimedia.org -T 10 check_threshold \ 'derivative(transformNull(varnishkafka.cp4020.webrequest.mobile.varnishkafka.kafka_drerr, 0))' \ -W 0 -C 20000 --from 10min --until 0min --perc 80 --over [15:41:21] (for instance) [15:52:54] hm, joal yeah not sure [15:52:58] will require more investigation... [15:53:10] ok ottomata [15:53:39] sorry for not beign able to sort that myself ottomata :( [15:54:58] joal, yay! latest-> the version with the mobilemenu etc not included? [15:55:11] dcausse: show your patch, I think we can deploy with easy strategy and later, once we are producing every msg with an id drop the code that makes things backwards compatible cc ottomata [15:55:19] Ironholds: mobilemenu? [15:55:28] joal, Special:MobileMenu [15:55:44] I excluded it from the pageview calculations in a patch a few weeks back; do you know if it's deployed to the def yet? [15:55:44] Ohhh Ironholds , sorru didn't understood [15:55:52] nuria: perfect, thanks! [15:55:57] Ironholds: There have benn multiple bugs with the pageview api [15:55:57] oh, my bad, I left it ambiguous as to whether I meant the API or the pageview def [15:56:05] right [15:56:08] dcausse: let me review and run tests and such ok? [15:56:14] so the API is the latest, but the pageview def..? [15:56:14] nuria: sure [15:56:18] So new pageview def is not yet deployed :( [15:57:18] It's been an awful long time, we agreed with nuria it should be done, and decided to go for early next week because of Thanksgiving [15:57:24] Ironholds: --^ [15:57:32] *nods glumly* [15:57:39] thankee. Trying to do research projects and this is a big hindrance. [15:57:48] it's okay, they're not particularly time-sensitive [15:58:00] ok Ironholds [15:58:21] So that you know as well, we won't backfill the pageview (you know how costly it is) [15:58:25] Ironholds: --^ [15:58:43] Ironholds: I'll make sure to ping you when the new def is deployed [15:58:48] yay! Thankee :) [16:01:54] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1828267 (Milimetric) hm, interesting idea but I don't think the data stored from this process would be used... [16:08:41] ottomata: does it sound good to deploy the code as is: https://gerrit.wikimedia.org/r/#/c/251267/ [16:08:57] ottomata: and later do a second pass to require the id so deployment is backwards compatible [16:09:06] cc dcausse [16:09:25] that way it eases deployment and we get code where we want it in two passes [16:11:11] +1 i'm for this process [16:11:22] (CR) Ottomata: [C: 1] "You are all awesome." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [16:12:35] dcausse: ok, lemme look at all of this again and hopefully merge today. did you tested this latest patch on 1002? [16:13:25] nuria: let me check the last time I tested and see if there was any changes after [16:13:33] dcausse: ok [16:15:40] nuria: it was PS12, running another test with latest to be sure [16:15:54] Analytics-Backlog: Upgrade wikimetrics code to check labs lag table - https://phabricator.wikimedia.org/T119514#1828287 (Nuria) NEW [16:16:32] (PS4) DCausse: Add 2 payloads map fields to CirrusSearchRequestSet avro schema [analytics/refinery/source] - https://gerrit.wikimedia.org/r/252958 (https://phabricator.wikimedia.org/T118570) [16:24:11] (CR) Nuria: [C: 1] Drop support for message without rev id in avro decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255105 (owner: DCausse) [16:34:15] (PS5) Mforns: Add sum aggregate by user report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254068 (https://phabricator.wikimedia.org/T117287) [16:38:10] hey joal, yt? [16:38:17] hey mforns [16:38:20] I am :) [16:38:24] :] [16:38:42] what's up ? [16:39:14] I was thinking that we will we working by ourselves on the end of this week, so I think it's a good idea to work together on the sanitizing of the pageview_hourly, no? [16:39:51] I was going to grab a task right now [16:40:11] is it OK to you if I grab one for that? [16:40:31] mforns: sounds awesome :) [16:41:02] joal, is there any task you would like to take yourself? [16:41:45] mforns: I will spend some time with nuria ensuring she gives me her existing code to check for user-path retrieval [16:41:45] they are kind of sequential, no? I guess the first one is https://phabricator.wikimedia.org/T118838 [16:42:08] correct mforns [16:42:22] and as said in the desc, nuria has some code, so I'll ask her :) [16:42:28] aha [16:42:58] joal: for "identity recognition' right? [16:43:08] correct nuria [16:43:22] joal: if so i will make a gist with selects, they were very useful. [16:43:25] nuria: identity recognition and/or path following [16:43:36] nuria: perfect :) [16:44:02] joal: ya, i compiled files and after (by hand) looked for "rare" UAS or very small cities [16:44:29] joal: see: https://wikitech.wikimedia.org/wiki/Analytics/Data/Preventing_identity_reconstruction [16:44:41] joal: will add link to gist [16:45:04] joal: actually code is on page [16:45:12] joal: so you are set cc mforns [16:45:13] nuria: with the HQL already in the page it's perfect :) [16:45:23] indeed nuria, thanks for that ! [16:45:35] cool nuria thanks :] [16:46:16] mforns: We can pair if you want, or do bits and then merge, as you prefer :) [16:46:30] man, I think I am not clear mforns :) [16:46:49] So, we can pair code, or do bits each on our side and then group and merge [16:46:52] pfff [16:46:55] Hard evening [16:47:01] Analytics: Set up metrics for Time on Site - https://phabricator.wikimedia.org/T119352#1828375 (Aklapper) [16:47:06] hehehe, I got it, I was writing [16:47:26] joal, are you planning to do other stuff today? I have the lightning talk 20-21 CET and before that talk setup and I want to rehearse some more [16:47:42] probably after the talks it's too late for you [16:47:46] Also a-team, my son needs to go to the doctor (again ....:(, so if don't have news from my with before standup, I won't attend [16:47:59] oh, I see [16:47:59] mforns: We can start tomorrow, I already have plenty as well :) [16:48:20] sure, tomorrow I will work more in the morning, so I'll ping you [16:48:32] great mforns [16:48:37] ok joal [16:52:39] a-team: need to go to technology meeting today, i have sent my e-scrum plus posted preliminary goals on mediawiki [16:52:47] https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q3_Goals#Analytics [16:53:21] dcausse: let me know whe you are done testing cause i think code can be merged [16:53:37] nuria: test still running... [16:54:13] Analytics-Tech-community-metrics, Phabricator, DevRel-November-2015, User-greg: Closed tickets in Bugzilla migrated without closing event? - https://phabricator.wikimedia.org/T107254#1828389 (Aklapper) [16:55:18] dcausse: ok, i will be in meeting for next hour but ping me with results [16:58:50] ottomata: around? [16:59:14] I was trying to read a few messages using kafkacat on stat1002 to test the logging stuff [16:59:56] this is what I tried - [16:59:57] kafkacat -b analytics1012.eqiad.wmnet:9092 -t eventlogging_EventError -o -10 [17:00:04] and I get [17:00:07] https://www.irccloud.com/pastebin/1AiDVRmH/ [17:00:42] kafka1012 [17:00:56] trying to join standup... [17:01:09] oh [17:01:14] ya me too [17:01:35] ugh, authentication issues, I'll be in the batcave in a sec [17:01:41] a-team: cannot get into batcave [17:02:15] looks like some auth issues [17:02:36] i got in now [17:04:26] madhuvishy: you're in? :/ [17:04:30] we're in parallel universe? [17:04:34] lol [17:04:34] ya [17:04:38] i can here marcel [17:04:45] hear [17:05:12] joal, do you hear me? [17:05:17] no :( [17:05:25] we can hear you though! :] [17:05:29] I don't even see you connected mforns [17:07:19] (CR) EBernhardson: "i've added (in the last couple minutes) the two version of CirrusSearchRequestSet schema to the event-schemas repository in Ib5f5eda60abce" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [17:34:12] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1828504 (jcrespo) This is the physical view: ``` root@db1046:/srv$ df -h | grep /srv /dev/mapper/tank-data 1.4T 1.3T 136G 91% /srv root@db1046:/srv$ du -h --max-depth=2 663G ./sqlda... [17:43:15] (PS19) DCausse: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) [17:46:53] (CR) DCausse: "re-added src/main/resources in pom.xml." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [17:51:56] (PS5) DCausse: Add 2 payloads map fields to CirrusSearchRequestSet avro schema [analytics/refinery/source] - https://gerrit.wikimedia.org/r/252958 (https://phabricator.wikimedia.org/T118570) [17:53:07] nuria: tested both patches: https://gerrit.wikimedia.org/r/#/c/251267/ and https://gerrit.wikimedia.org/r/#/c/252958/ [17:53:15] dcausse: and? [17:53:21] it works [18:02:03] ok [18:03:21] ottomata: can you look at ebernhardson latest comment on this patch: https://gerrit.wikimedia.org/r/#/c/251267/ [18:03:23] ? [18:03:41] ottomata: are there any issues to deploy refinery-camus using a git submodule for schemas? [18:11:45] halfak: around? [18:11:57] Yeah [18:12:23] halfak: I added three hql files on the shared workspace you might be interested in [18:12:31] halfak: currently in the process of testing them [18:12:58] joal, can't test now. Need to switch back to other work. [18:13:06] halfak: I am, sorry [18:13:10] :) [18:13:22] No worries. Thank you for continuing to push. [18:20:48] hi kevinator [18:21:01] something urgent has come up and I need to reschedule my lightning talk to next month [18:21:04] would this be okay? [18:21:41] yes, will be ok. [18:21:54] I'll move you name off the list for today lzia [18:22:03] thank you kevinator. [18:28:17] Analytics-Kanban, Analytics-Wikistats, Patch-For-Review: Feed Wikistats traffic reports with aggregated hive data {lama} [21 pts] - https://phabricator.wikimedia.org/T114379#1828844 (ezachte) Three new charts for per project totals, to do: 'Total articles' Preview: http://stats.wikimedia.org/EN/draft/... [18:31:43] joal: see at ebernhardson latest comment on this patch: https://gerrit.wikimedia.org/r/#/c/251267/, is there an issue to deploy a git submodule inside refinery? [18:32:32] nuria: no shouldn't be [18:32:42] ottomata: then , let's add it right? [18:32:51] hm, nuria, in deployment.yaml in puppet [18:32:53] need to add [18:33:00] checkout_submodules: true [18:33:02] should be it [18:35:05] ottomata: ok, ebernhardson do you want to add submodule? I will add teh puppet code [18:35:11] *the [18:39:10] ottomata: this one: ./hieradata/common/role/deployment.yaml [18:39:12] ? [18:39:50] ya [18:39:57] add it to analytics/refinery [18:40:02] or [18:40:02] hm [18:40:08] nuria does it need to be in anayltics/refinery/source? [18:40:09] oh! [18:40:11] if so, then never mind [18:40:16] we don't deploy refinery/source [18:40:19] so, it'll just work [18:40:34] ottomata: but .. how does the module get initialized? [18:41:11] hey milimetric [18:41:17] hey YuviPanda [18:41:19] limn1 might be having puppet failures [18:41:21] just a fyi :) [18:41:31] I keep seeing that from shinken [18:41:37] but I've no idea why.. [18:42:05] ottomata: do we do the initialization by hand/ [18:42:07] ? [18:42:08] I was going to take a look at it but realized it's self hosted puppetmaster and you've a large patch on top of it :( [18:42:31] you should fix that failure at some point or we'll run into long term issues that might render the machine unusable... [18:42:36] k, I'll rebase and try again - it says it can't find apache... [18:42:57] :) I'm not worried, I've been keeping that thing maintained for years [18:43:17] we refactored some of the code last week, got rid of the webserver module [18:43:30] I fixed that in all the things that were 'in tree' but... [18:43:48] milimetric: yeah, just wanted to give you a headsup :) [18:43:52] nuria: we don't deploy refinery/source [18:44:03] we deploy artifacts commpiled from refinery/source that we commit to refinery via git fat [18:44:16] ottomata: ah, sorry, now i get it [18:44:16] ah, YuviPanda that makes sense, so that's probably why I'm seeing this, right? "Error 400 on SERVER: Could not find class webserver::apache for limn1.analytics.eqiad.wmflabs on node limn1.analytics.eqiad.wmflabs" [18:44:22] so, if you need the avro schemas to work with java code, just add the submodule to refinery source [18:44:24] what's it called now? [18:44:27] no need to change deployment settings [18:44:28] milimetric: yeah, probably. it's just ::apache now [18:44:32] ottomata: understood [18:44:37] k, i'll fix that on top [18:44:54] ok [18:45:18] milimetric: I'm also trying to kill all the things in manifests/misc, and limn has some stuff there. I might move it somewhere else and I'll let you know when / if I do [18:45:41] YuviPanda: thx, I'm happy to mirror your changes on this instance [18:45:53] (CR) Nuria: "@EBernhardson : adding submodule sounds good" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [18:45:59] milimetric: cool! and thanks for being a responsible self hosted puppetmaster user! [18:46:03] YuviPanda: maybe I'll keep you for a bit longer though, now I get this: [18:46:06] "Could not find resource 'Exec[compile puppet.conf]' for relationship on 'Class[Puppetmaster::Ssl]' on node limn1.analytics.eqiad.wmflabs" [18:46:13] oh hmm [18:46:20] some others are reporting that too [18:46:32] milimetric: let me make a quick patch, moment [18:49:57] Analytics-Tech-community-metrics: Profile names in UTF-8 incorrectly displayed as ??? - https://phabricator.wikimedia.org/T119540#1829058 (Aklapper) NEW [19:09:23] milimetric: thanks! :) [19:09:31] milimetric: also unrelated, have you seen tmpnb.org [19:14:36] nice job mforns_meeting! :) [19:15:17] :S [19:18:26] ohhhh noooo :( [19:18:31] I missed mforns meetings :( [19:18:44] How has it gone mforns ? [19:19:15] (PS1) Addshore: Track spread of user & babel languages [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255158 [19:20:40] Hey folks. Is there a good reference for how to load a TSV into Hive for querying? [19:20:58] * halfak looks through docs on Wikitech [19:21:43] halfak: not sure about doc, but the process is load data onto hdfs, create external table with proper settings (TSV style) [19:22:00] ja halfak quick google will tell ya [19:22:03] Yeah... was hoping there would be a "here are the proper TSV settings" [19:22:19] halfak: ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' [19:22:19] STORED AS TEXTFILE [19:22:20] LOCATION '/ad_data/raw/reg_logs'; [19:22:26] RTFGoogle [19:22:27] :D [19:22:30] :D [19:23:08] Also, what's the best way to transfer large files between stat1003 and stat1002 these days? [19:23:18] halfak: finalised my hive scripts on c9, tables created and metadata extraction ongoing on altiscale IA [19:23:35] hm, can't help on that one halfak [19:23:40] joal, \o/ great! [19:24:27] joal, would you recommend bz2 compression for a 60GB TSV? [19:24:40] YuviPanda: that's cool, /me likes notebooks of all kinds [19:24:47] hm, halfak, depends what you want to do with [19:24:52] query it in hive [19:24:53] milimetric: I'm very close to doing a deployment of that on labs :) [19:24:57] on toollabs even [19:25:03] with the new kubernetes stuff [19:25:09] YuviPanda: you should do a lightning talk when you do [19:25:11] halfak: https://wikitech.wikimedia.org/wiki/Analytics/FAQ [19:25:12] :p [19:25:12] * YuviPanda hopes to replace a lot of production-like puppet setups with this at some point [19:25:15] Since bz2 is splitabble, sonds like reasonnable [19:25:39] a-team: My son has the gastroenteritis again :( [19:25:46] oh no! [19:25:48] joal: Nooo! [19:25:51] :( [19:25:57] Oh yeah. Forgot about the sync server. [19:25:59] a-team: I'll be baby sitting tomorrow, so on-and-off [19:26:08] take care joal! [19:26:21] Yeah, he's ok, but need some care, so I do :) [19:26:46] those viruses are the bad side of creche ... [19:27:01] aah [19:33:12] joal: you can see mforns talk recorded: https://www.youtube.com/watch?v=kE3lSfs1dzc [19:35:16] (CR) Addshore: [C: 2 V: 2] Track spread of user & babel languages [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255158 (owner: Addshore) [19:35:20] yes, sorry I had deactivated my IRC pings for the presentation [19:35:28] joal, nuria ^ [19:48:16] Analytics-Backlog, Wikipedia-iOS-App-Product-Backlog, iOS-5-app-production: Puppetize Piwik to prepare for production deployment - https://phabricator.wikimedia.org/T103577#1829238 (BGerstle-WMF) a:BGerstle-WMF>None [20:12:48] ebernhardson: yt? [20:12:53] nuria: yup [20:13:25] ebernhardson: sould we add the depot? [20:13:33] ebernhardson: how is it called? [20:13:37] nuria: i was thinking yes, unless it's particularly annoying [20:13:54] nuria: https://gerrit.wikimedia.org/r/mediawiki/event-schemas [20:15:19] ebernhardson: looks like deployment is not affected so it should be fine [20:16:18] nuria: the patch also needs to be merged, and it was waiting mostly on a question about what namespace to use. I added you as a reviewer because you had asked to use the analytics namespace before: https://gerrit.wikimedia.org/r/#/c/255134/ [20:16:44] ebernhardson: want to merge patch 1st and add depot later? that seems best probably [20:16:53] ebernhardson: i like to merge what has been tested [20:16:59] yea, that makes sense [20:17:09] it's a relatively easy change later instead of mixing it into a large patch [20:17:28] Thanks nuria for thel ink :) [20:20:27] (CR) Nuria: "On 2nd though, let's add submodule on different patch to just merge what we have tested." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [20:21:41] ebernhardson: ok, reviewed, i *think* it makes sense to keep all schemas we are loading at runtime within teh same java package [20:22:00] *the [20:24:13] joal: I am merging the avro patch, ticket notes that you worked on deployment steps with dcausse [20:25:53] nuria: i don't really know how it effects java, willing to defer to your judgement :) [20:27:18] nuria: ok, does that mean we should deploy refinery-source soon ? [20:28:19] ebernhardson: since bindings are generated it looks nicer to have all bindings generated be part of same package but i could be convinced other wise [20:28:29] joal: ya, next week right? [20:28:52] nuria: better yes [20:28:58] nuria: just wanted to confirm :) [20:29:42] joal:k [20:31:06] ottomata: I submitted a patch for the logging stuff - this change is super simple - I can also write up a config file example if you want [20:31:10] nuria: also I'm interested to get a brief overview of what you've merged when you'll have time :) [20:31:41] joal: ya, let's do it before merging, now? [20:32:09] sure [20:32:11] cabe? [20:32:15] cave? [20:32:45] i wouldn't mind listening in as well, i've a just a general idea of the overall concept [20:32:55] please come in ebernhardson :) [20:35:28] oo k looking [20:36:31] ha, ok! hm, madhuvishy|lunch that would be helpful [20:36:42] i'm adding a config/ directory in the services branch [20:36:50] maybe add an example config file there [20:37:03] config/log.ini or whatever [20:39:34] halfak: hive metadata extraction finished: 1h10mins [20:40:02] Just ran a quick query to double check: SELECT user_id, count(1) as c from wmf_dumps.enwiki_20150901 where page_namespace = 0 group by user_id order by c desc limit 20; [20:40:17] Result in 130 secs [20:40:19] halfak: --^ [20:42:58] ottomata: okay, also jenkins fails on my patch with something about python3.3 interpreter [20:43:12] i think i saw something similar on my local when i ran tox too [20:43:15] 3.3? [20:43:18] ya [20:43:26] ohh, hm, i think master is behind in toxification :) [20:43:36] nuria: can you commit on services branch? [20:43:38] service branch? [20:43:48] ottomata: oh oh [20:44:01] okay i'll do that [20:48:53] danke [21:05:19] joal, Woot! Great! [21:09:18] halfak: currently playing with spark on the parquet metadata --> it's incredibly fast :) [21:09:42] joal: iam correcting commit message [21:09:47] joal: and merging [21:10:19] k nuria [21:10:22] * halfak slogs though an analysis that is way overdue. [21:11:05] (PS3) Nuria: Drop support for message without rev id in avro decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255105 (owner: DCausse) [21:12:42] good luck halfak [21:13:32] halfak: There is a guys that have made 657k edits on one single page :) [21:13:49] Ok, I think we can say that this thing works :) [21:13:50] a bot [21:13:51] ? [21:13:55] I hope ! [21:13:59] :) [21:13:59] Username or ID? [21:15:53] (PS20) Nuria: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [21:16:25] (CR) Nuria: [C: 2] Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [21:17:51] Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1829687 (Milimetric) @jcrespo, thanks very much for the physical report. As far as team Analytics is concerned, we only need enough data on m4-master to facilitate backfilling. So, if w... [21:19:28] (Merged) jenkins-bot: Add support for custom timestamp and schema rev id in avro message decoders [analytics/refinery/source] - https://gerrit.wikimedia.org/r/251267 (https://phabricator.wikimedia.org/T117873) (owner: DCausse) [21:19:32] (PS6) Nuria: Add 2 payloads map fields to CirrusSearchRequestSet avro schema [analytics/refinery/source] - https://gerrit.wikimedia.org/r/252958 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [21:19:56] ottomata: a little ping for icinga - Now or tomorrow ? [21:21:21] joal tomorrow, s'ok? [21:21:27] no problemo ottomata :) [21:21:31] if i push this patch today then i can do that first tomorrow [21:22:01] that's no issue ottomata (little goodie: http://www.confluent.io/blog/apache-kafka-0.9-is-released) [21:22:05] joal, ottomata : table creations are run by hand right? [21:22:15] nuria: Yes m'dmae [21:22:29] so ... when avro schema updates on avro table: https://gerrit.wikimedia.org/r/#/c/252956/1/hive/mediawiki/cirrus-searchrequest-set/create_CirrusSearchRequestSet_table.hql [21:22:42] we need to re-run [21:23:03] the table creation? [21:23:09] wowowow that's uncool nuria [21:23:21] ya...that seems so strange [21:23:27] I don't know if we can update schemas [21:23:45] I guess we can (should be linked in each partition), but needs to be thouroughly tested [21:23:51] in this case prior data can be lost [21:24:07] so it is no issue but .. what about going forward? [21:24:09] The thing is: if table re-creation for each change, then needs existing parittion reloading ! [21:24:44] nuria: We have gone through that with parquet with ottomata [21:24:57] nuria: it's manageable as long as new schemas can handle old data [21:25:10] We really should have a closer look [21:25:21] joal: we have tested that on hive though [21:25:31] nuria: first time creation, fine [21:25:41] then schema change --> update table ? [21:25:51] Or re-create ? [21:26:01] If re-create, can we read old data with new schema ? [21:26:19] hive stores schema per record [21:26:27] If update, does it work as expected (meaning old partitions are read with the old schema, new ones with the new schema ) ? [21:26:57] nuria: per record ? [21:27:50] joal: i thought so yes, let's see if dcausse is arround.. [21:28:04] not possible nuria, you image, 1 schema per record ? [21:28:05] not per rec ord [21:28:06] per file [21:28:13] Right, more sensib;e :) [21:28:21] and each file is guarunteed to have been written with the same schema [21:28:29] Then might work fine with table update [21:28:36] But we still should try :) [21:28:44] i think ebernhardson has tested this, but not sure how [21:29:07] ebernhardson: yt? [21:29:27] nuria: yup [21:29:44] did you tested hive table updates when schema has changed? [21:30:06] i thought it was dcausse , cause we talked about this a bit before our avro troubles [21:30:08] i havn't, not sure about dcausse [21:30:15] right, ok [21:30:34] oh thought ebernhardson had a while ago, pologies for the ping [21:31:03] ottomata: it was the search team though cause we didi talk about this, i will re-test [21:31:21] nuria: let me know if you want me to have a go at it tomorrow [21:31:50] i could just change the schema on ebernhardson.cirrussearchrequestset table and see what happens (but in 30 minutes, i'm just starting a meeting) [21:31:58] or you can :) [21:32:30] ebernhardson: i will give it a go [21:33:00] k nuria, I'm off for now, let me know by email if you want me to test tomorrow [21:33:04] k [21:33:07] if i had to guess, it will use the schema in the file as the writer schema, and whatever is in the table DDL as the reader schema [21:33:11] but just a guess [21:34:32] ebernhardson: I think it will the table DDL to write newly inserted data, then the schema embedded in the file to read (immutable files in hive) [21:35:11] hmm, we don't write any data with hive it's all write by camus [21:35:27] (external table) [21:35:44] ebernhardson: right ... But we could [21:36:05] So maybe actually the schema is not needed in have def ;) [21:36:14] in hive definition sorry [21:37:30] hmm, perhaps [21:43:06] I'm off a-team ! See you tomorrow :) [21:43:23] bonne nuit! [21:45:56] bye joal ! [21:57:56] ottomata: pushed a sample config [21:58:05] i rebased off service [21:58:14] i think jenkins is still failing though [21:58:33] but that may be something i did [21:58:34] hmm [21:58:53] madhuvishy: you are still on master though, no? [21:59:01] you should just push your review on top of service [21:59:06] so, abandon this change [21:59:10] then [21:59:13] ummm [21:59:22] oh [21:59:23] i see [21:59:28] i had a different branch [21:59:35] which i rebased on top of service now [21:59:36] madhuvishy: your local logging branch is on top of service, yes? [21:59:38] ok yes [21:59:46] so, you jsut need to tell gerrit you are submitting on that branch [21:59:47] so [21:59:48] instead of just [21:59:49] git erview [21:59:50] git review [21:59:51] do [21:59:53] git review service [21:59:58] (master is just the default) [22:00:00] okay cool will do that [22:00:00] also [22:00:02] before ytou do that [22:00:06] you should remove the Change_id [22:00:06] so [22:00:10] git commit -a --amend [22:00:11] ah right [22:00:12] sorry [22:00:14] just git commit --amend [22:00:16] and delete the change id [22:00:19] okay [22:00:23] which will allow git to make a new one for you [22:00:25] then you can do [22:00:27] git review service [22:00:33] this will make a new changeset in gerrit [22:00:33] alright doing [22:00:37] so you can abandon this one [22:00:45] okay [22:04:34] (CR) Milimetric: [C: 2] Add sum aggregate by user report [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/254068 (https://phabricator.wikimedia.org/T117287) (owner: Mforns) [22:05:09] (Abandoned) Milimetric: [WIP] Add Rolling Recurring Old Active Editor [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/161521 (https://bugzilla.wikimedia.org/69569) (owner: Milimetric) [22:06:02] thanks milimetric [22:21:33] madhuvishy: wanna chat a bit about the RunReport vs RunGlobalReport thing? [22:21:52] milimetric, madhuvishy, I'm looking at that code right now [22:22:16] me too, figured it'd be good to review it anyway [22:22:35] sure [22:24:32] ok, I'll just leave comments there then [22:24:59] milimetric, do you want to talk in the cave? I guess you wanted madhu to be there... [22:25:21] we can chat, 2/3 on the same page is better than 1/3 :) [22:25:32] sure [22:25:34] batcave? [22:25:55] yes (there) [22:39:43] milimetric: mforns was at 1-1 can join now if you are still talking [22:39:55] yeah, madhuvishy, to the batcave! [22:57:20] ottomata, ebernhardson : tested avro evolution on hive [23:02:58] (CR) Mforns: [WIP] Setup celery task workflow to handle running reports for the Global API (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) (owner: Madhuvishy) [23:03:18] bye a-team, see you tomorrow! [23:03:26] hasta luego [23:03:28] nite mforns [23:03:33] :] [23:07:19] (CR) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the Global API (1 comment) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) (owner: Madhuvishy) [23:07:21] (CR) Nuria: Add 2 payloads map fields to CirrusSearchRequestSet avro schema (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/252956 (https://phabricator.wikimedia.org/T118570) (owner: DCausse) [23:07:47] (CR) Milimetric: [WIP] Setup celery task workflow to handle running reports for the Global API (11 comments) [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308) (owner: Madhuvishy) [23:12:21] madhuvishy: did you get CR on your wikimetrics stuff? otherwise i can look at it now [23:12:45] nuria: yeah, Dan and Marcel just reviewed it [23:18:45] ottomata: okay pushed to service branch in a new patch [23:18:55] https://gerrit.wikimedia.org/r/#/c/255275/ [23:47:12] (CR) Nuria: Implement ArraySum UDF (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 (owner: EBernhardson) [23:51:14] (CR) EBernhardson: Implement ArraySum UDF (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 (owner: EBernhardson)