[06:07:27] (PS1) Yuvipanda: Use a thread scoped session object [analytics/quarry/web] - https://gerrit.wikimedia.org/r/251916 [06:07:29] (PS1) Yuvipanda: Fix hosts list in deploy script [analytics/quarry/web] - https://gerrit.wikimedia.org/r/251917 [06:07:31] (PS1) Yuvipanda: Increase query limits to 30mins [analytics/quarry/web] - https://gerrit.wikimedia.org/r/251918 [06:08:19] (CR) Yuvipanda: [C: 2] Use a thread scoped session object [analytics/quarry/web] - https://gerrit.wikimedia.org/r/251916 (owner: Yuvipanda) [06:08:32] (CR) Yuvipanda: [C: 2] Fix hosts list in deploy script [analytics/quarry/web] - https://gerrit.wikimedia.org/r/251917 (owner: Yuvipanda) [06:08:49] (CR) Yuvipanda: [C: 2] Increase query limits to 30mins [analytics/quarry/web] - https://gerrit.wikimedia.org/r/251918 (owner: Yuvipanda) [06:12:15] (Merged) jenkins-bot: Increase query limits to 30mins [analytics/quarry/web] - https://gerrit.wikimedia.org/r/251918 (owner: Yuvipanda) [06:12:50] Quarry: Time limit on quarry queries - https://phabricator.wikimedia.org/T111779#1792561 (yuvipanda) Yes, that seems to have been a bug that I hopefully have fixed :) Usually when they get killed they get their status set to 'killed' but apparently not that one. Now queries get killed in 30min. [06:15:45] (PS1) Yuvipanda: Add link to Flow board [analytics/quarry/web] - https://gerrit.wikimedia.org/r/251919 (https://phabricator.wikimedia.org/T117647) [06:16:09] (CR) Yuvipanda: [C: 2] Add link to Flow board [analytics/quarry/web] - https://gerrit.wikimedia.org/r/251919 (https://phabricator.wikimedia.org/T117647) (owner: Yuvipanda) [06:16:35] (Merged) jenkins-bot: Add link to Flow board [analytics/quarry/web] - https://gerrit.wikimedia.org/r/251919 (https://phabricator.wikimedia.org/T117647) (owner: Yuvipanda) [06:21:18] Quarry, Patch-For-Review: Add chat or forum (Quarry) - https://phabricator.wikimedia.org/T117647#1792580 (yuvipanda) There's a flow board linked from the top bar now! \o/ I wonder how it'll be used. [09:18:36] Analytics-EventLogging, MediaWiki-extensions-RelatedArticles, MobileFrontend, Patch-For-Review, and 2 others: Upstream Schema.js from MobileFrontend to EventLogging - https://phabricator.wikimedia.org/T117140#1792700 (phuedx) Open>Resolved This task tracked a technical background-ish task. I... [10:11:01] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1792766 (Addshore) Per http://graphite.readthedocs.org/en/latest/overview.html Graphite does exactly what we need here. It "**Stores nume... [10:36:45] Analytics, Discovery-Cirrus-Sprint: Decide how to deploy avro schemas in hive (CirrusSearchRequestSet) - https://phabricator.wikimedia.org/T118155#1792797 (dcausse) NEW [11:14:05] Analytics, Discovery-Cirrus-Sprint: Decide how to deploy avro schemas in hive (CirrusSearchRequestSet) - https://phabricator.wikimedia.org/T118155#1792895 (dcausse) [11:20:55] Analytics-Tech-community-metrics: Handling multiple affiliations (at once; like work vs spare time) in tech community metrics - https://phabricator.wikimedia.org/T95238#1792918 (Aklapper) p:Normal>Low [11:24:32] Analytics-Tech-community-metrics: Connecting wikitech.wikimedia.org user profiles with community metrics - https://phabricator.wikimedia.org/T53050#1792967 (Aklapper) Open>stalled This task seems to be blocked by the (non-existing?) task to allow registered users on http://wikitech.wikimedia.org to set... [11:25:08] Analytics-Tech-community-metrics: Pull user profile data from wikitech.wikimedia.org and use it in community metrics - https://phabricator.wikimedia.org/T53050#1792969 (Aklapper) p:Low>Lowest [11:26:17] Analytics-Tech-community-metrics, Possible-Tech-Projects, Outreachy-Round-11: Misc. improvements to MediaWikiAnalysis (which is part of the MetricsGrimoire toolset) - https://phabricator.wikimedia.org/T89135#1792973 (Aklapper) [11:26:30] Analytics-Tech-community-metrics, Performance: Improve performance of MediaWikiAnalysis - https://phabricator.wikimedia.org/T114439#1792974 (Aklapper) [11:28:19] Analytics-Tech-community-metrics: Port MediaWikiAnalysis to SQLAlchemy - https://phabricator.wikimedia.org/T114437#1792986 (Aklapper) a:Anmolkalia >>! In T114437#1732889, @jgbarah wrote: >>>! In T114437#1732753, @Anmolkalia wrote: >> So, I was able to correct the code. This is working fine now. {F2730722}... [11:29:22] Analytics-Tech-community-metrics: Illegible overlapping tables on narrow screens due to CSS - https://phabricator.wikimedia.org/T97115#1792988 (Aklapper) p:Low>Lowest [11:30:30] Analytics-Tech-community-metrics: In MediaWikiAnalysis, implement some missing information from the MediaWiki API - https://phabricator.wikimedia.org/T114440#1792991 (Aklapper) [11:30:55] Analytics-Tech-community-metrics, DevRel-December-2015, Easy, Google-Code-In-2015: Clarify Demographics definitions on korma (Attracted vs. time served; retained) - https://phabricator.wikimedia.org/T97117#1792993 (Aklapper) a:Aklapper [11:32:47] Analytics-Tech-community-metrics, DevRel-November-2015: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1792999 (Aklapper) [11:32:49] Analytics-Tech-community-metrics, Phabricator: Metrics for Maniphest - https://phabricator.wikimedia.org/T28#1792998 (Aklapper) [11:33:25] Analytics-Tech-community-metrics, Developer-Relations, DevRel-January-2016: Who are the top 50 independent contributors and what do they need from the WMF? - https://phabricator.wikimedia.org/T85600#1793001 (Aklapper) [11:36:27] Analytics-Tech-community-metrics, Research consulting, Research-and-Data: Quantifying the "sum of all contributors" - https://phabricator.wikimedia.org/T113406#1793012 (Aklapper) Removing #Analytics-Tech-Community-Metrics project as what we can currently offer in this area has been described in T11340... [11:40:49] Analytics-Tech-community-metrics, DevRel-November-2015: Improve Key performance indicator: code contributors new / gone - https://phabricator.wikimedia.org/T63563#1793028 (Aklapper) [11:42:39] (PS1) Christopher Johnson (WMDE): minor lint and path fixes [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/251938 [12:34:22] Analytics-Tech-community-metrics, DevRel-November-2015: Improve Key performance indicator: code contributors new / gone - https://phabricator.wikimedia.org/T63563#1793104 (Aklapper) p:Normal>Low a:Aklapper [12:38:22] Analytics-Tech-community-metrics, DevRel-November-2015: Key performance indicator: Top contributors - https://phabricator.wikimedia.org/T64221#1793122 (Aklapper) @MarkAHershberger: Ranking algorithm will be explained on page once https://github.com/Bitergia/mediawiki-dashboard/pull/70 is solved. Discussin... [12:38:28] Analytics-Tech-community-metrics, DevRel-November-2015: Many profiles on profile.html do not display identity's name though data is available - https://phabricator.wikimedia.org/T117871#1785578 (Aklapper) [12:38:29] Analytics-Tech-community-metrics, DevRel-November-2015: Key performance indicator: Top contributors - https://phabricator.wikimedia.org/T64221#1793125 (Aklapper) [12:39:12] Analytics-Tech-community-metrics, DevRel-December-2015: Key performance indicator: Top contributors - https://phabricator.wikimedia.org/T64221#660476 (Aklapper) [12:40:23] Analytics-Tech-community-metrics, Phabricator: Metrics for Maniphest - https://phabricator.wikimedia.org/T28#1793133 (Aklapper) [12:40:25] Analytics-Tech-community-metrics: Add remaining KPIs to kpi_overview.html once available in korma - https://phabricator.wikimedia.org/T116572#1793132 (Aklapper) [12:40:29] Analytics-Tech-community-metrics: Add remaining KPIs to kpi_overview.html once available in korma - https://phabricator.wikimedia.org/T116572#1793134 (Aklapper) p:Low>Lowest [12:41:16] Analytics-Tech-community-metrics, DevRel-December-2015: Review/update mailing list repositories in korma - https://phabricator.wikimedia.org/T116285#1793135 (Aklapper) a:Aklapper [12:41:38] Analytics-Tech-community-metrics, DevRel-December-2015: Review/update mailing list repositories in korma - https://phabricator.wikimedia.org/T116285#1745496 (Aklapper) [12:41:40] Analytics-Tech-community-metrics, Patch-For-Review: Tech metrics missing IRC channels - https://phabricator.wikimedia.org/T56230#1793140 (Aklapper) [12:41:41] Analytics-Tech-community-metrics, DevRel-December-2015: Key performance indicator: Top contributors - https://phabricator.wikimedia.org/T64221#1793138 (Aklapper) [12:46:31] Analytics-Tech-community-metrics: Create an API REST to list identities from Korma - https://phabricator.wikimedia.org/T91871#1793154 (Aklapper) p:Low>Lowest [12:47:04] Analytics-Tech-community-metrics: Microtask: Create a very simple REST API for SortingHat - https://phabricator.wikimedia.org/T114838#1793157 (Aklapper) p:Normal>Lowest [12:47:50] Analytics-Tech-community-metrics: Create an AngularJS app that consumes a fake identities, affiliations and countries API REST and allows to play with such dataset - https://phabricator.wikimedia.org/T91873#1793160 (Aklapper) p:Low>Lowest [12:48:19] Analytics-Tech-community-metrics: Create an AngularJS app that consumes a fake identities, affiliations and countries API REST and allows to play with such dataset - https://phabricator.wikimedia.org/T91873#1097920 (Aklapper) p:Lowest>Low [12:48:37] Analytics-Tech-community-metrics: Microtask: Create a very simple REST API for SortingHat - https://phabricator.wikimedia.org/T114838#1707546 (Aklapper) p:Lowest>Low [12:48:46] Analytics-Tech-community-metrics: Create an API REST to list identities from Korma - https://phabricator.wikimedia.org/T91871#1793173 (Aklapper) p:Lowest>Low [12:50:22] Analytics-Tech-community-metrics: MediaWiki.org stats should also consider discussion - https://phabricator.wikimedia.org/T62074#1793174 (Aklapper) [12:55:48] Analytics-Tech-community-metrics, DevRel-November-2015: Fix 404s (VizGrimoireJS entirely broken) on korma's mediawiki.html - https://phabricator.wikimedia.org/T118167#1793181 (Aklapper) NEW a:Dicortazar [12:55:57] Analytics-Tech-community-metrics: MediaWiki.org stats should also consider discussion activity (Talk/Thread namespaces) - https://phabricator.wikimedia.org/T62074#1793188 (Aklapper) [12:56:06] Analytics-Tech-community-metrics: MediaWiki.org stats should also consider discussion activity (Talk/Thread namespaces) - https://phabricator.wikimedia.org/T62074#656677 (Aklapper) [12:56:07] Analytics-Tech-community-metrics, DevRel-November-2015: Fix 404s (VizGrimoireJS entirely broken) on korma's mediawiki.html - https://phabricator.wikimedia.org/T118167#1793181 (Aklapper) [12:56:15] Analytics-Tech-community-metrics: MediaWiki.org stats should also consider discussion activity (Talk/Thread namespaces) - https://phabricator.wikimedia.org/T62074#656677 (Aklapper) p:Low>Lowest [12:58:45] Analytics-Tech-community-metrics, DevRel-December-2015: Clicking "Age of open changesets by Affiliation" explanation link goes to top of page - https://phabricator.wikimedia.org/T110874#1793192 (Aklapper) [13:08:02] Analytics-Tech-community-metrics, DevRel-November-2015: Many profiles on profile.html do not display identity's name though data is available - https://phabricator.wikimedia.org/T117871#1793219 (Aklapper) a:Dicortazar [13:10:40] Analytics-Tech-community-metrics, DevRel-December-2015: Empty "subject" and "creator" fields for mailing list thread on mls.html - https://phabricator.wikimedia.org/T116284#1793226 (Aklapper) [13:10:55] Analytics-Tech-community-metrics, DevRel-December-2015: Contributor pages which show user name but not any other data should include an explanation - https://phabricator.wikimedia.org/T58111#1793227 (Aklapper) [13:11:07] Analytics-Tech-community-metrics, DevRel-December-2015: NaN values for certain "Time from last patchset" values - https://phabricator.wikimedia.org/T115871#1793229 (Aklapper) [13:11:35] Analytics-Tech-community-metrics, DevRel-December-2015: Time axis on repository.html only displays two months, repeated several items - https://phabricator.wikimedia.org/T115872#1793239 (Aklapper) p:Low>Lowest [13:12:19] Analytics-Tech-community-metrics, DevRel-December-2015: Empty "subject" and "creator" fields for mailing list thread on mls.html - https://phabricator.wikimedia.org/T116284#1793242 (Aklapper) a:Dicortazar [13:12:28] Analytics-Tech-community-metrics, DevRel-December-2015: NaN values for certain "Time from last patchset" values - https://phabricator.wikimedia.org/T115871#1793243 (Aklapper) a:Dicortazar [13:12:47] Analytics-Tech-community-metrics: Clicking "Age of open changesets by Affiliation" explanation link goes to top of page - https://phabricator.wikimedia.org/T110874#1793244 (Aklapper) p:Low>Lowest [13:13:04] Analytics-Tech-community-metrics, DevRel-December-2015: Contributor pages which show user name but not any other data should include an explanation - https://phabricator.wikimedia.org/T58111#1793246 (Aklapper) a:Dicortazar [13:13:12] Analytics-Tech-community-metrics, DevRel-December-2015: Time axis on repository.html only displays two months, repeated several items - https://phabricator.wikimedia.org/T115872#1793247 (Aklapper) a:Dicortazar [13:19:18] Analytics-Tech-community-metrics, DevRel-November-2015: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1793275 (Aklapper) [14:00:48] Analytics-Cluster, Analytics-Engineering: analytics1032 has / mounted ro - https://phabricator.wikimedia.org/T118175#1793372 (faidon) NEW [14:05:09] ottomata: are refinery artefacts like refinery-camus.jar deployed to all hive nodes with the same path in /srv/deployment/analytics ? [14:09:06] dcausse: no [14:09:11] but, they are deployed to hdfs [14:09:16] argh :( [14:09:53] hdfs dfs -ls /wmf/refinery/current [14:10:11] you should be able to access them via an hdfs path [14:10:18] https://phabricator.wikimedia.org/T118155 [14:10:21] hdfs://analytics-hadoop/wmf/refinery/current/... [14:11:02] I thought we could use the :jar:file tricks to load the avro schema, but all nodes will try to lookup this path on the local filesystem (not hdfs) [14:11:07] hmmmm, we shoudl figure out the stored as avro thing, that's how I thought it should work [14:11:17] dcausse: maybe try the :jar trick with and hdfs path [14:12:57] hmm not sure how to write this path with hdfs :/ [14:13:45] the STORED AS AVRO is buggy, looks like the schema is not exactly the same, some fields are returning null instead of the actual data [14:14:16] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1793386 (Ottomata) I'm not very opinionated here, but I think an analytics graphite instance will be very useful for other things than ju... [14:14:39] dcause, you just use hdfs://analytlics-hadoop instead of file:/// [14:14:43] hdfs://analytics-hadoop/wmf/refinery/current/... [14:15:07] ok will try something like :jar:hdfs:// ... [14:30:57] Analytics, Discovery-Cirrus-Sprint: Decide how to deploy avro schemas in hive (CirrusSearchRequestSet) - https://phabricator.wikimedia.org/T118155#1793418 (dcausse) Option 2 won't work, artifacts are not deployed to all nodes and I'm unable to find a valid URL that will load a file in a jar stored in hdfs. [14:32:16] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [30.0] [14:40:06] !log restarting eventlogging to see if it is ok after enabling firewall rules on kafka1014 [14:40:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [14:45:25] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [14:56:51] (PS7) DCausse: [WIP] Add initial oozie job for CirrusSearchRequestSet [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) [14:57:20] (CR) DCausse: [C: -1] "WIP" [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) (owner: DCausse) [15:09:30] dcausse: did the 'STORED as avro' not worked? [15:10:07] nuria: no :(, very strange failures, some fields data returns NULL [15:11:45] I can try to re-create the table with some partitions if you want [15:13:12] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 26.67% of data above the critical threshold [30.0] [15:13:41] dcausse: you tried on your own db, right? [15:13:49] nuria: yes [15:15:12] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [15:17:06] dcausse: did you define fields with same types that are defined on schema? [15:17:15] dcausse: i imagine yes [15:17:26] i imagine you did that is [15:17:55] nuria: in fact I created a table with the avro schema, then copied/pasted show create table and added STORED as AVRO [15:18:14] ya... man .. [15:18:37] maybe show create table is wrong :/ [15:18:43] and (to be super sure) [15:19:00] did you tested your data validates to avro schema with avro tools? [15:19:11] let me send you the cmd and write it on wikitech [15:19:42] in fact when I use a table created with the avro schema stored in hdfs I can see all the data [15:20:34] argh!!! [15:21:17] we can also use avro.schema.literal and copy/paste the avro schema directly in the create table script [15:21:51] but here also I found a bug... I have to minify the avro schema otherwize show create table fails with a strange JSon parse error :( [15:22:03] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1793572 (Nuria) I -at best- we will be able to get to this by end of this quarter, not before but we can probably give you a more precise estimate later on this week. [15:23:06] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1793576 (Nuria) I second @ottomata [15:23:49] dcausse: ya, i think it is because if schema is lengthy [15:24:00] yes [15:24:04] dcausse: it goes over size alotted to that field [15:24:17] dcausse: boy you gotta love hive eh? [15:24:26] hehe yes :) [15:24:42] dcausse: and using the handy minification trick did things worked? [15:24:45] * ebernhardson has also been annoyed with hive recently for other reasons. You really have to escape ; inside strings? :P [15:24:51] nuria: yes [15:24:52] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [30.0] [15:26:08] dcausse: ok, then things work that way but we have the schema duplicated [15:26:13] dcausse: is that correct? [15:26:17] hm [15:27:15] morning a-team :) [15:27:21] madhuvishy: helou [15:27:48] nuria: yes unless we add some scripts beside create_CirrusSearchRequestSet_table.hql that will extract the schema from the jar/minify and pass it to hive [15:29:54] but the schema will be duplicated inside the table definition [15:30:10] (in hive) [15:31:51] dcausse: ok, how is this for another crazy idea [15:32:17] dcausse: what if we specify schema as a url and point it to the github depot, thus using latest [15:34:07] like: https://raw.githubusercontent.com/wikimedia/analytics-refinery-source/master/refinery-camus/src/main/avro/CirrusSearchRequestSet.avsc [15:34:19] dcausse: only that --ahem- that is https [15:34:19] nuria: we could, but the docs say its only for testing [15:34:21] ". For http schemas, this works for testing and small-scale clusters, but as the schema will be accessed at least once from each task in the job, this can quickly turn the job into a DDOS attack against the URL provider (a web server, for instance). Use caution when using this parameter for anything other than testing." [15:34:41] we don't have some crazy several hundred node cluster though, so maybe it's ok [15:35:19] another crazy idea: is /mnt/hdfs mounted on all hive nodes? [15:35:30] ebernhardson: ya, i read that also not too crazy about this cause http requests might tie out and such [15:36:26] ottomata: what about /mnt/hdfs ? is it mounted on all hive nodes? [15:37:14] no, dcausse [15:37:19] argh :( [15:37:26] /mnt/hdfs is kinda flaky [15:37:38] its a fuse mount [15:37:45] even just for small reads? :) [15:37:56] haha, i mean, it would probably work most of the time :) [15:38:01] but, we don't want to mount it everywhere [15:38:06] ok [15:38:14] it occasionally locks up, and that would mean a big admin overhead to deal with [15:38:21] sure [15:38:24] dcausse: hdfs:// not work? [15:38:36] not with the jar stuff :( [15:38:38] aye [15:38:39] hm [15:38:41] ottomata: exactly [15:38:57] the jar trick will work only with local filesystem [15:39:11] wait, do you need the schema with every query? or just at table create time? [15:39:18] ottomata: looks like we need to deploy schemas to a known location on hdfs [15:39:31] ottomata: every job [15:39:36] why every job? [15:39:41] ottomata: I thought it was used only at create table time but it is read when querying :(- [15:39:51] its not stored in hive metadata as create table statement?!? [15:39:54] ottomata: thats what the docs say, "the schema will be accessed at least once from each task in the job," [15:40:00] oh, taht's if you specify path to jar or schema. [15:40:03] hm [15:40:12] what if you just provide the schema as a string when you create? [15:40:18] ottomata: that works [15:40:28] ottomata: but that duplicates the schema [15:40:37] then i say just do that (for now), you only have to run create table once, and we do that manually for our tables [15:41:08] i still promise a generic schema repo where we can keep this stuff [15:41:13] by the end of the quarter :) [15:41:20] :) [15:41:33] all right seems that that is what we need to do to move forward [15:42:05] so we agree to embed the schema in the create table script? [15:43:27] nuria: what is the meeting with Erik that joseph mentioned in emails, can you invite me? [15:43:37] madhuvishy: sure [15:50:36] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 25.00% above the threshold [20.0] [16:02:05] dcausse: you will be submitting another patch with the minimized schema and such right? [16:02:46] nuria: yes, I'm just testing the purge python script and then it's ok I think [16:03:57] dcausse: is the purge python script part of your patch? [16:04:03] nuria: yes [16:04:14] dcausse: ok, thank you , will look then [16:04:29] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1793758 (fgiunchedi) @addshore to clarify, more than functionality I was pointing out guarantees about the data stored. if the metrics ar... [16:05:35] Analytics-EventLogging, Analytics-Kanban, MediaWiki-extensions-WikimediaEvents, Patch-For-Review: Provide oneIn() sampling function as part of standard library - https://phabricator.wikimedia.org/T112603#1793764 (Nuria) [16:08:45] Analytics, Analytics-Kanban, Discovery, EventBus, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1793775 (Ottomata) Hi all, I talked to @gwicke a little bit more about this last Thursday. He impressed upon me a couple of good points I hadn't fully taken in before, and I wan... [16:10:42] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1793777 (JanZerebecki) I agree with T117732#1793386. (Though I have no clear opinion on whether it should be a different instance than th... [16:14:37] Analytics-Kanban: Create depot to have different API clients for pageview API - https://phabricator.wikimedia.org/T118190#1793791 (Nuria) NEW [16:33:23] Analytics-Tech-community-metrics, DevRel-December-2015: Names on scr-contributors.html should link to corresponding people.html page - https://phabricator.wikimedia.org/T118192#1793840 (Aklapper) NEW [16:33:34] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1793846 (Christopher) If you are going to use HDFS, why not just use HBase instead of Graphite? [16:36:13] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1793849 (Addshore) > I think there's value in a single shared instance for ease of use Well, this was also my initial thought. Until Joe... [16:38:13] hey halfak, just passing by : https://googleblog.blogspot.fr/2015/11/tensorflow-smarter-machine-learning-for.html [16:50:23] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1793895 (Christopher) If not HBase, what about Cassandra? This is already puppetized. At least you will be using a storage solution tha... [16:56:53] (PS8) DCausse: Add initial oozie job for CirrusSearchRequestSet [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) [16:58:59] (CR) DCausse: "Tested & fixed few problems with the python script : https://phabricator.wikimedia.org/P2294" [analytics/refinery] - https://gerrit.wikimedia.org/r/251238 (https://phabricator.wikimedia.org/T117575) (owner: DCausse) [16:59:09] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1793944 (JanZerebecki) As far as I understand Graphite, using it as a source for the backup means that we are loosing data after the rete... [16:59:14] Analytics-Tech-community-metrics, DevRel-December-2015: Names on scr-contributors.html should link to corresponding people.html page - https://phabricator.wikimedia.org/T118192#1793945 (Aklapper) [17:00:10] a-team will be 5 mins late to stand-up, my daughter has some fever [17:00:22] I need to do a couple things [17:00:37] Analytics, Discovery-Cirrus-Sprint: Decide how to deploy avro schemas in hive (CirrusSearchRequestSet) - https://phabricator.wikimedia.org/T118155#1793947 (dcausse) We decided to go for option 3. We'll review this decision once we have an official schema repo in hdfs. [17:06:55] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1793965 (Ottomata) Doing HBase would be a much larger project than just spinning up a graphite instance. We may eventually evaluate larg... [17:07:49] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1793972 (Ottomata) Cassandra is a complicated clustered solution, graphite will just require a simple single instance running and intakin... [17:21:38] Analytics, Discovery-Cirrus-Sprint: Decide how to deploy avro schemas in hive (CirrusSearchRequestSet) - https://phabricator.wikimedia.org/T118155#1794013 (dcausse) Open>Resolved [17:26:25] Analytics-Kanban: Pageview API Press release {slug} - https://phabricator.wikimedia.org/T117225#1794030 (Milimetric) a:Milimetric [17:27:45] milimetric: can I ask you a q in another hangout? [17:27:57] sure ottomata [17:28:09] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1794036 (Addshore) > As far as I understand Graphite, using it as a source for the backup means that we are loosing data after the retent... [17:36:01] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1794053 (JKatzWMF) @nuria thanks. [17:47:34] madhuvishy: yt? [17:47:39] nuria: yeah [17:47:58] then 3;30 to 4pm today we can talk about nocookie? [17:48:11] nuria: sure [17:48:41] i think teh apps session job chnages are easy but testing on our end will be consuming some time cause i want to make sure job is run for some weeks and results are "sensical" before changing it. [17:49:12] madhuvishy: also, note that there are two tickets 1) provide metrics (monthly) per device 2) provide metrics weekly [17:49:18] nuria: as in? [17:49:23] oh [17:49:29] i haven't seen the other [17:49:29] madhuvishy: we need some clarification [17:49:44] madhuvishy: let me ping them on ticket [17:50:02] okay [17:50:21] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1794114 (Nuria) [17:51:48] nuria: they want only weekly now? [17:52:18] madhuvishy: ya, see, it is not clear cause tillman was saying that he still needs monthly for trends [17:52:30] madhuvishy: let us clarify before we take that item [17:53:10] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1794129 (Christopher) I am not sure why this is considered to be "a simple use case" since as mentioned in T117735 there are at least two... [17:53:37] nuria: yeah alright [17:53:43] that is indeed a bit confusing [17:57:21] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1794154 (Nuria) Then @Tbayer, @JKatzWMF Would you mind clarifying what is the requirement? You want monthly report to continue? (if so no changes to that report will be done) You also want weekly metrics... [18:10:05] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1794211 (Addshore) > Content metrics require long term (non-decaying) storage, operational metrics do not. Both cases can be covered by... [18:12:22] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1794213 (JanZerebecki) >>! In T117732#1794036, @Addshore wrote: >> As far as I understand Graphite, using it as a source for the backup m... [18:17:09] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1794224 (Addshore) > And when this setting changes without us noticing until the backup is overwritten, it overwrites something that is s... [18:23:11] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1794257 (JanZerebecki) > I really don't know why we are all expecting graphite to unexpectedly loose our data? I really don't know why w... [18:29:51] Analytics-Backlog, WMDE-Analytics-Engineering, Wikidata, Graphite: Create a Graphite instance in the Analytics cluster - https://phabricator.wikimedia.org/T117732#1794281 (Addshore) As said above whatever the solution we will want to take backups in some form or another. Having to do backups of grap... [18:31:25] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1794285 (madhuvishy) @Nuria, this is what I understand - We should change the report to add a column for "Device" or change the type column to reflect the device type, and produce per device and overall metri... [18:32:58] Analytics-Backlog: Implement a simple public API to calculate global metrics {kudu} - https://phabricator.wikimedia.org/T117285#1794289 (Milimetric) [18:37:33] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1794291 (Tbayer) >>! In T117615#1786815, @Tbayer wrote: >>>! In T117615#1781919, @Nuria wrote: >>> absolutely agree, but that's hypothetical as we don't have platform-specific data... [18:39:54] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1794298 (Nuria) >We can backfill two months' worth of data, but not more. Please note that months are not calendar months, though. We can backfill now only month of October (as we... [18:46:11] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1794303 (Tbayer) >>! In T117637#1794154, @Nuria wrote: > Then @Tbayer, @JKatzWMF > > Would you mind clarifying what is the requirement? > > You want monthly report to continue? (if so no changes to that re... [19:05:01] Analytics-Backlog: Provide weekly app session metrics separately for Android and iOS - https://phabricator.wikimedia.org/T117615#1794357 (Tbayer) >>! In T117615#1794298, @Nuria wrote: >>We can backfill two months' worth of data, but not more. > Please note that months are not calendar months, though. > We c... [19:09:12] Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1794369 (Tbayer) >>! In T117637#1794285, @madhuvishy wrote: > @Nuria, > > this is what I understand - We should change the report to add a column for "Device" or change the type column to reflect the device t... [19:46:48] (PS1) Christopher Johnson (WMDE): adds property labels to property usage table [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/252024 [19:47:29] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] minor lint and path fixes [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/251938 (owner: Christopher Johnson (WMDE)) [19:49:13] (CR) Christopher Johnson (WMDE): [C: 2 V: 2] adds property labels to property usage table [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/252024 (owner: Christopher Johnson (WMDE)) [20:24:15] (PS1) Christopher Johnson (WMDE): adds static path for tsv output [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/252033 [20:26:26] Quarry: Time limit on quarry queries - https://phabricator.wikimedia.org/T111779#1794532 (Jarekt) yuvipanda, it seems like there is still a problem, as http://quarry.wmflabs.org/query/2556 is running for 2 hours now. By the way, is there a way to get this query to go faster? [20:28:41] (PS2) Christopher Johnson (WMDE): adds static path for tsv output [wikidata/analytics/dashboard] - https://gerrit.wikimedia.org/r/252033 [21:35:49] joal, I am sorry for being terrible, but which UDF did you suggest that had arrays as input? [21:37:35] and second question, which repo would I go in to find the structure or hive tables that are fed into by kafka data? [21:39:18] Ironholds: there isn't a repo that has specifically kafka data, but the create tables statements that we use are in refinery [21:39:19] so [21:39:32] https://github.com/wikimedia/analytics-refinery/tree/master/hive [21:39:40] create_ *.hql files in there [21:39:41] ottomata, gotcha! I'm looking to tweak the column descriptions in the cirrus data [21:39:51] ah, that, i don't know [21:40:02] that is Cirrus stuff, not analytics :) [21:40:16] in that I mean, its not an analytics team managed table :) [22:13:56] Analytics, Reading-Admin, Zero: Country mapping routine for proxied requests - https://phabricator.wikimedia.org/T116678#1794671 (DFoy) @kevinator - we are unsure this is working properly - can you verify that we have correct countries identified when coming through a proxy? [22:14:07] wow [22:21:36] yeehaw! :) [22:48:52] Analytics-Cluster, Analytics-Engineering: analytics1032 has / mounted ro - https://phabricator.wikimedia.org/T118175#1794742 (Ottomata) Hm, this happened on analytics1038 a week ago. I rebooted and ran fsck, but found no errors. After that it came back up again. Doing the same here, but twice in 7 days... [23:00:08] ottomata, cool [23:08:22] Analytics-Cluster, Analytics-Engineering: analytics1032 has / mounted ro - https://phabricator.wikimedia.org/T118175#1794794 (Ottomata) 1 reboot did not bring this back up. I then dropped into Ubuntu recovery, got a root shell, and tried to do fsck of /dev/mapper/analytics1032--vg-root. This showed no e... [23:10:04] Analytics-Backlog: Create a tool that can read the elements of a wiki template and call the API {kudu} - https://phabricator.wikimedia.org/T117290#1794805 (Abit) There is a Norwegian wp bot used for writing contest that tracks a couple global metrics. Haven't looked at how is works yet, but maybe could be he... [23:13:08] milimetric: I was just seeing https://phabricator.wikimedia.org/T117290#1794805 - I think you can already do this using mwparserfromhell