[00:21:31] Analytics: python-mwviews does not handle unicode in titles - https://phabricator.wikimedia.org/T123200#1926348 (Milimetric) a:Milimetric [00:23:23] nuria: what task tracks browser usage data? [00:23:27] the one you keep plugging? :) [00:39:02] madhuvishy: what's hue.wikimedia.org? [00:39:16] and can i have access if it has browser usage reports [00:39:20] jdlrobson: it's a GUI to hive [00:39:30] you can directly get from hive too [00:39:41] madhuvishy: keen to see output of https://phabricator.wikimedia.org/T88504 [00:39:41] jdlrobson: do you have stat1002 access? [00:39:45] madhuvishy: yes [00:39:55] then you should be able to log in with ldap [00:40:04] oh stat1003 is what i have [00:40:20] ah - no hadoop user? let me check puppet [00:41:31] jdlrobson: https://github.com/wikimedia/operations-puppet/blob/711405a425ca635675d7b3b3eb0cbf175a4b15f1/modules/admin/data/data.yaml#L226 [00:41:37] hmmm i don't think you are here [00:42:05] nope. How can I request access for all the team? [00:42:21] I want people to be able to see what % of our traffic is a certain browser easily [00:42:36] jdlrobson: can you request it - and specifically mention getting added to analytics-privatedata-users - for multiple people - i'm not sure how ops wants it [00:42:42] Analytics-Cluster, Analytics-Kanban, Reading-Admin, Easy, Patch-For-Review: PM sees reports on browsers (Weekly or Daily) {lama} [8 pts] - https://phabricator.wikimedia.org/T88504#1926478 (Jdlrobson) Is there a wiki page documenting how to see these reports? [00:42:51] jdlrobson: we'll soon have these reports public [00:43:08] milimetric is working on making a dashboard for this data [00:43:09] madhuvishy: you in office? [00:43:12] yes [00:43:15] on the other side [00:46:26] ops wants blood sacrifices [00:46:32] must be provably human [00:52:25] jdlrobson: https://hue.wikimedia.org/filebrowser/view/wmf/data/archive/browser/general [00:57:07] jdlrobson: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive [02:16:47] Analytics: python-mwviews does not handle unicode in titles - https://phabricator.wikimedia.org/T123200#1926807 (ResMar) @Aklapper Apologies, being unfamiliar with the organization of Phabricator tasks (well, with what place this toolset occupies in it) I was unsure of where to place this bug. Thank you! @... [03:26:34] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, MW-1.26-release, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1926854 (Tgr) Beta version: http://g... [03:29:28] Analytics, MediaWiki-Authentication-and-authorization, Reading-Infrastructure-Team, MW-1.26-release, Patch-For-Review: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1926864 (Tgr) Although "no visible c... [03:59:08] Analytics: python-mwapi fails on import in Python 2.7 - https://phabricator.wikimedia.org/T123201#1926907 (ResMar) [04:00:21] Analytics: python-mwapi fails on import in Python 2.7 - https://phabricator.wikimedia.org/T123201#1926910 (ResMar) stalled>Open [04:00:58] Analytics: python-mwapi fails on import in Python 2.7 - https://phabricator.wikimedia.org/T123201#1923555 (ResMar) Thanks, I've gone and done so. Again, apologies, a little unfamiliar with what project maintains these tools. [13:00:05] hi a-team :] [13:02:49] Analytics-Cluster, Analytics-Kanban, Reading-Admin, Easy, Patch-For-Review: PM sees reports on browsers (Weekly or Daily) {lama} [8 pts] - https://phabricator.wikimedia.org/T88504#1927371 (mforns) @Jdlrobson https://wikitech.wikimedia.org/wiki/Analytics/Cluster/BrowserReports It's short, but I th... [13:06:12] Hey mforns :) [13:06:18] Howdy ? [13:06:26] hi joal, I'm good! you? [13:06:38] Yeah, good as well [13:47:34] (CR) Mforns: [C: 2 V: 2] "LGTM! thx" [analytics/limn-analytics-data] - https://gerrit.wikimedia.org/r/261592 (https://phabricator.wikimedia.org/T122626) (owner: Nuria) [14:03:51] o/ [14:04:00] Hey halfak ! [14:04:06] joal, just using our meeting time this morning to get some hacking done on the research cluster [14:04:06] sorry, missed the time, joining [14:04:09] not much to talk about [14:04:16] k halfak :) [14:04:24] let me know if I can help [14:04:41] halfak: --^ [14:04:48] joal, one thing I would like to discuss is the decision to not accept full paths in the ETL script. [14:05:03] We had built it to only accept the "wiki-date" we wanted to process. [14:05:22] hmm [14:05:31] It seems like that limits flexibility and will require us to edit the script if we every change where the data lives or if we want to run a test dataset. [14:05:48] halfak: you are right [14:06:10] * halfak was worried he'd just forgotten the original thoughts [14:06:24] halfak: IIRC we choose that to make it easier for users [14:06:47] halfak: So I have no problem making full path parameters :) [14:07:45] joal, OK. maybe we can make a wrapper that will build standard sets of arguments based on a simple wiki-date param. [14:08:03] halfak: sounds great idea :) [14:08:21] * halfak has lost our cloud9 repo again :( [14:08:27] huhu [14:08:39] FOUND IT :) [14:08:43] :D [14:08:53] Why is "shared with me" always split from "mine" [14:09:02] "shared with me" is "mine" while it is shared! [14:09:23] * halfak reflects on mine-ness with sleepy brain [14:09:34] Agreed ... Engineering-centered user-design maybe? [14:10:02] Oh! I suppose that seems likely. [14:11:24] * halfak starts working on moving his denormalized productivity measurements to the Research Cluster [14:15:32] Analytics-Backlog: Use a new approach to compute monthly top 1000 articles (brute force in hive doesn't work) - https://phabricator.wikimedia.org/T120113#1927411 (JAllemandou) Two reasons for this to have failed: - having to work every project/access-method at once --> The job needs to aggregate, sort and fi... [14:23:38] joal, I have a dataset with 50m rows (page stats) and a dataset with 600m rows (revision stats) that I'd like to join together. [14:23:42] How would you do that? [14:29:58] Arg! Looks like I've lost access to the s3 bucket with our wikimedia cluster [14:29:59] boo! [14:30:08] So I can't transfer the data at all. [14:30:09] :/ [14:41:42] halfak: really ? [14:41:50] Just worked it out. :) [14:41:55] Ahh :) [14:41:59] Looks like it's still working :D [14:42:06] cool [14:42:09] Also, I didn't realize that buckets were expensive. [14:42:35] I'm not sure who is paying for ours. [14:42:46] But altiscale deleted the google docs file with all the details. [14:42:56] good old brain remembered enough to operate. :) [14:42:56] about joining datasets, it depends how the code you are using (hive, core-MR, spark ...) [14:43:11] well done :) [14:43:12] joal, I have two datasets and would like to join them. No code yet. [14:43:46] If data is easily loadable (or even better already loaded) in hive, just use that :) [14:43:54] or Spark (of course) [14:44:11] Core MR for joins is not the easiest [14:45:03] OK. So load the two datasets into HIVE as external tables and then do a join query -- write the output in a place that I can reuse. [14:45:08] Makes enough sense to me. [14:45:17] great [14:45:38] halfak: Easiest would be to write the data as a table itself [14:45:38] If I save the table as parquet, will that make it easy/hard to operate on it in spark? [14:45:50] CREATE TABLE blah AS SELECT ... [14:46:01] halfak: easy [14:46:25] hm, can't remember the version of spark there is on that cluster [14:46:53] Cool [14:47:18] halfak: If you go for hive and create a new xexternal table, you probably want to specify the file format as well [14:49:49] halfak: any change on the research cluster ? [14:49:58] halfak: Seems I can't login anynmore :( [14:50:15] Hmm... Weird. I know we switched to a new workbench, but that was a couple months ago. [14:50:25] Can you tell me what domain/port you're sshing to? [14:50:41] I had ia.z42 [14:50:52] Seems to be waterloo-ia now, right ? [14:51:02] port 1237 [14:51:06] woops no [14:51:30] waterloo.z43.altiscale.com:1450 [14:51:37] yessir !b [14:51:43] Updating ssh config [14:56:32] Arg! I need to figure out how to block off more time for this work. :/ [14:56:38] * halfak goes on to the next thing. [15:04:53] halfak: I sent an email, can't login to the new workbench (as if my key wasn't installed) [15:05:08] The /home mount is 100% full [15:05:11] That might be the problem. [15:05:16] Arfff [15:05:18] I'm emailing people about it right now [15:05:26] You rock :) [15:05:27] This happens to me every time I log in to get work done. [15:05:32] >:( [15:05:38] hmm [15:10:19] YuviPanda: plop plop ? [15:30:19] The data transfers! [15:41:04] Nooo! Flattening. [15:41:05] :( [15:41:10] I have to flatten my new JSON. [15:41:30] joal, just checking -- any thoughts on ways to not need to flatten JSON for querying in HIVE? [15:41:37] halfak: hmmmm [15:41:48] halfak: Having a schema that contains a struct [15:41:58] Darn [15:42:11] I wish it could just pretend that it is flat and replace levels with a special char. [15:42:15] halfak: I think the json-transformer can flatten at transform/sort time [15:42:31] joal, is this the serde? [15:42:49] nonono - the one to xml2json [15:43:11] Oh yeah. I'm working with the outputs of my persistence jobs. [15:43:15] But it doesn't help really if the json has already been generated [15:43:16] So, it's already JSON [15:43:21] ok, makes sense [15:43:40] hmm, if structure not too complex, usin hive sub-struct could do [15:43:44] halfak: --^ [15:44:07] I suppose we'll want a "flatten.in_hadoop" script that will flatten JSON and give hierarchy with a special char. [15:44:32] e.g. page: {id: ...} --> page_id: ... [15:44:55] halfak: feasible [15:45:20] halfak: I would even say: easily feasible for most cases [15:46:22] jdlrobson: browser data visualization: https://phabricator.wikimedia.org/T118329 [15:47:51] jdlrobson: but note that data is available in cluster: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/BrowserReports [15:50:42] HELLO JETLAG! [15:51:36] ottomata: jajajaja [15:53:35] Analytics-Kanban: Gather metrics about cluster usage - https://phabricator.wikimedia.org/T121783#1927556 (JAllemandou) a:JAllemandou [16:03:46] * joal is away for a bit [16:06:18] Aaaand I could write and test a flattener if I could install anything on the machine but since there is no disk space, I'm fully blocked [16:06:19] Weeee [16:06:56] nuria, sorry! forgot [16:07:09] mforns: np [16:07:31] nuria, joining [16:07:37] mforns: k [16:08:11] halfak: whatchu talkin bout? [16:08:23] altiscale workbench server [16:08:45] "I know, let's give 40 people 10GB to work with. That'll teach them to play nicely together." [16:08:58] But in practice, one guy fills up all of the space and then goes on vacation for a week. [16:09:01] EVERY WEEK [16:09:32] Also, apparently at least 5GB of this data must be there no matter what. [16:15:40] Analytics-Cluster, Datasets-Archiving, Datasets-Webstatscollector: Mediacounts missing top1000 files after 2016-01-01: rsync fails - https://phabricator.wikimedia.org/T122864#1927593 (Ottomata) a:ezachte>Ottomata [16:15:55] Analytics-Cluster, Analytics-Kanban, Datasets-Archiving, Datasets-Webstatscollector: Mediacounts missing top1000 files after 2016-01-01: rsync fails - https://phabricator.wikimedia.org/T122864#1915529 (Ottomata) [16:21:49] qchris: hiya! yt? [16:21:54] https://phabricator.wikimedia.org/T122864 [16:22:01] i'm looking for what creates the top1000 files [16:22:09] but not finding it in refinery or puppet [16:22:13] There, but in a meeting. [16:22:36] Ezachte ran some strips to extract some parts of those files. [16:22:49] (That's unpuppetized and outside of refinery) [16:23:00] Let me check my emails after my meeting. I'll look it up. [16:23:09] ah! [16:23:11] ok [16:23:26] oook [16:23:27] s/strips/scripts/ [16:23:29] i know what's wrong then [16:23:34] he can't write into this dir [16:23:35] fixing [16:23:37] the new 2016 dir [16:24:31] thanks [16:25:22] Analytics-Cluster, Analytics-Kanban, Datasets-Archiving, Datasets-Webstatscollector: Mediacounts missing top1000 files after 2016-01-01: rsync fails - https://phabricator.wikimedia.org/T122864#1927612 (Ottomata) Ok, yeah, I changed the perms on the new 2016 directory so you should be able to write... [16:25:53] Analytics-Cluster, Analytics-Kanban, Datasets-Archiving, Datasets-Webstatscollector: Mediacounts missing top1000 files after 2016-01-01: rsync fails - https://phabricator.wikimedia.org/T122864#1927621 (Ottomata) a:Ottomata>ezachte [16:26:32] Analytics-Kanban, operations, HTTPS: EventLogging sees too few distinct client IPs {oryx} [8 pts] - https://phabricator.wikimedia.org/T119144#1927624 (Ottomata) a:Ottomata [16:39:03] a-team: have management meeting today , i have sent the e-scrum [16:46:34] Analytics-Kanban, operations, HTTPS, Patch-For-Review: EventLogging sees too few distinct client IPs {oryx} [8 pts] - https://phabricator.wikimedia.org/T119144#1927669 (Ironholds_backup) Will this do XFF resolution or just the immediate client IP? (Vast fix either way, mind!) [16:47:15] Analytics-Kanban, operations, HTTPS, Patch-For-Review: EventLogging sees too few distinct client IPs {oryx} [8 pts] - https://phabricator.wikimedia.org/T119144#1927671 (Ottomata) Yes, it does XFF. It is the new canonical way of IDing client IPs, and is done in varnish for all requests. [16:56:07] Analytics-Kanban, operations, HTTPS, Patch-For-Review: EventLogging sees too few distinct client IPs {oryx} [8 pts] - https://phabricator.wikimedia.org/T119144#1927694 (Ironholds_backup) Cooool! Super-excited about this :D [16:58:26] ottomata: o/ [17:00:00] milimetric: http://www.parrot.com/zik/usa/ [17:02:21] mforns: yooo [17:02:29] ottomata, trying to join! [17:02:35] it's kicking you out? [17:02:44] not letting me in.. [17:02:53] mforns: i invited you again, try now? [17:07:34] Analytics-Kanban, Patch-For-Review: Write hive code doing pageview data anonimisation with two tables [13 pts] {hawk} - https://phabricator.wikimedia.org/T118838#1927715 (Milimetric) a:mforns>JAllemandou [17:15:34] that pageview dashboard in case you missed it in standup: https://grafana.wikimedia.org/dashboard/db/pageviews [17:23:57] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1927749 (Sadads) @krenair could you review r/263145? [17:24:50] Analytics-Backlog, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1927753 (Krenair) To request reviews you need to use the 'Add reviewer' form in Gerrit [17:31:41] Analytics-Kanban, operations, HTTPS, Patch-For-Review: EventLogging sees too few distinct client IPs {oryx} [8 pts] - https://phabricator.wikimedia.org/T119144#1927780 (Ottomata) Ok, should be deployed. Can someone verify that new data looks good? [17:35:16] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1927791 (Ottomata) I believe we can close this task, ja? Got a few defined here: https://github.com/wikimedia/mediawiki-event-s... [17:36:32] ottomata: so this is the script that is now slowed to a crawl: https://github.com/wikimedia/operations-puppet/blob/f0df1ec45b3f70a5c041cef217751014824ca6ec/files/mariadb/eventlogging_sync.sh [17:36:52] last time it was stuck yuvi tailed the log and saw SQL statements happening, just very slowly [17:37:29] milimetric: do we know why we aren't using regular replication? [17:37:52] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1927800 (Eevans) >>! In T116247#1927791, @Ottomata wrote: > I believe we can close this task, ja? Got a few defined here: https... [17:37:53] not sure the root reason, other than this is how sean did it and we haven't had time to fix/improve [17:42:44] milimetric: https://github.com/wikimedia/operations-puppet/commit/f0df1ec45b3f70a5c041cef217751014824ca6ec [17:44:44] ottomata: makes sense, but doesn't make sense why it's lagging [17:45:11] ja [17:45:12] am looking [17:45:40] not sure how the script connects to mysql...no pw or user given [17:46:39] there's probably a .myconfig file in the right place [17:47:00] yup :) [17:47:40] milimetric: i think its lagging because of a query [17:47:48] research | 10.64.36.103:54339 | log | Query | 5607 | Queried about 960000 rows | select day, [17:47:48] sum(if(success, repeated, 0)) / sum(repeated) as 'total', [17:47:48] sum(if(success | 0.000 | [17:47:48] ... [17:47:59] not totallys ure [17:48:20] hmm, dunno though [17:48:32] maybe not [17:48:37] how long has this been happening? a few days? [17:52:25] milimetric: it looks to be about 5 or 6 hours behind, is that right? [17:57:36] sorry ottomata :) [17:57:44] it's a few hours behind for some tables [17:57:47] but 2 days behind for others [17:57:59] I only looked at 15 of the tables [17:58:35] ottomata: that query's only been running for 2 hours and hasn't seen *that* many rows [17:59:43] yeah [18:00:14] the box is very busy though [18:00:25] ok, so it's load then, which makes sense [18:00:29] looks like io [18:00:30] but I wonder what it's busy with [18:00:32] mostly [18:00:47] is it ok on disk space? [18:01:38] ja [18:01:40] 1.9 T avail [18:01:50] hmhmhm [18:02:50] hmm [18:03:21] only 50% loaded, but 30ish percent of htat is iowait [18:06:57] hmm, milimetric don't thikn this is the problem though [18:07:02] not much change over the last week [18:07:15] or month [18:07:29] we're going to the office, I'll be back on later and keep thinking about it [18:07:45] hm, there is a very small bump up in cpu utilization, but not really significant i think [18:20:19] joal: let me know if you'll find any good suspect for the jobs failing in the cluster (with failing == needed to be restarted, still not 100% sure about the terminology) [18:20:31] Sure elukey :) [18:20:36] thanks! [18:27:39] joal: can you help me look at this for a bit? [18:27:40] https://gerrit.wikimedia.org/r/#/c/216341/10/oozie/last_access_uniques/monthly/last_access_uniques_monthly.hql [18:27:52] * joal reads [18:29:20] nuria: That's a complex piece of hive ! [18:29:35] I *think* i figured out what is wrong [18:29:46] batcave for discussion/explanation? [18:29:50] joal: let me fix it and i will bother you later if it doesn't work [18:30:02] k nuria [18:30:22] I also have a quick comment about nornalized uri_host :) [18:30:28] nuria: --^ [18:30:42] yes [18:39:04] nuria: If you have time, I think explanations are worth :) [18:39:45] joaL: i am omw to batcave to talk with madhu about query, want to join? [18:39:53] yup ! [18:50:47] leila fyi: https://phabricator.wikimedia.org/T121727#1928163 [18:51:35] I'll prepare a card jdlrobson. thanks. [18:51:45] by when do you need it jdlrobson? [18:52:16] even if just a placeholder - Adam said we are chatting with you tomorrow and it would be useful to have an artifact we can flesh out in that meeting [18:52:34] sounds good, jdlrobson. [18:52:37] We just want to get a sense of the amount of work you need from us (i'm assuming we'll work that out tomorrow) [18:52:45] Analytics-Tech-community-metrics, DevRel-January-2016: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1928176 (Aklapper) [18:52:46] Analytics-Tech-community-metrics, DevRel-January-2016: Many profiles on profile.html do not display identity's name though data is available - https://phabricator.wikimedia.org/T117871#1928177 (Aklapper) [18:52:48] Analytics-Tech-community-metrics, DevRel-January-2016: Empty "subject" and "creator" fields for mailing list thread on mls.html - https://phabricator.wikimedia.org/T116284#1928178 (Aklapper) [18:52:50] Analytics-Tech-community-metrics, DevRel-January-2016: Time axis on repository.html only displays two months, repeated several items - https://phabricator.wikimedia.org/T115872#1928179 (Aklapper) [18:52:52] Analytics-Tech-community-metrics, DevRel-January-2016: NaN values for certain "Time from last patchset" values - https://phabricator.wikimedia.org/T115871#1928180 (Aklapper) [18:52:54] Analytics-Tech-community-metrics, DevRel-January-2016: "Age of open changesets by Affiliation" has some "NaN" values - https://phabricator.wikimedia.org/T110875#1928182 (Aklapper) [18:52:56] Analytics-Tech-community-metrics, DevRel-January-2016: Affiliations and country of resident should be visible in Korma's user profiles - https://phabricator.wikimedia.org/T112528#1928181 (Aklapper) [18:52:58] Analytics-Tech-community-metrics, DevRel-January-2016: "Tickets" (defunct Bugzilla) vs "Maniphest" sections on korma are confusing - https://phabricator.wikimedia.org/T106037#1928183 (Aklapper) [18:53:00] Analytics-Tech-community-metrics, DevRel-January-2016: Contributor pages which show user name but not any other data should include an explanation - https://phabricator.wikimedia.org/T58111#1928186 (Aklapper) [18:53:02] Analytics-Tech-community-metrics, DevRel-January-2016: Legend for "review time for reviewers" and other strings on repository.html - https://phabricator.wikimedia.org/T103469#1928184 (Aklapper) [18:54:15] Analytics-Backlog, Analytics-Wikistats, DevRel-January-2016: Clean the code review queue of analytics/wikistats - https://phabricator.wikimedia.org/T113695#1928189 (Aklapper) [19:15:39] Analytics-Tech-community-metrics, Developer-Relations, DevRel-January-2016: Check whether it is true that we have lost 40% of (Git) code contributors in the past 12 months - https://phabricator.wikimedia.org/T103292#1928303 (Aklapper) [19:21:21] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 6 others: Reliable publish / subscribe event bus - https://phabricator.wikimedia.org/T84923#1928329 (mobrovac) [19:21:27] Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1928325 (mobrovac) Open>Resolved Indeed. We are done here. [19:29:15] Analytics: Backfill pageview_hourly sanitization - 1 month - [8 pts] {hawk} - DUPLICATE THIS TASK FOR EACH MONTH TO BACKFILL - https://phabricator.wikimedia.org/T118842#1928360 (ggellerman) [19:29:17] Analytics: Deploy pageview sanitization and start ongoing process [5 pts] {hawk} - https://phabricator.wikimedia.org/T118841#1928361 (ggellerman) [19:29:19] Analytics: Productionize hive code with Oozie job and refinery inclusion [8 pts] {hawk} - https://phabricator.wikimedia.org/T118839#1928362 (ggellerman) [19:29:20] Analytics: Expose the results of the global metric at a public link, that's available immediately for the API {kudu} [8 pts] - https://phabricator.wikimedia.org/T118310#1928363 (ggellerman) [19:29:22] Analytics: Projections of cost and scaling for pageview API. {hawk} [8 pts] - https://phabricator.wikimedia.org/T116097#1928364 (ggellerman) [19:29:24] Analytics: Doc cleanup day 2.0 {flea} [15 pts] - https://phabricator.wikimedia.org/T112024#1928365 (ggellerman) [19:29:26] Analytics, Research-and-Data: Research Spike: Article Title normalization contains weird chars [8 pts] {hawk} - https://phabricator.wikimedia.org/T108867#1928366 (ggellerman) [19:30:15] Analytics: Add instruction text next to the input fields in the Program Global Metrics Report {kudu} - https://phabricator.wikimedia.org/T121899#1928390 (ggellerman) [19:30:16] Analytics: Productionize last access jobs for daily and monthly calculations {bear} - https://phabricator.wikimedia.org/T122514#1928389 (ggellerman) [19:30:18] Analytics: 'is_spider' column in eventlogging user agent data {flea} - https://phabricator.wikimedia.org/T121550#1928391 (ggellerman) [19:30:20] Analytics, Analytics-Cluster, Patch-For-Review: Single Kafka partition replica periodically lags - https://phabricator.wikimedia.org/T121407#1928392 (ggellerman) [19:30:22] Analytics: Create Pageview API dashboard to monitor response times - https://phabricator.wikimedia.org/T121277#1928393 (ggellerman) [19:30:24] Analytics, Analytics-Wikimetrics: Include all timezones in global metrics report interface {kudu} - https://phabricator.wikimedia.org/T121167#1928394 (ggellerman) [19:30:26] Analytics: Use a new approach to compute monthly top 1000 articles (brute force in hive doesn't work) - https://phabricator.wikimedia.org/T120113#1928396 (ggellerman) [19:30:28] Analytics: Change the Pageview API's RESTBase docs for the top endpoint - https://phabricator.wikimedia.org/T120019#1928397 (ggellerman) [19:30:30] Analytics: ==== Immediate Above ==== - https://phabricator.wikimedia.org/T115634#1928399 (ggellerman) [19:30:33] Analytics, Project-Creators: Dedicated and/or automated Wikimedia pageviews API project/tag in Phabricator Maniphest - https://phabricator.wikimedia.org/T119151#1928398 (ggellerman) [19:30:35] Analytics, Analytics-EventLogging: Upgrade eventlogging servers to Jessie - https://phabricator.wikimedia.org/T114199#1928400 (ggellerman) [19:30:37] Analytics: Create new table for 'referer' aggregated data - https://phabricator.wikimedia.org/T112284#1928401 (ggellerman) [19:30:39] Analytics: Host a debrief of EventLogging cleanup {tick} - https://phabricator.wikimedia.org/T104351#1928403 (ggellerman) [19:30:49] Analytics: Story: VitalSignsUser selects Monthly Pageviews metric - https://phabricator.wikimedia.org/T75331#1928414 (ggellerman) [19:31:48] Analytics, Analytics-Wikimetrics: Bot to call global metrics to event page {kudu} - https://phabricator.wikimedia.org/T120330#1928441 (ggellerman) [19:31:50] Analytics, Ops-Access-Requests, operations: add mforns, milimetric, nuria,ottomata, madhuvishy and joal to piwik-roots - https://phabricator.wikimedia.org/T122325#1928442 (ggellerman) [19:31:52] Analytics: Define what constitutes a search pageview - https://phabricator.wikimedia.org/T120249#1928443 (ggellerman) [19:31:55] Analytics, WMDE-Analytics-Engineering: Provide machine readable directory indexes on http://datasets.wikimedia.org/aggregate-datasets/ - https://phabricator.wikimedia.org/T117480#1928446 (ggellerman) [19:31:57] Analytics: Create cron on 1002 to remove CirrusSearchRequest partitions - https://phabricator.wikimedia.org/T119897#1928444 (ggellerman) [19:31:59] Analytics, Privacy, Varnish: Connect Hadoop records of the same request coming via different channels - https://phabricator.wikimedia.org/T113817#1928449 (ggellerman) [19:32:03] Analytics, Beta-Cluster-Infrastructure, Deployment-Systems, Services, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1928448 (ggellerman) [19:32:08] Analytics, Deployment-Systems, Services, operations, Scap3: Deploy AQS with scap3 - https://phabricator.wikimedia.org/T114999#1928450 (ggellerman) [19:32:10] Analytics, Discovery: Display automata and humans separately on zero results rate graph - https://phabricator.wikimedia.org/T112846#1928456 (ggellerman) [19:32:12] Analytics: Re-baselining checkpoints periodically - https://phabricator.wikimedia.org/T112009#1928457 (ggellerman) [19:32:15] Analytics, Analytics-EventLogging, Performance-Team, Patch-For-Review: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#1928461 (ggellerman) [19:32:17] Analytics, Wikimedia-Developer-Summit-2016: Developer summit session: Pageview API from the Event Bus perspective - https://phabricator.wikimedia.org/T112956#1928455 (ggellerman) [19:32:19] Analytics, DBA: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1928462 (ggellerman) [19:32:23] Analytics: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#1928469 (ggellerman) [19:32:26] Analytics, Fundraising-Analysis, Research-and-Data-Archive: What's our projected ability to fundraise in the coming years - https://phabricator.wikimedia.org/T107606#1928471 (ggellerman) [19:32:29] Analytics, MediaWiki-API, Reading-Infrastructure-Team, Research-and-Data, and 3 others: Publish detailed Action API request information to Hadoop - https://phabricator.wikimedia.org/T108618#1928470 (ggellerman) [19:32:31] Analytics, Research consulting, Research-and-Data: Too few page views for June/July 2015 - https://phabricator.wikimedia.org/T106034#1928472 (ggellerman) [19:32:33] Analytics, Research management, Research-and-Data: Pipeline for data-intensive applications from research to productization to integration - https://phabricator.wikimedia.org/T105815#1928473 (ggellerman) [19:32:43] Analytics, Need-volunteer: Validate JsonSchemaContent using MediaWIki core's handling - https://phabricator.wikimedia.org/T76432#1928483 (ggellerman) [19:33:04] madhuvishy: did you try merging into the secret repo? [19:33:25] YuviPanda: not yet - will do in a few minutes [19:33:28] working on it [19:33:55] madhuvishy: cool cool [19:34:00] Analytics-Engineering, Analytics-Visualization: PM shares a deep link into Limn Dashboard [8 pts] - https://phabricator.wikimedia.org/T78743#1928502 (ggellerman) Open>declined [19:34:02] Analytics-Engineering: Investigate an automated way to load the data into warehouse - https://phabricator.wikimedia.org/T78099#1928503 (ggellerman) Open>declined [19:34:03] Analytics-Engineering: EPIC: data warehouse - https://phabricator.wikimedia.org/T76382#1928504 (ggellerman) [19:34:05] Analytics-Engineering: Update Gerrit Documentation - https://phabricator.wikimedia.org/T77059#1928508 (ggellerman) Open>declined [19:34:07] Analytics-Engineering: EPIC: data warehouse - https://phabricator.wikimedia.org/T76382#799107 (ggellerman) [19:34:09] Analytics-Kanban: {puma} Mondrian & Saiku - https://phabricator.wikimedia.org/T76739#1928511 (ggellerman) [19:34:11] Analytics-Engineering: LDAP authentication - https://phabricator.wikimedia.org/T76756#1928509 (ggellerman) Open>declined [19:34:13] Analytics-Engineering: puppetize Pentaho - https://phabricator.wikimedia.org/T76755#1928510 (ggellerman) Open>declined [19:34:15] Analytics-Kanban: {puma} Mondrian & Saiku - https://phabricator.wikimedia.org/T76739#819159 (ggellerman) [19:34:17] Analytics-Kanban: {puma} Mondrian & Saiku - https://phabricator.wikimedia.org/T76739#819159 (ggellerman) [19:34:19] Analytics-Engineering: EPIC: data warehouse - https://phabricator.wikimedia.org/T76382#799107 (ggellerman) [19:36:37] a-team: quick question on phab cleanup [19:36:40] (which we're doing now) [19:36:43] thanks Grace! [19:36:50] :) [19:36:57] so question: do you all care if Backlog and Incoming get merged? [19:37:12] all the Incoming is really tasks added since last time we all groomed [19:37:17] so we'd have to groom those and the backlog anyway [19:37:17] hmmm, i think we've looked through everything in backlog once [19:37:22] right [19:37:22] but not incoming [19:37:26] ok [19:37:35] we'll keep them separate then [19:37:40] although i don't know if we remember anything [19:37:42] I'm ok with both [19:38:09] or we could use a rubicon [19:38:35] Analytics, Analytics-Wikimetrics: Once public a report cannot be made private - https://phabricator.wikimedia.org/T113452#1928591 (ggellerman) [19:38:35] Analytics, Analytics-Wikimetrics: Cannot remove invalid members from cohort - https://phabricator.wikimedia.org/T113454#1928590 (ggellerman) [19:38:38] Analytics, Wikimedia-Mailing-lists: home page for the analytics mailing list should link to gmane - https://phabricator.wikimedia.org/T116740#1928589 (ggellerman) [19:38:40] Analytics, Analytics-Wikimetrics: Accented letters seem to be rejected in cohort names - https://phabricator.wikimedia.org/T111611#1928592 (ggellerman) [19:38:43] Analytics, operations, Privacy: Honor DNT header for access logs & varnish logs - https://phabricator.wikimedia.org/T98831#1928593 (ggellerman) [19:38:45] Analytics, Analytics-Cluster: Investigate getting redirect_page_id as an x_analytics field using the X analytics extension. {pika} - https://phabricator.wikimedia.org/T89397#1928594 (ggellerman) [19:38:47] Analytics, Analytics-EventLogging: Story: Analytics Eng can monitor database replication lag - https://phabricator.wikimedia.org/T86136#1928596 (ggellerman) [19:39:00] Analytics, Analytics-Dashiki: Pageview data files for mobile breakdowns: absence of data should not be represented as 'zero' - https://phabricator.wikimedia.org/T78025#1928614 (ggellerman) [19:39:02] Analytics, Wikimedia-Logstash: Kafka logging to Logstash - https://phabricator.wikimedia.org/T84907#1928613 (ggellerman) [19:39:04] Analytics: Provide a ua-parser service (ua-parser.wikimedia.org) using the One True Wikimedia UA-Parserâ„¢ - https://phabricator.wikimedia.org/T1336#1928617 (ggellerman) [19:39:06] Analytics, Analytics-Wikimetrics, Community-Wikimetrics, Easy, and 2 others: User reads result of validation after creating a cohort - https://phabricator.wikimedia.org/T76914#1928615 (ggellerman) [19:39:08] Analytics: Don't accept data from automated bots in Event Logging - https://phabricator.wikimedia.org/T67508#1928616 (ggellerman) [19:41:09] Analytics, Analytics-Cluster, Traffic, operations: Upgrade analytics-eqiad Kafka cluster to Kafka 0.9 - https://phabricator.wikimedia.org/T121562#1928633 (ggellerman) [19:41:11] Analytics, Analytics-Wikimetrics: Central repository of global metrics reports {kudu} - https://phabricator.wikimedia.org/T121286#1928634 (ggellerman) [19:41:13] Analytics: Send burrow lag statistics to statsd/graphite - https://phabricator.wikimedia.org/T120852#1928636 (ggellerman) [19:41:15] Analytics, Analytics-Wikimetrics: Display global metrics report results on same page as report inputs {kudu} - https://phabricator.wikimedia.org/T121262#1928635 (ggellerman) [19:41:17] Analytics: delete useless wikimetrics.report records - https://phabricator.wikimedia.org/T120713#1928637 (ggellerman) [19:41:19] Analytics, Easy: Standardize logic, names, and null handling across UDFs in refinery-source {hawk} - https://phabricator.wikimedia.org/T120131#1928638 (ggellerman) [19:41:21] Analytics, Analytics-Cluster: Upgrade to CDH 5.5 - https://phabricator.wikimedia.org/T119646#1928642 (ggellerman) [19:41:23] Analytics: Vital Signs: Please provide an "all languages" de-duplicated stream for the Community/Content groups of metrics - https://phabricator.wikimedia.org/T120037#1928641 (ggellerman) [19:41:25] Analytics: Vital Signs: Please make the data for enwiki and other big wikis less sad, and not just be missing for most days - https://phabricator.wikimedia.org/T120036#1928640 (ggellerman) [19:41:27] Analytics, Analytics-Wikimetrics, Easy: Upgrade wikimetrics code to check labs lag table - https://phabricator.wikimedia.org/T119514#1928645 (ggellerman) [19:41:29] Analytics: Track pageview stats for outreach.wikimedia.org - https://phabricator.wikimedia.org/T118987#1928647 (ggellerman) [19:41:39] Analytics, Analytics-Dashiki: Fix annotation date parsing in Firefox {crow} - https://phabricator.wikimedia.org/T112273#1928661 (ggellerman) [19:41:39] Analytics: Update passport-mediawiki module URLs and documentation - https://phabricator.wikimedia.org/T113234#1928663 (ggellerman) [19:41:41] Analytics: Need a Dashiki namespace so we can protect configs {crow} - https://phabricator.wikimedia.org/T112268#1928664 (ggellerman) [19:41:43] Analytics: Allow clicking on links in Dashiki annotations - https://phabricator.wikimedia.org/T110459#1928665 (ggellerman) [19:41:45] Analytics, Analytics-Cluster: Create and maintain an Analytics Cluster in Beta Cluster in labs. - https://phabricator.wikimedia.org/T109859#1928666 (ggellerman) [19:41:47] Analytics, Analytics-EventLogging, Privacy: Opt-out from logging some of the default EventLogging fields - https://phabricator.wikimedia.org/T108757#1928667 (ggellerman) [19:41:49] Analytics, Analytics-EventLogging: Send raw server side events to Kafka using a PHP Kafka Client - https://phabricator.wikimedia.org/T106257#1928672 (ggellerman) [19:41:51] Analytics, Research consulting, Research-and-Data: Workshop to teach analysts, etc about Quarry, Hive, Wikimetrics and EL {flea} - https://phabricator.wikimedia.org/T105544#1928673 (ggellerman) [19:41:53] Analytics: RUBICON - https://phabricator.wikimedia.org/T105515#1928676 (ggellerman) [19:41:56] Analytics, Analytics-Dashiki, Editing-Analysis, VisualEditor, Patch-For-Review: Improve the edit analysis dashboard {lion} - https://phabricator.wikimedia.org/T104261#1928677 (ggellerman) [19:41:59] Analytics, Labs, Labs-Infrastructure: Report page views for labs instances - https://phabricator.wikimedia.org/T103726#1928678 (ggellerman) [19:42:01] Analytics, Analytics-Cluster: Deploy oozie reporting of last-access counts {bear} - https://phabricator.wikimedia.org/T103376#1928683 (ggellerman) [19:42:03] Analytics, Analytics-EventLogging: Analytics Eng monitors consumed EL events in Graphite (valid & write rate) {oryx} - https://phabricator.wikimedia.org/T97295#1928688 (ggellerman) [19:42:05] Analytics, Analytics-Cluster, Easy: Add better detection of wikipediaApp to user agent UDF {hawk} - https://phabricator.wikimedia.org/T96376#1928689 (ggellerman) [19:42:07] Analytics, Analytics-EventLogging: Analytics Eng monitors for death of EL processes in syslog {oryx} - https://phabricator.wikimedia.org/T97296#1928687 (ggellerman) [19:42:11] Analytics, Analytics-Cluster: Productionize Impala {hawk} - https://phabricator.wikimedia.org/T96331#1928694 (ggellerman) [19:42:13] Analytics, Analytics-Cluster: Report pageviews to the annual report - https://phabricator.wikimedia.org/T95573#1928696 (ggellerman) [19:42:23] Analytics, Analytics-Cluster: Process new PV hourly dumps for Vital Signs {hawk} - https://phabricator.wikimedia.org/T94592#1928710 (ggellerman) [19:42:24] Analytics, Analytics-Visualization: Mobile PMs has visualization on session-related metrics from Wikipedia Apps - https://phabricator.wikimedia.org/T94481#1928711 (ggellerman) [19:42:26] Analytics, Analytics-Wikimetrics: Troubleshoot Wikimetrics RAE reports - https://phabricator.wikimedia.org/T93217#1928716 (ggellerman) [19:42:28] Analytics, Analytics-Cluster: decide how to monitor the cluster {hawk} - https://phabricator.wikimedia.org/T91991#1928720 (ggellerman) [19:42:30] Analytics: Dashboard Directory Design Feedback - https://phabricator.wikimedia.org/T92502#1928718 (ggellerman) [19:42:32] Analytics: Publish aggregate geodumps of article pageviews - https://phabricator.wikimedia.org/T91331#1928721 (ggellerman) [19:42:35] Analytics, Analytics-EventLogging, MediaWiki-extensions-MultimediaViewer: Parse mediaviewer team's requirements for EventLogging {oryx} - https://phabricator.wikimedia.org/T90766#1928722 (ggellerman) [19:42:37] Analytics, Analytics-Cluster, Scrum-of-Scrums: Create Daily & Monthly pageview dump with country data - https://phabricator.wikimedia.org/T90759#1928724 (ggellerman) [19:42:39] Analytics, Analytics-Cluster: Estimate roughly of how many users might not have javascript capable/enable browsers, use CSS to crosscheck. - https://phabricator.wikimedia.org/T89847#1928728 (ggellerman) [19:42:41] Analytics: Analyze difference in Edit Schema "bounce rates" across wikis {lion} - https://phabricator.wikimedia.org/T89726#1928729 (ggellerman) [19:42:52] Analytics, Analytics-Dashiki, Analytics-Visualization: Build low level visualization of the paths through the application (starburst) in Dashiki - https://phabricator.wikimedia.org/T88374#1928744 (ggellerman) [19:42:53] Analytics, Analytics-Dashiki, Analytics-Visualization: Build high level timeseries view of key metrics (save / ready rate) in Dashiki - https://phabricator.wikimedia.org/T88373#1928745 (ggellerman) [19:42:55] Analytics, Analytics-EventLogging: Add ops-reportcard dashboard with analysis that shows the http to https slowdown on russian wikipedia - https://phabricator.wikimedia.org/T87604#1928748 (ggellerman) [19:42:57] Analytics, Analytics-Dashiki: Add time range selection to Limn dashboards (or new Dashiki dashboards) {frog} - https://phabricator.wikimedia.org/T87603#1928749 (ggellerman) [19:43:06] Analytics, Analytics-EventLogging, Privacy: Allow opting out from logging some of the default EventLogging fields on a schema-by-schema basis - https://phabricator.wikimedia.org/T108757#1928770 (Deskana) [19:43:25] Analytics, Analytics-EventLogging, Privacy: Allow opting out from logging some of the default EventLogging fields on a schema-by-schema basis - https://phabricator.wikimedia.org/T108757#1529694 (Deskana) Reworded the title to try to capture the intent of the task based on the above discussion; feel fr... [19:43:52] Analytics: Move IOS team piwiki usage to production instance - https://phabricator.wikimedia.org/T123262#1928790 (ggellerman) [19:43:53] Analytics: Create a central page in wikitech to act as a central hub so users know where to go for different types of data - https://phabricator.wikimedia.org/T122970#1928792 (ggellerman) [19:43:55] Analytics: Visualize unique devices data in dashiki - https://phabricator.wikimedia.org/T122533#1928793 (ggellerman) [19:43:57] Analytics: update comScore description on report card - https://phabricator.wikimedia.org/T122059#1928795 (ggellerman) [19:44:00] Analytics, Graph, Graphoid: Add usage to Grafana - https://phabricator.wikimedia.org/T122226#1928794 (ggellerman) [19:44:02] Analytics: We may be missing some more spiders when tagging pageviews {slug} - https://phabricator.wikimedia.org/T121934#1928796 (ggellerman) [19:44:04] Analytics, Analytics-Kanban, Patch-For-Review: Change analytics kafka cluster JMX metrics to be prefixed with cluster name and change alerts and dashboards [5 pts] - https://phabricator.wikimedia.org/T121643#1928797 (ggellerman) [19:44:06] Analytics: Spike: Investigate situation of logstash and hadoop logs - https://phabricator.wikimedia.org/T121418#1928798 (ggellerman) [19:44:08] Analytics: Make Pageview API date formats more flexible {slug} - https://phabricator.wikimedia.org/T118543#1928799 (ggellerman) [19:44:18] Analytics: Community has a Stats landing page with links - https://phabricator.wikimedia.org/T117496#1928809 (ggellerman) [19:44:18] Analytics: Make reportupdater output emtpy values when query returns no results. - https://phabricator.wikimedia.org/T117537#1928808 (ggellerman) [19:44:20] Analytics: Create API functionality to see the date ranges available - https://phabricator.wikimedia.org/T117361#1928811 (ggellerman) [19:46:27] three huurays for Grace :) [19:46:28] hip hip [19:46:31] * milimetric hurays [19:46:33] hip hip [19:46:35] * milimetric hurays [19:46:51] hip hip [19:46:59] Analytics, Analytics-Visualization: Mobile PMs has visualization on session-related metrics from Wikipedia Apps - https://phabricator.wikimedia.org/T94481#1928891 (Deskana) This task is still being kicked around, but I've long since moved on from the mobile space. Analytics should probably check in with... [19:47:14] milimetric: you did heavy lifting :) [19:52:30] Analytics, Analytics-Visualization, Reading-Admin: Mobile PMs has visualization on session-related metrics from Wikipedia Apps - https://phabricator.wikimedia.org/T94481#1928934 (dr0ptp4kt) [19:53:31] * joal is late byt wants to say HURRAY for grace :) [19:57:18] * joal got a few emails from phabricator [20:18:19] elukey: hola [20:23:01] Note to stat1002 users: I have a process taking a lot of resources (forgot to nice it ...) If it becomes overwhelming, let me know I'll kill it [20:30:19] madhuvishy, yt? [20:53:59] Analytics, Discovery, Maps: Add daily totals to grafana maps graph - https://phabricator.wikimedia.org/T122494#1929208 (Yurik) Solved, thanks to @krinkle: `alias(hitcount(sumSeries(varnish.$site.maps.frontend.request.client.total.rate), '24h'), 'Total')` See it live: https://grafana.wikimedia.org/d... [20:54:06] Analytics, Discovery, Maps: Add daily totals to grafana maps graph - https://phabricator.wikimedia.org/T122494#1929214 (Yurik) Open>Resolved a:Yurik [21:06:24] mforns: yup [21:06:40] hi madhuvishy [21:07:08] are you trying to use deployment-eventlogging03? [21:09:42] madhuvishy, I heard in stand-up you were using it [21:10:02] mforns: oh ya i was looking at logs to see if something was wrong [21:10:16] i'm not using it to test anything [21:10:42] ah ok ok :] I'm breaking it again [21:10:45] thx [21:11:20] Analytics-EventLogging, TimedMediaHandler, Wikimedia-Video: Record and report metrics for audio and video playback - https://phabricator.wikimedia.org/T108522#1929274 (Krenair) >>! In T108522#1921190, @TheDJ wrote: > @Krenair, you've done eventlogging for wikieditor at some point right ? Would you ca... [21:11:21] okay :) [21:18:49] Analytics: 'is_spider' column in eventlogging user agent data {flea} - https://phabricator.wikimedia.org/T121550#1929319 (Tbayer) Context: T117631 [21:21:59] Heyo! quick question: if I "move" a Schema: page to a new name, what happens to the corresponding database table? (will it remain as is, or will it get renamed too?) [21:22:41] dbrant: it will remain as is [21:23:19] ottomata: kthx! perfect [21:23:28] ottomata, dbrant: and probably you'll get some validation errors if you have events coming in for that schema, no? [21:23:34] Analytics, Datasets-Webstatscollector, Language-Engineering: Investigate anomalous views to pages with replacement characters - https://phabricator.wikimedia.org/T117945#1929350 (Tbayer) See also {T108867} . Possibly related, too: T104755 ("Wikimedia's URL-routing logic straddles five layers ...") [21:24:02] yeah, dbrant you probably shouldn't move the page [21:24:04] but just make a new schema [21:24:11] with a new name, leaving the old one in place [21:24:37] ottomata: mforns: i see; i'll do that, then! thanks [21:24:47] ottomata, btw :] [21:24:54] eh? :) [21:24:58] I'm trying to debug EL in beta [21:25:01] yes [21:25:02] k [21:25:04] wasssup? [21:25:28] and get errors in reading events, it says the event is a dict instead of an object (the wrapper) [21:26:31] where? [21:26:35] what are you reading with? [21:26:50] kafka [21:26:59] eventlogging-consumer [21:26:59] ? [21:27:06] it's the mysql-consumer, so it reads from kafka and produces to kafka [21:27:06] aha [21:27:11] ok consumer [21:27:17] it doesn't produce to kafka [21:27:20] unless its an error [21:27:22] no [21:27:26] that doesnt' produce to kafka at all, no? [21:27:30] ok, looking [21:27:32] oh sorry, no no I meant mysql [21:27:46] k [21:28:05] Analytics: Don't accept data from automated bots in Event Logging - https://phabricator.wikimedia.org/T67508#1929374 (Tbayer) Related: {T121550} [21:28:31] HmmMmmm [21:28:54] interesting! [21:28:55] yeah hm [21:29:18] ottomata, at what point is the parse method called from kafka reader? [21:29:26] it isn't [21:29:29] this is just in the processor right? [21:29:29] parse is for raw events [21:29:30] but [21:29:33] I see [21:29:43] the each event in the stream does need wrapped with Event.factory [21:29:49] looking still [21:29:51] thought I did this already... [21:30:09] (PS2) Madhuvishy: Fabric deployment setup for wikimetrics [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/261579 [21:30:20] mmm I wonder if I didn't rebase the patch correctly... [21:30:53] ottomata, yes it's not rebased ... gosh sorry [21:31:01] ahhh, phew, ok. [21:31:20] actually, i'm reading code now and am not sure how this works... [21:31:30] hehehe [21:32:12] its working for you now mforns? [21:32:22] deploying [21:32:55] ottomata, no... [21:33:06] hMmM [21:33:08] weird, um [21:33:17] batcave? [21:33:33] sure! [21:37:55] (PS3) Madhuvishy: Fabric deployment setup for wikimetrics [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/261579 [21:44:22] milimetric: around? [21:45:13] Analytics-Cluster, Analytics-Kanban, Datasets-Archiving, Datasets-Webstatscollector: Mediacounts missing top1000 files after 2016-01-01: rsync fails - https://phabricator.wikimedia.org/T122864#1929457 (ezachte) @Ottomata now both 2015 and 2016 have drwxr-xr-x instead of drwxrwxr-x So I can't upd... [21:48:50] Analytics-Tech-community-metrics, DevRel-February-2016: Key performance indicator: Top contributors: Should have sane Ranking algorithm which takes (un)reliability of user data into account - https://phabricator.wikimedia.org/T64221#1929463 (Aklapper) [22:30:23] milimetric: I think we should just use labs db for wikimetrics [22:30:33] if not, we'd have to do our own backups [22:30:38] which need nfs [22:40:17] Analytics-Cluster, Analytics-Kanban, Datasets-Archiving, Datasets-Webstatscollector: Mediacounts missing top1000 files after 2016-01-01: rsync fails - https://phabricator.wikimedia.org/T122864#1929627 (Ottomata) Hmm... ``` $ ls -l /mnt/hdfs/wmf/data/archive/mediacounts/daily total 8 drwxrwxr-x 7... [23:38:45] milimetric: let me know if you have a few mins [23:48:36] (PS4) Madhuvishy: Fabric deployment setup for wikimetrics [analytics/wikimetrics-deploy] - https://gerrit.wikimedia.org/r/261579