[02:26:09] <wikibugs>	 Analytics, MediaWiki-User-preferences, Tool-Labs-tools-Database-Queries: Gadget usage statistics for Portuguese Wikipedia - https://phabricator.wikimedia.org/T61480#1837499 (coren)
[09:01:59] <wikibugs>	 Analytics-Wikimetrics, Education-Program-Dashboard: I want WikiMetrics integration with the education dashboard that lets you easily pull reports about courses, institutions, etc. - https://phabricator.wikimedia.org/T92454#1837707 (awight) p:Triage>High
[09:26:54] <wikibugs>	 Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1837766 (jcrespo) ``` jynus@db1046:/srv$ df -h | grep /srv /dev/mapper/tank-data  1.4T  1.3T  106G  93% /srv jynus@db1046:/srv$ du -h --max-depth=2 691G ./sqldata/log 119M ./sqldata/mysql...
[09:40:35] <wikibugs>	 Analytics-Kanban: Cassandra Backfill July [5 pts] {melc} - https://phabricator.wikimedia.org/T119863#1837784 (JAllemandou) NEW
[10:38:23] <grrrit-wm>	 (PS1) Addshore: Add wikipedia ref counting script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255971 (https://phabricator.wikimedia.org/T119607)
[10:38:39] <grrrit-wm>	 (CR) Addshore: [C: 2 V: 2] Add wikipedia ref counting script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255971 (https://phabricator.wikimedia.org/T119607) (owner: Addshore)
[10:56:05] <wikibugs>	 Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1837894 (jcrespo) This is a list of the first record on db1046 for each table:   ``` mysql -A -BN -h db1046 log -e "SELECT table_name FROM information_schema.columns WHERE column_name='ti...
[11:08:45] <grrrit-wm>	 (PS1) Addshore: Add instanceof tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255975 (https://phabricator.wikimedia.org/T119074)
[11:08:48] <wikibugs>	 Analytics-Backlog, Research consulting, Research-and-Data-Archive: Analysis on traffic through the HTTPS transition - https://phabricator.wikimedia.org/T102431#1837912 (Aklapper) Open>Resolved >>! In T102431#1675396, @ellery wrote: > @Aklapper this task is complete.   @ellery: Is there a reason t...
[11:09:14] <grrrit-wm>	 (CR) Addshore: [C: 2 V: 2] Add instanceof tracking script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255975 (https://phabricator.wikimedia.org/T119074) (owner: Addshore)
[11:13:45] <grrrit-wm>	 (PS1) Joal: Update changelog.md for v0.0.23 deployment [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255976
[11:17:47] <grrrit-wm>	 (PS1) Addshore: Rename wikidata_ -> wp_ ref script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255977
[11:17:55] <grrrit-wm>	 (CR) Addshore: [C: 2] Rename wikidata_ -> wp_ ref script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255977 (owner: Addshore)
[11:18:14] <grrrit-wm>	 (Merged) jenkins-bot: Rename wikidata_ -> wp_ ref script [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255977 (owner: Addshore)
[11:20:31] <wikibugs>	 Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1837970 (ori) >>! In T119380#1830707, @jcrespo wrote: > I have just one question, when and who decides when new tables are to be created within a schema?  At the moment it is done manuall...
[11:31:57] <wikibugs>	 Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1837986 (jcrespo) > If we get agreement on T119144, we could potentially drop the clientIp column (varchar(191)) from all tables.  Dropping columns is not an investment work persuing. Par...
[11:33:26] <icinga-wm>	 PROBLEM - DPKG on gadolinium is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[11:43:04] <icinga-wm>	 PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: Puppet has 1 failures
[11:45:15] <icinga-wm>	 RECOVERY - DPKG on gadolinium is OK: All packages OK
[11:46:55] <icinga-wm>	 RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[11:54:24] <mforns>	 joal, I just wanted to ask about the scala code
[11:54:41] <mforns>	 do you want to continue pairing?
[11:55:19] <joal>	 mforns: depends :)
[11:55:42] <joal>	 mforns: I'd like to move forward some other things, but I'd also like to pair :)
[11:55:53] <mforns>	 aha
[11:56:12] <mforns>	 if you want I have also things to do, currently backfilling EL
[11:56:16] <joal>	 mforns: Let's discuss about the path forward, then I'll let you code and possibly get back to it when the other things I want to work will be done :)
[11:56:26] <joal>	 ok mforns
[11:56:31] <mforns>	 ok
[11:56:42] <joal>	 mforns: backfilling didn't automatically work ?
[11:56:52] <mforns>	 joal, have you added more modifications to the scala code?
[11:57:01] <mforns>	 backfilling... no... :(
[11:57:35] <joal>	 mforns: batcave?
[11:57:43] <mforns>	 ok
[12:39:16] <grrrit-wm>	 (PS1) Joal: Add LRUCache to webrequest spider identification [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255986
[13:42:20] <grrrit-wm>	 (PS2) Joal: Update changelog.md for v0.0.23 deployment [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255976
[13:45:18] <grrrit-wm>	 (CR) Joal: "Tested on hive with for isSpider only simple group by: new computation time between 2/3 and 1/2 of the previous one." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255986 (owner: Joal)
[13:52:12] <grrrit-wm>	 (PS1) Addshore: Add script to count refs by type [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255991 (https://phabricator.wikimedia.org/T119777)
[13:52:51] <grrrit-wm>	 (PS2) Addshore: Add script to count refs by type [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255991 (https://phabricator.wikimedia.org/T119777)
[14:09:42] <grrrit-wm>	 (PS3) Addshore: Add script to count refs by type [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255991 (https://phabricator.wikimedia.org/T119777)
[14:11:48] <grrrit-wm>	 (PS4) Addshore: Add script to count refs by type [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255991 (https://phabricator.wikimedia.org/T119777)
[14:15:21] <grrrit-wm>	 (PS1) Addshore: Count total statements in statements_per_entity [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255995
[14:16:00] <grrrit-wm>	 (CR) Addshore: [C: 2 V: 2] Add script to count refs by type [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255991 (https://phabricator.wikimedia.org/T119777) (owner: Addshore)
[14:16:11] <grrrit-wm>	 (CR) Addshore: [C: 2 V: 2] Count total statements in statements_per_entity [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/255995 (owner: Addshore)
[14:28:51] <wikibugs>	 Analytics, Traffic, operations, Patch-For-Review: Replace Analytics XFF/client.ip data with X-Client-IP - https://phabricator.wikimedia.org/T118557#1838151 (BBlack) With varnishkafka-1.0.7 deployed and the patch above merged, the webrequest stream now has a correct "client_ip" field that analytics...
[14:29:58] <wikibugs>	 Analytics, Traffic, operations, Patch-For-Review: Replace Analytics XFF/client.ip data with X-Client-IP - https://phabricator.wikimedia.org/T118557#1838153 (BBlack) (and, I just read @ottomata's comment above - we can certainly switch the data into "ip" instead of "client_ip".  That might be simple...
[14:33:07] <wikibugs>	 Analytics, Traffic, operations, Patch-For-Review: Replace Analytics XFF/client.ip data with X-Client-IP - https://phabricator.wikimedia.org/T118557#1838154 (JAllemandou) No problem for me to remove and reuse ip, and remove x_forwarded_for :)
[14:36:02] <ottomata>	 morning
[14:36:18] <Ironholds>	 hey ottomata :)
[14:36:20] <Ironholds>	 How was your weekend?
[14:36:23] <joal>	 Hi ottomata
[14:58:51] <wikibugs>	 Analytics-Tech-community-metrics, Possible-Tech-Projects: Misc. improvements to MediaWikiAnalysis (which is part of the MetricsGrimoire toolset) - https://phabricator.wikimedia.org/T89135#1838188 (01tonythomas)
[14:58:59] <wikibugs>	 Analytics-Tech-community-metrics, Possible-Tech-Projects, Epic: Allow contributors to update their own details in tech metrics directly - https://phabricator.wikimedia.org/T60585#1838192 (01tonythomas)
[15:59:21] <joal>	 Hi ebernhardson
[16:02:33] <ebernhardson>	 hi!
[16:07:30] <grrrit-wm>	 (CR) EBernhardson: Add page_id to webrequest and pageview_hourly (3 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 (owner: EBernhardson)
[16:07:36] <grrrit-wm>	 (PS7) EBernhardson: Add page_id to webrequest and pageview_hourly [analytics/refinery] - https://gerrit.wikimedia.org/r/255318
[16:13:43] <joal>	 ebernhardson: I Didn't even have to tell you :)
[16:13:51] <joal>	 ebernhardson: thanks for the new patch
[16:14:09] <grrrit-wm>	 (CR) Joal: [C: 2 V: 2] "Looks good to me !" [analytics/refinery] - https://gerrit.wikimedia.org/r/255318 (owner: EBernhardson)
[16:15:45] <ebernhardson>	 joal: :) thanks for merging
[16:16:09] <joal>	 ebernhardson: Deployment planning today, hopefully successful deploy tomorrow :)
[16:17:30] <wikibugs>	 Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1838344 (Milimetric) So there seem to be two threads here.  Table level partitioning seems to me to complicate replication to the slaves and complicate application logic.  It doesn't seem...
[16:19:49] <wikibugs>	 Analytics, Traffic, operations, Patch-For-Review: Replace Analytics XFF/client.ip data with X-Client-IP - https://phabricator.wikimedia.org/T118557#1838361 (BBlack) Ok, the old "ip" field now has the X-Client-IP data in the webrequest logs.  The remaining pending patches here are:  the (updated) on...
[16:22:52] <wikibugs>	 Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1838382 (jcrespo) @Milimetric: deleting data will not solve immediately the problem, as deleting data logically doesn't mean space is freed from disk. Hence the partitioning suggestion....
[16:28:35] <wikibugs>	 Analytics-Kanban, operations, Database: db1046 running out of disk space - https://phabricator.wikimedia.org/T119380#1838414 (jcrespo) BTW, I found the acceleration issue: the automatic purge process was failing since some tables had been deleted.
[16:30:12] <wikibugs>	 Analytics-Kanban, Database: Delete obsolete schemas {tick} [5 pts] - https://phabricator.wikimedia.org/T108857#1838427 (jcrespo) Resolved>Open This task caused the purging process to fail (T119380). Table purge_schedule has to be updated to reflect the dropped tables.
[16:30:13] <wikibugs>	 Analytics-Kanban: Enforce policy for each schema: Sanitize {tick} [8 pts] - https://phabricator.wikimedia.org/T104877#1838431 (jcrespo)
[16:44:43] <joal>	 ottomata: as per https://gerrit.wikimedia.org/r/#/c/256002/, should we update refinement not to compute client_ip but copy it from IP ?
[16:55:26] <grrrit-wm>	 (PS1) Joal: Deprecate client_ip functions [analytics/refinery/source] - https://gerrit.wikimedia.org/r/256024
[17:01:50] <nuria>	 a-team; coming to standup
[17:01:56] <kevinator>	 ottomata: standup?
[17:02:37] <grrrit-wm>	 (PS1) Joal: Remove client_ip computation from refine [analytics/refinery] - https://gerrit.wikimedia.org/r/256027
[17:05:34] <grrrit-wm>	 (CR) Ottomata: "I don't think either of these functions are deprecated. We just won't use them in webrequest jobs. They are still useful, if you have th" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/256024 (owner: Joal)
[17:05:43] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Create celery chain or other organization that handles validation and computation {kudu} [8 pts] - https://phabricator.wikimedia.org/T118308#1838677 (madhuvishy)
[17:05:44] <wikibugs>	 Analytics-Kanban: Implement the logic of each node in the celery chain {kudu} [5 pts] - https://phabricator.wikimedia.org/T118309#1838676 (madhuvishy)
[17:05:47] <grrrit-wm>	 (PS8) EBernhardson: Implement ArraySum UDF [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452
[17:05:53] <wikibugs>	 Analytics-Kanban, Patch-For-Review: Create celery chain or other organization that handles validation and computation {kudu} [13 pts] - https://phabricator.wikimedia.org/T118308#1838679 (madhuvishy)
[17:06:26] <wikibugs>	 Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session:  Pageview API  overview - https://phabricator.wikimedia.org/T112956#1838690 (Milimetric) > I would like to use pageview data excluding spiders and bots but I need some kind of bulk download for all projects like it was before or...
[17:06:42] <grrrit-wm>	 (CR) EBernhardson: Implement ArraySum UDF (7 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/254452 (owner: EBernhardson)
[17:23:38] <wikibugs>	 Analytics-Kanban, Services: Response times pageview API. Dashboard .  - https://phabricator.wikimedia.org/T119886#1838795 (Nuria) NEW
[17:23:43] <wikibugs>	 Analytics-Kanban, Services: Response times pageview API. Dashboard . - https://phabricator.wikimedia.org/T119886#1838805 (Nuria) a:Nuria
[17:26:29] <grrrit-wm>	 (CR) Ottomata: [C: 1] Remove client_ip computation from refine [analytics/refinery] - https://gerrit.wikimedia.org/r/256027 (owner: Joal)
[17:28:26] <grrrit-wm>	 (Abandoned) Joal: Deprecate client_ip functions [analytics/refinery/source] - https://gerrit.wikimedia.org/r/256024 (owner: Joal)
[17:48:43] <wikibugs>	 Analytics-EventLogging, Analytics-Kanban: EventLogging Kafka consumer stops consuming after Kafka metadata change - https://phabricator.wikimedia.org/T118315#1838961 (Nuria)
[17:49:46] <wikibugs>	 Analytics-Kanban, CirrusSearch, Discovery, operations, and 2 others: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old. - https://phabricator.wikimedia.org/T118527#1838966 (Milimetric) a:Ottomata
[17:50:20] <wikibugs>	 Analytics-Kanban, CirrusSearch, Discovery, operations, and 2 others: Delete logs on stat1002 in /a/mw-log/archive that are more than 90 days old {hawk} - https://phabricator.wikimedia.org/T118527#1838970 (Milimetric) p:Triage>High
[17:52:52] <wikibugs>	 Analytics-EventLogging, Analytics-Kanban: EventLogging Kafka consumer stops consuming after Kafka metadata change. See if upgrade fixes it. - https://phabricator.wikimedia.org/T118315#1838990 (Nuria)
[17:52:58] <wikibugs>	 Analytics-Kanban, Services: Response times pageview API. Dashboard . - https://phabricator.wikimedia.org/T119886#1838992 (GWicke) This data from the perspective of the frontend RESTBase cluster is already available in graphite, under restbase.v1_metrics*.
[17:57:58] <Ironholds>	 halfak, any idea where Morten is?
[17:58:42] <halfak>	 Ironholds, has he been away for a while?
[17:58:55] <halfak>	 He's in Seattle, so I can't go throw a snowball at him
[18:00:13] <Ironholds>	 ahh
[18:00:22] <Ironholds>	 you have snow already?!
[18:00:33] <halfak>	 Can ping him more directly if you're looking for him.
[18:00:38] <halfak>	 First major snowfall was today.
[18:00:45] <halfak>	 ~4 inches of heavy stuff
[18:00:49] * halfak went shoveling.
[18:04:54] <wikibugs>	 Analytics-EventLogging, Analytics-Kanban: EventLogging Kafka consumer stops consuming after Kafka metadata change. See if upgrade fixes it. [13 pts] - https://phabricator.wikimedia.org/T118315#1839067 (Ottomata)
[18:05:06] <wikibugs>	 Analytics-Kanban, Services: Response times pageview API. Dashboard . - https://phabricator.wikimedia.org/T119886#1839068 (Nuria) cache-control: max-age=3600, s-maxage=3600 should be added to AQS per @GWicke
[18:05:14] <wikibugs>	 Analytics-EventLogging, Analytics-Kanban: EventLogging Kafka consumer stops consuming after Kafka metadata change. See if upgrade fixes it. {oryx} [13 pts] - https://phabricator.wikimedia.org/T118315#1839069 (Milimetric)
[18:05:29] <halfak>	 Ironholds, shall I ping Nettrom directly?
[18:05:59] <Ironholds>	 halfak, naw, we're good, just an idle query :)
[18:06:01] <wikibugs>	 Analytics-Kanban, Services: Response times pageview API. Dashboard . - https://phabricator.wikimedia.org/T119886#1839071 (Nuria)
[18:06:11] <halfak>	 Oh!  Gotcha.
[18:12:30] <wikibugs>	 Analytics-Kanban, Services: Response times pageview API. Dashboard . [8] - https://phabricator.wikimedia.org/T119886#1839084 (Nuria)
[18:13:14] <mforns>	 joal, :]
[18:15:26] <wikibugs>	 Analytics-Kanban: Reformat pageview API responses to allow for status reports and messages {slug} - https://phabricator.wikimedia.org/T117017#1839105 (Milimetric) @Ironholds: so far we're still of the same opinion.  Filling the holes and/or adding more information to these responses should be the job of a hig...
[18:21:25] <joal>	 mforns: ?
[18:23:29] <mforns>	 joal, are you staying for a while or leaving already?
[18:23:41] <mforns>	 do we have time for scala?
[18:23:52] <joal>	 I'm still here, but I'll spend time with nuria on deployment plan
[18:23:56] <mforns>	 I see
[18:23:58] <mforns>	 ok
[18:24:01] <joal>	 mforns: You can go for scala :)
[18:24:25] <mforns>	 joal, I'll do EL backfilling and if I have time, will go for scala, ok
[18:24:43] <joal>	 mforns: If no scala no bother, we'll do tomorrow together :)
[18:26:48] <joal>	 kevinator: Do you give me a minute?
[18:26:51] <wikibugs>	 Analytics-Kanban: Reformat pageview API responses to allow for status reports and messages {slug} - https://phabricator.wikimedia.org/T117017#1839139 (Ironholds) Multiple requests to the API? Just hit it over and over again until it 404s to identify the range of data?
[18:29:01] <nuria>	 joal: omw to batcave
[18:29:07] <joal>	 k nuria
[18:29:58] <joal>	 nuria: you can't hear me :)
[18:46:58] <grrrit-wm>	 (PS2) Nuria: Add LRUCache to webrequest spider identification [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255986 (owner: Joal)
[18:49:30] <grrrit-wm>	 (CR) Nuria: [C: 2] Add LRUCache to webrequest spider identification [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255986 (owner: Joal)
[18:53:14] <wikibugs>	 Analytics-Kanban: Remove avro schema from jar [1] - https://phabricator.wikimedia.org/T119893#1839308 (Nuria) NEW
[19:02:27] <nuria>	 dcausse: yt?
[19:02:27] <madhuvishy>	 milimetric: around?
[19:06:23] <wikibugs>	 Analytics-Backlog: Create cron on 1002 to remove CirrusSearchRequest partitions  - https://phabricator.wikimedia.org/T119897#1839364 (Nuria) NEW
[19:09:16] <joal>	 ottomata: are you around?
[19:09:37] <ottomata>	 ja
[19:10:06] <joal>	 reviewing tomorrow deploy with nuria : do you htink we should add a field in webrequest table for network_origin (now that we have functions)
[19:10:25] <joal>	 ottomata: --^
[19:10:35] <nuria>	 ebernhardson: yt?
[19:10:52] <ebernhardson>	 nuria: yup
[19:11:23] <nuria>	 ebernhardson: did you guys submitted a patch for camus.properties file on puppet for new cirrus search request runs?
[19:11:47] <nuria>	 ebernhardson: makes sense?
[19:12:45] <nuria>	 ebernhardson: see: https://github.com/wikimedia/operations-puppet/tree/production/modules/camus/templates
[19:12:51] <ebernhardson>	 nuria: hmm, i don't think we did unless it was in one of david's patches, but i'm not seeing one
[19:13:00] <ottomata>	 joal:  sorry on phone, few mins...
[19:13:07] <joal>	 np ottomata
[19:13:53] <nuria>	 ebernhardson, dcausse : i think we need the properties file that we tested with in puppet
[19:16:11] <ebernhardson>	 nuria: the only patch he has there is https://gerrit.wikimedia.org/r/#/c/252432/2/modules/camus/templates/mediawiki.erb but that only adjust's camus for the schema id
[19:17:02] <ottomata>	 joal:  sure why not, maybe ask bd808 (he submitted it, right?) if he'd like that
[19:17:56] <joal>	 ottomata: nuria think it's not needed, let's have bd808 decide :)
[19:18:04] <ottomata>	 k
[19:18:12] <nuria>	 ottomata: can you review thsi one: https://gerrit.wikimedia.org/r/#/c/252432/2
[19:18:26] <nuria>	 ottomata: I think is ready as dcausse tested it a while back
[19:18:59] <nuria>	 ottomata: i rather not add the bd808 ip dimernsions
[19:19:22] <nuria>	 ottomata: cause they are really mean to help qhen querying but i do not think tehy are of global interest to everyone
[19:19:23] <ottomata>	 ok cool
[19:19:23] <ottomata>	 i will merge
[19:19:31] <ottomata>	 nuria:  that's fine with me too
[19:19:36] <ottomata>	 i think i don't have an opinion on that one :)
[19:19:40] <nuria>	 ottomata: then let's proceeed
[19:21:15] <joal>	 ottomata: about the properties, will it break not having the new camus jar ?
[19:21:21] <ottomata>	 nuria:  properties updated on an27, they will be used during the next camus run
[19:21:22] <ottomata>	 oh
[19:21:28] <ottomata>	 uhhh, dunno!   will it?
[19:21:33] <joal>	 :D
[19:21:36] <nuria>	 ottomata: you ARE teh expert
[19:21:38] <nuria>	 *the
[19:21:38] <joal>	 Dunno either
[19:21:39] <nuria>	 ajaja
[19:21:48] <nuria>	 but  i do not think is an issue
[19:21:53] <ottomata>	 i'm not up to date on what has been merged or deployed
[19:21:54] <nuria>	 as it adds one property
[19:21:54] <ottomata>	 oh
[19:22:01] <joal>	 ¯\_(ツ)_/¯
[19:22:02] <nuria>	 that otherwise will be ignored
[19:22:03] <ottomata>	 no, they jar doesn't use these properties yet, right?
[19:22:06] <ottomata>	 ya
[19:22:08] <ottomata>	 then it will probably be fine
[19:22:14] <ottomata>	 i'll tail the logs and let you know if i see anything
[19:22:15] <nuria>	 no, until we deploy latest candidate
[19:22:32] <joal>	 ok ottomata, I'll deploy the new jar tomorrow and will monitor
[19:22:35] <ottomata>	 k
[19:24:56] <nuria>	 ebernhardson: there is no data on topic you guys need to save ?
[19:25:23] <bd808>	 nuria, joal: I think I agree with nuria that my ip origin function is probably not of general interest. I'll use it in the api specific tables when I finally get them setup
[19:25:39] <nuria>	 I think I checked with dcausse and (while changes are backwards compatible) there is no need to preserve data, correct?
[19:25:44] <joal>	 Thanks bd808 for the heads up
[19:25:49] <nuria>	 bd808: thank you
[19:31:34] <nuria>	 sometimes it looks like I am writing from an enigma machine....
[19:34:05] <wikibugs>	 Analytics-Backlog, Wikimedia-Developer-Summit-2016: Developer summit session:  Pageview API  overview - https://phabricator.wikimedia.org/T112956#1839444 (Nuria) @Symac: The poor response times are likely due to lack of caching and thus going to storage every time. We are working on fixing that and will u...
[19:34:11] <ebernhardson>	 nuria: we can lose the data, we havn't started using it yet. I thought david set it up such that that wasn't necessary though
[19:34:23] <nuria>	 ebernhardson: correct, just confirming
[19:34:34] <nuria>	 cc joal so he is on the loop
[19:35:18] <addshore>	 ottomata: if I want to make a http request between stat100[123] and wdqs1001.eqiad.wmnet what might I be falling over? It looks like the firewall is open on wdqs1001, and I guess any outbound stuff is allowed from the stat servers, is there some extra firewall / segregation in the moddile I am missing? :)
[19:40:37] <ottomata>	 yeah, no outbound stuff is allowed from the stat servers :p
[19:40:49] <ottomata>	 the analytics cluster is firewalled off from the prod servers
[19:41:11] <wikibugs>	 Analytics-Backlog: Move App session data to 7 day counts - https://phabricator.wikimedia.org/T117637#1839469 (Nuria) p:Triage>Normal
[19:41:14] <ottomata>	 you might be able to get away with setting http proxy
[19:41:14] <ottomata>	 though
[19:41:15] <ottomata>	 gm
[19:41:16] <ottomata>	 hm
[19:41:29] <ottomata>	 https://wikitech.wikimedia.org/wiki/Http_proxy
[19:41:30] <ottomata>	 not sure
[19:43:39] <addshore>	 hmmm, okay *has a think*
[19:45:17] <addshore>	 yeh, I could use the webproxy thing! then I guess I just need to change the rule at the other end to allow stuff from the webproxy / look at the X-Forwarded-For ip
[19:45:30] <wikibugs>	 Analytics-Kanban: Reformat pageview API responses to allow for status reports and messages {slug} - https://phabricator.wikimedia.org/T117017#1839488 (Nuria) @Ironholds: I think part of the problem can be fixed with docs. We should certainly document that the pageview API will never have data older than May 2...
[19:49:02] <nuria>	 ottomata: yt?
[19:49:48] <nuria>	 ottomata: is there a way to access hive CLI outside  1002? I think not ... but just triple checking...
[19:52:36] <wikibugs>	 Analytics-Kanban: Reformat pageview API responses to allow for status reports and messages {slug} - https://phabricator.wikimedia.org/T117017#1839501 (Ironholds) Agreed, never older than May, but are you planning on clearing old data out as new data comes in for storage purposes?
[19:53:00] <ottomata>	 nuria:  ja
[19:53:08] <ottomata>	 um, on analytics1027, ja
[19:53:20] <ottomata>	 i mean, anywhere it is installed and has the right configs and can talk to hiveserver/hadoop
[19:53:21] <ottomata>	 :)
[19:58:57] <wikibugs>	 Analytics-Cluster, Collaboration-Team-Current, Database: Replicate Echo tables to analytics-store - https://phabricator.wikimedia.org/T115275#1839508 (Neil_P._Quinn_WMF)
[20:17:59] <nuria>	 ottomata: see https://phabricator.wikimedia.org/T119108
[20:18:26] <nuria>	 ottomata: is there a away to access hive w/o accessing it through a machine taht has access to private data?
[20:19:20] <ottomata>	 ah
[20:20:00] <ottomata>	 nuria:  https://phabricator.wikimedia.org/T89887
[20:20:07] <ottomata>	 nuria, so
[20:20:11] <ottomata>	 there is an 'analytics-users' group
[20:20:13] <ottomata>	 that we basically don't use
[20:20:28] <ottomata>	 but, this group will allow access to hadoop/hive, but not access to the webrequest table
[20:20:35] <ottomata>	 since it is group-readable by analytics-privatedata-users
[20:20:38] <ottomata>	 but
[20:20:47] <ottomata>	 since currently we only give access to hadoop on stat1002
[20:21:02] <ottomata>	 and there aren't clean perms on stuff in /a
[20:21:13] <ottomata>	 anyone with stat1002 can access private logs via the local filesystem
[20:21:23] <ottomata>	 2 possible solutions:
[20:21:33] <ottomata>	 - 1 make hadoop accessable from stat1003 via analytics-users
[20:21:45] <ottomata>	 - 2, fix permissions on stat1002 so that local files are not accessible
[20:21:51] <ottomata>	 i think 2 is more difficult than it sounds
[20:22:01] <ottomata>	 there are a lot of files there, and many things create those files
[20:26:32] <ottomata>	 nuria: i would like to talk some eventlogging stuff if you have some batcave minutes
[20:26:37] <ottomata>	 (i have 1:1 with kevin in 30 mins)
[20:35:07] <ottomata>	 joal:  didn't realize, but bblack has merged this
[20:35:07] <ottomata>	 https://gerrit.wikimedia.org/r/#/c/256002/1
[20:35:21] <ottomata>	 should be fine, just need to deploy your patch to no longer do the logic
[20:36:28] <grrrit-wm>	 (CR) Ottomata: [C: 1] Update changelog.md for v0.0.23 deployment [analytics/refinery/source] - https://gerrit.wikimedia.org/r/255976 (owner: Joal)
[20:54:47] <ebernhardson>	 just starting out playing with pyspark and hive integration.  just using the shell (pyspark --master yarn --deploy-mode client) and running sqlCtx.sql("select project, sum(view_count) as view_count from wmf.pageview_hourly where year=2015 and month=11 and day=11 and hour=11 group by project") is taking several minutes, where the same query in the hive cli takes a total of 40s
[20:54:53] <ebernhardson>	 am i perhaps doing something wrong?
[20:55:50] <ebernhardson>	 i'm up to 28 minutes of cpu time on the pyspark version, doesn't seem right
[20:55:57] <ottomata>	 ebernhardson:  spark does not auto scale, so dunno
[20:56:02] <ottomata>	 how many mappers does hive make for you?
[20:56:11] <nuria>	 ottomata: back, let me know if you want to talk
[20:56:15] <ottomata>	 with spark you have to tell it how many executors to run
[20:56:23] <ottomata>	 nuria: k, let's talk after my 1:1 with kevin
[20:56:23] <ebernhardson>	 ottomata: 7 mappers, 2 reducers
[20:56:34] <ottomata>	 aye, default for spark is 2 exectuors (processes)
[20:56:35] <ebernhardson>	 ottomata: the odd thing though is across all that hive used 40s of cpu time (not real time, but aggregate)
[20:56:42] <nuria>	 ottomata: k
[20:56:42] <ottomata>	 dunno if that is the proble, but ja
[20:56:42] <ebernhardson>	 and pyspark is up to 30 minutes
[20:56:43] <ottomata>	 hm...
[20:56:45] <ottomata>	 hmmmmmm
[20:57:04] <ottomata>	 ebernhardson: not sure, but I have to say we haven't been impressed with the hive spark integration in our current version
[20:57:13] <ottomata>	 we like SparkSQL/dataframes
[20:57:17] <ottomata>	 but the hive integration was flaky
[20:57:26] <ottomata>	 you can try to load the files from hdfs directly into a dataframe
[20:57:32] <ebernhardson>	 hmm, ok. based on my reading of the doc's i can just read the parquet files directly intead
[20:57:35] <ottomata>	 yes
[20:58:49] <milimetric>	 madhuvishy: I accidentally closed IRC, sorry, what's up
[20:59:03] <ottomata>	 ebernhardson: here is an example with sequenceFile/json, but parquetFile shoudl work similiarly
[20:59:05] <ottomata>	 https://wikitech.wikimedia.org/wiki/Analytics/EventLogging#Spark
[20:59:06] <ottomata>	 Edit events with SparkSQL in Spark Python (pyspark):
[20:59:51] <ebernhardson>	 thanks
[21:00:02] <ottomata>	 ebernhardson: this is scala
[21:00:03] <ottomata>	 https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/AppSessionMetrics.scala#L188
[21:00:05] <ottomata>	 but same idea
[21:00:11] <ottomata>	 or, https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/AppSessionMetrics.scala#L212
[21:12:03] <madhuvishy>	 milimetric: hmmm trying to remember what
[21:12:25] <milimetric>	 :)
[21:12:30] <madhuvishy>	 milimetric: oh
[21:12:41] <madhuvishy>	 Amanda replied saying the name program metrics makes sense
[21:12:52] <madhuvishy>	 was wondering if I should go ahead and change the code
[21:14:18] <grrrit-wm>	 (PS1) Addshore: Add inline TODOs [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256117
[21:14:19] <milimetric>	 madhuvishy: yeah, totally
[21:14:25] <madhuvishy>	 milimetric: cool
[21:14:29] <grrrit-wm>	 (CR) Addshore: [C: 2 V: 2] Add inline TODOs [analytics/limn-wikidata-data] - https://gerrit.wikimedia.org/r/256117 (owner: Addshore)
[21:14:31] <milimetric>	 sounded like an even *more* global name :)
[21:14:39] <madhuvishy>	 ha ha
[21:15:09] <madhuvishy>	 GrantMakingMetrics?
[21:15:30] <madhuvishy>	 GrantMakingGlobalProgramMetrics
[21:18:36] <milimetric>	 oh madhuvishy I thought it could just be ProgramMetrics
[21:19:01] <madhuvishy>	 milimetric: he he yes that's what i am going to change it to
[21:19:02] <milimetric>	 or GrantProgramMetrics if you want to be more specific
[21:19:19] <milimetric>	 oh ok
[21:19:38] <madhuvishy>	 yeah GrantProgramMetrics sounds good too
[21:28:36] <ebernhardson>	 i'm going to have to dig into a bunch more documentation. `source = sqlCtx.parquetFile("hdfs://analytics-hadoop/wmf/data/wmf/pageview/hourly")` from pyspark also takes ages to run (this will be aggregating over a week's worth of data, so selecting individual directories or even months wont work right)
[21:31:00] <ottomata>	 ebernhardson: in either hive or spark, you'llh ave to specify the partitions you want
[21:31:09] <ottomata>	 you should just compute which days are 'last week'
[21:31:14] <ottomata>	 and load it up that way
[21:31:55] <ottomata>	 madhuvishy: give ebernhardson some tips :)
[21:31:57] <ottomata>	 https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/AppSessionMetrics.scala#L193
[21:31:57] <ottomata>	 :)
[21:32:17] <ottomata>	 ok nuria want to lbatcave?
[21:32:26] <ebernhardson>	 ottomata: i was thinking that would happen after initializing the variable that points to the directory. clearly i have a lot to learn about this :)
[21:32:26] <nuria>	 k
[21:32:30] <nuria>	 give me 2 mins
[21:32:54] <ottomata>	 yeah ebernhardson it isn't automatic
[21:33:05] <ottomata>	 just like in hive, you have to tell it what data to load, so you don't hit all of it
[21:34:29] <madhuvishy>	 yeah I wish it was smarter about it
[21:35:36] <madhuvishy>	 ebernhardson: it could also be slow because of too low driver or executor memory
[21:35:44] <madhuvishy>	 you can pass those in the cli
[21:36:03] <madhuvishy>	 i think the defaults are like 512MB
[21:37:01] <madhuvishy>	 like this:
[21:37:12] <madhuvishy>	 spark-submit --master yarn --driver-memory 2g --num-executors=8 --executor-cores=1   --executor-memory=2g --class
[21:38:11] <madhuvishy>	 ebernhardson: http://spark.apache.org/docs/1.3.0/configuration.html
[21:38:33] <ebernhardson>	 madhuvishy: eventually it hit the GC overhead limit and started spewing stack traces, so that might be as well. it seems sensible to follow that AppSessionMetrics lead and generate a list of directories rather than passing the top level one though
[21:39:12] <madhuvishy>	 ebernhardson: ya definitely ramp up memory then - it happened a lot for the session metrics job because we were loading 30 days of daya
[21:39:14] <madhuvishy>	 data
[22:20:46] <grrrit-wm>	 (PS9) Madhuvishy: [WIP] Setup celery task workflow to handle running reports for the ProgramMetrics API [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/253750 (https://phabricator.wikimedia.org/T118308)
[22:58:08] <wikibugs>	 Analytics, Discovery, EventBus, MediaWiki-General-or-Unknown, and 7 others: Define edit related events for change propagation - https://phabricator.wikimedia.org/T116247#1839888 (Ottomata) @gwicke and I discussed the schema/revision in meta issue in IRC today. He had an idea that I quite like!...
[23:21:30] <milimetric>	 mforns: I added hash-parsing to the API demo, wanted to be able to link directly like this: https://analytics.wmflabs.org/demo/pageview-api/#Beirut,Paris
[23:21:49] <mforns>	 milimetric, awesome!
[23:22:21] <milimetric>	 mforns: the change in case you wanna update the gist / code review:
[23:22:24] <milimetric>	 https://www.irccloud.com/pastebin/XPf2Y9l1/
[23:22:39] <mforns>	 thx
[23:25:34] <nuria>	 milimetric: excellent!
[23:26:06] <milimetric>	 :) it's not fancy like dashiki but I wanted to link from the blog post
[23:26:51] <nuria>	 milimetric: i was about to do that , just mentioned marcel the other day how bookmarks make your life so much easier
[23:27:52] <milimetric>	 yes, I'm kind of worried this leads us on the path of people filing bugs and feature requests for the demo, but we'll see :)
[23:27:53] <nuria>	 milimetric: but you need to remove those if you remove them from the search bar , right?
[23:27:54] <mforns>	 milimetric, nuria: I updated the gist
[23:28:13] <mforns>	 nuria, yes, or add new ones if you add them to the search bar
[23:28:25] <madhuvishy>	 milimetric: ha ha i was just about to ask - date ranges on the url?
[23:28:27] <mforns>	 but we can implement that later, I already updated the gist
[23:28:31] <milimetric>	 nuria: yeah, but that's what I mean, I kinda don't want it to get too fancy
[23:28:38] <milimetric>	 I was just doing it specifically to allow direct linking
[23:28:59] <mforns>	 makes sense
[23:29:01] <milimetric>	 see?!!  /me closes vim and forgets where on limn1 this code is
[23:29:19] <madhuvishy>	 :D
[23:29:32] <madhuvishy>	 project too please :P
[23:29:42] <mforns>	 hehehe
[23:59:15] <mforns>	 a-team, good night, see you tomorrow!
[23:59:23] <milimetric>	 have a nice night