[00:06:18] Wikimedia-General-or-Unknown, Analytics: Sudden drop in number of articles on nowiki on Nov29 (by 34k articles) - https://phabricator.wikimedia.org/T76356#808865 (Ironholds) To be absolutely clear, we don't maintain WikiApiary. If the bug is only appearing there, you probably want to contact the people who do... [00:06:47] Analytics-EventLogging: Engineer reads documentation on Wikitech to set up a dashboard from EL data [3 pts] - https://phabricator.wikimedia.org/T76364#808875 (kevinator) just found this extensive documentation on setting up a schema https://www.mediawiki.org/wiki/Extension:EventLogging/Guide#Creating_a_schema [00:07:25] Wikimedia-General-or-Unknown, Analytics: Sudden drop in number of articles on nowiki on Nov29 (by 34k articles) - https://phabricator.wikimedia.org/T76356#808877 (Ironholds) [00:10:07] Analytics, Analytics-Engineering: Analytics User uses CentralNotice cookie in x-analytics field of web-request logs - https://phabricator.wikimedia.org/T75835#808882 (Ironholds) This would be awesome to have. I want to do a study on the probability distribution of banner impressions/pageviews to clickthroughs... [00:34:01] Analytics-EventLogging, Analytics-Engineering: WMF engineer follows sets to collect EL data - https://phabricator.wikimedia.org/T76679#808947 (kevinator) [00:34:25] Analytics-EventLogging, Analytics-Engineering: WMF engineer follows steps to collect EL data - https://phabricator.wikimedia.org/T76679#808947 (kevinator) [00:39:32] Analytics, Analytics-Engineering: Epic: Analyst uses an operationalized Saiku - https://phabricator.wikimedia.org/T75246#808958 (kevinator) [00:42:28] Analytics-EventLogging, Analytics-Engineering: WMF engineer follows steps to collect EL data - https://phabricator.wikimedia.org/T76679#808965 (kevinator) p:Unbreak!>High [01:03:13] (PS3) Bmansurov: Add support for timezones while creating reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/175169 (https://bugzilla.wikimedia.org/72116) [01:19:33] (PS4) Bmansurov: Add support for timezones while creating reports [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/175169 [01:22:32] nuria__: Hi, do you think this is a good task for me to work on? https://phabricator.wikimedia.org/T76521 [01:29:57] Analytics-Wikimetrics, Analytics-Engineering: Support page should mention Phabricator, not Bugzilla - https://phabricator.wikimedia.org/T76521#809011 (bmansurov) a:bmansurov [01:35:15] (PS1) Bmansurov: Replace Bugzilla with Phabricator on the support page [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/177460 [03:48:25] (PS1) Unicodesnowman: Make Dashiki responsive [analytics/dashiki] - https://gerrit.wikimedia.org/r/177487 [03:49:53] Analytics-Dashiki: Dashiki needs to have a friendlier mobile view. - https://phabricator.wikimedia.org/T75030#809113 (Unicodesnowman) [03:55:44] Analytics-Dashiki: Dashiki needs to have a friendlier mobile view. - https://phabricator.wikimedia.org/T75030#809115 (Unicodesnowman) Screenshots: Desktop before & after https://i.imgur.com/Wu0k9Qe.png (overflow fixed) Nexus 5 Portrait http://i.imgur.com/c0MxIJk.png Nexus 5 Landscape http://i.imgur.com/GNWCu... [04:06:09] Analytics-EventLogging: Provide a robust way of logging events without blocking until network request completes; use sendBeacon with client-side storage fallback - https://phabricator.wikimedia.org/T44815#809120 (Prtksxna) [07:28:54] Analytics-Wikimetrics: Story: Dashiki uses Mediawiki for storage [13 pts] - https://phabricator.wikimedia.org/T70448#809266 (kevinator) Here are the config files for Vital Signs (https://metrics.wmflabs.org/static/public/dash): * https://meta.wikimedia.org/wiki/Dashiki:CategorizedMetrics * https://meta.wikime... [07:29:22] Analytics-Dashiki: Story: Dashiki uses Mediawiki for storage [13 pts] - https://phabricator.wikimedia.org/T70448#809267 (kevinator) [07:41:52] Analytics-Dashiki, Analytics-Engineering: User sees banner in Dashiki when loading the site - https://phabricator.wikimedia.org/T76695#809275 (kevinator) p:Triage>Normal [09:58:25] Wikimedia-General-or-Unknown, Analytics: Sudden drop in number of articles on nowiki on Nov29 (by 34k articles) - https://phabricator.wikimedia.org/T76356#810307 (jeblad) To quote myself from Erik Mõllers page at meta: //It seems like 29. nov or shortly before there was a drop in number of articles at nowiki... [12:53:10] Analytics-Refinery: Raw webrequest partitions for 2014-12-03T17/1H not marked successful - https://phabricator.wikimedia.org/T76708 (QChris) NEW p:Triage a:QChris [13:03:51] Analytics-Refinery: Raw webrequest partitions for 2014-12-03T17/1H not marked successful - https://phabricator.wikimedia.org/T76708#818695 (QChris) All partitions show ~16% duplicates (no missing lines) for all hosts. Since analytics1027 got upgraded during that hour, I attribute it to the upgrade (although... [13:05:25] Analytics-Refinery: Raw webrequest partitions for 2014-12-03T17/1H not marked successful - https://phabricator.wikimedia.org/T76708#818696 (QChris) [13:06:25] Analytics-Refinery: Raw webrequest partitions for 2014-12-03T17/1H not marked successful - https://phabricator.wikimedia.org/T76708#818672 (QChris) [13:06:39] Analytics-Refinery: Raw webrequest partitions that were not marked successful due to configuration updates - https://phabricator.wikimedia.org/T74300#818698 (QChris) [13:10:20] (PS1) QChris: [DO NOT SUBMIT] First shot at deduping [analytics/refinery] - https://gerrit.wikimedia.org/r/177522 [13:10:43] (CR) QChris: [C: -2] "Just parked code" [analytics/refinery] - https://gerrit.wikimedia.org/r/177522 (owner: QChris) [13:32:10] qchris: hiya! [13:32:14] heya! [13:32:21] as promised I am here and ready to deploy! [13:32:26] AWESOME! [13:32:30] shall we wait until 09:00:01? [13:32:32] Analytics-Tech-community-metrics: Key performance indicator: Bugzilla response time - https://phabricator.wikimedia.org/T63561#818766 (Qgil) [13:32:37] I think so. [13:32:39] k [13:32:45] At least for the C implementation. [13:32:49] i'm going to stop puppet on the two machines...which i always have to look up [13:32:55] oh are there changes to hive? [13:33:13] ah [13:33:14] i see [13:33:19] https://gerrit.wikimedia.org/r/#/c/177224/ [13:33:48] The relevant machines are oxygen and gadolinium for the C implementation. [13:35:26] If the Hive part is merged, I guess I can take care of the rest. [13:35:45] (I am curious of trying out my new deploy permissions :-) ) [13:36:00] (CR) Ottomata: [C: 2 V: 2] Ignore traffic from cache-local SSL terminators [analytics/refinery] - https://gerrit.wikimedia.org/r/177224 (owner: QChris) [13:36:01] ok! [13:36:02] merged. [13:36:10] Cool. Thanks. [13:36:16] ok, puppet is stopped on those nodes [13:36:22] i'm going to reprepro update apt [13:36:50] cool [13:36:50] http://apt.wikimedia.org/wikimedia/pool/main/w/webstatscollector/ [13:36:52] 0.5 [13:37:11] Neat! [13:37:25] So I geoss we're set up for 14:00:01 :-) [13:37:32] s/geoss/guess/ [13:37:43] ha, sorry, yes not 9:00. that was timezoneist [13:37:48] 14:00:01 [13:37:56] i shoudl be able to check that collector has a new file, right? [13:38:06] Right. [13:38:11] then upgrade and restart both filter and collector [13:38:12] cool [13:39:01] qchris: check this out: [13:39:02] http://grafana.wikimedia.org/#/dashboard/db/kafkatest [13:39:06] still needs some work, and there are things to do [13:39:08] but its kidna cool! [13:39:22] Whoa :-D [13:39:50] I need to do some rearranging of some varnishkafka things...i might restart bits varnishkafkas today, not sure [13:39:54] so , heads up for 0 resets there [13:40:25] Noted. Thanks. [13:41:13] Btw. I cannot find the ChangeLog change for webstatscollector's C implementation. Is that already in gerrit? [13:50:17] ottomata: Yesterday, we had an hour for which all partitions showed 16% dupes, and 0% missing. Since this affects pagecounts-all-raw, I'd want to dedupe and arrive at an hour with clean data. [13:50:18] Would it be ok for you if I just replace the the duped directory underneath /wmf/data/raw with the deduped one, or would you want to keep the duped around somewhere for extra safety? [13:52:45] naw, its ok with me, replace away [13:52:59] um, ithought i pushed it, maybe I didn't... [13:53:01] ah igues si didn't [13:53:34] (PS1) Ottomata: Bump version to 0.5-1 [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/177535 [13:53:44] qchris: ^ mergy? [13:54:32] (CR) QChris: [C: 2] Bump version to 0.5-1 [analytics/webstatscollector] - https://gerrit.wikimedia.org/r/177535 (owner: Ottomata) [13:57:43] qchris: i am curious, how do you deduplicate? [13:57:46] hive query [13:57:47] ? [13:57:58] yes. And looooots of duct tape. [13:58:05] https://gerrit.wikimedia.org/r/#/c/177522/1/hive/webrequest/dedupe_partition.hql [14:00:33] ok i have 14:00 file [14:00:34] proceeding! [14:00:45] cool [14:01:54] range? [14:01:58] oh right [14:01:58] range [14:02:00] forgot about that field [14:03:02] by the way, can anyone check over this to make sure it looks good? https://gerrit.wikimedia.org/r/#/c/177487/ [14:07:41] unicodesnowman: I am sure the dashiki devs noted, but I'll bring it up during our daily meeting in about ~1 hour. [14:09:11] qchris, thank you :) not in a rush at all, would just like to know if I did things correctly -- I'm new to contributing, from GCI [14:12:14] psshhh, qchris, analytics1003 has not kernel panic-ed yet! [14:12:34] ottomata: >.< [14:12:43] i was waiting for it to do it again! [14:12:53] before I resinstalled it and proceeded with the cluster install [14:12:58] upgrade* [14:12:59] gr [14:13:00] Yup. [14:13:31] I looked at it a bit and could not find anything immediate. [14:13:53] Some similar reports on kernels in not too far away versions. [14:14:02] But not many. [14:15:28] Analytics-Dashiki: Dashiki needs to have a friendlier mobile view. - https://phabricator.wikimedia.org/T75030#818811 (QChris) [14:59:56] !log Deduped webrequest Hive partitions for 2014-12-03T17/1H [15:00:27] milimetric: hangout says "not allowed" [15:00:30] :( [15:00:50] mee too [15:01:04] invited all of you again [15:18:29] milimetric, Hi [15:20:12] hi rtnpro [15:20:15] saw your PR [15:21:15] milimetric, am I heading in the desired direction? [15:24:18] milimetric, ^^ [15:27:55] Analytics-Dashiki: Dashiki needs to have a friendlier mobile view. - https://phabricator.wikimedia.org/T75030#818914 (Nuria) [15:30:01] milimetric: i can't find a tasking hangout link [15:30:08] is it in batcave? [15:30:41] s/It/At/ [15:30:49] Analytics-Wikimetrics: Script to populate predefined tags can not acces production DB - https://phabricator.wikimedia.org/T75519#818916 (Nuria) Open>Resolved [15:30:50] Analytics-Tech-community-metrics: Key performance indicator: Bugzilla response time - https://phabricator.wikimedia.org/T63561#818915 (chasemp) [15:31:11] Whoops. My comment above ^ wast meant to be in a /msg :-/ [15:31:28] trying to join [15:31:28] ottomata: yes, sorry [15:31:32] won't let you? [15:34:09] Analytics-Refinery: Raw webrequest partitions that were not marked successful due to configuration updates - https://phabricator.wikimedia.org/T74300#818931 (QChris) [15:34:09] Analytics-Refinery: Raw webrequest partitions for 2014-12-03T17/1H not marked successful - https://phabricator.wikimedia.org/T76708#818929 (QChris) [15:34:30] ggellerman____ & kevinator: i invited you again [15:35:46] milimetric, let me know when you have some time, I will wait [15:36:00] rtnpro: sorry :( [15:36:02] Wikimedia-General-or-Unknown, Analytics: Sudden drop in number of articles on nowiki on Nov29 (by 34k articles) - https://phabricator.wikimedia.org/T76356#818932 (Ironholds) The last database dump was on 11 November, so that wouldn't really help. Again, there is no unusual move activity or delete activity; I... [15:36:09] last few days have been crazy with meetings [15:36:16] i'm in meetings for another few hours [15:36:20] i will definitely comment on that soon [15:36:27] *soon => after meetings :( [15:36:32] milimetric, ok :) [15:38:11] Analytics-Engineering: Dedupe at ETL phase (placeholder) - https://phabricator.wikimedia.org/T76724#818942 (ggellerman) [15:42:38] Analytics-Engineering: EPIC: Productionizing Wikimetrics - https://phabricator.wikimedia.org/T76726 (ggellerman) NEW p:Triage [15:45:13] Wikimedia-General-or-Unknown, Analytics: Sudden drop in number of articles on nowiki on Nov29 (by 34k articles) - https://phabricator.wikimedia.org/T76356#818969 (Reedy) [15:45:25] Analytics-Wikimetrics, Analytics-Engineering: EPIC: Productionizing Wikimetrics - https://phabricator.wikimedia.org/T76726#818970 (kevinator) [15:46:56] Analytics-Wikimetrics, Analytics-Engineering: EPIC: Productionizing Wikimetrics - https://phabricator.wikimedia.org/T76726#818961 (kevinator) [15:47:05] Analytics-Wikimetrics, Analytics-Engineering: EPIC: Productionizing Wikimetrics - https://phabricator.wikimedia.org/T76726#818974 (kevinator) p:Triage>Normal [15:48:08] unicodesnowman: thanks for your submit! will try to give you a CR today [16:31:47] ottomata: "It's unlikely that an upgrade would succeed; I recommend you simply spin up a new node which is likely to be both faster and more reliable." [16:31:59] aye [16:32:07] i have succeeded! but only as a test [16:32:13] not something that was long running [16:40:34] qchris_away, around? have a q for you [16:48:33] (CR) BBlack: [C: 1] Assert lengths aren't negative [analytics/kafkatee] - https://gerrit.wikimedia.org/r/177152 (owner: CSteipp) [16:49:18] (CR) Ottomata: [C: 1] Assert lengths aren't negative [analytics/kafkatee] - https://gerrit.wikimedia.org/r/177152 (owner: CSteipp) [18:03:43] Analytics-Engineering: EPIC: Getting Mondrian & Saiku productionized - https://phabricator.wikimedia.org/T76739 (ggellerman) NEW p:Triage [18:18:31] Analytics-Dashiki, Analytics-Engineering: Vital Signs user reads description of metric - https://phabricator.wikimedia.org/T76741#819194 (ggellerman) [18:19:11] Analytics-Engineering: EPIC: Getting Mondrian & Saiku productionized - https://phabricator.wikimedia.org/T76739#819204 (kevinator) p:Triage>Normal [18:19:40] Analytics-EventLogging, Analytics-Engineering: Epic: Engineer has simpler way to deploy dashboard from EL data - https://phabricator.wikimedia.org/T75836#819206 (kevinator) [18:20:06] Analytics-Dashiki, Analytics-Engineering: Vital Signs user reads description of metric - https://phabricator.wikimedia.org/T76741#819208 (kevinator) p:Triage>High [18:29:53] yurikR: back. What's the question you have for me? [18:33:51] qchris, i'm having some very strange discrepancy between the log files and hive - somehow hive has about 15-20% lower row count [18:34:24] Analytics-Refinery: WSC data in a cube - https://phabricator.wikimedia.org/T76093#819223 (kevinator) p:Triage>Normal [18:35:02] qchris, i can send you two files - one generated from the log files for one day for a specific carrier, and another - same data from hadoop [18:35:24] the sequence numbers didn't match, and neither have row counts [18:35:48] the sequence numbers are not expected to match. [18:36:01] data in hive gets produced by varnishkafka, and [18:36:13] data in the logs is coming from varnishncsa. [18:36:36] Different varnish parts, hence different sequence numbers. [18:36:50] By logs ... which files do you exactly mean. [18:36:53] sampled-1000? [18:36:57] mobile-sampled-100? [18:37:50] yurikR: ^ [18:38:12] qchris, no, the logs in squid [18:38:17] sec [18:38:42] qchris, /a/squid/archive/zero/ [18:39:11] i think the data in limn comes from those files too [18:41:04] Yes, the limn graphs use the zero tsvs too. [18:43:54] So how do you extract pageviews from the logs? [18:47:08] VisualEditor, Analytics-Engineering: Dashboard repository for Edit schema - https://phabricator.wikimedia.org/T76744#819244 (Catrope) [18:48:08] yurikR: ^ [18:48:09] qchris, i greped those logs for one date for one carrier, converted that file into a hive table [18:48:25] and ran my query against that table [18:48:43] except that i ran it to list all items that do NOT match a pageview count [18:49:00] and i also ran the same query on the rawrequests [18:49:05] and compared the results [18:49:12] the rawrequests count is lower [18:49:36] Sooo .... you're saying that the hive table wmf_raw.webrequest is missing rows? [18:49:55] seems that way [18:50:18] Do you have an example row? [18:50:33] i couldn't do direct comparison, i can send you two excel files :) [18:51:00] excel kinda crashes when i try to compare :( [18:51:04] And just in case ... you are aware that SSL requests cause two lines in the TSVs and only one row in the Hive table? [18:51:18] ?? [18:51:19] no [18:51:53] since i do my counting from hive... [18:52:27] now you know :-) [18:53:08] I am not sure how much SSL traffic is in zero.tsvs. So it might not align. [18:54:06] Also ... regarding the timestamp ... Hive really is sharp on the edges of the hours. [18:54:54] But on the other hand, the TSVs are getting logrotated around 06:30 [18:55:38] So for example the zero TSV with 20141204 in the file name effectively spans ~2014-12-03T06:30 until ~2014-12-04T06:30 [18:55:42] qchris, i did not go by the logfile name, i grepped several files [18:55:49] VisualEditor, Analytics-Engineering: Dashboard repository for Edit schema - https://phabricator.wikimedia.org/T76744#819269 (Milimetric) a:Milimetric [18:55:55] Ok. [18:56:02] Let's have a concrete hour. [18:56:26] Is 2014-12-03T20:00/1H ok? [18:57:03] we could work on one hour too i guess ) [18:57:32] will make the data set smaller. But lets take 2014-12-02T20:00/1H [18:57:33] VisualEditor, Analytics-Engineering: Dashboard repository for Edit schema - https://phabricator.wikimedia.org/T76744#819244 (Milimetric) I'm making this now. FYI - this will be for all edit-team related dashboarding, it's not restricted to the edit schema. [18:57:47] Ok. 2014-12-02T20:00/1H it is. [18:58:40] Analytics-Dashiki: Strange rendering glitches when removing lines - https://phabricator.wikimedia.org/T76745 (Catrope) NEW p:Triage [18:58:42] i converted and imported the log data in rawrequest (except for the 'range" column) as yurik.515_03_2014_12_02 [18:58:48] qchris, ^ [18:58:59] cool [18:59:41] (PS2) Ottomata: Ignore dia backup files for diagrams [analytics/refinery] - https://gerrit.wikimedia.org/r/177220 (owner: QChris) [18:59:48] (CR) Ottomata: [C: 2 V: 2] Ignore dia backup files for diagrams [analytics/refinery] - https://gerrit.wikimedia.org/r/177220 (owner: QChris) [19:00:56] Analytics-Dashiki: Removing lines updates URL hash, but editing URL hash or using back/forward buttons has no effect - https://phabricator.wikimedia.org/T76746#819281 (Catrope) [19:02:52] Analytics-Dashiki: Icon font 404ing on metrics-staging - https://phabricator.wikimedia.org/T76747 (Catrope) NEW p:Triage [19:07:31] yurikR: the wmf_raw.webrequest table in Hive has 420K rows for that hour. The zero.tsv also has 420K rows for that hour. [19:07:32] But! [19:07:47] The zero.tsv has 33K rows from the SSL terminators. [19:08:01] Maybe you counted those too? [19:08:42] 1) how do you tell them apart, 2) does limn graph exclude them? [19:08:43] * qchris hugs ottomata for all the merges. [19:09:38] Ad 1) the URL field starts in "https://" for the SSL terminators. [19:10:14] Ad 2) I totally do not remember. [19:10:18] qchris, but why do you think those are dups? [19:10:54] i think i generate https: myself as part of my log file to hive conv script [19:10:56] Because the SSL terminators log to udp2log and the varnishes do too. And an SSL request goes through both. [19:11:38] * yurikR commits dispicable acts [19:11:55] The zero.tsv have "https://" for SSL terminator requests, and "http://" for varnish requests. [19:12:31] qchris: the dashiki notification from unicodesnowman commit was on my inbox, it was just buried under 2 million phabricator e-mails [19:12:44] nuria__: k :-) [19:12:51] qchris, any way to count them just once from each file, but still know if they came from ssl? [19:13:11] now we mark them with xanalytics ssl=1 [19:13:19] but we started doing that half a year ago i think [19:13:35] You mean "https=1"? [19:14:03] right [19:14:08] Ok. [19:14:29] any way we can get older data from logs? [19:14:37] varnish logic around those parts is too hard for me to understand. But URLs starting in http:// or https:// is easier for me. [19:14:49] That's why I use http:// and https:// [19:15:02] sigh [19:15:07] but hive is accurate? [19:15:10] Last ~30 days is in hive. [19:15:27] Hive is not productionized. [19:15:37] I told you that before. More than once in the meantime. [19:15:54] I can look up which hours are good. [19:16:14] For the mobile partition, the hours are mostly good. [19:16:50] You can find known issues for that partition at [19:16:51] qchris, yes, you did (which btw differs from what tobie said) :) [19:16:52] https://wikitech.wikimedia.org/w/index.php?title=Analytics/Pagecounts-all-sites#Events_and_known_problems_since_2014-10-01 [19:17:03] Analytics: vet data loaded by Sean (user, page, edit tables) [13 pts] - https://phabricator.wikimedia.org/T76480#819300 (Nuria) [19:18:11] That table only covers the mobile and text partion. If you care about other partitions, it's a bit more involved. [19:18:16] qchris, mobile is all i need at the moment. My goal is to get you to sign off on 'accuracy' of our approach (reasonable) [19:18:32] Sure. [19:19:29] Sooooo coming back to the 2014-12-02T20 hour. [19:19:32] ok, so you think the difference between logs & hive as of 2 days ago is ssl related. [19:19:38] right, pulling a query [19:20:33] By how you imported the data from the TSVs, that would explain why the imported data does not match Hive's data. [19:22:51] Yes, I totally think that. Grepping the hour for example for 'Berlin', I get 23 hits from hive, and 26 from the tsv. [19:23:05] When diving down into the tsv, 3 of those 26 are https:// [19:23:17] Similar for other terms. [19:24:05] (Be carefull when compraing Main_Page and Angelsberg. Some of those are monitoring requests. Monitoring requests not necessarily handled in the same way) [19:26:30] qchris, http://pastebin.com/wg4HRfWu [19:28:51] I've got the result of the query now. I gues you want me to verify that with the TSV data. Right? [19:29:35] qchris, i ran both just now with my query: [19:29:36] 515-03 https en m wikipedia 26 [19:29:41] 515-03 https en m wikipedia 52 [19:30:03] 52 comes from the log file [19:30:06] 52=26*2. [19:30:10] so i guess it was the https :) [19:30:34] the only other diff: [19:30:36] 515-03 en m wikipedia 1644 [19:30:43] vs 1643 in hive [19:30:46] but i think that's minor [19:31:26] That might be a monitoring request that's only in the TSV. [19:31:37] all this brings us to basically the issue of dups in log imports [19:32:02] dups for you ... useful information for others .-) [19:32:03] i guess i will have to remove the https://, and only treat them as https when https=1 is given [19:32:15] thanks, that might account for the issue we were having [19:32:26] will stop bugging you for now ) [19:32:39] Oh. Now I get your point. You're backfilling for historical data? [19:32:46] This is not for the new files? [19:32:51] s/files/hours/ [19:33:21] I thought you were importing the TSVs just to proved that Hive is wrong :-) [19:33:34] Yes, totally. When importing old TSVs, [19:33:42] throw the https:// away. [19:33:56] But be carefull to only grep for https in the 9th column. [19:34:08] And only at the beginning of it. [19:34:26] There are some requests where https:// occurs elsewhere. [19:34:34] Like in paramaters, or in the Referer. [19:35:45] Anywaaaaaaaay. Thanks for dealing with all that! :-) [19:36:13] ire [19:36:20] wrong text field! [19:36:28] :-) [19:39:04] Oh ... and yurikR, if you are importing old. Be aware previously, there was also squids that logged to TSVs. They also used a slightly different format. [19:39:09] https://wikitech.wikimedia.org/wiki/Cache_log_format [19:39:18] Should (hopefully) have all the details you need. [19:40:15] (Argh. Today my typing is double-extra-lousy :-( ) [19:43:08] qchris, yes, i take care of all the formatting diffs [19:43:18] and create properly formatted hive-style data [19:43:25] Cool. [19:43:36] i didn't know about the https [19:43:46] thx for clarifying ) [19:44:29] yw [19:46:24] (PS1) Milimetric: Add example query, config, and graph [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/177601 [19:47:29] (PS2) Milimetric: Add example query, config, and graph [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/177601 [19:48:52] VisualEditor, Analytics-Engineering: Dashboard repository for Edit schema - https://phabricator.wikimedia.org/T76744#819344 (Milimetric) [19:54:52] (PS2) QChris: [DO NOT SUBMIT] First shot at deduping [analytics/refinery] - https://gerrit.wikimedia.org/r/177522 [19:56:13] (CR) Catrope: [C: -1] Add example query, config, and graph (1 comment) [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/177601 (owner: Milimetric) [20:00:42] (PS3) Milimetric: Add example query, config, and graph [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/177601 [20:01:00] (CR) Milimetric: [C: 2 V: 2] Add example query, config, and graph [analytics/limn-edit-data] - https://gerrit.wikimedia.org/r/177601 (owner: Milimetric) [20:01:42] ottomata: could you merge https://gerrit.wikimedia.org/r/#/c/177602/2 [20:01:51] (editing team's repo is all set with initial stuffs) [20:04:45] ooooo so many coming in! [20:04:47] ok [20:04:55] danke [20:06:15] VisualEditor, Analytics-Engineering: Dashboard repository for Edit schema - https://phabricator.wikimedia.org/T76744#819375 (Jdforrester-WMF) [20:08:28] milimetric, leila: I just sent you both an invite for next Wednesday to discus LE analytics work. Please let me know if you can't make it. Thanks! [20:09:07] jsahleen: just making sure we're all required - what will we be covering in the meeting? [20:09:50] milimetric: I'd like to discuss how best to collect our metrics and see if we can get dashboards set up so they are ready when we deploy in January. [20:10:15] jsahleen: ok, so maybe leila is good to talk to about metrics and what to analyze. [20:10:28] milimetric: we can do the dashboards later if you want. [20:10:55] as far as dashboards, I just set them up for flow and the edit team. All it takes is for you to request it in Phabricator. Here's, for example, Editing team's request: https://phabricator.wikimedia.org/T76744 [20:11:27] I was able to do it quickly after they requested it, and I'll certainly do the same for you [20:11:42] jsahleen, let's keep the calendar event as a HOLD, pending on the outcome of the earlier meeting that will finalize the metrics and commitments [20:11:51] jsahleen: if you'd like, as soon as you have event logging data we can set up a very basic dashboard? Let me know [20:12:09] milimetric: Got it. No need to attend the meeting then. We can connect after everything is confirmed with Leila. [20:12:17] leila: Ok. I will set it as a hold. [20:12:23] thanks jsahleen [20:12:27] ok, cool - I await your command good sir [20:12:34] :) [20:15:31] milimetric: hoy! [20:15:52] ori: howdy [20:16:22] EL is blowing up like fireworks on Jan 1st in the Netherlands [20:16:32] blowing up in a good or bad way? [20:16:42] I'm setting up dashboards left and right man [20:16:43] milimetric: re: real time -- i'm just accepting your invitation now, but could i ask you to gather as much specific detail as possible about the user requirements? i'm a bit worried that "real-time" means "i get everything i want instantly", which is a hard spec to meet. [20:16:44] yeah - crap I gotta work on my analogies [20:17:03] I was mostly setting up that meeting for ottomata [20:17:19] he's hearing a lot of the same kind of thing from different folks [20:17:39] I think this meeting is to have a few of us talk about it and what we could do as a "first step" [20:17:44] cool [20:18:43] ja, ori, this would be a meeting for us to just brain bounce on a way to get people realtimeish app metrics, like we've kinda talked about before [20:18:51] btw, awesome to see HHVM doing so well. Congratz to you and Aaron [20:18:58] ottomata: yep! [20:19:06] I once had a guy in an interview explain to me why graphite wasn't really "real time" because it's impossible :) I said, yeah thanks for that info....duh [20:19:11] milimetric: YuviPanda deserves the credit [20:19:17] btw, ori: https://gerrit.wikimedia.org/r/#/c/177546/ :D [20:19:24] haha [20:19:44] ottomata: reviewing [20:20:12] ottomata: what about historic data in ganglia? are you okay with forfeiting that? (or having the discontinuity?) [20:20:19] chasemp: lol [20:20:25] i'm not changing the names of the metrics here [20:20:30] just the filenames [20:20:30] so [20:20:47] /etc/varnishkafka/webrequest.conf, /var/cache/varnishkafka/webrequest.stats.json [20:20:54] ori: heh I do quite like walking in at the last minute and taking all credit for a thing I was mostly not involved in :) [20:20:58] the contents of those files should remain the same [20:21:08] i will have to manually clean up the old named instances [20:21:47] Analytics-EventLogging, Analytics-Engineering: Epic: WMF Engineer reads documentation to set up a dashboard from EL data - https://phabricator.wikimedia.org/T76362#819407 (Milimetric) [20:21:48] Analytics-EventLogging: Engineer reads documentation on Wikitech to set up a dashboard from EL data [3 pts] - https://phabricator.wikimedia.org/T76364#819404 (Milimetric) [20:21:52] yeah, correct. i just refreshed my memory by looking at the varnishkafka module code. lgtm [20:22:04] danke [20:22:08] Analytics-EventLogging, Analytics-Engineering: Epic: WMF Engineer reads documentation to set up a dashboard from EL data - https://phabricator.wikimedia.org/T76362#798767 (Milimetric) [20:22:09] Analytics-EventLogging: Engineer knows by when to expect a dashboard from EL data [1 pts] - https://phabricator.wikimedia.org/T76365#819408 (Milimetric) Open>Resolved a:Milimetric We might be wrong on the timing 'cause I just did it in way less time. But ... resolving :) [20:22:31] i will wait until tomorrow before I try to merge that [20:33:19] (CR) Milimetric: [C: -1] "Thanks very much for the work. I left some comments in-line." (3 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/177487 (owner: Unicodesnowman) [20:33:40] nuria__: sorry I just remembered you were going to review that ^ [20:34:12] milimetric: np, did you tried changing res with chrome mobile tools? [20:34:41] i did, and looked at his screenshots. I'm glad that the problem is getting tackled, but there's still some work [20:34:42] milimetric: there are lot of viewport sizes that chrome supports now [20:34:56] yeah, I was using that the other day to help a friend - thanks for pointing those out [20:35:08] I still always just resize my dev tools :) [20:35:27] ori: btw, not sure who set this up, but grafana is awesome [20:35:55] brb [20:41:03] (CR) Nuria: Make Dashiki responsive (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/177487 (owner: Unicodesnowman) [20:42:31] Analytics-Engineering: EPIC: Getting Mondrian & Saiku productionized - https://phabricator.wikimedia.org/T76739#819435 (ggellerman) [20:42:44] ottomata: i did! [20:43:13] ori, super awesome [20:43:22] just now playing with it [20:43:22] http://grafana.wikimedia.org/#/dashboard/db/kafkatest [20:43:46] Analytics-Engineering: pick a version of Pentaho (Community Edition vs. Enterprise) - https://phabricator.wikimedia.org/T76753#819439 (ggellerman) [20:44:20] Analytics-Engineering: add Pentaho to Archiva - https://phabricator.wikimedia.org/T76754 (ggellerman) NEW p:Normal [20:44:41] Analytics-Engineering: puppetize Pentaho - https://phabricator.wikimedia.org/T76755#819453 (ggellerman) [20:44:50] ottomata: pretty, nice that it has a dashboard finder! [20:45:00] Analytics-Engineering: LDAP authentication - https://phabricator.wikimedia.org/T76756 (ggellerman) NEW p:Normal [20:45:15] Analytics-Engineering: provision hardware and install - https://phabricator.wikimedia.org/T76757 (ggellerman) NEW p:Normal [20:45:21] yeah, and you can apply all these amazing graphite functions to the data [20:45:36] Analytics-Engineering: Have a PostgreSQL database to pull data from (warehouse database on analytics-store) - https://phabricator.wikimedia.org/T76758 (ggellerman) NEW p:Normal [20:45:57] and an API even! [20:45:59] well, that's graphite [20:46:09] this uses graphite's api to pull out datapoints in json [20:47:09] ottomata: graphite is awesome no doubt [20:51:07] Analytics-Engineering: EPIC: Aggregating and Shaping the data for browsing - https://phabricator.wikimedia.org/T76761 (ggellerman) NEW p:Triage [20:51:37] Analytics-Engineering: Creating the data in Hadoop from the raw request data, with Oliver's definition - https://phabricator.wikimedia.org/T76762 (ggellerman) NEW p:Triage [20:51:47] WHAAAAA [20:52:11] GET outta here, who's tracking that :p [20:52:12] Analytics-Engineering: Define Oozie-executable job to aggregate the data (Hive?) - https://phabricator.wikimedia.org/T76763 (ggellerman) NEW p:Triage [20:52:38] Analytics-Engineering: Schedule Oozie job and set up monitoring - https://phabricator.wikimedia.org/T76764 (ggellerman) NEW p:Triage [20:53:00] Analytics-Engineering: Sqoop? data from Hadoop into a PostgreSQL database, Oozify and monitor this - https://phabricator.wikimedia.org/T76765 (ggellerman) NEW p:Triage [20:53:28] Analytics-Engineering: Mondrian has access to the PostgreSQL database - https://phabricator.wikimedia.org/T76766 (ggellerman) NEW p:Triage [20:53:37] ggellerman____: you are crazy and awesome! [20:54:06] I am crazy, but thanks! :-) [20:54:29] Analytics-Engineering: make cubes as windows into warehouse - https://phabricator.wikimedia.org/T76767#819541 (ggellerman) [20:59:35] Analytics: Move stat1002 and stat1003 into Analytics VLAN - https://phabricator.wikimedia.org/T76346#819561 (Gage) [21:00:03] Analytics-Wikimetrics, Analytics-Engineering: EPIC: Productionizing Wikimetrics - https://phabricator.wikimedia.org/T76726#819562 (ggellerman) see Dec 4, 2014 tasking mtg notes: http://etherpad.wikimedia.org/p/analytics-tasking [21:00:10] Analytics: Move stat1002 and stat1003 into Analytics VLAN - https://phabricator.wikimedia.org/T76346#798392 (Gage) [21:01:24] Analytics-Wikimetrics, Analytics-Engineering: [Dev - 8 pts] upgrade wikimetrics to trusty (using same install process as we have now. Probably just setting up new instances, not uprgrading them) - https://phabricator.wikimedia.org/T76769 (ggellerman) NEW p:Normal [21:02:52] Analytics-Wikimetrics, Analytics-Engineering: [Dev 13 pts] change wikimetrics in labs to use debian packages, try in staging 1st, make sure this works alongside of pip install with webserver part, too - https://phabricator.wikimedia.org/T76770 (ggellerman) NEW p:Normal [21:13:31] (CR) Nuria: [C: 2 V: 2] Replace Bugzilla with Phabricator on the support page [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/177460 (owner: Bmansurov) [21:13:57] (Merged) jenkins-bot: Replace Bugzilla with Phabricator on the support page [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/177460 (owner: Bmansurov) [21:22:12] VisualEditor, Analytics-Engineering: Dashboard repository for Edit schema - https://phabricator.wikimedia.org/T76744#819633 (Milimetric) [21:27:11] VisualEditor, Analytics-Engineering: Dashboard repository for Edit schema - https://phabricator.wikimedia.org/T76744#819648 (Jdforrester-WMF) [21:27:39] hmm, milimetric|post just a thang: https://github.com/RuckusWirelessIL/pentaho-kafka-consumer [21:28:30] ottomata: cool, but that's a kettle consumer [21:28:35] we're not planning on installing that - though we could [21:28:37] dunno what ees dees kettle [21:28:40] it's their data transformation thingy [21:28:51] basically their ETL solution [21:29:32] i didn't know how to go about trying it out - but if you want to look into it, might be worth your time [21:29:35] if it works as advertised, anyway [21:29:35] aye [21:30:01] * YuviPanda likes ottomata's spellings today :) [21:37:48] Analytics-Wikimetrics, Analytics-Engineering: [Dev 13 pts] Fix oauth and do a quick pre-security review. - https://phabricator.wikimedia.org/T76779#819674 (ggellerman) [21:38:39] Analytics-Wikimetrics, Analytics-Engineering: [Dev 8 pts] Change admin script so it doesn't have to make symlinks, configure production instance to write files to /a/limn-public-data - https://phabricator.wikimedia.org/T76780 (ggellerman) NEW p:Normal [21:46:36] Analytics-Wikimetrics, Analytics-Engineering: Arrange for security review - https://phabricator.wikimedia.org/T76782 (ggellerman) NEW p:Normal [21:51:03] milimetric: questuon: should teh alembic changeset be one on top of the current dw changes we have ? [21:51:07] *the [21:51:13] *question [21:51:46] Analytics, Analytics-Engineering: THEME: Analyst uses an operationalized Saiku - https://phabricator.wikimedia.org/T75246#819752 (ggellerman) [21:52:48] nuria__: sure, let's merge that one as "round 1" and you can add a second one for the alembic stuff [21:53:03] !log deployed refinery 115f510 [21:53:22] k, have all pieces now, just need to make sure things work as advertised for our purpose [21:56:00] Analytics, Analytics-Engineering: THEME: Analyst uses an operationalized Saiku - https://phabricator.wikimedia.org/T75246#819770 (ggellerman) see related epics: EPIC: Getting Mondrian & Saiku productionized https://phabricator.wikimedia.org/T76739 EPIC: Aggregating and Shaping the data for browsing https://p... [21:56:14] (CR) Unicodesnowman: Make Dashiki responsive (2 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/177487 (owner: Unicodesnowman) [21:56:23] time to try and learn react :) [21:58:13] unicodesnowman: hm... the problem might be simpler than that? [21:58:33] your assumption that the graph needs to fill the page: that's true, but not a hard requirement [21:59:03] most of the data we have won't display if the graph is too small. So we made min-width and min-height for a reason [21:59:23] Analytics, Analytics-Engineering: THEME: Analyst uses an operationalized Saiku - https://phabricator.wikimedia.org/T75246#819780 (ggellerman) [21:59:43] it's fine if the layout generally spills out - that just means people's screens are too small. If we have a specific problem with a projector layout or something, we can change the min-height and min-width for that media maybe [22:00:00] Analytics, Analytics-Engineering: THEME: Analyst uses an operationalized Saiku - https://phabricator.wikimedia.org/T75246#753415 (ggellerman) [22:00:00] oh, hmm [22:00:32] do you think the overflow approach is good for mobile though? maybe I'll only apply the overflow in responsive.js [22:00:33] css* [22:00:59] unicodesnowman: your irc nick is my favorite [22:01:36] ori, thanks! ☃ [22:01:50] Analytics-Wikimetrics, Analytics-Engineering: debianizing any (scheduler/queue) wikimetrics dependencies that are not in trusty / our apt [22:01:59] such a good nick, yea :) [22:02:24] um, unicodesnowman: not sure about mobile. That's always a pain... One thought I had was to force landscape mode? [22:02:50] i mean some of these graphs don't work very well in such a small space. [22:03:59] where does the whitelisted_mediawiki_projects hiveconf variable live, anyone? [22:05:38] milimetric, landscape mode is better but the whole graph still can't be fitted in the screen, so I think scrolling is a good way to display the graph [22:06:51] unicodesnowman: i agree, i think. If you set it up that way I think it would be a very worthwhile experiment anyway [22:09:19] ok :) I'll try and get a collapsible menu working too [22:11:30] unicodesnowman: cool, let me know if you need to brain-bounce anything. Knockout can be weird to work with if you're not used to it [22:14:43] later everyone! [22:15:53] take care! [22:21:42] erm, how do I get knockout to update its components? I've been modifying components/project-selector.html and it's not showing up, even after gulping [22:26:09] ah, browser caching issue. [22:43:09] Analytics-Refinery: Find deployment host to deploy refinery from that has neither refinery-hive, nor passwords in hive-site.xml - https://phabricator.wikimedia.org/T76806#819971 (QChris) [22:43:15] Analytics-Engineering: EPIC EL documentation & evangelism - https://phabricator.wikimedia.org/T76795#819978 (ggellerman) [22:43:49] ha, good call qchris! [22:43:56] mhmm? [22:44:26] Oh. wikibugs already reported that :-) [22:44:40] Ja. Oozie jobs are failing with missing that jar. [22:44:51] It took me a bit to see where it's coming from. [22:45:15] I monkey patched today's deployment. [22:45:25] Hopefully that will make jobs pass again :-) [22:46:15] Analytics-Dashiki: Dashiki needs to have a friendlier mobile view. - https://phabricator.wikimedia.org/T75030#819986 (Aklapper) >>! In T75030#818914, @Nuria wrote: > Will take a look at the code within the day today. Hi Nuria, once you've had time to review Unicodesnowman's patch in Gerrit, could I ask you t... [22:49:04] Analytics-Engineering: set up weekly office hours for EL - https://phabricator.wikimedia.org/T76796#819987 (ggellerman) [22:49:29] Analytics-Engineering: work with Rachel on how best to evangelize - https://phabricator.wikimedia.org/T76797#819988 (ggellerman) [22:49:48] Analytics-Engineering: set up mtg w/Visual Editor (Ron) to talk about what we did months back - https://phabricator.wikimedia.org/T76798#819989 (ggellerman) [22:50:09] Analytics-Engineering: set up hackaton at MWDS - https://phabricator.wikimedia.org/T76800#819992 (ggellerman) [22:50:25] Analytics-Engineering: drive consistency across all of the schemas (standard fields & def'ns) - https://phabricator.wikimedia.org/T76801#819993 (ggellerman) [22:50:52] Analytics-Engineering: think through capacity management & Information Architecture - https://phabricator.wikimedia.org/T76803#819994 (ggellerman) [22:53:00] !log Monkey patched 2014-12-04T21.51.53Z--115f510 and "current" deployment of refinery to not add refinery-hive.jar to oozie's hive config (See {{PhabT|76806}}) [22:55:08] ok, i got to run, ttyl [22:55:46] qchris: btw, shinken warning for free space on /var/log for your instance: http://shinken.wmflabs.org/host/qchris-master [22:55:52] (guest/guest) uid/pw [22:56:04] Oh. Cool :-) [22:56:07] Thanks. [22:56:19] qchris: :) yw! [22:58:35] Mhmmm. Cannot ACK. I guess that's ok for a "guest". [22:59:14] qchris: yeaaaah, ACL is a bit messed up still, and no easy way to fix that [22:59:34] No worries :-) I just clean up /var/logs a bit. [22:59:43] qchris: :) cool [22:59:54] Btw ... Is there some easy way to opt out of monitoring instances? [23:00:03] not yet, I'm afraid. [23:00:08] but I think that should be done. [23:00:08] I would not want you to get emails just because some of my instances are failing. [23:00:15] k. [23:00:16] well, I personally don't mind :P [23:00:24] but I think we need far more fine grained control than what we have now [23:00:27] :-D [23:00:40] qchris: although, personally, in an ideal world, there wouldn't be an 'analytics' project, but a 'wikimetrics' project, etc [23:00:51] Full ACK. [23:01:11] qchris: and now that new project creations go through phab, that should be much easier! [23:01:16] no more requests rotting for weeks [23:02:40] Yay! [23:05:12] Analytics-Wikimetrics: Get a separate Labs Project to host wikimetrics instances in - https://phabricator.wikimedia.org/T76808#820023 (QChris) [23:05:43] YuviPanda: Let's see if ^ gets prioritized :-) [23:06:55] qchris: so, clicking 'request project' in wikitech sidebar takes you to https://phabricator.wikimedia.org/T76375 with instructions on where to file the task :) [23:07:16] qchris: so if you follow that all the labs admins get notified, and I usually do them as soon as I see them [23:07:17] I'll gonna link that :-) [23:07:47] * qchris pads YuviPanda on the shoulder for being a great admin :-) [23:08:00] heh :) [23:12:02] Analytics-Wikimetrics: Get a separate Labs Project to host wikimetrics instances in - https://phabricator.wikimedia.org/T76808#820045 (QChris) Just in case ... it seems T76375 is the new place to add requests under. [23:25:00] Analytics: After switch to local ssl terminators, pagecounts-raw (C implementation of webstatscollector) overcounts HTTPS from some data-centers. - https://phabricator.wikimedia.org/T76390#820052 (QChris) ottomata deployed the fixed webstatscollector between 2014-12-04T14:00:00 and 2014-12-04T14:03:00. The co... [23:26:00] (CR) Nuria: Make Dashiki responsive (1 comment) [analytics/dashiki] - https://gerrit.wikimedia.org/r/177487 (owner: Unicodesnowman) [23:41:41] (PS2) Unicodesnowman: Make Dashiki responsive [analytics/dashiki] - https://gerrit.wikimedia.org/r/177487 [23:43:45] (CR) Unicodesnowman: "Ok, done :) This is now just the mobile changes, without changing overflow on the desktop." [analytics/dashiki] - https://gerrit.wikimedia.org/r/177487 (owner: Unicodesnowman) [23:46:27] Analytics-Wikimetrics: Tag cohort functionality needs to check for ownership - https://phabricator.wikimedia.org/T68483#820070 (bmansurov) a:bmansurov