[11:19:29] Analytics-Cluster: Raw text webrequest partitions for 2014-12-30T20/1H not marked successful - https://phabricator.wikimedia.org/T85692#952364 (QChris) NEW [11:20:00] Analytics-Cluster: Raw text webrequest partitions for 2014-12-30T20/1H not marked successful - https://phabricator.wikimedia.org/T85692#952364 (QChris) It only affects amssq42.esams.wmnet, which has 101 duplicates between 2014-12-30T20:50:33 and 2014-12-30T20:50:38 analytics1012: [2014-12-30 20:52:02,145]... [11:20:31] Analytics-Cluster: Raw text webrequest partitions for 2014-12-30T20/1H not marked successful - https://phabricator.wikimedia.org/T85692#952371 (QChris) Open>Resolved a:QChris Deduped the partition. [11:20:32] Analytics-Cluster: Raw webrequest partitions that were not marked successful - https://phabricator.wikimedia.org/T72085#952374 (QChris) [11:21:11] !log Marked raw text webrequest partition for 2014-12-30T20/1H ok (See {{PhabT|85692}}) [12:07:43] Analytics-Cluster: Raw text webrequest partitions for 2014-12-29T17/1H not marked successful - https://phabricator.wikimedia.org/T85695#952408 (QChris) NEW [12:31:51] Analytics-Cluster: Raw text webrequest partitions for 2014-12-29T17/1H not marked successful - https://phabricator.wikimedia.org/T85695#952423 (QChris) Around 2014-12-29T17:30, commit 64c7ea2ca798666bdd1638f2fa5423a226610688 broke varnish's reporting to the correct kafka topic on 2014-12-29T18:22. It was fixe... [12:34:50] Analytics-Cluster: Raw mobile webrequest partitions for 2014-12-29T17/1H not marked successful - https://phabricator.wikimedia.org/T85695#952424 (QChris) [12:39:34] !log Marked raw mobile webrequest partition for 2014-12-29T17/1H ok (See {{PhabT|85695}}) [14:27:30] (CR) Ottomata: Mobile apps oozie jobs (4 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/181017 (owner: Nuria) [14:29:09] (CR) Ottomata: "Cool! Makes sense. Lets definitely make this generic in the util/ directory. email can then be a parameter too, etc." [analytics/refinery] - https://gerrit.wikimedia.org/r/182350 (owner: Nuria) [14:49:48] qchris: hello! [14:49:56] ottomata: Heya! Happy new year! [14:50:37] happy new year to you too! [14:50:51] You already back in working mode? [14:50:55] ja, working today [14:50:58] didn't work on wed. [15:06:30] Analytics: analytics-logbot is no longer in the #wikimedia-analytics channel - https://phabricator.wikimedia.org/T85698#952517 (QChris) NEW [15:17:27] so qchris, analytics1021 is offline [15:17:30] well. [15:17:31] not a leader [15:17:41] did you want to look at it for something while it was in this state? [15:21:20] the esams-bits issue should surface more clearly while we only have 3 leaders. [15:21:45] s/esams-bits/esams/ [15:22:15] I am currently cleaning up and rerunning the partitions that failed during christmas, [15:22:33] so I haven't found to look at the issue again. [15:22:50] (And I am not sure if I am supposed to look into it in first place) [15:22:51] But! [15:23:03] We tuned cp3022's timeout some time ago. [15:23:29] And for one of the issues, it only had 252 duplicates, where all others had missings. [15:23:51] Might be a coincidence. [15:23:56] But still :) [15:38:10] hmmm interesting [15:38:13] missing is better than duplicate! [15:46:14] I'd say the opposite. Duplicate is better than missing. [15:46:19] We can always dedupe. [15:46:37] But we cannot regenerate lines that are missing from kafka. [15:48:50] Analytics-Cluster: Raw text webrequest partitions for 2014-12-11T20/1H not marked successful - https://phabricator.wikimedia.org/T85699#952580 (QChris) NEW [15:49:07] Analytics-Cluster: Raw webrequest partitions that were not marked successful due to depooled servers interfering with monitoring - https://phabricator.wikimedia.org/T74649#952593 (QChris) [15:49:09] Analytics-Cluster: Raw text webrequest partitions for 2014-12-11T20/1H not marked successful - https://phabricator.wikimedia.org/T85699#952580 (QChris) Open>Resolved a:QChris It only affected cp1008, which is a test host for SSL terminators. cp1008 had its sequence numbers reset. No missing/duplicates. [15:51:10] !log Marked raw text webrequest partition for 2014-12-11T20/1H ok (See {{PhabT|85699}}) [15:52:27] uh, that is what I meant to say. [15:52:39] :-D [15:54:54] Analytics-Cluster: Raw mobile webrequest partitions for 2014-12-29T17/1H not marked successful - https://phabricator.wikimedia.org/T85695#952601 (QChris) Open>Resolved a:QChris [15:54:55] Analytics-Cluster: Raw webrequest partitions that were not marked successful due to deployments gone wrong - https://phabricator.wikimedia.org/T74299#952603 (QChris) [16:44:08] qchris, fyi, at that time the config was conditionally restricted to apply only on mobiles [16:44:43] ottomata: you mean for T74299? [16:45:20] s/T74299/T85695/ [16:46:08] Two esams bits hosts look fishy around that time too. [16:46:08] One is harmless. [16:46:14] The other has a lots of duplicates and missings. [16:46:46] But I did not put it in the bug, as it might be unrelated, and we don't care about bits too much at this point. [16:47:11] 74299, ja [16:54:40] so qchris, i am playing with parquet webrequest data [16:54:56] i was writing a mapreduce job to convert, but then realized it it kinda easier to do with just hive [16:55:01] and I can cluster that way too. [16:55:17] Like replacing the current storage format or in addition to it? [16:55:29] i think in addition to it. i could probably get camus to write parquet [16:55:35] but then the stuff wouldnt' be clustered anyway [16:55:52] so, ja, etl isn't really well defined [16:56:01] but, i'm thinking about just starting on it sooner rather than later, and iterating. [16:56:10] if I can get something settled [16:56:19] Sounds great. [16:56:27] maybe an oozie job that runs a hive query to select from and insert into parquet clustered by [16:56:40] since we almost have pageview, I could even etl out an is_pageview field too [16:56:51] Sounds great :-) [16:56:56] and, if we get a good geocode soon (ananth?), I can add that too [16:57:06] i'll still keep this dataset around only for 30 days [16:57:41] picking clustering columns and numbers isn't obivous though. [16:57:43] With those two (is_pageview + country) combined, I can even see a clear use case for the "columnar" aspect of parquet. [16:57:49] ja [16:58:03] at the very least it would make filtering much faster I thikn, right? [16:58:10] if you only wanted to look at pageviews, for example [16:58:22] I'd think so too. [16:58:24] i guess hive would be able to read only the pageview column and filter that way? [16:58:26] not really sure how that works. [16:58:32] anyway, clustering, yeah hm. so [16:58:47] i pick a column (or columns), and then hive hashes them and stores them in N files based on the hash [16:58:48] so [16:59:09] clustered by (COLUMN,LIST) into N buckets [16:59:23] I'm choosing N mainly on data size [16:59:33] i want to keep files roughly the same as HDFS block size [16:59:45] Sounds like a good plan [16:59:47] this is hard to do since partitions are different sizes, e.g. mobile is so much smaller than the others [17:00:49] but, i'm not sure what column to cluster on. [17:00:54] i think for just sampling [17:00:55] it doesn't matter [17:00:59] ip might be a good candidate [17:01:17] since there is a good amount of variety there [17:01:24] but also isn't completely random [17:01:27] Do people filter much on ip (I don't) [17:01:31] no [17:01:36] but if you are just sampling it doesn't matter [17:01:39] i mean [17:01:46] if you are using the bucketing just for sampling it doesn't matter [17:01:47] Oh. Just for the bucketing :-) [17:01:49] right. [17:02:06] but, apparently hive can optimize if you are doing joins between tables that are clustered on the same columns [17:02:10] Timestamp? [17:02:10] So dt. [17:03:28] Mhmm ... [17:03:37] Not sure joins are such a good idea overall. [17:03:49] What would people want to join against? [17:04:44] right now nothing, but I suppose eventually...mediawiki data? [17:04:51] article title maybe? [17:05:32] Yup. But clustering by IP won't help much there. [17:05:58] One could cluster by uri_path for that. [17:06:12] That won't give the right cluster for all cases, but for most. [17:06:41] hm, yeah, i'm not sure how hive does this with the joins. like, how it would know two tables are clustered by the same data. I suppose the hash of the join fields would have to be the same [17:07:09] so, in this example, uri_path wouldn't be good, but extracting the title out of the path would [17:07:12] ha, hm. or, page_id [17:07:15] if we ever get that! [17:07:35] page_id would be a great candidate for clustering :-) [17:07:44] s/clustering/bucketing/ [17:07:56] hm, ok, i'll hold of worrying about that then. we can change later. :) [17:08:01] At least for requests that have a page_id. [17:08:03] will just use ip for now [17:08:05] yeah. [17:08:05] hm [17:08:21] yeah i wonder if that would end up with a rather large bucket, for those with page_id: "-" (or whatever) [17:34:45] Analytics-Cluster: Raw webrequest partitions that were not marked successful due to oozie being overwhelmed - https://phabricator.wikimedia.org/T85704#952690 (QChris) NEW [17:41:46] Analytics-Cluster: About half of the raw webrequest partitions for 2014-12-22T18/8H not marked successful - https://phabricator.wikimedia.org/T85705#952701 (QChris) NEW [17:47:07] Analytics-Cluster: Raw webrequest partitions that were not marked successful due to oozie being overwhelmed - https://phabricator.wikimedia.org/T85704#952720 (QChris) [17:47:08] Analytics-Cluster: About half of the raw webrequest partitions for 2014-12-22T18/8H not marked successful - https://phabricator.wikimedia.org/T85705#952717 (QChris) Open>Resolved a:QChris Resources on the cluster went scarce, and the jobs could not launch before they timed out. I re-started them by... [18:59:10] Analytics-General-or-Unknown: Two webrequest partitions for 2014-12-26T06/1H not marked successful - https://phabricator.wikimedia.org/T85709#952778 (QChris) NEW [19:03:44] Analytics-General-or-Unknown: Two webrequest partitions for 2014-12-26T06/1H not marked successful - https://phabricator.wikimedia.org/T85709#952816 (QChris) Analytics1021 got dropped out it's partition leader role around 2014-12-26T06:02 (which caused loss <1 second worth of traffic). --------------- * tex... [19:03:57] Analytics-General-or-Unknown: Kafka broker analytics1021 not receiving messages every now and then - https://phabricator.wikimedia.org/T71667#952819 (QChris) [19:03:58] Analytics-General-or-Unknown: Two webrequest partitions for 2014-12-26T06/1H not marked successful - https://phabricator.wikimedia.org/T85709#952817 (QChris) Open>Resolved a:QChris [19:05:12] !log Marked raw text+upload webrequest partitions for 2014-12-26T06/1H ok (See {{PhabT|85709}}) [19:10:49] Analytics-Cluster: About half of the raw webrequest partitions for 2014-12-25T16/15H not marked successful - https://phabricator.wikimedia.org/T85710#952821 (QChris) NEW [19:13:37] Analytics-Cluster: Raw webrequest partitions that were not marked successful due to oozie being overwhelmed - https://phabricator.wikimedia.org/T85704#952832 (QChris) [19:13:39] Analytics-Cluster: About half of the raw webrequest partitions for 2014-12-25T16/15H not marked successful - https://phabricator.wikimedia.org/T85710#952827 (QChris) Open>Resolved a:QChris Resources on the cluster went scarce, and the jobs could not launch before they timed out. I re-started them by... [21:03:10] (PS1) Ottomata: [WIP] First draft of refinement phase for webrequest [analytics/refinery] - https://gerrit.wikimedia.org/r/182478 [21:05:08] Analytics-General-or-Unknown: Some raw webrequest partitions for 2014-12-11T14/1H not marked successful - https://phabricator.wikimedia.org/T85712#952859 (QChris) NEW [21:06:03] Analytics-General-or-Unknown: Kafka broker analytics1021 not receiving messages every now and then - https://phabricator.wikimedia.org/T71667#952868 (QChris) [21:07:29] !log Marked raw bits, text, and upload webrequest partition for 2014-12-11T14/1H ok (See {{PhabT|85712}}) [21:21:21] !log Ran kafka leader re-election to bring analytics1021 back into the set of leaders [22:57:17] (CR) QChris: Add UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters (4 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata) [23:01:28] (CR) QChris: Add UDF for classifying pageviews according to https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/180023 (owner: Ottomata)