[00:34:14] Analytics, Pageviews-API, Wikidata: "egranary digital library system" UA should be listed as a spider - https://phabricator.wikimedia.org/T135164#2290291 (Tbayer) >>! In T135164#2290869, @Addshore wrote: > Okay, so the first queries we used while looking also caught pages such as Special:RecentChange... [00:50:17] Analytics-Tech-community-metrics, Developer-Relations, Community-Tech-Sprint: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? - https://phabricator.wikimedia.org/T125459#2291534 (Earwig) Is anyone gonna answer my question first? [03:06:39] (PS5) Nuria: Match paths on request only if it is a web pageview [analytics/refinery/source] - https://gerrit.wikimedia.org/r/288458 (https://phabricator.wikimedia.org/T135168) [03:08:09] (CR) jenkins-bot: [V: -1] Match paths on request only if it is a web pageview [analytics/refinery/source] - https://gerrit.wikimedia.org/r/288458 (https://phabricator.wikimedia.org/T135168) (owner: Nuria) [03:18:30] (PS6) Nuria: Match paths on request only if it is a web pageview [analytics/refinery/source] - https://gerrit.wikimedia.org/r/288458 (https://phabricator.wikimedia.org/T135168) [03:18:58] (CR) jenkins-bot: [V: -1] Match paths on request only if it is a web pageview [analytics/refinery/source] - https://gerrit.wikimedia.org/r/288458 (https://phabricator.wikimedia.org/T135168) (owner: Nuria) [03:21:20] Analytics, DBA: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2291694 (Nuria) [03:21:22] Analytics-Kanban: Enforce policy for each schema: Sanitize {tick} [8 pts] - https://phabricator.wikimedia.org/T104877#2291695 (Nuria) [03:21:24] Analytics, DBA: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#2291693 (Nuria) Open>Resolved [04:21:42] (PS6) Nuria: [WIP] Fix unique devices bugs [analytics/dashiki] - https://gerrit.wikimedia.org/r/288104 (https://phabricator.wikimedia.org/T122533) (owner: Mforns) [04:28:26] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, and 2 others: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#2291749 (kaldari) @Sadads: No idea why it isn't logging currently. I remember @Ottomata saying someth... [04:31:03] (PS7) Nuria: [WIP] Fix unique devices bugs [analytics/dashiki] - https://gerrit.wikimedia.org/r/288104 (https://phabricator.wikimedia.org/T122533) (owner: Mforns) [05:11:48] Analytics-Tech-community-metrics, Developer-Relations, Community-Tech-Sprint: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? - https://phabricator.wikimedia.org/T125459#2291758 (kaldari) @Earwig: Honestly I have no idea if Microsoft would consider your API t... [08:11:21] Analytics, Pageviews-API, Wikidata: "egranary digital library system" UA should be listed as a spider - https://phabricator.wikimedia.org/T135164#2291917 (Addshore) >>! In T135164#2291517, @Tbayer wrote: >>>! In T135164#2290869, @Addshore wrote: >> Okay, so the first queries we used while looking als... [10:59:38] Analytics, Revision-Slider, TCB-Team, Patch-For-Review: Data need: User Behaviour when comparing article revisions - https://phabricator.wikimedia.org/T134861#2279974 (Lea_WMDE) Just a summary of the data Jan and I are interested in: For the revision view as is (without the new revision slider):... [11:03:05] Analytics, Pageviews-API, Wikidata: "egranary digital library system" UA should be listed as a spider - https://phabricator.wikimedia.org/T135164#2292179 (JAllemandou) @Tbayer : I suggested @Addshore to request webrequest on a specific hour for detailed user_agent analysis. For this check @Addshore,... [11:06:15] Analytics, Revision-Slider, TCB-Team, Patch-For-Review: Data need: User Behaviour when comparing article revisions - https://phabricator.wikimedia.org/T134861#2292187 (Addshore) So the patch above collects the oldid and newid, from this we can get the timestamp of each and the position in history... [11:08:57] Analytics, Pageviews-API, Wikidata: "egranary digital library system" UA should be listed as a spider - https://phabricator.wikimedia.org/T135164#2292190 (Addshore) >>! In T135164#2292179, @JAllemandou wrote: > @Tbayer : I suggested @Addshore to request webrequest on a specific hour for detailed user... [13:36:14] Analytics: Count requests for all wikis/systems behind varnish - https://phabricator.wikimedia.org/T130249#2130851 (Sadads) @Nuria do we have a sense of when this will happen? There are a fair number of dependencies on having this data available. [13:50:53] mobrovac: o/ [13:51:01] do you have a minute for the cassandra cluster thing? [13:54:13] elukey: hm, i will be intermittently available until the ops session [13:54:25] elukey: have a simple question we can resolve quickly? :P [13:54:50] mobrovac: already solved, thanks :) will upload the new patch, not urgent.. we'll review it next week [13:55:03] that was fast :)))) [13:56:20] mobrovac: that was me being dumb, different :P [13:57:27] haha [13:57:31] (CR) Mforns: [C: -1] [WIP] Fix unique devices bugs (4 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/288104 (https://phabricator.wikimedia.org/T122533) (owner: Mforns) [14:26:04] thx very much for the fix, Andrew, unblocked a couple of people that were waiting on me [14:27:25] (CR) Hashar: "recheck" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/287264 (owner: Bearloga) [14:31:11] (CR) Hashar: "I was merely validating that CI is all fine" [analytics/refinery/source] - https://gerrit.wikimedia.org/r/287264 (owner: Bearloga) [14:31:46] Analytics-Cluster, Analytics-Kanban, Operations, Traffic, Patch-For-Review: Upgrade analytics-eqiad Kafka cluster to Kafka 0.9 - https://phabricator.wikimedia.org/T121562#2292703 (Ottomata) We didn't get a chance to fully restart each broker with `inter.broker.protocol.version=0.9.0.X` this w... [14:49:39] nuria_: Hi M'dame, you here ? [15:08:57] joal: yes hello [15:09:05] heya :) [15:09:11] Modifying your code currently [15:09:36] Do you mind if I change you tests into the way we originally did (CSV data file?) [15:09:56] nuria_: --^ [15:11:05] joal: no, but those are 1 assertion tests if you have several steps per test they do not work as well [15:11:32] nuria_: true, you need 1 data row per assertion [15:11:38] But still doable [15:12:18] joal: in the case of "this is not a pageview" but now i change "x" and now it is a pageview [15:12:37] yes: same two rows except for header [15:12:46] joal: that is hard to see on the csv file as 1 record per assertion [15:13:15] Yes, I understand, I'm just trying to keep one way of doing things [15:13:38] Meaning, if we go CSV testing + core testing, at some point it's even less readable [15:13:52] What is tested where ... [15:20:16] Analytics-Kanban, Datasets-Webstatscollector, RESTBase-Cassandra, Patch-For-Review: Better response times on AQS (Pageview API mostly) {melc} - https://phabricator.wikimedia.org/T124314#1952692 (Bluerasberry) I wanted to speak up and say that I get the throttling notice. I need to use tools to ma... [15:20:41] joal: mmm.. i disagree, cvs testing is for 1 type of tests, very clear cut [15:21:04] joal: more "workflow" like tests are not well -suited for that strategy [15:22:14] Analytics-Kanban: Enable rate limiting on pageview api - https://phabricator.wikimedia.org/T135240#2292830 (Nuria) [15:23:02] nuria: While I agree, I don't view a huge value to the workflow strategy in the pageview case ... We are testing a yes / no based on a number of value ... [15:25:07] joaL: ok, change it then [15:28:34] Analytics: Count requests for all wikis/systems behind varnish - https://phabricator.wikimedia.org/T130249#2292857 (Nuria) @Sadads: not for a at least 3 months, we are focusing of edit data after having worked on pageview data for a while. As I said before (and I understand this is less convenient) we data... [15:32:44] milimetric, mforns : i think our problem in dashiki is not that hard to solve and deffered updates will help us, i have fixed 1 bug but there is other things we can do w/o doing a major refactor: http://knockoutjs.com/documentation/deferred-updates.html [15:33:15] nuria_, aha [15:48:29] mforns: there are many things but I think basically the issue is that the breakdown component needs to be reset when metric changes and it is not being so, this would be fixed by making the metric and breakdown same thing (what milimetric said yesterday) but i think i'd be better to control our dependency flow and not have 1 major object we are passing [15:48:29] around. But if we cannot fix it on a seamless fashion i will undo my changes. [15:48:39] that separated metric and breakdown [15:54:10] hey a-team, i'd like to go first at standup and do it fast and then go to the ops session that starts in 5 mins [15:54:16] ottomata: k [15:54:17] np [15:54:20] joe is doing one on conftool which sounds cool [15:54:26] Analytics-Kanban, Datasets-Webstatscollector, RESTBase-Cassandra, Patch-For-Review: Better response times on AQS (Pageview API mostly) {melc} - https://phabricator.wikimedia.org/T124314#2292924 (MusikAnimal) >>! In T124314#2292812, @Bluerasberry wrote: > I wanted to speak up and say that I get th... [15:54:59] ^ maybe I should not link to that phab task in the throttling notice I added to the Pageviews tools? [16:01:05] joal: standdduppp [16:01:12] Soooorry [16:01:13] joining [16:01:37] MusikAnimal: https://phabricator.wikimedia.org/T135240 [16:05:50] nuria_: I assume this will enforce the throttling I am currently doing on the frontend? and hopefully not be too aggressive :) [16:10:00] MusikAnimal: since you are in labs you will be throttled with other labs tools that access the api, throttling will be per Ip [16:14:17] nuria: maybe we can throtlle per IP+UA ? [16:14:19] nuria_: so in short, I should remove the notice in the Pageviews tools as the throttling isn't going away :/ [16:14:39] MusikAnimal: throttling is not on yet [16:14:43] MusikAnimal: it will be [16:14:50] right, but I have my own throttling right now [16:14:59] and the notice is telling users that it is temporary [16:15:03] which I guess is a lie [16:16:15] right now I'm only making requests every 250ms. Additionally the "massviews" tool, which imports Page Piles, is truncated at 500 individual requests [16:16:57] The outreach programs really want that limit to be removed, but I bet they don't mind dealing with the throttling [16:19:36] they just want to see the data on all the requested pages. Some of these "page piles" list thousands of pages! If we can still allow them to query for these, but just slow it down however much we need to, than I think everyone will be happy [16:34:26] nuria_: sorry to pester, but the one thing I have to let users know about is the "Error in Cassandra table storage backend" error. My understanding was this issue will eventually be resolved? Once you add the throttling and the "better response times on AQS", we should get all the data we need? [16:35:37] I'm not only throttling requests, but after one collection of requests has finished, I'm making the user wait 90 seconds before submitting for another batch of requests [16:35:51] yet I still occasionally get the Cassandra error [16:37:13] MusikAnimal: all within bounds, say we will never be able to return 10 years of data shoudl we have it [16:37:16] *should [16:37:31] MusikAnimal: so we will be returning as much data as our new nodes can return in 2secs [16:37:53] MusikAnimal: so error will be less frequent than it is now, yes. [16:37:57] MusikAnimal: makes sense? [16:38:03] ok cool [16:38:06] yes, I think [16:39:51] is https://phabricator.wikimedia.org/T124314 the most relevant phab task to be following? that will resolve the Cassandra errors? [16:40:40] I can just remove the link and explain that it's being worked on [16:50:37] (PS7) Nuria: Match paths on request only if it is a web pageview [analytics/refinery/source] - https://gerrit.wikimedia.org/r/288458 (https://phabricator.wikimedia.org/T135168) [16:51:23] MusikAnimal: yes, https://phabricator.wikimedia.org/T124314 [16:51:28] ottomata: o/ [16:51:47] I saw your comments on the aqs code review, all the domains are already in ops/dns [16:51:57] (and I ran a while ago auth-update) [16:52:18] nuria_: batcave again for a minute? [16:52:48] joal: in a minute? [16:52:53] sure :) [16:52:59] ping me when ready [17:04:28] joal, did you want to talk about sth? [17:04:49] ah k cool elukey! [17:04:52] hiya! [17:06:40] o/ [17:07:30] so I am still working with Rob for the boot issue with aqs100[456], apparently we are so lucky that something really weird is happening with the installer (probably not dependent by partman) [17:11:45] mforns: wanted to discuss again on pageview def :) [17:11:54] joal, ok [17:12:02] elukey: Maaaaaan :) [17:12:22] elukey: You go all the way with partman, then the machine itself dies on you ! [17:14:25] joal: I told you, aqs doesn't like me :P (but I'll win!) [17:15:03] also joal, https://gerrit.wikimedia.org/r/#/c/288373/ should resolve our issues and spin up a complete aqs cluster with cassandra multi instance [17:15:21] of course we won't have LVS [17:18:36] elukey: "Triumph without peril brings no glory" ! [17:19:48] gwicke: Hi, I have a quick question for you, do you have a minute ? [17:20:11] Analytics: Pageview API: Limit (and document) size of data you can request - https://phabricator.wikimedia.org/T134524#2293111 (MusikAnimal) @GWicke Got it! Makes perfect sense, you may disregard my concerns :) An API wrapper is a bit of work, but should be fun to implement. I will note that I've never actu... [17:24:04] joal: yup, made it to the office [17:24:34] great gwicke :) [17:27:23] * gwicke is all ears [17:27:25] so gwicke about /page/mobile-sections-lead [17:27:56] I wonder about laguage-variants [17:28:20] those aren't supported right now [17:28:33] gwicke: Ok great [17:28:35] https://phabricator.wikimedia.org/T122942 discusses options, but it's pending a more general solution [17:29:10] might be a path prefix (ex: /zh-hans/api/rest_v1/..), or domains [17:29:14] gwicke: I'm parsing the API url to extract page_title, and was wondering about variant for pageview data extraction [17:29:23] Analytics: Pageview API: Limit (and document) size of data you can request - https://phabricator.wikimedia.org/T134524#2293121 (Nuria) @MusikAnimal : to be clear we are not going to implement these suggestions: Specific day: /20160101/ One month: /201601/ Last 30 days: /-30d/ As they actually do not help w... [17:29:26] I'll subscribe to the task to be kept in the loop [17:30:15] kk; one potential gotcha to keep in mind for the title is that it's percent-encoded [17:30:24] Thanks a lot for the info gwicke (currently implementing in emergency the parsing of pageview info for android new calls to restbase :) [17:30:39] gwicke: I would have guessed :) [17:30:46] kk ;) [17:31:00] Have a good day and a good weekend gwicke, later :) [17:31:07] it differs from MW article accesses in that slashes are encoded as well [17:31:32] have a good weekend as well, joal! [17:31:44] ok, I'm gonna double check on examples gwicke :) [17:35:54] Analytics: Pageview API: Limit (and document) size of data you can request - https://phabricator.wikimedia.org/T134524#2293132 (GWicke) > As they actually do not help with our cache hit ratios problems that much. I'm curious, what is this assertion based on ? [17:47:44] nuria_: just realized some weird stuff [17:47:53] joal: yes [17:48:09] Analytics: Pageview API: Limit (and document) size of data you can request - https://phabricator.wikimedia.org/T134524#2293171 (Nuria) @GWicke Please see @JAllemandou comment above. Misses are due to article variability, some due to time ranges too but article variability seems to be a bigger issue. [17:48:19] joal: listening [17:48:44] while getting info for parsing title: it seems there is a bunch of pageview tagged requests that don't make to pageviews (not only the restbase ones, but also some api ones) [17:49:12] joal: do not make it to pageview_hourly you mean? [17:49:21] yes, is_pageview = flase [17:50:00] wait , they have pageview=1 being api.php requests? [17:50:18] nuria: batcave ? [17:50:21] joal: sure [17:52:35] Analytics: Pageview API: Limit (and document) size of data you can request - https://phabricator.wikimedia.org/T134524#2293190 (GWicke) @Nuria, we have a lot of end points that all vary on the title, and they have significantly higher hit rates at similar overall request rates. I wouldn't discount the useful... [17:55:22] Analytics: Investigate requests flagged as pageview in analytics header coming from bots - https://phabricator.wikimedia.org/T135251#2293216 (JAllemandou) [17:57:52] milimetric: joal, fyi, druid cluster running in labs using deb + puppet [17:57:56] druid10[123] [17:58:18] am working on puppet role stuff to auto configure hdfs deep storage now, but it is there for use if you wanna try stuff [18:25:31] hi all! I need some help fixing my log-in for stat1002. Is anyone available to help a lost soul? :) [18:25:49] (PS8) Joal: Refactor pageview definition for mobile [analytics/refinery/source] - https://gerrit.wikimedia.org/r/288458 (https://phabricator.wikimedia.org/T135168) (owner: Nuria) [18:26:05] nuria_: --^ Thanks ! Gone for diner [18:26:14] joal: k, CR-ing [18:37:26] ottomata: awesome. That was fast :) [18:48:43] (CR) Nuria: Refactor pageview definition for mobile (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/288458 (https://phabricator.wikimedia.org/T135168) (owner: Nuria) [19:08:30] ping on my question above :) [19:10:05] EGalvez: what's up with your login? [19:11:06] its not letting me in - madhu says its related to some changes to Bast1001 [19:11:26] I can email you the error? [19:11:37] (CR) Joal: "Thnaks for quick review nuria, see my comments in reply." (3 comments) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/288458 (https://phabricator.wikimedia.org/T135168) (owner: Nuria) [19:12:41] (PS9) Joal: Refactor pageview definition for mobile apps [analytics/refinery/source] - https://gerrit.wikimedia.org/r/288458 (https://phabricator.wikimedia.org/T135168) (owner: Nuria) [19:12:56] ottomata: Thanks man, that was fast indeed !P [19:17:16] a-team, gone for real ! [19:17:25] milimetric: speaking about logins, I have issues as well [19:17:37] although I suspect it's a misconfiguration in my case [19:19:21] EGalvez: oh maybe you haven't logged in since the last change to bastion, in which case you'll need to remove your known_hosts file. For me, this is in /home/dan/.ssh/known_hosts [19:19:31] strainu: what problems are you hitting? [19:20:31] Connection closed by 208.80.155.129 ssh_exchange_identification: Connection closed by remote host [19:20:41] madhu suggested I remove "Host bast1001.wikimedia.org" from my config file, but that did not seem to solve the problem [19:20:47] Im trying to follow https://wikitech.wikimedia.org/wiki/SSH_access [19:21:14] Tried that, but got the same error "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED" [19:21:21] but with bastion.wmflabs.org as proxy [19:21:42] EGalvez: have you tried removing it from known_hosts? [19:22:52] ohh I see [19:23:23] So - I remove the line that starts with "bast1001.wikimedia.org" ? [19:23:44] yes [19:23:57] milimetric: using bast3001.wikimedia.org I get to password login [19:24:08] so presumably the ssh key is not available on that server [19:24:19] EGalvez: you can safely delete all of known_hosts, it'll just ask you to confirm the fingerprint of all the places you login, but there's nothing critical in that file [19:24:34] but if you find the line for bastion you can delete that [19:25:07] mforns_gym: i *think* I found the way to the small low-impact fix, lemme super test [19:25:10] strainu: hm... bast3001... hang on, I'll paste my config up [19:26:07] https://www.irccloud.com/pastebin/6LBanNMq/ [19:26:17] (CR) Nuria: "Looks good, leaving merge up to you." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/288458 (https://phabricator.wikimedia.org/T135168) (owner: Nuria) [19:26:37] sweeet - I am in - thank you! milimetric and strainu [19:26:39] strainu: that config ^ is pretty bullet-proof, and it's using bastion-eqiad.wmflabs.org [19:26:57] of course you'll want to change the User and identity file path [19:27:08] sure, let me try that [19:28:32] nope, still connection closed [19:29:01] I added my ssh key at https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack [19:31:41] oh strainu this is the first time you're connecting to a labs machine? [19:32:12] in that case, head over to #wikimedia-labs and ask them to give you shell access. I think that's all you should need [19:32:12] from this computer yes [19:32:19] (PS8) Nuria: [WIP] Fix unique devices bugs [analytics/dashiki] - https://gerrit.wikimedia.org/r/288104 (https://phabricator.wikimedia.org/T122533) (owner: Mforns) [19:32:25] oh - so your account has hit labs before [19:32:29] I did connect to toollabs before [19:32:34] and I do have shell rights [19:32:42] at least that's what the wiki says [19:32:51] hm, not sure then, you should ask them, something seems broken [19:33:02] ok, thanks [19:33:21] (PS9) Nuria: [WIP] Fix unique devices bugs [analytics/dashiki] - https://gerrit.wikimedia.org/r/288104 (https://phabricator.wikimedia.org/T122533) (owner: Mforns) [19:33:25] sorry I can't help you much more if it's not the obvious stuff, someone probably needs to monitor the connection and I don't have rights for that [19:46:24] Analytics: Pageview API: Limit (and document) size of data you can request - https://phabricator.wikimedia.org/T134524#2293567 (Nuria) @GWickie: I am pointing out that we are not doing these changes in the near future given that we have quite a lot of work to do to make sure the storage layer actually works... [19:57:28] Analytics-Kanban, Operations, ops-eqiad, Patch-For-Review: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2293607 (RobH) So I disabled the second controller port and it boots into the OS. It seems the OS installs onto one of the ports, but the other port is conflicting... [20:53:21] Analytics-Kanban, Operations, ops-eqiad, Patch-For-Review: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2293964 (RobH) Ok, So the boot from C is due to Jessie/Trusty detecting the second controller/port over the primary controller/port. The boot order has to be chang... [20:58:35] Analytics-Kanban, Operations, ops-eqiad, Patch-For-Review: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2293976 (RobH) I fixed the bios boot order on aqs100[456], setting port #2 to primary allows the bios to boot in the order that the jessie/trusty installer detects... [21:00:55] nuria_, if we add .extend({throttle: 1}) to the end of the computed that updates mergedData, the problem disappears [21:03:29] throttle:1 makes the mentioned computed execute after all other synchronous computeds (including breakdown-toggle) [21:06:39] HMMMM i need a puppet druid hadoop brain bounce [21:07:40] nuria_, I will push it for review, see what you think. [21:40:14] bye a-team! [21:40:19] nice weekend!