Wikipedia:Reference desk/Archives/Science/2012 August 1

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Science desk
< July 31 << Jul | August | Sep >> August 2 >
Welcome to the Wikipedia Science Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 1[edit]

Blood relations[edit]

i was marry with my mothers elder brothers daughter <request for medical advice removed> are we first cousins or second cousins? — Preceding unsigned comment added by Nalaka88 (talkcontribs) 06:40, 1 August 2012 (UTC)

We cannot give medical advice, so it would be best for you to seek guidance from a medical professional. Evanh2008 (talk|contribs) 06:43, 1 August 2012 (UTC)

Please see our medical disclaimer. BigNate37(T) 06:48, 1 August 2012 (UTC)
We can answer the non-medical question. Your father in law also being your uncle makes you and your wife first cousins. See Cousin#Basic_definitions. 203.27.72.5 (talk) 08:31, 1 August 2012 (UTC)
While we can't give medical advice, I can give you a link to Cousin marriage#Biological aspects. If you wish to know how that information relates to you, then you'll need to speak to a doctor or genetic counsellor. --Tango (talk) 11:47, 1 August 2012 (UTC)
This is far more a political or cultural issue than a medical one. A small detail not mentioned in the cousin marriage article is that in the traditionally more widely favored cross cousin arrangement, which this exemplifies, one greatgrandfather contributes his X to a daughter and his Y to a son, the third generation (opposite sexes of either variety) then marry; in this case no part of the greatgrandfather's X can meet up with itself in the fourth generation. (Of course, recessive sex-linked traits only matter to women anyway) Wnt (talk) 14:23, 1 August 2012 (UTC)
Chromosomes don't stay together. See chromosomal crossover. Also, you are ignoring all the non-sex chromosomes. --Tango (talk) 23:01, 1 August 2012 (UTC)

Vitamin C in raw meat[edit]

Where does the vitamin C in raw meat come from? Madifrop (talk) 15:43, 1 August 2012 (UTC)

From the animal's food of course. Roger (talk) 15:48, 1 August 2012 (UTC)
(ec) Most animals can produce vitamin C in their bodies (see Vitamin C), and of course animals can also gain vitamin C from their food, as we humans do. - Lindert (talk) 15:49, 1 August 2012 (UTC)
The ability to synthesize Vitamin C, a rather simple compound with a similar chemical structure to glucose, was lost in the ancestry of the haplorrhines, the primate group including the monkeys, apes, and tarsiers. This was possible because these animals received enough ascorbic acid in their diets (from fruits like figs), that they did not need to synthesize it, so losing the gene for the enzyme that catalysed its synthesis was no handicap. μηδείς (talk) 03:03, 2 August 2012 (UTC)

Shovel-shaped incisors[edit]

Why were Shovel-shaped incisors developed in humans and why are they prevalent in the Mongoloid races? What do people with non Shovel-shaped incisors have instead? Reticuli88 (talk) 15:57, 1 August 2012 (UTC)

According to [1], the crown shapes of maxillary central incisors in Caucasians is ovoid, square, or triangular. StuRat (talk) 21:58, 1 August 2012 (UTC)
See Sinodonty and Sundadonty. Dental patterns in humans go from generalized to specialized. "Proto-Sundadonty" is believed to be the ancestral dental pattern of all late Pleistocene human populations. Isolated populations that underwent the least amount of genetic drift like the Ainu, the prehistoric Jomon, modern Papua New Guineans, and Australian Aborigines still resemble Proto-Sundadonts closely. Dental patterns of South Asians, Indochinese, Austronesians, southern Chinese, Arctic Native Americans (Aleut, Inuit, etc.), and some other Native Americans are predominantly the less generalized Sundadont dental pattern (which do not exhibit shoveled incisors). Not sure about modern European, western and central Asian, and African dental patterns, but I think they are also generalized and resemble Sundadonty. On the other hand, East Asians, Northeast Siberians, and the rest of the Native Americans who underwent the most amount of genetic drift from the original human migrants out of Africa are predominantly the more derived Sinodonts. There is no whys about it, though.-- OBSIDIANSOUL 23:47, 1 August 2012 (UTC)

Extracting multiple regression lines from a single data set[edit]

Return maps of velocity and acceleration respectively, of Drosophila wandering in an arena.

I was inspired by the discussion here on how to extract nonlinear order from chaos in seemingly random time series such as the logistic map by using something called a "(Poincare) return map" (a little different from the Poincare map described here on Wikipedia). I used the same idea to look at the time series variables of my Drosophila, where I had tracked things like velocity, acceleration, angular velocity, etc.

I've shown two examples of the return maps I have gotten for all my Drosophila populations (more or less in the same form) to the right, though the acceleration plots and the velocity plots are from different populations. These are just examples of the plots I'm getting-- all the velocity and acceleration plots have the same basic form as these plots, although slopes and spread/distribution might differ.

I created the return maps by plotting f(t+1) on the y-axis versus f(t) on the x-axis, where f is either velocity or acceleration as appropriate. (The time difference between time t and t+1 is 1/15 of a second, i.e. time-series measurements are sampled at 15 Hz.) It reveals intriguing semidiscrete stochastic decision-making on the part of Drosophila, but how do I even statistically analyse the multiple trendlines, especially so I can detect differences between genetically-different populations? My normal approach (line/curve fitting) is useless here. I think extracting things like slopes, angle difference (maybe as a function of probability), spread from the trendlines but I wouldn't know how to begin. Can someone introduce me to "multiple trendline" analysis methods? (How would you pick out the main sequence from a Hertzsprung-Russell diagram for example?) Nothing gold can stay (talk) 21:28, 1 August 2012 (UTC)

"It reveals intriguing semidiscrete stochastic decision-making on the part of Drosophila..." ... or, it reveals aliasing due to quantization noise. You're sampling a continuous trajectory; your resolution in time is band-limited by your video frame-rate; and your samples in space are band-limited by your camera's optical/digital resolution (number of millimeters per pixel). You're aliasing like crazy in three dimensions! (x,y, and t). All data derived from the measurements should be carefully processed to avoid creating spurious signals. But, if you're certain that you believe these charts, and simply want to fit equations to the scatter-plots, you should start by reading Statistical classification. There are many different types of classifier algorithms; these types of mathematical algorithms can be used to distinguish one "blob" from another "blob" in a scatter-plot or a point-cloud. Once you separate the data into separate "blobs," (rather, "once you classify the data into disjoint sets"), you can fit a linear equation, or any other sort of statistical parameterization, for each set. Nimur (talk) 21:40, 1 August 2012 (UTC)
However I observe the discretization only happens at fast speeds or fast accelerations, where the fly moves or accelerates a lot between frames. If it was due to quantization noise, you would expect the quantization to increase as speed or acceleration decreased. If the fly is moving or accelerating quickly, there is a greater spectrum of values that it could "land" on. My spatial resolution is roughly ~6.3 pixels per mm and temporal resolution at 15 Hz, thus my velocity resolution is at least 2.4 mm/s and acceleration resolution at least 36 mm/s^2. Nothing gold can stay (talk) 22:19, 1 August 2012 (UTC)
...You're not making a very convincing case. You've computed the derivative numerically, correct? Are you familiar with the noise problem introduced by applying the finite difference operator to approximate the derivative? In other words, solving for velocity v(t) by computing the difference in position x(t+1) - x(t) is a very great way to amplify noise and amplify the error in your measurement. You don't think it's a strange coincidence that the data seems clustered around a line with slope of 1/2, 1, and 2? What you're proposing is that statistically, a fly who is buzzing around at 42.5mm/s for 1/15 of a second is very likely to decide that during the next 1/15 second, it should fly at 21.25 mm/s. This doesn't strike you as a procedural error? Nimur (talk) 23:52, 1 August 2012 (UTC)
I have a good reason to suspect my tracking problem superlocalises so the resolution is in fact better than 2.4 mm/s. I don't think it's that likely to jump from 42.5 to 21.25 mm/s because the biggest steps would be in 2.4 mm/s, if not smaller. I do note that separation increases as a function of speed/acceleration. Also, I would think error in measurement in one frame is unlikely to propagate to another because the video samples at every frame and doesn't do any sort of dead reckoning. Correct me if I'm wrong? Nothing gold can stay (talk) 00:08, 2 August 2012 (UTC)

(Also, for both aesthetic and scientific reasons, may I suggest that you use squared axes when plotting velocity-vs-velocity, or any other case when you create a scatter-plot of dimensionally-identical quantities? It helps make the interpretation easier. If you used MATLAB to create the plots, use axis equal. Nimur (talk) 21:51, 1 August 2012 (UTC)

As far as writing a program to handle the top graph, it should only compute distance from each line to those points closer to that line than the other two. You would have to specify that it needs to look for 3 lines, though. You could expect such a program to take far longer to run than if you manually separated the data into 3 blobs. StuRat (talk) 22:05, 1 August 2012 (UTC)
Okay, but how would it find the lines once I've specified the number of lines? Nothing gold can stay (talk) 22:32, 1 August 2012 (UTC)
It would initially place all the lines, say, along the X axis, then use a hill-climbing method to change the angle and elevation of each line, until it found the best location for each. If you could specify an initial guess for the placement of each line, that would certainly speed things up. If you can provide a CSV file with the data in that top graph, I could take a shot at writing a program for it. StuRat (talk) 22:49, 1 August 2012 (UTC)
Thanks a lot for your offer! :) I'll see if I can package it online. Maybe I'll upload it to my university web page. Nothing gold can stay (talk) 00:08, 2 August 2012 (UTC)
(update) Okay, I have my csv files at http://people.virginia.edu/~jrs5fg/returnvelo.csv and http://people.virginia.edu/~jrs5fg/returnaccel.csv; they both correspond to the same raw experimental data set. Basically, there are two columns in each file, the first being f(t) and the second being f(t+1). Nothing gold can stay (talk) 03:58, 2 August 2012 (UTC)
StuRat, that will only work if you're planning to fit the data to a strictly convex model. "Hill climbing" algorithms also have a nasty tendency of finding numerical instabilities and local minima. Do you know the characteristics of this data-set inside and out? Are you an expert in the field of trajectory modeling in drosophila? If so, why don't you offer up some reliable sources for best practices and techniques? And if you aren't... that's why we don't offer to conduct research. I think our O.P. ought to get in touch with their P.I. for a little more guidance and direction. Nimur (talk) 23:43, 1 August 2012 (UTC)
Geez, what's with the vitriol ? I just offered to curve fit a graph, and it shouldn't matter what the data is. It's up to the researcher to decide what it means, not the curve fitting program, or programmer. Hill-climbing doesn't always work, but, looking at the top graph, hill climbing should work fairly well here, and is likely the only option that could fit this amount of data with three lines in a reasonable time frame. The only problem I see is where the data overlaps, but I don't see any solution for that problem. I'd restrict the curve fitting to the point where the data diverges. StuRat (talk) 23:56, 1 August 2012 (UTC)
The No Original Research policy applies to Wikipedia articles. This is not an article, it's a Ref Desk question. What on earth is wrong with an answer that contains original research, significant research even, as well as just a simple calculation or graph, if the provider of the articles is happy to do provide it? Doing so may mean that the poster gets no official credit, but that's his problem. The NOR policy is appropriate to WP articles, as they comprise an encyclopedia (=collection of known facts), but here the sole criteria should be does it help the OP? As with any sort of answer on Ref Desk, it is up to the OP to assess the merits of any OR provided Wickwack120.145.176.119 (talk) 00:52, 2 August 2012 (UTC)
I'm exhausting all available options here-- I've contacted other people in my department but they are away teaching at a mountain lake this summer, so I'm trying other options. Also my PI says I have to seek outside help to solve my problem, because he doesn't know the techniques to solve this problem himself. Nothing gold can stay (talk) 00:08, 2 August 2012 (UTC)
Just curious, do drosophila have three different modes of locomotion, like a horse has a trot, gallop, etc. ? StuRat (talk) 00:14, 2 August 2012 (UTC)
I don't think so...? (I haven't heard of it, at least...) From other papers I've seen where they looked at behavior through return maps, discrete locomotion behaviors (trot, gallop, etc.) would correspond to spherical'ish clusters on a return map, not lines. i.e. at f(t) there would be discrete clusters a, b, c ... (corresponding to the different velocity modes), and at f(t+1) a, b, c, and the return map would simply imply switching between discrete modes. This is opposed to the lines bx +c... also if measured velocity was in fact discretised, wouldn't the return map break up into such clusters? Instead, the lines are fairly continuous. Nothing gold can stay (talk) 00:30, 2 August 2012 (UTC)
Did you choose 1/15th of a second because that's the time frame that makes the lines have slopes of .5, 1 and 2, or did it just work out that way the first time you plugged in the data? It just seems incredibly suspicious to me. I showed your graph to my own lab's tracking expert, and he simply refused to believe the lines are real. Someguy1221 (talk) 00:36, 2 August 2012 (UTC)
It's 1/15 second because it seems to be the maximum sampling rate from VirtualDub (I can't seem to change it, I think my camera is capable of 1/29.97). Also I believe the slopes are closer to 0.4, 1 and 2.5....what's so suspicious about the lines? Again the structure of the return map suggests possible velocity is continuous, not discrete, just that change in velocity can follow three modes, and once the fly has "chosen" a mode, there is wide continuum within that mode in which to choose. Also, why is everyone ignoring the acceleration graph? Acceleration ranges from -500 to 500 mm/s^2 (and the resolution would be at least 36 mm/s^2). When a fly is at 0 mm/s, its pdf for the next frame ranges from 0 to 500 mm/s^2-- once such a positive acceleration is chosen in that frame, a negative acceleration is likely in the frame after that, proportional to the positive acceleration "chosen". And once a negative acceleration is chosen, there is a tendency to return to zero acceleration in the next next frame. Thus, a fly's acceleration plot wanders all over from -500 to 500 mm/s^2 over time, in a seemingly random fashion. If the plots suffered from quantization error I would expect the acceleration return map (as well as the velocity return map) to look a lot different, breaking up into clusters rather than lines. Can someone check my logic? Nothing gold can stay (talk) 00:45, 2 August 2012 (UTC)

A note on using both return maps together[edit]

Return maps again, just so people don't have to scroll-up to follow this section.

Suppose a fly is at 30 mm/s at t=1. It has a good chance of decreasing around 10 mm/s in the next (1/15) second. Then its acceleration will be (v2 - v1) * 15 fps = -300 mm/s^2 at t=2. But according to the acceleration return map it's likely that acceleration will now go to zero in the next frame +/- 50 mm/s^2 (velocity would increase or decrease 3 mm/s max) at t=3, so we can deduce that velocity will most likely remain around 10 (+/- 3) mm/s at t=3.

But now that acceleration is close to zero, it will most likely choose acceleration values anywhere from 0 to 500 mm/s^2, so it would prefer one of the two modes (increasing or maintaining velocity) but not the decreasing velocity mode, such that velocity will be anywhere from 7 to 43 mm/s at t=4, but not below that, even though the velocity return map would allow for values between 0 to 7 mm/s otherwise. At t=5, it is likely the fly will decelerate (according to the acceleration return map, because it just accelerated) proportional to the acceleration chosen-- this locks out the increasing-velocity mode -- but because v can be anywhere from 7 to 43 mm/s (roughly) and acceleration anywhere from 0 to 500 mm/s^2 at t=5, we can't deduce much about t=6. (The acceleration return map cuts off because of how I cropped it but the lines reach +/- 500 mm/s^2.)

Of course, if v=30 mm/s at t=1, it's v could also be 30 or 60 mm/s at t=2 (with less probability), going into loops that will be broken by drift. Walkthrough: If velocity stays the same (at 30 mm/s) at t=2, acceleration is zero at t=2, so it will likely accelerate at t=3, in which case the velocity return map indicates ~60 mm/s^2, in which case acceleration will be roughly +450 mm/s^2 at t=3. But then acceleration will likely be -450 mm/s^2 at t=4 (note it does not have a good chance of accelerating or staying at the same velocity, which you would expect if the discretization was due to quantization error) so it will go back to 30 mm/s^2 at t=4. But this implies (according to the acceleration return map) that acceleration is zero at t=5, i.e. at t=5 v is 30 mm/s^2 with acceleration at t=6 being anywhere from 0 to 500 mm/s^2, but with a bias to accelerate it at +450 mm/s^2 towards 60 mm/s at t=6 for the cycle to begin again. (Stochastic drift, or chance of being an outlier, allows the fly to escape this cycle.) But, if v = 60 mm/s at t=2, then a = ~+450 mm/s^2 at t=2, which means a ~= -450 mm/s^2 and v ~= 30 mm/s at t=3, a ~= 0 mm/s^2 and v ~= ~30 mm/s at t=4 and v = 60 mm/s and a = 450 mm/s^2 at t=5 -- another short-term loop which will be broken by drift.

This is just for v = 30 mm/s at t=1 -- I haven't even gone through the deduction process for other values yet. If the discrete divergence of modes were simply due to quantization, these implied cycles wouldn't exist and you would see more than one preferred "next acceleration" for a given current acceleration at time t. (t+1 acceleration varies the most when acceleration is zero at time t.) Using the info from both plots, we can see sometimes one can "eliminate" one or two other modes from consideration in deciding behavior at the next step. I don't think I would be able to deduce this much information if the discrete modes were due simply to quantization error. Again, I think I would get clusters rather than continuous lines. Nothing gold can stay (talk) 01:48, 2 August 2012 (UTC)

I find it intriguing that no other contributor (at the science desk!) except Nimur points out that this data doesn't make sense (biologically) and looks like a clear example of measurement and/or processing artifacts. What do you suggest as interpretation of your data? Drosophila has somehow only three modes of acceleration? And nobody noticed before? Do you have some kind of control experiment? That tests that your equipment and your analysis pipeline is really able to measure what you want? I would bet that a control experiment with some light point in random motion (or some other insect/small animal, which could be easier to arrange) captured the same way would give the exact same "split" of data points. And I think this is the way to proceed, if you ever want to publish this data in some form. Is there something comparable already published in your field? Or did you set up the method yourself? It would perhaps be helpful to look how others have setup their measurement and analysis. Sorry for my harsh tone, but I really think the RefDesk Science could do better in such a central question as measuring artifacts, so I find it important to bring my point across. --TheMaster17 (talk) 09:11, 2 August 2012 (UTC)
Another important point concerning your expectation of cycles because of quantization is that you are dealing here with quantization in every single measured variable (x,y and t). It is not trivial to extrapolate what kind of artifacts you get when quantization errors have an effect on each other and on your derived values, such as v and a here. Circles are definitive not the shapes I would expect. --TheMaster17 (talk) 09:16, 2 August 2012 (UTC)
Do you know how return maps work? It is not that hard to conjecture what happens because the effect of noise and discretization on return maps has been studied. Please not that velocity or acceleration is not confined to discrete points. You seem to make this mistake a lot. At each point t, there seems to be 3 choices (sometimes less-- possibly two or one , due to the patterns implied, which likely wouldn't happen with discretization), but each choice has a certain element of drift and randomness in it, and the output velocity or acceleration is dependent on this choice and the previous velocity or acceleration. Even if there were no stochastic noise at all, very chaotic behavior can arise from very determined behavior -- see the logistic map, in which determined equations make a function wander all over the place. People haven't also answered my question at why discretization would separate modes most at fast accelerations and speeds, rather than slow ones. Nothing gold can stay (talk) 11:30, 2 August 2012 (UTC)
But do let me conduct the control experiment. But my goal is not to study this sort of thing, it's simply to find an assay that will differentiate genotypes or drugs. But few people seem to have answered my statistical questions.... Nothing gold can stay (talk) 11:25, 2 August 2012 (UTC)
It is surprising, but we shouldn't reject data out-of hand just because it's not what we expected. For example, the observation that the expansion of the universe is accelerating seemed very odd, at the time, but has since been verified by many others. Perhaps, using a car analogy, they tend to "floor it", "maintain cruising speed", or "slam on the brakes", rarely picking any acceleration in between. While I wouldn't have expected this, it's not exactly shocking. And, if it can be verified by others, this is an interesting observation worth publishing. StuRat (talk) 09:18, 2 August 2012 (UTC)


You are totally right, and I never proposed to just discard the data. But the OP really needs to address some questions if he wants to get a firm interpretation. And coming to your example, exactly for the same reasons the expansion of the universe was only accepted as fact after it was confirmed around the world by different measurements in different groups: to quote Laplace, "The weight of evidence for an extraordinary claim must be proportioned to its strangeness." And I really cannot imagine that this simple setup with camera was never tried by others. I remember, for example, a study about movements of some social insects (ants, bees or termites probably) which plotted routes of them inside the nest, which would probably use a similar setup. Concerning your "it's not exactly shocking" statement, I must admit that I would find it rather shocking when a complex animal like drosophila would only ever have three kinds of acceleration. Even plots of bacteria and other single celled organisms moving through or over medium show a diverse range of speeds and acceleration depending on a broad range of factors, and not just three discrete modes. This is why I wouldn't believe the data just as they are now. With appropriate controls this could change, of course. --TheMaster17 (talk) 09:55, 2 August 2012 (UTC)
Maybe this behavior is limited to drosophila. If so, it would be interesting to isolate the genes which cause it. StuRat (talk) 10:22, 2 August 2012 (UTC)
Dude, possible accelerations are not discrete, they vary continuously. However style of acceleration seems to be discretely dependent on t and the previous acceleration, but the freedom in which to vary is large (for example when previous acceleration (t=1) is zero, acceleration chosen is likely to be anything positive thus ensuring that when t=3, acceleration is likely to be anything negative), plus there is the stochastic randomness in which a fly chooses to "break" the trend or slowly drift away from it. One test to see if this is an artifact of data processing is to freely vary the time lag interval (i.e. t and t+2, etc.) I bet you that at t+5 the correlation would probably disappear. (Let me check this.) From what I see, groups have applied return maps to animal behavior, but generally things like "interval time between rats pressing levers" etc rather than motion. I don't think this is an extraordinary finding. Of course, my PI doesn't think this a thing worth pursuing until it can separate the effects of different drugs or genotypes.
Also, there are very few papers on Drosophila walking-tracking (there are a lot on a fly with clipped wings being held by a force gauge to measure pseudoflight) -- I have consulted other papers doing tracking studies, but they never looked at return maps. Nothing gold can stay (talk) 11:06, 2 August 2012 (UTC)
I can read the plots, that is why I said "discrete modes", not "discrete accelerations". But what you are proposing in your interpretation is that there are only three classes of points in your a and v plots for t and t+1. Why should this be real? What could be the biological cause of this? And I cannot really follow your narrative in your previous explanation. Why should a(t=3) be negative? If the fly is continuously moving (because it accelerated between t=1 and t=3), I would expect it to be around 0 sometime after this, after the animal has picked up velocity and is moving at its preferred speed (depending on the context). What you are proposing is really that the animals are jumping, in steps of 15th of a second, between accelerating and deaccelerating (sometimes no acceleration at all) continuously? This would be really wasteful. Another thing that comes to mind: Did you check that the alignment of your camera pixels does not influence the quantization of the position? You could, for example, get preferential different results for v (and therefore a) if the animal is running along a neat line of pixels, or in diagonal, because the quantization error differs systematically between the two cases. --TheMaster17 (talk) 12:03, 2 August 2012 (UTC)
But there are surely papers of walk-tracking for other insects. Did they ever describe something similar? I would see no reason why drosophila should be different in walking pattern than say, for example, an ant or a bug. And if it is really different, you would still need extraordinary proof for this extraordinary claim (and probably have a hypothesis why it is special). Before you really dive into this, I think you have to check your data and setup. Nothing feels as bad, in my opinion, as to realize after half a year that you hunted an artifact the whole time. Happened to me, too. --TheMaster17 (talk) 12:10, 2 August 2012 (UTC)
And let's not forget the important point that Nimur already mentioned: Using your (already quantized and therefore having a measurement error) x,y differences to compute numerically the first derivative v and then the second derivative a is not the way to go, because you will invariably inflate errors. There really is no reason why the fly should only be able to chose certain classes of velocities and accelerations at t+1 depending on its speed and acceleration at t. A fly has a longer memory than 1/15th of a second, and both speed and acceleration should form a continuum and not split into three classes at high values. --TheMaster17 (talk) 13:02, 2 August 2012 (UTC)
"There really is no reason why the fly should only be able to chose certain classes of velocities and accelerations at t+1 depending on its speed and acceleration at t.". I can easily imagine a cause -- something like a central pattern generator -- different neurons are taking over movement at different points at time. I am going to sample at different points to see how the patterns change.
"This would be really wasteful." Sustained movement is often a cycle. Have you thought about how you run? You accelerate sharply, cruise for a bit, then decelerate. Is that wasteful? The same thing goes with wing flapping.
"But there are surely papers of walk-tracking for other insects." There are not very many using return maps. Drosophila is an important model organism.
I do not expect deviation from other insects. I have used my data to investigate other things like velocity-curvature relationship, and Drosophila appears to be close to the 2/3 power law described for velocity-(radius of curvature) throughout nature, from eye saccades to hand movement. Nothing gold can stay (talk) 14:37, 2 August 2012 (UTC)

Processing data[edit]

Velocity graph with circles every 10 units from origin, and data points divided into 3 colored "blobs", with the lines shown which were used to determine the "blobs".

I took the velocity data you supplied and made my first attempt at dividing it into 3 "blobs". However, as I noted before, we have a problem where the three blobs overlap. It's not a severe problem for the central blob, since presumably an approximately equal number of data points are lost (or gained) at both edges. However, a clear skewing effect is seen at the lower and upper blobs. So, we should drop the data in the region where the overlap occurs. I'll leave it to you to decide how much to drop, but it looks to me like data in the 40 unit range from the origin has this problem. Once you let me know how much data you'd like to drop, I can then provide you with CSV files for the 3 blobs, and you can then run those through your favorite curve fitting application. StuRat (talk) 08:17, 2 August 2012 (UTC)

Wow thanks! Could you provide me with the script perhaps? I have many data plots like this, so it's not the csv file I need. Also, is there a way to weight a probability mapping for points that seem to be "shared" by either blob-- perhaps a point in the middle would have a 50% weighting for one blob and 50% for the other. (Or perhaps a probability function can be used to decide which blob a point should belong to.) Can this technique be applied to the acceleration graph? Acceleration does seem more cleanly separated after all. Nothing gold can stay (talk) 11:11, 2 August 2012 (UTC)
I could assign a probability if we knew the distribution ahead of time, but we don't. For example, if I just said a point midway between two lines has a 50% chance of being in either, I'd likely be wrong. So, I really don't think we can get any useful data out of the place where all three blobs overlap. StuRat (talk) 17:48, 2 August 2012 (UTC)
You would find less statistical skewing if you used an actual classifier algorithm, like I described in my very first post. But, since you're already taking great liberties with data-processing, why not just throw out any subset portion of the data that is statistically destroyed by an intermediate processing step! But seriously, if you're actually planning to pursue this as a topic of research, you've got to identify your methodological error first. Not after you spend a week analyzing invalid results. You're flying at 60 mm/sec down the wrong road here. "Science" is about understanding your data, and its natural causes, not about running an elaborate post-processing algorithm to a point-cloud. Though, if you announce to the world that you've discovered something huge based on a statistical anomaly, only to recant six months later because all you discovered is that you don't know how to operate and interpret your own experimental equipment, you'd be in good company. I've got a recommendation: Proakis and Manilakis, because I think you're missing some fundamentals of elementary signal-processing techniques. And once you've mastered regular signal processing, here's an even more advanced book, available for free online: Statistical Signal Processing. I'd also recommend that you review how your trajectories are generated from your raw data. I'd also look at your raw data. StuRat has called my responses vitriolic, and I apologize. I am not trying to discourage your work; but from over here, what it looks like is that somebody handed you a copy of MATLAB and told you to "use it" to process fly trajectories. They may as well put you in front of a Cray. It is also a very powerful number-crunching machine, but you don't know how to use it. The computer will spit results out, and StuRat can even write a program to color-code them, but... that doesn't mean anything. Nimur (talk) 15:41, 2 August 2012 (UTC)
I agree that the preliminary data needs to be checked for systemic error, but this doesn't necessarily need to be done prior to curve-fitting. Indeed, curve fitting may help to detect some types of systemic error. It could also be used to request better equipment or a colleague's assistance (in the same lab or another), to verify the results. StuRat (talk) 19:29, 2 August 2012 (UTC)
The author of the program I'm using got back to me, and apparently my program does some sort of superlocalising. "Ctrax calculates the position and orientation of flies by fitting ellipses to connected groups of foreground pixels using the spatial moments of each connected component. The precision of this calculation is limited by the bit depth and spatial resolution of the source images, but the relationship isn't straightforward." Nothing gold can stay (talk) 22:31, 2 August 2012 (UTC)

Biological explanation ?[edit]

Presumably max acceleration and braking takes more energy than more gradual velocity changes. If this turns out to be real, one possible explanation for having 3 discrete acceleration modes might be preemptive predator evasion. Other flies randomly change direction when flying, whether they detect a predator or not. Perhaps these guys also randomly change acceleration, which certainly would make it difficult to catch them in flight. StuRat (talk) 19:35, 2 August 2012 (UTC)

Flies often do path-finding saccades, because they are constantly trying to explore new territory as well as revisit old ones to make sure they haven't missed anything. The velocity-curvature relationship roughly follows the 2/3 power law. I found that the pattern continues (but gets more noisy) comparing f(t+2) against f(t), f(t+5), and at f(t+10) you can still see the separate arms but they are very fat and short. (Going to upload pictures.) So apparently velocity and acceleration in one frame have a longer-term memory of a dozen or more frames back, which makes sense, because I am sampling at an arbitrary frrequency. Also a reason why I argue against noise is that angular velocity's return map does not break up into discrete lines -- it's a semi-random map with a weak/moderate f(t+1) = -f(t) signal. Nothing gold can stay (talk) 21:45, 2 August 2012 (UTC)
I see that Nimur posted once in the meantime, trying to point you to relevant sources to understand your sampling problem. I just want to give you a simple thought experiment to help you grasp the basic problem of quantization and numerical derivatives: Imagine a point-like fly that is moving in x-direction (skipping y for simplicity) with a velocity of 3 mm/s. Your temporal resolution is 1 s, and your camera+software has "bins" of 5 mm into which it sorts the position of the fly. The point-fly starts at t=0 s at x=0 mm. At t=1 s, it is at x=3 mm, fitting into the first bin. t=2 s, real position is x=6 mm, second bin. t=3 s, real position x=9 mm, still second bin. t=4 s, real position x=12 mm, third bin. If you now look only at the bins, as this is the output you get from the software, you would think the fly stopped between t=2 s and t=3 s, giving you a classical quantization error. Although, in reality, the fly had constant speed and an acceleration of zero, you would, with your method, calculate non-zero acceleration and changing speeds just because of the way your detection algorithm works. Granted, this example is very simplified, but I just wanted to give you a start for thinking. In your real dataset, you have quantization of every measurement (x,y,t), plus unknown interactions with the software that computes the detected position from the pixel-positions (as I mentioned: does it make a difference if a fly runs diagonal to the pixel grid? Did I understand correctly that the sampling rate of your software is different than the sampling rate of your camera?), followed by the calculation of numerical derivatives from the already error-prone measurements. All those error sources could amplify each other, and without systematic testing for that it is impossible to give a proper interpretation of what is real and what is noise. --TheMaster17 (talk) 07:41, 3 August 2012 (UTC)
No the sampling rate of my software is whatever the sampling rate of the video is. Also the spatial resolution of 5 mm in your example is pretty bad (my spatial resolution is at least 0.2mm, superlocalising brings this down to 0.1 mm or 0.05mm although this is a complicated relationship). Also note that my data doesn't have obvious bins. The spatial resolution appears to be pretty high, if you actually looked at my data. Have you looked at the spatial resolution on the axes? Nothing gold can stay (talk) 15:03, 3 August 2012 (UTC)
The spatial resolution of your axes tells you nothing about the real resolution of your data. It may be a hint, but this depends entirely on the software and formatting you use. And as I said, my example is extremely simplified. But you can put arbitrarily small values into my example and would still get numerical pseudo-signals if you don't apply some sort of filter to compensate for the quantization. The numbers are therefore irrelevant, I just wanted to demonstrate the principle. And the principle holds, regardless of the way you bin your data (and that is what you are doing in every real world scenario: breaking up continuous variables into discrete parts). --TheMaster17 (talk) 08:16, 4 August 2012 (UTC)
Could your tracking software pick up a small steel ball? I have seen videos of Drosophila tracking experiments in which a small steel ball was added to the arena, and a motorized magnet located beneath the arena moved the ball along a defined trajectory. You could run your analysis software on recordings of such a ball as a control. This could be something as simple as strapping the magnet to a motor that makes it spin in a circle. Someguy1221 (talk) 07:52, 3 August 2012 (UTC)
Perhaps some of the people involved with mosquito flight track analysis and the Track3D system at Wageningen University and Research Centre might by useful contacts. See [2][3]. Sean.hoyland - talk 16:52, 3 August 2012 (UTC)
This sound like a good idea to get a grasp of the limitations (and abilities) of your setup. You could vary speed, acceleration and trajectory of the ball somewhat, and see how this influences the graphs you are producing. --TheMaster17 (talk) 08:19, 4 August 2012 (UTC)
Also, this may be of interest, Valente D, Golani I, Mitra PP (2007) Analysis of the Trajectory of Drosophila melanogaster in a Circular Open Field Arena. PLoS ONE 2(10): e1083. doi:10.1371/journal.pone.0001083. Sean.hoyland - talk 18:46, 4 August 2012 (UTC)
This is not only of interest, this is the solution. The authors of the paper that Sean.hoyland has cited state explicitly that the trajectory of the fly has to be smoothed before calculating the numerical derivative if you want a meaningful value. And they even mention another method of getting the derivative in the methods. I really recommend that you try to reproduce their method before interpreting anything about your v and a values (which are probably products of your noisy position data). --TheMaster17 (talk) 20:40, 4 August 2012 (UTC)