I had this tweet appear in my timeline
Which links through to a quora discussion about the internet formulation of
z/(x+y) + y/(x+z) + x/(y+z) = 4 and an example image of
With the accompanying text
“Roughly 99.999995% of the people don’t stand a chance at solving it, and that includes a good number of mathematicians at leading universities who just don’t happen to be number theorists. … A brute-force search with a computer is totally useless here… I don’t know how to fit the entire solution in a Quora answer without assuming that everyone already knows everything they need to know about elliptic curves.”
I just want to point out a few things about the problem, from my perspective as a non-mathematician data person. I am not going to speculate about if solutions exist at all (read the interesting linked through material about that), instead I am noting a data drive approach that if a solution exists, this is how you can brute force it.
Principle One: Can you make a problem more like a problem that you have already solved
Well, there is nothing in the formulation saying x, y, and z have to be different numbers, so let’s only find a solution for when y = z. This makes the problem harder in the sense that the (infinite) set of possible solutions where y and z are the same is (infinitely) smaller than the solutions where they are different. However, it is easier because suddenly we are only dealing with 2 variables.
2y/(x+y) + x/2y = 4
In brute forcing it on a computer it has gone from a 3d search space to a 2d search space, which is closer to impractical than impossible.
But if we were looking brute forcing, can we do better.
Principle Two: Visualize the data
It is a pretty-much instant process to feed all the combinations from 1 to 500 for x and y into the equation, and see how close the answers are to 4. Of the 124750 pairs tested, I graphed the 200 pairs closest but under and the 200 pairs closest but over 4.
library(combinat) library(dplyr) xy <- data.frame(combn2(1:500)) names(xy) <- c("x","y") yx <- xy[,c(2,1)] names(yx) <- c("y","x") yx <- rbind(xy,yx) xy2 <- xy %>% mutate(a= 2 * x / (x + y) + y / (2 * x), distance = abs(a-4), abovebelow = ifelse(a > 4,"above","below") ) %>% arrange(abovebelow, distance) %>% group_by(abovebelow) %>% slice(1:200) %>% ungroup() plot(xy2$x, xy2$y, xlim=c(0,500), ylim=c(0,500), pch=".", xlab="possible x values", ylab="possible y values", frame.plot = FALSE)
So, possible solutions form along a line. Which is to say the ratio of x to y is near but not quite perfect. As the solution (if one exists) will be on the line, this turns the problem from a 2d search space to a 1d (linear) search space.And 1d search spaces are, in principle, highly brute forcible.
Principle Three: Do a regression
We can but some specific bounds on the search space by taking the lower bound of “values that result in just below 4” and an upper bound of “values that result in just above 4” and regress on those (initially 200) values.
above_model <- lm(y ~ x, data=xy2[1:200,]) under_model <- lm(y ~ x, data=xy2[201:400,]) a_i <- above_model$coefficients a_b <- above_model$coefficients u_i <- under_model$coefficients u_b <- under_model$coefficients
and test on the assumption that for a given value of x it is only worth testing y values between 0.0614424 + 7.4843108x and 0.0873257 + 7.5742664x . We can then lower the search space by jumping forward though possible integers and generating new test values
predictors <- data.frame(x= (1000000-40):1000000) predictors$y2 <- u_i + u_b * predictors$x predictors$y1 <- a_i + a_b * predictors$x
Adding the “closest to 4” entries to the existing data, one could then rerun the regression to narrow the search space.
Repeat until fade.
And then, of course, there is just doing some clever stuff with the overall regression line of the closest itself as the basis of the search, rather than upper and lower bounds, and building in some clever search intelligence on the way the proportions approach or depart from being integers as you move through the series.