PDA

View Full Version : Regression analysis


CC Brown
18th March 2005, 11:12.49 AM
Has anyone found or used a regression analysis program for Access? If so what is it? and how hard is it to use?
Thanks

hurrikane
21st March 2005, 11:43.27 AM
Hey CC,

I have done some things with this...without much success I might add.

Latest toy are salford-systems.com analysis tools. CART and MARS.

They don't run on access per se but you export things to excel and run from there.

Most of the things I have found specific to access are too simplistic for the magnitude of data fields in HTR. A lot of static (of course that could be coming from inside my head).

overdog
5th April 2005, 07:24.34 PM
Hi CC;

I am a consultant in Applied Mathematics, was a technical advisor to Dr. Z, inventor of the place and show betting method and software, and am a beta tester for one of the world's largest statistical software companies. I also develop software using Artificial Intelligence to do predictions and forecasting. I have at various times worked with over a dozen Fortune 500 Companies, doing data mining, including several on Wall Street.. So I have had a long history of finding out what works and what doesn't in Quantitative Analysis. It is time consuming, but a lot of fun with horses, and can help make your passion a profitable hobby.

You can export Access data or any data which comes in text or CSV(comma separated values) or several other formats, to Excel.

In Excel you need to click on Tools, Add-Ins and load the Analysis ToolPak. Do so by clicking in the check box next to the Analysis listing. It will load the toolpack and continue to do so, every time you start excel until you uninstall it, by unselecting the ToolPak.

Be advised though, that Regression is EXTREMEMLY hard to do well.

Best results occur when:

1. You have at least 5 rows of data. Actually you can multiply that by 10 or 100 before I would BEGIN to trust the data.

2. It works best with from 4-8 predictor factors. More than that and you are fooling yourself.

3. The factors should be TOTALLY unrelated to each other. In horseracing stats, this is very difficult to accomplish., i.e. Class. Use one factor only. If you used two or more like average finish time AND (finish position in classes with purses of $20K or better, or $20K or higher claiming class,) you are really using "class" twice. (Higher class horses tend to run faster times.)

5. Convert ALL numbers to a scaled value. If you use lets say Cramer speed #'s, say in the 80's and 90's, and also wanted to use post position #'s, (1 to 12,) the equation will utilize the higher Cramer #'s in a different way, giving them much more weight.

Scale each horse to a value which is a % of the MAXIMUM of all the horses in the race. So if the highest Cramer fig is 90, that becomes a 1.0. A horse who runs a Cramer 60, gets a score of .67. If the lowest were 45, he'd get a .5 etc.

Do this for all variables.

6. Plan your regression ahead of time. Mainly ask, what else in the data available abut this horse, is expressed in another way in another column of HX4 or whatever. If there's even the SLIGHTEST doubt in your mind, that they measure the same thing, DISCARD one or the other. Then try it first with one, then re-run with the other.

7. Treat some items as 1 OR 0. i.e. wet = 0, dry is 1. Turf is 1 or 0, etc. If the variable is basically yes/no, treat as a binary variable, meaning 1 or 0. By convention, 1 usually means yes, 0 means no. But be consistent. Aim to make all the positives 1 or 0 and the negatives the opposite.

8. By FAR the most important. When you get your output, do NOT assume correlation is causation. At least not until you have a large database population. Assume you need to have not less than 30 WINNERS with an angle. Keep adding sample races until you have enough total races to give you 30 winners. This may be 60 races (all odds-on faves in Gr1 stake races) or several hundred races.

9. Test your method on a COMPLETELY new sample of races. Do NOT use ANY of the original 30 winners, or the entire collection of races those winners came out of.

Hope this helps,

Good hunting,

Regards,
Fraser Rawlinson

hurrikane
5th April 2005, 08:12.00 PM
Hi CC;

8. By FAR the most important. When you get your output, do NOT assume correlation is causation. At least not until you have a large database population. Assume you need to have not less than 30 WINNERS with an angle. Keep adding sample races until you have enough total races to give you 30 winners. This may be 60 races (all odds-on faves in Gr1 stake races) or several hundred races.



This could not be said better. BY FAR MOST IMPORTANT!!!!!!!

I'll even add that just because you have a large data set DO NOT make the assumption.

You may well find as I did that excel does not have the data capacity to do the regression you are looking for with only 65k rows.