Home > Uncategorized > daily 08/14/2016

daily 08/14/2016

    • SISA allows you to  do statistical analysis directly on the Internet. Click on one of the  procedure names below, fill in the form, click the button, and the  analysis will take place on the spot. Study the user friendly guides  to statistical procedures to see what procedure is appropriate for  your problem.
    • The statistical distribution spreadsheets can only be used if you have Ms Excel installed on your computer. The spreadsheets also seem to work fine in open office. Please “click” the links to the spreadsheets below. If your computer is configured in the right way the spreadsheets will be loaded automatically into excel, otherwise save the spreadsheets and open them as an excel file.

       

    • List of probability distributions

       

       

    • Know Thy Distributions.
    • Pareto
    • Gaussian
    • Exponential
    • von Mises?
    • The log-normal!
    • Fitting. Once you’ve got your distributions down, you should know how to fit them to data in slick ways. Start with maximum likelihood and go from there.
    • Classical hypothesis testing. I think p-values and frequentist hypothesis testing
    • Markov chains + bells + whistles.
    • Basic Bayesian thinking & modeling. Learn to think of everything as a probability distribution instead of just a single value (if appropriate). Be able to assemble the models & compute with them.
    • Some old-school stats and probability theory. E.g. “Random variables; transformations, conditional expectation, moment generating functions, convergence, limit theorems, estimation; Cramer-Rao lower bound, maximum likelihood estimation, sufficiency, ancillarity, completeness. Rao-Blackwell theorem. Some decision theory.”
    • Regression! First linear, then non-linear. (Gasp!)
    • Machine learning. I know you said “statistics,” but really if you want to be a “data scientist” then machine learning
    • We have been comfortable with L2 optimization (Euclidian distance metric) for a long time but there is a ground-swell of activity in L1 optimization (taxi-cab distance metric). L1 optimization pushes us out of our comfort zone of mean-squared error optimality and associated 2nd-order thinking!
    • The highlight is the collection of the vast material under 3 topics: Bayes Theorem, Cover Theorem and Neuroscience & ad hoc methods. In ML practice, these ML methods are “wrapped” around by “bootstrap” and “consensus” methods.
    • Cover Theorem
    • Estimating the Conditional Expectation,
    • Perceptrons
    • Input side: Bootstrap methods
       The objective is to maximize Training Set information use.
    • Feature subspace.
    • Output side: Consensus methods
       Solve the problem using independent ML methods and combine the results.
        • Combine “weak learners”. 
             

          • Random Forest. 
          • AdaBoost. 
          • And many more. 

           

         

      • These are different learning algorithms
      • Line 11 – 12: These are two empty lists. In order to make a plot, I need to give plotly a list of values. As I go through each step in the calculation, I will add a value to the list
    • Conda is both a package manager and an environment manager
    • But let’s say that you want to use a package that requires a different version of Python than you are currently using
    • 2. Managing environments

    • TIP: Many frequently used options after two dashes (--) can be abbreviated with just a dash and the first letter. So --name and -n options are the same and --envs and -e are the same. See conda --help or conda -h for a list of abbreviations.
    • TIP: You can add much more to the conda create command, type conda create --help for details.

       

    • conda info --envs 

       

       

      • Windows: activate snowflakes
    • Environments are installed by default into the envs directory in your conda directory. You can specify a different path, see conda create --help for details.
    • Which of these environments are you using right now – snowflakes or bunnies? To find out, type the same command:

       

      
      
    • NOTE: conda also puts an asterisk (*) in front of the active environment in your environment list; see above in “List all environments.”
    • View a list of packages and versions installed in an environment

       

    • TIP: Pip is only a package manager, so it cannot manage environments for you. Pip cannot even update Python, because unlike conda it does not consider Python a package. But it does install some things that conda does not, and vice versa. Both pip and conda are included in Anaconda and Miniconda.

       

    • The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.
    • Code can produce rich output such as images, videos, LaTeX, and JavaScript. Interactive widgets can be used to manipulate and visualize data in realtime.
    • Leverage big data tools, such as Apache Spark, from Python, R and Scala. Explore that same data with pandas, scikit-learn, ggplot2, dplyr, etc.
    • IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history.
    • An IPython notebook is a JSON document containing an ordered list of input/output cells which can contain code, text, mathematics, plots and rich media.
      • IPython Notebook provides a browser-based REPL built upon a number of popular Open Source libraries:

         

           

    • IPython Notebook was added to IPython in the 0.12 release[8] (December 2011). IPython Notebook has been compared to Maple, Mathematica, and Sage.

       

      IPython notebooks frequently draw from SciPy stack[9] libraries like NumPy and SciPy, often installed along with IPython from one of many Scientific Python distributions.[9]

    • nt.

       

      Project Jupyter[edit]

       

      In 2014, Fernando Pérez announced a spin-off project from IPython called Project Jupyter. IPython will continue to exist as a Python shell and a kernel for Jupyter, while the notebook and other language-agnostic parts of IPython will move under the Jupyter name.[11] Jupyter added support for J

    • If you want to install your own packages inside the container, you can get into it and run any normal bash shell commands. In order to get into a container, you’ll need to run docker exec. Docker exec takes a specific container id, and a command to run. For instance, typing docker exec -it 4greg24134 /bin/bash will open a shell prompt in the container with id 4greg24134. The -it flags ensure that we keep an input session open with the container, and can enter commands.

Posted from Diigo. The rest of my favorite links are here.

Advertisements
Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: