# Statistics

Engineering students interested in a survey of the mathematical theory of probability and statistics should consider the pair *STAT UN3203: Probability theory and UN3204: Statistical inference*. Students seeking a quicker overview that focuses more on probability theory should consider *STAT GU4001. STAT GU4109 *(6 pts) covers the same material as *UN3203 *and *UN3204 *in a single semester. *STAT UN3205: Linear regression models* takes *UN3203 *and *UN3204 *as prerequisites; like other advanced offerings in statistics, it covers both theory and practical aspects of modeling and data analysis.

*STAT GR5203, GR5204, and GR5205 *are the equivalent of* UN3203, UN3204, *and *UN3205, *respectively; but graduate students may not register for *UN3203, UN3204, *or* UN3205.*

Advanced offerings in probability theory, stochastic processes, and mathematical finance generally take *STAT UN3203 *as a prerequisite; advanced offerings in statistical theory and methods generally take *STAT GR5204* and, in several cases, *UN3205 *as prerequisites; an exception is *STAT GU4220: Data mining*, which has a course in computer programming as prerequisite and *STAT UN3204* as corequisite. *STAT GR5291* is an advanced survey of applied statistical methods.

Please note that *STAT UN3000 *has been renumbered as *UN3203 *and *STAT UN3659 *has been renumbered as *UN3204*. For a description of the following course offered jointly by the Departments of Statistics and Industrial Engineering and Operations Research, see** Industrial Engineering and Operations Research**.

**STAT UN2102x Applied linear regression analysis**

*3 pts. Professor Feng.*

Prerequisite: One of STAT UN1001, UN1111, or UN1211. Develops critical thinking and data analysis skills for regression analysis in science and policy settings. Simple and multiple linear regression, nonlinear and logistic models, random-effects models, penalized regression methods. Implementation in a statistical package. Optional computer-lab sessions. Emphasis on real-world examples and on planning, proposing, implementing, and reporting.

**STAT UN2104y Applied statistical methods**

*3 pts. Professors Landwehr and Whalen.*

Prerequisite: STAT UN2102. Classical nonparametric methods, permutation tests; contingency tables, generalized linear models, missing data, causal inference, multiple comparisons. Implementation in statistical software. Emphasis on conducting data analyses and reporting the results. Optional weekly computer-lab sessions.

**STAT UN2105x Statistical applications and case studies**

*3 pts. Instructor to be announced.*

Prerequisite: STAT UN2104. A sample of topics and application areas in applied statistics. Topic areas may include Markov processes and queuing theory; meta-analysis of clinical trial research; receiver-operator curves in medical diagnosis; spatial statistics with applications in geology, astronomy, and epidemiology; multiple comparisons in bio-informatics; causal modeling with missing data; statistical methods in genetic epidemiology; stochastic analysis of neural spike train data; graphical models for computer and social network data.

**STAT UN2106x Applied data mining**

*3 pts. Professor Emir.*

Data mining is a dynamic and fast growing field at the interface of Statistics and Computer Science. The emergence of massive datasets containing millions or even billions of observations provides the primary impetus for the field. Such datasets arise, for instance, in large-scale retailing, telecommunications, astronomy, computational and statistical challenges. This course will provide an overview of current practice in data mining. Specific topics covered include databases and data warehousing, exploratory data analysis and visualization, descriptive modeling, predictive modeling, pattern and rule discovery, text mining, Bayesian data mining, and causal inference. The use of statistical software will be emphasized.

**STAT UN3103x Mathematical methods for statistics**

*6 pts. Professor Hannah.*

Prerequisite: MATH UN1101 or permission of the instructor. A fast-paced coverage of those aspects of the differential and integral calculus of one and several variables and of the linear algebra required for the core courses in the Statistics major. The mathematical topics are integrated with an introduction to computing. Students seeking more comprehensive background should replace this course with MATH UN1102 and UN2010, and any COMS course numbered from W1003 to W1009.

**STAT UN3203x Introduction to probability**

*3 pts. Professor Lo.*

Prerequisites: MATH UN1101 and UN1102 or equivalent. A calculus-based introduction to probability theory. A quick review of multivariate calculus is provided. Topics covered include random variables, conditional probability, expectation, independence, Bayes’ rule, important distributions, joint distributions, moment generating functions, central limit theorem, laws of large numbers and Markov’s inequality.

**STAT UN3204y Introduction to statistical inference**

*3 pts. Professor Neath.*

Prerequisite: STAT UN3203 or GR5203, or equivalent. Calculus-based introduction to the theory of statistics. Useful distributions, law of large numbers and central limit theorem, point estimation, hypothesis testing, confidence intervals maximum likelihood, likelihood ratio tests, nonparametric procedures, theory of least squares, and analysis of variance.

**STAT UN3205x Linear regression models**

*3 pts. Professor Zheng.*

Prerequisites: STAT UN3204 (or UN3001) and STAT UN3103 (or MATH UN1101, UN1102, and UN2110). Theory and practice of regression analysis. Simple and multiple regression, testing, estimation, prediction, and confidence procedures, modeling, regression diagnostics and plots, polynomial regression, colinearity and confounding, model selection, geometry of least squares. Extensive use of the computer to analyze data. Equivalent to STAT W4315 except that enrollment is limited to undergraduate students.

**STAT UN3281x Theory of interest**

*3 pts. Professor s Qadir, Szeto, and Xu.*

Prerequisite: MATH UN1101 or equivalent. Introduction to the mathematical theory of interest as well as the elements of economic and financial theory of interest. Topics include rates of interest and discount; simple, compound, real, nominal, effective, dollar (time)-weighted; present, current, future value; discount function; annuities; stocks and other instruments; definitions of key terms of modern financial analysis; yield curves; spot (forward) rates; duration; immunization; and short sales. The course will cover determining equivalent measures of interest; discounting; accumulating; determining yield rates; and amortization.

**STAT UN3997x and y Independent research**

*1 pt. Members of faculty.*

Prerequisite: Permission of a member of the department. May be repeated for credit. This course provides a mechanism for students who undertake research with a faculty member from the Department of Statistics to receive academic credit; students should only register for this course with permission of their project mentor.

**STAT GU4001x and y Introduction to probability and statistics**

*3 pts. Members of faculty.*

Prerequisites: MATH UN1101 and UN1102 or equivalent. A quick calculus-based tour of the fundamentals of probability theory and statistical inference. Probabilistic models, random variables, useful distributions, expectations, laws of large numbers, central limit theorem, point and confidence interval estimation, hypothesis tests, linear regression. Students seeking a more thorough introduction to probability and statistics should consider STAT UN3203 and UN3204.

**STAT GR5205x and y Linear regression models**

*3 pts. Members of faculty.*

Prerequisites: STAT UN3204 or equivalent, MATH UN1101, UN1102, UN2010 or permission of program adviser. Theory and practice regression analysis, simple and multiple regression, including testing, estimation and confidence procedures, modeling, regression diagnostics and plots, polynomial regression, colinearity and confounding, model selection, geometry of least squares. Extensive use of the computer to analyze data.

**STAT GR5207x and y Elementary stochastic processes**

*3 pts. Professors Brown and Wang.*

Prerequisite: STAT UN3203, GR5203, or equivalent. Review of elements of probability theory. Poisson processes. Renewal theory. Wald’s equation. Introduction to discrete and continuous time Markov chains. Applications to queueing theory, inventory models, branching processes.

**STAT GR5221x and y Time series analysis**

*3 pts. Professor Safikhani.*

Prerequisite: STAT GR5205 or equivalent. Least squares smoothing and prediction, linear systems, Fourier analysis, and spectral estimation. Impulse response and transfer function. Fourier series, the fast Fourier transform, autocorrelation function, and spectral density. Univariate Box-Jenkins modeling and forecasting. Emphasis on applications. Examples from the physical sciences, social sciences, and business. Computing is an integral part of the course.

**STAT GR5222y Nonparametric statistics**

*3 pts. Professors Maleki and Sen.*

Prerequisite: STAT UN3204 or GR5204. Statistical inference without parametric model assumption. Hypothesis testing using ranks, permutations, and order statistics. Nonparametric analogs of analysis of variance. Nonparametric regression, smoothing and model selection.

**STAT GR5231y Survival analysis**

*Professor Shnaidman.*

Prerequisite: STAT GR5205. Survival distributions, types of censored data, estimation for various survival models, nonparametric estimation of survival distributions, the proportional hazard and accelerated lifetime models for regression analysis with failure-time data. Extensive use of the computer.

**STAT GR5232y Generalized linear models**

*3 pts. Professor Sobel.*

Prerequisite: STAT GR5205. Statistical methods for rates and proportions, ordered and nominal categorical responses, contingency tables, odds-ratios, exact inference, logistic regression, Poisson regression, generalized linear models.

**STAT GR5233x Multilevel models**

*3 pts. Instructor to be announced.*

Prerequisites: STAT GR5205. Theory and practice, including model-checking, for random and mixed-effects models (also called hierarchical, multi-level models). Extensive use of the computer to analyze data.

**STAT GR5234x Sample surveys**

*3 pts. Professors Ben-David and Wu.*

Prerequisite: STAT UN3204 or GR5204. Introductory course on the design and analysis of sample surveys. How sample surveys are conducted, why the designs are used, how to analyze survey results, and how to derive from first principles the standard results and their generalizations. Examples from public health, social work, opinion polling, and other topics of interest.

**STAT GR5242x Data mining**

*3 pts. Professors Mazumder, Motta, and Rabinowitz.*

Prerequisite: COMS W1003, W1004, W1005, W1007, or the equivalent. Corequisites: Either STAT UN3203 or GR5203, and either STAT UN3204 or GR5204. Data Mining is a dynamic and fast growing field at the interface of Statistics and Computer Science. The emergence of massive datasets containing millions or even billions of observations provides the primary impetus for the field. Such datasets arise, for instance, in large-scale retailing, telecommunications, astronomy, computational and statistical challenges. This course will provide an overview of current research in data mining and will be suitable for graduate students from many disciplines. Specific topics covered with include databases and data warehousing, exploratory data analysis and visualization, descriptive modeling, predictive modeling, pattern and rule discovery, text mining, Bayesian data mining, and causal inference.

**STAT GR5261y Statistical methods in finance**

*3 pts. Professors ElBarmi, Wang, and Ying.*

Prerequisite: STAT UN3204 or GR5204. A fast-paced introduction to statistical methods used in quantitative finance. Financial applications and statistical methodologies are intertwined in all lectures. Topics include regression analysis and applications to the Capital Asset Pricing Model and multifactor pricing models, principal components and multivariate analysis, smoothing techniques and estimation of yield curves statistical methods for financial time series, value at risk, term structure models and fixed income research, and estimation and modeling of volatilities. Hands-on experience with financial data.

**STAT GR5262y Stochastic processes for finance**

*3 pts. Professor Zhang.*

Prerequisite: STAT UN3203, GR5203, or equivalent. This course covers theory of stochastic processes applied to finance. It covers concepts of Martingales, Markov chain models, Brownian motion. Stochastic Integration, Ito’s formula as a theoretical foundation of processes used in financial modeling. It also introduces basic discrete and continuous time models of asset price evolutions in the context of the following problems in finance: portfolio optimization, option pricing, spot rate interest modeling.

**STAT GR5291x and y Advanced data analysis**

*3 pts. Professors Alemayehu and Liu.*

Prerequisite: STAT GR5205. At least one of GR5261, GR5232, GR5233, GR5221, GR5222, GR5231 is recommended. This is a course on getting the most out of data. The emphasis will be on hands-on experience, involving case studies with real data and using common statistical packages. The course covers, at a very high level, exploratory data analysis, model formulation, goodness of fit testing, and other standard and nonstandard statistical procedures, including linear regression, analysis of variance, nonlinear regression, generalized linear models, survival analysis, time series analysis, and modern regression methods. Students will be expected to propose a data set of their choice for use as case study material.

**STAT GR5703x Statistical inference and modeling**

*3 pts. Professor Hannah.*

Prerequisites: Working knowledge of calculus and linear algebra (vectors and matrices), and STAT GR5203 or equivalent. This course systematically covers the fundamentals of statistical inference and testing, and gives and introduction to statistical modeling. The first half of the course focuses on inference and testing, covering topics such as maximum likelihood estimates, hypothesis testing, likelihood ratio test, Bayesian inference, etc. The second half of the course provides an introduction to statistical modeling via introductory lectures on linear regression models, generalized linear regression models, nonparametric regression and statistical computing. Real-data examples will be used in lecture discussion and homework problems. This course lays the foundation for other courses in machine learning, data mining, and visualization.

**2016-2017 Academic Year: the system of course numbering and designated level is in transition; please consult an adviser.*