Reference
Last updated on 2024-10-18 | Edit this page
Reference
Running and Quitting
- Python files have the
.py
extension. - Can be written in a text file or a Jupyter Notebook.
- Jupyter notebooks have the extension
.ipynb
- Jupyter notebooks can be opened from Anaconda or
through the command line by entering
$ jupyter notebook
- Markdown and HTML are allowed in markdown cells for documenting code.
- Jupyter notebooks have the extension
Variables and Assignment
- Variables are stored using
=
.- Strings are defined in quotations
'...'
. - Integers and floating point numbers are defined without quotations.
- Strings are defined in quotations
- Variables can contain letters, digits, and underscores
_
.- Cannot start with a digit.
- Variables that start with underscores should be avoided.
- Use
print(...)
to display values as text. - Can use indexing on strings.
- Indexing starts at 0.
- Position is given in square brackets
[position]
following the variable name. - Take a slice using
[start:stop]
. This makes a copy of part of the original string.-
start
is the index of the first element. -
stop
is the index of the element after the last desired element.
-
- Use
len(...)
to find the length of a variable or string.
Data Types and Type Conversion
- Each value has a type. This controls what can be done with it.
-
int
represents an integer -
float
represents a floating point number. -
str
represents a string.
-
- To determine a variables type, use the built-in function
type(...)
, including the variable name in the parenthesis. - Modifying strings:
- Use
+
to concatenate strings. - Use
*
to repeat a string. - Numbers and strings cannot be added to on another.
- Convert string to integer:
int(...)
. - Convert integer to string:
str(...)
.
- Convert string to integer:
- Use
Built-in Functions and Help
- To add a comment, place
#
before the thing you do not with to be executed. - Commonly used built-in functions:
-
min()
finds the smallest value. -
max()
finds the largest value. -
round()
rounds off a floating point number. -
help()
displays documentation for the function in the parenthesis.- Other ways to get help include holding down
shift
and pressingtab
in Jupyter Notebooks.
- Other ways to get help include holding down
-
Libraries
- Importing a library:
- Use
import ...
to load a library. - Refer to this library by using
module_name.thing_name
.-
.
indicates ‘part of’.
-
- Use
- To import a specific item from a library:
from ... import ...
- To import a library using an alias:
import ... as ...
- Importing the math library:
import math
- Example of referring to an item with the module’s name:
math.cos(math.pi)
.
- Example of referring to an item with the module’s name:
- Importing the plotting library as an alias:
import matplotlib as mpl
Reading Tabular Data into DataFrames
- Use the pandas library to do statistics on tabular data. Load with
import pandas as pd
.- To read in a csv:
pd.read_csv()
, including the path name in the parenthesis.- To specify a column’s values should be used as row headings:
pd.read_csv('path', index_col='column name')
, where path and column name should be replaced with the relevant values.
- To specify a column’s values should be used as row headings:
- To read in a csv:
- To get more information about a DataFrame, use
DataFrame.info
, replacingDataFrame
with the variable name of your DataFrame. - Use
DataFrame.columns
to view the column names. - Use
DataFrame.T
to transpose a DataFrame. - Use
DataFrame.describe
to get summary statistics about your data.
Pandas DataFrames
- Select data using
[i,j]
- To select by entry position:
DataFrame.iloc[..., ...]
- This is inclusive of everything except the final index.
- To select by entry label:
DataFrame.loc[..., ...]
- Can select multiple rows or columns by listing labels.
- This is inclusive to both ends.
- Use
:
to select all rows or columns.
- To select by entry position:
- Can also select data based on values using
True
andFalse
. This is a Boolean mask.mask = subset > 10000
- We can then use this to select values.
- To use a select-apply-combine operation we use
data.apply(lambda x: x > x.mean())
wheremean()
can be any operation the user would like to be applied to x.
Plotting
- The most widely used plotting library is
matplotlib
.- Usually imported using
import matplotlib.pyplot as plt
. - To plot we use the command
plt.plot(time, position)
. - To create a legend use
plt.legend(['label1', 'label2'], loc='upper left')
- Can also define labels within the plot statements by using
plt.plot(time, position, label='label')
. To make the legend show up, useplt.legend()
- Can also define labels within the plot statements by using
- To label x and y axis
plt.xlabel('label')
andplt.ylabel('label')
are used.
- Usually imported using
- Pandas DataFrames can be used to plot by using
DataFrame.plot()
. Any operations that can be used on a DataFrame can be applied while plotting.- To plot a bar plot
data.plot(kind='bar')
- To plot a bar plot
Lists
- Defined within
[...]
and separated by,
.- An empty list can be created by using
[]
.
- An empty list can be created by using
- Can use
len(...)
to determine how many values are in a list. - Can index just as done in previous lessons.
- Indexing can be used to reassign values
list_name[0] = newvalue
.
- Indexing can be used to reassign values
- To add an item to a list use
list_name.append()
, with the item to append in the parenthesis. - To combine two lists use
list_name_1.extend(list_name_2)
. - To remove an item from a list use
del list_name[index]
.
For Loops
- Start a for loop with
for number in [1, 2, 3]:
, with the following lines indented.-
[1, 2, 3]
is considered the collection. -
number
is the loop variable. - The action following the collection is the body.
-
- To iterate over a sequence of numbers use
range(start, end)
Conditionals
- Defined similarly to a loop, using
if variable conditional value:
.- For example,
if variable > 5:
.
- For example,
- Use
elif:
for additional tests. - Use
else:
for when if statement is not true. - Can combine more than one conditional by using
and
oror
. - Often used in combination with for loops.
- Conditions that can be used:
-
==
equal to. -
>=
greater than or equal to. -
<=
less than or equal to. -
>
greater than. -
<
less than.
-
Looping Over Data Sets
- Use a for loop:
for filename in [file1, file2]:
- To find a set of files using a pattern use
glob.glob
- Must import first using
import glob
. -
*
indicates “match zero or more characters” -
?
indicates “match exactly one character”- For example:
glob.glob(*.txt)
will find all files that end with.txt
in the current directory.
- For example:
- Must import first using
- Combine these by writing a loop using:
for filename in glob.glob(*.txt):
Writing Functions
- Define a function using
def function_name(parameters):
. Replaceparameters
with the variables to use when the function is executed. - Run by using
function_name(parameters)
. - To return a result to the caller use
return ...
in the function.
Variable Scope
- A local variable is defined in a function and can only be seen and used within that function.
- A global variable is defined outside of a function and can be seen or used anywhere after definition.
Programming Style
- Document your code.
- Use clear and meaningful variable names.
- Follow the PEP8 style guide when setting up your code.
- Use assertions to check for internal errors.
- Use docstrings to provide help.
Glossary
- Arguments
- Values passed to functions.
- Array
- A container holding elements of the same type.
- Boolean
-
An object composed of
True
andFalse
. - DataFrame
- The way Pandas represents a table; a collection of series.
- Element
- An item in a list or an array. For a string, these are the individual characters.
- Function
- A block of code that can be called and re-used elsewhere.
- Global variable
- A variable defined outside of a function that can be used anywhere.
- Index
- The position of a given element.
- Jupyter Notebook
- Interactive coding environment allowing a combination of code and markdown.
- Library
- A collection of files containing functions used by other programs.
- Local Variable
- A variable defined inside of a function that can only be used inside of that function.
- Mask
- A boolean object used for selecting data from another object.
- Method
-
An action tied to a particular object. Called by using
object.method
. - Modules
- The files within a library containing functions used by other programs.
- Parameters
- Variables used when executing a function.
- Series
- A Pandas data structure to represent a column.
- Substring
- A part of a string.
- Variables
- Names for values.