Fun With PowerShell: Let's Get Started (Digging Deeper into "The Pipeline")
In the first post in the Fun with PowerShell series, we wrote a little script that searched the Open Movie Database for movies containing the word "Avengers".
We learned about Invoke-RestMethod
, the syntax for invoking commands, the concept of "pipelines", and the fact that anything we type in a script that isn't assigned to a variable or passed into a pipeline is printed out. We also learned about redirection and the special $null
variable, which allows us to redirect output into nothingness.
If you're just interesting in learning enough PowerShell to be useful, feel free to move on to the second post in the series (Fun With PowerShell: Deduplicating Records). But if you're curious to dig deeper, let's unpack a few of the concepts that we learned in more detail.
Invocation Syntax
Like in other shells, you invoke a command in Powershell by mentioning it.
Get-Process
NPM(K) PM(M) WS(M) CPU(s) Id SI ProcessName
------ ----- ----- ------ -- -- -----------
0 0.00 88.55 2.94 14 1 node
0 0.00 169.46 8.84 132 128 pwsh
0 0.00 3.26 0.00 131 128 runuser
0 0.00 0.75 0.00 1 1 sh
0 0.00 1.55 0.00 5 1 startNode.sh
0 0.00 3.14 0.00 128 128 startPwsh.sh
We invoked the command Get-Process
, which is similar to Unix's ps
command, and got back a table of data.
Parameters in Powershell are similar to other shells, but are much more structured. In fact, simply defining a Powershell command is enough to create a reasonable help description. In fact, every command has an automatic -?
parameter that will print out help for that command.
Get-Process -?
NAME
Get-Process
SYNOPSIS
Gets the processes that are running on the local computer or a remote computer.
SYNTAX
Get-Process [[-Name] <String[]>] [-ComputerName <String[]>] [-FileVersionInfo] [-Module] [<CommonParameters>]
Get-Process [-ComputerName <String[]>] [-FileVersionInfo] -Id <Int32[]> [-Module] [<CommonParameters>]
Get-Process [-ComputerName <String[]>] [-FileVersionInfo] -InputObject <Process[]> [-Module] [<CommonParameters>]
Get-Process -Id <Int32[]> -IncludeUserName [<CommonParameters>]
Get-Process [[-Name] <String[]>] -IncludeUserName [<CommonParameters>]
Get-Process -IncludeUserName -InputObject <Process[]> [<CommonParameters>]
DESCRIPTION
The Get-Process cmdlet gets the processes on a local or remote computer.
Without parameters, this cmdlet gets all of the processes on the local computer. You can also specify a particular process by process name or process ID (PID) or pass a process object through the pipeline
to this cmdlet.
By default, this cmdlet returns a process object that has detailed information about the process and supports methods that let you start and stop the process. You can also use the parameters of the
Get-Process cmdlet to get file version information for the program that runs in the process and to get the modules that the process loaded.
RELATED LINKS
Online Version: http://go.microsoft.com/fwlink/?linkid=821590
Debug-Process
Get-Process
Start-Process
Stop-Process
Wait-Process
REMARKS
To see the examples, type: "get-help Get-Process -examples".
For more information, type: "get-help Get-Process -detailed".
For technical information, type: "get-help Get-Process -full".
For online help, type: "get-help Get-Process -online"
Let's refine our call to Get-Process
by restricting it to processes named node
:
Get-Process -Name node
NPM(K) PM(M) WS(M) CPU(s) Id SI ProcessName
------ ----- ----- ------ -- -- -----------
0 0.00 103.52 4.45 14 1 node
So far, this looks like an (verbose, more on that later) equivalent to Bash. What's special about it?
Even though the output looks like a specially formatted table created by the Get-Process
command, it is in fact an array of objects.
Powershell Commands Return Objects
Let's take a closer look at what we got from Get-Process -Name node
.
$processes = Get-Process -Name node
$processes.psobject
BaseObject : System.Diagnostics.Process (node)
Members : {PSConfiguration {Name, Id, PriorityClass, FileVersion}, PSResources {Name, Id, Handlecount, WorkingSet, NonPagedMemorySize, PagedMemorySize,
PrivateMemorySize, VirtualMemorySize, Threads.Count, TotalProcessorTime}, Name = ProcessName, SI = SessionId…}
Properties : {Name = ProcessName, SI = SessionId, Handles = Handlecount, VM = VirtualMemorySize64…}
Methods : {get_SafeHandle, get_Handle, get_BasePriority, get_ExitCode…}
ImmediateBaseObject : System.Diagnostics.Process (node)
TypeNames : {System.Diagnostics.Process, System.ComponentModel.Component, System.MarshalByRefObject, System.Object}
An object in powershell, like in most languages, has a bunch of properties and methods. Let's dive right into the guts and take a look at what we're looking at.
In short, we're looking at a Process
object. The rest of the output shows us other details of this object, like which properties it has, which methods it has, as well as the class hierarchy.
$processes.psobject.TypeNames
System.Diagnostics.Process
System.ComponentModel.Component
System.MarshalByRefObject
System.Object
Looking at the guts like this is cool, but most of the time you'll look at objects using higher-level tools, like Get-Member
.
$processes | Get-Member
TypeName: System.Diagnostics.Process
Name MemberType Definition
---- ---------- ----------
Handles AliasProperty Handles = Handlecount
Name AliasProperty Name = ProcessName
NPM AliasProperty NPM = NonpagedSystemMemorySize64
PM AliasProperty PM = PagedMemorySize64
SI AliasProperty SI = SessionId
VM AliasProperty VM = VirtualMemorySize64
WS AliasProperty WS = WorkingSet64
Disposed Event System.EventHandler Disposed(System.Object, System.EventArgs)
ErrorDataReceived Event System.Diagnostics.DataReceivedEventHandler ErrorDataReceived(System.Object, System.Diagnostics.DataReceivedEventArgs)
Exited Event System.EventHandler Exited(System.Object, System.EventArgs)
OutputDataReceived Event System.Diagnostics.DataReceivedEventHandler OutputDataReceived(System.Object, System.Diagnostics.DataReceivedEventArgs)
BeginErrorReadLine Method void BeginErrorReadLine()
BeginOutputReadLine Method void BeginOutputReadLine()
...
You can narrow down what you're looking at by passing parameters to Get-Member
:
$processes | Get-Member -MemberType Property
TypeName: System.Diagnostics.Process
Name MemberType Definition
---- ---------- ----------
BasePriority Property int BasePriority {get;}
Container Property System.ComponentModel.IContainer Container {get;}
EnableRaisingEvents Property bool EnableRaisingEvents {get;set;}
ExitCode Property int ExitCode {get;}
ExitTime Property datetime ExitTime {get;}
Handle Property System.IntPtr Handle {get;}
HandleCount Property int HandleCount {get;}
HasExited Property bool HasExited {get;}
...
Dealing With Collections
In the previous section, we used -Name
to filter our call to Get-Process
by the name of process. That's a nice convenience, but we can filter the results of Get-Process
using array facilities.
Get-Process | where Name -eq node
NPM(K) PM(M) WS(M) CPU(s) Id SI ProcessName
------ ----- ----- ------ -- -- -----------
0 0.00 103.87 5.99 14 1 node
Now we're getting somewhere. The Get-Process
method passed a list of processes through the pipeline, and we used the general-purpose querying function where
to filter out the processes who name is not equal to "node"
.
The syntax Name -eq node
is Powershell's syntax for comparing two things. Comparing numbers would be 10 -gt 20
, for example. Powershell chose this syntax rather than the more familiar 10 > 20
syntax because >
is traditionally the redirection operator in shells.
This illustrates a common theme in Powershell: the designers of Powershell had to balance the syntax traditions of scripting languages like Ruby, Python and Perl with the syntax traditions of shells like Bash. When the two traditions are in strong conflict, Powershell's designers typically chose the solution that most closely matched the traditions of interactive shells (in Bash, [ $num -gt $other ]
is the syntax for comparisons).
The Pipeline: All Streams, All the Time
TL;DR The pipeline handles objects in a streaming manner, which means that zero, one or more objects all count as a "stream of objects". When a pipeline's objects are assigned to a variable, they turn into
$null
, a single object or an array of objects, depending on how many objects were emitted by the pipeline. To work with data uniformly, the@(...)
operator takes the result of a pipeline and produces an array, no matter how many objects were emitted.
If you've been playing along, there's something curious about what we've seen so far.
(Get-Process).GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
(Get-Process | where Name -eq node).GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False Process System.ComponentModel.Component
Get-Process | where CPU -gt 5
NPM(K) PM(M) WS(M) CPU(s) Id SI ProcessName
------ ----- ----- ------ -- -- -----------
0 0.00 104.91 6.32 14 1 node
0 0.00 207.68 18.17 132 128 pwsh
(Get-Process | where CPU -gt 5).GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
When a command puts a single object into the pipeline, the result of that command is an instance of the object. But when a command puts multiple objects into the pipeline, the result is an array of instances.
But it's even a little more surprising than that:
Get-Process | where Name -eq node | where CPU -gt 5
NPM(K) PM(M) WS(M) CPU(s) Id SI ProcessName
------ ----- ----- ------ -- -- -----------
0 0.00 80.11 6.63 14 1 node
Even though the first step through the pipeline only produces a single object, it can still be piped into another where
just fine.
In order to understand this, you need to understand that the pipeline receives objects one at a time, and possibly emits more objects into the pipeline. In other words, when you pipe something into where
, the where
command doesn't treat a single element any differently than zero or two elements.
This is important to allow pipelines to work in a streaming manner. When a pipeline sees an object, it doesn't know yet whether it will see another object, so Powershell commands process objects as soon as they are received. In practice, this means that Powershell pipelines treat zero, one or more objects as streaming collections.
However, when we want to stick the result of a pipeline into a variable, Powershell produces an array if the last stage of the pipeline emitted more than one object, produces the object itself if the pipeline emitted exactly one object, and produces $null
if the pipeline produces zero objects.
Because it can be convenient to handle all of these cases uniformly, Powershell provides a special operator (@(...)
) that turns the result of a pipeline into an array regardless of how many items were emitted.
@(Get-Process | where Name -eq node).GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
@(Get-Process | where Name -eq nonexistent).GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array