Fun With PowerShell: Let's Get Started (Digging Deeper into "The Pipeline")

In this article, we dig deeper into the concepts we learned in the first post in the Fun With PowerShell series, including the pipeline.

In the first post in the Fun with PowerShell series, we wrote a little script that searched the Open Movie Database for movies containing the word "Avengers".

We learned about Invoke-RestMethod, the syntax for invoking commands, the concept of "pipelines", and the fact that anything we type in a script that isn't assigned to a variable or passed into a pipeline is printed out. We also learned about redirection and the special $null variable, which allows us to redirect output into nothingness.

If you're just interesting in learning enough PowerShell to be useful, feel free to move on to the second post in the series (Fun With PowerShell: Deduplicating Records). But if you're curious to dig deeper, let's unpack a few of the concepts that we learned in more detail.

Invocation Syntax

Like in other shells, you invoke a command in Powershell by mentioning it.

Get-Process
 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
      0     0.00      88.55       2.94      14   1 node
      0     0.00     169.46       8.84     132 128 pwsh
      0     0.00       3.26       0.00     131 128 runuser
      0     0.00       0.75       0.00       1   1 sh
      0     0.00       1.55       0.00       5   1 startNode.sh
      0     0.00       3.14       0.00     128 128 startPwsh.sh

We invoked the command Get-Process, which is similar to Unix's ps command, and got back a table of data.

Parameters in Powershell are similar to other shells, but are much more structured. In fact, simply defining a Powershell command is enough to create a reasonable help description. In fact, every command has an automatic -? parameter that will print out help for that command.

Get-Process -?

NAME
    Get-Process
    
SYNOPSIS
    Gets the processes that are running on the local computer or a remote computer.
    
    
SYNTAX
    Get-Process [[-Name] <String[]>] [-ComputerName <String[]>] [-FileVersionInfo] [-Module] [<CommonParameters>]
    
    Get-Process [-ComputerName <String[]>] [-FileVersionInfo] -Id <Int32[]> [-Module] [<CommonParameters>]
    
    Get-Process [-ComputerName <String[]>] [-FileVersionInfo] -InputObject <Process[]> [-Module] [<CommonParameters>]
    
    Get-Process -Id <Int32[]> -IncludeUserName [<CommonParameters>]
    
    Get-Process [[-Name] <String[]>] -IncludeUserName [<CommonParameters>]
    
    Get-Process -IncludeUserName -InputObject <Process[]> [<CommonParameters>]
    
    
DESCRIPTION
    The Get-Process cmdlet gets the processes on a local or remote computer.
    
    Without parameters, this cmdlet gets all of the processes on the local computer. You can also specify a particular process by process name or process ID (PID) or pass a process object through the pipeline 
    to this cmdlet.
    
    By default, this cmdlet returns a process object that has detailed information about the process and supports methods that let you start and stop the process. You can also use the parameters of the 
    Get-Process cmdlet to get file version information for the program that runs in the process and to get the modules that the process loaded.
    

RELATED LINKS
    Online Version: http://go.microsoft.com/fwlink/?linkid=821590
    Debug-Process 
    Get-Process 
    Start-Process 
    Stop-Process 
    Wait-Process 

REMARKS
    To see the examples, type: "get-help Get-Process -examples".
    For more information, type: "get-help Get-Process -detailed".
    For technical information, type: "get-help Get-Process -full".
    For online help, type: "get-help Get-Process -online"

Let's refine our call to Get-Process by restricting it to processes named node:

Get-Process -Name node
 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
      0     0.00     103.52       4.45      14   1 node

So far, this looks like an (verbose, more on that later) equivalent to Bash. What's special about it?

Even though the output looks like a specially formatted table created by the Get-Process command, it is in fact an array of objects.

Powershell Commands Return Objects

Let's take a closer look at what we got from Get-Process -Name node.

$processes = Get-Process -Name node

$processes.psobject
BaseObject          : System.Diagnostics.Process (node)
Members             : {PSConfiguration {Name, Id, PriorityClass, FileVersion}, PSResources {Name, Id, Handlecount, WorkingSet, NonPagedMemorySize, PagedMemorySize,
                      PrivateMemorySize, VirtualMemorySize, Threads.Count, TotalProcessorTime}, Name = ProcessName, SI = SessionId…}
Properties          : {Name = ProcessName, SI = SessionId, Handles = Handlecount, VM = VirtualMemorySize64…}
Methods             : {get_SafeHandle, get_Handle, get_BasePriority, get_ExitCode…}
ImmediateBaseObject : System.Diagnostics.Process (node)
TypeNames           : {System.Diagnostics.Process, System.ComponentModel.Component, System.MarshalByRefObject, System.Object}

An object in powershell, like in most languages, has a bunch of properties and methods. Let's dive right into the guts and take a look at what we're looking at.

In short, we're looking at a Process object. The rest of the output shows us other details of this object, like which properties it has, which methods it has, as well as the class hierarchy.

$processes.psobject.TypeNames
System.Diagnostics.Process
System.ComponentModel.Component
System.MarshalByRefObject
System.Object

Looking at the guts like this is cool, but most of the time you'll look at objects using higher-level tools, like Get-Member.

$processes | Get-Member
   TypeName: System.Diagnostics.Process

Name                       MemberType     Definition                                                                                                                                                   
----                       ----------     ----------                                                                                                                                                   
Handles                    AliasProperty  Handles = Handlecount                                                                                                                                        
Name                       AliasProperty  Name = ProcessName                                                                                                                                           
NPM                        AliasProperty  NPM = NonpagedSystemMemorySize64                                                                                                                             
PM                         AliasProperty  PM = PagedMemorySize64                                                                                                                                       
SI                         AliasProperty  SI = SessionId                                                                                                                                               
VM                         AliasProperty  VM = VirtualMemorySize64                                                                                                                                     
WS                         AliasProperty  WS = WorkingSet64                                                                                                                                            
Disposed                   Event          System.EventHandler Disposed(System.Object, System.EventArgs)                                                                                                
ErrorDataReceived          Event          System.Diagnostics.DataReceivedEventHandler ErrorDataReceived(System.Object, System.Diagnostics.DataReceivedEventArgs)                                       
Exited                     Event          System.EventHandler Exited(System.Object, System.EventArgs)                                                                                                  
OutputDataReceived         Event          System.Diagnostics.DataReceivedEventHandler OutputDataReceived(System.Object, System.Diagnostics.DataReceivedEventArgs)                                      
BeginErrorReadLine         Method         void BeginErrorReadLine()                                                                                                                                    
BeginOutputReadLine        Method         void BeginOutputReadLine()                                                                                                                                   
...

You can narrow down what you're looking at by passing parameters to Get-Member:

$processes | Get-Member -MemberType Property


   TypeName: System.Diagnostics.Process

Name                       MemberType Definition                                                             
----                       ---------- ----------                                                             
BasePriority               Property   int BasePriority {get;}                                                
Container                  Property   System.ComponentModel.IContainer Container {get;}                      
EnableRaisingEvents        Property   bool EnableRaisingEvents {get;set;}                                    
ExitCode                   Property   int ExitCode {get;}                                                    
ExitTime                   Property   datetime ExitTime {get;}                                               
Handle                     Property   System.IntPtr Handle {get;}                                            
HandleCount                Property   int HandleCount {get;}                                                 
HasExited                  Property   bool HasExited {get;}                           
...

Dealing With Collections

In the previous section, we used -Name to filter our call to Get-Process by the name of process. That's a nice convenience, but we can filter the results of Get-Process using array facilities.

Get-Process | where Name -eq node
 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
      0     0.00     103.87       5.99      14   1 node

Now we're getting somewhere. The Get-Process method passed a list of processes through the pipeline, and we used the general-purpose querying function where to filter out the processes who name is not equal to "node".

The syntax Name -eq node is Powershell's syntax for comparing two things. Comparing numbers would be 10 -gt 20, for example. Powershell chose this syntax rather than the more familiar 10 > 20 syntax because > is traditionally the redirection operator in shells.

This illustrates a common theme in Powershell: the designers of Powershell had to balance the syntax traditions of scripting languages like Ruby, Python and Perl with the syntax traditions of shells like Bash. When the two traditions are in strong conflict, Powershell's designers typically chose the solution that most closely matched the traditions of interactive shells (in Bash, [ $num -gt $other ] is the syntax for comparisons).

The Pipeline: All Streams, All the Time

TL;DR The pipeline handles objects in a streaming manner, which means that zero, one or more objects all count as a "stream of objects". When a pipeline's objects are assigned to a variable, they turn into $null, a single object or an array of objects, depending on how many objects were emitted by the pipeline. To work with data uniformly, the @(...) operator takes the result of a pipeline and produces an array, no matter how many objects were emitted.


If you've been playing along, there's something curious about what we've seen so far.

(Get-Process).GetType()
IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array
(Get-Process | where Name -eq node).GetType()
IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     False    Process                                  System.ComponentModel.Component
Get-Process | where CPU -gt 5
 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
      0     0.00     104.91       6.32      14   1 node
      0     0.00     207.68      18.17     132 128 pwsh
(Get-Process | where CPU -gt 5).GetType()
IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array

When a command puts a single object into the pipeline, the result of that command is an instance of the object. But when a command puts multiple objects into the pipeline, the result is an array of instances.

But it's even a little more surprising than that:

Get-Process | where Name -eq node | where CPU -gt 5
 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
      0     0.00      80.11       6.63      14   1 node

Even though the first step through the pipeline only produces a single object, it can still be piped into another where just fine.

In order to understand this, you need to understand that the pipeline receives objects one at a time, and possibly emits more objects into the pipeline. In other words, when you pipe something into where, the where command doesn't treat a single element any differently than zero or two elements.

This is important to allow pipelines to work in a streaming manner. When a pipeline sees an object, it doesn't know yet whether it will see another object, so Powershell commands process objects as soon as they are received. In practice, this means that Powershell pipelines treat zero, one or more objects as streaming collections.

However, when we want to stick the result of a pipeline into a variable, Powershell produces an array if the last stage of the pipeline emitted more than one object, produces the object itself if the pipeline emitted exactly one object, and produces $null if the pipeline produces zero objects.

Because it can be convenient to handle all of these cases uniformly, Powershell provides a special operator (@(...)) that turns the result of a pipeline into an array regardless of how many items were emitted.

@(Get-Process | where Name -eq node).GetType()
IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array
@(Get-Process | where Name -eq nonexistent).GetType()
IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array